The Human Testing Element is Still Important Despite Advances in AI

As AI becomes more advanced and capable, it’s being applied to a growing number of use cases. There are more and more instances of AI being used as a tool by software developers and the recent emergence of ‘vibe coding’ has certainly been grabbing a lot of media attention.

Just a few months ago, Andrej Karpathy coined this term to describe the process of asking LLMs such as ChatGPT and Claude to write code by giving it simple, natural language instructions. While this approach is capable of generating working code, it's not yet able to produce advanced software programs unassisted – not yet anyway. But LLMs are being used as tools by developers as they build software intended for real-world use cases.

The key here is that even though the LLMs are writing the code, the people commanding it to do so are still capable of understanding what's been written. If they couldn't, then debugging, maintaining and securing the codebase effectively would be impossible. AI is an assistant that is being used to make suggestions, rather than provide answers that can be trusted 100%.

Like in other parts of software development, human oversight still plays an important role. It's especially vital for humans to be involved in the testing of usability and accessibility in software – but it’s also critical that humans remain the master of all software testing processes.

LLMs can write code – but are poor usability testers

LLMs have been trained on the vast reams of data, images and text from a magnitude of sources around the internet and are designed to behave in an expected manner when carrying out tasks such as writing an email, for example. They’ve been trained to code by ingesting simple programs written by human software engineers.

And LLMs have also been trained to understand imperfect input. They’re designed to try and understand the user at all times, so they will attempt to fill in the gaps when the user skips words or doesn’t express themselves clearly. And this ‘understand at all costs’ approach is in conflict with the idea of usability testing.

Usability testing requires developers to switch their focus to specific frames of reference where they ask if the behaviour of software is clear, or work with a focus group to establish that the product is intuitively usable. LLMs are not well-positioned to test for any flaws in usability as they are trained to understand everything – for them, there’s no difference between bad and good.

But for human users, usability is judged in terms of whether software works in line with their preconceptions, which are often based on previous experience. You can’t ask the humans to re-learn their approach to using software – the program has to meet their expectations. Even if software comes with documentation, you can’t rely on users to read these instructions. That’s why a significant part of usability testing is making sure that software behaves with an implied knowledge of how things should work.

Accessibility testing requires human oversight

Poor accessibility is something that impacts many different groups of people — not just those with permanent disabilities. It also includes temporary impairments (like a broken arm) and situational ones (like bright sunlight or background noise). These concepts, popularized by Microsoft’s Inclusive Design work in the mid-2010s, show that accessibility isn’t just about edge cases — it’s about building software that works for everyone, in any context.

Currently, LLMs are unlikely to have been trained to recognise poor colour contrast that could affect users with visual impairments – because they are trained to understand everything, they will see every pixel as perfect. Nor will they pick up issues with bad keyboard navigation, text with poor readability and a lack of zoom functionality. However, it is entirely possible to train models to check for accessibility issues and offer advice to developers. But LLMs could only ever provide assistance in these scenarios – human oversight would be necessary to understand what accessibility aspects we are looking to address.

AI cannot manage risk or be held accountable

With large, complex software applications, systematic testing is necessary prior to release. However, exhaustive testing isn't possible – there are just too many potential use cases and inputs that the application could encounter. That means decisions have to be made about what to test – and what tests don't need to be carried out.

This is a vital part of QA – and it cannot be left to AI. QA teams must draw on their experience, knowledge, and judgement to make informed decisions. They must plan, prioritize and make risk assessments. They will understand that there are trade-offs that need to be made – but they can make accountable decisions and justify them to relevant stakeholders. If AI makes these decisions, then there's no accountability if something goes wrong.

Where AI can streamline software testing and test management

There are many areas of the software testing process where AI can be of assistance. There’s a lot of heavy lifting involved in software testing such as preparing test data, suggesting the specific scenarios that need to be tested, and writing automated tests. AI agents are already capable of performing some basic tests today, and it’s likely they’ll take on even more complex testing tasks in the near future.

And AI also has utility for software test management. It can be used to help with writing test documentation, and to suggest ideas for key areas to test. It can be used to research other, similar software projects to surface relevant data and insights that could assist the testing. And it can also perform data analysis of the test results, creating reports for relevant stakeholders quickly so they don't have to wait for human operatives to compile their reports.

Humans are the end users – so human expertise is needed

Quality in software is non-negotiable. We’ve all heard stories about disastrous software launches where products have been shipped with massive issues that have made them completely unusable for a certain group of people. For now, usability and accessibility are human domains so it is natural to use humans for testing these aspects.

In the era of AI, it's important to maintain human oversight. If AI is used in software creation, testing or test management it is vital to check the results. QA teams need to be the ones that take accountability for software test management processes. Only they have the domain knowledge and understanding of business priorities to determine test coverage and acceptable levels of risk. While AI can support these processes with data and recommendations, it cannot currently make business critical decisions on its own.