By clicking a retailer link you consent to third-party cookies that track your onward journey. This enables W? to receive an affiliate commission if you make a purchase, which supports our mission to be the UK's consumer champion.

18 Feb 2019

Can you trust AI symptom checkers?

Symptom checking apps gave conflicting results and advice when we presented them with the same set of symptoms

Anna Studman

Online and app-based symptom checkers powered by artificial intelligence (AI) algorithms aim to offer more accurate health advice than self-diagnosis through search engines, but we found you get different advice depending on which one you use.

Plenty of people have turned to Dr Google at some point with a health query, but a new wave of AI-powered symptom checkers aim to make that process more clinically accurate, helping to direct you towards the most appropriate next steps.

A symptom checker tool - a version of the NHS 111 triage service - features as part of the new NHS app. It's also part of the service offered byBabylon - the private-health tech company which runs the online NHS GP practice, GP at Hand.

When we put these symptom checkers to the test, we found that results vary depending on which one you use, with the potential for incorrect or inadequate advice being given to patients (though most of the apps come with the caveat that they should not serve as a replacement for a medical diagnosis).

Read more: Online GP services put to the test

How do AI symptom checkers work?

Typically, you enter your symptoms and the app asks you follow-up questions and reacts to your answers.

At the end of questioning, there will usually be a list of possible conditions - and sometimes onward treatment advice - according to the perceived urgency of your issue.

Don't rely on the diagnosis

Babylon's terms and conditions state that its symptom-checking services 'do not constitute medical advice, diagnosis or treatment'. The second symptom checker we tested, Ada, has a similar caveat.

This seems to contradict the basis for offering the apps. Babylon talks about its symptom checker in a way that you'd be forgiven for thinking meant diagnosis was precisely the point.

But Babylon says that it can use the information you enter to provide triage advice, and that information on potential diagnoses simply provides context for why it advises a particular course of action.

What happened when we used three symptom checkers

We tried Babylon's symptom checker, and another popular app, Ada, as well as the NHS 111 online triage service, using two health scenarios designed by a GP to test the services.

We got different responses for the same medical queries from each symptom checker
The NHS checker tended to err on the side of caution
Ada and Babylon leaned the other way, sometimes missing potentially significant red flags

The healthcare scenarios we used to test the apps were:

Someone with insomnia and underlying mental health issues (which should have been detected based on the information they supplied)
Someone with flu symptoms that could potentially be meningitis

Some apps missed potential 'red flags' with insomnia patient

The importance of how you describe your symptoms, and the limitations of a check-box approach, became clear in our snapshot test.

Babylon only had an option for 'restless sleep' to describe insomnia symptoms.

Selecting this option directed patients to talk to their GP about mental health issues, but if not, the app only provided basic fact sheets about sleep, which our experts felt were inadequate and could potentially leave patients undiagnosed if they relied upon it for answers (though Babylon is clear that this shouldn't be the case).

Ada was considered thorough by our experts for the insomnia scenario, but the question-based format didn't allow for important contextual information to come out.It discounted potential red flags where our respondent said they were 'unsure' if they had suicidal thoughts.

Possible meningitis symptoms missed

If you describe your symptoms to the Babylon app as flu-like, it simply assumed you had flu, rather than checking.

When we put in additional symptoms, it did suggest meningitis, but our experts pointed out that the maximum six-hour timeframe it gave to wait before seeking medical advice was a long time for a serious illness that requires prompt treatment.

Babylon maintains that the six-hour timeframe was appropriate in this case.

The Ada app failed to properly rule out meningitis. It didn't ask about key symptoms that would help to do so, and didn't suggest it as a possibility.

Ada told us meningitis would have been flagged had we reported a severe instead of moderate headache, and a fever instead of 'not sure', but when we tried this, it still didn't suggest meningitis.

NHS 111 plays it safe

The NHS 111 symptom checker works slightly differently. It's based on the established NHS Pathways triage program, which is known to be extremely cautious.

As such, it played things safer than the other apps, suggesting our insomnia patients seek emergency advice within the hour, and our meningitis patient seek help within two hours.

While this approach is appropriate for meningitis, our experts felt it didn't do enough to establish how urgent the insomnia patient's mental health needs were, an approach which could overwhelm emergency services with unnecessary callouts.

Experts' views: are AI symptom checkers safe?

Advocates believe they can reduce the strain on the NHS by effectively directing patients to the most appropriate source of help, but not everyone is convinced.

Elizabeth Murray, Professor of eHealth and Primary Care at University College London, thinks it is unlikely that these symptom checkers will be able to make a safe diagnosis, because the apps haven't been developed on the basis of robust evidence, such as going through peer reviewing or clinical trials.

These processes are at odds with how the tech industry likes to work: quickly, and with an emphasis on marketing.

Dr Whitaker, GP and New Statesman columnist, puts it more bluntly. He thinks these algorithms are 'basically disasters', and argues strongly for the importance of face-to-face interaction at the initial stage of a patient's diagnosis.

It's possible that this technology simply needs more time. Alastair McLellan, editor of the Health Service Journal, sees potential - and says that the algorithms are developing quickly.

He argues that there's always a risk in healthcare of missing things, and that AI could eventually make better judgements than most GPs. For example, AI could collate and interpret patient notes and peer-reviewed studies much faster and with better recall than humans.

But we aren't there yet, and 72% of Which? members told us they'd be concerned about an AI robot doing triage in place of a human GP.

According to GP Dr Margaret McCartney, the danger is that the apps subvert best practice for a normal consultation. 'Usually it is considered good practice to ensure that the patient can talk freely for the first couple of minutes, to get the story of what has happened and why they are there,' she points out, 'but there's no ability for the app to dissect free text. It's like playing “20 questions” at a party.'

The bottom line: should you use symptom checkers?

A symptom checker could be a useful tool to help establish possible diagnoses and what to do next. But bear in mind they aren't perfect, and some might over or underestimate the seriousness of your condition.

Take the advice you get with a grain of salt, and if in any doubt, consult your GP.