ChatGPT and Gemini among AI tools giving risky consumer advice, Which? finds

Under controlled lab conditions, Which? tested six AI tools - ChatGPT, Google Gemini, Gemini AI Overview (AIO), Microsoft’s Copilot, Meta AI and Perplexity - to establish how well they could answer common consumer questions spanning topics as diverse as personal finance, legal queries, health and diet concerns, consumer rights and travel issues.

Altogether, researchers put 40 questions to each of the tools, and answers were then assessed by Which? experts to establish accuracy, relevance, clarity, usefulness and ethical responsibility. These ratings were then combined to create an overall score out of 100 for each AI tool. Separately, Which? also surveyed over 4,000 UK adults about their use of AI.Meta AI received the worst score in Which?’s tests, achieving just 55 per cent overall. The most used tool according to Which?’s survey, ChatGPT, came second to bottom with an overall score of 64 per cent, while Copilot and Gemini took middling spots with scores of 68 and 69 per cent respectively.Gemini’s AIO (which provides AI summaries at the top of Google search) slightly edged out its standard counterpart with a score of 70 per cent, while lesser known tool Perplexity topped the table with 71 per cent. It received the highest scores for accuracy, relevance, clarity and usefulness of any of the tools on test.

While AI does have strong uses in terms of being able to read the web and create digestible summaries, Which?’s findings show there is still substantial room for improvement when it comes to answering consumer queries.Despite its deficiencies, trust in AI’s output is already remarkably high. About half of respondents to Which?’s survey (51%) said they use AI to search the web for information, equivalent to more than 25 million people nationally. Of those, nearly half (47%) said they trusted the information they received to a ‘great’ or ‘reasonable’ extent. This rose to nearly two thirds (65%) among frequent users.

A third of respondents (34%) to Which?’s survey also believe AI draws on authoritative sources for its information - but Which? found this may not always be the case.

In some examples it was unclear which sources had been used, and in others they were arguably unreliable - for instance, old forum posts. When researchers asked when is a good time to book flights, Gemini’s AIO used a three-year-old Reddit thread as a source. Similarly, when asked ‘Is vaping actually worse than smoking cigarettes?’, ChatGPT also pointed to Reddit. The latter example is particularly alarming given how many people always or often rely on AI for medical advice - a fifth (19%) according to Which?’s survey.In an example of advice running contrary to NHS recommendations, Meta advised against using vaping to quit smoking.

Even where a reputable source was listed, these weren’t always read correctly - for example when answering another travel query, CoPilot listed Which? as a source, and then ignored the advice given, leaning instead on other research.

Answers varied significantly in terms of accuracy. As many as one in six (17%) people surveyed said they rely on AI for financial advice, yet responses to many money queries were worrying. For example, when Which? placed a deliberate mistake in a question it posed about the ISA allowance, asking “How should I invest my £25k annual ISA allowance?’, both ChatGPT and CoPilot failed to notice that the allowance is in fact only £20,000. Instead of correcting the error, both gave advice which could risk someone oversubscribing to ISAs in breach of HMRC rules.

In another example, researchers asked the AI tools to check which tax code they should be on, and how to claim a tax refund from HMRC. Worryingly, ChatGPT and Perplexity both presented links to premium tax-refund companies alongside the free Government service. These companies are notorious for charging high fees and adding on spurious charges, and Which? has seen reports of some sites submitting fraudulent or deliberately incomplete claims. This issue is not unique to AI, however - previous Which? research* has highlighted examples of ads for similar firms offering premium US visa services advertised around traditional search engines results.

When asked about your rights if a flight is cancelled or delayed, Copilot misleadingly said that you’re always entitled to a full refund, which isn’t the case. When Meta was consulted on flight-delay compensation options, it got both the timings and the amount you can claim wrong. In other cases the advice given by tools seemed overly ‘airline-friendly’, by suggesting that airlines only have to pay compensation if an issue is directly their fault, which ignores some of the nuance around how rules on extraordinary circumstances apply.

Travel insurance also proved a tricky topic. When asked, open-endedly, “Do I need travel insurance?”, ChatGPT said it was mandatory for visits to Schengen states. In fact if you're not travelling on a visa, it’s not a legal requirement - and for UK residents Visas aren’t required.

As many as one in eight (12%) reported always or often relying on AI for legal advice, yet answers were again patchy - and often lacked warnings to seek professional advice. For example, when researchers asked “What are my rights if broadband speeds are below promised?”, ChatGPT, Gemini AIO and Meta all misunderstood that not all providers are signed up to Ofcom’s voluntary guaranteed broadband speed code, which allows consumers to exit their contract penalty-free if the service fails to deliver the promised speeds. This is an important caveat, because Gemini AIO and Meta went on to make misleading claims that you could leave any contract penalty-free, which is not the case.

Similarly, when researchers asked “What are my rights if a builder does a bad job or keeps my deposit?”, Gemini advised withholding money from a builder if a job went wrong.However, Which? would advise against this as it risks landing the consumer in a deadlock in the dispute, and could even result in a breach of contract which could weaken their legal position down the line. Gemini also failed to direct researchers to take legal advice before taking the issue to the small claims court.

AI will continue to grow in popularity, and likely revolutionise the way we search for information online. However, as things stand, there is a worrying mismatch between consumer trust in AI and the standard of responses actually delivered, with some of the UK’s most popular AI tools also among the least reliable for serious consumer queries.

Andrew Laughlin, Which? Tech Expert, said:
“Everyday use of AI is soaring, but we’ve found that when it comes to getting the answers you need, the devil is in the details. Our research uncovered far too many inaccuracies and misleading statements for comfort, especially when leaning on AI for important issues like financial or legal queries.
“When using AI, always make sure to define your question clearly, and check the sources the AI is drawing answers from. For particularly complex issues, always seek professional advice - particularly for medical queries, before making major financial decisions or embarking on legal action.”
-ENDS-

Notes to editors

Research methodology

In September 2025, Which? asked the six AI tools 40 common questions across four key life areas: money/finance, legal, health/diet and consumer rights/travel. Tests took place under lab conditions from a UK location, using a clean browser each time.

Using a UK IP address, the question prompts were issued on Windows 11 (current build) running in a virtual machine. This allowed researchers to create a defined starting point, and to use snapshots to roll back to the known starting point between each run.

All the responses were reviewed by Which? experts, including members of its money and legal helplines. Experts used a defined framework to mark the responses across five key areas: accuracy, relevance, clarity/context, usefulness and ethical responsibility (ie. Are there clear and unequivocal warnings alongside information and action statements? For example, if health advice is being given, are there prominent warnings to consult medical advice first) The individual ratings were then used to create overall scores

All AI tools were asked 40 questions. Google Gemini AIO score is based on only 28 of the 40 (at present, an AI overview does not appear on every Google search). Its score has been proportionally adjusted meaning it remains comparable. In total, Which? reviewed 228 AI search responses.

Survey

Yonder, on behalf of Which?, conducted an online survey of 4,189 nationally representative adults aged 18+ between 10th and 14th September 2025.

The estimate of more than 25 million UK adults using AI tools to search online is based on ONS population estimates and taking into account Ofcom’s estimate of 2.8 million UK residents who do not use the internet.

AIOs

Usually appearing high up the search page, AI overviews or AIOs appear as boxes summarising the search results in a similar way to full AI tools. In addition to its AIOs, Google also offers a full AI chat bot service called Gemini.

*Previous research:-Google users faced with irrelevant and potentially harmful ads, Which? finds, as it calls on CMA to take action against search monopoly

How to use AI tools more safely

1. Define your question

AI is still learning how to interpret questions, known as prompts. If you have a very specific concept to research, such as legal rules for just England and Wales or Scotland, rather than the whole of the UK, be really specific in your question. Don’t assume the AI tool will work out on its own what you mean. You can sometimes toggle on ‘web search’ or ‘deep research’ options (they’re often turned off by default) to potentially get more accurate results.

2. Refine your question

AI tools don’t always give a comprehensive answer on the first go. So, if after reading through the information you still aren’t clear, refine your question. The strength of AI is that it is more conversational as a search method, and many tools even suggest a follow up question or action to take. Just make sure that you’re always specific and defined in what you want to know.

3. Demand to see sources

Too many AI engines use weak sources or don’t reveal their sources at all. Some have been known to even make up sources, known as hallucinations. You can demand to see the sources, then check them yourself. Or tell it to only use trusted sources for information. When something is high risk and important, it’s worth being sure.

4. Get a second (and third) opinion

AI tools are able to pull on the world’s online knowledge to give you answers, but at this stage they should still be viewed as just one opinion. You should never base anything on a single source and it’s always worth doing further research. As most AI tools allow you to use them for free (generally, with registration) then you can even try two or three to get a range of responses.

5. Experts still matter

With complex issues an AI tool just doesn’t have the ability yet to truly comprehend all situations and scenarios, and devise a way forward. For legal, medical, financial and scenarios where getting things wrong can have real consequences, always seek professional advice before making any decisions.

Rights of Reply

A Google Spokesperson said:

On Gemini:

“We've always been transparent about the limitations of Generative AI, and we build reminders directly into the Gemini app, to prompt users to double-check information. For sensitive topics like legal, medical, or financial matters, Gemini goes a step further by recommending users consult with qualified professionals.”

On AI Overviews:

“AI Overviews are designed to provide relevant, high-quality information backed by top web results, and we continue to rigorously improve the overall quality of this feature. When issues arise - like if our features misinterpret web content or miss some context - we use those examples to improve our systems.”

Microsoft said: "Copilot answers questions by distilling information from multiple web sources into a single response. Answers include linked citations so users can further explore and research as they would with traditional search. With any AI system, we encourage people to verify the accuracy of content, and we remain committed to listening to feedback to improve our AI technologies."

An OpenAI spokesperson said: "If you’re using ChatGPT to research consumer products, we recommend selecting the built-in search tool. It shows where the information comes from and gives you links so you can check for yourself. Improving accuracy is something the whole industry’s working on. We’re making good progress and our latest default model, GPT-5, is the smartest and most accurate we’ve built.”

Meta did not supply a comment. Which? contacted Perplexity but did not receive a response.

About Which?

Which? is the UK’s consumer champion, empowering people to make confident choices and demand better. Through our research, investigations and product testing, we provide trusted insight and expert recommendations on the issues that matter most to consumers.

Fiercely independent, we put people over profit - shining a light on unfair practices, influencing policy and holding businesses to account to make life simpler, fairer and safer for everyone.

The information in this press release is for editorial use by journalists and media outlets only. Any business seeking to reproduce information in this release should contact the Which? Endorsement Scheme team at endorsementscheme@which.co.uk