AI Chatbots Give Bad Health Advice, Study Finds

A new study found AI chatbots gave inaccurate health advice in nearly half of cases, presenting false claims as equal to science. This is worse than last year's findings.

New research indicates artificial intelligence chatbots frequently deliver inaccurate and inconsistent health information, raising significant concerns about their use for medical advice. A comprehensive study, primarily led by the University of Oxford and published in Nature Medicine, found that users relying on these tools for medical decisions did not achieve better outcomes than those using traditional methods like online searches or their own judgment. The investigation, involving the largest user study of large language models (LLMs) for public health assistance, highlights that AI systems are not yet ready to assume roles traditionally held by medical professionals.

The core issue identified is the tendency of AI chatbots to generate responses that are not only factually incorrect but also vary wildly depending on how a question is phrased, leaving users uncertain about what advice to trust. This inconsistency is compounded by the frequent production of fabricated citations and "hallucinations," where false information is presented as established fact. Completeness scores for the references provided by these chatbots averaged a mere 40 percent, suggesting a profound lack of reliable sourcing.

Read More: Companies move AI models to own servers for privacy

Study Warns AI Chatbots Often Provide Inaccurate Medical Information - 1

The study evaluated chatbot responses across various health topics, including cancer, vaccines, stem cells, nutrition, and athletic performance. A notable finding indicated that nearly half of the responses presented a false equivalence between scientifically validated claims and non-science-based assertions. While some models, such as Google's Gemini, reportedly generated fewer highly problematic responses compared to others, the overall trend points to significant shortcomings. Open-ended prompts, in particular, were found to elicit a higher proportion of problematic advice.

Researchers emphasize that the continued deployment of these chatbots without adequate public education and stringent oversight risks exacerbating the spread of misinformation. The advice given is often delivered with a high degree of apparent confidence, masking underlying inaccuracies and incompleteness. This is particularly dangerous as users may struggle to discern reliable information from flawed suggestions, especially when presented with multiple potential conditions without clear guidance on which is most likely.

Read More: Eli Lilly's Foundayo drug: 1,400 prescriptions in first week, faces FDA checks

Study Warns AI Chatbots Often Provide Inaccurate Medical Information - 2

Background and Methodologies

The findings, published across multiple outlets including The British Medical Journal (BMJ) Open and Nature Medicine, stem from studies that tested AI chatbots against various medical scenarios and user queries. Some research also specifically investigated the vulnerability of foundational LLMs – including OpenAI's GPT-4o, Google's Gemini 1.5 Pro, Anthropic's Claude 3.5 Sonnet, Meta's Llama 3.2-90B Vision, and xAI's Grok Beta – to malicious instructions aimed at spreading health disinformation. These tests involved customising chatbots to produce incorrect responses with an authoritative, convincing, and scientific tone on topics ranging from vaccine-autism links to dietary cancer cures.

While some experts suggest AI could potentially augment patient interactions with human doctors, particularly in non-emergency situations, there is a clear consensus that AI is not a substitute for professional medical consultation. The research underscores the behavioral limitations of these systems and the urgent need to re-evaluate their integration into public health communication channels, stressing the necessity for rigorous, real-world testing with diverse user groups, analogous to clinical trials for pharmaceuticals.

Read More: Apple iPhone Shipments Up 20% in China While Market Falls

Frequently Asked Questions

Q: Why are AI chatbots not good for medical advice?
A new study found AI chatbots often give wrong and different health advice. This means users get uncertain about what to trust.
Q: What did the study find about AI chatbot accuracy?
The study showed AI chatbots gave wrong health information and made up references. They were only 40% complete on average for sources.
Q: Who is affected by bad AI health advice?
People looking for health answers online are affected. They might get incorrect information that is presented confidently, leading to bad health choices.
Q: What happens next with AI in health?
Researchers say AI chatbots are not ready for medical advice. More testing and rules are needed before they can be used safely for public health.
Q: Did any AI chatbots perform better than others?
Some models like Google's Gemini gave fewer very bad answers. However, the study found many AI systems still have big problems with giving correct health guidance.