Researchers at the University of Massachusetts Amherst studied how often large language models (LLMs) tend to produce false or misleading information, known as hallucinations, while generating medical summaries. The study analyzed 100 medical summaries from two up-to-date LLMs, GPT-4o and Llama-3, finding hallucinations in almost all of them. The most common hallucinations were related to symptoms, diagnosis, and medicinal instructions. The researchers highlighted the potential dangers of relying on these AI-generated summaries, as inaccuracies could lead to incorrect treatments or overlooking critical information in medical records. The study calls for a better framework for detecting and addressing AI hallucinations in the healthcare industry.
Source link