A study compared six humans, OpenAI’s GPT-4 and Anthropic’s Claude3-Opus to answer medical questions. The humans performed better than the AI, with GPT-4 performing worse than Claude3-Opus. The questions were based on medical knowledge from a Knowledge Graph by Kahun, based on peer-reviewed sources. 105,000 questions were used to prepare the AI models. Both AI models did better with semantic questions than numerical ones. While LLMs can answer some questions, they are not yet reliable enough to assist physicians. Kahun aims to create more transparent AI that incorporates verified sources, as doctors want to understand the basis of AI recommendations.
Source link