OpenAI's GPT-4 model rivals ophthalmologists in diagnosing eye problems, according to a new research paper.
Researchers recently tested how GPT-4 would perform in 87 patient scenarios. While the model failed to match expert-level ophthalmologists and made some mistakes in accurate diagnoses, the researchers found that the tool performed better than junior doctors and matched many specialists in addressing eye problems.
"Large language models (LLMs) are approaching expert-level performance in advanced ophthalmology questions," the researchers wrote in a paper published in the PLOS Digital Health journal. They added that GPT-4 was able to outscore "some expert ophthalmologists" in diagnosing eye problems.
Also: What does GPT stand for? Understanding GPT 3.5, GPT 4, and more
AI has been disrupting nearly every industry to varying degrees, but researchers are especially excited about the potential applications of the technology for health care. With AI's help, researchers hope that they can catch missed diagnoses and generally improve patient outcomes. In order for that to happen, LLMs still need significant improvement, however, given they can be accurate in some cases but are nowhere near ready for clinical settings.
This latest research, however, suggests GPT-4 is getting close. In the study, the researchers provided 347 ophthalmology questions across the 87 scenarios to GPT-4 and asked doctors about the accuracy and relevance of its results. In general, GPT-4 performed exceptionally well, but the researchers found that the model failed to correctly answer a handful of questions on topics ranging from glaucoma and cataracts to pediatric ophthalmology. The researchers didn't see any association between those incorrect assessments and doctor answers, suggesting GPT-4 underperformed on those topic areas for no specific reason. Regardless, the researchers were impressed by the results.
"The remarkable performance of GPT-4 in ophthalmology examination questions suggests that LLMs may be able to provide useful input in clinical contexts, either to assist clinicians in their day-to-day work or with their education or preparation for examinations," they wrote in their paper.
Nevertheless, they cautioned that GPT-4 isn't necessarily ready to handle patient visits on its own, and said that there are very real ethical implications to turning over medical diagnoses to a large language model.
"Our study found that despite meeting expert standards, state-of-the-art LLMs such as GPT-4 do not match top-performing ophthalmologists," the researchers wrote. "Moreover, there remain controversial ethical questions about what roles should and should not be assigned to inanimate AI models, and to what extent human clinicians must remain responsible for their patients."
Also: GPT-4 Turbo reclaims the 'best AI model' crown from Anthropic's Claude 3
Looking ahead, the researchers think GPT-4 and its successors could benefit from additional context and "fine-tuning" with "high quality ophthalmological text data," along with an "uncertainty indicator" that would tell doctors how sure GPT-4 is of its diagnosis. Still, in the absence of experts, even now, GPT-4 may prove better than the average doctor in diagnosing eye problems.
"GPT-4 may prove especially useful where access to ophthalmologists is limited," the researchers said, adding that its "knowledge and reasoning ability is likely to be superior to non-specialist doctors and allied health care professionals working without support, as their exposure to and knowledge of eye care is limited."