Sorry, you need to enable JavaScript to visit this website.
Skip to main content
Study

GPT versus resident physicians — a benchmark based on official board scores.

Katz U, Cohen E, Shachar E, et al. GPT versus resident physicians — a benchmark based on official board scores. NEJM AI. 2024;1(5):5. doi:10.1056/aidbp2300192.

Save
Print
July 31, 2024
Katz U, Cohen E, Shachar E, et al. NEJM AI. 2024;1(5):5.
View more articles from the same authors.

Before large language models (LLM) can be integrated into clinical care, they must be shown to perform at least as well as physicians. This study compared two publicly available GPT models with official physician scores on the Israeli board residency examinations in five core medical disciplines: internal medicine, general surgery, pediatrics, psychiatry, and obstetrics and gynecology (OB/GYN).  GPT-4 performance was comparable to that of physicians taking the exam, whereas GPT-3.5 did not reach passing levels on any of the five exams.

Save
Print
Cite
Citation

Katz U, Cohen E, Shachar E, et al. GPT versus resident physicians — a benchmark based on official board scores. NEJM AI. 2024;1(5):5. doi:10.1056/aidbp2300192.