Diagnostic accuracy of a large language model in pediatric case studies.

Joseph Barile; Alex Margolis; Grace Cason; Rachel Kim; Saia Kalash; Alexis Tchaconas; Ruth Milanaik

Study

Diagnostic accuracy of a large language model in pediatric case studies.

January 17, 2024

Barile J, Margolis A, Cason G, et al. JAMA Pediatr. 2024;178(3):313-315.

View more articles from the same authors.

Clinicians and the public are increasingly interested in using chatbots like ChatGPT to learn more about their care, particularly for diagnoses. This study asked ChatGPT to provide a differential diagnosis list and final diagnosis for 100 pediatric case studies. ChatGPT had an overall error rate of 83%. Among incorrect diagnoses, many were clinically related to the final diagnosis, but too broad to be classified as correct, and just over half were of the same organ system. Despite the error rate, authors still thought that large language models (LLMs) could be helpful to clinicians as a tool, and recommend that teaching chatbots may improve diagnostic accuracy.

PubMed citation

Available at

Diagnostic accuracy of a large language model in pediatric case studies.

Connect With Us

Agency for Healthcare Research and Quality

Diagnostic accuracy of a large language model in pediatric case studies.

Connect With Us

Agency for Healthcare Research and Quality

Submit Your Innovations

Continue as a Guest

Continue Logged In

Submit Your Innovations

Continue as a Guest

Continue logged in

New Users to the PSNet site

Submit Your Training

Continue as a Guest

Continue Logged In

Submit Your Training

Continue as a Guest

Continue Logged In

New Users to the PSNet site

Submit Your Toolkit

Continue as a Guest

Continue Logged In

Submit Your Toolkit

Continue as a Guest

Continue Logged In

New Users to the PSNet site

Submit Your WebM&M Case

Continue as a Guest

Continue Logged In

Submit Your WebM&M Case

Continue as a Guest

Already have a PSNet

Account?

New Users