Performance of large language models on medical oncology examination questions.
Longwell JB, Hirsch I, Binder F, et al. Performance of large language models on medical oncology examination questions. JAMA Netw Open. 2024;7(6):e2417641. doi:10.1001/jamanetworkopen.2024.17641.
A common way to test the accuracy and limitations of large language models (LLM) is by prompting it to answer standardized questions. This study used medical oncology examination questions from the American Society of Clinical Oncology (ASCO), the European Society for Medical Oncology (ESMO), and original questions constructed by the study team to test the accuracy of several open source and proprietary LLMs. Accuracy varied by LLM, and many incorrect answers, if acted upon in practice, had the potential for patient harm.