Does one size fit all? Developing an evaluation strategy to assess large language models for patient safety event report analysis.
Fong A, Adams KT, Boxley C, et al. Does one size fit all? Developing an evaluation strategy to assess large language models for patient safety event report analysis. JAMIA Open. 2024;7(4):ooae128. doi:10.1093/jamiaopen/ooae128.
Free-text narratives in patient safety event (PSE) reports provide rich detail, but reading and analyzing them is resource-intensive. This study compares four large language models (LLM) to analyze PSE reports accurately. No one model outperformed the others on all tasks. Large parameter models generally performed better than smaller models but took significantly longer to run, which suggests that utilizing multiple models may be the best approach.