Accuracy of a proprietary large language model in labeling obstetric incident reports.
Voluntary incident reporting is an important resource for identifying adverse events and near misses, but the volume of reports can pose challenges. This study used the large language model (LLM) ChatGPT-3.5 in a secure environment to label a sample of obstetric incident reports (e.g., neonatal resuscitation supplies, lactation support). Compared with the human-assigned labels—the gold standard—ChatGPT demonstrated high sensitivity and specificity.