Hidden flaws behind expert-level accuracy of multimodal GPT-4 vision in medicine.
Jin Q, Chen F, Zhou Y, et al. Hidden flaws behind expert-level accuracy of multimodal GPT-4 vision in medicine. NPJ Dig Med. 2024;7(1):190. doi:10.1038/s41746-024-01185-7.
Numerous studies have evaluated Generative Pre-trained Transformer’s (GPT) accuracy in response to text-only questions. This study appraises GPT-4 with Vision (GPT-4V), which analyzes images and text together. GPT-4V performs similarly to physicians regarding multiple choice accuracy, but demonstrates flawed rationale even when it selected the correct response.