Hidden flaws behind expert-level accuracy of multimodal GPT-4 vision in medicine.
Numerous studies have evaluated Generative Pre-trained Transformer’s (GPT) accuracy in response to text-only questions. This study appraises GPT-4 with Vision (GPT-4V), which analyzes images and text together. GPT-4V performs similarly to physicians regarding multiple choice accuracy, but demonstrates flawed rationale even when it selected the correct response.