Bias Sensitivity in Diagnostic Decision-Making: Comparing ChatGPT with Residents

Abstract

Background

Diagnostic errors, often due to biases in clinical reasoning, significantly affect patient care. While artificial intelligence chatbots like ChatGPT could help mitigate such biases, their potential susceptibility to biases is unknown.

Methods

This study evaluated diagnostic accuracy of ChatGPT against the performance of 265 medical residents in five previously published experiments aimed at inducing bias. The residents worked in several major teaching hospitals in the Netherlands. The biases studied were case-intrinsic (presence of salient distracting findings in the patient history, effects of disruptive patient behaviors) and situational (prior availability of a look-alike patient). ChatGPT’s accuracy in identifying the most-likely diagnosis was measured.

Results

Diagnostic accuracy of residents and ChatGPT was equivalent. For clinical cases involving case-intrinsic bias, both ChatGPT and the residents exhibited a decline in diagnostic accuracy. Residents’ accuracy decreased on average 12%, while the accuracy of ChatGPT 4.0 decreased 21%. Accuracy of ChatGPT 3.5 decreased 9%. These findings suggest that, like human diagnosticians, ChatGPT is sensitive to bias when the biasing information is part of the patient history. When the biasing information was extrinsic to the case in the form of the prior availability of a look-alike case, residents’ accuracy decreased by 15%. By contrast, ChatGPT’s performance was not affected by the biasing information. Chi-square goodness-of-fit tests corroborated these outcomes.

Conclusions

It seems that, while ChatGPT is not sensitive to bias when biasing information is situational, it is sensitive to bias when the biasing information is part of the patient’s disease history. Its utility in diagnostic support has potential, but caution is advised. Future research should enhance AI’s bias detection and mitigation to make it truly useful for diagnostic support.

Topic

JGIM

Author Descriptions

Erasmus School of Social and Behavioural Sciences, Erasmus University Rotterdam, Mandeville Building, Room T15-10, P.O. Box 1738, Rotterdam, DR, 3000, The Netherlands
Henk G. Schmidt PhD

Karolinska Institutet, Solna, Sweden
Jerome I Rotgans PhD

Institute of Medical Education Research Rotterdam, Erasmus Medical Center, Institute of Medical Education Research Rotterdam, Dr. Molewaterplein 40, Na-2418, 3015 GD, Rotterdam, The Netherlands
Jerome I Rotgans PhD & Silvia Mamede MD, PhD

Share

Bias Sensitivity in Diagnostic Decision-Making: Comparing ChatGPT with Residents

Abstract

Background

Methods

Results

Conclusions

Perspectives of In-Hospital Intramuscular Naltrexone and Oral Medications for Alcohol Use Disorder: A Study of Addiction Clinicians and Hospitalized Patients

Beyond Workarounds: Enhancing Education, Care, and Wellness on Inpatient Medicine Rotations—A Multicenter Qualitative Study

Become
an SGIM Member

Bias Sensitivity in Diagnostic Decision-Making: Comparing ChatGPT with Residents

Abstract

Background

Methods

Results

Conclusions

Related Articles

Perspectives of In-Hospital Intramuscular Naltrexone and Oral Medications for Alcohol Use Disorder: A Study of Addiction Clinicians and Hospitalized Patients

Beyond Workarounds: Enhancing Education, Care, and Wellness on Inpatient Medicine Rotations—A Multicenter Qualitative Study

Becomean SGIM Member

Become
an SGIM Member