Potential biases in machine learning algorithms using electronic health record data.
Machine learning, a type of computing that uses data and statistical methods to enable computers to progressively enhance their prediction or task performance over time, has been widely promoted as a tool to improve health care safety. This commentary describes the potential for machine learning to worsen socioeconomic disparities in health care. Disadvantaged populations are more likely to receive care in multiple health systems. Therefore, relevant data about their health may be missing in an individual health system's records, hindering performance of machine learning algorithms. Racial and ethnic minority patients may not be present in sufficient numbers for accurate prediction. The authors raise concern that implicit bias in the care that disadvantaged populations receive may influence algorithms, which will amplify this bias. They recommend inclusion of sociodemographic characteristics into algorithms, building and testing algorithms in diverse health care systems, and conducting follow-up testing to ensure that machine learning does not perpetuate or exacerbate health care disparities.