Health Care Data Science for Quality Improvement and Patient Safety

Alvin Rajkomar

Health Care Data Science for Quality Improvement and Patient Safety

Alvin Rajkomar, MD | October 1, 2016

View more articles from the same authors.

Perspective

Background

Having spent countless sleepless nights on call during my internship hunched over coffee-stained tables writing notes by hand, I eagerly anticipated implementation of a state-of-the art electronic health record (EHR) during my final year of residency. As a regular user of Amazon and Netflix, I assumed that merely adopting such technology would enhance my productivity in the hospital, the way it improved my life outside the hospital. Yet, I soon confronted the same clunky user interfaces and unintuitive EHR designs that have frustrated health care practitioners everywhere.(1) Optimistically, I assumed that even if the user interfaces were subpar, at least the data we so painstakingly entered into the system could be used to improve patient care.

Although the log of all EHR activity is dumped into large databases daily, the process of piecing together analyzable datasets is done by report writers. This new class of health information technology (IT) professionals has direct access to stores of data packed in large spreadsheets. To find a piece of data, they look up the aisle and shelf where it is stored and retrieve it for you, analogous to a librarian finding a single book in an enormous library. Yet clinical data is complex and contextual—a heart rate may be listed under the formal vital sign table or under nursing documentation, where it is listed as a pulse. A report writer without clinical background may not appreciate that a request for heart rate should actually include data from both tables, and a clinician requesting data would not even know to give that level of specification.

After failing a few times to get valid datasets pertaining to a specific patient population from report writers, I hypothesized that I could do it faster and more accurately if I extracted the data myself. Therefore, I went through a lengthy training process (24 hours of in-person classes, a 4-hour project, and 4 hours of supervised testing) to get direct access to EHR database. I was only the second of more than 2400 UCSF faculty members to do so.

Complex Health Care Data Requires Multidisciplinary Teams to Understand

Once I had direct access to billions of rows of data spread across thousands of tables, I realized that I could theoretically answer nearly any clinical question I had. Did clinicians at my institution transfuse blood in a way consistent with modern guidelines? Could we predict which patients would have avoidable emergency department visits? I had expected that extracting data to answer these questions would be similar to the process of examining laboratory results for my hospitalized patients—as long as you know where to look, the process would be transparent.

However, harnessing EHR data along with sensor, app, and patient-reported outcomes to improve patient safety and quality is challenging. The data hardcodes nearly every detail of the work done by other health professionals. Therefore, understanding the data requires close collaboration among multiple health professionals. A data analyst who is not comfortable seeking out nurses, pharmacists, or laboratory technicians would have an even harder time piecing together the raw data in a meaningful way.

As an example, a week after I had finished caring for patients in the hospital, I wanted to review my use of antibiotics. I had assumed that knowing when and how to order a medication were sufficient to analyze the data on how I prescribed medications. Instead, I found an incredibly complex trail of data showing every step in which multiple health care professionals took care of countless details of formulating medications, transferring them to the right unit, and administering them. As a clinician, I ordered medications, but I soon realized that I only initiated a Rube Goldberg–like machine that handled all the details for this seemingly simple order to be carried out. Moreover, I noticed an unanticipated consequence of poor user interfaces. With multiple ways to order the same laboratory test or medication, clinicians frequently must cancel and reorder services. Rather than reflecting the clinician's intention, the data faithfully encodes the ostensibly capricious computer instructions of clinicians trying to get the system to do what they want.

Emerging Role for a Clinician Data Scientist

After observing this phenomenon time and again, I recognized a glaring need for a clinician–data translator. The core skills would be domain expertise in clinical systems, ability to extract data from large electronic stores, and thorough understanding of how to rigorously analyze large datasets. In real life, these skills are intertwined because if a researcher wants to add a variable to analysis, the translator would need to assess the benefit of that addition, whether the data exists within the database, and if it can be extracted successfully. In other fields, the combination of a domain expert, a computer scientist, and a statistician is referred to as a data scientist.

The benefits of a clinician data scientist extend far beyond simply accessing large, accurate datasets, although that is the first step. Consider the real-world problem of credit card fraud detection, where suspicious purchases are flagged and occasionally credit card orders are declined to prevent fraudulent purchases. For example, PayPal found that purchases from multiple parts of the world may indicate fraud, but adopting this rule universally would unnecessarily flag the purchases from a pilot. To prevent this, humans review the results from computer purchasing algorithms. Now imagine that, instead of looking for fraud, we are looking for adverse events, such as a strange antibiotic pattern or constellation of vital signs, which could prompt closer review by a clinician to catch events early or even before they occur. With a foot in both the clinical and data science realms, clinician data scientists are best positioned to determine which questions warrant significant investment to answer and, just as importantly, how to integrate them into workflows. This role is distinct from that of a clinician–statistician; biostatisticians have traditionally focused on the design of clinical studies that have defined protocols of data collection and analysis.

In other domains where prediction is applied—like which movie or product you may enjoy—the consequences of acting on an inaccurate recommendation are limited. In medicine, accepting a suggested clinical intervention generated by a flawed algorithm carries significant risk to patients and even clinicians, who might be operating in a busy environment where second guessing every decision is impractical. Clinician data scientists can help bridge the gap of where to apply novel algorithms and how to design safeguards to prevent mistaken applications of algorithms.

There is no current pathway to train such clinician data scientists. Many data science training pathways and degrees are domain agnostic: designed to create professionals who can work in a variety of fields. The training focuses on methodology and technical skills that are broadly applicable but, frankly, technical. However, learning how to leverage distributed file systems for batch analysis does not appeal to most clinicians, who may be interested in knowing enough to work productively with data scientists but not in creating a complex pipeline of data infrastructure. Just as clinician researchers must know enough about logistic regression and survival analysis to work productively with biostatisticians, clinician data scientists should know enough of the technical details of the data flows and programming to be conversant with the data scientists they will collaborate with, although they do not need to become expert programmers themselves. New training programs must be created that blend the technical training of data scientists with particular emphasis on applications to the health care domain, which requires collaboration with multiple health care professionals.

As the work of clinician data scientists becomes more prominent, rank-and-file clinicians will also need additional training to work with data products. Recently, a representative of an EHR vendor recounted the story of one physician who was shown an algorithm that predicted a high readmission risk for a patient driven by the high number of medications prescribed. He asked a data scientist if he should prescribe fewer medications to reduce this risk. The doctor failed to appreciate that the machine learning algorithm found a significant correlation between the number of medications and readmission risk, but a high number of medications did not cause the patient to be readmitted.

Although the call for clinician data scientists has largely been in the context of precision medicine (focused on "-omic" data), it should be supplemented with a call for clinician data scientists who can harness clinical datasets to improve quality and safety. EHR datasets contain valuable information that can provide insight on delivering better care, whether through retrospective analysis or enabling prospective trials. This work will depend on involving clinician data scientists intimately in the process as a bridge between raw data and the clinical activity to be understood and optimized.

Alvin Rajkomar, MD Assistant Professor Division of Hospital Medicine University of California, San Francisco

References

1. Rosenbaum L. Transitional chaos or enduring Harm? The EHR and the disruption of medicine. N Engl J Med. 2015;373:1585-1588. [go to PubMed]

In Conversation With… Richard Platt, MD, MSc

October 1, 2016

Editor's note: Dr. Platt is Professor and Chair of the Harvard Medical School Department of Population Medicine. He is principal investigator (PI) of the FDA Sentinel System, which performs postmarketing safety surveillance using the electronic health data from more than 175 million people. Dr. Platt is also co-PI of the Patient-Centered Outcomes Research Institute's (PCORI) PCORnet coordinating center, a consortium of 34 networks that will use electronic health data to conduct comparative effectiveness research. We spoke with him about big data and patient safety.

Dr. Robert M. Wachter: What got you interested in using databases to improve the safety of care?

Dr. Richard Platt: For over 20 years, I was a hospital epidemiologist and focused on developing and implementing systems for improving one of the most important patient safety fields: prevention of health care–associated infections, for both patients and health care workers. As part of that work, I was asked to develop an antimicrobial stewardship program and that led to a general interest in the safety of drugs, vaccines, and other medical products. I have balanced these two streams of work since then. Both of those streams have matured in a way that let us do things that we only really dreamed about for a long time. That's largely because systems for collecting and analyzing large amounts of both inpatient and ambulatory data have matured. We've been able to piggyback on that to answer questions that can inform system-level policy and practice.

RW: I imagine that, as you began your work, it was mostly paper and pencil; whereas, it has become an era where the data are collected electronically. How does that enhance the work and change it fundamentally?

RP: It's now possible to talk about the experience of large, well-defined populations, which is a great improvement over learning only about individuals to whom something important has happened. We are now able to characterize everyone who's eligible for care, those who have received specific diagnoses, tests, and treatment, and to assess their health outcomes. We spent a lot of time trying to understand how to repurpose electronic health record and billing data for these new purposes.

There is a fairly common belief that it is easy to do these things. However, most electronic health data is not automatically useful for these other purposes. A great deal of additional work is usually required to make it useful for new purposes like patient safety and medication safety. Several people in our group maintain contact with our partner practices, hospital systems, and health plans to understand the care behind their data and to identify systematic differences, gaps, or discontinuities. It's possible to do these things where it wasn't before. But a lot of work is needed to understand what's in the data and to standardize it before we can really take advantage of it.

RW: Do you think that work is largely around creating standard ways of recording things? Or might we be reaching an era where the technical capacities of IT systems to find data, even when it's recorded in different places and in different ways, is going to be so good that (the same way we don't pay as much attention about filing things in our computers because search is so good) we'll be able to get what we need even in messy systems?

RP: The future will be terrific—eventually. For now, though, we need information that's often not in the electronic data at all, or that are collected in entirely different ways by different systems. The extensive customization that is a feature of many EHRs invites clinicians to record the same information in different ways in different systems. We often need to know what options they are offered to make sense of the information. Similarly, we need to know insurance system benefits to understand whether the system captures specific kinds of care with an appropriate level of granularity. If the data aren't captured or there's no more record of how it got to be the way it is, then we're going to have trouble.

RW: Of course we hear lots of complaining now from clinicians who are being asked to capture data through checkboxes and standardized forms for quality measurement, surveillance, regulatory, and all of that. In some ways the challenge is that well-meaning organizations are looking for data in standardized ways that allow them to analyze it, but the clinicians are being turned into expensive data entry clerks. How do we reconcile that?

RP: There are a couple approaches to consider. One is to ask the patients to directly enter information about race, ethnicity, education, sexual orientation and gender identity, smoking status, and medical history. Another is to substitute computable definitions for reportable events and conditions that now require substantial skilled personnel resources. These definitions should use information as part of care delivery without making clinicians collect more or different information than they would ordinarily. An example is the recent change in the Center for Disease Control and Prevention's surveillance definitions for complications of ventilation. It used to require surveillance personnel to identify ventilator-associated pneumonia by reviewing clinical status, radiology reports, and sputum production. This has been replaced by a definition that depends on fully objective, routinely recorded, electronic data about ventilator settings and inspired oxygen concentration. It is straightforward to build this detection capability into EHRs. That has the double advantage that we get much more standardized, useful, and actionable information while reducing the burden on skilled personnel. We need to move in that direction as much as we can.

RW: As you look at the IT systems out there and the ecosystem that exists between academicians like you and your group, the CDC, and the Epics and Cerners of the world, are the relationships sufficiently well developed to move from an example just like the one you gave, a new standard way of collecting a certain piece of data that will enhance safety and improve quality, to having that get embedded in the IT systems in a way that makes all of this real? Or does that require a lot of work?

RP: It certainly requires a lot of work. Both the Office of the National Coordinator for Health Information Technology and a private consortium are committed to making changes that would make it easier to do. At the moment, there are three major kinds of problems. One is the extensive customization of EHRs that leads to problematic nonstandardization. A second is the difficulty of extracting information from EHRs or of making it available to third-party programs that can use it. A third is poorly documented changes over time that result from routine upgrades. For example, we were completely misled by a false signal of an influenza outbreak that resulted from a system upgrade that stored more diagnoses per encounter. That meant our historical baseline rates were too low. So the system fired off signals of an outbreak when in fact nothing had been going on.

Those kinds of changes are rarely documented. There was no reason why the system architect should have thought to tell us or even annotate the system. But if you're trying to understand what's happening over time in our health care system or understand differences between organizations or regions, those things all come to the fore.

RW: What are you seeing in terms of your ability to follow populations over time as people move from one IT system to another—if they aren't in a VA or in Kaiser Permanente for example? Is that getting better, and do you see a path toward a day where you will be able to seamlessly follow a population over months and years?

RP: We haven't been able to solve that problem. For the FDA's program called Sentinel, we've built a distributed data network that involves most large national health plans, insurance companies, and large HMOs. We have hundreds of millions of person-years of data in the distributed dataset. But we are at the mercy of the fact that people change the location of their care, their health care providers, and their insurers. So, although we have hundreds of millions of years of experience to draw on, the median duration that we're following people at present is 2 or 3 years. A closely related need is to link information in EHRs to claims data. We need that capability because EHRs only contain information provided by the host system and don't know about care provided elsewhere. Claims data contain much of that data about out-of-system care. In both instances it will likely be necessary to link confidential data held by different organizations. This poses both technical and governance challenges. FDA Sentinel and PCORI's PCORnet have addressed these topics in a white paper. We have recently begun a PCORI-sponsored effort to develop and implement governance and technical capabilities to link EHR and claims data held by different organizations.

RW: As the patient safety field emerged, we looked to infection control as maybe the only example within hospitals of an area where people had been focusing on safety, had developed standardized definitions, and had experts already in the building with time allocated to do this work. Do you find that a useful analogy?

RP: Yes! There are very useful things for the more general field. Let's tick off some of them. One is having highly trained experts who are embedded in the care delivery system. The second is emphasis on surveillance to assess outcomes drive improvement and measure its impact. As we mentioned a moment ago, there has been a laudable transformation to focus on more standard definitions that could be applied evenly and then to reduce the effort involved doing that surveillance. The transformation of hospital infection prevention work—being much more proactive than it used to be to become prevention rather than response—is important. Another thing that took me a while to appreciate in hospital infection control is understanding where the most important opportunities are. On a societal level, the burden of morbidity is really greatest because the "normal" rate is higher than it needs to be, rather than because a small fraction of hospitals or practices have outlier rates. While we might look hard at an institution that has higher-than-average surgical site infection rates, the real opportunity for improvement is getting the hospitals that have a 1.5% infection rate down to a 0.75% infection rate.

RW: As we become more electronic, it's not only gathering data in new ways for surveillance, but providing clinicians and managers immediate feedback in some ways illustrating that a patient is at risk for a bad thing happening or a bad thing is starting to happen. If you intervene, you can prevent it. How is that going? It always seems to sound good on a PowerPoint slide, but in real life you are throwing lots of alerts at people and it's hard to change behavior.

RP: You're absolutely right. Alert fatigue is an important problem. I think there several ways to address it. One is to improve our modeling so that the predictive value of an alert is high enough. We're working on a real-time prediction model for patients admitted with pneumonia, to assist prescribing choices regarding broad versus narrow spectrum antibiotics.

The second possibility to improve safety is to build in certain kinds of requirements into the workflow without getting in the way of the efficient delivery of care. For a study we did of decolonization in ICUs, the Hospital Corporation of America built a screen into the daily nursing section of their EHR so the staff affirmed bathing with antimicrobial soap at the same time the daily bath was required. Although this didn't require new data collection, it notified and reinforced appropriate behavior and drove very high levels of compliance.

A third approach to avoiding fatigue is to shift some responsibilities away from frontline providers. For instance, major responsibility for following up women with gestational diabetes who have not had recommended postpartum glucose could be assigned to someone who does not have direct patient care responsibilities. These kinds of follow-up are important, but not necessarily very time sensitive.

RW: As you look at the technological advances on the horizon, we already talked about interoperability and standardized definitions. Are there other things, for example, natural language processing or other advances in artificial intelligence that will fundamentally transform this kind of work?

RP: Natural language processing and machine learning seem quite promising eventually. At present, I think we'll have to work out the uses one by one and start with the simplest things. For instance, identifying cardiac ejection fractions in catheterization or ultrasound reports will be more straightforward than assigning class 3 or 4 heart failure status on the basis of symptoms reported in a history of present illness. We are having success using structured data for predictive modeling, for instance in identifying individuals who are good candidates for pre-exposure prophylaxis against HIV infection (PrEP).

RW: It always struck me during the Ebola time, doesn't the State Department have in its passport records the knowledge that you just came back from Liberia, and somehow if those databases got merged with a patient presenting with fever rather than these dual assessments, could we get better at this?

RP: There is no question that we could do a lot that would be really useful. But merging confidential and proprietary data remains a big challenge. This is partly for technical reasons and even more so because of legal and other considerations.

RW: We talked about the futuristic IT-driven piece of this, but we have a lot of new information today and skilled people trying to do this work. What do you think we should be doing now?

RP: We could do a much better job using existing data to identify patients who need follow-up care and don't receive it, to monitor clinical outcomes, and to support population medicine. We already mentioned finding women who require postpartum assessment of diabetes status. Another example is confirming dispensing of prescribed medication—it can be a useful way to assess adherence, especially to medications intended to be used chronically. In the arena of outcomes assessment, we've shown that both CMS and state inpatient databases based on claims provide much more complete and useful information about hospitals' rates of surgical site infection than hospitals are able to develop using their own resources because so many patients receive care for the complications at other institutions. And on the population medicine front, we've partnered with the Massachusetts Department of Public Health and three large multisite practices to use EHR data to provide more timely and detailed information about the prevalence of obesity, smoking, poorly controlled hypertension, and other measures than are available from other means. We could build systems that would make it much easier to identify patients who are at risk for problems with opioid dependence or complications of opioid therapy that would guide clinicians with better prescribing. We could do a lot in the near future using the tools we have now while we build new ones.

This project was funded under contract number 75Q80119C00004 from the Agency for Healthcare Research and Quality (AHRQ), U.S. Department of Health and Human Services. The authors are solely responsible for this report’s contents, findings, and conclusions, which do not necessarily represent the views of AHRQ. Readers should not interpret any statement in this report as an official position of AHRQ or of the U.S. Department of Health and Human Services. None of the authors has any affiliation or financial involvement that conflicts with the material presented in this report. View AHRQ Disclaimers