Skip to main content

Interpreting the Patient Safety Literature

Kaveh G. Shojania, MD | June 1, 2005


Five years ago, a widely publicized randomized trial reported a 90% reduction in the incidence of contrast dye-induced renal failure when patients were pretreated with acetylcysteine, an agent previously used to treat acetaminophen overdoses and bronchitis.(1) When I asked a nephrologist colleague what he thought of this dramatic result, he replied: "Nothing works that well."

Time would prove my colleague correct. In the 5 years since the original study, 19 more trials have appeared.(2) Two reported results in the same ballpark as the first, but most suggested more modest benefits or no effect at all. Combining the results of these trials produces a reduction in risk of contrast nephropathy of not 90%, but 27%, a risk reduction of only borderline statistical significance.(2)

No single principle can encapsulate all of the interpretive issues for a body of literature, but nothing works that well comes close. To allow some wiggle room for the discovery of penicillin and other occasional quantum leaps in medical care, we can tone it down a little: most things don't work that well. The relevant corollary is that any study reporting dramatic improvements in any major clinical outcome is probably flawed. When clinical interventions do work, they tend to bring very modest gains: relative improvements of 20% to 40% are often cause for celebration, and absolute improvements in the 5% to 10% range represent major advances in care. If an article reports improvements in these ranges, scrutinize it closely. If the improvements exceed these ranges, expect subsequent studies to show less impressive effects, or even no benefit.

If such modest gains are the best we can hope for in conventional clinical research, consider the obstacles facing patient safety research. We have relatively superficial understandings of the causes of even common errors and adverse events, never mind rare but catastrophic ones. The interventions designed to reduce these events are often intrinsically complex and must be implemented in the messy setting of routine care, rather than the highly controlled environments of most clinical trials. Finally, patient safety research is funded with only a fraction of the dollars available to traditional biomedical research.(3) So, if you think a particular patient safety intervention is going to save large numbers of lives or reduce serious injuries substantially, well, then there's a lovely piece of property I'd like to show you.

The Users' Guide to the Medical Literature series (originally published in the Journal of the American Medical Association and now available in book form [4]) presents many of the key issues facing the interpretation of different study types. Rather than attempting to compress this comprehensive series into a brief commentary, the discussion below highlights issues particularly relevant to producers and consumers of patient safety research.

The Design of the Study

The before-after study is the most commonly encountered design encountered in quality improvement research (5) and likely in patient safety as well. Unfortunately, many potentially relevant changes may occur between "before" and "after" periods of measurement. For example, consider a 2001 study in which rotating the antibiotics used in an intensive care unit was associated with a six-fold reduction in infection-related mortality during the intervention period (remember—most things don't work that well).(6) Is this cause and effect? We know little about other infection control programs introduced at the time, changes to ICU staffing or organization, changes in the patient population, or other improvement trends that may have already been underway, producing so-called secular trends.

One alternative to the before-after design is to collect data from a control site contemporaneously. For instance, in a well-known study (7) of pharmacists on clinical rounds in the ICU, investigators collected before-after data from units within the hospital that carried out this intervention as well as control units that did not. This study provided much more compelling evidence than a simple before-after study would have, because it eliminated the possibility that other changes in the hospital could account for the improvements seen in the intervention units. Another alternative is to add more time points. For instance, a study of an intervention to improve compliance with hand washing provided annual compliance data for several years before and after implementation of the hand hygiene campaign. The observed trend provides much more compelling support for the intervention than would a study with one data point before and one after the campaign.(8)

Could the intervention have changed the data rather than changing care?

Sometimes implementation of an intervention changes the measurement of the study's outcomes. Consider a study of the impact of a medical emergency team on survival from cardiac arrest on general hospital wards.(9) Using a before-after design, the study reported an absolute reduction in mortality from "cardiac arrest" of 22%. Most things don't work that well.

The study focuses on patients with "calls for cardiac arrest." Before the intervention, cardiac arrests meant literal cardiac arrests. After the intervention, they referred to any event that resulted in the emergency team being called—low blood pressure, agitation, or a variety of other problems short of cardiac arrest. That many more patients with these pseudo-arrests should survive compared with patients who have no peripheral pulse because of a malignant ventricular rhythm should come as no surprise.

Even if the definition of an outcome does not change, its measurement may change as a result of an intervention. For instance, computerized order entry systems and greater participation of pharmacists in clinical activities may play roles in detecting medication errors, not just reducing them. Well-designed medication safety studies have taken care to collect data in the same way before and after implementation of the intervention, avoiding having the intervention do double duty as the main data source.(7,10)

Do the outcomes involve subjective judgments?

Clinicians reviewing medical records often disagree in labeling cases as adverse events from medical care, and even more in judging the preventability of such events.(11) It is crucial, therefore, that studies focused on outcomes involving such judgments use multiple reviewers and report the degree to which they agreed. For instance, in a recent study of medication reconciliation (12), the significance of the reported outcome—unintended discrepancies between discharge and admission prescriptions—depended on the identification of discrepancies with the potential to cause harm. The investigators had three clinicians independently review each outcome, and found that they agreed only 26% more often than would have been expected by chance.

Given this relatively poor agreement, how does one assign the final outcome for each case—majority rules or discussion to achieve consensus? This particular study (12) chose the latter, which sounds fine but can be subtly problematic, since reviewers of these cases might share common biases. For instance, researchers might push each other to find potentially harmful errors wherever they look, thereby legitimizing the study and its importance. Conversely, clinicians with no research agenda might be prone to undercall errors, especially if they know the cases come from their own department. One solution is to use reviewers who do not know why the study is being conducted or, in the case of clinical trials, do not know to which group the records came from—control or intervention. A recent study of the effect of a new rotation schedule on errors due to fatigue took this precaution.(13)

How strong is the connection between the study's outcomes and true patient outcomes?

Only a minority of patient safety studies report impacts on morbidity or mortality. Typically, studies report surrogate outcomes (e.g., error rates, changes in safety culture) or process measures, such as hand-washing rates or compliance with a protocol for surgical site identification. In both cases, the key question is how tightly these surrogate outcomes or process measures are linked to clinical endpoints. For instance, in the previously mentioned study of medication reconciliation (11), the outcome was "potential for harm." Even putting aside the issues of interrater reliability, the question remains: how many adverse clinical events would result from, say, 100 unintended discrepancies? The example given to illustrate the most significant class of discrepancy was a patient mistakenly discharged on the antihypertensive agent ramipril. If 100 patients were discharged on this medication, some would have the error caught by their regular physician, others would experience asymptomatic reduction in blood pressure or asymptomatic increases in serum potassium or creatinine. Perhaps some might become symptomatic. But how many? The question is particularly important because only 6% of the discrepancies fell in this "serious" category to begin with.

In some cases, a surrogate outcome has no established relationship to patient outcomes. For instance, various tools for assessing patient safety culture have been described in the literature.(14) But the degree to which improvements in "safety culture" (as measured by any of these instruments) produce actual improvements in safety remains to be established. The lack of measured patient outcomes does not invalidate such research entirely: such a link may ultimately be established and seems plausible enough. But this limitation should engender a healthy sense of skepticism when reading studies reporting this as the key outcome.

Could the intervention have had adverse effects not reported in the study?

Any intervention can produce unintended effects.(15) We take this concept for granted with new clinical interventions such as drugs, but it has received less attention in patient safety research. For instance, interventions to reduce work hours of physician trainees might result in more frequent hand-offs and, therefore, an increase in errors due to greater discontinuity in care.(16) Thus, the important outcomes in a clinical trial of new intern rotation schedule include not just errors made by interns in the trial but also errors made by other clinicians caring for the same patients.(13) Future studies of computerized order entry and bar coding as error-reduction strategies will hopefully report rates of errors created—not just errors averted—particularly since recent qualitative studies have identified new opportunities for error associated with these technologies.(17,18)


Those who conduct and interpret patient safety research face many of the same challenges faced by their colleagues in more traditional clinical research.(4,15) However, the intrinsic complexity of many safety interventions, the frequent use of before-after designs, and the subjective nature of many important outcomes create additional pitfalls. In the midst of this interpretive complexity, most things don't work that well provides a surprisingly helpful guiding principle.

Kaveh G. Shojania, MDCanada Research Chair in Patient Safety and Quality ImprovementAssistant Professor of MedicineUniversity of Ottawa


1. Tepel M, van der Giet M, Schwarzfeld C, Laufer U, Liermann D, Zidek W. Prevention of radiographic-contrast-agent-induced reductions in renal function by acetylcysteine. N Engl J Med. 2000;343:180-184. [ go to PubMed ]

2. Nallamothu BK, Shojania KG, Saint S, et al. Is acetylcysteine effective in preventing contrast-related nephropathy? A meta-analysis. Am J Med. 2004;117:938-947. [ go to PubMed ]

3. Wachter R, Shojania K. Internal Bleeding: The Truth Behind America's Terrifying Epidemic of Medical Mistakes. New York, NY: Rugged Land; 2004.

4. Guyatt G, Rennie D, eds. Users' Guides to the Medical Literature: A Manual for Evidence-Based Clinical Practice. Chicago, IL: AMA Press; 2002.

5. Shojania KG, Grimshaw JM. Evidence-based quality improvement: the state of the science. Health Aff (Millwood). 2005;24:138-150. [ go to PubMed ]

6. Raymond DP, Pelletier SJ, Crabtree TD, et al. Impact of a rotating empiric antibiotic schedule on infectious mortality in an intensive care unit. Crit Care Med. 2001;29:1101-1108. [ go to PubMed ]

7. Leape LL, Cullen DJ, Clapp MD, et al. Pharmacist participation on physician rounds and adverse drug events in the intensive care unit. JAMA. 1999;282:267-270. [ go to PubMed ]

8. Pittet D, Hugonnet S, Harbarth S, et al. Effectiveness of a hospital-wide programme to improve compliance with hand hygiene. Infection Control Programme. Lancet. 2000;356:1307-1312. [ go to PubMed ]

9. Buist MD, Moore GE, Bernard SA, Waxman BP, Anderson JN, Nguyen TV. Effects of a medical emergency team on reduction of incidence of and mortality from unexpected cardiac arrests in hospital: preliminary study. BMJ. 2002;324:387-390. [ go to PubMed ]

10. Bates DW, Leape LL, Cullen DJ, et al. Effect of computerized physician order entry and a team intervention on prevention of serious medication errors. JAMA. 1998;280:1311-1316. [ go to PubMed ]

11. Hayward RA, Hofer TP. Estimating hospital deaths due to medical errors: preventability is in the eye of the reviewer. JAMA. 2001;286:415-420. [ go to PubMed ]

12. Cornish PL, Knowles SR, Marchesano R, et al. Unintended medication discrepancies at the time of hospital admission. Arch Intern Med. 2005;165:424-429. [ go to PubMed ]

13. Landrigan CP, Rothschild JM, Cronin JW, et al. Effect of reducing interns' work hours on serious medical errors in intensive care units. N Engl J Med. 2004;351:1838-1848. [ go to PubMed ]

14. Nieva VF, Sorra J. Safety culture assessment: a tool for improving patient safety in healthcare organizations. Qual Saf Health Care. 2003;12(suppl 2):ii17-23. [ go to PubMed ]

15. Shojania KG, Duncan BW, McDonald KM, Wachter RM. Safe but sound: patient safety meets evidence-based medicine. JAMA. 2002;288:508-513. [ go to PubMed ]

16. Laine C, Goldman L, Soukup JR, Hayes JG. The impact of a regulation restricting medical house staff working hours on the quality of patient care. JAMA. 1993;269:374-378. [ go to PubMed ]

17. Koppel R, Metlay JP, Cohen A, et al. Role of computerized physician order entry systems in facilitating medication errors. JAMA. 2005;293:1197-1203. [ go to PubMed ]

18. Patterson ES, Cook RI, Render ML. Improving patient safety by identifying side effects from introducing bar coding in medication administration. J Am Med Inform Assoc. 2002;9:540-553. [ go to PubMed ]

This project was funded under contract number 75Q80119C00004 from the Agency for Healthcare Research and Quality (AHRQ), U.S. Department of Health and Human Services. The authors are solely responsible for this report’s contents, findings, and conclusions, which do not necessarily represent the views of AHRQ. Readers should not interpret any statement in this report as an official position of AHRQ or of the U.S. Department of Health and Human Services. None of the authors has any affiliation or financial involvement that conflicts with the material presented in this report. View AHRQ Disclaimers