In Conversation With… Shantanu Nundy, MD
Editor's note: Dr. Nundy is the Director of the nonprofit organization called the Human Diagnosis Project, a unique effort to crowdsource the diagnostic process. He also practices primary care at a federally qualified health center for low-income and uninsured individuals in Washington, DC. We spoke with him about his work with the Human Diagnosis Project.
Dr. Robert M. Wachter: Tell us about the genesis of the Human Diagnosis Project.
Dr. Shantanu Nundy: It started back in medical school when I was on the wards, learning how to do my basic physical exam and history. As a second-year medical student, they always make you meet the nicest patient in the hospital. For me, it was this young woman in her 30s, a Marine who had been admitted because she, for the past year, had these debilitating nightly fevers. Despite a year's worth of medical care—she started off in Southeast Asia, then Germany, then to Walter Reed, and finally to Hopkins, where I was a student—she was about to be discharged yet again without a diagnosis when I was in the room examining her. This retired infectious disease physician bounded into the room and promptly asked her a series of questions, did some basic exam maneuvers, and said, "I know exactly what you have—and you're going to get better." And she did. For me, that was the genesis.
RW: What was it that she had?
SN: She had adult-onset Still disease. At that time, I was just a couple of years out of college. I studied computer science and engineering, and I was just baffled and stunned by what I saw. That over the course of the past year, she had seen hundreds of doctors in multiple parts of the world, yet just one physician was able to get to the crux of what was going on. So, I sat there thinking, this is a huge information problem because someone, somewhere knew what to do with her. But in subsequent years, I also realized there were a couple of other challenges.
One is the information gap of knowing who knows what, about what. The other one was that there really is no good way for doctors to collaborate. She had seen hundreds of doctors, and maybe some had been thinking about rheumatic causes of her fevers even though she was stationed in Southeast Asia, and most of them thought infectious. But how do you collaborate and not simply communicate across different doctors? The third one was how do you measure individual doctors' clinical reasoning abilities? Because, without that, a lot of the doctors that saw her were going in with that bias of "this must be infectious" (there was some weird PCR test that was positive), so I'm just going to keep digging along that path. Yet this physician, who came out of retirement to see her, hadn't seen the electronic health record and hadn't talked to the team. He came in without that framing bias.
Long story short, I was a second-year medical student, and I had this crazy idea to start a website called Wikimedicine. The idea was that there ought to be a place where doctors can post cases in a way that's searchable, so that if I saw a patient like this I could type in her basic medical problems, her age, and other things and be able to find similar cases. That promptly failed because there were large parts of the problem I didn't understand. Almost a decade later, one of my good friends was starting this thing, which at that time was called Step Four, and approached me. I said I tried to work on something like this, and that eventually became the Human Diagnosis Project.
RW: Why did your original wiki approach fail?
SN: It failed on a couple of levels. One is I didn't have a sense of end-user design, and that was critical to figuring out how to get doctors to contribute their cases. The second one is I had a simple idea around how do you index these cases by different ages, genders, and other things. But I didn't realize that in order to figure out if one case is similar to another case, you need a data structure the connects the key findings ("graph-based") underneath. The hardest part is the dissimilarity between cases. I didn't build this with any big data, machine learning approach that was necessary to find similar cases to the patient I was seeing. Finally, I didn't think about resources. I was just thinking I'm going to put up a website and people are going to use it.
RW: In Silicon Valley, every successful entrepreneur has a great failure story. You've given us the micro lessons. What macro lessons from that failure have been important to you in your career?
SN: One is a lot of physicians that I see are in academia or other jobs, who do something off the side of the desk. The classic model in academia is someone asks you to work on a cool project, and you say sure. Someone asks you to mentor, and you say sure. One thing I learned is that everything takes pi plus 2x longer than you think it's going to take. My joke is that if you tell your wife, just 5 more minutes and I'll finish this email—you end up going downstairs 17 minutes later, which is pi plus 3x longer. You have to give yourself the opportunity to take a big bet and focus on it. I was trying to develop Wikimedicine while I was a medical student doing 100 different things. Having the time and the space to work on something as hard as this was a key lesson.
RW: As a computer science major jazzed about diagnostic errors, I would have thought the natural path would have been to create a bigger and better artificial intelligence (AI), computer-driven diagnostic engine on the shoulders of DXplain or all the ones that were tried in the past and failed. Why did you have this instinct to approach it in a more social, wiki kind of way?
SN: Two reasons. One is we just think it's the right thing to do. Such a system, if we could do what all these other organizations claim to be doing or are interested in doing, ought to be something that's available to every person on earth. And that's just fundamental to our belief.
The second one came from seeing Deep Blue beat Garry Kasparov in chess. Look at what they used as a training set: chess matches played between different grand masters at different levels all around the world. They had a data set of every match, every move on every piece, what the next move was. If you think about what most of these other organizations are doing with respect to just buying up EHR companies or claims datasets, basically it's the equivalent of telling you who won every chess match going back the last 50 years and saying, "Now can you play chess." You'd say, no, of course not. You cannot learn how to reason on sparse data that way.
Similarly, that is what is happening with EHR-based datasets. All you can say is that on this date, Dr. Nundy saw a patient and diagnosed this. It's the end result of the diagnostic process. It's not every piece on every board. So the second part of what drove us to our approach is realizing that the other process doesn't work. The way I reason through the step-by-step process I go through when I'm working up a patient is simply not in the dataset. These other organizations say their NLP [natural language processing] is better, but I don't care how good your NLP is. If the data is not in the notes, there is no way you're going to extract the answer.
RW: My sense is that you're democratizing it and using the wisdom of crowds, and the crowds aren't all chess masters. Whereas, Deep Blue's approach wasn't to look at every chess game of every schlub. It was to look at the greatest chess players in the universe and see how they reason. How did you reconcile that tension?
SN: One reason is that, in medicine, attending physicians have a lot of expertise. Obviously, some are significantly better than others. But in many ways when it comes to health and not just health care, physicians are at the top of their field and have a tremendous amount of expertise—and that expertise is incredibly distributed. In academia, we have this notion that so and so at such and such place is the best. But the reality is we don't know. Some of the community-based settings have lots of experience, lots of insights, lots of strong practitioners. It's hard to know a priori where they are.
Point number two is foundational to our approach of what we call collective intelligence. The difference between collective intelligence and crowdsourcing is for crowdsourcing you sum the different people providing input and give them equal weight. Collective intelligence says we should add up all those people in a weighted average. We should weigh their input on this clinical case based on their topical expertise pertinent to this case.
If we could design such a calculator, then we can create this more democratized process but maintain a really high level of quality. That's how other systems work. If you look at Wikipedia, how is it that anyone can edit the Theory of Relativity page yet it's still so good that Encyclopedia Britannica doesn't exist anymore? It's because they have an internal system that tells them who knows what about what. Those people are allowed to not only suggest edits but accept edits to the Theory of Relativity page. It's the same concept of wisdom from the crowds, but doing so in a smart way.
RW: Since your first effort in this area didn't go well, tell us how you made some of the big decisions when you decided to reboot this, in terms of how you were structuring the organization and what it was designed to do.
SN: Part one was realizing that if we were successful in answering the specific problem of the patient I saw when I was a medical student, what we would basically be collecting is the equivalent of a global positioning system or a map. Like a map enables other datasets so that I can figure out how to get from point A to point B the fastest way. Similarly in health care, we have lots of data but are missing the fundamental base layer of data, which is collective insight that allows us to use evidence-based medicine, your Fitbit, and your genetics to get from Point A to Point B. So one realization was just how big this was. And that led to the name "The Human Diagnosis Project."
Second was realizing how critical having a truly interdisciplinary approach was. The project is set up with six disciplines that make it possible: medicine, community, operations, design, engineering, and commerce. All six are equally important in the way we're building the project. Rather than the [clinical-first] approach I took before, it's realizing that without designers to figure out how you get doctors to care and want to contribute when they're so busy; without engineers to build that graph-based data structure and make it into a simple tool; without the operations people to support what is now the largest open medical project in the world—it wouldn't work.
RW: For those of us who haven't used it either as an end user or as a contributor to the database, take us through an hour in the life of those two people, someone who is trying to contribute to this effort and how an end user might access it and what they would see.
SN: First of all, it's a website or it's an application. It's very easy to use. Like many other applications, people self-onboard and start to use it themselves. There are two primary use cases. One is medical education, which is most of the usage on the system. The other is case collaboration.
For medical education, people are doing one of two things. One is posting a teaching case: "I saw this really interesting case that I learned a lot from, and I'm going to put it into this project so other people can learn from it and so the system can collect the data." The second activity is to solve a two-minute long clinical problem-solving case. They get a score based on how well they did on different dimensions of clinical reasoning.
On the clinical collaboration side, I use it in my clinic. I share a case, a clinical presentation, my clinical question. Then other people can provide input on what they think the right approach is. It's like a Q&A around that clinical question.
RW: It sounds like one of the things you tripped up on in the beginning was in figuring out how to motivate busy clinicians to do this work. What turns out to be the answer?
SN: The answer was told to me by Maureen Bisognano. She said, "I think I know what you guys have figured out. There are the three "Ms" of any professional that motivates them. It's meaning, it's membership, and it's mastery." For us, the meaning is the fact that I can see the impact I'm having both on other doctors when I help them with their cases as well as the impact to the overall project. The way we do this is we score with a system that we call "impact." After I solve a case, I see my impact go up, and it allows me to understand what impact I'm having.
The second is membership, which is the fact that I belong to something that matters. We call everyone a contributor. We don't call them users. We describe ourselves as a movement, not as a company. People have this sense of I'm part of something big, and the success of this big thing is my success. The third is mastery. People want to do something and feel like they're getting smarter or better because of it. That gets to some of the learning aspects. We're giving them feedback and helping them understand how their clinical reasoning is getting better with every case. Those are the secret sauce of how we've built the engagement around the system.
RW: You mentioned earlier that it's not purely wisdom of crowds in an egalitarian way, but some people's opinions carry more weight than others. How do you adjudicate that in the case of Still disease you saw as a medical student? Would someone with board certification in rheumatology or somehow a predefined knowledge of Still disease, published papers on it, have an advantage in the weighting of his or her opinion? Or it's all internal to what they've done on the site before?
SN: Today, it's all internal based on the system. That's how we've created something so egalitarian. When you enter Human Dx, we don't know if you're a doctor or not. We don't know if you're board certified or not. We don't know all the titles and degrees and papers that you have. But as you use the system and solve reference cases or teaching cases, we learn. We understand where you have strengths and weaknesses, and we're understanding who knows what about what. That's a way that we can build a project that's both high quality and egalitarian. We can have people in other countries where there aren't board scores and certifications, and we could even have patients—particularly around elements of their conditions that they understand best, like self-management of IBD [inflammatory bowel disease]. We can invite those individuals to contribute their knowledge while simultaneously making sure that we maintain the highest level of data quality and accuracy.
RW: If you had access to a database that you thought credibly told you that contributor A is really an expert in a relevant area and contributor B was not, would you not integrate that into the scoring system? Or is it a core principle that everybody starts out equal?
SN: That's a great question. It's not a core principle. Over time, we'd love to be able to ingest adjacent datasets to understand that. Our whole approach is using all available data. It's humans plus machines plus data. We've taken a completely human-based approach, and that's true with everything that we've done. The best answer is going to be served with aggregating all these different information sources together.
RW: Who do you think your competition is? Is it UpToDate? Is it the Isabels of the world? Or the Watsons of the world? Is it doctors who think they don't need any help? Who are you trying to win over?
SN: I think it's everyone and no one. There are at least three categories that we think about. First, we're in many respects competing for doctors' time and attention. We're particularly competing against doctors' online time and attention. Of organizations trying to do that, some are nonmedical like Facebook and some are medical like SERMO or Figure 1 or Doximity. That's one category, those social online medical communities. The second is the decision aids. You mentioned DXplain; there are also other systems like eConsult. So a lot of different systems are competing with more of the use case around clinical decision-making or clinical collaboration. The third is competitors around what we're ultimately building around this graph-based data structure, machine learning, AI, or however you want to term it. What's exciting for us is as an open project, we generally think that there is a place for nearly all these organizations in an ecosystem. We want to partner with as many of these organizations as possible to get to the answer or the care that patients deserve faster.
RW: Do you see ultimately that you're going and approaching large health care organizations and connecting with their personnel through them and they link the tool to their EHR as a kind of enterprise decision support? Or is this all at the level of individuals?
SN: We want to move into more of an enterprise approach. Where we're focused now is the United States safety net, and that gets to a little bit about our mission and the needs of the underserved. But we would love to serve every large hospital system, and not only have this individual user but also have "enterprise" users.
RW: Give us a sense of the numbers. How many cases, how many users, how much growth have you had?
SN: We started building the community 2 and a half years ago. It was me and a couple of friends from medical school. We recently crossed the 7000 physicians in over 70 countries mark. We're really inspired by Linux, the largest operating system in the world. There are around 15 million software engineers in the world and similarly around 15 million doctors in the world. To date, Linux has had around 15,000 individuals contribute to the Linux code base. When we think about our 7000 physician mark, that's the analog we think about.
RW: How much wind in the sails have you gotten from the emphasis over the last 5 years on diagnostic errors within the field of patient safety?
SN: It has been tremendous. As a young nonprofit organization, the first couple of grants were in large part due to those tailwinds, including the National Academy of Medicine report on diagnostic errors. It's not only from a funding perspective, but how we've prioritized, where should we demonstrate the value, and the evidence behind the technology are driven by the patient safety and diagnostic safety agenda.
RW: Take us forward 5 or 10 years. You've been successful beyond your dreams. Sketch out what it looks like. How is it used? What does the tool look like? How is it implemented?
SN: One of the areas we're most focused on is closing the specialty access gap for the 30 million uninsured Americans that rely on safety net clinics nationally. We've set a goal to close the specialty access gap for our nation's underserved. We think that is the first national-scale application of our work.
RW: Does that mean the primary care doctor in a rural community who doesn't have access to a gastroenterologist logs on and asks the question or finds the right diagnosis and an answer? Or does that mean the patient with a set of symptoms and signs logs on and obviates the need to see a doctor?
SN: It's the former. The idea is to create this as a tool for primary care doctors in the front lines of safety net practices to be able to access the insight they need to provide a higher level of care to their patients who cannot otherwise access it.
RW: You've mentioned one of your epiphanies from your early experience was the need for focus. Is this what you do 100% of your time or do you still practice?
SN: I do see patients in the safety net here in Washington, DC, every Friday. I don't miss that for anything. But otherwise, this is my full-time work, between this and my two little girls at home.
RW: When you're in your practice on Fridays, how has this work changed the way you practice?
SN: In a very tangible way for us. What's fun from an innovation perspective is I have all these brilliant ideas Monday through Thursday. On Friday, when I'm in clinic and three patients late and have someone with complex social issues, it reminds me that I have to go back on Monday and change the design we're working on. So it has been really important in terms of creating that focus to bridge both worlds, and also heartening to use this to start serving my patients. We've had a number of my patients benefit from the access the system has been able to provide and cannot wait to bring this to more clinics and providers.
RW: You talked about how you've been both inspired and helped by the diagnostic safety movement and the National Academy of Medicine report. What do you see the role of your project in the broader field of patient safety, in that you're pioneering an approach that's different than anything we've done in any other areas of safety, and maybe even in the areas of quality and evidence-based practice?
SN: Fundamentally, what we're doing is very human. In many respects, we've encapsulated the best of what I had in my medical training, which was rounding. A group of physicians, some with more expertise than others, hunkering down at a patient's bedside trying to figure out what's the right answer. We've used technology to enable that clinical collaboration and decision-making process in a way that the majority of doctors around the world can access in a busy clinical practice.
Our focus right now with some research we're doing at UCSF and elsewhere is around the diagnosis decision. Because it's hard. We don't talk about it, but when we're in a room with a patient and they're coming in with this, that, and the other thing—we're wrinkling our eyebrows and we have like 5 minutes. That is not easy to do. It's using technology to bring more people, more expertise, to bear in terms of these really hard decisions, like deciding whether a patient should be palliative or whether you should continue aggressive oncology care. So many decisions are at the heart of patient safety gaps. The approaches I see today are approaches that are around the doctor, but not with the doctor. Number two is making things more human, not less. Bringing more expertise to bear, not additional levels of automation or checklists or decision trees. Inasmuch as that approach is validated, we believe it can be used for other problems across the patient safety spectrum.