'Things get built and forgotten about': The challenging work of maintaining AI tools in clinical settings

With AI being all the buzz in healthcare today, many questions about the technology still remain. Do clinicians understand and trust AI tools? Are algorithms accurate and free from bias? What guardrails are there to protect patient privacy?

At the 2024 SXSW conference, the technology garnered considerable attention across sessions. One panel on the future of AI in healthcare, which drew a particularly large crowd, had panelists discussing everything from concerns to potential to what's needed for safer, more integrated and more impactful systems. 

Claire Novorol, M.D., Ph.D., co-founder and chief medical officer of Ada Health, emphasized that many general-purpose AI tools are not trained on healthcare data. They have meaningful potential, but they also come with more risk, she said.

As a consumer-facing app doing clinical assessments, Ada is more than just an administrative AI tool. It supports clinical decisions.

“That means you need an enormous focus on safety, on quality,” Novorol said on the panel. “That’s a huge mountain to climb and needs years of dedicated focus.” 

Jesse Ehrenfeld, M.D., president of the American Medical Association (AMA), cautioned that HIPAA only applies to covered entities—that excludes private companies. One company, which he did not call out by name, is making claims that its app is HIPAA-compliant. That is entirely misleading because it was never bound by HIPAA, Ehrenfeld noted. 

“Companies should not do that, and yet we’re starting to see examples of that unfortunately on the consumer-facing side," he said during the panel. 

Nevertheless, patients are increasingly excited about AI, Alex Stinard, M.D., regional medical director of Envision Healthcare, said on the panel. When Stinard, who specializes in computer vision and large language models, first wore Google Glasses to see patients a few years ago, many were confused by it.

Today, when he tells people he’s using a tool like an AI scribe, he sees a lot of excitement from patients and physicians. “The buzz of AI hitting the community makes everybody super excited about it,” Stinard said on the panel. “Now that it’s getting incorporated into their life, they see the actual true value of it.”

Fierce Healthcare caught up with AMA's Ehrenfeld after the panel about benchmarks for AI tools, the challenge of maintaining them and how to be transparent with patients. 

This interview has been edited and condensed for clarity.

Fierce Healthcare: The Open Source Imaging Consortium Data Repository, a collaborative with Microsoft and PwC, is collecting de-identified medical imaging to predict the progression of a rare disease. Something they've told me in the past is that it's difficult to get providers to share that kind of data. How do you think about the need for that type of collaboration? 

Jesse Ehrenfeld, M.D.: There's a need. And one challenge is there are many consortiums, many centers, many industry players that are all trying to do the same thing. And it makes it really challenging to actually understand the efficacy of a given tool. 

If there was some framework where that problem was solved, where we thought we really had ground truth in a data set that wasn't biased, that was representative—that would be really helpful. It doesn't exist. Could somebody build that? Maybe. Is there a role for the federal government to take that on? I don't know. 

But as the FDA figures out their regulatory framework for tools, knowing what those tools are benchmarked against is going to be really important for professionals to understand how well they work, as well as for patients.

The question that you asked, foundationally—what are these things being tested against—is a really important one, and how do we get data into those systems. I think there's been concern around how data is getting monetized. There's been some pretty negative press when patient data is sold without patients’ explicit consent. So there's a lot to be sorted out as we solve this problem of having data repositories that can truly allow us to not just develop, but also benchmark these tools.

FH: Say a health organization is implementing an AI tool, how should they think about collecting data on how it's impacting patient safety or just user feedback?

JE: I saw this in several of the health systems where I've worked. It's really easy to build an algorithm, to build a clinical decision support pop-up to remind me to do a thing. Things get built and forgotten about. And then they just run forever. And we can't allow that to happen. Particularly when things change, or an algorithm evolves or no longer works—I built software. I deployed software. And it ran fine. And unless you are very intentional about monitoring the performance of these tools over time, they degrade.

We had a lab system change, and they changed the label on a glucose value in one system, and it broke everything we built. And nobody realized it for three weeks. Those examples happen all of the time in the pretty routine, state-of-the-art clinical decision support development space that's not even AI.

Imagine this problem extending across AI tools. There has to be an intentionality and some diligence in making sure that we have an understanding of how these tools are holding up over time as they're deployed in the real world. 

FH: Do you think that's on the provider's internal tech team, or that's on the company that's providing that product?

JE: I think it depends on who creates the tool. Most of these tools, their performance is going to be shaped by the conditions of deployment, and that is often something that in many cases the companies aren't going to have control over. But it takes a lot of resources to do that, it takes a ton of time and money and energy to deploy something. And then to just do the work to understand how it's actually playing out is almost as resource-intensive.

FH: It sounds like you're saying a lot of these tools that are being deployed ultimately aren't effective long-term because these things aren't being tracked.

JE: I don't think we know. Nothing is worse than having pop-ups or alerts that are meaningless. It drives clinicians crazy. We have too much of that already. I remember when we switched electronic health record vendors at an institution where I used to work—there were thousands of rules firing in the background and nobody had cataloged it.

Nobody really knew all the things that were going on, until we were trying to figure out, well, what do we actually need to rebuild or carry forward? And there's all this stuff that just didn't make sense, but nobody had obviously done that.

FH: With AI tools that help with clinical decision-making, do you have a concern about skill atrophy? Like maybe physicians lose some of their own skill sets?

JE: I don't. At the end of the day, there will always be a moment when the power might go off and my ventilator might [not] work or the computer system will go down and we need to account for those things, we need to plan for those things. 

But to not benefit from what these tools can do to make systems more resilient and more reliable, I think, would be a mistake.

FH: How do you think providers should be broaching this conversation of 'here's how I use AI in my practice, here's how it's gonna affect you?' How should a provider start bringing that up? 

JE: It’s a space where we don't really know what the best practice ought to be. Certainly there should be some transparency. If we're sharing data and it's going somewhere to a third party, there are probably circumstances where that ought to be shared with the patient so that they know that.

If somebody is listening to a conversation, whether that's a virtual scribe in another country, down the hallway, or AI, you should probably let the patient know that so that they have an understanding and agree to allow those tools to be used. But I don't think that there's a clear set of best practices. I think it's a space where we have more work to do.