At the Coalition for Health AI’s Global Summit in Las Vegas, CHAI members gathered to tease out some of the most basic questions about their proposed national network of AI assurance labs.
CHAI staff and working group leaders sought feedback on the ideas they’ve developed on their recently released AI model cards and assurance labs. Approximately 80 people from CHAI member organizations hashed out the questions they have about assurance labs, which included who the primary customers of the assurance labs would be and how the labs will disclose conflicts of interest.
CHAI has been rapidly developing a national network of AI assurance labs.
CHAI told Fierce Healthcare in August that it will announce the first two assurance labs by the end of the year. There are roughly 32 CHAI member organizations that are interested in becoming assurance labs, CHAI executives said on an August episode of "Podnosis."
The UMass Chan Medical School and the MITRE Corporation announced the launch of their AI assurance lab in March.
But the CHAI community is still hashing out some of the most basic questions around assurance labs, like who will pay for model validation and what information the lab will be validating.
The discussion was moderated through Chatham House rules, a set of guidelines that allow participants in a meeting to share information from a discussion without revealing the identity of the speaker. Chatham House rules were instituted at the roundtable discussions so journalists could attend.
The following are some of the biggest questions that participants discussed at the summit.
Who is the primary customer of the assurance labs, and who should pay?
The roundtable participants openly wondered who the primary customer for an assurance lab will be.
Many participants suggested that model developers would seek validation from the assurance labs before undergoing FDA review. In this case, participants agreed that the model developer would pay for assurance lab certification.
From the group discussion, pre-FDA model validation seemed to be a solid use case for the labs. But the scenarios only grow more complicated from there.
There are a swath of algorithms that never need to go through FDA authorization. These algorithms may be developed internally by a large academic medical center. Some of the most prolific algorithm developers are Duke University, Mayo Clinic and Stanford University, a source told Fierce Healthcare outside of the discussion.
It’s unclear whether health systems like Duke, Mayo and Stanford would use an assurance lab for a model they developed on their data and intend to deploy for their patients alone.
An assurance lab could be useful in the case that a health system wants to buy an AI product but wants it to be tested first to see whether it will work well on their patient population. This seems to be another prime use case, along with premarket testing.
But, in this case, there is debate about which entity would pay for the product testing. A doctor from a large academic medical center said that the buyers of the product, the health system, should pay. Afterall, any beta testing or internal validation efforts are currently borne by the buyer, they said.
“The quality of your assurance is probably relative to the size of your organization and the number of dollars you have. So there's a disparity kind of intrinsic in that. … if someone is taking work off my plate … perhaps I would be willing to pay for that information,” they said.
Some said that if the model vendor assumed the cost of validation, they would likely figure out how to incorporate the additional cost into their pricing, which would ultimately be paid by the buyer. Another potential issue with the model vendor paying for the validation is that they may expect a stamp of approval for paying for validation.
“When you're thinking about the funding models … healthcare organizations may be more willing to pay more to prevent that conflict of interest and that potential bias,” an attendee said in reference to model vendors paying for assurance lab certification.
The large academic medical center doctor suggested the buyer and the developer could split the cost of validation, with the developer assuming a greater percentage over time.
“I think it's got to be borne partly by the commercial efforts that are creating this. Who's making the money off of selling these things has to have part of it, but organizations have to be able to cover their own cost of governance,” another attendee said.
How specific will each lab be, and what will they be validating?
Participants discussed what the labs would be validating and how they will manage breadth versus depth of testing.
CHAI has circulated the idea that each lab would be equipped with local data that represent their part of the country.
Thus, if a health system or model developer intends to use an algorithm on mostly urban Black patients, they could use an assurance lab that could test the algorithm on a new set of data of urban Black patients.
AI is a big category, and models for use in healthcare range from advanced image reading to automatic billing. One attendee compared the diversity of models to the differences between a microwave, a boat and an airplane.
Testing the accuracy of different products could take widely different inputs and resources, one attendee noted. It will be difficult to balance breadth and depth of assurance within the labs, they said.
Assuring a single product could require deep expertise of the technology. But if a lab spends the time and money to become deeply knowledgeable about one kind of imaging algorithm, for example, it would not likely be able to sustain itself financially.
Participants said the AI assurance labs would be most useful if they compared vendors’ claims against one another, essentially cutting through the confusion of the AI vendor chaos. It’s unclear how the proposed national registry for AI validation would compare vendor claims.
Participants seemed to agree that the assurance labs’ national registry of algorithms would work well to avoid duplicating models. But there’s debate over how accessible that information will be.
“Somebody develops a sepsis model … It doesn't need to be recreated over and over and over again, right? … CHAI could be part of that brokerage of what has already been tested and could be used, and whether that's licensed or free, I don't know,” the participant said.
A December 2023 paper written by CHAI board members Nigam Shaw, Ph.D., John Halamka, M.D., and Suchia Saria said: “Such labs could provide different levels of evaluation, ranging from a technical evaluation of model performance and bias for a specific use case, to an interpretation of its performance for stratified subgroups of patients, to a prospective evaluation of usability and adoption via human-machine teaming and pre-deployment simulation of the consequences of using the model’s output in light of specific policies and work capacity constraints.”
How will rural healthcare facilities factor in?
The CHAI Global Summit participants collectively stressed that all types of healthcare facilities should have access to safe and reliable AI, including rural hospitals and clinics. But a rural hospital likely does not have the budget or capacity to internally validate an AI vendor’s claims.
Some participants said that a network of assurance labs overseen by CHAI could be good for this purpose. Moreover, the national registry of assurance lab results that CHAI has touted could also help democratize the results of AI product testing. It is unclear whether there would be a fee to access the model results.
Algorithms also require ongoing monitoring, which small healthcare organizations would likely struggle to do on their own. One participant suggested that large health systems could bear the responsibility for helping smaller clinics keep tabs on their algorithms.
There was an idea circulating at the discussion that more resourced systems should bear the brunt of the cost and the initial effort to make the results of AI validation public.
“If we had a group of 10 health systems [come] together, [start] reporting on these things ... is there a world where resource-rich health systems can do this initial sort of validation, have this reporting, and then we report that out to the rest of the nation?”
This approach could pose an issue if considerations for rural patients are not reflected in the data and testing of large systems.
One attendee said that to relieve the burden on small and rural facilities, they should have a simple and streamlined way to engage with the assurance labs.
Is it possible to pool resources?
The amount of money that large health systems are pouring into AI could collectively be used to accomplish a lot for the whole healthcare industry, but what is the likelihood that these health systems would be willing to cooperate?
“If you aggregate all health systems evaluating one technology ... we're spending, like probably, in aggregate, hundreds of millions of dollars on developing the product, but because everyone's doing it on their own, the quality of that evaluation is just what they can afford to do," one participant said.
Both health systems and companies are worried about their intellectual property being shared if they work too closely with an assurance lab that is housed within a competing institution, summit attendees said.
How will labs disclose conflicts of interest?
The conference attendees discussed what kind of disclosures should be required by the assurance labs.
A CHAI staff member pointed to the manufacturing industry, which does detailed disclosures of their conflicts of interest and suggested that the health AI industry may do the same.
“In the manufacturing space, there are labs that can both create and validate their own products, but the level of disclosure that's required to be able to do that is very detailed. It's very robust," one attendee said.
Republican lawmakers have expressed concern that CHAI’s biggest members, which include Google and Microsoft, would be responsible for validating the technologies of smaller competitors. CHAI has said this won’t be the case, though the assurance labs will likely be existing AI companies, health systems and academic medical centers.