HLTH24: Atropos Health unveils chat-based AI co-pilot for real-world evidence

LAS VEGAS—Atropos Health's large language model for real-world evidence is now ready for prime time after a successful beta program.

The startup, maker of a real-world data platform, announced this week that ChatRWD, a generative AI solution, is now more broadly available to clinicians and researchers.

A year ago, the company launched a new operating system, Geneva OS, and a chatbot interface to help generate observational studies rapidly and at scale. That technology, ChatRWD, reduces the time to produce high-quality publication-grade real-world evidence from months to minutes through a chat-based AI co-pilot, according to the company. The generative AI application is designed to deliver full observational studies on healthcare data in minutes, the company claims.

ChatRWD answers clinical questions more accurately than existing large language models like ChatGPT and Google Gemini, fueled by data from the Atropos' evidence network, the largest federated data network with over 300 million anonymous patient records, according to the company.

Every answer comes with a "real world fitness score" (RWFS) - evaluating the “fit-for-purpose” of the dataset used to answer the question. The average RWFS score across the questions in the evaluation was 79.8, indicating high quality data, Atropos said.

During the beta program, over 2,000 questions were asked by testers from dozens of health system and life science organizations. Testers reported significant time savings over other forms of “information foraging” such as literature reviews and second opinion consults, which can take hours and days. The average time from question to study was 5.23 minutes.

Observational research on real-world data is not new, noted Brigham Hyde, Ph.d., CEO and co-founder of Atropos Health, but it typically takes weeks and months to go from the data layer to producing a publication-grade piece of evidence. Atropos Health uses LLMs and automation to significantly speed up that process.

ChatRWD also outperforms other large language models (LLMs) when it comes to trust, quality, and accuracy, Hyde said.

During the beta, Atropos Health conducted a comparison of LLMs, which highlighted several critical outcomes: General-purpose LLMs (ChatGPT-4, Claude 3 Opus, Gemini Pro 1.5) produced relevant and evidence-based answers less than 10% of the time. ChatRWD outperformed all other LLMs, completing 94% of the answers. Independent physician reviewers found that ChatRWD produced the best answers for novel questions in 87% of the cases.

"What that showed is that we're answering far more questions than those LLMs more accurately. And, we really won on physician trust. We were the most trusted answer most often. We can produce high-quality results and evidence that physicians trust. That's why we're coming out of beta right now, because we've proven that to a level that even beyond the paper and we think it's ready for prime time," Hyde said.

Physicians who tested the technology during its beta phase reported that ChatRWD provided "clear, concise answers, backed up by hardcore data and statistics that was transparent," Hyde noted. "Everything needs to be backed up by transparency and accuracy," he added.

Clinicians and researchers can register to register to use ChatRWD at this site.

"The pace of innovation in AI and LLM technologies and their adoption in healthcare is rapid," said Neil Sanghavi, president of Atropos Health in a statement. "High-quality clinical evidence must remain the cornerstone of value in healthcare. ChatRWD's results during the beta outperformed our expectations—both technically and on user satisfaction—which led us to today’s release."

Atropos Health sits at the intersection of two rapidly growing fields, real-world evidence research and artificial intelligence.

The startup, founded in 2019 as a spin-out of the “Green Button” technology developed at Stanford University, developed a consultation service for doctors powered by publication-grade real-world evidence to guide clinical decisions. The technology can quickly answer clinical questions, such as which drug is most effective for certain cancer patients.

There’s a severe evidence gap in healthcare treatment guidelines and physicians often rely on their own judgment and educated guesswork. According to a BMJ study in 2017, only 18% of clinical recommendations made by primary care physicians are based on current evidence. Clinical trials shape most care guidelines and payer decisions, yet they exclude about 70% of the U.S. population, according to Hyde.

At the same time, physicians are eager to use generative AI-based tools to help with notetaking and research, there has been significant uptake in the use of publicly available tools like ChatGPT, which is not trained on medical data.

Fierce Healthcare collaborated with physician social network Sermo to conduct a survey of physicians. It found that 76% of respondents reported using general-purpose LLMs in clinical decision-making, as Senior Writer Anastassia Gliadkovskaya reports. More than 60% of physicians reported using LLMs like ChatGPT to check drug interactions, while more than half use them for diagnosis support. Nearly half use them to generate clinical documentation, and more than 40% use them for treatment planning. Seventy percent use them for patient education and literature search. 

"People, including doctors, love this user experience," Hyde said, referring to the generative AI-based tool. "They like automating things that are more arduous than it should be for them. You have to pay attention to that user experience demand. But our belief is, if you look at even your best big tech LLM, even the best ones, they're not accurate enough. We had independent clinicians review the detailed answers. There were situations where they came to the wrong conclusion, but it looked as if there was a confident conclusion. There were situations where they made up a reference. That stuff is what's coming behind us. I believe that doctors will have to trust this to continue using it. If they go there because they like the user experience, trust is the very next wave beyond that."

Since the beta launch a year ago, Atropos Health made substantial technical upgrades to ChatRWD including a 4x increase to the phenotype library, expansion to support multiple study designs including case series, cross-sectional, and comparative analysis studies and the ability to detect study design. With these advancements, Atropos Health predicts ChatRWD will be able to answer 70-80% of all potential questions across both descriptive and causal use cases and enable 10x the study volume per user.