New peer-reviewed data cast doubt on a proprietary sepsis prediction algorithm developed by Epic and implemented at hundreds of hospitals in the U.S.
Called the Epic Sepsis Model, the tool is included as part of Epic’s electronic health record platform. According to the company, it calculates and indicates “the probability of a likelihood of sepsis” to help clinicians identify hard-to-spot cases.
While some providers have reported success with the tool, researchers affiliated with the University of Michigan Medical School in Ann Arbor found its output to be “substantially worse” than what was reported by the vendor when applied to a large retrospective sample of more than 27,000 adult Michigan Medicine patients.
“Our study has important national implications,” the researchers wrote in JAMA Internal Medicine. “The increase and growth in deployment of proprietary models has led to an underbelly of confidential, non–peer-reviewed model performance documents that may not accurately reflect real-world model performance.”
The Epic Sepsis Model is used by 170 customers that represent hundreds of hospitals, according to Epic. Sepsis is responsible for nearly 1 million hospitalizations each year and is a substantial driver of in-hospital mortality and healthcare spending.
In what they described as the first independent validation of the tool, the researchers reviewed roughly 38,500 hospitalizations recorded at the academic health system between Dec. 6, 2018, and Oct. 20, 2019. Sepsis occurred among 2,552 (7%) of these included patients.
The Epic Sepsis Model demonstrated an area under the curve of 0.63, which the researchers said was well below the 0.76 to 0.83 range described in Epic’s internal documentation and other data reported by the vendor alongside researchers from other health systems.
Further, at the threshold value adopted by Michigan Medicine that was within the range suggested by Epic, the researchers observed sensitivity of 33%, specificity of 83%, positive predictive value of 12% and negative predictive value of 95%.
The tool, they wrote, “identifies only 7% of patients with sepsis who were missed by a clinician … highlighting the low sensitivity of the [Epic Sepsis Model] in comparison with contemporary clinical practice. The [Epic Sepsis Model] also did not identify 67% of patients with sepsis despite generating alerts on 18% of all hospitalized patients, thus creating a large burden of alert fatigue.”
In an emailed statement responding to the findings, a representative of Epic said the researchers’ analysis didn’t take into account the required tuning that should precede real-world deployment of the tool.
The threshold selected by the researchers and Michigan Medicine was relatively low and “would be appropriate for a rapid response team that wants to cast a wide net to assess more patients,” the representative wrote. A higher threshold reduces false positives and would be more appropriate for clinical use by attending physicians and nurses, Epic's spokesperson said.
Epic’s response also contested the researchers’ characterization of the model’s inner workings as a company secret.
“The full mathematical formula and model inputs are available to administrators on their systems,” the representative wrote. “Accuracy measurements and information on model training are also on Epic’s UserWeb, which is available to our customers.”
The study’s results, however, have drawn concerns from the researchers and other experts regarding the future of these algorithms.
In an accompanying editorial, authors affiliated with the University of California, San Francisco and Kaiser Permanente said the data highlight a need for external validation of proprietary prediction models prior to widespread clinical use.
Models that are successful in one setting or time period may not be applicable to others, as appeared to be the case with Michigan Medicine and the systems that previously presented sepsis prediction results alongside Epic, they wrote.
“Health systems must support data scientists who can evaluate such models in the same way that health systems currently support clinicians in tailoring national clinical guidelines to their local patient populations,” they wrote in the editorial.
This type of work will become critical as algorithms expand beyond “relatively straightforward” logistic regressions and more often employ iterative machine learning methods, they continued.
“These more complex prediction tools may present insurmountable barriers to local external validation, which will place more responsibility on model developers either to publish model performance characteristics, the variables included and the settings in which they were obtained or to provide code and data on request to enable others to verify the model transferability to the local setting,” they wrote.