Industry Voices—How Amazon's health venture could help turn unstructured EHR data into refined fuel

Electronic health records
Amazon's natural language processing solution needs refined health data to make an impact. (Getty/monkeybusinessimages)

As I’ve written before, we need fresh thinking and new investments in healthcare, but this is a complex industry with some unique challenges.

In the long run, a mash-up of old and new could lead to real breakthroughs. In the short run, we should expect failures and flameouts. Recent reports about Amazon’s new health data service provide a good opportunity to explore these issues.

Amazon is starting to apply advanced natural-language processing (NLP) technology developed for their retail business to the problem of extracting discrete data elements from the unstructured data typically found in documents and reports in an electronic health record (EHR).

There is a lot of this “buried treasure,” and traditional extraction methods are labor-intensive and error-prone. Automating this by applying AI will be a boon to healthcare but presents significant challenges. I connected with two experts, Eric Rosow, CEO of Diameter Health, with 30 years of experience in healthcare technology and Murali Minnah, strategy officer for WiredInformatics, to get their perspectives.

From raw crude to refined fuel

Rosow uses an analogy to gasoline production.

“To fuel the healthcare ecosystem, you first have to process the massive volumes of unrefined ‘digital crude oil’ that is often contained in disparate ‘tanks’ and silos across the continuum of care," he said. "A primary challenge is connecting ‘the pipes’ to EHRs and other repositories."

"True semantic interoperability must comprise the aggregate of both infrastructure (‘the pipes’) as well as normalization and enrichment software (‘the refinery’) to account for shortcomings in current clinical data exchange standards," he added. 

The “data refinery” model: From raw crude to highly refined and enhanced (Sansoro Health)

From a pipes perspective, the location of data within an EHR database can vary widely, creating huge mapping problems. Legacy EHR integration solutions are cumbersome, limited in capability, brittle, and expensive to build and maintain. API-based integration solves many of these problems and is getting increasing traction in health IT. 

From a refinery perspective, healthcare data is unique. It is extremely heterogeneous and denormalized with enormous variation in how it is collected, coded, and processed. There’s a bewildering array of formats with their own quirks and unique sets of rules. GIGO (garbage in, garbage out) rules. Rosow’s colleague, Mark Andersen, observes, “Your surgeon wouldn’t do surgery with dirty hands and similarly, you shouldn’t do analytics with dirty data!”

Rosow concludes, “Today we suffer from ‘clinical data disorder’ with gaps in content and enormous variation in the way information is collected, coded and processed. Healthcare data, in particular, is far more complex and varied than other industries. We’ve spent years attacking these problems. This is not for the faint of heart.”

Building, fueling and tuning the engine

NLP applications like Amazon’s must be trained and fine-tuned using this “fuel.”

Like normalization, this is an iterative process that is both art and science. Various companies have been solving these problems with a specific focus on health data, and that experience provides valuable insights.

“What we have experienced over the last few years is the high intensity and rigor required to solve NLP use cases in healthcare,” says Minnah.

“Tuning the engine’s output for use cases requiring high precision to those that require high recall is very unique to healthcare and demands interdisciplinary expertise that needs to be established for each use case," he added. "This core expertise has to be built over several years."

Time will tell how effectively Amazon can solve these challenges and at what pace. Murali believes that regardless, Amazon's entry into this field will further propel rapid innovation in agile and nimble competitors.

Dave Levin, M.D., is the chief medical officer for Sansoro Health. He is a nationally recognized speaker, author and the former CMIO for the Cleveland Clinic. You can follow him @DaveLevinMD or email [email protected]