Industry Voices—How to transform healthcare with real-time deep link analytics? Graph databases

By Gaurav Deshpande Jun 19, 2019 11:37am

Americans spent $3.65 trillion on healthcare in 2018, federal actuaries estimate. That’s an increase of 4.4% over 2017, and it is projected to rise 5.5% per year for the next 10 years until 2027. The United States is not the only country facing this challenge.

Global spending on healthcare is expected to top $18 trillion by 2040.

To address this challenge, healthcare payers and providers are focused on four strategic imperatives. Big data technology is proving crucial in how they’re addressing each one. However, it’s important for these stakeholders to understand what works well and what doesn’t. Here's a look:

1. Rising costs

Healthcare payers and providers are seeking ways to contain rising costs while also exploring options to improve the quality of care for all of their members. To deliver on this imperative, it’s important for them to understand the relationships among the members or patients and the prescribers or doctors, especially as they pertain to the member’s journey to wellness.

Unfortunately, most tools for storing and analyzing healthcare data are built on relational databases. These databases store the data for each entity such as member, prescriber, claim, or facility (e.g., hospital or treatment center) in separate tables or databases. To understand the relationships among members, prescribers, facilities, and the claims connecting them to each other, all of these tables or databases must be joined together.

As the size and complexity of the data grow, database joins become time-consuming and computationally expensive, making the relational database an expensive and impractical solution for understanding and analyzing relationships. Relational databases also fall short in “similarity analysis”: the ability to delve deeply into interconnected entities (similar treatment, similar providers, etc.) and uncover patterns in real time.

As United HealthGroup CEO David Wichmann told CNBC, “It’s about making the best use of the data that you have and converting it to information, applying it against best-known science, identifying gaps in care, places where you can, you know, change the way in which you access the health system and make yourself better.”

2. Rampant waste, fraud, and abuse

In March 2018, Kaiser Health News shared a YouTube video showing how an unscrupulous laboratory billed nearly $2,000 for a simple urine test by piling on unnecessary tests to pad the bill. This is but one of many examples of healthcare-related fraud that, according to the National Health Care Anti-Fraud Association, costs the United States about $68 billion annually—3% of the nation’s total healthcare spending.

Finding these fraudulent activities requires analysis of internal data regarding patients, hospitals, and physicians and connecting it with external data such as recently used addresses and phone numbers to find the hidden connections in the data. It also requires benchmarking the end-to-end cost of care for every member across a network of doctors and treatment facilities to identify higher-than-average cost of care, especially for new insurance policies issued under Obamacare or other government medical care plans.

3. Linking public data with internal health records

This is necessary to gain additional insights on drug and treatment efficacy. It requires connecting the adverse reactions observed with a specific drug as a part of treating members with the publicly reported data to understand if the guidelines for usage of the drug need to be updated.

The process has been especially challenging because the information infrastructure for the healthcare industry is built on relational databases that require several months to integrate with new data sources. Healthcare organizations are looking for a more agile solution to integrate and connect public data with internal data to improve the quality of care for their members.

4. Member satisfaction

The net promoter score, or NPS, is commonly used as the key metric to measure and improve member satisfaction. The NPS surveys cover a small fraction of the members and provide useful but limited insights into member satisfaction. The behavior of the member is a better indicator of their satisfaction with the healthcare services. Are they calling customer service often with issues? Do they need to call or contact multiple times to resolve an issue? Have they missed or canceled multiple appointments with their primary care provider or specialist? Are they refilling their prescriptions on time?

All of this information regarding the patient journey is key in driving NPS and requires the healthcare industry to build a 360-degree view of the member or patient combining data from all parts of the organization. Data warehouses, as well as data marts based on Hadoop, have created massive repositories with hundreds of terabytes to petabytes of this data. However, they struggle to understand, connect, and analyze this data in real time for the member satisfaction insights.

Graph databases

Fortunately, the latest graph databases are designed to handle these challenges. Graph databases are purpose-built to address all of these imperatives, as they model the data as interconnected business entities. The relationships among members, prescribers, facilities, and claims are modeled in the schema and do not require expensive joins, as business entities are already connected.

First- and second-generation graph databases are excellent solutions for small data sets with moderate complexity. However, they can’t scale to meet requirements as the data volume grows to medium and large size and the complexity requires analysis of the relationships that go deeper than three levels.

Native parallel graphs, however, are built to understand, explore, and analyze the complex relationships in healthcare data, allowing data scientists and business users, such as fraud investigators, to go 10 or more levels deep into the data in real time across billions of claims and millions of members and prescribers.

The objective should be a real-time, 360-degree view of healthcare. And that’s challenging because it requires combining three types of data from a variety of internal sources.

The first type of data is master data about members or patients, prescribers, healthcare providers, hospitals, and their various facilities. Master data covers information such as name, address, email, and phone number as well as details regarding specialty, subspecialty, equipment and healthcare services offered.

The second type is operational or transactional data, which includes healthcare claims and payments as well as the member’s electronic health records, which are updated throughout the member’s journey to wellness. The third type is historical data: petabytes of information stored in data warehouses, data marts and massive Hadoop data lakes.

All of this internal data must be combined with data from partners such as hospitals, other healthcare providers, third-party data sources such as OpenCorporates—the world’s largest open database of corporate information—Thomson Reuters and Dunn & Bradstreet, as well as public data sources including FAERS (FDA Adverse Event Reporting System) and openFDA.

Native parallel graph databases combine these three types of data in real time to deliver the four strategic imperatives. They deliver consistently high quality of care while controlling costs; detect and prevent waste and abuse; link public data with internal data to improve healthcare outcomes; and improve net promoter scores.

Rising healthcare costs is a complex issue with many causes. But native parallel graph databases are just what the doctor ordered to cure some of the contributing factors.

Gaurav Deshpande is vice president of marketing at TigerGraph.

Big Data Digital health Healthcare Costs Hospital Impact Industry Voices News Hospitals Health Tech