A global collaborative teamed up to help diagnose rare lung disease with AI. Here's how they did it

Editor's Note: A previous version of this article misstated one of the OSIC partners. It is BI, or Boehringer Ingelheim, not NBI.

It’s been less than a year since Microsoft, PwC and nonprofit Open Source Imaging Consortium teamed up to do something about the speed and accuracy of diagnoses for idiopathic pulmonary fibrosis (IPF), a rare lung disease.

Already, the trio has learned a lot and run into challenges. Time to diagnosis can be more than three years, with one study estimating at least half of IPF patients are misdiagnosed once or more. No formal staging system exists to predict the progression of the disease in patients. 

Together, along with input from 122 experts from fields including respiratory, radiology, neurology and machine learning, they are building out the Open Source Imaging Consortium Data Repository, a database of medical imaging and clinical data they claim is the first of its kind and the world’s largest.

With existing silos and challenges in identifying clinical patterns, the hope is that clinicians and researchers who access the new centralized repository can leverage predictive modeling and their own insights to better target patient treatment, executives told Fierce Healthcare. 

Flipping innovation on its head

After collecting some privacy-compliant scans, OSIC took an open-source approach to build an algorithm to detect IPF progression, opening two Kaggle challenges up to AI developers around the globe. Roughly 2,500 experts looked at the data, according to OSIC.

"That's not an approach we typically take in medicine," PwC principal Will Perry told Fierce Healthcare, adding that usually these processes are done in a “walled garden.” Microsoft provided the Azure platform and AI tool sets while PwC performed the implementation. “Then we relied on the world to do the heavy lifting,” Perry said. 

The winning algorithm spotted the disease notably earlier than humans, and more accurately, Perry said. Ultimately, the platform was designed to explain the patterns detected by its AI to clinicians or other users and to help diagnose sooner and pick a more accurate treatment path.

When the partnership was first announced in September 2021, its goal was to reach 15,000 anonymized scans by the first quarter of 2022. But getting there has taken longer than anticipated, OSIC Executive Director Elizabeth Estes acknowledged. Now, 15,000 scans are projected by the end of 2022.

The biggest challenge has been trying to collect data in an industry that, generally, “has not shared readily,” Estes told Fierce Healthcare, particularly in the U.S. “There’s a lot of fear in medicine.”

Acquiring and analyzing consumer data is a multi-billion dollar industry in the U.S., and health data are no exception. Hospitals have increasingly come under fire for selling troves of medical data. Some have reportedly granted access to patient data to tech giants like Microsoft and Amazon, even though not all of it has been de-identified. 

To address potential concerns about patient privacy, OSIC went through an 8-month international review to ensure the database was compliant with GDPR, the EU’s personal data protection law widely hailed as the gold standard

Focused on progress 

Many stakeholders are involved in the database, ranging from philanthropy to academia to private companies.

“It’s a really cool mix of brainpower,” Estes said. In some cases, direct competitors are working together on the project, like Galapagos, Boehringer Ingelheim and CSL Behring. 

Looking ahead, OSIC intends to expand its collection of images beyond IPF and is already in talks to bring in Bronchiectasis scans. (Different scans will be separated by a firewall in the database.) It is also considering accepting other modalities of scans and exploring combining synthetic data with digital twins. It’s excited by the possibility of biomarkers being democratized in the future with a similar approach. The sharing and accessibility of these sorts of data are thought to facilitate more informed research and treatment. 

But the data repository’s efforts won’t be successful unless they reach clinicians and help them determine whether a patient will progress or not. “In our perfect nirvana, somebody creates a screening device,” Estes said, “and you can diagnose this sooner.” 

While scans are still being collected, the data set is not necessarily perfect—because real-world data aren’t either, Estes said, adding that this is critical to algorithm training. OSIC accepts scans that are thicker or contain less clean data than usual, for instance. 

The database is currently not open to the public—but interested academic medical institutions can get access for free in exchange for contributing 500 anonymized scans. Others, like private companies, are charged for access on a sliding scale based on their revenue, Estes said. There is a possibility the database will be opened up to a broader audience. 

“We're not here to make money,” Estes said. “We’re here to make progress.”