Researchers have shown how easy it is to re-identify patients in de-identified data, yet de-identified data can lose its value as more identifying factors are stripped out.
In a study published in the Journal of the American Medical Informatics Association, researchers from Vanderbilt University and elsewhere extended an algorithm to explore policy options that balance risk of violating a patient's privacy vs. the use of data for society.
The Safe Harbor model defined by HIPAA is one policy that specifies 18 rules, including suppression of explicit identifiers such as names, and generalization of "quasi-identifiers," such as date of birth, requiring recording the age of all patients over 90 as 90+. This rigid rule-based policy might not be ideal for sharing every data set, such as studies on dementia patients.
So the law allows alternatives, provided the risk of re-identification is appropriately measured and mitigated. A Centers for Medicare & Medicaid Services dataset, for instance, published on the Internet would carry a high risk because the system is completely open and the users unknown. Health data to be used by a trusted party with a data-use agreement and strong information security practices could be allowed a policy that favors utility over risk.
The researchers used the Sublattice Heuristic Search algorithm with U.S. census data from 10 states to show it can be applied to recommended rule-based de-identification policy alternatives for patient-level datasets with less risk and more utility than Safe Harbor and other models.
Harvard researchers have shown that patients can be re-identified with just their Zip code, date of birth and gender, along with other publicly available data such as voter rolls.
The Health Information Trust Alliance recently released a new framework for de-identification of sensitive patient information as part of a risk-management strategy.
To learn more:
- here's the research