Researchers from the Data Privacy Lab at Harvard University have shown how easy it is to identify and link anonymous participants in a public DNA study with their personal data.
Professor Latanya Sweeney, director of the lab, and her research assistants linked data back to patient names in 42 percent of a sample of records from the Personal Genome Project, according to a Forbes article.
More than 2,500 people have submitted their DNA and personal information to the Personal Genome Project, which aims to sequence and make public the complete genomes and medical records of 100,000 volunteers in an effort toward furthering personalized medicine.
Though participants' names do not appear on the Internet, medical information such as abortions, illegal drug use, alcoholism, depression and sexually transmitted diseases appears in their medical records.
Of the 1,130 records scrutinized, about 579 provided their Zip code, date of birth and gender, three pieces of information that can be combined with other publicly available data such as voter rolls to learn the person's name. Sweeney succeeded in naming 241 of the sample. A secondary method included checking files that participants had uploaded to their profile. Often these files included their name, says a website set up to explain how they did it. A separate page on that site allows anyone to check how common their own Zip code, date of birth and gender are. To help protect themselves, Sweeney suggests research volunteers provide less or incomplete data, such as their birth year only, but no month and date.
A 24-page consent letter states that the information participants provide "may be used, on its own or in combination with your previously shared data, to identify you as a participant in otherwise private and/or confidential research," iHealthBeat notes.
Research published in January from the Whitehead Institute for Biomedical Research in Cambridge, Mass., found that genome research participants could be identified even from pooled data.
Though concern reportedly is rising about the possibility that de-identified patient data can later be re-identified, the U.S. Department of Health & Human Services' Office for Civil Rights has issued guidance stating that no method of de-identifying patient data is 100 percent effective.