Abstract

The privacy risks from individuals’ genomes have garnered increasing attention. Recent research studies and forensics have underscored the ability to re-identify a person using genomic-identified relatives and quasi-identifiers, such as sex, birthdate and zip code. However, summary omics data, such as gene expression values and DNA methylation sites, are generally treated as safe to share, with low privacy risks – though research studies have indicated they could be linked to existing genomes. We have demonstrated that some types of summary omics data can be accurately linked to a unique genome. We developed methods to match against genotypes in consumer genealogy databases with their restricted tools. Thus, the theoretical privacy concerns regarding summary omics data are now practically relevant. The ability to link sets of quasi-identifiers can reveal a research participant’s identity and protected health information. Most important, such risks increase over time, activated by new techniques, new knowledge, and new databases. Thus public omics data may become privacy time bombs: safe at the time of distribution, but increasingly likely to compromise personal information. The need to preserve individuals’ genomic privacy for their lifetime and beyond (for descendants and relatives) poses unique challenges to the effective sharing of high-throughput molecular data.

Video Recording