Description

Challenges in Genomic Data Sharing and Patient Privacy

A number of large genomic datasets are being generated to understand human disease and history. Data sharing is critical for maximizing the information extracted from these datasets. Patient privacy is an important consideration for public participation in research and data sharing.
 
Recently, the Global Alliance for Genomics and Health (GA4GH) has proposed a new mechanism for data sharing, called beacons. Beacons are web servers that answer allele-presence queries—such as “Do you have a genome that has a specific nucleotide (e.g., A) at a specific genomic position (e.g., position 11,272 on chromosome 1)?”—with either “yes” or “no.” In this talk, I will show that individuals in a beacon are susceptible to re-identification even if the only data shared include presence or absence information about alleles in a beacon. Since beacons can have a specific disease/phenotype associated with them, confirming membership can be informative about disease status and thus compromises privacy. Specifically, I propose a likelihood-ratio test of whether a given individual is present in a given genetic beacon. In a simulated beacon with 1,000 individuals, re-identification is possible with just 5,000 beacon queries. Relatives can also be identified in the beacon. With just 1,000 SNP queries, I was able to confirm the presence of an individual genome from the Personal Genome Project in an existing beacon. These results show that beacons can disclose membership and implied phenotypic information about participants and do not protect privacy a priori. I will discuss risk mitigation through policies and standards such as not allowing anonymous pings of genetic beacons and requiring minimum beacon sizes.

All scheduled dates:

Upcoming

No Upcoming activities yet

Past