If there’s a will, there’s a way to hack global genomic databases. Researchers in The American Journal of Human Genetics describe how this could happen, and what steps could help prevent such privacy breaches.
The Global Alliance for Genomics and Health (GA4GH) has created a network of “beacons,” or web servers that facilitate data sharing of genomic information by answering allele-presence questions on specific genomes. As an example, someone could ask the web server about a genome that has a specific nucleotide, and the beacon would respond “yes” or “no.”
Although they’re set up to share data and protect patient privacy simultaneously, there are instances in which beacons could potentially leak phenotype and membership information.
“For instance, identifying that a given genome is part of the SFARI [Simons Foundation Autism Research Initiative] beacon, which contains genomic data from families with a child affected by autism spectrum disorder, means that the individual belongs to a family where some member has autism spectrum disorder,” explained authors Suyash Shringarpure, PhD, and Carlos Bustamante, PhD, researchers at Stanford University School of Medicine.
The Genetic Information Nondiscrimination Act (GINA) offers some genetic privacy protection, yet not all insurance situations are covered under GINA, including long-term, life, and disability.
To determine whether identities could be compromised through the beacon system under various scenarios, Shringarpure and Bustamante incorporated a likelihood-ratio test (LRT) to detect “allele presence or absence responses from a beacon to predict whether a given individual genome is present in the beacon database,” they explained.
Through simulations, they demonstrated that by making just 5,000 queries, it was possible to identify someone or their relatives in a beacon with 1,000 individuals. Even with sequencing errors, it is possible to re-identify an individual, they found. “Here, we show that individuals in a beacon are susceptible to re-identification even if the only data shared include presence or absence information about alleles in a beacon,” the researchers indicated.
Several methods could potentially make it harder to detect membership in a beacon, but also might complicate things for legitimate users of these web services, the researchers indicated. Increasing beacon size is one option, yet “protection against genome-wide re-identification attacks will require tens of thousands of individuals,” they noted.
Other potential solutions involve aggregating datasets to obscure data sources or calling for minimum beacon sizes.
Disallowing anonymous access or requiring that users authenticate their identity to access these web servers might be the most effective step toward patching up security holes in the beacon system, the researchers suggested.
Shringarpure and Bustamante have been working with investigators from GA4GH to mitigate security breaches in genomic databases.