In a report that appears in the Jan. 18 issue of the journal Science, researchers say they were able to determine the identities of nearly 50 people who had undergone genome sequencing. Only males could be directly identified because the scientists used information found only on the Y chromosome.
The first step was going online and pulling up the anonymous but unique information.
“Right now in genomics, we have databases that are publicly available and they contain thousands of genomes — but without explicit identifiers, without the name and the surname of the person,” explained study author Yaniv Erlich. “What we’ve shown is that you can take a genome from this publicly available database for research, and analyze the Y chromosome in this genome if this is a male.”
That may be possible for Erlich — a distinguished fellow at the Whitehead Institute for Biomedical Research — but what about someone without specialized knowledge?
“You need some skills to do that,” he said. “There are available tools that you can use, but you need to know what to do. It’s not like a layperson can start tomorrow doing this research. Surely there is some learning curve.”
But, he added, “You don’t need a lab, just a computer with Internet connection.”
The next step involved genetic genealogy sites — such as Family Tree DNA — that allow people to search for their ancestors.
“So we analyze the Y chromosome, and then you can take this data and go to a different database of recreational genetic genealogy,” Erlich said. “And some of these databases, they have a search engine where you can plug these Y chromosome markers in search for matches.”
And if the anonymous genome specimen donor is related to anyone on the ancestor search site — voila, the last name appears.
“And it wouldn’t have to be their brothers,” Erlich said. “We found it could be your second cousin, once removed. It can be even larger than that. It can propagate quite far on your family tree — this connection between a surname and the Y chromosome.”
Women — who only have X chromosomes and who rarely pass their surnames on through generations — cannot be directly identified with this method.
Even though researchers had uncovered the last name, another step remained.
“There are tens of thousands of people in the U.S. with [the same] surname,” Erlich said. But, he noted that HIPAA [the U.S. Health Insurance Portability and Accountability Act] privacy rules allow participants’ ages and state to be posted on research databases.
Once age, state and name are all revealed, he said, the possibilities narrow to about 12 people. Internet search tools and other public information can further allow a user to pinpoint an individual.
“If you look at the genomes of someone, you can see predisposition for certain [medical] conditions, but maybe this is not the most sensitive information,” Erlich said. For instance, he noted, evidence of non-paternity within a family could be revealed. Medical insurers aren’t allowed to deny coverage based on genetic data, he said, but life insurers are not prevented from doing so.
Laura Lyman Rodriguez, director of the Office of Policy, Communications and Education in the U.S. National Human Genome Research Institute, discussed the balance between protecting privacy and advancing science and health.
“Tightening security or locking down data is not always the best answer,” said Rodriguez, who co-wrote an editorial accompanying the study. Genomic databases provide a huge, vastly improved pool of knowledge for scientists everywhere, she said.
With the Human Genome Project, it was decided to put all information “into the public domain within 24 hours of it being produced,” she explained. “So it wasn’t then only a benefit to those trying to complete mapping the genome and the production of the sequence itself, but also to begin using it to understand biology, which can then improve health; to look at what genes might be contributing to disease.”
Rodriguez and Erlich emphasized both the importance of sharing public data and the need for full disclosure to research participants.
“We want to be open and transparent with the participants. We want to tell them the benefits — and I’m not diminishing the benefits,” Erlich said. “We want to tell them the risks. And we are very open and clear about the risks: You might be identified. It’s about autonomy. It’s about empowering these people who take this and form their decision.”
Rodriguez said the study highlights the need to stay on top of privacy issues.
“There are privacy protections in place for participants and we just need to make sure that those remain up to date with the technology and methods and information available to the public, so that they remain as robust as we want them to be and the public wants them to be,” she said.