UCSC already has a claim to fame in the history of genomic data; it was a team from the university that published the first draft of the human genome online in 2000. Now, with a new $600,000 National Science Foundation grant, another UCSC-led team could be on its way to making genomic history—this time, defining what constitutes privacy when the information at stake is what makes you who you are.
Abhradeep Guha Thakurta, assistant professor of computer science and engineering at UCSC, is on a team exploring how to best give researchers access to increasing amounts of genomic data. The stakes are high, promising unprecedented insight into what causes—and could possibly cure—a range of diseases and chronic conditions.
How to share that valuable information without revealing deeply personal medical details is the balance that Guha Thakurta will try to strike, along with UCSC Assistant Professor of Bimolecular Engineering Russ Corbett-Detig, UCSC Professor of Computer Science Dimitris Achlioptas, and Temple University Assistant Professor of Statistical Science Vishesh Karwa.
“Your genome sequence is your fingerprint,” says Guha Thakurta, a clue to highly individualized strengths and weaknesses in human biology, which is also increasingly of interest to at-home gene analysis companies, drug makers, advertisers, and other business and research interests.
The explosion in genetic data is fueled in part by a huge decrease in the cost of genetic sequencing, from around $3 billion for the groundbreaking Human Genome Project to $1,000 today for whole-genome sequencing. Companies like 23andMe offer a less-detailed view of a person’s DNA for as little as $100.
Companies are cropping up to charge people for all kinds of insights purportedly based on their DNA. Many operate in the field of “personalized medicine,” offering a chance to adapt medical care and behavior to individual genetic health risks. And then there are ventures like Helix, which offers products “personalized by your DNA,” from $90 weight-loss plans and $60 wine recommendations to color-coded genetic results printed on socks, shirts and tote bags.
When people take the plunge to learn about their DNA, it’s also not just their own information they’re sharing (or wearing). Some 60 percent of Americans of Northern European descent can be identified through genetic databases, regardless of whether they’ve personally joined, a recent study found. That number could reach 90 percent within three years.
With companies and researchers vying for gene data for their own purposes, the researchers at UCSC are trying to allow medical teams to access more shared data—wherever it may be—without compromising deeply personal details. “Privacy is not a scientific word,” Guha Thakurta says. “It is an expectation of people.”
He brings years of experience dealing with this gray area, including privacy work at Microsoft Research, the security group at UC Berkeley and Yahoo Labs. Guha Thakurta also worked at Apple from 2015-2017 on “differential privacy,” a way of gaining insights from a group of users’ data without revealing information about individual users. So far, that’s been difficult to do with hyper-specific genetic data.
As it stands, when someone spits in a tube and sends it to a private company to be sequenced, they often don’t know where their data is going or how it’s going to be used. But there is at least one nearby startup trying to change that, offering customers a chance to control their DNA—and make money off of it.
Most people are paying personal genomics companies “for the privilege of having them take your data and resell it,” says Kamal Obbad, CEO and co-founder of San Francisco startup Nebula Genomics. He pitches a world where the cost of gene sequencing shifts from individuals to organizations using their data by letting people sell directly to researchers or buyers like biopharmaceutical companies.
That makes it more important to answer social and regulatory questions about who genetic data belongs to, Guha Thakurta says. Ultimately, he hopes the new grant project will yield privacy protections that go beyond an academic paper, to actually be used by those who control genomic data—whoever they may be.