UC Santa Cruz research explores the genetic basis for cancer
It took more than 10 years for researchers at UC Santa Cruz to compile the first sequence of the human genome—the complete collection of DNA that serves as the alphabet of life.
Since this historic date in 2003, sequencing technology has progressed at a breathtaking rate, and it now takes just months to amass the same information.
This has created a surprising predicament on the UCSC campus: sequence data is piling up faster than researchers can analyze it. The mounting collection of tumor sequences is particularly unmanageable, say scientists.
“It’s surprising that we have so much information and yet we don’t know how one body of data relates to another,” says Ted Goldstein, who recently obtained a doctorate in biomolecular engineering at UCSC. “We need to interconnect all [of] the genomic information with clinical data.”
A new $3.5 million grant aims to bridge this gap. The funds were awarded by the National Cancer Institute earlier this month, and will help researchers at the UCSC Center for Biomolecular Science and Engineering develop data management tools of unprecedented sophistication.
The primary goal is to develop models that relate genetic information to clinical outcomes like drug resistance and survival rates. This will allow doctors to treat a new patient based on the molecular signature of their tumor cells, says Goldstein.
A Much Bigger Puzzle
UC Santa Cruz archives tumor sequences from doctors and scientists across the nation. Data often include RNA sequences that can be used to measure gene expression.
“Our contribution will be the development of tools that can be used to associate sequence information with clinical data,” says Josh Stuart, professor of biomolecular engineering and the principal investigator on the grant project.
The clinical data corresponding to each tumor sequence can sometimes be accessed through the National Cancer Institute, allowing researchers to ask a range of questions. Mutations have already been linked to drug resistance, and depending on the clusters of genes expressed at any given time, tumor cells take on a different growth pattern.
“I picture it like an onion,” says Stuart. “In the core are DNA sequence reads of the tumor cell, and then there are variants between tumor cells and healthy cells. We also have gene expression patterns and signaling pathways. The goal is to develop computational methods that inter-relate all the layers, and link the results to clinical outcomes in patients.”
This is easier said than done. Simply locating genes in a sequence of DNA takes a special algorithm, as does the identification of variance between and among tumor types. Statistical analysis must be used to compare every gene in a resistant tumor to the corresponding gene in a treatable tumor. The relative difference between each gene becomes a signature used to predict clinical outcomes.
This is an expensive and time-consuming process, which is why studies have so far focused on a single type of cancer. “Sometimes cancers arise in multiple tissues, and they share molecular properties,” says Stuart. “This is one of the reasons we want to compare data across tumor types.”
Competing for Survival
Piecing together the complete puzzle requires relating molecular information for brain, breast and many other cancers. This, in turn, requires rigorous testing of the models developed for each tumor type, says Adam Margolin, director of computational biology at the Seattle, Wash.-based Sage Bionetworks.
Margolin will help Stuart identify the best models for predicting clinical outcomes. Stuart and their collaborators will design a wide variety of computational methods, and Margolin will test each model on the same group of patients.
Margolin and his colleagues recently evaluated 1,400 models used to predict survival in breast cancer patients. To monitor gene expression, many models look to messenger RNA, which carries a copy of the DNA to the cellular machinery responsible for protein synthesis. The relative abundance of unique messenger RNA indicates which genes are being translated to proteins, and this is thought to predict survival.
There is just one problem: researchers can’t agree on which patterns of messenger RNA are the best predictors. “The literature is inconsistent,” says Margolin.
To tackle this debate, he and his colleagues collected models from 350 teams of researchers located in 35 countries. They tested each model on the same batch of patients and selected a superior model based on its accuracy. The findings appeared in Science Translational Medicine last April.
Margolin will now run this same competition for the models developed by Stuart and his team. “The idea is to test methods on larger patient groups and answer a wider range of clinical questions,” he says.
Medicine Based on Molecular Profiles
Three years ago, Stuart and his team set out to identify the genetic factors underlying drug-resistant breast cancer. The findings recently appeared in the academic journal Nature and offer a glimpse of the connections the researchers will make over the coming years.
When a patient is diagnosed with breast cancer, doctors biopsy the tumor and profile receptors in the cell surface. If there is an abundance of estrogen receptors, it’s likely the tumor grows in response to the hormone estrogen. Other types of receptors suggest the cell is addicted to progesterone or human growth factor.
Blocking the receptors is the most common treatment. For example, patients with estrogen-addicted tumors are given blockers that specifically target estrogen receptors, “but sometimes this type of personal therapy doesn’t work,” says Goldstein, who collaborated with Stuart on the project.
To find out why some patients are more responsive to treatment, the team scanned the whole genomes of 77 drug-resistant tumors. They identified several important mutations, including MALAT1 and BIRC6, which appear to drive tumor growth even when estrogen is blocked at the cell surface.
“If you have one of these mutations, estrogen blockers may not help,” says Goldstein.
The model may one day allow doctors to prescribe treatments based on patient mutation profiles. If a new breast cancer patient tests positive for an estrogen-addicted tumor, it will be feasible to look for MALAT1 and BIRC6 and choose a more aggressive treatment.
This wasn’t possible 10 years ago, but thanks to advances in sequencing technology and the development of data repositories, it’s now possible to compare multiple human genomes. While it may be too expensive to compile the entire genome for every cancer patient, regions of tumor DNA can be quickly sequenced.
Stuart now hopes to broaden this analysis. “No one had looked at so many whole genomes before—at least not for breast cancer,” he says. “But it’s the classic example of a study trapped in one tissue type. We now want to broaden our view and look across tissues.”
Knowledge in a Nutshell
UCSC is renowned for its contributions to the field of computational biology. Yet despite its sophisticated staff and decades of publications, whole genome analysis takes time.
Stuart and his group will begin by organizing the petabytes of data stored in the UCSC Cancer Genomics Hub and other repositories so that information is in the same format.
Researchers will then start working on predictive models, and Margolin and his team will identify the best of the batch.
The end product will be a graph-based database like Google Maps, which overlays spatial information with restaurant reviews and parking information. Called the Biomedical Evidence Graph, the project will contain a table of information that relates sequence information, gene expression patterns and clinical outcomes.
“Instead of turning on the restaurants, we might be able to turn on all the p53 mutants and see which cancers they are associated with,” says Stuart. The p53 mutation is one of the most common mutations found in tumor cells.
“We also plan to flag patients with unique mutations or genetic signatures,” says Goldstein.
No one has the tools to do this yet, and if the researchers are successful, they stand to revolutionize the field of personalized medicine. “The idea is to build a discovery environment so that information can be shared by the larger medical community,” says Stuart.