It’s one of the most daring projects in biology today – reading the entire genome of every bird, mammal, lizard, fish, and every other creature with a backbone.
And now comes the first major gain from the Vertebrate Genome Project (VGP): near-complete, high-quality genomes from 25 species, Howard Hughes Medical Institute (HHMI) investigator Erich Jarvis with numerous co-authors report on April 28, 2021 in the newspaper Nature. These species include the great rhino, the Canadian lynx, the platypus, and the kākāpō parrot – one of the first high-quality genomes of an endangered vertebrate species.
The paper also exposes technical advances that allow scientists to reach a new level of precision and completeness and paves the way for decoding the genomes of the roughly 70,000 vertebrate species living today, says HHMI researcher and co-author. study David Haussler, a computer geneticist at the University of California, Santa Cruz (UCSC). “We will get a spectacular picture of how nature has in fact filled all ecosystems with this incredibly diverse range of animals.”
With a plethora of accompanying documents, the work begins to deliver on this promise. The project team discovered previously unknown chromosomes in the zebra finch genome, for example, and a surprise discovery about the genetic differences between the marmoset and the human brain. The new research also offers hope of saving the endangered kākāpō parrot and vaquita dolphin from extinction.
“These 25 genomes represent a milestone,” says Jarvis, president of VGP and neurogeneticist at Rockefeller University. “We are learning a lot more than we expected,” he says. “The work is proof of principle for what is to come.”
From 10K to 70K
The VGP milestone has been years in the making. The origins of the project date back to the late 2000s, when Haussler, geneticist Stephen O’Brien, and Oliver Ryder, director of conservation genetics at the San Diego Zoo, thought it was time to think big.
Instead of sequencing just a few species, such as humans and model organisms like fruit flies, why not read the complete genomes of ten thousand animals in a bold “10K Genome” effort? At the time, however, the price was in the hundreds of millions of dollars and the plan never really got off the ground. “Everyone knew it was a great idea, but no one wanted to pay for it,” recalls Beth Shapiro, HHMI researcher and professor, evolutionary biologist at UCSC and co-author of the project. Nature paper.
In addition, scientists’ early efforts to spell out, or “sequence,” all the letters of DNA in an animal’s genome were riddled with errors. In the original approach used to complete the first raw human genome in 2003, scientists cut DNA into small pieces of a few hundred letters and read those letters. Then came the devilishly difficult job of putting the pieces together in the correct order. The methods were not up to par, leading to assembly errors, major shortcomings and other errors. Often, it was not even possible to map genes into individual chromosomes.
The introduction of new sequencing technologies with shorter reads has helped make possible the idea of reading thousands of genomes. These rapidly developing technologies have reduced costs, but also reduced the quality of the genome’s assembly structure. Then, in 2015, Haussler and his colleagues brought in Jarvis, a pioneer in deciphering the complex neural circuits that allow birds to trill new melodies after listening to the songs of others. Jarvis had previously shown a knack for handling large, complex efforts. In 2014, he and more than 100 colleagues sequenced the genomes of 48 species of birds, which revealed new genes involved in vocal learning. “David and others asked me to take the lead on the Genome 10K project,” recalls Jarvis. “They felt I had the personality for it.” Or, as Shapiro puts it, “Erich is a very pushy leader, in a beautiful way. What he wants to happen, he will.”
Jarvis expanded and renamed the idea of the 10K genome to include all vertebrate genomes. He also helped launch a new sequencing center at Rockefeller which, along with one at the Max Planck Institute in Germany led by former HHMI Janelia Research Campus Group Leader Gene Myers, and another at the Sanger Institute in the UK Uni led by Richard Durbin and Mark Blaxter, currently produces most of the VGP genomic data. He asked Adam Phillippy, a leading genome expert at the National Human Genome Research Institute (NHGRI), to chair the VGP assembly team. Then he found around 60 top scientists willing to use their own grant to pay for sequencing costs at the centers to tackle the genomes they were most interested in. The team also negotiated with Maori in New Zealand and officials in Mexico to obtain samples of kākāpō and vaquita in “a fine example of international collaboration,” says Sadye Paez, VGP program director at Rockefeller.
The huge team of researchers have achieved a series of technological advances. New sequencing machines allow them to read pieces of DNA of 10,000 letters or more, instead of a few hundred. Researchers have also developed smart ways to assemble these segments into individual chromosomes. They were able to identify the genes inherited from the mother and the father. This fixes a particularly thorny problem known as “false duplication,” where scientists mistakenly label maternal and paternal copies of the same gene as two separate genes.
“I think this work opens a really important set of doors, because the technical aspects of assembly have been the bottleneck of genome sequencing in the past,” says Jenny Tung, geneticist at Duke University, who was not directly involved in the research. Having high-quality sequencing data “will transform the types of questions people can ask,” she says.
The team’s improved precision shows that the previous genomic sequences are seriously incomplete. In the zebra finch, for example, the team found eight new chromosomes and around 900 genes that were said to have been missing. Previously unknown chromosomes have also appeared in the platypus, as reported by members of the team online in Nature earlier this year. The researchers also walked through and correctly assembled long stretches of repetitive DNA, most of which only contained two of the four genetic letters. Some scientists considered these sections to be non-functional “waste” or “dark matter”. Wrong. Lots of repeats occur in regions of the genome that code for proteins, Jarvis says, suggesting that DNA plays a surprisingly crucial role in turning genes on or off.
This is only the beginning of what Nature paper envisions it as “a new era of discovery in the life sciences”. With each new sequence in the genome, Jarvis and his collaborators discover new, often unexpected, discoveries. Jarvis’s lab, for example, finally caught the regulatory region of a key gene that parrots and songbirds need to learn tunes; then his team will try to figure out how it works. The genome of the marmoset has given several surprises. While the genes of the marmoset and the human brain are largely conserved, the marmoset has several amino acid genes that are pathogenic to humans. This underlines the need to take into account the genomic context when developing animal models, reports the team in an accompanying document in Nature. And in the results also published last year in Nature, a group led by Professor Emma Teeling of University College Dublin in Ireland found that some bats have lost immunity-related genes, which may help explain their ability to tolerate viruses like SARS- CoV-2, responsible for COVID-19.
The new information can also boost efforts to save rare species. “It is an extremely important moral duty to help endangered species,” says Jarvis. That’s why the team collected samples from a kākāpō parrot named Jane, as part of a captive breeding program that brought the parrot back to the brink of extinction. In an article published in the new journal Cell Genomics, of the Cell family of journals, Nicolas Dussex of the University of Otago and his colleagues described their studies of Jane’s genes with other people. The work revealed that the last surviving kākāpō population, isolated on an island off New Zealand for the past 10,000 years, has somehow purged the deleterious mutations, despite the species’ low genetic diversity. A similar finding was observed for the vaquita, with around 10-20 individuals remaining on the planet, in a study published in Molecular Ecology Resources, led by Phil Morin at the National Oceanic and Atmospheric Administration Fisheries in La Jolla, California. “This means there is hope for the conservation of the species,” Jarvis concludes.
A clear path
VGP is now focusing on sequencing even more species. The project team’s next goal is to complete 260 genomes, representing all orders of vertebrates, and then secure enough funds to tackle thousands more, representing all families. This job will not be easy and will inevitably bring new technical and logistical challenges, Tung says. Once hundreds or even thousands of animals easily found in zoos or laboratories have been sequenced, scientists may face ethical hurdles in obtaining samples from other species, especially when animals are rare or endangered.
But with the new paper, the way forward seems clearer than it has been in years. The VGP model even inspires other major sequencing efforts, including the Earth Biogenome Project, which aims to decode the genomes of all eukaryotic species within 10 years. Perhaps for the first time, it seems possible to fulfill the dream that Haussler and many others share of reading every letter of every organism’s genome. Darwin considered the enormous diversity of life on Earth to be “the most beautiful infinite forms,” Haussler observes. “Now we have an incredible opportunity to see how these shapes came to be.”