It ultimately became a Nobel Prize-winning revolution when researchers first designed CRISPR as a gene-editing technology for bacterial, plant, animal, and human cells. The potential of the technology is great and ranges from curing genetically disposed diseases to applications in agricultural and industrial biotechnology, but there are challenges.
One of these challenges is to select a so-called gRNA molecule that should be designed to guide the Cas9 protein to the right place in the DNA where it will make a cut from the editing of the gene.
“Typically, there are several possible gRNAs and not all of them are equally effective. Therefore, the challenge is to select the few that work with high efficiency and this is precisely what our new method does, ”says Yonglun Luo, associate professor in the department of biomedicine. at Aarhus University.
The new method is developed based on the new data from the researchers and the implementation of an algorithm, which gives a prediction on which gRNAs are performing the most efficiently.
“By combining our own data with publicly available data and including knowledge about the molecular interactions between gRNA, DNA and CRISPR-Cas9 protein, we have succeeded in developing a better method,” says Jan Gorodkin, professor in the Department of Veterinary and Animal Medicine. Sciences at the University of Copenhagen.
Data, deep learning of molecular interactions
Jan Gorodkin’s research group with Giulia Corsi and Christian Anthon collaborated with Yonglun Luo’s research group to obtain the new results. The experimental part of the study was conducted by Luo’s group while Gorodkin’s group led the computer modeling.
“In our study, we quantified the effectiveness of gRNA molecules for over 10,000 different sites. The work was carried out using a massive method based on a high-speed library, which would not be possible with traditional methods, ”says Yonglun Luo.
The researchers took their starting point regarding data generation from the concept of having a virus expressing gRNA and a synthetic target site in one cell at a time. Synthetic target sites have exactly the same DNA sequences as the corresponding target sites in the genome. Thus, these synthetic target sites are used as so-called surrogate target sites to capture the efficiency of CRISPR-Cas9 editing. Together with colleagues from BGI-Research’s Lars Bolund Institute of Regenerative Medicine and Harvard Medical School, they generated high-quality CRISPR-Cas9 activity for over 10,000 gRNAs.
With this dataset of gRNAs with known efficiencies of low to high, the researchers were able to build a model that could predict the efficiency of gRNAs that has never been seen before.
“In order to train an algorithm to become accurate, you need to have a large data set. With our virus library, we have obtained data that is the perfect starting point to train our deep learning algorithm to to predict the efficiency of gRNAs for gene editing. Our new method is more precise than the other methods currently available “, explains Jan Gorodkin.