The fact that the human body is made up of cells is a well understood basic concept. Yet surprisingly, scientists are still trying to figure out the different types of cells that make up our organs and contribute to our health.
A relatively new technique called single-cell sequencing allows researchers to recognize and categorize cell types based on characteristics such as the genes they express. But this type of research generates huge amounts of data, with data sets of hundreds of thousands to millions of cells.
A new algorithm developed by Joshua Welch, Ph.D., Department of Computational Medicine and Bioinformatics, Ph.D. Candidate Chao Gao and his team use e-learning, dramatically speeding up this process and empowering researchers around the world integer to analyze large sets of data using the amount of memory found on a standard laptop. The results are described in the review Biotechnology of nature.
“Our technique allows anyone with a computer to perform whole organism analysis,” Welch explains. “This is really where the estate is heading.
The team demonstrated their proof of principle using data sets from the National Institute of Health’s Brain Initiative, a project to understand the human brain by mapping each cell, with investigative teams across the country, including Welch’s lab.
Typically, Welch explains, for projects like this, each single-cell dataset submitted should be rescanned with the previous datasets in the order they arrive. Their new approach allows new datasets to be added to existing ones, without reprocessing old datasets. It also allows researchers to divide datasets into so-called mini-batches to reduce the amount of memory needed to process them.
“This is crucial for increasingly generated sets with millions of cells,” Welch says. “This year there have been five to six articles with two million cells or more and the amount of memory you need just to store the raw data is significantly more than anyone has on their computer.”
Welch compares the online technique to the continuous processing of data performed by social media platforms like Facebook and Twitter, which must process data continuously generated by users and deliver relevant posts to people’s feeds. “Here, instead of writing tweets, we have labs around the world doing experiments and publishing their data.”
This discovery has the potential to dramatically improve the efficiency of other ambitious projects such as the Human Body Map and the Human Cell Atlas. According to Welch, “Understanding the normal complement of cells in the body is the first step towards understanding how they go wrong in disease.”
Source of the story:
Material provided by Michigan Medicine – University of Michigan. Original written by Kelly Malcom. Note: Content can be changed for style and length.