AI to assist researchers see the larger picture in cell biology | MIT News

Studying gene expression in a cancer patient’s cells may help clinical biologists understand the cancer’s origin and predict the success of various treatments. But cells are complex and contain many layers, so how the biologist conducts measurements affects which data they’ll obtain. For example, measuring proteins in a cell could yield different information concerning the effects of cancer than measuring gene expression or cell morphology.

Where within the cell the data comes from matters. But to capture complete information concerning the state of the cell, scientists often must conduct many measurements using different techniques and analyze them one by one. Machine-learning methods can speed up the method, but existing methods lump all the data from each measurement modality together, making it difficult to work out which data got here from which a part of the cell.

To beat this problem, researchers on the Broad Institute of MIT and Harvard and ETH Zurich/Paul Scherrer Institute (PSI) developed a synthetic intelligence-driven framework that learns which details about a cell’s state is shared across different measurement modalities and which information is exclusive to a specific measurement type.

By pinpointing which information got here from which cell parts, the approach provides a more holistic view of the cell’s state, making it easier for a biologist to see the whole picture of cellular interactions. This might help scientists understand disease mechanisms and track the progression of cancer, neurodegenerative disorders akin to Alzheimer’s, and metabolic diseases like diabetes.

“Once we study cells, one measurement is usually not sufficient, so scientists develop recent technologies to measure different points of cells. While now we have some ways of taking a look at a cell, at the tip of the day we only have one underlying cell state. By putting the data from all these measurement modalities together in a wiser way, we could have a fuller picture of the state of the cell,” says lead writer Xinyi Zhang SM ’22, PhD ’25, a former graduate student within the MIT Department of Electrical Engineering and Computer Science (EECS) and an affiliate of the Eric and Wendy Schmidt Center on the Broad Institute of MIT and Harvard, who’s now a bunch leader at AITHYRA in Vienna, Austria.

Zhang is joined on a paper concerning the work by G.V. Shivashankar, a professor within the Department of Health Sciences and Technology at ETH Zurich and head of the Laboratory of Multiscale Bioimaging at PSI; and senior writer Caroline Uhler, a professor in EECS and the Institute for Data, Systems, and Society (IDSS) at MIT, member of MIT’s Laboratory for Information and Decision Systems (LIDS), and director of the Eric and Wendy Schmidt Center on the Broad Institute. The research appears today in Nature Computational Science.

Manipulating multiple measurements

There are a lot of tools scientists can use to capture details about a cell’s state. For example, they’ll measure RNA to see if the cell is growing, or they’ll measure chromatin morphology to see if the cell is coping with external physical or chemical signals.

“When scientists perform multimodal evaluation, they gather information using multiple measurement modalities and integrate it to raised understand the underlying state of the cell. Some information is captured by one modality only, while other information is shared across modalities. To completely understand what is going on contained in the cell, it is crucial to know where the data got here from,” says Shivashankar.

Often, for scientists, the one approach to sort this out is to conduct multiple individual experiments and compare the outcomes. This slow and cumbersome process limits the quantity of data they’ll gather.

In the brand new work, the researchers built a machine-learning framework that specifically understands which information overlaps between different modalities, and which information is exclusive to a specific modality but not captured by others.

“As a user, you possibly can simply input your cell data and it mechanically tells you which ones data are shared and which data are modality-specific,” Zhang says.

To construct this framework, the researchers rethought the everyday way machine-learning models are designed to capture and interpret multimodal cellular measurements.

Often these methods, generally known as autoencoders, have one model for every measurement modality, and every model encodes a separate representation for the info captured by that modality. The representation is a compressed version of the input data that discards any irrelevant details.

The MIT method has a shared representation space where data that overlap between multiple modalities are encoded, in addition to separate spaces where unique data from each modality are encoded.

In essence, one can consider it like a Venn diagram of cellular data.

The researchers also used a special, two-step training procedure that helps their model handle the complexity involved in deciding which data are shared across multiple data modalities. After training, the model can discover which data are shared and that are unique when fed cell data it has never seen before.

Distinguishing data

In tests on synthetic datasets, the framework appropriately captured known shared and modality-specific information. After they applied their method to real-world single-cell datasets, it comprehensively and mechanically distinguished between gene activity captured jointly by two measurement modalities, akin to transcriptomics and chromatin accessibility, while also appropriately identifying which information got here from only one in every of those modalities.

As well as, the researchers used their method to discover which measurement modality captured a certain protein marker that indicates DNA damage in cancer patients. Knowing where this information got here from would help a clinical scientist determine which technique they need to use to measure that marker.

“There are too many modalities in a cell and we are able to’t possibly measure all of them, so we’d like a prediction tool. But then the query is: Which modalities should we measure and which modalities should we predict? Our method can answer that query,” Uhler says.

In the longer term, the researchers wish to enable the model to offer more interpretable information concerning the state of the cell. Additionally they wish to conduct additional experiments to make sure it appropriately disentangles cellular information and apply the model to a wider range of clinical questions.

“It shouldn’t be sufficient to only integrate the data from all these modalities,” Uhler says. “We will learn quite a bit concerning the state of a cell if we fastidiously compare the several modalities to know how different components of cells regulate one another.”

This research is funded, partially, by the Eric and Wendy Schmidt Center on the Broad Institute, the Swiss National Science Foundation, the U.S. National Institutes of Health, the U.S. Office of Naval Research, AstraZeneca, the MIT-IBM Watson AI Lab, the MIT J-Clinic for Machine Learning and Health, and a Simons Investigator Award.

Related Post

Leave a Reply