A causal theory for studying the cause-and-effect relationships of genes | MIT News

Date:

By studying changes in gene expression, researchers find out how cells function at a molecular level, which could help them understand the event of certain diseases.

But a human has about 20,000 genes that may affect one another in complex ways, so even knowing which groups of genes to focus on is an enormously complicated problem. Also, genes work together in modules that regulate one another.

MIT researchers have now developed theoretical foundations for methods that might discover one of the best option to aggregate genes into related groups so that they can efficiently learn the underlying cause-and-effect relationships between many genes.

Importantly, this recent method accomplishes this using only observational data. This implies researchers don’t must perform costly, and sometimes infeasible, interventional experiments to acquire the info needed to infer the underlying causal relationships.

In the long term, this system could help scientists discover potential gene targets to induce certain behavior in a more accurate and efficient manner, potentially enabling them to develop precise treatments for patients.

“In genomics, it is rather vital to grasp the mechanism underlying cell states. But cells have a multiscale structure, so the extent of summarization could be very vital, too. If you happen to work out the suitable option to aggregate the observed data, the data you learn concerning the system must be more interpretable and useful,” says graduate student Jiaqi Zhang, an Eric and Wendy Schmidt Center Fellow and co-lead creator of a paper on this system.

Zhang is joined on the paper by co-lead creator Ryan Welch, currently a master’s student in engineering; and senior creator Caroline Uhler, a professor within the Department of Electrical Engineering and Computer Science (EECS) and the Institute for Data, Systems, and Society (IDSS) who can be director of the Eric and Wendy Schmidt Center on the Broad Institute of MIT and Harvard, and a researcher at MIT’s Laboratory for Information and Decision Systems (LIDS). The research might be presented on the Conference on Neural Information Processing Systems.

Learning from observational data

The issue the researchers got down to tackle involves learning programs of genes. These programs describe which genes function together to manage other genes in a biological process, akin to cell development or differentiation.

Since scientists can’t efficiently study how all 20,000 genes interact, they use a way called causal disentanglement to learn how one can mix related groups of genes right into a representation that permits them to efficiently explore cause-and-effect relationships.

In previous work, the researchers demonstrated how this may very well be done effectively within the presence of interventional data, that are data obtained by perturbing variables within the network.

However it is commonly expensive to conduct interventional experiments, and there are some scenarios where such experiments are either unethical or the technology will not be adequate for the intervention to succeed.

With only observational data, researchers can’t compare genes before and after an intervention to find out how groups of genes function together.

“Most research in causal disentanglement assumes access to interventions, so it was unclear how much information you’ll be able to disentangle with just observational data,” Zhang says.

The MIT researchers developed a more general approach that uses a machine-learning algorithm to effectively discover and aggregate groups of observed variables, e.g., genes, using only observational data.

They’ll use this system to discover causal modules and reconstruct an accurate underlying representation of the cause-and-effect mechanism. “While this research was motivated by the issue of elucidating cellular programs, we first needed to develop novel causal theory to grasp what could and couldn’t be learned from observational data. With this theory in hand, in future work we will apply our understanding to genetic data and discover gene modules in addition to their regulatory relationships,” Uhler says.

A layerwise representation

Using statistical techniques, the researchers can compute a mathematical function often called the variance for the Jacobian of every variable’s rating. Causal variables that don’t affect any subsequent variables must have a variance of zero.

The researchers reconstruct the representation in a layer-by-layer structure, starting by removing the variables in the underside layer which have a variance of zero. Then they work backward, layer-by-layer, removing the variables with zero variance to find out which variables, or groups of genes, are connected.

“Identifying the variances which are zero quickly becomes a combinatorial objective that’s pretty hard to unravel, so deriving an efficient algorithm that might solve it was a significant challenge,” Zhang says.

In the long run, their method outputs an abstracted representation of the observed data with layers of interconnected variables that accurately summarizes the underlying cause-and-effect structure.

Each variable represents an aggregated group of genes that function together, and the connection between two variables represents how one group of genes regulates one other. Their method effectively captures all the data utilized in determining each layer of variables.

After proving that their technique was theoretically sound, the researchers conducted simulations to indicate that the algorithm can efficiently disentangle meaningful causal representations using only observational data.

In the long run, the researchers wish to apply this system in real-world genetics applications. Additionally they wish to explore how their method could provide additional insights in situations where some interventional data can be found, or help scientists understand how one can design effective genetic interventions. In the long run, this method could help researchers more efficiently determine which genes function together in the identical program, which could help discover drugs that might goal those genes to treat certain diseases.

This research is funded, partially, by the MIT-IBM Watson AI Lab and the U.S. Office of Naval Research.

Share post:

Popular

More like this
Related