Recent data science tool greatly accelerates molecular evaluation of the environment

A research team led by scientists on the University of California, Riverside, has developed a computational workflow for analyzing large data sets in the sphere of metabolomics, the study of small molecules found inside cells, biofluids, tissues, and full ecosystems.

Most recently, the team applied this recent computational tool to research pollutants in seawater in Southern California. The team swiftly captured the chemical profiles of coastal environments and highlighted potential sources of pollution.

“We’re thinking about understanding how such pollutants get introduced within the ecosystem,” said Daniel Petras, an assistant professor of biochemistry at UC Riverside, who led the research team. “Determining which molecules within the ocean are necessary for environmental health isn’t straightforward due to the ocean’s sheer chemical diversity. The protocol we developed greatly accelerates this process. More efficient sorting of the information means we are able to understand problems related to ocean pollution faster.”

Petras and his colleagues report within the journal Nature Protocols that their protocol is designed not only for knowledgeable researchers but in addition for educational purposes, making it a perfect resource for college students and early-career scientists. This computational workflow is accompanied by an accessible web application with a graphical user interface that makes metabolomics data evaluation accessible for non-experts and enables them to realize statistical insights into their data inside minutes.

“This tool is accessible to a broad range of researchers, from absolute beginners to experts, and is tailored to be used along with the molecular networking software my group is developing,” said coauthor Mingxun Wang, an assistant professor of computer science and engineering at UCR. “For beginners, the rules and code we offer make it easier to know common data processing and evaluation steps. For experts, it accelerates reproducible data evaluation, enabling them to share their statistical data evaluation workflows and results.”

Petras explained the research paper is exclusive, serving as a big educational resource organized through a virtual research group called Virtual Multiomics Lab, or VMOL. With greater than 50 scientists participating from all over the world, VMOL is a community-driven, open-access community. It goals to simplify and democratize the chemical evaluation process, making it accessible to researchers worldwide, no matter their background or resources.

“I’m incredibly proud to see how this project evolved into something impactful, involving experts and students from across the globe,” said Abzer Pakkir Shah, a doctoral student in Petras’ group and the primary creator of the paper. “By removing physical and economic barriers, VMOL provides training in computational mass spectrometry and data science and goals to launch virtual research projects as a brand new type of collaborative science.”

All software the team developed is free and publicly available. The software development was initiated during a summer school for non-targeted metabolomics in 2022 on the University of Tübingen, where the team also launched VMOL.

Petras expects the protocol will likely be especially useful to environmental researchers in addition to scientists working within the biomedical field and researchers doing clinical studies in microbiome science.

“The flexibility of our protocol extends to a big selection of fields and sample types, including combinatorial chemistry, doping evaluation, and trace contamination of food, pharmaceuticals, and other industrial products,” he said.

Petras received his master’s degree in biotechnology from the University of Applied Science Darmstadt and his doctoral degree in biochemistry from the Technical University Berlin. He did postdoctoral research at UC San Diego, where he focused on the event of large-scale environmental metabolomics methods. In 2021, he launched the Functional Metabolomics Lab on the University of Tübingen. In January 2024 he joined UCR, where his lab focuses on the event and application of mass spectrometry-based methods to visualise and assess chemical exchange inside microbial communities.