Characterised by weakened or damaged heart musculature, heart failure ends in the gradual buildup of fluid in a patient’s lungs, legs, feet, and other parts of the body. The condition is chronic and incurable, often resulting in arrhythmias or sudden cardiac arrest. For a lot of centuries, bloodletting and leeches were the treatment of selection, famously practiced by barber surgeons in Europe, during a time when physicians rarely operated on patients.
Within the twenty first century, the management of heart failure has develop into decidedly less medieval: Today, patients undergo a mixture of healthy lifestyle changes, prescription of medicines, and sometimes use pacemakers. Yet heart failure stays one in every of the leading causes of morbidity and mortality, placing a considerable burden on health-care systems across the globe.
“About half of the people diagnosed with heart failure will die inside five years of diagnosis,” says Teya Bergamaschi, an MIT PhD student within the lab of Nina T. and Robert H. Rubin Professor Collin Stultz and the co-first creator of a brand new paper introducing a deep learning model for predicting heart failure. “Understanding how a patient will fare after hospitalization is basically vital in allocating finite resources.”
The paper, published in Lancet eClinical Medicine by a team of researchers at MIT, Mass General Brigham, and Harvard Medical School, shares results from developing and testing PULSE-HF, which stands loosely for “Predict changes in left ventricULar Systolic function from ECGs of patients who’ve Heart Failure.” The project was conducted in Stultz’s lab, which is affiliated with the MIT Abdul Latif Jameel Clinic for Machine Learning in Health. Developed and retrospectively tested across three different patient cohorts from Massachusetts General Hospital, Brigham and Women’s Hospital, and MIMIC-IV (a publicly available dataset), the deep learning model accurately predicts changes within the left ventricular ejection fraction (LVEF), which is the share of blood being pumped out of the left ventricle of the guts.
A healthy human heart pumps out about 50 to 70 percent of blood from the left ventricle with each beat — anything less is taken into account an indication of a possible problem. “The model takes an [electrocardiogram] and outputs a prediction of whether or not there might be an ejection fraction inside the following 12 months that falls below 40 percent,” says Tiffany Yau, an MIT PhD student in Stultz’s lab who can be co-first creator of the PULSE-HF paper. “That’s essentially the most severe subgroup of heart failure.”
If PULSE-HF predicts that a patient’s ejection fraction is prone to worsen inside a 12 months, the clinician can prioritize the patient for follow-up. Subsequently, lower-risk patients can reduce their variety of hospital visits and the period of time spent getting 10 electrodes adhered to their body for a 12-lead ECG. The model will also be deployed in low-resource clinical settings, including doctors offices in rural areas that don’t typically have a cardiac sonographer employed to run ultrasounds each day.
“The largest thing that distinguishes [PULSE-HF] from other heart failure ECG methods is as an alternative of detection, it does forecasting,” says Yau. The paper notes that to this point, no other methods exist for predicting future LVEF decline amongst patients with heart failure.
Through the testing and validation process, the researchers used a metric generally known as “area under the receiver operating characteristic curve” (AUROC) to measure PULSE-HF’s performance. AUROC is often used to measure a model’s ability to discriminate between classes on a scale from 0 to 1, with 0.5 being random and 1 being perfect. PULSE-HF achieved AUROCs starting from 0.87 to 0.91 across all three patient cohorts.
Notably, the researchers also built a version of PULSE-HF for single-lead ECGs, meaning just one electrode must be placed on the body. While 12-lead ECGs are generally considered superior for being more comprehensive and accurate, the performance of the single-lead version of PULSE-HF was just as strong because the 12-lead version.
Despite the elegant simplicity behind the concept of PULSE-HF, like most clinical AI research, it belies a laborious execution. “It’s taken years [to complete this project],” Bergamaschi recalls. “It’s undergone many iterations.”
Considered one of the team’s biggest challenges was collecting, processing, and cleansing the ECG and echocardiogram datasets. While the model goals to forecast a patient’s ejection fraction, the labels for the training data weren’t all the time available. Very similar to a student learning from a textbook with a solution key, labeling is critical for helping machine-learning models appropriately discover patterns in data.
Clean, linear text in the shape of TXT files typically works best when training models. But echocardiogram files typically are available the shape of PDFs, and when PDFs are converted to TXT files, the text (which gets broken up by line breaks and formatting) becomes difficult for the model to read. The unpredictable nature of real-life scenarios, like a restless patient or a loose lead, also marred the information. “There are quite a lot of signal artifacts that have to be cleaned,” Bergamaschi says. “It’s type of a never-ending rabbit hole.”
While Bergamaschi and Yau acknowledge that more complicated methods could help filter the information for higher signals, there may be a limit to the usefulness of those approaches. “At what point do you stop?” Yau asks. “You might have to think concerning the use case — is it easiest to have this model that works on data that’s barely messy? Since it probably might be.”
The researchers anticipate that the following step for PULSE-HF might be testing the model in a prospective study on real patients, whose future ejection fraction is unknown.
Despite the challenges inherent to bringing clinical AI tools like PULSE-HF over the finish line, including the possible risk of prolonging a PhD by one other 12 months, the scholars feel that the years of exertions were worthwhile.
“I believe things are rewarding partially because they’re difficult,” Bergamaschi says. “A friend said to me, ‘For those who think one can find your calling after graduation, in case your calling is really calling, it’ll be there within the one additional 12 months it takes you to graduate.’ … The best way we’re measured as researchers in [the ML and health] space is different from other researchers in ML space. Everyone on this community understands the unique challenges that exist here.”
“There’s an excessive amount of suffering on the planet,” says Yau, who joined Stultz’s lab after a health event made her realize the importance of machine learning in health care. “Anything that tries to ease suffering is something that I might consider a invaluable use of my time.”

