Machine learning (ML) programs computers to learn like we do, through continuous evaluation of data and identification of patterns based on past results. ML can quickly identify trends in large data sets, operate with little or no human interaction, and improve its predictions over time. Thanks to these capabilities, it quickly finds its place in medical research.
People with breast cancer could soon be diagnosed by ML more quickly than by biopsy. ML can also help paralyzed people regain their autonomy through prosthetics controlled by models identified in brain scan data. ML research promises these and other possibilities to help people lead healthier lives. But even though the number of studies on ML is increasing, its actual use in medical offices has not expanded.
Limitations lie in the small sample sizes and unique datasets of medical research. This small data prevents machines from identifying meaningful patterns. The more data, the more accurate ML diagnoses and predictions. For many diagnostic uses, thousands of subjects would be needed, but most studies use smaller numbers, dozens of subjects.
But there are ways to get meaningful results from small data sets if you know how to manipulate the numbers. Running statistical tests over and over again with different subsets of your data can indicate the meaning of a data set that, in reality, may just be random outliers.
This tactic, known as P-hacking or feature hacking in ML, leads to the creation of predictive models that are too limited to be useful in the real world. What looks good on paper does not translate into a doctor’s ability to diagnose or treat us. These statistical errors, often made unknowingly, can lead to dangerous conclusions.
To help scientists avoid these mistakes and advance ML applications, Konrad KordingA Penn integrates his knowledge University professor appointed to the Department of Neurosciences of Perelman School of Medicine and in the departments of bioengineering and computer and information sciences of the School of Engineering and Applied Sciencesleads one aspect of a large NIH-funded program known as CENTER – Create an educational link for training in experimental rigor. Kording will lead the Penn cohort in creating the Community for Rigor, which will provide open access resources on conducting solid scientific research. Members of this inclusive scientific community will be able to participate in ML simulations and discussion-based courses.
“The reason for the absence of ML in real-world scenarios is due to poor use of statistics rather than the limitations of the tool itself,” says Kording. “If a study publishes a claim that seems too good to be true, it usually is, and we can often trace that back to their use of statistics.”
To make significant progress in the field of ML in biomedical research, it will be necessary to raise awareness of these issues, help researchers understand how to identify and mitigate them, and create a stronger culture around scientific rigor within the research community.
Kording aims to make the point that just because integrating machine learning into biomedical research can introduce bias, doesn’t mean scientists should avoid it. They just need to understand how to use it meaningfully.
The Community for Rigor aims to address the challenges of the field with specific projects to create a module on machine learning in biomedical research that will guide participants through datasets and statistical tests and identify the exact locations where biases are commonly introduced.
This story is by Melissa Pappas. Read more on Penn Engineering Today.