Eight quick tips for biologically and medically informed machine learning

Machine learning has become a powerful tool for computational analysis in the biomedical sciences, with its effectiveness significantly enhanced by integrating domain-specific knowledge. This integration has give rise to informed machine learning, in contrast…
Octavio Hahn · 4 days ago · 3 minutes read


## Biologically and Medically Informed Machine Learning: A Guide to Best Practices**Introduction**Machine learning has become a powerful tool in computational biology and medicine, but it also faces challenges. Uninformed machine learning approaches treat all variables equally, which can lead to errors. Informed machine learning, on the other hand, integrates domain knowledge into the learning process, enhancing its accuracy, explainability, and reliability.### Understanding Informed Machine Learning**Types of Modeling Approaches:*** Full-knowledge models rely on domain expertise and incorporate little data into the model.* Zero-knowledge models utilize large datasets and make predictions without prior domain knowledge.* Partial-knowledge models (informed machine learning) combine domain knowledge with data-driven approaches, enhancing the strengths of both.### Quick Tips for Informed Machine Learning**1. Recognize the Limitations of Domain Knowledge*** Domain knowledge used to inform machine learning models can be limited or outdated.* Communicate any limitations of the knowledge used to avoid misleading results.**2. Explore Different Approaches to Informed Machine Learning*** Pre-processing: Incorporate domain knowledge into data preparation and selection.* In-processing: Embed domain knowledge directly into the machine learning model's learning algorithm.* Post-processing: Refine the model's output to align with domain knowledge and expectations.**3. Use Pre-processing Appropriately*** Pre-processing modifies input features, data, and/or selects specific machine learning algorithms based on prior domain knowledge.* This can improve the performance and interpretability of the machine learning model.**4. Use In-processing Appropriately*** In-processing integrates domain laws and principles into the learning process to ensure adherence to known scientific knowledge.* This can result in more interpretable and reliable models.**5. Use Post-processing Appropriately*** Post-processing aligns the output of a machine learning model with domain knowledge and practical considerations.* This can ensure the model's decisions are consistent and make sense in the context of the problem space.**6. Involve a Biomedical Expert*** Collaborate with a biologist or medical doctor throughout the machine learning project.* Their insights will ensure the accuracy and relevance of the scientific question and the interpretation of results.**7. Evaluate and Compare Different Approaches*** Compare the performance of a machine learning model using the following approaches: domain knowledge-only, uninformed machine learning, and informed machine learning.* This can identify the best approach for the specific problem.**8. Follow Open Science Best Practices*** Promote open source software code, open data release, and open access publication.* This ensures reproducibility, collaboration, and a broader impact of the scientific study.### ConclusionInformed machine learning offers significant advantages over uninformed approaches, but careful consideration should be given to the recommendations outlined in this guide to avoid common pitfalls. By adhering to these best practices, researchers can generate more reliable and accurate results in biomedical studies using machine learning.