Motivation

Would not it be great to have a virtual replica of ourselves to explore our interaction with the real world in real time? A living, digital representation of ourselves that integrates machine learning and multiscale modeling to continuously learn and dynamically update itself as our environment changes in real life? A virtual mirror of ourselves that allows us to simulate our personal medical history and health condition using data-driven analytical algorithms and theory-driven physical knowledge? These are the objectives of the Digital Twin.1 In health care, a Digital Twin would allow us to improve health, sports, and education by integrating population data with personalized data, all adjusted in real time, on the basis of continuously recorded health and lifestyle parameters from various sources.2,3,4 But, realistically, how long will it take before we have a Digital Twin by our side? Can we leverage our knowledge of machine learning and multiscale modeling in the biological, biomedical, and behavioral sciences to accelerate developments towards a Digital Twin? Do we already have digital organ models that we could integrate into a full Digital Twin? And what are the challenges, open questions, opportunities, and limitations? Where do we even begin? Fortunately, we do not have to start entirely from scratch. Over the past two decades, multiscale modeling has emerged into a promising tool to build individual organ models by systematically integrating knowledge from the tissue, cellular, and molecular levels, in part fueled by initiatives like the United States Federal Interagency Modeling and Analysis Group IMAG5. Depending on the scale of interest, multiscale modeling approaches fall into two categories, ordinary differential equation-based and partial differential equation-based approaches. Within both categories, we can distinguish data-driven and theory-driven machine learning approaches. Here we discuss these four approaches towards developing a Digital Twin.

Ordinary differential equations characterize the temporal evolution of biological systems

Ordinary differential equations are widely used to simulate the integral response of a system during development, disease, environmental changes, or pharmaceutical interventions. Systems of ordinary differential equations allow us to explore the dynamic interplay of key characteristic features to understand the sequence of events, the progression of disease, or the timeline of treatment. Applications range from the molecular, cellular, tissue, and organ levels all the way to the population level including immunology to correlate protein–protein interactions and immune response,6 microbiology to correlate growth rates and bacterial competition, metabolic networks to correlate genome and physiome,7,8 neuroscience to correlate protein misfolding to biomarkers of neurodegeneration,9 oncology to correlate perturbations to tumorigenesis,10 and epidemiology to correlate disease spread to public health. In essence, ordinary differential equations are a powerful tool to study the dynamics of biological, biomedical, and behavioral systems in an integral sense, irrespective of the regional distribution of the underlying features.

Partial differential equations characterize the spatio-temporal evolution of biological systems

In contrast to ordinary differential equations, partial differential equations are typically used to study spatial patterns of inherently heterogeneous, regionally varying fields, for example, the flow of blood through the cardiovascular system11 or the elastodynamic contraction of the heart.12 Unavoidably, these equations are nonlinear and highly coupled, and we usually employ computational tools, for example, finite difference or finite element methods, to approximate their solution numerically. Finite element methods have a long history of success at combining ordinary differential equations and partial differential equations to pass knowledge across the scales.13 They are naturally tailored to represent the small-scale behavior locally through constitutive laws using ordinary differential equations and spatial derivatives and embed this knowledge globally into physics-based conservation laws using partial differential equations. Assuming we know the governing ordinary and partial differential equations, finite element models can predict the behavior of the system from given initial and boundary conditions measured at a few selected points. This approach is incredibly powerful, but requires that we actually know the physics of the system, for example through the underlying kinematic equations, the balance of mass, momentum, or energy. Yet, to close the system of equations, we need constitutive equations that characterize the behavior of the system, which we need to calibrate either with experimental data or with data generated via multiscale modeling.

Multiscale modeling seeks to predict the behavior of biological, biomedical, and behavioral systems

Toward this goal, the main objective of multiscale modeling is to identify causality and establish causal relations between data. Our experience has taught us that most engineering materials display an elastic, viscoelastic, or elastoplastic constitutive behavior. However, biological and biomedical materials are often more complex, simply because they are alive.14 They continuously interact with and adapt to their environment and dynamically respond to biological, chemical, or mechanical cues.15 Unlike classical engineering materials, living matter has amazing abilities to generate force, actively contract, rearrange its architecture, and grow or shrink in size.16 To appropriately model these phenomena, we not only have to rethink the underlying kinetics, the balance of mass, and the laws of thermodynamics, but often have to include the biological, chemical, or electrical fields that act as stimuli of this living response.17 This is where multiphysics multiscale modeling becomes important:18,19 multiscale modeling allows us to thoroughly probe biologically relevant phenomena at a smaller scale and seamlessly embed the relevant mechanisms at the larger scale to predict the dynamics of the overall system.20 Importantly, rather than making phenomenological assumptions about the behavior at the larger scale, multiscale models postulate that the behavior at the larger scale emerges naturally from the collective action at the smaller scales. Yet, this attention to detail comes at a price. While multiscale models can provide unprecedented insight to mechanistic detail, they are not only expensive, but also introduce a large number of unknowns, both in the form of unknown physics and unknown parameters21,22. Fortunately, with the increasing ability to record and store information, we now have access to massive amounts of biological and biomedical data that allow us to systematically discover details about these unknowns.

Machine learning seeks to infer the dynamics of biological, biomedical, and behavioral systems

Toward this goal, the main objective of machine learning is to identify correlations among big data. The focus in the biology, biomedicine, and behavioral sciences is currently shifting from solving forward problems based on sparse data towards solving inverse problems to explain large datasets.23 Today, multiscale simulations in the biological, biomedical, and behavioral sciences seek to infer the behavior of the system, assuming that we have access to massive amounts of data, while the governing equations and their parameters are not precisely known.24,25,26 This is where machine learning becomes critical: machine learning allows us to systematically preprocess massive amounts of data, integrate and analyze it from different input modalities and different levels of fidelity, identify correlations, and infer the dynamics of the overall system. Similarly, we can use machine learning to quantify the agreement of correlations, for example by comparing computationally simulated and experimentally measured features across multiple scales using Bayesian inference and uncertainty quantification.27

Machine learning and multiscale modeling mutually complement one another

Where machine learning reveals correlation, multiscale modeling can probe whether the correlation is causal; where multiscale modeling identifies mechanisms, machine learning, coupled with Bayesian methods, can quantify uncertainty. This natural synergy presents exciting challenges and new opportunities in the biological, biomedical, and behavioral sciences.28 On a more fundamental level, there is a pressing need to develop the appropriate theories to integrate machine learning and multiscale modeling. For example, it seems intuitive to a priori build physics-based knowledge in the form of partial differential equations, boundary conditions, and constraints into a machine learning approach.22 Especially when the available data are limited, we can increase the robustness of machine learning by including physical constraints such as conservation, symmetry, or invariance. On a more translational level, there is a need to integrate data from different modalities to build predictive simulation tools of biological systems.29 For example, it seems reasonable to assume that experimental data from cell and tissue level experiments, animal models, and patient recordings are strongly correlated and obey similar physics-based laws, even if they do not originate from the same system. Naturally, while data and theory go hand in hand, some of the approaches to integrate information are more data driven, seeking to answer questions about the quality of the data, identify missing information, or supplement sparse training data,30,31 while some are more theory driven, seeking to answer questions about robustness and efficiency, analyze sensitivity, quantify uncertainty, and choose appropriate learning tools.

Figure 1 illustrates the integration of machine learning and multiscale modeling on the parameter level by constraining their spaces, identifying values, and analyzing their sensitivity, and on the system level by exploiting the underlying physics, constraining design spaces, and identifying system dynamics. Machine learning provides the appropriate tools for supplementing training data, preventing overfitting, managing ill-posed problems, creating surrogate models, and quantifying uncertainty. Multiscale modeling integrates the underlying physics for identifying relevant features, exploring their interaction, elucidating mechanisms, bridging scales, and understanding the emergence of function. We have structured this review around four distinct but overlapping methodological areas: ordinary and partial differential equations, and data and theory driven machine learning. These four themes roughly map into the four corners of the data-physics space, where the amount of available data increases from top to bottom and physical knowledge increases from left to right. For each area, we identify challenges, open questions, and opportunities, and highlight various examples from the life sciences. For convenience, we summarize the most important terms and technologies associated with machine learning with examples from multiscale modeling in Box 1. We envision that our article will spark discussion and inspire scientists in the fields of machine learning and multiscale modeling to join forces towards creating predictive tools to reliably and robustly predict biological, biomedical, and behavioral systems for the benefit of human health.

Fig. 1: Machine learning and multiscale modeling in the biological, biomedical, and behavioral sciences.
figure 1

Machine learning and multiscale modeling interact on the parameter level via constraining parameter spaces, identifying parameter values, and analyzing sensitivity and on the system level via exploiting the underlying physics, constraining design spaces, and identifying system dynamics. Machine learning provides the appropriate tools towards supplementing training data, preventing overfitting, managing ill-posed problems, creating surrogate models, and quantifying uncertainty with the ultimate goal being to explore massive design spaces and identify correlations. Multiscale modeling integrates the underlying physics towards identifying relevant features, exploring their interaction, elucidating mechanisms, bridging scales, and understanding the emergence of function with the ultimate goal of predicting system dynamics and identifying causality.

Challenges

A major challenge in the biological, biomedical, and behavioral sciences is to understand systems for which the underlying data are incomplete and the physics are not yet fully understood. In other words, with a complete set of high-resolution data, we could apply machine learning to explore design spaces and identify correlations; with a validated and calibrated set of physics equations and material parameters, we could apply multiscale modeling to predict system dynamics and identify causality. By integrating machine learning and multiscale modeling we can leverage the potential of both, with the ultimate goal of providing quantitative predictive insight into biological systems. Figure 2 illustrates how we could integrate machine learning and multiscale modeling to better understand the cardiac system.

Fig. 2: Machine learning and multiscale modeling of the cardiac system.
figure 2

Multiscale modeling can teach machine learning how to exploit the underlying physics described by, e.g., the ordinary differential equations of cellular electrophysiology and the partial differential equations of electro-mechanical coupling, and constrain the design spaces; machine learning can teach multiscale modeling how to identify parameter values, e.g., the gating variables that govern local ion channel dynamics, and identify system dynamics, e.g., the anisotropic signal propagation that governs global diffusion. This natural synergy presents new challenges and opportunities in the biological, biomedical, and behavioral sciences.

Ordinary differential equations encode temporal evolution into machine learning

Ordinary differential equations in time are ubiquitous in the biological, biomedical, and behavior sciences. This is largely because it is relatively easy to make observations and acquire data at the molecular, cellular, organ, or population scales without accounting for spatial heterogeneity, which is often more difficult to access. The descriptions typically range from single ordinary differential equations to large systems of ordinary differential equations or stochastic ordinary differential equations. Consequently, the number of parameters is large and can easily reach thousands or more.32,33 Given adequate data, the challenge begins with identifying the nonlinear, coupled driving terms.34 To analyze the data, we can apply formal methods of system identification, including classical regression and stepwise regression.24,26 These approaches are posed as nonlinear optimization problems to determine the set of coefficients by multiplying combinations of algebraic and rate terms that result in the best fit to the observations. Given adequate data, system identification works with notable robustness and can learn a parsimonious set of coefficients, especially when using stepwise regression. In addition to identifying coefficients, the system identification should also address uncertainty quantification and account for both measurement errors and model errors. The Bayesian setting provides a formal framework for this purpose.35 Recent system identification techniques24,26,36,37,38,39,40 start from a large space of candidate terms in the ordinary differential equations to systematically control and treat model errors. Machine learning can provide a powerful approach to reduce the number of dynamical variables and parameters while maintaining the biological relevance of the model.24,41

Partial differential equations encode physics-based knowledge into machine learning

The interaction between the different scales, from the cell to the tissue and organ levels, is generally complex and involves temporally and spatially varying fields with many unknown parameters.42 Prior physics-based information in the form of partial differential equations, boundary conditions, and constraints can regularize a machine learning approach in such a way that it can robustly learn from small and noisy data that evolve in time and space. Gaussian processes and neural networks have proven particularly powerful in this regard.43,44,45 For Gaussian process regression, the partial differential equation is encoded in an informative function prior;46 for deep neural networks, the partial differential equation induces a new neural network coupled to the standard uninformed data-driven neural network,22 see Fig. 3. This coupling of data and partial differential equations into a deep neural network presents itself as an approach to impose physics as a constraint on the expressive power of the latter. New theory driven approaches are required to extend this approach to stochastic partial differential equations using generative adversarial networks, for fractional partial differential equations in systems with memory using high-order discrete formulas, and for coupled systems of partial differential equations in multiscale multiphysics modeling. Multiscale modeling is a critical step, since biological systems typically possess a hierarchy of structure, mechanical properties, and function across the spatial and temporal scales. Over the past decade, modeling multiscale phenomena has been a major point of attention, which has advanced detailed deterministic models and their coupling across scales.13 Recently, machine learning has permeated into the multiscale modeling of hierarchical engineering materials3,44,47,48 and into the solution of high-dimensional partial differential equations with deep learning methods.34,43,49,50,51,52,53 Uncertainty quantification in material properties is also gaining relevance,54 with examples of Bayesian model selection to calibrate strain energy functions55,56 and uncertainty propagation with Gaussian processes of nonlinear mechanical systems.57,58,59 These trends for non-biological systems point towards immediate opportunities for integrating machine learning and multiscale modeling in the biological, biomedical, and behavioral sciences and opens new perspectives that are unique to the living nature of biological systems.

Fig. 3: Partial differential equations encode physics-based knowledge into machine learning.
figure 3

Physics imposed on neural networks. The neural network on the left, as yet unconstrained by physics, represents the solution u(x, t) of the partial differential equation; the neural network on the right describes the residual f(x, t) of the partial differential equation. The example illustrates a one-dimensional version of the Schrödinger equation with unknown parameters λ1 and λ2 to be learned. In addition to unknown parameters, we can learn missing functional terms in the partial differential equation. Currently, this optimization is done empirically based on trial and error by a human-in-the-loop. Here, the u-architecture is a fully connected neural network, while the f-architecture is dictated by the partial differential equation and is, in general, not possible to visualize explicitly. Its depth is proportional to the highest derivative in the partial differential equation times the depth of the uninformed neural network.

Data-driven machine learning seeks correlations in big data

Machine learning can be regarded as an extension of classical statistical modeling that can digest massive amounts of data to identify high-order correlations and generate predictions. This is not only important in view of the rapid developments of ultra-high-resolution measurement techniques,60 including cryo-EM, high-resolution imaging flow cytometry, or four-dimensional-flow magnetic resonance imaging, but also when analyzing large-scale health data from wearable and smartphone apps.61,62 Machine learning can play a central role in helping us mine these data more effectively and bring experiment, modeling, and computation closer together.63 We can use machine learning as a tool in developing artificial intelligence applications to solve complex biological, biomedical, or behavioral systems.64 Figure 4 illustrates a framework for integrating machine learning and multiscale modeling with a view towards data-driven approaches. Most data-driven machine learning techniques seek correlation rather than causality. Some machine learning techniques, e.g., Granger causality65 or dynamic causal modeling,66 do seek causality, but without mechanisms. In contrast to machine learning, multiscale modeling seeks to provide not only correlation or causality but also the underlying mechanism.20 This suggests that machine learning and multiscale modeling can effectively complement one another when analyzing big data: Where machine learning reveals a correlation, multiscale modeling can probe whether this correlation is causal, and can unpack cause into mechanisms or mechanistic chains at lower scales.28 This unpacking is particularly important in personalized medicine where each patient’s disease process is a unique variant, traditionally lumped into large populations by evidence based medicine, whether through the use of statistics, machine learning, or artificial intelligence. Multiscale models can split the variegated patient population apart by identifying mechanistic variants based on differences in genome of the patient, as well as genomes of invasive organisms or tumor cells, or immunological history. This is an important step towards creating a digital twin, a multiscale model of an organ system or a disease process, where we can develop therapies without risk to the patient. As multiscale modeling attempts to leverage the vast volume of experimental data to gain understanding, machine learning will provide invaluable tools to preprocess these data, automate the construction of models, and analyze the similarly vast output data generated by multiscale modeling.67,68

Fig. 4: Data-driven machine learning seeks correlations in big data.
figure 4

This general framework integrates data-driven multiscale modeling and machine learning by performing organ, cellular, or molecular level simulations and systematically comparing the simulation results against experimental target data using machine learning analysis including clustering, regression, dimensionality reduction, reinforcement learning, and deep learning with the objectives to identify parameters, generate new hypotheses, or optimize treatment.

Theory-driven machine learning seeks causality by integrating physics and big data

The basic idea of theory-driven machine learning is, given a physics-based ordinary or partial differential equation, how can we leverage structured physical laws and mechanistic models as informative prior information in a machine learning pipeline towards advancing modeling capabilities and expediting multiscale simulations? Figure 5 illustrates the integration of theory-driven machine learning and multiscale modeling to accelerate model- and data-driven discovery. Historically, we have solved this problem using dynamic programing and variational methods. Both are extremely powerful when we know the physics of the problem and can constrain the parameters space to reproduce experimental observations. However, when the underlying physics are unknown, or there is uncertainty about their form, we can adapt machine learning techniques that learn the underlying system dynamics. Theory-driven machine learning allows us to seamlessly integrate physics-based models at multiple temporal and spatial scales. For example, multifidelity techniques can combine coarse measurements and reduced order models to significantly accelerate the prediction of expensive experiments and large-scale computations.29,69 In drug development, for example, we can leverage theory-driven machine learning techniques to integrate information across ten orders of magnitude in space and time towards developing interpretable classifiers to characterize the pro-arrhythmic potential of drugs.70 Specifically, we can employ Gaussian process regression to effectively explore the interplay between drug concentration and drug toxicity using coarse, low-cost models, anchored by a few, judiciously selected, high-resolution simulations.27 Theory-driven machine learning techniques can also leverage probabilistic formulations to inform the judicious acquisition of new data and actively expedite tasks such as exploring massive design spaces or identifying system dynamics. For example, we could devise an effective data acquisition policy for choosing the most informative mesoscopic simulations that need to be performed to recover detailed constitutive laws as appropriate closures for macroscopic models of complex fluids.71 More recently, efforts have been made to directly bake-in theory into machine learning practice. This enables the construction of predictive models that adhere to the underlying physical principles, including conservation, symmetry, or invariance, while remaining robust even when the observed data are very limited. For example, a recent model only utilized conservation laws of reaction to model the metabolism of a cell. While the exact functional forms of the rate laws was unknown, the equations were solved using machine learning.72 An intriguing implication is related to their ability to leverage auxiliary observations to infer quantities of interest that are difficult to measure in practice.22 Another example includes the use of neural networks constrained by physics to infer the arterial blood pressure directly and non-invasively from four-dimensional magnetic resonance images of blood velocities and arterial wall displacements by leveraging the known dynamic correlations induced by first principles in fluid and solid mechanics.11 In personalized medicine, we can use theory-driven machine learning to classify patients into specific treatment regimens. While this is typically done by genome profiling alone, models that supplement the training data using simulations based on biological or physical principles can have greater classification power than models built on observed data alone. For the examples of radiation impact on cells and Boolean cancer modeling, a recent study has shown that, for small training datasets, simulation-based kernel methods that use approximate simulations to build a kernel improve the downstream machine learning performance and are superior over standard no-prior-knowledge machine learning techniques.73

Fig. 5: Theory-driven machine learning seeks causality by integrates prior knowledge and big data.
figure 5

Accelerating model- and data-driven discovery by integrating theory-driven machine learning and multiscale modeling. Theory-driven machine learning can yield data-efficient workflows for predictive modeling by synthesizing prior knowledge and multimodality data at different scales. Probabilistic formulations can also enable the quantification of predictive uncertainty and guide the judicious acquisition of new data in a dynamic model-refinement setting.

Open questions and opportunities

Numerous open questions and opportunities emerge from integrating machine learning and multiscale modeling in the biological, biomedical, and behavioral sciences. We address some of the most urgent ones below.

Managing ill-posed problems

Can we solve ill-posed inverse problems that arise during parameter or system identification? Unfortunately, many of the inverse problems for biological systems are ill posed. Mathematically speaking, they constitute boundary value problems with unknown boundary values. Classical mathematical approaches are not suitable in these cases. Methods for backward uncertainty quantification could potentially deal with the uncertainty involved in inverse problems, but these methods are difficult to scale to realistic settings. In view of the high-dimensional input space and the inherent uncertainty of biological systems, inverse problems will always be challenging. For example, it is difficult to determine if there are multiple solutions or no solutions at all, or to quantify the confidence in the prediction of an inverse problem with high-dimensional input data. Does the inherent regularization in the loss function of neural networks allow us to deal with ill-posed inverse partial differential equations without boundary or initial conditions and discover hidden states?

Identifying missing information

Are the parameters of the proposed model sufficient to provide a basic set to produce higher scale system dynamics? Multiscale simulations and generative networks can be set up to work in parallel, alongside the experiment, to provide an independent confirmation of parameter sensitivity. For example, circadian rhythm generators provide relatively simple dynamics but have very complex dependence on numerous underlying parameters, which multiscale modeling can reveal. An open opportunity exists to use generative models to identify both the underlying low dimensionality of the dynamics and the high dimensionality associated with parameter variation. Inadequate multiscale models could then be identified with failure of generative model predictions.

Creating surrogate models

Can we use generative adversarial networks to create new test datasets for multiscale models? Conversely, can we use multiscale modeling to provide training or test instances to create new surrogate models using deep learning? By using deep learning networks, we could provide answers more quickly than by using complex and sophisticated multiscale models. This could, for example, have significant applications in predicting pharmaceutical efficacy for patients with particular genetic inheritance in personalized medicine.

Discretizing space and time

Can we remove or automate the tyranny of grid generation in conventional methods? Discretization of complex and moving three-dimensional domains remains a time- and labor-intense challenge. It generally requires specific expertise and many hours of dedicated labor, and has to be re-done for every individual model. This becomes particularly relevant when creating personalized models with complex geometries at multiple spatial and temporal scales. While many efforts in machine learning are devoted to solving partial differential equations in a given domain, new opportunities arise for machine learning when dealing directly with the creation of the discrete problem. This includes automatic mesh generation, meshless interpolation, and parameterization of the domain itself as one of the inputs for the machine learning algorithm. Neural networks constrained by physics remove the notion of a mesh, but retain the more fundamental concept of basis functions: They impose the conservation laws of mass, momentum, and energy at, e.g., collocation points that, while neither connected through a regular lattice nor through an unstructured grid, serve to determine the parameters that define the basis functions.

Bridging the scales

Can machine learning provide scale bridging in cases where a relatively clean separation of scales is possible? For example, in cancer, machine learning could be used to explore responses of both immune cells and tumor cells based on single-cell data. This example points towards opportunities to build a multiscale model on the families of solutions to codify the evolution of the tumor at the organ or metastasis scales.

Supplementing training data

Can we use simulated data to supplement training data? Supervised learning, as used in deep networks, is a powerful technique, but requires large amounts of training data. Recent studies have shown that, in the area of object detection in image analysis, simulation augmented by domain randomization can be used successfully as a supplement to existing training data. In areas where multiscale models are well-developed, simulation across vast areas of parameter can, for example, supplement existing training data for nonlinear diffusion models to provide physics-informed machine learning. Similarly, multiscale models can be used in biological, biomedical, and behavioral systems to augment insufficient experimental or clinical datasets.

Quantifying uncertainty

Can theory-driven machine learning approaches enable the reliable characterization of predictive uncertainty and pinpoint its sources? Uncertainty quantification is the backbone of decision-making. This has many practical applications such as decision-making in the clinic, the robust design of synthetic biology pathways, drug target identification and drug risk assessment. There are also opportunities to use quantification to guide the informed, targeted acquisition of new data.

Exploring massive design spaces

Can theory-driven machine learning approaches uncover meaningful and compact representations for complex inter-connected processes, and, subsequently, enable the cost-effective exploration of vast combinatorial spaces? While this is already pretty common in the design of bio-molecules with target properties in drug development, there many other applications in biology and biomedicine that could benefit from these technologies.

Elucidating mechanisms

Can theory-driven machine learning approaches enable the discovery of interpretable models that cannot only explain data, but also elucidate mechanisms, distill causality, and help us probe interventions and counterfactuals in complex multiscale systems? For instance, causal inference generally uses various statistical measures such as partial correlation to infer causal influence. If instead, the appropriate statistical measure were known from the underlying physics, would the causal inference be more accurate or interpretable as a mechanism?

Understanding emergence of function

Can theory-driven machine learning, combined with sparse and indirect measurements, produce a mechanistic understanding of the emergence of biological function? Understanding the emergence of function is of critical importance in biology and medicine, environmental studies, biotechnology, and other biological sciences. The study of emergence critically relies on our ability to model collective action on a lower scale to predict how the phenomena on the higher scale emerges from this collective action.

Harnessing biologically inspired learning

Can we harness biological learning to design more efficient algorithms and architectures? Artificial intelligence through deep learning is an exciting recent development that has seen remarkable success in solving problems, which are difficult for humans. Typical examples include chess and Go, as well as the classical problem of image recognition, that, although superficially easy, engages broad areas of the brain. By contrast, activities that neuronal networks are particularly good at remain beyond the reach of these techniques, for example, the control systems of a mosquito engaged in evasion and targeting are remarkable considering the small neuronal network involved. This limitation provides opportunities for more detailed brain models to assist in developing new architectures and new learning algorithms.

Preventing overfitting

Can we use prior physics-based knowledge to avoid overfitting or non-physical predictions? How can we calibrate and validate the proposed models without overfitting? How can we apply cross-validation to simulated data, especially when the simulations may contain long-time correlations? From a conceptual point of view, this is a problem of supplementing the set of known physics-based equations with constitutive equations, an approach, which has long been used in traditional engineering disciplines. While data-driven methods can provide solutions that are not constrained by preconceived notions or models, their predictions should not violate the fundamental laws of physics. Sometimes it is difficult to determine whether the model predictions obey these fundamental laws, especially when the functional form of the model cannot be determined explicitly. This makes it difficult to know whether the analysis predicts the correct answer for the right reasons. There are well-known examples of deep learning neural networks that appear to be highly accurate, but make highly inaccurate predictions when faced with data outside their training regime, and others that make highly inaccurate predictions based on seemingly minor changes to the target data. To address this limitation, there are numerous opportunities to combine machine learning and multiscale modeling towards a priori satisfying the fundamental laws of physics, and, at the same time, preventing overfitting of the data.

Minimizing data bias

Can an arrhythmia patient trust a neural net controller embedded in a pacemaker that was trained under different environmental conditions than the ones during his own use? Training data come at various scales and different levels of fidelity. Data are typically generated by existing models, experimental assays, historical data, and other surveys, all of which come with their own inductive biases. Machine learning algorithms can only be as good as the data they have seen. This implies that proper care needs to be taken to safe-guard against biased datasets. New theory-driven approaches could provide a rigorous foundation to estimate the range of validity, quantify the uncertainty, and characterize the level of confidence of machine learning based approaches.

Increasing rigor and reproducibility

Can we establish rigorous validation tests and guidelines to thoroughly test the predictive power of models built with machine learning algorithms? The use of open source codes and data sharing by the machine learning community is a positive step, but more benchmarks and guidelines are needed for neural networks constrained by physics. Reproducibility has to be quantified in terms of statistical metrics, as many optimization methods are stochastic in nature and may lead to different results. In addition to memory, the 32-bit limitation of current GPU systems is particularly troubling for modeling biological systems where steep gradients and very fast multirate dynamics may require 64-bit arithmetic, which, in turn, may require ten times more computational time with the current technologies.

Conclusions

Machine learning and multiscale modeling naturally complement and mutually benefit from one another. Machine learning can explore massive design spaces to identify correlations and multiscale modeling can predict system dynamics to identify causality. Recent trends suggest that integrating machine learning and multiscale modeling could become key to better understand biological, biomedical, and behavioral systems. Along those lines, we have identified five major challenges in moving the field forward.

The first challenge is to create robust predictive mechanistic models when dealing with sparse data. The lack of sufficient data is a common problem in modeling biological, biomedical, and behavioral systems. For example, it can result from an inadequate experimental resolution or an incomplete medical history. A critical first step is to systematically identify the missing information. Experimentally, this can guide the judicious acquisition of new data or even the design of new experiments to complement the knowledge base. Computationally, this can motivate supplementing the available training data by performing computational simulations. Ultimately, the challenge is to maximize information gain and optimize efficiency by combining low- and high-resolution data and integrating data from different sources, which, in machine learning terms, introduces a multifidelity, multimodality approach.

The second challenge is to manage ill-posed problems. Unfortunately, ill-posed problems are relatively common in the biological, biomedical, and behavioral sciences and can result from inverse modeling, for example, when identifying parameter values or identifying system dynamics. A potential solution is to combine deterministic and stochastic models. Coupling the deterministic equations of classical physics—the balance of mass, momentum, and energy—with the stochastic equations of living systems—cell-signaling networks or reaction-diffusion equations—could help guide the design of computational models for problems that are otherwise ill-posed. Along those lines, physics-informed neural networks and physics-informed deep learning are promising approaches that inherently use constrained parameter spaces and constrained design spaces to manage ill-posed problems. Beyond improving and combining existing techniques, we could even think of developing entirely novel architectures and new algorithms to understand ill-posed biological problems inspired by biological learning.

The third challenge is to efficiently explore massive design spaces to identify correlations. With the rapid developments in gene sequencing and wearable electronics, the personalized biomedical data has become as accessible and inexpensive as never before. However, efficiently analyzing big datasets within massive design spaces remains a logistic and computational challenge. Multiscale modeling allows us to integrate physics-based knowledge to bridge the scales and efficiently pass information across temporal and spatial scales. Machine learning can utilize these insights for efficient model reduction towards creating surrogate models that drastically reduce the underlying parameter space. Ultimately, the efficient analytics of big data, ideally in real time, is a challenging step towards bringing artificial intelligence solutions into the clinic.

The fourth challenge is to robustly predict system dynamics to identify causality. Indeed, this is the actual driving force behind integrating machine learning and multiscale modeling for biological, biomedical, and behavioral systems. Can we eventually utilize our models to identify relevant biological features and explore their interaction in real time? A very practical example of immediate translational value is whether we can identify disease progression biomarkers and elucidate mechanisms from massive datasets, for example, early biomarkers of neurodegenerative disease, by exploiting the fundamental laws of physics. On a more abstract level, the ultimate challenge is to advance data- and theory-driven approaches to create a mechanistic understanding of the emergence of biological function to explain phenomena at higher scale as a result of the collective action on lower scales.

The fifth challenge is to know the limitations of machine learning and multiscale modeling. Important steps in this direction are analyzing sensitivity and quantifying of uncertainty. While machine learning tools are increasingly used to perform sensitivity analysis and uncertainty quantification for biological systems, they are at a high risk of overfitting and generating non-physical predictions. Ultimately, our approaches can only be as good as the underlying models and the data they have been trained on, and we have to be aware of model limitations and data bias. Preventing overfitting, minimizing data bias, and increasing rigor and reproducibility have been and will always remain the major challenges in creating predictive models for biological, biomedical, and behavioral systems.