Science-Informed Deep Learning (ScIDL) With Applications to Wireless Communications

Atefeh Termehchi, Ekram Hossain, , and Isaac Woungang Atefeh Termehchi and Ekram Hossain are with the Department of Electrical and Computer Engineering at the University of Manitoba, Winnipeg, Canada (emails: atefeh.termehchi@umanitoba.ca and ekram.hossain@umanitoba.ca), and Isaac Woungang is with the Department of Computer Science, Toronto Metropolitan University, Toronto, Canada (email: iwoungan@torontomu.ca).

Abstract

Given the extensive and growing capabilities offered by deep learning (DL), more researchers are turning to DL to address complex challenges in next-generation (xG) communications. However, despite its progress, DL also reveals several limitations that are becoming increasingly evident. One significant issue is its lack of interpretability, which is especially critical for safety-sensitive applications. Another significant consideration is that DL may not comply with the constraints set by physics laws or given security standards, which are essential for reliable DL. Additionally, DL models often struggle outside their training data distributions, which is known as poor generalization. Moreover, there is a scarcity of theoretical guidance on designing DL algorithms. These challenges have prompted the emergence of a burgeoning field known as science-informed DL (ScIDL). ScIDL aims to integrate existing scientific knowledge with DL techniques to develop more powerful algorithms. The core objective of this article is to provide a brief tutorial on ScIDL that illustrates its building blocks and distinguishes it from conventional DL. Furthermore, we discuss both recent applications of ScIDL and potential future research directions in the field of wireless communications.

Index Terms:

Science-informed deep learning (ScIDL), interpretability, generalization, physics consistency, scientific knowledge, wireless communications.

I Introduction

Given the substantial success of deep learning (DL) across various research and commercial domains, it is viewed as a promising alternative to traditional scientific methods in wireless communications. This is particularly evident in tackling intricate challenges within 5G and early 6G applications such as autonomous vehicles, e-health systems, and Industry 4.0. Traditional scientific methods, using physics-based models and conventional optimization algorithms, have successfully advanced wireless communications technologies from 1G to 5G. However, most physics-based models inherently involve approximations due to an incomplete understanding of real-world conditions, including uncertainties, noises, perturbations, and disturbances. In addition, they frequently entail complex nonlinear models with numerous parameters requiring estimation based on limited observed data. For example, in terahertz (THz) communication systems, physics-based models for signal propagation are grounded in fundamental principles. However, they often rely on approximations because of an inadequate comprehension of real-world circumstances. This can lead to inaccuracies in predicting the signal strength. Moreover, conventional optimization algorithms used in scientific methods, have demonstrated high computational complexity in solving 5G and early 6G challenges. Consequently, the purely scientific methods approach may not be suitable for solving problems online and in large-scale, complicated wireless network applications. These are the reasons DL has attracted significant interest in the field of wireless communication.

DL algorithms learn the models of the world from data and make decisions without the need for any existing theories. Over the past decades, significant advancements have occurred in DL, encompassing: i) the integration of deeper neural network (NN) designs, ii) the design of improved training algorithms, iii) the rise of more powerful computing architectures, iv) the introduction of enhanced capacities for collecting and storing vast amounts of sensory data, and v) improved online connectivity [1, 2]. These crucial breakthroughs have elevated the importance of DL in tackling various challenges across different fields, including wireless communication. Consequently, a dominant research theme has emerged: “Apply DL to problem X”. However, despite these progresses, various limitations of DL have been coming to light.

The primary concern with most DL methods lies in their lack of interpretability. While these methods can effectively learn complex phenomena, they are often perceived as “black boxes”, with limited insight into how they represent and analyze the information about the world. The concept of interpretability from a physics perspective revolves around the use of governing equations to explain phenomena such as signal attenuation, interference, and channel fading. These equations contain a consistent set of terms with clear physical interpretations. For instance, the principles governing signal propagation remain consistent whether in urban or rural areas, with modifications in parameters like signal strength and interference levels. The lack of interpretability is particularly crucial in safety-critical wireless applications, where understanding the rationale behind DL’s predictions is necessary [2]. Moreover, DL methods can yield results that contradict the established scientific principles or security standards. For example, DL algorithms applied in wireless communications may occasionally generate outcomes that are inconsistent with known laws of electromagnetic propagation. This poses some challenges for ensuring a reliable and accurate signal transmission in communication networks [3]. Another significant issue with DL methods is their frequent inability to generalize beyond training scenarios. Relationships established by DL models are typically valid only for the specific combinations of variables present in the training dataset (i.e. data training distribution) and cannot extend to unseen scenarios. Therefore, employing DL methods to address problems in non-stationary wireless environments presents significant challenges. Finally, there is a lack of theoretical guidance regarding the design of DL algorithms suitable for specific tasks. While emerging fields such as neural architecture search are starting to provide more automated approaches, selecting deep neural network (DNN) architectures often still relies on trial-and-error methods.

Both scientific and DL methods have distinct advantages and have been successfully applied to various wireless network applications. However, they face particular limitations when applied to high dimensional problems with significant uncertainty and a non-stationary environment. Consequently, a burgeoning field called science-informed deep learning (ScIDL) has emerged [1, 4, 5, 6, 7]. ScIDL aims to integrate the existing scientific knowledge such as physics laws or expert knowledge, with DL. The goal of ScIDL is to enhance DL algorithms by embedding centuries of scientific knowledge into the DL framework, with the added aim of leveraging DL to discover new or more accurate scientific understanding (Fig. 1).

Refer to caption — Figure 1: Science-informed deep learning

Indeed, DL and data-driven methods can offer valuable tools for advancing our understanding of physics and improving the accuracy and applicability of models in various domains. Notably, research in this area has been referred to by various terms, including “physics-guided machine learning”, “physics-informed machine learning”, “theory-guided data science”, “physics-informed neural network”, “hybrid model-based/data-driven” or “physics-aware AI”, despite its application across multiple scientific fields. In this article, we provide a brief tutorial overview of ScIDL and present a taxonomy of research themes in ScIDL. In addition, our article provides a road map for researchers, which includes the challenges, implementation guidelines, and future research directions for applying ScIDL in the field of wireless communications.

II Science-informed Deep Learning

As discussed above, neither a purely data-driven DL approach nor an exclusively scientific knowledge-based approach is sufficient for addressing complex scientific and engineering challenges. Therefore, researchers are now exploring the integration of scientific knowledge and DL techniques in a complementary and synergistic way [5]. This methodology is substantially different from the conventional practices in the DL. Typically, domain-specific knowledge is utilized during feature engineering or in the post-processing stage. However, this new paradigm focuses on embedding scientific knowledge directly within the DL framework. For example, physics-based algebraic equations can be incorporated as constraints into the loss functions of NN, known as physics-informed NN (PINN) [8], or variational physics-informed NN (VPINN) [9], which integrates the variational form of the underlying differential equation into the loss function. Moreover, in this new paradigm, DL and data-driven techniques are used to discover or refine the existing scientific knowledge. Indeed, there are diverse sources and representations of scientific knowledge across various disciplines and applications. Accordingly, researchers have developed numerous methods for integrating scientific knowledge into the framework of DL. In this section, first, we will explain the different possible sources of scientific knowledge that can be integrated into DL. In addition, we will briefly discuss how this knowledge can be represented. Second, a category of different methods for embedding scientific knowledge into DL will be presented. Finally, we will explore how DL and data-driven methods can enhance scientific knowledge.

II-A Source and Representation of Scientific Knowledge

II-A1 Source of Scientific Knowledge

Scientific knowledge refers to the understanding gained from a rigorous examination of the dynamics of the natural and physical world. This understanding is primarily achieved through observation, measurement, experimentation, and the development of formal models to explain these findings [10]. The source of scientific knowledge can be categorized into physics knowledge, world facts, and expert knowledge, all of which are formalized and validated.

Physics knowledge encompasses an understanding of the fundamental principles, theories, and laws that govern the behavior of physical phenomena. They are formulated through mathematical equations. For example, the Shannon-Hartley theorem is a physics knowledge, that dictates the maximum rate of reliable information transmission over noisy channels.

World facts are empirical observations about the natural or physical world. These are objective pieces of data collected from real-world environments, which are used to inform the models and theories. For instance, real-world measurements of how wireless signals propagate in urban environments include factors like building materials, density, and reflections.

Expert knowledge refers to specialized understanding and skills acquired through extensive training and experience in a particular field. For instance, network access, control policies, and intrusion detection protocols are typically crafted by security experts who understand the threats and vulnerabilities specific to wireless networks. Security policies regarding which devices can access the network and under what conditions, can be integrated into a DRL model.

II-A2 Representation of Scientific Knowledge

Here, we will introduce a category of diverse formal frameworks utilized for representing scientific knowledge in a systematic manner. This category encompasses differential equations, logic rules, algebraic equations, graph-based representation, production rules, and geometric properties [6, 11].

Differential equations: Differential equations are mathematical equations that they describe the relationship between a function and its rates of change. In wireless communications, the wave equation is a common differential equation used to model the propagation of electromagnetic waves.

Logic rules: Logic rules, also known as logical expressions or propositional logic, are a formal way of representing scientific knowledge. They describe facts and dependencies using statements that can be either true or false. These rules use logical operators such as AND, OR, NOT, and IMPLIES.

Algebraic equations: Algebraic equations are mathematical statements that express the equality or inequality of two algebraic expressions involving variables, and constants. For instance, the path loss model is given by an algebraic equation.

Graph-based representation: A graph is a structured representation of knowledge in the form of a graph, consisting of nodes (vertices) and edges. In a graph, nodes represent entities or concepts, while edges represent the relationships or connections between these entities. There are multiple types of graph-based representations, such as semantic networks and Bayesian networks. Semantic networks play a crucial role in natural language processing by capturing the meaning of words and their relationships.

Production rules: Production rules represent the knowledge and control of the reasoning process in rule-based systems. A production rule consists of two parts: a condition and an action (also called a consequent or conclusion). The condition part outlines a collection of conditions or criteria that need to be satisfied for the rule to be activated or executed. For example, security policies can be represented as production rules.

Geometric properties: Geometric properties describe the characteristics that remain unchanged under mathematical transformations such as translations and rotations. If an object remains unchanged after such transformations, it possesses symmetry. Similarly, a function can be considered invariant if it produces the same outcome when its argument is subjected to a symmetric transformation.

II-B Methods for Embedding Science Knowledge into DL

We describe five broad categories of methods for embedding scientific knowledge into DL, along with instances from the field of wireless communication. This calcification is developed based on different phases involved in the DL process encompassing: i) science-informed initialization, ii) science-informed design and architecture of DL, iii) science-informed loss function, iv) science-informed optimization algorithm, and v) science-informed refinement of DL results (Fig. 2). Notably, various combinations of these methods can be employed for a specific problem.

II-B1 Science-Informed Initialization

The initialization phase of a DL method includes data generation, validation and analysis of data, and initial parameter setting. Here, we focus on methods of embedding scientific knowledge into the initialization phase.

Science-informed data generation: Obtaining a sufficient amount of observed data in real-world systems can be challenging, and also creating new data can be both expensive and time-consuming. In the field of wireless communication, traditional physics-based approaches for generating virtual simulation data can be useful under specific conditions. However, these approaches involve running physics-based model simulations or conducting experiments, which can be extremely time-consuming. Hence, generative DL methods have recently gained significant attention across diverse domains. The idea behind these methods is to understand the underlying probabilistic distribution to create new data that resembles the original training data. Nonetheless, a notable drawback of generative DL methods is that they require large amounts of high-quality data and significant computational power to ensure physics consistency and generalization. Consequently, an emerging area of study focuses on developing generative DL capable of leveraging scientific knowledge such as physical laws. Specifically, a constrained generative adversarial neural network (GAN) that incorporates known statistical properties and behaviors of a system will result in an improved DL-based emulator. In [12], the authors introduced a constrained GAN by considering the covariance constraints. They improved the GAN’s ability to emulate the data, resulting in a model that accurately captures the true and known statistics of the training dataset. Indeed, integrating scientific knowledge with generative DL can enhance data efficiency and lower computational complexity. Another approach for incorporating the scientific knowledge into data generation is utilizing known geometric attributes of the system. The real-world dataset can be augmented by applying scientific knowledge such as knowledge regarding geometric properties. For example, in wireless communications, antenna radiation patterns often exhibit symmetric properties. Specifically, a dipole antenna typically has a symmetric radiation pattern. Recognizing this symmetry can create additional training data by generating mirrored versions of existing radiation patterns. This augmented dataset can then be used to train DL algorithms for tasks such as beamforming or antenna selection. Accordingly, the dataset augmentation can improve the DL generalizability.

Science-informed validation and analysis of data: In traditional DL, prior knowledge is integrated through labeling or feature engineering. However, this section focuses on the validation and analysis of data using domain scientific knowledge. It involves two key steps to ensure the quality, reliability, and relevance of the training data for a given application:

•

Science consistency checks: These are meant to ensure that the data adhere to known domain-specific rules and patterns. For example, ensuring that the signal strength values adhere to known propagation models and physical constraints (e.g., in an open environment, the signal strength decreases with the distance according to the inverse square law).
•

Completeness: This refers to using domain scientific knowledge to infer or estimate missing values. For instance, ensuring that the dataset covers various operating conditions, including frequencies, antenna configurations, and environmental scenarios.

Science-informed initial parameter setting: DL methods, especially iterative ones, often require an initial choice of hyperparameters to start the learning process. If this initialization is inadequate, it can result in the development of poor DL models. Employing domain scientific knowledge during initialization can steer the learning algorithm toward selecting the models that are both generalizable and physically consistent [1]. This is particularly crucial in DNNs, where improper initialization may cause the models to become trapped in local optimal. Moreover, leveraging scientific knowledge to guide the initialization of weights can expedite the model training phase, reducing the number of epochs needed for convergence and reducing the requirement for a large volume of training data to attain a satisfactory performance [13]. One approach to inform DL initialization with scientific knowledge is to use a DL technique called transfer learning [5]. In transfer learning, a model is initially trained on a related task before being fine-tuned with a limited amount of training data for the target task. This pre-trained model provides a well-informed initial state, which is closer to the appropriate parameters for the target task compared to random initialization. A practical way to implement this is by pre-training the DL model using simulated data from a physics-based model. For instance, when designing an NN for channel estimation in the satellite-aided network, initializing the weights with simulated data from known wireless channel propagation characteristics can expedite the convergence of the model during the training process. This approach not only accelerates the training process but also reduces the reliance on extensive training datasets.

II-B2 Science-Informed Design and Architecture of DL

Interpretability is a desirable but often missing feature in DL methods, which typically operate as black boxes. To address this, recent research has focused on developing new DL architectures that leverage the unique characteristics of the problem at hand. These architectures integrate scientific knowledge into the design process. This approach not only enhances the model’s ability to solve specific problems but also improves the interpretability of DL [5]. The inherent flexibility and modular design of NNs make them ideal candidates for modifications and customization in their architecture. In the following, we outline four potential directions for integrating scientific knowledge into DL architecture (e.g. deep neural network [DNN] architecture).

Science-informed node and layer connections in DNNs: To capture the known relationships among variables, domain knowledge can be utilized in defining the node connections. In recent years, a growing number of researchers have advocated using GNNs to embed graph-based knowledge into NNs, especially within the field of wireless communications [14]. This approach has significantly enhanced the NNs’ ability to learn from and model the interactions between the nodes with greater accuracy and efficiency. In wireless communication, the network topology frequently changes due to resource reallocation and the movement of users, resulting in dynamic graphs. Consequently, the authors in [15] developed an innovative learning model that considers the interdependencies among communication devices and the evolving nature of wireless networks. Another related approach is using deep unfolding techniques. The unfolding technique involves the mapping of the iterations of optimization algorithms into trainable NN layers. For example, in [16], the authors employed deep unfolding for signal detection problems in a multi-input multi-output (MIMO) network. In addition, in [17], the authors proposed an algorithm of deep unfolding weighted minimum mean square error for beamforming in a multi-user multiple-input single-output network. They utilize the deep unfolding technique instead of end-to-end learning to manage the computational complexity of beamforming optimization in this network. In deep unfolding, the number of layers can be considered fixed based on the required performance. Therefore, the computational complexity of this method increases proportionally with the sum of the sizes of the input and output variables, whereas the complexity of traditional optimization may increase exponentially. Additionally, deep unfolding significantly addresses the challenges of architecture selection and interpretability that are common in end-to-end learning.

Define intermediate physical variables: One approach for integrating physics knowledge into NN design is to associate physical meaning to specific neurons within the network. This can be achieved by explicitly defining physically relevant variables and computing the intermediate physical variables along the neural pathway from inputs to outputs. For example, consider a scenario where a Terahertz-enabled base station (BS) aims to dynamically allocate the resources to users in its coverage area while considering physical constraints and QoS requirements. Specifically, the BS needs to allocate resources such as frequency bands and transmit power levels to the users to maximize spectral efficiency while adhering to the physical constraints and QoS requirements. A Long Short Term Memory (LSTM)-based auto-encoder framework can be used to extract the temporal features from historical network data. These features capture the temporal variations of traffic patterns (users’ locations). By utilizing the extracted temporal features along with additional environmental factors (e.g. concentration of water vapor molecules in the propagation environment, speed of light, operating THz frequency), another LSTM predicts an intermediate physical quantity namely the path loss. This prediction is made while ensuring that it adheres to the related physics knowledge (by using science-informed loss function methods discussed in the next subsection). Then, a multi-layer perceptron model can be utilized to combine the predicted path loss with the input features (e.g. QoS demands) to allocate the resources to each UE. This example is inspired by the architecture proposed in [18]. Furthermore, a related method involves setting specific weights within the NN to fixed values with physical meaning, thereby mimicking known physical behaviors for a variable. For instance, specific key weights in an NN can be determined during the training phase to mimic the free-space path loss model and ensure that the NN respects the inverse square law of signal attenuation with distance. These critical weights can then be fixed to maintain the mimicking behavior. Afterward, the NN continues training with the remaining adjustable weights on real-world data. This approach allows the NN to adapt to complex patterns while maintaining the mimicking behavior. Another similar method is the science knowledge-informed connections between layers. For example, decomposing a complicated problem into several sub-problems. Each of these sub-problems can be addressed using one or a couple of the NN’s layers. The inputs and outputs of these models are then linked together based on the physical relationships among the sub-problems.

Embedding invariance and symmetry properties into DL: Symmetries such as translational and rotational invariance serve important roles in physics, influencing the formulation of fundamental laws governing nature’s behavior. Therefore, incorporating symmetries into DL models is likely to enhance the physical consistency and generalizability of DL. State-of-the-art DL architectures already capture certain types of invariance. For instance, recurrent neural networks (RNNs) capture the temporal invariance, while convolutional neural networks (CNNs) implicitly handle the spatial translation, rotation, and scale invariance. A more recent trend involves incorporating spatial invariances as part of the CNN architecture, giving rise to a field known as geometric DL. Similarly, it is required to embody in DL methods other types of invariance based on physical laws. For example, the authors in [19] introduced several techniques to enforce various symmetries such as translation, rotation, uniform motion, and scale in CNNs for modeling dynamical systems.

Other directions: Various other methods are possible for incorporating scientific knowledge into the architecture of machine learning (ML) models. For example, Gaussian process regression (GPR) is a non-parametric, Bayesian method for regression that is gaining popularity in ML methods. A novel approach to embedding physical knowledge into GPR involves integrating differential equations directly into the kernel function. Indeed, the authors in [20] demonstrated that the covariance function can explicitly represent the fundamental physical laws expressed by differential equations. Another way to integrate scientific knowledge into NN design is the science-informed choice of the activation function. Activation functions are crucial in NNs as they introduce non-linearity, enabling the network to learn complex patterns and make sophisticated decisions. For example, the authors in [21] proposed two learnable activation functions, one at the layer level and the other at the neuron level. In the layer-wise approach, a scalable parameter is introduced for each layer, whereas in the neuron-wise approach, a separate scalable parameter is introduced for each neuron. These parameters are then optimized alongside the weights and biases using the stochastic gradient descent algorithm. In another related work, authors in [22] incorporated the principles from the stability properties of differential equations in dynamic systems modeling to guide the design of activation functions in an RNN.

II-B3 Science-Informed Loss Function

DL algorithms typically involve a loss function, which provides feedback to the learning algorithm, guiding the adjustment of the DL model’s parameters. This guidance is typically achieved through optimization methods like gradient descent to minimize the loss. In the case of using DL for behavior prediction in an environment, the loss function involves measuring the difference between the learned model’s predictions and the actual target values. However, in scientific problems, there are intricate relationships among numerous physical variables that change across space and time at various scales. Conventional DL models may struggle to directly capture these relationships from the data, particularly when confronted with limited observational data. This limitation contributes to the DL models’ inability to generalize to situations not present in the training dataset [5]. To address this, modifying the loss function to incorporate scientific constraints is proposed as follows [7]:

\begin{split}LossFunc&=W_{data}Loss_{data}(Y_{actual},Y_{pred})\\ &+W_{sci}Loss_{sci}(Y_{pred}),\end{split}

(1)

where $Loss_{data}$ is the loss for measuring a supervised error between the actual-labels data ( $Y_{actual}$ ) and the predicted-labels data ( $Y_{pred}$ ), and $Loss_{sci}$ indicates the unsupervised loss related to scientific constraints. In addition, $W_{data}$ and $W_{sci}$ are the weights used to balance the interaction between the two loss terms. These weights can be user-defined or automatically tuned, and they are crucial for enhancing the generalizability of the learned model. This modification is known as a ‘soft’ way of enforcing scientific constraints in DL. It helps the DL models to learn more generalizable dynamic patterns and remain consistent with the established scientific knowledge. For instance, constraints such as logic rules [23] or algebraic equations [24] have been incorporated into loss functions. More specifically, in [23], a semantic loss function is formulated to connect the neural output vectors with the logical constraints. In addition, in [25], the authors introduced physics-informed neural operators to train the Fourier neural operators with a physics-informed loss function. These Fourier neural operators are utilized to map between the function spaces using stacked Fourier layers. These layers transform the inputs into the Fourier domain and truncate them to a fixed number of modes. Using Fourier layers enables invariant mappings between the function spaces, ensuring robustness to the variations in the input data. As an example, wireless/mobile edge computing involves managing the queue lengths and offloading computing tasks from the mobile devices efficiently, which is crucial to ensuring low latency. The dynamics of queue lengths and offloading decisions can be modeled using differential equations. Integrating these equations into a NN model, that predicts the queue lengths, helps the model respect the underlying principles governing the queue behavior and offloading mechanisms. Adding scientific knowledge to the loss function is similar to adding a scientific term to the reward function when using deep reinforcement learning (DRL), where the optimization objective is to maximize the reward function. For example, in [26], authors utilized the queuing dynamics model to propose a Lyapunov candidate function. Then, they derive the Lyapunov drift based on the definition of the Lyapunov function. Specifically, they aim to enhance the queuing stability in mobile edge computing networks by incorporating the Lyapunov drift into the reward function, along with a penalty for end-to-end latency. Indeed, the optimization of the DRL policy focuses on reducing the Lyapunov drift, to improve the system stability, and minimizing the penalty for smaller end-to-end latency. A hyperparameter is introduced to regulate the trade-off between the system stability and penalty reduction. Incorporating scientific knowledge in this way can notably enhance the DL models’ convergence and their generalization capabilities, and decrease the need for extensive training data. However, the learned model or policy usually minimizes the expected loss function value across the training data, which means it may not fully adhere to the constraints imposed by its knowledge-based terms. Indeed, these terms in the loss function act as regularizers, constraining the range of models that can be learned, akin to prior knowledge in Bayesian modeling [2]. To address this challenge, the authors in [27, 28] customized the DRL optimization objective to find a learning policy that maximizes the reward function while adhering to scientific knowledge-based constraints. Specifically, instead of adding the knowledge-based term to the reward function, they directly incorporated the term into the proximal policy optimization function using the Lagrangian method. They adjusted the Lagrangian multipliers through gradient ascent, thereby guaranteeing the satisfaction of the scientific knowledge-based constraints.

II-B4 Science-Informed Optimization Algorithm

In DL, gradient-based methods are commonly used to optimize the model parameters during the training phase because of their low computational complexity. However, it is crucial to modify the standard gradient-based optimization to effectively manage the complex interactions between multiple terms in scientific knowledge-informed loss functions. Specifically, different loss components (such as those enforcing the physical constraints and those fitting the data) can have gradients that interact in complex ways. These interactions can lead to suboptimal performance if not managed correctly. Considering this challenge, an adaptive gradient descent algorithm (AGDA) is introduced in [29] based on an analysis of the interaction mechanisms analysis. Indeed, AGDA dynamically adjusts the learning rates based on the interactions between various loss gradients. Consequently, AGDA can overcome the limitations of traditional gradient descent (GD) methods and enhance the model performance, stability, and generalization. Furthermore, utilizing advanced GD techniques such as projected GD (PGD) can assist in maintaining the parameter constraints and ensuring compliance with scientific knowledge-based constraints. In PGD, after computing the gradient of the loss function with respect to the parameters, these parameters are updated in the direction of the negative gradient, similar to standard GD. However, before updating the parameters, they are projected back onto the feasible region defined by the constraints. This ensures that the parameters satisfy the constraints after each update. Notably, in NNs, the projection step occurs after the backpropagation step and before updating the parameters of the NN. In short, using PGD helps maintaining the constraints on the parameters, ensuring that they stay within feasible regions and adhere to the physical principles of the problem. To address the computational complexity of the original PGD framework, which consists of one GD step followed by a projection in each iteration, the general PGD framework can be employed [16]. In the general PGD framework, the single GD step is extended to multiple ( $m$ ) GD steps.

II-B5 Science-Informed Refinement of DL Results

The outputs of the DL models can be refined by incorporating explicit or implicit scientific knowledge, resulting in numerous advantages. Firstly, this allows for consistency with scientific knowledge, thereby reducing the solution search space for DL models. Secondly, DL models that adhere to the desired physical properties are more likely to generalize to out-of-sample scenarios than standard DL models. In the field of spatiotemporal data analysis, a substantial amount of literature focuses on refining the model outputs to maintain spatial consistency and temporal coherence across predictions. In [28], the authors utilize a DRL output refinement to account for the speed constraint of THz-enabled UAVs in a wireless communications scenario. Additionally, they employ another refinement strategy to ensure that the UAVs remain within the predefined area by utilizing the $\text{clip}(.)$ function. Specifically, the authors use refinement strategies rather than considering the physical constraints as additional reward terms. This approach helps to mitigate the complexity of the rewards and the challenge of adjusting the hyperparameters.

II-C Improve Scientific Knowledge Using DL

In Section II.B, we presented how scientific knowledge can improve the DL methods. However, the interaction between scientific knowledge and DL is mutual. Indeed, just as scientific knowledge can enhance the performance of DL algorithms, DL techniques can also contribute to advancing scientific knowledge (Fig. 1). By utilizing large volumes of measurement data, DL methods have the transformative potential to enhance scientific knowledge and even uncover new physical principles. For example, in 2019, researchers employed DL methods to uncover new phases of matter within intricate quantum systems. By employing NNs to analyze extensive data from quantum simulations, they successfully categorized various phases, some of which were unanticipated by conventional theoretical frameworks. Additionally, DL can enhance the accuracy of physical modeling for a phenomenon. For example, if a second-order differential equation initially represents a phenomenon, the model’s accuracy can be improved by adding third-order and fourth-order terms to capture additional complexity. By using measurement data and applying ML techniques, this refinement helps to represent the phenomenon’s behavior more accurately. An example from wireless communications involves developing more accurate channel fading and path loss models for satellite-aided networks. In these networks, the signals can be attenuated by atmospheric conditions, such as rain, fog, and clouds. Additionally, reflections and scattering due to multipath fading can cause signal fading, leading to unreliable communication links. Data-driven and DL techniques can be employed to enhance the physics-based models of channel fading and path loss in these networks.

III Challenges and Future Research Directions

It is essential to address the outlined weaknesses of conventional DL methods including lack of interpretability, physics consistency, theoretical guidance in designing learning algorithms, and limited generalizability. As discussed in the previous sections, a promising trend is integrating scientific knowledge into DL methods, called ScIDL. In the following subsections, we first highlight some major challenges in ScIDL and then present promising future research directions for applying ScIDL in wireless communications.

III-A Challenges

Although ScIDL offers promising solutions to the limitations of conventional DL, there are significant challenges in ScIDL that require attention. Here, we discuss these challenges.

III-A1 Lack of Real-world Datasets and Benchmarks

In fields such as imaging, speech, and natural language processing, it is common to use standard datasets to evaluate and compare the computational costs and generalizability of various ML techniques. However, for applications in wireless communication, datasets and benchmarks derived from real-world scenarios are scarce. This scarcity of real-world datasets and benchmarks is particularly pronounced in the context of ScIDL. As ScIDL concepts evolve and mature, there is an increasing need to evaluate their scalability and generalizability to real-world problems. While proof-of-concept studies using simplified, toy problems offer valuable insights, it is crucial to acknowledge the potential scaling issues. For instance, computational complexity can be an issue when applying ScIDL techniques to real-world problems. Indeed, the scaling issues may persist even if the technique demonstrates superiority over conventional DL methods in toy problems. Therefore, overcoming the challenge of insufficient real-world datasets and benchmarks is imperative to guarantee the broad applicability of ScIDL in solving real-world wireless problems.

III-A2 New Computational Techniques and Framework

Developing ScIDL approaches requires a productive collaboration between DL, optimization, numerical analysis, and rigorous scientific knowledge. This collaboration has the potential not only to develop more robust and effective training algorithms but also to establish a solid foundation for a new generation of computational methods [7]. Specifically, DL methods are trained using gradient-based optimization algorithms with a low computational cost. However, the training dynamics of these methods are often not well understood. Consequently, it is necessary to develop new algorithms that are more understandable. Moreover, employing the discussed ScIDL methods, such as customizing DNN architecture, incorporating the knowledge terms into the cost function, or customizing optimization algorithms such as projected gradient descent, can increase computational complexity. In essence, ensuring the interpretability, generalization, and physics consistency of DL methods may require greater computational resources. Therefore, it is essential to develop new computational algorithms and frameworks that are more understandable and suitable for ScIDL methods. The new computational algorithms and frameworks should include the requisite software tools, libraries, and infrastructure. This could significantly enhance the adoption of ScIDL across various fields.

III-A3 New Mathematical Metrics

The primary objective of ScIDL is to integrate the current scientific knowledge with DL to overcome the limitations of traditional DL methods. It is crucial to assess how ScIDL approaches can effectively resolve limitations such as limited generalizability, lack of interpretability, and physics consistency. As multi-scale and multi-physics problems add complexity to ScIDL approaches, addressing this question will become increasingly important. For example, multi-function and multi-access technology in xG wireless networks make them inherently multi-scale and multi-physics. These systems must integrate and optimize across various functions and physical phenomena, spanning different temporal and spatial scales, to deliver the advanced capabilities envisioned for xG wireless networks. Therefore, developing new mathematical metrics to evaluate aspects such as the generalizability or interpretability of ScIDL methods is imperative.

III-A4 Real-time Data Integration

As discussed earlier, the interaction between DL and scientific knowledge is reciprocal. DL methods can be enhanced and updated by incorporating scientific knowledge, and conversely, scientific knowledge can be refined through the use of DL and sensor data. However, generating and integrating sensor data in real-time is a significant challenge due to several factors such as high sensor data volume, sensor accuracy, data consistency from multiple sensors, latency, real-time processing, and security and privacy. One advanced technology that facilitates the closed-loop interaction between DL and scientific knowledge is the digital twin (DT). Initially introduced by General Electric, a DT is a virtual representation of a physical object, system, or process that is used to understand, analyze, and optimize its real-world counterpart. A DT can find new information and create a feedback loop that continuously improves DL, the scientific understanding of the system, and the digital model (Fig. 3). However, generating and integrating sensor data in real-time remains an important challenge that requires attention.

III-B Research Directions in ScIDL for Wireless Communications

As has been mentioned, the traditional model-based methods are unsuitable for xG wireless communications due to ever-increasing computational complexity and scaling challenges. On the other hand, although DL methods have been proposed to address many challenges in wireless communications, they face significant limitations, as previously discussed. Therefore, substantial research utilizing ScIDL techniques is envisioned to address various challenges in xG wireless communications. Here, we classify the general difficulties of xG wireless communications problems into five categories: (i) computational complexity in solving optimization problems, (ii) the necessity of estimating the environmental behavior due to the need for adaptability, (iii) Shannon physical-layer capacity limit, IV. safety and security, and (v) the need to continually update and refine our knowledge of the system behavior. Accordingly, future research directions for employing ScIDL in xG wireless communication are categorized as: (i) ScIDL-based optimization methods, (ii) ScIDL-based estimation for environment behavior, (iii) ScIDL-based goal-oriented and semantic communication, (iv) ScIDL-based secure and safe methods, and (v) ScIDL-based scientific knowledge extraction, which are described as follows.

III-B1 ScIDL-Based Optimization Methods

The increasing need for reduced latency, higher data rates, and reliability in emerging applications such as connected robots and extended reality, coupled with constraints such as limited energy and bandwidth, drives the development of new wireless technologies. These technologies include dynamic link adaptation, beamforming, reconfigurable intelligent surfaces (RIS), massive MIMO, satellite-aided networks, and multi-function multi-access systems. However, developing these technologies requires addressing the intricate optimization challenges across various layers of the wireless network. Meanwhile, the scale of xG wireless networks is expanding to meet the increasing demands for faster, more reliable, and ubiquitous connectivity across various environments and user scenarios. Consequently, the complexity of the xG wireless optimization problems intensifies due to high dimensionality, non-convexity, and the requirement for joint optimization across multiple variables and constraints. Therefore, advanced optimization algorithms are required to tackle the computational complexity in xG wireless networks. ScIDL methods can be employed to address the computational complexity in those problems while also considering interpretability, generalization, and physics consistency. For example, joint user scheduling and beamforming in multi-user massive MIMO considering user demands and channel conditions is a non-convex optimization problem that requires the integration of DL, traditional optimization, physics, and signal processing knowledge. The authors in [17] proposed an unfoldable weighted minimum mean square error algorithm for beamforming in an MU-MISO downlink channel by using a science-informed design and architecture method, effectively handling the complexity versus performance trade-off. Furthermore, by using the science-informed loss function methods, DL can be used to solve the beamforming optimization problem in RIS-assisted multi-user MIMO, reducing the complexity while maintaining the physics consistency and generality. Specifically, to improve generalization and the physics consistency of the employed DL, physically consistent constraints such as radiation patterns can be added to the DL’s cost function. In addition, to ensure the constraints are satisfied, a customized optimization algorithm can be proposed based on the science-informed optimization algorithm framework.

III-B2 ScIDL-Based Estimation of Environmental Behavior

The wireless communications environment is highly dynamic due to factors such as user mobility, changing traffic demand of users, changing channel conditions, and interference. Therefore, to respond promptly to the highly dynamic and variable nature of the wireless communications environment, different solutions such as dynamic link adaptation, beamforming, netload balancing, adaptive MAC design, and adaptive resource allocation in xG networks, have been proposed. This adaptability is essential for maintaining optimal network performance, ensuring efficient resource utilization, and providing a high-quality user experience. To ensure appropriate adaptiveness in each proposed solution, it is required to estimate or predict the environmental behavior. For instance, in dynamic link adaptation and beamforming, predicting or estimating the channel state information (CSI) is crucial. Similarly, efficient load balancing in the network requires predicting of user movement, traffic patterns, and CSI. Accordingly, one promising research direction is to predict future environmental behavior using SciDML, integrating DL and continually refining the prediction model by incorporating real-time sensor data and scientific knowledge. An effective technology that leverages ScIDL to forecast future environmental behavior is the DT (Fig. 3). Indeed, the DT can simulate and predict how the real-world environment will behave, enabling proactive decision-making and optimization in various applications such as dynamic link adaptation, beamforming, load balancing, and adaptive resource allocation in xG networks. Thus, the DT can use ScIDL methods to mirror and predict the dynamics of the physical environment, enhancing the efficiency and performance of xG wireless network operations. However, predicting the behavior of highly nonlinear, multiscale, and multiphysics systems presents a significant challenge in this area. This challenge can be addressed through approaches such as Fourier neural networks, the Koopman method, and domain decomposition. For instance, employing the Koopman operator theory can help in obtaining linear models in a higher-dimensional space (often infinite-dimensional) of a nonlinear dynamical system. This transformation allows for the employment of linear analysis tools, which can facilitate the understanding and prediction of the system’s behavior.

III-B3 ScIDL-Based Goal-oriented and Semantic Communication

Goal-oriented and semantic communications represent cutting-edge technologies within xG networks, aiming to transcend the conventional communication theory boundaries. These technologies rely on artificial intelligence and a shared knowledge base to ensure an alignment between the sender and the recipient. However, implementing these technologies faces a significant challenge due to computational complexity, especially on Internet of Things (IoT) devices with limited processing power. To address this challenge, ScIDL offers a solution by streamlining complex DNNs and reducing the data required for training the semantic and channel encoders and decoders. For example, utilizing symbolic NNs with a loss function that embeds the logical constraints or production rules derived from scientific knowledge can effectively reduce the training data necessary for these components.

III-B4 ScIDL-Based Secure and Safe Methods

Safety and security are critical in wireless communications. For example, in applications such as autonomous vehicles or remote healthcare, where real-time data transmission is crucial, any compromise in security could lead to data breaches or delays, affecting the system’s responsiveness and reliability. Safety is also closely linked with other critical requirements such as reliability, security, and robustness. However, a major drawback common to most DL methods is that they do not ensure safety [28]. Similarly, DL methods have demonstrated vulnerability to adversarial examples, which are intentionally crafted inputs (often created by adding slight, but precise modifications to valid training data) designed to produce incorrect outputs [30]. As a result, cyber attackers can exploit these vulnerabilities to degrade the system’s performance. Therefore, developing safe and secure DL methods is essential, yet it remains a highly challenging task. One promising research direction in xG wireless communications involves leveraging ScIDL frameworks to enhance the safety and security of DL methods. For example, the authors in [28] used a science-informed loss function method for safe DRL. The paper aims to satisfy the cooperative safety constraints (avoiding collision between UAVs) while maximizing the energy efficiency in a THz-enabled UAV-aided wireless network. Additionally, ScIDL can be used to detect anomalies by incorporating domain-specific knowledge about expected behaviors and physical constraints in the DL’s cost function or in the optimization algorithm.

III-B5 ScIDL-Based Scientific Knowledge Extraction

xG wireless networks are transitioning into highly nonlinear, multiscale, and multiphysics systems with the integration of new technologies such as aerial and space base stations (UAVs, satellites), integrated sensing and communications, multi-band massive MIMO communications, and adaptive multilayer beamforming, to name a few. Our current scientific understanding of these systems is limited, making it crucial to enhance our scientific knowledge. We anticipate that ScIDL frameworks will emerge as a quintessential tool for advancing scientific discovery in future research. Specifically, as discussed previously, the interaction between DL and scientific knowledge is mutual in ScIDL. DL can utilize vast amounts of sensory data to analyze and enhance our scientific understanding through data-driven techniques, and in turn, this improved knowledge can be utilized to refine the DL methods. Additionally, as described earlier, DT technology can play a pivotal role in facilitating this process.

IV Conclusion

Conventional scientific methods, including physics-based models and traditional optimization algorithms, have effectively driven the evolution of wireless communication from 1G to 5G. However, these approaches are inadequate for the next generation of wireless communication because of the continuously ever-increasing computational complexity and scaling challenges. On the other hand, while DL methods have been suggested to tackle the challenges in 5G and 6G wireless communication, they encounter significant limitations. These limitations include the lack of interpretability, physics consistency, theoretical guidance in designing learning algorithms, and limited generalizability. In this article, we envision that the emerging field of ScIDL will play a pivotal role in tackling current and future challenges in wireless communication. ScIDL aims to leverage centuries of scientific knowledge to overcome the limitations of conventional DL. Indeed, in ScIDL, scientific knowledge acts as a teacher for DL, improving the performance of DL algorithms. Moreover, DL and data-driven methods can provide valuable tools for enhancing our scientific knowledge and increasing the accuracy and applicability of physics-based models in wireless communications. In this article, we have provided a concise tutorial on ScIDL. Furthermore, a roadmap for researchers is provided, which outlines the challenges, implementation guidelines, and future research directions for applying ScIDL in wireless communications.

References

[1] A. Karpatne, G. Atluri, J. H. Faghmous, M. Steinbach, A. Banerjee, A. Ganguly, S. Shekhar, N. Samatova, and V. Kumar, “Theory-guided data science: A new paradigm for scientific discovery from data,” IEEE Transactions on Knowledge and Data Engineering, vol. 29, no. 10, pp. 2318–2331, 2017.
[2] B. Moseley, Physics-informed machine learning: from concepts to real-world applications. PhD thesis, University of Oxford, 2022.
[3] M. Akrout, A. Feriani, F. Bellili, A. Mezghani, and E. Hossain, “Domain generalization in machine learning models for wireless communications: Concepts, state-of-the-art, and open issues,” IEEE Communications Surveys & Tutorials, 2023.
[4] N. Baker, F. Alexander, T. Bremer, A. Hagberg, Y. Kevrekidis, H. Najm, M. Parashar, A. Patra, J. Sethian, S. Wild, et al., “Workshop report on basic research needs for scientific machine learning: Core technologies for artificial intelligence,” tech. rep., USDOE Office of Science (SC), Washington, DC, US), 2019.
[5] J. Willard, X. Jia, S. Xu, M. Steinbach, and V. Kumar, “Integrating scientific knowledge with machine learning for engineering and environmental systems,” ACM Computing Surveys, vol. 55, no. 4, pp. 1–37, 2022.
[6] L. Von Rueden, S. Mayer, K. Beckh, B. Georgiev, S. Giesselbach, R. Heese, B. Kirsch, J. Pfrommer, A. Pick, R. Ramamurthy, et al., “Informed machine learning–a taxonomy and survey of integrating prior knowledge into learning systems,” IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 1, pp. 614–633, 2021.
[7] G. E. Karniadakis, I. G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, and L. Yang, “Physics-informed machine learning,” Nature Reviews Physics, vol. 3, no. 6, pp. 422–440, 2021.
[8] A. Karpatne, W. Watkins, J. Read, and V. Kumar, “Physics-guided neural networks (pgnn): An application in lake temperature modeling,” arXiv preprint arXiv:1710.11431, vol. 2, 2017.
[9] E. Kharazmi, Z. Zhang, and G. E. Karniadakis, “Variational physics-informed neural networks for solving partial differential equations,” arXiv preprint arXiv:1912.00873, 2019.
[10] “The cambridge dictionary of philosophy.” https://dictionary.cambridge.org/. Accessed: 2024-06-27.
[11] N. Cercone and G. McCalla, “What is knowledge representation?,” in The Knowledge Frontier: Essays in the Representation of Knowledge, pp. 1–43, Springer, 1987.
[12] J.-L. Wu, K. Kashinath, A. Albert, D. Chirila, H. Xiao, et al., “Enforcing statistical constraints in generative adversarial networks for modeling chaotic dynamical systems,” Journal of Computational Physics, vol. 406, p. 109209, 2020.
[13] X. Jia, J. Willard, A. Karpatne, J. S. Read, J. A. Zwart, M. Steinbach, and V. Kumar, “Physics-guided machine learning for scientific discovery: An application in simulating lake temperature profiles,” ACM/IMS Transactions on Data Science, vol. 2, no. 3, pp. 1–26, 2021.
[14] S. He, S. Xiong, Y. Ou, J. Zhang, J. Wang, Y. Huang, and Y. Zhang, “An overview on the application of graph neural networks in wireless networks,” IEEE Open Journal of the Communications Society, vol. 2, pp. 2547–2565, 2021.
[15] S. Zhang, B. Yin, W. Zhang, and Y. Cheng, “Topology aware deep learning for wireless network optimization,” IEEE Transactions on Wireless Communications, vol. 21, no. 11, pp. 9791–9805, 2022.
[16] L. He, Z. Wang, S. Yang, T. Liu, and Y. Huang, “Generalizing projected gradient descent for deep-learning-aided massive mimo detection,” IEEE Transactions on Wireless Communications, 2023.
[17] L. Pellaco, M. Bengtsson, and J. Jaldén, “Matrix-inverse-free deep unfolding of the weighted mmse beamforming algorithm,” IEEE Open Journal of the Communications Society, vol. 3, pp. 65–81, 2021.
[18] A. Daw, R. Q. Thomas, C. C. Carey, J. S. Read, A. P. Appling, and A. Karpatne, “Physics-guided architecture (pga) of neural networks for quantifying uncertainty in lake temperature modeling,” in Proceedings of the 2020 SIAM International Conference on Data Mining, pp. 532–540, SIAM, 2020.
[19] R. Wang, R. Walters, and R. Yu, “Incorporating symmetry into deep dynamics models for improved generalization,” International Conference on Learning Representations (ICLR), 2021.
[20] M. Raissi and G. E. Karniadakis, “Hidden physics models: Machine learning of nonlinear partial differential equations,” Journal of Computational Physics, vol. 357, pp. 125–141, 2018.
[21] A. D. Jagtap, K. Kawaguchi, and G. Em Karniadakis, “Locally adaptive activation functions with slope recovery for deep and physics-informed neural networks,” Proceedings of the Royal Society A, vol. 476, no. 2239, p. 20200334, 2020.
[22] B. Chang, M. Chen, E. Haber, and E. H. Chi, “Antisymmetricrnn: A dynamical system view on recurrent neural networks,” arXiv preprint arXiv:1902.09689, 2019.
[23] J. Xu, Z. Zhang, T. Friedman, Y. Liang, and G. Broeck, “A semantic loss function for deep learning with symbolic knowledge,” in International Conference on Machine Learning, pp. 5502–5511, PMLR, 2018.
[24] R. Stewart and S. Ermon, “Label-free supervision of neural networks with physics and domain knowledge,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, 2017.
[25] Z. Li, H. Zheng, N. Kovachki, D. Jin, H. Chen, B. Liu, K. Azizzadenesheli, and A. Anandkumar, “Physics-informed neural operator for learning partial differential equations,” ACM/JMS Journal of Data Science, vol. 1, no. 3, pp. 1–27, 2024.
[26] Z. Zhuang, J. Wang, Q. Qi, J. Liao, and Z. Han, “Adaptive and robust routing with lyapunov-based deep rl in mec networks enabled by blockchains,” IEEE Internet of Things Journal, vol. 8, no. 4, pp. 2208–2225, 2020.
[27] M. Han, Y. Tian, L. Zhang, J. Wang, and W. Pan, “Reinforcement learning control of constrained dynamic systems with uniformly ultimate boundedness stability guarantee,” Automatica, vol. 129, p. 109689, 2021.
[28] A. Termehchi, A. Syed, W. S. Kennedy, and M. Erol-Kantarci, “Distributed safe multi-agent reinforcement learning: Joint design of thz-enabled uav trajectory and channel allocation,” IEEE Transactions on Vehicular Technology, Early access.
[29] X. Li, Y. Liu, and Z. Liu, “Physics-informed neural network based on a new adaptive gradient descent algorithm for solving partial differential equations of flow problems,” Physics of Fluids, vol. 35, no. 6, 2023.
[30] L. Pellaco, Machine learning for wireless communications: Hybrid data-driven and model-based approaches. PhD thesis, KTH Royal Institute of Technology, 2022.