Optimal transport with constraints: from mirror descent to classical mechanics
Abstract
Finding optimal trajectories for multiple traffic demands in a congested network is a challenging task. Optimal transport theory is a principled approach that has been used successfully to study various transportation problems. Its usage is limited by the lack of principled and flexible ways to incorporate realistic constraints. We propose a principled physics-based approach to impose constraints flexibly in optimal transport problems. Constraints are included in mirror descent dynamics using the principle of D’Alembert-Lagrange from classical mechanics. This results in a sparse, local and linear approximation of the feasible set leading in many cases to closed-form updates.
Introduction.
Optimal transport in networks has important applications in different disciplines, in particular in urban transportation networks Arnott and Small (1994). Congestion not only increases travel time for users and decreases productivity, but it also drives air pollution. Reducing congestion and making transportation more efficient are also a core objective for EU policies, as highlighted throughout the EU Transport White Paper and the Strategic Plan 2020-2024 Commission (2011, 2020).
The design of efficient transportation networks is a complex task that requires a multifaceted solution. One of these facets is the problem of finding optimal routes for passengers. This is a well-studied problem in operations research Ahuja et al. (1988) where minimum-cost optimization is often considered to model discrete flows and can be solved using classical techniques from linear programming. In our work, we consider the continuous case, where flows are real-valued quantities. A variety of approaches have been suggested to model transport in networks using techniques from physics of complex systems Morris and Barthelemy (2012); Gao et al. (2019). Path optimality and congestion control have been studied in discrete settings Noh and Rieger (2002); Dobrin and Duxbury (2001); Bayati et al. (2008) or using the cavity method Yeung and Saad (2012); Yeung et al. (2013). These usually rely on ad-hoc algorithmic updates that depend on the specific type of constraints. The computational complexity of the ad-hoc updates is greatly influenced by the constraints. Other approaches have been proposed to investigate navigation in complex systems Solé-Ribalta et al. (2016); Gómez-Gardenes and Latora (2008); Lacasa et al. (2009); Sneppen et al. (2005); Rosvall et al. (2005); Zhao et al. (2005); Estrada et al. (2023), where the focus lies on investigating the properties of flows, rather than their optimization, as we consider here. In addition, these models often assume that passengers follow their shortest paths, an assumption, which may not be satisfied in practice. Adaptation dynamics Tero et al. (2010); Hu and Cai (2013); Ronellenfitsch and Katifori (2016) have been proposed to model biological distribution networks. However, these methods fall short of describing realistic scenarios where transport flows are limited by constraints.
In the following we cast the problem of designing efficient transportation networks under the broader framework of optimal transport theory (OT) Santambrogio (2015). This has been used to model and optimize various aspects of transport networks such as network design Tero et al. (2010); Ronellenfitsch and Katifori (2016); Baptista et al. (2020); Leite and De Bacco (2022) and traffic flows Bonifaci et al. (2012); Lonardi et al. (2021); Bohn and Magnasco (2007); Ibrahim et al. (2021); Lonardi et al. (2023a). These approaches guarantee a principled and computationally efficient way of solving transportation problems on networks. In addition, they model traffic congestion with a single tuning parameter that enables a transition between opposite traffic regimes, where traffic congestion can either be consolidated or discouraged. In standard OT methods, beyond few obvious constraints (e.g. conservation of mass), the amount of flow passing through an edge of the transportation network is unconstrained. As a result, traffic tends to concentrate on path trajectories that may be structurally unfeasible, which severely limits the applicability of OT models in real-world situations, where, for example, roads have a limited capacity of vehicles traveling at the same time. This letter proposes an approach to avoid this crucial flaw of OT models by imposing constraints. Applying this approach significantly impacts the overall network topology induced by the optimal flows, as the resulting path trajectories have different path lengths and traffic distribution than those obtained from unconstrained scenarios.
Our approach has not only a solid foundation via the principle of D’Alembert-Lagrange from classical mechanics Lanczos (1949), but also leads to algorithms that are computationally efficient and have a low implementation complexity. The key idea is to consider mirror descent dynamics of an OT problem, where constraints are included on a velocity level. This leads to a sparse, local and linear approximation of the feasible set which, in many cases, allows for a closed-form update rule, even in situations where the feasible set is nonconvex.
The model.
In analogy with electrical grids or hydraulic networks, we model mass flow on a transportation network using conductivities and flows on network edges. We consider a multi-commodity scenario Lonardi et al. (2021); Bonifaci et al. (2022), where mass of different type can move along different trajectories. The flow of mass of type along an edge can be described by where is a pressure potential at node for passenger of type , is the length of the edge and its conductivity. This latter quantity can be seen as proportional to the size of an edge, and is the main variable of interest in determining optimal trajectories. Once the conductivity is known, the pressure differences can then be calculated from Kirchhoff’s law, which in turns determines the flows , see Supporting Material (SM) sup . In the absence of constraints, the optimal conductivities are the stationary solutions of the dynamics , where
(1) |
with and denotes the Euclidean norm. Intuitively, this equation describes a positive feedback mechanism where conductivities increase for larger fluxes and decrease for negligible ones Tero et al. (2010). It can be shown that the dynamics in Eq. 1 admits a Lyapunov function which can be interpreted as a combination of the cost to operate the network and that of building the infrastructure Lonardi et al. (2021), see SM sup . Moreover, we have that , where is a diagonal matrix with diagonal entries and Eq. 1 can therefore be seen as a mirror descent for the cost function Bonifaci (2021). This scaling in has the advantage of ensuring good behavior of the resulting numerical methods. One can also reinterpret Eq. 1 as a classical gradient descent by applying a suitable transformation Facca and Benzi (2021), we do not explore this here.
Variants of these dynamics have been proposed to model distributions over networks Hu and Cai (2013); Bohn and Magnasco (2007); Katifori et al. (2010); Banavar et al. (2000); Ronellenfitsch and Katifori (2016). The constant regulates the desired transportation regime. The setting penalizes traffic congestion by distributing paths on more edges, encourages path consolidation into fewer highways, and is shortest path-like.
In addition to imposing Kirchhoff’s law on nodes to ensure mass conservation, solving these dynamics outputs otherwise unconstrained optimal and (see SM sup ). While this may be enough in ideal cases, in more realistic scenarios it is important to further constrain the solution. For instance, structural constraints may limit the maximum amount of flow that an edge can carry, or a budget constraint may be used to limit the infrastructure cost for building the network. Hence, the dynamics must be altered to account for these additional constraints.
There are many ways in which constraints can be added. A popular approach is to add constraints on a so-called position level, which leads to gradient inclusions in continuous time (Aubin and Cellina, 1984, Ch 3.4), and projected gradient descent in discrete time. Unfortunately, the scope of projected gradients is limited, due to the fact that projections can only be efficiently evaluated for constraints that have a particular structure (such as a low-dimensional hyperplane, the probability simplex, or a Euclidean norm ball). When the feasible set is nonconvex and/or fails to have a simple structure, evaluating projections is a computationally daunting task. This motivates our formulation (see also Muehlebach and Jordan (2022)), which includes constraints on a velocity level and yields a sparse local and linear approximation of the feasible set. As a consequence, the updates for can often still be evaluated in closed-form (or there is an efficient way of computing them numerically) even though the underlying feasible set is nonconvex or fails to have a simple structure. We will highlight explicit examples of such situations in the remainder of this letter.
We define as the set of feasible conductivities , with a constraint function that we assume continuously differentiable and is the number of network edges. Interpreting as a “position” variable we can equivalently express the constraints in in terms of a “velocity” variable by imposing , where is the set of feasible velocities and is a constant typically referred to as a “restitution” parameter or “slackness”, see Appendix for details.
For and an active constraint , the constraint is equivalent to , which ensures that potential constraint violations decay at the rate . The situation is visualized in Fig. 1(A).
In order to account for the velocity constraint we augment the dynamics with a reaction force that forces the solution to remain within the desired constraints:
(2) |
where denotes the normal cone of the set at . Due to the scaling of the gradient with , the normal cone is defined with respect to the inner product , where are arbitrary vectors. This has the important effect of guaranteeing that (of the unconstrained dynamics) is still a Lyapunov function also in the constrained setting and that is monotonically decreasing along the trajectories of Eq. 2. A detailed derivation is included in SM sup .
The addition of ensures that even if pushes away from , as shown in Fig. 1(B), the force , which is orthogonal to the set , annihilates the component of that would lead to a constraint violation and ensures that . As discussed above, we can therefore conclude that for all and for .
In addition, we infer from Fig. 1 that the resulting in Eq. 2 is nothing but the projection of onto the set and as a result, we can rewrite in the following way:
(3) |
which can also be equivalently reformulated as the quadratic program (QP)
(4) |
This reformulation is not only useful for numerical computations, but also highlights that the velocity is chosen, at each point in time, to match the unconstrained . Fig. 1(A) visualizes the set and the set of feasible velocities and at points and , respectively. Point lies on the boundary of , while is infeasible. We note that the cone includes an offset, which is controlled by the restitution parameter ; this ensures that any leads to a decrease in constraint violation. Fig. 1 (B) shows that when the vector field is pushing away from , a force is added to the dynamics. The force annihilates the component of that would lead to a constraint violation and ensures , where is chosen as close as possible to . This can also be interpreted as Gauss’s principle of least constraint. It is important to note that is a polyhedral set that only includes the constraints , a subset of the original constraints . The set represents therefore a sparse, local and linear approximation of the feasible set. The solution of Eq. 3 can then be used to update the conductivity with a discrete-time algorithm:
(5) |
where is the step size.
This general formalism can be applied to a variety of scenarios, provided one can compute , which determines the set . We can then solve Eq. 4 by using numerical solvers tailored to QP, which then yields the update Eq. 5. Additional details about the computational complexity for solving Eq. 5 are described in SM sup . However, in important special cases, the optimization Eq. 5 can be solved in closed-form, as we illustrate below with three relevant examples.
Capacity constraints.
In cases of structural constraints that strictly limit the amount of mass that can travel along any given edge, one can consider capacities on edges and set constraints as . The velocity constraint in Eq. 3 reads as , for , which is strictly negative, since (SM sup ). As previously discussed, is a restitution parameter that dictates the rate at which constraint violations decay. In discrete time, one should choose such that to guarantee convergence (see Muehlebach and Jordan (2022)). We can then solve Eq. 3 in closed-form for edges violating the constraint obtaining . In summary, for each edge , we have:
(6) |
Fig. 2 shows the path topologies with capacity constraints on synthetic data, compared against the unconstrained case. We generate random planar networks as the Delaunay triangulation (Guibas and Stolfi, 1985) of points in the plane. We measure the Gini coefficient calculated on the traffic on edges, defined as the -dimensional vector with entries , where is the number of passengers. The coefficient has value in and it determines how traffic is distributed along network edges, with meaning equally-balanced or highly unbalanced traffic on few edges, respectively. The choice of the edge capacity influences this value, with lower imposing stricter constraint and thus encouraging traffic to distribute more equally along the edge, i.e. lower Gini, as shown in Fig. 2(A). Conversely, this implies longer routes for passengers, as measured by an increasing average total path length compared to the unconstrained solution, as shown in Fig. 2(B).
Budget constraint.
As a second example, we consider a global constraint that involves all the edges at once, a budget constraint . This is relevant when a network manager has a fixed limited amount of resources to invest. We note that, while the Lyapunov function contains a similar budget term–the cost to build the infrastracture–this cost is not regarded as a constraint in standard approaches Hu and Cai (2013); Lonardi et al. (2021) but as part of the energy consumption, and the budget is not a Lagrange multiplier but a measurable constant. Furthermore, unlike the previous case where including a positivity constraint is optional (but it can in principle be imposed as well, see SM sup ), here we need to include that explicitly. In the standard OT formalism positivity is ensured, provided is initialized as a positive quantity. Adding constraint may not preserve positivity anymore during the updates, this is the case for the budget constraint, as we observed empirically. Positivity is enforced by adding , i.e. .
In this budget constraint setting, the conductivities violate the constraint whenever . We derive a closed-form solution as: , if , and otherwise, where , a Lagrange multiplier for the budget constraint, can be numerically determined via fixed-point iteration (SM sup ).
Combining linear and non-linear constraints.
All the previous examples considered linear constraints, where it is simple to derive analytical solutions. In general, constraints can be more complicated and thus require numerical methods to solve the constrained QP in Eq. 3. In this scenario, we consider a non-linear budget constraint of the form: , where is a nonlinearity parameter. Setting gives a linear budget constraint as the one discussed earlier. A non-linear example is a volume-preserving constraint where , this is relevant for biological processes such as leaf venation and vascular systems Takamatsu et al. (2017); Ronellenfitsch and Katifori (2016). This non-linear budget induces the velocity constraint . In addition, we also consider a capacity constraint as in the first scenario studied above. Overall, three functions are required: i) to impose non-linear budget constraint; ii) to impose edge capacity and iii) to ensure positivity. We derive the closed-form solution as
(7) |
where and . The value of can be determined numerically using fixed-point iteration (SM sup ). The value ensures there is no violation on the edge capacity, imposes positivity constraint and captures budget violation. Overall, this scenario ensures that the velocity has an upper bound of and lower bound of . The choice of impacts the topological properties of the resulting network, e.g., the total path length. In the numerical experiments, we set the nonlinearity parameter as .
network.
We examine the topology of various constrained solutions on the road network of the city of Kujala et al. (2018), see Fig. 3(A). This has 640 nodes and 740 edges. As a relevant example, we set the central bus station as the destination node and select the remaining nodes as origins, but our method still applies to other choices of origin-destination pairs, e.g. peripheral nodes connecting to other peripheral nodes or to various hubs. This can be specified inside Kirchhoff’s law, see SM sup .
Routes generated from the non-linear constraint scenario balance traffic more than the unconstrained case and result in longer routes, see Fig. 3(B-C). Adding a budget constraint for results in more distributed traffic (lower Gini) without increasing much the total path length, compared to the unconstrained case. This could be used for instance to allocate to roads infrastructural works aimed at maintenance or upgrade when having a restricted budget.
Discussion.
Distributing flows in a transportation network is challenging. Approaches based on optimal transport theory are promising, but they are limited by the lack of a mechanism to incorporate realistic constraints. We show how to impose arbitrary constraints on OT problems in a principled and flexible way. The constraints are lifted from a position to a velocity level and are included in the corresponding mirror descent dynamics. This results in a scalable algorithm that solves constrained OT problems in a computationally efficient manner. The algorithm relies on a sparse local approximation of the feasible set at each iteration. Thus, closed-form updates can often be derived, even if the underlying feasible set is nonconvex or nonlinear. Otherwise, one can resort to efficient numerical methods to solve at most a quadratic program. Our physics-based approach is a change of paradigm with regard to how OT problems are modelled and solved numerically. This calls for a generalization of transportation problems in wider scenarios, e.g. in networks with multiple transport modes Ibrahim et al. (2021), with real-time traffic demands Lonardi et al. (2023b) or with noise-induced resonances Folz et al. (2023).
We provide an open source implementation McOpt (2023).
Acknowledgements.
Acknowledgments: The authors thank the International Max Planck Research School for Intelligent Systems (IMPRS-IS) for supporting AAI. MM thanks the German Research Foundation and the Branco Weiss Fellowship, administered by ETH Zurich, for the support.Appendix A Appendix
Details about setting the constraints.
We define as the set of feasible conductivities , with a constraint function that we assume continuously differentiable and is the number of network edges. We focus on those edges where constraints are not satisfied, and denote the set of active constraints for a given as . Interpreting as a “position” variable, a constraint to ensure , can be equivalently formulated as a constraint on its velocity , with , where denotes the tangent cone of the feasible set at , see Rockafellar and Wets (1998). However, it will be convenient to slightly extend the notion of tangent cone to also account for infeasible initial conditions (this is particularly important for the discretization), which is achieved by imposing , where , and is a constant typically referred to as a “restitution” parameter or “slackness”. We note that generalizes the notion of the tangent cone, since for , . We assume mild regularity conditions (constraint qualification). A sufficient condition is, for example, the existence of such that for all .
For , the constraint is equivalent to , , which ensures that potential constraint violations decay at the rate .
Details about our method.
From a variational optimization perspective, our approach is related to successive linear and sequential quadratic programming Nocedal and Wright (2006); Bertsekas (1999); Luenberger and Ye (2016). The underlying idea of these methods is to linearize the objective function and the constraints about the current iterate and to solve a local linear and/or quadratic program. Our work improves upon these ideas and tailors these to optimal transport problems in the following way: i) we linearize a subset of constraints at every iteration, which means that the subproblem Eq. 3 typically includes very few constraints and can be solved efficiently; ii) we introduce a non-Euclidean inner product that is adapted to optimal transport problems and is used to show that is a Lyapunov function; iii) we provide closed-form updates in various problem instances that are practically relevant.
References
- Arnott and Small (1994) R. Arnott and K. Small, American scientist 82, 446 (1994).
- Commission (2011) E. Commission, “White paper of 28 march 2011: “roadmap to a single european transport area—towards a competitive and resource efficient transport system”,” (2011).
- Commission (2020) E. Commission, “Strategic plan 2020-2024,” (2020).
- Ahuja et al. (1988) R. K. Ahuja, T. L. Magnanti, and J. B. Orlin, “Network flows. alfred p. sloan school of management,” (1988).
- Morris and Barthelemy (2012) R. G. Morris and M. Barthelemy, Physical review letters 109, 128703 (2012).
- Gao et al. (2019) L. Gao, P. Shu, M. Tang, W. Wang, and H. Gao, Physical Review E 100, 012310 (2019).
- Noh and Rieger (2002) J. D. Noh and H. Rieger, Physical Review E 66, 066127 (2002).
- Dobrin and Duxbury (2001) R. Dobrin and P. Duxbury, Physical Review Letters 86, 5076 (2001).
- Bayati et al. (2008) M. Bayati, C. Borgs, A. Braunstein, J. Chayes, A. Ramezanpour, and R. Zecchina, Physical review letters 101, 037208 (2008).
- Yeung and Saad (2012) C. H. Yeung and D. Saad, Physical review letters 108, 208701 (2012).
- Yeung et al. (2013) C. H. Yeung, D. Saad, and K. M. Wong, Proceedings of the National Academy of Sciences 110, 13717 (2013).
- Solé-Ribalta et al. (2016) A. Solé-Ribalta, S. Gómez, and A. Arenas, Physical review letters 116, 108701 (2016).
- Gómez-Gardenes and Latora (2008) J. Gómez-Gardenes and V. Latora, Physical Review E 78, 065102 (2008).
- Lacasa et al. (2009) L. Lacasa, M. Cea, and M. Zanin, Physica A: Statistical Mechanics and its Applications 388, 3948 (2009).
- Sneppen et al. (2005) K. Sneppen, A. Trusina, and M. Rosvall, Europhysics Letters 69, 853 (2005).
- Rosvall et al. (2005) M. Rosvall, A. Grönlund, P. Minnhagen, and K. Sneppen, Physical Review E 72, 046117 (2005).
- Zhao et al. (2005) L. Zhao, Y.-C. Lai, K. Park, and N. Ye, Physical Review E 71, 026125 (2005).
- Estrada et al. (2023) E. Estrada, J. Gómez-Gardeñes, and L. Lacasa, Proceedings of the National Academy of Sciences 120, e2305001120 (2023).
- Tero et al. (2010) A. Tero, S. Takagi, T. Saigusa, K. Ito, D. P. Bebber, M. D. Fricker, K. Yumiki, R. Kobayashi, and T. Nakagaki, Science 327, 439 (2010).
- Hu and Cai (2013) D. Hu and D. Cai, Physical review letters 111, 138701 (2013).
- Ronellenfitsch and Katifori (2016) H. Ronellenfitsch and E. Katifori, Physical review letters 117, 138301 (2016).
- Santambrogio (2015) F. Santambrogio, Birkäuser, NY 55, 94 (2015).
- Baptista et al. (2020) D. Baptista, D. Leite, E. Facca, M. Putti, and C. De Bacco, Scientific reports 10, 20806 (2020).
- Leite and De Bacco (2022) D. Leite and C. De Bacco, arXiv preprint arXiv:2209.06751 (2022).
- Bonifaci et al. (2012) V. Bonifaci, K. Mehlhorn, and G. Varma, Journal of theoretical biology 309, 121 (2012).
- Lonardi et al. (2021) A. Lonardi, E. Facca, M. Putti, and C. De Bacco, Physical Review Research 3, 043010 (2021).
- Bohn and Magnasco (2007) S. Bohn and M. O. Magnasco, Physical review letters 98, 088702 (2007).
- Ibrahim et al. (2021) A. A. Ibrahim, A. Lonardi, and C. D. Bacco, Algorithms 14, 189 (2021).
- Lonardi et al. (2023a) A. Lonardi, D. Baptista, and C. De Bacco, Frontiers in Physics 11, 65 (2023a).
- Lanczos (1949) C. Lanczos, The Variational Principles of Mechanics (University of Toronto Press, 1949).
- Bonifaci et al. (2022) V. Bonifaci, E. Facca, F. Folz, A. Karrenbauer, P. Kolev, K. Mehlhorn, G. Morigi, G. Shahkarami, and Q. Vermande, Theoretical Computer Science 920, 1 (2022).
- (32) See Supporting Material, which includes Refs. [27-31], for additional information about the derivations and a detailed discussion of the numerical simulations .
- Ibrahim et al. (2022) A. A. Ibrahim, D. Leite, and C. De Bacco, Physical Review E 105, 064302 (2022).
- Davis (2004) T. A. Davis, ACM Trans. Math. Softw. 30, 196–199 (2004).
- Briggs et al. (2000) W. L. Briggs, V. E. Henson, and S. F. McCormick, A multigrid tutorial (SIAM, 2000).
- Nesterov (2004) Y. Nesterov, Introductory lectures on convex optimization: A basic course (Springer Science & Business Media LLC, 2004).
- Bonifaci (2021) V. Bonifaci, Computational Optimization and Applications 79, 441 (2021).
- Facca and Benzi (2021) E. Facca and M. Benzi, SIAM Journal on Scientific Computing 43, A2295 (2021).
- Katifori et al. (2010) E. Katifori, G. J. Szöllősi, and M. O. Magnasco, Physical review letters 104, 048704 (2010).
- Banavar et al. (2000) J. R. Banavar, F. Colaiori, A. Flammini, A. Maritan, and A. Rinaldo, Physical Review Letters 84, 4745 (2000).
- Aubin and Cellina (1984) J. P. Aubin and A. Cellina, Differential Inclusions (Springer, 1984).
- Muehlebach and Jordan (2022) M. Muehlebach and M. I. Jordan, Journal of Machine Learning Research 23, 1 (2022).
- Guibas and Stolfi (1985) L. Guibas and J. Stolfi, ACM transactions on graphics (TOG) 4, 74 (1985).
- Takamatsu et al. (2017) A. Takamatsu, T. Gomi, T. Endo, T. Hirai, and T. Sasaki, Journal of Physics D: Applied Physics 50, 154003 (2017).
- Kujala et al. (2018) R. Kujala, C. Weckström, R. K. Darst, M. N. Mladenović, and J. Saramäki, Scientific data 5, 1 (2018).
- Lonardi et al. (2023b) A. Lonardi, E. Facca, M. Putti, and C. De Bacco, Physical Review E 107, 024302 (2023b).
- Folz et al. (2023) F. Folz, K. Mehlhorn, and G. Morigi, Physical Review Letters 130, 267401 (2023).
- McOpt (2023) McOpt, https://github.com/aleable/McOpt (2023).
- Rockafellar and Wets (1998) R. T. Rockafellar and R. J.-B. Wets, Variational Analysis (Springer Verlag Berlin-Heidelberg, 1998).
- Nocedal and Wright (2006) J. Nocedal and S. J. Wright, Numerical Optimization, 2nd ed. (Springer, 2006).
- Bertsekas (1999) D. P. Bertsekas, Nonlinear Programming, 2nd ed. (Athena Scientific, 1999).
- Luenberger and Ye (2016) D. G. Luenberger and Y. Ye, Linear and Nonlinear Programming, 4th ed. (Springer, 2016).