Characterization of the Distortion-Perception Tradeoff for Finite Channels with Arbitrary Metrics
Abstract
Whenever inspected by humans, reconstructed signals should not be distinguished from real ones. Typically, such a high perceptual quality comes at the price of high reconstruction error, and vice versa. We study this distortion-perception (DP) tradeoff over finite-alphabet channels, for the Wasserstein- distance induced by a general metric as the perception index, and an arbitrary distortion matrix. Under this setting, we show that computing the DP function and the optimal reconstructions is equivalent to solving a set of linear programming problems. We provide a structural characterization of the DP tradeoff, where the DP function is piecewise linear in the perception index. We further derive a closed-form expression for the case of binary sources.
I Introduction
The reconstruction of a signal from degraded data is required in numerous settings across science and engineering. Until recently, reconstruction algorithms’ performance has been measured by its mean distortion, such as mean squared error (MSE). For that reason, many methods aimed to minnimize distortion measures such as MSE and peak signal-to-noise ratio (PSNR). However, in systems whose outputs are inspected by human users, reconstructions should not be easily distinguished from signals typical to the source domain. Therefore, many current works target perceptual quality rather than distortion (e.g. in image restoration, see [1, 2, 3, 4]).
Mathematically, the probability of success in a hypothesis test is known to be proportional to the Total-Variation (TV) distance between distributions [5]. Hence, high perceptual quality is considered to be achieved when the distribution of restored signals is close to the real signals distribution [6].
Good perceptual quality generally comes at the price of high reconstruction error and vice versa. This leads to a tradeoff between distortion and perception, first studied in [6]. The central problem is thus to quantify the distortion-perception (DP) function, which is the minimal distortion possible for a certain level of perceptual quality. The DP problem was studied by various authors. Specifically, [7] studied the DP function in real spaces, for the MSE distortion and the Wasserstein- perception index. In discrete spaces, [8] characterized the special case of a binary source, for the Hamming distortion and the TV perception index.
In this paper, we focus on discrete spaces, and investigate the DP tradeoff for general finite-alphabet channels and general distortion matrices. As the perception index, we consider the Wasserstein- distance induced by a general metric, which generalizes the TV distance [9, 10, 11]. We show that finding the DP function and the optimal reconstruction for this setting is equivalent to solving a set of linear problems, and the result is always a piecewise linear function of the perception index, regardless of the channel size, the underlying distributions or distortion measure. This stems from the properties of the dual feasible set. We further revisit the binary setting of [8], and derive a closed-form expression for the DP function, now considering a general distortion measure. We provide a self-contained proof for this case based on our novel analysis of the general setting.
II Preliminaries
II-A The distortion-perception tradeoff
Let be random variables taking values in some complete separable metric spaces , respectively. We assume the existence of the joint probability on , and a Borel lower-bounded distortion function . An estimator is a random variable on , defined by its distribution conditioned on the measurement , , with marginal distribution .
An optimal estimator for the DP tradeoff, is an estimator that minimizes the expected distortion under the perception constraint . Here, is a divergence between probability measures. [6] introduced the DP function
(1) |
The expectation is taken w.r.t. the probability measure induced by and where we assume that are independent given . We have the following result of [6, Thm. 2].
Theorem 1.
(The perception-distortion tradeoff). If is convex in its second argument, then the distortion-perception function (1) is monotonically non-increasing and convex.
This is the case for the TV and Wasserstein distances discussed in this paper.
II-B Related work
Apart from the general properties given by Theorem 1, the precise nature of DP functions depends on the exact setup. [7] fully characterized this function in real spaces, considering the MSE and Wasserstein- indices. Under this setting, the distortion-perception function is always quadratic, and possesses a closed-form expression for Gaussian channels. In this work we discuss discrete signals, where we provide an analogous structural characterization, in which is always piecewise linear (Theorem 6).
Reconstruction problems with constrained output distributions were studied in optimal transport [12], lossy compression and quantization [13, 14]. Recently, [15] investigated the cost of perfect perceptual consistency constraints in online estimation settings. The DP tradeoff was also extended to lossy compression by presenting the rate-distortion-perception (RDP) function [16, 17, 18], which is the minimal rate of a code whose decoding allows a desired tradeoff between reconstruction and perceptual quality. A coding theorem was introduced for this setting [19, 20], where the properties of optimal codes are investigated [21].
In the context of RDP theory, [8] investigated channels with binary sources. They showed [8, Thm. 7] that for the Hamming loss with the TV perceptual index, the DP function is a piecewise linear function whose breakpoints are given by an explicit formula. Here, we extend this result, considering an arbitrary distortion measure (Theorem 9).
III Problem formulation
In this paper, we discuss the discrete case, where and are finite spaces. Let be discrete variables defined on finite alphabets , where is the variable of interest, and is a measurement of over a noisy channel. Their joint probability is represented by the matrix , and the marginal distributions and are given by the vectors . We assume that for each letter in the channel’s output, (i.e., we ignore unused symbols). A randomized estimator of from is defined by a stochastic transition matrix whose entries are the probabilities to reconstruct the symbol given that the channel output is . We assume the Markov relation where are independent given . The arbitrary distortion matrix is given by , where the expected distortion
(2) |
should be minimized w.r.t. . The marginal distribution of is given by the vector . We are interested in analyzing the distortion-perception (DP) function [6]
(3) |
For simplicity, let us first consider the TV distance as the perceptual index , given by
(4) |
Note that using this definition, , and iff . Now,
(5) | ||||
(8) |
where the Frobenius inner product , is the dimensional all-ones vector, and for the constraint is applied elementwise. We start by presenting some elementary properties of (5).
Proposition 2.
Let . The optimization problem (5) is feasible (namely, the constraints are satisfiable), and its optimal value is bounded from below.
Proof.
The posterior sampling solution is feasible for every , since , yielding . For every stochastic matrix Q,
(9) |
hence the optimal value is bounded. ∎
Proposition 3.
Denote the matrix , whose entries are given by . Then, for any ,
(10) |
A corresponding optimal estimator is given by
(11) |
Trivially, holds for every .
The proof is straightforward.
IV Linear Programming formulation
We now observe that the perceptual constraint in (5), is equivalent to the set of linear constraints
(12) |
Taking all possible sign combinations we attain linear constraints, where the constraints for which the signs are either all positive or all negative are redundant since for probability vectors and a stochastic matrix the LHS of (12) vanishes. Together with (5), we can reformulate the DP function as the following Linear Program (LP) [22, 23]
(13) | ||||
(17) |
In (13), we have variables (the entries of ), and constraints.
IV-A Total Variation as a Wasserstein distance
Let be the Hamming distance matrix, let be the set of probability measures on , and let be a coupling between and (parameterized by a matrix ). It is well known [10] that taking as a metric on , the TV distance coincides with the Wasserstein- distance on , namely
(18) | ||||
(19) |
where the minimum is attained [11, Lemma 3.4.1]. Wasserstein distances are convex metrics on [9]. Using (19), we can rewrite (13) as the linear problem
(20) | ||||
(28) |
where is a slack variable. The problem (20) possesses variables and only constraints, from which are independent.
Interestingly, the form (20) allows to discuss a more general family of perceptual divergences Wasserstein- distances (19) induced by arbitrary metrics on , which we will consider to be the case from this point on. We will assume w.l.o.g. that takes values in , hence the results of Propositions 2 and 3 hold trivially in this case.
IV-B The Dual Problem
Let the general linear programming problem [22]
(29) |
where , and the inequality is elementwise. Its dual problem (DLP) is given by
(30) |
Finally, recall that Strong duality holds for feasible and bounded LP problems [22], namely, the problem (30) is feasible and
(31) |
We next derive the dual form of (20). For convenience, we split the variables in (30) into four groups: variables related to the stochasticity constraint on for each symbol in , the two groups of variables and related to the constraints on the marginals of , and the variable related to the perception constraint . We denote
(32) |
and explicitly write the dual problem of (20) as (see derivation in the Appendix),
(33) | ||||
(36) |
From the strong duality property, we have that (33) is feasible; indeed, we can choose and . This choice of variables recovers the lower bound of Proposition 3, where for .
Remark 4.
It is easy to see that in this case , while one constraint is redundant, namely we can eliminate a linear constraint from the primal program (20) (a row of ) such that the row rank of the problem is full. Equivalently, we can set one of the variables to , and the dual feasible set (projected onto ) will not contain a line. This implies the existence of an extreme point in this dual set in (see [22, Thm. 2.6]).
Given a value , can be calculated by numerically solving (20) (equivalently, (33)). However, finding a closed-form solution remains an open problem. In Section V-B we find such an expression for small alphabets. We also observe that the objective of (33) is linear in the perception index, hence the maximal value for a given is attained by some non-increasing linear function of the form . We further develop this insight below.
V Main results
V-A Piecewise linearity of DP functions
While the problem of finding an exact formula for is still open, here we exploit the properties of the dual problem (33) in order to show the general property that is piecewise linear in the perception index . Moreover, the breakpoints and slopes of this function are determined by the vertices of a convex set in . We will utilize the following property of LP problems.
Lemma 5.
([22, Thm. 2.8]) For a bounded LP problem, if there exists an extreme point in the feasible set, then the optimal solution is obtained at an extreme point.
This is true of course also for the dual problem. We now use this result to prove the following.
Theorem 6.
For , the DP function (20) is a non-increasing piecewise linear function of with a non-decreasing slope. Furthermore, there exists such that .
The proof is based on analyzing the dual formulation (33). Due to strong duality (31) this matches the primal problem. The feasible set of (33) has a finite number of vertices, and this set is independent of the perceptual index . The solution to (33) must occur at one of these vertices. Thus, the interval may be partitioned into sub-intervals, so that in each sub-interval the solution to (33) is at the same vertex. For a fixed choice of variables and in (33), the function is linear with slope . Hence, the DP function is piecewise linear. Since DP functions are non-increasing and convex (see Thm. 1), the slope cannot decrease.
Proof of Theorem 6.
Let and , both in . We can write the objective (33) as
(37) |
where and is the set of feasible solutions to the dual problem (33) where we choose to set (see Remark 4). Let denote the vertices of in . Note that the set of vertices is non-empty, finite, and importantly, independent of . Lemma 5 above implies that the dual optimal value is obtained on this set. We now have from strong duality,
(38) |
where we denote the projections
(39) | ||||
(40) |
As a maximum of finite set of linear functions, (38) is a piecewise linear function. The non-decreasing slope property can be easily deduced from (38), or from the fact that DP functions are convex [6]. ∎
Corollary 7.
The breakpoints of the function lie within the set
(41) |
As we show next, not every vertex is a candidate for optimality in (38); optimal solutions must be obtained on a 2-D convex hull. Using the notations of Theorem 6 proof, denote the set which represents the (finite) set of linear curves on the -dimensional plane by the projections of their corresponding vertices (39)-(40).
Theorem 8.
For any , there exists a vertex of such that , and is an extreme point of .
Proof.
Let be the set of extremals of . The set is finite, hence its convex hull is bounded. We can write any point in as a convex combination , thus we have
(42) |
∎
The results of Theorems 6 and 8 are illustrated in Fig. 2 for alphabet sizes and , where we considered the TV distance and distortion given by a random matrix . We numerically solve (33) for different values of along the DP tradeoff and project the optimal solutions according to (39)-(40). We also calculate the extreme points of the feasible set to obtain (for a discussion about finding the vertices of a feasible set, we refer the reader to [22, Sec. 2.2]). It can be seen that optimal solutions to (33) correspond to the linear segments of the DP function, and are obtained on extreme points of in the -plane.
V-B Full characterization of channels with binary sources
We next focus on the case of binary sources, where with probabilities , respectively, and is of arbitrary size . It suffices to analyze the TV distance (4) as the perceptual index, since every metric defining the Wasserstein- distance is proportional to the Hamming distance in the binary case. The distortion matrix is arbitrary, yielding the matrix defined in (32). Denote which is half the cost of reconstructing as over reconstructing as , and assume w.l.o.g. that . We define , which is right-continuous with left limit . We further denote the symbols whose is non-zero, namely
(43) | |||
(44) |
Theorem 9.
Assume that , and let . Then, the DP function is piecewise linear with breakpoints given by
(45) |
where, specifically, . The DP function is then given by
(46) |
If , then similarly , and , while it is non-negative, and is determined analogously. In the case , and for all .
Remark 10.
If then and this yields a ‘degenerate’ interval. If , then (45) can alternatively be written more simply as .
The results of Theorem 9 are illustrated in Fig. 1. These results reassure the intuition that channel outputs in should be mapped to symbols in in a greedy fashion; At the point , each is reconstructed with a minimal penalty, without any perceptual constraints (as in Proposition 3). This can be done by setting, e.g., . At the point , ’s are still reconstructed optimally, but now under a perception constraint. This can be obtained by rearranging the mapping of symbols whose , which yields no extra cost in distortion. Now, suppose that is not ‘fully allocated’, that is, . As the perception constraint becomes more restrictive (lower ), the estimator will seek for the minimal cost symbols that are mapped to with probability less than , and increase this probability. For a small change of , the cost in distortion is . This is done until is met, namely .
Corollary 11.
At the breakpoints where , an optimal estimator is given by a deterministic rule (for , given by ). Interestingly, at , the estimator is given by the convex combination of estimators at the interval edges, , with .
This result implies that in order to construct an estimator for any point along the tradeoff at test time, without any additional calculations, it is sufficient to calculate estimators beforehand, one at each breakpoint (and at ).
Acknowledgements
The research of NW was partially supported by the Israel Science Foundation (ISF), grant no. 1782/22. The work of RM was partially supported by the Skillman chair in biomedical sciences and by the Ollendorff Minerva Center, ECE Faculty, Technion.
References
- [1] T. Adrai, G. Ohayon, T. Michaeli, and M. Elad, “Deep optimal transport: A practical algorithm for photo-realistic image restoration,” arXiv preprint arXiv:2306.02342, 2023.
- [2] X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, Y. Qiao, and C. Change Loy, “ESRGAN: Enhanced super-resolution generative adversarial networks,” in Proceedings of the European Conference on Computer Vision (ECCV) Workshops, 2018, pp. 0–0.
- [3] B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee, “Enhanced deep residual networks for single image super-resolution,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 136–144.
- [4] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang et al., “Photo-realistic single image super-resolution using a generative adversarial network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4681–4690.
- [5] F. Nielsen, “Hypothesis testing, information divergence and computational geometry,” in International Conference on Geometric Science of Information. Springer, 2013, pp. 241–248.
- [6] Y. Blau and T. Michaeli, “The perception-distortion tradeoff,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 6228–6237.
- [7] D. Freirich, T. Michaeli, and R. Meir, “A theory of the distortion-perception tradeoff in wasserstein space,” Advances in Neural Information Processing Systems, vol. 34, pp. 25 661–25 672, 2021.
- [8] J. Qian, G. Zhang, J. Chen, and A. Khisti, “A rate-distortion-perception theory for binary sources,” in International Zurich Seminar on Information and Communication (IZS 2022). Proceedings. ETH Zurich, 2022, pp. 34–38.
- [9] L. Ambrosio, N. Gigli, and G. Savaré, Gradient flows: in metric spaces and in the space of probability measures. Springer Science & Business Media, 2008.
- [10] R. Van Handel, “Probability in high dimension,” Lecture Notes (Princeton University), 2014.
- [11] M. Raginsky, I. Sason et al., “Concentration of measure inequalities in information theory, communications, and coding,” Foundations and Trends® in Communications and Information Theory, vol. 10, no. 1-2, pp. 1–246, 2013.
- [12] Y. Bai, X. Wu, and A. Özgür, “Information constrained optimal transport: From talagrand, to marton, to cover,” IEEE Transactions on Information Theory, vol. 69, no. 4, pp. 2059–2073, 2023.
- [13] N. Saldi, T. Linder, and S. Yüksel, “Randomized quantization and source coding with constrained output distribution,” IEEE Transactions on Information Theory, vol. 61, no. 1, pp. 91–106, 2015.
- [14] ——, “Output constrained lossy source coding with limited common randomness,” IEEE Transactions on Information Theory, vol. 61, no. 9, pp. 4984–4998, 2015.
- [15] D. Freirich, T. Michaeli, and R. Meir, “Perceptual kalman filters: Online state estimation under a perfect perceptual-quality constraint,” arXiv preprint arXiv:2306.02400, 2023.
- [16] Y. Blau and T. Michaeli, “Rethinking lossy compression: The rate-distortion-perception tradeoff,” in International Conference on Machine Learning. PMLR, 2019, pp. 675–685.
- [17] S. Salehkalaibar, B. Phan, A. Khisti, and W. Yu, “Rate-distortion-perception tradeoff based on the conditional perception measure,” in 2023 Biennial Symposium on Communications (BSC). IEEE, 2023, pp. 31–37.
- [18] Z. Yan, F. Wen, and P. Liu, “Optimally controllable perceptual lossy compression,” arXiv preprint arXiv:2206.10082, 2022.
- [19] L. Theis and A. B. Wagner, “A coding theorem for the rate-distortion-perception function,” arXiv preprint arXiv:2104.13662, 2021.
- [20] A. B. Wagner, “The rate-distortion-perception tradeoff: The role of common randomness,” arXiv preprint arXiv:2202.04147, 2022.
- [21] J. Chen, L. Yu, J. Wang, W. Shi, Y. Ge, and W. Tong, “On the rate-distortion-perception function,” arXiv preprint arXiv:2204.06049, 2022.
- [22] D. Bertsimas and J. N. Tsitsiklis, Introduction to linear optimization. Athena Scientific Belmont, MA, 1997, vol. 6.
- [23] R. J. Vanderbei et al., Linear programming. Springer, 2020.
In this Appendix, we start with an extended review of Linear Programs and their Dual Problems. We derive the dual forms of both formulations (13) and (20) (Eq. (33)). We then provide a detailed proof for Theorem 9 in the text.
-C The linear optimization problem and strong duality
Let the general Linear Programming (LP) problem [22]
(47) |
are real matrices, . The Dual Linear Programming problem (DLP) is given by
(48) |
Recall that by slight abuse of notation, here, similarly to the main text, we use and to denote their indices and , respectively.
Dual problems are useful for establishing lower bounds on the optimal value, due to the property of weak duality, which assures that every feasible value for the Primal problem is greater than or equal to every feasible value of its Dual, yielding (in case where both problems are feasible)
(49) |
For feasible, bounded LP problems we further possess a strong duality, namely the problem (48) is feasible and
(50) |