Characterization of the Distortion-Perception Tradeoff for Finite Channels with Arbitrary Metrics

Dror Freirich, Nir Weinberger and Ron Meir Viterbi Faculty of Electrical and Computer Engineering
Technion - Israel Institute of Technology

Abstract

Whenever inspected by humans, reconstructed signals should not be distinguished from real ones. Typically, such a high perceptual quality comes at the price of high reconstruction error, and vice versa. We study this distortion-perception (DP) tradeoff over finite-alphabet channels, for the Wasserstein- $1$ distance induced by a general metric as the perception index, and an arbitrary distortion matrix. Under this setting, we show that computing the DP function and the optimal reconstructions is equivalent to solving a set of linear programming problems. We provide a structural characterization of the DP tradeoff, where the DP function is piecewise linear in the perception index. We further derive a closed-form expression for the case of binary sources.

I Introduction

Refer to caption — Figure 1: The distortion-perception (DP) function. (Left) The minimal distortion possible for a certain level of perceptual quality forms a convex, non-increasing curve. The region below the curve can not be attained by any reconstruction method. (Right) In our discrete setting, $D(P)$ is a piecewise linear function. Breakpoints $P^{*}_{i}$ and slopes $2u_{i}$ are given explicitly by Theorem 9 for binary sources.

The reconstruction of a signal from degraded data is required in numerous settings across science and engineering. Until recently, reconstruction algorithms’ performance has been measured by its mean distortion, such as mean squared error (MSE). For that reason, many methods aimed to minnimize distortion measures such as MSE and peak signal-to-noise ratio (PSNR). However, in systems whose outputs are inspected by human users, reconstructions should not be easily distinguished from signals typical to the source domain. Therefore, many current works target perceptual quality rather than distortion (e.g. in image restoration, see [1, 2, 3, 4]).

Mathematically, the probability of success in a hypothesis test is known to be proportional to the Total-Variation (TV) distance between distributions [5]. Hence, high perceptual quality is considered to be achieved when the distribution of restored signals is close to the real signals distribution [6].

Good perceptual quality generally comes at the price of high reconstruction error and vice versa. This leads to a tradeoff between distortion and perception, first studied in [6]. The central problem is thus to quantify the distortion-perception (DP) function, which is the minimal distortion possible for a certain level of perceptual quality. The DP problem was studied by various authors. Specifically, [7] studied the DP function in real spaces, for the MSE distortion and the Wasserstein- $2$ perception index. In discrete spaces, [8] characterized the special case of a binary source, for the Hamming distortion and the TV perception index.

In this paper, we focus on discrete spaces, and investigate the DP tradeoff for general finite-alphabet channels and general distortion matrices. As the perception index, we consider the Wasserstein- $1$ distance induced by a general metric, which generalizes the TV distance [9, 10, 11]. We show that finding the DP function and the optimal reconstruction for this setting is equivalent to solving a set of linear problems, and the result is always a piecewise linear function of the perception index, regardless of the channel size, the underlying distributions or distortion measure. This stems from the properties of the dual feasible set. We further revisit the binary setting of [8], and derive a closed-form expression for the DP function, now considering a general distortion measure. We provide a self-contained proof for this case based on our novel analysis of the general setting.

II Preliminaries

II-A The distortion-perception tradeoff

Let $X,Y$ be random variables taking values in some complete separable metric spaces $\mathcal{X\mathrm{,}Y}$ , respectively. We assume the existence of the joint probability $p_{X,Y}$ on $\mathcal{X}\times\mathcal{Y}$ , and a Borel lower-bounded distortion function $d:\mathcal{X}\times\mathcal{X}\rightarrow\mathbb{R}^{+}\cup\{0\}$ . An estimator $\hat{X}\in\mathcal{X}$ is a random variable on $\mathcal{X}$ , defined by its distribution conditioned on the measurement $Y$ , $p_{\hat{X}|Y}$ , with marginal distribution $p_{\hat{X}}$ .

An optimal estimator for the DP tradeoff, is an estimator that minimizes the expected distortion $\mathbb{E}[d(X,\hat{X})]$ under the perception constraint $d_{p}(p_{X},p_{\hat{X}})\leq P$ . Here, $d_{p}$ is a divergence between probability measures. [6] introduced the DP function

D(P)\triangleq\min_{p_{\hat{X}|Y}}\left\{\mathbb{E}[d(X,\hat{X})]\;:\;d_{p}(p_% {X},p_{\hat{X}})\leq P\right\}.

(1)

The expectation is taken w.r.t. the probability measure induced by $p_{XY}$ and $p_{\hat{X}|Y}$ where we assume that $X,\hat{X}$ are independent given $Y$ . We have the following result of [6, Thm. 2].

Theorem 1.

(The perception-distortion tradeoff). If $d_{p}(p,q)$ is convex in its second argument, then the distortion-perception function (1) is monotonically non-increasing and convex.

This is the case for the TV and Wasserstein distances discussed in this paper.

II-B Related work

Apart from the general properties given by Theorem 1, the precise nature of DP functions depends on the exact setup. [7] fully characterized this function in real spaces, considering the MSE and Wasserstein- $2$ indices. Under this setting, the distortion-perception function is always quadratic, and possesses a closed-form expression for Gaussian channels. In this work we discuss discrete signals, where we provide an analogous structural characterization, in which $D(P)$ is always piecewise linear (Theorem 6).

Reconstruction problems with constrained output distributions were studied in optimal transport [12], lossy compression and quantization [13, 14]. Recently, [15] investigated the cost of perfect perceptual consistency constraints in online estimation settings. The DP tradeoff was also extended to lossy compression by presenting the rate-distortion-perception (RDP) function [16, 17, 18], which is the minimal rate of a code whose decoding allows a desired tradeoff between reconstruction and perceptual quality. A coding theorem was introduced for this setting [19, 20], where the properties of optimal codes are investigated [21].

In the context of RDP theory, [8] investigated channels with binary sources. They showed [8, Thm. 7] that for the Hamming loss with the TV perceptual index, the DP function is a piecewise linear function whose breakpoints are given by an explicit formula. Here, we extend this result, considering an arbitrary distortion measure (Theorem 9).

III Problem formulation

In this paper, we discuss the discrete case, where $\mathcal{X}$ and $\mathcal{Y}$ are finite spaces. Let $X,Y$ be discrete variables defined on finite alphabets $\mathcal{X}=\{x_{1},\ldots,x_{n_{x}}\},\mathcal{Y}=\{y_{1},\ldots,y_{n_{y}}\}$ , where $X$ is the variable of interest, and $Y$ is a measurement of $X$ over a noisy channel. Their joint probability $p_{X,Y}\in\mathcal{P}(\mathcal{X}\times\mathcal{Y})$ is represented by the matrix ${\bf P}_{X,Y}=\{p(x,y)\}_{x,y\in\mathcal{X}\mathcal{\times Y}}\in\mathbb{R}^{|% \mathcal{X}|\times|\mathcal{Y}|}$ , and the marginal distributions $p_{X}$ and $p_{Y}$ are given by the vectors ${\bf P}_{X}\in\mathbb{R}^{|\mathcal{X}|},{\bf P}_{Y}\in\mathbb{R}^{|\mathcal{Y% }|}$ . We assume that for each letter in the channel’s output, $p_{Y}(y_{i})>0$ (i.e., we ignore unused symbols). A randomized estimator $\hat{X}\in\mathcal{X}$ of $X$ from $Y$ is defined by a stochastic transition matrix ${\bf Q}={\bf Q}_{\hat{X}|Y}\in\mathbb{R}^{|\mathcal{X}|\times|\mathcal{Y}|}$ whose entries are the probabilities $q(\hat{x}|y)$ to reconstruct the symbol $\hat{x}\in\mathcal{X}$ given that the channel output is $Y=y\in\mathcal{Y}$ . We assume the Markov relation where $X,\hat{X}$ are independent given $Y$ . The arbitrary distortion matrix is given by ${\bf D}=\{d(x,\hat{x})\}_{x,\hat{x}\in\mathcal{X}^{2}}\in\mathbb{R}^{|\mathcal% {X}|\times|\mathcal{X}|}$ , where the expected distortion

\mathbb{E}_{{\bf Q}}\left[{d(X,\hat{X})}\right]=\mathrm{Tr}\left\{{{\bf P}_{X,% Y}^{\top}{\bf D}{\bf Q}}\right\}

(2)

should be minimized w.r.t. $q(\hat{x}|y),\hat{x},y\in\mathcal{X}\times\mathcal{Y}$ . The marginal distribution $p_{\hat{X}}$ of $\hat{X}$ is given by the vector ${\bf P}_{\hat{X}}={\bf Q}{\bf P}_{Y}$ . We are interested in analyzing the distortion-perception (DP) function [6]

D(P)=\min_{{\bf Q}_{\hat{X}|Y}}\left\{\mathbb{E}_{{\bf Q}}\left[{d(X,\hat{X})}% \right]\;:\;d_{p}(p_{X},p_{\hat{X}})\leq P\right\}.

(3)

For simplicity, let us first consider the TV distance as the perceptual index $d_{p}$ , given by

	$\displaystyle d_{TV}({\bf P}_{X},{\bf P}_{\hat{X}})$	$\displaystyle\triangleq\frac{1}{2}\sum_{x\in\mathcal{X}}\|{\bf P}_{X}(x)-{\bf P% }_{\hat{X}}(x)\|$
		$\displaystyle=\sup_{A\subseteq\mathcal{X}}\|p_{X}(A)-p_{\hat{X}}(A)\|.$		(4)

Note that using this definition, $d_{TV}({\bf P}_{X},{\bf P}_{\hat{X}})\in[0,1]$ , and $d_{TV}({\bf P}_{X},{\bf P}_{\hat{X}})=0$ iff ${\bf P}_{X}={\bf P}_{\hat{X}}$ . Now,

	$\displaystyle D(P)=$			(5)
	$\displaystyle\min_{{\bf Q}\geq 0}\left\{({\bf D}^{\top}{\bf P}_{X,Y})\bullet{% \bf Q}:\text{$\begin{array}[]{c}\boldsymbol{1}^{\|\mathcal{X}\|}\cdot{\bf Q}=% \boldsymbol{1}^{\|\mathcal{Y}\|}\\ d_{TV}({\bf P}_{X},{\bf Q}{\bf P}_{Y})\leq P\end{array}$}\!\!\right\},$			(8)

where the Frobenius inner product $A\bullet B=\mathrm{Tr}\left\{{A^{\top}B}\right\}$ , $\boldsymbol{1}^{d}$ is the $1\times d$ dimensional all-ones vector, and for ${\bf Q}\in\mathbb{R}^{|\mathcal{X}|\times|\mathcal{Y}|}$ the constraint ${\bf Q}\geq 0$ is applied elementwise. We start by presenting some elementary properties of (5).

Proposition 2.

Let $P\in[0,1]$ . The optimization problem (5) is feasible (namely, the constraints are satisfiable), and its optimal value is bounded from below.

Proof.

The posterior sampling solution ${\bf Q}={\bf P}_{X|Y}=\{p_{X,Y}(x,y)/p_{Y}(y)\}_{x,y\in\mathcal{X}\mathcal{% \times Y}}$ is feasible for every $P\geq 0$ , since ${\bf P}_{\hat{X}}={\bf Q}{\bf P}_{Y}={\bf P}_{X}$ , yielding $d_{TV}({\bf P}_{\hat{X}},{\bf P}_{X})=0$ . For every stochastic matrix Q,

({\bf D}^{\top}{\bf P}_{X,Y})\bullet{\bf Q}\in\left[\min D_{x,\hat{x}},\max D_% {x,\hat{x}}\right],

(9)

hence the optimal value is bounded. ∎

Proposition 3.

Denote the matrix $\rho\triangleq{\bf D}^{\top}{\bf P}_{X,Y}$ ρろー ≜ bold_D start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_P start_POSTSUBSCRIPT italic_X , italic_Y end_POSTSUBSCRIPT, whose entries are given by $\rho_{\hat{x},y}={\bf P}_{Y}(y)\mathbb{E}\left[{d(X,\hat{x})|Y=y}\right]$ ρろー start_POSTSUBSCRIPT over^ start_ARG italic_x end_ARG , italic_y end_POSTSUBSCRIPT = bold_P start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ( italic_y ) blackboard_E [ italic_d ( italic_X , over^ start_ARG italic_x end_ARG ) | italic_Y = italic_y ]. Then, for any $P\geq 1$ ,

D(P)=\sum_{y}\min_{\hat{x}\in\mathcal{X}}\rho_{\hat{x},y}\triangleq D^{*}.

ρろー start_POSTSUBSCRIPT over^ start_ARG italic_x end_ARG , italic_y end_POSTSUBSCRIPT ≜ italic_D start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT .

(10)

A corresponding optimal estimator is given by

\hat{X}^{*}(Y)\in\mathop{\mathrm{argmin}}_{\hat{x}}\rho_{\hat{x},Y}.

ρろー start_POSTSUBSCRIPT over^ start_ARG italic_x end_ARG , italic_Y end_POSTSUBSCRIPT .

(11)

Trivially, $D(P)\geq D^{*}$ holds for every $P\in[0,1]$ .

The proof is straightforward.

IV Linear Programming formulation

We now observe that the perceptual constraint $\frac{1}{2}\sum_{x\in\mathcal{X}}|{\bf P}_{X}(x)-\sum_{y\in\mathcal{Y}}{\bf P}% _{Y}(y){\bf Q}(x|y)|\leq P$ in (5), is equivalent to the set of linear constraints

\sum_{x\in\mathcal{X}}\pm\left({\bf P}_{X}(x)-\sum_{y\in\mathcal{Y}}{\bf P}_{Y% }(y){\bf Q}(x|y)\right)\leq 2P.

(12)

Taking all possible sign combinations we attain $2^{|\mathcal{X}|}$ linear constraints, where the $2$ constraints for which the signs are either all positive or all negative are redundant since for probability vectors ${\bf P}_{X},{\bf P}_{Y}$ and a stochastic matrix ${\bf Q}$ the LHS of (12) vanishes. Together with (5), we can reformulate the DP function as the following Linear Program (LP) [22, 23]

	$\displaystyle D(P)=$			(13)
	$\displaystyle\min_{{\bf Q}\geq 0}\!\left\{\rho\bullet{\bf Q}:\!\!\begin{array}% []{c}\boldsymbol{1}^{\|\mathcal{X}\|}\cdot{\bf Q}=\boldsymbol{1}^{\|\mathcal{Y}\|}% ,\ {\bf Q}\in\mathbb{R}^{\|\mathcal{X}\|\times\|\mathcal{Y}\|}\\ \sum_{x\in\mathcal{X}}\pm\left({\bf P}_{X}(x)-\sum_{y\in\mathcal{Y}}{\bf P}_{Y% }(y){\bf Q}_{x\|y}\right)\\ \leq 2P\end{array}\!\!\right\}.$ ρろー ∙ bold_Q : start_ARRAY start_ROW start_CELL bold_1 start_POSTSUPERSCRIPT \| caligraphic_X \| end_POSTSUPERSCRIPT ⋅ bold_Q = bold_1 start_POSTSUPERSCRIPT \| caligraphic_Y \| end_POSTSUPERSCRIPT , bold_Q ∈ blackboard_R start_POSTSUPERSCRIPT \| caligraphic_X \| × \| caligraphic_Y \| end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL ∑ start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT ± ( bold_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x ) - ∑ start_POSTSUBSCRIPT italic_y ∈ caligraphic_Y end_POSTSUBSCRIPT bold_P start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ( italic_y ) bold_Q start_POSTSUBSCRIPT italic_x \| italic_y end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL ≤ 2 italic_P end_CELL end_ROW end_ARRAY } .			(17)

In (13), we have $|\mathcal{X}|\times|\mathcal{Y}|$ variables (the entries of ${\bf Q}=\{q(\hat{x}|y)\}$ ), and $|\mathcal{Y}|+2^{|\mathcal{X}|}-2$ constraints.

IV-A Total Variation as a Wasserstein distance

Let ${\bf H}=\{1-\delta_{x,\hat{x}}\}_{x,\hat{x}\in\mathcal{X}\times\mathcal{X}}$ δでるた start_POSTSUBSCRIPT italic_x , over^ start_ARG italic_x end_ARG end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_x , over^ start_ARG italic_x end_ARG ∈ caligraphic_X × caligraphic_X end_POSTSUBSCRIPT be the Hamming distance matrix, let $\mathscr{P}(\mathcal{X})$ be the set of probability measures on $\mathcal{X}$ , and let $\Pi\in\mathscr{P}(\mathcal{X}\times\mathcal{X})$ Πぱい𝒫𝒳𝒳\Pi\in\mathscr{P}(\mathcal{X}\times\mathcal{X})roman_Πぱい ∈ script_P ( caligraphic_X × caligraphic_X ) be a coupling between ${\bf P}_{X}$ and ${\bf P}_{\hat{X}}$ (parameterized by a matrix $\bf\Pi_{x,\hat{x}}$ Πぱい start_POSTSUBSCRIPT bold_x , over^ start_ARG bold_x end_ARG end_POSTSUBSCRIPT). It is well known [10] that taking ${\bf H}$ as a metric on $\mathcal{X}$ , the TV distance coincides with the Wasserstein- $1$ distance on $\mathscr{P}(\mathcal{X})$ , namely

	$\displaystyle d_{TV}({\bf P}_{X},{\bf P}_{\hat{X}})$	$\displaystyle=\inf_{\Pi}\Pi\left[x\neq\hat{x}\right]=W_{1,H}({\bf P}_{X},{\bf P% }_{\hat{X}}),$ ΠぱいΠぱいdelimited-[]𝑥^𝑥subscript𝑊1𝐻subscript𝐏𝑋subscript𝐏^𝑋\displaystyle=\inf_{\Pi}\Pi\left[x\neq\hat{x}\right]=W_{1,H}({\bf P}_{X},{\bf P% }_{\hat{X}}),= roman_inf start_POSTSUBSCRIPT roman_Πぱい end_POSTSUBSCRIPT roman_Πぱい [ italic_x ≠ over^ start_ARG italic_x end_ARG ] = italic_W start_POSTSUBSCRIPT 1 , italic_H end_POSTSUBSCRIPT ( bold_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT , bold_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG end_POSTSUBSCRIPT ) ,		(18)
	$\displaystyle W_{1,H}({\bf P}_{X},{\bf P}_{\hat{X}})$	$\displaystyle\triangleq\inf_{{\bf{\Pi}}}{\bf{\Pi}}\bullet{\bf H}=\inf_{{\bf{% \Pi}}}\sum_{x,\hat{x}}{\bf{\Pi}}_{x,\hat{x}}{\bf H}_{x,\hat{x}},$ Πぱい end_POSTSUBSCRIPT bold_Πぱい ∙ bold_H = roman_inf start_POSTSUBSCRIPT bold_Πぱい end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_x , over^ start_ARG italic_x end_ARG end_POSTSUBSCRIPT bold_Πぱい start_POSTSUBSCRIPT italic_x , over^ start_ARG italic_x end_ARG end_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_x , over^ start_ARG italic_x end_ARG end_POSTSUBSCRIPT ,		(19)

where the minimum is attained [11, Lemma 3.4.1]. Wasserstein distances are convex metrics on $\mathcal{P}(\mathcal{X})$ [9]. Using (19), we can rewrite (13) as the linear problem

	$\displaystyle D(P)=$			(20)
	$\displaystyle\min_{\scriptsize\begin{array}[]{c}{\bf Q},{\bf{\Pi}},\\ \varepsilon\geq 0\end{array}}\!\!\!\!\left\{\rho\bullet{\bf Q}:\!\!\begin{% array}[]{l}\sum_{\hat{x}\in\mathcal{X}}{\bf P}_{Y}(y){\bf Q}_{\hat{x}\|y}={\bf P% }_{Y}(y),\forall y\in\mathcal{Y}\\ \sum_{\hat{x}\in\mathcal{X}}{\bf{\Pi}}_{x,\hat{x}}={\bf P}_{X}(x),\forall x\in% \mathcal{X}\\ \sum_{x\in\mathcal{X}}{\bf{\Pi}}_{x,\hat{x}}=\sum_{y\in\mathcal{Y}}{\bf P}_{Y}% (y){\bf Q}_{\hat{x}\|y},\forall\hat{x}\in\mathcal{X}\\ {\bf{\Pi}}\bullet{\bf H}+\varepsilon=P,{\bf Q}\in\mathbb{R}^{\|\mathcal{X}\|% \times\|\mathcal{Y}\|},\\ {\bf{\Pi}}\in\mathbb{R}^{\|\mathcal{X}\|\times\|\mathcal{X}\|}\end{array}\!\!\!\!\right\}$ Πぱい , end_CELL end_ROW start_ROW start_CELL italic_εいぷしろん ≥ 0 end_CELL end_ROW end_ARRAY end_POSTSUBSCRIPT { italic_ρろー ∙ bold_Q : start_ARRAY start_ROW start_CELL ∑ start_POSTSUBSCRIPT over^ start_ARG italic_x end_ARG ∈ caligraphic_X end_POSTSUBSCRIPT bold_P start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ( italic_y ) bold_Q start_POSTSUBSCRIPT over^ start_ARG italic_x end_ARG \| italic_y end_POSTSUBSCRIPT = bold_P start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ( italic_y ) , ∀ italic_y ∈ caligraphic_Y end_CELL end_ROW start_ROW start_CELL ∑ start_POSTSUBSCRIPT over^ start_ARG italic_x end_ARG ∈ caligraphic_X end_POSTSUBSCRIPT bold_Πぱい start_POSTSUBSCRIPT italic_x , over^ start_ARG italic_x end_ARG end_POSTSUBSCRIPT = bold_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x ) , ∀ italic_x ∈ caligraphic_X end_CELL end_ROW start_ROW start_CELL ∑ start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT bold_Πぱい start_POSTSUBSCRIPT italic_x , over^ start_ARG italic_x end_ARG end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_y ∈ caligraphic_Y end_POSTSUBSCRIPT bold_P start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ( italic_y ) bold_Q start_POSTSUBSCRIPT over^ start_ARG italic_x end_ARG \| italic_y end_POSTSUBSCRIPT , ∀ over^ start_ARG italic_x end_ARG ∈ caligraphic_X end_CELL end_ROW start_ROW start_CELL bold_Πぱい ∙ bold_H + italic_εいぷしろん = italic_P , bold_Q ∈ blackboard_R start_POSTSUPERSCRIPT \| caligraphic_X \| × \| caligraphic_Y \| end_POSTSUPERSCRIPT , end_CELL end_ROW start_ROW start_CELL bold_Πぱい ∈ blackboard_R start_POSTSUPERSCRIPT \| caligraphic_X \| × \| caligraphic_X \| end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY }			(28)

where $\varepsilon$ εいぷしろん is a slack variable. The problem (20) possesses $|\mathcal{X}|(|\mathcal{Y}|+|\mathcal{X}|)+1$ variables and only $|\mathcal{Y}|+2|\mathcal{X}|+1$ constraints, from which $|\mathcal{Y}|+2|\mathcal{X}|$ are independent.

Interestingly, the form (20) allows to discuss a more general family of perceptual divergences $-$ Wasserstein- $1$ distances (19) induced by arbitrary metrics $H$ on $\mathcal{X}$ , which we will consider to be the case from this point on. We will assume w.l.o.g. that $H$ takes values in $[0,1]$ , hence the results of Propositions 2 and 3 hold trivially in this case.

IV-B The Dual Problem

Let the general linear programming problem [22]

{\rm(LP)}\quad\min_{q}z^{\top}q,\,\mathrm{s.t.}\ {\bf{A}}q=b~{}\mathrm{and}~{}% q\geq 0,

(29)

where $q,z\in\mathbb{R}^{n},b\in\mathbb{R}^{n_{c}},{\bf{A}}\in\mathbb{R}^{n_{c}\times n}$ , and the inequality is elementwise. Its dual problem (DLP) is given by

{\rm(DLP)}\quad\max_{w}w^{\top}b,\,\mathrm{s.t.}\ w^{\top}{\bf{A}}\leq z^{\top}.

(30)

Finally, recall that Strong duality holds for feasible and bounded LP problems [22], namely, the problem (30) is feasible and

\min_{q}z^{\top}q=\max_{w}w^{\top}b.

(31)

We next derive the dual form of (20). For convenience, we split the variables in (30) into four groups: $|\mathcal{Y}|$ variables $\{w_{y}\}_{y\in\mathcal{Y}}$ related to the stochasticity constraint on ${\bf Q}$ for each symbol in $\mathcal{Y}$ , the two groups of ${|\mathcal{X}|}$ variables $\{r_{x}\}$ and $\{\nu_{\hat{x}}\}$ νにゅー start_POSTSUBSCRIPT over^ start_ARG italic_x end_ARG end_POSTSUBSCRIPT } related to the constraints on the marginals of ${\bf{\Pi}}$ Πぱい, and the variable $l$ related to the perception constraint ${\bf{\Pi}}\bullet{\bf H}+\varepsilon=P$ Πぱい ∙ bold_H + italic_εいぷしろん = italic_P. We denote

{\bf{\rho}}^{\prime}_{\hat{x},y}\triangleq\frac{{\bf{\rho}}_{\hat{x},y}}{{\bf P% }_{Y}({y})}=\mathbb{E}\left[{d(X,\hat{x})|Y=y}\right],

ρろー start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over^ start_ARG italic_x end_ARG , italic_y end_POSTSUBSCRIPT ≜ divide start_ARG italic_ρろー start_POSTSUBSCRIPT over^ start_ARG italic_x end_ARG , italic_y end_POSTSUBSCRIPT end_ARG start_ARG bold_P start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ( italic_y ) end_ARG = blackboard_E [ italic_d ( italic_X , over^ start_ARG italic_x end_ARG ) | italic_Y = italic_y ] ,

(32)

and explicitly write the dual problem of (20) as (see derivation in the Appendix),

	$\displaystyle\max_{w,r,\nu,l}\left[\sum_{y\in\mathcal{Y}}p_{y}w_{y}+\sum_{x\in% \mathcal{X}}p_{x}r_{x}-lP\right]$ νにゅー , italic_l end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT italic_y ∈ caligraphic_Y end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT - italic_l italic_P ]			(33)
	$\displaystyle\mathrm{s.t.}\,\left\{l\geq 0,\begin{array}[]{ll}w_{y}\leq\rho_{% \hat{x},y}^{\prime}-\nu_{\hat{x}},&\forall\hat{x},y\in\mathcal{X}\times% \mathcal{Y}\\ r_{x}\leq{\bf H}_{x,\hat{x}}l+\nu_{\hat{x}},&\forall x,\hat{x}\in\mathcal{X}% \times\mathcal{X}\end{array}\right..$ ρろー start_POSTSUBSCRIPT over^ start_ARG italic_x end_ARG , italic_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_νにゅー start_POSTSUBSCRIPT over^ start_ARG italic_x end_ARG end_POSTSUBSCRIPT , end_CELL start_CELL ∀ over^ start_ARG italic_x end_ARG , italic_y ∈ caligraphic_X × caligraphic_Y end_CELL end_ROW start_ROW start_CELL italic_r start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ≤ bold_H start_POSTSUBSCRIPT italic_x , over^ start_ARG italic_x end_ARG end_POSTSUBSCRIPT italic_l + italic_νにゅー start_POSTSUBSCRIPT over^ start_ARG italic_x end_ARG end_POSTSUBSCRIPT , end_CELL start_CELL ∀ italic_x , over^ start_ARG italic_x end_ARG ∈ caligraphic_X × caligraphic_X end_CELL end_ROW end_ARRAY .			(36)

From the strong duality property, we have that (33) is feasible; indeed, we can choose $w_{y}=\min_{\hat{x}\in\mathcal{X}}\rho^{\prime}_{\hat{x},y}$ ρろー start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over^ start_ARG italic_x end_ARG , italic_y end_POSTSUBSCRIPT and $r_{x},\nu_{\hat{x}},l=0$ νにゅー start_POSTSUBSCRIPT over^ start_ARG italic_x end_ARG end_POSTSUBSCRIPT , italic_l = 0. This choice of variables recovers the lower bound of Proposition 3, where $D(P)\geq D^{*}$ for $P\in[0,1]$ .

Remark 4.

It is easy to see that in this case ${\rm rank}({\bf{A}})=|\mathcal{Y}|+2|\mathcal{X}|$ , while one constraint is redundant, namely we can eliminate a linear constraint from the primal program (20) (a row of ${\bf{A}}$ ) such that the row rank of the problem is full. Equivalently, we can set one of the variables $r_{x},\nu_{\hat{x}}$ νにゅー start_POSTSUBSCRIPT over^ start_ARG italic_x end_ARG end_POSTSUBSCRIPT to $0$ , and the dual feasible set (projected onto $\mathbb{R}^{|\mathcal{Y}|+2|\mathcal{X}|}$ ) will not contain a line. This implies the existence of an extreme point in this dual set in $\mathbb{R}^{|\mathcal{Y}|+2|\mathcal{X}|}$ (see [22, Thm. 2.6]).

Given a value $P$ , $D(P)$ can be calculated by numerically solving (20) (equivalently, (33)). However, finding a closed-form solution remains an open problem. In Section V-B we find such an expression for small alphabets. We also observe that the objective of (33) is linear in the perception index, hence the maximal value for a given $P$ is attained by some non-increasing linear function of the form $p_{0}+p_{1}P$ . We further develop this insight below.

V Main results

V-A Piecewise linearity of DP functions

While the problem of finding an exact formula for $D(P)$ is still open, here we exploit the properties of the dual problem (33) in order to show the general property that $D(P)$ is piecewise linear in the perception index $P$ . Moreover, the breakpoints and slopes of this function are determined by the vertices of a convex set in $\mathbb{R}^{2}$ . We will utilize the following property of LP problems.

Lemma 5.

([22, Thm. 2.8]) For a bounded LP problem, if there exists an extreme point in the feasible set, then the optimal solution is obtained at an extreme point.

This is true of course also for the dual problem. We now use this result to prove the following.

Theorem 6.

For $P\in[0,\infty)$ , the DP function (20) is a non-increasing piecewise linear function of $P$ with a non-decreasing slope. Furthermore, there exists $P^{*}\in[0,1]$ such that $D(P)=D^{*},P\geq P^{*}$ .

The proof is based on analyzing the dual formulation (33). Due to strong duality (31) this matches the primal problem. The feasible set of (33) has a finite number of vertices, and this set is independent of the perceptual index $P$ . The solution to (33) must occur at one of these vertices. Thus, the interval $[0,1]$ may be partitioned into sub-intervals, so that in each sub-interval the solution to (33) is at the same vertex. For a fixed choice of variables $w,r,\nu$ νにゅー and $l$ in (33), the $D(P)$ function is linear with slope $-l$ . Hence, the DP function is piecewise linear. Since DP functions are non-increasing and convex (see Thm. 1), the slope cannot decrease.

Proof of Theorem 6.

Let $d=[0^{|\mathcal{Y}|+2|\mathcal{X}|},\ 1]^{\top}$ and $b_{0}=[{\bf P}_{Y}^{\top},{\bf P}_{X}^{\top},0^{|\mathcal{X}|+1}]$ , both in $\mathbb{R}^{|\mathcal{Y}|+2|\mathcal{X}|+1}$ . We can write the objective (33) as

[w,r,\nu,l]^{\top}b(P)\rightarrow\max_{w,r,\nu,l\in\mathcal{S}},

νにゅー , italic_l ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_b ( italic_P ) → roman_max start_POSTSUBSCRIPT italic_w , italic_r , italic_νにゅー , italic_l ∈ caligraphic_S end_POSTSUBSCRIPT ,

(37)

where $b(P)=b_{0}-dP$ and $\mathcal{S}$ is the set of feasible solutions to the dual problem (33) where we choose to set $\nu_{|\mathcal{X}|}\equiv 0$ νにゅー start_POSTSUBSCRIPT | caligraphic_X | end_POSTSUBSCRIPT ≡ 0 (see Remark 4). Let $\mathrm{ext}\left(\mathcal{S}\right)=\{p^{i}=[w^{i},r^{i},\nu^{i},l^{i}]\}$ νにゅー start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_l start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] } denote the vertices of $\mathcal{S}$ in $\mathbb{R}^{|\mathcal{Y}|+2|\mathcal{X}|}$ . Note that the set of vertices is non-empty, finite, and importantly, independent of $P$ . Lemma 5 above implies that the dual optimal value is obtained on this set. We now have from strong duality,

	$\displaystyle D(P)$	$\displaystyle=\max_{i}p^{i}\cdot b(P)$
		$\displaystyle=\max_{i}[w^{i},r^{i},\nu^{i},l^{i}]^{\top}b(P)=\max_{i}\left[p_{% 0}^{i}+p_{1}^{i}P\right],$ νにゅー start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_l start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_b ( italic_P ) = roman_max start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT + italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_P ] ,		(38)

where we denote the projections

	$\displaystyle p_{0}^{i}$	$\displaystyle=p^{i}\cdot b_{0},$		(39)
	$\displaystyle p_{1}^{i}$	$\displaystyle=-p^{i}\cdot d.$		(40)

As a maximum of finite set of linear functions, (38) is a piecewise linear function. The non-decreasing slope property can be easily deduced from (38), or from the fact that DP functions are convex [6]. ∎

Corollary 7.

The breakpoints of the $D(P)$ function lie within the set

\mathcal{P}=\left\{{\frac{p_{0}^{i}-p_{0}^{j}}{p_{1}^{j}-p_{1}^{i}}:\,}\begin{% array}[]{c}p^{i},p^{j}\textrm{ are vertices of the set of }\\ \textrm{feasible solutions to the dual problem}\end{array}\right\}.

(41)

As we show next, not every vertex is a candidate for optimality in (38); optimal solutions must be obtained on a 2-D convex hull. Using the notations of Theorem 6 proof, denote the set $\mathcal{S}_{2}=\left\{\left(p_{0}^{i},p_{1}^{i}\right):p_{0}^{i}=p^{i}\cdot b% _{0},p_{1}^{i}=-p^{i}\cdot d,p^{i}\in\mathrm{ext}\left(\mathcal{S}\right)% \right\}\subseteq\mathbb{R}^{2}$ which represents the (finite) set of linear curves $\left\{p_{0}^{i}+p_{1}^{i}P,p^{i}\in\mathrm{ext}\left(\mathcal{S}\right)\right\}$ on the $2$ -dimensional plane by the projections of their corresponding vertices (39)-(40).

Theorem 8.

For any $P\geq 0$ , there exists a vertex of $\mathcal{S}$ such that $p^{k}\in\mathop{\mathrm{argmax}}_{p\in\mathrm{ext}\left(\mathcal{S}\right)}p% \cdot b(P)$ , and $\left(p_{0}^{k},p_{1}^{k}\right)$ is an extreme point of $\mathrm{conv}\left(\mathcal{S}_{2}\right)$ .

Proof.

Let $\left\{\left(\widetilde{p}_{0}^{k},\widetilde{p}_{1}^{k}\right)\right\}_{k=1}^% {M}\subseteq\mathcal{S}_{2}$ be the set of extremals of $\mathrm{conv}\left(\mathcal{S}_{2}\right)$ . The set $\mathcal{S}_{2}$ is finite, hence its convex hull is bounded. We can write any point in $\mathcal{S}_{2}$ as a convex combination $\left(p_{0}^{i},p_{1}^{i}\right)=\sum_{k=1}^{M}\alpha_{ik}\left(\widetilde{p}_% {0}^{k},\widetilde{p}_{1}^{k}\right)$ αあるふぁ start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT ( over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ), thus we have

	$\displaystyle p^{i}\cdot b(P)$	$\displaystyle=p_{0}^{i}+p_{1}^{i}P=\sum_{k=1}^{M}\alpha_{ik}\left(\widetilde{p% }_{0}^{k}+\widetilde{p}_{1}^{k}P\right)$ αあるふぁ start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT ( over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT + over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_P )
		$\displaystyle\leq\max_{k}\left(\widetilde{p}_{0}^{k}+\widetilde{p}_{1}^{k}P% \right)=\max_{k}{p}^{k}\cdot b(P).$		(42)

∎

The results of Theorems 6 and 8 are illustrated in Fig. 2 for alphabet sizes $|\mathcal{X}|=3$ and $|\mathcal{Y}|=5$ , where we considered the TV distance and distortion given by a random matrix ${\bf D}$ . We numerically solve (33) for different values of $P$ along the DP tradeoff and project the optimal solutions according to (39)-(40). We also calculate the extreme points of the feasible set to obtain $\mathcal{S}_{2}$ (for a discussion about finding the vertices of a feasible set, we refer the reader to [22, Sec. 2.2]). It can be seen that optimal solutions to (33) correspond to the linear segments of the DP function, and are obtained on extreme points of $\mathrm{conv}\left(\mathcal{S}_{2}\right)$ in the $(p_{0},p_{1})$ -plane.

V-B Full characterization of channels with binary sources

We next focus on the case of binary sources, where $\mathcal{X}=\{x_{1},x_{2}\}$ with probabilities $p_{x_{1}},p_{x_{2}}$ , respectively, and $\mathcal{Y}$ is of arbitrary size $n_{y}$ . It suffices to analyze the TV distance (4) as the perceptual index, since every metric defining the Wasserstein- $1$ distance is proportional to the Hamming distance in the binary case. The distortion matrix is arbitrary, yielding the matrix $\rho^{\prime}$ ρろー start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT defined in (32). Denote $u_{y}=\frac{1}{2}(\rho^{\prime}_{\hat{x}_{1}y}-\rho^{\prime}_{\hat{x}_{2}y})$ ρろー start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT - italic_ρろー start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ) which is half the cost of reconstructing $y$ as $x_{1}$ over reconstructing as $x_{2}$ , and assume w.l.o.g. that $u_{y_{1}}\leq u_{y_{2}}\leq\ldots\leq u_{y_{n}}$ . We define $P_{Y}^{-}(u)={\rm Pr}\{u_{Y}\leq u\}=\sum_{y:u_{y}\leq u}{\bf P}_{Y}(y)$ , which is right-continuous with left limit $P_{Y}^{-}(u^{-})={\rm Pr}\{u_{Y}<u\}=\sum_{y:u_{y}<u}{\bf P}_{Y}(y)$ . We further denote the symbols $y^{*}_{i}$ whose $u_{y}$ is non-zero, namely

	$\displaystyle 0=u_{0}<u_{1}=u_{y^{}_{1}}\leq\ldots\leq u_{M^{+}}=u_{y^{}_{M^% {+}}},$		(43)
	$\displaystyle u_{-{M^{-}}}=u_{y^{}_{-M^{-}}}\leq\ldots\leq u_{-1}=u_{y^{}_{-% 1}}<0=u_{0}.$		(44)

Theorem 9.

Assume that $p_{x_{1}}\geq P_{Y}^{-}(0)$ , and let $I=\max\{i\colon p_{x_{1}}\geq P_{Y}^{-}(u_{i})\}$ . Then, the DP function $D(P)$ is piecewise linear with breakpoints $\{P^{*}_{i}\}_{i=0}^{I}$ given by

P^{*}_{i}=p_{x_{1}}-P_{Y}^{-}(u_{i})

(45)

where, specifically, $P_{0}^{*}=p_{x_{1}}-P_{Y}^{-}(0)=P^{*}$ . The DP function is then given by

D(P)=\begin{cases}D^{*},&P\geq P_{0}^{*}\\ D(P_{i-1}^{*})+2u_{i}\left(P_{i-1}^{*}-P\right),&P_{i}^{*}\leq P\leq P_{i-1}^{% *}\\ D(P_{I}^{*})+2u_{I+1}\left(P_{I}^{*}-P\right),&0\leq P\leq P_{I}^{*}\end{cases}.

(46)

If $P_{Y}^{-}(0^{-})\geq p_{x_{1}}$ , then similarly $P^{*}_{0}=P_{Y}^{-}(0^{-})-p_{x_{1}}$ , and $P^{*}_{i}=P_{Y}^{-}(u_{-i-1})-p_{x_{1}}$ , while it is non-negative, and $D(P)$ is determined analogously. In the case $P_{Y}^{-}(0)\geq p_{x_{1}}\geq P_{Y}^{-}(0^{-})$ , $P^{*}=0$ and $D(P)\equiv D^{*}$ for all $P\geq 0$ .

Remark 10.

If $u_{i}=u_{i-1}$ then $P^{*}_{i}=P^{*}_{i-1}$ and this yields a ‘degenerate’ interval. If $u_{i}>u_{i-1}$ , then (45) can alternatively be written more simply as $P^{*}_{i}=P^{*}_{i-1}-{\bf P}_{Y}(y^{*}_{i})$ .

The results of Theorem 9 are illustrated in Fig. 1. These results reassure the intuition that channel outputs in $\mathcal{Y}$ should be mapped to symbols in $\{x_{1},x_{2}\}$ in a greedy fashion; At the point $P=1$ , each $y$ is reconstructed with a minimal penalty, without any perceptual constraints (as in Proposition 3). This can be done by setting, e.g., $q(\hat{x}_{1}|y)=\delta_{u_{y}\leq 0}$ δでるた start_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ≤ 0 end_POSTSUBSCRIPT . At the point $P=P^{*}$ , $y$ ’s are still reconstructed optimally, but now under a perception constraint. This can be obtained by rearranging the mapping of symbols whose $u_{y}=0$ , which yields no extra cost in distortion. Now, suppose that $x_{1}$ is not ‘fully allocated’, that is, $p_{x_{1}}\geq P^{-}_{Y}(0)$ . As the perception constraint becomes more restrictive (lower $P$ ), the estimator will seek for the minimal cost symbols $y\in\mathcal{Y}$ that are mapped to $x_{1}$ with probability less than $1$ , and increase this probability. For a small change of $\Delta P$ Δでるた𝑃\Delta Proman_Δでるた italic_P, the cost in distortion is $2u_{y}\Delta P$ Δでるた𝑃2u_{y}\Delta P2 italic_u start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT roman_Δでるた italic_P. This is done until $P=0$ is met, namely $p_{\hat{x}_{1}}=p_{x_{1}}$ .

Corollary 11.

At the breakpoints where $P^{*}_{i}\neq 0$ , an optimal estimator is given by a deterministic rule $Q_{P^{*}_{i}}$ (for $p_{x_{1}}\geq P_{Y}^{-}(0)$ , given by $Q_{P^{*}_{i}}=\left\{q(x_{1}|y)=\delta_{u_{y}\leq u_{i}}\right\}$ δでるた start_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ≤ italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT }). Interestingly, at $P\in\left[P_{i}^{*},P_{i-1}^{*}\right]$ , the estimator is given by the convex combination of estimators at the interval edges, $Q_{P}=\alpha Q_{P_{i-1}^{*}}+(1-\alpha)Q_{P_{i}^{*}}$ αあるふぁ italic_Q start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT + ( 1 - italic_αあるふぁ ) italic_Q start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, with $\alpha=\frac{P-P_{i}^{*}}{P_{i-1}^{*}-P_{i}^{*}}$ αあるふぁ = divide start_ARG italic_P - italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG start_ARG italic_P start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG.

This result implies that in order to construct an estimator for any point along the tradeoff at test time, without any additional calculations, it is sufficient to calculate $\mathcal{O}(|\mathcal{Y}|)$ estimators beforehand, one at each breakpoint (and at $P=0$ ).

Acknowledgements

The research of NW was partially supported by the Israel Science Foundation (ISF), grant no. 1782/22. The work of RM was partially supported by the Skillman chair in biomedical sciences and by the Ollendorff Minerva Center, ECE Faculty, Technion.

References

[1] T. Adrai, G. Ohayon, T. Michaeli, and M. Elad, “Deep optimal transport: A practical algorithm for photo-realistic image restoration,” arXiv preprint arXiv:2306.02342, 2023.
[2] X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, Y. Qiao, and C. Change Loy, “ESRGAN: Enhanced super-resolution generative adversarial networks,” in Proceedings of the European Conference on Computer Vision (ECCV) Workshops, 2018, pp. 0–0.
[3] B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee, “Enhanced deep residual networks for single image super-resolution,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 136–144.
[4] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang et al., “Photo-realistic single image super-resolution using a generative adversarial network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4681–4690.
[5] F. Nielsen, “Hypothesis testing, information divergence and computational geometry,” in International Conference on Geometric Science of Information. Springer, 2013, pp. 241–248.
[6] Y. Blau and T. Michaeli, “The perception-distortion tradeoff,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 6228–6237.
[7] D. Freirich, T. Michaeli, and R. Meir, “A theory of the distortion-perception tradeoff in wasserstein space,” Advances in Neural Information Processing Systems, vol. 34, pp. 25 661–25 672, 2021.
[8] J. Qian, G. Zhang, J. Chen, and A. Khisti, “A rate-distortion-perception theory for binary sources,” in International Zurich Seminar on Information and Communication (IZS 2022). Proceedings. ETH Zurich, 2022, pp. 34–38.
[9] L. Ambrosio, N. Gigli, and G. Savaré, Gradient flows: in metric spaces and in the space of probability measures. Springer Science & Business Media, 2008.
[10] R. Van Handel, “Probability in high dimension,” Lecture Notes (Princeton University), 2014.
[11] M. Raginsky, I. Sason et al., “Concentration of measure inequalities in information theory, communications, and coding,” Foundations and Trends® in Communications and Information Theory, vol. 10, no. 1-2, pp. 1–246, 2013.
[12] Y. Bai, X. Wu, and A. Özgür, “Information constrained optimal transport: From talagrand, to marton, to cover,” IEEE Transactions on Information Theory, vol. 69, no. 4, pp. 2059–2073, 2023.
[13] N. Saldi, T. Linder, and S. Yüksel, “Randomized quantization and source coding with constrained output distribution,” IEEE Transactions on Information Theory, vol. 61, no. 1, pp. 91–106, 2015.
[14] ——, “Output constrained lossy source coding with limited common randomness,” IEEE Transactions on Information Theory, vol. 61, no. 9, pp. 4984–4998, 2015.
[15] D. Freirich, T. Michaeli, and R. Meir, “Perceptual kalman filters: Online state estimation under a perfect perceptual-quality constraint,” arXiv preprint arXiv:2306.02400, 2023.
[16] Y. Blau and T. Michaeli, “Rethinking lossy compression: The rate-distortion-perception tradeoff,” in International Conference on Machine Learning. PMLR, 2019, pp. 675–685.
[17] S. Salehkalaibar, B. Phan, A. Khisti, and W. Yu, “Rate-distortion-perception tradeoff based on the conditional perception measure,” in 2023 Biennial Symposium on Communications (BSC). IEEE, 2023, pp. 31–37.
[18] Z. Yan, F. Wen, and P. Liu, “Optimally controllable perceptual lossy compression,” arXiv preprint arXiv:2206.10082, 2022.
[19] L. Theis and A. B. Wagner, “A coding theorem for the rate-distortion-perception function,” arXiv preprint arXiv:2104.13662, 2021.
[20] A. B. Wagner, “The rate-distortion-perception tradeoff: The role of common randomness,” arXiv preprint arXiv:2202.04147, 2022.
[21] J. Chen, L. Yu, J. Wang, W. Shi, Y. Ge, and W. Tong, “On the rate-distortion-perception function,” arXiv preprint arXiv:2204.06049, 2022.
[22] D. Bertsimas and J. N. Tsitsiklis, Introduction to linear optimization. Athena Scientific Belmont, MA, 1997, vol. 6.
[23] R. J. Vanderbei et al., Linear programming. Springer, 2020.

In this Appendix, we start with an extended review of Linear Programs and their Dual Problems. We derive the dual forms of both formulations (13) and (20) (Eq. (33)). We then provide a detailed proof for Theorem 9 in the text.

-C The linear optimization problem and strong duality

Let the general Linear Programming (LP) problem [22]

\begin{cases}\rho\bullet{\bf Q}&\rightarrow\min_{\bf Q}\\ \mathrm{s.t.}&a_{i}\bullet{\bf Q}=b_{i},\,i\in M_{1}~{}.\\ &s_{i}\bullet{\bf Q}\leq b_{i},\,i\in M_{2}\\ &{\bf Q}\geq 0\end{cases}

ρろー ∙ bold_Q end_CELL start_CELL → roman_min start_POSTSUBSCRIPT bold_Q end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL roman_s . roman_t . end_CELL start_CELL italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∙ bold_Q = italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_i ∈ italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT . end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∙ bold_Q ≤ italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_i ∈ italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL bold_Q ≥ 0 end_CELL end_ROW

(47)

${\bf Q},\rho,a_{i}$ ρろー , italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are real $|\mathcal{X}|\times|\mathcal{Y}|$ matrices, $b=\{b_{i}\}_{i\in M}\in\mathbb{R}^{n_{c}}$ . The Dual Linear Programming problem (DLP) is given by

\begin{cases}w^{\top}b&\rightarrow\max_{w}\\ \mathrm{s.t.}&w_{i}\leq 0,\,i\in M_{2}\\ &w_{i}\in\mathbb{R},\,i\in M_{1}\\ &\sum_{i\in M_{1}}w_{i}\{a_{i}\}_{x,y}+\sum_{i\in M_{2}}w_{i}\{s_{i}\}_{x,y}% \leq\rho_{x,y},\\ &\forall x,y\in\mathcal{X}\times\mathcal{Y}\end{cases}.

ρろー start_POSTSUBSCRIPT italic_x , italic_y end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ∀ italic_x , italic_y ∈ caligraphic_X × caligraphic_Y end_CELL end_ROW .

(48)

Recall that by slight abuse of notation, here, similarly to the main text, we use $x=x_{\alpha}$ αあるふぁ end_POSTSUBSCRIPT and $y=y_{\beta}$ βべーた end_POSTSUBSCRIPT to denote their indices $\alpha$ αあるふぁ and $\beta$ βべーた, respectively.

Dual problems are useful for establishing lower bounds on the optimal value, due to the property of weak duality, which assures that every feasible value for the Primal problem is greater than or equal to every feasible value of its Dual, yielding (in case where both problems are feasible)

\min_{{\bf Q}}\rho\bullet{\bf Q}\geq\max_{w}w^{\top}b.

ρろー ∙ bold_Q ≥ roman_max start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT italic_w start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_b .

(49)

For feasible, bounded LP problems we further possess a strong duality, namely the problem (48) is feasible and

\min_{{\bf Q}}\rho\bullet{\bf Q}=\max_{w}w^{\top}b.

ρろー ∙ bold_Q = roman_max start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT italic_w start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_b .

(50)

-D A dual form for the TV distance setting

For our future analysis, here it is convenient to derive the dual of the form (13) to $D(P)$ . In this formulation, we have $\rho={\bf D}^{\top}{\bf P}_{X,Y}$ ρろー = bold_D start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_P start_POSTSUBSCRIPT italic_X , italic_Y end_POSTSUBSCRIPT, and we can write the parameter $b$ in (47) as

	$\displaystyle b^{\top}=b(P)^{\top}$		(51)
	$\displaystyle=\left[p_{y_{1}},\ldots,p_{y_{n}},2P-S_{1}^{\top}{\bf P}_{X},% \ldots,2P-S_{2^{\|\mathcal{X}\|}-2}^{\top}{\bf P}_{X}\right],$

where $P$ is the perception index. Also,

	$\displaystyle a_{j}$	$\displaystyle={\bf P}_{Y}({y_{j}})\boldsymbol{1}^{\|\mathcal{X}\|\top}e_{j},\,j=% 1,\ldots,\|\mathcal{Y}\|~{},$		(52)
	$\displaystyle s_{i}$	$\displaystyle=S_{i}{\bf P}_{Y}^{\top},\,i=1,\ldots,2^{\|\mathcal{X}\|}-2~{},$		(53)

$S_{i}$ are the vectors of the set $\left\{-1,1\right\}^{|\mathcal{X}|}\backslash\left\{\pm\left[1,...,1\right]\right\}$ , and $e_{j}$ is the $j$ -th unit vector in the standard basis.

For convenience, let us split the decision variables in (48) into two groups; $|\mathcal{Y}|$ variables $\{w_{y}\}_{y\in\mathcal{Y}}$ related to the stochasticity constraint for each symbol in $\mathcal{Y}$ , and the $2^{|\mathcal{X}|}-2$ variables $\{\nu_{i}\}$ νにゅー start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } related to the perception constraints (12). Now, (48) becomes

\begin{cases}[w^{\top},\nu^{\top}]b&\rightarrow\max_{w,\nu}\\ \mathrm{s.t.}&\nu_{i}\leq 0,\,\,i=1,\ldots,2^{|\mathcal{X}|}-2\\ &w_{y}\in\mathbb{R},\,\forall y\in\mathcal{Y}\\ &\sum_{j}w_{j}\{a_{j}\}_{x,y}-\sum_{i}\nu_{i}\{s_{i}\}_{x,y}\leq\rho_{x,y},\\ &\forall x,y\in\mathcal{X}\times\mathcal{Y},\end{cases},