(Translated by https://www.hiragana.jp/)
Characterization of the Distortion-Perception Tradeoff for Finite Channels with Arbitrary Metrics
License: arXiv.org perpetual non-exclusive license
arXiv:2402.02265v1 [cs.IT] 03 Feb 2024

Characterization of the Distortion-Perception Tradeoff for Finite Channels with Arbitrary Metrics

Dror Freirich, Nir Weinberger and Ron Meir Viterbi Faculty of Electrical and Computer Engineering
Technion - Israel Institute of Technology
Abstract

Whenever inspected by humans, reconstructed signals should not be distinguished from real ones. Typically, such a high perceptual quality comes at the price of high reconstruction error, and vice versa. We study this distortion-perception (DP) tradeoff over finite-alphabet channels, for the Wasserstein-1111 distance induced by a general metric as the perception index, and an arbitrary distortion matrix. Under this setting, we show that computing the DP function and the optimal reconstructions is equivalent to solving a set of linear programming problems. We provide a structural characterization of the DP tradeoff, where the DP function is piecewise linear in the perception index. We further derive a closed-form expression for the case of binary sources.

I Introduction

Refer to caption
Refer to caption
Figure 1: The distortion-perception (DP) function. (Left) The minimal distortion possible for a certain level of perceptual quality forms a convex, non-increasing curve. The region below the curve can not be attained by any reconstruction method. (Right) In our discrete setting, D(P)𝐷𝑃D(P)italic_D ( italic_P ) is a piecewise linear function. Breakpoints Pi*subscriptsuperscript𝑃𝑖P^{*}_{i}italic_P start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and slopes 2ui2subscript𝑢𝑖2u_{i}2 italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are given explicitly by Theorem 9 for binary sources.

The reconstruction of a signal from degraded data is required in numerous settings across science and engineering. Until recently, reconstruction algorithms’ performance has been measured by its mean distortion, such as mean squared error (MSE). For that reason, many methods aimed to minnimize distortion measures such as MSE and peak signal-to-noise ratio (PSNR). However, in systems whose outputs are inspected by human users, reconstructions should not be easily distinguished from signals typical to the source domain. Therefore, many current works target perceptual quality rather than distortion (e.g. in image restoration, see [1, 2, 3, 4]).

Mathematically, the probability of success in a hypothesis test is known to be proportional to the Total-Variation (TV) distance between distributions [5]. Hence, high perceptual quality is considered to be achieved when the distribution of restored signals is close to the real signals distribution [6].

Good perceptual quality generally comes at the price of high reconstruction error and vice versa. This leads to a tradeoff between distortion and perception, first studied in [6]. The central problem is thus to quantify the distortion-perception (DP) function, which is the minimal distortion possible for a certain level of perceptual quality. The DP problem was studied by various authors. Specifically, [7] studied the DP function in real spaces, for the MSE distortion and the Wasserstein-2222 perception index. In discrete spaces, [8] characterized the special case of a binary source, for the Hamming distortion and the TV perception index.

In this paper, we focus on discrete spaces, and investigate the DP tradeoff for general finite-alphabet channels and general distortion matrices. As the perception index, we consider the Wasserstein-1111 distance induced by a general metric, which generalizes the TV distance [9, 10, 11]. We show that finding the DP function and the optimal reconstruction for this setting is equivalent to solving a set of linear problems, and the result is always a piecewise linear function of the perception index, regardless of the channel size, the underlying distributions or distortion measure. This stems from the properties of the dual feasible set. We further revisit the binary setting of [8], and derive a closed-form expression for the DP function, now considering a general distortion measure. We provide a self-contained proof for this case based on our novel analysis of the general setting.

II Preliminaries

II-A The distortion-perception tradeoff

Let X,Y𝑋𝑌X,Yitalic_X , italic_Y be random variables taking values in some complete separable metric spaces 𝒳,𝒴𝒳𝒴\mathcal{X\mathrm{,}Y}caligraphic_X , caligraphic_Y, respectively. We assume the existence of the joint probability pX,Ysubscript𝑝𝑋𝑌p_{X,Y}italic_p start_POSTSUBSCRIPT italic_X , italic_Y end_POSTSUBSCRIPT on 𝒳×𝒴𝒳𝒴\mathcal{X}\times\mathcal{Y}caligraphic_X × caligraphic_Y, and a Borel lower-bounded distortion function d:𝒳×𝒳+{0}:𝑑𝒳𝒳superscript0d:\mathcal{X}\times\mathcal{X}\rightarrow\mathbb{R}^{+}\cup\{0\}italic_d : caligraphic_X × caligraphic_X → blackboard_R start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ∪ { 0 }. An estimator X^𝒳^𝑋𝒳\hat{X}\in\mathcal{X}over^ start_ARG italic_X end_ARG ∈ caligraphic_X is a random variable on 𝒳𝒳\mathcal{X}caligraphic_X, defined by its distribution conditioned on the measurement Y𝑌Yitalic_Y, pX^|Ysubscript𝑝conditional^𝑋𝑌p_{\hat{X}|Y}italic_p start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG | italic_Y end_POSTSUBSCRIPT, with marginal distribution pX^subscript𝑝^𝑋p_{\hat{X}}italic_p start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG end_POSTSUBSCRIPT.

An optimal estimator for the DP tradeoff, is an estimator that minimizes the expected distortion 𝔼[d(X,X^)]𝔼delimited-[]𝑑𝑋^𝑋\mathbb{E}[d(X,\hat{X})]blackboard_E [ italic_d ( italic_X , over^ start_ARG italic_X end_ARG ) ] under the perception constraint dp(pX,pX^)Psubscript𝑑𝑝subscript𝑝𝑋subscript𝑝^𝑋𝑃d_{p}(p_{X},p_{\hat{X}})\leq Pitalic_d start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG end_POSTSUBSCRIPT ) ≤ italic_P. Here, dpsubscript𝑑𝑝d_{p}italic_d start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT is a divergence between probability measures. [6] introduced the DP function

D(P)minpX^|Y{𝔼[d(X,X^)]:dp(pX,pX^)P}.𝐷𝑃subscriptsubscript𝑝conditional^𝑋𝑌:𝔼delimited-[]𝑑𝑋^𝑋subscript𝑑𝑝subscript𝑝𝑋subscript𝑝^𝑋𝑃D(P)\triangleq\min_{p_{\hat{X}|Y}}\left\{\mathbb{E}[d(X,\hat{X})]\;:\;d_{p}(p_% {X},p_{\hat{X}})\leq P\right\}.italic_D ( italic_P ) ≜ roman_min start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG | italic_Y end_POSTSUBSCRIPT end_POSTSUBSCRIPT { blackboard_E [ italic_d ( italic_X , over^ start_ARG italic_X end_ARG ) ] : italic_d start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG end_POSTSUBSCRIPT ) ≤ italic_P } . (1)

The expectation is taken w.r.t. the probability measure induced by pXYsubscript𝑝𝑋𝑌p_{XY}italic_p start_POSTSUBSCRIPT italic_X italic_Y end_POSTSUBSCRIPT and pX^|Ysubscript𝑝conditional^𝑋𝑌p_{\hat{X}|Y}italic_p start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG | italic_Y end_POSTSUBSCRIPT where we assume that X,X^𝑋^𝑋X,\hat{X}italic_X , over^ start_ARG italic_X end_ARG are independent given Y𝑌Yitalic_Y. We have the following result of [6, Thm. 2].

Theorem 1.

(The perception-distortion tradeoff). If dp(p,q)subscript𝑑𝑝𝑝𝑞d_{p}(p,q)italic_d start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_p , italic_q ) is convex in its second argument, then the distortion-perception function (1) is monotonically non-increasing and convex.

This is the case for the TV and Wasserstein distances discussed in this paper.

II-B Related work

Apart from the general properties given by Theorem 1, the precise nature of DP functions depends on the exact setup. [7] fully characterized this function in real spaces, considering the MSE and Wasserstein-2222 indices. Under this setting, the distortion-perception function is always quadratic, and possesses a closed-form expression for Gaussian channels. In this work we discuss discrete signals, where we provide an analogous structural characterization, in which D(P)𝐷𝑃D(P)italic_D ( italic_P ) is always piecewise linear (Theorem 6).

Reconstruction problems with constrained output distributions were studied in optimal transport [12], lossy compression and quantization [13, 14]. Recently, [15] investigated the cost of perfect perceptual consistency constraints in online estimation settings. The DP tradeoff was also extended to lossy compression by presenting the rate-distortion-perception (RDP) function [16, 17, 18], which is the minimal rate of a code whose decoding allows a desired tradeoff between reconstruction and perceptual quality. A coding theorem was introduced for this setting [19, 20], where the properties of optimal codes are investigated [21].

In the context of RDP theory, [8] investigated channels with binary sources. They showed [8, Thm. 7] that for the Hamming loss with the TV perceptual index, the DP function is a piecewise linear function whose breakpoints are given by an explicit formula. Here, we extend this result, considering an arbitrary distortion measure (Theorem 9).

III Problem formulation

In this paper, we discuss the discrete case, where 𝒳𝒳\mathcal{X}caligraphic_X and 𝒴𝒴\mathcal{Y}caligraphic_Y are finite spaces. Let X,Y𝑋𝑌X,Yitalic_X , italic_Y be discrete variables defined on finite alphabets 𝒳={x1,,xnx},𝒴={y1,,yny}formulae-sequence𝒳subscript𝑥1subscript𝑥subscript𝑛𝑥𝒴subscript𝑦1subscript𝑦subscript𝑛𝑦\mathcal{X}=\{x_{1},\ldots,x_{n_{x}}\},\mathcal{Y}=\{y_{1},\ldots,y_{n_{y}}\}caligraphic_X = { italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_POSTSUBSCRIPT } , caligraphic_Y = { italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_y start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT end_POSTSUBSCRIPT }, where X𝑋Xitalic_X is the variable of interest, and Y𝑌Yitalic_Y is a measurement of X𝑋Xitalic_X over a noisy channel. Their joint probability pX,Y𝒫(𝒳×𝒴)subscript𝑝𝑋𝑌𝒫𝒳𝒴p_{X,Y}\in\mathcal{P}(\mathcal{X}\times\mathcal{Y})italic_p start_POSTSUBSCRIPT italic_X , italic_Y end_POSTSUBSCRIPT ∈ caligraphic_P ( caligraphic_X × caligraphic_Y ) is represented by the matrix 𝐏X,Y={p(x,y)}x,y𝒳×𝒴|𝒳|×|𝒴|subscript𝐏𝑋𝑌subscript𝑝𝑥𝑦𝑥𝑦𝒳𝒴superscript𝒳𝒴{\bf P}_{X,Y}=\{p(x,y)\}_{x,y\in\mathcal{X}\mathcal{\times Y}}\in\mathbb{R}^{|% \mathcal{X}|\times|\mathcal{Y}|}bold_P start_POSTSUBSCRIPT italic_X , italic_Y end_POSTSUBSCRIPT = { italic_p ( italic_x , italic_y ) } start_POSTSUBSCRIPT italic_x , italic_y ∈ caligraphic_X × caligraphic_Y end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT | caligraphic_X | × | caligraphic_Y | end_POSTSUPERSCRIPT, and the marginal distributions pXsubscript𝑝𝑋p_{X}italic_p start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT and pYsubscript𝑝𝑌p_{Y}italic_p start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT are given by the vectors 𝐏X|𝒳|,𝐏Y|𝒴|formulae-sequencesubscript𝐏𝑋superscript𝒳subscript𝐏𝑌superscript𝒴{\bf P}_{X}\in\mathbb{R}^{|\mathcal{X}|},{\bf P}_{Y}\in\mathbb{R}^{|\mathcal{Y% }|}bold_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT | caligraphic_X | end_POSTSUPERSCRIPT , bold_P start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT | caligraphic_Y | end_POSTSUPERSCRIPT. We assume that for each letter in the channel’s output, pY(yi)>0subscript𝑝𝑌subscript𝑦𝑖0p_{Y}(y_{i})>0italic_p start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) > 0 (i.e., we ignore unused symbols). A randomized estimator X^𝒳^𝑋𝒳\hat{X}\in\mathcal{X}over^ start_ARG italic_X end_ARG ∈ caligraphic_X of X𝑋Xitalic_X from Y𝑌Yitalic_Y is defined by a stochastic transition matrix 𝐐=𝐐X^|Y|𝒳|×|𝒴|𝐐subscript𝐐conditional^𝑋𝑌superscript𝒳𝒴{\bf Q}={\bf Q}_{\hat{X}|Y}\in\mathbb{R}^{|\mathcal{X}|\times|\mathcal{Y}|}bold_Q = bold_Q start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG | italic_Y end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT | caligraphic_X | × | caligraphic_Y | end_POSTSUPERSCRIPT whose entries are the probabilities q(x^|y)𝑞conditional^𝑥𝑦q(\hat{x}|y)italic_q ( over^ start_ARG italic_x end_ARG | italic_y ) to reconstruct the symbol x^𝒳^𝑥𝒳\hat{x}\in\mathcal{X}over^ start_ARG italic_x end_ARG ∈ caligraphic_X given that the channel output is Y=y𝒴𝑌𝑦𝒴Y=y\in\mathcal{Y}italic_Y = italic_y ∈ caligraphic_Y. We assume the Markov relation where X,X^𝑋^𝑋X,\hat{X}italic_X , over^ start_ARG italic_X end_ARG are independent given Y𝑌Yitalic_Y. The arbitrary distortion matrix is given by 𝐃={d(x,x^)}x,x^𝒳2|𝒳|×|𝒳|𝐃subscript𝑑𝑥^𝑥𝑥^𝑥superscript𝒳2superscript𝒳𝒳{\bf D}=\{d(x,\hat{x})\}_{x,\hat{x}\in\mathcal{X}^{2}}\in\mathbb{R}^{|\mathcal% {X}|\times|\mathcal{X}|}bold_D = { italic_d ( italic_x , over^ start_ARG italic_x end_ARG ) } start_POSTSUBSCRIPT italic_x , over^ start_ARG italic_x end_ARG ∈ caligraphic_X start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT | caligraphic_X | × | caligraphic_X | end_POSTSUPERSCRIPT, where the expected distortion

𝔼𝐐[d(X,X^)]=Tr{𝐏X,Y𝐃𝐐}subscript𝔼𝐐delimited-[]𝑑𝑋^𝑋Trsuperscriptsubscript𝐏𝑋𝑌top𝐃𝐐\mathbb{E}_{{\bf Q}}\left[{d(X,\hat{X})}\right]=\mathrm{Tr}\left\{{{\bf P}_{X,% Y}^{\top}{\bf D}{\bf Q}}\right\}blackboard_E start_POSTSUBSCRIPT bold_Q end_POSTSUBSCRIPT [ italic_d ( italic_X , over^ start_ARG italic_X end_ARG ) ] = roman_Tr { bold_P start_POSTSUBSCRIPT italic_X , italic_Y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_DQ } (2)

should be minimized w.r.t. q(x^|y),x^,y𝒳×𝒴𝑞conditional^𝑥𝑦^𝑥𝑦𝒳𝒴q(\hat{x}|y),\hat{x},y\in\mathcal{X}\times\mathcal{Y}italic_q ( over^ start_ARG italic_x end_ARG | italic_y ) , over^ start_ARG italic_x end_ARG , italic_y ∈ caligraphic_X × caligraphic_Y. The marginal distribution pX^subscript𝑝^𝑋p_{\hat{X}}italic_p start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG end_POSTSUBSCRIPT of X^^𝑋\hat{X}over^ start_ARG italic_X end_ARG is given by the vector 𝐏X^=𝐐𝐏Ysubscript𝐏^𝑋subscript𝐐𝐏𝑌{\bf P}_{\hat{X}}={\bf Q}{\bf P}_{Y}bold_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG end_POSTSUBSCRIPT = bold_QP start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT. We are interested in analyzing the distortion-perception (DP) function [6]

D(P)=min𝐐X^|Y{𝔼𝐐[d(X,X^)]:dp(pX,pX^)P}.𝐷𝑃subscriptsubscript𝐐conditional^𝑋𝑌:subscript𝔼𝐐delimited-[]𝑑𝑋^𝑋subscript𝑑𝑝subscript𝑝𝑋subscript𝑝^𝑋𝑃D(P)=\min_{{\bf Q}_{\hat{X}|Y}}\left\{\mathbb{E}_{{\bf Q}}\left[{d(X,\hat{X})}% \right]\;:\;d_{p}(p_{X},p_{\hat{X}})\leq P\right\}.italic_D ( italic_P ) = roman_min start_POSTSUBSCRIPT bold_Q start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG | italic_Y end_POSTSUBSCRIPT end_POSTSUBSCRIPT { blackboard_E start_POSTSUBSCRIPT bold_Q end_POSTSUBSCRIPT [ italic_d ( italic_X , over^ start_ARG italic_X end_ARG ) ] : italic_d start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG end_POSTSUBSCRIPT ) ≤ italic_P } . (3)

For simplicity, let us first consider the TV distance as the perceptual index dpsubscript𝑑𝑝d_{p}italic_d start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT, given by

dTV(𝐏X,𝐏X^)subscript𝑑𝑇𝑉subscript𝐏𝑋subscript𝐏^𝑋\displaystyle d_{TV}({\bf P}_{X},{\bf P}_{\hat{X}})italic_d start_POSTSUBSCRIPT italic_T italic_V end_POSTSUBSCRIPT ( bold_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT , bold_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG end_POSTSUBSCRIPT ) 12x𝒳|𝐏X(x)𝐏X^(x)|absent12subscript𝑥𝒳subscript𝐏𝑋𝑥subscript𝐏^𝑋𝑥\displaystyle\triangleq\frac{1}{2}\sum_{x\in\mathcal{X}}|{\bf P}_{X}(x)-{\bf P% }_{\hat{X}}(x)|≜ divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT | bold_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x ) - bold_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG end_POSTSUBSCRIPT ( italic_x ) |
=supA𝒳|pX(A)pX^(A)|.absentsubscriptsupremum𝐴𝒳subscript𝑝𝑋𝐴subscript𝑝^𝑋𝐴\displaystyle=\sup_{A\subseteq\mathcal{X}}|p_{X}(A)-p_{\hat{X}}(A)|.= roman_sup start_POSTSUBSCRIPT italic_A ⊆ caligraphic_X end_POSTSUBSCRIPT | italic_p start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_A ) - italic_p start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG end_POSTSUBSCRIPT ( italic_A ) | . (4)

Note that using this definition, dTV(𝐏X,𝐏X^)[0,1]subscript𝑑𝑇𝑉subscript𝐏𝑋subscript𝐏^𝑋01d_{TV}({\bf P}_{X},{\bf P}_{\hat{X}})\in[0,1]italic_d start_POSTSUBSCRIPT italic_T italic_V end_POSTSUBSCRIPT ( bold_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT , bold_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG end_POSTSUBSCRIPT ) ∈ [ 0 , 1 ], and dTV(𝐏X,𝐏X^)=0subscript𝑑𝑇𝑉subscript𝐏𝑋subscript𝐏^𝑋0d_{TV}({\bf P}_{X},{\bf P}_{\hat{X}})=0italic_d start_POSTSUBSCRIPT italic_T italic_V end_POSTSUBSCRIPT ( bold_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT , bold_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG end_POSTSUBSCRIPT ) = 0 iff 𝐏X=𝐏X^subscript𝐏𝑋subscript𝐏^𝑋{\bf P}_{X}={\bf P}_{\hat{X}}bold_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT = bold_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG end_POSTSUBSCRIPT. Now,

D(P)=𝐷𝑃absent\displaystyle D(P)=italic_D ( italic_P ) = (5)
min𝐐0{(𝐃𝐏X,Y)𝐐:𝟏|𝒳|𝐐=𝟏|𝒴|dTV(𝐏X,𝐐𝐏Y)P},\displaystyle\min_{{\bf Q}\geq 0}\left\{({\bf D}^{\top}{\bf P}_{X,Y})\bullet{% \bf Q}:\text{$\begin{array}[]{c}\boldsymbol{1}^{|\mathcal{X}|}\cdot{\bf Q}=% \boldsymbol{1}^{|\mathcal{Y}|}\\ d_{TV}({\bf P}_{X},{\bf Q}{\bf P}_{Y})\leq P\end{array}$}\!\!\right\},roman_min start_POSTSUBSCRIPT bold_Q ≥ 0 end_POSTSUBSCRIPT { ( bold_D start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_P start_POSTSUBSCRIPT italic_X , italic_Y end_POSTSUBSCRIPT ) ∙ bold_Q : start_ARRAY start_ROW start_CELL bold_1 start_POSTSUPERSCRIPT | caligraphic_X | end_POSTSUPERSCRIPT ⋅ bold_Q = bold_1 start_POSTSUPERSCRIPT | caligraphic_Y | end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_d start_POSTSUBSCRIPT italic_T italic_V end_POSTSUBSCRIPT ( bold_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT , bold_QP start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ) ≤ italic_P end_CELL end_ROW end_ARRAY } , (8)

where the Frobenius inner product AB=Tr{AB}𝐴𝐵Trsuperscript𝐴top𝐵A\bullet B=\mathrm{Tr}\left\{{A^{\top}B}\right\}italic_A ∙ italic_B = roman_Tr { italic_A start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_B }, 𝟏dsuperscript1𝑑\boldsymbol{1}^{d}bold_1 start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT is the 1×d1𝑑1\times d1 × italic_d dimensional all-ones vector, and for 𝐐|𝒳|×|𝒴|𝐐superscript𝒳𝒴{\bf Q}\in\mathbb{R}^{|\mathcal{X}|\times|\mathcal{Y}|}bold_Q ∈ blackboard_R start_POSTSUPERSCRIPT | caligraphic_X | × | caligraphic_Y | end_POSTSUPERSCRIPT the constraint 𝐐0𝐐0{\bf Q}\geq 0bold_Q ≥ 0 is applied elementwise. We start by presenting some elementary properties of (5).

Proposition 2.

Let P[0,1]𝑃01P\in[0,1]italic_P ∈ [ 0 , 1 ]. The optimization problem (5) is feasible (namely, the constraints are satisfiable), and its optimal value is bounded from below.

Proof.

The posterior sampling solution 𝐐=𝐏X|Y={pX,Y(x,y)/pY(y)}x,y𝒳×𝒴𝐐subscript𝐏conditional𝑋𝑌subscriptsubscript𝑝𝑋𝑌𝑥𝑦subscript𝑝𝑌𝑦𝑥𝑦𝒳𝒴{\bf Q}={\bf P}_{X|Y}=\{p_{X,Y}(x,y)/p_{Y}(y)\}_{x,y\in\mathcal{X}\mathcal{% \times Y}}bold_Q = bold_P start_POSTSUBSCRIPT italic_X | italic_Y end_POSTSUBSCRIPT = { italic_p start_POSTSUBSCRIPT italic_X , italic_Y end_POSTSUBSCRIPT ( italic_x , italic_y ) / italic_p start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ( italic_y ) } start_POSTSUBSCRIPT italic_x , italic_y ∈ caligraphic_X × caligraphic_Y end_POSTSUBSCRIPT is feasible for every P0𝑃0P\geq 0italic_P ≥ 0, since 𝐏X^=𝐐𝐏Y=𝐏Xsubscript𝐏^𝑋subscript𝐐𝐏𝑌subscript𝐏𝑋{\bf P}_{\hat{X}}={\bf Q}{\bf P}_{Y}={\bf P}_{X}bold_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG end_POSTSUBSCRIPT = bold_QP start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT = bold_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT, yielding dTV(𝐏X^,𝐏X)=0subscript𝑑𝑇𝑉subscript𝐏^𝑋subscript𝐏𝑋0d_{TV}({\bf P}_{\hat{X}},{\bf P}_{X})=0italic_d start_POSTSUBSCRIPT italic_T italic_V end_POSTSUBSCRIPT ( bold_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG end_POSTSUBSCRIPT , bold_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ) = 0. For every stochastic matrix Q,

(𝐃𝐏X,Y)𝐐[minDx,x^,maxDx,x^],superscript𝐃topsubscript𝐏𝑋𝑌𝐐subscript𝐷𝑥^𝑥subscript𝐷𝑥^𝑥({\bf D}^{\top}{\bf P}_{X,Y})\bullet{\bf Q}\in\left[\min D_{x,\hat{x}},\max D_% {x,\hat{x}}\right],( bold_D start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_P start_POSTSUBSCRIPT italic_X , italic_Y end_POSTSUBSCRIPT ) ∙ bold_Q ∈ [ roman_min italic_D start_POSTSUBSCRIPT italic_x , over^ start_ARG italic_x end_ARG end_POSTSUBSCRIPT , roman_max italic_D start_POSTSUBSCRIPT italic_x , over^ start_ARG italic_x end_ARG end_POSTSUBSCRIPT ] , (9)

hence the optimal value is bounded. ∎

Proposition 3.

Denote the matrix ρろー𝐃𝐏X,Ynormal-≜𝜌superscript𝐃topsubscript𝐏𝑋𝑌\rho\triangleq{\bf D}^{\top}{\bf P}_{X,Y}italic_ρろー ≜ bold_D start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_P start_POSTSUBSCRIPT italic_X , italic_Y end_POSTSUBSCRIPT, whose entries are given by ρろーx^,y=𝐏Y(y)𝔼[d(X,x^)|Y=y]subscript𝜌normal-^𝑥𝑦subscript𝐏𝑌𝑦𝔼delimited-[]conditional𝑑𝑋normal-^𝑥𝑌𝑦\rho_{\hat{x},y}={\bf P}_{Y}(y)\mathbb{E}\left[{d(X,\hat{x})|Y=y}\right]italic_ρろー start_POSTSUBSCRIPT over^ start_ARG italic_x end_ARG , italic_y end_POSTSUBSCRIPT = bold_P start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ( italic_y ) blackboard_E [ italic_d ( italic_X , over^ start_ARG italic_x end_ARG ) | italic_Y = italic_y ]. Then, for any P1𝑃1P\geq 1italic_P ≥ 1,

D(P)=yminx^𝒳ρろーx^,yD*.𝐷𝑃subscript𝑦subscript^𝑥𝒳subscript𝜌^𝑥𝑦superscript𝐷D(P)=\sum_{y}\min_{\hat{x}\in\mathcal{X}}\rho_{\hat{x},y}\triangleq D^{*}.italic_D ( italic_P ) = ∑ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT roman_min start_POSTSUBSCRIPT over^ start_ARG italic_x end_ARG ∈ caligraphic_X end_POSTSUBSCRIPT italic_ρろー start_POSTSUBSCRIPT over^ start_ARG italic_x end_ARG , italic_y end_POSTSUBSCRIPT ≜ italic_D start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT . (10)

A corresponding optimal estimator is given by

X^*(Y)argminx^ρろーx^,Y.superscript^𝑋𝑌subscriptargmin^𝑥subscript𝜌^𝑥𝑌\hat{X}^{*}(Y)\in\mathop{\mathrm{argmin}}_{\hat{x}}\rho_{\hat{x},Y}.over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_Y ) ∈ roman_argmin start_POSTSUBSCRIPT over^ start_ARG italic_x end_ARG end_POSTSUBSCRIPT italic_ρろー start_POSTSUBSCRIPT over^ start_ARG italic_x end_ARG , italic_Y end_POSTSUBSCRIPT . (11)

Trivially, D(P)D*𝐷𝑃superscript𝐷D(P)\geq D^{*}italic_D ( italic_P ) ≥ italic_D start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT holds for every P[0,1]𝑃01P\in[0,1]italic_P ∈ [ 0 , 1 ].

The proof is straightforward.

IV Linear Programming formulation

We now observe that the perceptual constraint 12x𝒳|𝐏X(x)y𝒴𝐏Y(y)𝐐(x|y)|P\frac{1}{2}\sum_{x\in\mathcal{X}}|{\bf P}_{X}(x)-\sum_{y\in\mathcal{Y}}{\bf P}% _{Y}(y){\bf Q}(x|y)|\leq Pdivide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT | bold_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x ) - ∑ start_POSTSUBSCRIPT italic_y ∈ caligraphic_Y end_POSTSUBSCRIPT bold_P start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ( italic_y ) bold_Q ( italic_x | italic_y ) | ≤ italic_P in (5), is equivalent to the set of linear constraints

x𝒳±(𝐏X(x)y𝒴𝐏Y(y)𝐐(x|y))2P.plus-or-minussubscript𝑥𝒳subscript𝐏𝑋𝑥subscript𝑦𝒴subscript𝐏𝑌𝑦𝐐conditional𝑥𝑦2𝑃\sum_{x\in\mathcal{X}}\pm\left({\bf P}_{X}(x)-\sum_{y\in\mathcal{Y}}{\bf P}_{Y% }(y){\bf Q}(x|y)\right)\leq 2P.∑ start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT ± ( bold_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x ) - ∑ start_POSTSUBSCRIPT italic_y ∈ caligraphic_Y end_POSTSUBSCRIPT bold_P start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ( italic_y ) bold_Q ( italic_x | italic_y ) ) ≤ 2 italic_P . (12)

Taking all possible sign combinations we attain 2|𝒳|superscript2𝒳2^{|\mathcal{X}|}2 start_POSTSUPERSCRIPT | caligraphic_X | end_POSTSUPERSCRIPT linear constraints, where the 2222 constraints for which the signs are either all positive or all negative are redundant since for probability vectors 𝐏X,𝐏Ysubscript𝐏𝑋subscript𝐏𝑌{\bf P}_{X},{\bf P}_{Y}bold_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT , bold_P start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT and a stochastic matrix 𝐐𝐐{\bf Q}bold_Q the LHS of (12) vanishes. Together with (5), we can reformulate the DP function as the following Linear Program (LP) [22, 23]

D(P)=𝐷𝑃absent\displaystyle D(P)=italic_D ( italic_P ) = (13)
min𝐐0{ρろー𝐐:𝟏|𝒳|𝐐=𝟏|𝒴|,𝐐|𝒳|×|𝒴|x𝒳±(𝐏X(x)y𝒴𝐏Y(y)𝐐x|y)2P}.\displaystyle\min_{{\bf Q}\geq 0}\!\left\{\rho\bullet{\bf Q}:\!\!\begin{array}% []{c}\boldsymbol{1}^{|\mathcal{X}|}\cdot{\bf Q}=\boldsymbol{1}^{|\mathcal{Y}|}% ,\ {\bf Q}\in\mathbb{R}^{|\mathcal{X}|\times|\mathcal{Y}|}\\ \sum_{x\in\mathcal{X}}\pm\left({\bf P}_{X}(x)-\sum_{y\in\mathcal{Y}}{\bf P}_{Y% }(y){\bf Q}_{x|y}\right)\\ \leq 2P\end{array}\!\!\right\}.roman_min start_POSTSUBSCRIPT bold_Q ≥ 0 end_POSTSUBSCRIPT { italic_ρろー ∙ bold_Q : start_ARRAY start_ROW start_CELL bold_1 start_POSTSUPERSCRIPT | caligraphic_X | end_POSTSUPERSCRIPT ⋅ bold_Q = bold_1 start_POSTSUPERSCRIPT | caligraphic_Y | end_POSTSUPERSCRIPT , bold_Q ∈ blackboard_R start_POSTSUPERSCRIPT | caligraphic_X | × | caligraphic_Y | end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL ∑ start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT ± ( bold_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x ) - ∑ start_POSTSUBSCRIPT italic_y ∈ caligraphic_Y end_POSTSUBSCRIPT bold_P start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ( italic_y ) bold_Q start_POSTSUBSCRIPT italic_x | italic_y end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL ≤ 2 italic_P end_CELL end_ROW end_ARRAY } . (17)

In (13), we have |𝒳|×|𝒴|𝒳𝒴|\mathcal{X}|\times|\mathcal{Y}|| caligraphic_X | × | caligraphic_Y | variables (the entries of 𝐐={q(x^|y)}𝐐𝑞conditional^𝑥𝑦{\bf Q}=\{q(\hat{x}|y)\}bold_Q = { italic_q ( over^ start_ARG italic_x end_ARG | italic_y ) }), and |𝒴|+2|𝒳|2𝒴superscript2𝒳2|\mathcal{Y}|+2^{|\mathcal{X}|}-2| caligraphic_Y | + 2 start_POSTSUPERSCRIPT | caligraphic_X | end_POSTSUPERSCRIPT - 2 constraints.

IV-A Total Variation as a Wasserstein distance

Let 𝐇={1δでるたx,x^}x,x^𝒳×𝒳𝐇subscript1subscript𝛿𝑥^𝑥𝑥^𝑥𝒳𝒳{\bf H}=\{1-\delta_{x,\hat{x}}\}_{x,\hat{x}\in\mathcal{X}\times\mathcal{X}}bold_H = { 1 - italic_δでるた start_POSTSUBSCRIPT italic_x , over^ start_ARG italic_x end_ARG end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_x , over^ start_ARG italic_x end_ARG ∈ caligraphic_X × caligraphic_X end_POSTSUBSCRIPT be the Hamming distance matrix, let 𝒫(𝒳)𝒫𝒳\mathscr{P}(\mathcal{X})script_P ( caligraphic_X ) be the set of probability measures on 𝒳𝒳\mathcal{X}caligraphic_X, and let Πぱい𝒫(𝒳×𝒳)Πぱい𝒫𝒳𝒳\Pi\in\mathscr{P}(\mathcal{X}\times\mathcal{X})roman_Πぱい ∈ script_P ( caligraphic_X × caligraphic_X ) be a coupling between 𝐏Xsubscript𝐏𝑋{\bf P}_{X}bold_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT and 𝐏X^subscript𝐏^𝑋{\bf P}_{\hat{X}}bold_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG end_POSTSUBSCRIPT (parameterized by a matrix 𝚷𝐱,𝐱^subscript𝚷𝐱^𝐱\bf\Pi_{x,\hat{x}}bold_Πぱい start_POSTSUBSCRIPT bold_x , over^ start_ARG bold_x end_ARG end_POSTSUBSCRIPT). It is well known [10] that taking 𝐇𝐇{\bf H}bold_H as a metric on 𝒳𝒳\mathcal{X}caligraphic_X, the TV distance coincides with the Wasserstein-1111 distance on 𝒫(𝒳)𝒫𝒳\mathscr{P}(\mathcal{X})script_P ( caligraphic_X ), namely

dTV(𝐏X,𝐏X^)subscript𝑑𝑇𝑉subscript𝐏𝑋subscript𝐏^𝑋\displaystyle d_{TV}({\bf P}_{X},{\bf P}_{\hat{X}})italic_d start_POSTSUBSCRIPT italic_T italic_V end_POSTSUBSCRIPT ( bold_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT , bold_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG end_POSTSUBSCRIPT ) =infΠぱいΠぱい[xx^]=W1,H(𝐏X,𝐏X^),absentsubscriptinfimumΠぱいΠぱいdelimited-[]𝑥^𝑥subscript𝑊1𝐻subscript𝐏𝑋subscript𝐏^𝑋\displaystyle=\inf_{\Pi}\Pi\left[x\neq\hat{x}\right]=W_{1,H}({\bf P}_{X},{\bf P% }_{\hat{X}}),= roman_inf start_POSTSUBSCRIPT roman_Πぱい end_POSTSUBSCRIPT roman_Πぱい [ italic_x ≠ over^ start_ARG italic_x end_ARG ] = italic_W start_POSTSUBSCRIPT 1 , italic_H end_POSTSUBSCRIPT ( bold_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT , bold_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG end_POSTSUBSCRIPT ) , (18)
W1,H(𝐏X,𝐏X^)subscript𝑊1𝐻subscript𝐏𝑋subscript𝐏^𝑋\displaystyle W_{1,H}({\bf P}_{X},{\bf P}_{\hat{X}})italic_W start_POSTSUBSCRIPT 1 , italic_H end_POSTSUBSCRIPT ( bold_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT , bold_P start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG end_POSTSUBSCRIPT ) inf𝚷𝚷𝐇=inf𝚷x,x^𝚷x,x^𝐇x,x^,absentsubscriptinfimum𝚷𝚷𝐇subscriptinfimum𝚷subscript𝑥^𝑥subscript𝚷𝑥^𝑥subscript𝐇𝑥^𝑥\displaystyle\triangleq\inf_{{\bf{\Pi}}}{\bf{\Pi}}\bullet{\bf H}=\inf_{{\bf{% \Pi}}}\sum_{x,\hat{x}}{\bf{\Pi}}_{x,\hat{x}}{\bf H}_{x,\hat{x}},≜ roman_inf start_POSTSUBSCRIPT bold_Πぱい end_POSTSUBSCRIPT bold_Πぱい ∙ bold_H = roman_inf start_POSTSUBSCRIPT bold_Πぱい end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_x , over^ start_ARG italic_x end_ARG end_POSTSUBSCRIPT bold_Πぱい start_POSTSUBSCRIPT italic_x , over^ start_ARG italic_x end_ARG end_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_x , over^ start_ARG italic_x end_ARG end_POSTSUBSCRIPT , (19)

where the minimum is attained [11, Lemma 3.4.1]. Wasserstein distances are convex metrics on 𝒫(𝒳)𝒫𝒳\mathcal{P}(\mathcal{X})caligraphic_P ( caligraphic_X ) [9]. Using (19), we can rewrite (13) as the linear problem

D(P)=𝐷𝑃absent\displaystyle D(P)=italic_D ( italic_P ) = (20)
min𝐐,𝚷,εいぷしろん0{ρろー𝐐:x^𝒳𝐏Y(y)𝐐x^|y=𝐏Y(y),y𝒴x^𝒳𝚷x,x^=𝐏X(x),x𝒳x𝒳𝚷x,x^=y𝒴𝐏Y(y)𝐐x^|y,x^𝒳𝚷𝐇+εいぷしろん=P,𝐐|𝒳|×|𝒴|,𝚷|𝒳|×|𝒳|}\displaystyle\min_{\scriptsize\begin{array}[]{c}{\bf Q},{\bf{\Pi}},\\ \varepsilon\geq 0\end{array}}\!\!\!\!\left\{\rho\bullet{\bf Q}:\!\!\begin{% array}[]{l}\sum_{\hat{x}\in\mathcal{X}}{\bf P}_{Y}(y){\bf Q}_{\hat{x}|y}={\bf P% }_{Y}(y),\forall y\in\mathcal{Y}\\ \sum_{\hat{x}\in\mathcal{X}}{\bf{\Pi}}_{x,\hat{x}}={\bf P}_{X}(x),\forall x\in% \mathcal{X}\\ \sum_{x\in\mathcal{X}}{\bf{\Pi}}_{x,\hat{x}}=\sum_{y\in\mathcal{Y}}{\bf P}_{Y}% (y){\bf Q}_{\hat{x}|y},\forall\hat{x}\in\mathcal{X}\\ {\bf{\Pi}}\bullet{\bf H}+\varepsilon=P,{\bf Q}\in\mathbb{R}^{|\mathcal{X}|% \times|\mathcal{Y}|},\\ {\bf{\Pi}}\in\mathbb{R}^{|\mathcal{X}|\times|\mathcal{X}|}\end{array}\!\!\!\!\right\}roman_min start_POSTSUBSCRIPT start_ARRAY start_ROW start_CELL bold_Q , bold_Πぱい , end_CELL end_ROW start_ROW start_CELL italic_εいぷしろん ≥ 0 end_CELL end_ROW end_ARRAY end_POSTSUBSCRIPT { italic_ρろー ∙ bold_Q : start_ARRAY start_ROW start_CELL ∑ start_POSTSUBSCRIPT over^ start_ARG italic_x end_ARG ∈ caligraphic_X end_POSTSUBSCRIPT bold_P start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ( italic_y ) bold_Q start_POSTSUBSCRIPT over^ start_ARG italic_x end_ARG | italic_y end_POSTSUBSCRIPT = bold_P start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ( italic_y ) , ∀ italic_y ∈ caligraphic_Y end_CELL end_ROW start_ROW start_CELL ∑ start_POSTSUBSCRIPT over^ start_ARG italic_x end_ARG ∈ caligraphic_X end_POSTSUBSCRIPT bold_Πぱい start_POSTSUBSCRIPT italic_x , over^ start_ARG italic_x end_ARG end_POSTSUBSCRIPT = bold_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x ) , ∀ italic_x ∈ caligraphic_X end_CELL end_ROW start_ROW start_CELL ∑ start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT bold_Πぱい start_POSTSUBSCRIPT italic_x , over^ start_ARG italic_x end_ARG end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_y ∈ caligraphic_Y end_POSTSUBSCRIPT bold_P start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ( italic_y ) bold_Q start_POSTSUBSCRIPT over^ start_ARG italic_x end_ARG | italic_y end_POSTSUBSCRIPT , ∀ over^ start_ARG italic_x end_ARG ∈ caligraphic_X end_CELL end_ROW start_ROW start_CELL bold_Πぱい ∙ bold_H + italic_εいぷしろん = italic_P , bold_Q ∈ blackboard_R start_POSTSUPERSCRIPT | caligraphic_X | × | caligraphic_Y | end_POSTSUPERSCRIPT , end_CELL end_ROW start_ROW start_CELL bold_Πぱい ∈ blackboard_R start_POSTSUPERSCRIPT | caligraphic_X | × | caligraphic_X | end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY } (28)

where εいぷしろん𝜀\varepsilonitalic_εいぷしろん is a slack variable. The problem (20) possesses |𝒳|(|𝒴|+|𝒳|)+1𝒳𝒴𝒳1|\mathcal{X}|(|\mathcal{Y}|+|\mathcal{X}|)+1| caligraphic_X | ( | caligraphic_Y | + | caligraphic_X | ) + 1 variables and only |𝒴|+2|𝒳|+1𝒴2𝒳1|\mathcal{Y}|+2|\mathcal{X}|+1| caligraphic_Y | + 2 | caligraphic_X | + 1 constraints, from which |𝒴|+2|𝒳|𝒴2𝒳|\mathcal{Y}|+2|\mathcal{X}|| caligraphic_Y | + 2 | caligraphic_X | are independent.

Interestingly, the form (20) allows to discuss a more general family of perceptual divergences -- Wasserstein-1111 distances (19) induced by arbitrary metrics H𝐻Hitalic_H on 𝒳𝒳\mathcal{X}caligraphic_X, which we will consider to be the case from this point on. We will assume w.l.o.g. that H𝐻Hitalic_H takes values in [0,1]01[0,1][ 0 , 1 ], hence the results of Propositions 2 and 3 hold trivially in this case.

IV-B The Dual Problem

Let the general linear programming problem [22]

(LP)minqzq,s.t.𝐀q=bandq0,formulae-sequenceLPsubscript𝑞superscript𝑧top𝑞st𝐀𝑞𝑏and𝑞0{\rm(LP)}\quad\min_{q}z^{\top}q,\,\mathrm{s.t.}\ {\bf{A}}q=b~{}\mathrm{and}~{}% q\geq 0,( roman_LP ) roman_min start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT italic_z start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_q , roman_s . roman_t . bold_A italic_q = italic_b roman_and italic_q ≥ 0 , (29)

where q,zn,bnc,𝐀nc×nformulae-sequence𝑞𝑧superscript𝑛formulae-sequence𝑏superscriptsubscript𝑛𝑐𝐀superscriptsubscript𝑛𝑐𝑛q,z\in\mathbb{R}^{n},b\in\mathbb{R}^{n_{c}},{\bf{A}}\in\mathbb{R}^{n_{c}\times n}italic_q , italic_z ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_b ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , bold_A ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT × italic_n end_POSTSUPERSCRIPT, and the inequality is elementwise. Its dual problem (DLP) is given by

(DLP)maxwwb,s.t.w𝐀z.formulae-sequenceDLPsubscript𝑤superscript𝑤top𝑏stsuperscript𝑤top𝐀superscript𝑧top{\rm(DLP)}\quad\max_{w}w^{\top}b,\,\mathrm{s.t.}\ w^{\top}{\bf{A}}\leq z^{\top}.( roman_DLP ) roman_max start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT italic_w start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_b , roman_s . roman_t . italic_w start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_A ≤ italic_z start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT . (30)

Finally, recall that Strong duality holds for feasible and bounded LP problems [22], namely, the problem (30) is feasible and

minqzq=maxwwb.subscript𝑞superscript𝑧top𝑞subscript𝑤superscript𝑤top𝑏\min_{q}z^{\top}q=\max_{w}w^{\top}b.roman_min start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT italic_z start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_q = roman_max start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT italic_w start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_b . (31)

We next derive the dual form of (20). For convenience, we split the variables in (30) into four groups: |𝒴|𝒴|\mathcal{Y}|| caligraphic_Y | variables {wy}y𝒴subscriptsubscript𝑤𝑦𝑦𝒴\{w_{y}\}_{y\in\mathcal{Y}}{ italic_w start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_y ∈ caligraphic_Y end_POSTSUBSCRIPT related to the stochasticity constraint on 𝐐𝐐{\bf Q}bold_Q for each symbol in 𝒴𝒴\mathcal{Y}caligraphic_Y, the two groups of |𝒳|𝒳{|\mathcal{X}|}| caligraphic_X | variables {rx}subscript𝑟𝑥\{r_{x}\}{ italic_r start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT } and {νにゅーx^}subscript𝜈^𝑥\{\nu_{\hat{x}}\}{ italic_νにゅー start_POSTSUBSCRIPT over^ start_ARG italic_x end_ARG end_POSTSUBSCRIPT } related to the constraints on the marginals of 𝚷𝚷{\bf{\Pi}}bold_Πぱい, and the variable l𝑙litalic_l related to the perception constraint 𝚷𝐇+εいぷしろん=P𝚷𝐇𝜀𝑃{\bf{\Pi}}\bullet{\bf H}+\varepsilon=Pbold_Πぱい ∙ bold_H + italic_εいぷしろん = italic_P. We denote

ρろーx^,yρろーx^,y𝐏Y(y)=𝔼[d(X,x^)|Y=y],subscriptsuperscript𝜌^𝑥𝑦subscript𝜌^𝑥𝑦subscript𝐏𝑌𝑦𝔼delimited-[]conditional𝑑𝑋^𝑥𝑌𝑦{\bf{\rho}}^{\prime}_{\hat{x},y}\triangleq\frac{{\bf{\rho}}_{\hat{x},y}}{{\bf P% }_{Y}({y})}=\mathbb{E}\left[{d(X,\hat{x})|Y=y}\right],italic_ρろー start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over^ start_ARG italic_x end_ARG , italic_y end_POSTSUBSCRIPT ≜ divide start_ARG italic_ρろー start_POSTSUBSCRIPT over^ start_ARG italic_x end_ARG , italic_y end_POSTSUBSCRIPT end_ARG start_ARG bold_P start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ( italic_y ) end_ARG = blackboard_E [ italic_d ( italic_X , over^ start_ARG italic_x end_ARG ) | italic_Y = italic_y ] , (32)

and explicitly write the dual problem of (20) as (see derivation in the Appendix),

maxw,r,νにゅー,l[y𝒴pywy+x𝒳pxrxlP]subscript𝑤𝑟𝜈𝑙subscript𝑦𝒴subscript𝑝𝑦subscript𝑤𝑦subscript𝑥𝒳subscript𝑝𝑥subscript𝑟𝑥𝑙𝑃\displaystyle\max_{w,r,\nu,l}\left[\sum_{y\in\mathcal{Y}}p_{y}w_{y}+\sum_{x\in% \mathcal{X}}p_{x}r_{x}-lP\right]roman_max start_POSTSUBSCRIPT italic_w , italic_r , italic_νにゅー , italic_l end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT italic_y ∈ caligraphic_Y end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT - italic_l italic_P ] (33)
s.t.{l0,wyρろーx^,yνにゅーx^,x^,y𝒳×𝒴rx𝐇x,x^l+νにゅーx^,x,x^𝒳×𝒳.\displaystyle\mathrm{s.t.}\,\left\{l\geq 0,\begin{array}[]{ll}w_{y}\leq\rho_{% \hat{x},y}^{\prime}-\nu_{\hat{x}},&\forall\hat{x},y\in\mathcal{X}\times% \mathcal{Y}\\ r_{x}\leq{\bf H}_{x,\hat{x}}l+\nu_{\hat{x}},&\forall x,\hat{x}\in\mathcal{X}% \times\mathcal{X}\end{array}\right..roman_s . roman_t . { italic_l ≥ 0 , start_ARRAY start_ROW start_CELL italic_w start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ≤ italic_ρろー start_POSTSUBSCRIPT over^ start_ARG italic_x end_ARG , italic_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_νにゅー start_POSTSUBSCRIPT over^ start_ARG italic_x end_ARG end_POSTSUBSCRIPT , end_CELL start_CELL ∀ over^ start_ARG italic_x end_ARG , italic_y ∈ caligraphic_X × caligraphic_Y end_CELL end_ROW start_ROW start_CELL italic_r start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ≤ bold_H start_POSTSUBSCRIPT italic_x , over^ start_ARG italic_x end_ARG end_POSTSUBSCRIPT italic_l + italic_νにゅー start_POSTSUBSCRIPT over^ start_ARG italic_x end_ARG end_POSTSUBSCRIPT , end_CELL start_CELL ∀ italic_x , over^ start_ARG italic_x end_ARG ∈ caligraphic_X × caligraphic_X end_CELL end_ROW end_ARRAY . (36)

From the strong duality property, we have that (33) is feasible; indeed, we can choose wy=minx^𝒳ρろーx^,ysubscript𝑤𝑦subscript^𝑥𝒳subscriptsuperscript𝜌^𝑥𝑦w_{y}=\min_{\hat{x}\in\mathcal{X}}\rho^{\prime}_{\hat{x},y}italic_w start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT = roman_min start_POSTSUBSCRIPT over^ start_ARG italic_x end_ARG ∈ caligraphic_X end_POSTSUBSCRIPT italic_ρろー start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over^ start_ARG italic_x end_ARG , italic_y end_POSTSUBSCRIPT and rx,νにゅーx^,l=0subscript𝑟𝑥subscript𝜈^𝑥𝑙0r_{x},\nu_{\hat{x}},l=0italic_r start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT , italic_νにゅー start_POSTSUBSCRIPT over^ start_ARG italic_x end_ARG end_POSTSUBSCRIPT , italic_l = 0. This choice of variables recovers the lower bound of Proposition 3, where D(P)D*𝐷𝑃superscript𝐷D(P)\geq D^{*}italic_D ( italic_P ) ≥ italic_D start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT for P[0,1]𝑃01P\in[0,1]italic_P ∈ [ 0 , 1 ].

Remark 4.

It is easy to see that in this case rank(𝐀)=|𝒴|+2|𝒳|normal-rank𝐀𝒴2𝒳{\rm rank}({\bf{A}})=|\mathcal{Y}|+2|\mathcal{X}|roman_rank ( bold_A ) = | caligraphic_Y | + 2 | caligraphic_X |, while one constraint is redundant, namely we can eliminate a linear constraint from the primal program (20) (a row of 𝐀𝐀{\bf{A}}bold_A) such that the row rank of the problem is full. Equivalently, we can set one of the variables rx,νにゅーx^subscript𝑟𝑥subscript𝜈normal-^𝑥r_{x},\nu_{\hat{x}}italic_r start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT , italic_νにゅー start_POSTSUBSCRIPT over^ start_ARG italic_x end_ARG end_POSTSUBSCRIPT to 00, and the dual feasible set (projected onto |𝒴|+2|𝒳|superscript𝒴2𝒳\mathbb{R}^{|\mathcal{Y}|+2|\mathcal{X}|}blackboard_R start_POSTSUPERSCRIPT | caligraphic_Y | + 2 | caligraphic_X | end_POSTSUPERSCRIPT) will not contain a line. This implies the existence of an extreme point in this dual set in |𝒴|+2|𝒳|superscript𝒴2𝒳\mathbb{R}^{|\mathcal{Y}|+2|\mathcal{X}|}blackboard_R start_POSTSUPERSCRIPT | caligraphic_Y | + 2 | caligraphic_X | end_POSTSUPERSCRIPT (see [22, Thm. 2.6]).

Given a value P𝑃Pitalic_P, D(P)𝐷𝑃D(P)italic_D ( italic_P ) can be calculated by numerically solving (20) (equivalently, (33)). However, finding a closed-form solution remains an open problem. In Section V-B we find such an expression for small alphabets. We also observe that the objective of (33) is linear in the perception index, hence the maximal value for a given P𝑃Pitalic_P is attained by some non-increasing linear function of the form p0+p1Psubscript𝑝0subscript𝑝1𝑃p_{0}+p_{1}Pitalic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_P. We further develop this insight below.

V Main results

V-A Piecewise linearity of DP functions

Refer to caption
Figure 2: Numerical illustration of Theorem 6 and Theorem 8 for |𝒳|=3,|𝒴|=5formulae-sequence𝒳3𝒴5|\mathcal{X}|=3,|\mathcal{Y}|=5| caligraphic_X | = 3 , | caligraphic_Y | = 5. In the (Middle) pane we present the set 𝒮2subscript𝒮2\mathcal{S}_{2}caligraphic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and its convex hull in the (p0,p1)subscript𝑝0subscript𝑝1(p_{0},p_{1})( italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT )-plane. The (Right) pane shows the optimal solutions obtained by numerically solving (33) for different values of P𝑃Pitalic_P. We can see that the solutions, corresponding to the linear segments of D(P)𝐷𝑃D(P)italic_D ( italic_P ) (Left pane), occur at extreme points of conv(𝒮2)convsubscript𝒮2\mathrm{conv}\left(\mathcal{S}_{2}\right)roman_conv ( caligraphic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ).

While the problem of finding an exact formula for D(P)𝐷𝑃D(P)italic_D ( italic_P ) is still open, here we exploit the properties of the dual problem (33) in order to show the general property that D(P)𝐷𝑃D(P)italic_D ( italic_P ) is piecewise linear in the perception index P𝑃Pitalic_P. Moreover, the breakpoints and slopes of this function are determined by the vertices of a convex set in 2superscript2\mathbb{R}^{2}blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. We will utilize the following property of LP problems.

Lemma 5.

([22, Thm. 2.8]) For a bounded LP problem, if there exists an extreme point in the feasible set, then the optimal solution is obtained at an extreme point.

This is true of course also for the dual problem. We now use this result to prove the following.

Theorem 6.

For P[0,)𝑃0P\in[0,\infty)italic_P ∈ [ 0 , ∞ ), the DP function (20) is a non-increasing piecewise linear function of P𝑃Pitalic_P with a non-decreasing slope. Furthermore, there exists P*[0,1]superscript𝑃01P^{*}\in[0,1]italic_P start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∈ [ 0 , 1 ] such that D(P)=D*,PP*formulae-sequence𝐷𝑃superscript𝐷𝑃superscript𝑃D(P)=D^{*},P\geq P^{*}italic_D ( italic_P ) = italic_D start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_P ≥ italic_P start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT.

The proof is based on analyzing the dual formulation (33). Due to strong duality (31) this matches the primal problem. The feasible set of (33) has a finite number of vertices, and this set is independent of the perceptual index P𝑃Pitalic_P. The solution to (33) must occur at one of these vertices. Thus, the interval [0,1]01[0,1][ 0 , 1 ] may be partitioned into sub-intervals, so that in each sub-interval the solution to (33) is at the same vertex. For a fixed choice of variables w,r,νにゅー𝑤𝑟𝜈w,r,\nuitalic_w , italic_r , italic_νにゅー and l𝑙litalic_l in (33), the D(P)𝐷𝑃D(P)italic_D ( italic_P ) function is linear with slope l𝑙-l- italic_l. Hence, the DP function is piecewise linear. Since DP functions are non-increasing and convex (see Thm. 1), the slope cannot decrease.

Proof of Theorem 6.

Let d=[0|𝒴|+2|𝒳|, 1]𝑑superscriptsuperscript0𝒴2𝒳1topd=[0^{|\mathcal{Y}|+2|\mathcal{X}|},\ 1]^{\top}italic_d = [ 0 start_POSTSUPERSCRIPT | caligraphic_Y | + 2 | caligraphic_X | end_POSTSUPERSCRIPT , 1 ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT and b0=[𝐏Y,𝐏X,0|𝒳|+1]subscript𝑏0superscriptsubscript𝐏𝑌topsuperscriptsubscript𝐏𝑋topsuperscript0𝒳1b_{0}=[{\bf P}_{Y}^{\top},{\bf P}_{X}^{\top},0^{|\mathcal{X}|+1}]italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = [ bold_P start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT , bold_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT , 0 start_POSTSUPERSCRIPT | caligraphic_X | + 1 end_POSTSUPERSCRIPT ], both in |𝒴|+2|𝒳|+1superscript𝒴2𝒳1\mathbb{R}^{|\mathcal{Y}|+2|\mathcal{X}|+1}blackboard_R start_POSTSUPERSCRIPT | caligraphic_Y | + 2 | caligraphic_X | + 1 end_POSTSUPERSCRIPT. We can write the objective (33) as

[w,r,νにゅー,l]b(P)maxw,r,νにゅー,l𝒮,superscript𝑤𝑟𝜈𝑙top𝑏𝑃subscript𝑤𝑟𝜈𝑙𝒮[w,r,\nu,l]^{\top}b(P)\rightarrow\max_{w,r,\nu,l\in\mathcal{S}},[ italic_w , italic_r , italic_νにゅー , italic_l ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_b ( italic_P ) → roman_max start_POSTSUBSCRIPT italic_w , italic_r , italic_νにゅー , italic_l ∈ caligraphic_S end_POSTSUBSCRIPT , (37)

where b(P)=b0dP𝑏𝑃subscript𝑏0𝑑𝑃b(P)=b_{0}-dPitalic_b ( italic_P ) = italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_d italic_P and 𝒮𝒮\mathcal{S}caligraphic_S is the set of feasible solutions to the dual problem (33) where we choose to set νにゅー|𝒳|0subscript𝜈𝒳0\nu_{|\mathcal{X}|}\equiv 0italic_νにゅー start_POSTSUBSCRIPT | caligraphic_X | end_POSTSUBSCRIPT ≡ 0 (see Remark 4). Let ext(𝒮)={pi=[wi,ri,νにゅーi,li]}ext𝒮superscript𝑝𝑖superscript𝑤𝑖superscript𝑟𝑖superscript𝜈𝑖superscript𝑙𝑖\mathrm{ext}\left(\mathcal{S}\right)=\{p^{i}=[w^{i},r^{i},\nu^{i},l^{i}]\}roman_ext ( caligraphic_S ) = { italic_p start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = [ italic_w start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_r start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_νにゅー start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_l start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] } denote the vertices of 𝒮𝒮\mathcal{S}caligraphic_S in |𝒴|+2|𝒳|superscript𝒴2𝒳\mathbb{R}^{|\mathcal{Y}|+2|\mathcal{X}|}blackboard_R start_POSTSUPERSCRIPT | caligraphic_Y | + 2 | caligraphic_X | end_POSTSUPERSCRIPT. Note that the set of vertices is non-empty, finite, and importantly, independent of P𝑃Pitalic_P. Lemma 5 above implies that the dual optimal value is obtained on this set. We now have from strong duality,

D(P)𝐷𝑃\displaystyle D(P)italic_D ( italic_P ) =maxipib(P)absentsubscript𝑖superscript𝑝𝑖𝑏𝑃\displaystyle=\max_{i}p^{i}\cdot b(P)= roman_max start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ⋅ italic_b ( italic_P )
=maxi[wi,ri,νにゅーi,li]b(P)=maxi[p0i+p1iP],\displaystyle=\max_{i}[w^{i},r^{i},\nu^{i},l^{i}]^{\top}b(P)=\max_{i}\left[p_{% 0}^{i}+p_{1}^{i}P\right],= roman_max start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ italic_w start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_r start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_νにゅー start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_l start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_b ( italic_P ) = roman_max start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT + italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_P ] , (38)

where we denote the projections

p0isuperscriptsubscript𝑝0𝑖\displaystyle p_{0}^{i}italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT =pib0,absentsuperscript𝑝𝑖subscript𝑏0\displaystyle=p^{i}\cdot b_{0},= italic_p start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ⋅ italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , (39)
p1isuperscriptsubscript𝑝1𝑖\displaystyle p_{1}^{i}italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT =pid.absentsuperscript𝑝𝑖𝑑\displaystyle=-p^{i}\cdot d.= - italic_p start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ⋅ italic_d . (40)

As a maximum of finite set of linear functions, (38) is a piecewise linear function. The non-decreasing slope property can be easily deduced from (38), or from the fact that DP functions are convex [6]. ∎

Corollary 7.

The breakpoints of the D(P)𝐷𝑃D(P)italic_D ( italic_P ) function lie within the set

𝒫={p0ip0jp1jp1i:pi,pj are vertices of the set of feasible solutions to the dual problem}.\mathcal{P}=\left\{{\frac{p_{0}^{i}-p_{0}^{j}}{p_{1}^{j}-p_{1}^{i}}:\,}\begin{% array}[]{c}p^{i},p^{j}\textrm{ are vertices of the set of }\\ \textrm{feasible solutions to the dual problem}\end{array}\right\}.caligraphic_P = { divide start_ARG italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT - italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_ARG start_ARG italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT - italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_ARG : start_ARRAY start_ROW start_CELL italic_p start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_p start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT are vertices of the set of end_CELL end_ROW start_ROW start_CELL feasible solutions to the dual problem end_CELL end_ROW end_ARRAY } . (41)

As we show next, not every vertex is a candidate for optimality in (38); optimal solutions must be obtained on a 2-D convex hull. Using the notations of Theorem 6 proof, denote the set 𝒮2={(p0i,p1i):p0i=pib0,p1i=pid,piext(𝒮)}2subscript𝒮2conditional-setsuperscriptsubscript𝑝0𝑖superscriptsubscript𝑝1𝑖formulae-sequencesuperscriptsubscript𝑝0𝑖superscript𝑝𝑖subscript𝑏0formulae-sequencesuperscriptsubscript𝑝1𝑖superscript𝑝𝑖𝑑superscript𝑝𝑖ext𝒮superscript2\mathcal{S}_{2}=\left\{\left(p_{0}^{i},p_{1}^{i}\right):p_{0}^{i}=p^{i}\cdot b% _{0},p_{1}^{i}=-p^{i}\cdot d,p^{i}\in\mathrm{ext}\left(\mathcal{S}\right)% \right\}\subseteq\mathbb{R}^{2}caligraphic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = { ( italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) : italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = italic_p start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ⋅ italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = - italic_p start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ⋅ italic_d , italic_p start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ roman_ext ( caligraphic_S ) } ⊆ blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT which represents the (finite) set of linear curves {p0i+p1iP,piext(𝒮)}superscriptsubscript𝑝0𝑖superscriptsubscript𝑝1𝑖𝑃superscript𝑝𝑖ext𝒮\left\{p_{0}^{i}+p_{1}^{i}P,p^{i}\in\mathrm{ext}\left(\mathcal{S}\right)\right\}{ italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT + italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_P , italic_p start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ roman_ext ( caligraphic_S ) } on the 2222-dimensional plane by the projections of their corresponding vertices (39)-(40).

Theorem 8.

For any P0𝑃0P\geq 0italic_P ≥ 0, there exists a vertex of 𝒮𝒮\mathcal{S}caligraphic_S such that pkargmaxpext(𝒮)pb(P)superscript𝑝𝑘subscriptnormal-argmax𝑝normal-ext𝒮normal-⋅𝑝𝑏𝑃p^{k}\in\mathop{\mathrm{argmax}}_{p\in\mathrm{ext}\left(\mathcal{S}\right)}p% \cdot b(P)italic_p start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∈ roman_argmax start_POSTSUBSCRIPT italic_p ∈ roman_ext ( caligraphic_S ) end_POSTSUBSCRIPT italic_p ⋅ italic_b ( italic_P ), and (p0k,p1k)superscriptsubscript𝑝0𝑘superscriptsubscript𝑝1𝑘\left(p_{0}^{k},p_{1}^{k}\right)( italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) is an extreme point of conv(𝒮2)normal-convsubscript𝒮2\mathrm{conv}\left(\mathcal{S}_{2}\right)roman_conv ( caligraphic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ).

Proof.

Let {(p~0k,p~1k)}k=1M𝒮2superscriptsubscriptsuperscriptsubscript~𝑝0𝑘superscriptsubscript~𝑝1𝑘𝑘1𝑀subscript𝒮2\left\{\left(\widetilde{p}_{0}^{k},\widetilde{p}_{1}^{k}\right)\right\}_{k=1}^% {M}\subseteq\mathcal{S}_{2}{ ( over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) } start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT ⊆ caligraphic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT be the set of extremals of conv(𝒮2)convsubscript𝒮2\mathrm{conv}\left(\mathcal{S}_{2}\right)roman_conv ( caligraphic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ). The set 𝒮2subscript𝒮2\mathcal{S}_{2}caligraphic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is finite, hence its convex hull is bounded. We can write any point in 𝒮2subscript𝒮2\mathcal{S}_{2}caligraphic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT as a convex combination (p0i,p1i)=k=1Mαあるふぁik(p~0k,p~1k)superscriptsubscript𝑝0𝑖superscriptsubscript𝑝1𝑖superscriptsubscript𝑘1𝑀subscript𝛼𝑖𝑘superscriptsubscript~𝑝0𝑘superscriptsubscript~𝑝1𝑘\left(p_{0}^{i},p_{1}^{i}\right)=\sum_{k=1}^{M}\alpha_{ik}\left(\widetilde{p}_% {0}^{k},\widetilde{p}_{1}^{k}\right)( italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT italic_αあるふぁ start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT ( over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ), thus we have

pib(P)superscript𝑝𝑖𝑏𝑃\displaystyle p^{i}\cdot b(P)italic_p start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ⋅ italic_b ( italic_P ) =p0i+p1iP=k=1Mαあるふぁik(p~0k+p~1kP)absentsuperscriptsubscript𝑝0𝑖superscriptsubscript𝑝1𝑖𝑃superscriptsubscript𝑘1𝑀subscript𝛼𝑖𝑘superscriptsubscript~𝑝0𝑘superscriptsubscript~𝑝1𝑘𝑃\displaystyle=p_{0}^{i}+p_{1}^{i}P=\sum_{k=1}^{M}\alpha_{ik}\left(\widetilde{p% }_{0}^{k}+\widetilde{p}_{1}^{k}P\right)= italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT + italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_P = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT italic_αあるふぁ start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT ( over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT + over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_P )
maxk(p~0k+p~1kP)=maxkpkb(P).absentsubscript𝑘superscriptsubscript~𝑝0𝑘superscriptsubscript~𝑝1𝑘𝑃subscript𝑘superscript𝑝𝑘𝑏𝑃\displaystyle\leq\max_{k}\left(\widetilde{p}_{0}^{k}+\widetilde{p}_{1}^{k}P% \right)=\max_{k}{p}^{k}\cdot b(P).≤ roman_max start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT + over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_P ) = roman_max start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_p start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ⋅ italic_b ( italic_P ) . (42)

The results of Theorems 6 and 8 are illustrated in Fig. 2 for alphabet sizes |𝒳|=3𝒳3|\mathcal{X}|=3| caligraphic_X | = 3 and |𝒴|=5𝒴5|\mathcal{Y}|=5| caligraphic_Y | = 5, where we considered the TV distance and distortion given by a random matrix 𝐃𝐃{\bf D}bold_D. We numerically solve (33) for different values of P𝑃Pitalic_P along the DP tradeoff and project the optimal solutions according to (39)-(40). We also calculate the extreme points of the feasible set to obtain 𝒮2subscript𝒮2\mathcal{S}_{2}caligraphic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (for a discussion about finding the vertices of a feasible set, we refer the reader to [22, Sec. 2.2]). It can be seen that optimal solutions to (33) correspond to the linear segments of the DP function, and are obtained on extreme points of conv(𝒮2)convsubscript𝒮2\mathrm{conv}\left(\mathcal{S}_{2}\right)roman_conv ( caligraphic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) in the (p0,p1)subscript𝑝0subscript𝑝1(p_{0},p_{1})( italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT )-plane.

V-B Full characterization of channels with binary sources

We next focus on the case of binary sources, where 𝒳={x1,x2}𝒳subscript𝑥1subscript𝑥2\mathcal{X}=\{x_{1},x_{2}\}caligraphic_X = { italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT } with probabilities px1,px2subscript𝑝subscript𝑥1subscript𝑝subscript𝑥2p_{x_{1}},p_{x_{2}}italic_p start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT, respectively, and 𝒴𝒴\mathcal{Y}caligraphic_Y is of arbitrary size nysubscript𝑛𝑦n_{y}italic_n start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT. It suffices to analyze the TV distance (4) as the perceptual index, since every metric defining the Wasserstein-1111 distance is proportional to the Hamming distance in the binary case. The distortion matrix is arbitrary, yielding the matrix ρろーsuperscript𝜌\rho^{\prime}italic_ρろー start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT defined in (32). Denote uy=12(ρろーx^1yρろーx^2y)subscript𝑢𝑦12subscriptsuperscript𝜌subscript^𝑥1𝑦subscriptsuperscript𝜌subscript^𝑥2𝑦u_{y}=\frac{1}{2}(\rho^{\prime}_{\hat{x}_{1}y}-\rho^{\prime}_{\hat{x}_{2}y})italic_u start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_ρろー start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT - italic_ρろー start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ) which is half the cost of reconstructing y𝑦yitalic_y as x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT over reconstructing as x2subscript𝑥2x_{2}italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and assume w.l.o.g. that uy1uy2uynsubscript𝑢subscript𝑦1subscript𝑢subscript𝑦2subscript𝑢subscript𝑦𝑛u_{y_{1}}\leq u_{y_{2}}\leq\ldots\leq u_{y_{n}}italic_u start_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ≤ italic_u start_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ≤ … ≤ italic_u start_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT. We define PY(u)=Pr{uYu}=y:uyu𝐏Y(y)superscriptsubscript𝑃𝑌𝑢Prsubscript𝑢𝑌𝑢subscript:𝑦subscript𝑢𝑦𝑢subscript𝐏𝑌𝑦P_{Y}^{-}(u)={\rm Pr}\{u_{Y}\leq u\}=\sum_{y:u_{y}\leq u}{\bf P}_{Y}(y)italic_P start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ( italic_u ) = roman_Pr { italic_u start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ≤ italic_u } = ∑ start_POSTSUBSCRIPT italic_y : italic_u start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ≤ italic_u end_POSTSUBSCRIPT bold_P start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ( italic_y ), which is right-continuous with left limit PY(u)=Pr{uY<u}=y:uy<u𝐏Y(y)superscriptsubscript𝑃𝑌superscript𝑢Prsubscript𝑢𝑌𝑢subscript:𝑦subscript𝑢𝑦𝑢subscript𝐏𝑌𝑦P_{Y}^{-}(u^{-})={\rm Pr}\{u_{Y}<u\}=\sum_{y:u_{y}<u}{\bf P}_{Y}(y)italic_P start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ( italic_u start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) = roman_Pr { italic_u start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT < italic_u } = ∑ start_POSTSUBSCRIPT italic_y : italic_u start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT < italic_u end_POSTSUBSCRIPT bold_P start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ( italic_y ). We further denote the symbols yi*subscriptsuperscript𝑦𝑖y^{*}_{i}italic_y start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT whose uysubscript𝑢𝑦u_{y}italic_u start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT is non-zero, namely

0=u0<u1=uy1*uM+=uyM+*,0subscript𝑢0subscript𝑢1subscript𝑢subscriptsuperscript𝑦1subscript𝑢superscript𝑀subscript𝑢subscriptsuperscript𝑦superscript𝑀\displaystyle 0=u_{0}<u_{1}=u_{y^{*}_{1}}\leq\ldots\leq u_{M^{+}}=u_{y^{*}_{M^% {+}}},0 = italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT < italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_u start_POSTSUBSCRIPT italic_y start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ≤ … ≤ italic_u start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = italic_u start_POSTSUBSCRIPT italic_y start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT , (43)
uM=uyM*u1=uy1*<0=u0.subscript𝑢superscript𝑀subscript𝑢subscriptsuperscript𝑦superscript𝑀subscript𝑢1subscript𝑢subscriptsuperscript𝑦10subscript𝑢0\displaystyle u_{-{M^{-}}}=u_{y^{*}_{-M^{-}}}\leq\ldots\leq u_{-1}=u_{y^{*}_{-% 1}}<0=u_{0}.italic_u start_POSTSUBSCRIPT - italic_M start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = italic_u start_POSTSUBSCRIPT italic_y start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT - italic_M start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ≤ … ≤ italic_u start_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT = italic_u start_POSTSUBSCRIPT italic_y start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT < 0 = italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT . (44)
Theorem 9.

Assume that px1PY(0)subscript𝑝subscript𝑥1superscriptsubscript𝑃𝑌0p_{x_{1}}\geq P_{Y}^{-}(0)italic_p start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ≥ italic_P start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ( 0 ), and let I=max{i:px1PY(ui)}𝐼normal-:𝑖subscript𝑝subscript𝑥1superscriptsubscript𝑃𝑌subscript𝑢𝑖I=\max\{i\colon p_{x_{1}}\geq P_{Y}^{-}(u_{i})\}italic_I = roman_max { italic_i : italic_p start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ≥ italic_P start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ( italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) }. Then, the DP function D(P)𝐷𝑃D(P)italic_D ( italic_P ) is piecewise linear with breakpoints {Pi*}i=0Isuperscriptsubscriptsubscriptsuperscript𝑃𝑖𝑖0𝐼\{P^{*}_{i}\}_{i=0}^{I}{ italic_P start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_I end_POSTSUPERSCRIPT given by

Pi*=px1PY(ui)subscriptsuperscript𝑃𝑖subscript𝑝subscript𝑥1superscriptsubscript𝑃𝑌subscript𝑢𝑖P^{*}_{i}=p_{x_{1}}-P_{Y}^{-}(u_{i})italic_P start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_p start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_P start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ( italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) (45)

where, specifically, P0*=px1PY(0)=P*superscriptsubscript𝑃0subscript𝑝subscript𝑥1superscriptsubscript𝑃𝑌0superscript𝑃P_{0}^{*}=p_{x_{1}}-P_{Y}^{-}(0)=P^{*}italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = italic_p start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_P start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ( 0 ) = italic_P start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT. The DP function is then given by

D(P)={D*,PP0*D(Pi1*)+2ui(Pi1*P),Pi*PPi1*D(PI*)+2uI+1(PI*P),0PPI*.𝐷𝑃casessuperscript𝐷𝑃superscriptsubscript𝑃0𝐷superscriptsubscript𝑃𝑖12subscript𝑢𝑖superscriptsubscript𝑃𝑖1𝑃superscriptsubscript𝑃𝑖𝑃superscriptsubscript𝑃𝑖1𝐷superscriptsubscript𝑃𝐼2subscript𝑢𝐼1superscriptsubscript𝑃𝐼𝑃0𝑃superscriptsubscript𝑃𝐼D(P)=\begin{cases}D^{*},&P\geq P_{0}^{*}\\ D(P_{i-1}^{*})+2u_{i}\left(P_{i-1}^{*}-P\right),&P_{i}^{*}\leq P\leq P_{i-1}^{% *}\\ D(P_{I}^{*})+2u_{I+1}\left(P_{I}^{*}-P\right),&0\leq P\leq P_{I}^{*}\end{cases}.italic_D ( italic_P ) = { start_ROW start_CELL italic_D start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , end_CELL start_CELL italic_P ≥ italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_D ( italic_P start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) + 2 italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_P ) , end_CELL start_CELL italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ≤ italic_P ≤ italic_P start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_D ( italic_P start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) + 2 italic_u start_POSTSUBSCRIPT italic_I + 1 end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_P ) , end_CELL start_CELL 0 ≤ italic_P ≤ italic_P start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_CELL end_ROW . (46)

If PY(0)px1superscriptsubscript𝑃𝑌superscript0subscript𝑝subscript𝑥1P_{Y}^{-}(0^{-})\geq p_{x_{1}}italic_P start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ( 0 start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) ≥ italic_p start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT, then similarly P0*=PY(0)px1subscriptsuperscript𝑃0superscriptsubscript𝑃𝑌superscript0subscript𝑝subscript𝑥1P^{*}_{0}=P_{Y}^{-}(0^{-})-p_{x_{1}}italic_P start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_P start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ( 0 start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) - italic_p start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT, and Pi*=PY(ui1)px1subscriptsuperscript𝑃𝑖superscriptsubscript𝑃𝑌subscript𝑢𝑖1subscript𝑝subscript𝑥1P^{*}_{i}=P_{Y}^{-}(u_{-i-1})-p_{x_{1}}italic_P start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_P start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ( italic_u start_POSTSUBSCRIPT - italic_i - 1 end_POSTSUBSCRIPT ) - italic_p start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT, while it is non-negative, and D(P)𝐷𝑃D(P)italic_D ( italic_P ) is determined analogously. In the case PY(0)px1PY(0)superscriptsubscript𝑃𝑌0subscript𝑝subscript𝑥1superscriptsubscript𝑃𝑌superscript0P_{Y}^{-}(0)\geq p_{x_{1}}\geq P_{Y}^{-}(0^{-})italic_P start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ( 0 ) ≥ italic_p start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ≥ italic_P start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ( 0 start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ), P*=0superscript𝑃0P^{*}=0italic_P start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = 0 and D(P)D*𝐷𝑃superscript𝐷D(P)\equiv D^{*}italic_D ( italic_P ) ≡ italic_D start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT for all P0𝑃0P\geq 0italic_P ≥ 0.

Remark 10.

If ui=ui1subscript𝑢𝑖subscript𝑢𝑖1u_{i}=u_{i-1}italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_u start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT then Pi*=Pi1*subscriptsuperscript𝑃𝑖subscriptsuperscript𝑃𝑖1P^{*}_{i}=P^{*}_{i-1}italic_P start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_P start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT and this yields a ‘degenerate’ interval. If ui>ui1subscript𝑢𝑖subscript𝑢𝑖1u_{i}>u_{i-1}italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > italic_u start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT, then (45) can alternatively be written more simply as Pi*=Pi1*𝐏Y(yi*)subscriptsuperscript𝑃𝑖subscriptsuperscript𝑃𝑖1subscript𝐏𝑌subscriptsuperscript𝑦𝑖P^{*}_{i}=P^{*}_{i-1}-{\bf P}_{Y}(y^{*}_{i})italic_P start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_P start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - bold_P start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ( italic_y start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ).

The results of Theorem 9 are illustrated in Fig. 1. These results reassure the intuition that channel outputs in 𝒴𝒴\mathcal{Y}caligraphic_Y should be mapped to symbols in {x1,x2}subscript𝑥1subscript𝑥2\{x_{1},x_{2}\}{ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT } in a greedy fashion; At the point P=1𝑃1P=1italic_P = 1, each y𝑦yitalic_y is reconstructed with a minimal penalty, without any perceptual constraints (as in Proposition 3). This can be done by setting, e.g., q(x^1|y)=δでるたuy0𝑞conditionalsubscript^𝑥1𝑦subscript𝛿subscript𝑢𝑦0q(\hat{x}_{1}|y)=\delta_{u_{y}\leq 0}italic_q ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | italic_y ) = italic_δでるた start_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ≤ 0 end_POSTSUBSCRIPT . At the point P=P*𝑃superscript𝑃P=P^{*}italic_P = italic_P start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT, y𝑦yitalic_y’s are still reconstructed optimally, but now under a perception constraint. This can be obtained by rearranging the mapping of symbols whose uy=0subscript𝑢𝑦0u_{y}=0italic_u start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT = 0, which yields no extra cost in distortion. Now, suppose that x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is not ‘fully allocated’, that is, px1PY(0)subscript𝑝subscript𝑥1subscriptsuperscript𝑃𝑌0p_{x_{1}}\geq P^{-}_{Y}(0)italic_p start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ≥ italic_P start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ( 0 ). As the perception constraint becomes more restrictive (lower P𝑃Pitalic_P), the estimator will seek for the minimal cost symbols y𝒴𝑦𝒴y\in\mathcal{Y}italic_y ∈ caligraphic_Y that are mapped to x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT with probability less than 1111, and increase this probability. For a small change of ΔでるたPΔでるた𝑃\Delta Proman_Δでるた italic_P, the cost in distortion is 2uyΔでるたP2subscript𝑢𝑦Δでるた𝑃2u_{y}\Delta P2 italic_u start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT roman_Δでるた italic_P. This is done until P=0𝑃0P=0italic_P = 0 is met, namely px^1=px1subscript𝑝subscript^𝑥1subscript𝑝subscript𝑥1p_{\hat{x}_{1}}=p_{x_{1}}italic_p start_POSTSUBSCRIPT over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_p start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT.

Corollary 11.

At the breakpoints where Pi*0subscriptsuperscript𝑃𝑖0P^{*}_{i}\neq 0italic_P start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≠ 0, an optimal estimator is given by a deterministic rule QPi*subscript𝑄subscriptsuperscript𝑃𝑖Q_{P^{*}_{i}}italic_Q start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT (for px1PY(0)subscript𝑝subscript𝑥1superscriptsubscript𝑃𝑌0p_{x_{1}}\geq P_{Y}^{-}(0)italic_p start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ≥ italic_P start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ( 0 ), given by QPi*={q(x1|y)=δでるたuyui}subscript𝑄subscriptsuperscript𝑃𝑖𝑞conditionalsubscript𝑥1𝑦subscript𝛿subscript𝑢𝑦subscript𝑢𝑖Q_{P^{*}_{i}}=\left\{q(x_{1}|y)=\delta_{u_{y}\leq u_{i}}\right\}italic_Q start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT = { italic_q ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | italic_y ) = italic_δでるた start_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ≤ italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT }). Interestingly, at P[Pi*,Pi1*]𝑃superscriptsubscript𝑃𝑖superscriptsubscript𝑃𝑖1P\in\left[P_{i}^{*},P_{i-1}^{*}\right]italic_P ∈ [ italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_P start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ], the estimator is given by the convex combination of estimators at the interval edges, QP=αあるふぁQPi1*+(1αあるふぁ)QPi*subscript𝑄𝑃𝛼subscript𝑄superscriptsubscript𝑃𝑖11𝛼subscript𝑄superscriptsubscript𝑃𝑖Q_{P}=\alpha Q_{P_{i-1}^{*}}+(1-\alpha)Q_{P_{i}^{*}}italic_Q start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT = italic_αあるふぁ italic_Q start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT + ( 1 - italic_αあるふぁ ) italic_Q start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, with αあるふぁ=PPi*Pi1*Pi*𝛼𝑃superscriptsubscript𝑃𝑖superscriptsubscript𝑃𝑖1superscriptsubscript𝑃𝑖\alpha=\frac{P-P_{i}^{*}}{P_{i-1}^{*}-P_{i}^{*}}italic_αあるふぁ = divide start_ARG italic_P - italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG start_ARG italic_P start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG.

This result implies that in order to construct an estimator for any point along the tradeoff at test time, without any additional calculations, it is sufficient to calculate 𝒪(|𝒴|)𝒪𝒴\mathcal{O}(|\mathcal{Y}|)caligraphic_O ( | caligraphic_Y | ) estimators beforehand, one at each breakpoint (and at P=0𝑃0P=0italic_P = 0).

Acknowledgements

The research of NW was partially supported by the Israel Science Foundation (ISF), grant no. 1782/22. The work of RM was partially supported by the Skillman chair in biomedical sciences and by the Ollendorff Minerva Center, ECE Faculty, Technion.

References

  • [1] T. Adrai, G. Ohayon, T. Michaeli, and M. Elad, “Deep optimal transport: A practical algorithm for photo-realistic image restoration,” arXiv preprint arXiv:2306.02342, 2023.
  • [2] X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, Y. Qiao, and C. Change Loy, “ESRGAN: Enhanced super-resolution generative adversarial networks,” in Proceedings of the European Conference on Computer Vision (ECCV) Workshops, 2018, pp. 0–0.
  • [3] B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee, “Enhanced deep residual networks for single image super-resolution,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 136–144.
  • [4] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang et al., “Photo-realistic single image super-resolution using a generative adversarial network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4681–4690.
  • [5] F. Nielsen, “Hypothesis testing, information divergence and computational geometry,” in International Conference on Geometric Science of Information.   Springer, 2013, pp. 241–248.
  • [6] Y. Blau and T. Michaeli, “The perception-distortion tradeoff,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 6228–6237.
  • [7] D. Freirich, T. Michaeli, and R. Meir, “A theory of the distortion-perception tradeoff in wasserstein space,” Advances in Neural Information Processing Systems, vol. 34, pp. 25 661–25 672, 2021.
  • [8] J. Qian, G. Zhang, J. Chen, and A. Khisti, “A rate-distortion-perception theory for binary sources,” in International Zurich Seminar on Information and Communication (IZS 2022). Proceedings.   ETH Zurich, 2022, pp. 34–38.
  • [9] L. Ambrosio, N. Gigli, and G. Savaré, Gradient flows: in metric spaces and in the space of probability measures.   Springer Science & Business Media, 2008.
  • [10] R. Van Handel, “Probability in high dimension,” Lecture Notes (Princeton University), 2014.
  • [11] M. Raginsky, I. Sason et al., “Concentration of measure inequalities in information theory, communications, and coding,” Foundations and Trends® in Communications and Information Theory, vol. 10, no. 1-2, pp. 1–246, 2013.
  • [12] Y. Bai, X. Wu, and A. Özgür, “Information constrained optimal transport: From talagrand, to marton, to cover,” IEEE Transactions on Information Theory, vol. 69, no. 4, pp. 2059–2073, 2023.
  • [13] N. Saldi, T. Linder, and S. Yüksel, “Randomized quantization and source coding with constrained output distribution,” IEEE Transactions on Information Theory, vol. 61, no. 1, pp. 91–106, 2015.
  • [14] ——, “Output constrained lossy source coding with limited common randomness,” IEEE Transactions on Information Theory, vol. 61, no. 9, pp. 4984–4998, 2015.
  • [15] D. Freirich, T. Michaeli, and R. Meir, “Perceptual kalman filters: Online state estimation under a perfect perceptual-quality constraint,” arXiv preprint arXiv:2306.02400, 2023.
  • [16] Y. Blau and T. Michaeli, “Rethinking lossy compression: The rate-distortion-perception tradeoff,” in International Conference on Machine Learning.   PMLR, 2019, pp. 675–685.
  • [17] S. Salehkalaibar, B. Phan, A. Khisti, and W. Yu, “Rate-distortion-perception tradeoff based on the conditional perception measure,” in 2023 Biennial Symposium on Communications (BSC).   IEEE, 2023, pp. 31–37.
  • [18] Z. Yan, F. Wen, and P. Liu, “Optimally controllable perceptual lossy compression,” arXiv preprint arXiv:2206.10082, 2022.
  • [19] L. Theis and A. B. Wagner, “A coding theorem for the rate-distortion-perception function,” arXiv preprint arXiv:2104.13662, 2021.
  • [20] A. B. Wagner, “The rate-distortion-perception tradeoff: The role of common randomness,” arXiv preprint arXiv:2202.04147, 2022.
  • [21] J. Chen, L. Yu, J. Wang, W. Shi, Y. Ge, and W. Tong, “On the rate-distortion-perception function,” arXiv preprint arXiv:2204.06049, 2022.
  • [22] D. Bertsimas and J. N. Tsitsiklis, Introduction to linear optimization.   Athena Scientific Belmont, MA, 1997, vol. 6.
  • [23] R. J. Vanderbei et al., Linear programming.   Springer, 2020.

In this Appendix, we start with an extended review of Linear Programs and their Dual Problems. We derive the dual forms of both formulations (13) and (20) (Eq. (33)). We then provide a detailed proof for Theorem 9 in the text.

-C The linear optimization problem and strong duality

Let the general Linear Programming (LP) problem [22]

{ρろー𝐐min𝐐s.t.ai𝐐=bi,iM1.si𝐐bi,iM2𝐐0cases𝜌𝐐absentsubscript𝐐formulae-sequencestformulae-sequencesubscript𝑎𝑖𝐐subscript𝑏𝑖𝑖subscript𝑀1𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒formulae-sequencesubscript𝑠𝑖𝐐subscript𝑏𝑖𝑖subscript𝑀2𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒𝐐0\begin{cases}\rho\bullet{\bf Q}&\rightarrow\min_{\bf Q}\\ \mathrm{s.t.}&a_{i}\bullet{\bf Q}=b_{i},\,i\in M_{1}~{}.\\ &s_{i}\bullet{\bf Q}\leq b_{i},\,i\in M_{2}\\ &{\bf Q}\geq 0\end{cases}{ start_ROW start_CELL italic_ρろー ∙ bold_Q end_CELL start_CELL → roman_min start_POSTSUBSCRIPT bold_Q end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL roman_s . roman_t . end_CELL start_CELL italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∙ bold_Q = italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_i ∈ italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT . end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∙ bold_Q ≤ italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_i ∈ italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL bold_Q ≥ 0 end_CELL end_ROW (47)

𝐐,ρろー,ai𝐐𝜌subscript𝑎𝑖{\bf Q},\rho,a_{i}bold_Q , italic_ρろー , italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are real |𝒳|×|𝒴|𝒳𝒴|\mathcal{X}|\times|\mathcal{Y}|| caligraphic_X | × | caligraphic_Y | matrices, b={bi}iMnc𝑏subscriptsubscript𝑏𝑖𝑖𝑀superscriptsubscript𝑛𝑐b=\{b_{i}\}_{i\in M}\in\mathbb{R}^{n_{c}}italic_b = { italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i ∈ italic_M end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. The Dual Linear Programming problem (DLP) is given by

{wbmaxws.t.wi0,iM2wi,iM1iM1wi{ai}x,y+iM2wi{si}x,yρろーx,y,x,y𝒳×𝒴.casessuperscript𝑤top𝑏absentsubscript𝑤formulae-sequencestformulae-sequencesubscript𝑤𝑖0𝑖subscript𝑀2𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒formulae-sequencesubscript𝑤𝑖𝑖subscript𝑀1𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒subscript𝑖subscript𝑀1subscript𝑤𝑖subscriptsubscript𝑎𝑖𝑥𝑦subscript𝑖subscript𝑀2subscript𝑤𝑖subscriptsubscript𝑠𝑖𝑥𝑦subscript𝜌𝑥𝑦𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒for-all𝑥𝑦𝒳𝒴\begin{cases}w^{\top}b&\rightarrow\max_{w}\\ \mathrm{s.t.}&w_{i}\leq 0,\,i\in M_{2}\\ &w_{i}\in\mathbb{R},\,i\in M_{1}\\ &\sum_{i\in M_{1}}w_{i}\{a_{i}\}_{x,y}+\sum_{i\in M_{2}}w_{i}\{s_{i}\}_{x,y}% \leq\rho_{x,y},\\ &\forall x,y\in\mathcal{X}\times\mathcal{Y}\end{cases}.{ start_ROW start_CELL italic_w start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_b end_CELL start_CELL → roman_max start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL roman_s . roman_t . end_CELL start_CELL italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ 0 , italic_i ∈ italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R , italic_i ∈ italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ∑ start_POSTSUBSCRIPT italic_i ∈ italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT { italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_x , italic_y end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_i ∈ italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT { italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_x , italic_y end_POSTSUBSCRIPT ≤ italic_ρろー start_POSTSUBSCRIPT italic_x , italic_y end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ∀ italic_x , italic_y ∈ caligraphic_X × caligraphic_Y end_CELL end_ROW . (48)

Recall that by slight abuse of notation, here, similarly to the main text, we use x=xαあるふぁ𝑥subscript𝑥𝛼x=x_{\alpha}italic_x = italic_x start_POSTSUBSCRIPT italic_αあるふぁ end_POSTSUBSCRIPT and y=yβべーた𝑦subscript𝑦𝛽y=y_{\beta}italic_y = italic_y start_POSTSUBSCRIPT italic_βべーた end_POSTSUBSCRIPT to denote their indices αあるふぁ𝛼\alphaitalic_αあるふぁ and βべーた𝛽\betaitalic_βべーた, respectively.

Dual problems are useful for establishing lower bounds on the optimal value, due to the property of weak duality, which assures that every feasible value for the Primal problem is greater than or equal to every feasible value of its Dual, yielding (in case where both problems are feasible)

min𝐐ρろー𝐐maxwwb.subscript𝐐𝜌𝐐subscript𝑤superscript𝑤top𝑏\min_{{\bf Q}}\rho\bullet{\bf Q}\geq\max_{w}w^{\top}b.roman_min start_POSTSUBSCRIPT bold_Q end_POSTSUBSCRIPT italic_ρろー ∙ bold_Q ≥ roman_max start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT italic_w start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_b . (49)

For feasible, bounded LP problems we further possess a strong duality, namely the problem (48) is feasible and

min𝐐ρろー𝐐=maxwwb.subscript𝐐𝜌𝐐subscript𝑤superscript𝑤top𝑏\min_{{\bf Q}}\rho\bullet{\bf Q}=\max_{w}w^{\top}b.roman_min start_POSTSUBSCRIPT bold_Q end_POSTSUBSCRIPT italic_ρろー ∙ bold_Q = roman_max start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT italic_w start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_b . (50)

-D A dual form for the TV distance setting

For our future analysis, here it is convenient to derive the dual of the form (13) to D(P)𝐷𝑃D(P)italic_D ( italic_P ). In this formulation, we have ρろー=𝐃𝐏X,Y𝜌superscript𝐃topsubscript𝐏𝑋𝑌\rho={\bf D}^{\top}{\bf P}_{X,Y}italic_ρろー = bold_D start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_P start_POSTSUBSCRIPT italic_X , italic_Y end_POSTSUBSCRIPT, and we can write the parameter b𝑏bitalic_b in (47) as

b=b(P)superscript𝑏top𝑏superscript𝑃top\displaystyle b^{\top}=b(P)^{\top}italic_b start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT = italic_b ( italic_P ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT (51)
=[py1,,pyn,2PS1𝐏X,,2PS2|𝒳|2𝐏X],absentsubscript𝑝subscript𝑦1subscript𝑝subscript𝑦𝑛2𝑃superscriptsubscript𝑆1topsubscript𝐏𝑋2𝑃superscriptsubscript𝑆superscript2𝒳2topsubscript𝐏𝑋\displaystyle=\left[p_{y_{1}},\ldots,p_{y_{n}},2P-S_{1}^{\top}{\bf P}_{X},% \ldots,2P-S_{2^{|\mathcal{X}|}-2}^{\top}{\bf P}_{X}\right],= [ italic_p start_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , … , italic_p start_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT , 2 italic_P - italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT , … , 2 italic_P - italic_S start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT | caligraphic_X | end_POSTSUPERSCRIPT - 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ] ,

where P𝑃Pitalic_P is the perception index. Also,

ajsubscript𝑎𝑗\displaystyle a_{j}italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT =𝐏Y(yj)𝟏|𝒳|ej,j=1,,|𝒴|,formulae-sequenceabsentsubscript𝐏𝑌subscript𝑦𝑗superscript1limit-from𝒳topsubscript𝑒𝑗𝑗1𝒴\displaystyle={\bf P}_{Y}({y_{j}})\boldsymbol{1}^{|\mathcal{X}|\top}e_{j},\,j=% 1,\ldots,|\mathcal{Y}|~{},= bold_P start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) bold_1 start_POSTSUPERSCRIPT | caligraphic_X | ⊤ end_POSTSUPERSCRIPT italic_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_j = 1 , … , | caligraphic_Y | , (52)
sisubscript𝑠𝑖\displaystyle s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =Si𝐏Y,i=1,,2|𝒳|2,formulae-sequenceabsentsubscript𝑆𝑖superscriptsubscript𝐏𝑌top𝑖1superscript2𝒳2\displaystyle=S_{i}{\bf P}_{Y}^{\top},\,i=1,\ldots,2^{|\mathcal{X}|}-2~{},= italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_P start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT , italic_i = 1 , … , 2 start_POSTSUPERSCRIPT | caligraphic_X | end_POSTSUPERSCRIPT - 2 , (53)

Sisubscript𝑆𝑖S_{i}italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are the vectors of the set {1,1}|𝒳|\{±[1,,1]}\superscript11𝒳plus-or-minus11\left\{-1,1\right\}^{|\mathcal{X}|}\backslash\left\{\pm\left[1,...,1\right]\right\}{ - 1 , 1 } start_POSTSUPERSCRIPT | caligraphic_X | end_POSTSUPERSCRIPT \ { ± [ 1 , … , 1 ] }, and ejsubscript𝑒𝑗e_{j}italic_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is the j𝑗jitalic_j-th unit vector in the standard basis.

For convenience, let us split the decision variables in (48) into two groups; |𝒴|𝒴|\mathcal{Y}|| caligraphic_Y | variables {wy}y𝒴subscriptsubscript𝑤𝑦𝑦𝒴\{w_{y}\}_{y\in\mathcal{Y}}{ italic_w start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_y ∈ caligraphic_Y end_POSTSUBSCRIPT related to the stochasticity constraint for each symbol in 𝒴𝒴\mathcal{Y}caligraphic_Y, and the 2|𝒳|2superscript2𝒳22^{|\mathcal{X}|}-22 start_POSTSUPERSCRIPT | caligraphic_X | end_POSTSUPERSCRIPT - 2 variables {νにゅーi}subscript𝜈𝑖\{\nu_{i}\}{ italic_νにゅー start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } related to the perception constraints (12). Now, (48) becomes

{[w,νにゅー]bmaxw,νにゅーs.t.νにゅーi