(Translated by https://www.hiragana.jp/)
Randomness-Efficient Constructions of Capacity-Achieving List-Decodable Codes

Randomness-Efficient Constructions of Capacity-Achieving List-Decodable Codes

Jonathan Mosheiff Department of Computer Science, Ben-Gurion University. Research supported by an Alon Fellowship. Part of this work was conducted while the author was visiting the Simons Institute for the Theory of Computing. mosheiff@bgu.ac.il    Nicolas Resch Informatics Institute, University of Amsterdam. Research supported by a Veni grant (VI.Veni.222.347) from the Dutch Research Council (NWO). Part of this work was conducted while the author was visiting the Simons Institute for the Theory of Computing. n.a.resch@uva.nl    Kuo Shang School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University. billy63878@sjtu.edu.cn    Chen Yuan School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University. chen_yuan@sjtu.edu.cn
(May 15, 2024)
Abstract

In this work, we consider the task of generating list-decodable codes over small (say, binary) alphabets using as little randomness as possible. Specifically, we hope to generate codes achieving what we term the Elias bound, which means that they are (ρ,L)𝜌𝐿(\rho,L)( italic_ρ , italic_L )-list-decodable with rate R1h(ρ)O(1/L)𝑅1𝜌𝑂1𝐿R\geq 1-h(\rho)-O(1/L)italic_R ≥ 1 - italic_h ( italic_ρ ) - italic_O ( 1 / italic_L ). A long line of work shows that uniformly random linear codes (RLCs) achieve the Elias bound: hence, we know O(n2)𝑂superscript𝑛2O(n^{2})italic_O ( italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) random bits suffice. Prior works (Guruswami and Mosheiff, FOCS 2022; Putterman and Pyne, arXiv 2023) demonstrate that just O(Ln)𝑂𝐿𝑛O(Ln)italic_O ( italic_L italic_n ) random bits suffice, via puncturing of low-bias codes. These recent constructions are essentially combinatorial, and rely (directly or indirectly) on graph expansion.

We provide two new constructions, which are algebraic. Compared to prior works, our constructions are considerably simpler and more direct. Furthermore, our codes are designed in such a way that their duals are also quite easy to analyze. Our first construction — which can be seen as a generalization of the celebrated Wozencraft ensemble — achieves the Elias bound and consumes Ln𝐿𝑛Lnitalic_L italic_n random bits. Additionally, its dual code achieves the Gilbert-Varshamov bound with high probability, and both the primal and dual admit quasilinear-time encoding algorithms. The second construction consumes 2nL2𝑛𝐿2nL2 italic_n italic_L random bits and yields a code where both it and its dual achieve the Elias bound. As we discuss, properties of a dual code are often crucial for applications of error-correcting codes in cryptography.

In all of the above cases – including the prior works achieving randomness complexity O(Ln)𝑂𝐿𝑛O(Ln)italic_O ( italic_L italic_n ) – the codes are designed to “approximate” RLCs. More precisely, for a given locality parameter L𝐿Litalic_L we construct codes achieving the same L𝐿Litalic_L-local properties as RLCs. This allows one to appeal to known list-decodability results for RLCs and thereby conclude that the code approximating an RLC also achieves the Elias bound (with high probability). As a final contribution, we indicate that such a proof strategy is inherently unable to generate list-decodable codes of rate R𝑅Ritalic_R over 𝔽qsubscript𝔽𝑞\mathbb{F}_{q}blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT with less than L(1R)nlog2(q)𝐿1𝑅𝑛subscript2𝑞L(1-R)n\log_{2}(q)italic_L ( 1 - italic_R ) italic_n roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_q ) bits of randomness.

1 Introduction

The basic task of coding theory is to define subsets of 𝒞[q]n𝒞superscriptdelimited-[]𝑞𝑛{\mathcal{C}}\subseteq[q]^{n}caligraphic_C ⊆ [ italic_q ] start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, where q𝑞q\in\mathbb{N}italic_q ∈ blackboard_N is the alphabet size and n𝑛n\in\mathbb{N}italic_n ∈ blackboard_N is the block-length, that satisfy two conflicting desiderata. Firstly, the code 𝒞𝒞{\mathcal{C}}caligraphic_C should be as large as possible, as this corresponds to the amount of information that one transmits in n𝑛nitalic_n symbol transmissions. But secondly, the elements of 𝒞𝒞{\mathcal{C}}caligraphic_C, termed codewords, should be as spread out as possible in order to minimize the likelihood that two distinct codewords are confused should errors be introduced. In this work, we will focus almost exclusively on linear codes, in which case we require q𝑞qitalic_q to be a prime power and insist that 𝒞𝔽qn𝒞superscriptsubscript𝔽𝑞𝑛{\mathcal{C}}\leq\mathbb{F}_{q}^{n}caligraphic_C ≤ blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, i.e., 𝒞𝒞{\mathcal{C}}caligraphic_C is a subspace of the vector space 𝔽qnsuperscriptsubscript𝔽𝑞𝑛\mathbb{F}_{q}^{n}blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. Unless otherwise mentioned, from now on all codes are linear.

Typically, instead of directly working with the cardinality |𝒞|𝒞|{\mathcal{C}}|| caligraphic_C | of a code, one analyzes its rate R=logq|𝒞|n=dim(𝒞)n𝑅subscript𝑞𝒞𝑛dimension𝒞𝑛R=\frac{\log_{q}|{\mathcal{C}}|}{n}=\frac{\dim({\mathcal{C}})}{n}italic_R = divide start_ARG roman_log start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT | caligraphic_C | end_ARG start_ARG italic_n end_ARG = divide start_ARG roman_dim ( caligraphic_C ) end_ARG start_ARG italic_n end_ARG, which measures the amount of information transmitted per codeword symbol. To measure a code’s error-resilience, various metrics can be used. The most basic (at least for the setting of worst-case errors) is 𝒞𝒞{\mathcal{C}}caligraphic_C’s minimum distance δ:=min{d(𝒙,𝒚):𝒙,𝒚𝒞,𝒙𝒚}assign𝛿:𝑑𝒙𝒚𝒙𝒚𝒞𝒙𝒚\delta:=\min\{d(\bm{x},\bm{y}):\bm{x},\bm{y}\in{\mathcal{C}},\bm{x}\neq\bm{y}\}italic_δ := roman_min { italic_d ( bold_italic_x , bold_italic_y ) : bold_italic_x , bold_italic_y ∈ caligraphic_C , bold_italic_x ≠ bold_italic_y }, where here and throughout d(𝒙,𝒚):=1n|{i{1,2,,n}:xiyi}|assign𝑑𝒙𝒚1𝑛conditional-set𝑖12𝑛subscript𝑥𝑖subscript𝑦𝑖d(\bm{x},\bm{y}):=\frac{1}{n}|\{i\in\{1,2,\dots,n\}:x_{i}\neq y_{i}\}|italic_d ( bold_italic_x , bold_italic_y ) := divide start_ARG 1 end_ARG start_ARG italic_n end_ARG | { italic_i ∈ { 1 , 2 , … , italic_n } : italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≠ italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } | is the (normalized) Hamming distance. A classical observation is that as long as ρ<δ/2𝜌𝛿2\rho<\delta/2italic_ρ < italic_δ / 2 fraction of symbols are corrupted, one can always uniquely-decode111At least, information-theoretically. Algorithmic decoding is a separate challenge. to recover the original codeword.

The first question one might ask, then, is what sort of tradeoffs one can achieve between rate and distance. A classical result due to Gilbert [Gil52] and Varshamov [Var57] states that, for any R,δ𝑅𝛿R,\deltaitalic_R , italic_δ satisfying R<1hq(δ)𝑅1subscript𝑞𝛿R<1-h_{q}(\delta)italic_R < 1 - italic_h start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( italic_δ ),222Here, hq()subscript𝑞h_{q}(\cdot)italic_h start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( ⋅ ) denotes the q𝑞qitalic_q-ary entropy function, which we define formally in Section 2.1. there exist infinite families of codes of rate at least R𝑅Ritalic_R and distance at least δ𝛿\deltaitalic_δ. We say that codes which achieve this tradeoff (or, in some cases, get ε𝜀\varepsilonitalic_ε-close for some small ε𝜀\varepsilonitalic_ε) achieve the GV bound.

A natural relaxation of unique-decoding that we focus upon is list-decoding: for a parameter ρ(0,11/q)𝜌011𝑞\rho\in(0,1-1/q)italic_ρ ∈ ( 0 , 1 - 1 / italic_q ) and an integer L1𝐿1L\geq 1italic_L ≥ 1 we call a code 𝒞𝒞{\mathcal{C}}caligraphic_C (ρ,L)𝜌𝐿(\rho,L)( italic_ρ , italic_L )-list-decodable if for any 𝒛𝔽qn𝒛superscriptsubscript𝔽𝑞𝑛\bm{z}\in\mathbb{F}_{q}^{n}bold_italic_z ∈ blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, the number of codewords at distance at most ρ𝜌\rhoitalic_ρ from 𝒛𝒛\bm{z}bold_italic_z is less than L𝐿Litalic_L. In notation:

𝒛𝔽qn,|{𝒄𝒞:d(𝒄,𝒛)ρ}|<L.formulae-sequencefor-all𝒛superscriptsubscript𝔽𝑞𝑛conditional-set𝒄𝒞𝑑𝒄𝒛𝜌𝐿\forall\bm{z}\in\mathbb{F}_{q}^{n},\leavevmode\nobreak\ \leavevmode\nobreak\ |% \{\bm{c}\in{\mathcal{C}}:d(\bm{c},\bm{z})\leq\rho\}|<L\ .∀ bold_italic_z ∈ blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , | { bold_italic_c ∈ caligraphic_C : italic_d ( bold_italic_c , bold_italic_z ) ≤ italic_ρ } | < italic_L .

Early work due to Elias and Wozencraft [Eli57, Woz58, Eli91] proposed list-decodable codes as an object of study, largely as an intermediate target on the way to unique-decoding. In the past 30 years or so, list-decodable codes have seen increased attention due to their connections to other parts of theoretical computer science, particularly complexity theory, cryptography and pseudorandomness [GL89, BFNW90, Lip90, KM93, Jac97, STV01]. Note that the above discussion of unique-decodability implies that any code with distance δ𝛿\deltaitalic_δ is (δ/2,1)𝛿21(\delta/2,1)( italic_δ / 2 , 1 )-list-decodable. In particular, by choosing a code 𝒞𝒞{\mathcal{C}}caligraphic_C achieving the GV bound, we can have a rate R<1hq(2ρ)𝑅1subscript𝑞2𝜌R<1-h_{q}(2\rho)italic_R < 1 - italic_h start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( 2 italic_ρ ) code which is (ρ,1)𝜌1(\rho,1)( italic_ρ , 1 )-list-decodable.

If one allows the list-size parameter L𝐿Litalic_L to grow, the list-decoding capacity theorem essentially says that we can correct up to twice as many errors for the same rate. More precisely, there exist (ρ,L)𝜌𝐿(\rho,L)( italic_ρ , italic_L )-list-decodable codes of rate 1hq(ρ)O(1/L)1subscript𝑞𝜌𝑂1𝐿1-h_{q}(\rho)-O(1/L)1 - italic_h start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( italic_ρ ) - italic_O ( 1 / italic_L ). Informally, one says that a code333Technically, one should speak of an infinite family of codes of increasing block-length whose rates have limit R𝑅Ritalic_R. In this work, we will not be too careful with this formalism, but it should be clear that our constructions lead to such infinite families. achieves list-decoding capacity if its rate is arbitrarily close to 1hq(ρ)1subscript𝑞𝜌1-h_{q}(\rho)1 - italic_h start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( italic_ρ ) with list-size Lpoly(n)𝐿poly𝑛L\leq\mathrm{poly}(n)italic_L ≤ roman_poly ( italic_n ). For our purposes, we are interested in codes that achieve the tradeoff achieved by random codes. Introducing some terminology, we will say a code construction 𝒞𝒞{\mathcal{C}}caligraphic_C achieves the Elias bound if it is (ρ,L)𝜌𝐿(\rho,L)( italic_ρ , italic_L )-list-decodable and has rate at least 1hq(ρ)O(1/L)1subscript𝑞𝜌𝑂1𝐿1-h_{q}(\rho)-O(1/L)1 - italic_h start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( italic_ρ ) - italic_O ( 1 / italic_L ).

We also mention that a generalization of list-decoding, termed list-recovery, has seen increasing attention in recent years. It was originally abstracted as a useful primitive in list-decoding concatenated codes [GI01, GI02, GI03, GI04]. However, it has recently proved itself to merit investigation in its own right, finding applications in cryptography [HIOS15, HLR21], randomness extraction [GUV09], hardness amplification [DMOZ20], group testing [INR10, NPR11], streaming algorithms [DW22], and beyond. The interested reader is directed to Section 2.3 for the precise definition of list-recovery; for now, suffice it to say that all of the preceding and ensuing discussion generalizes cleanly to list-recovery as well.

An outstanding problem in the theory of error-correcting codes is to provide explicit444While we will not be too precise with the meaning of “explicit” in this work, we informally mean that a description of the code can be constructed deterministically in time polynomial in n𝑛nitalic_n. constructions of capacity-achieving list-decodable codes. The problem in the regime of “large alphabet” has seen tremendous progress in the last quarter of a century. Since Guruswami and Rudra demonstrated that folded Reed-Solomon codes achieve list-decoding capacity [GR08], a long line of work has now led to explicit constructions of capacity-achieving codes: namely, codes of rate R𝑅Ritalic_R which are (1Rε,exp(poly(1/ε)))1𝑅𝜀poly1𝜀(1-R-\varepsilon,\exp(\mathrm{poly}(1/\varepsilon)))( 1 - italic_R - italic_ε , roman_exp ( roman_poly ( 1 / italic_ε ) ) )-list-decodable, assuming q(1/ε)Ω(1/ε2)𝑞superscript1𝜀Ω1superscript𝜀2q\geq(1/\varepsilon)^{\Omega(1/\varepsilon^{2})}italic_q ≥ ( 1 / italic_ε ) start_POSTSUPERSCRIPT roman_Ω ( 1 / italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_POSTSUPERSCRIPT [GRZ21]. While achieving optimal tradeoffs between all the parameters involved is still not completely resolved, it is fair to say that we have very satisfactory constructions, assuming q𝑞qitalic_q is sufficiently large. However, when it comes to explicitly constructing list-decodable codes over the binary alphabet, the existing results are quite paltry. The only notable successes concern the regime of very high noise, where one hopes to decode at radius 12ε12𝜀\frac{1}{2}-\varepsilondivide start_ARG 1 end_ARG start_ARG 2 end_ARG - italic_ε with codes of rate Ω(ε2)Ωsuperscript𝜀2\Omega(\varepsilon^{2})roman_Ω ( italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), matching (up to constant factors) the rate-distance tradeoff achieved by random linear codes. The current state of the art is Ta-Shma’s code [TS17] achieving rate Ω(ε2+o(1))Ωsuperscript𝜀2𝑜1\Omega(\varepsilon^{2+o(1)})roman_Ω ( italic_ε start_POSTSUPERSCRIPT 2 + italic_o ( 1 ) end_POSTSUPERSCRIPT ), for which we now additionally have efficient unique- and list-decoding [GJQST20, JST21] algorithms.

In light of the difficulty of explicitly constructing list-decodable codes over small alphabets, we focus on a more modest goal: let’s construct them randomly using as little randomness as possible. And in this case, we would like to achieve the Elias bound, i.e., for (ρ,L)𝜌𝐿(\rho,L)( italic_ρ , italic_L )-list-decodability the rate R𝑅Ritalic_R should be at least 1hq(ρ)O(1/L)1subscript𝑞𝜌𝑂1𝐿1-h_{q}(\rho)-O(1/L)1 - italic_h start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( italic_ρ ) - italic_O ( 1 / italic_L ). For example, the classical argument of Elias – which argues that random subsets 𝒞{0,1}n𝒞superscript01𝑛{\mathcal{C}}\subseteq\{0,1\}^{n}caligraphic_C ⊆ { 0 , 1 } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT of size 2Rnsuperscript2𝑅𝑛2^{Rn}2 start_POSTSUPERSCRIPT italic_R italic_n end_POSTSUPERSCRIPT are (ρ,L)𝜌𝐿(\rho,L)( italic_ρ , italic_L )-list-decodable assuming R<1h2(ρ)1/L𝑅1subscript2𝜌1𝐿R<1-h_{2}(\rho)-1/Litalic_R < 1 - italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_ρ ) - 1 / italic_L – shows that with exponentially many random bits we can have such a code. This generalizes to R<1hq(ρ)1/L𝑅1subscript𝑞𝜌1𝐿R<1-h_{q}(\rho)-1/Litalic_R < 1 - italic_h start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( italic_ρ ) - 1 / italic_L for general alphabet size q𝑞qitalic_q.

If rather than a plain random code one instead samples a random linear code (RLC), a long line of works [ZP81, GHSZ02, GHK11, CGV13, Woo13, RW14, RW18, LW21, GLM+22, AGL23] shows that they achieve the Elias bound in most parameter regimes. In particular, [LW21, GLM+22] settles the binary case. Hence, O(n2)𝑂superscript𝑛2O(n^{2})italic_O ( italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) random bits are sufficient.

To push beyond this, [MRR+20] shows that random low-density parity-check (LDPC) codes also achieve list-decoding capacity efficiently, and such codes can be sampled with O(nlogn)𝑂𝑛𝑛O(n\log n)italic_O ( italic_n roman_log italic_n ) bits of randomness. This work actually argues something stronger: namely, any local property that is satisfied by a random linear code of rate R𝑅Ritalic_R with high probability is also satisfied by a random LDPC code of rate Ro(1)𝑅𝑜1R-o(1)italic_R - italic_o ( 1 ).

While we precisely define local properties in Section 2.4, for now we give the following intuitive explanation: for a given locality parameter =O(1)𝑂1\ell=O(1)roman_ℓ = italic_O ( 1 ), \ellroman_ℓ-local properties are defined by excluding a collection of “forbidden subsets” of size \ellroman_ℓ. In the case of list-decodability, the collection would be defined as the family of all L𝐿Litalic_L-tuples of vectors 𝒙1,,𝒙Lsubscript𝒙1subscript𝒙𝐿\bm{x}_{1},\dots,\bm{x}_{L}bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_italic_x start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT which all lie in a Hamming ball of radius ρ𝜌\rhoitalic_ρ. That is, (ρ,L)𝜌𝐿(\rho,L)( italic_ρ , italic_L )-list-decodability is an L𝐿Litalic_L-local property. The same in fact holds for (ρ,λ,L)𝜌𝜆𝐿(\rho,\lambda,L)( italic_ρ , italic_λ , italic_L )-list-recoverability: it is also an L𝐿Litalic_L-local property.

Subsequent work by Guruswami and Mosheiff [GM22] provides a means of sampling codes achieving list-decoding capacity efficiently with only O(n)𝑂𝑛O(n)italic_O ( italic_n ) randomness. In fact, as is the case for LDPC codes, these codes achieve the same local properties as RLCs. First, note that an RLC is nothing but a random puncturing of the Hadamard code.555Recall that the Hadamard code encodes a message 𝒎𝔽2k𝒎superscriptsubscript𝔽2𝑘\bm{m}\in\mathbb{F}_{2}^{k}bold_italic_m ∈ blackboard_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT into a length-2ksuperscript2𝑘2^{k}2 start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT codeword by computing 𝒎,𝒙𝒎𝒙\langle\bm{m},\bm{x}\rangle⟨ bold_italic_m , bold_italic_x ⟩ for every x𝔽2k𝑥superscriptsubscript𝔽2𝑘x\in\mathbb{F}_{2}^{k}italic_x ∈ blackboard_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT. Observe further that the Hadamard code is optimally balanced, in the sense that every non-zero codeword has weight precisely 1/2121/21 / 2. Gurusawmi and Mosheiff suggest then puncturing some other explicitly chosen “mother code” of block-length N𝑁Nitalic_N, and so long as this code is nearly balanced in the sense that all non-zero codewords have weight 1/2absent12\approx 1/2≈ 1 / 2, then a random puncturing will again “look like” an RLC from the perspective of local properties. Assuming Npoly(n)𝑁poly𝑛N\leq\mathrm{poly}(n)italic_N ≤ roman_poly ( italic_n ), then we need nlogN=O(nlogn)𝑛𝑁𝑂𝑛𝑛n\log N=O(n\log n)italic_n roman_log italic_N = italic_O ( italic_n roman_log italic_n ) random bits to sample such a code, matching the guarantee for LDPC codes. To achieve O(n)𝑂𝑛O(n)italic_O ( italic_n ) randomness, one must ensure NO(n)𝑁𝑂𝑛N\leq O(n)italic_N ≤ italic_O ( italic_n ) (by choosing, e.g., Ta-Shma’s codes [TS17] for the mother code) and then puncturing without replacement: one thus requires only log(Nn)=O(n)binomial𝑁𝑛𝑂𝑛\log\binom{N}{n}=O(n)roman_log ( FRACOP start_ARG italic_N end_ARG start_ARG italic_n end_ARG ) = italic_O ( italic_n ) bits of randomness.

Very recently, another derandomization has been offered. Putterman and Pyne [PP23] demonstrate that instead of choosing each coordinate independently one can choose them via an expander random walk. This then means that we only require O(nlogd)𝑂𝑛𝑑O(n\log d)italic_O ( italic_n roman_log italic_d ) bits of randomness to sample the code, where d𝑑ditalic_d is the degree of the expander graph. Assuming d=O(1)𝑑𝑂1d=O(1)italic_d = italic_O ( 1 ) – which is achievable if one is interested in local properties – we in particular find that O(n)𝑂𝑛O(n)italic_O ( italic_n ) bits of randomness suffice.

Thus, we currently know how to construct list-decodable binary codes achieving capacity efficiently with O(n)𝑂𝑛O(n)italic_O ( italic_n ) bits of randomness. As elaborated below, this seems like a hard barrier for current techniques.

The above constructions are quite “indirect,” requiring the existence of a sufficiently nice mother code that can then be punctured. While explicit constructions of such highly balanced codes are known, the constructions are all quite nontrivial. This status naturally leads us to wonder if we can provide more “direct” randomness-efficient constructions of binary codes achieving the Elias bound.

Furthermore, we also find motivations stemming from code-based cryptography. In this setting, one would often like to generate codes that “look like” random codes, but in fact admit efficient descriptions, as the description of the code is often some sort of public parameter that must be known by all parties making use of the cryptographic scheme. We elaborate upon this connection below. And in this case, one would often like the dual code to also look random (again, we discuss this motivation further below). We observe that while the dual of an RLC is again an RLC – and hence will also satisfy the Elias bound with high probability – the above constructions (LDPC or puncturing-based) do not have such guarantees. And indeed, the dual of an LDPC code certainly cannot even have linear minimum distance! As for the puncturing-based constructions, it is unclear to us whether random puncturing can yield the Elias bound and good dual distance; at the very least, such a proof would require new techniques.

1.1 Our Results

In this work, we provide two new randomized constructions of codes achieving the Elias bound and consuming only O(Ln)𝑂𝐿𝑛O(Ln)italic_O ( italic_L italic_n ) bits of randomness. In fact, for any (constant) locality parameter \ellroman_ℓ, we show that these codes are \ellroman_ℓ-locally similar (see Definition 2.14) to RLCs, which implies that any \ellroman_ℓ-local property satisfied by RLCs with high probability is also satisfied by our codes with high probability (in fact, the success probability will be of the form 1qΩ(n)1superscript𝑞Ω𝑛1-q^{-\Omega(n)}1 - italic_q start_POSTSUPERSCRIPT - roman_Ω ( italic_n ) end_POSTSUPERSCRIPT). In particular, taking =L𝐿\ell=Lroman_ℓ = italic_L implies that all our codes achieve list-decoding capacity efficiently with high probability. We provide our constructions for general (but constant) field size q𝑞qitalic_q, although we are mostly motivated by the binary case.

The notions of local property and local similarity are thoroughly defined and discussed in section 2.4. For concreteness, we give a shorter and less precise description here, and for simplicity we restrict attention here to the binary case. Fix a locality parameter \ell\in\mathbb{N}roman_ℓ ∈ blackboard_N and consider the set of all n×𝑛n\times\ellitalic_n × roman_ℓ binary matrices. We generally think of \ellroman_ℓ as constant while n𝑛nitalic_n tends to infinity. A code 𝒞𝔽2n𝒞superscriptsubscript𝔽2𝑛{\mathcal{C}}\subseteq\mathbb{F}_{2}^{n}caligraphic_C ⊆ blackboard_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is said to contain a matrix A𝔽2n×𝐴superscriptsubscript𝔽2𝑛A\in\mathbb{F}_{2}^{n\times\ell}italic_A ∈ blackboard_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n × roman_ℓ end_POSTSUPERSCRIPT if it contains all the columns of A𝐴Aitalic_A as codewords. We group the matrices in 𝔽2n×superscriptsubscript𝔽2𝑛\mathbb{F}_{2}^{n\times\ell}blackboard_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n × roman_ℓ end_POSTSUPERSCRIPT according to their row distribution. More precisely, we associate with A𝔽2n×𝐴superscriptsubscript𝔽2𝑛A\in\mathbb{F}_{2}^{n\times\ell}italic_A ∈ blackboard_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n × roman_ℓ end_POSTSUPERSCRIPT a distribution 𝖤𝗆𝗉A𝔽2similar-tosubscript𝖤𝗆𝗉𝐴superscriptsubscript𝔽2\mathsf{Emp}_{A}\sim\mathbb{F}_{2}^{\ell}sansserif_Emp start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ∼ blackboard_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT that yields a vector x𝔽2𝑥superscriptsubscript𝔽2x\in\mathbb{F}_{2}^{\ell}italic_x ∈ blackboard_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT proportionally to the number of times that x𝑥xitalic_x appears as a row in A𝐴Aitalic_A, namely, τ(x)=|{i[n]:Ai=x}|n𝜏𝑥conditional-set𝑖delimited-[]𝑛subscript𝐴𝑖𝑥𝑛\tau(x)=\frac{|\{i\in[n]:A_{i}=x\}|}{n}italic_τ ( italic_x ) = divide start_ARG | { italic_i ∈ [ italic_n ] : italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_x } | end_ARG start_ARG italic_n end_ARG, where A1,,An𝔽2subscript𝐴1subscript𝐴𝑛superscriptsubscript𝔽2A_{1},\dots,A_{n}\in\mathbb{F}_{2}^{\ell}italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_A start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ blackboard_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT denote A𝐴Aitalic_A’s rows. We denote the set of all matrices in 𝔽2n×superscriptsubscript𝔽2𝑛\mathbb{F}_{2}^{n\times\ell}blackboard_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n × roman_ℓ end_POSTSUPERSCRIPT with row distribution τ𝜏\tauitalic_τ by n,τsubscript𝑛𝜏\mathcal{M}_{n,\tau}caligraphic_M start_POSTSUBSCRIPT italic_n , italic_τ end_POSTSUBSCRIPT. We can now define the notion of local-similarity to an RLC for binary codes.

Definition 1.1 (Local similarity to RLC in the binary case).

Let 𝒞𝔽2n𝒞superscriptsubscript𝔽2𝑛{\mathcal{C}}\leq\mathbb{F}_{2}^{n}caligraphic_C ≤ blackboard_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT be a linear code sampled from some ensemble. We say that 𝒞𝒞{\mathcal{C}}caligraphic_C is \ellroman_ℓ-locally-similar to an RLC of rate R𝑅Ritalic_R if, for every 1b1𝑏1\leq b\leq\ell1 ≤ italic_b ≤ roman_ℓ and every distribution τ𝔽2bsimilar-to𝜏superscriptsubscript𝔽2𝑏\tau\sim\mathbb{F}_{2}^{b}italic_τ ∼ blackboard_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT with dim(τ)=bdimension𝜏𝑏\dim(\tau)=broman_dim ( italic_τ ) = italic_b, we have

𝔼𝒞[|{An,τ:A𝒞}|]2(H2(τ)b(1R))n.subscript𝔼𝒞delimited-[]conditional-set𝐴subscript𝑛𝜏𝐴𝒞superscript2subscript𝐻2𝜏𝑏1𝑅𝑛\mathop{\mathbb{E}}_{{\mathcal{C}}}\left[|\{A\in\mathcal{M}_{n,\tau}:A% \subseteq{\mathcal{C}}\}|\right]\leq 2^{(H_{2}(\tau)-b(1-R))n}\ .blackboard_E start_POSTSUBSCRIPT caligraphic_C end_POSTSUBSCRIPT [ | { italic_A ∈ caligraphic_M start_POSTSUBSCRIPT italic_n , italic_τ end_POSTSUBSCRIPT : italic_A ⊆ caligraphic_C } | ] ≤ 2 start_POSTSUPERSCRIPT ( italic_H start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_τ ) - italic_b ( 1 - italic_R ) ) italic_n end_POSTSUPERSCRIPT .

Above, H2(τ)subscript𝐻2𝜏H_{2}(\tau)italic_H start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_τ ) denotes the entropy of the distribution τ𝜏\tauitalic_τ, measured in bits.

Less formally, 𝒞𝒞{\mathcal{C}}caligraphic_C is locally similar to an RLC if, for every τ𝜏\tauitalic_τ, the expected number of matrices from n,τsubscript𝑛𝜏\mathcal{M}_{n,\tau}caligraphic_M start_POSTSUBSCRIPT italic_n , italic_τ end_POSTSUBSCRIPT in 𝒞𝒞{\mathcal{C}}caligraphic_C is not much larger than that in an RLC. The motivation for this definition is that local-similarity of 𝒞𝒞{\mathcal{C}}caligraphic_C to an RLC implies that 𝒞𝒞{\mathcal{C}}caligraphic_C almost surely satisfies every local property (a notion formulated in Definition 2.9) that is satisfied by an RLC with high probability. As important motivating special cases, we note that list-decodability and list-recoverability are both local properties; this is established in e.g. [MRR+20, Res20]. Therefore, we can morally say that any code satisfying Definition 1.1 is likely to be list-decodable and list-recoverable with similar paramters to those of an RLC. In particular, such a code is likely to achieve the Elias bound.

We now turn to describing our constructions. In contrast to prior works, neither of our constructions rely on an explicit “mother code” which we then puncture, but are instead built “from scratch.” Our constructions also have the pleasing property of being rather simple. A final major bonus of our codes is that their duals also satisfy non-trivial properties: for the first construction, its dual achieves the GV bound with high probability; for the second, its dual is also \ellroman_ℓ-locally similar to RLCs!

Our first construction, which yields a code over 𝔽qsubscript𝔽𝑞\mathbb{F}_{q}blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT of block-length n𝑛nitalic_n, uses linearized polynomials f(X)𝑓𝑋f(X)italic_f ( italic_X ) with q𝑞qitalic_q-degree at most 11\ell-1roman_ℓ - 1. That is, f(X)𝑓𝑋f(X)italic_f ( italic_X ) is of the form i=01fiXqisuperscriptsubscript𝑖01subscript𝑓𝑖superscript𝑋superscript𝑞𝑖\sum_{i=0}^{\ell-1}f_{i}X^{q^{i}}∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ - 1 end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUPERSCRIPT italic_q start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT where each fi𝔽qnsubscript𝑓𝑖subscript𝔽superscript𝑞𝑛f_{i}\in\mathbb{F}_{q^{n}}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_F start_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, the degree n𝑛nitalic_n extension of 𝔽qsubscript𝔽𝑞\mathbb{F}_{q}blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT. As we review in Section 2.2, such linearized polynomials define 𝔽qsubscript𝔽𝑞\mathbb{F}_{q}blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT-linear maps 𝔽qn𝔽qnsubscript𝔽superscript𝑞𝑛subscript𝔽superscript𝑞𝑛\mathbb{F}_{q^{n}}\to\mathbb{F}_{q^{n}}blackboard_F start_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT → blackboard_F start_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT. That is, for all a,b𝔽q𝑎𝑏subscript𝔽𝑞a,b\in\mathbb{F}_{q}italic_a , italic_b ∈ blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT and α,β𝔽qn𝛼𝛽subscript𝔽superscript𝑞𝑛\alpha,\beta\in\mathbb{F}_{q^{n}}italic_α , italic_β ∈ blackboard_F start_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, we have f(aα+bβ)=af(α)+bf(β)𝑓𝑎𝛼𝑏𝛽𝑎𝑓𝛼𝑏𝑓𝛽f(a\alpha+b\beta)=af(\alpha)+bf(\beta)italic_f ( italic_a italic_α + italic_b italic_β ) = italic_a italic_f ( italic_α ) + italic_b italic_f ( italic_β ). The code is sampled by sampling the coefficients fi𝔽qnsubscript𝑓𝑖subscript𝔽superscript𝑞𝑛f_{i}\in\mathbb{F}_{q^{n}}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_F start_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT independently and uniformly at random. In particular, this requires only nlog2q𝑛subscript2𝑞\ell\lceil n\log_{2}q\rceilroman_ℓ ⌈ italic_n roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_q ⌉ uniformly random bits.

To provide codes with rate R=k/n𝑅𝑘𝑛R=k/nitalic_R = italic_k / italic_n, we fix an 𝔽qsubscript𝔽𝑞\mathbb{F}_{q}blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT-linear subspace V𝔽qn𝑉subscript𝔽superscript𝑞𝑛V\subseteq\mathbb{F}_{q^{n}}italic_V ⊆ blackboard_F start_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT of dimension k𝑘kitalic_k. The code is then defined as {φ(f(α)):αV}conditional-set𝜑𝑓𝛼𝛼𝑉\{\varphi(f(\alpha)):\alpha\in V\}{ italic_φ ( italic_f ( italic_α ) ) : italic_α ∈ italic_V }, where φ:𝔽qn𝔽qn:𝜑subscript𝔽superscript𝑞𝑛superscriptsubscript𝔽𝑞𝑛\varphi:\mathbb{F}_{q^{n}}\to\mathbb{F}_{q}^{n}italic_φ : blackboard_F start_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT → blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is any bijective 𝔽qsubscript𝔽𝑞\mathbb{F}_{q}blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT-linear map. Recall that such a map exists as 𝔽qnsubscript𝔽superscript𝑞𝑛\mathbb{F}_{q^{n}}blackboard_F start_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT is of dimension n𝑛nitalic_n as a vector space over 𝔽qsubscript𝔽𝑞\mathbb{F}_{q}blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT, and any two vector spaces over the same field of the same dimension are isomorphic. For example, if ω1,,ωnsubscript𝜔1subscript𝜔𝑛\omega_{1},\dots,\omega_{n}italic_ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_ω start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is a basis for 𝔽qnsubscript𝔽superscript𝑞𝑛\mathbb{F}_{q^{n}}blackboard_F start_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT over 𝔽qsubscript𝔽𝑞\mathbb{F}_{q}blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT, we could set φ:i=1nxiωi(x1,,xn):𝜑maps-tosuperscriptsubscript𝑖1𝑛subscript𝑥𝑖subscript𝜔𝑖subscript𝑥1subscript𝑥𝑛\varphi:\sum_{i=1}^{n}x_{i}\omega_{i}\mapsto(x_{1},\dots,x_{n})italic_φ : ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ↦ ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ). We say that 𝒞𝒞{\mathcal{C}}caligraphic_C is a pseudorandom code from linearized polynomials of rate R𝑅Ritalic_R and degree \ellroman_ℓ, or just PCLP(R,)PCLP𝑅\mathrm{PCLP}(R,\ell)roman_PCLP ( italic_R , roman_ℓ ) for short, if it is sampled according to the above procedure.666The dependence on V𝑉Vitalic_V and φ𝜑\varphiitalic_φ is not made explicit in this notation, as they will turn out to have no impact on our results regarding 𝒞𝒞{\mathcal{C}}caligraphic_C’s combinatorial properties. Requiring the polynomial f𝑓fitalic_f to be linearized guarantees that the resulting code is linear, as desired.

Not only are we able to show that such codes achieve the Elias bound with high probability, we also show that their dual code achieves the GV bound. As we elaborate upon below, for cryptographic applications a code’s dual distance is often a crucial parameter of interest. We remark that, prior to our work, we are not aware of any construction of binary codes consuming O(n)𝑂𝑛O(n)italic_O ( italic_n ) randomness outputting codes with both distance and dual distance lying on the GV bound.

Having realized that with n𝑛\ell nroman_ℓ italic_n randomness we can construct a binary code that is \ellroman_ℓ-locally similar to RLCs with dual code achieving the GV bound (which informally follows from being 1111-locally similar to RLCs), it is natural to wonder if it is possible to get both primal and dual code \ellroman_ℓ-locally similar to RLCs. We emphasize again that this would imply that both the primal and the dual code achieve the Elias bound for list size L=𝐿L=\ellitalic_L = roman_ℓ. The answer to this question is yes: our second construction has exactly this property. We now turn to describing this construction.

First, fix distinct elements α1,,αn𝔽qnsubscript𝛼1subscript𝛼𝑛subscript𝔽superscript𝑞𝑛\alpha_{1},\dots,\alpha_{n}\in\mathbb{F}_{q^{n}}italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_α start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ blackboard_F start_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT (they need not be linearly independent over 𝔽qsubscript𝔽𝑞\mathbb{F}_{q}blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT). Let γ:𝔽qn𝔽qn:𝛾subscript𝔽superscript𝑞𝑛superscriptsubscript𝔽𝑞𝑛\gamma:\mathbb{F}_{q^{n}}\to\mathbb{F}_{q}^{n}italic_γ : blackboard_F start_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT → blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT be a full-rank linear map (as with φ𝜑\varphiitalic_φ before), and let η:𝔽qn𝔽qk:𝜂subscript𝔽superscript𝑞𝑛superscriptsubscript𝔽𝑞𝑘\eta:\mathbb{F}_{q^{n}}\to\mathbb{F}_{q}^{k}italic_η : blackboard_F start_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT → blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT be any surjective 𝔽qsubscript𝔽𝑞\mathbb{F}_{q}blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT-linear map. For example, if ω1,,ωnsubscript𝜔1subscript𝜔𝑛\omega_{1},\dots,\omega_{n}italic_ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_ω start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is a basis for 𝔽qnsubscript𝔽superscript𝑞𝑛\mathbb{F}_{q^{n}}blackboard_F start_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT over 𝔽qsubscript𝔽𝑞\mathbb{F}_{q}blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT, we could set γ:i=1nxiωi(x1,,xn):𝛾maps-tosuperscriptsubscript𝑖1𝑛subscript𝑥𝑖subscript𝜔𝑖subscript𝑥1subscript𝑥𝑛\gamma:\sum_{i=1}^{n}x_{i}\omega_{i}\mapsto(x_{1},\dots,x_{n})italic_γ : ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ↦ ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) and η:i=1nxiωi(x1,,xk):𝜂maps-tosuperscriptsubscript𝑖1𝑛subscript𝑥𝑖subscript𝜔𝑖subscript𝑥1subscript𝑥𝑘\eta:\sum_{i=1}^{n}x_{i}\omega_{i}\mapsto(x_{1},\dots,x_{k})italic_η : ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ↦ ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ).

For a given rate R=k/n𝑅𝑘𝑛R=k/nitalic_R = italic_k / italic_n, we choose independently two polynomials f(X),g(X)𝔽qn[X]𝑓𝑋𝑔𝑋subscript𝔽superscript𝑞𝑛delimited-[]𝑋f(X),g(X)\in\mathbb{F}_{q^{n}}[X]italic_f ( italic_X ) , italic_g ( italic_X ) ∈ blackboard_F start_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ italic_X ] uniformly amongst all such polynomials of degree at most 11\ell-1roman_ℓ - 1 (unlike in the previous construction, these polynomials need not be linearized). Note that this requires only 2nlog2q2𝑛subscript2𝑞2\ell\lceil n\log_{2}q\rceil2 roman_ℓ ⌈ italic_n roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_q ⌉ uniformly random bits. We then define the following two matrices G,G′′𝔽qk×nsuperscript𝐺superscript𝐺′′superscriptsubscript𝔽𝑞𝑘𝑛G^{\prime},G^{\prime\prime}\in\mathbb{F}_{q}^{k\times n}italic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_G start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ∈ blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k × italic_n end_POSTSUPERSCRIPT:

  • For each i[k]𝑖delimited-[]𝑘i\in[k]italic_i ∈ [ italic_k ], the i𝑖iitalic_i-th row of Gsuperscript𝐺G^{\prime}italic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is defined to be γ(f(αi))𝛾𝑓subscript𝛼𝑖\gamma(f(\alpha_{i}))italic_γ ( italic_f ( italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ).

  • For each j[n]𝑗delimited-[]𝑛j\in[n]italic_j ∈ [ italic_n ], the j𝑗jitalic_j-th column of G′′superscript𝐺′′G^{\prime\prime}italic_G start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT is defined to be η(g(αj))𝜂𝑔subscript𝛼𝑗\eta(g(\alpha_{j}))italic_η ( italic_g ( italic_α start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ).

We then define G:=G+G′′assign𝐺superscript𝐺superscript𝐺′′G:=G^{\prime}+G^{\prime\prime}italic_G := italic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + italic_G start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT and set 𝒞:={𝒙G:𝒙𝔽qk}assign𝒞conditional-set𝒙𝐺𝒙superscriptsubscript𝔽𝑞𝑘{\mathcal{C}}:=\{\bm{x}G:\bm{x}\in\mathbb{F}_{q}^{k}\}caligraphic_C := { bold_italic_x italic_G : bold_italic_x ∈ blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT }. We call a code constructed in this way a pseudorandom code from row and column polynomials of rate R𝑅Ritalic_R and degree \ellroman_ℓ, or just PCRCP(R,)PCRCP𝑅\mathrm{PCRCP}(R,\ell)roman_PCRCP ( italic_R , roman_ℓ ), if it is sampled according to this procedure.777For reasons analogous to before, this notation omits mention of γ𝛾\gammaitalic_γ, η𝜂\etaitalic_η and α1,,αnsubscript𝛼1subscript𝛼𝑛\alpha_{1},\dots,\alpha_{n}italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_α start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. Informally, the matrix Gsuperscript𝐺G^{\prime}italic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is responsible for ensuring that the primal code is \ellroman_ℓ-locally similar to RLCs, while the matrix G′′superscript𝐺′′G^{\prime\prime}italic_G start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT guarantees the same holds for the dual code. In particular, if we had just set G=G𝐺superscript𝐺G=G^{\prime}italic_G = italic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT then we would have had just the primal code \ellroman_ℓ-locally similar to RLCs, while if we just set G=G′′𝐺superscript𝐺′′G=G^{\prime\prime}italic_G = italic_G start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT then only the dual code would be \ellroman_ℓ-locally similar to RLCs.

This second construction thus yields the following result.

Theorem 1.2 (Informal; follows from Theorem 4.2).

Let ,n𝑛\ell,n\in\mathbb{N}roman_ℓ , italic_n ∈ blackboard_N, R(0,1)𝑅01R\in(0,1)italic_R ∈ ( 0 , 1 ) for which Rn𝑅𝑛Rn\in\mathbb{N}italic_R italic_n ∈ blackboard_N and q𝑞qitalic_q is a prime power. Let 𝒫,𝒫𝒫superscript𝒫perpendicular-to\mathcal{P},\mathcal{P}^{\perp}caligraphic_P , caligraphic_P start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT be \ellroman_ℓ-local properties, and suppose that an RLC(R)RLC𝑅\mathrm{RLC}(R)roman_RLC ( italic_R ) satisfies 𝒫𝒫\mathcal{P}caligraphic_P with probability 1qΩ(n)1superscript𝑞Ω𝑛1-q^{-\Omega(n)}1 - italic_q start_POSTSUPERSCRIPT - roman_Ω ( italic_n ) end_POSTSUPERSCRIPT and an RLC(1R)RLC1𝑅\mathrm{RLC}(1-R)roman_RLC ( 1 - italic_R ) satisfies 𝒫superscript𝒫perpendicular-to\mathcal{P}^{\perp}caligraphic_P start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT with probability 1qΩ(n)1superscript𝑞Ω𝑛1-q^{-\Omega(n)}1 - italic_q start_POSTSUPERSCRIPT - roman_Ω ( italic_n ) end_POSTSUPERSCRIPT. Then, for sufficiently large n𝑛nitalic_n, there exists a randomized procedure consuming O(nlogq)𝑂𝑛𝑞O(\ell n\log q)italic_O ( roman_ℓ italic_n roman_log italic_q ) bits of randomness outputting a code 𝒞𝒞{\mathcal{C}}caligraphic_C such that, with probability at least 1qΩ(n)1superscript𝑞Ω𝑛1-q^{-\Omega(n)}1 - italic_q start_POSTSUPERSCRIPT - roman_Ω ( italic_n ) end_POSTSUPERSCRIPT:

  • 𝒞𝒞{\mathcal{C}}caligraphic_C has rate R𝑅Ritalic_R;

  • 𝒞𝒞{\mathcal{C}}caligraphic_C satisfies 𝒫𝒫\mathcal{P}caligraphic_P; and

  • 𝒞superscript𝒞perpendicular-to{\mathcal{C}}^{\perp}caligraphic_C start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT satisfies 𝒫superscript𝒫perpendicular-to\mathcal{P}^{\perp}caligraphic_P start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT, where 𝒞superscript𝒞perpendicular-to{\mathcal{C}}^{\perp}caligraphic_C start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT is the code dual to 𝒞𝒞{\mathcal{C}}caligraphic_C.

As mentioned above, we do know of other code ensembles sampled with linear randomness that share local properties with RLCs. However, we are not aware of any other code ensembles for which the dual code also shares local properties with RLCs.

Lastly, we mention one other pleasing feature of our first construction based on linearized polynomials. Namely, a careful choice of representation for 𝔽qnsubscript𝔽superscript𝑞𝑛\mathbb{F}_{q^{n}}blackboard_F start_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT over 𝔽qsubscript𝔽𝑞\mathbb{F}_{q}blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT allows one to view the task of encoding a message as a constant number of polynomial multiplications, which can be computed in O(nlogn)𝑂𝑛𝑛O(n\log n)italic_O ( italic_n roman_log italic_n ) time via standard FFT-type methods. Thus, we can also claim the following result.

Theorem 1.3 (Informal; follows from Theorem 3.4 and Proposition 3.5).

Let ,n𝑛\ell,n\in\mathbb{N}roman_ℓ , italic_n ∈ blackboard_N, R(0,1)𝑅01R\in(0,1)italic_R ∈ ( 0 , 1 ) for which Rn𝑅𝑛Rn\in\mathbb{N}italic_R italic_n ∈ blackboard_N and q𝑞qitalic_q a prime power. Let 𝒫𝒫\mathcal{P}caligraphic_P be an \ellroman_ℓ-local property, and suppose that an RLC(R)RLC𝑅\mathrm{RLC}(R)roman_RLC ( italic_R ) satisfies 𝒫𝒫\mathcal{P}caligraphic_P with probability 1qΩ(n)1superscript𝑞Ω𝑛1-q^{-\Omega(n)}1 - italic_q start_POSTSUPERSCRIPT - roman_Ω ( italic_n ) end_POSTSUPERSCRIPT. Then, for sufficiently large n𝑛nitalic_n, there exists a randomized procedure consuming O(nlogq)𝑂𝑛𝑞O(\ell n\log q)italic_O ( roman_ℓ italic_n roman_log italic_q ) bits of randomness outputting a code 𝒞𝒞{\mathcal{C}}caligraphic_C such that, with probability at least 1qΩ(n)1superscript𝑞Ω𝑛1-q^{-\Omega(n)}1 - italic_q start_POSTSUPERSCRIPT - roman_Ω ( italic_n ) end_POSTSUPERSCRIPT:

  • 𝒞𝒞{\mathcal{C}}caligraphic_C has rate R𝑅Ritalic_R;

  • 𝒞𝒞{\mathcal{C}}caligraphic_C satisfies 𝒫𝒫\mathcal{P}caligraphic_P;

  • 𝒞superscript𝒞perpendicular-to{\mathcal{C}}^{\perp}caligraphic_C start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT achieves the GV-bound;

  • 𝒞𝒞{\mathcal{C}}caligraphic_C is encodable in O(nlogn)𝑂𝑛𝑛O(n\log n)italic_O ( italic_n roman_log italic_n ) time.

Cryptographic motivation.

We remark that codes that “look like random linear codes” but are in fact samplable with less randomness are highly motivated by cryptographic considerations. And in fact, achieving good dual distance is often a crucial desideratum: the security of a cryptosystem is typically tied to the dual distance of the code, whether this is provably the case (i.e., with secret-sharing schemes [CDD+15],[CXY20]) or plausibly the case (i.e., the linear tests framework for learning parity with noise [BCG+22]). However, codes that require less randomness to generate allow for reduced public key sizes: the sizes of keys is typically the major drawback of post-quantum public-key cryptosystems. Hence the popularity of, e.g., quasi-cyclic [MAB+18] and moderate-density parity-check codes [ABB+22].

While for McEliece-type encryption schemes [McE78] an important requirement is that the code admits an efficient decoding algorithm, this is not in fact required for recent applications of error-correcting codes in multi-party computation – e.g., in the context of pseudorandom correlation generators (PCGs). In fact, one typically hopes that such codes do not admit efficient decoding [BIP+18, DGH+21]. A current “rule-of-thumb” is that the employed code should have good dual distance. In our view, a much more satisfying guarantee is that the dual code in fact shares more sophisticated properties with random linear codes, e.g., list-decodability/-recoverability, as our techniques can establish.

We further remark that many recent code constructions for PCGs [BCG+20, BCG+22, RRT23] in fact only admit randomized constructions that fail with probability 1/poly(n)1poly𝑛1/\mathrm{poly}(n)1 / roman_poly ( italic_n ); that is, they fail with non-negligible probability. This implies that the resulting constructions technically fail to satisfy standard security definitions. In contrast, all of our code constructions satisfy the targeted combinatorial properties with probability at least 1exp(Ω(n))1Ω𝑛1-\exp(-\Omega(n))1 - roman_exp ( - roman_Ω ( italic_n ) ).

Concretely, one can plug the (dual of) our first code construction into the framework of [BCG+19] to obtain PCGs for standard correlations like oblivious transfers with quasi-linear computation time for the involved parties. While constructions of such efficiency were known previously, we view our additional guarantee of local-similarity to RLCs as a stronger security guarantee than prior constructions offered (which only promised good minimum distance). Additionally, as emphasized above, ours is the first construction of such efficiency with negligible failure probability (in the randomized construction of the utilized code). We leave further investigation of the PCG implications of our codes for future research.

Finally, we recall that a linear code with large distance and dual distance yields a linear secret sharing scheme with small reconstruction and large privacy, and moreover that an asymptotically good linear code with asymptotically good dual yields an asymptotically good linear secret sharing scheme. The asymptotic linear secret sharing scheme was first considered and realized in [CC06], thereby enabling an “asymptotic version” of the general MPC theorem from [BGW88]. Since 2007, with the advent of the so-called “MPC-in-the-head paradigm” [IKOS09], these asymptotically-good schemes have been further exposed as a central theoretical primitive in numerous constant communication-rate results in multi-party cryptographic scenarios and – perhaps surprisingly – in two-party cryptography as well. Druk and Ishai [DI14] utilize an expander graph to construct a linear time encodable code; such a code combined with a linear-time universal hash function [CDD+15] yields an asymptotically good linear secret sharing scheme equipped with a linear time encoding algorithm. Recently, Cramer, Xing and Yuan [CXY20] construct an asymptotically good secret sharing scheme with quasi-linear time encoding and decoding algorithm.

We remark that the privacy and reconstruction of all above mentioned asymptotically good schemes do not achieve the optimal trade-off, i.e., GV bound. In contrast, the linear code derived from our linearized polynomials yields an asymptotically good linear secret sharing scheme with quasi-linear-time encoding algorithm, and moreover the privacy and reconstruction of the resulting scheme achieves the optimal trade-off.

Challenge of sublinear randomness.

As a final contribution, we highlight the inherent challenge of designing code ensembles consuming o(n)𝑜𝑛o(n)italic_o ( italic_n ) random bits outputting codes that achieve the EB bound with high probability – or for that matter, even the GV-bound.888Of course, recent breakthroughs [TS17] provide explicit binary codes of rate nearly Ω(ε2)Ωsuperscript𝜀2\Omega(\varepsilon^{2})roman_Ω ( italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) with minimum distance 12ε12𝜀\frac{1}{2}-\varepsilondivide start_ARG 1 end_ARG start_ARG 2 end_ARG - italic_ε; however such constructions seem inherently unable to reach the GV bound in other regimes, and even in the 12ε12𝜀\frac{1}{2}-\varepsilondivide start_ARG 1 end_ARG start_ARG 2 end_ARG - italic_ε distance regime the constant in front of the rate is unlikely to be pushed to 2ln222\frac{2}{\ln 2}divide start_ARG 2 end_ARG start_ARG roman_ln 2 end_ARG, as one would hope. More precisely, we observe that any code ensemble that is \ellroman_ℓ-locally similar to RLCs requires at least (1R)nlog2q1𝑅𝑛subscript2𝑞\ell(1-R)n\log_{2}qroman_ℓ ( 1 - italic_R ) italic_n roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_q bits of randomness. This is not much more than an observation – namely, that the granularity required by certain distributions is only achievable with this many bits of randomness – but we nonetheless believe that elucidating this shortcoming is insightful. Note that local similarity to RLC is merely a sufficient condition for a code ensemble to share combinatorial properties with RLCs. However, we emphasize that all the previous works (including our own) rely on local-similarity.

We also observe that our lower bound is tight: a simple twist on our codes from linearized polynomials PCLPPCLP\mathrm{PCLP}roman_PCLP requires only (1R)nlog2q1𝑅𝑛subscript2𝑞\ell(1-R)n\log_{2}qroman_ℓ ( 1 - italic_R ) italic_n roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_q to sample and is \ellroman_ℓ-locally similar to RLCs. In fact, this code is a natural generalization of the famous Wozencraft ensemble [Mas63]. That is, recall that the Wozencraft ensmeble is obtained by uniformly sampling β𝔽qk𝛽subscript𝔽superscript𝑞𝑘\beta\in\mathbb{F}_{q^{k}}italic_β ∈ blackboard_F start_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_POSTSUBSCRIPT and then defining

𝒞:={(φ(α),φ(βα)):α𝔽qk},assign𝒞conditional-set𝜑𝛼𝜑𝛽𝛼𝛼subscript𝔽superscript𝑞𝑘{\mathcal{C}}:=\{(\varphi(\alpha),\varphi(\beta\alpha)):\alpha\in\mathbb{F}_{q% ^{k}}\}\ ,caligraphic_C := { ( italic_φ ( italic_α ) , italic_φ ( italic_β italic_α ) ) : italic_α ∈ blackboard_F start_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_POSTSUBSCRIPT } ,

where φ:𝔽qk𝔽qk:𝜑subscript𝔽superscript𝑞𝑘superscriptsubscript𝔽𝑞𝑘\varphi:\mathbb{F}_{q^{k}}\to\mathbb{F}_{q}^{k}italic_φ : blackboard_F start_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_POSTSUBSCRIPT → blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT is a full-rank 𝔽qsubscript𝔽𝑞\mathbb{F}_{q}blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT-linear map. Defining f(X)=βX𝑓𝑋𝛽𝑋f(X)=\beta Xitalic_f ( italic_X ) = italic_β italic_X, the codewords of 𝒞𝒞{\mathcal{C}}caligraphic_C are thus of the form (φ(α),φ(f(α)))𝜑𝛼𝜑𝑓𝛼(\varphi(\alpha),\varphi(f(\alpha)))( italic_φ ( italic_α ) , italic_φ ( italic_f ( italic_α ) ) ). Note that f(X)𝑓𝑋f(X)italic_f ( italic_X ) is in fact a uniformly random linearized polynomial of q𝑞qitalic_q-degree at most 00. The generalization that we consider is thus to allow f(X)𝑓𝑋f(X)italic_f ( italic_X ) to have q𝑞qitalic_q-degree at most 11\ell-1roman_ℓ - 1, and we observe that indeed this code ensemble will be \ellroman_ℓ-locally similar to RLCs. However, a drawback is that this construction only naturally produces codes of rate 1/2121/21 / 2; by sampling r𝑟ritalic_r independent linearized polynomials we can also achieve rates of the form 1/r1𝑟1/r1 / italic_r for r𝑟r\in\mathbb{N}italic_r ∈ blackboard_N, but not any possible rate as we can with PCLPPCLP\mathrm{PCLP}roman_PCLP’s (which can themselves be similarly considered a different generalization of the Wozencraft ensemble). Further discussion of this construction is provided in Section 3.1.

To conclude this discussion, we provide Figure 1 summarizing our contributions and the prior state-of-the-art.

Source Code Randomness Dual Code
[GLM+22] Random Linear Code O(n2)𝑂superscript𝑛2O(n^{2})italic_O ( italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) EB
[MRR+20] Low-Density Parity-Check Codes O(Lnlogn)𝑂𝐿𝑛𝑛O(Ln\log n)italic_O ( italic_L italic_n roman_log italic_n )
[GM22] Puncturing of Low-Bias Code O(Ln)𝑂𝐿𝑛O(Ln)italic_O ( italic_L italic_n )
[PP23] Expander-Puncturing of Low-Bias Code O(Ln)𝑂𝐿𝑛O(Ln)italic_O ( italic_L italic_n )
Section 3 Codes from Linearized Polynomials Ln𝐿𝑛Lnitalic_L italic_n GV
Section 3 Generalized Wozencraft Ensemble L(1R)n𝐿1𝑅𝑛L(1-R)nitalic_L ( 1 - italic_R ) italic_n, R=1integer𝑅1integerR=\frac{1}{\text{integer}}italic_R = divide start_ARG 1 end_ARG start_ARG integer end_ARG
Section 4 Row-Column Polynomial Codes 2Ln2𝐿𝑛2Ln2 italic_L italic_n EB
Section 5 Lower Bound for RLC-similarity L(1Rε)n𝐿1𝑅𝜀𝑛L(1-R-\varepsilon)nitalic_L ( 1 - italic_R - italic_ε ) italic_n
Figure 1: Randomness requirements for binary codes achieving the Elias Bound. We remark that all the above constructions generalize to larger (but constant) q𝑞qitalic_q. Regarding the dual code criterion, “EB” means that the dual-code also achieves the Elias Bound (for lists of size L𝐿Litalic_L), while “GV” means that the dual distance achieves the GV bound. An ✗ means that no guarantees are provided (and, in certain cases, cannot hold). The lower bound applies to all ensembles that achieve similarity to RLC (a stronger property than the Elias bound; see Definitions 1.1, 2.14), including all constructions listed in this table.

1.2 Techniques

Given a random code 𝒞𝒞{\mathcal{C}}caligraphic_C of (design) rate R𝑅Ritalic_R sampled according to either of the above constructions, we wish to demonstrate that it behaves combinatorially much as an RLC 𝒟𝒟\mathcal{D}caligraphic_D of rate R𝑅Ritalic_R. More precisely, we consider any property 𝒫𝒫\mathcal{P}caligraphic_P obtained by forbidding \ellroman_ℓ-sized sets of vectors and wish to show that 𝒞𝒞{\mathcal{C}}caligraphic_C satisfies the property 𝒫𝒫\mathcal{P}caligraphic_P with probability roughly the same as 𝒟𝒟\mathcal{D}caligraphic_D. As discussed above, these properties capture well-studied notions like list-decodability and list-recoverability.

Fortunately, recent works [MRR+20, GM22] have introduced a calculus for making such arguments. Intuitively, a conclusion of these works is that it suffices to argue that, for any S𝔽qn𝑆superscriptsubscript𝔽𝑞𝑛S\subseteq\mathbb{F}_{q}^{n}italic_S ⊆ blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT of size \ellroman_ℓ, the probability that S𝑆Sitalic_S is contained in 𝒞𝒞{\mathcal{C}}caligraphic_C is roughly the same as the probability this holds for 𝒟𝒟\mathcal{D}caligraphic_D. Of course, this latter probability is q(1R)nrank(S)superscript𝑞1𝑅𝑛rank𝑆q^{-(1-R)n\operatorname{rank}(S)}italic_q start_POSTSUPERSCRIPT - ( 1 - italic_R ) italic_n roman_rank ( italic_S ) end_POSTSUPERSCRIPT, where rank(S)rank𝑆\operatorname{rank}(S)roman_rank ( italic_S ) denotes the dimension of the vector space spanned by S𝑆Sitalic_S.999At least, this holds exactly if one samples a RLC by choosing a uniformly random parity-check matrix, which is the model we consider in this work. For other natural models – e.g., sampling a uniformly random generator matrix – q(1R)nrank(S)superscript𝑞1𝑅𝑛rank𝑆q^{-(1-R)n\operatorname{rank}(S)}italic_q start_POSTSUPERSCRIPT - ( 1 - italic_R ) italic_n roman_rank ( italic_S ) end_POSTSUPERSCRIPT gives an upper on this probability.

In a bit more detail, these works in fact view such sets as matrices in 𝔽qn×superscriptsubscript𝔽𝑞𝑛\mathbb{F}_{q}^{n\times\ell}blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n × roman_ℓ end_POSTSUPERSCRIPT, and observe that the forbidden matrices for properties like list-decoding are closed under row-permutation. One can thus restrict to the various orbit classes of this action, and study these orbit classes one at a time. The requirement is in fact that, for each orbit class, the expected number of matrices from that class lying in 𝒞𝒞{\mathcal{C}}caligraphic_C is roughly the same as the expected number lying in 𝒟𝒟\mathcal{D}caligraphic_D. By identifying these orbit classes with row distributions, one obtains Definition 1.1.

For our specific constructions, for fixed vectors 𝒙𝔽qn𝒙superscriptsubscript𝔽𝑞𝑛\bm{x}\in\mathbb{F}_{q}^{n}bold_italic_x ∈ blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT we consider event indicator random variables X𝒙subscript𝑋𝒙X_{\bm{x}}italic_X start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT outputting 1111 if 𝒙𝒞𝒙𝒞\bm{x}\in{\mathcal{C}}bold_italic_x ∈ caligraphic_C, and observe that, for any 1b1𝑏1\leq b\leq\ell1 ≤ italic_b ≤ roman_ℓ, a b𝑏bitalic_b-tuple of random variables (X𝒙1,,X𝒙b)subscript𝑋subscript𝒙1subscript𝑋subscript𝒙𝑏(X_{\bm{x}_{1}},\dots,X_{\bm{x}_{b}})( italic_X start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) is independent if 𝒙1,,𝒙bsubscript𝒙1subscript𝒙𝑏\bm{x}_{1},\dots,\bm{x}_{b}bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_italic_x start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT are linearly independent. Of course, this also holds for random linear codes (in fact, it holds for tuples of all sizes),101010At least, this is true if one defines a random linear code by sampling a uniformly random parity-check matrix, which we here implicitly assume, and this is the sense in which our constructions approximate the “local behaviour” of random linear codes, which we can then bootstrap into full-blown \ellroman_ℓ-local similarity via the machinery of [MRR+20, GM22].

To analyze our first construction based on linearized polynomials, we exploit the fact that for any fixed tuple of inputs and outputs (x1,y1),,(xb,yb)subscript𝑥1subscript𝑦1subscript𝑥𝑏subscript𝑦𝑏(x_{1},y_{1}),\dots,(x_{b},y_{b})( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , ( italic_x start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ) with x1,,xb𝔽qnsubscript𝑥1subscript𝑥𝑏subscript𝔽superscript𝑞𝑛x_{1},\dots,x_{b}\in\mathbb{F}_{q^{n}}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ∈ blackboard_F start_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT linearly independent over 𝔽qsubscript𝔽𝑞\mathbb{F}_{q}blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT, over a uniformly random choice of linearized polynomial f(X)𝑓𝑋f(X)italic_f ( italic_X ) of q𝑞qitalic_q-degree at most 11\ell-1roman_ℓ - 1, the vector (f(x1),,f(xb))𝑓subscript𝑥1𝑓subscript𝑥𝑏(f(x_{1}),\dots,f(x_{b}))( italic_f ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , italic_f ( italic_x start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ) ) is distributed uniformly at random over 𝔽qnbsuperscriptsubscript𝔽superscript𝑞𝑛𝑏\mathbb{F}_{q^{n}}^{b}blackboard_F start_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT. This follows readily from properties of the Moore matrix M=(αiqj)ij𝑀subscriptsuperscriptsubscript𝛼𝑖superscript𝑞𝑗𝑖𝑗M=(\alpha_{i}^{q^{j}})_{ij}italic_M = ( italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT (recalling b𝑏b\leq\ellitalic_b ≤ roman_ℓ).

Next, we consider the code’s dual. In order to show that the dual of a PCLP(R,)PCLP𝑅\mathrm{PCLP}(R,\ell)roman_PCLP ( italic_R , roman_ℓ ) code achieves the GV-bound with high probably, we exploit a pleasant characterization of its dual. Namely, the dual is of the form {ψ(f01β):βW}conditional-set𝜓superscriptsubscript𝑓01𝛽𝛽𝑊\{\psi(f_{0}^{-1}\cdot\beta):\beta\in W\}{ italic_ψ ( italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ⋅ italic_β ) : italic_β ∈ italic_W }, where W𝑊Witalic_W is connected to the dual of V𝑉Vitalic_V and ψ𝜓\psiitalic_ψ is (morally) dual to φ𝜑\varphiitalic_φ.111111More precisely, if {α1,,αn}subscript𝛼1subscript𝛼𝑛\{\alpha_{1},\dots,\alpha_{n}\}{ italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_α start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } is a basis for 𝔽qnsubscript𝔽superscript𝑞𝑛\mathbb{F}_{q^{n}}blackboard_F start_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT for which φ(ixiαi)=(x1,,xn)𝜑subscript𝑖subscript𝑥𝑖subscript𝛼𝑖subscript𝑥1subscript𝑥𝑛\varphi\left(\sum_{i}x_{i}\alpha_{i}\right)=(x_{1},\dots,x_{n})italic_φ ( ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ), then ψ(iyiβi)=(y1,,yn)𝜓subscript𝑖subscript𝑦𝑖subscript𝛽𝑖subscript𝑦1subscript𝑦𝑛\psi\left(\sum_{i}y_{i}\beta_{i}\right)=(y_{1},\dots,y_{n})italic_ψ ( ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = ( italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) where {β1,,βn}subscript𝛽1subscript𝛽𝑛\{\beta_{1},\dots,\beta_{n}\}{ italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_β start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } is the dual basis. In particular, the dual is essentially another PCLP(1R,1)PCLP1𝑅1\mathrm{PCLP}(1-R,1)roman_PCLP ( 1 - italic_R , 1 )-code! Hence, the previous discussion implies it is 1111-locally similar to an RLC, which means in particular that it achieves the GV bound.

For the second construction, i.e., pseudorandom codes from row and column polynomials, upon summing over all choices of (full-rank) sets of message vectors one can observe that the desired behaviour of the random variables X𝒙isubscript𝑋subscript𝒙𝑖X_{\bm{x}_{i}}italic_X start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT in fact follows from the following: if X𝔽qb×k𝑋superscriptsubscript𝔽𝑞𝑏𝑘X\in\mathbb{F}_{q}^{b\times k}italic_X ∈ blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b × italic_k end_POSTSUPERSCRIPT is of rank b𝑏bitalic_b, then over the randomness of the generator G𝐺Gitalic_G, XG𝔽qb×n𝑋𝐺superscriptsubscript𝔽𝑞𝑏𝑛XG\in\mathbb{F}_{q}^{b\times n}italic_X italic_G ∈ blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b × italic_n end_POSTSUPERSCRIPT is uniformly random. Recalling G=G+G′′𝐺superscript𝐺superscript𝐺′′G=G^{\prime}+G^{\prime\prime}italic_G = italic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + italic_G start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT, we just must show XG𝑋superscript𝐺XG^{\prime}italic_X italic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is uniformly random. Exploiting the requirement that γ:𝔽qn𝔽qn:𝛾subscript𝔽superscript𝑞𝑛superscriptsubscript𝔽𝑞𝑛\gamma:\mathbb{F}_{q^{n}}\to\mathbb{F}_{q}^{n}italic_γ : blackboard_F start_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT → blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is an isomorphism, it suffices to show that the tuple

(i=1kXjif(αi))j[b]𝔽qnbsubscriptsuperscriptsubscript𝑖1𝑘subscript𝑋𝑗𝑖𝑓subscript𝛼𝑖𝑗delimited-[]𝑏superscriptsubscript𝔽superscript𝑞𝑛𝑏\displaystyle\left(\sum_{i=1}^{k}X_{ji}f(\alpha_{i})\right)_{j\in[b]}\in% \mathbb{F}_{q^{n}}^{b}( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT italic_f ( italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) start_POSTSUBSCRIPT italic_j ∈ [ italic_b ] end_POSTSUBSCRIPT ∈ blackboard_F start_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT (1)

is uniformly random over the choice of f𝑓fitalic_f. And this follows naturally from properties of the Vandermonde matrix, as the αisubscript𝛼𝑖\alpha_{i}italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s are distinct and f𝑓fitalic_f is chosen uniformly amongst all polynomials of degree 1absent1\leq\ell-1≤ roman_ℓ - 1 (recalling again b𝑏b\leq\ellitalic_b ≤ roman_ℓ).

The argument establishing \ellroman_ℓ-local-similarity for the dual is almost identical to the above argument for the primal. Here, it suffices to consider a matrix X𝔽qn×b𝑋superscriptsubscript𝔽𝑞𝑛𝑏X\in\mathbb{F}_{q}^{n\times b}italic_X ∈ blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n × italic_b end_POSTSUPERSCRIPT of rank b𝑏bitalic_b and argue that over the randomness of G′′superscript𝐺′′G^{\prime\prime}italic_G start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT now, G′′Xsuperscript𝐺′′𝑋G^{\prime\prime}Xitalic_G start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT italic_X is uniformly random. And to do this, one reduces to studying a tuple of random variables analogous to those in Equation 1, although in this case the polynomial g(X)𝑔𝑋g(X)italic_g ( italic_X ) will play the starring role. Since g(X)𝑔𝑋g(X)italic_g ( italic_X ) is again uniformly sampled from all polynomials of degree at most 11\ell-1roman_ℓ - 1, the desired conclusion follows.

2 Preliminaries

2.1 Miscellaneous Notation

By default, ={1,2,,}\mathbb{N}=\{1,2,\dots,\}blackboard_N = { 1 , 2 , … , }, i.e., 000\notin\mathbb{N}0 ∉ blackboard_N. For a positive integer n𝑛n\in\mathbb{N}italic_n ∈ blackboard_N, [n]:={1,,n}assigndelimited-[]𝑛1𝑛[n]:=\{1,\dots,n\}[ italic_n ] := { 1 , … , italic_n }. Throughout, q𝑞qitalic_q denotes a prime power, 𝔽qsubscript𝔽𝑞\mathbb{F}_{q}blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT denotes a finite field with q𝑞qitalic_q elements, and 𝔽qnsubscript𝔽superscript𝑞𝑛\mathbb{F}_{q^{n}}blackboard_F start_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT denotes a degree n𝑛nitalic_n extension of 𝔽qsubscript𝔽𝑞\mathbb{F}_{q}blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT (which of course has size qnsuperscript𝑞𝑛q^{n}italic_q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT). The q𝑞qitalic_q-ary entropy function is defined for x(0,1)𝑥01x\in(0,1)italic_x ∈ ( 0 , 1 ) as

hq(x):=xlogq(q1)+xlogq1x+(1x)logq11xassignsubscript𝑞𝑥𝑥subscript𝑞𝑞1𝑥subscript𝑞1𝑥1𝑥subscript𝑞11𝑥\displaystyle h_{q}(x):=x\log_{q}(q-1)+x\log_{q}\frac{1}{x}+(1-x)\log_{q}\frac% {1}{1-x}\ italic_h start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( italic_x ) := italic_x roman_log start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( italic_q - 1 ) + italic_x roman_log start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_x end_ARG + ( 1 - italic_x ) roman_log start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG 1 - italic_x end_ARG

and extended by continuity to hq(0)=0subscript𝑞00h_{q}(0)=0italic_h start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( 0 ) = 0 and hq(1)=logq(q1)subscript𝑞1subscript𝑞𝑞1h_{q}(1)=\log_{q}(q-1)italic_h start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( 1 ) = roman_log start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( italic_q - 1 ). This function is known to be monotonically increasing from 00 to 1111 on the interval [0,11/q]011𝑞[0,1-1/q][ 0 , 1 - 1 / italic_q ], and hence we can define its inverse hq1:[0,1][0,11/q]:superscriptsubscript𝑞101011𝑞h_{q}^{-1}:[0,1]\to[0,1-1/q]italic_h start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT : [ 0 , 1 ] → [ 0 , 1 - 1 / italic_q ].

Given a discrete distribution τ𝜏\tauitalic_τ and a universe U𝑈Uitalic_U, we write τUsimilar-to𝜏𝑈\tau\sim Uitalic_τ ∼ italic_U to denote that τ𝜏\tauitalic_τ is distributed over U𝑈Uitalic_U, i.e., that τ𝜏\tauitalic_τ’s support supp(τ):={x:τ(x)>0}Uassignsupp𝜏conditional-set𝑥𝜏𝑥0𝑈\mathrm{supp}(\tau):=\{x:\tau(x)>0\}\subseteq Uroman_supp ( italic_τ ) := { italic_x : italic_τ ( italic_x ) > 0 } ⊆ italic_U. In general, we write vectors in boldface – e.g., 𝒙𝒙\bm{x}bold_italic_x, 𝒚𝒚\bm{y}bold_italic_y, etc. – while scalars are unbolded.

2.2 Algebraic Concepts

Let Tr:𝔽qn𝔽q:Trsubscript𝔽superscript𝑞𝑛subscript𝔽𝑞\operatorname{Tr}:\mathbb{F}_{q^{n}}\rightarrow\mathbb{F}_{q}roman_Tr : blackboard_F start_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT → blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT be a trace function, i.e.,

Tr(α)=i=0n1αqi.Tr𝛼superscriptsubscript𝑖0𝑛1superscript𝛼superscript𝑞𝑖\operatorname{Tr}(\alpha)=\sum_{i=0}^{n-1}\alpha^{q^{i}}.roman_Tr ( italic_α ) = ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT italic_α start_POSTSUPERSCRIPT italic_q start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT .
Lemma 2.1.

Suppose α1,,αnsubscript𝛼1subscript𝛼𝑛\alpha_{1},\ldots,\alpha_{n}italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_α start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is a basis of 𝔽qnsubscript𝔽superscript𝑞𝑛\mathbb{F}_{q^{n}}blackboard_F start_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT over 𝔽qsubscript𝔽𝑞\mathbb{F}_{q}blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT. We can always find a dual basis β1,,βnsubscript𝛽1subscript𝛽𝑛\beta_{1},\ldots,\beta_{n}italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_β start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT in 𝔽qnsubscript𝔽superscript𝑞𝑛\mathbb{F}_{q^{n}}blackboard_F start_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT such that

Tr(αiβj)=δijTrsubscript𝛼𝑖subscript𝛽𝑗subscript𝛿𝑖𝑗\operatorname{Tr}(\alpha_{i}\beta_{j})=\delta_{ij}roman_Tr ( italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) = italic_δ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT (2)

where δij=0subscript𝛿𝑖𝑗0\delta_{ij}=0italic_δ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = 0 for any ij𝑖𝑗i\neq jitalic_i ≠ italic_j and is otherwise 1111.

Proof.

We provide the proof for completeness. Write βi=r=1nbi,rαrsubscript𝛽𝑖superscriptsubscript𝑟1𝑛subscript𝑏𝑖𝑟subscript𝛼𝑟\beta_{i}=\sum_{r=1}^{n}b_{i,r}\alpha_{r}italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_r = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_b start_POSTSUBSCRIPT italic_i , italic_r end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT and we consider the equations

δji=Tr(αjβi)=Tr(αjr=1nbi,rαr).subscript𝛿𝑗𝑖Trsubscript𝛼𝑗subscript𝛽𝑖Trsubscript𝛼𝑗superscriptsubscript𝑟1𝑛subscript𝑏𝑖𝑟subscript𝛼𝑟\delta_{ji}=\operatorname{Tr}(\alpha_{j}\beta_{i})=\operatorname{Tr}\left(% \alpha_{j}\sum_{r=1}^{n}b_{i,r}\alpha_{r}\right)\ .italic_δ start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT = roman_Tr ( italic_α start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = roman_Tr ( italic_α start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_r = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_b start_POSTSUBSCRIPT italic_i , italic_r end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) .

Define the n×n𝑛𝑛n\times nitalic_n × italic_n matrix T=(Tr(αjαr))j,r[n]𝑇subscriptTrsubscript𝛼𝑗subscript𝛼𝑟𝑗𝑟delimited-[]𝑛T=(\operatorname{Tr}(\alpha_{j}\alpha_{r}))_{j,r\in[n]}italic_T = ( roman_Tr ( italic_α start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) ) start_POSTSUBSCRIPT italic_j , italic_r ∈ [ italic_n ] end_POSTSUBSCRIPT over 𝔽qsubscript𝔽𝑞\mathbb{F}_{q}blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT; the above n𝑛nitalic_n equations can be written as T(bi,1,,bi,n)=𝒆i𝑇superscriptsubscript𝑏𝑖1subscript𝑏𝑖𝑛topsubscript𝒆𝑖T(b_{i,1},\ldots,b_{i,n})^{\top}=\bm{e}_{i}italic_T ( italic_b start_POSTSUBSCRIPT italic_i , 1 end_POSTSUBSCRIPT , … , italic_b start_POSTSUBSCRIPT italic_i , italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT = bold_italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT where 𝒆isubscript𝒆𝑖\bm{e}_{i}bold_italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the i𝑖iitalic_i-th vector in the standard basis of 𝔽qnsuperscriptsubscript𝔽𝑞𝑛\mathbb{F}_{q}^{n}blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. Since α1,,αnsubscript𝛼1subscript𝛼𝑛\alpha_{1},\ldots,\alpha_{n}italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_α start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT forms a basis of 𝔽qnsubscript𝔽superscript𝑞𝑛\mathbb{F}_{q^{n}}blackboard_F start_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, T𝑇Titalic_T has full rank and there must exists a nonzero solution for bi,1,,bi,nsubscript𝑏𝑖1subscript𝑏𝑖𝑛b_{i,1},\ldots,b_{i,n}italic_b start_POSTSUBSCRIPT italic_i , 1 end_POSTSUBSCRIPT , … , italic_b start_POSTSUBSCRIPT italic_i , italic_n end_POSTSUBSCRIPT. Thus, we can always find β1,,βnsubscript𝛽1subscript𝛽𝑛\beta_{1},\ldots,\beta_{n}italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_β start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT which satisfy (2). It remains to show that β1,,βnsubscript𝛽1subscript𝛽𝑛\beta_{1},\ldots,\beta_{n}italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_β start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT are 𝔽qsubscript𝔽𝑞\mathbb{F}_{q}blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT-linearly independent. Assume not and without loss of generality we may assume βnsubscript𝛽𝑛\beta_{n}italic_β start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT can be represented as the linear combination of β1,,βn1subscript𝛽1subscript𝛽𝑛1\beta_{1},\ldots,\beta_{n-1}italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_β start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT, i.e., βn=i=1n1λiβisubscript𝛽𝑛superscriptsubscript𝑖1𝑛1subscript𝜆𝑖subscript𝛽𝑖\beta_{n}=\sum_{i=1}^{n-1}\lambda_{i}\beta_{i}italic_β start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT with λ1,,λn1𝔽qsubscript𝜆1subscript𝜆𝑛1subscript𝔽𝑞\lambda_{1},\ldots,\lambda_{n-1}\in\mathbb{F}_{q}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_λ start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ∈ blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT. From (2) we have

1=Tr(αnβn)=i=1n1λiTr(αnβi)=0,1Trsubscript𝛼𝑛subscript𝛽𝑛superscriptsubscript𝑖1𝑛1subscript𝜆𝑖Trsubscript𝛼𝑛subscript𝛽𝑖01=\operatorname{Tr}(\alpha_{n}\beta_{n})=\sum_{i=1}^{n-1}\lambda_{i}% \operatorname{Tr}(\alpha_{n}\beta_{i})=0\ ,1 = roman_Tr ( italic_α start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_Tr ( italic_α start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = 0 ,

a clear contradiction. ∎

We now introduce the concept of orthogonality for two 𝔽qsubscript𝔽𝑞\mathbb{F}_{q}blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT-subspaces of 𝔽qnsubscript𝔽superscript𝑞𝑛\mathbb{F}_{q^{n}}blackboard_F start_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT.

Definition 2.2.

Let V,W𝔽qn𝑉𝑊subscript𝔽superscript𝑞𝑛V,W\subseteq\mathbb{F}_{q^{n}}italic_V , italic_W ⊆ blackboard_F start_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT be a 𝔽qsubscript𝔽𝑞\mathbb{F}_{q}blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT-linear space. W𝑊Witalic_W is said to be orthogonal to V𝑉Vitalic_V if the following holds

Tr(ab)=0,aV,bWformulae-sequenceTr𝑎𝑏0formulae-sequencefor-all𝑎𝑉𝑏𝑊\operatorname{Tr}(ab)=0,\quad\forall a\in V,b\in Wroman_Tr ( italic_a italic_b ) = 0 , ∀ italic_a ∈ italic_V , italic_b ∈ italic_W

We write WVperpendicular-to𝑊𝑉W\perp Vitalic_W ⟂ italic_V to denote that W𝑊Witalic_W is orthogonal to V𝑉Vitalic_V. If dim(W)+dim(V)=ndimension𝑊dimension𝑉𝑛\dim(W)+\dim(V)=nroman_dim ( italic_W ) + roman_dim ( italic_V ) = italic_n, W𝑊Witalic_W is said to be the dual space of V𝑉Vitalic_V.

Finally, we collect terminology connected to linearized polynomials.

Definition 2.3 (Linearized Polynomial).

We call a polynomial f(X)𝔽qn[X]𝑓𝑋subscript𝔽superscript𝑞𝑛delimited-[]𝑋f(X)\in\mathbb{F}_{q^{n}}[X]italic_f ( italic_X ) ∈ blackboard_F start_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ italic_X ] a linearized polynomial if it is of the form i=0dfiXqisuperscriptsubscript𝑖0𝑑subscript𝑓𝑖superscript𝑋superscript𝑞𝑖\sum_{i=0}^{d}f_{i}X^{q^{i}}∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUPERSCRIPT italic_q start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT with fi𝔽qnsubscript𝑓𝑖subscript𝔽superscript𝑞𝑛f_{i}\in\mathbb{F}_{q^{n}}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_F start_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT and d𝑑d\in\mathbb{N}italic_d ∈ blackboard_N. That is, the only monomials appearing in f(X)𝑓𝑋f(X)italic_f ( italic_X ) have exponent a power of q𝑞qitalic_q. Its q𝑞qitalic_q-degree is max{i:fi0}:𝑖subscript𝑓𝑖0\max\{i:f_{i}\neq 0\}roman_max { italic_i : italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≠ 0 }.

Recall that the Frobenius automorphism Frob:𝔽qn𝔽qn:Frobsubscript𝔽superscript𝑞𝑛subscript𝔽superscript𝑞𝑛\mathrm{Frob}:\mathbb{F}_{q^{n}}\to\mathbb{F}_{q^{n}}roman_Frob : blackboard_F start_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT → blackboard_F start_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT defined by Frob(α)=αqFrob𝛼superscript𝛼𝑞\mathrm{Frob}(\alpha)=\alpha^{q}roman_Frob ( italic_α ) = italic_α start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT is 𝔽qsubscript𝔽𝑞\mathbb{F}_{q}blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT-linear, i.e., it holds that for all a,b𝔽q𝑎𝑏subscript𝔽𝑞a,b\in\mathbb{F}_{q}italic_a , italic_b ∈ blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT and α,β𝔽qn𝛼𝛽subscript𝔽superscript𝑞𝑛\alpha,\beta\in\mathbb{F}_{q^{n}}italic_α , italic_β ∈ blackboard_F start_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT,

(aα+bβ)q=aαq+bβq.superscript𝑎𝛼𝑏𝛽𝑞𝑎superscript𝛼𝑞𝑏superscript𝛽𝑞(a\alpha+b\beta)^{q}=a\alpha^{q}+b\beta^{q}\ .( italic_a italic_α + italic_b italic_β ) start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT = italic_a italic_α start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT + italic_b italic_β start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT .

This readily implies that any linearized polynomial is also an 𝔽qsubscript𝔽𝑞\mathbb{F}_{q}blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT-linear map, justifying the name. We record this fact now.

Proposition 2.4.

Any linearized polynomial defines an 𝔽qsubscript𝔽𝑞\mathbb{F}_{q}blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT-linear map from 𝔽qn𝔽qnsubscript𝔽superscript𝑞𝑛subscript𝔽superscript𝑞𝑛\mathbb{F}_{q^{n}}\to\mathbb{F}_{q^{n}}blackboard_F start_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT → blackboard_F start_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT.

2.3 Coding Theory

A linear code 𝒞𝒞{\mathcal{C}}caligraphic_C is a subspace of 𝔽qnsuperscriptsubscript𝔽𝑞𝑛\mathbb{F}_{q}^{n}blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT for a prime power q𝑞qitalic_q. Such a code may always be presented in terms of its generator matrix, which is a matrix G𝔽qk×n𝐺superscriptsubscript𝔽𝑞𝑘𝑛G\in\mathbb{F}_{q}^{k\times n}italic_G ∈ blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k × italic_n end_POSTSUPERSCRIPT for which 𝒞={𝒎G:𝒎𝔽qk}𝒞conditional-set𝒎𝐺𝒎superscriptsubscript𝔽𝑞𝑘{\mathcal{C}}=\{\bm{m}G:\bm{m}\in\mathbb{F}_{q}^{k}\}caligraphic_C = { bold_italic_m italic_G : bold_italic_m ∈ blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT }. When q=2𝑞2q=2italic_q = 2, a code is called binary. The block-length of the code is n𝑛nitalic_n and its rate is R:=knassign𝑅𝑘𝑛R:=\frac{k}{n}italic_R := divide start_ARG italic_k end_ARG start_ARG italic_n end_ARG, where k=dim(𝒞)𝑘dimension𝒞k=\dim({\mathcal{C}})italic_k = roman_dim ( caligraphic_C ). We endow 𝔽qnsuperscriptsubscript𝔽𝑞𝑛\mathbb{F}_{q}^{n}blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT with the (relative) Hamming metric d(𝒙,𝒚):=1n|{i[n]:xiyi}|assign𝑑𝒙𝒚1𝑛conditional-set𝑖delimited-[]𝑛subscript𝑥𝑖subscript𝑦𝑖d(\bm{x},\bm{y}):=\tfrac{1}{n}|\{i\in[n]:x_{i}\neq y_{i}\}|italic_d ( bold_italic_x , bold_italic_y ) := divide start_ARG 1 end_ARG start_ARG italic_n end_ARG | { italic_i ∈ [ italic_n ] : italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≠ italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } | for 𝒙,𝒚𝔽qn𝒙𝒚superscriptsubscript𝔽𝑞𝑛\bm{x},\bm{y}\in\mathbb{F}_{q}^{n}bold_italic_x , bold_italic_y ∈ blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. For a linear code 𝒞𝔽qn𝒞superscriptsubscript𝔽𝑞𝑛{\mathcal{C}}\leq\mathbb{F}_{q}^{n}caligraphic_C ≤ blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, its dual code is defined as 𝒞:={𝒚𝔽qn:𝒙𝒞,𝒙,𝒚=0}assignsuperscript𝒞perpendicular-toconditional-set𝒚superscriptsubscript𝔽𝑞𝑛formulae-sequencefor-all𝒙𝒞𝒙𝒚0{\mathcal{C}}^{\perp}:=\{\bm{y}\in\mathbb{F}_{q}^{n}:\forall\bm{x}\in{\mathcal% {C}},\leavevmode\nobreak\ \langle\bm{x},\bm{y}\rangle=0\}caligraphic_C start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT := { bold_italic_y ∈ blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT : ∀ bold_italic_x ∈ caligraphic_C , ⟨ bold_italic_x , bold_italic_y ⟩ = 0 }. 121212Note the contrast with Definition 2.2: that definition is concerned with 𝔽qsubscript𝔽𝑞\mathbb{F}_{q}blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT-linear subspaces of the ambient space 𝔽qnsubscript𝔽superscript𝑞𝑛\mathbb{F}_{q^{n}}blackboard_F start_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, which is endowed with a different inner-product than 𝔽qnsuperscriptsubscript𝔽𝑞𝑛\mathbb{F}_{q}^{n}blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. The appropriate meaning of perpendicular-to\perp can therefore be deduced from the context.

A random linear code 𝒞𝒞{\mathcal{C}}caligraphic_C of rate R=k/n𝑅𝑘𝑛R=k/nitalic_R = italic_k / italic_n – briefly, a RLC(R)𝑅(R)( italic_R ) – is defined to be the kernel of a uniformly random matrix H𝔽q(nk)×n𝐻superscriptsubscript𝔽𝑞𝑛𝑘𝑛H\in\mathbb{F}_{q}^{(n-k)\times n}italic_H ∈ blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n - italic_k ) × italic_n end_POSTSUPERSCRIPT, i.e., 𝒞:={𝒙𝔽qn:H𝒙=0}assign𝒞conditional-set𝒙superscriptsubscript𝔽𝑞𝑛𝐻superscript𝒙top0{\mathcal{C}}:=\{\bm{x}\in\mathbb{F}_{q}^{n}:H\bm{x}^{\top}=0\}caligraphic_C := { bold_italic_x ∈ blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT : italic_H bold_italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT = 0 }.

This work concerns combinatorial properties of linear codes. The quintessential example of such a property is minimum distance defined as δ=min{d(𝒙,𝒚:𝒙𝒚,𝒙,𝒚𝒞}\delta=\min\{d(\bm{x},\bm{y}:\bm{x}\neq\bm{y},\bm{x},\bm{y}\in{\mathcal{C}}\}italic_δ = roman_min { italic_d ( bold_italic_x , bold_italic_y : bold_italic_x ≠ bold_italic_y , bold_italic_x , bold_italic_y ∈ caligraphic_C }. Equivalently, it is min{wt(𝒙):𝒙𝒞{0}}:wt𝒙𝒙𝒞0\min\{\mathrm{wt}(\bm{x}):\bm{x}\in{\mathcal{C}}\setminus\{0\}\}roman_min { roman_wt ( bold_italic_x ) : bold_italic_x ∈ caligraphic_C ∖ { 0 } }, the minimum weight of a non-zero codeword. By the triangle-inequality for the Hamming metric, it is immediate that δ/2𝛿2\delta/2italic_δ / 2 is the maximum radius at which one can hope to uniquely-decode from worst-case errors. If one relaxes the requirement for unique-decoding and is satisfied with outputting a list of possible messages, then one arrives list-decoding.

Definition 2.5 (List-Decodability).

Let ρ(0,11/q)𝜌011𝑞\rho\in(0,1-1/q)italic_ρ ∈ ( 0 , 1 - 1 / italic_q ) and L1𝐿1L\geq 1italic_L ≥ 1. A code 𝒞𝔽qn𝒞superscriptsubscript𝔽𝑞𝑛{\mathcal{C}}\subseteq\mathbb{F}_{q}^{n}caligraphic_C ⊆ blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is (ρ,L)𝜌𝐿(\rho,L)( italic_ρ , italic_L )-list-decodable if for all 𝒛𝔽qn𝒛superscriptsubscript𝔽𝑞𝑛\bm{z}\in\mathbb{F}_{q}^{n}bold_italic_z ∈ blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT,

|{𝒄𝒞:d(𝒄,𝒛)ρ}|<L.conditional-set𝒄𝒞𝑑𝒄𝒛𝜌𝐿|\{\bm{c}\in{\mathcal{C}}:d(\bm{c},\bm{z})\leq\rho\}|<L\ .| { bold_italic_c ∈ caligraphic_C : italic_d ( bold_italic_c , bold_italic_z ) ≤ italic_ρ } | < italic_L .

A generalization of list-decoding is proferred by list-recovery. For this notion, we extend the definition of Hamming distance to allow one of the arguments to be a tuple of sets 𝑺=(S1,,Sn)𝑺subscript𝑆1subscript𝑆𝑛\bm{S}=(S_{1},\dots,S_{n})bold_italic_S = ( italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ), where each Si𝔽qsubscript𝑆𝑖subscript𝔽𝑞S_{i}\subseteq\mathbb{F}_{q}italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊆ blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT, as follows: d(𝒙,𝑺):=1n|{i[n]:xiSi}|assign𝑑𝒙𝑺1𝑛conditional-set𝑖delimited-[]𝑛subscript𝑥𝑖subscript𝑆𝑖d(\bm{x},\bm{S}):=\frac{1}{n}|\{i\in[n]:x_{i}\notin S_{i}\}|italic_d ( bold_italic_x , bold_italic_S ) := divide start_ARG 1 end_ARG start_ARG italic_n end_ARG | { italic_i ∈ [ italic_n ] : italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∉ italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } |.

Definition 2.6 (List-Recovery).

Let ρ(0,11/q)𝜌011𝑞\rho\in(0,1-1/q)italic_ρ ∈ ( 0 , 1 - 1 / italic_q ), 1λq11𝜆𝑞11\leq\lambda\leq q-11 ≤ italic_λ ≤ italic_q - 1 and L1𝐿1L\geq 1italic_L ≥ 1. A code 𝒞𝔽qn𝒞superscriptsubscript𝔽𝑞𝑛{\mathcal{C}}\subseteq\mathbb{F}_{q}^{n}caligraphic_C ⊆ blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is (ρ,λ,L)𝜌𝜆𝐿(\rho,\lambda,L)( italic_ρ , italic_λ , italic_L )-list-recoverable if for all tuples 𝑺=(S1,,Sn)𝑺subscript𝑆1subscript𝑆𝑛\bm{S}=(S_{1},\dots,S_{n})bold_italic_S = ( italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) with each Si𝔽qsubscript𝑆𝑖subscript𝔽𝑞S_{i}\subseteq\mathbb{F}_{q}italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊆ blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT satisfying |Si|λsubscript𝑆𝑖𝜆|S_{i}|\leq\lambda| italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ≤ italic_λ,

|{𝒄𝒞:d(𝒄,𝑺)ρ}|<L.conditional-set𝒄𝒞𝑑𝒄𝑺𝜌𝐿|\{\bm{c}\in{\mathcal{C}}:d(\bm{c},\bm{S})\leq\rho\}|<L\ .| { bold_italic_c ∈ caligraphic_C : italic_d ( bold_italic_c , bold_italic_S ) ≤ italic_ρ } | < italic_L .

These are both special cases of the much more general class of local properties, which we now introduce. The technical terminology takes some time to motivate and define, but will allow for a very clean argument once we have it in place.

2.4 Local Properties

We now introduce the specialized notations and tools that we need in order to apply the machinery of [MRR+20, GM22]. Generally speaking, this machinery allows us to efficiently reason about the probability that sets of \ellroman_ℓ vectors (for any integer =O(1)𝑂1\ell=O(1)roman_ℓ = italic_O ( 1 )) lie in random ensembles of codes. In fact, it is convenient to (arbitrarily) order these sets and thereby view them as matrices. Thus, for A𝔽qn×𝐴superscriptsubscript𝔽𝑞𝑛A\in\mathbb{F}_{q}^{n\times\ell}italic_A ∈ blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n × roman_ℓ end_POSTSUPERSCRIPT we will talk about events of the form “A𝒞𝐴𝒞A\subseteq{\mathcal{C}}italic_A ⊆ caligraphic_C”, which denotes that event that every column of A𝐴Aitalic_A is contained in 𝒞𝒞{\mathcal{C}}caligraphic_C.

To index these events, we assign to each matrix a type, which is determined by the empirical row distribution of the matrix.

Definition 2.7 (Empirical Row Distribution).

Let A𝔽qn×𝐴superscriptsubscript𝔽𝑞𝑛A\in\mathbb{F}_{q}^{n\times\ell}italic_A ∈ blackboard_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n × roman_ℓ end_POSTSUPERSCRIPT. We define its empirical row distribution 𝖤𝗆𝗉A𝔽q