(Translated by https://www.hiragana.jp/)
Ditto: Quantization-aware Secure Inference of Transformers upon MPC

Ditto: Quantization-aware Secure Inference of Transformers upon MPC

Haoqi Wu    Wenjing Fang    Yancheng Zheng    Junming Ma    Jin Tan    Yinggui Wang    Lei Wang
Abstract

Due to the rising privacy concerns on sensitive client data and trained models like Transformers, secure multi-party computation (MPC) techniques are employed to enable secure inference despite attendant overhead. Existing works attempt to reduce the overhead using more MPC-friendly non-linear function approximations. However, the integration of quantization widely used in plaintext inference into the MPC domain remains unclear. To bridge this gap, we propose the framework named Ditto to enable more efficient quantization-aware secure Transformer inference. Concretely, we first incorporate an MPC-friendly quantization into Transformer inference and employ a quantization-aware distillation procedure to maintain the model utility. Then, we propose novel MPC primitives to support the type conversions that are essential in quantization and implement the quantization-aware MPC execution of secure quantized inference. This approach significantly decreases both computation and communication overhead, leading to improvements in overall efficiency. We conduct extensive experiments on Bert and GPT2 models to evaluate the performance of Ditto. The results demonstrate that Ditto is about 3.144.40×3.14\sim 4.40\times3.14 ∼ 4.40 × faster than MPCFormer (ICLR 2023) and 1.442.35×1.44\sim 2.35\times1.44 ∼ 2.35 × faster than the state-of-the-art work PUMA with negligible utility degradation.

Machine Learning, ICML

1 Introduction

The recent advance of pre-trained Transformer (Vaswani et al., 2017) models in domains like visual recognition (Dosovitskiy et al., 2021; Chen et al., 2021) and natural language processing (Devlin et al., 2019; Radford et al., 2019) have led to their widespread adoption for machine learning (ML) inference services. However, despite their increasing popularity, a major concern revolves around data security. In ML services like ChatGPT (Brown et al., 2020), the model owner offers an API that receives user prompts as input and generates answers in return. However, this process involves sending user prompts to the server in plaintext, potentially exposing sensitive information. An alternative approach is to employ secure multi-party computation (MPC) (Shamir, 1979; Yao, 1986) to offer provable security.

However, the huge computation and communication overhead introduced by MPC techniques hinders the application of MPC-based secure Transformer inference. For one thing, non-linear functions like GeLU are frequently invoked, which are expensive in MPC. For another, the overhead is amplified in Transformers due to their large model size. In general, Transformers have millions to billions of parameters. As to the former problem, (Chou et al., 2018; Li et al., 2023; Akimoto et al., 2023; Liu & Liu, 2023) replace these non-linear functions with MPC-friendly approximations. Regarding the latter, there have been practices in plaintext inference (Dettmers et al., 2022; Kim et al., 2021) that quantize the model to lower bits and employ low-bit integer arithmetic, thus accelerating the inference. However, plaintext quantization cannot be trivially incorporated into secure inference upon MPC.

Refer to caption
Figure 1: Communication size breakdown on standard vs. quantized secure matrix multiplication. 𝐘=𝐗8×768𝐖768×3072𝐘superscript𝐗8768superscript𝐖7683072\mathbf{Y}=\mathbf{X}^{8\times 768}\cdot\mathbf{W}^{768\times 3072}bold_Y = bold_X start_POSTSUPERSCRIPT 8 × 768 end_POSTSUPERSCRIPT ⋅ bold_W start_POSTSUPERSCRIPT 768 × 3072 end_POSTSUPERSCRIPT. Scale denotes multiplying the dot product by the scaling factor s𝑠sitalic_s. Clip typically restricts the values within the range [128,127]128127[-128,127][ - 128 , 127 ] for int8 quantization.

Toy Example. For a linear layer in Transformers 𝐘=𝐗𝐖𝐘𝐗𝐖\mathbf{Y}=\mathbf{X}\cdot\mathbf{W}bold_Y = bold_X ⋅ bold_W, 𝐘B×M𝐘superscript𝐵𝑀\mathbf{Y}\in\mathbb{R}^{B\times M}bold_Y ∈ blackboard_R start_POSTSUPERSCRIPT italic_B × italic_M end_POSTSUPERSCRIPT, 𝐗B×N𝐗superscript𝐵𝑁\mathbf{X}\in\mathbb{R}^{B\times N}bold_X ∈ blackboard_R start_POSTSUPERSCRIPT italic_B × italic_N end_POSTSUPERSCRIPT and 𝐖N×M𝐖superscript𝑁𝑀\mathbf{W}\in\mathbb{R}^{N\times M}bold_W ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_M end_POSTSUPERSCRIPT, where B𝐵Bitalic_B is the batch size, N𝑁Nitalic_N and M𝑀Mitalic_M are hidden dimensions. A standard floating-point matrix multiplication over MPC typically operates over uniform 64-bit fixed-point numbers 𝐘fxp64=𝐗fxp64𝐖fxp64subscript𝐘fxp64subscript𝐗fxp64subscript𝐖fxp64\mathbf{Y}_{\text{fxp}64}=\mathbf{X}_{\text{fxp}64}\cdot\mathbf{W}_{\text{fxp}% 64}bold_Y start_POSTSUBSCRIPT fxp 64 end_POSTSUBSCRIPT = bold_X start_POSTSUBSCRIPT fxp 64 end_POSTSUBSCRIPT ⋅ bold_W start_POSTSUBSCRIPT fxp 64 end_POSTSUBSCRIPT , while quantized matrix multiplication goes as 𝐘i8=(𝐗i8𝐖i8)i32s\mathbf{Y}_{i8}=\lfloor(\mathbf{X}_{i8}\cdot\mathbf{W}_{i8})_{i32}\cdot s\rceilbold_Y start_POSTSUBSCRIPT italic_i 8 end_POSTSUBSCRIPT = ⌊ ( bold_X start_POSTSUBSCRIPT italic_i 8 end_POSTSUBSCRIPT ⋅ bold_W start_POSTSUBSCRIPT italic_i 8 end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i 32 end_POSTSUBSCRIPT ⋅ italic_s ⌉, where the multiply-and-accumulate of 𝐗i8𝐖i8subscript𝐗𝑖8subscript𝐖𝑖8\mathbf{X}_{i8}\cdot\mathbf{W}_{i8}bold_X start_POSTSUBSCRIPT italic_i 8 end_POSTSUBSCRIPT ⋅ bold_W start_POSTSUBSCRIPT italic_i 8 end_POSTSUBSCRIPT is computed over 32-bit integers, s𝑠sitalic_s is a floating-point scaling factor and delimited-⌊⌉\lfloor\cdot\rceil⌊ ⋅ ⌉ denotes the clip operation. As demonstrated in Figure 1, while quantization indeed diminishes the bitwidth, thereby reducing the communication of dot product, the additional procedures, i.e., scaling and clipping impose substantial overhead, surpassing the benefits derived from quantization.

It is worth noting that there are several cross-domain gaps between the worlds of ML and MPC since

  • ML experts mainly focus on designing delicate quantization methods to improve efficiency, which, however, may not be MPC-friendly. The type conversions between data types like INT8 and FP16 in plaintext quantization that involves scaling and clipping operations are not trivial in MPC. Besides, applying quantization directly may cause substantial utility drop.

  • MPC experts mainly focus on constructing efficient underlying primitives and may not be aware of employing mixed-precision quantization to enhance end-to-end inference efficiency, thus lacking the capability to support quantization-aware secure inference.

Therefore, the problem naturally arises:

Can we perform quantization-aware secure inference with negligible utility degradation?

As an answer to the above question, we develop Ditto, an easy-to-deploy framework for secure and efficient Transformer inference based on a co-design of novel MPC primitives and ML quantization. Our contributions are as follows:

  • MPC-friendly Quantization-Aware Distillation. We propose to incorporate static dyadic quantization (i.e., from floating-point to fixed-point representation) to avoid CPU-cheap yet MPC-expensive dynamic quantization operations like clip and max. With lower quantization precision, a smaller bitwidth is required, thus reducing the overall overhead in secure inference. Besides, we utilize knowledge distillation to perform quantization-aware distillation on pre-trained models to retain the model utility.

  • Secure Quantized Inference Framework. To the best of our knowledge, Ditto is the first framework that supports the MPC execution of quantization-aware secure inference. Concretely, the layer-wise quantized computations are automatically mapped to secure computations over different data types (essentially different rings). To do so, we propose novel MPC primitives to support efficient interchangeable type conversions. We will open-source the code once accepted.

  • Empirical Evaluations. We evaluate Ditto in the secure inference of several Transformer models, i.e., Bert and GPT2. The performance is evaluated from two metrics: the model utility and efficiency. The evaluation results indicate that efficiency improvement can be achieved without a significant utility drop. Compared to prior works, Ditto is about 3.144.40×3.14\sim 4.40\times3.14 ∼ 4.40 × faster than MPCFormer (Li et al., 2023) and 1.442.35×1.44\sim 2.35\times1.44 ∼ 2.35 × faster than PUMA (Dong et al., 2023).

2 Related Work

Secure Transformer Inference.

MPC originates from the Billionaire problem (Yao, 1986; Shamir, 1979) and aims to enable multiple untrusted parties to jointly compute a function while keeping the inputs private. There have been many prior works working on privacy-preserving machine learning (mainly focusing on convolutional neural networks), including two-party computation (2PC) setting (Mohassel & Zhang, 2017; Patra et al., 2021; Huang et al., 2022), 3PC setting (Mohassel & Rindal, 2018; Tan et al., 2021; Wagh et al., 2021) and even 4PC setting (Byali et al., 2020; Dalskov et al., 2021). Recent works (Li et al., 2023; Hao et al., 2022; Akimoto et al., 2023; Liang et al., 2023; Liu & Liu, 2023; Dong et al., 2023; Zhang et al., 2023) further study the secure inference of more complex Transformer models. These approaches mainly use MPC-friendly approximations for non-linear functions. For example, MPCFormer (Li et al., 2023), MPCViT (Zeng et al., 2023) and SAL-ViT (Zhang et al., 2023) propose more efficient softmax approximations, which are complementary to our work. We take the first step towards leveraging MPC-friendly quantization to enhance the efficiency. Among these works, 3PC in semi-honest and honest-majority setting (Li et al., 2023; Dong et al., 2023) achieves the overall best efficiency. In this work, we also adopt this setting.

Secure Quantized Inference. In MPC-based secure inference, floating-point arithmetic is typically quantized to fixed-points to enable integer computations over rings (Li et al., 2023; Dong et al., 2023; Mohassel & Rindal, 2018; Dalskov et al., 2020). XONN (Riazi et al., 2019) and Quotient (Agrawal et al., 2019) utilize binary or ternary quantized weights to accelerate inference. Nonetheless, such quantization may result in utility drop, especially in large Transformers. SecureQ8 (Dalskov et al., 2020) pioneers the use of int8 quantization in secure inference. However, the quantization of plaintext weights/activations to 8-bit integers does not extend to ciphertexts, which remain encoded over a uniformly large ring, typically 64-bit 264subscriptsuperscript264\mathbb{Z}_{2^{64}}blackboard_Z start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT 64 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT. This discrepancy stems from the absence of type conversion methods and the necessity for higher bitwidths in certain non-linear computations to maintain sufficient precision. This inconsistency impedes the potential for efficiency gains that SecureQ8 might otherwise offer. In this paper, we aim to resolve the above drawbacks for secure Transformer inference.

3 Background

In this section, we briefly introduce the Transformer models and the underlying MPC protocols, specifically 2-out-of-3 replicated secret sharing. Finally, we introduce the quantization methods, along with the fixed-point quantization.

3.1 Transformer and its Variants

Transformer models generally consist of three parts: 1) the Embedding module that maps a discrete token to its continuous hidden vector representation; 2) a stack of Transformer Block; 3) the last Prediction module that maps the hidden vector to task-specific representation. For Transformer Block, it typically has Attention and Feed-Forward Network (FFN) modules.

Attention module can be formulated as 𝖲𝗈𝖿𝗍𝗆𝖺𝗑(𝖰𝖪+𝖬)𝖵𝖲𝗈𝖿𝗍𝗆𝖺𝗑𝖰superscript𝖪top𝖬𝖵\mathsf{Softmax}(\mathsf{Q}\cdot\mathsf{K}^{\top}+\mathsf{M})\cdot\mathsf{V}sansserif_Softmax ( sansserif_Q ⋅ sansserif_K start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + sansserif_M ) ⋅ sansserif_V, where 𝖰,𝖪,𝖵𝖰𝖪𝖵\mathsf{Q},\mathsf{K},\mathsf{V}sansserif_Q , sansserif_K , sansserif_V denote the vectors obtained by the matrix multiplication of input activation and three weight matrices, and 𝖬𝖬\mathsf{M}sansserif_M denotes the attention mask. The two widely-used variants, Bert and GPT, use different masks.

FFN module can be formulated as 𝖫𝗂𝗇𝖾𝖺𝗋(𝖦𝖾𝖫𝖴(𝖫𝗂𝗇𝖾𝖺𝗋(x,w0,b0)),w1,b1)𝖫𝗂𝗇𝖾𝖺𝗋𝖦𝖾𝖫𝖴𝖫𝗂𝗇𝖾𝖺𝗋𝑥subscript𝑤0subscript𝑏0subscript𝑤1subscript𝑏1\mathsf{Linear}(\mathsf{GeLU}(\mathsf{Linear}(x,w_{0},b_{0})),w_{1},b_{1})sansserif_Linear ( sansserif_GeLU ( sansserif_Linear ( italic_x , italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ) , italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ), where wi,bisubscript𝑤𝑖subscript𝑏𝑖w_{i},b_{i}italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT denote the parameters for i𝑖iitalic_i-th linear layer. It consists of two linear layers and an activation function 𝖦𝖾𝖫𝖴𝖦𝖾𝖫𝖴\mathsf{GeLU}sansserif_GeLU.

3.2 Secure Multi-Party Computation

2-out-of-3 replicated secret sharing (RSS) (Araki et al., 2016; Mohassel & Rindal, 2018), a widely-used MPC technique, runs by splitting a secret value x𝑥xitalic_x into several random values (denoted as shares) as x={x0\llbracket x\rrbracket=\{x_{0}⟦ italic_x ⟧ = { italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, x2}x_{2}\}italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT }, s.t., x=x0+x1+x2mod2𝑥modulosubscript𝑥0subscript𝑥1subscript𝑥2superscript2x=x_{0}+x_{1}+x_{2}\mod 2^{\ell}italic_x = italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_mod 2 start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT, where \ellroman_ℓ denotes the ring size. All the computations are performed over the ring 2superscriptsubscript2\mathbb{Z}_{2}^{\ell}blackboard_Z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT. In RSS, the three shares are distributed to three computing parties 𝒫={P0,P1,P2}𝒫subscript𝑃0subscript𝑃1subscript𝑃2\mathcal{P}=\{P_{0},P_{1},P_{2}\}caligraphic_P = { italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT }, where Pisubscript𝑃𝑖P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT holds two shares {xi,xi+1}subscript𝑥𝑖subscript𝑥𝑖1\{x_{i},x_{i+1}\}{ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT } (x3subscript𝑥3x_{3}italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT corresponds to x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT).

In this paper, we use [[]]subscriptdelimited-[]delimited-[][\![\cdot]\!]_{\ell}[ [ ⋅ ] ] start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT to denote RSS over 2subscriptsuperscript2\mathbb{Z}_{2^{\ell}}blackboard_Z start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT. For 11\ell\geq 1roman_ℓ ≥ 1 that supports arithmetic operations like +,,+,-,\cdot+ , - , ⋅, we denote such type as arithmetic sharing. In the case of =11\ell=1roman_ℓ = 1 that only supports boolean operations like bit-wise direct-sum\oplus and \land, we refer to this type as boolean sharing.

In order to incorporate floating-point arithmetic, which is extensively used in ML, into MPC that operates over a ring, we employ fixed-point quantization to encode floating-point numbers as integers. This approach can be considered as a branch of quantization techniques (refer to Section 3.3).

Linear Algebra. Let a,b,c𝑎𝑏𝑐a,b,citalic_a , italic_b , italic_c be public constants and x,y\llbracket x\rrbracket,\llbracket y\rrbracket⟦ italic_x ⟧ , ⟦ italic_y ⟧ be arithmetic-shared values. ax+by+ca\llbracket x\rrbracket+b\llbracket y\rrbracket+citalic_a ⟦ italic_x ⟧ + italic_b ⟦ italic_y ⟧ + italic_c only involves addition and multiplication by a public constant. Hence, ax+by+cdelimited-⟦⟧𝑎𝑥𝑏𝑦𝑐\llbracket ax+by+c\rrbracket⟦ italic_a italic_x + italic_b italic_y + italic_c ⟧ can be computed as (ax0+by0+c,ax1+by1,ax2+by2)𝑎subscript𝑥0𝑏subscript𝑦0𝑐𝑎subscript𝑥1𝑏subscript𝑦1𝑎subscript𝑥2𝑏subscript𝑦2(ax_{0}+by_{0}+c,ax_{1}+by_{1},ax_{2}+by_{2})( italic_a italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_b italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_c , italic_a italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_b italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_a italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_b italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) locally. While for the multiplication of two shared values, xydelimited-⟦⟧𝑥𝑦\llbracket x\cdot y\rrbracket⟦ italic_x ⋅ italic_y ⟧ can be decomposed into (x0+x1+x2)(y0+y1+y2)=i=02zi=(xiyi+xi+1yi+xiyi+1)subscript𝑥0subscript𝑥1subscript𝑥2subscript𝑦0subscript𝑦1subscript𝑦2superscriptsubscript𝑖02subscript𝑧𝑖subscript𝑥𝑖subscript𝑦𝑖subscript𝑥𝑖1subscript𝑦𝑖subscript𝑥𝑖subscript𝑦𝑖1(x_{0}+x_{1}+x_{2})\cdot(y_{0}+y_{1}+y_{2})=\sum_{i=0}^{2}z_{i}=(x_{i}y_{i}+x_% {i+1}y_{i}+x_{i}y_{i+1})( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ⋅ ( italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ), with Pisubscript𝑃𝑖P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT computes zisubscript𝑧𝑖z_{i}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. To obtain xydelimited-⟦⟧𝑥𝑦\llbracket x\cdot y\rrbracket⟦ italic_x ⋅ italic_y ⟧, the parties should perform re-share zisubscript𝑧𝑖z_{i}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (Mohassel & Rindal, 2018), which requires communication with each other.

Non-linear Functions. In addition to linear algebra operations, non-linear functions such as Softmax and Layer Normalization are commonly employed. To implement these functions, we leverage several underlying MPC computation primitives proposed in prior works (Mohassel & Rindal, 2018). We omit the descriptions for primitives like comparison, which are used as black boxes, and only present the functionalities of primitives explicitly mentioned in this paper. We refer to (Dong et al., 2023; Lu et al., 2020) to construct 𝖤𝗑𝗉(x)=ex{\mathsf{Exp}}(\llbracket x\rrbracket)=\llbracket e^{x}\rrbracketsansserif_Exp ( ⟦ italic_x ⟧ ) = ⟦ italic_e start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT ⟧ and 𝗋𝖲𝗊𝗋𝗍(x)=1/x{\mathsf{rSqrt}}(\llbracket x\rrbracket)=\llbracket 1/\sqrt{x}\rrbracketsansserif_rSqrt ( ⟦ italic_x ⟧ ) = ⟦ 1 / square-root start_ARG italic_x end_ARG ⟧.

3.3 Model Quantization

Quantization (Gholami et al., 2021) refers to converting floating-point numbers to low-bit integer representation like 8-bit integer. This can be formulated as x^=𝖢𝗅𝗂𝗉(𝖨𝗇𝗍(x/s),min,max)^𝑥𝖢𝗅𝗂𝗉𝖨𝗇𝗍𝑥𝑠𝑚𝑖𝑛𝑚𝑎𝑥\hat{x}=\mathsf{Clip}(\mathsf{Int}(x/s),min,max)over^ start_ARG italic_x end_ARG = sansserif_Clip ( sansserif_Int ( italic_x / italic_s ) , italic_m italic_i italic_n , italic_m italic_a italic_x ), where min,max𝑚𝑖𝑛𝑚𝑎𝑥min,maxitalic_m italic_i italic_n , italic_m italic_a italic_x denote the clipping bound and s𝑠sitalic_s denotes the scaling factor. Generally, quantization methods can be divided into two categories: post-training quantization (PTQ) (Yao et al., 2022; Dettmers et al., 2022; Frantar et al., 2023) and quantization-aware training (QAT) (Kim et al., 2021; Yao et al., 2021). The former PTQ methods allow one-shot quantization while requiring more complex quantization schemes, e.g., more fine-grained quantization (Yao et al., 2022) or dynamic quantization (Dettmers et al., 2022). The latter QAT methods allow for more diverse quantization by quantizing the weights and activations during the training of the model. Hence, cheaper quantization methods like static quantization are feasible despite the cost of re-training the model.

Most of the quantization methods utilize floating-point scales to achieve adequate precision. Among these works, dyadic quantization (Yao et al., 2021; Kim et al., 2021) is a typical class for integer-only quantization, where the scale s𝑠sitalic_s is a dyadic number c/2f𝑐superscript2𝑓c/2^{f}italic_c / 2 start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT, c𝑐citalic_c is an integer and f𝑓fitalic_f is the precision bit. In this paper, we employ a modified version called fixed-point quantization (s=1/2f𝑠1superscript2𝑓s=1/2^{f}italic_s = 1 / 2 start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT, with c=1𝑐1c=1italic_c = 1) to accommodate floating-point computations into fixed-point arithmetic, which is crucial in MPC. Mathematically, this can be expressed as x^=FXPf(x)=x2fmod2\hat{x}=\texttt{FXP}_{\ell}^{f}(x)=\lfloor x*2^{f}\rceil\mod 2^{\ell}over^ start_ARG italic_x end_ARG = FXP start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT ( italic_x ) = ⌊ italic_x ∗ 2 start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT ⌉ roman_mod 2 start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT.

4 Design

In this section, we begin by introducing the high-level workflow of Ditto. Then we elaborate on two ingredients in Ditto: 1) the MPC-friendly quantization and distillation of Transformers; 2) the quantization-aware secure inference of the quantized and distilled model upon MPC.

4.1 High-Level Workflow

Setting. In this paper, we consider the inference scenario, where the model owner provides a trained model \mathcal{M}caligraphic_M, and the client provides input data x𝑥xitalic_x for the inference task. The inference computation can be formulated as y=θ(x)𝑦subscript𝜃𝑥y=\mathcal{M}_{\theta}(x)italic_y = caligraphic_M start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x ), where θ𝜃\thetaitalic_θ denotes the parameters for the model \mathcal{M}caligraphic_M. The security concern is that both the parameters θ𝜃\thetaitalic_θ and input data x𝑥xitalic_x are unknown to each other, along with potential attackers. Only the inference result y𝑦yitalic_y can be revealed to the client.

Similar to prior works (Li et al., 2023; Dong et al., 2023), we consider the secure outsourced 3PC setting. That is, we offload the inference computation to an MPC system consisting of three computing parties 𝒫={P0,P1,P2}𝒫subscript𝑃0subscript𝑃1subscript𝑃2\mathcal{P}=\{P_{0},P_{1},P_{2}\}caligraphic_P = { italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT }. The client encrypts xdelimited-⟦⟧𝑥\llbracket x\rrbracket⟦ italic_x ⟧ using RSS and sends the shares to corresponding computing parties. Similarly, the model owner encrypts the model parameters θ𝜃\thetaitalic_θ and sends θdelimited-⟦⟧𝜃\llbracket\theta\rrbracket⟦ italic_θ ⟧ to 𝒫𝒫\mathcal{P}caligraphic_P. The computing parties 𝒫𝒫\mathcal{P}caligraphic_P then carry out the secure inference and obtain the inference result ydelimited-⟦⟧𝑦\llbracket y\rrbracket⟦ italic_y ⟧. 𝒫𝒫\mathcal{P}caligraphic_P then sends all the shares of y𝑦yitalic_y to the client, who can reveal the plaintext of y𝑦yitalic_y.

Security Model. In the MPC system, we consider semi-honest and honest-majority adversary (Dong et al., 2023; Tan et al., 2021), where the adversary corrupts no more than half of the computing parties (one exactly in 3PC) and strictly follow the protocol to perform computations but might try to crack sensitive information by analyzing the messages they receive. We note that output privacy, where the inference outputs can be utilized to infer information like membership (Shokri et al., 2017), is beyond the scope.

Refer to caption
Figure 2: High-level workflow of Ditto.

With the setting and security model in mind, we hereby present the high-level workflow of Ditto in Figure 2. In general, this is a two-step inference scheme via a co-design of ML quantization and efficient MPC computation. The first step (the upper left part) is quantizing and distilling the model to a more MPC-friendly version (Section 4.2). This step is performed by the model owner locally using plaintext computation. The second step (the bottom part) involves quantization-aware secure inference of the MPC-friendly model obtained from the first step. We design novel MPC primitives to support essential type conversion in quantization (Section 4.3).

4.2 MPC-friendly Model Quantization and Distillation

4.2.1 MPC-friendly Fixed-point Model Quantization

Our motivation to employ quantization in secure inference originates from the following two considerations. 1) The necessity of model quantization is amplified in secure inference of Transformers since it requires communicating messages between computing parties and the communication size heavily depends on the message bitwidth. Therefore, applying quantization to decrease bitwidth can theoretically shrink the overall volume of transmitted data, leading to improved inference efficiency. 2) The feasibility of using low-bit quantization is also evidenced in previous works. As observed in (Bombari et al., 2022), the neural network models are typically over-parameterized, thus leaving room for reducing the precision and computing with lower bitwidth while maintaining the model utility. The recent success of quantization in large language models (Dettmers et al., 2022; Frantar et al., 2022) also proves the feasibility of quantization in more complex models.

Whereas there exist several gaps between plaintext quantized inference and secure quantized inference.

Gap 1: Dynamic quantization is expensive in MPC. The state-of-the-art plaintext quantization works (Dettmers et al., 2022; Frantar et al., 2023) allow the entire inference to be carried out using low-bit integers like INT8 or even INT4. Despite achieving considerable speed up, tailored quantization operations are required, like dynamically computing the min/max/outlier to obtain the scaling factor s𝑠sitalic_s and calculating xsdelimited-⌊⌉𝑥𝑠\lfloor x\cdot s\rceil⌊ italic_x ⋅ italic_s ⌉ with clipping. Such operations are quite expensive in MPC. In this case, directly transplanting existing quantization strategies into secure inference is not viable, where the overhead to perform quantization alone outweighs the overhead reduction brought by quantization.

Solution. To mitigate this issue, we adopt static dyadic quantization to avoid dynamically computing scale in inference. By dyadic, we quantize all the weights and activations into fixed-point numbers, where the scaling factor is in the form of 1/2f1superscript2𝑓1/2^{f}1 / 2 start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT. In this way, the scale conversion can be implemented using left-shift or right-shift (aka. truncation (Mohassel & Rindal, 2018; Escudero et al., 2020)), which is much cheaper in MPC. We also adopt layer-wise quantization. That is, we use different quantization scales for different layers. By enabling a smaller quantization scale for linear layers that are not sensitive to precision, we can improve efficiency. While for those non-linear layers like layer normalization, we use a larger scale to avoid a significant accuracy drop. (Micikevicius et al., 2017) 111Similar precision practices are also used in PyTorch: https://pytorch.org/docs/stable/amp.html#cuda-ops-that-can-autocast-to-float16.

Gap 2: Type conversions are difficult in MPC. In plaintext quantization, linear operations utilize INT8 for computation and INT32 for accumulation to prevent overflow, while non-linear functions compute in FP32 to maintain sufficient precision. This necessitates type conversions throughout the model inference to avert overflow and substantial loss of precision. However, we note that these type conversions between INT8 and INT32 are straightforward in plaintext but pose a novel challenge in MPC, which operates over rings. Such type conversion involves converting shares among different rings, which cannot be done locally.

Solution. To bridge this gap, we propose efficient type conversion MPC primitives (Section 4.3.1). Besides, to avoid frequent type conversions, we use uniform low-bitwidth fixed-point encoding (FXP328superscriptsubscriptFXP328\texttt{FXP}_{32}^{8}FXP start_POSTSUBSCRIPT 32 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT) in intermediate Transformer blocks for linear operations. For the non-linear functions and the last prediction layer, we instead use a higher-bitwidth fixed-point encoding (FXP6418superscriptsubscriptFXP6418\texttt{FXP}_{64}^{18}FXP start_POSTSUBSCRIPT 64 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 18 end_POSTSUPERSCRIPT) for the sake of adequate precision. We note that 32/64-bit integers can accommodate the activation range, of which the distribution is illustrated in Appendix F, thus avoiding overflow.

To summarize, we quantize all the weights and activations into fixed-point numbers using layer-wise static quantization with dyadic scales. In Ditto, all the variables (i.e., activations and weights) are quantized with different precision. We provide an illustration of the difference between fixed-point inference and traditional floating-point inference in Appendix G. Note that for the non-linear layers, we exploit fixed-point conversion to raise the precision.

4.2.2 Quantization-aware Activation Functions

We hereby introduce the quantization-aware non-linear activation functions using fixed-point numbers of different bits. The formulated algorithms are deferred to Appendix B.

Softmax. We use Softmax to map the inputs into the range (0,1)01(0,1)( 0 , 1 ) as 𝖲𝗈𝖿𝗍𝗆𝖺𝗑(x)=exiiexi𝖲𝗈𝖿𝗍𝗆𝖺𝗑𝑥superscript𝑒subscript𝑥𝑖subscript𝑖superscript𝑒subscript𝑥𝑖\mathsf{Softmax}(x)=\frac{e^{x_{i}}}{\sum_{i}e^{x_{i}}}sansserif_Softmax ( italic_x ) = divide start_ARG italic_e start_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG. For numerical stability, we first ‘normalize’ the input by computing x=x𝗆𝖺𝗑i(x)𝑥𝑥subscript𝗆𝖺𝗑𝑖𝑥x=x-\mathsf{max}_{i}(x)italic_x = italic_x - sansserif_max start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ). Since 𝗆𝖺𝗑𝗆𝖺𝗑\mathsf{max}sansserif_max does not require a high precision, we compute this part using FXP328superscriptsubscriptFXP328\texttt{FXP}_{32}^{8}FXP start_POSTSUBSCRIPT 32 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT. For the following exponential and division, we use existing protocols 𝖤𝗑𝗉𝖤𝗑𝗉{\mathsf{Exp}}sansserif_Exp (tailored approximation for Softmax, detailed in Appendix J) and 𝖱𝖾𝖼𝗂𝗉𝖱𝖾𝖼𝗂𝗉{\mathsf{Recip}}sansserif_Recip (Catrina & Saxena, 2010). These two functions are computed over FXP6418superscriptsubscriptFXP6418\texttt{FXP}_{64}^{18}FXP start_POSTSUBSCRIPT 64 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 18 end_POSTSUPERSCRIPT to maintain adequate precision.

GeLU. The original GeLU function computes 𝖦𝖾𝖫𝖴(x)=x2(1+tanh(2/π(x+0.044715x3)))𝖦𝖾𝖫𝖴𝑥𝑥212𝜋𝑥0.044715superscript𝑥3\mathsf{GeLU}(x)=\frac{x}{2}\cdot(1+\tanh(\sqrt{2/\pi}\cdot(x+0.044715\cdot x^% {3})))sansserif_GeLU ( italic_x ) = divide start_ARG italic_x end_ARG start_ARG 2 end_ARG ⋅ ( 1 + roman_tanh ( square-root start_ARG 2 / italic_π end_ARG ⋅ ( italic_x + 0.044715 ⋅ italic_x start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) ) ). To precisely compute GeLU, we first try to use high-order chebyshev polynomial to approximate tanh\tanhroman_tanh. However, it requires several multiplication, thus leading to significant overhead. Inspired by (Li et al., 2023), we use a quantized polynomial to directly approximate 𝖦𝖾𝖫𝖴(x)=0.125x2+0.25x+0.5𝖦𝖾𝖫𝖴𝑥0.125superscript𝑥20.25𝑥0.5\mathsf{GeLU}(x)=0.125x^{2}+0.25x+0.5sansserif_GeLU ( italic_x ) = 0.125 italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 0.25 italic_x + 0.5 over FXP328superscriptsubscriptFXP328\texttt{FXP}_{32}^{8}FXP start_POSTSUBSCRIPT 32 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT. The Quad approximation is worth mentioning as it evaluates a two-order polynomial, allowing it to be computed with lower precision.

Layer Normalization. Given a vector of x𝑥xitalic_x as input, 𝖫𝖺𝗒𝖾𝗋𝖭𝗈𝗋𝗆=xμσ+ϵg+b𝖫𝖺𝗒𝖾𝗋𝖭𝗈𝗋𝗆𝑥𝜇𝜎italic-ϵgb\mathsf{LayerNorm}=\frac{x-\mu}{\sqrt{\sigma+\epsilon}}\cdot\textbf{g}+\textbf% {b}sansserif_LayerNorm = divide start_ARG italic_x - italic_μ end_ARG start_ARG square-root start_ARG italic_σ + italic_ϵ end_ARG end_ARG ⋅ g + b, where μ𝜇\muitalic_μ and σ𝜎\sigmaitalic_σ denote mean and variance, g,bgb\textbf{g},\textbf{b}g , b denote scale and bias, and ϵitalic-ϵ\epsilonitalic_ϵ is a small constant. To avoid significant precision loss, we upcast the inputs and perform the layer normalization with a relatively higher precision FXP6418superscriptsubscriptFXP6418\texttt{FXP}_{64}^{18}FXP start_POSTSUBSCRIPT 64 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 18 end_POSTSUPERSCRIPT. The final outputs are downcasted back to FXP328superscriptsubscriptFXP328\texttt{FXP}_{32}^{8}FXP start_POSTSUBSCRIPT 32 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT for subsequent computations.

4.2.3 Quantization-aware Distillation

Despite the efficiency gain from the above MPC-friendly quantization and approximation, these two steps can cause the precision drop. We illustrate the loss between the outputs from the original model and the quantized and approximated model in Appendix E. Consequently, the converted model \mathcal{M}caligraphic_M is of low utility. In order to compensate for the error introduced by these two methods, we adopt knowledge distillation (KD) (Jiao et al., 2020; Li et al., 2023).

Without special declaration, we denote the original model as 𝒯𝒯\mathcal{T}caligraphic_T and the converted model as \mathcal{M}caligraphic_M. We quantize the weights of 𝒯𝒯\mathcal{T}caligraphic_T to low-precision version in layer-wise granularity, which serve as the initial weights of \mathcal{M}caligraphic_M. All the computations in \mathcal{M}caligraphic_M use integer-only arithmetic. We leverage layer-wise distillation, considering that we use layer-wise quantization. Concretely, we capture the hidden states of all the Transformer layers from both 𝒯𝒯\mathcal{T}caligraphic_T and \mathcal{M}caligraphic_M and use the Mean Squared Error loss (MSE) between these two outputs to measure the distillation loss.

4.3 Secure Model Inference upon MPC

To perform the aforementioned type conversions in layer-wise quantization, we propose efficient MPC primitives.

4.3.1 Type Conversion MPC primitives

The type conversion can be divided into upcast and downcast. Upcast refers to converting values from a smaller fixed-point representation to a larger fixed-point representation, while downcast is the opposite. In MPC, type conversions additionally involves share conversions among different rings. We consider convert the input x\llbracket x\rrbracket_{\ell}⟦ italic_x ⟧ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT from FXPfsuperscriptsubscriptFXP𝑓\texttt{FXP}_{\ell}^{f}FXP start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT to FXPfsuperscriptsubscriptFXPsuperscriptsuperscript𝑓\texttt{FXP}_{\ell^{\prime}}^{f^{\prime}}FXP start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT. Due to page limitation, we defer the correctness analysis and security proof to Appendix Bsimilar-to\simD.

Downcast (>,f>fformulae-sequencesuperscript𝑓superscript𝑓\ell>\ell^{\prime},f>f^{\prime}roman_ℓ > roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_f > italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, Algorithm 4). It suffices to a right-shift followed by a modulo operation as x=𝖣𝗈𝗐𝗇𝖢𝖺𝗌𝗍(x)=xi(ff)mod2\llbracket x^{\prime}\rrbracket_{\ell^{\prime}}=\mathsf{DownCast}(\llbracket x% \rrbracket_{\ell})=x_{i}\gg(f-f^{\prime})\mod 2^{\ell^{\prime}}⟦ italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⟧ start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = sansserif_DownCast ( ⟦ italic_x ⟧ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) = italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≫ ( italic_f - italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) roman_mod 2 start_POSTSUPERSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT for i{0,1,2}𝑖012i\in\{0,1,2\}italic_i ∈ { 0 , 1 , 2 }. The local right-shift by (ff)𝑓superscript𝑓(f-f^{\prime})( italic_f - italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) bits first lowers the precision to 2fsuperscript2superscript𝑓2^{f^{\prime}}2 start_POSTSUPERSCRIPT italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT. The subsequent local modulo operation, i.e., dropping the most significant ()superscript(\ell-\ell^{\prime})( roman_ℓ - roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) bits, converts the shares to a smaller ring, s.t., x/2f=x/2f𝑥superscript2𝑓superscript𝑥superscript2superscript𝑓x/2^{f}=x^{\prime}/2^{f^{\prime}}italic_x / 2 start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT = italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT / 2 start_POSTSUPERSCRIPT italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT.

Upcast (<,f<fformulae-sequencesuperscript𝑓superscript𝑓\ell<\ell^{\prime},f<f^{\prime}roman_ℓ < roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_f < italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, Algorithm 1). It suffices to convert xdelimited-⟦⟧𝑥\llbracket x\rrbracket⟦ italic_x ⟧ from 2subscriptsuperscript2\mathbb{Z}_{2^{\ell}}blackboard_Z start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT to 2subscriptsuperscript2superscript\mathbb{Z}_{2^{\ell^{\prime}}}blackboard_Z start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, followed by a left-shift operation. The left-shift can be implemented directly by left-shifting the shares locally. While for the ring conversion, it is not trivial. As shown in Equation 1, there may be potential wrap w𝑤witalic_w of the sum of xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT modulo 2superscript22^{\ell}2 start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT, i.e., w=(x0+x1+x2)/2𝑤subscript𝑥0subscript𝑥1subscript𝑥2superscript2w=\lfloor(x_{0}+x_{1}+x_{2})/2^{\ell}\rflooritalic_w = ⌊ ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) / 2 start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ⌋. w𝑤witalic_w cannot be implicitly erased since <superscript\ell<\ell^{\prime}roman_ℓ < roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and w2mod2modulo𝑤superscript2superscript2superscriptw\cdot 2^{\ell}\mod 2^{\ell^{\prime}}italic_w ⋅ 2 start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT roman_mod 2 start_POSTSUPERSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT might not equal zero. However, directly computing w𝑤witalic_w that involves non-linear comparisons is expensive in MPC.

xmod2=(x0+x1+x2w2)mod2=x0mod2+x1mod2+x2mod2(w2)mod2modulo𝑥superscript2modulosubscript𝑥0subscript𝑥1subscript𝑥2𝑤superscript2superscript2modulosubscript𝑥0modulosuperscript2superscriptsubscript𝑥1modulosuperscript2superscriptsubscript𝑥2modulosuperscript2superscript𝑤superscript2superscript2superscript\small\begin{split}x&\mod 2^{\ell}=(x_{0}+x_{1}+x_{2}-w\cdot 2^{\ell})\mod 2^{% \ell}\\ &=x_{0}\mod 2^{\ell^{\prime}}+x_{1}\mod 2^{\ell^{\prime}}+x_{2}\mod 2^{\ell^{% \prime}}\\ &-(w\cdot 2^{\ell})\mod 2^{\ell^{\prime}}\\ \end{split}start_ROW start_CELL italic_x end_CELL start_CELL roman_mod 2 start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT = ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_w ⋅ 2 start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) roman_mod 2 start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT roman_mod 2 start_POSTSUPERSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT + italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT roman_mod 2 start_POSTSUPERSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT + italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_mod 2 start_POSTSUPERSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL - ( italic_w ⋅ 2 start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) roman_mod 2 start_POSTSUPERSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_CELL end_ROW (1)
Algorithm 1 Secure 𝖴𝗉𝖢𝖺𝗌𝗍𝖴𝗉𝖢𝖺𝗌𝗍\mathsf{UpCast}sansserif_UpCast Protocol
0:  RSS-shared x\llbracket x\rrbracket_{\ell}⟦ italic_x ⟧ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT over FXPfsuperscriptsubscriptFXP𝑓\texttt{FXP}_{\ell}^{f}FXP start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT;
0:  RSS-shared x\llbracket x^{\prime}\rrbracket_{\ell^{\prime}}⟦ italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⟧ start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT over FXPfsuperscriptsubscriptFXPsuperscriptsuperscript𝑓\texttt{FXP}_{\ell^{\prime}}^{f^{\prime}}FXP start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT, where x=xsuperscript𝑥𝑥x^{\prime}=xitalic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_x.
1:  P2subscript𝑃2P_{2}italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT samples bits {ri}subscript𝑟𝑖\{r_{i}\}{ italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } for i[0,1]𝑖01i\in[0,\ell-1]italic_i ∈ [ 0 , roman_ℓ - 1 ] and computes r=i=01ri2i𝑟superscriptsubscript𝑖01subscript𝑟𝑖superscript2𝑖r=\sum_{i=0}^{\ell-1}r_{i}*2^{i}italic_r = ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ - 1 end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∗ 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT.
2:  P2subscript𝑃2P_{2}italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT generates 2-out-of-2 sharing of r𝑟ritalic_r and r1subscript𝑟1r_{\ell-1}italic_r start_POSTSUBSCRIPT roman_ℓ - 1 end_POSTSUBSCRIPT as r={r0,r1}subscriptdelimited-⟨⟩𝑟superscriptsubscript𝑟0subscript𝑟1\langle r\rangle_{\ell^{\prime}}=\{r_{0},r_{1}\}⟨ italic_r ⟩ start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = { italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT }, r1={r1,0,r1,1}subscriptdelimited-⟨⟩subscript𝑟1superscriptsubscript𝑟10subscript𝑟11\langle r_{\ell-1}\rangle_{\ell^{\prime}}=\{r_{\ell-1,0},r_{\ell-1,1}\}⟨ italic_r start_POSTSUBSCRIPT roman_ℓ - 1 end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = { italic_r start_POSTSUBSCRIPT roman_ℓ - 1 , 0 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT roman_ℓ - 1 , 1 end_POSTSUBSCRIPT }.
3:  P2subscript𝑃2P_{2}italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT sends the corresponding shares to P0subscript𝑃0P_{0}italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Pisubscript𝑃𝑖P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT holds risubscript𝑟𝑖r_{i}italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and r1,isubscript𝑟1𝑖r_{\ell-1,i}italic_r start_POSTSUBSCRIPT roman_ℓ - 1 , italic_i end_POSTSUBSCRIPT.
4:  Pisubscript𝑃𝑖P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for i{0,1,2}𝑖012i\in\{0,1,2\}italic_i ∈ { 0 , 1 , 2 } generate random numbers z0,z22subscript𝑧0subscript𝑧2subscriptsuperscript2superscriptz_{0},z_{2}\in\mathbb{Z}_{2^{\ell^{\prime}}}italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ blackboard_Z start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_POSTSUBSCRIPT using PRF: P2subscript𝑃2P_{2}italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and P0subscript𝑃0P_{0}italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT samples z0subscript𝑧0z_{0}italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT; P2subscript𝑃2P_{2}italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT samples z2subscript𝑧2z_{2}italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.
5:  P0subscript𝑃0P_{0}italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT obtain rsubscriptdelimited-⟨⟩𝑟\langle r\rangle_{\ell}⟨ italic_r ⟩ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT by invoking 𝖣𝗈𝗐𝗇𝖢𝖺𝗌𝗍(r(ff))𝖣𝗈𝗐𝗇𝖢𝖺𝗌𝗍much-less-thansubscriptdelimited-⟨⟩𝑟superscriptsuperscript𝑓𝑓{\mathsf{DownCast}}(\langle r\rangle_{\ell^{\prime}}\ll(f^{\prime}-f))sansserif_DownCast ( ⟨ italic_r ⟩ start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≪ ( italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_f ) ). using r from Step-3using r from Step-3~{}~{}~{}~{}~{}~{}\vartriangleright\text{using $\langle r\rangle_{\ell^{\prime% }}$ from Step-3}⊳ using ⟨ italic_r ⟩ start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT from Step-3
6:  P0subscript𝑃0P_{0}italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT convert x\llbracket x\rrbracket_{\ell}⟦ italic_x ⟧ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT to x^={x^0,x^1}subscriptdelimited-⟨⟩^𝑥subscript^𝑥0subscript^𝑥1\langle\hat{x}\rangle_{\ell}=\{\hat{x}_{0},\hat{x}_{1}\}⟨ over^ start_ARG italic_x end_ARG ⟩ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT = { over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT } by constructing x^0=x1+x2subscript^𝑥0subscript𝑥1subscript𝑥2\hat{x}_{0}=x_{1}+x_{2}over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, x^1=x0subscript^𝑥1subscript𝑥0\hat{x}_{1}=x_{0}over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, where Pisubscript𝑃𝑖P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT holds x^isubscript^𝑥𝑖\hat{x}_{i}over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and x=x^𝑥^𝑥x=\hat{x}italic_x = over^ start_ARG italic_x end_ARG.
7:  P0subscript𝑃0P_{0}italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT executes the following steps:
y=x^+r and open y=𝖱𝖾𝗏𝖾𝖺𝗅(y),y=x^+rw^=r1¬y1,w^=r1¬y1x=yr+w^2,x=yr+w^2\begin{split}&\langle y\rangle_{\ell}=\langle\hat{x}\rangle_{\ell}+\langle r% \rangle_{\ell}\text{ and open $y={\mathsf{Reveal}}(\langle y\rangle_{\ell})$},% ~{}~{}~{}\vartriangleright y=\hat{x}+r\\ &\langle\hat{w}\rangle_{\ell^{\prime}}=\langle r_{\ell-1}\rangle_{\ell^{\prime% }}\cdot\neg y_{\ell-1},~{}~{}~{}\vartriangleright\hat{w}=r_{\ell-1}\cdot\neg y% _{\ell-1}\\ &\langle x^{\prime}\rangle_{\ell^{\prime}}=y-\langle r\rangle_{\ell^{\prime}}+% \langle\hat{w}\rangle_{\ell^{\prime}}\cdot 2^{\ell},~{}~{}~{}~{}~{}~{}% \vartriangleright x=y-r+\hat{w}\cdot 2^{\ell}\end{split}start_ROW start_CELL end_CELL start_CELL ⟨ italic_y ⟩ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT = ⟨ over^ start_ARG italic_x end_ARG ⟩ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT + ⟨ italic_r ⟩ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT and open italic_y = sansserif_Reveal ( ⟨ italic_y ⟩ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) , ⊳ italic_y = over^ start_ARG italic_x end_ARG + italic_r end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ⟨ over^ start_ARG italic_w end_ARG ⟩ start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = ⟨ italic_r start_POSTSUBSCRIPT roman_ℓ - 1 end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ⋅ ¬ italic_y start_POSTSUBSCRIPT roman_ℓ - 1 end_POSTSUBSCRIPT , ⊳ over^ start_ARG italic_w end_ARG = italic_r start_POSTSUBSCRIPT roman_ℓ - 1 end_POSTSUBSCRIPT ⋅ ¬ italic_y start_POSTSUBSCRIPT roman_ℓ - 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ⟨ italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⟩ start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = italic_y - ⟨ italic_r ⟩ start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT + ⟨ over^ start_ARG italic_w end_ARG ⟩ start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ⋅ 2 start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , ⊳ italic_x = italic_y - italic_r + over^ start_ARG italic_w end_ARG ⋅ 2 start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT end_CELL end_ROW
and outputs x={x0,x1}subscriptdelimited-⟨⟩superscript𝑥superscriptsubscriptsuperscript𝑥0subscriptsuperscript𝑥1\langle x^{\prime}\rangle_{\ell^{\prime}}=\{x^{\prime}_{0},x^{\prime}_{1}\}⟨ italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⟩ start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = { italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT }, s.t., x0+x1mod2=xmodulosubscriptsuperscript𝑥0subscriptsuperscript𝑥1superscript2superscript𝑥x^{\prime}_{0}+x^{\prime}_{1}\mod 2^{\ell^{\prime}}=xitalic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT roman_mod 2 start_POSTSUPERSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT = italic_x.
8:  Pisubscript𝑃𝑖P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for i{0,1}𝑖01i\in\{0,1\}italic_i ∈ { 0 , 1 } proceed using randomness from Step-4: 1) P0subscript𝑃0P_{0}italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT computes x0=x0z0subscriptsuperscript𝑥0subscriptsuperscript𝑥0subscript𝑧0x^{\prime}_{0}=x^{\prime}_{0}-z_{0}italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT; 2) P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT computes x1=x1z2subscriptsuperscript𝑥1subscriptsuperscript𝑥1subscript𝑧2x^{\prime}_{1}=x^{\prime}_{1}-z_{2}italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.
9:  P0subscript𝑃0P_{0}italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT exchanges x0subscriptsuperscript𝑥0x^{\prime}_{0}italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and x1subscriptsuperscript𝑥1x^{\prime}_{1}italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT.
10:  P0subscript𝑃0P_{0}italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT outputs (z0,x0+x1)subscript𝑧0subscriptsuperscript𝑥0subscriptsuperscript𝑥1(z_{0},x^{\prime}_{0}+x^{\prime}_{1})( italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ), P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT outputs (x0+x1,z2)subscriptsuperscript𝑥0subscriptsuperscript𝑥1subscript𝑧2(x^{\prime}_{0}+x^{\prime}_{1},z_{2})( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) and P2subscript𝑃2P_{2}italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT outputs (z2,z0)subscript𝑧2subscript𝑧0(z_{2},z_{0})( italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ).
11:  return  x={z0,x0+x1,z2}\llbracket x^{\prime}\rrbracket_{\ell^{\prime}}=\{z_{0},x^{\prime}_{0}+x^{% \prime}_{1},z_{2}\}⟦ italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⟧ start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = { italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT }.

Optimization Trick. We here take the intuition of mask-and-open that goes as x=((x+r)mod2+w^2r)mod2𝑥modulomodulo𝑥𝑟superscript2^𝑤superscript2𝑟superscript2superscriptx=((x+r)\mod 2^{\ell}+\hat{w}\cdot 2^{\ell}-r)\mod 2^{\ell^{\prime}}italic_x = ( ( italic_x + italic_r ) roman_mod 2 start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT + over^ start_ARG italic_w end_ARG ⋅ 2 start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT - italic_r ) roman_mod 2 start_POSTSUPERSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT, where w^=(x+r)>?2^𝑤𝑥𝑟?superscript2\hat{w}=(x+r)\overset{?}{>}2^{\ell}over^ start_ARG italic_w end_ARG = ( italic_x + italic_r ) over? start_ARG > end_ARG 2 start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT. The problem now reduces to compute another potential wrap w^^𝑤\hat{w}over^ start_ARG italic_w end_ARG, which is easier than computing w𝑤witalic_w. To further facilitate the computation, we exploit a positive heuristic trick. That is, supposing the input x[22,221]𝑥superscript22superscript221x\in[-2^{\ell-2},2^{\ell-2}-1]italic_x ∈ [ - 2 start_POSTSUPERSCRIPT roman_ℓ - 2 end_POSTSUPERSCRIPT , 2 start_POSTSUPERSCRIPT roman_ℓ - 2 end_POSTSUPERSCRIPT - 1 ] of which the MSB is unknown, we add a large bias 22superscript222^{\ell-2}2 start_POSTSUPERSCRIPT roman_ℓ - 2 end_POSTSUPERSCRIPT to x𝑥xitalic_x to ensure x=x+r[0,211]superscript𝑥𝑥𝑟0superscript211x^{\prime}=x+r\in[0,2^{\ell-1}-1]italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_x + italic_r ∈ [ 0 , 2 start_POSTSUPERSCRIPT roman_ℓ - 1 end_POSTSUPERSCRIPT - 1 ] is positive. Take =88\ell=8roman_ℓ = 8 as an example. An 8-bit signed integer can represent the range [128,127]128127[-128,127][ - 128 , 127 ]. We here assume the range of x𝑥xitalic_x as x[64,63]𝑥6463x\in[-64,63]italic_x ∈ [ - 64 , 63 ], which can be represented with a 7-bit integer. With this assumption, we have x+64[0,127]𝑥640127x+64\in[0,127]italic_x + 64 ∈ [ 0 , 127 ]. This ensures that the MSB of x+64𝑥64x+64italic_x + 64, when encoded as an 8-bit integer, will always be 0. As a result, we can compute w^=r1¬y1^𝑤subscript𝑟1subscript𝑦1\hat{w}=r_{\ell-1}\wedge\neg y_{\ell-1}over^ start_ARG italic_w end_ARG = italic_r start_POSTSUBSCRIPT roman_ℓ - 1 end_POSTSUBSCRIPT ∧ ¬ italic_y start_POSTSUBSCRIPT roman_ℓ - 1 end_POSTSUBSCRIPT with one AND operation, where y=x+rmod2𝑦modulosuperscript𝑥𝑟superscript2y=x^{\prime}+r\mod 2^{\ell}italic_y = italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + italic_r roman_mod 2 start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT. After the upcast conversion, the bias can be directly subtracted to eliminate its influence. Noted that the sharing of r𝑟ritalic_r over both 2subscriptsuperscript2\mathbb{Z}_{2^{\ell}}blackboard_Z start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT and 2subscriptsuperscript2superscript\mathbb{Z}_{2^{\ell^{\prime}}}blackboard_Z start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_POSTSUBSCRIPT (Step-5) can be implemented using 𝖣𝗈𝗐𝗇𝖢𝖺𝗌𝗍𝖣𝗈𝗐𝗇𝖢𝖺𝗌𝗍\mathsf{DownCast}sansserif_DownCast.

Communication Complexity Analysis. The upcast protocol in total requires sending 3+3superscript3\ell+\ell^{\prime}3 roman_ℓ + roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT bits in 3 communication rounds. In Step-3, P2subscript𝑃2P_{2}italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT sends randomness of 222\ell2 roman_ℓ bits. In Step-7, P0subscript𝑃0P_{0}italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT communicates \ellroman_ℓ bits to reveal y𝑦yitalic_y. Finally, P0subscript𝑃0P_{0}italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT exchanges superscript\ell^{\prime}roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT bits to obtain the output.

4.3.2 Quantization-aware MPC execution

With the underlying MPC primitives ready, we proceed to implement an end-to-end secure quantization-aware inference framework. We enhance the SecretFlow-SPU (Ma et al., 2023) framework, which supports the compilation of front-end models into privacy-preserving versions but lacks support for quantization. Concretely, we design and implement the quantization-aware compiler to allow dynamic ring support and automatic type conversions. The compiler can directly load the models from Huggingface and automatically rewrite the directed acyclic graph (DAG) to allow seamless execution of secure quantized inference.

Figure 3 depicts the DAG of the Softmax function, where the right part illustrates the modified quantization-aware variant in the domain of secure computation. The upcast and downcast operations are automatically called to align types (i.e., different quantizations and rings) upon MPC.

Refer to caption
Figure 3: DAG of Softmax (x8×10𝑥superscript810x\in\mathbb{R}^{8\times 10}italic_x ∈ blackboard_R start_POSTSUPERSCRIPT 8 × 10 end_POSTSUPERSCRIPT). Low-bit and high-bit computations are marked in Green and Orange, resp.

Dynamic Ring Support. In many MPC frameworks, including SPU, secure computations are limited to a uniform, statically-configured ring, a constraint that hampers the versatile deployment of fixed-point numbers with variable bitwidths. To overcome this limitation, we devise a dynamic mapping strategy that maps plaintext data types from front-end to corresponding ciphertext types and, consequently the back-end rings of different sizes. For instance, FP16 and FP32 are associated with FXP32 over 232subscriptsuperscript232\mathbb{Z}_{2^{32}}blackboard_Z start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT 32 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT and FXP64 over 264subscriptsuperscript264\mathbb{Z}_{2^{64}}blackboard_Z start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT 64 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, respectively. As a result, the secure computations can be dynamically dispatched to the appropriate rings.

Automatic Type Conversion. We further refine the compiler to track the input and compute types associated with each operator. As inference proceeds, variables are encoded using different types (i.e., different precisions and bitwidths). We exploit the MPC primitives to perform the interleaving type conversions that automatically maintain type consistency. For instance, if an operator (such as exponential) is specified with an input type of FXP32 and a compute type of FXP64, an upcast operation will seamlessly transition the input to FXP64 before carrying out further computations.

5 Experiments

We evaluate Ditto mainly from three aspects: 1) model utility (Section 5.1); 2) inference efficiency (Section 5.2); 3) scalability experiments and ablation studies (Section 5.3).

Experimental setup. We implement Ditto upon the framework SecretFlow-SPU 222SPU: https://github.com/secretflow/spu. that supports privacy-preserving machine learning. We use pure fixed-point arithmetic during the quantization and distillation procedure. We conduct the experiments on one CentOS 8 machine equipped with one AMD Ryzen CPU (32 cores and 3.60GHz) and 256GB of RAM. We consider two network environments: 1) LAN setting with a bandwidth of 5Gbps and 0.4ms round-trip time; 2) WAN setting with a bandwidth of 400Mbps and 40ms round-trip time. We simulate the network environments using the Linux tc tool.

Model architectures and datasets. We use the pre-trained Bert models and GPT models in Hugging Face (Wolf et al., 2020). For Bert, we use Bert-base and Bert-large pre-trained over BookCorpus (Zhu et al., 2015) and English Wikipedia (Wikipedia contributors, 2004) datasets. For GPT, we use GPT2-base and GPT2-medium pre-trained over the Wikitext-103 dataset (Merity et al., 2016). We evaluate Bert over RTE, CoLA, QQP and QNLI from GLUE benchmarks (Wang et al., 2019), and GPT2 on the validation set of Wikitext-103. The detailed hyper-parameters for fine-tuning and distillation are in Appendix H.

Baselines. We adopt secure inference upon SPU as the vanilla baseline. The ablation models are denoted as Dittow/o{a}subscriptDitto𝑤𝑜𝑎\texttt{Ditto}_{w/o\{a\}}Ditto start_POSTSUBSCRIPT italic_w / italic_o { italic_a } end_POSTSUBSCRIPT with quantization, and Ditto with both quantization and non-linear function approximation. To make a more comprehensive comparison, we compare with two state-of-the-art work MPCFormer (Li et al., 2023) and PUMA (Dong et al., 2023), which are similar to our setting.

5.1 Utility Evaluation

The evaluation of model utility is based on various accuracy metrics for downstream tasks. Concretely, we adopt Accuracy for RTE and QNLI, Matthews correlation for CoLA, F1 score for QQP, and Perplexity for Wikitext-103. In the GLUE benchmark, the input sequence length is set to 128 for Bert-base and Bert-large. For Wikitext-103, the input sequence length is set to 50 for GPT2-base and GPT2-medium. Regarding MPCFormer, we explore two variants: Quad-alone and Quad+2ReLU. For PUMA, the GeLU function is computed using their Poly approximation (cf., Appendix I).

The results are provided in Table 1. In general, with mere Quad approximation, Ditto (Quad) achieves similar results to that of MPCFormer (Quad) and slightly lower than the baseline without any quantization or approximation. The utility degradation is negligible on most datasets, except CoLA. The lower utility of both MPCFormer and Ditto on CoLA could be attributed to its smaller size, leading to unstable distillation performance. Notably we observe that with 2ReLU approximation, both MPCFormer and Ditto incur noticeable utility drops in Bert tasks. This is in line with the results reported in MPCFormer, thus indicating that Softmax is more sensitive to precision. Regarding PUMA, it is worth noting that it incurs almost no accuracy drop due to the usage of a more accurate Polynomial approximation. However, as demonstrated in the following experiment, this improved accuracy comes at the cost of more communication overhead. To balance between utility and efficiency, we mainly use Quad approximation for GeLU in Ditto.

Table 1: Model utility on GLUE benchmark for Bert and on Wikitext-103 dataset for GPT2.
Method Approx. Bert-base Bert-large GPT2-base GPT2-medium
RTE CoLA QQP QNLI (\uparrow) RTE CoLA QQP QNLI (\uparrow) Wikitext-103 (\downarrow)
Baseline - 68.59 57.06 87.96 91.62 72.56 63.09 88.52 92.58 12.25 10.60
MPCFormer Quad 67.85 54.47 87.76 91.68 71.86 57.53 88.34 92.53 - -
Quad+2ReLU 64.30 52.75 86.95 90.76 70.29 55.53 87.64 91.85 - -
PUMA Poly 68.47 56.96 87.95 91.48 72.56 62.60 88.50 92.55 12.25 10.49
Dittow/o{a}subscriptDitto𝑤𝑜𝑎\texttt{Ditto}_{w/o\{a\}}Ditto start_POSTSUBSCRIPT italic_w / italic_o { italic_a } end_POSTSUBSCRIPT - 67.87 54.17 87.15 91.74 72.55 56.25 88.22 92.58 12.99 10.61
Ditto Quad 67.82 54.52 87.72 91.78 71.84 56.45 88.23 92.58 13.78 11.35
Quad+2ReLU 63.89 52.78 86.92 87.71 71.48 51.69 87.51 87.53 - -

5.2 Efficiency Evaluation

We measure the end-to-end runtime and concrete communication size to evaluate the efficiency. The experiments are conducted with a batch size of 1 in the LAN setting (refer to Appendix A for experiments in WAN). For Bert models, the input sequence length is set to 128. We choose the CoLA task as a representative since other tasks share the same model architecture as CoLA, resulting in similar inference overhead. As for GPT2 models, we generate 1 new token with an input length of 32. We run experiments for MPCFormer 333MPCFormer is configured to run on CPU for fair comparisons. and Ditto with and without the Quad approximation of GeLU for a comprehensive comparison.

Refer to caption
Refer to caption
(a) Bert-base
Refer to caption
(b) Bert-large
Refer to caption
(c) GPT2-base
Refer to caption
(d) GPT2-medium
Figure 4: Efficiency evaluation on Bert and GPT2 models. The closer to the bottom left corner, the better the performance.

In Figure 4, the x𝑥xitalic_x- and y𝑦yitalic_y- axes represent communication size (in GB) and runtime (in seconds), respectively. The closer the dot is to the bottom left corner, the better the performance. On the four models, Ditto (marked in red star) generally has the best efficiency. Regarding communication size, Dittow/o{a}subscriptDitto𝑤𝑜𝑎\texttt{Ditto}_{w/o\{a\}}Ditto start_POSTSUBSCRIPT italic_w / italic_o { italic_a } end_POSTSUBSCRIPT incurs 1.372.25×1.37\sim 2.25\times1.37 ∼ 2.25 × lower communication than PUMA, and 1.251.66×1.25\sim 1.66\times1.25 ∼ 1.66 × than MPCFormer. When combined with Quad approximation, both MPCFormer (Quad) and Ditto incur lower communication than PUMA. Concretely, the communication size of Ditto is 1.281.70×1.28\sim 1.70\times1.28 ∼ 1.70 × lower than MPCFormer (Quad) and 2.373.43×2.37\sim 3.43\times2.37 ∼ 3.43 × than PUMA. Owing to the reduction of communication size, Ditto is 3.144.40×3.14\sim 4.40\times3.14 ∼ 4.40 × and 1.442.35×1.44\sim 2.35\times1.44 ∼ 2.35 × faster than MPCFormer (Quad) and PUMA, respectively. One exception is that MPCFormer incurs lower communication size on GPT2-base. This may be because GPT2 models have a larger vocabulary size, thus leading to a much higher overhead in the embedding layer for Ditto 444We convert the input token ids to one-hot vectors using MPC, while MPCFormer performs the conversion in the client locally..

5.3 Extensive Experiments

We hereby evaluate the efficiency with varying input sequence length. The experiments for varying batch size and network environments, along with comparison to state-of-the-art 2PC works are presented in Appendix A.

Table 2: Inference efficiency on Bert-base and GPT2-base with varying input sequence length.
Model Method #Input Length
32 64 128 256
Comm. Time Comm. Time Comm. Time Comm. Time
Bert-base Baseline 2.79 13.57 6.24 27.32 15.12 53.17 40.65 114.83
MPCFormer 2.08 35.15 3.02 63.94 6.70 124.81 19.12 253.20
PUMA 2.16 12.97 4.65 22.59 10.59 43.98 26.07 88.70
Ditto 0.72 7.36 1.68 14.80 4.35 30.58 12.78 62.48
3.00×\times× 1.76×\times× 2.77×\times× 1.53×\times× 2.43×\times× 1.44×\times× 2.04×\times× 1.42×\times×
GPT2-base Baseline 8.88 37.88 12.50 52.63 22.03 84.04 47.91 157.47
MPCFormer 3.05 57.57 4.01 104.97 7.73 182.20 20.24 403.85
PUMA 8.61 35.22 11.95 48.33 17.57 73.77 33.00 131.33
Ditto 3.59 19.52 5.18 29.41 8.25 49.93 17.44 93.52
2.40×\times× 1.80×\times× 2.31×\times× 1.64×\times× 2.13×\times× 1.48×\times× 1.89×\times× 1.40×\times×

Varying Input Sequence Length. The language models typically have conversation sentences as inputs, thus having different input lengths. We hereby conduct experiments with input length {32,64,128,256}absent3264128256\in\{32,64,128,256\}∈ { 32 , 64 , 128 , 256 } on Bert-base and GPT2-base to make a more comprehensive evaluation.

The results are shown in Table 2 (the speedup numbers are against PUMA). In general, the communication size of Ditto is about 23×2\sim 3\times2 ∼ 3 × lower than the state-of-the-art PUMA. Owing to the communication reduction, Ditto achieves a speedup of about 1.41.8×1.4\sim 1.8\times1.4 ∼ 1.8 × against PUMA and a speedup of about 2.94.8×2.9\sim 4.8\times2.9 ∼ 4.8 × against MPCFormer.

Ablation Studies. We study the effects of quantization and approximation on Bert models. As shown in Table 3, the quantization with the Quad approximation for GeLU generally results in negligible degradation in utility. The speedup achieved against the vanilla baseline is approximately 1.411.56×1.41\sim 1.56\times1.41 ∼ 1.56 × with quantization alone and 1.742.09×1.74\sim 2.09\times1.74 ∼ 2.09 × with the additional GeLU approximation.

Table 3: Ablation studies of Ditto on Bert models.
Method Approx. Bert-base Bert-large
RTE CoLA QQP QNLI Speedup RTE CoLA QQP QNLI Speedup
Baseline - 68.59 57.06 87.96 91.62 - 72.56 63.09 88.52 92.58 -
Dittow/o{a}subscriptDitto𝑤𝑜𝑎\texttt{Ditto}_{w/o\{a\}}Ditto start_POSTSUBSCRIPT italic_w / italic_o { italic_a } end_POSTSUBSCRIPT - 67.87 54.17 87.15 91.74 1.41×\times× 72.55 56.25 88.22 92.58 1.56×\times×
Ditto Quad 67.82 54.52 87.72 91.78 1.74×\times× 71.84 56.45 88.23 92.57 2.09×\times×

6 Conclusion

In this paper, we propose a framework Ditto to enable secure quantization-aware inference of Transformer models. By incorporating MPC-friendly ML quantization and quantization-aware MPC execution, Ditto reduces the overhead and enhances the inference efficiency. In the future, we plan to investigate adopting more aggressive quantization methods, i.e., using lower bits in secure inference.

Acknowledgement

We would like to thank the anonymous reviewers and program chairs for their invaluable comments. We would also like to acknowledge the insightful feedback and efforts provided by Ye Dong on the final version of our paper. We extend our appreciation to Wen-jie Lu, Zhicong Huang, and Cheng Hong for their significant contributions to SPU and Ditto. Lastly, we thank all members of the SecretFlow team for their supports throughout this project.

Impact Statements

This paper presents work whose goal is to advance the field of Machine Learning. Our work aims to address the critical privacy concerns arising in the widely used Transformer-based inference. We employ cryptographic secure multi-party computation techniques, which are of provable security, to protect both the user query and model weights during the Transformer inference. However, it is also worth acknowledging the efficiency decay caused by secure computation. One notable contribution of our work is the novel optimizations that incorporate plaintext quantization and secure computations over dynamic rings. The advancements could have broader implications to make secure Transformer inference more practical in the real world.

References

  • Agrawal et al. (2019) Agrawal, N., Shamsabadi, A. S., Kusner, M. J., and Gascón, A. QUOTIENT: two-party secure neural network training and prediction. In Cavallaro, L., Kinder, J., Wang, X., and Katz, J. (eds.), Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, CCS 2019, London, UK, November 11-15, 2019, pp.  1231–1247. ACM, 2019. doi: 10.1145/3319535.3339819. URL https://doi.org/10.1145/3319535.3339819.
  • Akimoto et al. (2023) Akimoto, Y., Fukuchi, K., Akimoto, Y., and Sakuma, J. Privformer: Privacy-preserving transformer with mpc. In 2023 IEEE 8th European Symposium on Security and Privacy (EuroSP), pp.  392–410, Los Alamitos, CA, USA, 2023. IEEE Computer Society. doi: 10.1109/EuroSP57164.2023.00031. URL https://doi.ieeecomputersociety.org/10.1109/EuroSP57164.2023.00031.
  • Araki et al. (2016) Araki, T., Furukawa, J., Lindell, Y., Nof, A., and Ohara, K. High-throughput semi-honest secure three-party computation with an honest majority. In Weippl, E. R., Katzenbeisser, S., Kruegel, C., Myers, A. C., and Halevi, S. (eds.), Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, October 24-28, 2016, pp.  805–817. ACM, 2016. doi: 10.1145/2976749.2978331. URL https://doi.org/10.1145/2976749.2978331.
  • Bombari et al. (2022) Bombari, S., Amani, M. H., and Mondelli, M. Memorization and optimization in deep neural networks with minimum over-parameterization. In NeurIPS, 2022. URL http://papers.nips.cc/paper_files/paper/2022/hash/323746f0ae2fbd8b6f500dc2d5c5f898-Abstract-Conference.html.
  • Brown et al. (2020) Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., and Amodei, D. Language models are few-shot learners, 2020.
  • Byali et al. (2020) Byali, M., Chaudhari, H., Patra, A., and Suresh, A. FLASH: fast and robust framework for privacy-preserving machine learning. Proc. Priv. Enhancing Technol., 2020(2):459–480, 2020. doi: 10.2478/popets-2020-0036. URL https://doi.org/10.2478/popets-2020-0036.
  • Catrina & Saxena (2010) Catrina, O. and Saxena, A. Secure computation with fixed-point numbers. In Sion, R. (ed.), Financial Cryptography and Data Security, 14th International Conference, FC 2010, Tenerife, Canary Islands, Spain, January 25-28, 2010, Revised Selected Papers, volume 6052 of Lecture Notes in Computer Science, pp.  35–50. Springer, 2010. doi: 10.1007/978-3-642-14577-3“˙6. URL https://doi.org/10.1007/978-3-642-14577-3_6.
  • Chen et al. (2021) Chen, X., Xie, S., and He, K. An empirical study of training self-supervised vision transformers. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pp.  9620–9629. IEEE, 2021. doi: 10.1109/ICCV48922.2021.00950. URL https://doi.org/10.1109/ICCV48922.2021.00950.
  • Chou et al. (2018) Chou, E., Beal, J., Levy, D., Yeung, S., Haque, A., and Fei-Fei, L. Faster cryptonets: Leveraging sparsity for real-world encrypted inference. CoRR, abs/1811.09953, 2018. URL http://arxiv.org/abs/1811.09953.
  • Dalskov et al. (2020) Dalskov, A. P. K., Escudero, D., and Keller, M. Secure evaluation of quantized neural networks. Proc. Priv. Enhancing Technol., 2020(4):355–375, 2020. doi: 10.2478/POPETS-2020-0077. URL https://doi.org/10.2478/popets-2020-0077.
  • Dalskov et al. (2021) Dalskov, A. P. K., Escudero, D., and Keller, M. Fantastic four: Honest-majority four-party secure computation with malicious security. In Bailey, M. and Greenstadt, R. (eds.), 30th USENIX Security Symposium, USENIX Security 2021, August 11-13, 2021, pp.  2183–2200. USENIX Association, 2021. URL https://www.usenix.org/conference/usenixsecurity21/presentation/dalskov.
  • Dettmers et al. (2022) Dettmers, T., Lewis, M., Belkada, Y., and Zettlemoyer, L. Llm.int8(): 8-bit matrix multiplication for transformers at scale. CoRR, abs/2208.07339, 2022. doi: 10.48550/arXiv.2208.07339. URL https://doi.org/10.48550/arXiv.2208.07339.
  • Devlin et al. (2019) Devlin, J., Chang, M., Lee, K., and Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In Burstein, J., Doran, C., and Solorio, T. (eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp.  4171–4186. Association for Computational Linguistics, 2019. doi: 10.18653/v1/n19-1423. URL https://doi.org/10.18653/v1/n19-1423.
  • Dong et al. (2023) Dong, Y., Lu, W., Zheng, Y., Wu, H., Zhao, D., Tan, J., Huang, Z., Hong, C., Wei, T., and Chen, W. PUMA: secure inference of llama-7b in five minutes. CoRR, abs/2307.12533, 2023. doi: 10.48550/arXiv.2307.12533. URL https://doi.org/10.48550/arXiv.2307.12533.
  • Dosovitskiy et al. (2021) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., and Houlsby, N. An image is worth 16x16 words: Transformers for image recognition at scale. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021. URL https://openreview.net/forum?id=YicbFdNTTy.
  • Escudero et al. (2020) Escudero, D., Ghosh, S., Keller, M., Rachuri, R., and Scholl, P. Improved primitives for MPC over mixed arithmetic-binary circuits. In Micciancio, D. and Ristenpart, T. (eds.), Advances in Cryptology - CRYPTO 2020 - 40th Annual International Cryptology Conference, CRYPTO 2020, Santa Barbara, CA, USA, August 17-21, 2020, Proceedings, Part II, volume 12171 of Lecture Notes in Computer Science, pp.  823–852. Springer, 2020. doi: 10.1007/978-3-030-56880-1“˙29. URL https://doi.org/10.1007/978-3-030-56880-1_29.
  • Frantar et al. (2022) Frantar, E., Ashkboos, S., Hoefler, T., and Alistarh, D. GPTQ: accurate post-training quantization for generative pre-trained transformers. CoRR, abs/2210.17323, 2022. doi: 10.48550/arXiv.2210.17323. URL https://doi.org/10.48550/arXiv.2210.17323.
  • Frantar et al. (2023) Frantar, E., Ashkboos, S., Hoefler, T., and Alistarh, D. OPTQ: accurate quantization for generative pre-trained transformers. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. URL https://openreview.net/pdf?id=tcbBPnfwxS.
  • Gholami et al. (2021) Gholami, A., Kim, S., Dong, Z., Yao, Z., Mahoney, M. W., and Keutzer, K. A survey of quantization methods for efficient neural network inference. CoRR, abs/2103.13630, 2021. URL https://arxiv.org/abs/2103.13630.
  • Hao et al. (2022) Hao, M., Li, H., Chen, H., Xing, P., Xu, G., and Zhang, T. Iron: Private inference on transformers. In Oh, A. H., Agarwal, A., Belgrave, D., and Cho, K. (eds.), Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=deyqjpcTfsG.
  • Huang et al. (2022) Huang, Z., Lu, W., Hong, C., and Ding, J. Cheetah: Lean and fast secure two-party deep neural network inference. In Butler, K. R. B. and Thomas, K. (eds.), 31st USENIX Security Symposium, USENIX Security 2022, Boston, MA, USA, August 10-12, 2022, pp.  809–826. USENIX Association, 2022. URL https://www.usenix.org/conference/usenixsecurity22/presentation/huang-zhicong.
  • Jiao et al. (2020) Jiao, X., Yin, Y., Shang, L., Jiang, X., Chen, X., Li, L., Wang, F., and Liu, Q. Tinybert: Distilling BERT for natural language understanding. In Cohn, T., He, Y., and Liu, Y. (eds.), Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020, volume EMNLP 2020 of Findings of ACL, pp.  4163–4174. Association for Computational Linguistics, 2020. doi: 10.18653/v1/2020.findings-emnlp.372. URL https://doi.org/10.18653/v1/2020.findings-emnlp.372.
  • Kim et al. (2021) Kim, S., Gholami, A., Yao, Z., Mahoney, M. W., and Keutzer, K. I-BERT: integer-only BERT quantization. In Meila, M. and Zhang, T. (eds.), Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pp.  5506–5518. PMLR, 2021. URL http://proceedings.mlr.press/v139/kim21d.html.
  • Li et al. (2023) Li, D., Wang, H., Shao, R., Guo, H., Xing, E., and Zhang, H. MPCFORMER: FAST, PERFORMANT AND PRIVATE TRANSFORMER INFERENCE WITH MPC. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=CWmvjOEhgH-.
  • Liang et al. (2023) Liang, Z., Wang, P., Zhang, R., Xing, L., Xu, N., and Zhang, S. Merge: Fast private text generation, 2023.
  • Liu & Liu (2023) Liu, X. and Liu, Z. Llms can understand encrypted prompt: Towards privacy-computing friendly transformers, 2023.
  • Lu et al. (2020) Lu, W., Fang, Y., Huang, Z., Hong, C., Chen, C., Qu, H., Zhou, Y., and Ren, K. Faster secure multiparty computation of adaptive gradient descent. In Zhang, B., Popa, R. A., Zaharia, M., Gu, G., and Ji, S. (eds.), PPMLP’20: Proceedings of the 2020 Workshop on Privacy-Preserving Machine Learning in Practice, Virtual Event, USA, November, 2020, pp.  47–49. ACM, 2020. doi: 10.1145/3411501.3419427. URL https://doi.org/10.1145/3411501.3419427.
  • Lu et al. (2023) Lu, W., Huang, Z., Gu, Z., Li, J., Liu, J., Ren, K., Hong, C., Wei, T., and Chen, W. Bumblebee: Secure two-party inference framework for large transformers. IACR Cryptol. ePrint Arch., pp.  1678, 2023. URL https://eprint.iacr.org/2023/1678.
  • Ma et al. (2023) Ma, J., Zheng, Y., Feng, J., Zhao, D., Wu, H., Fang, W., Tan, J., Yu, C., Zhang, B., and Wang, L. Secretflow-spu: A performant and user-friendly framework for privacy-preserving machine learning. In Lawall, J. and Williams, D. (eds.), 2023 USENIX Annual Technical Conference, USENIX ATC 2023, Boston, MA, USA, July 10-12, 2023, pp.  17–33. USENIX Association, 2023. URL https://www.usenix.org/conference/atc23/presentation/ma.
  • Merity et al. (2016) Merity, S., Xiong, C., Bradbury, J., and Socher, R. Pointer sentinel mixture models, 2016.
  • Micikevicius et al. (2017) Micikevicius, P., Narang, S., Alben, J., Diamos, G. F., Elsen, E., García, D., Ginsburg, B., Houston, M., Kuchaiev, O., Venkatesh, G., and Wu, H. Mixed precision training. CoRR, abs/1710.03740, 2017. URL http://arxiv.org/abs/1710.03740.
  • Mohassel & Rindal (2018) Mohassel, P. and Rindal, P. Aby3: A mixed protocol framework for machine learning. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pp.  35–52, New York, NY, USA, 2018. Association for Computing Machinery. ISBN 9781450356930. doi: 10.1145/3243734.3243760. URL https://doi.org/10.1145/3243734.3243760.
  • Mohassel & Zhang (2017) Mohassel, P. and Zhang, Y. Secureml: A system for scalable privacy-preserving machine learning. In 2017 IEEE Symposium on Security and Privacy, SP 2017, San Jose, CA, USA, May 22-26, 2017, pp.  19–38. IEEE Computer Society, 2017. doi: 10.1109/SP.2017.12. URL https://doi.org/10.1109/SP.2017.12.
  • Patra et al. (2021) Patra, A., Schneider, T., Suresh, A., and Yalame, H. ABY2.0: improved mixed-protocol secure two-party computation. In Bailey, M. and Greenstadt, R. (eds.), 30th USENIX Security Symposium, USENIX Security 2021, August 11-13, 2021, pp.  2165–2182. USENIX Association, 2021. URL https://www.usenix.org/conference/usenixsecurity21/presentation/patra.
  • Radford et al. (2019) Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., and Sutskever, I. Language models are unsupervised multitask learners. 2019.
  • Riazi et al. (2019) Riazi, M. S., Samragh, M., Chen, H., Laine, K., Lauter, K. E., and Koushanfar, F. XONN: xnor-based oblivious deep neural network inference. In Heninger, N. and Traynor, P. (eds.), 28th USENIX Security Symposium, USENIX Security 2019, Santa Clara, CA, USA, August 14-16, 2019, pp.  1501–1518. USENIX Association, 2019. URL https://www.usenix.org/conference/usenixsecurity19/presentation/riazi.
  • Shamir (1979) Shamir, A. How to share a secret. Commun. ACM, 22(11):612–613, 1979. doi: 10.1145/359168.359176. URL https://doi.org/10.1145/359168.359176.
  • Shokri et al. (2017) Shokri, R., Stronati, M., Song, C., and Shmatikov, V. Membership inference attacks against machine learning models. In 2017 IEEE Symposium on Security and Privacy, SP 2017, San Jose, CA, USA, May 22-26, 2017, pp.  3–18. IEEE Computer Society, 2017. doi: 10.1109/SP.2017.41. URL https://doi.org/10.1109/SP.2017.41.
  • Tan et al. (2021) Tan, S., Knott, B., Tian, Y., and Wu, D. J. Cryptgpu: Fast privacy-preserving machine learning on the GPU. In 42nd IEEE Symposium on Security and Privacy, SP 2021, San Francisco, CA, USA, 24-27 May 2021, pp.  1021–1038. IEEE, 2021. doi: 10.1109/SP40001.2021.00098. URL https://doi.org/10.1109/SP40001.2021.00098.
  • Vaswani et al. (2017) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. Attention is all you need. In Guyon, I., von Luxburg, U., Bengio, S., Wallach, H. M., Fergus, R., Vishwanathan, S. V. N., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pp.  5998–6008, 2017. URL https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.
  • Wagh et al. (2021) Wagh, S., Tople, S., Benhamouda, F., Kushilevitz, E., Mittal, P., and Rabin, T. Falcon: Honest-majority maliciously secure framework for private deep learning. Proc. Priv. Enhancing Technol., 2021(1):188–208, 2021. doi: 10.2478/popets-2021-0011. URL https://doi.org/10.2478/popets-2021-0011.
  • Wang et al. (2019) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., and Bowman, S. R. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019. URL https://openreview.net/forum?id=rJ4km2R5t7.
  • Wikipedia contributors (2004) Wikipedia contributors. Plagiarism — Wikipedia, the free encyclopedia, 2004. URL https://en.wikipedia.org/w/index.php?title=Plagiarism&oldid=5139350. [Online; accessed 22-July-2004].
  • Wolf et al. (2020) Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Davison, J., Shleifer, S., von Platen, P., Ma, C., Jernite, Y., Plu, J., Xu, C., Scao, T. L., Gugger, S., Drame, M., Lhoest, Q., and Rush, A. M. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp.  38–45, Online, October 2020. Association for Computational Linguistics. URL https://www.aclweb.org/anthology/2020.emnlp-demos.6.
  • Yao (1986) Yao, A. C. How to generate and exchange secrets (extended abstract). In 27th Annual Symposium on Foundations of Computer Science, Toronto, Canada, 27-29 October 1986, pp.  162–167. IEEE Computer Society, 1986. doi: 10.1109/SFCS.1986.25. URL https://doi.org/10.1109/SFCS.1986.25.
  • Yao et al. (2021) Yao, Z., Dong, Z., Zheng, Z., Gholami, A., Yu, J., Tan, E., Wang, L., Huang, Q., Wang, Y., Mahoney, M. W., and Keutzer, K. HAWQ-V3: dyadic neural network quantization. In Meila, M. and Zhang, T. (eds.), Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pp.  11875–11886. PMLR, 2021. URL http://proceedings.mlr.press/v139/yao21a.html.
  • Yao et al. (2022) Yao, Z., Aminabadi, R. Y., Zhang, M., Wu, X., Li, C., and He, Y. Zeroquant: Efficient and affordable post-training quantization for large-scale transformers. In NeurIPS, 2022. URL http://papers.nips.cc/paper_files/paper/2022/hash/adf7fa39d65e2983d724ff7da57f00ac-Abstract-Conference.html.
  • Zeng et al. (2023) Zeng, W., Li, M., Xiong, W., Tong, T., Lu, W., Tan, J., Wang, R., and Huang, R. Mpcvit: Searching for accurate and efficient mpc-friendly vision transformer with heterogeneous attention. In IEEE/CVF International Conference on Computer Vision, ICCV 2023, Paris, France, October 1-6, 2023, pp.  5029–5040. IEEE, 2023. doi: 10.1109/ICCV51070.2023.00466. URL https://doi.org/10.1109/ICCV51070.2023.00466.
  • Zhang et al. (2023) Zhang, Y., Chen, D., Kundu, S., Li, C., and Beerel, P. A. Sal-vit: Towards latency efficient private inference on vit using selective attention search with a learnable softmax approximation. In IEEE/CVF International Conference on Computer Vision, ICCV 2023, Paris, France, October 1-6, 2023, pp.  5093–5102. IEEE, 2023. doi: 10.1109/ICCV51070.2023.00472. URL https://doi.org/10.1109/ICCV51070.2023.00472.
  • Zhu et al. (2015) Zhu, Y., Kiros, R., Zemel, R., Salakhutdinov, R., Urtasun, R., Torralba, A., and Fidler, S. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In The IEEE International Conference on Computer Vision (ICCV), December 2015.

Appendix A Supplementary Experiments

Varying Batch Size.

We evaluate the secure inference with batch size {1,2,4,8}absent1248\in\{1,2,4,8\}∈ { 1 , 2 , 4 , 8 } on Bert-base model. As shown in Table 4, the communication size and runtime increase about linearly to the batch size for all the methods. Ditto remains about 1.4×1.4\times1.4 × and 4.0×4.0\times4.0 × faster than PUMA and MPCFormer, respectively.

Table 4: Inference efficiency on Bert-base with varying batch size. The input length is set to 128.
Model Method #Batch Size
1 2 4 8
Comm. Time Comm. Time Comm. Time Comm. Time
Bert-base Baseline 15.12 53.17 30.21 97.56 60.49 185.63 120.77 365.76
MPCFormer 6.70 124.81 11.77 227.82 21.93 432.06 42.25 839.83
PUMA 10.59 43.98 21.20 79.70 42.27 153.88 84.65 297.90
Ditto 4.35 30.58 8.68 56.29 17.35 107.52 34.96 208.63
2.43×\times× 1.44×\times× 2.44×\times× 1.42×\times× 2.44×\times× 1.43×\times× 2.42×\times× 1.43×\times×
Varying Network Environment.

We evaluate the secure inference under two different network settings, i.e., LAN and WAN. Since secure inference based on MPC is communication-bound, the network status has a significant effect on the efficiency. As shown in Table 5, the runtime increases dramatically in WAN, which is nearly 10×10\times10 × that in LAN. This is because WAN has a smaller bandwidth and higher latency. Compared to PUMA, Ditto is still 1.461.53×1.46\sim 1.53\times1.46 ∼ 1.53 × faster in WAN due to the reduction of communication overhead.

Table 5: Inference efficiency of Bert-base and GPT2-base under different network environments.
Network Model Runtime (s)
Baseline MPCFormer PUMA Ditto
Bert-base LAN 53.17 124.81 43.98 30.58 1.44×\times×
WAN 551.45 888.55 444.43 303.50 1.46×\times×
GPT2-base LAN 39.60 57.57 38.33 20.30 1.89×\times×
WAN 382.98 588.14 357.65 233.32 1.53×\times×
Comparison to 2PC works.

We also evaluate the state-of-the-art 2PC works for secure Transformer inference to make a comprehensive comparison. The concrete runtime and communication size are from the paper  (Lu et al., 2023). The efficiency results are illustrated in Figure 5. In general, Ditto is orders of magnitude faster than Iron (Hao et al., 2022) and BumbleBee (Lu et al., 2023) on Bert (base and large) models with input sequence length of 128.

Refer to caption
Refer to caption
(a) Bert-base
Refer to caption
(b) Bert-large
Figure 5: Efficiency comparison to 2PC methods on Bert models. The input sequences are of length 128.

Appendix B Formulated Protocol Constructions

In this section, we present the formulated protocol constructions for the fixed-point computation of GeLU and Softmax functions mentioned in Section 4.2.2, and the type cast MPC primitives introduced in Section 4.3.1.

The approximation of GeLU function that computes 𝖦𝖾𝖫𝖴(x)=0.125x2+0.25x+0.5𝖦𝖾𝖫𝖴𝑥0.125superscript𝑥20.25𝑥0.5\mathsf{GeLU}(x)=0.125x^{2}+0.25x+0.5sansserif_GeLU ( italic_x ) = 0.125 italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 0.25 italic_x + 0.5 is presented in Algorithm 2. The Softmax function that computes with different fixed-point representations are shown in Algorithm 3.

Algorithm 2 Approximated 𝖦𝖾𝖫𝖴𝖦𝖾𝖫𝖴\mathsf{GeLU}sansserif_GeLU Protocol
0:  Fixed-point x𝑥xitalic_x over FXPfsuperscriptsubscriptFXP𝑓\texttt{FXP}_{\ell}^{f}FXP start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT; Polynomial coefficients a,b,c={0.125,0.25,0.5}𝑎𝑏𝑐0.1250.250.5a,b,c=\{0.125,0.25,0.5\}italic_a , italic_b , italic_c = { 0.125 , 0.25 , 0.5 }
0:  Fixed-point y𝑦yitalic_y over FXPfsuperscriptsubscriptFXP𝑓\texttt{FXP}_{\ell}^{f}FXP start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT;
1:  aint=a2fa_{int}=\lfloor a\cdot 2^{f}\rceilitalic_a start_POSTSUBSCRIPT italic_i italic_n italic_t end_POSTSUBSCRIPT = ⌊ italic_a ⋅ 2 start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT ⌉, bint=b2fb_{int}=\lfloor b\cdot 2^{f}\rceilitalic_b start_POSTSUBSCRIPT italic_i italic_n italic_t end_POSTSUBSCRIPT = ⌊ italic_b ⋅ 2 start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT ⌉, cint=c2fc_{int}=\lfloor c\cdot 2^{f}\rceilitalic_c start_POSTSUBSCRIPT italic_i italic_n italic_t end_POSTSUBSCRIPT = ⌊ italic_c ⋅ 2 start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT ⌉
2:  y^=aintx/2f+bint,0.125x+0.25^𝑦subscript𝑎𝑖𝑛𝑡𝑥superscript2𝑓subscript𝑏𝑖𝑛𝑡0.125𝑥0.25\hat{y}=a_{int}\cdot x/2^{f}+b_{int},~{}~{}~{}~{}~{}~{}\vartriangleright 0.125% \cdot x+0.25over^ start_ARG italic_y end_ARG = italic_a start_POSTSUBSCRIPT italic_i italic_n italic_t end_POSTSUBSCRIPT ⋅ italic_x / 2 start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT + italic_b start_POSTSUBSCRIPT italic_i italic_n italic_t end_POSTSUBSCRIPT , ⊳ 0.125 ⋅ italic_x + 0.25
3:  y=xy^/2f+cint,x(0.125x+0.25)+0.5𝑦𝑥^𝑦superscript2𝑓subscript𝑐𝑖𝑛𝑡𝑥0.125𝑥0.250.5y=x\cdot\hat{y}/2^{f}+c_{int},~{}~{}~{}~{}~{}~{}\vartriangleright x\cdot(0.125% \cdot x+0.25)+0.5italic_y = italic_x ⋅ over^ start_ARG italic_y end_ARG / 2 start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT + italic_c start_POSTSUBSCRIPT italic_i italic_n italic_t end_POSTSUBSCRIPT , ⊳ italic_x ⋅ ( 0.125 ⋅ italic_x + 0.25 ) + 0.5
4:  return  y𝑦yitalic_y
Algorithm 3 Approximated 𝖲𝗈𝖿𝗍𝗆𝖺𝗑𝖲𝗈𝖿𝗍𝗆𝖺𝗑\mathsf{Softmax}sansserif_Softmax Protocol
0:  Fixed-point x𝑥xitalic_x over FXPfsuperscriptsubscriptFXP𝑓\texttt{FXP}_{\ell}^{f}FXP start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT; <superscript\ell<\ell^{\prime}roman_ℓ < roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and f<f𝑓superscript𝑓f<f^{\prime}italic_f < italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT
0:  Fixed-point y𝑦yitalic_y over FXPfsuperscriptsubscriptFXP𝑓\texttt{FXP}_{\ell}^{f}FXP start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT;
1:  x=x𝖬𝖺𝗑(x)𝑥𝑥𝖬𝖺𝗑𝑥x=x-\mathsf{Max}(x)italic_x = italic_x - sansserif_Max ( italic_x ),       \vartriangleright Max computes with precision bit f𝑓fitalic_f
2:  x^=𝖢𝖺𝗌𝗍(x,FXPf,FXPf)^𝑥𝖢𝖺𝗌𝗍𝑥superscriptsubscriptFXP𝑓superscriptsubscriptFXPsuperscriptsuperscript𝑓\hat{x}=\mathsf{Cast}(x,\texttt{FXP}_{\ell}^{f},\texttt{FXP}_{\ell^{\prime}}^{% f^{\prime}})over^ start_ARG italic_x end_ARG = sansserif_Cast ( italic_x , FXP start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT , FXP start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ),       \vartriangleright from FXPfsuperscriptsubscriptFXP𝑓\texttt{FXP}_{\ell}^{f}FXP start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT to FXPfsuperscriptsubscriptFXPsuperscriptsuperscript𝑓\texttt{FXP}_{\ell^{\prime}}^{f^{\prime}}FXP start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT
3:  x^exp=𝖤𝗑𝗉(x^)subscript^𝑥𝑒𝑥𝑝𝖤𝗑𝗉^𝑥\hat{x}_{exp}=\mathsf{Exp}(\hat{x})over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_e italic_x italic_p end_POSTSUBSCRIPT = sansserif_Exp ( over^ start_ARG italic_x end_ARG ),       \vartriangleright Exponential computes with precision bit fsuperscript𝑓f^{\prime}italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT
4:  y^=x^exp/𝖲𝗎𝗆(x^exp,axis=1)^𝑦subscript^𝑥𝑒𝑥𝑝𝖲𝗎𝗆subscript^𝑥𝑒𝑥𝑝𝑎𝑥𝑖𝑠1\hat{y}=\hat{x}_{exp}/\mathsf{Sum}(\hat{x}_{exp},axis=-1)over^ start_ARG italic_y end_ARG = over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_e italic_x italic_p end_POSTSUBSCRIPT / sansserif_Sum ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_e italic_x italic_p end_POSTSUBSCRIPT , italic_a italic_x italic_i italic_s = - 1 )
5:  y=𝖢𝖺𝗌𝗍(y^,FXPf,FXPf)𝑦𝖢𝖺𝗌𝗍^𝑦superscriptsubscriptFXPsuperscriptsuperscript𝑓superscriptsubscriptFXP𝑓y=\mathsf{Cast}(\hat{y},\texttt{FXP}_{\ell^{\prime}}^{f^{\prime}},\texttt{FXP}% _{\ell}^{f})italic_y = sansserif_Cast ( over^ start_ARG italic_y end_ARG , FXP start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT , FXP start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT ),       \vartriangleright from FXPfsuperscriptsubscriptFXPsuperscriptsuperscript𝑓\texttt{FXP}_{\ell^{\prime}}^{f^{\prime}}FXP start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT to FXPfsuperscriptsubscriptFXP𝑓\texttt{FXP}_{\ell}^{f}FXP start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT
6:  return  y𝑦yitalic_y

In the following, the Algorithm 4 depicts the construction of downcast protocol in the MPC domain.

Algorithm 4 Secure 𝖣𝗈𝗐𝗇𝖢𝖺𝗌𝗍𝖣𝗈𝗐𝗇𝖢𝖺𝗌𝗍\mathsf{DownCast}sansserif_DownCast Protocol
0:  RSS-shared x\llbracket x\rrbracket_{\ell}⟦ italic_x ⟧ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT over FXPfsuperscriptsubscriptFXP𝑓\texttt{FXP}_{\ell}^{f}FXP start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT;
0:  RSS-shared x\llbracket x^{\prime}\rrbracket_{\ell^{\prime}}⟦ italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⟧ start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT over FXPfsuperscriptsubscriptFXPsuperscriptsuperscript𝑓\texttt{FXP}_{\ell^{\prime}}^{f^{\prime}}FXP start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT, where x/2f=x/2f𝑥superscript2𝑓superscript𝑥superscript2superscript𝑓x/2^{f}=x^{\prime}/2^{f^{\prime}}italic_x / 2 start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT = italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT / 2 start_POSTSUPERSCRIPT italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT.
1:  Pisubscript𝑃𝑖P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for i{0,1,2}𝑖012i\in\{0,1,2\}italic_i ∈ { 0 , 1 , 2 } proceed as follows:
xi=xi(ff)mod2xi+1=xi+1(ff)mod2,x/2ffmod2=xmod2\begin{split}&x^{\prime}_{i}=x_{i}\gg(f-f^{\prime})\mod 2^{\ell^{\prime}}\\ &x^{\prime}_{i+1}=x_{i+1}\gg(f-f^{\prime})\mod 2^{\ell^{\prime}},~{}~{}~{}~{}~% {}~{}\vartriangleright x/2^{f-f^{\prime}}\mod 2^{\ell^{\prime}}=x^{\prime}\mod 2% ^{\ell^{\prime}}\\ \end{split}start_ROW start_CELL end_CELL start_CELL italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≫ ( italic_f - italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) roman_mod 2 start_POSTSUPERSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ≫ ( italic_f - italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) roman_mod 2 start_POSTSUPERSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT , ⊳ italic_x / 2 start_POSTSUPERSCRIPT italic_f - italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT roman_mod 2 start_POSTSUPERSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT = italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT roman_mod 2 start_POSTSUPERSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_CELL end_ROW
2:  return  x={x0,x1,x2}\llbracket x^{\prime}\rrbracket_{\ell^{\prime}}=\{x^{\prime}_{0},x^{\prime}_{1% },x^{\prime}_{2}\}⟦ italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⟧ start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = { italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT }.

Appendix C Correctness Analysis

In this section, we analyze the correctness of proposed type cast protocols in Section 4.3.1. The type cast in MPC involves converting shares among rings of different sizes. Consider two rings, 2subscriptsuperscript2\mathbb{Z}_{2^{\ell}}blackboard_Z start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT and 2subscriptsuperscript2superscript\mathbb{Z}_{2^{\ell^{\prime}}}blackboard_Z start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, and let x={x0,x1,x2}\llbracket x\rrbracket_{\ell}=\{x_{0},x_{1},x_{2}\}⟦ italic_x ⟧ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT = { italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT } be the input sharing of x𝑥xitalic_x over the first ring. Our goal is to obtain the sharing of x𝑥xitalic_x over the second ring, denoted as x={x0,x1,x2}\llbracket x^{\prime}\rrbracket_{\ell^{\prime}}=\{x^{\prime}_{0},x^{\prime}_{1% },x^{\prime}_{2}\}⟦ italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⟧ start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = { italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT }. We note that x\llbracket x\rrbracket_{\ell}⟦ italic_x ⟧ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT and x\llbracket x\rrbracket_{\ell^{\prime}}⟦ italic_x ⟧ start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT are encoded over FXPfsuperscriptsubscriptFXP𝑓\texttt{FXP}_{\ell}^{f}FXP start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT and FXPfsuperscriptsubscriptFXPsuperscriptsuperscript𝑓\texttt{FXP}_{\ell^{\prime}}^{f^{\prime}}FXP start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT, respectively. Hence, we require x/2f=x/2f𝑥superscript2𝑓superscript𝑥superscript2superscript𝑓x/2^{f}=x^{\prime}/2^{f^{\prime}}italic_x / 2 start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT = italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT / 2 start_POSTSUPERSCRIPT italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT. Note that to ensure correctness, we have the assumption that xsuperscript𝑥x^{\prime}italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT can be represented using superscript\ell^{\prime}roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT bits, i.e., x[21,211]superscript𝑥superscript2superscript1superscript2superscript11x^{\prime}\in[-2^{\ell^{\prime}-1},2^{\ell^{\prime}-1}-1]italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ [ - 2 start_POSTSUPERSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , 2 start_POSTSUPERSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT - 1 ].

Proof.

Based on the relationship between {,f}𝑓\{\ell,f\}{ roman_ℓ , italic_f } and {,f}superscriptsuperscript𝑓\{\ell^{\prime},f^{\prime}\}{ roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT }, we have two cases that correspond to downcast and upcast, respectively.

Case 1: {,f}>{,f}𝑓superscriptsuperscript𝑓\{\ell,f\}>\{\ell^{\prime},f^{\prime}\}{ roman_ℓ , italic_f } > { roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT } (Downcast). In Algorithm 4, the input x𝑥xitalic_x is firstly right-shifted by t=ff𝑡𝑓superscript𝑓t=f-f^{\prime}italic_t = italic_f - italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT bits to lower the precision. The following step is to convert x/2ff𝑥superscript2𝑓superscript𝑓x/2^{f-f^{\prime}}italic_x / 2 start_POSTSUPERSCRIPT italic_f - italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT to the smaller ring 2subscriptsuperscript2superscript\mathbb{Z}_{2^{\ell^{\prime}}}blackboard_Z start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_POSTSUBSCRIPT using modulo operation.

The above steps can be formulated as

x=x/2t=((x0+x1+x2)mod2)/2t=x0/2t+x1/2t+x2/2tw2t+w=(x0/2t+x1/2t+x2/2tw2t+w)mod2=(x0/2tmod2)+(x1/2tmod2)+(x2/2tmod2)(w2tmod2)+wmod2superscript𝑥𝑥superscript2𝑡modulosubscript𝑥0subscript𝑥1subscript𝑥2superscript2superscript2𝑡subscript𝑥0superscript2𝑡subscript𝑥1superscript2𝑡subscript𝑥2superscript2𝑡𝑤superscript2𝑡superscript𝑤modulosubscript𝑥0superscript2𝑡subscript𝑥1superscript2𝑡subscript𝑥2superscript2𝑡𝑤superscript2𝑡superscript𝑤superscript2superscriptmodulomodulosubscript𝑥0superscript2𝑡superscript2superscriptmodulosubscript𝑥1superscript2𝑡superscript2superscriptmodulosubscript𝑥2superscript2𝑡superscript2superscriptmodulo𝑤superscript2𝑡superscript2superscriptsuperscript𝑤superscript2superscript\begin{split}x^{\prime}=x/2^{t}&=((x_{0}+x_{1}+x_{2})\mod 2^{\ell})/2^{t}\\ &=x_{0}/2^{t}+x_{1}/2^{t}+x_{2}/2^{t}-w\cdot 2^{\ell-t}+w^{\prime}\\ &=(x_{0}/2^{t}+x_{1}/2^{t}+x_{2}/2^{t}-w\cdot 2^{\ell-t}+w^{\prime})\mod 2^{% \ell^{\prime}}\\ &=(x_{0}/2^{t}\mod 2^{\ell^{\prime}})+(x_{1}/2^{t}\mod 2^{\ell^{\prime}})+(x_{% 2}/2^{t}\mod 2^{\ell^{\prime}})\\ &-(w\cdot 2^{\ell-t}\mod 2^{\ell^{\prime}})+w^{\prime}\mod 2^{\ell^{\prime}}\\ \end{split}start_ROW start_CELL italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_x / 2 start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL start_CELL = ( ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) roman_mod 2 start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) / 2 start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT / 2 start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT / 2 start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT / 2 start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_w ⋅ 2 start_POSTSUPERSCRIPT roman_ℓ - italic_t end_POSTSUPERSCRIPT + italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT / 2 start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT / 2 start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT / 2 start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_w ⋅ 2 start_POSTSUPERSCRIPT roman_ℓ - italic_t end_POSTSUPERSCRIPT + italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) roman_mod 2 start_POSTSUPERSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT / 2 start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT roman_mod 2 start_POSTSUPERSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) + ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT / 2 start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT roman_mod 2 start_POSTSUPERSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) + ( italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT / 2 start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT roman_mod 2 start_POSTSUPERSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL - ( italic_w ⋅ 2 start_POSTSUPERSCRIPT roman_ℓ - italic_t end_POSTSUPERSCRIPT roman_mod 2 start_POSTSUPERSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) + italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT roman_mod 2 start_POSTSUPERSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_CELL end_ROW (2)

where w{0,1,2}superscript𝑤012w^{\prime}\in\{0,1,2\}italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ { 0 , 1 , 2 } denotes the potential carry bits from the lower t𝑡titalic_t bits. Since we have t=64(188)=54𝑡6418854\ell-t=64-(18-8)=54roman_ℓ - italic_t = 64 - ( 18 - 8 ) = 54, =32superscript32\ell^{\prime}=32roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 32, w2tmod2=0modulo𝑤superscript2𝑡superscript2superscript0w\cdot 2^{\ell-t}\mod 2^{\ell^{\prime}}=0italic_w ⋅ 2 start_POSTSUPERSCRIPT roman_ℓ - italic_t end_POSTSUPERSCRIPT roman_mod 2 start_POSTSUPERSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT = 0. We can finally get

x=(x0/2tmod2)+(x1/2tmod2)+(x2/2tmod2)+w=(x0+x1+x2+w)mod2superscript𝑥modulosubscript𝑥0superscript2𝑡superscript2superscriptmodulosubscript𝑥1superscript2𝑡superscript2superscriptmodulosubscript𝑥2superscript2𝑡superscript2superscriptsuperscript𝑤modulosubscriptsuperscript𝑥0subscriptsuperscript𝑥1subscriptsuperscript𝑥2superscript𝑤superscript2superscript\begin{split}x^{\prime}&=(x_{0}/2^{t}\mod 2^{\ell^{\prime}})+(x_{1}/2^{t}\mod 2% ^{\ell^{\prime}})+(x_{2}/2^{t}\mod 2^{\ell^{\prime}})+w^{\prime}\\ &=(x^{\prime}_{0}+x^{\prime}_{1}+x^{\prime}_{2}+w^{\prime})\mod 2^{\ell^{% \prime}}\end{split}start_ROW start_CELL italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_CELL start_CELL = ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT / 2 start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT roman_mod 2 start_POSTSUPERSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) + ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT / 2 start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT roman_mod 2 start_POSTSUPERSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) + ( italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT / 2 start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT roman_mod 2 start_POSTSUPERSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) + italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) roman_mod 2 start_POSTSUPERSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_CELL end_ROW (3)

The probabilistic wsuperscript𝑤w^{\prime}italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT occurs at the lowest significant bit, thus merely having a negligible impact of precision 2fsuperscript2superscript𝑓2^{-f^{\prime}}2 start_POSTSUPERSCRIPT - italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT. The correctness of Algorithm 4 thus holds. ∎

Case 2: {,f}<{,f}𝑓superscriptsuperscript𝑓\{\ell,f\}<\{\ell^{\prime},f^{\prime}\}{ roman_ℓ , italic_f } < { roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT } (Upcast). The input x𝑥xitalic_x is firstly converted to the larger ring 2subscriptsuperscript2superscript\mathbb{Z}_{2^{\ell^{\prime}}}blackboard_Z start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_POSTSUBSCRIPT using Algorithm 1, followed by a left-shifting of t𝑡titalic_t bits. Regarding Algorithm 1, the masking goes as

(x+r)mod2=x+rw^2modulo𝑥𝑟superscript2𝑥𝑟^𝑤superscript2\begin{split}(x+r)\mod 2^{\ell}=x+r-\hat{w}\cdot 2^{\ell}\end{split}start_ROW start_CELL ( italic_x + italic_r ) roman_mod 2 start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT = italic_x + italic_r - over^ start_ARG italic_w end_ARG ⋅ 2 start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT end_CELL end_ROW (4)

where w^=(x+r)>?2^𝑤𝑥𝑟?superscript2\hat{w}=(x+r)\overset{?}{>}2^{\ell}over^ start_ARG italic_w end_ARG = ( italic_x + italic_r ) over? start_ARG > end_ARG 2 start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT. The above equation can be transformed into

xmod2=(x+r)mod2r+w^2mod2modulo𝑥superscript2superscriptmodulo𝑥𝑟modulosuperscript2𝑟^𝑤superscript2superscript2superscript\begin{split}x\mod 2^{\ell^{\prime}}=(x+r)\mod 2^{\ell}-r+\hat{w}\cdot 2^{\ell% }\mod 2^{\ell^{\prime}}\end{split}start_ROW start_CELL italic_x roman_mod 2 start_POSTSUPERSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT = ( italic_x + italic_r ) roman_mod 2 start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT - italic_r + over^ start_ARG italic_w end_ARG ⋅ 2 start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT roman_mod 2 start_POSTSUPERSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_CELL end_ROW (5)

The correctness holds as long as w^^𝑤\hat{w}over^ start_ARG italic_w end_ARG is correct. Recall that we add a bias to ensure that the MSB of x𝑥xitalic_x is 0, x+r𝑥𝑟x+ritalic_x + italic_r wraps around 2superscript22^{\ell}2 start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT if and only if the MSB of r𝑟ritalic_r (i.e., r1subscript𝑟1r_{\ell-1}italic_r start_POSTSUBSCRIPT roman_ℓ - 1 end_POSTSUBSCRIPT) is 1 and the MSB of y=x+rmod2𝑦modulo𝑥𝑟superscript2y=x+r\mod 2^{\ell}italic_y = italic_x + italic_r roman_mod 2 start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT (i.e., y1subscript𝑦1y_{\ell-1}italic_y start_POSTSUBSCRIPT roman_ℓ - 1 end_POSTSUBSCRIPT) is 0. Therefore, we can correctly compute the wrap as w^=r1¬y1^𝑤subscript𝑟1subscript𝑦1\hat{w}=r_{\ell-1}\wedge\neg y_{\ell-1}over^ start_ARG italic_w end_ARG = italic_r start_POSTSUBSCRIPT roman_ℓ - 1 end_POSTSUBSCRIPT ∧ ¬ italic_y start_POSTSUBSCRIPT roman_ℓ - 1 end_POSTSUBSCRIPT. As to the bias, we have a trick that limits the range of x[22,221]𝑥superscript22superscript221x\in[-2^{\ell-2},2^{\ell-2}-1]italic_x ∈ [ - 2 start_POSTSUPERSCRIPT roman_ℓ - 2 end_POSTSUPERSCRIPT , 2 start_POSTSUPERSCRIPT roman_ℓ - 2 end_POSTSUPERSCRIPT - 1 ] and choose 22superscript222^{\ell-2}2 start_POSTSUPERSCRIPT roman_ℓ - 2 end_POSTSUPERSCRIPT as the bias. As a result, any input x=x+r[0,211]𝑥𝑥𝑟0superscript211x=x+r\in[0,2^{\ell-1}-1]italic_x = italic_x + italic_r ∈ [ 0 , 2 start_POSTSUPERSCRIPT roman_ℓ - 1 end_POSTSUPERSCRIPT - 1 ] is positive. After the conversion, the bias can be conveniently subtracted to eliminate its influence. The following left-shift operation can be regarded as multiplication by a public constant 2tsuperscript2𝑡2^{t}2 start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT, thus satisfying x/2f=x2t/2f=x/2fsuperscript𝑥superscript2superscript𝑓𝑥superscript2𝑡superscript2superscript𝑓𝑥superscript2𝑓x^{\prime}/2^{f^{\prime}}=x\cdot 2^{t}/2^{f^{\prime}}=x/2^{f}italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT / 2 start_POSTSUPERSCRIPT italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT = italic_x ⋅ 2 start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT / 2 start_POSTSUPERSCRIPT italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT = italic_x / 2 start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT.

Appendix D Security Proof

Theorem D.1.

Based on replicated secret sharing, the protocol 𝖣𝗈𝗐𝗇𝖢𝖺𝗌𝗍𝖣𝗈𝗐𝗇𝖢𝖺𝗌𝗍{\mathsf{DownCast}}sansserif_DownCast securely performs the share extension against the semi-honest adversary, with honest-majority assumption.

Proof.

The 𝖣𝗈𝗐𝗇𝖢𝖺𝗌𝗍𝖣𝗈𝗐𝗇𝖢𝖺𝗌𝗍{\mathsf{DownCast}}sansserif_DownCast protocol relies on local right-shift and modulo operations, which are performed individually by each party on the shares they hold. No communication of shares between parties is required for these computations. Owing to the nature of the underlying RSS scheme, each party alone cannot reveal the secret data, proving the overall security of the protocol. ∎

Theorem D.2.

Based on replicated secret sharing, the protocol 𝖴𝗉𝖢𝖺𝗌𝗍𝖴𝗉𝖢𝖺𝗌𝗍{\mathsf{UpCast}}sansserif_UpCast securely performs the share extension against the semi-honest adversary in the (𝖯𝖱𝖥,𝖣𝗈𝗐𝗇𝖢𝖺𝗌𝗍)𝖯𝖱𝖥𝖣𝗈𝗐𝗇𝖢𝖺𝗌𝗍(\mathsf{PRF},\mathsf{DownCast})( sansserif_PRF , sansserif_DownCast )-hybrid model, with honest-majority assumption.

Proof.

𝖴𝗉𝖢𝖺𝗌𝗍𝖴𝗉𝖢𝖺𝗌𝗍{\mathsf{UpCast}}sansserif_UpCast facilitates the computation by letting P2subscript𝑃2P_{2}italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT locally sample data-independent correlated randomness and offload the subsequent computations to the left two parties, i.e., P0subscript𝑃0P_{0}italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Recall that we have honest-majority assumption, P2subscript𝑃2P_{2}italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT cannot collude with either P0subscript𝑃0P_{0}italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT or P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Hence, although P2subscript𝑃2P_{2}italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT knows the plaintext value of randomness, he cannot reveal the input x𝑥xitalic_x without the information of the revealed y𝑦yitalic_y in Step-5. The randomness r𝑟ritalic_r is a random \ellroman_ℓ-bit integer. Its sharing over 2subscriptsuperscript2superscript\mathbb{Z}_{2^{\ell^{\prime}}}blackboard_Z start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_POSTSUBSCRIPT is generated by P2subscript𝑃2P_{2}italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and we let P0subscript𝑃0P_{0}italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT use 𝖣𝗈𝗐𝗇𝖢𝖺𝗌𝗍𝖣𝗈𝗐𝗇𝖢𝖺𝗌𝗍\mathsf{DownCast}sansserif_DownCast to obtain its sharing over 2subscriptsuperscript2\mathbb{Z}_{2^{\ell}}blackboard_Z start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT in Step-3. As long as the security of 𝖣𝗈𝗐𝗇𝖢𝖺𝗌𝗍𝖣𝗈𝗐𝗇𝖢𝖺𝗌𝗍\mathsf{DownCast}sansserif_DownCast holds, the security of Step-3 holds. Subsequently, the mask-and-open operation computes y=x+r𝑦𝑥𝑟y=x+ritalic_y = italic_x + italic_r over 2subscriptsuperscript2\mathbb{Z}_{2^{\ell}}blackboard_Z start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT. Since r𝑟ritalic_r is uniformly sampled over 2subscriptsuperscript2\mathbb{Z}_{2^{\ell}}blackboard_Z start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, and the computation modulos 2superscript22^{\ell}2 start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT, information-theoretical security is guaranteed. Despite the information of y𝑦yitalic_y, both P0subscript𝑃0P_{0}italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT cannot crack x𝑥xitalic_x. Regarding the computation of w^^𝑤\hat{w}over^ start_ARG italic_w end_ARG, it also merely involves local computations, thus leaking no information to help crack x𝑥xitalic_x. Finally, in Step-6, the two random values z0,z2subscript𝑧0subscript𝑧2z_{0},z_{2}italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT over 2subscriptsuperscript2superscript\mathbb{Z}_{2^{\ell^{\prime}}}blackboard_Z start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_POSTSUBSCRIPT generated using 𝖯𝖱𝖥𝖯𝖱𝖥\mathsf{PRF}sansserif_PRF in Step-2 are used to convert the two-out-of-two sharing of xsuperscript𝑥x^{\prime}italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT into RSS. P0subscript𝑃0P_{0}italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and P1subscript𝑃