Efficient estimation of partially linear additive Cox models and variance estimation under shape restrictions

Junjun Lang KLATASDS-MOE, School of Statistics, East China Normal University, Shanghai 200062, China Yukun Liu Corresponding author: ykliu@sfs.ecnu.edu.cn KLATASDS-MOE, School of Statistics, East China Normal University, Shanghai 200062, China Jing Qin National Institute of Allergy and Infectious Diseases, National Institutes of Health, USA

Abstract

Shape-restricted inferences have exhibited empirical success in various applications with survival data. However, certain works fall short in providing a rigorous theoretical justification and an easy-to-use variance estimator with theoretical guarantee. Motivated by Deng et al. (2023), this paper delves into an additive and shape-restricted partially linear Cox model for right-censored data, where each additive component satisfies a specific shape restriction, encompassing monotonic increasing/decreasing and convexity/concavity. We systematically investigate the consistencies and convergence rates of the shape-restricted maximum partial likelihood estimator (SMPLE) of all the underlying parameters. We further establish the aymptotic normality and semiparametric effiency of the SMPLE for the linear covariate shift. To estimate the asymptotic variance, we propose an innovative data-splitting variance estimation method that boasts exceptional versatility and broad applicability. Our simulation results and an analysis of the Rotterdam Breast Cancer dataset demonstrate that the SMPLE has comparable performance with the maximum likelihood estimator under the Cox model when the Cox model is correct, and outperforms the latter and Huang (1999)’s method when the Cox model is violated or the hazard is nonsmooth. Meanwhile, the proposed variance estimation method usually leads to reliable interval estimates based on the SMPLE and its competitors.

Keywords: Shape restriction; Righter-censored data; Additive model; Semiparametric efficiency; Variance estimation

1 Introduction

Shape restrictions (such as monotonicity and convexity) arise naturally in numerous practical scenarios. For instance, the growth curves of animals and plants in ecology and the dose-response in medicine must inherently exhibit non-decreasing characteristics (Chang et al., 2007; Wang and Ghosh, 2012). In the realm of economics, utility and production functions are often concave in income and prices (Matzkin, 1991; Varian, 1984), cost functions are monotone increasing, concave in input prices, and may exhibit non-increasing or non-decreasing returns to scale (Horowitz and Lee, 2017). In genetic epidemiology studies, the cumulative risk of a disease for individuals possess monotonicity (Qin et al., 2014). While in reliability analysis, the bathtub curve describing the failure rate typically displays convexity.

Incorporating shape restrictions into statistical analysis, apart from its exceptional interpretability and ability to enforce domain-specific constraints, often results in an estimation procedure that is devoid of tuning parameters, enhancing its efficiency and robustness. Therefore shape-restricted techniques has become an increasing popular tool for statistical inference or learning in various settings over the past decades. A comprehensive review on shape-restricted nonparametric inferences can be found in Groeneboom and Jongbloed (2014) and references therein. Recently, Chen and Samworth (2016) developed an algorithm for the estimation of the generalized additive model in which each of the additive components is linear or subject to a shape restriction. Balabdaoui et al. (2019) considered the estimation of the index parameter in a single-index model with a monotonically increasing link function. Deng and Zhang (2020) studied minimax and adaptation rates in general multiple isotonic regression. Feng et al. (2022) systematically investigate the theoretical properties of the least squared estimator of a S-shaped regression function.

This paper focus on the statistical inference for righter-censored survival data. Let $T$ denote the survival time and $(Z,X)\in\mathbb{R}^{p}\times\mathbb{R}^{d}$ denote a $(p+d)\times 1$ vector of covariates. We consider the partially linear Cox model of (Sasieni, 1992, PLCM) for modelling the conditional hazard function, i.e.

\displaystyle\lambda_{T}(t\mid x,z)=\lambda(t)\exp(\beta^{\top}x+g(z)),

λらむだ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_t ∣ italic_x , italic_z ) = italic_λらむだ ( italic_t ) roman_exp ( italic_βべーた start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x + italic_g ( italic_z ) ) ,

(1)

where $\lambda(\cdot)$ λらむだ ( ⋅ ) is the unspecified baseline hazard function, $\beta\in\mathbb{R}^{d}$ βべーた ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT is unspecified and $g(\cdot):\mathbb{R}^{p}\mapsto\mathbb{R}$ is an unknown function. This model reduces to the renowned Cox proportional hazards model (Cox, 1972, 1975) when the covariate $Z$ disappears, and it becomes the nonparametric Cox model (Sleeper and Harrington, 1990; O’Sullivan, 1993) in the absence of $X$ .

Many nonparametric techniques have been developed for the estimation of the PLCM, in particular for the linear covariate effect. Examples include profile partial likelihood together with a kernel technique (Heller, 2001), maximum likelihood estimation with a deep neural network (Zhong et al., 2022), and a kernel machine representation method (Rong et al., 2024), etc. However, these methods are either hampered by the curse of dimensionality or lack interpretability for $g(Z)$ , or suffer from tuning parameters, whose selection is not always straightforward. Alternatively, Huang (1999) proposed to model $g(Z)$ by a generalized additive model (Hastie and Tibshirani, 1986, 1990), which effectively avoids the curse of dimensionality and enforces an additive effect for the covariate $Z$ . Specifically,

\displaystyle g(Z)=\sum\limits_{j=1}^{p}g_{j}(Z_{(j)}),

(2)

where for $1\leq j\leq p$ , $Z_{(j)}$ is the $j$ -th component of $Z$ and $g_{j}$ is an unknown function. Huang (1999) proposed the use of polynomial splines to fit the unknown additive components. This method entails a number of tuning parameters, also yields convergence rates that lack conciseness and elegance. Furthermore, the spline method does not provide good interpretability for the additive covariate $Z_{(j)}$ .

Our paper is motivated by the work of Deng et al. (2023) which studied a shape-restricted and additive PLCM. Specifically, under models (1) and 2, they assume that each $g_{j}$ is monotonic increasing/decreasing or convex/concave. An active-set optimization algorithm was provided to calculate the shape-restricted maximum likelihood estimator. The shape-restriction strategy facilitates the utilization of prior knowledge regarding the effect of the log conditional hazard function on each covariate $Z_{(j)}$ and leads to a tuning-parameter-free estimation procedure. However, they proved only a consistency result, and did not provide any asymptotic normality results. Qin et al. (2021) studied a PLCM with a single additive component subject to shape restrictions, but they did not establish any $\sqrt{n}$ -consistency result. In addition, in general shape-restriction inferences, even if asymptotic normality results can be established, it is generally challenging to construct reasonable estimators for the asymptotic variances with theoretical guarantees (Groeneboom and Hendrickx, 2017).

This paper makes two main contributions to the literature of additive and shape-restricted PLCMs for survival data. The first contribution is to provide powerful statistical guarantees for the shape-restricted maximum partial likelihood estimator (SMPLE) and the induced Breslow-type estimator for the baseline cumulative hazard function under the model assumption of Deng et al. (2023). This includes a thorough convergence rate analysis for the estimators of the infinitely dimensional parameters, as well as establishing asymptotic normality and semiparametric efficiency for the estimator of the linear covariate effect. Our second contribution is to offer an easy-to-use estimator for the asymptotic variance of the linear covariate effect estimator. We show that this variance estimation method always provide consistent estimators once the corresponding asymptotic normality result holds. This method is very flexible and is applicable for general purpose especially in shape-restricted inferences, where theoretical guarantee of a bootstrap variance estimator is generally rather challenging (Groeneboom and Hendrickx, 2017). Our simulation results and an analysis of the Rotterdam Breast Cancer dataset demonstrate that the SMPLE has comparable performance with the maximum likelihood estimator under the Cox model when the Cox model is correct, and outperforms the latter and Huang (1999)’s method when the Cox model is violated and the hazard is nonsmooth. Meanwhile, the proposed variance estimation method usually leads to reliable interval estimates for the SMPLE and its competitors.

The rest of this paper is organized as follows. Section 2 introduces notations, data, and the shape-restricted maximum partial likelihood estimators (SMPLE). Section 3 investigates the convergence rates of the SMPLEs for all the unknown parameters, including $\beta$ βべーた, the unknown additive components, and the baseline cumulative hazard function. Section 4 establishes the asymptotic normality and semiparametric efficiency of the SMPLE for $\beta$ βべーた. A novel estimation method is also provided to estimate the asymptotic variance of the SMPLE of $\beta$ βべーた. A simulation study and real data analysis are presented in Section 5 and 6, respectively. Section 7 contains concluding remarks. For clarity, all technical proofs are postponed to the supplementary material.

2 Methodology

2.1 Data and model assumptions

Let $T$ and $(X,Z)$ be the survival time and the vector of covariates, respectively, in the introduction. Suppose that given $(X,Z)$ , the conditional hazard function of $T$ satisfies model (1) with $g(Z)$ satisfying (2). The survival time $T$ may be right censored by a censoring time $C$ and we only observe $Y=\min(T,C)$ . Throughout this paper, we use $\mathbf{1}(A)$ to denote the indicator function of the set $A$ and use a subscript 0 to highlight the true counterpart of a parameter. Let $\Delta=\mathbf{1}(T\leq C)$ Δでるた1𝑇𝐶\Delta=\mathbf{1}(T\leq C)roman_Δでるた = bold_1 ( italic_T ≤ italic_C ) be the non-censoring indicator. Given $n$ independent and identically distributed (iid) observations $(X_{i},Z_{i},Y_{i},\Delta_{i})$ Δでるた𝑖(X_{i},Z_{i},Y_{i},\Delta_{i})( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , roman_Δでるた start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), $1\leq i\leq n$ , from $(X,Z,Y,\Delta)$ Δでるた(X,Z,Y,\Delta)( italic_X , italic_Z , italic_Y , roman_Δでるた ), we wish to infer $(\beta_{0},g_{0})$ βべーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_g start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) and the baseline cumulative hazard function $\Lambda_{0}(y)=\int_{0}^{y}\lambda_{0}(t)dt$ Λらむだ0𝑦superscriptsubscript0𝑦subscript𝜆0𝑡differential-d𝑡\Lambda_{0}(y)=\int_{0}^{y}\lambda_{0}(t)dtroman_Λらむだ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_y ) = ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT italic_λらむだ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_t ) italic_d italic_t.

The identifiability issue of models (1) and (2) was investigated by Deng et al. (2023), following which we assume ${\mathbb{E}}\{g_{0,j}(Z_{(j)})\Delta\}=0$ Δでるた0{\mathbb{E}}\{g_{0,j}(Z_{(j)})\Delta\}=0blackboard_E { italic_g start_POSTSUBSCRIPT 0 , italic_j end_POSTSUBSCRIPT ( italic_Z start_POSTSUBSCRIPT ( italic_j ) end_POSTSUBSCRIPT ) roman_Δでるた } = 0, $j=1,2,\cdots,p$ , for identifiability. Furthermore, we assume that for $1\leq j\leq p$ , $g_{0,j}(\cdot)$ satisfies one of the four shape restrictions: monotone increasing, monotone decreasing, convex and concave, which are encoded as shape types 1, 2, 3 and 4, respectively. For any additive function $g=\sum\limits_{j=1}^{p}g_{j}(z_{(j)})$ , we define ${\rm sha}(g)=({\rm sha}(g_{1}),\cdots,{\rm sha}(g_{p}))^{\top}$ , where ${\rm sha}(h)\in\{1,2,3,4\}$ denotes the shape type of a univariate function $h$ . We always denote $\boldsymbol{k}_{0}={\rm sha}(g_{0})$ . Let $\mathcal{X}$ be the support of $X$ and for simplicity, we assume that the support of $Z_{(j)}$ is $[0,1]$ for $j=1,2,\cdots,p$ .

2.2 SMPLE

For any $(\beta,g)$ βべーた , italic_g ), denote $\eta=(\beta,g)$ ηいーた = ( italic_βべーた , italic_g ) and $R_{\eta}(U)=X^{\top}\beta+g(Z)$ ηいーた end_POSTSUBSCRIPT ( italic_U ) = italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_βべーた + italic_g ( italic_Z ), where $U=(X^{\top},Z^{\top})^{\top}$ . $1/n$ times the usual partial log likelihood is

\displaystyle L_{n}(\eta)=\frac{1}{n}\sum\limits_{i=1}^{n}\Delta_{i}\left[R_{% \eta}(U_{i})-\log\left(\sum\limits_{j=1}^{n}\mathbf{1}(Y_{j}\geq Y_{i})\exp(R_% {\eta}(U_{j}))\right)\right].

Δでるた𝑖delimited-[]subscript𝑅𝜂subscript𝑈𝑖superscriptsubscript𝑗1𝑛1subscript𝑌𝑗subscript𝑌𝑖subscript𝑅𝜂subscript𝑈𝑗\displaystyle L_{n}(\eta)=\frac{1}{n}\sum\limits_{i=1}^{n}\Delta_{i}\left[R_{% \eta}(U_{i})-\log\left(\sum\limits_{j=1}^{n}\mathbf{1}(Y_{j}\geq Y_{i})\exp(R_% {\eta}(U_{j}))\right)\right].italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_ηいーた ) = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_Δでるた start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ italic_R start_POSTSUBSCRIPT italic_ηいーた end_POSTSUBSCRIPT ( italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - roman_log ( ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_1 ( italic_Y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≥ italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) roman_exp ( italic_R start_POSTSUBSCRIPT italic_ηいーた end_POSTSUBSCRIPT ( italic_U start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ) ) ] .

We propose to estimate $\eta$ ηいーた by the shape-restricted maximum partial likelihood estimator (SMPLE),

\displaystyle\hat{\eta}:=(\hat{\beta},\hat{g})=\operatorname{argmax}_{\eta\in% \mathbb{R}^{d}\times\mathcal{G}_{\boldsymbol{k}_{0}}}L_{n}(\eta),

ηいーた end_ARG := ( over^ start_ARG italic_βべーた end_ARG , over^ start_ARG italic_g end_ARG ) = roman_argmax start_POSTSUBSCRIPT italic_ηいーた ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT × caligraphic_G start_POSTSUBSCRIPT bold_italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_ηいーた ) ,

(3)

where

\displaystyle\mathcal{G}_{\boldsymbol{k}_{0}}=\bigg{\{}g:[0,1]^{p}\mapsto% \mathbb{R}\mid g(Z)=\sum\limits_{j=1}^{p}g_{j}(z_{(j)}),~{}{\rm sha}(g)=% \boldsymbol{k}_{0},\mathbb{E}\left[\Delta g_{j}(Z_{(j)})\right]=0,~{}1\leq j% \leq p\bigg{\}}

Δでるたsubscript𝑔𝑗subscript𝑍𝑗01𝑗𝑝\displaystyle\mathcal{G}_{\boldsymbol{k}_{0}}=\bigg{\{}g:[0,1]^{p}\mapsto% \mathbb{R}\mid g(Z)=\sum\limits_{j=1}^{p}g_{j}(z_{(j)}),~{}{\rm sha}(g)=% \boldsymbol{k}_{0},\mathbb{E}\left[\Delta g_{j}(Z_{(j)})\right]=0,~{}1\leq j% \leq p\bigg{\}}caligraphic_G start_POSTSUBSCRIPT bold_italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = { italic_g : [ 0 , 1 ] start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ↦ blackboard_R ∣ italic_g ( italic_Z ) = ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT ( italic_j ) end_POSTSUBSCRIPT ) , roman_sha ( italic_g ) = bold_italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , blackboard_E [ roman_Δでるた italic_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_Z start_POSTSUBSCRIPT ( italic_j ) end_POSTSUBSCRIPT ) ] = 0 , 1 ≤ italic_j ≤ italic_p }

is the parameter space of $g$ . With the SMPLE in (3), we estimate $\Lambda_{0}(y)$ Λらむだ0𝑦\Lambda_{0}(y)roman_Λらむだ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_y ) by the Breslow-type estimator

\displaystyle\hat{\Lambda}(y;\hat{\eta})=\frac{1}{n}\sum\limits_{j=1}^{n}\frac% {\Delta_{j}}{S_{0,n}(Y_{j},\hat{\eta})}\mathbf{1}(y\geq Y_{j}),

Λらむだ𝑦^𝜂1𝑛superscriptsubscript𝑗1𝑛subscriptΔでるた𝑗subscript𝑆0𝑛subscript𝑌𝑗^𝜂1𝑦subscript𝑌𝑗\displaystyle\hat{\Lambda}(y;\hat{\eta})=\frac{1}{n}\sum\limits_{j=1}^{n}\frac% {\Delta_{j}}{S_{0,n}(Y_{j},\hat{\eta})}\mathbf{1}(y\geq Y_{j}),over^ start_ARG roman_Λらむだ end_ARG ( italic_y ; over^ start_ARG italic_ηいーた end_ARG ) = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG roman_Δでるた start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_S start_POSTSUBSCRIPT 0 , italic_n end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , over^ start_ARG italic_ηいーた end_ARG ) end_ARG bold_1 ( italic_y ≥ italic_Y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ,

(4)

where $S_{0,n}(y,\eta)=(1/n)\sum\limits_{i=1}^{n}\{\mathbf{1}(Y_{i}\geq y)\exp(R_{% \eta}(U_{i}))\}.$ ηいーた ) = ( 1 / italic_n ) ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT { bold_1 ( italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ italic_y ) roman_exp ( italic_R start_POSTSUBSCRIPT italic_ηいーた end_POSTSUBSCRIPT ( italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) } .

The SMPLE defined in (3) can be calculated with the active-set algorithm introduced in Deng et al. (2023). Let $\hat{g}_{j}$ be functions satisfying $\hat{g}(Z)=\sum\limits_{j=1}^{p}\hat{g}_{j}(Z_{(j)})$ for all $Z$ . The function $\hat{g}(Z)$ is unique only at the observed $Z_{i}$ and is therefore non-unique typically for $Z$ other than $Z_{i}$ ’s, which is akin to general shape-restricted regression estimators (Chen and Samworth, 2016). This implies that $\hat{g}_{j}(\cdot)$ is usually non-unique for $1\leq j\leq p$ , and the solution set of $\hat{g}_{j}(\cdot)$ always contains a piece-wise linear function (Deng et al., 2023). See Figure 3 for an illustration.

3 Rate of convergence

The consistency property of the SMPLEs $\hat{\beta}$ βべーた end_ARG, $\hat{g}$ , and $\hat{g}_{j}$ was established by Deng et al. (2023). In this section, we establish their convergence rates. We make the following assumptions.

Assumption 1.

(i) The observed $\{Y_{i}\}_{i=1}^{n}$ are in the interval $[0,\tau]$ τたう ], for some $\tau>0$ τたう > 0. (ii) Given $U$ , $T$ and $C$ are mutually independent of each other. (iii) $\Lambda_{0}(\tau)<\infty$ Λらむだ0𝜏\Lambda_{0}(\tau)<\inftyroman_Λらむだ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_τたう ) < ∞ and ${\rm pr}(C\geq\tau\mid U)\geq c>0$ τたう ∣ italic_U ) ≥ italic_c > 0 almost surely for some constant $c$ . (iv) $\mathbb{E}[\Delta X]=0$ Δでるた𝑋0\mathbb{E}[\Delta X]=0blackboard_E [ roman_Δでるた italic_X ] = 0 and $\mathbb{E}[\Delta]>0$ Δでるた0\mathbb{E}[\Delta]>0blackboard_E [ roman_Δでるた ] > 0.

Let $\|\cdot\|$ denote the usual Euclidean norm and $\|f(\cdot)\|_{\infty}$ the supreme norm of a real-valued function $f$ . For any constant $M>0$ , define

\displaystyle\mathcal{K}_{M,\boldsymbol{k}_{0}}:=\left\{\eta\mid\eta=(\beta,g)% ,g\in\mathcal{G}_{\boldsymbol{k}_{0}},\|\beta\|+\sum\limits_{j=1}^{p}\|g_{j}(% \cdot)\|_{\infty}\leq M\right\}.

ηいーた ∣ italic_ηいーた = ( italic_βべーた , italic_g ) , italic_g ∈ caligraphic_G start_POSTSUBSCRIPT bold_italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , ∥ italic_βべーた ∥ + ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ∥ italic_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( ⋅ ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≤ italic_M } .

Assumption 2.

The support $\mathcal{X}$ of $X$ is a bounded subset of $\mathbb{R}^{d}$ and there exists a positive constant $M_{0}>0$ such that $\eta_{0}\in\mathcal{K}_{M_{0},\boldsymbol{k}_{0}}$ ηいーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ caligraphic_K start_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT.

Assumption 3.

There exists a small positive constant $\epsilon$ such that ${\rm pr}(\Delta=1\mid U)>\epsilon$ Δでるたconditional1𝑈italic-ϵ{\rm pr}(\Delta=1\mid U)>\epsilonroman_pr ( roman_Δでるた = 1 ∣ italic_U ) > italic_ϵ almost surely with respect to the probability measure of $U$ .

Assumption 4.

The joint density of $(Y,Z,\Delta)$ Δでるた(Y,Z,\Delta)( italic_Y , italic_Z , roman_Δでるた ) satisfies

\displaystyle 0<\inf_{(y,z)\in[0,\tau]\times[0,1]^{p}}{\rm pr}(Y=y,Z=z,\Delta=% 1)\leq\sup_{(y,z)\in[0,\tau]\times[0,1]^{p}}{\rm pr}(Y=y,Z=z,\Delta=1)<\infty.

Δでるた1subscriptsupremum𝑦𝑧0𝜏superscript01𝑝prformulae-sequence𝑌𝑦formulae-sequence𝑍𝑧Δでるた1\displaystyle 0<\inf_{(y,z)\in[0,\tau]\times[0,1]^{p}}{\rm pr}(Y=y,Z=z,\Delta=% 1)\leq\sup_{(y,z)\in[0,\tau]\times[0,1]^{p}}{\rm pr}(Y=y,Z=z,\Delta=1)<\infty.0 < roman_inf start_POSTSUBSCRIPT ( italic_y , italic_z ) ∈ [ 0 , italic_τたう ] × [ 0 , 1 ] start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_pr ( italic_Y = italic_y , italic_Z = italic_z , roman_Δでるた = 1 ) ≤ roman_sup start_POSTSUBSCRIPT ( italic_y , italic_z ) ∈ [ 0 , italic_τたう ] × [ 0 , 1 ] start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_pr ( italic_Y = italic_y , italic_Z = italic_z , roman_Δでるた = 1 ) < ∞ .

Assumption 5.

When ${{\rm sha}}(g_{0,j})\in\{3,4\}$ , the density function $Z_{(j)}$ with respect to the Lebesgue measure has uniformly upper and lower bounds on $[0,1]$ .

Assumptions 1–2 are standard in the theoretical analysis of traditional Cox model and its variants (Huang, 1999; Zhong et al., 2022). Assumption 3 ensures that the probability of being uncensored is positive regardless of the covariate values, and it is used to establish the convergence rate results in Theorem 1. Assumption 4 is used in the calculation of the semiparametric efficiency lower bound (Huang, 1999). In Assumption 5, the upper bound requirement is used to calculate some entropy results needed in the proof of Proposition 1, and the lower bound requirement guarantees that the approximation errors of piecewise linear approximations of the convex/concave additive components to themselves are small enough in the proof of Theorem 3.

The Proposition below establishes the consistency of $R_{\hat{\eta}}(\cdot)$ ηいーた end_ARG end_POSTSUBSCRIPT ( ⋅ ), as an estimator of $R_{\eta_{0}}(\cdot)$ ηいーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ⋅ ), which roughly implies the consistency of $\hat{\eta}=(\hat{\beta},\hat{g})$ ηいーた end_ARG = ( over^ start_ARG italic_βべーた end_ARG , over^ start_ARG italic_g end_ARG ). We would have proved the consistencies of $\hat{\beta}$ βべーた end_ARG and each $\hat{g}_{j}$ separately. However, the latter results are not needed in the proofs of the subsequent convergence rate results given the consistency of $R_{\hat{\eta}}(\cdot)$ ηいーた end_ARG end_POSTSUBSCRIPT ( ⋅ ).

Proposition 1.

Suppose that models (1) and (2) and Assumptions 1, 2 and 5 are satisfied. As $n\rightarrow\infty$ , we have

\displaystyle\|R_{\hat{\eta}}(\cdot)-R_{\eta_{0}}(\cdot)\|_{\infty}=o_{p}(1).

ηいーた end_ARG end_POSTSUBSCRIPT ( ⋅ ) - italic_R start_POSTSUBSCRIPT italic_ηいーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ⋅ ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_o start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( 1 ) .

(5)

Define $d^{2}(\eta,\eta_{0})=\mathbb{E}_{U}\left\{(R_{\eta}(U)-R_{\eta_{0}}(U))^{2}% \right\},$ ηいーた , italic_ηいーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = blackboard_E start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT { ( italic_R start_POSTSUBSCRIPT italic_ηいーた end_POSTSUBSCRIPT ( italic_U ) - italic_R start_POSTSUBSCRIPT italic_ηいーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_U ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } , where $\mathbb{E}_{U}$ denotes the expectation with respect to $U$ . Let $\|\cdot\|_{L_{2}}$ denote the $L_{2}(P)$ norm and $\rho(\boldsymbol{k}_{0})=0.5+0.5\cdot\mathbf{1}(\cup_{i=1}^{p}({\rm sha}(g_{0,% i})\in\{1,2\}))$ ρろー ( bold_italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = 0.5 + 0.5 ⋅ bold_1 ( ∪ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( roman_sha ( italic_g start_POSTSUBSCRIPT 0 , italic_i end_POSTSUBSCRIPT ) ∈ { 1 , 2 } ) ). One of our main results is to establish the convergence rate of the SMPLE $\hat{\eta}$ ηいーた end_ARG.

Theorem 1.

Assume the same conditions in Proposition 1. As $n\rightarrow\infty$ , we have

d(\hat{\eta},\eta_{0})=O_{p}\left(n^{-\frac{1}{2+\rho(\boldsymbol{k}_{0})}}% \right).

ηいーた end_ARG , italic_ηいーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = italic_O start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_n start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 + italic_ρろー ( bold_italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_ARG end_POSTSUPERSCRIPT ) .

Furthermore, if Assumptions 3–4 are satisfied and $I(\beta_{0})$ βべーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) (defined in (7)) is non-singular, then for $1\leq j\leq p$ ,

\displaystyle\|\hat{\beta}-\beta_{0}\|=O_{p}\left(n^{-\frac{1}{2+\rho(% \boldsymbol{k}_{0})}}\right),\quad\|\hat{g}_{j}(Z_{(j)})-g_{0,j}(Z_{(j)})\|_{L% _{2}}=O_{p}\left(n^{-\frac{1}{2+\rho(\boldsymbol{k}_{0})}}\right).

βべーた end_ARG - italic_βべーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ = italic_O start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_n start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 + italic_ρろー ( bold_italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_ARG end_POSTSUPERSCRIPT ) , ∥ over^ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_Z start_POSTSUBSCRIPT ( italic_j ) end_POSTSUBSCRIPT ) - italic_g start_POSTSUBSCRIPT 0 , italic_j end_POSTSUBSCRIPT ( italic_Z start_POSTSUBSCRIPT ( italic_j ) end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_O start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_n start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 + italic_ρろー ( bold_italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_ARG end_POSTSUPERSCRIPT ) .

According to Theorem 1, the rates of convergence of all $\hat{g}_{j}$ are $O_{p}(n^{-2/5})$ if none of the additive components of $g$ is monotonic. Conversely, if one additive component of $g$ is monotonic, then their convergence rates all slow down to $O_{p}(n^{-1/3})$ . An explanation for this finding is that the complexity of the class of bounded and monotonic functions is much larger than that of the class of bounded and convex (or concave) functions. These convergence rate results are free from the covariate dimensionality and exhibit a much more elegant form than those in Huang (1999) and Zhong et al. (2022). Theorem 1 also establishes the convergence rate of $\hat{\beta}$ βべーた end_ARG, although it is sub-optimal.

With Theorem 1, we are able to establish the uniformly rate of convergence for the SMPLE $\hat{\Lambda}(y;\hat{\eta})$ Λらむだ𝑦^𝜂\hat{\Lambda}(y;\hat{\eta})over^ start_ARG roman_Λらむだ end_ARG ( italic_y ; over^ start_ARG italic_ηいーた end_ARG ) in (4) of the baseline cumulative hazard function $\Lambda_{0}(y)$ Λらむだ0𝑦\Lambda_{0}(y)roman_Λらむだ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_y ). It turns out that $\hat{\Lambda}(y;\hat{\eta})$ Λらむだ𝑦^𝜂\hat{\Lambda}(y;\hat{\eta})over^ start_ARG roman_Λらむだ end_ARG ( italic_y ; over^ start_ARG italic_ηいーた end_ARG ) has the same convergence rate as $\hat{\eta}$ ηいーた end_ARG, although their convergence rates are quantified by different distances.

Theorem 2.

Assume the same conditions as in Proposition 1. As $n\rightarrow\infty$ , it holds that

\displaystyle\sup_{y\in[0,\tau]}\left|\hat{\Lambda}(y;\hat{\eta})-\Lambda_{0}(% y)\right|=O_{p}\left(n^{-\frac{1}{2+\rho(\boldsymbol{k}_{0})}}\right).

Λらむだ𝑦^𝜂subscriptΛらむだ0𝑦subscript𝑂𝑝superscript𝑛12𝜌subscript𝒌0\displaystyle\sup_{y\in[0,\tau]}\left|\hat{\Lambda}(y;\hat{\eta})-\Lambda_{0}(% y)\right|=O_{p}\left(n^{-\frac{1}{2+\rho(\boldsymbol{k}_{0})}}\right).roman_sup start_POSTSUBSCRIPT italic_y ∈ [ 0 , italic_τたう ] end_POSTSUBSCRIPT | over^ start_ARG roman_Λらむだ end_ARG ( italic_y ; over^ start_ARG italic_ηいーた end_ARG ) - roman_Λらむだ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_y ) | = italic_O start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_n start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 + italic_ρろー ( bold_italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_ARG end_POSTSUPERSCRIPT ) .

Remark 1.

In practice, one may impose a combination of monotonicity and convexity/concavity constraints on the additive components according to domain knowledge. See (Chen and Samworth, 2016; Kuchibhotla et al., 2023; Deng et al., 2023) for further motivation on additional shape constraints. Proposition 1 and Theorems 1–4 still hold when model (2) incorporates additive components that satisfy both monotonicity and convexity/concavity restrictions. An intuitive explanation for this result is that the parameter space $\mathcal{G}_{\boldsymbol{k}_{0}}$ is reduced by additional constraints on the additive components and this can lead to better convergence rates of the SMPLE (if not the same).

4 Asymptotic normality and efficiency

Based on the convergence rate results in the previous section, in this section, we further show that our SMPLE in (3) for the linear covariate effect $\hat{\beta}$ βべーた end_ARG is asymptotically normal and semiparametric efficient, in the sense that its asymptotic variance achieves the semiparametric efficiency lower bound (Bickel et al., 1993) or the information bound of estimating $\beta_{0}$ βべーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT under models (1) and (2).

We begin with presenting the information bound for $\beta_{0}$ βべーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. Recall that $U=(X^{\top},Z^{\top})^{\top}$ , and define

\displaystyle M(y)\equiv M(y\mid Y,\Delta,U)=\Delta\mathbf{1}(Y\leq y)-\int_{0% }^{y}\mathbf{1}(Y\geq t)\exp\{R_{\eta_{0}}(U)\}d\Lambda_{0}(t),

Δでるた𝑈Δでるた1𝑌𝑦superscriptsubscript0𝑦1𝑌𝑡subscript𝑅subscript𝜂0𝑈differential-dsubscriptΛらむだ0𝑡\displaystyle M(y)\equiv M(y\mid Y,\Delta,U)=\Delta\mathbf{1}(Y\leq y)-\int_{0% }^{y}\mathbf{1}(Y\geq t)\exp\{R_{\eta_{0}}(U)\}d\Lambda_{0}(t),italic_M ( italic_y ) ≡ italic_M ( italic_y ∣ italic_Y , roman_Δでるた , italic_U ) = roman_Δでるた bold_1 ( italic_Y ≤ italic_y ) - ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT bold_1 ( italic_Y ≥ italic_t ) roman_exp { italic_R start_POSTSUBSCRIPT italic_ηいーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_U ) } italic_d roman_Λらむだ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_t ) ,

which is a counting process martingale associated with the Cox model. The log-likelihood of model (1) based on one observation $(X,Z,Y,\Delta)$ Δでるた(X,Z,Y,\Delta)( italic_X , italic_Z , italic_Y , roman_Δでるた ) is (up to constant)

\displaystyle\ell(\beta,g,\Lambda)=\Delta\log\lambda(Y)+\Delta\{X^{\top}\beta+% g(Z)\}-\Lambda(Y)\exp\{X^{\top}\beta+g(Z)\}.

ΛらむだΔでるた𝜆𝑌Δでるたsuperscript𝑋top𝛽𝑔𝑍Λらむだ𝑌superscript𝑋top𝛽𝑔𝑍\displaystyle\ell(\beta,g,\Lambda)=\Delta\log\lambda(Y)+\Delta\{X^{\top}\beta+% g(Z)\}-\Lambda(Y)\exp\{X^{\top}\beta+g(Z)\}.roman_ℓ ( italic_βべーた , italic_g , roman_Λらむだ ) = roman_Δでるた roman_log italic_λらむだ ( italic_Y ) + roman_Δでるた { italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_βべーた + italic_g ( italic_Z ) } - roman_Λらむだ ( italic_Y ) roman_exp { italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_βべーた + italic_g ( italic_Z ) } .

(6)

Conisder a parametric smooth sub-model $\{\lambda_{(\nu)}:\nu\in\mathbb{R}\}$ λらむだ start_POSTSUBSCRIPT ( italic_νにゅー ) end_POSTSUBSCRIPT : italic_νにゅー ∈ blackboard_R } and $\{g_{j,(\nu)}:\nu\in\mathbb{R}\}$ νにゅー ) end_POSTSUBSCRIPT : italic_νにゅー ∈ blackboard_R }, $1\leq j\leq p$ , with $\lambda_{(0)}=\lambda_{0}$ λらむだ start_POSTSUBSCRIPT ( 0 ) end_POSTSUBSCRIPT = italic_λらむだ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and $g_{j,(0)}=g_{0,j}$ . Define $L_{2}(P_{Y})$ to be the set of $a(\cdot)$ satisfying ${\mathbb{E}}\{\Delta a^{2}(Y)\}<\infty$ Δでるたsuperscript𝑎2𝑌{\mathbb{E}}\{\Delta a^{2}(Y)\}<\inftyblackboard_E { roman_Δでるた italic_a start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_Y ) } < ∞ and $a(y)=\partial\log\lambda_{(\nu)}(y)/\partial\nu|_{\nu=0},$ λらむだ start_POSTSUBSCRIPT ( italic_νにゅー ) end_POSTSUBSCRIPT ( italic_y ) / ∂ italic_νにゅー | start_POSTSUBSCRIPT italic_νにゅー = 0 end_POSTSUBSCRIPT , for some submodel. Similarly, for $1\leq j\leq p$ , define $L_{2}^{0}(P_{Z_{(j)}})$ to be the set of $h_{j}$ satisfying ${\mathbb{E}}\{\Delta h_{j}(Z_{(j)})\}=0$ Δでるたsubscriptℎ𝑗subscript𝑍𝑗0{\mathbb{E}}\{\Delta h_{j}(Z_{(j)})\}=0blackboard_E { roman_Δでるた italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_Z start_POSTSUBSCRIPT ( italic_j ) end_POSTSUBSCRIPT ) } = 0, ${\mathbb{E}}\{\Delta h_{j}^{2}(Z_{(j)})\}<\infty$ Δでるたsuperscriptsubscriptℎ𝑗2subscript𝑍𝑗{\mathbb{E}}\{\Delta h_{j}^{2}(Z_{(j)})\}<\inftyblackboard_E { roman_Δでるた italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_Z start_POSTSUBSCRIPT ( italic_j ) end_POSTSUBSCRIPT ) } < ∞ and $h_{j}(z_{(j)})=\partial g_{j,(\nu)}(z_{(j)})/\partial\nu|_{\nu=0}$ νにゅー ) end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT ( italic_j ) end_POSTSUBSCRIPT ) / ∂ italic_νにゅー | start_POSTSUBSCRIPT italic_νにゅー = 0 end_POSTSUBSCRIPT for some submodel. The following lemma, which is Theorem 3.1 of Huang (1999), gives the information bound of $\beta_{0}$ βべーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT.

Lemma 1 (Theorem 3.1 of Huang (1999)).

Suppose that models (1) and (2) and Assumptions 1–4 are satisfied. Let $((\boldsymbol{a}^{\star})^{\top},(\boldsymbol{h}_{1}^{\star})^{\top},\cdots,(% \boldsymbol{h}_{p}^{\star})^{\top})^{\top}$ be the unique, vector-valued function in $L_{2}(P_{Y})^{d}\times L_{2}^{0}(P_{Z_{(1)}})^{d}\times\cdots\times L_{2}^{0}(% P_{Z_{(p)}})^{d}$ that minimizes

\displaystyle{\mathbb{E}}\left\{\Delta\|X-\boldsymbol{a}(Y)-\boldsymbol{h}_{1}% (Z_{(1)})-\cdots-\boldsymbol{h}_{p}(Z_{(p)})\|^{2}\right\}.

Δでるたsuperscriptnorm𝑋𝒂𝑌subscript𝒉1subscript𝑍1⋯subscript𝒉𝑝subscript𝑍𝑝2\displaystyle{\mathbb{E}}\left\{\Delta\|X-\boldsymbol{a}(Y)-\boldsymbol{h}_{1}% (Z_{(1)})-\cdots-\boldsymbol{h}_{p}(Z_{(p)})\|^{2}\right\}.blackboard_E { roman_Δでるた ∥ italic_X - bold_italic_a ( italic_Y ) - bold_italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_Z start_POSTSUBSCRIPT ( 1 ) end_POSTSUBSCRIPT ) - ⋯ - bold_italic_h start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_Z start_POSTSUBSCRIPT ( italic_p ) end_POSTSUBSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } .

(1)

The efficient score for estimation of $\beta_{0}$ βべーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is

\displaystyle\ell_{\beta_{0}}^{\star}(Y,\Delta,U)=\int_{0}^{\tau}\{X-% \boldsymbol{a}^{\star}(y)-\boldsymbol{h}^{\star}(Z)\}dM(y),

Δでるた𝑈superscriptsubscript0𝜏𝑋superscript𝒂⋆𝑦superscript𝒉⋆𝑍differential-d𝑀𝑦\displaystyle\ell_{\beta_{0}}^{\star}(Y,\Delta,U)=\int_{0}^{\tau}\{X-% \boldsymbol{a}^{\star}(y)-\boldsymbol{h}^{\star}(Z)\}dM(y),roman_ℓ start_POSTSUBSCRIPT italic_βべーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( italic_Y , roman_Δでるた , italic_U ) = ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τたう end_POSTSUPERSCRIPT { italic_X - bold_italic_a start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( italic_y ) - bold_italic_h start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( italic_Z ) } italic_d italic_M ( italic_y ) ,

where $\boldsymbol{h}^{\star}(Z)=\sum\limits_{j=1}^{p}\boldsymbol{h}_{j}^{\star}(Z_{(% j)})$ and $\boldsymbol{a}^{\star}(y)={\mathbb{E}}\left\{X-\boldsymbol{h}^{\star}(Z)\mid Y% =y,\Delta=1\right\}.$ Δでるた1\boldsymbol{a}^{\star}(y)={\mathbb{E}}\left\{X-\boldsymbol{h}^{\star}(Z)\mid Y% =y,\Delta=1\right\}.bold_italic_a start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( italic_y ) = blackboard_E { italic_X - bold_italic_h start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( italic_Z ) ∣ italic_Y = italic_y , roman_Δでるた = 1 } .

(2)

The information bound for estimation of $\beta_{0}$ βべーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is

\displaystyle I(\beta_{0})={\mathbb{E}}\left[\left\{\ell_{\beta_{0}}^{\star}(Y% ,\Delta,U)\right\}^{\otimes 2}\right]={\mathbb{E}}\left[\Delta\left\{X-% \boldsymbol{a}^{\star}(Y)-\boldsymbol{h}^{\star}(Z)\right\}^{\otimes 2}\right],

Δでるた𝑈tensor-productabsent2𝔼delimited-[]Δでるたsuperscript𝑋superscript𝒂⋆𝑌superscript𝒉⋆𝑍tensor-productabsent2\displaystyle I(\beta_{0})={\mathbb{E}}\left[\left\{\ell_{\beta_{0}}^{\star}(Y% ,\Delta,U)\right\}^{\otimes 2}\right]={\mathbb{E}}\left[\Delta\left\{X-% \boldsymbol{a}^{\star}(Y)-\boldsymbol{h}^{\star}(Z)\right\}^{\otimes 2}\right],italic_I ( italic_βべーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = blackboard_E [ { roman_ℓ start_POSTSUBSCRIPT italic_βべーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( italic_Y , roman_Δでるた , italic_U ) } start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT ] = blackboard_E [ roman_Δでるた { italic_X - bold_italic_a start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( italic_Y ) - bold_italic_h start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( italic_Z ) } start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT ] ,

(7)

where $A^{\otimes 2}=AA^{\top}$ for any vector or matrix $A$ .

Additional assumptions are needed to obtain the asymptotic normality and efficiency of $\hat{\beta}$ βべーた end_ARG. Denote $\boldsymbol{h}_{j}^{\star}=(\boldsymbol{h}_{j,1}^{\star},\cdots,\boldsymbol{h}% _{j,d}^{\star})^{\top}$ for $1\leq j\leq p$ .

Assumption 6.

When ${\rm sha}(g_{0,j})\in\{1,2\}$ , there exist constant $\tilde{C}_{1}>0$ and $\tilde{C}_{2}>0$ such that $\|\boldsymbol{h}^{\star}_{j}(x_{1})-\boldsymbol{h}^{\star}_{j}(y_{1})\|\leq% \tilde{C}_{1}|x_{1}-y_{1}|$ and $|g_{0,j}^{-1}(x_{2})-g_{0,j}^{-1}(y_{2})|\leq\tilde{C}_{2}|x_{2}-y_{2}|.$ for all $x_{1},y_{1}\in[0,1]$ and $x_{2},y_{2}\in[-M_{0},M_{0}]$ .

Assumption 7.

When ${\rm sha}(g_{0,j})\in\{3,4\}$ , the function $\boldsymbol{h}_{j,\tilde{i}}^{\star}(\cdot)$ is $\bar{p}$ -H $\rm\ddot{o}$ lder continuous for all $1\leq\tilde{i}\leq d$ and some $\bar{p}\in(\rho(\boldsymbol{k}_{0}),\;2]$ ρろー ( bold_italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , 2 ].

Assumption 8.

When ${\rm sha}(g_{0,j})\in\{3\}$ , the function $g_{0,j}(t)-\tilde{C}t^{2}$ is convex for some constant $\tilde{C}>0$ ; when ${\rm sha}(g_{0,j})\in\{4\}$ , the function $g_{0,j}(t)+\tilde{C}t^{2}$ is concave for some constant $\tilde{C}>0$ .

Assumption 6 is used to control the fluctuation of the score function corresponding to the monotone components in the direction of the projection defined in Lemma 1. (Huang, 2002; Cheng, 2009) adopted a similar assumption. Assumptions 7–8, which are analogues of Assumption B1 of Kuchibhotla et al. (2023), are used for approximations of the convex (concave) components.

Theorem 3.

Suppose that models (1) and (2) and Assumptions 1–8 are satisfied. Further assume that $I(\beta_{0})$ βべーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) is non-singular. Then as $n\rightarrow\infty$ , $\sqrt{n}(\hat{\beta}-\beta_{0})\stackrel{{\scriptstyle d}}{{\longrightarrow}}N% (0,I^{-1}(\beta_{0})).$ βべーた end_ARG - italic_βべーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_RELOP SUPERSCRIPTOP start_ARG ⟶ end_ARG start_ARG italic_d end_ARG end_RELOP italic_N ( 0 , italic_I start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_βべーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ) . This implies that the asymptotic variance achieves the information bound and $\hat{\beta}$ βべーた end_ARG is asymptotically semiparametric efficient among all regular estimators of $\beta_{0}$ βべーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT.

By Theorem 3, $\hat{\beta}$ βべーた end_ARG has an asymptotically normal distribution with asymptotic variance $I^{-1}(\beta_{0})$ βべーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ). When making inference about $\beta_{0}$ βべーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT based on this theorem, we need to construct a consistent estimator for $I^{-1}(\beta_{0})$ βべーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ). However, $I^{-1}(\beta_{0})$ βべーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) or equivalently $I(\beta_{0})$ βべーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) has a rather complicated form, making its plug-in estimator not easy to use. To crack this nut, we propose a novel data-splitting estimation method to estimate $I^{-1}(\beta_{0})$ βべーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ).

4.1 Data-splitting variance estimation and inference on $\beta$ βべーた

We introduce the proposed data-splitting variance estimation method under a general setting as it is applicable generally. Let $\theta_{0}\in\mathbb{R}^{d}$ θしーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT be a functional of a statistical population $\mathscr{P}$ and $\hat{\theta}_{n}$ θしーた end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT be an estimator of $\theta_{0}$ θしーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT based on i.i.d samples $\{O_{i}\}_{i=1}^{n}$ from $\mathscr{P}$ . Suppose that $\sqrt{n}(\hat{\theta}_{n}-\theta_{0}){\overset{d}{\longrightarrow\;}}N(0,\Sigma)$ Σしぐま\sqrt{n}(\hat{\theta}_{n}-\theta_{0}){\overset{d}{\longrightarrow\;}}N(0,\Sigma)square-root start_ARG italic_n end_ARG ( over^ start_ARG italic_θしーた end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_θしーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) overitalic_d start_ARG ⟶ end_ARG italic_N ( 0 , roman_Σしぐま ), where $\Sigma$ Σしぐま\Sigmaroman_Σしぐま is a semi-positive matrix. Let $k_{n}<n$ and $k_{n}\rightarrow\infty$ . We partition the sample into $\lfloor k_{n}\rfloor$ subsamples, each of which has $m_{n}=\lfloor n/k_{n}\rfloor$ observations, and let $\hat{\theta}_{ni}$ θしーた end_ARG start_POSTSUBSCRIPT italic_n italic_i end_POSTSUBSCRIPT denote the estimator of $\theta_{0}$ θしーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT based on the $i$ -th subsample, $1\leq i\leq\lfloor k_{n}\rfloor$ . Our splitting-data estimator for the asymptotic variance $\Sigma$ Σしぐま\Sigmaroman_Σしぐま is defined as

\displaystyle\hat{\Sigma}=\frac{m_{n}}{\lfloor k_{n}\rfloor}\sum_{i=1}^{% \lfloor k_{n}\rfloor}(\hat{\theta}_{ni}-\bar{\hat{\theta}}_{n})(\hat{\theta}_{% ni}-\bar{\hat{\theta}}_{n})^{\top},

Σしぐまsubscript𝑚𝑛subscript𝑘𝑛superscriptsubscript𝑖1subscript𝑘𝑛subscript^𝜃𝑛𝑖subscript¯^𝜃𝑛superscriptsubscript^𝜃𝑛𝑖subscript¯^𝜃𝑛top\displaystyle\hat{\Sigma}=\frac{m_{n}}{\lfloor k_{n}\rfloor}\sum_{i=1}^{% \lfloor k_{n}\rfloor}(\hat{\theta}_{ni}-\bar{\hat{\theta}}_{n})(\hat{\theta}_{% ni}-\bar{\hat{\theta}}_{n})^{\top},over^ start_ARG roman_Σしぐま end_ARG = divide start_ARG italic_m start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG ⌊ italic_k start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⌋ end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⌊ italic_k start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⌋ end_POSTSUPERSCRIPT ( over^ start_ARG italic_θしーた end_ARG start_POSTSUBSCRIPT italic_n italic_i end_POSTSUBSCRIPT - over¯ start_ARG over^ start_ARG italic_θしーた end_ARG end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ( over^ start_ARG italic_θしーた end_ARG start_POSTSUBSCRIPT italic_n italic_i end_POSTSUBSCRIPT - over¯ start_ARG over^ start_ARG italic_θしーた end_ARG end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ,

(8)

where $\bar{\hat{\theta}}_{n}$ θしーた end_ARG end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is the sample mean of $\hat{\theta}_{n1},\ldots,\hat{\theta}_{n\lfloor k_{n}\rfloor}$ θしーた end_ARG start_POSTSUBSCRIPT italic_n 1 end_POSTSUBSCRIPT , … , over^ start_ARG italic_θしーた end_ARG start_POSTSUBSCRIPT italic_n ⌊ italic_k start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⌋ end_POSTSUBSCRIPT. For better stableness, we may repeat the above splitting and estimating procedure for many times and take the average of the resulting variance estimates as a final variance estimate.

Theorem 4.

Let $\hat{\theta}_{n}$ θしーた end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT be an estimator of $\theta_{0}$ θしーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT based on i.i.d samples $\{O_{i}\}_{i=1}^{n}$ and $\sqrt{n}(\hat{\theta}_{n}-\theta_{0}){\overset{d}{\longrightarrow\;}}N(0,\Sigma)$ Σしぐま\sqrt{n}(\hat{\theta}_{n}-\theta_{0}){\overset{d}{\longrightarrow\;}}N(0,\Sigma)square-root start_ARG italic_n end_ARG ( over^ start_ARG italic_θしーた end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_θしーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) overitalic_d start_ARG ⟶ end_ARG italic_N ( 0 , roman_Σしぐま ). Let $k_{n}=n^{\tilde{\alpha}}$ αあるふぁ end_ARG end_POSTSUPERSCRIPT for some $\tilde{\alpha}\in(0,1)$ αあるふぁ end_ARG ∈ ( 0 , 1 ), $m_{n}=\lfloor n^{1-\tilde{\alpha}}\rfloor$ αあるふぁ end_ARG end_POSTSUPERSCRIPT ⌋ and $\hat{\Sigma}$ Σしぐま\hat{\Sigma}over^ start_ARG roman_Σしぐま end_ARG be the variance estimator in (8). Then $\hat{\Sigma}=\Sigma+o_{p}(1)$ ΣしぐまΣしぐまsubscript𝑜𝑝1\hat{\Sigma}=\Sigma+o_{p}(1)over^ start_ARG roman_Σしぐま end_ARG = roman_Σしぐま + italic_o start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( 1 ) as $n\rightarrow\infty$ .

Theorem 4 guarantees the validity of the data-splitting variance estimator. This method is very easy to use and is flexible enough for general purpose. Alternatively, we may construct a variance estimator by bootstrap. However, the consistency of a bootstrap variance estimator often requires stronger conditions (Groeneboom and Hendrickx, 2017) and is often very difficult to prove, especially under shape restrictions.

As a specific application, we apply the data-splitting estimation method to construct an estimator for the information bound or the asymptotic variance $I^{-1}(\beta_{0})$ βべーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) of $\hat{\beta}$ βべーた end_ARG. Denote the resulting estimator by $\widehat{I^{-1}(\beta_{0})}$ βべーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_ARG, which is consistent to $I^{-1}(\beta_{0})$ βべーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) by Theorem 4. Therefore $n(\hat{\beta}-\beta_{0})^{\mathrm{\scriptscriptstyle\top}}\{\widehat{I^{-1}(% \beta_{0})}\}^{-1}(\hat{\beta}-\beta_{0})$ βべーた end_ARG - italic_βべーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT { over^ start_ARG italic_I start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_βべーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_ARG } start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( over^ start_ARG italic_βべーた end_ARG - italic_βべーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) follows asymptotically $\chi^{2}(d)$ χかい start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_d ), a chisquare distribution of $d$ degrees of freedom. For $\alpha\in(0,1)$ αあるふぁ ∈ ( 0 , 1 ), let $\chi^{2}_{1-\alpha}(d)$ χかい start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 - italic_αあるふぁ end_POSTSUBSCRIPT ( italic_d ) be the $(1-\alpha)$ αあるふぁ ) quantile of $\chi^{2}(d)$ χかい start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_d ). A $(1-\alpha)$ αあるふぁ )-level confidence region for $\beta_{0}$ βべーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT can be constructed as

\displaystyle\{\beta:n(\hat{\beta}-\beta)^{\mathrm{\scriptscriptstyle\top}}\{% \widehat{I^{-1}(\beta_{0})}\}^{-1}(\hat{\beta}-\beta)\leq\chi^{2}_{1-\alpha}(d% )\}.

βべーた : italic_n ( over^ start_ARG italic_βべーた end_ARG - italic_βべーた ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT { over^ start_ARG italic_I start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_βべーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_ARG } start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( over^ start_ARG italic_βべーた end_ARG - italic_βべーた ) ≤ italic_χかい start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 - italic_αあるふぁ end_POSTSUBSCRIPT ( italic_d ) } .

(9)

And for the hypothesis $H_{0}:\beta=\beta_{0}\leftrightarrow H_{1}:\beta\neq\beta_{0}$ βべーた = italic_βべーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ↔ italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : italic_βべーた ≠ italic_βべーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, we propose to reject $H_{0}$ at the significance level $\alpha$ αあるふぁ if

\displaystyle n(\hat{\beta}-\beta_{0})^{\mathrm{\scriptscriptstyle\top}}\{% \widehat{I^{-1}(\beta_{0})}\}^{-1}(\hat{\beta}-\beta_{0})>\chi^{2}_{1-\alpha}(% d).

βべーた end_ARG - italic_βべーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT { over^ start_ARG italic_I start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_βべーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_ARG } start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( over^ start_ARG italic_βべーた end_ARG - italic_βべーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) > italic_χかい start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 - italic_αあるふぁ end_POSTSUBSCRIPT ( italic_d ) .

(10)

By Theorems 3 and 4, the confidence region (9) has an asymptotically correct $(1-\alpha)$ αあるふぁ ) coverage probability, and the test defined by the rejection region (10) has an asymptotically correct type I eror $\alpha$ αあるふぁ.

5 Simulations

In this section, we conduct simulations to assess the finite-sample performance of the proposed SMPLE $\hat{\beta}$ βべーた end_ARG and the proposed confidence region (9) for the linear covariate effect $\beta$ βべーた. To generate data, we take $X$ and $Z$ to be two scalar random variables, which are iid from the standard normal distribution, and take the conditional distribution of $T$ given $(X,Z)$ to be an exponential distribution with mean $1/\exp(\beta_{0}X+g_{0}(Z))$ βべーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_X + italic_g start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_Z ) ). Therefore the conditional hazard function of $T$ given $(X,Z)$ is $\exp(\beta_{0}X+g_{0}(Z))$ βべーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_X + italic_g start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_Z ) ). We set the censoring time $C$ to follow a uniform distribution on $(0,c)$ . We consider three scenarios for $g_{0}$ : (I) $g_{0}(z)=-2z$ , (II) $g_{0}(z)=-|z|^{3}/2$ and (III) $g_{0}(z)=2|z|$ . We set $\beta_{0}=-2$ βべーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = - 2 and consider two choices for $c$ : $5$ and $10$ , and two sample sizes: $600$ and $800$ . A larger $c$ results in a smaller censoring proportion.

When implementing our SMPLE, we set $\boldsymbol{k}_{0}$ to be $3,4,3$ in Scenarios I–III, respectively. For comparison, we also consider the traditional Cox regression estimator (TCR) of $\beta$ βべーた and the partial likelihood estimator of $\beta$ βべーた with the $r$ -order polynomial splines under the partially linear additive model (Huang, 1999, SPLA- $r$ ), where $r$ may be $2,3$ or $4$ . We generate 1000 random samples to evaluate the performance of the above five estimation methods.

5.1 Point estimation

Table 1 presents $100$ times the simulated root mean square errors (RMSEs) and the absolute biases (BIASs) of these estimators. The model assumptions of SMPLE and the SPLA- $r$ are correct in all the three scenarios, whereas the standard Cox model is correctly specified only in Scenario I. As expected, in Scenario I, TCR has uniformly the best performance among the five estimators under comparison in terms of RMSE and BIAS. Nevertheless, the SMPLE and the SPLA- $r$ estimators have almost the same RMSEs and BIASs. When the standard Cox model is misspecified in Scenarios II and III, TCR has much larger RMSEs and BIASs than SMPLE and the SPLA- $r$ , or equivalently, SMPLE and the SPLA- $r$ have clear priority over TCR. Compared with SPLA- $r$ , SMPLE is comparable and slightly inferior in Scenarios I and II, but has uniformly much smaller RMSEs and BIASs in Scenario III. A possible explanation for this phenomenon is that although continuous in all three scenarios, the hazard function is smooth in Scenarios I and II but nonsmooth in Scenario III. As the sample size $n$ or the constant $c$ increases, we have more completely observed data, consequently all estimators have improved performance when the underlying model assumption is correct. A counterexample is the performance of TCR in Scenarios II and III, where TCR has larger RMSEs and BIASs as $n$ or $c$ increases.

Figure 1 displays the boxplots of the SMPLE and SPLA- $r$ (r=2, 3, 4) estimators (minus $\beta_{0}$ βべーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT) of $\beta$ βべーた under study when the sample size is $n=800$ . TCR is excluded here as it has extremely large RMSEs and BIASs in Scenarios II and III. SMPLE and the three SPLA- $r$ exhibit almost the same performance in Scenario I. In Scenario II, where the true hazard function is smooth, the four methods have close variances, but from SMPLE, to SPLA- $2$ , SPLA- $3$ , and SPLA- $4$ , their BIASs become smaller and smaller. In Scenario III, where the true hazard function is nonsmooth, the four methods have close variances again, however the three SPLA- $r$ estimators have much larger BIASs than SMPLE, whose BIASs are negligible.

Table 1: Simulated root mean square errors and absolute biases (in parentheses) of the five point estimators under comparison. All results have been multiplied by 100.

Scenario	$c$	$n$	SMPLE	TCR	SPLA- $2$	SPLA- $3$	SPLA- $4$
I	5	600	$\underset{(7.29)}{9.25}$	$\underset{(7.17)}{9.13}$	$\underset{(7.21)}{9.17}$	$\underset{(7.27)}{9.22}$	$\underset{(7.32)}{9.29}$
	5	800	$\underset{(6.36)}{8.04}$	$\underset{(6.30)}{7.95}$	$\underset{(6.33)}{7.99}$	$\underset{(6.35)}{8.01}$	$\underset{(6.36)}{8.03}$
	10	600	$\underset{(7.19)}{8.96}$	$\underset{(7.07)}{8.86}$	$\underset{(7.11)}{8.90}$	$\underset{(7.17)}{8.96}$	$\underset{(7.21)}{9.01}$
	10	800	$\underset{(6.07)}{7.65}$	$\underset{(6.03)}{7.57}$	$\underset{(6.05)}{7.61}$	$\underset{(6.05)}{7.62}$	$\underset{(6.06)}{7.64}$
II	5	600	$\underset{(8.19)}{10.29}$	$\underset{(67.57)}{68.89}$	$\underset{(7.65)}{9.46}$	$\underset{(7.66)}{9.46}$	$\underset{(7.78)}{9.70}$
	5	800	$\underset{(6.68)}{8.49}$	$\underset{(69.02)}{70.06}$	$\underset{(6.53)}{8.17}$	$\underset{(6.44)}{8.10}$	$\underset{(6.36)}{8.09}$
	10	600	$\underset{(7.86)}{9.89}$	$\underset{(77.53)}{78.58}$	$\underset{(7.42)}{9.21}$	$\underset{(7.40)}{9.18}$	$\underset{(7.44)}{9.30}$
	10	800	$\underset{(6.46)}{8.20}$	$\underset{(78.79)}{79.62}$	$\underset{(6.48)}{8.04}$	$\underset{(6.38)}{7.93}$	$\underset{(6.15)}{7.78}$
III	5	600	$\underset{(6.87)}{8.54}$	$\underset{(63.15)}{63.46}$	$\underset{(18.83)}{20.54}$	$\underset{(17.63)}{19.39}$	$\underset{(7.95)}{9.74}$
	5	800	$\underset{(5.74)}{7.28}$	$\underset{(63.35)}{63.61}$	$\underset{(19.60)}{20.92}$	$\underset{(18.62)}{19.98}$	$\underset{(7.74)}{9.28}$
	10	600	$\underset{(6.78)}{8.43}$	$\underset{(63.40)}{63.70}$	$\underset{(19.40)}{21.05}$	$\underset{(18.21)}{19.91}$	$\underset{(8.04)}{9.86}$
	10	800	$\underset{(5.66)}{7.17}$	$\underset{(63.47)}{63.71}$	$\underset{(20.01)}{21.25}$	$\underset{(19.04)}{20.31}$	$\underset{(7.78)}{9.34}$

Refer to caption — Figure 1: Boxplots of the SMPLE, SPLA- $2$ , SPLA- $3$ and SPLA- $4$ estimates (minus $\beta_{0}$ βべーた start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT) of $\beta$ βべーた when the sample size is $n=800$ .

Table 2: Simulated coverage probabilities and average interval lengths (in parentheses) of the confidence intervals at the 95% confidence level based on the five estimators under comparison

Scenario	$c$	$n$	SMPLE	TCR	SPLA- $2$	SPLA- $3$	SPLA- $4$
I	5	600	$\underset{(0.412)}{0.947}$	$\underset{(0.384)}{0.930}$	$\underset{(0.396)}{0.940}$	$\underset{(0.409)}{0.944}$	$\underset{(0.422)}{0.948}$
	5	800	$\underset{(0.343)}{0.941}$	$\underset{(0.324)}{0.937}$	$\underset{(0.332)}{0.937}$	$\underset{(0.340)}{0.941}$	$\underset{(0.348)}{0.949}$
	10	600	$\underset{(0.388)}{0.940}$	$\underset{(0.364)}{0.930}$	$\underset{(0.374)}{0.937}$	$\underset{(0.385)}{0.940}$	$\underset{(0.395)}{0.942}$
	10	800	$\underset{(0.327)}{0.945}$	$\underset{(0.310)}{0.935}$	$\underset{(0.317)}{0.941}$