Abstract: |
We revisit the classic semiparametric problem of inference on a low
dimensional parameter θ_0 in the presence of high-dimensional nuisance
parameters η_0. We depart from the classical setting by allowing for η_0 to be
so high-dimensional that the traditional assumptions, such as Donsker
properties, that limit complexity of the parameter space for this object break
down. To estimate η_0, we consider the use of statistical or machine learning
(ML) methods which are particularly well-suited to estimation in modern, very
high-dimensional cases. ML methods perform well by employing regularization to
reduce variance and trading off regularization bias with overfitting in
practice. However, both regularization bias and overfitting in estimating η_0
cause a heavy bias in estimators of θ_0 that are obtained by naively plugging
ML estimators of η_0 into estimating equations for θ_0. This bias results in
the naive estimator failing to be N^(-1/2) consistent, where N is the sample
size. We show that the impact of regularization bias and overfitting on
estimation of the parameter of interest θ_0 can be removed by using two
simple, yet critical, ingredients: (1) using Neyman-orthogonal moments/scores
that have reduced sensitivity with respect to nuisance parameters to estimate
θ_0, and (2) making use of cross-fitting which provides an efficient form of
data-splitting. We call the resulting set of methods double or debiased ML
(DML). We verify that DML delivers point estimators that concentrate in a
N^(-1/2)-neighborhood of the true parameter values and are approximately
unbiased and normally distributed, which allows construction of valid
confidence statements. The generic statistical theory of DML is elementary and
simultaneously relies on only weak theoretical requirements which will admit
the use of a broad array of modern ML methods for estimating the nuisance
parameters such as random forests, lasso, ridge, deep neural nets, boosted
trees, and various hybrids and ensembles of these methods. We illustrate the
general theory by applying it to provide theoretical properties of DML applied
to learn the main regression parameter in a partially linear regression model,
DML applied to learn the coefficient on an endogenous variable in a partially
linear instrumental variables model, DML applied to learn the average
treatment effect and the average treatment effect on the treated under
unconfoundedness, and DML applied to learn the local average treatment effect
in an instrumental variables setting. In addition to these theoretical
applications, we also illustrate the use of DML in three empirical examples. |