(Translated by https://www.hiragana.jp/)
Generalized Estimation and Information

Generalized Estimation and Information

Paul Vos and Qiang Wu East Carolina University, vosp@ecu.eduEast Carolina University, wuq@ecu.edu
Abstract

This paper extends the idea of a generalized estimator for a scalar parameter (Vos,, 2022) to multi-dimensional parameters both with and without nuisance parameters. The title reflects the fact that generalized estimators provide more than simply another method to find point estimators, and that the methods to assess generalized estimators differ from those for point estimators. By generalized estimation we mean the use of generalized estimators together with an extended definition of information to assess their inferential properties. We show that Fisher information provides an upper bound for the information utilized by an estimator and that the score attains this bound.

Key words: Cram\textipaΓ©r-Rao bound, Fisher information, geometry, score, slope

1 Introduction

The maximum likelihood estimator need not be efficient and, among the class of biased estimators, it need not be admissible. These issues with maximum likelihood estimation and the parameter dependence of other point estimators are addressed using generalized estimators. Generalized estimators are described by information rather than variance and the Fisher information provides an upper for the information of an estimator. This bound applies to all generalized estimators; it does not require estimators to be unbiased. The score is a generalized estimator and its information equals the Fisher information.

A point estimator assigns to each value y𝑦yitalic_y in the sample space a point in the parameter space Ξ˜γ—γƒΌγŸΞ˜γ—γƒΌγŸ\Thetaroman_Ξ˜γ—γƒΌγŸ. A generalized estimator g𝑔gitalic_g assigns to each y𝑦yitalic_y a function gysubscript𝑔𝑦g_{y}italic_g start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT on Ξ˜γ—γƒΌγŸΞ˜γ—γƒΌγŸ\Thetaroman_Ξ˜γ—γƒΌγŸ where gy⁒(ΞΈγ—γƒΌγŸ)subscriptπ‘”π‘¦πœƒg_{y}(\theta)italic_g start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ( italic_ΞΈγ—γƒΌγŸ ) indicates the consistency of ΞΈγ—γƒΌγŸπœƒ\thetaitalic_ΞΈγ—γƒΌγŸ with y𝑦yitalic_y. The function gysubscript𝑔𝑦g_{y}italic_g start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT can be thought of as a continuum of tests statistics evaluated at y𝑦yitalic_y. The information of g𝑔gitalic_g describes the average rate at which these test statistics change with ΞΈγ—γƒΌγŸπœƒ\thetaitalic_ΞΈγ—γƒΌγŸ. Section 2 presents the scalar parameter case in a manner for natural extension to multi-dimensional parameters in Section 3. Section 4 presents two examples: one to illustrate the role of information in assessing estimators and the other to illustrate how confidence intervals can be obtained from a generalized estimate.

2 One Parameter Families

As we want inferences to be unaffected by the choice of parameter, we describe the basics of inference without these. Parameterization will be introduced to describe the smooth structures of estimators.

Let M𝒳subscript𝑀𝒳M_{\mathcal{\mathcal{X}}}italic_M start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT be a family of probability measures having common support 𝒳𝒳\mathcal{X}caligraphic_X. While 𝒳𝒳\mathcal{X}caligraphic_X can be an abstract space, for most applications π’³βŠ‚β„d𝒳superscriptℝ𝑑\mathcal{X}\subset\mathbb{R}^{d}caligraphic_X βŠ‚ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. Points in M𝒳subscript𝑀𝒳M_{\mathcal{X}}italic_M start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT serve as models for a population whose individuals take values in 𝒳𝒳\mathcal{X}caligraphic_X. We consider inference for models from M𝒳subscript𝑀𝒳M_{\mathcal{X}}italic_M start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT based on a sample that is denoted by y𝑦yitalic_y and let 𝒴𝒴\mathcal{Y}caligraphic_Y be the corresponding sample space. The relationship between 𝒳𝒳\mathcal{X}caligraphic_X and 𝒴𝒴\mathcal{Y}caligraphic_Y will depend on the sampling plan, conditioning, and dimension reduction using sufficient statistics. For a simple random sample of size n𝑛nitalic_n without conditioning and no dimension reduction 𝒴=𝒳n𝒴superscript𝒳𝑛\mathcal{Y}=\mathcal{X}^{n}caligraphic_Y = caligraphic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT.

Let M=M𝒴𝑀subscript𝑀𝒴M=M_{\mathcal{Y}}italic_M = italic_M start_POSTSUBSCRIPT caligraphic_Y end_POSTSUBSCRIPT be the family of probability measures obtained from M𝒳subscript𝑀𝒳M_{\mathcal{X}}italic_M start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT using a sampling plan whose sample space is 𝒴𝒴\mathcal{Y}caligraphic_Y. For 𝒴=𝒳n𝒴superscript𝒳𝑛\mathcal{Y}=\mathcal{X}^{n}caligraphic_Y = caligraphic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT

M={m:m⁒(y)=∏m𝒳⁒(xi),mπ’³βˆˆM𝒳}.𝑀conditional-setπ‘šformulae-sequenceπ‘šπ‘¦productsubscriptπ‘šπ’³subscriptπ‘₯𝑖subscriptπ‘šπ’³subscript𝑀𝒳M=\left\{m:m(y)=\prod m_{\mathcal{X}}(x_{i}),\ m_{\mathcal{X}}\in M_{\mathcal{% X}}\right\}.italic_M = { italic_m : italic_m ( italic_y ) = ∏ italic_m start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , italic_m start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT ∈ italic_M start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT } .

For the Bernoulli family of distributions, 𝒳={0,1}𝒳01\mathcal{X}=\left\{0,1\right\}caligraphic_X = { 0 , 1 },

M𝒳={m:0<m⁒(1)<1,m⁒(0)+m⁒(1)=1}.subscript𝑀𝒳conditional-setπ‘šformulae-sequence0π‘š11π‘š0π‘š11M_{\mathcal{X}}=\left\{m:0<m(1)<1,m(0)+m(1)=1\right\}.italic_M start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT = { italic_m : 0 < italic_m ( 1 ) < 1 , italic_m ( 0 ) + italic_m ( 1 ) = 1 } .

For a sample of size n𝑛nitalic_n we use the sufficient statistic y=βˆ‘xi𝑦subscriptπ‘₯𝑖y=\sum x_{i}italic_y = βˆ‘ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT so that 𝒴={0,1,2,…,n}𝒴012…𝑛\mathcal{Y}=\left\{0,1,2,\ldots,n\right\}caligraphic_Y = { 0 , 1 , 2 , … , italic_n } and

M={m:m⁒(y)=(ny)⁒m𝒳⁒(1)y⁒m𝒳⁒(0)nβˆ’y,mπ’³βˆˆM𝒳}.𝑀conditional-setπ‘šformulae-sequenceπ‘šπ‘¦binomial𝑛𝑦subscriptπ‘šπ’³superscript1𝑦subscriptπ‘šπ’³superscript0𝑛𝑦subscriptπ‘šπ’³subscript𝑀𝒳M=\left\{m:m(y)={n\choose y}m_{\mathcal{X}}(1)^{y}m_{\mathcal{X}}(0)^{n-y},\ m% _{\mathcal{X}}\in M_{\mathcal{X}}\right\}.italic_M = { italic_m : italic_m ( italic_y ) = ( binomial start_ARG italic_n end_ARG start_ARG italic_y end_ARG ) italic_m start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT ( 1 ) start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT ( 0 ) start_POSTSUPERSCRIPT italic_n - italic_y end_POSTSUPERSCRIPT , italic_m start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT ∈ italic_M start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT } . (1)

When 𝒴𝒴\mathcal{Y}caligraphic_Y is open it will be convenient to let mπ‘šmitalic_m be a probability density with respect to a dominating measure ΞΌγΏγ‚…γƒΌπœ‡\muitalic_μみゅー. For 𝒳=ℝ𝒳ℝ\mathcal{X}=\mathbb{R}caligraphic_X = blackboard_R and function Ο•>0italic-Ο•0\phi>0italic_Ο• > 0 such that βˆ«Ο•β’(x)⁒𝑑μみゅー=1italic-Ο•π‘₯differential-dπœ‡1\int\phi(x)d\mu=1∫ italic_Ο• ( italic_x ) italic_d italic_μみゅー = 1 there is a location family

M𝒳={m:m⁒(x)=ϕ⁒(xβˆ’a),aβˆˆβ„}.subscript𝑀𝒳conditional-setπ‘šformulae-sequenceπ‘šπ‘₯italic-Ο•π‘₯π‘Žπ‘Žβ„M_{\mathcal{X}}=\left\{m:m(x)=\phi(x-a),\ a\in\mathbb{R}\right\}.italic_M start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT = { italic_m : italic_m ( italic_x ) = italic_Ο• ( italic_x - italic_a ) , italic_a ∈ blackboard_R } .

For a simple random sample with y=(x1,x2,…,xn)πšπ‘¦superscriptsubscriptπ‘₯1subscriptπ‘₯2…subscriptπ‘₯π‘›πšy=\left(x_{1},x_{2},\ldots,x_{n}\right)^{{\tt t}}italic_y = ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT typewriter_t end_POSTSUPERSCRIPT

M={m:m⁒(y)=βˆΟ•β’(xiβˆ’a),aβˆˆβ„}.𝑀conditional-setπ‘šformulae-sequenceπ‘šπ‘¦productitalic-Ο•subscriptπ‘₯π‘–π‘Žπ‘Žβ„M=\left\{m:m(y)=\prod\phi(x_{i}-a),\ a\in\mathbb{R}\right\}.italic_M = { italic_m : italic_m ( italic_y ) = ∏ italic_Ο• ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_a ) , italic_a ∈ blackboard_R } .

If ϕ⁒(x)=(2⁒πぱい)βˆ’1/2⁒exp⁑(βˆ’12⁒x2)italic-Ο•π‘₯superscript2πœ‹1212superscriptπ‘₯2\phi(x)=\left(2\pi\right)^{-1/2}\exp\left(-\frac{1}{2}x^{2}\right)italic_Ο• ( italic_x ) = ( 2 italic_πぱい ) start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT roman_exp ( - divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) then M𝒳subscript𝑀𝒳M_{\mathcal{X}}italic_M start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT is the normal location family with unit variance. Using the sufficient statistic y=xΒ―=(βˆ‘xi)/nβˆˆπ’΄=ℝ𝑦¯π‘₯subscriptπ‘₯𝑖𝑛𝒴ℝy=\bar{x}=(\sum x_{i})/n\in\mathcal{Y}=\mathbb{R}italic_y = overΒ― start_ARG italic_x end_ARG = ( βˆ‘ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) / italic_n ∈ caligraphic_Y = blackboard_R,

M={m:m⁒(y)=n⁒ϕ⁒(n⁒(yβˆ’a)),aβˆˆβ„}.𝑀conditional-setπ‘šformulae-sequenceπ‘šπ‘¦π‘›italic-Ο•π‘›π‘¦π‘Žπ‘Žβ„M=\left\{m:m\left(y\right)=\sqrt{n}\phi\left(\sqrt{n}\left(y-a\right)\right),a% \in\mathbb{R}\right\}.italic_M = { italic_m : italic_m ( italic_y ) = square-root start_ARG italic_n end_ARG italic_Ο• ( square-root start_ARG italic_n end_ARG ( italic_y - italic_a ) ) , italic_a ∈ blackboard_R } . (2)

If ϕ⁒(x)=Ο€γ±γ„βˆ’1⁒(1+x2)βˆ’1italic-Ο•π‘₯superscriptπœ‹1superscript1superscriptπ‘₯21\phi(x)=\pi^{-1}\left(1+x^{2}\right)^{-1}italic_Ο• ( italic_x ) = italic_πぱい start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( 1 + italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT then M𝒳subscript𝑀𝒳M_{\mathcal{X}}italic_M start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT is the Cauchy location family with unit scale factor. There is no sufficient statistic of dimension less than n𝑛nitalic_n so we use y=(x1,x2,…,xn)πšπ‘¦superscriptsubscriptπ‘₯1subscriptπ‘₯2…subscriptπ‘₯π‘›πšy=\left(x_{1},x_{2},\ldots,x_{n}\right)^{{\tt t}}italic_y = ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT typewriter_t end_POSTSUPERSCRIPT,

M={m:m⁒(y)=Ο€γ±γ„βˆ’n⁒∏(1+(xiβˆ’a)2)βˆ’1,aβˆˆβ„}.𝑀conditional-setπ‘šformulae-sequenceπ‘šπ‘¦superscriptπœ‹π‘›productsuperscript1superscriptsubscriptπ‘₯π‘–π‘Ž21π‘Žβ„M=\left\{m:m(y)=\pi^{-n}\prod\left(1+\left(x_{i}-a\right)^{2}\right)^{-1},a\in% \mathbb{R}\right\}.italic_M = { italic_m : italic_m ( italic_y ) = italic_πぱい start_POSTSUPERSCRIPT - italic_n end_POSTSUPERSCRIPT ∏ ( 1 + ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_a ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , italic_a ∈ blackboard_R } . (3)

For real-valued measurable function hβ„Žhitalic_h we define the expected value of hβ„Žhitalic_h at mπ‘šmitalic_m,

Em⁒h=βˆ«π’΄h⁒(y)⁒m⁒(y)⁒𝑑μみゅーsubscriptπΈπ‘šβ„Žsubscriptπ’΄β„Žπ‘¦π‘šπ‘¦differential-dπœ‡E_{m}h=\int_{\mathcal{Y}}h(y)m(y)d\muitalic_E start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_h = ∫ start_POSTSUBSCRIPT caligraphic_Y end_POSTSUBSCRIPT italic_h ( italic_y ) italic_m ( italic_y ) italic_d italic_μみゅー

when 𝒴𝒴\mathcal{Y}caligraphic_Y is open and Em⁒h=βˆ‘yβˆˆπ’΄h⁒(y)⁒m⁒(y)subscriptπΈπ‘šβ„Žsubscriptπ‘¦π’΄β„Žπ‘¦π‘šπ‘¦E_{m}h=\sum_{y\in\mathcal{Y}}h(y)m(y)italic_E start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_h = βˆ‘ start_POSTSUBSCRIPT italic_y ∈ caligraphic_Y end_POSTSUBSCRIPT italic_h ( italic_y ) italic_m ( italic_y ) when 𝒴𝒴\mathcal{Y}caligraphic_Y is discrete. We use the following Hilbert space

HM={h:Em⁒h2<∞,βˆ€m∈M}subscript𝐻𝑀conditional-setβ„Žformulae-sequencesubscriptπΈπ‘šsuperscriptβ„Ž2for-allπ‘šπ‘€H_{M}=\left\{h:E_{m}h^{2}<\infty,\ \forall\ m\in M\right\}italic_H start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT = { italic_h : italic_E start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_h start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT < ∞ , βˆ€ italic_m ∈ italic_M }

which has a family of inner products indexed by M𝑀Mitalic_M,

⟨h,hβ€²βŸ©m=Em⁒(h⁒hβ€²)⁒for all ⁒h,hβ€²βˆˆHM.formulae-sequencesubscriptβ„Žsuperscriptβ„Žβ€²π‘šsubscriptπΈπ‘šβ„Žsuperscriptβ„Žβ€²for allΒ β„Žsuperscriptβ„Žβ€²subscript𝐻𝑀\langle h,h^{\prime}\rangle_{m}=E_{m}\left(hh^{\prime}\right)\mbox{for all }h,% h^{\prime}\in H_{M}.⟨ italic_h , italic_h start_POSTSUPERSCRIPT β€² end_POSTSUPERSCRIPT ⟩ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = italic_E start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_h italic_h start_POSTSUPERSCRIPT β€² end_POSTSUPERSCRIPT ) for all italic_h , italic_h start_POSTSUPERSCRIPT β€² end_POSTSUPERSCRIPT ∈ italic_H start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT .

When Em⁒(h⁒hβ€²)=0subscriptπΈπ‘šβ„Žsuperscriptβ„Žβ€²0E_{m}(hh^{\prime})=0italic_E start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_h italic_h start_POSTSUPERSCRIPT β€² end_POSTSUPERSCRIPT ) = 0 the vectors hβ„Žhitalic_h and hβ€²superscriptβ„Žβ€²h^{\prime}italic_h start_POSTSUPERSCRIPT β€² end_POSTSUPERSCRIPT are mπ‘šmitalic_m-orthogonal and we write hβŸ‚mhβ€²subscriptperpendicular-toπ‘šβ„Žsuperscriptβ„Žβ€²h\perp_{m}h^{\prime}italic_h βŸ‚ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_h start_POSTSUPERSCRIPT β€² end_POSTSUPERSCRIPT. At each m∈Mπ‘šπ‘€m\in Mitalic_m ∈ italic_M there is a copy of HMsubscript𝐻𝑀H_{M}italic_H start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT and this collection we denote by

H⁒M=MΓ—HM.𝐻𝑀𝑀subscript𝐻𝑀H\!M=M\times H_{M}.italic_H italic_M = italic_M Γ— italic_H start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT .

The copy of HMsubscript𝐻𝑀H_{M}italic_H start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT at mπ‘šmitalic_m with inner product βŸ¨β‹…,β‹…βŸ©msubscriptβ‹…β‹…π‘š\langle\cdot,\cdot\rangle_{m}⟨ β‹… , β‹… ⟩ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT is Hmsubscriptπ»π‘šH_{m}italic_H start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT which we also write as Hm⁒Msubscriptπ»π‘šπ‘€H_{m}Mitalic_H start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_M to indicate its relationship to H⁒M𝐻𝑀H\!Mitalic_H italic_M. For inference, Hmsubscriptπ»π‘šH_{m}italic_H start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT will be restricted to the orthogonal complement of the constant functions, HmβŸ‚={h∈Hm:Em⁒h=0}superscriptsubscriptπ»π‘šperpendicular-toconditional-setβ„Žsubscriptπ»π‘šsubscriptπΈπ‘šβ„Ž0H_{m}^{\perp}=\{h\in H_{m}:E_{m}h=0\}italic_H start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT βŸ‚ end_POSTSUPERSCRIPT = { italic_h ∈ italic_H start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT : italic_E start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_h = 0 }, so that

Hm=HmβŸ‚βŠ•Hm0⁒and⁒HmβŸ‚βŸ‚mHm0.subscriptπ»π‘šdirect-sumsuperscriptsubscriptπ»π‘šperpendicular-tosuperscriptsubscriptπ»π‘š0andsuperscriptsubscriptπ»π‘šperpendicular-tosubscriptperpendicular-toπ‘šsuperscriptsubscriptπ»π‘š0H_{m}=H_{m}^{\perp}\oplus H_{m}^{0}\ \mbox{and}\ H_{m}^{\perp}\perp_{m}H_{m}^{% 0}.italic_H start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = italic_H start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT βŸ‚ end_POSTSUPERSCRIPT βŠ• italic_H start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT and italic_H start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT βŸ‚ end_POSTSUPERSCRIPT βŸ‚ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_H start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT . (4)

Note Em⁒h=⟨h,1⟩msubscriptπΈπ‘šβ„Žsubscriptβ„Ž1π‘šE_{m}h=\langle h,1\rangle_{m}italic_E start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_h = ⟨ italic_h , 1 ⟩ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT and Hm0superscriptsubscriptπ»π‘š0H_{m}^{0}italic_H start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT does not depend on mπ‘šmitalic_m. Since (4) holds for each mπ‘šmitalic_m we write

H⁒M=HβŸ‚β’MβŠ•H0⁒M⁒and⁒HβŸ‚β’MβŸ‚H0⁒M𝐻𝑀direct-sumsuperscript𝐻perpendicular-to𝑀superscript𝐻0𝑀andsuperscript𝐻perpendicular-to𝑀perpendicular-tosuperscript𝐻0𝑀H\!M=H^{\perp}\!M\oplus H^{0}M\ \mbox{and}\ H^{\perp}M\perp H^{0}Mitalic_H italic_M = italic_H start_POSTSUPERSCRIPT βŸ‚ end_POSTSUPERSCRIPT italic_M βŠ• italic_H start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT italic_M and italic_H start_POSTSUPERSCRIPT βŸ‚ end_POSTSUPERSCRIPT italic_M βŸ‚ italic_H start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT italic_M (5)

where βŸ‚perpendicular-to\perpβŸ‚ indicates βŸ‚msubscriptperpendicular-toπ‘š\perp_{m}βŸ‚ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT holds for HmβŸ‚β’M=HmβŸ‚superscriptsubscriptπ»π‘šperpendicular-to𝑀superscriptsubscriptπ»π‘šperpendicular-toH_{m}^{\perp}M=H_{m}^{\perp}italic_H start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT βŸ‚ end_POSTSUPERSCRIPT italic_M = italic_H start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT βŸ‚ end_POSTSUPERSCRIPT.

As the notation suggests, H⁒M𝐻𝑀H\negmedspace Mitalic_H italic_M is a vector bundle on M𝑀Mitalic_M with vector space HMsubscript𝐻𝑀H_{M}italic_H start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT. It extends the tangent bundle T⁒M𝑇𝑀T\!Mitalic_T italic_M since T⁒MβŠ‚HβŸ‚β’M𝑇𝑀superscript𝐻perpendicular-to𝑀T\!M\subset H^{\perp}\!Mitalic_T italic_M βŠ‚ italic_H start_POSTSUPERSCRIPT βŸ‚ end_POSTSUPERSCRIPT italic_M.

For inference regarding models in M𝑀Mitalic_M, we consider functions gM:𝒴×M→ℝ:subscript𝑔𝑀→𝒴𝑀ℝg_{M}:\mathcal{Y}\times M\rightarrow\mathbb{R}italic_g start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT : caligraphic_Y Γ— italic_M β†’ blackboard_R such that

gM⁒(β‹…,m)∈HmβŸ‚β’Β for all ⁒m∈M.subscriptπ‘”π‘€β‹…π‘šsuperscriptsubscriptπ»π‘šperpendicular-toΒ for allΒ π‘šπ‘€g_{M}\left(\cdot,m\right)\in H_{m}^{\perp}\mbox{ for all }m\in M.italic_g start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( β‹… , italic_m ) ∈ italic_H start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT βŸ‚ end_POSTSUPERSCRIPT for all italic_m ∈ italic_M . (6)

We also want gMsubscript𝑔𝑀g_{M}italic_g start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT to be a continuous on M𝑀Mitalic_M,

gM⁒(y,β‹…)∈C⁒(M)⁒ for a.e. ⁒yβˆˆπ’΄,subscript𝑔𝑀𝑦⋅𝐢𝑀 for a.e. 𝑦𝒴g_{M}\left(y,\cdot\right)\in C(M)\mbox{ for a.e. }y\in\mathcal{Y},italic_g start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_y , β‹… ) ∈ italic_C ( italic_M ) for a.e. italic_y ∈ caligraphic_Y , (7)

so that the expectation of gMsubscript𝑔𝑀g_{M}italic_g start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT is a continuous function. For point estimators of a parameter, say ΞΈγ—γƒΌγŸπœƒ\thetaitalic_ΞΈγ—γƒΌγŸ, the expectation of the estimator ΞΈγ—γƒΌγŸ^^πœƒ\hat{\theta}over^ start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG is a real number. To emphasize this distinction we use the sans serif font to indicate the expectation of gMsubscript𝑔𝑀g_{M}italic_g start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT

𝖀⁒gM∈C⁒(M)⁒ while ⁒Eβ’ΞΈγ—γƒΌγŸ^βˆˆβ„.𝖀subscript𝑔𝑀𝐢𝑀 while 𝐸^πœƒβ„\mathsf{E}g_{M}\in C(M)\mbox{ while }E\hat{\theta}\in\mathbb{R}.sansserif_E italic_g start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ∈ italic_C ( italic_M ) while italic_E over^ start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG ∈ blackboard_R .

Expectation 𝖀𝖀\mathsf{E}sansserif_E operates on C⁒(M)𝐢𝑀C(M)italic_C ( italic_M )-valued distributions, whereas E𝐸Eitalic_E operates on ℝℝ\mathbb{R}blackboard_R-valued distributions. To be a generalized estimator, gM⁒(y,β‹…)subscript𝑔𝑀𝑦⋅g_{M}(y,\cdot)italic_g start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_y , β‹… ) will be required to have continuous derivatives on M𝑀Mitalic_M and these will be described using parameterizations that are diffeomorphisms.

We assume M𝑀Mitalic_M is a 1-dimensional smooth manifold. While more general manifolds can be considered (e.g., Fisher’s circle model), we will only consider families that have a global parameterization

ΞΈγ—γƒΌγŸ:Mβ†’Ξ˜γ—γƒΌγŸβŠ‚β„:πœƒβ†’π‘€Ξ˜γ—γƒΌγŸβ„\theta:M\rightarrow\Theta\subset\mathbb{R}italic_ΞΈγ—γƒΌγŸ : italic_M β†’ roman_Ξ˜γ—γƒΌγŸ βŠ‚ blackboard_R (8)

and are connected so that Ξ˜γ—γƒΌγŸΞ˜γ—γƒΌγŸ\Thetaroman_Ξ˜γ—γƒΌγŸ is an open interval. For gM:𝒴×M→ℝ:subscript𝑔𝑀→𝒴𝑀ℝg_{M}:\mathcal{Y}\times M\rightarrow\mathbb{R}italic_g start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT : caligraphic_Y Γ— italic_M β†’ blackboard_R we define gΞ˜γ—γƒΌγŸ=gMβˆ˜ΞΈγ—γƒΌγŸβˆ’1:π’΄Γ—Ξ˜γ—γƒΌγŸβ†’β„:subscriptπ‘”Ξ˜γ—γƒΌγŸsubscript𝑔𝑀superscriptπœƒ1β†’π’΄Ξ˜γ—γƒΌγŸβ„g_{\Theta}=g_{M}\circ\theta^{-1}:\mathcal{Y}\times\Theta\rightarrow\mathbb{R}italic_g start_POSTSUBSCRIPT roman_Ξ˜γ—γƒΌγŸ end_POSTSUBSCRIPT = italic_g start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ∘ italic_ΞΈγ—γƒΌγŸ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT : caligraphic_Y Γ— roman_Ξ˜γ—γƒΌγŸ β†’ blackboard_R. Unless more than one parameterization is being used, we drop the subscript and write g𝑔gitalic_g for gΞ˜γ—γƒΌγŸsubscriptπ‘”Ξ˜γ—γƒΌγŸg_{\Theta}italic_g start_POSTSUBSCRIPT roman_Ξ˜γ—γƒΌγŸ end_POSTSUBSCRIPT. The log likelihood function on Ξ˜γ—γƒΌγŸΞ˜γ—γƒΌγŸ\Thetaroman_Ξ˜γ—γƒΌγŸ for y𝑦yitalic_y is the function defined by

β„“=ℓ⁒(y,β‹…)=β„“M⁒(y,β‹…)βˆ˜ΞΈγ—γƒΌγŸβˆ’1ℓℓ𝑦⋅subscriptℓ𝑀𝑦⋅superscriptπœƒ1\ell=\ell(y,\cdot)=\ell_{M}(y,\cdot)\circ\theta^{-1}roman_β„“ = roman_β„“ ( italic_y , β‹… ) = roman_β„“ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_y , β‹… ) ∘ italic_ΞΈγ—γƒΌγŸ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT

where β„“M⁒(y,m)=log⁑m⁒(y)subscriptβ„“π‘€π‘¦π‘šπ‘šπ‘¦\ell_{M}\left(y,m\right)=\log m\left(y\right)roman_β„“ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_y , italic_m ) = roman_log italic_m ( italic_y ). The score function on Ξ˜γ—γƒΌγŸΞ˜γ—γƒΌγŸ\Thetaroman_Ξ˜γ—γƒΌγŸ for y𝑦yitalic_y is

s=βˆ‡β„“=βˆ‚β„“/βˆ‚ΞΈγ—γƒΌγŸ.π‘ βˆ‡β„“β„“πœƒs=\nabla\ell=\partial\ell/\partial\theta.italic_s = βˆ‡ roman_β„“ = βˆ‚ roman_β„“ / βˆ‚ italic_ΞΈγ—γƒΌγŸ .

We only consider M𝑀Mitalic_M such that s⁒(β‹…,ΞΈγ—γƒΌγŸ)∈HMπ‘ β‹…πœƒsubscript𝐻𝑀s(\cdot,\theta)\in H_{M}italic_s ( β‹… , italic_ΞΈγ—γƒΌγŸ ) ∈ italic_H start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT for all ΞΈγ—γƒΌγŸβˆˆΞ˜γ—γƒΌγŸπœƒΞ˜γ—γƒΌγŸ\theta\in\Thetaitalic_ΞΈγ—γƒΌγŸ ∈ roman_Ξ˜γ—γƒΌγŸ. Because M𝑀Mitalic_M is a smooth manifold s⁒(y,β‹…)∈C1⁒(Ξ˜γ—γƒΌγŸ)⁒a.e.yformulae-sequence𝑠𝑦⋅superscript𝐢1Ξ˜γ—γƒΌγŸπ‘Žπ‘’π‘¦s\left(y,\cdot\right)\in C^{1}(\Theta)\ a.e.\ yitalic_s ( italic_y , β‹… ) ∈ italic_C start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( roman_Ξ˜γ—γƒΌγŸ ) italic_a . italic_e . italic_y and since 𝖀⁒s=0𝖀𝑠0\mathsf{E}s=0sansserif_E italic_s = 0,

s⁒(β‹…,ΞΈγ—γƒΌγŸ)∈HΞΈγ—γƒΌγŸβŸ‚.π‘ β‹…πœƒsuperscriptsubscriptπ»πœƒperpendicular-tos(\cdot,\theta)\in H_{\theta}^{\perp}.italic_s ( β‹… , italic_ΞΈγ—γƒΌγŸ ) ∈ italic_H start_POSTSUBSCRIPT italic_ΞΈγ—γƒΌγŸ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT βŸ‚ end_POSTSUPERSCRIPT . (9)

These properties of s𝑠sitalic_s are used to define generalized estimators.

Definition 1.

A generalized estimator for scalar parameter ΞΈγ—γƒΌγŸπœƒ\thetaitalic_ΞΈγ—γƒΌγŸ is a function

g:π’΄Γ—Ξ˜γ—γƒΌγŸβŸΆβ„:π‘”βŸΆπ’΄Ξ˜γ—γƒΌγŸβ„g:\mathcal{Y}\times\Theta\longrightarrow\mathbb{R}italic_g : caligraphic_Y Γ— roman_Ξ˜γ—γƒΌγŸ ⟢ blackboard_R

and g=g⁒(y,β‹…)𝑔𝑔𝑦⋅g=g(y,\cdot)italic_g = italic_g ( italic_y , β‹… ) is the corresponding generalized estimate at y𝑦yitalic_y if

(i) g⁒(y,β‹…)∈C1⁒(Ξ˜γ—γƒΌγŸ)⁒ a.e.⁒y𝑔𝑦⋅superscript𝐢1Ξ˜γ—γƒΌγŸΒ a.e.𝑦\displaystyle\ \ g\left(y,\cdot\right)\in C^{1}(\Theta)\mbox{\ a.e.}\ yitalic_g ( italic_y , β‹… ) ∈ italic_C start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( roman_Ξ˜γ—γƒΌγŸ ) a.e. italic_y
(ii) g⁒(β‹…,ΞΈγ—γƒΌγŸ)∈HΞΈγ—γƒΌγŸβŸ‚β’Β for allΒ β’ΞΈγ—γƒΌγŸπ‘”β‹…πœƒsuperscriptsubscriptπ»πœƒperpendicular-toΒ for allΒ πœƒ\displaystyle\ \ g\left(\cdot,\theta\right)\in H_{\theta}^{\perp}\mbox{ for % all }\thetaitalic_g ( β‹… , italic_ΞΈγ—γƒΌγŸ ) ∈ italic_H start_POSTSUBSCRIPT italic_ΞΈγ—γƒΌγŸ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT βŸ‚ end_POSTSUPERSCRIPT for all italic_ΞΈγ—γƒΌγŸ
(iii) 𝖡⁒(g)>0𝖡𝑔0\displaystyle\ \ \mathsf{V}\left(g\right)>0sansserif_V ( italic_g ) > 0

where 𝖡⁒(g)=𝖀⁒(g2)∈C1⁒(Ξ˜γ—γƒΌγŸ).𝖡𝑔𝖀superscript𝑔2superscript𝐢1Ξ˜γ—γƒΌγŸ\mathsf{V}\left(g\right)=\mathsf{E}\left(g^{2}\right)\in C^{1}(\Theta).sansserif_V ( italic_g ) = sansserif_E ( italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ∈ italic_C start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( roman_Ξ˜γ—γƒΌγŸ ) .

The space of generalized estimators for ΞΈγ—γƒΌγŸπœƒ\thetaitalic_ΞΈγ—γƒΌγŸ is 𝒒𝒒\mathcal{G}caligraphic_G which we write as π’’Ξ˜γ—γƒΌγŸsubscriptπ’’Ξ˜γ—γƒΌγŸ\mathcal{G}_{\Theta}caligraphic_G start_POSTSUBSCRIPT roman_Ξ˜γ—γƒΌγŸ end_POSTSUBSCRIPT if we consider more than one parameterization. Any function fβˆ‰HΞΈγ—γƒΌγŸβŸ‚π‘“superscriptsubscriptπ»πœƒperpendicular-tof\not\in H_{\theta}^{\perp}italic_f βˆ‰ italic_H start_POSTSUBSCRIPT italic_ΞΈγ—γƒΌγŸ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT βŸ‚ end_POSTSUPERSCRIPT that satisfies f∈HΞΈγ—γƒΌγŸπ‘“subscriptπ»πœƒf\in H_{\theta}italic_f ∈ italic_H start_POSTSUBSCRIPT italic_ΞΈγ—γƒΌγŸ end_POSTSUBSCRIPT and conditions (i) and (iii) of Definition 1 is a pre generalized estimator, or simply, a pre estimator. For any pre-estimator f𝑓fitalic_f, its orthogonalization

fβŸ‚=fβˆ’fβŠ€βˆˆπ’’.superscript𝑓perpendicular-to𝑓superscript𝑓top𝒒f^{\perp}=f-f^{\top}\in\mathcal{G}.italic_f start_POSTSUPERSCRIPT βŸ‚ end_POSTSUPERSCRIPT = italic_f - italic_f start_POSTSUPERSCRIPT ⊀ end_POSTSUPERSCRIPT ∈ caligraphic_G . (10)

where f⊀=𝖀⁒fsuperscript𝑓top𝖀𝑓f^{\top}=\mathsf{E}fitalic_f start_POSTSUPERSCRIPT ⊀ end_POSTSUPERSCRIPT = sansserif_E italic_f.

Godambe, (1960) has similar criteria but allows 𝖡⁒(g)=0𝖡𝑔0\mathsf{V}(g)=0sansserif_V ( italic_g ) = 0 for some ΞΈγ—γƒΌγŸβˆˆΞ˜γ—γƒΌγŸπœƒΞ˜γ—γƒΌγŸ\theta\in\Thetaitalic_ΞΈγ—γƒΌγŸ ∈ roman_Ξ˜γ—γƒΌγŸ and adds that 𝖀⁒(βˆ‡g)2>0𝖀superscriptβˆ‡π‘”20\mathsf{E}\left(\nabla g\right)^{2}>0sansserif_E ( βˆ‡ italic_g ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT > 0 so that 𝖀⁒(βˆ‡g)π–€βˆ‡π‘”\mathsf{E}\left(\nabla g\right)sansserif_E ( βˆ‡ italic_g ) can never be zero on Ξ˜γ—γƒΌγŸΞ˜γ—γƒΌγŸ\Thetaroman_Ξ˜γ—γƒΌγŸ. We do not need this restriction since we describe estimators in terms of information rather than variance. Allowing 𝖀⁒(βˆ‡g)π–€βˆ‡π‘”\mathsf{E}\left(\nabla g\right)sansserif_E ( βˆ‡ italic_g ) to be zero will be useful for nuisance parameters in the multi-dimension setting. Because 𝖡⁒(g)>0𝖡𝑔0\mathsf{V}(g)>0sansserif_V ( italic_g ) > 0 we can define the standardization of g𝑔gitalic_g as

gΒ―=g𝖡⁒(g).¯𝑔𝑔𝖡𝑔\bar{g}=\frac{g}{\sqrt{\mathsf{V}(g)}}.overΒ― start_ARG italic_g end_ARG = divide start_ARG italic_g end_ARG start_ARG square-root start_ARG sansserif_V ( italic_g ) end_ARG end_ARG .

Since g¯⁒(β‹…,ΞΈγ—γƒΌγŸ)∈HΞΈγ—γƒΌγŸβŸ‚Β―π‘”β‹…πœƒsuperscriptsubscriptπ»πœƒperpendicular-to\bar{g}(\cdot,\theta)\in H_{\theta}^{\perp}overΒ― start_ARG italic_g end_ARG ( β‹… , italic_ΞΈγ—γƒΌγŸ ) ∈ italic_H start_POSTSUBSCRIPT italic_ΞΈγ—γƒΌγŸ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT βŸ‚ end_POSTSUPERSCRIPT is a vector of unit length, g¯¯𝑔\bar{g}overΒ― start_ARG italic_g end_ARG is also called the direction of g𝑔gitalic_g. Standardized estimators are the same in every parameterization. That is, for any mβ€²βˆˆMsuperscriptπ‘šβ€²π‘€m^{\prime}\in Mitalic_m start_POSTSUPERSCRIPT β€² end_POSTSUPERSCRIPT ∈ italic_M, gΒ―Ξ˜γ—γƒΌγŸβ’(β‹…,ΞΈγ—γƒΌγŸβ€²)=gΒ―Ξžγγ—γƒΌβ’(β‹…,ξくしー′)subscriptΒ―π‘”Ξ˜γ—γƒΌγŸβ‹…superscriptπœƒβ€²subscriptΒ―π‘”Ξžγγ—γƒΌβ‹…superscriptπœ‰β€²\bar{g}_{\Theta}(\cdot,\theta^{\prime})=\bar{g}_{\Xi}(\cdot,\xi^{\prime})overΒ― start_ARG italic_g end_ARG start_POSTSUBSCRIPT roman_Ξ˜γ—γƒΌγŸ end_POSTSUBSCRIPT ( β‹… , italic_ΞΈγ—γƒΌγŸ start_POSTSUPERSCRIPT β€² end_POSTSUPERSCRIPT ) = overΒ― start_ARG italic_g end_ARG start_POSTSUBSCRIPT roman_Ξžγγ—γƒΌ end_POSTSUBSCRIPT ( β‹… , italic_ξくしー start_POSTSUPERSCRIPT β€² end_POSTSUPERSCRIPT ) where ΞΈγ—γƒΌγŸβ€²=ΞΈγ—γƒΌγŸβ’(mβ€²)superscriptπœƒβ€²πœƒsuperscriptπ‘šβ€²\theta^{\prime}=\theta(m^{\prime})italic_ΞΈγ—γƒΌγŸ start_POSTSUPERSCRIPT β€² end_POSTSUPERSCRIPT = italic_ΞΈγ—γƒΌγŸ ( italic_m start_POSTSUPERSCRIPT β€² end_POSTSUPERSCRIPT ) and ξくしー′=ξくしー⁒(mβ€²)superscriptπœ‰β€²πœ‰superscriptπ‘šβ€²\xi^{\prime}=\xi(m^{\prime})italic_ξくしー start_POSTSUPERSCRIPT β€² end_POSTSUPERSCRIPT = italic_ξくしー ( italic_m start_POSTSUPERSCRIPT β€² end_POSTSUPERSCRIPT ).

A non-degenerate point estimator ΞΈγ—γƒΌγŸ^^πœƒ\hat{\theta}over^ start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG whose first two moments are smooth functions on Ξ˜γ—γƒΌγŸΞ˜γ—γƒΌγŸ\Thetaroman_Ξ˜γ—γƒΌγŸ is a pre-estimator so that

ΞΈγ—γƒΌγŸ^βˆ’π–€β’ΞΈγ—γƒΌγŸ^βˆˆπ’’.^πœƒπ–€^πœƒπ’’\hat{\theta}-\mathsf{E}\hat{\theta}\in\mathcal{G}.over^ start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG - sansserif_E over^ start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG ∈ caligraphic_G .

We use the sans serif notation because as a pre-estimator ΞΈγ—γƒΌγŸ^⁒(y,β‹…)^πœƒπ‘¦β‹…\hat{\theta}(y,\cdot)over^ start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG ( italic_y , β‹… ) is a function on the parameter space, the constant function taking the value of the point estimate at y𝑦yitalic_y. The estimator need not be unbiased, so that generalized estimation can be used to compare biased and unbiased point estimators as well as estimators not constrained to be constant on Ξ˜γ—γƒΌγŸΞ˜γ—γƒΌγŸ\Thetaroman_Ξ˜γ—γƒΌγŸ. Generalized estimators are compared in terms of their information.

Definition 2.

The information for scalar parameter ΞΈγ—γƒΌγŸπœƒ\thetaitalic_ΞΈγ—γƒΌγŸ utilized by g𝑔gitalic_g is

Λらむだ⁒(g)=(π–€β’βˆ‡gΒ―)2=(π–€β’βˆ‡g)2𝖀⁒(g2).Λらむだ𝑔superscriptπ–€βˆ‡Β―π‘”2superscriptπ–€βˆ‡π‘”2𝖀superscript𝑔2\Lambda(g)=\left(\mathsf{E}\nabla\bar{g}\right)^{2}=\frac{(\mathsf{E}\nabla g)% ^{2}}{\mathsf{E}(g^{2})}.roman_Λらむだ ( italic_g ) = ( sansserif_E βˆ‡ overΒ― start_ARG italic_g end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG ( sansserif_E βˆ‡ italic_g ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG sansserif_E ( italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG . (11)

where the second equality follows from the definition of g¯¯𝑔\bar{g}overΒ― start_ARG italic_g end_ARG and 𝖀⁒g=0𝖀𝑔0\mathsf{E}g=0sansserif_E italic_g = 0.

The Fisher information for a sample of size n𝑛nitalic_n, I(n)subscript𝐼𝑛I_{(n)}italic_I start_POSTSUBSCRIPT ( italic_n ) end_POSTSUBSCRIPT, and the Fisher information in a single observation, I(1)subscript𝐼1I_{(1)}italic_I start_POSTSUBSCRIPT ( 1 ) end_POSTSUBSCRIPT, satisfy I(n)=n⁒I(1)subscript𝐼𝑛𝑛subscript𝐼1I_{(n)}=nI_{(1)}italic_I start_POSTSUBSCRIPT ( italic_n ) end_POSTSUBSCRIPT = italic_n italic_I start_POSTSUBSCRIPT ( 1 ) end_POSTSUBSCRIPT. This relationship also holds for the information utilized by an estimator

Λらむだ⁒(g(n))=n⁒Λらむだ⁒(g(1)).Λらむだsubscript𝑔𝑛𝑛Λらむだsubscript𝑔1\Lambda(g_{(n)})=n\Lambda(g_{(1)}).roman_Λらむだ ( italic_g start_POSTSUBSCRIPT ( italic_n ) end_POSTSUBSCRIPT ) = italic_n roman_Λらむだ ( italic_g start_POSTSUBSCRIPT ( 1 ) end_POSTSUBSCRIPT ) . (12)

When considering only samples of size n𝑛nitalic_n we use I=I(n)𝐼subscript𝐼𝑛I=I_{(n)}italic_I = italic_I start_POSTSUBSCRIPT ( italic_n ) end_POSTSUBSCRIPT and Λらむだ⁒(g)=Λらむだ⁒(g(n))Λらむだ𝑔Λらむだsubscript𝑔𝑛\Lambda(g)=\Lambda(g_{(n)})roman_Λらむだ ( italic_g ) = roman_Λらむだ ( italic_g start_POSTSUBSCRIPT ( italic_n ) end_POSTSUBSCRIPT ).

As the score is the archetype for a generalized estimator g𝑔gitalic_g, the log likelihood function is the archetype for the scalar potential.

Definition 3.

A scalar potential of g𝑔gitalic_g is any function G:π’΄Γ—Ξ˜γ—γƒΌγŸβŸΆβ„:πΊβŸΆπ’΄Ξ˜γ—γƒΌγŸβ„G:\mathcal{Y}\times\Theta\longrightarrow\mathbb{R}italic_G : caligraphic_Y Γ— roman_Ξ˜γ—γƒΌγŸ ⟢ blackboard_R such that βˆ‡G=gβˆ‡πΊπ‘”\nabla G=gβˆ‡ italic_G = italic_g.

While Gβˆ‰π’’πΊπ’’G\not\in\mathcal{G}italic_G βˆ‰ caligraphic_G we define the information utilized by G𝐺Gitalic_G to be the information of its derivative: Λらむだ⁒(G)=Λらむだ⁒(g)Λらむだ𝐺Λらむだ𝑔\Lambda(G)=\Lambda(g)roman_Λらむだ ( italic_G ) = roman_Λらむだ ( italic_g ). Information is a local property and so does not distinguish between a generalized estimator and its scalar potential. The scalar potential is useful for finding confidence regions especially when the parameterization is multidimensional.

We assume differentiation commutes with the integral sign so for any pre-estimator f𝑓fitalic_f

βˆ‡(𝖀⁒f)=𝖀⁒(βˆ‡f)+(βˆ‡π–€)⁒(f)βˆ‡π–€π‘“π–€βˆ‡π‘“βˆ‡π–€π‘“\nabla\left(\mathsf{E}f\right)=\mathsf{E}\left(\nabla f\right)+\left(\nabla% \mathsf{E}\right)\left(f\right)βˆ‡ ( sansserif_E italic_f ) = sansserif_E ( βˆ‡ italic_f ) + ( βˆ‡ sansserif_E ) ( italic_f ) (13)

where (βˆ‡π–€)βˆ‡π–€\left(\nabla\mathsf{E}\right)( βˆ‡ sansserif_E ) is the linear operator on HMsubscript𝐻𝑀H_{M}italic_H start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT defined by

(βˆ‡π–€)⁒(h)=𝖀⁒((βˆ‡β„“)⁒h).βˆ‡π–€β„Žπ–€βˆ‡β„“β„Ž\left(\nabla\mathsf{E}\right)(h)=\mathsf{E}\left(\left(\nabla\ell\right)h% \right).( βˆ‡ sansserif_E ) ( italic_h ) = sansserif_E ( ( βˆ‡ roman_β„“ ) italic_h ) .

Note that we use f𝑓fitalic_f and g𝑔gitalic_g for functions on π’΄Γ—Ξ˜γ—γƒΌγŸπ’΄Ξ˜γ—γƒΌγŸ\mathcal{Y}\times\Thetacaligraphic_Y Γ— roman_Ξ˜γ—γƒΌγŸ while h∈HMβ„Žsubscript𝐻𝑀h\in H_{M}italic_h ∈ italic_H start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT is a function on 𝒴𝒴\mathcal{Y}caligraphic_Y. For generalized estimator g𝑔gitalic_g, 𝖀⁒g𝖀𝑔\mathsf{E}gsansserif_E italic_g vanishes so (13) becomes, after switching left- and right-hand sides, the score equation

𝖀⁒(βˆ‡g)+𝖀⁒(s⁒g)=0.π–€βˆ‡π‘”π–€π‘ π‘”0\mathsf{E}\left(\nabla g\right)+\mathsf{E}\left(sg\right)=0.sansserif_E ( βˆ‡ italic_g ) + sansserif_E ( italic_s italic_g ) = 0 . (14)

When g=s𝑔𝑠g=sitalic_g = italic_s, the score equation gives the equivalent definitions of the Fisher information for ΞΈγ—γƒΌγŸπœƒ\thetaitalic_ΞΈγ—γƒΌγŸ

I=βˆ’π–€β’(βˆ‡s)=𝖀⁒(s2).πΌπ–€βˆ‡π‘ π–€superscript𝑠2I=-\mathsf{E}(\nabla s)=\mathsf{E}(s^{2}).italic_I = - sansserif_E ( βˆ‡ italic_s ) = sansserif_E ( italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) .

The information upper bound follows from the score identity.

Theorem 1.

The information for ΞΈγ—γƒΌγŸπœƒ\thetaitalic_ΞΈγ—γƒΌγŸ utilized by g𝑔gitalic_g is bounded by the Fisher information

Λらむだ⁒(g)Λらむだ𝑔\displaystyle\Lambda\left(g\right)roman_Λらむだ ( italic_g ) ≀\displaystyle\leq≀ I.𝐼\displaystyle I.italic_I .

Furthermore, the score s𝑠sitalic_s attains this bound and for any gβˆˆπ’’π‘”π’’g\in\mathcal{G}italic_g ∈ caligraphic_G

Λらむだ⁒(g)Λらむだ𝑔\displaystyle\Lambda(g)roman_Λらむだ ( italic_g ) =\displaystyle== 𝖡⁒(𝖯g⁒s)𝖡subscript𝖯𝑔𝑠\displaystyle\mathsf{V}(\mathsf{P}_{g}s)sansserif_V ( sansserif_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT italic_s )
=\displaystyle== 𝖱2⁒Isuperscript𝖱2𝐼\displaystyle\mathsf{R}^{2}Isansserif_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I

where 𝖯g⁒ssubscript𝖯𝑔𝑠\mathsf{P}_{g}ssansserif_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT italic_s is the projection of s𝑠sitalic_s onto the space spanned by g𝑔gitalic_g and 𝖱=𝖀⁒(s¯⁒gΒ―)𝖱𝖀¯𝑠¯𝑔\mathsf{R}=\mathsf{E}(\bar{s}\bar{g})sansserif_R = sansserif_E ( overΒ― start_ARG italic_s end_ARG overΒ― start_ARG italic_g end_ARG ) is the correlation between s𝑠sitalic_s and g𝑔gitalic_g.

Proof.

From the score equation

Λらむだ⁒(g)=𝖀2⁒(s⁒gΒ―).Λらむだ𝑔superscript𝖀2𝑠¯𝑔\Lambda(g)=\mathsf{E}^{2}(s\bar{g}).roman_Λらむだ ( italic_g ) = sansserif_E start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_s overΒ― start_ARG italic_g end_ARG ) . (15)

The second displayed equality follows upon noting that 𝖀2⁒(s⁒gΒ―)=𝖀2⁒(s¯⁒gΒ―)⁒I=𝖱2⁒Isuperscript𝖀2𝑠¯𝑔superscript𝖀2¯𝑠¯𝑔𝐼superscript𝖱2𝐼\mathsf{E}^{2}(s\bar{g})=\mathsf{E}^{2}(\bar{s}\bar{g})I=\mathsf{R}^{2}Isansserif_E start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_s overΒ― start_ARG italic_g end_ARG ) = sansserif_E start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( overΒ― start_ARG italic_s end_ARG overΒ― start_ARG italic_g end_ARG ) italic_I = sansserif_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I. The first equality follows by expressing the projection using basis vector g¯¯𝑔\bar{g}overΒ― start_ARG italic_g end_ARG

𝖡⁒(𝖯g⁒s)𝖡subscript𝖯𝑔𝑠\displaystyle\mathsf{V}(\mathsf{P}_{g}s)sansserif_V ( sansserif_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT italic_s ) =\displaystyle== 𝖡⁒(𝖀⁒(s⁒gΒ―)⁒gΒ―)𝖡𝖀𝑠¯𝑔¯𝑔\displaystyle\mathsf{V}\left(\mathsf{E}(s\bar{g})\bar{g}\right)sansserif_V ( sansserif_E ( italic_s overΒ― start_ARG italic_g end_ARG ) overΒ― start_ARG italic_g end_ARG )
=\displaystyle== 𝖀2⁒(s⁒gΒ―).superscript𝖀2𝑠¯𝑔\displaystyle\mathsf{E}^{2}(s\bar{g}).sansserif_E start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_s overΒ― start_ARG italic_g end_ARG ) .

∎

Efficiency of a point estimator is defined using the ratio of its variance to the variance bound. Efficiency of a generalized estimator is defined as the ratio of its information to the information bound, I𝐼Iitalic_I.

Definition 4.

The ΛらむだΛらむだ\Lambdaroman_Λらむだ-efficiency of g𝑔gitalic_g is

EffΛらむだ⁒(g)=Iβˆ’1⁒Λらむだ⁒(g).superscriptEffΛらむだ𝑔superscript𝐼1Λらむだ𝑔\mbox{Eff}^{\Lambda}(g)=I^{-1}\Lambda(g).Eff start_POSTSUPERSCRIPT roman_Λらむだ end_POSTSUPERSCRIPT ( italic_g ) = italic_I start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Λらむだ ( italic_g ) .

An immediate corollary of Theorem 1 is that the ΛらむだΛらむだ\Lambdaroman_Λらむだ-efficiency is the square of the correlation between the estimator and the score.

Corollary 1.
EffΛらむだ⁒(g)superscriptEffΛらむだ𝑔\displaystyle\mbox{Eff}^{\Lambda}(g)Eff start_POSTSUPERSCRIPT roman_Λらむだ end_POSTSUPERSCRIPT ( italic_g ) =\displaystyle== 𝖡⁒(𝖯g⁒sΒ―)𝖡subscript𝖯𝑔¯𝑠\displaystyle\mathsf{V}\left(\mathsf{P}_{g}\bar{s}\right)sansserif_V ( sansserif_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT overΒ― start_ARG italic_s end_ARG )
=\displaystyle== 𝖱2.superscript𝖱2\displaystyle\mathsf{R}^{2}.sansserif_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

The ΛらむだΛらむだ\Lambdaroman_Λらむだ-efficiency of a point estimator ΞΈγ—γƒΌγŸ^^πœƒ\hat{\theta}over^ start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG is the ΛらむだΛらむだ\Lambdaroman_Λらむだ-efficiency of its generalized estimator gΞΈγ—γƒΌγŸ^=ΞΈγ—γƒΌγŸ^βˆ’π–€β’ΞΈγ—γƒΌγŸ^subscript𝑔^πœƒ^πœƒπ–€^πœƒg_{\hat{\theta}}=\hat{\theta}-\mathsf{E}\hat{\theta}italic_g start_POSTSUBSCRIPT over^ start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG end_POSTSUBSCRIPT = over^ start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG - sansserif_E over^ start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG. When ΞΈγ—γƒΌγŸ^^πœƒ\hat{\theta}over^ start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG is unbiased Λらむだ⁒(gΞΈγ—γƒΌγŸ^)=π–΅βˆ’1⁒(ΞΈγ—γƒΌγŸ^)Λらむだsubscript𝑔^πœƒsuperscript𝖡1^πœƒ\Lambda(g_{\hat{\theta}})=\mathsf{V}^{-1}(\hat{\theta})roman_Λらむだ ( italic_g start_POSTSUBSCRIPT over^ start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG end_POSTSUBSCRIPT ) = sansserif_V start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( over^ start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG ) so that ΛらむだΛらむだ\Lambdaroman_Λらむだ-efficiency is identical to efficiency based on variance.

Even though these efficiencies can take the same numerical value, it is incorrect to characterize the information as the reciprocal of the variance. The information at ΞΈγ—γƒΌγŸβ€²=ΞΈγ—γƒΌγŸβ’(mβ€²)superscriptπœƒβ€²πœƒsuperscriptπ‘šβ€²\theta^{\prime}=\theta(m^{\prime})italic_ΞΈγ—γƒΌγŸ start_POSTSUPERSCRIPT β€² end_POSTSUPERSCRIPT = italic_ΞΈγ—γƒΌγŸ ( italic_m start_POSTSUPERSCRIPT β€² end_POSTSUPERSCRIPT ), Λらむだ⁒(g)|ΞΈγ—γƒΌγŸ=ΞΈγ—γƒΌγŸβ€²evaluated-atΞ›γ‚‰γ‚€γ π‘”πœƒsuperscriptπœƒβ€²\Lambda(g)|_{\theta=\theta^{\prime}}roman_Λらむだ ( italic_g ) | start_POSTSUBSCRIPT italic_ΞΈγ—γƒΌγŸ = italic_ΞΈγ—γƒΌγŸ start_POSTSUPERSCRIPT β€² end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, is a measure of how g𝑔gitalic_g changes in a neighborhood mβ€²βˆˆMsuperscriptπ‘šβ€²π‘€m^{\prime}\in Mitalic_m start_POSTSUPERSCRIPT β€² end_POSTSUPERSCRIPT ∈ italic_M; that is, information depends on M𝑀Mitalic_M. The variance at ΞΈγ—γƒΌγŸβ€²superscriptπœƒβ€²\theta^{\prime}italic_ΞΈγ—γƒΌγŸ start_POSTSUPERSCRIPT β€² end_POSTSUPERSCRIPT, 𝖡⁒(g)|ΞΈγ—γƒΌγŸ=ΞΈγ—γƒΌγŸβ€²evaluated-atπ–΅π‘”πœƒsuperscriptπœƒβ€²\mathsf{V}(g)|_{\theta=\theta^{\prime}}sansserif_V ( italic_g ) | start_POSTSUBSCRIPT italic_ΞΈγ—γƒΌγŸ = italic_ΞΈγ—γƒΌγŸ start_POSTSUPERSCRIPT β€² end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, depends only on mβ€²superscriptπ‘šβ€²m^{\prime}italic_m start_POSTSUPERSCRIPT β€² end_POSTSUPERSCRIPT; it is the same for the countless manifolds we could choose that contain mβ€²superscriptπ‘šβ€²m^{\prime}italic_m start_POSTSUPERSCRIPT β€² end_POSTSUPERSCRIPT. Another difference is that variance is defined on horizontal distributions while information is defined on vertical distributions. Horizontal and vertical distributions are described in Example 1.

Example 1.

We consider inference for the proportion of a population having a genetic variation or other specified characteristic. We let 1111 (0) indicate the characteristic is present (absent) so 𝒳={0,1}𝒳01\mathcal{X}=\left\{0,1\right\}caligraphic_X = { 0 , 1 } and for a sample of size n𝑛nitalic_n, M𝑀Mitalic_M is given by (1). Figure 1 shows the standardized score

sΒ―=yβˆ’n⁒pn⁒p⁒(1βˆ’p)¯𝑠𝑦𝑛𝑝𝑛𝑝1𝑝\bar{s}=\frac{y-np}{\sqrt{np(1-p)}}overΒ― start_ARG italic_s end_ARG = divide start_ARG italic_y - italic_n italic_p end_ARG start_ARG square-root start_ARG italic_n italic_p ( 1 - italic_p ) end_ARG end_ARG

where n=20𝑛20n=20italic_n = 20 and p𝑝pitalic_p is the parameter defined by p⁒(m)=m⁒(1)π‘π‘šπ‘š1p(m)=m(1)italic_p ( italic_m ) = italic_m ( 1 ) with parameter space P=(0,1)𝑃01P=(0,1)italic_P = ( 0 , 1 ). The graph of the estimate sΒ―ysubscript¯𝑠𝑦\bar{s}_{y}overΒ― start_ARG italic_s end_ARG start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT when y=6𝑦6y=6italic_y = 6 is the black curve. The estimator s¯¯𝑠\bar{s}overΒ― start_ARG italic_s end_ARG is represented by the family of 21 curves, one for each y𝑦yitalic_y in the sample space (unrealized estimates are shown in white).

Refer to caption
Figure 1: The standardized score estimate sΒ―6subscript¯𝑠6\bar{s}_{6}overΒ― start_ARG italic_s end_ARG start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT obtained from the sample with y=6𝑦6y=6italic_y = 6 and n=20𝑛20n=20italic_n = 20 for the Bernoulli manifold with the parameter p=m⁒(1)π‘π‘š1p=m(1)italic_p = italic_m ( 1 ) is shown by the black curve. The standardized score estimator s¯¯𝑠\bar{s}overΒ― start_ARG italic_s end_ARG is represented by the family of 21 curves, one for each y𝑦yitalic_y in the sample space (unrealized estimates are shown in white). Of the continuum of vertical slices two are shown at p=.50𝑝.50p=.50italic_p = .50 and p=.55𝑝.55p=.55italic_p = .55. The distribution of the point estimate p^^𝑝\hat{p}over^ start_ARG italic_p end_ARG is shown by the intersection of these 21 curves with the horizontal axis. Note that for two of these curves the intersection occurs for a value outside of the parameter space.

Of the continuum of vertical slices two are shown, one at p=.50𝑝.50p=.50italic_p = .50 and another at p=.55𝑝.55p=.55italic_p = .55. Every vertical slice for 0<p<10𝑝10<p<10 < italic_p < 1 intersects all 21 curves and while the ordinate of these points of intersection depends on p𝑝pitalic_p the resulting distributions all have mean zero and variance one. These vertical distributions are the same in every parameterization. For any parameter ΞΈγ—γƒΌγŸπœƒ\thetaitalic_ΞΈγ—γƒΌγŸ, s¯⁒(y,p⁒(mβ€²))=sΒ―Ξ˜γ—γƒΌγŸβ’(y,ΞΈγ—γƒΌγŸβ’(mβ€²))¯𝑠𝑦𝑝superscriptπ‘šβ€²subscriptΒ―π‘ Ξ˜γ—γƒΌγŸπ‘¦πœƒsuperscriptπ‘šβ€²\bar{s}(y,p(m^{\prime}))=\bar{s}_{\Theta}(y,\theta(m^{\prime}))overΒ― start_ARG italic_s end_ARG ( italic_y , italic_p ( italic_m start_POSTSUPERSCRIPT β€² end_POSTSUPERSCRIPT ) ) = overΒ― start_ARG italic_s end_ARG start_POSTSUBSCRIPT roman_Ξ˜γ—γƒΌγŸ end_POSTSUBSCRIPT ( italic_y , italic_ΞΈγ—γƒΌγŸ ( italic_m start_POSTSUPERSCRIPT β€² end_POSTSUPERSCRIPT ) ) for all y𝑦yitalic_y and all mβ€²superscriptπ‘šβ€²m^{\prime}italic_m start_POSTSUPERSCRIPT β€² end_POSTSUPERSCRIPT. In contrast, the abscissa values obtained from the intersection of these curves with the parameter axis are the same for all p𝑝pitalic_p but the mean and variance of these horizontal distributions depends on the value of the parameter and on the choice of parameterization. The horizontal distributions describe the inferential properties in terms of the mean and variance of the roots of s𝑠sitalic_s while the vertical distributions describe how each estimate sΒ―ysubscript¯𝑠𝑦\bar{s}_{y}overΒ― start_ARG italic_s end_ARG start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT changes with the parameter.

When the maximum likelihood estimator exists and is unique it is, by definition, the parameter-intercept of the score, p^=sβˆ’1⁒(0)^𝑝superscript𝑠10\hat{p}=s^{-1}(0)over^ start_ARG italic_p end_ARG = italic_s start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( 0 ). For y=0𝑦0y=0italic_y = 0 and for y=20𝑦20y=20italic_y = 20, the maximum likelihood estimate does not exist since sysubscript𝑠𝑦s_{y}italic_s start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT does not cross the parameter axis. Even when the point estimate does not exist, confidence regions can be constructed from the standardized score s¯¯𝑠\bar{s}overΒ― start_ARG italic_s end_ARG. All 21 estimates sΒ―ysubscript¯𝑠𝑦\bar{s}_{y}overΒ― start_ARG italic_s end_ARG start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT provide z𝑧zitalic_z-standard deviation intervals

CIβˆ’z⁒(y)={p:sΒ―y⁒(p)β‰₯βˆ’z},CI+z⁒(y)={p:sΒ―y⁒(p)≀z}.formulae-sequencesubscriptCI𝑧𝑦conditional-set𝑝subscript¯𝑠𝑦𝑝𝑧subscriptCI𝑧𝑦conditional-set𝑝subscript¯𝑠𝑦𝑝𝑧\mbox{CI}_{-z}(y)=\left\{p:\bar{s}_{y}(p)\geq-z\right\},\mbox{CI}_{+z}(y)=% \left\{p:\bar{s}_{y}(p)\leq z\right\}.CI start_POSTSUBSCRIPT - italic_z end_POSTSUBSCRIPT ( italic_y ) = { italic_p : overΒ― start_ARG italic_s end_ARG start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ( italic_p ) β‰₯ - italic_z } , CI start_POSTSUBSCRIPT + italic_z end_POSTSUBSCRIPT ( italic_y ) = { italic_p : overΒ― start_ARG italic_s end_ARG start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ( italic_p ) ≀ italic_z } .

The intersection of the curve sΒ―6subscript¯𝑠6\bar{s}_{6}overΒ― start_ARG italic_s end_ARG start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT with the white lines at sΒ―y=Β±2subscript¯𝑠𝑦plus-or-minus2\bar{s}_{y}=\pm 2overΒ― start_ARG italic_s end_ARG start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT = Β± 2 in Figure 1 show the endpoints of CIβˆ’2⁒(6)subscriptCI26\mbox{CI}_{-2}(6)CI start_POSTSUBSCRIPT - 2 end_POSTSUBSCRIPT ( 6 ) and CI+2⁒(6)subscriptCI26\mbox{CI}_{+2}(6)CI start_POSTSUBSCRIPT + 2 end_POSTSUBSCRIPT ( 6 ). Since generalized estimators are parameter invariant, these intervals correspond to subsets of the space of models M𝑀Mitalic_M. The interpretation of these intervals can be stated in terms of their complement: if the true model is not in CIβˆ’2⁒(y)subscriptCI2𝑦\mbox{CI}_{-2}(y)CI start_POSTSUBSCRIPT - 2 end_POSTSUBSCRIPT ( italic_y ) or CI+2⁒(y)subscriptCI2𝑦\mbox{CI}_{+2}(y)CI start_POSTSUBSCRIPT + 2 end_POSTSUBSCRIPT ( italic_y ) then the score test for the observed data y𝑦yitalic_y is at least two standard deviations from zero. That is, for models outside these intervals the observed data y𝑦yitalic_y would be improbable since the score is at least two standard deviations from zero. Intervals based on tail probabilities can be obtained by allowing z𝑧zitalic_z to be a function of the parameter; for CI+z⁒(6)subscriptCI𝑧6\mbox{CI}_{+z}(6)CI start_POSTSUBSCRIPT + italic_z end_POSTSUBSCRIPT ( 6 ) the value for z𝑧zitalic_z would be obtained using the mass assigned to the values {0,1,…,5,6}01…56\{0,1,\ldots,5,6\}{ 0 , 1 , … , 5 , 6 }.

Figure 2 shows the log likelihood ratio statistic S𝑆Sitalic_S for y=6𝑦6y=6italic_y = 6 and its distribution on the other 20 values in the sample space. The vertical slices at p=.50𝑝.50p=.50italic_p = .50 and p=.55𝑝.55p=.55italic_p = .55 correspond to those from Figure 1 but the circles are only plotted when the slope of the intersecting curve is negative. Each vertical slice has 6 points of intersection corresponding to samples as extreme as y=6𝑦6y=6italic_y = 6. The resulting p-value is the same as for the score. This will be true for any vertical slice so that inference from the score and the signed log likelihood ratio are identical in this example. This will not be true when the curves of the estimator g𝑔gitalic_g intersect. Also, inference from g𝑔gitalic_g and unsigned scalar potential function G𝐺Gitalic_G will not be identical. In particular, the score and unsigned log likelihood ratio are not identical in this example.

Refer to caption
Figure 2: Twice the log likelihood ratio statistic obtained from observing y=6𝑦6y=6italic_y = 6 out of a sample of size n=20𝑛20n=20italic_n = 20 for the Bernoulli manifold with the parameter p=m⁒(1)π‘π‘š1p=m(1)italic_p = italic_m ( 1 ) is shown by the black curve. The distribution of twice the log likelihood ratio statistic is represented by the black curve and 20 white curves.
Example 2.

– We consider the same population as before but now the variable of interest is a measured quantity and we choose M𝒳subscript𝑀𝒳M_{\mathcal{X}}italic_M start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT to be the Cauchy family so that for a random sample of size n𝑛nitalic_n, M𝑀Mitalic_M is given by (3). For comparison we also consider models from the Normal family for which the family of sampling distributions is given by (2); we use Mπ™ΆπšŠπšžπšœsubscriptπ‘€π™ΆπšŠπšžπšœM_{{\tt Gaus}}italic_M start_POSTSUBSCRIPT typewriter_Gaus end_POSTSUBSCRIPT to identify this manifold. For parameterization ΞΈγ—γƒΌγŸπœƒ\thetaitalic_ΞΈγ—γƒΌγŸ, the graph of a generalized estimate gΒ―ysubscript¯𝑔𝑦\bar{g}_{y}overΒ― start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT for an observation y=(x1,x2,…,xn)πšπ‘¦superscriptsubscriptπ‘₯1subscriptπ‘₯2…subscriptπ‘₯π‘›πšy=\left(x_{1},x_{2},\ldots,x_{n}\right)^{{\tt t}}italic_y = ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT typewriter_t end_POSTSUPERSCRIPT is a curve over the parameter space Ξ˜γ—γƒΌγŸΞ˜γ—γƒΌγŸ\Thetaroman_Ξ˜γ—γƒΌγŸ. This corresponds to the black curve in the previous example. The distribution of the estimator g¯¯𝑔\bar{g}overΒ― start_ARG italic_g end_ARG is more difficult to represent since there are a continuum of curves indexed by y𝑦yitalic_y. For Mπ™ΆπšŠπšžπšœsubscriptπ‘€π™ΆπšŠπšžπšœM_{{\tt Gaus}}italic_M start_POSTSUBSCRIPT typewriter_Gaus end_POSTSUBSCRIPT there is also a continuum of curves but now the sufficient statistic xΒ―=nβˆ’1β’βˆ‘xiΒ―π‘₯superscript𝑛1subscriptπ‘₯𝑖\bar{x}=n^{-1}\sum x_{i}overΒ― start_ARG italic_x end_ARG = italic_n start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT βˆ‘ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT provides a one dimensional index. Nevertheless, the properties of the vertical distributions for M𝑀Mitalic_M and Mπ™ΆπšŠπšžπšœsubscriptπ‘€π™ΆπšŠπšžπšœM_{{\tt Gaus}}italic_M start_POSTSUBSCRIPT typewriter_Gaus end_POSTSUBSCRIPT still hold and confidence regions for gysubscript𝑔𝑦g_{y}italic_g start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT are defined in the same way.

3 Multi-parameter Families

We consider inference for a parameter ΞΈγ—γƒΌγŸ=(ΞΈγ—γƒΌγŸ1,ΞΈγ—γƒΌγŸ2,…,ΞΈγ—γƒΌγŸk)πšβˆˆβ„kπœƒsuperscriptsuperscriptπœƒ1superscriptπœƒ2…superscriptπœƒπ‘˜πšsuperscriptβ„π‘˜\theta=\left(\theta^{1},\theta^{2},\ldots,\theta^{k}\right)^{{\tt t}}\in% \mathbb{R}^{k}italic_ΞΈγ—γƒΌγŸ = ( italic_ΞΈγ—γƒΌγŸ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , italic_ΞΈγ—γƒΌγŸ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , … , italic_ΞΈγ—γƒΌγŸ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT typewriter_t end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT in the presence of a kβ€²superscriptπ‘˜β€²k^{\prime}italic_k start_POSTSUPERSCRIPT β€² end_POSTSUPERSCRIPT-dimensional nuisance parameter ΞΈγ—γƒΌγŸ~=(ΞΈγ—γƒΌγŸ~1,ΞΈγ—γƒΌγŸ~2,…,ΞΈγ—γƒΌγŸ~kβ€²)𝚝~πœƒsuperscriptsuperscript~πœƒ1superscript~πœƒ2…superscript~πœƒsuperscriptπ‘˜β€²πš\undertilde{\theta}=(\undertilde{\theta}^{1},\undertilde{\theta}^{2},\ldots,% \undertilde{\theta}^{k^{\prime}})^{{\tt t}}under~ start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG = ( under~ start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , under~ start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , … , under~ start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG start_POSTSUPERSCRIPT italic_k start_POSTSUPERSCRIPT β€² end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT typewriter_t end_POSTSUPERSCRIPT so that M𝑀Mitalic_M is a manifold of dimension (k+kβ€²)π‘˜superscriptπ‘˜β€²(k+k^{\prime})( italic_k + italic_k start_POSTSUPERSCRIPT β€² end_POSTSUPERSCRIPT ) and ΞΈγ—γƒΌγŸΒ―πš=(ΞΈγ—γƒΌγŸπš,ΞΈγ—γƒΌγŸ~𝚝)superscriptΒ―πœƒπšsuperscriptπœƒπšsuperscript~πœƒπš\text{$\underline{\theta}$}^{{\tt t}}=(\theta^{{\tt t}},\undertilde{\theta}^{{% \tt t}})underΒ― start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG start_POSTSUPERSCRIPT typewriter_t end_POSTSUPERSCRIPT = ( italic_ΞΈγ—γƒΌγŸ start_POSTSUPERSCRIPT typewriter_t end_POSTSUPERSCRIPT , under~ start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG start_POSTSUPERSCRIPT typewriter_t end_POSTSUPERSCRIPT ) is a global parameterization ΞΈγ—γƒΌγŸΒ―:Mβ†’Ξ˜γ—γƒΌγŸΒ―:Β―πœƒβ†’π‘€Β―Ξ˜γ—γƒΌγŸ\text{$\underline{\theta}$}:M\rightarrow\text{$\underline{\Theta}$}underΒ― start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG : italic_M β†’ underΒ― start_ARG roman_Ξ˜γ—γƒΌγŸ end_ARG. We use βˆ‡Β―Β―βˆ‡\overline{\nabla}overΒ― start_ARG βˆ‡ end_ARG, βˆ‡βˆ‡\nablaβˆ‡, and βˆ‡~~βˆ‡\widetilde{\nabla}over~ start_ARG βˆ‡ end_ARG to indicate differentiation with respect to ΞΈγ—γƒΌγŸΒ―Β―πœƒ\underline{\theta}underΒ― start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG, ΞΈγ—γƒΌγŸπœƒ\thetaitalic_ΞΈγ—γƒΌγŸ, and ΞΈγ—γƒΌγŸ~~πœƒ\undertilde{\theta}under~ start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG, respectively, so that

s~=βˆ‡~⁒ℓ=(βˆ‚β„“/βˆ‚ΞΈγ—γƒΌγŸ~1,βˆ‚β„“/βˆ‚ΞΈγ—γƒΌγŸ~2,…,βˆ‚β„“/βˆ‚ΞΈγ—γƒΌγŸ~kβ€²)𝚝.~𝑠~βˆ‡β„“superscriptβ„“superscript~πœƒ1β„“superscript~πœƒ2…ℓsuperscript~πœƒsuperscriptπ‘˜β€²πš\undertilde{s}=\widetilde{\nabla}\ell=(\partial\ell/\partial\undertilde{\theta% }^{1},\partial\ell/\partial\undertilde{\theta}^{2},\ldots,\partial\ell/% \partial\undertilde{\theta}^{k^{\prime}})^{{\tt t}}.under~ start_ARG italic_s end_ARG = over~ start_ARG βˆ‡ end_ARG roman_β„“ = ( βˆ‚ roman_β„“ / βˆ‚ under~ start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , βˆ‚ roman_β„“ / βˆ‚ under~ start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , … , βˆ‚ roman_β„“ / βˆ‚ under~ start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG start_POSTSUPERSCRIPT italic_k start_POSTSUPERSCRIPT β€² end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT typewriter_t end_POSTSUPERSCRIPT .

Note that subscripts are used for the components of g𝑔gitalic_g while superscripts are used for ΞΈγ—γƒΌγŸπœƒ\thetaitalic_ΞΈγ—γƒΌγŸ. This convention allows us to use the Einstein summation convention for calculations involving bases. It also reminds us that the component gasubscriptπ‘”π‘Žg_{a}italic_g start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT is not a point estimate for ΞΈγ—γƒΌγŸasuperscriptπœƒπ‘Ž\theta^{a}italic_ΞΈγ—γƒΌγŸ start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT; if it were, we would want to use superscripts for the components of g𝑔gitalic_g. While ΞΈγ—γƒΌγŸπœƒ\thetaitalic_ΞΈγ—γƒΌγŸ and g𝑔gitalic_g are both kπ‘˜kitalic_k-tuples, geometrically ΞΈγ—γƒΌγŸπœƒ\thetaitalic_ΞΈγ—γƒΌγŸ is a contra-variant (tangent) vector while g𝑔gitalic_g is a covariant vector as its components co-vary with the change of basis.

Generalized estimators may depend on the value of the nuisance parameter but we can make them independent of the nuisance parameterization by restricting to functions that are orthogonal to s~~𝑠\undertilde{s}under~ start_ARG italic_s end_ARG. For any fixed m∘∈Msubscriptπ‘šπ‘€m_{\circ}\in Mitalic_m start_POSTSUBSCRIPT ∘ end_POSTSUBSCRIPT ∈ italic_M there is a kβ€²superscriptπ‘˜β€²k^{\prime}italic_k start_POSTSUPERSCRIPT β€² end_POSTSUPERSCRIPT-dimensional submanifold through m∘subscriptπ‘šm_{\circ}italic_m start_POSTSUBSCRIPT ∘ end_POSTSUBSCRIPT

M|m∘={m∈M:ΞΈγ—γƒΌγŸβ’(m)=ΞΈγ—γƒΌγŸβˆ˜}evaluated-at𝑀subscriptπ‘šconditional-setπ‘šπ‘€πœƒπ‘šsubscriptπœƒM|_{m_{\circ}}=\left\{m\in M:\theta(m)=\theta_{\circ}\right\}italic_M | start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT ∘ end_POSTSUBSCRIPT end_POSTSUBSCRIPT = { italic_m ∈ italic_M : italic_ΞΈγ—γƒΌγŸ ( italic_m ) = italic_ΞΈγ—γƒΌγŸ start_POSTSUBSCRIPT ∘ end_POSTSUBSCRIPT }

where ΞΈγ—γƒΌγŸβˆ˜=ΞΈγ—γƒΌγŸβ’(m∘).subscriptπœƒπœƒsubscriptπ‘š\theta_{\circ}=\theta(m_{\circ}).italic_ΞΈγ—γƒΌγŸ start_POSTSUBSCRIPT ∘ end_POSTSUBSCRIPT = italic_ΞΈγ—γƒΌγŸ ( italic_m start_POSTSUBSCRIPT ∘ end_POSTSUBSCRIPT ) .The tangent space of M|m∘evaluated-at𝑀subscriptπ‘šM|_{m_{\circ}}italic_M | start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT ∘ end_POSTSUBSCRIPT end_POSTSUBSCRIPT at m∈M|mβˆ˜π‘ševaluated-at𝑀subscriptπ‘šm\in M|_{m_{\circ}}italic_m ∈ italic_M | start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT ∘ end_POSTSUBSCRIPT end_POSTSUBSCRIPT is

T~m⁒M=span⁒{s~⁒(β‹…,ΞΈγ—γƒΌγŸΒ―)}|ΞΈγ—γƒΌγŸΒ―πš=(ΞΈγ—γƒΌγŸβˆ˜πš,ΞΈγ—γƒΌγŸ~𝚝).subscript~π‘‡π‘šπ‘€evaluated-atspan~π‘ β‹…Β―πœƒsuperscriptΒ―πœƒπšsuperscriptsubscriptπœƒπšsuperscript~πœƒπš\widetilde{T}_{m}M=\mbox{span}\{\undertilde{s}(\cdot,\text{$\underline{\theta}% $})\}|_{\underline{\theta}^{{\tt t}}=(\theta_{\circ}^{{\tt t}},\undertilde{% \theta}^{{\tt t}})}.over~ start_ARG italic_T end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_M = span { under~ start_ARG italic_s end_ARG ( β‹… , underΒ― start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG ) } | start_POSTSUBSCRIPT underΒ― start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG start_POSTSUPERSCRIPT typewriter_t end_POSTSUPERSCRIPT = ( italic_ΞΈγ—γƒΌγŸ start_POSTSUBSCRIPT ∘ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT typewriter_t end_POSTSUPERSCRIPT , under~ start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG start_POSTSUPERSCRIPT typewriter_t end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT .

We will require estimators to be orthogonal to T~m⁒Msubscript~π‘‡π‘šπ‘€\widetilde{T}_{m}Mover~ start_ARG italic_T end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_M and so define

HmβŠ₯={h∈HM:Em⁒h=0,hβŸ‚mT~m⁒M}.superscriptsubscriptπ»π‘šbottomconditional-setβ„Žsubscript𝐻𝑀formulae-sequencesubscriptπΈπ‘šβ„Ž0subscriptperpendicular-toπ‘šβ„Žsubscript~π‘‡π‘šπ‘€H_{m}^{\bot}=\left\{h\in H_{M}:E_{m}h=0,h\perp_{m}\widetilde{T}_{m}M\right\}.italic_H start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT βŠ₯ end_POSTSUPERSCRIPT = { italic_h ∈ italic_H start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT : italic_E start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_h = 0 , italic_h βŸ‚ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT over~ start_ARG italic_T end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_M } .

Equations (4) and (5) for the one dimensional case become

Hmsubscriptπ»π‘š\displaystyle H_{m}italic_H start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT =\displaystyle== HmβŸ‚βŠ•T~m⁒MβŠ•Hm0direct-sumsuperscriptsubscriptπ»π‘šperpendicular-tosubscript~π‘‡π‘šπ‘€superscriptsubscriptπ»π‘š0\displaystyle H_{m}^{\perp}\oplus\widetilde{T}_{m}M\oplus H_{m}^{0}italic_H start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT βŸ‚ end_POSTSUPERSCRIPT βŠ• over~ start_ARG italic_T end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_M βŠ• italic_H start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT (16)
H⁒M𝐻𝑀\displaystyle H\!Mitalic_H italic_M =\displaystyle== HβŸ‚β’MβŠ•T~⁒MβŠ•H0⁒Mdirect-sumsuperscript𝐻perpendicular-to𝑀~𝑇𝑀superscript𝐻0𝑀\displaystyle H^{\perp\!}M\oplus\widetilde{T}\!M\oplus H^{0}\!Mitalic_H start_POSTSUPERSCRIPT βŸ‚ end_POSTSUPERSCRIPT italic_M βŠ• over~ start_ARG italic_T end_ARG italic_M βŠ• italic_H start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT italic_M

When M𝑀Mitalic_M is parameterized by ΞΈγ—γƒΌγŸΒ―Β―πœƒ\underline{\theta}underΒ― start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG, mπ‘šmitalic_m in (16) is replaced with ΞΈγ—γƒΌγŸΒ―=ΞΈγ—γƒΌγŸΒ―β’(m)ΞΈγ—γƒΌγŸΒ―=ΞΈγ—γƒΌγŸΒ―π‘š\text{$\underline{\theta}$=$\underline{\theta}$}(m)underΒ― start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG = underΒ― start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG ( italic_m ).

Definition 5.

A generalized estimator for a kπ‘˜kitalic_k-dimensional parameter ΞΈγ—γƒΌγŸπœƒ\thetaitalic_ΞΈγ—γƒΌγŸ is a function

g:π’΄Γ—Ξ˜γ—γƒΌγŸΒ―βŸΆβ„k:π‘”βŸΆπ’΄Β―Ξ˜γ—γƒΌγŸsuperscriptβ„π‘˜g:\mathcal{Y}\times\underline{\Theta}\longrightarrow\mathbb{R}^{k}italic_g : caligraphic_Y Γ— underΒ― start_ARG roman_Ξ˜γ—γƒΌγŸ end_ARG ⟢ blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT

and gy=g⁒(y,β‹…)subscript𝑔𝑦𝑔𝑦⋅g_{y}=g(y,\cdot)italic_g start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT = italic_g ( italic_y , β‹… ) is the corresponding generalized estimate at y𝑦yitalic_y if

(I)I\displaystyle\mathrm{(I)}( roman_I ) g⁒(y,β‹…)∈C1⁒(Ξ˜γ—γƒΌγŸΒ―,ℝk)⁒ a.e.⁒y𝑔𝑦⋅superscript𝐢1Β―Ξ˜γ—γƒΌγŸsuperscriptβ„π‘˜Β a.e.𝑦\displaystyle\ \ g\left(y,\cdot\right)\in C^{1}(\underline{\Theta},\mathbb{R}^% {k})\mbox{\ a.e.}\ yitalic_g ( italic_y , β‹… ) ∈ italic_C start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( underΒ― start_ARG roman_Ξ˜γ—γƒΌγŸ end_ARG , blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) a.e. italic_y
(II)II\displaystyle\mathrm{(II)}( roman_II ) g⁒(β‹…,ΞΈγ—γƒΌγŸΒ―)∈HΞΈγ—γƒΌγŸΒ―βŠ₯⁒ for allΒ ΞΈγ—γƒΌγŸΒ―βˆˆΞ˜γ—γƒΌγŸΒ―π‘”β‹…Β―πœƒsuperscriptsubscriptπ»Β―πœƒbottomΒ for allΒ ΞΈγ—γƒΌγŸΒ―Β―Ξ˜γ—γƒΌγŸ\displaystyle\ \ g(\cdot,\text{$\underline{\theta}$})\in H_{\text{$\underline{% \theta}$}}^{\bot}\mbox{ for all }\text{$\underline{\theta}$}\in\underline{\Theta}italic_g ( β‹… , underΒ― start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG ) ∈ italic_H start_POSTSUBSCRIPT underΒ― start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT βŠ₯ end_POSTSUPERSCRIPT for all Β―ΞΈγ—γƒΌγŸ ∈ underΒ― start_ARG roman_Ξ˜γ—γƒΌγŸ end_ARG
(III) 𝖡⁒(g)>0𝖡𝑔0\displaystyle\ \ \mathsf{V}\left(g\right)>0sansserif_V ( italic_g ) > 0

where 𝖡⁒(g)=𝖀⁒(g⁒g𝗍)∈C1⁒(Ξ˜γ—γƒΌγŸΒ―,ℝkΓ—k)𝖡𝑔𝖀𝑔superscript𝑔𝗍superscript𝐢1Β―Ξ˜γ—γƒΌγŸsuperscriptβ„π‘˜π‘˜\mathsf{V}(g)=\mathsf{E}(gg^{\mathsf{t}})\in C^{1}\left(\underline{\Theta},% \mathbb{R}^{k\times k}\right)sansserif_V ( italic_g ) = sansserif_E ( italic_g italic_g start_POSTSUPERSCRIPT sansserif_t end_POSTSUPERSCRIPT ) ∈ italic_C start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( underΒ― start_ARG roman_Ξ˜γ—γƒΌγŸ end_ARG , blackboard_R start_POSTSUPERSCRIPT italic_k Γ— italic_k end_POSTSUPERSCRIPT ).

The space of generalized estimators for ΞΈγ—γƒΌγŸπœƒ\thetaitalic_ΞΈγ—γƒΌγŸ is 𝒒𝒒\mathcal{G}caligraphic_G which we write as π’’Ξ˜γ—γƒΌγŸsubscriptπ’’Ξ˜γ—γƒΌγŸ\mathcal{G}_{\Theta}caligraphic_G start_POSTSUBSCRIPT roman_Ξ˜γ—γƒΌγŸ end_POSTSUBSCRIPT if we consider more than one parameterization. If fΞΈγ—γƒΌγŸΒ―=f⁒(β‹…,ΞΈγ—γƒΌγŸΒ―)∈HΞΈγ—γƒΌγŸΒ―subscriptπ‘“Β―πœƒπ‘“β‹…Β―πœƒsubscriptπ»Β―πœƒf_{\text{$\underline{\theta}$}}=f(\cdot,\text{$\underline{\theta}$})\in H_{% \underline{\theta}}italic_f start_POSTSUBSCRIPT underΒ― start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG end_POSTSUBSCRIPT = italic_f ( β‹… , underΒ― start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG ) ∈ italic_H start_POSTSUBSCRIPT underΒ― start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG end_POSTSUBSCRIPT for all ΞΈγ—γƒΌγŸΒ―βˆˆΞ˜γ—γƒΌγŸΒ―Β―πœƒΒ―Ξ˜γ—γƒΌγŸ\text{$\underline{\theta}$}\in\text{$\underline{\Theta}$}underΒ― start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG ∈ underΒ― start_ARG roman_Ξ˜γ—γƒΌγŸ end_ARG and satisfies conditions (I) and (III) of Definition 5 but fΞΈγ—γƒΌγŸΒ―βˆ‰HΞΈγ—γƒΌγŸΒ―βŸ‚subscriptπ‘“Β―πœƒsuperscriptsubscriptπ»Β―πœƒperpendicular-tof_{\text{$\underline{\theta}$}}\not\in H_{\underline{\theta}}^{\perp}italic_f start_POSTSUBSCRIPT underΒ― start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG end_POSTSUBSCRIPT βˆ‰ italic_H start_POSTSUBSCRIPT underΒ― start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT βŸ‚ end_POSTSUPERSCRIPT, then f𝑓fitalic_f is a pre-estimator. The orthogonalization of f𝑓fitalic_f at ΞΈγ—γƒΌγŸΒ―Β―πœƒ\underline{\theta}underΒ― start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG

fΞΈγ—γƒΌγŸΒ―βŠ₯=fΞΈγ—γƒΌγŸΒ―βˆ’fΞΈγ—γƒΌγŸΒ―βŠ€βˆˆHΞΈγ—γƒΌγŸΒ―βŸ‚superscriptsubscriptπ‘“Β―πœƒbottomsubscriptπ‘“Β―πœƒsuperscriptsubscriptπ‘“Β―πœƒtopsuperscriptsubscriptπ»Β―πœƒperpendicular-tof_{\text{$\underline{\theta}$}}^{\bot}=f_{\text{$\underline{\theta}$}}-f_{% \underline{\theta}}^{\top}\in H_{\text{$\underline{\theta}$}}^{\perp}italic_f start_POSTSUBSCRIPT underΒ― start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT βŠ₯ end_POSTSUPERSCRIPT = italic_f start_POSTSUBSCRIPT underΒ― start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG end_POSTSUBSCRIPT - italic_f start_POSTSUBSCRIPT underΒ― start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊀ end_POSTSUPERSCRIPT ∈ italic_H start_POSTSUBSCRIPT underΒ― start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT βŸ‚ end_POSTSUPERSCRIPT (17)

where fΞΈγ—γƒΌγŸΒ―βŠ€=EΞΈγ—γƒΌγŸΒ―β’(fΞΈγ—γƒΌγŸΒ―)+P~ΞΈγ—γƒΌγŸΒ―β’fΞΈγ—γƒΌγŸΒ―superscriptsubscriptπ‘“Β―πœƒtopsubscriptπΈΒ―πœƒsubscriptπ‘“Β―πœƒsubscript~π‘ƒΒ―πœƒsubscriptπ‘“Β―πœƒf_{\underline{\theta}}^{\top}=E_{\underline{\theta}}(f_{\text{$\underline{% \theta}$}})+\widetilde{P}_{\text{$\underline{\theta}$}}f_{\text{$\underline{% \theta}$}}italic_f start_POSTSUBSCRIPT underΒ― start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊀ end_POSTSUPERSCRIPT = italic_E start_POSTSUBSCRIPT underΒ― start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT underΒ― start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG end_POSTSUBSCRIPT ) + over~ start_ARG italic_P end_ARG start_POSTSUBSCRIPT underΒ― start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT underΒ― start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG end_POSTSUBSCRIPT and P~ΞΈγ—γƒΌγŸΒ―subscript~π‘ƒΒ―πœƒ\widetilde{P}_{\text{$\underline{\theta}$}}over~ start_ARG italic_P end_ARG start_POSTSUBSCRIPT underΒ― start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG end_POSTSUBSCRIPT is the orthogonal projection onto T~ΞΈγ—γƒΌγŸΒ―β’Msubscript~π‘‡Β―πœƒπ‘€\widetilde{T}_{\text{$\underline{\theta}$}}\!Mover~ start_ARG italic_T end_ARG start_POSTSUBSCRIPT underΒ― start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG end_POSTSUBSCRIPT italic_M. Since (17) holds for all ΞΈγ—γƒΌγŸΒ―Β―πœƒ\underline{\theta}underΒ― start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG and expectation and orthogonal projections are smooth functions we have

fβŸ‚=fβˆ’fβŠ€βˆˆπ’’superscript𝑓perpendicular-to𝑓superscript𝑓top𝒒f^{\perp}=f-f^{\top}\in\mathcal{G}italic_f start_POSTSUPERSCRIPT βŸ‚ end_POSTSUPERSCRIPT = italic_f - italic_f start_POSTSUPERSCRIPT ⊀ end_POSTSUPERSCRIPT ∈ caligraphic_G

where f⊀=𝖀⁒f+𝖯~⁒fsuperscript𝑓top𝖀𝑓~𝖯𝑓f^{\top}=\mathsf{E}f+\widetilde{\mathsf{P}}fitalic_f start_POSTSUPERSCRIPT ⊀ end_POSTSUPERSCRIPT = sansserif_E italic_f + over~ start_ARG sansserif_P end_ARG italic_f.

The score βˆ‡β„“βˆ‡β„“\nabla\ellβˆ‡ roman_β„“ is a pre-estimator so that we define s𝑠sitalic_s to be the orthogonalized score

s=(βˆ‡β„“)βŠ₯βˆˆπ’’.𝑠superscriptβˆ‡β„“bottom𝒒s=(\nabla\ell)^{\bot}\in\mathcal{G}.italic_s = ( βˆ‡ roman_β„“ ) start_POSTSUPERSCRIPT βŠ₯ end_POSTSUPERSCRIPT ∈ caligraphic_G .

The Fisher information for ΞΈγ—γƒΌγŸπœƒ\thetaitalic_ΞΈγ—γƒΌγŸ is I=𝖡⁒(βˆ‡β„“)πΌπ–΅βˆ‡β„“I=\mathsf{V}\left(\nabla\ell\right)italic_I = sansserif_V ( βˆ‡ roman_β„“ ) and the nuisance orthogonalized Fisher information is IβŸ‚=𝖡⁒((βˆ‡β„“)βŠ₯)=𝖡⁒(s)superscript𝐼perpendicular-to𝖡superscriptβˆ‡β„“bottom𝖡𝑠I^{\perp}=\mathsf{V}\left((\nabla\ell)^{\bot}\right)=\mathsf{V}\left(s\right)italic_I start_POSTSUPERSCRIPT βŸ‚ end_POSTSUPERSCRIPT = sansserif_V ( ( βˆ‡ roman_β„“ ) start_POSTSUPERSCRIPT βŠ₯ end_POSTSUPERSCRIPT ) = sansserif_V ( italic_s ); both can be functions of ΞΈγ—γƒΌγŸ~~πœƒ\undertilde{\theta}under~ start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG but only IβŸ‚superscript𝐼perpendicular-toI^{\perp}italic_I start_POSTSUPERSCRIPT βŸ‚ end_POSTSUPERSCRIPT is the same for all nuisance parameterizations.

The relationship between the score (information) and the orthogonalized score (orthogonalized information) expressed in the ΞΈγ—γƒΌγŸΒ―Β―πœƒ\underline{\theta}underΒ― start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG parameterization is

(βˆ‡β„“)βŸ‚superscriptβˆ‡β„“perpendicular-to\displaystyle\left(\nabla\ell\right)^{\perp}( βˆ‡ roman_β„“ ) start_POSTSUPERSCRIPT βŸ‚ end_POSTSUPERSCRIPT =\displaystyle== βˆ‡β„“βˆ’Iβˆ‡βˆ‡~⁒I~βˆ’1β’βˆ‡~β’β„“βˆ‡β„“subscriptπΌβˆ‡~βˆ‡superscript~𝐼1~βˆ‡β„“\displaystyle\nabla\ell-I_{\nabla\widetilde{\nabla}}\undertilde{I}^{-1}% \widetilde{\nabla}\ellβˆ‡ roman_β„“ - italic_I start_POSTSUBSCRIPT βˆ‡ over~ start_ARG βˆ‡ end_ARG end_POSTSUBSCRIPT under~ start_ARG italic_I end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over~ start_ARG βˆ‡ end_ARG roman_β„“
IβŸ‚superscript𝐼perpendicular-to\displaystyle I^{\perp}italic_I start_POSTSUPERSCRIPT βŸ‚ end_POSTSUPERSCRIPT =\displaystyle== Iβˆ’Iβˆ‡βˆ‡~⁒I~βˆ’1⁒Iβˆ‡~β’βˆ‡πΌsubscriptπΌβˆ‡~βˆ‡superscript~𝐼1subscript𝐼~βˆ‡βˆ‡\displaystyle I-I_{\nabla\widetilde{\nabla}}\undertilde{I}^{-1}I_{\widetilde{% \nabla}\nabla}italic_I - italic_I start_POSTSUBSCRIPT βˆ‡ over~ start_ARG βˆ‡ end_ARG end_POSTSUBSCRIPT under~ start_ARG italic_I end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_I start_POSTSUBSCRIPT over~ start_ARG βˆ‡ end_ARG βˆ‡ end_POSTSUBSCRIPT

where

IΒ―=Iβˆ‡Β―β’βˆ‡Β―=(IIβˆ‡βˆ‡~Iβˆ‡~β’βˆ‡I~)¯𝐼subscriptπΌΒ―βˆ‡Β―βˆ‡matrix𝐼subscriptπΌβˆ‡~βˆ‡subscript𝐼~βˆ‡βˆ‡~𝐼\underline{I}=I_{\overline{\nabla}\overline{\nabla}}=\begin{pmatrix}\begin{% array}[]{ll}I&I_{\nabla\widetilde{\nabla}}\\ I_{\widetilde{\nabla}\nabla}&\undertilde{I}\end{array}\end{pmatrix}underΒ― start_ARG italic_I end_ARG = italic_I start_POSTSUBSCRIPT overΒ― start_ARG βˆ‡ end_ARG overΒ― start_ARG βˆ‡ end_ARG end_POSTSUBSCRIPT = ( start_ARG start_ROW start_CELL start_ARRAY start_ROW start_CELL italic_I end_CELL start_CELL italic_I start_POSTSUBSCRIPT βˆ‡ over~ start_ARG βˆ‡ end_ARG end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_I start_POSTSUBSCRIPT over~ start_ARG βˆ‡ end_ARG βˆ‡ end_POSTSUBSCRIPT end_CELL start_CELL under~ start_ARG italic_I end_ARG end_CELL end_ROW end_ARRAY end_CELL end_ROW end_ARG )

and I=Iβˆ‡βˆ‡πΌsubscriptπΌβˆ‡βˆ‡I=I_{\nabla\nabla}italic_I = italic_I start_POSTSUBSCRIPT βˆ‡ βˆ‡ end_POSTSUBSCRIPT and I~=Iβˆ‡~β’βˆ‡~~𝐼subscript𝐼~βˆ‡~βˆ‡\undertilde{I}=I_{\widetilde{\nabla}\widetilde{\nabla}}under~ start_ARG italic_I end_ARG = italic_I start_POSTSUBSCRIPT over~ start_ARG βˆ‡ end_ARG over~ start_ARG βˆ‡ end_ARG end_POSTSUBSCRIPT are the Fisher informations for ΞΈγ—γƒΌγŸπœƒ\thetaitalic_ΞΈγ—γƒΌγŸ and ΞΈγ—γƒΌγŸ~~πœƒ\undertilde{\theta}under~ start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG. When Iβˆ‡βˆ‡~subscriptπΌβˆ‡~βˆ‡I_{\nabla\widetilde{\nabla}}italic_I start_POSTSUBSCRIPT βˆ‡ over~ start_ARG βˆ‡ end_ARG end_POSTSUBSCRIPT vanishes on Ξ˜γ—γƒΌγŸΒ―Β―Ξ˜γ—γƒΌγŸ\underline{\Theta}underΒ― start_ARG roman_Ξ˜γ—γƒΌγŸ end_ARG, parameterizations ΞΈγ—γƒΌγŸπœƒ\thetaitalic_ΞΈγ—γƒΌγŸ and ΞΈγ—γƒΌγŸ~~πœƒ\undertilde{\theta}under~ start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG are orthogonal.

The definition of the scalar potential in the multi-parameter case is straight forward. And, as in the scalar parameter case, the log likelihood β„“β„“\ellroman_β„“ is the scalar potential for s𝑠sitalic_s.

Definition 6.

A scalar potential of gβˆˆπ’’π‘”π’’g\in\mathcal{G}italic_g ∈ caligraphic_G is any function G:π’΄Γ—Ξ˜γ—γƒΌγŸΒ―βŸΆβ„:πΊβŸΆπ’΄Β―Ξ˜γ—γƒΌγŸβ„G:\mathcal{Y}\times\text{$\underline{\Theta}$}\longrightarrow\mathbb{R}italic_G : caligraphic_Y Γ— underΒ― start_ARG roman_Ξ˜γ—γƒΌγŸ end_ARG ⟢ blackboard_R such that g=(βˆ‡G)βŠ₯𝑔superscriptβˆ‡πΊbottomg=(\nabla G)^{\bot}italic_g = ( βˆ‡ italic_G ) start_POSTSUPERSCRIPT βŠ₯ end_POSTSUPERSCRIPT.

The multivariate version of (13) is

βˆ‡π–€β’(f𝚝)=𝖀⁒(βˆ‡f𝚝)+(βˆ‡π–€)⁒(f𝚝)βˆ‡π–€superscriptπ‘“πšπ–€βˆ‡superscriptπ‘“πšβˆ‡π–€superscriptπ‘“πš\nabla\mathsf{E}(f^{{\tt t}})=\mathsf{E}\left(\nabla f^{{\tt t}}\right)+\left(% \nabla\mathsf{E}\right)(f^{{\tt t}})βˆ‡ sansserif_E ( italic_f start_POSTSUPERSCRIPT typewriter_t end_POSTSUPERSCRIPT ) = sansserif_E ( βˆ‡ italic_f start_POSTSUPERSCRIPT typewriter_t end_POSTSUPERSCRIPT ) + ( βˆ‡ sansserif_E ) ( italic_f start_POSTSUPERSCRIPT typewriter_t end_POSTSUPERSCRIPT ) (18)

where

(βˆ‡π–€)⁒(f𝚝)=𝖀⁒((βˆ‡β„“)⁒f𝚝).βˆ‡π–€superscriptπ‘“πšπ–€βˆ‡β„“superscriptπ‘“πš\left(\nabla\mathsf{E}\right)\left(f^{{\tt t}}\right)=\mathsf{E}\left(\left(% \nabla\ell\right)f^{{\tt t}}\right).( βˆ‡ sansserif_E ) ( italic_f start_POSTSUPERSCRIPT typewriter_t end_POSTSUPERSCRIPT ) = sansserif_E ( ( βˆ‡ roman_β„“ ) italic_f start_POSTSUPERSCRIPT typewriter_t end_POSTSUPERSCRIPT ) .

Since g∈HMβŸ‚π‘”superscriptsubscript𝐻𝑀perpendicular-tog\in H_{M}^{\perp}italic_g ∈ italic_H start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT βŸ‚ end_POSTSUPERSCRIPT we have 𝖀⁒((βˆ‡β„“)⁒g𝚝)=𝖀⁒(s⁒g𝚝)π–€βˆ‡β„“superscriptπ‘”πšπ–€π‘ superscriptπ‘”πš\mathsf{E}\left(\left(\nabla\ell\right)g^{{\tt t}}\right)=\mathsf{E}\left(sg^{% {\tt t}}\right)sansserif_E ( ( βˆ‡ roman_β„“ ) italic_g start_POSTSUPERSCRIPT typewriter_t end_POSTSUPERSCRIPT ) = sansserif_E ( italic_s italic_g start_POSTSUPERSCRIPT typewriter_t end_POSTSUPERSCRIPT ) so that the multivariate version of the score equation (14) is

𝖀⁒(βˆ‡g𝚝)+𝖀⁒(s⁒g𝚝)=0.π–€βˆ‡superscriptπ‘”πšπ–€π‘ superscriptπ‘”πš0\mathsf{E}\left(\nabla g^{{\tt t}}\right)+\mathsf{E}\left(sg^{{\tt t}}\right)=0.sansserif_E ( βˆ‡ italic_g start_POSTSUPERSCRIPT typewriter_t end_POSTSUPERSCRIPT ) + sansserif_E ( italic_s italic_g start_POSTSUPERSCRIPT typewriter_t end_POSTSUPERSCRIPT ) = 0 . (19)

Differentiating with respect to the nuisance parameter we obtain

𝖀⁒(βˆ‡~⁒g𝚝)+𝖀⁒(s~⁒g𝚝)=0𝖀~βˆ‡superscriptπ‘”πšπ–€~𝑠superscriptπ‘”πš0\mathsf{E}(\widetilde{\nabla}g^{{\tt t}})+\mathsf{E}(\undertilde{s}g^{{\tt t}}% )=0sansserif_E ( over~ start_ARG βˆ‡ end_ARG italic_g start_POSTSUPERSCRIPT typewriter_t end_POSTSUPERSCRIPT ) + sansserif_E ( under~ start_ARG italic_s end_ARG italic_g start_POSTSUPERSCRIPT typewriter_t end_POSTSUPERSCRIPT ) = 0 (20)

so that g𝑔gitalic_g being nuisance orthogonal means that the average change of g𝑔gitalic_g in the direction of the nuisance parameter is zero.

For the mean slope to be meaningful we need to use its standardized version.

Definition 7.

For gβˆˆπ’’π‘”π’’g\in\mathcal{G}italic_g ∈ caligraphic_G, define

gΒ―=π–΅βˆ’1/2⁒g¯𝑔superscript𝖡12𝑔\bar{g}=\mathsf{V}^{-1/2}goverΒ― start_ARG italic_g end_ARG = sansserif_V start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT italic_g

where 𝖡=𝖡⁒(g)𝖡𝖡𝑔\mathsf{V}=\mathsf{V}\left(g\right)sansserif_V = sansserif_V ( italic_g ) so that 𝖡⁒(gΒ―)𝖡¯𝑔\mathsf{V}\left(\bar{g}\right)sansserif_V ( overΒ― start_ARG italic_g end_ARG ) is I𝗂𝖽subscript𝐼𝗂𝖽I_{\mathsf{id}}italic_I start_POSTSUBSCRIPT sansserif_id end_POSTSUBSCRIPT, the kΓ—kπ‘˜π‘˜k\times kitalic_k Γ— italic_k identity matrix. Any g𝑔gitalic_g such that 𝖡⁒(g)=I𝗂𝖽𝖡𝑔subscript𝐼𝗂𝖽\mathsf{V}\left(g\right)=I_{\mathsf{id}}sansserif_V ( italic_g ) = italic_I start_POSTSUBSCRIPT sansserif_id end_POSTSUBSCRIPT is called a standardized estimator.

Definition 8.

The information for ΞΈγ—γƒΌγŸπœƒ\thetaitalic_ΞΈγ—γƒΌγŸ utilized by g𝑔gitalic_g is

Λらむだ⁒(g)Λらむだ𝑔\displaystyle\Lambda\left(g\right)roman_Λらむだ ( italic_g ) =(π–€β’βˆ‡g¯𝗍)⁒(π–€β’βˆ‡g¯𝗍)𝗍absentπ–€βˆ‡superscript¯𝑔𝗍superscriptπ–€βˆ‡superscript¯𝑔𝗍𝗍\displaystyle=\left(\mathsf{E}\nabla\bar{g}^{\mathsf{t}}\right)\left(\mathsf{E% }\nabla\bar{g}^{\mathsf{t}}\right)^{\mathsf{t}}= ( sansserif_E βˆ‡ overΒ― start_ARG italic_g end_ARG start_POSTSUPERSCRIPT sansserif_t end_POSTSUPERSCRIPT ) ( sansserif_E βˆ‡ overΒ― start_ARG italic_g end_ARG start_POSTSUPERSCRIPT sansserif_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT sansserif_t end_POSTSUPERSCRIPT
=(π–€β’βˆ‡g𝗍)β’π–΅βˆ’1⁒(g)⁒(π–€β’βˆ‡g𝗍)𝗍.absentπ–€βˆ‡superscript𝑔𝗍superscript𝖡1𝑔superscriptπ–€βˆ‡superscript𝑔𝗍𝗍\displaystyle=\left(\mathsf{E}\nabla g^{\mathsf{t}}\right)\mathsf{V}^{-1}(g)% \left(\mathsf{E}\nabla g^{\mathsf{t}}\right)^{\mathsf{t}}.= ( sansserif_E βˆ‡ italic_g start_POSTSUPERSCRIPT sansserif_t end_POSTSUPERSCRIPT ) sansserif_V start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_g ) ( sansserif_E βˆ‡ italic_g start_POSTSUPERSCRIPT sansserif_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT sansserif_t end_POSTSUPERSCRIPT .

The scalar information for ΞΈγ—γƒΌγŸπœƒ\thetaitalic_ΞΈγ—γƒΌγŸ utilized by g𝑔gitalic_g is

λらむだ⁒(g)=t⁒rΛらむだ⁒(g).πœ†π‘”π‘‘π‘ŸΞ›γ‚‰γ‚€γ π‘”\lambda(g)=\mathop{tr}\Lambda(g).italic_λらむだ ( italic_g ) = start_BIGOP italic_t italic_r end_BIGOP roman_Λらむだ ( italic_g ) .

Note Λらむだ⁒(g)∈C1⁒(Ξ˜γ—γƒΌγŸΒ―,ℝk×ℝk)Λらむだ𝑔superscript𝐢1Β―Ξ˜γ—γƒΌγŸsuperscriptβ„π‘˜superscriptβ„π‘˜\Lambda(g)\in C^{1}(\text{$\underline{\Theta}$},\mathbb{R}^{k}\times\mathbb{R}% ^{k})roman_Λらむだ ( italic_g ) ∈ italic_C start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( underΒ― start_ARG roman_Ξ˜γ—γƒΌγŸ end_ARG , blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT Γ— blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ). Using the Frobenius norm for matrix A𝐴Aitalic_A, β€–Aβ€–=t⁒r(A𝚝⁒A)normπ΄π‘‘π‘Ÿsuperscript𝐴𝚝𝐴||A||=\sqrt{\mathop{tr}(A^{{\tt t}}A)}| | italic_A | | = square-root start_ARG start_BIGOP italic_t italic_r end_BIGOP ( italic_A start_POSTSUPERSCRIPT typewriter_t end_POSTSUPERSCRIPT italic_A ) end_ARG, we see that the scalar information is the square of the norm of π–€β’βˆ‡gΒ―πšπ–€βˆ‡superscriptΒ―π‘”πš\mathsf{E}\nabla\bar{g}^{{\tt t}}sansserif_E βˆ‡ overΒ― start_ARG italic_g end_ARG start_POSTSUPERSCRIPT typewriter_t end_POSTSUPERSCRIPT

λらむだ⁒(g)=β€–π–€β’βˆ‡gΒ―πšβ€–2.πœ†π‘”superscriptnormπ–€βˆ‡superscriptΒ―π‘”πš2\lambda(g)=||\mathsf{E}\nabla\bar{g}^{{\tt t}}||^{2}.italic_λらむだ ( italic_g ) = | | sansserif_E βˆ‡ overΒ― start_ARG italic_g end_ARG start_POSTSUPERSCRIPT typewriter_t end_POSTSUPERSCRIPT | | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

By replacing βˆ‡βˆ‡\nablaβˆ‡ with βˆ‡~~βˆ‡\widetilde{\nabla}over~ start_ARG βˆ‡ end_ARG in Definition 8 we could define Λらむだ~⁒(g)~Λらむだ𝑔\undertilde{\Lambda}(g)under~ start_ARG roman_Λらむだ end_ARG ( italic_g ), the information for ΞΈγ—γƒΌγŸ~~πœƒ\undertilde{\theta}under~ start_ARG italic_ΞΈγ—γƒΌγŸ end_ARG. However, equation (20) shows Λらむだ~⁒(g)=0~Λらむだ𝑔0\undertilde{\Lambda}(g)=0under~ start_ARG roman_Λらむだ end_ARG ( italic_g ) = 0 for all gβˆˆπ’’π‘”π’’g\in\mathcal{G}italic_g ∈ caligraphic_G. Restricting estimators to be orthogonal to the space spanned by the nuisance parameters makes inferences independent of the choice of the nuisance parameter but also means that estimators for the parameter of interest have no information for the nuisance parameter.

Theorem 2.

For k-dimensional parameter ΞΈγ—γƒΌγŸπœƒ\thetaitalic_ΞΈγ—γƒΌγŸ let s=(βˆ‡β„“)βŸ‚π‘ superscriptβˆ‡β„“perpendicular-tos=(\nabla\ell)^{\perp}italic_s = ( βˆ‡ roman_β„“ ) start_POSTSUPERSCRIPT βŸ‚ end_POSTSUPERSCRIPT and let IβŸ‚=𝖡⁒(s)superscript𝐼perpendicular-to𝖡𝑠I^{\perp}=\mathsf{V}(s)italic_I start_POSTSUPERSCRIPT βŸ‚ end_POSTSUPERSCRIPT = sansserif_V ( italic_s ) be the orthogonalized Fisher information for ΞΈγ—γƒΌγŸπœƒ\thetaitalic_ΞΈγ—γƒΌγŸ. For any gβˆˆπ’’π‘”π’’g\in\mathcal{G}italic_g ∈ caligraphic_G, Λらむだ⁒(g)≀IβŸ‚Ξ›γ‚‰γ‚€γ π‘”superscript𝐼perpendicular-to\Lambda(g)\leq I^{\perp}roman_Λらむだ ( italic_g ) ≀ italic_I start_POSTSUPERSCRIPT βŸ‚ end_POSTSUPERSCRIPT and s𝑠sitalic_s attains this bound, Λらむだ⁒(s)=IβŸ‚Ξ›γ‚‰γ‚€γ π‘ superscript𝐼perpendicular-to\Lambda(s)=I^{\perp}roman_Λらむだ ( italic_s ) = italic_I start_POSTSUPERSCRIPT βŸ‚ end_POSTSUPERSCRIPT. Furthermore,

Λらむだ⁒(g)Λらむだ𝑔\displaystyle\Lambda(g)roman_Λらむだ ( italic_g ) =\displaystyle== 𝖡⁒(𝖯g⁒s)=𝖡⁒(𝖯gβ’βˆ‡β„“)𝖡subscript𝖯𝑔𝑠𝖡subscriptπ–―π‘”βˆ‡β„“\displaystyle\mathsf{V}(\mathsf{P}_{g}s)=\mathsf{V}(\mathsf{P}_{g}\nabla\ell)sansserif_V ( sansserif_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT italic_s ) = sansserif_V ( sansserif_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT βˆ‡ roman_β„“ )
=\displaystyle== (IβŸ‚)1/2β’π–±π–±πšβ’(IβŸ‚)1/2superscriptsuperscript𝐼perpendicular-to12superscriptπ–±π–±πšsuperscriptsuperscript𝐼perpendicular-to12\displaystyle(I^{\perp})^{1/2}\mathsf{R}\mathsf{R}^{{\tt t}}(I^{\perp})^{1/2}( italic_I start_POSTSUPERSCRIPT βŸ‚ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT sansserif_RR start_POSTSUPERSCRIPT typewriter_t end_POSTSUPERSCRIPT ( italic_I start_POSTSUPERSCRIPT βŸ‚ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT

where 𝖱=𝖀⁒(s¯⁒g¯𝚝)𝖱𝖀¯𝑠superscriptΒ―π‘”πš\mathsf{R}=\mathsf{E}(\bar{s}\bar{g}^{{\tt t}})sansserif_R = sansserif_E ( overΒ― start_ARG italic_s end_ARG overΒ― start_ARG italic_g end_ARG start_POSTSUPERSCRIPT typewriter_t end_POSTSUPERSCRIPT ) is the correlation matrix between s𝑠sitalic_s and g𝑔gitalic_g.

Proof.

The displayed equations in the Theorem are obtained from the score equation (19) which gives

Λらむだ⁒(g)=𝖀⁒(s⁒g¯𝚝)⁒𝖀⁒(g¯⁒s𝚝).Λらむだ𝑔𝖀𝑠superscriptΒ―π‘”πšπ–€Β―π‘”superscriptπ‘ πš\Lambda(g)=\mathsf{E}(s\bar{g}^{{\tt t}})\mathsf{E}(\bar{g}s^{{\tt t}}).roman_Λらむだ ( italic_g ) = sansserif_E ( italic_s overΒ― start_ARG italic_g end_ARG start_POSTSUPERSCRIPT typewriter_t end_POSTSUPERSCRIPT ) sansserif_E ( overΒ― start_ARG italic_g end_ARG italic_s start_POSTSUPERSCRIPT typewriter_t end_POSTSUPERSCRIPT ) .

The first equation follows from the definition of the projection and its variance: 𝖯g⁒s=𝖀⁒(s⁒g¯𝚝)⁒gΒ―subscript𝖯𝑔𝑠𝖀𝑠superscriptΒ―π‘”πšΒ―π‘”\mathsf{P}_{g}s=\mathsf{E}(s\bar{g}^{{\tt t}})\bar{g}sansserif_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT italic_s = sansserif_E ( italic_s overΒ― start_ARG italic_g end_ARG start_POSTSUPERSCRIPT typewriter_t end_POSTSUPERSCRIPT ) overΒ― start_ARG italic_g end_ARG so 𝖡⁒(𝖯g⁒s)=𝖀⁒(s⁒g¯𝚝)⁒𝖀⁒(g¯⁒s𝚝)𝖡subscript𝖯𝑔𝑠𝖀𝑠superscriptΒ―π‘”πšπ–€Β―π‘”superscriptπ‘ πš\mathsf{V}(\mathsf{P}_{g}s)=\mathsf{E}(s\bar{g}^{{\tt t}})\mathsf{E}(\bar{g}s^% {{\tt t}})sansserif_V ( sansserif_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT italic_s ) = sansserif_E ( italic_s overΒ― start_ARG italic_g end_ARG start_POSTSUPERSCRIPT typewriter_t end_POSTSUPERSCRIPT ) sansserif_E ( overΒ― start_ARG italic_g end_ARG italic_s start_POSTSUPERSCRIPT typewriter_t end_POSTSUPERSCRIPT ). The second equation follows because βˆ‡β„“=s+(βˆ‡β„“)βŠ€βˆ‡β„“π‘ superscriptβˆ‡β„“top\nabla\ell=s+(\nabla\ell)^{\top}βˆ‡ roman_β„“ = italic_s + ( βˆ‡ roman_β„“ ) start_POSTSUPERSCRIPT ⊀ end_POSTSUPERSCRIPT and g𝑔gitalic_g is orthogonal to (βˆ‡β„“)⊀superscriptβˆ‡β„“top\left(\nabla\ell\right)^{\top}( βˆ‡ roman_β„“ ) start_POSTSUPERSCRIPT ⊀ end_POSTSUPERSCRIPT. The third equation follows from 𝖀⁒(s⁒g¯𝚝)=(IβŠ₯)1/2⁒𝖀⁒(s¯⁒g¯𝚝)𝖀𝑠superscriptΒ―π‘”πšsuperscriptsuperscript𝐼bottom12𝖀¯𝑠superscriptΒ―π‘”πš\mathsf{E}(s\bar{g}^{{\tt t}})=(I^{\bot})^{1/2}\mathsf{E}(\bar{s}\bar{g}^{{\tt t% }})sansserif_E ( italic_s overΒ― start_ARG italic_g end_ARG start_POSTSUPERSCRIPT typewriter_t end_POSTSUPERSCRIPT ) = ( italic_I start_POSTSUPERSCRIPT βŠ₯ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT sansserif_E ( overΒ― start_ARG italic_s end_ARG overΒ― start_ARG italic_g end_ARG start_POSTSUPERSCRIPT typewriter_t end_POSTSUPERSCRIPT ) since 𝖡⁒(s)=IβŠ₯𝖡𝑠superscript𝐼bottom\mathsf{V}(s)=I^{\bot}sansserif_V ( italic_s ) = italic_I start_POSTSUPERSCRIPT βŠ₯ end_POSTSUPERSCRIPT. The inequality Λらむだ⁒(g)≀IβŠ₯Λらむだ𝑔superscript𝐼bottom\Lambda(g)\leq I^{\bot}roman_Λらむだ ( italic_g ) ≀ italic_I start_POSTSUPERSCRIPT βŠ₯ end_POSTSUPERSCRIPT follows because the squared length of a projection cannot be longer than the original vector. ∎

When there are no nuisance parameters Theorem 2 holds with IβŠ₯=Isuperscript𝐼bottom𝐼I^{\bot}=Iitalic_I start_POSTSUPERSCRIPT βŠ₯ end_POSTSUPERSCRIPT = italic_I and s=βˆ‡β„“π‘ βˆ‡β„“s=\nabla\ellitalic_s = βˆ‡ roman_β„“.

Definition 9.

The ΛらむだΛらむだ\Lambdaroman_Λらむだ-efficiency of g𝑔gitalic_g is

EffΛらむだ(g)=(IβŸ‚)Ξ›γ‚‰γ‚€γ βˆ’1/2(g)(IβŸ‚).βˆ’1/2\mbox{Eff}^{\Lambda}\left(g\right)=(I^{\perp}){}^{-1/2}\Lambda(g)(I^{\perp}){}% ^{-1/2}.Eff start_POSTSUPERSCRIPT roman_Λらむだ end_POSTSUPERSCRIPT ( italic_g ) = ( italic_I start_POSTSUPERSCRIPT βŸ‚ end_POSTSUPERSCRIPT ) start_FLOATSUPERSCRIPT - 1 / 2 end_FLOATSUPERSCRIPT roman_Λらむだ ( italic_g ) ( italic_I start_POSTSUPERSCRIPT βŸ‚ end_POSTSUPERSCRIPT ) start_FLOATSUPERSCRIPT - 1 / 2 end_FLOATSUPERSCRIPT .

Corollary 2 follows immediately from Theorem 2.

Corollary 2.
EffΛらむだ⁒(g)superscriptEffΛらむだ𝑔\displaystyle\mbox{Eff}^{\Lambda}\left(g\right)Eff start_POSTSUPERSCRIPT roman_Λらむだ end_POSTSUPERSCRIPT ( italic_g ) =\displaystyle== 𝖡⁒(𝖯g⁒sΒ―)𝖡subscript𝖯𝑔¯𝑠\displaystyle\mathsf{V}(\mathsf{P}_{g}\bar{s})sansserif_V ( sansserif_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT overΒ― start_ARG italic_s end_ARG )
=\displaystyle== π–±π–±πš.superscriptπ–±π–±πš\displaystyle\mathsf{R}\mathsf{R}^{{\tt t}}.sansserif_RR start_POSTSUPERSCRIPT typewriter_t end_POSTSUPERSCRIPT .

4 Examples

4.1 Normal and t𝑑titalic_t-distributions

We consider two one-dimensional manifolds: the normal family and the family of t𝑑titalic_t distributions with 3 degrees of freedom. Both are location families so Fisher information is the same at each distribution in the manifold. We compare three estimators: the sample mean, sample median and the mle obtained from the t3subscript𝑑3t_{3}italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT distribution. We did not include the score for the t3subscript𝑑3t_{3}italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT distribution since it is very close to the corresponding mle. The sample mean is the mle for normal data.

The sample mean attains the information bound for the normal family and the t3subscript𝑑3t_{3}italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT score attains the information bound for the t3subscript𝑑3t_{3}italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT family. We use the information of these estimators to assess the cost of model misspecification and explore the relationship between information and the tails of the distribution.

Figure 3 is based on 100,000 samples of size 10 from a normal distribution and another 100,000 samples of size 10 from the t3subscript𝑑3t_{3}italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT distribution. For the graph on the left, 99 quantiles, from .005 to .995, obtained from the 100,000 sample means for the normal data were calculated. Using the empirical cdf for the 100,000 medians these 99 quantiles gave 100 tail areas (the median was included in both tails). Each tail area T⁒A𝑇𝐴T\!Aitalic_T italic_A was converted to a ΞΆγœγƒΌγŸπœ\zetaitalic_ΞΆγœγƒΌγŸ-score that measures the distance into the tail of the distribution. For continuous random variable X𝑋Xitalic_X define ΞΆγœγƒΌγŸ:X→ℝ:πœβ†’π‘‹β„\zeta:X\rightarrow\mathbb{R}italic_ΞΆγœγƒΌγŸ : italic_X β†’ blackboard_R by

ΞΆγœγƒΌγŸ={log2⁑(2⁒Pr⁒(X≀x))ifΒ Pr⁒(X≀x)≀1/2βˆ’log2⁑(2⁒Pr⁒(Xβ‰₯x))ifΒ Pr⁒(X≀x)>1/2.𝜁casessubscript22Pr𝑋π‘₯