Variance-stabilizing transformation
In applied statistics, a variance-stabilizing transformation is a data transformation that is specifically chosen either to simplify considerations in graphical exploratory data analysis or to allow the application of simple regression-based or analysis of variance techniques.[1]
Overview
[edit]The aim behind the choice of a variance-stabilizing transformation is to find a simple function ƒ to apply to values x in a data set to create new values y = ƒ(x) such that the variability of the values y is not related to their mean value. For example, suppose that the values x are realizations from different Poisson distributions: i.e. the distributions each have different mean values
is applied, the sampling variance associated with observation will be nearly constant: see Anscombe transform for details and some alternative transformations.
While variance-stabilizing transformations are well known for certain parametric families of distributions, such as the Poisson and the binomial distribution, some types of data analysis proceed more empirically: for example by searching among power transformations to find a suitable fixed transformation. Alternatively, if data analysis suggests a functional form for the relation between variance and mean, this can be used to deduce a variance-stabilizing transformation.[2] Thus if, for a mean
a suitable basis for a variance stabilizing transformation would be
where the arbitrary constant of integration and an arbitrary scaling factor can be chosen for convenience.
Example: relative variance
[edit]If X is a positive random variable and for some constant, s, the variance is given as h(
That is, the variance-stabilizing transformation is the logarithmic transformation.
Example: absolute plus relative variance
[edit]If the variance is given as h(
That is, the variance-stabilizing transformation is the inverse hyperbolic sine of the scaled value x /
Example: pearson correlation
[edit]The Fisher transformation is a variance stabilizing transformation for the pearson correlation coefficient.
Relationship to the delta method
[edit]Here, the delta method is presented in a rough way, but it is enough to see the relation with the variance-stabilizing transformations. To see a more formal approach see delta method.
Let be a random variable, with and . Define , where is a regular function. A first order Taylor approximation for is:
From the equation above, we obtain:
- and
This approximation method is called delta method.
Consider now a random variable such that and . Notice the relation between the variance and the mean, which implies, for example, heteroscedasticity in a linear model. Therefore, the goal is to find a function such that has a variance independent (at least approximately) of its expectation.
Imposing the condition , this equality implies the differential equation:
This ordinary differential equation has, by separation of variables, the following solution:
This last expression appeared for the first time in a M. S. Bartlett paper.[3]
References
[edit]- ^ Everitt, B. S. (2002). The Cambridge Dictionary of Statistics (2nd ed.). CUP. ISBN 0-521-81099-X.
- ^ Dodge, Y. (2003). The Oxford Dictionary of Statistical Terms. OUP. ISBN 0-19-920613-9.
- ^ Bartlett, M. S. (1947). "The Use of Transformations". Biometrics. 3: 39–52. doi:10.2307/3001536.