Jackknife resampling

Statistical method for resampling
Schematic of Jackknife Resampling

In statistics, the jackknife (jackknife cross-validation) is a cross-validation technique and, therefore, a form of resampling. It is especially useful for bias and variance estimation. The jackknife pre-dates other common resampling methods such as the bootstrap. Given a sample of size n {\displaystyle n} , a jackknife estimator can be built by aggregating the parameter estimates from each subsample of size ( n 1 ) {\displaystyle (n-1)} obtained by omitting one observation.[1]

The jackknife technique was developed by Maurice Quenouille (1924–1973) from 1949 and refined in 1956. John Tukey expanded on the technique in 1958 and proposed the name "jackknife" because, like a physical jack-knife (a compact folding knife), it is a rough-and-ready tool that can improvise a solution for a variety of problems even though specific problems may be more efficiently solved with a purpose-designed tool.[2]

The jackknife is a linear approximation of the bootstrap.[2]

A simple example: mean estimation

The jackknife estimator of a parameter is found by systematically leaving out each observation from a dataset and calculating the parameter estimate over the remaining observations and then aggregating these calculations.

For example, if the parameter to be estimated is the population mean of random variable x {\displaystyle x} , then for a given set of i.i.d. observations x 1 , . . . , x n {\displaystyle x_{1},...,x_{n}} the natural estimator is the sample mean:

x ¯ = 1 n i = 1 n x i = 1 n i [ n ] x i , {\displaystyle {\bar {x}}={\frac {1}{n}}\sum _{i=1}^{n}x_{i}={\frac {1}{n}}\sum _{i\in [n]}x_{i},}

where the last sum used another way to indicate that the index i {\displaystyle i} runs over the set [ n ] = { 1 , , n } {\displaystyle [n]=\{1,\ldots ,n\}} .

Then we proceed as follows: For each i [ n ] {\displaystyle i\in [n]} we compute the mean x ¯ ( i ) {\displaystyle {\bar {x}}_{(i)}} of the jackknife subsample consisting of all but the i {\displaystyle i} -th data point, and this is called the i {\displaystyle i} -th jackknife replicate:

x ¯ ( i ) = 1 n 1 j [ n ] , j i x j , i = 1 , , n . {\displaystyle {\bar {x}}_{(i)}={\frac {1}{n-1}}\sum _{j\in [n],j\neq i}x_{j},\quad \quad i=1,\dots ,n.}

It could help to think that these n {\displaystyle n} jackknife replicates x ¯ ( 1 ) , , x ¯ ( n ) {\displaystyle {\bar {x}}_{(1)},\ldots ,{\bar {x}}_{(n)}} give us an approximation of the distribution of the sample mean x ¯ {\displaystyle {\bar {x}}} and the larger the n {\displaystyle n} the better this approximation will be. Then finally to get the jackknife estimator we take the average of these n {\displaystyle n} jackknife replicates:

x ¯ j a c k = 1 n i = 1 n x ¯ ( i ) . {\displaystyle {\bar {x}}_{\mathrm {jack} }={\frac {1}{n}}\sum _{i=1}^{n}{\bar {x}}_{(i)}.}

One may ask about the bias and the variance of x ¯ j a c k {\displaystyle {\bar {x}}_{\mathrm {jack} }} . From the definition of x ¯ j a c k {\displaystyle {\bar {x}}_{\mathrm {jack} }} as the average of the jackknife replicates one could try to calculate explicitly, and the bias is a trivial calculation but the variance of x ¯ j a c k {\displaystyle {\bar {x}}_{\mathrm {jack} }} is more involved since the jackknife replicates are not independent.

For the special case of the mean, one can show explicitly that the jackknife estimate equals the usual estimate:

1 n i = 1 n x ¯ ( i ) = x ¯ . {\displaystyle {\frac {1}{n}}\sum _{i=1}^{n}{\bar {x}}_{(i)}={\bar {x}}.}

This establishes the identity x ¯ j a c k = x ¯ {\displaystyle {\bar {x}}_{\mathrm {jack} }={\bar {x}}} . Then taking expectations we get E [ x ¯ j a c k ] = E [ x ¯ ] = E [ x ] {\displaystyle E[{\bar {x}}_{\mathrm {jack} }]=E[{\bar {x}}]=E[x]} , so x ¯ j a c k {\displaystyle {\bar {x}}_{\mathrm {jack} }} is unbiased, while taking variance we get V [ x ¯ j a c k ] = V [ x ¯ ] = V [ x ] / n {\displaystyle V[{\bar {x}}_{\mathrm {jack} }]=V[{\bar {x}}]=V[x]/n} . However, these properties do not generally hold for parameters other than the mean.

This simple example for the case of mean estimation is just to illustrate the construction of a jackknife estimator, while the real subtleties (and the usefulness) emerge for the case of estimating other parameters, such as higher moments than the mean or other functionals of the distribution.

x ¯ j a c k {\displaystyle {\bar {x}}_{\mathrm {jack} }} could be used to construct an empirical estimate of the bias of x ¯ {\displaystyle {\bar {x}}} , namely bias ^ ( x ¯ ) j a c k = c ( x ¯ j a c k x ¯ ) {\displaystyle {\widehat {\operatorname {bias} }}({\bar {x}})_{\mathrm {jack} }=c({\bar {x}}_{\mathrm {jack} }-{\bar {x}})} with some suitable factor c > 0 {\displaystyle c>0} , although in this case we know that x ¯ j a c k = x ¯ {\displaystyle {\bar {x}}_{\mathrm {jack} }={\bar {x}}} so this construction does not add any meaningful knowledge, but it gives the correct estimation of the bias (which is zero).

A jackknife estimate of the variance of x ¯ {\displaystyle {\bar {x}}} can be calculated from the variance of the jackknife replicates x ¯ ( i ) {\displaystyle {\bar {x}}_{(i)}} :[3][4]

var ^ ( x ¯ ) j a c k = n 1 n i = 1 n ( x ¯ ( i ) x ¯ j a c k ) 2 = 1 n ( n 1 ) i = 1 n ( x i x ¯ ) 2 . {\displaystyle {\widehat {\operatorname {var} }}({\bar {x}})_{\mathrm {jack} }={\frac {n-1}{n}}\sum _{i=1}^{n}({\bar {x}}_{(i)}-{\bar {x}}_{\mathrm {jack} })^{2}={\frac {1}{n(n-1)}}\sum _{i=1}^{n}(x_{i}-{\bar {x}})^{2}.}

The left equality defines the estimator var ^ ( x ¯ ) j a c k {\displaystyle {\widehat {\operatorname {var} }}({\bar {x}})_{\mathrm {jack} }} and the right equality is an identity that can be verified directly. Then taking expectations we get E [ var ^ ( x ¯ ) j a c k ] = V [ x ] / n = V [ x ¯ ] {\displaystyle E[{\widehat {\operatorname {var} }}({\bar {x}})_{\mathrm {jack} }]=V[x]/n=V[{\bar {x}}]} , so this is an unbiased estimator of the variance of x ¯ {\displaystyle {\bar {x}}} .

Estimating the bias of an estimator

The jackknife technique can be used to estimate (and correct) the bias of an estimator calculated over the entire sample.

Suppose θ {\displaystyle \theta } is the target parameter of interest, which is assumed to be some functional of the distribution of x {\displaystyle x} . Based on a finite set of observations x 1 , . . . , x n {\displaystyle x_{1},...,x_{n}} , which is assumed to consist of i.i.d. copies of x {\displaystyle x} , the estimator θ ^ {\displaystyle {\hat {\theta }}} is constructed:

θ ^ = f n ( x 1 , , x n ) . {\displaystyle {\hat {\theta }}=f_{n}(x_{1},\ldots ,x_{n}).}

The value of θ ^ {\displaystyle {\hat {\theta }}} is sample-dependent, so this value will change from one random sample to another.

By definition, the bias of θ ^ {\displaystyle {\hat {\theta }}} is as follows:

bias ( θ ^ ) = E [ θ ^ ] θ . {\displaystyle {\text{bias}}({\hat {\theta }})=E[{\hat {\theta }}]-\theta .}

One may wish to compute several values of θ ^ {\displaystyle {\hat {\theta }}} from several samples, and average them, to calculate an empirical approximation of E [ θ ^ ] {\displaystyle E[{\hat {\theta }}]} , but this is impossible when there are no "other samples" when the entire set of available observations x 1 , . . . , x n {\displaystyle x_{1},...,x_{n}} was used to calculate θ ^ {\displaystyle {\hat {\theta }}} . In this kind of situation the jackknife resampling technique may be of help.

We construct the jackknife replicates:

θ ^ ( 1 ) = f n 1 ( x 2 , x 3 , x n ) {\displaystyle {\hat {\theta }}_{(1)}=f_{n-1}(x_{2},x_{3}\ldots ,x_{n})}
θ ^ ( 2 ) = f n 1 ( x 1 , x 3 , , x n ) {\displaystyle {\hat {\theta }}_{(2)}=f_{n-1}(x_{1},x_{3},\ldots ,x_{n})}
{\displaystyle \vdots }
θ ^ ( n ) = f n 1 ( x 1 , x 2 , , x n 1 ) {\displaystyle {\hat {\theta }}_{(n)}=f_{n-1}(x_{1},x_{2},\ldots ,x_{n-1})}

where each replicate is a "leave-one-out" estimate based on the jackknife subsample consisting of all but one of the data points:

θ ^ ( i ) = f n 1 ( x 1 , , x i 1 , x i + 1 , , x n ) i = 1 , , n . {\displaystyle {\hat {\theta }}_{(i)}=f_{n-1}(x_{1},\ldots ,x_{i-1},x_{i+1},\ldots ,x_{n})\quad \quad i=1,\dots ,n.}

Then we define their average:

θ ^ j a c k = 1 n i = 1 n θ ^ ( i ) {\displaystyle {\hat {\theta }}_{\mathrm {jack} }={\frac {1}{n}}\sum _{i=1}^{n}{\hat {\theta }}_{(i)}}

The jackknife estimate of the bias of θ ^ {\displaystyle {\hat {\theta }}} is given by:

bias ^ ( θ ^ ) j a c k = ( n 1 ) ( θ ^ j a c k θ ^ ) {\displaystyle {\widehat {\text{bias}}}({\hat {\theta }})_{\mathrm {jack} }=(n-1)({\hat {\theta }}_{\mathrm {jack} }-{\hat {\theta }})}

and the resulting bias-corrected jackknife estimate of θ {\displaystyle \theta } is given by:

θ ^ jack = θ ^ bias ^ ( θ ^ ) j a c k = n θ ^ ( n 1 ) θ ^ j a c k . {\displaystyle {\hat {\theta }}_{\text{jack}}^{*}={\hat {\theta }}-{\widehat {\text{bias}}}({\hat {\theta }})_{\mathrm {jack} }=n{\hat {\theta }}-(n-1){\hat {\theta }}_{\mathrm {jack} }.}

This removes the bias in the special case that the bias is O ( n 1 ) {\displaystyle O(n^{-1})} and reduces it to O ( n 2 ) {\displaystyle O(n^{-2})} in other cases.[2]

Estimating the variance of an estimator

The jackknife technique can be also used to estimate the variance of an estimator calculated over the entire sample.

See also

  • Leave-one-out error

Literature

  • Berger, Y.G. (2007). "A jackknife variance estimator for unistage stratified samples with unequal probabilities". Biometrika. 94 (4): 953–964. doi:10.1093/biomet/asm072.
  • Berger, Y.G.; Rao, J.N.K. (2006). "Adjusted jackknife for imputation under unequal probability sampling without replacement". Journal of the Royal Statistical Society, Series B. 68 (3): 531–547. doi:10.1111/j.1467-9868.2006.00555.x.
  • Berger, Y.G.; Skinner, C.J. (2005). "A jackknife variance estimator for unequal probability sampling". Journal of the Royal Statistical Society, Series B. 67 (1): 79–89. doi:10.1111/j.1467-9868.2005.00489.x.
  • Jiang, J.; Lahiri, P.; Wan, S-M. (2002). "A unified jackknife theory for empirical best prediction with M-estimation". The Annals of Statistics. 30 (6): 1782–810. doi:10.1214/aos/1043351257.
  • Jones, H.L. (1974). "Jackknife estimation of functions of stratum means". Biometrika. 61 (2): 343–348. doi:10.2307/2334363. JSTOR 2334363.
  • Kish, L.; Frankel, M.R. (1974). "Inference from complex samples". Journal of the Royal Statistical Society, Series B. 36 (1): 1–37.
  • Krewski, D.; Rao, J.N.K. (1981). "Inference from stratified samples: properties of the linearization, jackknife and balanced repeated replication methods". The Annals of Statistics. 9 (5): 1010–1019. doi:10.1214/aos/1176345580.
  • Quenouille, M.H. (1956). "Notes on bias in estimation". Biometrika. 43 (3–4): 353–360. doi:10.1093/biomet/43.3-4.353.
  • Rao, J.N.K.; Shao, J. (1992). "Jackknife variance estimation with survey data under hot deck imputation". Biometrika. 79 (4): 811–822. doi:10.1093/biomet/79.4.811.
  • Rao, J.N.K.; Wu, C.F.J.; Yue, K. (1992). "Some recent work on resampling methods for complex surveys". Survey Methodology. 18 (2): 209–217.
  • Shao, J. and Tu, D. (1995). The Jackknife and Bootstrap. Springer-Verlag, Inc.
  • Tukey, J.W. (1958). "Bias and confidence in not-quite large samples (abstract)". The Annals of Mathematical Statistics. 29 (2): 614.
  • Wu, C.F.J. (1986). "Jackknife, Bootstrap and other resampling methods in regression analysis". The Annals of Statistics. 14 (4): 1261–1295. doi:10.1214/aos/1176350142.

Notes

  1. ^ Efron 1982, p. 2.
  2. ^ a b c Cameron & Trivedi 2005, p. 375.
  3. ^ Efron 1982, p. 14.
  4. ^ McIntosh, Avery I. "The Jackknife Estimation Method" (PDF). Boston University. Avery I. McIntosh. Archived from the original (PDF) on 2016-05-14. Retrieved 2016-04-30.: p. 3.

References

  • Cameron, Adrian; Trivedi, Pravin K. (2005). Microeconometrics : methods and applications. Cambridge New York: Cambridge University Press. ISBN 9780521848053.
  • Efron, Bradley; Stein, Charles (May 1981). "The Jackknife Estimate of Variance". The Annals of Statistics. 9 (3): 586–596. doi:10.1214/aos/1176345462. JSTOR 2240822.
  • Efron, Bradley (1982). The jackknife, the bootstrap, and other resampling plans. Philadelphia, PA: Society for Industrial and Applied Mathematics. ISBN 9781611970319.
  • Quenouille, Maurice H. (September 1949). "Problems in Plane Sampling". The Annals of Mathematical Statistics. 20 (3): 355–375. doi:10.1214/aoms/1177729989. JSTOR 2236533.
  • Quenouille, Maurice H. (1956). "Notes on Bias in Estimation". Biometrika. 43 (3–4): 353–360. doi:10.1093/biomet/43.3-4.353. JSTOR 2332914.
  • Tukey, John W. (1958). "Bias and confidence in not quite large samples (abstract)". The Annals of Mathematical Statistics. 29 (2): 614. doi:10.1214/aoms/1177706647.
  • v
  • t
  • e
Continuous data
Center
Dispersion
Shape
Count data
Summary tables
Dependence
Graphics
Study design
Survey methodology
Controlled experiments
Adaptive designs
Observational studies
Statistical theory
Frequentist inference
Point estimation
Interval estimation
Testing hypotheses
Parametric tests
Specific tests
  • Z-test (normal)
  • Student's t-test
  • F-test
Goodness of fit
Rank statistics
Bayesian inference
Correlation
Regression analysis
Linear regression
Non-standard predictors
Generalized linear model
Partition of variance
Categorical
Multivariate
Time-series
General
Specific tests
Time domain
Frequency domain
Survival
Survival function
Hazard function
Test
Biostatistics
Engineering statistics
Social statistics
Spatial statistics
  • Category
  • icon Mathematics portal
  • Commons
  • WikiProject