Edgeworth series

The Gram–Charlier A series (named in honor of Jørgen Pedersen Gram and Carl Charlier), and the Edgeworth series (named in honor of Francis Ysidro Edgeworth) are series that approximate a probability distribution in terms of its cumulants.[1] The series are the same; but, the arrangement of terms (and thus the accuracy of truncating the series) differ.[2] The key idea of these expansions is to write the characteristic function of the distribution whose probability density function f is to be approximated in terms of the characteristic function of a distribution with known and suitable properties, and to recover f through the inverse Fourier transform.

Gram–Charlier A series

We examine a continuous random variable. Let f ^ {\displaystyle {\hat {f}}} be the characteristic function of its distribution whose density function is f, and κ r {\displaystyle \kappa _{r}} its cumulants. We expand in terms of a known distribution with probability density function ψ, characteristic function ψ ^ {\displaystyle {\hat {\psi }}} , and cumulants γ r {\displaystyle \gamma _{r}} . The density ψ is generally chosen to be that of the normal distribution, but other choices are possible as well. By the definition of the cumulants, we have (see Wallace, 1958)[3]

f ^ ( t ) = exp [ r = 1 κ r ( i t ) r r ! ] {\displaystyle {\hat {f}}(t)=\exp \left[\sum _{r=1}^{\infty }\kappa _{r}{\frac {(it)^{r}}{r!}}\right]} and
ψ ^ ( t ) = exp [ r = 1 γ r ( i t ) r r ! ] , {\displaystyle {\hat {\psi }}(t)=\exp \left[\sum _{r=1}^{\infty }\gamma _{r}{\frac {(it)^{r}}{r!}}\right],}

which gives the following formal identity:

f ^ ( t ) = exp [ r = 1 ( κ r γ r ) ( i t ) r r ! ] ψ ^ ( t ) . {\displaystyle {\hat {f}}(t)=\exp \left[\sum _{r=1}^{\infty }(\kappa _{r}-\gamma _{r}){\frac {(it)^{r}}{r!}}\right]{\hat {\psi }}(t)\,.}

By the properties of the Fourier transform, ( i t ) r ψ ^ ( t ) {\displaystyle (it)^{r}{\hat {\psi }}(t)} is the Fourier transform of ( 1 ) r [ D r ψ ] ( x ) {\displaystyle (-1)^{r}[D^{r}\psi ](-x)} , where D is the differential operator with respect to x. Thus, after changing x {\displaystyle x} with x {\displaystyle -x} on both sides of the equation, we find for f the formal expansion

f ( x ) = exp [ r = 1 ( κ r γ r ) ( D ) r r ! ] ψ ( x ) . {\displaystyle f(x)=\exp \left[\sum _{r=1}^{\infty }(\kappa _{r}-\gamma _{r}){\frac {(-D)^{r}}{r!}}\right]\psi (x)\,.}

If ψ is chosen as the normal density

ϕ ( x ) = 1 2 π σ exp [ ( x μ ) 2 2 σ 2 ] {\displaystyle \phi (x)={\frac {1}{{\sqrt {2\pi }}\sigma }}\exp \left[-{\frac {(x-\mu )^{2}}{2\sigma ^{2}}}\right]}

with mean and variance as given by f, that is, mean μ = κ 1 {\displaystyle \mu =\kappa _{1}} and variance σ 2 = κ 2 {\displaystyle \sigma ^{2}=\kappa _{2}} , then the expansion becomes

f ( x ) = exp [ r = 3 κ r ( D ) r r ! ] ϕ ( x ) , {\displaystyle f(x)=\exp \left[\sum _{r=3}^{\infty }\kappa _{r}{\frac {(-D)^{r}}{r!}}\right]\phi (x),}

since γ r = 0 {\displaystyle \gamma _{r}=0} for all r > 2, as higher cumulants of the normal distribution are 0. By expanding the exponential and collecting terms according to the order of the derivatives, we arrive at the Gram–Charlier A series. Such an expansion can be written compactly in terms of Bell polynomials as

exp [ r = 3 κ r ( D ) r r ! ] = n = 0 B n ( 0 , 0 , κ 3 , , κ n ) ( D ) n n ! . {\displaystyle \exp \left[\sum _{r=3}^{\infty }\kappa _{r}{\frac {(-D)^{r}}{r!}}\right]=\sum _{n=0}^{\infty }B_{n}(0,0,\kappa _{3},\ldots ,\kappa _{n}){\frac {(-D)^{n}}{n!}}.}

Since the n-th derivative of the Gaussian function ϕ {\displaystyle \phi } is given in terms of Hermite polynomial as

ϕ ( n ) ( x ) = ( 1 ) n σ n H e n ( x μ σ ) ϕ ( x ) , {\displaystyle \phi ^{(n)}(x)={\frac {(-1)^{n}}{\sigma ^{n}}}He_{n}\left({\frac {x-\mu }{\sigma }}\right)\phi (x),}

this gives us the final expression of the Gram–Charlier A series as

f ( x ) = ϕ ( x ) n = 0 1 n ! σ n B n ( 0 , 0 , κ 3 , , κ n ) H e n ( x μ σ ) . {\displaystyle f(x)=\phi (x)\sum _{n=0}^{\infty }{\frac {1}{n!\sigma ^{n}}}B_{n}(0,0,\kappa _{3},\ldots ,\kappa _{n})He_{n}\left({\frac {x-\mu }{\sigma }}\right).}

Integrating the series gives us the cumulative distribution function

F ( x ) = x f ( u ) d u = Φ ( x ) ϕ ( x ) n = 3 1 n ! σ n 1 B n ( 0 , 0 , κ 3 , , κ n ) H e n 1 ( x μ σ ) , {\displaystyle F(x)=\int _{-\infty }^{x}f(u)du=\Phi (x)-\phi (x)\sum _{n=3}^{\infty }{\frac {1}{n!\sigma ^{n-1}}}B_{n}(0,0,\kappa _{3},\ldots ,\kappa _{n})He_{n-1}\left({\frac {x-\mu }{\sigma }}\right),}

where Φ {\displaystyle \Phi } is the CDF of the normal distribution.

If we include only the first two correction terms to the normal distribution, we obtain

f ( x ) 1 2 π σ exp [ ( x μ ) 2 2 σ 2 ] [ 1 + κ 3 3 ! σ 3 H e 3 ( x μ σ ) + κ 4 4 ! σ 4 H e 4 ( x μ σ ) ] , {\displaystyle f(x)\approx {\frac {1}{{\sqrt {2\pi }}\sigma }}\exp \left[-{\frac {(x-\mu )^{2}}{2\sigma ^{2}}}\right]\left[1+{\frac {\kappa _{3}}{3!\sigma ^{3}}}He_{3}\left({\frac {x-\mu }{\sigma }}\right)+{\frac {\kappa _{4}}{4!\sigma ^{4}}}He_{4}\left({\frac {x-\mu }{\sigma }}\right)\right]\,,}

with H e 3 ( x ) = x 3 3 x {\displaystyle He_{3}(x)=x^{3}-3x} and H e 4 ( x ) = x 4 6 x 2 + 3 {\displaystyle He_{4}(x)=x^{4}-6x^{2}+3} .

Note that this expression is not guaranteed to be positive, and is therefore not a valid probability distribution. The Gram–Charlier A series diverges in many cases of interest—it converges only if f ( x ) {\displaystyle f(x)} falls off faster than exp ( ( x 2 ) / 4 ) {\displaystyle \exp(-(x^{2})/4)} at infinity (Cramér 1957). When it does not converge, the series is also not a true asymptotic expansion, because it is not possible to estimate the error of the expansion. For this reason, the Edgeworth series (see next section) is generally preferred over the Gram–Charlier A series.

The Edgeworth series

Edgeworth developed a similar expansion as an improvement to the central limit theorem.[4] The advantage of the Edgeworth series is that the error is controlled, so that it is a true asymptotic expansion.

Let { Z i } {\displaystyle \{Z_{i}\}} be a sequence of independent and identically distributed random variables with finite mean μ {\displaystyle \mu } and variance σ 2 {\displaystyle \sigma ^{2}} , and let X n {\displaystyle X_{n}} be their standardized sums:

X n = 1 n i = 1 n Z i μ σ . {\displaystyle X_{n}={\frac {1}{\sqrt {n}}}\sum _{i=1}^{n}{\frac {Z_{i}-\mu }{\sigma }}.}

Let F n {\displaystyle F_{n}} denote the cumulative distribution functions of the variables X n {\displaystyle X_{n}} . Then by the central limit theorem,

lim n F n ( x ) = Φ ( x ) x 1 2 π e 1 2 q 2 d q {\displaystyle \lim _{n\to \infty }F_{n}(x)=\Phi (x)\equiv \int _{-\infty }^{x}{\tfrac {1}{\sqrt {2\pi }}}e^{-{\frac {1}{2}}q^{2}}dq}

for every x {\displaystyle x} , as long as the mean and variance are finite.

The standardization of { Z i } {\displaystyle \{Z_{i}\}} ensures that the first two cumulants of X n {\displaystyle X_{n}} are κ 1 F n = 0 {\displaystyle \kappa _{1}^{F_{n}}=0} and κ 2 F n = 1. {\displaystyle \kappa _{2}^{F_{n}}=1.} Now assume that, in addition to having mean μ {\displaystyle \mu } and variance σ 2 {\displaystyle \sigma ^{2}} , the i.i.d. random variables Z i {\displaystyle Z_{i}} have higher cumulants κ r {\displaystyle \kappa _{r}} . From the additivity and homogeneity properties of cumulants, the cumulants of X n {\displaystyle X_{n}} in terms of the cumulants of Z i {\displaystyle Z_{i}} are for r 2 {\displaystyle r\geq 2} ,

κ r F n = n κ r σ r n r / 2 = λ r n r / 2 1 w h e r e λ r = κ r σ r . {\displaystyle \kappa _{r}^{F_{n}}={\frac {n\kappa _{r}}{\sigma ^{r}n^{r/2}}}={\frac {\lambda _{r}}{n^{r/2-1}}}\quad \mathrm {where} \quad \lambda _{r}={\frac {\kappa _{r}}{\sigma ^{r}}}.}

If we expand the formal expression of the characteristic function f ^ n ( t ) {\displaystyle {\hat {f}}_{n}(t)} of F n {\displaystyle F_{n}} in terms of the standard normal distribution, that is, if we set

ϕ ( x ) = 1 2 π exp ( 1 2 x 2 ) , {\displaystyle \phi (x)={\frac {1}{\sqrt {2\pi }}}\exp(-{\tfrac {1}{2}}x^{2}),}

then the cumulant differences in the expansion are

κ 1 F n γ 1 = 0 , {\displaystyle \kappa _{1}^{F_{n}}-\gamma _{1}=0,}
κ 2 F n γ 2 = 0 , {\displaystyle \kappa _{2}^{F_{n}}-\gamma _{2}=0,}
κ r F n γ r = λ r n r / 2 1 ; r 3. {\displaystyle \kappa _{r}^{F_{n}}-\gamma _{r}={\frac {\lambda _{r}}{n^{r/2-1}}};\qquad r\geq 3.}

The Gram–Charlier A series for the density function of X n {\displaystyle X_{n}} is now

f n ( x ) = ϕ ( x ) r = 0 1 r ! B r ( 0 , 0 , λ 3 n 1 / 2 , , λ r n r / 2 1 ) H e r ( x ) . {\displaystyle f_{n}(x)=\phi (x)\sum _{r=0}^{\infty }{\frac {1}{r!}}B_{r}\left(0,0,{\frac {\lambda _{3}}{n^{1/2}}},\ldots ,{\frac {\lambda _{r}}{n^{r/2-1}}}\right)He_{r}(x).}

The Edgeworth series is developed similarly to the Gram–Charlier A series, only that now terms are collected according to powers of n {\displaystyle n} . The coefficients of nm/2 term can be obtained by collecting the monomials of the Bell polynomials corresponding to the integer partitions of m. Thus, we have the characteristic function as

f ^ n ( t ) = [ 1 + j = 1 P j ( i t ) n j / 2 ] exp ( t 2 / 2 ) , {\displaystyle {\hat {f}}_{n}(t)=\left[1+\sum _{j=1}^{\infty }{\frac {P_{j}(it)}{n^{j/2}}}\right]\exp(-t^{2}/2)\,,}

where P j ( x ) {\displaystyle P_{j}(x)} is a polynomial of degree 3 j {\displaystyle 3j} . Again, after inverse Fourier transform, the density function f n {\displaystyle f_{n}} follows as

f n ( x ) = ϕ ( x ) + j = 1 P j ( D ) n j / 2 ϕ ( x ) . {\displaystyle f_{n}(x)=\phi (x)+\sum _{j=1}^{\infty }{\frac {P_{j}(-D)}{n^{j/2}}}\phi (x)\,.}

Likewise, integrating the series, we obtain the distribution function

F n ( x ) = Φ ( x ) + j = 1 1 n j / 2 P j ( D ) D ϕ ( x ) . {\displaystyle F_{n}(x)=\Phi (x)+\sum _{j=1}^{\infty }{\frac {1}{n^{j/2}}}{\frac {P_{j}(-D)}{D}}\phi (x)\,.}

We can explicitly write the polynomial P m ( D ) {\displaystyle P_{m}(-D)} as

P m ( D ) = i 1 k i ! ( λ l i l i ! ) k i ( D ) s , {\displaystyle P_{m}(-D)=\sum \prod _{i}{\frac {1}{k_{i}!}}\left({\frac {\lambda _{l_{i}}}{l_{i}!}}\right)^{k_{i}}(-D)^{s},}

where the summation is over all the integer partitions of m such that i i k i = m {\displaystyle \sum _{i}ik_{i}=m} and l i = i + 2 {\displaystyle l_{i}=i+2} and s = i k i l i . {\displaystyle s=\sum _{i}k_{i}l_{i}.}

For example, if m = 3, then there are three ways to partition this number: 1 + 1 + 1 = 2 + 1 = 3. As such we need to examine three cases:

  • 1 + 1 + 1 = 1 · k1, so we have k1 = 3, l1 = 3, and s = 9.
  • 1 + 2 = 1 · k1 + 2 · k2, so we have k1 = 1, k2 = 1, l1 = 3, l2 = 4, and s = 7.
  • 3 = 3 · k3, so we have k3 = 1, l3 = 5, and s = 5.

Thus, the required polynomial is

P 3 ( D ) = 1 3 ! ( λ 3 3 ! ) 3 ( D ) 9 + 1 1 ! 1 ! ( λ 3 3 ! ) ( λ 4 4 ! ) ( D ) 7 + 1 1 ! ( λ 5 5 ! ) ( D ) 5 = λ 3 3 1296 ( D ) 9 + λ 3 λ 4 144 ( D ) 7 + λ 5 120 ( D ) 5 . {\displaystyle {\begin{aligned}P_{3}(-D)&={\frac {1}{3!}}\left({\frac {\lambda _{3}}{3!}}\right)^{3}(-D)^{9}+{\frac {1}{1!1!}}\left({\frac {\lambda _{3}}{3!}}\right)\left({\frac {\lambda _{4}}{4!}}\right)(-D)^{7}+{\frac {1}{1!}}\left({\frac {\lambda _{5}}{5!}}\right)(-D)^{5}\\&={\frac {\lambda _{3}^{3}}{1296}}(-D)^{9}+{\frac {\lambda _{3}\lambda _{4}}{144}}(-D)^{7}+{\frac {\lambda _{5}}{120}}(-D)^{5}.\end{aligned}}}

The first five terms of the expansion are[5]

f n ( x ) = ϕ ( x ) 1 n 1 2 ( 1 6 λ 3 ϕ ( 3 ) ( x ) ) + 1 n ( 1 24 λ 4 ϕ ( 4 ) ( x ) + 1 72 λ 3 2 ϕ ( 6 ) ( x ) ) 1 n 3 2 ( 1 120 λ 5 ϕ ( 5 ) ( x ) + 1 144 λ 3 λ 4 ϕ ( 7 ) ( x ) + 1 1296 λ 3 3 ϕ ( 9 ) ( x ) ) + 1 n 2 ( 1 720 λ 6 ϕ ( 6 ) ( x ) + ( 1 1152 λ 4 2 + 1 720 λ 3 λ 5 ) ϕ ( 8 ) ( x ) + 1 1728 λ 3 2 λ 4 ϕ ( 10 ) ( x ) + 1 31104 λ 3 4 ϕ ( 12 ) ( x ) ) + O ( n 5 2 ) . {\displaystyle {\begin{aligned}f_{n}(x)&=\phi (x)\\&\quad -{\frac {1}{n^{\frac {1}{2}}}}\left({\tfrac {1}{6}}\lambda _{3}\,\phi ^{(3)}(x)\right)\\&\quad +{\frac {1}{n}}\left({\tfrac {1}{24}}\lambda _{4}\,\phi ^{(4)}(x)+{\tfrac {1}{72}}\lambda _{3}^{2}\,\phi ^{(6)}(x)\right)\\&\quad -{\frac {1}{n^{\frac {3}{2}}}}\left({\tfrac {1}{120}}\lambda _{5}\,\phi ^{(5)}(x)+{\tfrac {1}{144}}\lambda _{3}\lambda _{4}\,\phi ^{(7)}(x)+{\tfrac {1}{1296}}\lambda _{3}^{3}\,\phi ^{(9)}(x)\right)\\&\quad +{\frac {1}{n^{2}}}\left({\tfrac {1}{720}}\lambda _{6}\,\phi ^{(6)}(x)+\left({\tfrac {1}{1152}}\lambda _{4}^{2}+{\tfrac {1}{720}}\lambda _{3}\lambda _{5}\right)\phi ^{(8)}(x)+{\tfrac {1}{1728}}\lambda _{3}^{2}\lambda _{4}\,\phi ^{(10)}(x)+{\tfrac {1}{31104}}\lambda _{3}^{4}\,\phi ^{(12)}(x)\right)\\&\quad +O\left(n^{-{\frac {5}{2}}}\right).\end{aligned}}}

Here, φ(j)(x) is the j-th derivative of φ(·) at point x. Remembering that the derivatives of the density of the normal distribution are related to the normal density by ϕ ( n ) ( x ) = ( 1 ) n H e n ( x ) ϕ ( x ) {\displaystyle \phi ^{(n)}(x)=(-1)^{n}He_{n}(x)\phi (x)} , (where H e n {\displaystyle He_{n}} is the Hermite polynomial of order n), this explains the alternative representations in terms of the density function. Blinnikov and Moessner (1998) have given a simple algorithm to calculate higher-order terms of the expansion.

Note that in case of a lattice distributions (which have discrete values), the Edgeworth expansion must be adjusted to account for the discontinuous jumps between lattice points.[6]

Illustration: density of the sample mean of three χ² distributions

Density of the sample mean of three chi2 variables. The chart compares the true density, the normal approximation, and two Edgeworth expansions.

Take X i χ 2 ( k = 2 ) , i = 1 , 2 , 3 ( n = 3 ) {\displaystyle X_{i}\sim \chi ^{2}(k=2),\,i=1,2,3\,(n=3)} and the sample mean X ¯ = 1 3 i = 1 3 X i {\displaystyle {\bar {X}}={\frac {1}{3}}\sum _{i=1}^{3}X_{i}} .

We can use several distributions for X ¯ {\displaystyle {\bar {X}}} :

  • The exact distribution, which follows a gamma distribution: X ¯ G a m m a ( α = n k / 2 , θ = 2 / n ) = G a m m a ( α = 3 , θ = 2 / 3 ) {\displaystyle {\bar {X}}\sim \mathrm {Gamma} \left(\alpha =n\cdot k/2,\theta =2/n\right)=\mathrm {Gamma} \left(\alpha =3,\theta =2/3\right)} .
  • The asymptotic normal distribution: X ¯ n N ( k , 2 k / n ) = N ( 2 , 4 / 3 ) {\displaystyle {\bar {X}}{\xrightarrow {n\to \infty }}N(k,2\cdot k/n)=N(2,4/3)} .
  • Two Edgeworth expansions, of degrees 2 and 3.

Discussion of results

  • For finite samples, an Edgeworth expansion is not guaranteed to be a proper probability distribution as the CDF values at some points may go beyond [ 0 , 1 ] {\displaystyle [0,1]} .
  • They guarantee (asymptotically) absolute errors, but relative errors can be easily assessed by comparing the leading Edgeworth term in the remainder with the overall leading term.[2]

See also

References

  1. ^ Stuart, A., & Kendall, M. G. (1968). The advanced theory of statistics. Hafner Publishing Company.
  2. ^ a b Kolassa, John E. (2006). Series approximation methods in statistics (3rd ed.). Springer. ISBN 0387322272.
  3. ^ Wallace, D. L. (1958). "Asymptotic Approximations to Distributions". Annals of Mathematical Statistics. 29 (3): 635–654. doi:10.1214/aoms/1177706528. JSTOR 2237255.
  4. ^ Hall, P. (2013). The bootstrap and Edgeworth expansion. Springer Science & Business Media.
  5. ^ Weisstein, Eric W. "Edgeworth Series". MathWorld.
  6. ^ Kolassa, John E.; McCullagh, Peter (1990). "Edgeworth series for lattice distributions". Annals of Statistics. 18 (2): 981–985. doi:10.1214/aos/1176347637. JSTOR 2242145.

Further reading

  • H. Cramér. (1957). Mathematical Methods of Statistics. Princeton University Press, Princeton.
  • Wallace, D. L. (1958). "Asymptotic approximations to distributions". Annals of Mathematical Statistics. 29 (3): 635–654. doi:10.1214/aoms/1177706528.
  • M. Kendall & A. Stuart. (1977), The advanced theory of statistics, Vol 1: Distribution theory, 4th Edition, Macmillan, New York.
  • P. McCullagh (1987). Tensor Methods in Statistics. Chapman and Hall, London.
  • D. R. Cox and O. E. Barndorff-Nielsen (1989). Asymptotic Techniques for Use in Statistics. Chapman and Hall, London.
  • P. Hall (1992). The Bootstrap and Edgeworth Expansion. Springer, New York.
  • "Edgeworth series", Encyclopedia of Mathematics, EMS Press, 2001 [1994]
  • Blinnikov, S.; Moessner, R. (1998). "Expansions for nearly Gaussian distributions" (PDF). Astronomy and Astrophysics Supplement Series. 130: 193–205. arXiv:astro-ph/9711239. Bibcode:1998A&AS..130..193B. doi:10.1051/aas:1998221.
  • Martin, Douglas; Arora, Rohit (2017). "Inefficiency and bias of modified value-at-risk and expected shortfall". Journal of Risk. 19 (6): 59–84. doi:10.21314/JOR.2017.365.
  • J. E. Kolassa (2006). Series Approximation Methods in Statistics (3rd ed.). (Lecture Notes in Statistics #88). Springer, New York.