Taylor expansions for the moments of functions of random variables

(Learn how and when to remove this template message)

In probability theory, it is possible to approximate the moments of a function f of a random variable X using Taylor expansions, provided that f is sufficiently differentiable and that the moments of X are finite.

First moment

Given μ X {\displaystyle \mu _{X}} and σ X 2 {\displaystyle \sigma _{X}^{2}} , the mean and the variance of X {\displaystyle X} , respectively,[1] a Taylor expansion of the expected value of f ( X ) {\displaystyle f(X)} can be found via

E [ f ( X ) ] = E [ f ( μ X + ( X μ X ) ) ] E [ f ( μ X ) + f ( μ X ) ( X μ X ) + 1 2 f ( μ X ) ( X μ X ) 2 ] = f ( μ X ) + f ( μ X ) E [ X μ X ] + 1 2 f ( μ X ) E [ ( X μ X ) 2 ] . {\displaystyle {\begin{aligned}\operatorname {E} \left[f(X)\right]&{}=\operatorname {E} \left[f\left(\mu _{X}+\left(X-\mu _{X}\right)\right)\right]\\&{}\approx \operatorname {E} \left[f(\mu _{X})+f'(\mu _{X})\left(X-\mu _{X}\right)+{\frac {1}{2}}f''(\mu _{X})\left(X-\mu _{X}\right)^{2}\right]\\&{}=f(\mu _{X})+f'(\mu _{X})\operatorname {E} \left[X-\mu _{X}\right]+{\frac {1}{2}}f''(\mu _{X})\operatorname {E} \left[\left(X-\mu _{X}\right)^{2}\right].\end{aligned}}}

Since E [ X μ X ] = 0 , {\displaystyle E[X-\mu _{X}]=0,} the second term vanishes. Also, E [ ( X μ X ) 2 ] {\displaystyle E[(X-\mu _{X})^{2}]} is σ X 2 {\displaystyle \sigma _{X}^{2}} . Therefore,

E [ f ( X ) ] f ( μ X ) + f ( μ X ) 2 σ X 2 {\displaystyle \operatorname {E} \left[f(X)\right]\approx f(\mu _{X})+{\frac {f''(\mu _{X})}{2}}\sigma _{X}^{2}} .

It is possible to generalize this to functions of more than one variable using multivariate Taylor expansions. For example,

E [ X Y ] E [ X ] E [ Y ] cov [ X , Y ] E [ Y ] 2 + E [ X ] E [ Y ] 3 var [ Y ] {\displaystyle \operatorname {E} \left[{\frac {X}{Y}}\right]\approx {\frac {\operatorname {E} \left[X\right]}{\operatorname {E} \left[Y\right]}}-{\frac {\operatorname {cov} \left[X,Y\right]}{\operatorname {E} \left[Y\right]^{2}}}+{\frac {\operatorname {E} \left[X\right]}{\operatorname {E} \left[Y\right]^{3}}}\operatorname {var} \left[Y\right]}

Second moment

Similarly,[1]

var [ f ( X ) ] ( f ( E [ X ] ) ) 2 var [ X ] = ( f ( μ X ) ) 2 σ X 2 1 4 ( f ( μ X ) ) 2 σ X 4 {\displaystyle \operatorname {var} \left[f(X)\right]\approx \left(f'(\operatorname {E} \left[X\right])\right)^{2}\operatorname {var} \left[X\right]=\left(f'(\mu _{X})\right)^{2}\sigma _{X}^{2}-{\frac {1}{4}}\left(f''(\mu _{X})\right)^{2}\sigma _{X}^{4}}

The above is obtained using a second order approximation, following the method used in estimating the first moment. It will be a poor approximation in cases where f ( X ) {\displaystyle f(X)} is highly non-linear. This is a special case of the delta method.

Indeed, we take E [ f ( X ) ] f ( μ X ) + f ( μ X ) 2 σ X 2 {\displaystyle \operatorname {E} \left[f(X)\right]\approx f(\mu _{X})+{\frac {f''(\mu _{X})}{2}}\sigma _{X}^{2}} .

With f ( X ) = g ( X ) 2 {\displaystyle f(X)=g(X)^{2}} , we get E [ Y 2 ] {\displaystyle \operatorname {E} \left[Y^{2}\right]} . The variance is then computed using the formula var [ Y ] = E [ Y 2 ] μ Y 2 {\displaystyle \operatorname {var} \left[Y\right]=\operatorname {E} \left[Y^{2}\right]-\mu _{Y}^{2}} .

An example is,

var [ X Y ] var [ X ] E [ Y ] 2 2 E [ X ] E [ Y ] 3 cov [ X , Y ] + E [ X ] 2 E [ Y ] 4 var [ Y ] . {\displaystyle \operatorname {var} \left[{\frac {X}{Y}}\right]\approx {\frac {\operatorname {var} \left[X\right]}{\operatorname {E} \left[Y\right]^{2}}}-{\frac {2\operatorname {E} \left[X\right]}{\operatorname {E} \left[Y\right]^{3}}}\operatorname {cov} \left[X,Y\right]+{\frac {\operatorname {E} \left[X\right]^{2}}{\operatorname {E} \left[Y\right]^{4}}}\operatorname {var} \left[Y\right].}

The second order approximation, when X follows a normal distribution, is:[2]

var [ f ( X ) ] ( f ( E [ X ] ) ) 2 var [ X ] + ( f ( E [ X ] ) ) 2 2 ( var [ X ] ) 2 = ( f ( μ X ) ) 2 σ X 2 + 1 2 ( f ( μ X ) ) 2 σ X 4 + ( f ( μ X ) ) ( f ( μ X ) ) σ X 4 {\displaystyle \operatorname {var} \left[f(X)\right]\approx \left(f'(\operatorname {E} \left[X\right])\right)^{2}\operatorname {var} \left[X\right]+{\frac {\left(f''(\operatorname {E} \left[X\right])\right)^{2}}{2}}\left(\operatorname {var} \left[X\right]\right)^{2}=\left(f'(\mu _{X})\right)^{2}\sigma _{X}^{2}+{\frac {1}{2}}\left(f''(\mu _{X})\right)^{2}\sigma _{X}^{4}+\left(f'(\mu _{X})\right)\left(f'''(\mu _{X})\right)\sigma _{X}^{4}}

First product moment

To find a second-order approximation for the covariance of functions of two random variables (with the same function applied to both), one can proceed as follows. First, note that cov [ f ( X ) , f ( Y ) ] = E [ f ( X ) f ( Y ) ] E [ f ( X ) ] E [ f ( Y ) ] {\displaystyle \operatorname {cov} \left[f(X),f(Y)\right]=\operatorname {E} \left[f(X)f(Y)\right]-\operatorname {E} \left[f(X)\right]\operatorname {E} \left[f(Y)\right]} . Since a second-order expansion for E [ f ( X ) ] {\displaystyle \operatorname {E} \left[f(X)\right]} has already been derived above, it only remains to find E [ f ( X ) f ( Y ) ] {\displaystyle \operatorname {E} \left[f(X)f(Y)\right]} . Treating f ( X ) f ( Y ) {\displaystyle f(X)f(Y)} as a two-variable function, the second-order Taylor expansion is as follows:

f ( X ) f ( Y ) f ( μ X ) f ( μ Y ) + ( X μ X ) f ( μ X ) f ( μ Y ) + ( Y μ Y ) f ( μ X ) f ( μ Y ) + 1 2 [ ( X μ X ) 2 f ( μ X ) f ( μ Y ) + 2 ( X μ X ) ( Y μ Y ) f ( μ X ) f ( μ Y ) + ( Y μ Y ) 2 f ( μ X ) f ( μ Y ) ] {\displaystyle {\begin{aligned}f(X)f(Y)&{}\approx f(\mu _{X})f(\mu _{Y})+(X-\mu _{X})f'(\mu _{X})f(\mu _{Y})+(Y-\mu _{Y})f(\mu _{X})f'(\mu _{Y})+{\frac {1}{2}}\left[(X-\mu _{X})^{2}f''(\mu _{X})f(\mu _{Y})+2(X-\mu _{X})(Y-\mu _{Y})f'(\mu _{X})f'(\mu _{Y})+(Y-\mu _{Y})^{2}f(\mu _{X})f''(\mu _{Y})\right]\end{aligned}}}

Taking expectation of the above and simplifying—making use of the identities E ( X 2 ) = var ( X ) + [ E ( X ) ] 2 {\displaystyle \operatorname {E} (X^{2})=\operatorname {var} (X)+\left[\operatorname {E} (X)\right]^{2}} and E ( X Y ) = cov ( X , Y ) + [ E ( X ) ] [ E ( Y ) ] {\displaystyle \operatorname {E} (XY)=\operatorname {cov} (X,Y)+\left[\operatorname {E} (X)\right]\left[\operatorname {E} (Y)\right]} —leads to E [ f ( X ) f ( Y ) ] f ( μ X ) f ( μ Y ) + f ( μ X ) f ( μ Y ) cov ( X , Y ) + 1 2 f ( μ X ) f ( μ Y ) var ( X ) + 1 2 f ( μ X ) f ( μ Y ) var ( Y ) {\displaystyle \operatorname {E} \left[f(X)f(Y)\right]\approx f(\mu _{X})f(\mu _{Y})+f'(\mu _{X})f'(\mu _{Y})\operatorname {cov} (X,Y)+{\frac {1}{2}}f''(\mu _{X})f(\mu _{Y})\operatorname {var} (X)+{\frac {1}{2}}f(\mu _{X})f''(\mu _{Y})\operatorname {var} (Y)} . Hence,

cov [ f ( X ) , f ( Y ) ] f ( μ X ) f ( μ Y ) + f ( μ X ) f ( μ Y ) cov ( X , Y ) + 1 2 f ( μ X ) f ( μ Y ) var ( X ) + 1 2 f ( μ X ) f ( μ Y ) var ( Y ) [ f ( μ X ) + 1 2 f ( μ X ) var ( X ) ] [ f ( μ Y ) + 1 2 f ( μ Y ) var ( Y ) ] = f ( μ X ) f ( μ Y ) cov ( X , Y ) 1 4 f ( μ X ) f ( μ Y ) var ( X ) var ( Y ) {\displaystyle {\begin{aligned}\operatorname {cov} \left[f(X),f(Y)\right]&{}\approx f(\mu _{X})f(\mu _{Y})+f'(\mu _{X})f'(\mu _{Y})\operatorname {cov} (X,Y)+{\frac {1}{2}}f''(\mu _{X})f(\mu _{Y})\operatorname {var} (X)+{\frac {1}{2}}f(\mu _{X})f''(\mu _{Y})\operatorname {var} (Y)-\left[f(\mu _{X})+{\frac {1}{2}}f''(\mu _{X})\operatorname {var} (X)\right]\left[f(\mu _{Y})+{\frac {1}{2}}f''(\mu _{Y})\operatorname {var} (Y)\right]\\&{}=f'(\mu _{X})f'(\mu _{Y})\operatorname {cov} (X,Y)-{\frac {1}{4}}f''(\mu _{X})f''(\mu _{Y})\operatorname {var} (X)\operatorname {var} (Y)\end{aligned}}}

Random vectors

If X is a random vector, the approximations for the mean and variance of f ( X ) {\displaystyle f(X)} are given by[3]

E ( f ( X ) ) = f ( μ X ) + 1 2 trace ( H f ( μ X ) Σ X ) var ( f ( X ) ) = f ( μ X ) t Σ X f ( μ X ) + 1 2 trace ( H f ( μ X ) Σ X H f ( μ X ) Σ X ) . {\displaystyle {\begin{aligned}\operatorname {E} (f(X))&=f(\mu _{X})+{\frac {1}{2}}\operatorname {trace} (H_{f}(\mu _{X})\Sigma _{X})\\\operatorname {var} (f(X))&=\nabla f(\mu _{X})^{t}\Sigma _{X}\nabla f(\mu _{X})+{\frac {1}{2}}\operatorname {trace} \left(H_{f}(\mu _{X})\Sigma _{X}H_{f}(\mu _{X})\Sigma _{X}\right).\end{aligned}}}

Here f {\displaystyle \nabla f} and H f {\displaystyle H_{f}} denote the gradient and the Hessian matrix respectively, and Σ X {\displaystyle \Sigma _{X}} is the covariance matrix of X.

See also

Notes

  1. ^ a b Haym Benaroya, Seon Mi Han, and Mark Nagurka. Probability Models in Engineering and Science. CRC Press, 2005, p166.
  2. ^ Hendeby, Gustaf; Gustafsson, Fredrik. "ON NONLINEAR TRANSFORMATIONS OF GAUSSIAN DISTRIBUTIONS" (PDF). Retrieved 5 October 2017.
  3. ^ Rego, Bruno V.; Weiss, Dar; Bersi, Matthew R.; Humphrey, Jay D. (14 December 2021). "Uncertainty quantification in subject‐specific estimation of local vessel mechanical properties". International Journal for Numerical Methods in Biomedical Engineering. 37 (12): e3535. doi:10.1002/cnm.3535. ISSN 2040-7939. PMC 9019846. PMID 34605615.

Further reading