Moment-generating function

Concept in probability theory and statistics

In probability theory and statistics, the moment-generating function of a real-valued random variable is an alternative specification of its probability distribution. Thus, it provides the basis of an alternative route to analytical results compared with working directly with probability density functions or cumulative distribution functions. There are particularly simple results for the moment-generating functions of distributions defined by the weighted sums of random variables. However, not all random variables have moment-generating functions.

As its name implies, the moment-generating function can be used to compute a distribution’s moments: the nth moment about 0 is the nth derivative of the moment-generating function, evaluated at 0.

In addition to real-valued distributions (univariate distributions), moment-generating functions can be defined for vector- or matrix-valued random variables, and can even be extended to more general cases.

The moment-generating function of a real-valued distribution does not always exist, unlike the characteristic function. There are relations between the behavior of the moment-generating function of a distribution and properties of the distribution, such as the existence of moments.

Definition

Let X {\displaystyle X} be a random variable with CDF F X {\displaystyle F_{X}} . The moment generating function (mgf) of X {\displaystyle X} (or F X {\displaystyle F_{X}} ), denoted by M X ( t ) {\displaystyle M_{X}(t)} , is

M X ( t ) = E [ e t X ] {\displaystyle M_{X}(t)=\operatorname {E} \left[e^{tX}\right]}

provided this expectation exists for t {\displaystyle t} in some open neighborhood of 0. That is, there is an h > 0 {\displaystyle h>0} such that for all t {\displaystyle t} in h < t < h {\displaystyle -h<t<h} , E [ e t X ] {\displaystyle \operatorname {E} \left[e^{tX}\right]} exists. If the expectation does not exist in an open neighborhood of 0, we say that the moment generating function does not exist.[1]

In other words, the moment-generating function of X is the expectation of the random variable e t X {\displaystyle e^{tX}} . More generally, when X = ( X 1 , , X n ) T {\displaystyle \mathbf {X} =(X_{1},\ldots ,X_{n})^{\mathrm {T} }} , an n {\displaystyle n} -dimensional random vector, and t {\displaystyle \mathbf {t} } is a fixed vector, one uses t X = t T X {\displaystyle \mathbf {t} \cdot \mathbf {X} =\mathbf {t} ^{\mathrm {T} }\mathbf {X} } instead of  t X {\displaystyle tX} :

M X ( t ) := E ( e t T X ) . {\displaystyle M_{\mathbf {X} }(\mathbf {t} ):=\operatorname {E} \left(e^{\mathbf {t} ^{\mathrm {T} }\mathbf {X} }\right).}

M X ( 0 ) {\displaystyle M_{X}(0)} always exists and is equal to 1. However, a key problem with moment-generating functions is that moments and the moment-generating function may not exist, as the integrals need not converge absolutely. By contrast, the characteristic function or Fourier transform always exists (because it is the integral of a bounded function on a space of finite measure), and for some purposes may be used instead.

The moment-generating function is so named because it can be used to find the moments of the distribution.[2] The series expansion of e t X {\displaystyle e^{tX}} is

e t X = 1 + t X + t 2 X 2 2 ! + t 3 X 3 3 ! + + t n X n n ! + . {\displaystyle e^{t\,X}=1+t\,X+{\frac {t^{2}\,X^{2}}{2!}}+{\frac {t^{3}\,X^{3}}{3!}}+\cdots +{\frac {t^{n}\,X^{n}}{n!}}+\cdots .}

Hence

M X ( t ) = E ( e t X ) = 1 + t E ( X ) + t 2 E ( X 2 ) 2 ! + t 3 E ( X 3 ) 3 ! + + t n E ( X n ) n ! + = 1 + t m 1 + t 2 m 2 2 ! + t 3 m 3 3 ! + + t n m n n ! + , {\displaystyle {\begin{aligned}M_{X}(t)=\operatorname {E} (e^{t\,X})&=1+t\operatorname {E} (X)+{\frac {t^{2}\operatorname {E} (X^{2})}{2!}}+{\frac {t^{3}\operatorname {E} (X^{3})}{3!}}+\cdots +{\frac {t^{n}\operatorname {E} (X^{n})}{n!}}+\cdots \\&=1+tm_{1}+{\frac {t^{2}m_{2}}{2!}}+{\frac {t^{3}m_{3}}{3!}}+\cdots +{\frac {t^{n}m_{n}}{n!}}+\cdots ,\end{aligned}}}

where m n {\displaystyle m_{n}} is the n {\displaystyle n} th moment. Differentiating M X ( t ) {\displaystyle M_{X}(t)} i {\displaystyle i} times with respect to t {\displaystyle t} and setting t = 0 {\displaystyle t=0} , we obtain the i {\displaystyle i} th moment about the origin, m i {\displaystyle m_{i}} ; see Calculations of moments below.

If X {\displaystyle X} is a continuous random variable, the following relation between its moment-generating function M X ( t ) {\displaystyle M_{X}(t)} and the two-sided Laplace transform of its probability density function f X ( x ) {\displaystyle f_{X}(x)} holds:

M X ( t ) = L { f X } ( t ) , {\displaystyle M_{X}(t)={\mathcal {L}}\{f_{X}\}(-t),}

since the PDF's two-sided Laplace transform is given as

L { f X } ( s ) = e s x f X ( x ) d x , {\displaystyle {\mathcal {L}}\{f_{X}\}(s)=\int _{-\infty }^{\infty }e^{-sx}f_{X}(x)\,dx,}

and the moment-generating function's definition expands (by the law of the unconscious statistician) to

M X ( t ) = E [ e t X ] = e t x f X ( x ) d x . {\displaystyle M_{X}(t)=\operatorname {E} \left[e^{tX}\right]=\int _{-\infty }^{\infty }e^{tx}f_{X}(x)\,dx.}

This is consistent with the characteristic function of X {\displaystyle X} being a Wick rotation of M X ( t ) {\displaystyle M_{X}(t)} when the moment generating function exists, as the characteristic function of a continuous random variable X {\displaystyle X} is the Fourier transform of its probability density function f X ( x ) {\displaystyle f_{X}(x)} , and in general when a function f ( x ) {\displaystyle f(x)} is of exponential order, the Fourier transform of f {\displaystyle f} is a Wick rotation of its two-sided Laplace transform in the region of convergence. See the relation of the Fourier and Laplace transforms for further information.

Examples

Here are some examples of the moment-generating function and the characteristic function for comparison. It can be seen that the characteristic function is a Wick rotation of the moment-generating function M X ( t ) {\displaystyle M_{X}(t)} when the latter exists.

Distribution Moment-generating function M X ( t ) {\displaystyle M_{X}(t)} Characteristic function φ ( t ) {\displaystyle \varphi (t)}
Degenerate δ a {\displaystyle \delta _{a}} e t a {\displaystyle e^{ta}} e i t a {\displaystyle e^{ita}}
Bernoulli P ( X = 1 ) = p {\displaystyle P(X=1)=p} 1 p + p e t {\displaystyle 1-p+pe^{t}} 1 p + p e i t {\displaystyle 1-p+pe^{it}}
Geometric ( 1 p ) k 1 p {\displaystyle (1-p)^{k-1}\,p} p e t 1 ( 1 p ) e t ,   t < ln ( 1 p ) {\displaystyle {\frac {pe^{t}}{1-(1-p)e^{t}}},~t<-\ln(1-p)} p e i t 1 ( 1 p ) e i t {\displaystyle {\frac {pe^{it}}{1-(1-p)\,e^{it}}}}
Binomial B ( n , p ) {\displaystyle B(n,p)} ( 1 p + p e t ) n {\displaystyle \left(1-p+pe^{t}\right)^{n}} ( 1 p + p e i t ) n {\displaystyle \left(1-p+pe^{it}\right)^{n}}
Negative binomial NB ( r , p ) {\displaystyle \operatorname {NB} (r,p)} ( p 1 e t + p e t ) r ,   t < ln ( 1 p ) {\displaystyle \left({\frac {p}{1-e^{t}+pe^{t}}}\right)^{r},~t<-\ln(1-p)} ( p 1 e i t + p e i t ) r {\displaystyle \left({\frac {p}{1-e^{it}+pe^{it}}}\right)^{r}}
Poisson Pois ( λ ) {\displaystyle \operatorname {Pois} (\lambda )} e λ ( e t 1 ) {\displaystyle e^{\lambda (e^{t}-1)}} e λ ( e i t 1 ) {\displaystyle e^{\lambda (e^{it}-1)}}
Uniform (continuous) U ( a , b ) {\displaystyle \operatorname {U} (a,b)} e t b e t a t ( b a ) {\displaystyle {\frac {e^{tb}-e^{ta}}{t(b-a)}}} e i t b e i t a i t ( b a ) {\displaystyle {\frac {e^{itb}-e^{ita}}{it(b-a)}}}
Uniform (discrete) DU ( a , b ) {\displaystyle \operatorname {DU} (a,b)} e a t e ( b + 1 ) t ( b a + 1 ) ( 1 e t ) {\displaystyle {\frac {e^{at}-e^{(b+1)t}}{(b-a+1)(1-e^{t})}}} e a i t e ( b + 1 ) i t ( b a + 1 ) ( 1 e i t ) {\displaystyle {\frac {e^{ait}-e^{(b+1)it}}{(b-a+1)(1-e^{it})}}}
Laplace L ( μ , b ) {\displaystyle L(\mu ,b)} e t μ 1 b 2 t 2 ,   | t | < 1 / b {\displaystyle {\frac {e^{t\mu }}{1-b^{2}t^{2}}},~|t|<1/b} e i t μ 1 + b 2 t 2 {\displaystyle {\frac {e^{it\mu }}{1+b^{2}t^{2}}}}
Normal N ( μ , σ 2 ) {\displaystyle N(\mu ,\sigma ^{2})} e t μ + 1 2 σ 2 t 2 {\displaystyle e^{t\mu +{\frac {1}{2}}\sigma ^{2}t^{2}}} e i t μ 1 2 σ 2 t 2 {\displaystyle e^{it\mu -{\frac {1}{2}}\sigma ^{2}t^{2}}}
Chi-squared χ k 2 {\displaystyle \chi _{k}^{2}} ( 1 2 t ) k 2 ,   t < 1 / 2 {\displaystyle (1-2t)^{-{\frac {k}{2}}},~t<1/2} ( 1 2 i t ) k 2 {\displaystyle (1-2it)^{-{\frac {k}{2}}}}
Noncentral chi-squared χ k 2 ( λ ) {\displaystyle \chi _{k}^{2}(\lambda )} e λ t / ( 1 2 t ) ( 1 2 t ) k 2 {\displaystyle e^{\lambda t/(1-2t)}(1-2t)^{-{\frac {k}{2}}}} e i λ t / ( 1 2 i t ) ( 1 2 i t ) k 2 {\displaystyle e^{i\lambda t/(1-2it)}(1-2it)^{-{\frac {k}{2}}}}
Gamma Γ ( k , θ ) {\displaystyle \Gamma (k,\theta )} ( 1 t θ ) k ,   t < 1 θ {\displaystyle (1-t\theta )^{-k},~t<{\tfrac {1}{\theta }}} ( 1 i t θ ) k {\displaystyle (1-it\theta )^{-k}}
Exponential Exp ( λ ) {\displaystyle \operatorname {Exp} (\lambda )} ( 1 t λ 1 ) 1 ,   t < λ {\displaystyle \left(1-t\lambda ^{-1}\right)^{-1},~t<\lambda } ( 1 i t λ 1 ) 1 {\displaystyle \left(1-it\lambda ^{-1}\right)^{-1}}
Beta 1 + k = 1 ( r = 0 k 1 α + r α + β + r ) t k k ! {\displaystyle 1+\sum _{k=1}^{\infty }\left(\prod _{r=0}^{k-1}{\frac {\alpha +r}{\alpha +\beta +r}}\right){\frac {t^{k}}{k!}}} 1 F 1 ( α ; α + β ; i t ) {\displaystyle {}_{1}F_{1}(\alpha ;\alpha +\beta ;i\,t)\!} (see Confluent hypergeometric function)
Multivariate normal N ( μ , Σ ) {\displaystyle N(\mathbf {\mu } ,\mathbf {\Sigma } )} e t T ( μ + 1 2 Σ t ) {\displaystyle e^{\mathbf {t} ^{\mathrm {T} }\left({\boldsymbol {\mu }}+{\frac {1}{2}}\mathbf {\Sigma t} \right)}} e t T ( i μ 1 2 Σ t ) {\displaystyle e^{\mathbf {t} ^{\mathrm {T} }\left(i{\boldsymbol {\mu }}-{\frac {1}{2}}{\boldsymbol {\Sigma }}\mathbf {t} \right)}}
Cauchy Cauchy ( μ , θ ) {\displaystyle \operatorname {Cauchy} (\mu ,\theta )} Does not exist e i t μ θ | t | {\displaystyle e^{it\mu -\theta |t|}}
Multivariate Cauchy

MultiCauchy ( μ , Σ ) {\displaystyle \operatorname {MultiCauchy} (\mu ,\Sigma )} [3]

Does not exist e i t T μ t T Σ t {\displaystyle \!\,e^{i\mathbf {t} ^{\mathrm {T} }{\boldsymbol {\mu }}-{\sqrt {\mathbf {t} ^{\mathrm {T} }{\boldsymbol {\Sigma }}\mathbf {t} }}}}

Calculation

The moment-generating function is the expectation of a function of the random variable, it can be written as:

Note that for the case where X {\displaystyle X} has a continuous probability density function f ( x ) {\displaystyle f(x)} , M X ( t ) {\displaystyle M_{X}(-t)} is the two-sided Laplace transform of f ( x ) {\displaystyle f(x)} .

M X ( t ) = e t x f ( x ) d x = ( 1 + t x + t 2 x 2 2 ! + + t n x n n ! + ) f ( x ) d x = 1 + t m 1 + t 2 m 2 2 ! + + t n m n n ! + , {\displaystyle {\begin{aligned}M_{X}(t)&=\int _{-\infty }^{\infty }e^{tx}f(x)\,dx\\&=\int _{-\infty }^{\infty }\left(1+tx+{\frac {t^{2}x^{2}}{2!}}+\cdots +{\frac {t^{n}x^{n}}{n!}}+\cdots \right)f(x)\,dx\\&=1+tm_{1}+{\frac {t^{2}m_{2}}{2!}}+\cdots +{\frac {t^{n}m_{n}}{n!}}+\cdots ,\end{aligned}}}

where m n {\displaystyle m_{n}} is the n {\displaystyle n} th moment.

Linear transformations of random variables

If random variable X {\displaystyle X} has moment generating function M X ( t ) {\displaystyle M_{X}(t)} , then α X + β {\displaystyle \alpha X+\beta } has moment generating function M α X + β ( t ) = e β t M X ( α t ) {\displaystyle M_{\alpha X+\beta }(t)=e^{\beta t}M_{X}(\alpha t)}

M α X + β ( t ) = E [ e ( α X + β ) t ] = e β t E [ e α X t ] = e β t M X ( α t ) {\displaystyle M_{\alpha X+\beta }(t)=E[e^{(\alpha X+\beta )t}]=e^{\beta t}E[e^{\alpha Xt}]=e^{\beta t}M_{X}(\alpha t)}

Linear combination of independent random variables

If S n = i = 1 n a i X i {\displaystyle S_{n}=\sum _{i=1}^{n}a_{i}X_{i}} , where the Xi are independent random variables and the ai are constants, then the probability density function for Sn is the convolution of the probability density functions of each of the Xi, and the moment-generating function for Sn is given by

M S n ( t ) = M X 1 ( a 1 t ) M X 2 ( a 2 t ) M X n ( a n t ) . {\displaystyle M_{S_{n}}(t)=M_{X_{1}}(a_{1}t)M_{X_{2}}(a_{2}t)\cdots M_{X_{n}}(a_{n}t)\,.}

Vector-valued random variables

For vector-valued random variables X {\displaystyle \mathbf {X} } with real components, the moment-generating function is given by

M X ( t ) = E ( e t , X ) {\displaystyle M_{X}(\mathbf {t} )=E\left(e^{\langle \mathbf {t} ,\mathbf {X} \rangle }\right)}

where t {\displaystyle \mathbf {t} } is a vector and , {\displaystyle \langle \cdot ,\cdot \rangle } is the dot product.

Important properties

Moment generating functions are positive and log-convex,[citation needed] with M(0) = 1.

An important property of the moment-generating function is that it uniquely determines the distribution. In other words, if X {\displaystyle X} and Y {\displaystyle Y} are two random variables and for all values of t,

M X ( t ) = M Y ( t ) , {\displaystyle M_{X}(t)=M_{Y}(t),\,}

then

F X ( x ) = F Y ( x ) {\displaystyle F_{X}(x)=F_{Y}(x)\,}

for all values of x (or equivalently X and Y have the same distribution). This statement is not equivalent to the statement "if two distributions have the same moments, then they are identical at all points." This is because in some cases, the moments exist and yet the moment-generating function does not, because the limit

lim n i = 0 n t i m i i ! {\displaystyle \lim _{n\rightarrow \infty }\sum _{i=0}^{n}{\frac {t^{i}m_{i}}{i!}}}

may not exist. The log-normal distribution is an example of when this occurs.

Calculations of moments

The moment-generating function is so called because if it exists on an open interval around t = 0, then it is the exponential generating function of the moments of the probability distribution:

m n = E ( X n ) = M X ( n ) ( 0 ) = d n M X d t n | t = 0 . {\displaystyle m_{n}=E\left(X^{n}\right)=M_{X}^{(n)}(0)=\left.{\frac {d^{n}M_{X}}{dt^{n}}}\right|_{t=0}.}

That is, with n being a nonnegative integer, the nth moment about 0 is the nth derivative of the moment generating function, evaluated at t = 0.

Other properties

Jensen's inequality provides a simple lower bound on the moment-generating function:

M X ( t ) e μ t , {\displaystyle M_{X}(t)\geq e^{\mu t},}

where μ {\displaystyle \mu } is the mean of X.

The moment-generating function can be used in conjunction with Markov's inequality to bound the upper tail of a real random variable X. This statement is also called the Chernoff bound. Since x e x t {\displaystyle x\mapsto e^{xt}} is monotonically increasing for t > 0 {\displaystyle t>0} , we have

P ( X a ) = P ( e t X e t a ) e a t E [ e t X ] = e a t M X ( t ) {\displaystyle P(X\geq a)=P(e^{tX}\geq e^{ta})\leq e^{-at}E[e^{tX}]=e^{-at}M_{X}(t)}

for any t > 0 {\displaystyle t>0} and any a, provided M X ( t ) {\displaystyle M_{X}(t)} exists. For example, when X is a standard normal distribution and a > 0 {\displaystyle a>0} , we can choose t = a {\displaystyle t=a} and recall that M X ( t ) = e t 2 / 2 {\displaystyle M_{X}(t)=e^{t^{2}/2}} . This gives P ( X a ) e a 2 / 2 {\displaystyle P(X\geq a)\leq e^{-a^{2}/2}} , which is within a factor of 1+a of the exact value.

Various lemmas, such as Hoeffding's lemma or Bennett's inequality provide bounds on the moment-generating function in the case of a zero-mean, bounded random variable.

When X {\displaystyle X} is non-negative, the moment generating function gives a simple, useful bound on the moments:

E [ X m ] ( m t e ) m M X ( t ) , {\displaystyle E[X^{m}]\leq \left({\frac {m}{te}}\right)^{m}M_{X}(t),}

For any X , m 0 {\displaystyle X,m\geq 0} and t > 0 {\displaystyle t>0} .

This follows from the inequality 1 + x e x {\displaystyle 1+x\leq e^{x}} into which we can substitute x = t x / m 1 {\displaystyle x'=tx/m-1} implies t x / m e t x / m 1 {\displaystyle tx/m\leq e^{tx/m-1}} for any x , t , m R {\displaystyle x,t,m\in \mathbb {R} } . Now, if t > 0 {\displaystyle t>0} and x , m 0 {\displaystyle x,m\geq 0} , this can be rearranged to x m ( m / ( t e ) ) m e t x {\displaystyle x^{m}\leq (m/(te))^{m}e^{tx}} . Taking the expectation on both sides gives the bound on E [ X m ] {\displaystyle E[X^{m}]} in terms of E [ e t X ] {\displaystyle E[e^{tX}]} .

As an example, consider X Chi-Squared {\displaystyle X\sim {\text{Chi-Squared}}} with k {\displaystyle k} degrees of freedom. Then from the examples M X ( t ) = ( 1 2 t ) k / 2 {\displaystyle M_{X}(t)=(1-2t)^{-k/2}} . Picking t = m / ( 2 m + k ) {\displaystyle t=m/(2m+k)} and substituting into the bound:

E [ X m ] ( 1 + 2 m / k ) k / 2 e m ( k + 2 m ) m . {\displaystyle E[X^{m}]\leq (1+2m/k)^{k/2}e^{-m}(k+2m)^{m}.}

We know that in this case the correct bound is E [ X m ] 2 m Γ ( m + k / 2 ) / Γ ( k / 2 ) {\displaystyle E[X^{m}]\leq 2^{m}\Gamma (m+k/2)/\Gamma (k/2)} . To compare the bounds, we can consider the asymptotics for large k {\displaystyle k} . Here the moment-generating function bound is k m ( 1 + m 2 / k + O ( 1 / k 2 ) ) {\displaystyle k^{m}(1+m^{2}/k+O(1/k^{2}))} , where the real bound is k m ( 1 + ( m 2 m ) / k + O ( 1 / k 2 ) ) {\displaystyle k^{m}(1+(m^{2}-m)/k+O(1/k^{2}))} . The moment-generating function bound is thus very strong in this case.

Relation to other functions

Related to the moment-generating function are a number of other transforms that are common in probability theory:

Characteristic function
The characteristic function φ X ( t ) {\displaystyle \varphi _{X}(t)} is related to the moment-generating function via φ X ( t ) = M i X ( t ) = M X ( i t ) : {\displaystyle \varphi _{X}(t)=M_{iX}(t)=M_{X}(it):} the characteristic function is the moment-generating function of iX or the moment generating function of X evaluated on the imaginary axis. This function can also be viewed as the Fourier transform of the probability density function, which can therefore be deduced from it by inverse Fourier transform.
Cumulant-generating function
The cumulant-generating function is defined as the logarithm of the moment-generating function; some instead define the cumulant-generating function as the logarithm of the characteristic function, while others call this latter the second cumulant-generating function.
Probability-generating function
The probability-generating function is defined as G ( z ) = E [ z X ] . {\displaystyle G(z)=E\left[z^{X}\right].\,} This immediately implies that G ( e t ) = E [ e t X ] = M X ( t ) . {\displaystyle G(e^{t})=E\left[e^{tX}\right]=M_{X}(t).\,}

See also

References

Citations

  1. ^ Casella, George; Berger, Roger L. (1990). Statistical Inference. Wadsworth & Brooks/Cole. p. 61. ISBN 0-534-11958-1.
  2. ^ Bulmer, M. G. (1979). Principles of Statistics. Dover. pp. 75–79. ISBN 0-486-63760-3.
  3. ^ Kotz et al.[full citation needed] p. 37 using 1 as the number of degree of freedom to recover the Cauchy distribution

Sources

  • Casella, George; Berger, Roger (2002). Statistical Inference (2nd ed.). pp. 59–68. ISBN 978-0-534-24312-8.