Paley–Zygmund inequality

In mathematics, the Paley–Zygmund inequality bounds the probability that a positive random variable is small, in terms of its first two moments. The inequality was proved by Raymond Paley and Antoni Zygmund.

Theorem: If Z ≥ 0 is a random variable with finite variance, and if 0 θ 1 {\displaystyle 0\leq \theta \leq 1} , then

P ( Z > θ E [ Z ] ) ( 1 θ ) 2 E [ Z ] 2 E [ Z 2 ] . {\displaystyle \operatorname {P} (Z>\theta \operatorname {E} [Z])\geq (1-\theta )^{2}{\frac {\operatorname {E} [Z]^{2}}{\operatorname {E} [Z^{2}]}}.}

Proof: First,

E [ Z ] = E [ Z 1 { Z θ E [ Z ] } ] + E [ Z 1 { Z > θ E [ Z ] } ] . {\displaystyle \operatorname {E} [Z]=\operatorname {E} [Z\,\mathbf {1} _{\{Z\leq \theta \operatorname {E} [Z]\}}]+\operatorname {E} [Z\,\mathbf {1} _{\{Z>\theta \operatorname {E} [Z]\}}].}

The first addend is at most θ E [ Z ] {\displaystyle \theta \operatorname {E} [Z]} , while the second is at most E [ Z 2 ] 1 / 2 P ( Z > θ E [ Z ] ) 1 / 2 {\displaystyle \operatorname {E} [Z^{2}]^{1/2}\operatorname {P} (Z>\theta \operatorname {E} [Z])^{1/2}} by the Cauchy–Schwarz inequality. The desired inequality then follows. ∎

Related inequalities

The Paley–Zygmund inequality can be written as

P ( Z > θ E [ Z ] ) ( 1 θ ) 2 E [ Z ] 2 Var Z + E [ Z ] 2 . {\displaystyle \operatorname {P} (Z>\theta \operatorname {E} [Z])\geq {\frac {(1-\theta )^{2}\,\operatorname {E} [Z]^{2}}{\operatorname {Var} Z+\operatorname {E} [Z]^{2}}}.}

This can be improved[citation needed]. By the Cauchy–Schwarz inequality,

E [ Z θ E [ Z ] ] E [ ( Z θ E [ Z ] ) 1 { Z > θ E [ Z ] } ] E [ ( Z θ E [ Z ] ) 2 ] 1 / 2 P ( Z > θ E [ Z ] ) 1 / 2 {\displaystyle \operatorname {E} [Z-\theta \operatorname {E} [Z]]\leq \operatorname {E} [(Z-\theta \operatorname {E} [Z])\mathbf {1} _{\{Z>\theta \operatorname {E} [Z]\}}]\leq \operatorname {E} [(Z-\theta \operatorname {E} [Z])^{2}]^{1/2}\operatorname {P} (Z>\theta \operatorname {E} [Z])^{1/2}}

which, after rearranging, implies that

P ( Z > θ E [ Z ] ) ( 1 θ ) 2 E [ Z ] 2 E [ ( Z θ E [ Z ] ) 2 ] = ( 1 θ ) 2 E [ Z ] 2 Var Z + ( 1 θ ) 2 E [ Z ] 2 . {\displaystyle \operatorname {P} (Z>\theta \operatorname {E} [Z])\geq {\frac {(1-\theta )^{2}\operatorname {E} [Z]^{2}}{\operatorname {E} [(Z-\theta \operatorname {E} [Z])^{2}]}}={\frac {(1-\theta )^{2}\operatorname {E} [Z]^{2}}{\operatorname {Var} Z+(1-\theta )^{2}\operatorname {E} [Z]^{2}}}.}


This inequality is sharp; equality is achieved if Z almost surely equals a positive constant.

In turn, this implies another convenient form (known as Cantelli's inequality) which is

P ( Z > μ θ σ ) θ 2 1 + θ 2 , {\displaystyle \operatorname {P} (Z>\mu -\theta \sigma )\geq {\frac {\theta ^{2}}{1+\theta ^{2}}},}

where μ = E [ Z ] {\displaystyle \mu =\operatorname {E} [Z]} and σ 2 = Var [ Z ] {\displaystyle \sigma ^{2}=\operatorname {Var} [Z]} . This follows from the substitution θ = 1 θ σ / μ {\displaystyle \theta =1-\theta '\sigma /\mu } valid when 0 μ θ σ μ {\displaystyle 0\leq \mu -\theta \sigma \leq \mu } .

A strengthened form of the Paley-Zygmund inequality states that if Z is a non-negative random variable then

P ( Z > θ E [ Z Z > 0 ] ) ( 1 θ ) 2 E [ Z ] 2 E [ Z 2 ] {\displaystyle \operatorname {P} (Z>\theta \operatorname {E} [Z\mid Z>0])\geq {\frac {(1-\theta )^{2}\,\operatorname {E} [Z]^{2}}{\operatorname {E} [Z^{2}]}}}

for every 0 θ 1 {\displaystyle 0\leq \theta \leq 1} . This inequality follows by applying the usual Paley-Zygmund inequality to the conditional distribution of Z given that it is positive and noting that the various factors of P ( Z > 0 ) {\displaystyle \operatorname {P} (Z>0)} cancel.

Both this inequality and the usual Paley-Zygmund inequality also admit L p {\displaystyle L^{p}} versions:[1] If Z is a non-negative random variable and p > 1 {\displaystyle p>1} then

P ( Z > θ E [ Z Z > 0 ] ) ( 1 θ ) p / ( p 1 ) E [ Z ] p / ( p 1 ) E [ Z p ] 1 / ( p 1 ) . {\displaystyle \operatorname {P} (Z>\theta \operatorname {E} [Z\mid Z>0])\geq {\frac {(1-\theta )^{p/(p-1)}\,\operatorname {E} [Z]^{p/(p-1)}}{\operatorname {E} [Z^{p}]^{1/(p-1)}}}.}

for every 0 θ 1 {\displaystyle 0\leq \theta \leq 1} . This follows by the same proof as above but using Hölder's inequality in place of the Cauchy-Schwarz inequality.

See also

  • Cantelli's inequality
  • Second moment method
  • Concentration inequality – a summary of tail-bounds on random variables.

References

  1. ^ Petrov, Valentin V. (1 August 2007). "On lower bounds for tail probabilities". Journal of Statistical Planning and Inference. 137 (8): 2703–2705. doi:10.1016/j.jspi.2006.02.015.

Further reading

  • Paley, R. E. A. C.; Zygmund, A. (April 1932). "On some series of functions, (3)". Mathematical Proceedings of the Cambridge Philosophical Society. 28 (2): 190–205. Bibcode:1932PCPS...28..190P. doi:10.1017/S0305004100010860. S2CID 178702376.
  • Paley, R. E. A. C.; Zygmund, A. (July 1932). "A note on analytic functions in the unit circle". Mathematical Proceedings of the Cambridge Philosophical Society. 28 (3): 266–272. Bibcode:1932PCPS...28..266P. doi:10.1017/S0305004100010112. S2CID 122832495.