Notation in probability and statistics

Probability
  • v
  • t
  • e
Statistics
  • v
  • t
  • e

Probability theory and statistics have some commonly used conventions, in addition to standard mathematical notation and mathematical symbols.

Probability theory

  • Random variables are usually written in upper case Roman letters: X {\textstyle X} , Y {\textstyle Y} , etc.
  • Particular realizations of a random variable are written in corresponding lower case letters. For example, x 1 , x 2 , , x n {\textstyle x_{1},x_{2},\ldots ,x_{n}} could be a sample corresponding to the random variable X {\textstyle X} . A cumulative probability is formally written P ( X x ) {\displaystyle P(X\leq x)} to differentiate the random variable from its realization.[1]
  • The probability is sometimes written P {\displaystyle \mathbb {P} } to distinguish it from other functions and measure P to avoid having to define "P is a probability" and P ( X A ) {\displaystyle \mathbb {P} (X\in A)} is short for P ( { ω Ω : X ( ω ) A } ) {\displaystyle P(\{\omega \in \Omega :X(\omega )\in A\})} , where Ω {\displaystyle \Omega } is the event space and X ( ω ) {\displaystyle X(\omega )} is a random variable. Pr ( A ) {\displaystyle \Pr(A)} notation is used alternatively.
  • P ( A B ) {\displaystyle \mathbb {P} (A\cap B)} or P [ B A ] {\displaystyle \mathbb {P} [B\cap A]} indicates the probability that events A and B both occur. The joint probability distribution of random variables X and Y is denoted as P ( X , Y ) {\displaystyle P(X,Y)} , while joint probability mass function or probability density function as f ( x , y ) {\displaystyle f(x,y)} and joint cumulative distribution function as F ( x , y ) {\displaystyle F(x,y)} .
  • P ( A B ) {\displaystyle \mathbb {P} (A\cup B)} or P [ B A ] {\displaystyle \mathbb {P} [B\cup A]} indicates the probability of either event A or event B occurring ("or" in this case means one or the other or both).
  • σ-algebras are usually written with uppercase calligraphic (e.g. F {\displaystyle {\mathcal {F}}} for the set of sets on which we define the probability P)
  • Probability density functions (pdfs) and probability mass functions are denoted by lowercase letters, e.g. f ( x ) {\displaystyle f(x)} , or f X ( x ) {\displaystyle f_{X}(x)} .
  • Cumulative distribution functions (cdfs) are denoted by uppercase letters, e.g. F ( x ) {\displaystyle F(x)} , or F X ( x ) {\displaystyle F_{X}(x)} .
  • Survival functions or complementary cumulative distribution functions are often denoted by placing an overbar over the symbol for the cumulative: F ¯ ( x ) = 1 F ( x ) {\displaystyle {\overline {F}}(x)=1-F(x)} , or denoted as S ( x ) {\displaystyle S(x)} ,
  • In particular, the pdf of the standard normal distribution is denoted by φ ( z ) {\textstyle \varphi (z)} , and its cdf by Φ ( z ) {\textstyle \Phi (z)} .
  • Some common operators:
  • E [ X ] {\textstyle \mathrm {E} [X]}  : expected value of X
  • var [ X ] {\textstyle \operatorname {var} [X]}  : variance of X
  • cov [ X , Y ] {\textstyle \operatorname {cov} [X,Y]}  : covariance of X and Y
  • X is independent of Y is often written X Y {\displaystyle X\perp Y} or X Y {\displaystyle X\perp \!\!\!\perp Y} , and X is independent of Y given W is often written
X Y | W {\displaystyle X\perp \!\!\!\perp Y\,|\,W} or
X Y | W {\displaystyle X\perp Y\,|\,W}
  • P ( A B ) {\displaystyle \textstyle P(A\mid B)} , the conditional probability, is the probability of A {\displaystyle \textstyle A} given B {\displaystyle \textstyle B} [2]

Statistics

  • Greek letters (e.g. θ, β) are commonly used to denote unknown parameters (population parameters).[3]
  • A tilde (~) denotes "has the probability distribution of".
  • Placing a hat, or caret (also known as a circumflex), over a true parameter denotes an estimator of it, e.g., θ ^ {\displaystyle {\widehat {\theta }}} is an estimator for θ {\displaystyle \theta } .
  • The arithmetic mean of a series of values x 1 , x 2 , , x n {\textstyle x_{1},x_{2},\ldots ,x_{n}} is often denoted by placing an "overbar" over the symbol, e.g. x ¯ {\displaystyle {\bar {x}}} , pronounced " x {\textstyle x} bar".
  • Some commonly used symbols for sample statistics are given below:
  • Some commonly used symbols for population parameters are given below:
    • the population mean μ {\textstyle \mu } ,
    • the population variance σ 2 {\textstyle \sigma ^{2}} ,
    • the population standard deviation σ {\textstyle \sigma } ,
    • the population correlation ρ {\textstyle \rho } ,
    • the population cumulants κ r {\textstyle \kappa _{r}} ,
  • x ( k ) {\displaystyle x_{(k)}} is used for the k th {\displaystyle k^{\text{th}}} order statistic, where x ( 1 ) {\displaystyle x_{(1)}} is the sample minimum and x ( n ) {\displaystyle x_{(n)}} is the sample maximum from a total sample size n {\textstyle n} .[4]

Critical values

The α-level upper critical value of a probability distribution is the value exceeded with probability α {\textstyle \alpha } , that is, the value x α {\textstyle x_{\alpha }} such that F ( x α ) = 1 α {\textstyle F(x_{\alpha })=1-\alpha } , where F {\textstyle F} is the cumulative distribution function. There are standard notations for the upper critical values of some commonly used distributions in statistics:

  • z α {\textstyle z_{\alpha }} or z ( α ) {\textstyle z(\alpha )} for the standard normal distribution
  • t α , ν {\textstyle t_{\alpha ,\nu }} or t ( α , ν ) {\textstyle t(\alpha ,\nu )} for the t-distribution with ν {\textstyle \nu } degrees of freedom
  • χ α , ν 2 {\displaystyle {\chi _{\alpha ,\nu }}^{2}} or χ 2 ( α , ν ) {\displaystyle {\chi }^{2}(\alpha ,\nu )} for the chi-squared distribution with ν {\textstyle \nu } degrees of freedom
  • F α , ν 1 , ν 2 {\displaystyle F_{\alpha ,\nu _{1},\nu _{2}}} or F ( α , ν 1 , ν 2 ) {\textstyle F(\alpha ,\nu _{1},\nu _{2})} for the F-distribution with ν 1 {\textstyle \nu _{1}} and ν 2 {\textstyle \nu _{2}} degrees of freedom

Linear algebra

  • Matrices are usually denoted by boldface capital letters, e.g. A {\textstyle {\mathbf {A}}} .
  • Column vectors are usually denoted by boldface lowercase letters, e.g. x {\textstyle {\mathbf {x}}} .
  • The transpose operator is denoted by either a superscript T (e.g. A T {\textstyle {\mathbf {A}}^{\mathrm {T} }} ) or a prime symbol (e.g. A {\textstyle {\mathbf {A}}'} ).
  • A row vector is written as the transpose of a column vector, e.g. x T {\textstyle {\mathbf {x}}^{\mathrm {T} }} or x {\textstyle {\mathbf {x}}'} .

Abbreviations

Common abbreviations include:

See also

References

  1. ^ "Calculating Probabilities from Cumulative Distribution Function". 2021-08-09. Retrieved 2024-02-26.
  2. ^ "Probability and stochastic processes", Applied Stochastic Processes, Chapman and Hall/CRC, pp. 9–36, 2013-07-22, ISBN 978-0-429-16812-3, retrieved 2023-12-08
  3. ^ "Letters of the Greek Alphabet and Some of Their Statistical Uses". les.appstate.edu/. 1999-02-13. Retrieved 2024-02-26.
  4. ^ "Order Statistics" (PDF). colorado.edu. Retrieved 2024-02-26.
  • Halperin, Max; Hartley, H. O.; Hoel, P. G. (1965), "Recommended Standards for Statistical Symbols and Notation. COPSS Committee on Symbols and Notation", The American Statistician, 19 (3): 12–14, doi:10.2307/2681417, JSTOR 2681417

External links

  • Earliest Uses of Symbols in Probability and Statistics, maintained by Jeff Miller.
  • v
  • t
  • e
Common mathematical notation, symbols, and formulas
Lists of Unicode and LaTeX mathematical symbols
Lists of Unicode symbols
General
Alphanumeric
Arrows and Geometric Shapes
Operators
Supplemental Math Operators
Miscellaneous
  • A
  • B
  • Technical
  • ISO 31-11 (Mathematical signs and symbols for use in physical sciences and technology)
Typographical conventions and notations
Language
Letters
Notation
Meanings of symbols