Summary statistics

Type of statistics
Box plot of the Michelson–Morley experiment, showing several summary statistics.

In descriptive statistics, summary statistics are used to summarize a set of observations, in order to communicate the largest amount of information as simply as possible. Statisticians commonly try to describe the observations in

  • a measure of location, or central tendency, such as the arithmetic mean
  • a measure of statistical dispersion like the standard mean absolute deviation
  • a measure of the shape of the distribution like skewness or kurtosis
  • if more than one variable is measured, a measure of statistical dependence such as a correlation coefficient

A common collection of order statistics used as summary statistics are the five-number summary, sometimes extended to a seven-number summary, and the associated box plot.

Entries in an analysis of variance table can also be regarded as summary statistics.[1]: 378 

Examples

Location

Common measures of location, or central tendency, are the arithmetic mean, median, mode, and interquartile mean.[2][3]

Spread

Common measures of statistical dispersion are the standard deviation, variance, range, interquartile range, absolute deviation, mean absolute difference and the distance standard deviation. Measures that assess spread in comparison to the typical size of data values include the coefficient of variation.

The Gini coefficient was originally developed to measure income inequality and is equivalent to one of the L-moments.

A simple summary of a dataset is sometimes given by quoting particular order statistics as approximations to selected percentiles of a distribution.

Shape

Common measures of the shape of a distribution are skewness or kurtosis, while alternatives can be based on L-moments. A different measure is the distance skewness, for which a value of zero implies central symmetry.

Dependence

The common measure of dependence between paired random variables is the Pearson product-moment correlation coefficient, while a common alternative summary statistic is Spearman's rank correlation coefficient. A value of zero for the distance correlation implies independence.

Human perception of summary statistics

Humans efficiently use summary statistics to quickly perceive the gist of auditory and visual information.[4][5][6]

See also

References

  1. ^ Upton, Graham; Cook, Ian (2 October 2008). "Dictionary (S)". A Dictionary of Statistics (Second (revised) ed.). Oxford University Press. ISBN 978-0199541454. LCCN 2008300706. OCLC 935100347. OL 23145891M – via Internet Archive. p. 378: summary statistics [...] *ANOVA table might be referred to as summary statistics
  2. ^ Bullen, P. S. (31 August 2003). Handbook of Means and Their Inequalities. Mathematics and Its Applications. Vol. 560 (2 ed.). Springer Dordrecht. doi:10.1007/978-94-017-0399-4. ISBN 978-1-4020-1522-9. LCCN 2003060794. OCLC 939214285. OL 8370727M.
  3. ^ Grabisch, Michel; Marichal, Jean-Luc; Mesiar, Radko; Pap, Endre (2009). Aggregation Functions. Oxford University Press. ISBN 978-0521519267.
  4. ^ Piazza, Elise A.; Sweeny, Timothy D.; Wessel, David; Silver, Michael A.; Whitney, David (2013). "Humans Use Summary Statistics to Perceive Auditory Sequences". Psychological Science. 24 (8): 1389–1397. doi:10.1177/0956797612473759. PMC 4381997. PMID 23761928.
  5. ^ Alexander, R. G.; Schmidt, J.; Zelinsky, G. Z. (2014). "Are summary statistics enough? Evidence for the importance of shape in guiding visual search". Visual Cognition. 22 (3–4): 595–609. doi:10.1080/13506285.2014.890989. PMC 4500174. PMID 26180505.
  6. ^ Utochkin, Igor S. (2015). "Ensemble summary statistics as a basis for rapid visual categorization". Journal of Vision. 15 (4): 8. doi:10.1167/15.4.8. PMID 26317396.

External links

  • Media related to Summary statistics at Wikimedia Commons
  • v
  • t
  • e
Continuous data
Center
Dispersion
Shape
Count data
Summary tables
Dependence
Graphics
Study design
Survey methodology
Controlled experiments
Adaptive designs
Observational studies
Statistical theory
Frequentist inference
Point estimation
Interval estimation
Testing hypotheses
Parametric tests
Specific tests
  • Z-test (normal)
  • Student's t-test
  • F-test
Goodness of fit
Rank statistics
Bayesian inference
Correlation
Regression analysis
Linear regression
Non-standard predictors
Generalized linear model
Partition of variance
Categorical
Multivariate
Time-series
General
Specific tests
Time domain
Frequency domain
Survival
Survival function
Hazard function
Test
Biostatistics
Engineering statistics
Social statistics
Spatial statistics
  • Category
  • icon Mathematics portal
  • Commons
  • WikiProject