Tschuprow's T

  T = ϕ 2 ( r 1 ) ( c 1 ) {\displaystyle T={\sqrt {\frac {\phi ^{2}}{\sqrt {(r-1)(c-1)}}}}}  

Tschuprow's T

In statistics, Tschuprow's T is a measure of association between two nominal variables, giving a value between 0 and 1 (inclusive). It is closely related to Cramér's V, coinciding with it for square contingency tables. It was published by Alexander Tschuprow (alternative spelling: Chuprov) in 1939.[1]

Definition

For an r × c contingency table with r rows and c columns, let π i j {\displaystyle \pi _{ij}} be the proportion of the population in cell ( i , j ) {\displaystyle (i,j)} and let

π i + = j = 1 c π i j {\displaystyle \pi _{i+}=\sum _{j=1}^{c}\pi _{ij}} and π + j = i = 1 r π i j . {\displaystyle \pi _{+j}=\sum _{i=1}^{r}\pi _{ij}.}

Then the mean square contingency is given as

ϕ 2 = i = 1 r j = 1 c ( π i j π i + π + j ) 2 π i + π + j , {\displaystyle \phi ^{2}=\sum _{i=1}^{r}\sum _{j=1}^{c}{\frac {(\pi _{ij}-\pi _{i+}\pi _{+j})^{2}}{\pi _{i+}\pi _{+j}}},}

and Tschuprow's T as

T = ϕ 2 ( r 1 ) ( c 1 ) . {\displaystyle T={\sqrt {\frac {\phi ^{2}}{\sqrt {(r-1)(c-1)}}}}.}

Properties

T equals zero if and only if independence holds in the table, i.e., if and only if π i j = π i + π + j {\displaystyle \pi _{ij}=\pi _{i+}\pi _{+j}} . T equals one if and only there is perfect dependence in the table, i.e., if and only if for each i there is only one j such that π i j > 0 {\displaystyle \pi _{ij}>0} and vice versa. Hence, it can only equal 1 for square tables. In this it differs from Cramér's V, which can be equal to 1 for any rectangular table.

Estimation

If we have a multinomial sample of size n, the usual way to estimate T from the data is via the formula

T ^ = i = 1 r j = 1 c ( p i j p i + p + j ) 2 p i + p + j ( r 1 ) ( c 1 ) , {\displaystyle {\hat {T}}={\sqrt {\frac {\sum _{i=1}^{r}\sum _{j=1}^{c}{\frac {(p_{ij}-p_{i+}p_{+j})^{2}}{p_{i+}p_{+j}}}}{\sqrt {(r-1)(c-1)}}}},}

where p i j = n i j / n {\displaystyle p_{ij}=n_{ij}/n} is the proportion of the sample in cell ( i , j ) {\displaystyle (i,j)} . This is the empirical value of T. With χ 2 {\displaystyle \chi ^{2}} the Pearson chi-square statistic, this formula can also be written as

T ^ = χ 2 / n ( r 1 ) ( c 1 ) . {\displaystyle {\hat {T}}={\sqrt {\frac {\chi ^{2}/n}{\sqrt {(r-1)(c-1)}}}}.}

See also

Other measures of correlation for nominal data:

Other related articles:

  • Effect size

References

  1. ^ Tschuprow, A. A. (1939) Principles of the Mathematical Theory of Correlation; translated by M. Kantorowitsch. W. Hodge & Co.
  • Liebetrau, A. (1983). Measures of Association (Quantitative Applications in the Social Sciences). Sage Publications