Ordered logit

Regression model for ordinal dependent variables

Regression analysis
Part of a series on
Models
Linear regression Simple regression Polynomial regression General linear model
Generalized linear model Vector generalized linear model Discrete choice Binomial regression Binary regression Logistic regression Multinomial logistic regression Mixed logit Probit Multinomial probit Ordered logit Ordered probit Poisson
Multilevel model Fixed effects Random effects Linear mixed-effects model Nonlinear mixed-effects model
Nonlinear regression Nonparametric Semiparametric Robust Quantile Isotonic Principal components Least angle Local Segmented
Errors-in-variables
Estimation
Least squares Linear Non-linear
Ordinary Weighted Generalized Generalized estimating equation
Partial Total Non-negative Ridge regression Regularized
Least absolute deviations Iteratively reweighted Bayesian Bayesian multivariate Least-squares spectral analysis
Background
Regression validation Mean and predicted response Errors and residuals Goodness of fit Studentized residual Gauss–Markov theorem
Mathematics portal
v t e

In statistics, the ordered logit model (also ordered logistic regression or proportional odds model) is an ordinal regression model—that is, a regression model for ordinal dependent variables—first considered by Peter McCullagh.^[1] For example, if one question on a survey is to be answered by a choice among "poor", "fair", "good", "very good" and "excellent", and the purpose of the analysis is to see how well that response can be predicted by the responses to other questions, some of which may be quantitative, then ordered logistic regression may be used. It can be thought of as an extension of the logistic regression model that applies to dichotomous dependent variables, allowing for more than two (ordered) response categories.

The model and the proportional odds assumption

The model only applies to data that meet the proportional odds assumption, the meaning of which can be exemplified as follows. Suppose there are five outcomes: "poor", "fair", "good", "very good", and "excellent". We assume that the probabilities of these outcomes are given by p₁(x), p₂(x), p₃(x), p₄(x), p₅(x), all of which are functions of some independent variable(s) x. Then, for a fixed value of x, the logarithms of the odds (not the logarithms of the probabilities) of answering in certain ways are:

{\begin{aligned}{\text{poor: }}&\log {\frac {p_{1}(x)}{p_{2}(x)+p_{3}(x)+p_{4}(x)+p_{5}(x)}},\\[8pt]{\text{poor or fair: }}&\log {\frac {p_{1}(x)+p_{2}(x)}{p_{3}(x)+p_{4}(x)+p_{5}(x)}},\\[8pt]{\text{poor, fair, or good: }}&\log {\frac {p_{1}(x)+p_{2}(x)+p_{3}(x)}{p_{4}(x)+p_{5}(x)}},\\[8pt]{\text{poor, fair, good, or very good: }}&\log {\frac {p_{1}(x)+p_{2}(x)+p_{3}(x)+p_{4}(x)}{p_{5}(x)}}\end{aligned}}

The proportional odds assumption states that the numbers added to each of these logarithms to get the next are the same regardless of x. In other words, the difference between the logarithm of the odds of having poor or fair health minus the logarithm of having poor health is the same regardless of x; similarly, the logarithm of the odds of having poor, fair, or good health minus the logarithm of having poor or fair health is the same regardless of x; etc.^[2]

Examples of multiple-ordered response categories include bond ratings, opinion surveys with responses ranging from "strongly agree" to "strongly disagree," levels of state spending on government programs (high, medium, or low), the level of insurance coverage chosen (none, partial, or full), and employment status (not employed, employed part-time, or fully employed).^[3]

Ordered logit can be derived from a latent-variable model, similar to the one from which binary logistic regression can be derived. Suppose the underlying process to be characterized is

y^{*}=\mathbf {x} ^{\mathsf {T}}\beta +\varepsilon ,\,

where $y^{*}$ is an unobserved dependent variable (perhaps the exact level of agreement with the statement proposed by the pollster); $\mathbf {x}$ is the vector of independent variables; $\varepsilon$ is the error term, assumed to follow a standard logistic distribution; and $\beta$ is the vector of regression coefficients which we wish to estimate. Further suppose that while we cannot observe $y^{*}$ , we instead can only observe the categories of response

y={\begin{cases}0&{\text{if }}y^{*}\leq \mu _{1},\\1&{\text{if }}\mu _{1}<y^{*}\leq \mu _{2},\\2&{\text{if }}\mu _{2}<y^{*}\leq \mu _{3},\\\vdots \\N&{\text{if }}\mu _{N}<y^{*}\end{cases}}

where the parameters $\mu _{i}$ are the externally imposed endpoints of the observable categories. Then the ordered logit technique will use the observations on y, which are a form of censored data on y*, to fit the parameter vector $\beta$ .

Estimation

For details on how the equation is estimated, see the article Ordinal regression.

References

^ McCullagh, Peter (1980). "Regression Models for Ordinal Data". Journal of the Royal Statistical Society. Series B (Methodological). 42 (2): 109–142. JSTOR 2984952.
^ "rologit.pdf" (PDF). Stata.
^ Greene, William H. (2012). Econometric Analysis (Seventh ed.). Boston: Pearson Education. pp. 824–827. ISBN 978-0-273-75356-8.

External links

Simon, Steve (2004-09-22). "Sample size for an ordinal outcome". STATS − STeve's Attempt to Teach Statistics. Retrieved 2014-08-22.
Rodríguez, Germán. "Ordered Logit Models". Princeton University.