Semiparametric regression

Regression models that combine parametric and nonparametric models
Part of a series on
Regression analysis
Models
  • Linear regression
  • Simple regression
  • Polynomial regression
  • General linear model
  • Generalized linear model
  • Vector generalized linear model
  • Discrete choice
  • Binomial regression
  • Binary regression
  • Logistic regression
  • Multinomial logistic regression
  • Mixed logit
  • Probit
  • Multinomial probit
  • Ordered logit
  • Ordered probit
  • Poisson
Estimation
Background
  • icon Mathematics portal
  • v
  • t
  • e

In statistics, semiparametric regression includes regression models that combine parametric and nonparametric models. They are often used in situations where the fully nonparametric model may not perform well or when the researcher wants to use a parametric model but the functional form with respect to a subset of the regressors or the density of the errors is not known. Semiparametric regression models are a particular type of semiparametric modelling and, since semiparametric models contain a parametric component, they rely on parametric assumptions and may be misspecified and inconsistent, just like a fully parametric model.

Methods

Many different semiparametric regression methods have been proposed and developed. The most popular methods are the partially linear, index and varying coefficient models.

Partially linear models

A partially linear model is given by

Y i = X i β + g ( Z i ) + u i , i = 1 , , n , {\displaystyle Y_{i}=X'_{i}\beta +g\left(Z_{i}\right)+u_{i},\,\quad i=1,\ldots ,n,\,}

where Y i {\displaystyle Y_{i}} is the dependent variable, X i {\displaystyle X_{i}} is a p × 1 {\displaystyle p\times 1} vector of explanatory variables, β {\displaystyle \beta } is a p × 1 {\displaystyle p\times 1} vector of unknown parameters and Z i R q {\displaystyle Z_{i}\in \operatorname {R} ^{q}} . The parametric part of the partially linear model is given by the parameter vector β {\displaystyle \beta } while the nonparametric part is the unknown function g ( Z i ) {\displaystyle g\left(Z_{i}\right)} . The data is assumed to be i.i.d. with E ( u i | X i , Z i ) = 0 {\displaystyle E\left(u_{i}|X_{i},Z_{i}\right)=0} and the model allows for a conditionally heteroskedastic error process E ( u i 2 | x , z ) = σ 2 ( x , z ) {\displaystyle E\left(u_{i}^{2}|x,z\right)=\sigma ^{2}\left(x,z\right)} of unknown form. This type of model was proposed by Robinson (1988) and extended to handle categorical covariates by Racine and Li (2007).

This method is implemented by obtaining a n {\displaystyle {\sqrt {n}}} consistent estimator of β {\displaystyle \beta } and then deriving an estimator of g ( Z i ) {\displaystyle g\left(Z_{i}\right)} from the nonparametric regression of Y i X i β ^ {\displaystyle Y_{i}-X'_{i}{\hat {\beta }}} on z {\displaystyle z} using an appropriate nonparametric regression method.[1]

Index models

A single index model takes the form

Y = g ( X β 0 ) + u , {\displaystyle Y=g\left(X'\beta _{0}\right)+u,\,}

where Y {\displaystyle Y} , X {\displaystyle X} and β 0 {\displaystyle \beta _{0}} are defined as earlier and the error term u {\displaystyle u} satisfies E ( u | X ) = 0 {\displaystyle E\left(u|X\right)=0} . The single index model takes its name from the parametric part of the model x β {\displaystyle x'\beta } which is a scalar single index. The nonparametric part is the unknown function g ( ) {\displaystyle g\left(\cdot \right)} .

Ichimura's method

The single index model method developed by Ichimura (1993) is as follows. Consider the situation in which y {\displaystyle y} is continuous. Given a known form for the function g ( ) {\displaystyle g\left(\cdot \right)} , β 0 {\displaystyle \beta _{0}} could be estimated using the nonlinear least squares method to minimize the function

i = 1 ( Y i g ( X i β ) ) 2 . {\displaystyle \sum _{i=1}\left(Y_{i}-g\left(X'_{i}\beta \right)\right)^{2}.}

Since the functional form of g ( ) {\displaystyle g\left(\cdot \right)} is not known, we need to estimate it. For a given value for β {\displaystyle \beta } an estimate of the function

G ( X i β ) = E ( Y i | X i β ) = E [ g ( X i β o ) | X i β ] {\displaystyle G\left(X'_{i}\beta \right)=E\left(Y_{i}|X'_{i}\beta \right)=E\left[g\left(X'_{i}\beta _{o}\right)|X'_{i}\beta \right]}

using kernel method. Ichimura (1993) proposes estimating g ( X i β ) {\displaystyle g\left(X'_{i}\beta \right)} with

G ^ i ( X i β ) , {\displaystyle {\hat {G}}_{-i}\left(X'_{i}\beta \right),\,}

the leave-one-out nonparametric kernel estimator of G ( X i β ) {\displaystyle G\left(X'_{i}\beta \right)} .

Klein and Spady's estimator

If the dependent variable y {\displaystyle y} is binary and X i {\displaystyle X_{i}} and u i {\displaystyle u_{i}} are assumed to be independent, Klein and Spady (1993) propose a technique for estimating β {\displaystyle \beta } using maximum likelihood methods. The log-likelihood function is given by

L ( β ) = i ( 1 Y i ) ln ( 1 g ^ i ( X i β ) ) + i Y i ln ( g ^ i ( X i β ) ) , {\displaystyle L\left(\beta \right)=\sum _{i}\left(1-Y_{i}\right)\ln \left(1-{\hat {g}}_{-i}\left(X'_{i}\beta \right)\right)+\sum _{i}Y_{i}\ln \left({\hat {g}}_{-i}\left(X'_{i}\beta \right)\right),}

where g ^ i ( X i β ) {\displaystyle {\hat {g}}_{-i}\left(X'_{i}\beta \right)} is the leave-one-out estimator.

Smooth coefficient/varying coefficient models

Hastie and Tibshirani (1993) propose a smooth coefficient model given by

Y i = α ( Z i ) + X i β ( Z i ) + u i = ( 1 + X i ) ( α ( Z i ) β ( Z i ) ) + u i = W i γ ( Z i ) + u i , {\displaystyle Y_{i}=\alpha \left(Z_{i}\right)+X'_{i}\beta \left(Z_{i}\right)+u_{i}=\left(1+X'_{i}\right)\left({\begin{array}{c}\alpha \left(Z_{i}\right)\\\beta \left(Z_{i}\right)\end{array}}\right)+u_{i}=W'_{i}\gamma \left(Z_{i}\right)+u_{i},}

where X i {\displaystyle X_{i}} is a k × 1 {\displaystyle k\times 1} vector and β ( z ) {\displaystyle \beta \left(z\right)} is a vector of unspecified smooth functions of z {\displaystyle z} .

γ ( ) {\displaystyle \gamma \left(\cdot \right)} may be expressed as

γ ( Z i ) = ( E [ W i W i | Z i ] ) 1 E [ W i Y i | Z i ] . {\displaystyle \gamma \left(Z_{i}\right)=\left(E\left[W_{i}W'_{i}|Z_{i}\right]\right)^{-1}E\left[W_{i}Y_{i}|Z_{i}\right].}

See also

Notes

  1. ^ See Li and Racine (2007) for an in-depth look at nonparametric regression methods.

References

  • Robinson, P.M. (1988). "Root-n Consistent Semiparametric Regression". Econometrica. 56 (4). The Econometric Society: 931–954. doi:10.2307/1912705. JSTOR 1912705.
  • Li, Qi; Racine, Jeffrey S. (2007). Nonparametric Econometrics: Theory and Practice. Princeton University Press. ISBN 978-0-691-12161-1.
  • Racine, J.S.; Qui, L. (2007). "A Partially Linear Kernel Estimator for Categorical Data". Unpublished Manuscript, Mcmaster University.
  • Ichimura, H. (1993). "Semiparametric Least Squares (SLS) and Weighted SLS Estimation of Single Index Models". Journal of Econometrics. 58 (1–2): 71–120. doi:10.1016/0304-4076(93)90114-K.
  • Klein, R. W.; R. H. Spady (1993). "An Efficient Semiparametric Estimator for Binary Response Models". Econometrica. 61 (2). The Econometric Society: 387–421. CiteSeerX 10.1.1.318.4925. doi:10.2307/2951556. JSTOR 2951556.
  • Hastie, T.; R. Tibshirani (1993). "Varying-Coefficient Models". Journal of the Royal Statistical Society, Series B. 55: 757–796.
  • v
  • t
  • e
Continuous data
Center
Dispersion
Shape
Count data
Summary tables
Dependence
Graphics
Study design
Survey methodology
Controlled experiments
Adaptive designs
Observational studies
Statistical theory
Frequentist inference
Point estimation
Interval estimation
Testing hypotheses
Parametric tests
Specific tests
  • Z-test (normal)
  • Student's t-test
  • F-test
Goodness of fit
Rank statistics
Bayesian inference
Correlation
Regression analysis
Linear regression
Non-standard predictors
Generalized linear model
Partition of variance
Categorical
Multivariate
Time-series
General
Specific tests
Time domain
Frequency domain
Survival
Survival function
Hazard function
Test
Biostatistics
Engineering statistics
Social statistics
Spatial statistics
  • Category
  • icon Mathematics portal
  • Commons
  • WikiProject