Least trimmed squares

Least trimmed squares (LTS), or least trimmed sum of squares, is a robust statistical method that fits a function to a set of data whilst not being unduly affected by the presence of outliers[1] . It is one of a number of methods for robust regression.

Description of method

Instead of the standard least squares method, which minimises the sum of squared residuals over n points, the LTS method attempts to minimise the sum of squared residuals over a subset, k {\displaystyle k} , of those points. The unused n k {\displaystyle n-k} points do not influence the fit.

In a standard least squares problem, the estimated parameter values β are defined to be those values that minimise the objective function S(β) of squared residuals:

S = i = 1 n r i ( β ) 2 , {\displaystyle S=\sum _{i=1}^{n}r_{i}(\beta )^{2},}

where the residuals are defined as the differences between the values of the dependent variables (observations) and the model values:

r i ( β ) = y i f ( x i , β ) , {\displaystyle r_{i}(\beta )=y_{i}-f(x_{i},\beta ),}

and where n is the overall number of data points. For a least trimmed squares analysis, this objective function is replaced by one constructed in the following way. For a fixed value of β, let r ( j ) ( β ) {\displaystyle r_{(j)}(\beta )} denote the set of ordered absolute values of the residuals (in increasing order of absolute value). In this notation, the standard sum of squares function is

S ( β ) = j = 1 n r ( j ) ( β ) 2 , {\displaystyle S(\beta )=\sum _{j=1}^{n}r_{(j)}(\beta )^{2},}

while the objective function for LTS is

S k ( β ) = j = 1 k r ( j ) ( β ) 2 . {\displaystyle S_{k}(\beta )=\sum _{j=1}^{k}r_{(j)}(\beta )^{2}.}

Computational considerations

Because this method is binary, in that points are either included or excluded, no closed-form solution exists. As a result, methods for finding the LTS solution sift through combinations of the data, attempting to find the k subset that yields the lowest sum of squared residuals. Methods exist for low n that will find the exact solution; however, as n rises, the number of combinations grows rapidly, thus yielding methods that attempt to find approximate (but generally sufficient) solutions.

References

  1. ^ Fox, John (2015). "19". Applied Regression Analysis and Generalized Linear Models (3rd ed.). Thousand Oaks.{{cite book}}: CS1 maint: location missing publisher (link)
  • Rousseeuw, P. J. (1984). "Least Median of Squares Regression". Journal of the American Statistical Association. 79 (388): 871–880. doi:10.1080/01621459.1984.10477105. JSTOR 2288718.
  • Rousseeuw, P. J.; Leroy, A. M. (2005) [1987]. Robust Regression and Outlier Detection. Wiley. doi:10.1002/0471725382. ISBN 978-0-471-85233-9.
  • Li, L. M. (2005). "An algorithm for computing exact least-trimmed squares estimate of simple linear regression with constraints". Computational Statistics & Data Analysis. 48 (4): 717–734. doi:10.1016/j.csda.2004.04.003.
  • Atkinson, A. C.; Cheng, T.-C. (1999). "Computing least trimmed squares regression with the forward search". Statistics and Computing. 9 (4): 251–263. doi:10.1023/A:1008942604045.
  • Jung, Kang-Mo (2007). "Least Trimmed Squares Estimator in the Errors-in-Variables Model". Journal of Applied Statistics. 34 (3): 331–338. doi:10.1080/02664760601004973.