Scoring algorithm

Scoring algorithm, also known as Fisher's scoring,[1] is a form of Newton's method used in statistics to solve maximum likelihood equations numerically, named after Ronald Fisher.

Sketch of derivation

Let Y 1 , , Y n {\displaystyle Y_{1},\ldots ,Y_{n}} be random variables, independent and identically distributed with twice differentiable p.d.f. f ( y ; θ ) {\displaystyle f(y;\theta )} , and we wish to calculate the maximum likelihood estimator (M.L.E.) θ {\displaystyle \theta ^{*}} of θ {\displaystyle \theta } . First, suppose we have a starting point for our algorithm θ 0 {\displaystyle \theta _{0}} , and consider a Taylor expansion of the score function, V ( θ ) {\displaystyle V(\theta )} , about θ 0 {\displaystyle \theta _{0}} :

V ( θ ) V ( θ 0 ) J ( θ 0 ) ( θ θ 0 ) , {\displaystyle V(\theta )\approx V(\theta _{0})-{\mathcal {J}}(\theta _{0})(\theta -\theta _{0}),\,}

where

J ( θ 0 ) = i = 1 n | θ = θ 0 log f ( Y i ; θ ) {\displaystyle {\mathcal {J}}(\theta _{0})=-\sum _{i=1}^{n}\left.\nabla \nabla ^{\top }\right|_{\theta =\theta _{0}}\log f(Y_{i};\theta )}

is the observed information matrix at θ 0 {\displaystyle \theta _{0}} . Now, setting θ = θ {\displaystyle \theta =\theta ^{*}} , using that V ( θ ) = 0 {\displaystyle V(\theta ^{*})=0} and rearranging gives us:

θ θ 0 + J 1 ( θ 0 ) V ( θ 0 ) . {\displaystyle \theta ^{*}\approx \theta _{0}+{\mathcal {J}}^{-1}(\theta _{0})V(\theta _{0}).\,}

We therefore use the algorithm

θ m + 1 = θ m + J 1 ( θ m ) V ( θ m ) , {\displaystyle \theta _{m+1}=\theta _{m}+{\mathcal {J}}^{-1}(\theta _{m})V(\theta _{m}),\,}

and under certain regularity conditions, it can be shown that θ m θ {\displaystyle \theta _{m}\rightarrow \theta ^{*}} .

Fisher scoring

In practice, J ( θ ) {\displaystyle {\mathcal {J}}(\theta )} is usually replaced by I ( θ ) = E [ J ( θ ) ] {\displaystyle {\mathcal {I}}(\theta )=\mathrm {E} [{\mathcal {J}}(\theta )]} , the Fisher information, thus giving us the Fisher Scoring Algorithm:

θ m + 1 = θ m + I 1 ( θ m ) V ( θ m ) {\displaystyle \theta _{m+1}=\theta _{m}+{\mathcal {I}}^{-1}(\theta _{m})V(\theta _{m})} ..

Under some regularity conditions, if θ m {\displaystyle \theta _{m}} is a consistent estimator, then θ m + 1 {\displaystyle \theta _{m+1}} (the correction after a single step) is 'optimal' in the sense that its error distribution is asymptotically identical to that of the true max-likelihood estimate.[2]

See also

  • Score (statistics)
  • Score test
  • Fisher information

References

  1. ^ Longford, Nicholas T. (1987). "A fast scoring algorithm for maximum likelihood estimation in unbalanced mixed models with nested random effects". Biometrika. 74 (4): 817–827. doi:10.1093/biomet/74.4.817.
  2. ^ Li, Bing; Babu, G. Jogesh (2019), "Bayesian Inference", Springer Texts in Statistics, New York, NY: Springer New York, Theorem 9.4, doi:10.1007/978-1-4939-9761-9_6, ISBN 978-1-4939-9759-6, S2CID 239322258, retrieved 2023-01-03

Further reading

  • Jennrich, R. I. & Sampson, P. F. (1976). "Newton-Raphson and Related Algorithms for Maximum Likelihood Variance Component Estimation". Technometrics. 18 (1): 11–17. doi:10.1080/00401706.1976.10489395 (inactive 31 January 2024). JSTOR 1267911.{{cite journal}}: CS1 maint: DOI inactive as of January 2024 (link)
  • v
  • t
  • e
Optimization: Algorithms, methods, and heuristics
Unconstrained nonlinear
Functions
Gradients
Convergence
Quasi–Newton
Other methods
Hessians
Graph of a strictly concave quadratic function with unique maximum.
Optimization computes maxima and minima.
General
Differentiable
Convex
minimization
Linear and
quadratic
Interior point
Basis-exchange
Paradigms
Graph
algorithms
Minimum
spanning tree
Shortest path
Network flows