AdaBoost

AdaBoost(Adaptive Boosting、エイダブースト、アダブースト)は、Yoav Freund と Robert Schapire によって考案された[1]機械学習アルゴリズムである。メタアルゴリズムであり、他の多くの学習アルゴリズムと組み合わせて利用することで、そのパフォーマンスを改善することができる。AdaBoost は前の分類機の間違いに応じて調整された次の分類機を作るという意味で適応的 (Adaptive) である。AdaBoost はノイズの多いデータや異常値に影響を受ける。しかし、いくつかの場面では、多くの学習アルゴリズムより過剰適合の影響を受けにくい。

AdaBoost は、それぞれの標本に対し、弱分類器(英語版) t {\displaystyle t} を、 t = 1 {\displaystyle t=1} から t = T {\displaystyle t=T} まで順に適用し、それぞれの分類器が正解したか否かを判断する。間違って分類された標本に対応する重み D t {\displaystyle D_{t}} は、より重くされる(あるいは、正しく分類された標本の場合は、重みを減らす)。これらの標本に対する重みから、次の t のループでは正しい分類器を早く探す事が出来る。

二分類のアルゴリズム

Given: ( x 1 , y 1 ) , , ( x m , y m ) {\displaystyle (x_{1},y_{1}),\ldots ,(x_{m},y_{m})} where x i X , y i Y = { 1 , + 1 } {\displaystyle x_{i}\in X,\,y_{i}\in Y=\{-1,+1\}}

Initialize D 1 ( i ) = 1 m , i = 1 , , m . {\displaystyle D_{1}(i)={\frac {1}{m}},i=1,\ldots ,m.}

For t = 1 , , T {\displaystyle t=1,\ldots ,T} :

  • Find the classifier h t : X { 1 , + 1 } {\displaystyle h_{t}:X\to \{-1,+1\}} that minimizes the error with respect to the distribution D t {\displaystyle D_{t}} :
h t = argmin h j H ϵ j {\displaystyle h_{t}={\underset {h_{j}\in {\mathcal {H}}}{\operatorname {argmin} }}\;\epsilon _{j}} , where ϵ j = i = 1 m D t ( i ) [ y i h j ( x i ) ] {\displaystyle \epsilon _{j}=\sum _{i=1}^{m}D_{t}(i)[y_{i}\neq h_{j}(x_{i})]}
  • if ϵ t > 0.5 {\displaystyle \epsilon _{t}>0.5} then stop.
  • Choose α t R {\displaystyle \alpha _{t}\in \mathbf {R} } , typically α t = 1 2 ln 1 ϵ t ϵ t {\displaystyle \alpha _{t}={\frac {1}{2}}{\textrm {ln}}{\frac {1-\epsilon _{t}}{\epsilon _{t}}}} where ϵ t {\displaystyle \epsilon _{t}} is the weighted error rate of classifier h t {\displaystyle h_{t}} .
  • Update:
D t + 1 ( i ) = D t ( i ) exp ( α t y i h t ( x i ) ) Z t {\displaystyle D_{t+1}(i)={\frac {D_{t}(i)\exp(-\alpha _{t}y_{i}h_{t}(x_{i}))}{Z_{t}}}}

where Z t {\displaystyle Z_{t}} is a normalization factor (chosen so that D t + 1 {\displaystyle D_{t+1}} will be a probability distribution, i.e. sum one over all x).

Output the final classifier:

H ( x ) = sign ( t = 1 T α t h t ( x ) ) {\displaystyle H(x)={\textrm {sign}}\left(\sum _{t=1}^{T}\alpha _{t}h_{t}(x)\right)}

The equation to update the distribution D t {\displaystyle D_{t}} is constructed so that:

α t y i h t ( x i ) { < 0 , y ( i ) = h t ( x i ) > 0 , y ( i ) h t ( x i ) {\displaystyle -\alpha _{t}y_{i}h_{t}(x_{i}){\begin{cases}<0,&y(i)=h_{t}(x_{i})\\>0,&y(i)\neq h_{t}(x_{i})\end{cases}}}

Thus, after selecting an optimal classifier h t {\displaystyle h_{t}\,} for the distribution D t {\displaystyle D_{t}\,} , the examples x i {\displaystyle x_{i}\,} that the classifier h t {\displaystyle h_{t}\,} identified correctly are weighted less and those that it identified incorrectly are weighted more. Therefore, when the algorithm is testing the classifiers on the distribution D t + 1 {\displaystyle D_{t+1}\,} , it will select a classifier that better identifies those examples that the previous classifer missed.

ブースティングの統計的理解

ブースティングは凸集合の関数上に関する凸損失関数の最小化とみなすことができる[2]。特に、損失関数を最小化するために指数関数の損失関数:

i e y i f ( x i ) {\displaystyle \sum _{i}e^{-y_{i}f(x_{i})}}

および関数に対して探索を行う:

f = t α t h t {\displaystyle f=\sum _{t}\alpha _{t}h_{t}}

を用いる。

関連項目

脚注

  1. ^ Yoav Freund, Robert E. Schapire (1995年). “A Decision-Theoretic Generalization of on-Line Learning and an Application to Boosting”. 2010年7月9日閲覧。
  2. ^ T. Zhang, "Convex Risk Minimization", Annals of Statistics, 2004.

外部リンク

  • icsiboost, an open source implementation of Boostexter
  • NPatternRecognizer , a fast machine learning algorithm library written in C#. It contains support vector machine, neural networks, bayes, boost, k-nearest neighbor, decision tree, ..., etc.
  • Adaboost in C++, an implementation of Adaboost in C++ and boost by Antonio Gulli
  • Easy readable Matlab Implementation of Classic AdaBoost
  • Boosting.org, a site on boosting and related ensemble learning methods
  • JBoost, a site offering a classification and visualization package, implementing AdaBoost among other boosting algorithms.
  • AdaBoost Presentation summarizing Adaboost (see page 4 for an illustrated example of performance)
  • A Short Introduction to Boosting Introduction to Adaboost by Freund and Schapire from 1999
  • A decision-theoretic generalization of on-line learning and an application to boosting Journal of Computer and System Sciences, no. 55. 1997 (Original paper of Yoav Freund and Robert E.Schapire where Adaboost is first introduced.)
  • An applet demonstrating AdaBoost
  • Ensemble Based Systems in Decision Making, R. Polikar, IEEE Circuits and Systems Magazine, vol.6, no.3, pp. 21-45, 2006. A tutorial article on ensemble systems including pseudocode, block diagrams and implementation issues for AdaBoost and other ensemble learning algorithms.
  • A Matlab Implementation of AdaBoost
  • Additive logistic regression: a statistical view of boosting by Jerome Friedman, Trevor Hastie, Robert Tibshirani. Paper introducing probabilistic theory for AdaBoost, and introducing GentleBoost
  • OpenCV implementation of several boosting variants
  • MATLAB AdaBoost toolbox. Includes Real AdaBoost, Gentle AdaBoost and Modest AdaBoost implementations.
  • AdaBoost and the Super Bowl of Classifiers - A Tutorial on AdaBoost.
  • Rapid Object Detection using a Boosted Cascade of Simple Features