A Bayesian Approach for Online Classifier Ensemble

We propose a Bayesian approach for recursively estimating the classifier weights in online learning of a classifier ensemble. In contrast with past methods, such as stochastic gradient descent or online boosting, our approach estimates the weights by recursively updating its posterior distribution. For a specified class of loss functions, we show that it is possible to formulate a suitably defined likelihood function and hence use the posterior distribution as an approximation to the global empirical loss minimizer. If the stream of training data is sampled from a stationary process, we can also show that our approach admits a superior rate of convergence to the expected loss minimizer than is possible with standard stochastic gradient descent.

For more details ICML'14 paper, full version, Slides.

Downloads

[Code*] [Dataset* + Result*]
* Coming soon!

[BibTex]
@inproceedings{bai2014bayesian,
title={A Bayesian framework for online classifier ensemble},
author={Bai, Qinxun and Lam, Henry and Sclaroff, Stan},
booktitle={Proceedings of the 31st International Conference on Machine Learning (ICML)},
year={2014}
}

Online Ensemble Learning

A Stochastic Optimization Problem

Motivation

Two classical results in Bayesian statistics

The Bayesian posterior distribution tends to peak at the MLE of the same likelihood function.
MLE minimizes the expected negative log-likelihood.

Our idea

Above results hold regardless of whether the likelihood actually describes the data generating proces or not.
Derive an artifitial likelihood from a predefined loss function and run the recursive Bayesian procedure.

Main results

A Bayesian scheme for online classsifier ensemble

Formulate as a stochastic optimization problem and estimate the ensemble weights through a recursive Bayesian procedure.
Under some regularity conditions, converges to global optimum, in contrast with many local methods.
Convergence rate superior to standard stochastic gradient descent, in terms of asymptotic variance. Empirically outperforms state-of-the-art SGD methods.
Promising performance in experiments compared with online boosting methods.