RT-75

14th International Congress
THE "NEW FRONTIERS"
OF ARRHYTHMIAS 2000

Jan. 29 - Feb. 5, 2000
Marilleva, Trento, Italy

RT-75

Bagging as a predictive method for landscape epidemiology of Lyme disease

Cesare Furlanello, Stefano Merler, Annapaola Rizzoli*, Claudio Chemini*, Claudio Genchi**.
ITC-IRST, Trento, *Centro di Ecologia Alpina, Trento, **Institute of General Pathology and Parassitology, Veterinary Medicine, MilanoUniversity, Italy

Bagging classification trees for predictive risk modeling

A statistical procedure called bagging (for Bootstrap AGGregatING) has recently been proposed for combining an ensemble of predictive models25. The key idea is to grow many versions of the same predictor over resampled versions of the data set L available for model development. Given this learning set L=(yn,xn)n=1,...,N, of N observations, where the yn are the class labels and the xn are the multivariate vectors of predictor variables, assume there exists a procedure to develop from L a model f (x, L) for predicting class y of a generic and possibly unseen vector x. When small changes in L can result in large changes in the classification model f, future classification is improved by using a combination of different f (x, Lb), each developed on a different learning set Lb. The output of the bagged classifier fB(x) is defined by voting, i.e. the class receiving the most votes (if Nj=#{b; f (x, Lb)=j}, fB (x)=argmaxj Nj), or by the average in the case of regression models. Additional data is not required: the replicate learning sets Lb are obtained as bootstrap samples of L, i.e. data sets of N cases drawn at random, but with replacement, from L.
The bagging process is illustrated in figure 1. The technique can reduce output variability and, thus, induce a surprisingly large reduction in error rates, which is especially due to the use of recursive partitioning procedures (or tree-based classifiers) which have low bias but a dependence on the training data. A single classification tree is not only appropriate for modeling phenomena described by both discrete and continuous variables, but also yields a simple representation of the model. The bagging procedure, in which output is obtained by voting or averaging, can reduce the error variability on novel data, however it generally produces models of less immediate interpretation. In our landscape epidemiology problem, the model outputs constitute a digital map: the use of a geographical information system (GIS) as an interface thus eliminates the problem of the interpretation of bagged models.

 

Fig. 1: The general bagging procedure.

 


 

backward

forward

CARDIOnet® - registered trade mark name
Copyright © 1996-2000 by CARDIOnet. All rights reserved.