Filtern
Dokumenttyp
Sprache
- Englisch (2)
Gehört zur Bibliographie
- ja (2)
Schlagworte
- Boosting (2) (entfernen)
In this work we consider statistical learning problems. A learning machine aims to extract information from a set of training examples such that it is able to predict the associated label on unseen examples. We consider the case where the resulting classification or regression rule is a combination of simple rules - also called base hypotheses. The so-called boosting algorithms iteratively find a weighted linear combination of base hypotheses that predict well on unseen data. We address the following issues: o The statistical learning theory framework for analyzing boosting methods. We study learning theoretic guarantees on the prediction performance on unseen examples. Recently, large margin classification techniques emerged as a practical result of the theory of generalization, in particular Boosting and Support Vector Machines. A large margin implies a good generalization performance. Hence, we analyze how large the margins in boosting are and find an improved algorithm that is able to generate the maximum margin solution. o How can boosting methods be related to mathematical optimization techniques? To analyze the properties of the resulting classification or regression rule, it is of high importance to understand whether and under which conditions boosting converges. We show that boosting can be used to solve large scale constrained optimization problems, whose solutions are well characterizable. To show this, we relate boosting methods to methods known from mathematical optimization, and derive convergence guarantees for a quite general family of boosting algorithms. o How to make Boosting noise robust? One of the problems of current boosting techniques is that they are sensitive to noise in the training sample. In order to make boosting robust, we transfer the soft margin idea from support vector learning to boosting. We develop theoretically motivated regularized algorithms that exhibit a high noise robustness. o How to adapt boosting to regression problems? Boosting methods are originally designed for classification problems. To extend the boosting idea to regression problems, we use the previous convergence results and relations to semi-infinite programming to design boosting-like algorithms for regression problems. We show that these leveraging algorithms have desirable theoretical and practical properties. o Can boosting techniques be useful in practice? The presented theoretical results are guided by simulation results either to illustrate properties of the proposed algorithms or to show that they work well in practice. We report on successful applications in a non-intrusive power monitoring system, chaotic time series analysis and a drug discovery process. --- Anmerkung: Der Autor ist Träger des von der Mathematisch-Naturwissenschaftlichen Fakultät der Universität Potsdam vergebenen Michelson-Preises für die beste Promotion des Jahres 2001/2002.
Questions Can forest site characteristics be used to predict Ellenberg indicator values for soil moisture? Which is the best averaged mean value for modelling? Does the distribution of soil moisture depend on spatial information? Location Bavarian Alps, Germany. Methods We used topographic, climatic and edaphic variables to model the mean soil moisture value as found on 1505 forest plots from the database WINALPecobase. All predictor variables were taken from area-wide geodata layers so that the model can be applied to some 250 000 ha of forest in the target region. We adopted methods developed in species distribution modelling to regionalize Ellenberg indicator values. Therefore, we use the additive georegression framework for spatial prediction of Ellenberg values with the R-library mboost, which is a feasible way to consider environmental effects, spatial autocorrelation, predictor interactions and non-stationarity simultaneously in our data. The framework is much more flexible than established statistical and machine-learning models in species distribution modelling. We estimated five different mboost models reflecting different model structures on 50 bootstrap samples in each case. Results Median R2 values calculated on independent test samples ranged from 0.28 to 0.45. Our results show a significant influence of interactions and non-stationarity in addition to environmental covariates. Unweighted mean indicator values can be modelled better than abundance-weighted values, and the consideration of bryophytes did not improve model performance. Partial response curves indicate meaningful dependencies between moisture indicator values and environmental covariates. However, mean indicator values <4.5 and >6.0 could not be modelled correctly, since they were poorly represented in our calibration sample. The final map represents high-resolution information of site hydrological conditions. Conclusions Indicator values offer an effect-oriented alternative to physically-based hydrological models to predict water-related site conditions, even at landscape scale. The presented approach is applicable to all kinds of Ellenberg indicator values. Therefore, it is a significant step towards a new generation of models of forest site types and potential natural vegetation.