Carsten F. Dormann, Jane Elith, Sven Bacher, Carsten M. Buchmann, Gudrun Carl, Gabriel Carre, Jaime R. Garcia Marquez, Bernd Gruber, Bruno Lafourcade, Pedro J. Leitao, Tamara Münkemüller, Colin McClean, Patrick E. Osborne, Bjoern Reineking, Boris Schröder-Esselbach, Andrew K. Skidmore, Damaris Zurell, Sven Lautenbach
- Collinearity refers to the non independence of predictor variables, usually in a regression-type analysis. It is a common feature of any descriptive ecological data set and can be a problem for parameter estimation because it inflates the variance of regression parameters and hence potentially leads to the wrong identification of relevant predictors in a statistical model. Collinearity is a severe problem when a model is trained on data from one region or time, and predicted to another with a different or unknown structure of collinearity. To demonstrate the reach of the problem of collinearity in ecology, we show how relationships among predictors differ between biomes, change over spatial scales and through time. Across disciplines, different approaches to addressing collinearity problems have been developed, ranging from clustering of predictors, threshold-based pre-selection, through latent variable methods, to shrinkage and regularisation. Using simulated data with five predictor-response relationships of increasing complexityCollinearity refers to the non independence of predictor variables, usually in a regression-type analysis. It is a common feature of any descriptive ecological data set and can be a problem for parameter estimation because it inflates the variance of regression parameters and hence potentially leads to the wrong identification of relevant predictors in a statistical model. Collinearity is a severe problem when a model is trained on data from one region or time, and predicted to another with a different or unknown structure of collinearity. To demonstrate the reach of the problem of collinearity in ecology, we show how relationships among predictors differ between biomes, change over spatial scales and through time. Across disciplines, different approaches to addressing collinearity problems have been developed, ranging from clustering of predictors, threshold-based pre-selection, through latent variable methods, to shrinkage and regularisation. Using simulated data with five predictor-response relationships of increasing complexity and eight levels of collinearity we compared ways to address collinearity with standard multiple regression and machine-learning approaches. We assessed the performance of each approach by testing its impact on prediction to new data. In the extreme, we tested whether the methods were able to identify the true underlying relationship in a training dataset with strong collinearity by evaluating its performance on a test dataset without any collinearity. We found that methods specifically designed for collinearity, such as latent variable methods and tree based models, did not outperform the traditional GLM and threshold-based pre-selection. Our results highlight the value of GLM in combination with penalised methods (particularly ridge) and threshold-based pre-selection when omitted variables are considered in the final interpretation. However, all approaches tested yielded degraded predictions under change in collinearity structure and the folk lore'-thresholds of correlation coefficients between predictor variables of |r| >0.7 was an appropriate indicator for when collinearity begins to severely distort model estimation and subsequent prediction. The use of ecological understanding of the system in pre-analysis variable selection and the choice of the least sensitive statistical approaches reduce the problems of collinearity, but cannot ultimately solve them.…
MetadatenAuthor details: | Carsten F. Dormann, Jane Elith, Sven Bacher, Carsten M. Buchmann, Gudrun Carl, Gabriel Carre, Jaime R. Garcia Marquez, Bernd Gruber, Bruno Lafourcade, Pedro J. Leitao, Tamara Münkemüller, Colin McClean, Patrick E. Osborne, Bjoern Reineking, Boris Schröder-EsselbachORCiDGND, Andrew K. Skidmore, Damaris ZurellORCiDGND, Sven Lautenbach |
---|
DOI: | https://doi.org/10.1111/j.1600-0587.2012.07348.x |
---|
ISSN: | 0906-7590 |
---|
ISSN: | 1600-0587 |
---|
Title of parent work (English): | Ecography : pattern and diversity in ecology ; research papers forum |
---|
Publisher: | Wiley-Blackwell |
---|
Place of publishing: | Hoboken |
---|
Publication type: | Article |
---|
Language: | English |
---|
Year of first publication: | 2013 |
---|
Publication year: | 2013 |
---|
Release date: | 2017/03/26 |
---|
Volume: | 36 |
---|
Issue: | 1 |
---|
Number of pages: | 20 |
---|
First page: | 27 |
---|
Last Page: | 46 |
---|
Funding institution: | Helmholtz Association [VH-NG-247]; German Science Foundation
[4851/220/07, SCHR 1000/3-1, 14-2]; Australian Centre of Excellence for
Risk Analysis; Australian Research Council [DP0772671]; Hesse's Ministry
of Higher Education, Research, and the Arts; Portuguese Science and
Technology Foundation FCT [SFRH/BD/12569/2003]; 'Bavarian Climate
Programme 2020' within the joint research centre FORKAST |
---|
Organizational units: | Mathematisch-Naturwissenschaftliche Fakultät / Institut für Biochemie und Biologie |
---|
Peer review: | Referiert |
---|