Refine
Has Fulltext
- no (3)
Language
- English (3)
Is part of the Bibliography
- yes (3)
Keywords
- Hypothesis generation (1)
- SDM (1)
- mechanistic model (1)
- parameterization (1)
- process-based model (1)
- species distribution model (1)
- uncertainty (1)
- validation (1)
Institute
Collinearity a review of methods to deal with it and a simulation study evaluating their performance
(2013)
Collinearity refers to the non independence of predictor variables, usually in a regression-type analysis. It is a common feature of any descriptive ecological data set and can be a problem for parameter estimation because it inflates the variance of regression parameters and hence potentially leads to the wrong identification of relevant predictors in a statistical model. Collinearity is a severe problem when a model is trained on data from one region or time, and predicted to another with a different or unknown structure of collinearity. To demonstrate the reach of the problem of collinearity in ecology, we show how relationships among predictors differ between biomes, change over spatial scales and through time. Across disciplines, different approaches to addressing collinearity problems have been developed, ranging from clustering of predictors, threshold-based pre-selection, through latent variable methods, to shrinkage and regularisation. Using simulated data with five predictor-response relationships of increasing complexity and eight levels of collinearity we compared ways to address collinearity with standard multiple regression and machine-learning approaches. We assessed the performance of each approach by testing its impact on prediction to new data. In the extreme, we tested whether the methods were able to identify the true underlying relationship in a training dataset with strong collinearity by evaluating its performance on a test dataset without any collinearity. We found that methods specifically designed for collinearity, such as latent variable methods and tree based models, did not outperform the traditional GLM and threshold-based pre-selection. Our results highlight the value of GLM in combination with penalised methods (particularly ridge) and threshold-based pre-selection when omitted variables are considered in the final interpretation. However, all approaches tested yielded degraded predictions under change in collinearity structure and the folk lore'-thresholds of correlation coefficients between predictor variables of |r| >0.7 was an appropriate indicator for when collinearity begins to severely distort model estimation and subsequent prediction. The use of ecological understanding of the system in pre-analysis variable selection and the choice of the least sensitive statistical approaches reduce the problems of collinearity, but cannot ultimately solve them.
SDM performance varied for different range dynamics. Prediction accuracies decreased when abrupt range shifts occurred as species were outpaced by the rate of climate change, and increased again when a new equilibrium situation was realised. When ranges contracted, prediction accuracies increased as the absences were predicted well. Far- dispersing species were faster in tracking climate change, and were predicted more accurately by SDMs than short- dispersing species. BRTs mostly outperformed GLMs. The presence of a predator, and the inclusion of its incidence as an environmental predictor, made BRTs and GLMs perform similarly. Results are discussed in light of other studies dealing with effects of ecological traits and processes on SDM performance. Perspectives are given on further advancements of SDMs and for possible interfaces with more mechanistic approaches in order to improve predictions under environmental change.
Within the field of species distribution modelling an apparent dichotomy exists between process-based and correlative approaches, where the processes are explicit in the former and implicit in the latter. However, these intuitive distinctions can become blurred when comparing species distribution modelling approaches in more detail. In this review article, we contrast the extremes of the correlativeprocess spectrum of species distribution models with respect to core assumptions, model building and selection strategies, validation, uncertainties, common errors and the questions they are most suited to answer. The extremes of such approaches differ clearly in many aspects, such as model building approaches, parameter estimation strategies and transferability. However, they also share strengths and weaknesses. We show that claims of one approach being intrinsically superior to the other are misguided and that they ignore the processcorrelation continuum as well as the domains of questions that each approach is addressing. Nonetheless, the application of process-based approaches to species distribution modelling lags far behind more correlative (process-implicit) methods and more research is required to explore their potential benefits. Critical issues for the employment of species distribution modelling approaches are given, together with a guideline for appropriate usage. We close with challenges for future development of process-explicit species distribution models and how they may complement current approaches to study species distributions.