Constrain to perform : regularization of habitat models

Reineking, Björn; Schröder-Esselbach, Boris

doi:10.1016/j.ecolmodel.2005.10.003

Predictive habitat models are an important tool for ecological research and conservation. A major cause of unreliable models is excessive model complexity, and regularization methods aim to improve the predictive performance by adequately constraining model complexity. We compare three regularization methods for logistic regression: variable selection, lasso, and ridge. They differ in the way model complexity is measured: variable selection uses the number of estimated parameters, the lasso uses the sum of the absolute values of the parameter estimates, and the ridge uses the sum of the squared values of the parameter estimates. We performed a simulation study with environmental data of a real landscape and artificial species occupancy data. We investigated the effect of three factors on relative model performance: (1) the number of parameters (16, 10, 6, 2) in the 'true' model that determined the distribution of the artificial species, (2) the prevalence, i.e. the proportion of sites occupied by the species, and (3) the sample sizePredictive habitat models are an important tool for ecological research and conservation. A major cause of unreliable models is excessive model complexity, and regularization methods aim to improve the predictive performance by adequately constraining model complexity. We compare three regularization methods for logistic regression: variable selection, lasso, and ridge. They differ in the way model complexity is measured: variable selection uses the number of estimated parameters, the lasso uses the sum of the absolute values of the parameter estimates, and the ridge uses the sum of the squared values of the parameter estimates. We performed a simulation study with environmental data of a real landscape and artificial species occupancy data. We investigated the effect of three factors on relative model performance: (1) the number of parameters (16, 10, 6, 2) in the 'true' model that determined the distribution of the artificial species, (2) the prevalence, i.e. the proportion of sites occupied by the species, and (3) the sample size (measured in events per variable, EPV). Regularization improved model discrimination and calibration. However, no regularization method performed best under all circumstances: the ridge generally performed best in the 16-parameter scenario. The lasso generally performed best in the 10-parameter scenario. Variable selection with AIC was best at large sample sizes (EPV >= 10) when less than half of the variables influenced the species distribution. However, at low sample sizes (EPV < 10), ridge and lasso always performed best, regardless of the parameter scenario or prevalence. Overall, calibration was best in ridge models. Other methods showed overconfidence, particularly at low sample sizes. The percentage of correctly identified models was low for both lasso and variable selection. Variable selection should be used with caution. Although it can produce the best performing models under certain conditions, these situations are difficult to infer from the data. Ridge and lasso are risk-averse model strategies that can be expected to perform well under a wide range of underlying species-habitat relationships, particularly at small sample sizes.… show more

Author details:	Björn Reineking, Boris Schröder-Esselbach ORCiD GND
URL:	http://www.sciencedirect.com/science/journal/03043800
DOI:	https://doi.org/10.1016/j.ecolmodel.2005.10.003
ISSN:	0304-3800
Publication type:	Article
Language:	English
Year of first publication:	2006
Publication year:	2006
Release date:	2017/03/24
Source:	Ecological modelling. - ISSN 0304-3800. - 193 (2006), 3-4, S. 675 - 690
Organizational units:	Mathematisch-Naturwissenschaftliche Fakultät / Institut für Biochemie und Biologie
Peer review:	Referiert

Constrain to perform : regularization of habitat models

Export metadata

Additional Services