Refine
Has Fulltext
- no (3)
Document Type
- Article (3)
Language
- English (3)
Is part of the Bibliography
- yes (3)
Keywords
- Record linkage (1)
- animal movement (1)
- clustering (1)
- data matching (1)
- deduplication (1)
- entity resolution (1)
- kernel density estimation (1)
- local convex hull (1)
- minimum convex polygon (1)
- range distribution (1)
Moving in the Anthropocene
(2018)
Animal movement is fundamental for ecosystem functioning and species survival, yet the effects of the anthropogenic footprint on animal movements have not been estimated across species. Using a unique GPS-tracking database of 803 individuals across 57 species, we found that movements of mammals in areas with a comparatively high human footprint were on average one-half to one-third the extent of their movements in areas with a low human footprint. We attribute this reduction to behavioral changes of individual animals and to the exclusion of species with long-range movements from areas with higher human impact. Global loss of vagility alters a key ecological trait of animals that affects not only population persistence but also ecosystem processes such as predator-prey interactions, nutrient cycling, and disease transmission.
Home range estimation is routine practice in ecological research. While advances in animal tracking technology have increased our capacity to collect data to support home range analysis, these same advances have also resulted in increasingly autocorrelated data. Consequently, the question of which home range estimator to use on modern, highly autocorrelated tracking data remains open. This question is particularly relevant given that most estimators assume independently sampled data. Here, we provide a comprehensive evaluation of the effects of autocorrelation on home range estimation. We base our study on an extensive data set of GPS locations from 369 individuals representing 27 species distributed across five continents. We first assemble a broad array of home range estimators, including Kernel Density Estimation (KDE) with four bandwidth optimizers (Gaussian reference function, autocorrelated‐Gaussian reference function [AKDE], Silverman's rule of thumb, and least squares cross‐validation), Minimum Convex Polygon, and Local Convex Hull methods. Notably, all of these estimators except AKDE assume independent and identically distributed (IID) data. We then employ half‐sample cross‐validation to objectively quantify estimator performance, and the recently introduced effective sample size for home range area estimation ( N̂ area
) to quantify the information content of each data set. We found that AKDE 95% area estimates were larger than conventional IID‐based estimates by a mean factor of 2. The median number of cross‐validated locations included in the hold‐out sets by AKDE 95% (or 50%) estimates was 95.3% (or 50.1%), confirming the larger AKDE ranges were appropriately selective at the specified quantile. Conversely, conventional estimates exhibited negative bias that increased with decreasing N̂ area. To contextualize our empirical results, we performed a detailed simulation study to tease apart how sampling frequency, sampling duration, and the focal animal's movement conspire to affect range estimates. Paralleling our empirical results, the simulation study demonstrated that AKDE was generally more accurate than conventional methods, particularly for small N̂ area. While 72% of the 369 empirical data sets had >1,000 total observations, only 4% had an N̂ area >1,000, where 30% had an N̂ area <30. In this frequently encountered scenario of small N̂ area, AKDE was the only estimator capable of producing an accurate home range estimate on autocorrelated data.
Duplicate detection algorithms produce clusters of database records, each cluster representing a single real-world entity. As most of these algorithms use pairwise comparisons, the resulting (transitive) clusters can be inconsistent: Not all records within a cluster are sufficiently similar to be classified as duplicate. Thus, one of many subsequent clustering algorithms can further improve the result. <br /> We explain in detail, compare, and evaluate many of these algorithms and introduce three new clustering algorithms in the specific context of duplicate detection. Two of our three new algorithms use the structure of the input graph to create consistent clusters. Our third algorithm, and many other clustering algorithms, focus on the edge weights, instead. For evaluation, in contrast to related work, we experiment on true real-world datasets, and in addition examine in great detail various pair-selection strategies used in practice. While no overall winner emerges, we are able to identify best approaches for different situations. In scenarios with larger clusters, our proposed algorithm, Extended Maximum Clique Clustering (EMCC), and Markov Clustering show the best results. EMCC especially outperforms Markov Clustering regarding the precision of the results and additionally has the advantage that it can also be used in scenarios where edge weights are not available.