Statistical modelling of species distributions using presence-only data - Research Portal

Lancaster Environment Centre

Electronic data

2021Escamilla-MolgoraPhD
Final published version, 49.5 MB, PDF document
Available under license: CC BY-NC-SA: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Text available via DOI:

https://doi.org/10.17635/lancaster/thesis/1209
Final published version

Keywords

SPECIES DISTRIBUTION MODELS, Spatial statistics, Cloud computing environments, Big Data for Ecology, Knowledge based systems

View graph of relations

Statistical modelling of species distributions using presence-only data: A semantic and graphical approach using the tree of life

Research output: Thesis › Doctoral Thesis

Published

Juan Manuel Escamilla Molgora

More...

Publication date	12/01/2021
Number of pages	305
Qualification	PhD
Awarding Institution	Lancaster University
Supervisors/Advisors	Atkinson, Peter, Supervisor Sedda, Luigi, Supervisor Diggle, Peter, Supervisor
Thesis sponsors	CONACYT Faculty of Science and Technology, Lancaster University
Award date	29/09/2020
Publisher	Lancaster University
<mark>Original language</mark>	English

Abstract

Understanding the mechanisms that determine and differentiate the establishment of organisms in space is an old and fundamental question in ecology. The emergence of life’s spatial patterns is guided by the confluence of three forces: the environmental filtering, which unbalances the probability of establishment for organisms given their evolutionary adaptations to local environmental conditions; the biological interactions, which restrict their establishment according to the presence (or absence) of other organisms; the diversification of organisms’ strategies (traits) to migrate and adapt to changing environments.
The main hypothesis in this research is that the accumulated knowledge of biodiversity occurrences, the species taxonomic classification and geospatial environmental data can be integrated into a unified modelling framework to characterise the joint effect of these three forces and, thus, contribute with more general, accurate and statistically sound species distributions models (SDM)s.

The first part of this thesis describes the design and implementation of a knowledge engine capable to synthesise and integrate environmental geospatial data, taxonomic relationships and species occurrences. It uses semantic queries to instantiate complex data structures, represented as networks of concepts (knowledge graphs). Local taxonomic trees, distributed over a hierarchical spatial system of regular lattices are used as knowledge graphs to perform data synthesis, geoprocessing, and transformations. The implementation uses efficient call-by-need evaluations that facilitates spatial and scale analysis on large datasets.

The second part of the thesis corresponds to the statistical specification and implementation of two modelling frameworks for species distribution models (one for single species and other for multiple species). These models are designed for presence-only observations; obtained from the knowledge engine. The common specification of these models are that presence-only observations are the joint effect of two latent processes: one, that defines the species presence (ecological suitability); and other, that defines the probability of being sampled (sampling effort). The single species framework uses an informative sample, chosen by the modeller, to account for the sampling effort. Three modelling strategies are proposed for accounting the joint effect of the ecological and sampling process (independent processes, a common spatial random effect and correlated processes). The tree models were compared to the maximum entropy model (MaxEnt), a popular algorithm used in SDMs. In all cases, at least one model showed a better predictive performance than MaxEnt.

The multi-species modelling framework is a generalisation of the single species framework for developing a joint species distribution model for presence-only data. The specification is a multilevel hierarchical logistic model with a single spatial random effect, common to all species of interest. The sampling effort is modelled as a complementary sample obtained by complementary observations from the taxa of interest using a regional taxonomic tree. The model was tested against simulated data. All simulated parameters were covered by the credible intervals of the posterior sampling. A study case in Easter Mexico was presented as an application of the model. The results obtained in the case study were consistent with the macroecological theory. The model showed to be effective in removing bias and noise given by the sampling effort. This effect was particularly impressive in urban areas, where the sampling intensity is greater. The research presented here provides an interdisciplinary approach for modelling joint species distributions aided by the automated selection of biological, spatial and environmental context.

Research

Electronic data

Text available via DOI:

Keywords

Statistical modelling of species distributions using presence-only data: A semantic and graphical approach using the tree of life

Abstract

Quick Links

Connect With Us

Faculties & Depts

Contact Us

Research

Electronic data

Text available via DOI:

Keywords

Statistical modelling of species distributions using presence-only data: A semantic and graphical approach using the tree of life

Abstract

Related research outputs

A joint distribution framework to improve presence-only species distribution models by exploiting opportunistic surveys

Biospytial: spatial graph-based computing for ecological big data

Quick Links

Connect With Us

Faculties & Depts

Contact Us