Home > Research > Publications & Outputs > Biospytial

Links

Text available via DOI:

View graph of relations

Biospytial: spatial graph-based computing for ecological big data

Research output: Contribution to journalJournal article

Published
Article numbergiaa039
<mark>Journal publication date</mark>11/05/2020
<mark>Journal</mark>GigaScience
Issue number5
Volume9
Number of pages25
Pages (from-to)1-25
Publication statusPublished
Original languageEnglish

Abstract

Biospytial is a modular open source knowledge engine designed to import, organise, analyse and visualise big spatial ecological datasets using the power of graph theory. Specifically, it handles species occurrences and their taxonomic classification for performing ecological analysis on biodiversity and species distributions. The engine uses a hybrid graph-relational approach to store and access information. The data are linked with relationships that are stored in a graph database, while tabular and geospatial (vector and raster) data are stored in a relational database management system (RDBMS). The graph data structure provides a scalable design that eases the problem of merging datasets from different sources. The linkage relationships use semantic structures (objects and predicates) to answer scientific questions represented as complex data structures stored in the graph database. In this sense, we used species occurrences, taxonomic classification, and climatic datasets to build a knowledge graph of the Tree of Life embedded in an environmental and geographical grid. Biospytial comprises three interconnected components: i) a Geospatial Processing unit (GPU) supported by a RDBMS with geoprocessing capabilities, ii) a Graph Storage and Querying Unit, and iii) a graph-relational package, called: The Biospytial Computing Engine (BCE) that integrates all the system’s components. It also includes tools like: interactive notebooks (Jupyter), graph analytic libraries (NetworkX) and statistical frameworks (PyMC3). The Biospytial approach reduces the complexity of joining datasets using multiple primary-foreign key relations, a drawback in RDBMS. Applied to ecological data, it allows the discovery and inference of relationships using the interconnected network of taxonomic and spatial relationships. Its modular and scalable design makes it possible to run and distribute several instances simultaneously, allowing fast and efficient handling of big and complex ecological datasets. An example applied to the conservation of threatened species from the IUCN Red List using the co-occurrence of jaguars (Panthera onca) is included. This example demonstrates the engine’s capabilities in performing basic taxonomic trees manipulation, analysis and visualization of taxonomic groups co-occurring in space.