Biospytial - Research Portal | Lancaster University

Associated organisational units

Text available via DOI:

https://doi.org/10.1093/gigascience/giaa039
Final published version
Available under license: CC BY: Creative Commons Attribution 4.0 International License

Biospytial: spatial graph-based computing for ecological big data

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

Biospytial: spatial graph-based computing for ecological big data. / Escamilla Molgora, Juan Manuel ; Sedda, Luigi ; Atkinson, Peter.
In: GigaScience, Vol. 9, No. 5, giaa039, 11.05.2020, p. 1-25.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Bibtex

@article{6cd48b9664b34963b776c441981e4fc7,

title = "Biospytial: spatial graph-based computing for ecological big data",

abstract = "Biospytial is a modular open source knowledge engine designed to import, organise, analyse and visualise big spatial ecological datasets using the power of graph theory. Specifically, it handles species occurrences and their taxonomic classification for performing ecological analysis on biodiversity and species distributions. The engine uses a hybrid graph-relational approach to store and access information. The data are linked with relationships that are stored in a graph database, while tabular and geospatial (vector and raster) data are stored in a relational database management system (RDBMS). The graph data structure provides a scalable design that eases the problem of merging datasets from different sources. The linkage relationships use semantic structures (objects and predicates) to answer scientific questions represented as complex data structures stored in the graph database. In this sense, we used species occurrences, taxonomic classification, and climatic datasets to build a knowledge graph of the Tree of Life embedded in an environmental and geographical grid. Biospytial comprises three interconnected components: i) a Geospatial Processing unit (GPU) supported by a RDBMS with geoprocessing capabilities, ii) a Graph Storage and Querying Unit, and iii) a graph-relational package, called: The Biospytial Computing Engine (BCE) that integrates all the system{\textquoteright}s components. It also includes tools like: interactive notebooks (Jupyter), graph analytic libraries (NetworkX) and statistical frameworks (PyMC3). The Biospytial approach reduces the complexity of joining datasets using multiple primary-foreign key relations, a drawback in RDBMS. Applied to ecological data, it allows the discovery and inference of relationships using the interconnected network of taxonomic and spatial relationships. Its modular and scalable design makes it possible to run and distribute several instances simultaneously, allowing fast and efficient handling of big and complex ecological datasets. An example applied to the conservation of threatened species from the IUCN Red List using the co-occurrence of jaguars (Panthera onca) is included. This example demonstrates the engine{\textquoteright}s capabilities in performing basic taxonomic trees manipulation, analysis and visualization of taxonomic groups co-occurring in space.",

author = "{Escamilla Molgora}, {Juan Manuel} and Luigi Sedda and Peter Atkinson",

year = "2020",

month = may,

day = "11",

doi = "10.1093/gigascience/giaa039",

language = "English",

volume = "9",

pages = "1--25",

journal = "GigaScience",

issn = "2047-217X",

publisher = "Oxford University Press",

number = "5",

}

RIS

TY - JOUR

T1 - Biospytial

T2 - spatial graph-based computing for ecological big data

AU - Escamilla Molgora, Juan Manuel

AU - Sedda, Luigi

AU - Atkinson, Peter

PY - 2020/5/11

Y1 - 2020/5/11

N2 - Biospytial is a modular open source knowledge engine designed to import, organise, analyse and visualise big spatial ecological datasets using the power of graph theory. Specifically, it handles species occurrences and their taxonomic classification for performing ecological analysis on biodiversity and species distributions. The engine uses a hybrid graph-relational approach to store and access information. The data are linked with relationships that are stored in a graph database, while tabular and geospatial (vector and raster) data are stored in a relational database management system (RDBMS). The graph data structure provides a scalable design that eases the problem of merging datasets from different sources. The linkage relationships use semantic structures (objects and predicates) to answer scientific questions represented as complex data structures stored in the graph database. In this sense, we used species occurrences, taxonomic classification, and climatic datasets to build a knowledge graph of the Tree of Life embedded in an environmental and geographical grid. Biospytial comprises three interconnected components: i) a Geospatial Processing unit (GPU) supported by a RDBMS with geoprocessing capabilities, ii) a Graph Storage and Querying Unit, and iii) a graph-relational package, called: The Biospytial Computing Engine (BCE) that integrates all the system’s components. It also includes tools like: interactive notebooks (Jupyter), graph analytic libraries (NetworkX) and statistical frameworks (PyMC3). The Biospytial approach reduces the complexity of joining datasets using multiple primary-foreign key relations, a drawback in RDBMS. Applied to ecological data, it allows the discovery and inference of relationships using the interconnected network of taxonomic and spatial relationships. Its modular and scalable design makes it possible to run and distribute several instances simultaneously, allowing fast and efficient handling of big and complex ecological datasets. An example applied to the conservation of threatened species from the IUCN Red List using the co-occurrence of jaguars (Panthera onca) is included. This example demonstrates the engine’s capabilities in performing basic taxonomic trees manipulation, analysis and visualization of taxonomic groups co-occurring in space.

AB - Biospytial is a modular open source knowledge engine designed to import, organise, analyse and visualise big spatial ecological datasets using the power of graph theory. Specifically, it handles species occurrences and their taxonomic classification for performing ecological analysis on biodiversity and species distributions. The engine uses a hybrid graph-relational approach to store and access information. The data are linked with relationships that are stored in a graph database, while tabular and geospatial (vector and raster) data are stored in a relational database management system (RDBMS). The graph data structure provides a scalable design that eases the problem of merging datasets from different sources. The linkage relationships use semantic structures (objects and predicates) to answer scientific questions represented as complex data structures stored in the graph database. In this sense, we used species occurrences, taxonomic classification, and climatic datasets to build a knowledge graph of the Tree of Life embedded in an environmental and geographical grid. Biospytial comprises three interconnected components: i) a Geospatial Processing unit (GPU) supported by a RDBMS with geoprocessing capabilities, ii) a Graph Storage and Querying Unit, and iii) a graph-relational package, called: The Biospytial Computing Engine (BCE) that integrates all the system’s components. It also includes tools like: interactive notebooks (Jupyter), graph analytic libraries (NetworkX) and statistical frameworks (PyMC3). The Biospytial approach reduces the complexity of joining datasets using multiple primary-foreign key relations, a drawback in RDBMS. Applied to ecological data, it allows the discovery and inference of relationships using the interconnected network of taxonomic and spatial relationships. Its modular and scalable design makes it possible to run and distribute several instances simultaneously, allowing fast and efficient handling of big and complex ecological datasets. An example applied to the conservation of threatened species from the IUCN Red List using the co-occurrence of jaguars (Panthera onca) is included. This example demonstrates the engine’s capabilities in performing basic taxonomic trees manipulation, analysis and visualization of taxonomic groups co-occurring in space.

U2 - 10.1093/gigascience/giaa039

DO - 10.1093/gigascience/giaa039

M3 - Journal article

VL - 9

SP - 1

EP - 25

JO - GigaScience

JF - GigaScience

SN - 2047-217X

IS - 5

M1 - giaa039

ER -

Research

Associated organisational units

Links

Text available via DOI:

Biospytial: spatial graph-based computing for ecological big data

Standard

Harvard

APA

Vancouver

Author

Bibtex

RIS

Quick Links

Connect With Us

Faculties & Depts

Contact Us