Home > Research > Publications & Outputs > Linking DNA Metabarcoding and Text Mining to Cr...
View graph of relations

Linking DNA Metabarcoding and Text Mining to Create Network-Based Biomonitoring Tools: A Case Study on Boreal Wetland Macroinvertebrate Communities

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNChapter

  • Zacchaeus G. Compson
  • Wendy A. Monk
  • Colin J. Curry
  • Dominique Gravel
  • Alex Bush
  • Christopher J.O. Baker
  • Mohammad Sadnan Al Manir
  • Alexandre Riazanov
  • Mehrdad Hajibabaei
  • Shadi Shokralla
  • Joel F. Gibson
  • Sonja Stefani
  • Michael T.G. Wright
  • Donald J. Baird
Publication date2018
Host publicationAdvances in Ecological Research
EditorsDavid A. Bohan, Alex J. Dumbrell, Guy Woodward, Michelle Jackson
Number of pages42
ISBN (Print)9780128143179
<mark>Original language</mark>English

Publication series

NameAdvances in Ecological Research
ISSN (Print)0065-2504


Ecological networks are powerful tools for visualizing biodiversity data and assessing ecosystem health and function. Constructing these networks requires considerable empirical efforts, and this remains highly challenging due to sampling limitations and the laborious and notoriously limited, error-prone process of traditional taxonomic identification. Recent advancements in high-throughput gene sequencing and high-performance computing provide new ways to address these challenges. DNA metabarcoding, a method of bulk taxonomic identification from DNA extracted from environmental samples, can generate detailed biodiversity information through a standardizable analytical pipeline for species detection. When this biodiversity information is annotated with prior knowledge on taxon interactions, body size, and trophic position, it is possible to generate trait-based networks, which we call “heuristic food webs”. Although curating trait matrices for constructing heuristic food webs is a laborious, often intractable process using manual literature surveys, it can be greatly accelerated via text mining, allowing knowledge of relevant traits to be gathered across large databases. To explore this possibility, we employed a General Architecture for Text Engineering (GATE) system to create a hybrid text-mining pipeline combining rule-based and machine-learning modules. This pipeline was then used to query online repositories of published papers for missing data on a key trait, body size, that could not be gathered from existing trophic link libraries of freshwater benthic macroinvertebrates. Combining text-mined body size information with feeding information from existing sources allowed us to generate a database of over 20,000 pairwise trophic interactions. Next, we developed a pipeline that uses taxa lists generated from DNA metabarcoding and annotates this matrix with trophic information from existing databases and text-mined body size data. In this way, we generated heuristic food webs for wetland sites within a large delta complex formed by the confluence of the Peace and Athabasca rivers in northern Alberta: the Peace–Athabasca delta. Finally, we used these putative food webs and their network properties to resolve spatial and temporal differences between the benthic subwebs of wetlands in the Peace and Athabasca sectors of the delta complex. Specifically, we asked two questions. (1) How do food web properties (e.g. number of links, linkage density, trophic height) differ between the wetlands of the Peace and Athabasca deltas? (2) How do food web properties change temporally in wetlands of the two deltas? We discuss using DNA-generated, trait-based food webs as a powerful tool for rapid bioassessment, assess the limitations of our current approach, and outline a path forward to make this powerful tool more widely available for land managers and conservation biologists.