Home > Research > Projects > Data Flows in Genomic and Environmental Science
View graph of relations


Data Flows in Genomic and Environmental Science

Project: Non-funded ProjectResearch


The aim of the theme is to re-describe how data moves in NGS (Next Generation Sequencing) genomic sciences and in ENS (Embedded Networked Sensing) environmental sciences. In the life sciences, NGS and ENS epitomise very different data ‘topographies’. The mini-theme will develop maps of trajectories of data from collection (NGS instruments, embedded network sensors) through analysis, storage, visualizations, models, and publications. We have identified two related points of analytical focus: replication and durability. In the NGS setting,replication is a key problem. At the recent BBSRC workshop ‘Challenges of Visualizing Biological Data’ (Bristol, November 2010), Reinhard Schneider (EMBL-Heidelberg) identified the very low rate of replicability of genomic data (~2%). In the ENS setting, durability is a key problem. Recent work at CENS (UCLA) and elsewhere indicates that the obtaining of high volume environmental data over time requires careful planning to make sure environmental and computer scientists have an ongoing stake in keeping the sensor network going.
One key reference point for e-science is what we are calling ‘data metrology.’ Across both NGS and ENS, there are many general measurements of data size and quantity. In practice, they are a very inadequate guide to the problems of doing e-Science. There are good reasons to come up with better metrologies of data and with metrics that indicate where the value in data comes from. However,
realistic metrics might be hard to establish precisely because of the evolving nature of the data topographies.