DNA sequence data are currently viewed as a ‘bedrock’ or ‘backbone’ of modern biological science. This article traces DNA sequence data produced by so-called ‘next generation sequencing’ (NGS) platforms as it moves into a biological data infrastructure called the Sequence Read Archive (SRA). Since 2007, the SRA has been the leading repository for NGS-produced nucleotide (DNA and RNA) sequences. The way sequence data move into the SRA, we suggest, is symptomatic of a decisive shift towards post-archival genomics. This term refers to the increasing importance of the logistics rather than the biology of sequence data. In the SRA, logistical concerns with the bulk movements of sequence data somewhat supplant the emphasis in previous genomic and biological databases on contextualising particular sequences and cross-linking between different forms of biological data. At the same time, post-archival logistics do not necessarily flatten genomic research into global genomic homogeneity. Rather, the SRA provides evidence of an increasingly polymorphous flow of sequence data deriving from an expansion and diversification of sequencing techniques and instruments. The patterns of movement of data in and around the SRA suggest that sequence data are proliferating in various overlapping and sometimes disparate forms. By mapping differences in content across the SRA, by tracking patterns of absence or ‘missingness’ in metadata, and by following how changes in file formats highlight uncertainties in the definitions of seemingly obvious DNA-related artefacts such as a sequencer ‘run’, we highlight the growing lability of nucleotide sequence data. The movements of data in the SRA attest to a decisive mutation in sequences from biological bedrock to an increasingly expandable material whose epistemic and technological value remains open to reinvention.
This is a post-peer-review, pre-copyedit version of an article published in Biosocieties. The definitive publisher-authenticated version Post-archival genomics and the bulk logistics of DNA sequences
Adrian Mackenzie, Ruth McNally, Richard Mills and Stuart Sharples is available online at: http://www.palgrave-journals.com/biosoc/journal/v11/n1/full/biosoc201522a.html