Submitted manuscript, 541 KB, PDF document
Available under license: CC BY: Creative Commons Attribution 4.0 International License
Research output: Contribution to Journal/Magazine › Journal article
Research output: Contribution to Journal/Magazine › Journal article
}
TY - JOUR
T1 - Tempus et Locus: a tool for extracting precisely dated viral sequences from GenBank, and its application to the phylogenetics of primate erythroparvovirus 1 (B19V)
AU - Carter, Alice R.
AU - Gatherer, Derek
PY - 2016/7/4
Y1 - 2016/7/4
N2 - The presence of data in the collection_date field of a GenBank sequence record is of great assistance in the use of that sequence for Bayesian phylogenetics using tip-dating. We present Tempus et Locus (TeL), a tool for extracting such sequences from a GenBank-formatted sequence database. TeL shows that 60% of viral sequences in GenBank have collection date fields, but that this varies considerably between species. Primate erythroparvovirus 1 (human parvovirus B19 or B19V) has only 40% of its sequences dated, of which only 112 are of more than 4 kb. 100 of these are from B19V sub-genotype 1a and were collected from a mere 6 studies conducted in 5 countries between 2002 and 2013. Nevertheless, Bayesian phylogenetic analysis of this limited set gives a date for the common ancestor of sub-genotype 1a in 1990 (95% HPD 1981-1996) which is in reasonable agreement with estimates of previous studies where collection dates have been assembled by more laborious methods of literature search and direct enquiries to sequence submitters. We conclude that although collection dates should become standard for all future GenBank submissions of virus sequences, accurate dating of ancestors is possible with even a small number of sequences if sampling information is high quality.
AB - The presence of data in the collection_date field of a GenBank sequence record is of great assistance in the use of that sequence for Bayesian phylogenetics using tip-dating. We present Tempus et Locus (TeL), a tool for extracting such sequences from a GenBank-formatted sequence database. TeL shows that 60% of viral sequences in GenBank have collection date fields, but that this varies considerably between species. Primate erythroparvovirus 1 (human parvovirus B19 or B19V) has only 40% of its sequences dated, of which only 112 are of more than 4 kb. 100 of these are from B19V sub-genotype 1a and were collected from a mere 6 studies conducted in 5 countries between 2002 and 2013. Nevertheless, Bayesian phylogenetic analysis of this limited set gives a date for the common ancestor of sub-genotype 1a in 1990 (95% HPD 1981-1996) which is in reasonable agreement with estimates of previous studies where collection dates have been assembled by more laborious methods of literature search and direct enquiries to sequence submitters. We conclude that although collection dates should become standard for all future GenBank submissions of virus sequences, accurate dating of ancestors is possible with even a small number of sequences if sampling information is high quality.
KW - primate erythroparvovirus
KW - parvovirus B19
KW - Parvoviridae
KW - phylogenetics
KW - Tempus et Locus
KW - TeL
KW - virus
KW - evolution
KW - bioinformatics
U2 - 10.1101/061697
DO - 10.1101/061697
M3 - Journal article
VL - 2016
JO - Biorxiv
JF - Biorxiv
ER -