Home > Research > Publications & Outputs > An identity crisis in the life sciences
View graph of relations

An identity crisis in the life sciences

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Published
  • Jun Zhao
  • Carole Goble
  • Robert Stevens
Close
Publication date2006
Host publicationProvenance and Annotation of Data: International Provenance and Annotation Workshop, IPAW 2006, Chicago, IL, USA, May 3-5, 2006, Revised Selected Papers
EditorsLuc Moreau, Ian Foster
Place of PublicationBerlin
PublisherSpringer
Pages254-269
Number of pages16
ISBN (electronic)9783540463030
ISBN (print)9783540463023
<mark>Original language</mark>English

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume4145
ISSN (Print)030209743

Abstract

myGrid is an e-Science project assisting life scientists to build workflows that gather data from distributed, autonomous, replicated and heterogeneous resources. The provenance logs of workflow executions are recorded as RDF graphs. The log of one workflow run is used to trace the history of its execution process. However, by aggregating provenance logs of many workflow runs, one may gather the provenance of a common data product shared in multiple derivation paths. A successful aggregation relies on accurate and universal identification of each data product. The nature of bioinformatics data and services, however, makes this difficult. We describe the identity problem in bioinformatics data, and present a protocol for managing identity co-references and allocating identity to gathered and computed data products. The ability to overcome this problem means that the provenance of workflows in bioinformatics and other domains can be exploited to enhance the practice of e-Science.