Home > Research > Publications & Outputs > OpenFlyData
View graph of relations

OpenFlyData: an exemplar data web integrating gene expression data on the fruit fly Drosophila melanogaster

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Published

Standard

OpenFlyData: an exemplar data web integrating gene expression data on the fruit fly Drosophila melanogaster. / Miles, Alistair; Zhao, Jun; Klyne, Graham et al.
In: Journal of Biomedical Informatics, Vol. 43, No. 5, 10.2010, p. 752-761.

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Harvard

Miles, A, Zhao, J, Klyne, G, White-Cooper, H & Shotton, D 2010, 'OpenFlyData: an exemplar data web integrating gene expression data on the fruit fly Drosophila melanogaster', Journal of Biomedical Informatics, vol. 43, no. 5, pp. 752-761. https://doi.org/10.1016/j.jbi.2010.04.004

APA

Miles, A., Zhao, J., Klyne, G., White-Cooper, H., & Shotton, D. (2010). OpenFlyData: an exemplar data web integrating gene expression data on the fruit fly Drosophila melanogaster. Journal of Biomedical Informatics, 43(5), 752-761. https://doi.org/10.1016/j.jbi.2010.04.004

Vancouver

Miles A, Zhao J, Klyne G, White-Cooper H, Shotton D. OpenFlyData: an exemplar data web integrating gene expression data on the fruit fly Drosophila melanogaster. Journal of Biomedical Informatics. 2010 Oct;43(5):752-761. doi: 10.1016/j.jbi.2010.04.004

Author

Miles, Alistair ; Zhao, Jun ; Klyne, Graham et al. / OpenFlyData : an exemplar data web integrating gene expression data on the fruit fly Drosophila melanogaster. In: Journal of Biomedical Informatics. 2010 ; Vol. 43, No. 5. pp. 752-761.

Bibtex

@article{f5a5ff3850504884b312a64dc4e99237,
title = "OpenFlyData: an exemplar data web integrating gene expression data on the fruit fly Drosophila melanogaster",
abstract = "Motivation: Integrating heterogeneous data across distributed sources is a major requirement for in silico bioinformatics supporting translational research. For example, genome-scale data on patterns of gene expression in the fruit fly Drosophila melanogaster are widely used in functional genomic studies in many organisms to inform candidate gene selection and validate experimental results. However, current data integration solutions tend to be heavy weight, and require significant initial and ongoing investment of effort. Development of a common Web-based data integration infrastructure (a.k.a. data web), using Semantic Web standards, promises to alleviate these difficulties, but little is known about the feasibility, costs, risks or practical means of migrating to such an infrastructure. Results: We describe the development of OpenFlyData, a proof-of-concept system integrating gene expression data on D. melanogaster, combining Semantic Web standards with light-weight approaches to Web programming based on Web 2.0 design patterns. To support researchers designing and validating functional genomic studies, OpenFlyData includes user-facing search applications providing intuitive access to and comparison of gene expression data from FlyAtlas, the BDGP in situ database, and FlyTED, using data from FlyBase to expand and disambiguate gene names. OpenFlyData{\textquoteright}s services are also openly accessible, and are available for reuse by other bioinformaticians and application developers. Semi-automated methods and tools were developed to support labour- and knowledge-intensive tasks involved in deploying SPARQL services. These include methods for generating ontologies and relational-to-RDF mappings for relational databases, which we illustrate using the FlyBase Chado database schema; and methods for mapping gene identifiers between databases. The advantages of using Semantic Web standards for biomedical data integration are discussed, as are open issues. In particular, although the performance of open source SPARQL implementations is sufficient to query gene expression data directly from user-facing applications such as Web-based data fusions (a.k.a. mashups), we found open SPARQL endpoints to be vulnerable to denial-of-service-type problems, which must be mitigated to ensure reliability of services based on this standard. These results are relevant to data integration activities in translational bioinformatics. Availability: The gene expression search applications and SPARQL endpoints developed for OpenFlyData are deployed at http://openflydata.org. FlyUI, a library of JavaScript widgets providing re-usable user-interface components for Drosophila gene expression data, is available at http://flyui.googlecode.com. Software and ontologies to support transformation of data from FlyBase, FlyAtlas, BDGP and FlyTED to RDF are available at http://openflydata.googlecode.com. SPARQLite, an implementation of the SPARQL protocol, is available at http://sparqlite.googlecode.com. All software is provided under the GPL version 3 open source license.",
keywords = "Chado, Data integration , Data web , Drosophila , Gene expression , Performance , RDF , SPARQL , Triple store , User interface",
author = "Alistair Miles and Jun Zhao and Graham Klyne and Helen White-Cooper and David Shotton",
year = "2010",
month = oct,
doi = "10.1016/j.jbi.2010.04.004",
language = "English",
volume = "43",
pages = "752--761",
journal = "Journal of Biomedical Informatics",
issn = "1532-0464",
publisher = "Academic Press Inc.",
number = "5",

}

RIS

TY - JOUR

T1 - OpenFlyData

T2 - an exemplar data web integrating gene expression data on the fruit fly Drosophila melanogaster

AU - Miles, Alistair

AU - Zhao, Jun

AU - Klyne, Graham

AU - White-Cooper, Helen

AU - Shotton, David

PY - 2010/10

Y1 - 2010/10

N2 - Motivation: Integrating heterogeneous data across distributed sources is a major requirement for in silico bioinformatics supporting translational research. For example, genome-scale data on patterns of gene expression in the fruit fly Drosophila melanogaster are widely used in functional genomic studies in many organisms to inform candidate gene selection and validate experimental results. However, current data integration solutions tend to be heavy weight, and require significant initial and ongoing investment of effort. Development of a common Web-based data integration infrastructure (a.k.a. data web), using Semantic Web standards, promises to alleviate these difficulties, but little is known about the feasibility, costs, risks or practical means of migrating to such an infrastructure. Results: We describe the development of OpenFlyData, a proof-of-concept system integrating gene expression data on D. melanogaster, combining Semantic Web standards with light-weight approaches to Web programming based on Web 2.0 design patterns. To support researchers designing and validating functional genomic studies, OpenFlyData includes user-facing search applications providing intuitive access to and comparison of gene expression data from FlyAtlas, the BDGP in situ database, and FlyTED, using data from FlyBase to expand and disambiguate gene names. OpenFlyData’s services are also openly accessible, and are available for reuse by other bioinformaticians and application developers. Semi-automated methods and tools were developed to support labour- and knowledge-intensive tasks involved in deploying SPARQL services. These include methods for generating ontologies and relational-to-RDF mappings for relational databases, which we illustrate using the FlyBase Chado database schema; and methods for mapping gene identifiers between databases. The advantages of using Semantic Web standards for biomedical data integration are discussed, as are open issues. In particular, although the performance of open source SPARQL implementations is sufficient to query gene expression data directly from user-facing applications such as Web-based data fusions (a.k.a. mashups), we found open SPARQL endpoints to be vulnerable to denial-of-service-type problems, which must be mitigated to ensure reliability of services based on this standard. These results are relevant to data integration activities in translational bioinformatics. Availability: The gene expression search applications and SPARQL endpoints developed for OpenFlyData are deployed at http://openflydata.org. FlyUI, a library of JavaScript widgets providing re-usable user-interface components for Drosophila gene expression data, is available at http://flyui.googlecode.com. Software and ontologies to support transformation of data from FlyBase, FlyAtlas, BDGP and FlyTED to RDF are available at http://openflydata.googlecode.com. SPARQLite, an implementation of the SPARQL protocol, is available at http://sparqlite.googlecode.com. All software is provided under the GPL version 3 open source license.

AB - Motivation: Integrating heterogeneous data across distributed sources is a major requirement for in silico bioinformatics supporting translational research. For example, genome-scale data on patterns of gene expression in the fruit fly Drosophila melanogaster are widely used in functional genomic studies in many organisms to inform candidate gene selection and validate experimental results. However, current data integration solutions tend to be heavy weight, and require significant initial and ongoing investment of effort. Development of a common Web-based data integration infrastructure (a.k.a. data web), using Semantic Web standards, promises to alleviate these difficulties, but little is known about the feasibility, costs, risks or practical means of migrating to such an infrastructure. Results: We describe the development of OpenFlyData, a proof-of-concept system integrating gene expression data on D. melanogaster, combining Semantic Web standards with light-weight approaches to Web programming based on Web 2.0 design patterns. To support researchers designing and validating functional genomic studies, OpenFlyData includes user-facing search applications providing intuitive access to and comparison of gene expression data from FlyAtlas, the BDGP in situ database, and FlyTED, using data from FlyBase to expand and disambiguate gene names. OpenFlyData’s services are also openly accessible, and are available for reuse by other bioinformaticians and application developers. Semi-automated methods and tools were developed to support labour- and knowledge-intensive tasks involved in deploying SPARQL services. These include methods for generating ontologies and relational-to-RDF mappings for relational databases, which we illustrate using the FlyBase Chado database schema; and methods for mapping gene identifiers between databases. The advantages of using Semantic Web standards for biomedical data integration are discussed, as are open issues. In particular, although the performance of open source SPARQL implementations is sufficient to query gene expression data directly from user-facing applications such as Web-based data fusions (a.k.a. mashups), we found open SPARQL endpoints to be vulnerable to denial-of-service-type problems, which must be mitigated to ensure reliability of services based on this standard. These results are relevant to data integration activities in translational bioinformatics. Availability: The gene expression search applications and SPARQL endpoints developed for OpenFlyData are deployed at http://openflydata.org. FlyUI, a library of JavaScript widgets providing re-usable user-interface components for Drosophila gene expression data, is available at http://flyui.googlecode.com. Software and ontologies to support transformation of data from FlyBase, FlyAtlas, BDGP and FlyTED to RDF are available at http://openflydata.googlecode.com. SPARQLite, an implementation of the SPARQL protocol, is available at http://sparqlite.googlecode.com. All software is provided under the GPL version 3 open source license.

KW - Chado

KW - Data integration

KW - Data web

KW - Drosophila

KW - Gene expression

KW - Performance

KW - RDF

KW - SPARQL

KW - Triple store

KW - User interface

U2 - 10.1016/j.jbi.2010.04.004

DO - 10.1016/j.jbi.2010.04.004

M3 - Journal article

VL - 43

SP - 752

EP - 761

JO - Journal of Biomedical Informatics

JF - Journal of Biomedical Informatics

SN - 1532-0464

IS - 5

ER -