Home > Research > Publications & Outputs > Genome signatures, self-organizing maps and hig...

Electronic data

Links

View graph of relations

Genome signatures, self-organizing maps and higher order phylogenies: a parametric analysis

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Published

Standard

Genome signatures, self-organizing maps and higher order phylogenies: a parametric analysis. / Gatherer, Derek.
In: Evolutionary Bioinformatics, Vol. 2007, No. 3, 17.09.2007, p. 211-236.

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Harvard

APA

Vancouver

Author

Gatherer, Derek. / Genome signatures, self-organizing maps and higher order phylogenies : a parametric analysis. In: Evolutionary Bioinformatics. 2007 ; Vol. 2007, No. 3. pp. 211-236.

Bibtex

@article{22e093ef05ac48bcbca3ee7d78d28a9c,
title = "Genome signatures, self-organizing maps and higher order phylogenies: a parametric analysis",
abstract = "Genome signatures are data vectors derived from the compositional statistics of DNA. The self-organizing map (SOM) is a neural network method for the conceptualisation of relationships within complex data, such as genome signatures. The various parameters of the SOM training phase are investigated for their effect on the accuracy of the resulting output map. It is concluded that larger SOMs, as well as taking longer to train, are less sensitive in phylogenetic classification of unknown DNA sequences. However, where a classification can be made, a larger SOM is more accurate. Increasing the number of iterations in the training phase of the SOM only slightly increases accuracy, without improving sensitivity. The optimal length of the DNA sequence k-mer from which the genome signature should be derived is 4 or 5, but shorter values are almost as effective. In general, these results indicate that small, rapidly trained SOMs are generally as good as larger, longer trained ones for the analysis of genome signatures. These results may also be more generally applicable to the use of SOMs for other complex data sets, such as microarray data.",
keywords = "Genome Signature, Self-Organizing Map, Viruses, Phylogeny, Jack-Knife Method, Microarray, Metagenomics, Herpesvirus, CHAOS GAME REPRESENTATION, LARGE PROTEIN DATABASES, ART. NO. 23, BACTERIAL GENOMES, GENE-EXPRESSION, MICROARRAY DATA, BREAST-CANCER, DNA, CLASSIFICATION, SEQUENCES",
author = "Derek Gatherer",
year = "2007",
month = sep,
day = "17",
language = "English",
volume = "2007",
pages = "211--236",
journal = "Evolutionary Bioinformatics",
issn = "1176-9343",
publisher = "Libertas Academica Ltd.",
number = "3",

}

RIS

TY - JOUR

T1 - Genome signatures, self-organizing maps and higher order phylogenies

T2 - a parametric analysis

AU - Gatherer, Derek

PY - 2007/9/17

Y1 - 2007/9/17

N2 - Genome signatures are data vectors derived from the compositional statistics of DNA. The self-organizing map (SOM) is a neural network method for the conceptualisation of relationships within complex data, such as genome signatures. The various parameters of the SOM training phase are investigated for their effect on the accuracy of the resulting output map. It is concluded that larger SOMs, as well as taking longer to train, are less sensitive in phylogenetic classification of unknown DNA sequences. However, where a classification can be made, a larger SOM is more accurate. Increasing the number of iterations in the training phase of the SOM only slightly increases accuracy, without improving sensitivity. The optimal length of the DNA sequence k-mer from which the genome signature should be derived is 4 or 5, but shorter values are almost as effective. In general, these results indicate that small, rapidly trained SOMs are generally as good as larger, longer trained ones for the analysis of genome signatures. These results may also be more generally applicable to the use of SOMs for other complex data sets, such as microarray data.

AB - Genome signatures are data vectors derived from the compositional statistics of DNA. The self-organizing map (SOM) is a neural network method for the conceptualisation of relationships within complex data, such as genome signatures. The various parameters of the SOM training phase are investigated for their effect on the accuracy of the resulting output map. It is concluded that larger SOMs, as well as taking longer to train, are less sensitive in phylogenetic classification of unknown DNA sequences. However, where a classification can be made, a larger SOM is more accurate. Increasing the number of iterations in the training phase of the SOM only slightly increases accuracy, without improving sensitivity. The optimal length of the DNA sequence k-mer from which the genome signature should be derived is 4 or 5, but shorter values are almost as effective. In general, these results indicate that small, rapidly trained SOMs are generally as good as larger, longer trained ones for the analysis of genome signatures. These results may also be more generally applicable to the use of SOMs for other complex data sets, such as microarray data.

KW - Genome Signature

KW - Self-Organizing Map

KW - Viruses

KW - Phylogeny

KW - Jack-Knife Method

KW - Microarray

KW - Metagenomics

KW - Herpesvirus

KW - CHAOS GAME REPRESENTATION

KW - LARGE PROTEIN DATABASES

KW - ART. NO. 23

KW - BACTERIAL GENOMES

KW - GENE-EXPRESSION

KW - MICROARRAY DATA

KW - BREAST-CANCER

KW - DNA

KW - CLASSIFICATION

KW - SEQUENCES

M3 - Journal article

VL - 2007

SP - 211

EP - 236

JO - Evolutionary Bioinformatics

JF - Evolutionary Bioinformatics

SN - 1176-9343

IS - 3

ER -