Home > Research > Publications & Outputs > Peptide vocabulary analysis reveals ultra-conse...

Electronic data

Links

View graph of relations

Peptide vocabulary analysis reveals ultra-conservation and homonymity in protein sequences

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Published

Standard

Peptide vocabulary analysis reveals ultra-conservation and homonymity in protein sequences. / Gatherer, Derek.
In: Bioinformatics and Biology Insights, Vol. 2007, No. 1, 12.12.2007, p. 101-126.

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Harvard

APA

Vancouver

Gatherer D. Peptide vocabulary analysis reveals ultra-conservation and homonymity in protein sequences. Bioinformatics and Biology Insights. 2007 Dec 12;2007(1):101-126.

Author

Gatherer, Derek. / Peptide vocabulary analysis reveals ultra-conservation and homonymity in protein sequences. In: Bioinformatics and Biology Insights. 2007 ; Vol. 2007, No. 1. pp. 101-126.

Bibtex

@article{dc158323d1e248c48e5faa6079188cda,
title = "Peptide vocabulary analysis reveals ultra-conservation and homonymity in protein sequences",
abstract = "A new algorithm is presented for vocabulary analysis (word detection) in texts of human origin. It performs at 60%-70% overall accuracy and greater than 80% accuracy for longer words, and approximately 85% sensitivity on Alice in Wonderland, a considerable improvement on previous methods. When applied to protein sequences, it detects short sequences analogous to words in human texts, i.e. intolerant to changes in spelling (mutation), and relatively context-independent in their meaning (function). Some of these are homonyms of up to 7 amino acids, which can assume different structures in different proteins. Others are ultra-conserved stretches of up to 18 amino acids within proteins of less than 40% overall identity, reflecting extreme constraint or convergent evolution. Different species are found to have qualitatively different major peptide vocabularies, e.g. some are dominated by large gene families, while others are rich in simple repeats or dominated by internally repetitive proteins. This suggests the possibility of a peptide vocabulary signature, analogous to genome signatures in DNA. Homonyms may be useful in detecting convergent evolution and positive selection in protein evolution. Ultra-conserved words may be useful in identifying structures intolerant to substitution over long periods of evolutionary time.",
author = "Derek Gatherer",
year = "2007",
month = dec,
day = "12",
language = "English",
volume = "2007",
pages = "101--126",
journal = "Bioinformatics and Biology Insights",
publisher = "Libertas Academica Ltd.",
number = "1",

}

RIS

TY - JOUR

T1 - Peptide vocabulary analysis reveals ultra-conservation and homonymity in protein sequences

AU - Gatherer, Derek

PY - 2007/12/12

Y1 - 2007/12/12

N2 - A new algorithm is presented for vocabulary analysis (word detection) in texts of human origin. It performs at 60%-70% overall accuracy and greater than 80% accuracy for longer words, and approximately 85% sensitivity on Alice in Wonderland, a considerable improvement on previous methods. When applied to protein sequences, it detects short sequences analogous to words in human texts, i.e. intolerant to changes in spelling (mutation), and relatively context-independent in their meaning (function). Some of these are homonyms of up to 7 amino acids, which can assume different structures in different proteins. Others are ultra-conserved stretches of up to 18 amino acids within proteins of less than 40% overall identity, reflecting extreme constraint or convergent evolution. Different species are found to have qualitatively different major peptide vocabularies, e.g. some are dominated by large gene families, while others are rich in simple repeats or dominated by internally repetitive proteins. This suggests the possibility of a peptide vocabulary signature, analogous to genome signatures in DNA. Homonyms may be useful in detecting convergent evolution and positive selection in protein evolution. Ultra-conserved words may be useful in identifying structures intolerant to substitution over long periods of evolutionary time.

AB - A new algorithm is presented for vocabulary analysis (word detection) in texts of human origin. It performs at 60%-70% overall accuracy and greater than 80% accuracy for longer words, and approximately 85% sensitivity on Alice in Wonderland, a considerable improvement on previous methods. When applied to protein sequences, it detects short sequences analogous to words in human texts, i.e. intolerant to changes in spelling (mutation), and relatively context-independent in their meaning (function). Some of these are homonyms of up to 7 amino acids, which can assume different structures in different proteins. Others are ultra-conserved stretches of up to 18 amino acids within proteins of less than 40% overall identity, reflecting extreme constraint or convergent evolution. Different species are found to have qualitatively different major peptide vocabularies, e.g. some are dominated by large gene families, while others are rich in simple repeats or dominated by internally repetitive proteins. This suggests the possibility of a peptide vocabulary signature, analogous to genome signatures in DNA. Homonyms may be useful in detecting convergent evolution and positive selection in protein evolution. Ultra-conserved words may be useful in identifying structures intolerant to substitution over long periods of evolutionary time.

M3 - Journal article

C2 - 20066129

VL - 2007

SP - 101

EP - 126

JO - Bioinformatics and Biology Insights

JF - Bioinformatics and Biology Insights

IS - 1

ER -