Home > Research > Publications & Outputs > Phylogenetic differences in content and intensi...
View graph of relations

Phylogenetic differences in content and intensity of periodic proteins

Research output: Contribution to Journal/MagazineJournal articlepeer-review

<mark>Journal publication date</mark>04/2005
<mark>Journal</mark>Journal of Molecular Evolution
Issue number4
Number of pages15
Pages (from-to)447-461
Publication StatusPublished
<mark>Original language</mark>English


Many proteins exhibit sequence periodicity, often correlated with a visible structural periodicity. The statistical significance of such periodicity can be assessed by means of a chi-squared-based test, with significance thresholds being calculated from shuffled sequences. Comparison of the complete proteomes of 45 species reveals striking differences in the proportion of periodic proteins and the intensity of the most significant periodicities. Eukaryotes tend to have a higher proportion of periodic proteins than eubacteria, which in turn tend to have more than archaea. The intensity of periodicity in the most periodic proteins is also greatest in eukaryotes. By contrast, the relatively small group of periodic proteins in archaea also tend to be weakly periodic compared to those of eukaryotes and eubacteria. Exceptions to this general rule are found in those prokaryotes with multicellular life-cycle phases, e.g., Methanosarcina sp., or Anabaena sp., which have more periodicities than prokaryotes in general, and in unicellular eukaryotes, which have fewer than multicellular eukaryotes. The distribution of significantly periodic proteins in eukaryotes is over a wide range of period lengths, whereas prokaryotic proteins typically have a more limited set of period lengths. This is further investigated by repeating the analysis on the NRL-3D database of proteins of solved structure. Some short-range periodicities are explicable in terms of basic secondary structure, e.g., alpha helices, while middle-range periodicities are frequently found to consist of known short Pfam domains, e.g., leucine-rich repeats, tetratricopeptides or armadillo domains. However, not all can be explained in this way.