Remember Me
Or use your Academic/Social account:


Or use your Academic/Social account:


You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.


Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message


Verify Password:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Languages: English
Types: Doctoral thesis
Protein threading, which is also referred to as fold recognition, aligns a probe amino acid sequence onto a library of representative folds of known structure to identify a structural similarity. Following the threading technique of the structural profile approach, this research focused on developing and evaluating a new framework - Mixed Environment Specific Substitution Mapping (MESSM) - for protein threading by artificial neural networks (ANNs) and support vector machines (SVMs). The MESSM presents a new process to develop an efficient tool for protein fold recognition. It achieved better efficiency while retained the effectiveness on protein prediction. The MESSM has three key components, each of which is a step in the protein threading framework. First, building the fold profile library-given a protein structure with a residue level environmental description, Neural Networks are used to generate an environment-specific amino acid substitution (3D-1D) mapping. Second, mixed substitution mapping--a mixed environment-specific substitution mapping is developed by combing the structural-derived substitution score with sequence profile from well-developed amino acid substitution matrices. Third, confidence evaluation--a support vector machine is employed to measure the significance of the sequence-structure alignment. Four computational experiments are carried out to verify the performance of the MESSM. They are Fischer, ProSup, Lindahl and Wallner benchmarks. Tested on Fischer, Lindahl and Wallner benchmarks, MESSM achieved a comparable performance on fold recognition to those energy potential based threading models. For Fischer benchmark, MESSM correctly recognise 56 out of 68 pairs, which has the same performance as that of COBLATH and SPARKS. The computational experiments show that MESSM is a fast program. It could make an alignment between probe sequence (150 amino acids) and a profile of 4775 template proteins in 30 seconds on a PC with IG memory Pentium IV. Also, tested on ProSup benchmark, the MESSM achieved alignment accuracy of 59.7%, which is better than current models. The research work was extended to develop a threading score following the threading technique of the contact potential approach. A TES (Threading with Environment-specific Score) model is constructed by neural networks.
  • The results below are discovered through our pilot algorithms. Let us know how we are doing!

    • Orengo, C. A., Michie, A. D., Jones, S., Jones, D. T., Swindells, M. B. and Thornton, J. M. (1997) CATH- a hierarchic classification of protein domain structures. Structure 5, 1093-1108.
    • Osuna, E., Freund, R. and Girosi, F. (1997) An improved training algorithm for support vector machines. Neural Networks for Signal processing VI/-- proceeding of the 1997 IEEE workshop, 276-285.
    • Park, B. and Levitt, M. (1995) The complexity and accuracy of discrete state models of protein structure. Journal of Molecular Biologlj, 249(2), 493-507.
    • Park, B. and Levitt, M. (1996) Energy functions that discriminate X-ray and near-native folds from well-constructed decoys. Journal of Molecular Biologlj, 258,367-392.
    • Park, K. J. and Kanehisa, M. (2003) Prediction of protein subcellular locations by support verctor machines using compositions of amino acids and amino acid pairs. Bioinformatics, 19, 1656-1663.
    • Pavlidis, P., Weston, J., Cai, J. and Grundy, W. N. (2001) Gene functional classification from heterogeneous data. Proceedings of the 5til International Conference on Computational Molecular Biologlj, 242-248.
    • Peitsch, M. C. (1996) ProMod and Swiss-Model: Internet-based tools for automated comparative protein modeling. Bioche11l Soc Trans, 24(1), 274-279.
    • Pollastri, G. and Baldi, P. (2002) Prediction of contact maps by GIOHMMs and recurrent neural networks using lateral propagation from all four cardinal corners. Bioinformatics, 18, 62-70.
    • Qian, N. and Sejnowski, T. J. (1988) Predicting the secondary structure of globular proteins using neural network models. Journal of Molecular Biologlj, 202, 865-884.
    • Ramachandran, G. N., Kolaskar, A. S., Ramakrishnan, C. and Sasisekharan, V. (1974) The mean geometry of the peptide unit from crystal structure data. Biochim Biophys Acta, 359(2), 298-302.
    • Reczko, M. and Suhai, S. (1994) Applications of artificial neural networks in genome research. Computational Methods in Genome Research, ed. S. Suhai, 191-208, Plenum Press, New York.
    • Rice, D. W. and Eisenberg, D. (1997) A 3D-ID substitution matrix for protein fold recognition that includes predicted secondary structure of the sequence. Journal of Molecular Biology, 267(4), 1026-1038.
    • Rost, B. and Sander, C. (1993) Prediction of protein secondary structure at better than 70% accuracy. Journal ofMolecular BiologIj, 232, 584-599.
    • Rost, B. Sander, C. and Schneider, R. (1994) PHD-an automatic mail server for protein secondary structure prediction. Computer Applications in Biosciences, 10, 53-60.
    • Rost, B. (1995) TOPITS: threading one-dimensional predictions into threedimensional structures. Proc Int Conflntell Syst Mol BioI, 3, 314-321.
    • Rost B, Casadio R, and Fariselli P. (1996) Refining neural network predictions for helical transmembrane proteins by dynamic programming. Proc Int Conflntell Syst Mol Bio!., 4, 192-200.
    • Rost, B., Schneider, R. and Sander, C. (1997) Protein fold recognition by prediction-based threading. Journal ofMolecular BiologIj, 270,471-480.
    • Rost, B. (1999) Twilight zone of protein sequence alignments. Protein Engineering, 12(2), 85-94.
    • Rumelhart, D. E., Hinton, G. E. and Williams, R. J. (1986) Learning internal representations by error propagation. Chapter 8 in Parallel Distributed Processing: Foundation. Vol. I, MIT Press, Cambridge, MA, 318-362.
    • Russell, R. B., Copley, R. R. and Barton, G. J. (1996) Protein fold recognition by mapping predicted secondary structures. Journal of Molecular BiologIj, 259(3), 349-65.
    • Samudrala, R and Moult, J. (1998) An all-atom distance-dependent conditional probability discriminatory function for protein structure prediction. Journal of Molecular Biology, 275, 895-916.
    • Samudrala, R, Huang, E. S., Levitt, M. (1998) Selection of the most nativelike conformations from a set of models constructed by homology modelling. Unpublished results.
    • Samudrala, R, Xia, Y., Levitt, M., Huang, E. S. (1999) A combined approach for ab initio construction of low resolution protein tertiary structures from sequence. Proceedings of the Pacific Symposium on Biocomputing, 505-516.
    • Samudrala, R and Levitt, M. (2000) Decoys 'R' Us: a database of incorrect conformations to improve protein structure prediction. Protein Science, 9, 1399-1401.
    • Samudrala, R and Levitt, M. (2002) A comprehensive analysis of 40 blind protein structure predictions. BMC Structural Biology, 2, 3-18.
    • Schwede, T., Kopp, J., Guex, N. and Peitsch, M. C. (2003) SWISS-MODEL: an automated protein homology-modeling server. Nucleic Acids Research, 31, 3381-3385.
    • Segal, N. H., Pavlidis, P. Antonescu, C. R, Maki, R G., Noble, W.S., Woodruff, J. M., Lewis, J. J., Brennan, M. F., Houghton, A. N. and Cordon-Cardo, C. (2003a) Classification and subtype prediction of soft tissue sarcoma by functional genomics and support vector machine analysis. American Journal of Pathologtj. 169:691-700.
    • Segal, N. H., Pavlidis, P., Noble, W.S., Antonescu, C. R, Viale, A., Wesley, U. V., Busam, K., Gallardo, H., DeSantis, D., Brennan, M. F., CordonCardo, C. and Houghton, A. N. (2003b) Classification of clear cell sarcoma as melanoma of soft parts by genomic profiling. Journal of Clinical Oncologtj, 21:1775-1781.
    • Shan, Y. B., Wang, G. L. and Zhou, H. X. (2001) Fold recognition and accurate query-template alignment by a combination of PSI-BLAST and threading. Proteins: stmcture, junction, and bioinformatics, 42, 23-37.
    • Shi, J., Blundell, T. L. and Mizuguchi, K. (2001) FUGUE: sequencestructure homology recognition using environment-specific substitution tables and structure- dependent gap penalties. Journal of Molecular Biologtj, 310, 243-257.
    • Shih, E. S. C. and Hwang, M. J. (2003) Protein structure comparison by probability-based matching of secondary structure elements. Bioinformatics, 19, 735-741.
    • Simons, K. T., Kooperberg, c., Huang, E. and Baker, D. (1997) Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. Journal of Molecular Biologtj, 268(1), 209-225.
    • Simons, K. T., Bonneau, R., Ruczinski, I. I. And Baker, D. (1999) Ab initio protein structure prediction of CASP 0 targets using ROSETTA. Proteins: structure, function, and bioinformatics, 37(S3), 171-176.
    • SippI, M. J. (1990) Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. Journal of Molecular Biologtj, 213, 859-883.
    • SippI, M. J. (1995) Knowledge-based potentials for proteins. Current Opinion in Structural Biologtj, 5(2), 229-235.
    • Skolnick, J., Kolinski, A. and Ortiz, A. (2000) Derivation of protein-specific pair potentials based on weak sequence fragment similarity. Proteins: structure, function, and bioinformatics, 38,3-16.
    • Skolnick, J. and Kihara, D. (2001) Defrosting the frozen approximation: PROSPECTOR- a new approach to threading, Proteins: structure, function, and bioinformatics, 42: 319-331.
    • Skolnick, J. and Kolinski, A. (1991) Dynamic Monte Carlo simulations of a new lattice model of globular protein folding, structure and dynamics. Journal of Molecular Biologtj, 221(2), 499-531.
    • Smith, T. F. and Waterman, M. S. (1981) Identification of common molecular subsequences. Journal ofMolecular Biologtj, 147, 195-197.
    • Snyder, E. E., Stormo, G. D. (1995) Identification of protein coding regions in genomic DNA. Journal ofMolecular BiologtJ, 248, 1-18.
    • Sternberg, M. J. E. (1996) Protein Structure Prediction: A Practical Approach, Oxford University Press.
    • Stoesser, G., Bakwe, W., van den Broek, A., Camon, E., Garcia-Pastor, M., Kanz, c., Kulikova, T., Lombard, V., Lopez, R, Parkinson, H., Redaschi, N., Sterk, P., Stoehr, P. and Tuli, M. A. (2001) The EMBL nucleotide sequence database. Nucleic acids research, 10, 2997-3011.
    • Stormo, G. D., Schneider, T. D., Gold, L. M. and Ehrenfeucht, A. (1982) Use of the 1/ perceptron" algorithm to distinguish translational initiation sites in e. coli.. Nucleic acids research, 10, 2997-3011.
    • SUI Y., Murat M., Pavlovic, V. Schaffer, M. and Kasit S. (2003) RankGene: Identification of diagnostic genes based on expression data. Bioinfonnatics, 19(12): 1578.
    • Sun, S. (1993) Reduced representation model of protein structure prediction: statistical potential and genetic algorithms, Protein Science, 2 (5), 762-785.
    • Taylor, W. Rand Orengo, C. A. (1989) Protein structure alignment. Journal ofMolecular BiologtJ, 208(1), 1-22.
    • Taylor W.R (1997) Multiple sequence threading: an analysis of alignment quality and stability. Journal of Molecular BiologtJ, 269,902-943.
    • Thiele, R, Zimmer, Rand Lengauer, T. (1999) Protein threading by recursive dynamic programming. Journal of Molecular BiologtJ, 290, 757- 779.
    • Thompson, J. D., Higgins, D. G. and Gibson, T. J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Research, 22:4673-4680.
    • Thornton, J. M., Orengo, C. A., Todd, A. E. and Peart F. M. (1999) Protein folds, functions and evolution. Journal of Molecular BiologtJ, 293(2), 333- 42.
    • Torda, A. E., Procter, J. B. and Huber, T. (2004) Wurst: a protein threading server with a structural scoring function, sequence profiles and optimized substitution matrices, Nucleic acids research, 32, W532-W535.
    • Uberbacher, E. C. and Mural, R. J. (1991) Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. Proceedings of the National Academy of Science of the U. S. A., 88, 11261-11265.
    • Unger, R. and Moult, J. (1991) An analysis of protein folding pathways. Biochemistnj, 30, 3816-3823.
    • Vapnik, V. (1995) The Nature of Statistical Learning Theon;. Springer-Verlag, New York.
    • Vapnik, V. N. (1998) Statistical Learning Theory, Adaptive and learning systems for signal processing, communications, and control. Wiley, New York.
    • Vendruscolo, M., Najmanovich, R. and Domany, E. (2000) Can a pairwise contact potential stabilize native protein folds against decoys obtained by threading? Proteins: structure, function, and bioinformatics, 38, 134-148.
    • Vert, J. P. and kanehisa, M. (2003) Graph-driven features extraction from microarray data using diffusion kernels and kernel CCA. Advances in Neural Information Processing Systems 15. MIT Press, Cambridge, MA.
    • Wallner, B., Fang, H., Ohlson, T., Frey-Skott, J. and Elofsson, A. (2004) Using evolutionary information for query and target improves fold recognition. Proteins: structure, function, and bioinformatics, 54:342-350.
    • Wilcox, G. L., Poliac, M. O. and Liebman, M. N. (1991) Neural network analysis of protein tertiary structure. Tetrahedron Computer Methods, 3, 191-211.
    • Williams, M.G., Shirai, H., Shi, J. et al. (2001) Sequence-structure homology recognition by iterative alignment refinement and comparative modeling. Proteins: structure, function, and bioinformatics, S5,92-97.
    • Wu, C. H. (1997) Artificial neural networks for molecular sequence analysis. Computers & Chemistry, 21(4), 237 - 256.
    • Xia, Y., Huang, E. S., Levitt, M., Samudrala, R (2000) Ab initio construction of protein tertiary structures using a hierarchical approach. Journal of Molecular Biology, 300, 171-185.
    • Xin, Y., Carmeli, T. T., Liebman, M. N. and Wilcox, G. L. (1993) Use of the backpropagation neural network algorithm for prediction of protein folding patterns. In Proceedings of the Second International Conference on Bioinformatics, Supercomputing, and Complex Genome analysis, eds H. A. Lim, J. W. Fickett, C. R Cantor and R J. Robbins, 359-375. World Scientific, River Edge, NJ.
    • Xu, D., Crawford, O. H., LoCascio, p. F. and Xu Y. (2001) Application of PROSPECT in CASP4: Characterizing protein structures with new folds. Proteins: structure, junction, and bioinformatics, 55, 140.
    • Xu, Y. and Xu, D. (2000) Protein threading using PROSPECT: design and evaluation. Proteins: structure, junction, and bioinformatics, 40(3),343-354.
    • Xu, Y., Xu, D., and Olman, V. (2002) A Preactical Method for Interpretation of Threading Scores: An Application of neural Network. Statistica Sinica, 12, 159-177.
    • Yeang, c., Ramaswamy, S., Tamayo, P., Mukherjee, S., Rifkin, R R, Angelo, M., Reich, M., Lander, E., Mesirov, J. and Golub, T. (2001) Molecular classification of multiple tumor types. Bioinfonnatics, 17, Suppl1:S316-S322.
    • Yona, G and Levitt, M. (2002) Within the twilight zone: a sensitive profileprofile comparison tool based on information theory. Journal of Molecular BiologlJ, 315, 1257-1275.
    • Zavaljevski, N. and Reifman, J. (2002) Support vector machines with selective kernel scaling for protein classification and identification of key amino acid positions. Bioinformatics, 18(5):698-696.
    • Zhang, K. Y. and Eisenberg, D. (1994) The three-dimensional profile method using residue preference as a continuous function of residue environment. Protein Science, 3(4), 687-695.
    • Zhou, Hand Zhou, Y. (2002) Distance-scaled, finite ideal-gas reference state improves structure-derived potentials of mean force for structure selection and stability prediction, Protein Science, 11, 2714-2726.
    • Zhou, H. and Zhou, Y. (2004) Single-body residue-level knowledge-based energy score combined with sequence-profile and secondary structure information for fold recognition. Proteins: structure, function, and bioinfor111atics, 55:1005-1013.
    • Zien, A., Ratch, G., Mika, S., Scholkopf, B., Legauer, T. and Muller, K. R. (2000) Engineering support vector machine kernels that recognize translation initiation sites. Bioinfonnatics, 16, 799-807.
    • No. #AlnSSE Rough_Z Refine_z p-value #AlnRes 1 9 6.16 24.78 1.7e-11 123
  • No related research data.
  • No similar publications.
  • BioEntity Site Name
    1bfdProtein Data Bank
    1bp2Protein Data Bank
    1cbhProtein Data Bank
    1lh1Protein Data Bank
    1nreProtein Data Bank
    1p2pProtein Data Bank
    1pptProtein Data Bank
    1reiProtein Data Bank
    2cdvProtein Data Bank
    2ci2Protein Data Bank
    2croProtein Data Bank
    2cypProtein Data Bank
    2tmnProtein Data Bank
    5padProtein Data Bank

Share - Bookmark

Cite this article