Remember Me
Or use your Academic/Social account:


Or use your Academic/Social account:


You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.


Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message


Verify Password:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Polajnar, T.; Rogers, S.; Girolami, M. (2009)
Publisher: Springer Berlin / Heidelberg
Languages: English
Types: Article
Subjects: QA75

Classified by OpenAIRE into

ACM Ref: ComputingMethodologies_PATTERNRECOGNITION
The increase in the availability of protein interaction studies in textual format coupled with the demand for easier access to the key results has lead to a need for text mining solutions. In the text processing pipeline, classification is a key step for extraction of small sections of relevant text. Consequently, for the task of locating protein-protein interaction sentences, we examine the use of a classifier which has rarely been applied to text, the Gaussian processes (GPs). GPs are a non-parametric probabilistic analogue to the more popular support vector machines (SVMs). We find that GPs outperform the SVM and na\"ive Bayes classifiers on binary sentence data, whilst showing equivalent performance on abstract and multiclass sentence corpora. In addition, the lack of the margin parameter, which requires costly tuning, along with the principled multiclass extensions enabled by the probabilistic framework make GPs an appealing alternative worth of further adoption.
  • The results below are discovered through our pilot algorithms. Let us know how we are doing!

    • 1. A. Airola, S. Pyysalo, J. Bjorne, T. Pahikkala, F. Ginter, and T. Salakoski. Allpaths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning. BMC bioinformatics, 9 Suppl 11, 2008.
    • 2. A. Aizerman, E. M. Braverman, and L. I. Rozoner. Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control, 25:821{837, 1964.
    • 3. James H. Albert and Siddhartha Chib. Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88(422):669, June 1993.
    • 4. Yasemin Altun, Thomas Hofmann, and Alexander J. Smola. Gaussian process classi cation for segmenting and annotating sequences. In ICML, 2004.
    • 5. Bernhard E. Boser, Isabelle Guyon, and Vladimir Vapnik. A training algorithm for optimal margin classi ers. In Computational Learing Theory, pages 144{152, 1992.
    • 6. R Bunescu, R Ge, R J Kate, E M Marcotte, R J Mooney, A K Ramani, and Y W Wong. Comparative experiments on learning information extractors for proteins and their interactions. Artif Intell Med, 33(2):139{155, Feb 2005.
    • 7. G. C. Cawley. MATLAB support vector machine toolbox (v0.55 ). University of East Anglia, School of Information Systems, Norwich, Norfolk, U.K. NR4 7TJ, 2000.
    • 8. Kian Ming Adam Chai, Hai Leong Chieu, and Hwee Tou Ng. Bayesian online classi ers for text classi cation and ltering. In SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 97{104, New York, NY, USA, 2002. ACM Press.
    • 9. H Chen and B M Sharp. Content-rich biological network constructed by mining pubmed abstracts. BMC Bioinformatics, 5:147{147, Oct 2004.
    • 10. W. Chu and Z. Ghahramani. Gaussian processes for ordinal regression. Journal of Machine Learning Research, 6:1019{1041, 2005.
    • 11. W Chu, Z Ghahramani, F Falciani, and D L Wild. Biomarker discovery in microarray gene expression data with gaussian processes. Bioinformatics, 21(16):3385{ 3393, Aug 2005.
    • 12. Wei Chu and Zoubin Ghahramani. Preference learning with gaussian processes. In In Twenty-second International Conference on Machine Learning (ICML-2005), 2005.
    • 13. Aarom M. Cohen and William R Hersh. A survey of current work in biomedical text mining. Brie ngs in Bioinformatics, 6(1):51{71, 2005.
    • 14. Koby Crammer and Yoram Singer. On the algorithmic implementation of multiclass kernel-based vector machines. Journal of Machine Learning Research, 2:265{ 292, 2001.
    • 15. T Damoulas and M A Girolami. Probabilistic multi-class multi-kernel learning: On protein fold recognition and remote homology detection. Bioinformatics, Mar 2008.
    • 16. C H Ding and I Dubchak. Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics, 17(4):349{358, Apr 2001.
    • 17. Ian Donaldson, Joel Martin, Berry de Bruijn, Cheryl Wolting, Vicki Lay, Brigitte Tuekam, Shudong Zhang, Berivan Baskin, Gary D Bader, Katerina Michalickova, Tony Pawson, and Christopher WV Hogue. PreBIND and Textomy - mining the biomedical literature for protein-protein interactions using a support vector machine. BMC Bioinformatics, 4(11), 2003.
    • 18. Gunes Erkan, Arzucan Ozgur, and Dragomir R. Radev. Semi-supervised classi cation for extracting protein interaction sentences using dependency parsing. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLPCoNLL), pages 228{237, 2007.
    • 19. Mark Girolami and Simon Rogers. Variational bayesian multinomial probit regression with gaussian process priors. Neural Computation, 18(8):1790{1817, 2006.
    • 20. Mark Girolami and Mingjun Zhong. Data integration for classi cation problems employing gaussian process priors. In B. Scholkopf, J. Platt, and T. Ho man, editors, Advances in Neural Information Processing Systems 19, pages 465{472. MIT Press, Cambridge, MA, 2007.
    • 21. Claudio Giuliano, Alberto Lavelli, and Lorenza Romano. Exploiting shallow linguistic information for relation extraction from biomedical literature. In In Proc. EACL 2006, 2006.
    • 22. Y. Hao, X. Zhu, M. Huang, and M. Li. Discovering patterns to extract proteinprotein interactions from the literature: Part II. Bioinformatics, 21(15):3294{3300, August 2005.
    • 23. Jin Huang, Jingjing Lu, and Charles X. Ling. Comparing naive bayes, decision trees, and svm with auc and accuracy. In ICDM '03: Proceedings of the Third IEEE International Conference on Data Mining, page 553, Washington, DC, USA, 2003. IEEE Computer Society.
    • 24. Thorsten Joachims. Advances in Kernel Methods - Support Vector Learning, chapter Making large-Scale SVM Learning Practical. MIT-Press, 1999.
    • 25. S. Sathiya Keerthi, Olivier Chapelle, and Dennis DeCoste. Building support vector machines with reduced classi er complexity. Journal of Machine Learning Research, 7:14931515, 2006.
    • 26. J D Kim, T Ohta, Y Tateisi, and J Tsujii. GENIA corpus{semantically annotated corpus for bio-textmining. Bioinformatics, 19 Suppl 1:180{182, 2003.
    • 27. N Lama and M Girolami. Vbmp: variational Bayesian Multinomial Probit Regression for multi-class classi cation in R. Bioinformatics, 24(1):135{136, Jan 2008.
    • 28. Neil Lawrence, John C. Platt, and Michael I. Jordan. Extensions of the informative vector machine. In J. Winkler, N. D. Lawrence, and M. Niranjan, editors, Proceedings of the She eld Machine Learning Workshop, Berlin, 2005. Springer-Verlag.
    • 29. Yoonkyung Lee, Yi Lin, and Grace Wahba. Multicategory support vector machines: Theory and application to the classi cation of microarray data and satellite radiance data. Journal of the American Statistical Association, 99:67{81(15), 2004.
    • 30. David D. Lewis. Naive (Bayes) at forty: The independence assumption in information retrieval. In ECML '98: Proceedings of the 10th European Conference on Machine Learning, pages 4{15, London, UK, 1998. Springer-Verlag.
    • 31. C. D. Manning, P. Raghavan, and H. Schutze. Introduction to Information Retrieval. Cambridge University Press, 2008.
    • 32. Christopher D. Manning and Hinrich Schutze. Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge, Massachusetts, 1999.
    • 33. Edward M. Marcotte, Ioannis Xenarios, and David Eisenberg. Mining literature for protein-protein interactions. Bioinformatics, 17:359 { 363, 2001.
    • 34. J.C. Platt. Advances in Large Margin Classi ers, chapter Probabilities for SV Machines, pages 61{74. MIT Press, 1999.
    • 35. C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Pres, 2006.
    • 36. S. Rogers and M. Girolami. Multi-class semi-supervised learning with the - truncated multinomial probit gaussian process. Journal of Machine Learning Research Workshop and Conference Proceedings, 1:17{32, 2007.
    • 37. Barbara Rosario and Marti Hearst. Multi-way relation classi cation: Application to protein-protein interaction. In Proceedings of HLT-NAACL'05, 2005.
    • 38. M. Seeger and M. I. Jordan. Sparse gaussian process classi cation with multiple classes. Technical Report TR 661, Department of Statistics, University of California at Berkeley, 2004.
    • 39. Silva, Catarina, Ribeiro, and Bernardete. On text-based mining with active learning and background knowledge using svm. Soft Computing, 11(6):519{530, April 2007.
  • No related research data.
  • No similar publications.

Share - Bookmark

Cite this article