Remember Me
Or use your Academic/Social account:


Or use your Academic/Social account:


You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.


Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message


Verify Password:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Brewster, Christopher; Jupp, Simon; Luciano, Joanne; Shotton, David; Stevens, Robert D; Zhang, Ziqi (2009)
Publisher: BioMed Central
Journal: BMC Bioinformatics
Languages: English
Types: Article
Subjects: Molecular Biology, Biochemistry, Computer Science Applications, Proceedings
Ontology construction for any domain is a labour intensive and complex process. Any methodology that can reduce the cost and increase efficiency has the potential to make a major impact in the life sciences. This paper describes an experiment in ontology construction from text for the animal behaviour domain. Our objective was to see how much could be done in a simple and relatively rapid manner using a corpus of journal papers. We used a sequence of pre-existing text processing steps, and here describe the different choices made to clean the input, to derive a set of terms and to structure those terms in a number of hierarchies. We describe some of the challenges, especially that of focusing the ontology appropriately given a starting point of a heterogeneous corpus. Results - Using mainly automated techniques, we were able to construct an 18055 term ontology-like structure with 73% recall of animal behaviour terms, but a precision of only 26%. We were able to clean unwanted terms from the nascent ontology using lexico-syntactic patterns that tested the validity of term inclusion within the ontology. We used the same technique to test for subsumption relationships between the remaining terms to add structure to the initially broad and shallow structure we generated. All outputs are available at http://thirlmere.aston.ac.uk/~kiffer/animalbehaviour/ webcite. Conclusion - We present a systematic method for the initial steps of ontology or structured vocabulary construction for scientific domains that requires limited human effort and can make a contribution both to ontology learning and maintenance. The method is useful both for the exploration of a scientific domain and as a stepping stone towards formally rigourous ontologies. The filtering of recognised terms from a heterogeneous corpus to focus upon those that are the topic of the ontology is identified to be one of the main challenges for research in ontology learning.
  • The results below are discovered through our pilot algorithms. Let us know how we are doing!

    • 1. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, IsselTarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 2000, 25:25-29.
    • 2. Dublin Core [http://dublincore.org/]
    • 3. Norton C, Sarkar IN, Leary P: uBio - Universal Biological Indexer and Organizer. Web Page 2009 [http://www.ubio.org/].
    • 4. Ecoregion [http://www.worldwildlife.org/science/ecoregions/delin eation.html]
    • 5. Animal Behavior Ontology 2006 [http://ethodata.comm.nsdl.org/ ].
    • 6. Animal Behaviour Ontology Development Web page 2007 [http://ontogenesis.ontonet.org/moin/AnimalBehaviourOntologyDe velopment]. [Part of the Ontogenesis Project wiki]
    • 7. Cimiano P, Pivk A, Schmidt-Thieme L, Staab S: Learning Taxonomic Relations from Heterogeneous Sources of Evidence. Ontology Learning from Text: Methods, Evaluation and Applications, Frontiers in Artificial Intelligence 2005 [http://www.aifb.uni-karlsruhe.de/ WBS/pci/OLP_Book_Cimiano.pdf]. IOS Press
    • 8. Brewster C, Iria J, Zhang Z, Ciravegna F, Guthrie L, Wilks Y: Dynamic Iterative Ontology Learning. Recent Advances in Natural Language Processing (RANLP 07), Borovets, Bulgaria 2007 [http:// www.dcs.shef.ac.uk/~kiffer/papers/Brewster_RANLP07.pdf].
    • 9. Navigli R, Velardi P: Learning Domain Ontologies from Document Warehouses and Dedicated Websites. Computational Linguistics 2004, 30(2):151-179.
    • 10. Luciano JS, Stevens RD: e-Science and biological pathway semantics. BMC Bioinformatics 2007, 8(Suppl 3):S3.
    • 11. Stevens R, Aranguren ME, Wolstencroft K, Sattler U, Drummond N, Horridge M, Rector A: Using OWL to model biological knowledge. Int J Hum-Comput Stud 2007, 65(7):583-594.
    • 12. Hirschman L, Clark C, Cohen KB, Mardis S, Luciano J, Kottmann R, Cole J, Markowitz V, Kyrpides N, Morrison N, Schriml LM, Field D, Project N: Habitat-Lite: a GSC case study based on free text terms for environmental metadata. OMICS 2008, 12(2):129-136.
    • 13. Brewster C, Ciravegna F, Wilks Y: Knowledge Acquisition for Knowledge Management: Position Paper. Proceeding of the IJCAI-2001 Workshop on Ontology Learning 2001 [http:// www.dcs.shef.ac.uk/~kiffer/papers/ontolearning.pdf]. Seattle, WA: IJCAI
    • 14. Elsevier Author Instructions [http://www.elsevier.com/wps/find/ journaldescription.cws_home/622782/authorinstructions]
    • 15. OBI Consortium: Ontology for Biomedical Investigation. [http://obi-ontology.org/].
    • 16. Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, Goldberg LJ, Eilbeck K, Ireland A, Mungall CJ, Consortium OBI, Leontis N, Rocca-Serra P, Ruttenberg A, Sansone SA, Scheuermann RH, Shah N, Whetzel PL, Lewis S: The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol 2007, 25(11):1251-1255.
    • 17. Zhou X, Zhang X, Hu X: Dragon Toolkit: Incorporating Autolearned Semantic Knowledge into Large-Scale Text Retrieval and Mining. Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence (ICTAI) 2007 [http:// dragon.ischool.drexel.edu/].
    • 18. Zhang Z, Iria J, Brewster C, Ciravegna F: A Comparative Evaluation of Term Recognition Algorithms. Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC08), Marrakech, Morocco 2008 [http://www.dcs.shef.ac.uk/~kiffer/papers/ Zhang_LREC08.pdf].
    • 19. Horrocks I, Patel-Schneider PF, van Harmelen F: From SHIQ and RDF to OWL the making of a Web ontology language. Journal of Web Semantics 2003, 1: [http://www.websemanticsjournal.org/ papers/20040701/document3.pdf].
    • 20. Hearst M: Automatic Acquisition of Hyponyms from Large Text Corpora. Proceedings of the Fourteenth International Conference on Computational Linguistics (COLING 92), Nantes, France, July 1992 1992.
    • 21. Yahoo BOSS Web Service [http://developer.yahoo.com/search/ boss/]
    • 22. Brewster C: Mind the Gap: Bridging from Text to Ontological Knowledge. In PhD thesis Department of Computer Science, University of Sheffield; 2008.
    • 23. Iria J, Brewster C, Ciravegna F, Wilks Y: An Incremental Tri-Partite Approach To Ontology Learning. Proceedings of the Language Resources and Evaluation Conference (LREC-06), 22-28 May, Genoa, Italy 2006 [http://www.dcs.shef.ac.uk/~kiffer/papers/ Iria_lrec_abraxas.pdf].
    • 24. Rector AL, Wroe C, Rogers J, Roberts A: Untangling taxonomies and relationships: personal and practical problems in loosely coupled development of large ontologies. K-CAP 2001 2001:139-146 [http://www.cs.man.ac.uk/~rector/papers/rector-kcap-untangling-taxonomies-web.pdf].
    • 25. BioPAX - Metabolic Pathways Ontology [http://www.bio pax.org/]
    • 26. Influenzo - the Influenza Ontology. . http://influenzaontology wiki.igs.umaryland.edu/
    • 27. Experimental Factor Ontology Web page 2009 [http:// www.ebi.ac.uk/microarray-srv/efo/].
    • 28. Cell Type Ontology (CTO) Normalisation Experiments Web page 2008 [http://www.gong.manchester.ac.uk/CTON.html].
    • 29. Good BM, Tranfield EM, Tan PC, Shehata M, Singhera GK, Gosselink J, Okon EB, Wilkinson MD: Fast, Cheap and Out of Control: A Zero Curation Model for Ontology Development. Pacific Symposium on Biocomputing, Hawaii 2006, 11:128-139 [http://psb.stan ford.edu/psb-online/proceedings/psb06/good.pdf].
    • 30. Justeson JS, Katz SM: Technical Terminology Some Linguistic Properties and an Algorithm for Identification in Text. Natural Language Engineering 1995, 1:9-27.
    • 31. Kageura K, Umino B: Methods of automatic term recognition: a review. Terminology 1996, 3(2):259-289.
    • 32. Ahmad K: Pragmatics of Specialist Terms: The Acquisition and Representation of Terminology. In Proceedings of the Third International EAMT Workshop on Machine Translation and the Lexicon London, UK: Springer-Verlag; 1995:51-76.
    • 33. Sclano F, Velardi P: TermExtractor: a Web Application to Learn the Shared Terminology of Emergent Web Communities. Proceedings of the 3rd International Conference on Interoperability for Enterprise Software and Applications (I-ESA 2007), Funchal (Madeira Island), Portugal 2007 [http://lcl2.uniroma1.it/termextractor/help/IESA_2007_Sclano_Velardi.pdf].
    • 34. Kozakov L, Park Y, Fin TH, Drissi Y, Doganata YN, Cofino T: Glossary extraction and utilization in the information search and delivery system for IBM Technical Support. IBM Systems Journal 2004, 43(3):546-563.
    • 35. Frantzi KT, Ananiadou S: The C/NC value domain independent method for multi-word term extraction. Journal of Natural Language Processing 1999, 6(3):145-180.
    • 36. Park Y, Byrd RJ, Boguraev B: Automatic Glossary Extraction: Beyond Terminology Identification. 19th International Conference on Computational Linguistics - COLING 02 2002 [http:// acl.ldc.upenn.edu/C/C02/C02-1142.pdf]. Taipei, Taiwan: Howard International House and Academia Sinica
    • 37. Yeh A, Morgan A, Colosimo M, Hirschman L: BioCreAtIvE task 1A: gene mention finding evaluation. BMC Bioinformatics 2005, 6(Suppl 1):S2.
    • 38. Krauthammer M, Nenadic G: Term identification in the biomedical literature. J Biomed Inform 2004, 37(6):512-526.
    • 39. Afzal H, Stevens R, Nenadic G: Towards Semantic Annotation of Bioinformatics Services: Building a Controlled Vocabulary. 3rd International Symposium on Semantic Mining in Biomedicine 2008 [http://mars.cs.utu.fi/smbm2008/files/smbm2008proceedings/ smbmpaper_26.pdf].
    • 40. Shamsfard M, Barforoush AA: Learning ontologies from natural language texts. Int J Hum-Comput Stud 2004, 60:17-63.
    • 41. Cimiano P, Völker J: Text2Onto. In Natural Language Processing and Information Systems, 10th International Conference on Applications of Natural Language to Information Systems, NLDB 2005, Alicante, Spain, June 15-17, 2005, Proceedings, of Lecture Notes in Computer Science Volume 3513. Edited by: Montoyo A, Muñoz R, Métais E. Springer; 2005:227-238.
    • 42. Prince EF: Toward a taxonomy of given-new information. In Syntax and semantics: Radical Pragmatics Volume 14. Edited by: Cole P. New York: Academic Press; 1981:223-255.
    • 43. Brewster C, Ciravegna F, Wilks Y: Background and Foreground Knowledge in Dynamic Ontology Construction. Proceedings of the Semantic Web Workshop, Toronto, August 2003 2003 [http:// www.dcs.shef.ac.uk/~kiffer/papers/Brewster_SemWeb03.pdf]. SIGIR
    • 44. Maedche A, Staab S: Mining Ontologies from Text. In Knowledge Acquisition, Modeling and Management, Proceeedings of the 12th International Conference, EKAW 2000, Juan-les-Pins, France, October 2-6, 2000, Proceedings, of Lecture Notes in Computer Science Volume 1937. Edited by: Dieng R, Corby O. Springer; 2000:189-202.
    • 45. van Heijst G, Schreiber AT, Wielinga BJ: Using explicit ontologies in KBS development. Int J Hum-Comput Stud 1997, 46(2183-292 [http://dx.doi.org/10.1006/ijhc.1996.0090].
  • No related research data.
  • No similar publications.

Share - Bookmark

Funded by projects


Cite this article