Remember Me
Or use your Academic/Social account:


Or use your Academic/Social account:


You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.


Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message


Verify Password:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Iñurrieta, Uxoa; Díaz de Ilarraza, Arantza; Labaka, Gorka; Sarasola, Kepa; Aduriz, Itziar; Carroll, John (2016)
Publisher: International Committee on Computational Linguistics (ICCL)
Languages: English
Types: Other
Subjects: QA75
We present a linguistic analysis of a set of English and Spanish verb+noun combinations (VNCs), and a method to use this information to improve VNC identification. Firstly, a sample of frequent VNCs are analysed in-depth and tagged along lexico-semantic and morphosyntactic dimensions, obtaining satisfactory inter-annotator agreement scores. Then, a VNC identification experiment is undertaken, where the analysed linguistic data is combined with chunking information and syntactic dependencies. A comparison between the results of the experiment and the results obtained by a basic detection method shows that VNC identification can be greatly improved by using linguistic information, as a large number of additional occurrences are detected with high precision.
  • The results below are discovered through our pilot algorithms. Let us know how we are doing!

    • Timothy Baldwin, Colin Bannard, Takaaki Tanaka, and Dominic Widdows. 2003. An empirical model of multiword expression decomposability. In Proceedings of the ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, 89-96.
    • Sabine Bartsch. 2004. Structural and Functional Properties of Collocations in English: A Corpus Study of Lexical and Pragmatic Constraints on Lexical Co-occurrence. Gunter Narr Verlag, Tu¨bingen.
    • Julia Birke and Anoop Sarkar. 2006. A clustering approach for nearly unsupervised recognition of nonliteral language. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, 329-336.
    • Paul Cook, Afsaneh Fazly, and Suzanne Stevenson. 2008. The VNC-Tokens dataset. In Proceedings of the LREC Workshop Towards a Shared Task for Multiword Expressions (MWE 2008), 19-22.
    • Margaret Deuter (ed.) 2008. Oxford Collocations Dictionary for Students of English. Oxford University Press.
    • Stefan Evert. 2004. The Statistics of Word Cooccurrences: Word Pairs and Collocations. PhD dissertation, IMS, University of Stuttgart.
    • Stefan Evert. 2008. Corpora and collocations. In Anke Lu¨deling and Merja Kyto¨ (eds.), Corpus Linguistics: An International Handbook. Mouton de Gruyter, Berlin, 1212-1248.
    • Afsaneh Fazly and Suzanne Stevenson. 2007. Distinguishing subtypes of multiword expressions using linguistically-motivated statistical measures. In Proceedings of the ACL-SIGLEX Workshop on a Broader Perspective on Multiword Expressions, 9-16.
    • Ray Jackendoff. 1997. The Architecture of the Language Faculty. MIT Press, Cambridge, MA.
    • Wei Li, Xiuhong Zhang, Cheng Niu, Yuankai Jiang, and Rohini Srihari. 2003. An expert lexicon approach to identifying English phrasal verbs. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics: Long Papers, 513-520.
    • Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP natural language processing toolkit. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 55-60.
    • Aingeru Mayor, In˜aki Alegria, Arantza D´ıaz de Ilarraza, Gorka Labaka, Mikel Lersundi, and Kepa Sarasola. 2011. Matxin, an open-source rule-based machine translation system for Basque. Machine Translation, 25(1): 53-82.
    • Diana McCarthy, Bill Keller, and John Carroll. 2003. Detecting a continuum of compositionality in phrasal verbs. In Proceedings of the ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, 73-80.
    • Igor A. Mel'c´uk. 1998. Collocations and lexical functions. In Anthony P. Cowie (ed.), Phraseology. Theory, Analysis, and Applications. Oxford University Press, 23-53.
    • Llu´ıs Padro´ and Evgeny Stanilovsky. 2012. FreeLing 3.0: Towards wider multilinguality. In Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2473-2479.
    • Carlos Ramisch. 2015. Multiword Expressions Acquisition: A Generic and Open Framework. Springer International Publishing, Switzerland.
    • Sara Rodr´ıguez-Ferna´ndez, Luis Espinosa-Anke, Roberto Carlini, and Leo Wanner. 2016. Semantics-driven recognition of collocations using word embeddings. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL): Short Papers, 499-505.
    • Ivan A. Sag, Timothy Baldwin, Francis Bond, Ann Copestake, and Dan Flickinger. 2002. Multiword expressions: A pain in the neck for NLP. In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics (CICLING'02). Springer Berlin Heidelberg, 1-15.
    • Violeta Seretan and Eric Wehrli. 2009. Multilingual collocation extraction with a syntactic parser. Language Resources and Evaluation, 43(1): 71-85.
    • Violeta Seretan. 2013. On collocations and their interaction with parsing and translation. Informatics, 1(1): 11-33.
    • John Sinclair. 1991. Corpus, Concordance, Collocation. Oxford University Press.
    • Caroline Sporleder, and Linlin Li. 2009. Unsupervised recognition of literal and non-literal use of idiomatic expressions. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics (EACL), 754-762.
    • Zdenka Uresova, Jana Sindlerova, Eva Fucikova, and Jan Hajic. 2013. An analysis of annotation of verb-noun idiomatic combinations in a parallel dependency corpus. In Proceedings of the Ninth Workshop on Multiword Expressions (MWE 2013), 58-63.
    • Veronika Vincze. 2012. Light verb constructions in the SzegedParalellFX English-Hungarian parallel corpus. In Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2381-2388.
    • Eric Wehrli. 2014. The relevance of collocations for parsing. In Proceedings of the 10th Workshop on Multiword Expressions (MWE 2014), 26-32.
    • Stefanie Wulff. 2008. Rethinking Idiomaticity: A Usage-based Approach. Continuum, London, New York.
  • No related research data.
  • Discovered through pilot similarity algorithms. Send us your feedback.

Share - Bookmark

Download from

Cite this article