Remember Me
Or use your Academic/Social account:


Or use your Academic/Social account:


You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.


Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message


Verify Password:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
CLARIN.SI repository
Data Repository
27 Research Data
OpenAIRE Data (funded, referenced datasets)
More information
Detailed data provider information (Re3data)


  • No data provider publications found
  • Emoji Sentiment Ranking 1.0

    Kralj Novak, Petra; Smailović, Jasmina; Sluban, Borut; Mozetič, Igor (2015)
    Publisher: Jožef Stefan Institute
    Projects: EC | SIMPOL (610704)
    Embargo end date: 2016/04/14
    A lexicon of 751 emoji characters with automatically assigned sentiment. The sentiment is computed from 70,000 tweets, labeled by 83 human annotators in 13 European languages. The process and analysis of emoji sentiment ranking is described in the paper: Kralj Novak P, Smailović J, Sluban B, Mozetič I (2015) Sentiment of Emojis. PLoS ONE 10(12): e0144296. doi:10.1371/journal.pone.0144296

    Serbian-English parallel corpus srenWaC 1.0

    Ljubešić, Nikola; Esplà-Gomis, Miquel; Ortiz Rojas, Sergio; Klubička, Filip; Toral, Antonio (2016)
    Publisher: Jožef Stefan Institute
    Projects: EC | ABU-MATRAN (324414)
    Embargo end date: 2016/03/09
    The srenWaC corpus version 1.0 consists of parallel Serbian-English texts crawled from the .rs top-level domain for Serbia. The corpus was built with Spidextor (https://github.com/abumatran/spidextor), a tool that glues together the output of SpiderLing used for crawling and Bitextor used for bitext extraction. The accuracy of the extracted bitext, given the evaluation results on other languages, can be estimated at 74% on the segment level and 76% on the word level.

    Inflectional lexicon hrLex 1.2

    Ljubešić, Nikola; Klubička, Filip; Boras, Damir (2016)
    Publisher: Faculty of Humanities and Social Sciences, University of Zagreb
    Projects: EC | ABU-MATRAN (324414)
    Embargo end date: 2016/09/19
    hrLex is a large inflectional lexicon of Croatian language where each entry consists of a (wordform, lemma, MSD, frequency, per-million frequency) 5-tuple. The (wordform, lemma, MSD) triple frequencies are calculated on the hrWaC v2.2 corpus. The MSD tagset follows the MULTEXT-East V5 tagset for Croatian available at http://nl.ijs.si/ME/V5/msd/html/msd-hr.html.

    xLiMe Twitter Corpus XTC 1.0.1

    Rei, Luis; Krek, Simon; Mladenić, Dunja (2016)
    Publisher: Jožef Stefan Institute
    Projects: EC | XLIME (611346)
    Embargo end date: 2016/11/28
    The xLiMe Twitter Corpus contains tweets in German, Italian and Spanish manually annotated with part-of-speech, named entities, and message-level sentiment polarity. In total, the corpus contains almost 20K annotated messages and 350K tokens. The corpus is described in Luis Rei, Dunja Mladenić, Simon Krek. A Multilingual Social Media Linguistic Corpus. Proceedings of the 4th Conference on CMC and Social Media Corpora for the Humanities. 27–28 September 2016, Ljubljana, Slovenia. http:/...

    MULTEXT-East free lexicons 4.0

    Erjavec, Tomaž; Bruda, Ştefan; Derzhanski, Ivan; Dimitrova, Ludmila; Garabík, Radovan; Holozan, Peter; Ide, Nancy; Kaalep, Heiki-Jaan; Kotsyba, Natalia; Oravecz, Csaba; Petkevič, Vladimír; Priest-Dorman, Greg; Shevchenko, Igor; Simov, Kiril; Sinapova, Lydia;... (2010)
    Publisher: Jožef Stefan Institute
    Projects: EC | MONDILEX (211938)
    Embargo end date: 2015/06/15
    The MULTEXT-East morphosyntactic lexicons have a simple structure, where each line is a lexical entry with three tab-separated fields: (1) the word-form, the inflected form of the word; (2) the lemma, the base-form of the word; (3) the MSD, the morphosyntactic description of the word-form, i.e., its fine-grained PoS tag, as defined in the MULTEXT-East morphosyntactic specifications. This submission contains the freely available MULTEXT-East lexicons, while a separate submission (http://hd...
  • Latest Documents Timeline

    Chart is loading... It may take a bit of time. Please be patient and don't reload the page.

    Document Types

    Chart is loading... It may take a bit of time. Please be patient and don't reload the page.

    Projects with most Research Data

    Chart is loading... It may take a bit of time. Please be patient and don't reload the page.

Share - Bookmark