Remember Me
Or use your Academic/Social account:


Or use your Academic/Social account:


You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.


Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message


Verify Password:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
CLARIN.SI repository
Data Repository
26 Research Data
OpenAIRE Data (funded, referenced datasets)
More information
Detailed data provider information (Re3data)


  • No data provider publications found
  • Emoji Sentiment Ranking 1.0

    Kralj Novak, Petra; Smailović, Jasmina; Sluban, Borut; Mozetič, Igor (2015)
    Publisher: Jožef Stefan Institute
    Projects: EC | SIMPOL (610704)
    Embargo end date: 2016/04/14
    A lexicon of 751 emoji characters with automatically assigned sentiment. The sentiment is computed from 70,000 tweets, labeled by 83 human annotators in 13 European languages. The process and analysis of emoji sentiment ranking is described in the paper: Kralj Novak P, Smailović J, Sluban B, Mozetič I (2015) Sentiment of Emojis. PLoS ONE 10(12): e0144296. doi:10.1371/journal.pone.0144296

    Serbian-English parallel corpus srenWaC 1.0

    Ljubešić, Nikola; Esplà-Gomis, Miquel; Ortiz Rojas, Sergio; Klubička, Filip; Toral, Antonio (2016)
    Publisher: Jožef Stefan Institute
    Projects: EC | ABU-MATRAN (324414)
    Embargo end date: 2016/03/09
    The srenWaC corpus version 1.0 consists of parallel Serbian-English texts crawled from the .rs top-level domain for Serbia. The corpus was built with Spidextor (https://github.com/abumatran/spidextor), a tool that glues together the output of SpiderLing used for crawling and Bitextor used for bitext extraction. The accuracy of the extracted bitext, given the evaluation results on other languages, can be estimated at 74% on the segment level and 76% on the word level.

    Inflectional lexicon hrLex 1.2

    Ljubešić, Nikola; Klubička, Filip; Boras, Damir (2016)
    Publisher: Faculty of Humanities and Social Sciences, University of Zagreb
    Projects: EC | ABU-MATRAN (324414)
    Embargo end date: 2016/09/19
    hrLex is a large inflectional lexicon of Croatian language where each entry consists of a (wordform, lemma, MSD, frequency, per-million frequency) 5-tuple. The (wordform, lemma, MSD) triple frequencies are calculated on the hrWaC v2.2 corpus. The MSD tagset follows the MULTEXT-East V5 tagset for Croatian available at http://nl.ijs.si/ME/V5/msd/html/msd-hr.html.

    xLiMe Twitter Corpus XTC 1.0.1

    Rei, Luis; Krek, Simon; Mladenić, Dunja (2016)
    Publisher: Jožef Stefan Institute
    Projects: EC | XLIME (611346)
    Embargo end date: 2016/11/28
    The xLiMe Twitter Corpus contains tweets in German, Italian and Spanish manually annotated with part-of-speech, named entities, and message-level sentiment polarity. In total, the corpus contains almost 20K annotated messages and 350K tokens. The corpus is described in Luis Rei, Dunja Mladenić, Simon Krek. A Multilingual Social Media Linguistic Corpus. Proceedings of the 4th Conference on CMC and Social Media Corpora for the Humanities. 27–28 September 2016, Ljubljana, Slovenia. http:/...

    Dataset of European Parliament roll-call votes and Twitter activities MEP 1.0

    Cherepnalkoski, Darko; Karpf, Andreas; Mozetič, Igor; Grčar, Miha (2016)
    Publisher: Jožef Stefan Institute
    Projects: EC | DOLFINS (640772)
    Embargo end date: 2016/08/05
    The resource consists of two datasets related to Members of the 8th European Parliament (MEPs). The first one is a dataset of 2,535 roll-call votes of MEPs until 2016-03-01. The second one is a dataset of 26,133 retweets between MEPs in the period between 2014-10-01 and 2016-03-01. The data can be used to examine the patterns of covoting and retweeting of MEPs and analyze the extent to which they are similar. The resource is presented and used in the paper: Darko Cherepnalkoski, Andrea...
  • Latest Documents Timeline

    Chart is loading... It may take a bit of time. Please be patient and don't reload the page.

    Document Types

    Chart is loading... It may take a bit of time. Please be patient and don't reload the page.

    Projects with most Research Data

    Chart is loading... It may take a bit of time. Please be patient and don't reload the page.

Share - Bookmark