Remember Me
Or use your Academic/Social account:


Or use your Academic/Social account:


You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.


Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message


Verify Password:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
CLARIN.SI repository
Data Repository
26 Research Data
OpenAIRE Data (funded, referenced datasets)
More information
Detailed data provider information (Re3data)


  • No data provider publications found
  • Post-edited and error annotated machine translation corpus PErr 1.0

    Popović, Maja; Arčan, Mihael (2016)
    Publisher: Insight Centre for Data Analytics, National University of Ireland, Galway
    Projects: EC | TraMOOC (644333)
    Embargo end date: 2016/05/29
    The PE²rr corpus contains source language texts from different domains along with their automatically generated translations into several morphologically rich languages, their post-edited versions, and error annotations of the performed post-edit operations. The main advantage of the corpus is the fusion of post-editing and error classification tasks, which have usually been seen as two independent tasks, although naturally they are not. The corpus is further described in: Maja Popović...

    Dataset of European Parliament roll-call votes and Twitter activities MEP 1.0

    Cherepnalkoski, Darko; Karpf, Andreas; Mozetič, Igor; Grčar, Miha (2016)
    Publisher: Jožef Stefan Institute
    Projects: EC | DOLFINS (640772)
    Embargo end date: 2016/08/05
    The resource consists of two datasets related to Members of the 8th European Parliament (MEPs). The first one is a dataset of 2,535 roll-call votes of MEPs until 2016-03-01. The second one is a dataset of 26,133 retweets between MEPs in the period between 2014-10-01 and 2016-03-01. The data can be used to examine the patterns of covoting and retweeting of MEPs and analyze the extent to which they are similar. The resource is presented and used in the paper: Darko Cherepnalkoski, Andrea...

    Inflectional lexicon srLex 1.2

    Ljubešić, Nikola; Klubička, Filip; Boras, Damir (2016)
    Publisher: Faculty of Humanities and Social Sciences, University of Zagreb
    Projects: EC | ABU-MATRAN (324414)
    Embargo end date: 2016/09/19
    srLex is a large inflectional lexicon of Serbian language where each entry consists of a (wordform, lemma, MSD, frequency, per-million frequency) 5-tuple. The (wordform, lemma, MSD) triple frequencies are calculated on the srWaC v1.2 corpus. The MSD tagset follows the MULTEXT-East V5 tagset for Bosnian available at http://nl.ijs.si/ME/V5/msd/html/msd-bs.html.

    MULTEXT-East "1984" document corpus 4.0

    Erjavec, Tomaž; Bruda, Ştefan; Dimitrova, Ludmila; Ide, Nancy; Kaalep, Heiki-Jaan; Krstev, Cvetana; Orav, Heili; Oravecz, Csaba; Paldre, Leho; Petkevič, Vladimír; Priest-Dorman, Greg; Simov, Kiril; Sinapova, Lydia; Sokolovsky, Paul; Sryvkin, Sergey;... (2010)
    Publisher: Jožef Stefan Institute
    Projects: EC | MONDILEX (211938)
    Embargo end date: 2015/06/15
    The novel "1984" by George Orwell is the central component of the MULTEXT-East corpus. This parallel and sentence aligned corpus contains the novel in the English original (about 100,000 words in length), and its translations into a number of languages. This version of the corpus contains structurally annotated texts only, which contain elements such as the paragraph, the footnote, and highlighted text. In terms of linguistic annotations, the text contain names and sentences. The lin...

    Emoji Sentiment Ranking 1.0

    Kralj Novak, Petra; Smailović, Jasmina; Sluban, Borut; Mozetič, Igor (2015)
    Publisher: Jožef Stefan Institute
    Projects: EC | SIMPOL (610704)
    Embargo end date: 2016/04/14
    A lexicon of 751 emoji characters with automatically assigned sentiment. The sentiment is computed from 70,000 tweets, labeled by 83 human annotators in 13 European languages. The process and analysis of emoji sentiment ranking is described in the paper: Kralj Novak P, Smailović J, Sluban B, Mozetič I (2015) Sentiment of Emojis. PLoS ONE 10(12): e0144296. doi:10.1371/journal.pone.0144296
  • Latest Documents Timeline

    Chart is loading... It may take a bit of time. Please be patient and don't reload the page.

    Document Types

    Chart is loading... It may take a bit of time. Please be patient and don't reload the page.

    Projects with most Research Data

    Chart is loading... It may take a bit of time. Please be patient and don't reload the page.

Share - Bookmark