LOGIN TO YOUR ACCOUNT

Username
Password
Remember Me
Or use your Academic/Social account:

CREATE AN ACCOUNT

Or use your Academic/Social account:

Congratulations!

You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.

Important!

Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message

CREATE AN ACCOUNT

Name:
Username:
Password:
Verify Password:
E-mail:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Name
LINDAT/CLARIN repository
Type
Data Repository
Items
33 Research Data
Compatibility
OpenAIRE Data (funded, referenced datasets)
OAI-PMH
http://lindat.mff.cuni.cz/repository/oai/openaire_data
More information
Detailed data provider information (Re3data)

 

  • No data provider publications found
  • English-Hindi Parallel Corpus

    Bojar, Ondřej; Straňák, Pavel; Zeman, Daniel; Jain, Gaurav; Damani, Om Prakesh (2010)
    Publisher: Charles University in Prague, UFAL
    Projects: EC | EUROMATRIXPLUS (231720)
    Embargo end date: 2011/11/07
    English-Hindi parallel corpus collected from several sources. Tokenized and sentence-aligned. A part of the data is our patch for the Emille parallel corpus.

    MSTperl parser

    Rosa, Rudolf (2014)
    Publisher: Charles University in Prague, UFAL
    Projects: EC | QTLEAP (610516)
    Embargo end date: 2014/04/07
    MSTperl is a Perl reimplementation of the MST parser of Ryan McDonald (http://www.seas.upenn.edu/~strctlrn/MSTParser/MSTParser.html). MST parser (Maximum Spanning Tree parser) is a state-of-the-art natural language dependency parser -- a tool that takes a sentence and returns its dependency tree. In MSTperl, only some functionality was implemented; the limitations include the following: the parser is a non-projective one, curently with no possibility of enforcing the requirement of...

    Many Czech References for 50 Sentences Selected from WMT11 Data

    Bojar, Ondřej; Macháček, Matouš; Tamchyna, Aleš; Zeman, Daniel (2013)
    Publisher: Charles University in Prague, UFAL
    Projects: EC | MOSESCORE (288487)
    Embargo end date: 2013/12/10
    This dataset contains the whole set of very many Czech translations for 50 English source sentences coming from WMT11 test set (http://www.statmt.org/wmt11). In total, there are 15431447 Czech sentences, i.e. 300k reference translations per source English sentence on average, but the exact number greatly varies across sentences. You can find more details in included README file. If you use this dataset, please cite the following paper which describes the technique used to construc...

    Czech-Slovak Parallel Corpus

    Galuščáková, Petra; Garabík, Radovan; Bojar, Ondřej (2012)
    Publisher: Charles University in Prague, UFAL
    Projects: EC | EUROMATRIXPLUS (231720)
    Embargo end date: 2012/05/15
    Czech-Slovak parallel corpus consisting of several freely available corpora (Acquis [1], Europarl [2], Official Journal of the European Union [3] and part of OPUS corpus [4] – EMEA, EUConst, KDE4 and PHP) and downloaded website of European Commission [5]. Corpus is published in both in plaintext format and with an automatic morphological annotation. References: [1] http://langtech.jrc.it/JRC-Acquis.html/ [2] http://www.statmt.org/europarl/ [3] http://apertium.eu/data [4] ht...

    Urdu Monolingual Corpus

    Jawaid, Bushra; Kamran, Amir; Bojar, Ondřej (2014)
    Publisher: Charles University in Prague, UFAL
    Projects: EC | MOSESCORE (288487)
    Embargo end date: 2014/03/27
    We release a sizeable monolingual Urdu corpus automatically tagged with part-of-speech tags. We extend the work of Jawaid and Bojar (2012) who use three different taggers and then apply a voting scheme to disambiguate among the different choices suggested by each tagger. We run this complex ensemble on a large monolingual corpus and release the both plain and tagged corpora.
  • Latest Documents Timeline

    Chart is loading... It may take a bit of time. Please be patient and don't reload the page.

    Document Types

    Chart is loading... It may take a bit of time. Please be patient and don't reload the page.

    Projects with most Research Data

    Chart is loading... It may take a bit of time. Please be patient and don't reload the page.

Share - Bookmark