LOGIN TO YOUR ACCOUNT

Username
Password
Remember Me
Or use your Academic/Social account:

CREATE AN ACCOUNT

Or use your Academic/Social account:

Congratulations!

You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.

Important!

Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message

CREATE AN ACCOUNT

Name:
Username:
Password:
Verify Password:
E-mail:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Name
LINDAT/CLARIN repository
Type
Data Repository
Items
39 Research Data
Compatibility
OpenAIRE Data (funded, referenced datasets)
OAI-PMH
http://lindat.mff.cuni.cz/repository/oai/openaire_data
More information
Detailed data provider information (Re3data)

 

  • No data provider publications found
  • Hindi Web Texts

    Bojar, Ondřej; Straňák, Pavel; Zeman, Daniel (2011)
    Publisher: Charles University in Prague, UFAL
    Projects: EC | EUROMATRIXPLUS (231720)
    Embargo end date: 2011/11/23
    A Hindi corpus of texts downloaded mostly from news sites. Contains both the original raw texts and an extensively cleaned-up and tokenized version suitable for language modeling. 18M sentences, 308M tokens

    Eye-Tracking Recordings from a Pilot Study of WMT-style MT Outputs Ranking

    Bojar, Ondřej; Děchtěrenko, Filip; Zelenina, Maria (2016)
    Publisher: Charles University in Prague, UFAL
    Projects: EC | QT21 (645452)
    Embargo end date: 2016/04/01
    This package contains the eye-tracker recordings of 8 subjects evaluating English-to-Czech machine translation quality using the WMT-style ranking of sentences. We provide the set of sentences evaluated, the exact screens presented to the annotators (including bounding box information for every area of interest and even for individual letters in the text) and finally the raw EyeLink II files with gaze trajectories. The description of the experiment can be found in the paper: Ondře...

    WMT16 Tuning Shared Task Models (English-to-Czech)

    Kamran, Amir; Jawaid, Bushra; Bojar, Ondřej; Stanojevic, Milos (2016)
    Publisher: Charles University in Prague, UFAL
    Projects: EC | QT21 (645452)
    Embargo end date: 2016/03/22
    This item contains models to tune for the WMT16 Tuning shared task for English-to-Czech. CzEng 1.6pre (http://ufal.mff.cuni.cz/czeng/czeng16pre) corpus is used for the training of the translation models. The data is tokenized (using Moses tokenizer), lowercased and sentences longer than 60 words and shorter than 4 words are removed before training. Alignment is done using fast_align (https://github.com/clab/fast_align) and the standard Moses pipeline is used for training. Two 5-gram l...

    Khresmoi Summary Translation Test Data 1.1

    Dušek, Ondřej; Hajič, Jan; Hlaváčová, Jaroslava; Pecina, Pavel; Tamchyna, Aleš; Urešová, Zdeňka (2014)
    Publisher: Charles University in Prague, UFAL
    Projects: EC | KHRESMOI (257528)
    Embargo end date: 2014/04/28
    This package contains data sets for development and testing of machine translation of sentences from summaries of medical articles between Czech, English, French, and German.

    WMT 2011 Testing Set

    Galuščáková, Petra; Bojar, Ondřej (2012)
    Publisher: Charles University in Prague, UFAL
    Projects: EC | EUROMATRIXPLUS (231720)
    Embargo end date: 2012/05/15
    Testing set from WMT 2011 [1] competition, manually translated from Czech and English into Slovak. Test set contains 3003 sentences in Czech, Slovak and English. Test set is described in [2]. References: [1] http://www.statmt.org/wmt11/evaluation-task.html [2] Petra Galuščáková and Ondřej Bojar. Improving SMT by Using Parallel Data of a Closely Related Language. In Human Language Technologies - The Baltic Perspective - Proceedings of the Fifth International Conference Baltic HLT 20...
  • Latest Documents Timeline

    Chart is loading... It may take a bit of time. Please be patient and don't reload the page.

    Document Types

    Chart is loading... It may take a bit of time. Please be patient and don't reload the page.

    Projects with most Research Data

    Chart is loading... It may take a bit of time. Please be patient and don't reload the page.

Share - Bookmark