LOGIN TO YOUR ACCOUNT

Username
Password
Remember Me
Or use your Academic/Social account:

CREATE AN ACCOUNT

Or use your Academic/Social account:

Congratulations!

You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.

Important!

Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message

CREATE AN ACCOUNT

Name:
Username:
Password:
Verify Password:
E-mail:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1

Search filters

Refine by

Publication Year

2012 (8)
2016 (8)
2017 (8)
2015 (3)
2018 (3)
View more
Publication Year

2012 (8)
2016 (8)
2017 (8)
2015 (3)
2018 (3)
2011 (2)
2013 (2)
2014 (2)
2010 (1)

Access Mode

Type

Dataset (37)

Language

37 research data, page 1 of 4

WMT16 Tuning Shared Task Models (English-to-Czech)

Kamran, Amir; Jawaid, Bushra; Bojar, Ondřej; Stanojevic, Milos (2016)
Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Projects: EC | QT21 (645452)
Embargo end date: 2016/03/22
This item contains models to tune for the WMT16 Tuning shared task for English-to-Czech. CzEng 1.6pre (http://ufal.mff.cuni.cz/czeng/czeng16pre) corpus is used for the training of the translation models. The data is tokenized (using Moses tokenizer), lowercased and sentences longer than 60 words and shorter than 4 words are removed before training. Alignment is done using fast_align (https://github.com/clab/fast_align) and the standard Moses pipeline is used for training. Two 5-gram l...

Khresmoi Query Translation Test Data 1.0

Pecina, Pavel; Dušek, Ondřej; Hajič, Jan; Urešová, Zdeňka (2013)
Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Projects: EC | KHRESMOI (257528)
Embargo end date: 2014/04/02
This package contains data sets for development and testing of machine translation of medical search short queries between Czech, English, French, and German. The queries come from general public and medical experts.

Prague Czech-English Dependency Treebank 2.0 Coref

Nedoluzhko, Anna; Novák, Michal; Cinková, Silvie; Mikulová, Marie; Mírovský, Jiří (2016)
Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Projects: EC | QTLEAP (610516)
Embargo end date: 2016/03/30
The Prague Czech-English Dependency Treebank 2.0 Coref (PCEDT 2.0 Coref) is a parallel treebank building upon the original PCEDT 2.0 release and enriching it with the extended manual annotation of coreference, as well as with an improved automatic annotation of the coreferential expression alignment.

Test Data DE-EN APE Shared Task WMT17

Turchi, Marco; Chatterjee, Rajen; Negri, Marco (2017)
Publisher: Fondazione Bruno Kessler, Trento, Italy
Projects: EC | QT21 (645452)
Embargo end date: 2017/04/11
Test data for the WMT 2017 Automatic post-editing task (the same used for the Sentence-level Quality Estimation task). They consist in German-English triplets (source and target) belonging to the pharmacological domain and already tokenized. Test set contains 2,000 pairs. All data is provided by the EU project QT21 (http://www.qt21.eu/).

Many Czech References for 50 Sentences Selected from WMT11 Data

Bojar, Ondřej; Macháček, Matouš; Tamchyna, Aleš; Zeman, Daniel (2013)
Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Projects: EC | MOSESCORE (288487)
Embargo end date: 2013/12/10
This dataset contains the whole set of very many Czech translations for 50 English source sentences coming from WMT11 test set (http://www.statmt.org/wmt11). In total, there are 15431447 Czech sentences, i.e. 300k reference translations per source English sentence on average, but the exact number greatly varies across sentences. You can find more details in included README file. If you use this dataset, please cite the following paper which describes the technique used to construc...

Khresmoi Summary Translation Test Data 2.0

Dušek, Ondřej; Hajič, Jan; Hlaváčová, Jaroslava; Libovický, Jindřich; Pecina, Pavel; Tamchyna, Aleš; Urešová, Zdeňka (2017)
Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Projects: EC | KHRESMOI (257528)
Embargo end date: 2017/04/03
This package contains data sets for development (Section dev) and testing (Section test) of machine translation of sentences from summaries of medical articles between Czech, English, French, German, Hungarian, Polish, Spanish and Swedish. Version 2.0 extends the previous version by adding Hungarian, Polish, Spanish, and Swedish translations.

WMT16 APE Shared Task Data

Turchi, Marco; Chatterjee, Rajen; Negri, Matteo (2016)
Publisher: Fondazione Bruno Kessler, Trento, Italy
Projects: EC | QT21 (645452)
Embargo end date: 2016/02/21
Training, development and text data (the same used for the Sentence-level Quality Estimation task) consist in English-German triplets (source, target and post-edit) belonging to the IT domain and already tokenized. Training and development respectively contain 12,000 and 1,000 triplets, while the test set 2,000 instances. All data is provided by the EU project QT21 (http://www.qt21.eu/).

WMT18 APE Shared Task: En-DE NMT Train and Dev Data

Turchi, Marco; Negri, Matteo; Chatterjee, Rajen (2018)
Publisher: Fondazione Bruno Kessler, Trento, Italy
Projects: EC | QT21 (645452)
Embargo end date: 2018/02/12
Training and development data for the WMT 2018 Automatic post-editing task. They consist in English-German triplets (source, target and post-edit) belonging to the information technology domain and already tokenized. Training and development respectively contain 13,442 and 1,000 triplets. A neural machine translation system has been used to generate the target segments. All data is provided by the EU project QT21 (http://www.qt21.eu/).

Khresmoi Query Translation Test Data 2.0

Pecina, Pavel; Dušek, Ondřej; Hajič, Jan; Libovický, Jindřich; Urešová, Zdeňka (2017)
Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Projects: EC | KHRESMOI (257528)
Embargo end date: 2017/04/03
This package contains data sets for development and testing of machine translation of medical queries between Czech, English, French, German, Hungarian, Polish, Spanish ans Swedish. The queries come from general public and medical experts. This is version 2.0 extending the previous version by adding Hungarian, Polish, Spanish, and Swedish translations.

WMT 2011 Testing Set

Galuščáková, Petra; Bojar, Ondřej (2012)
Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Projects: EC | EUROMATRIXPLUS (231720)
Embargo end date: 2012/05/15
Testing set from WMT 2011 [1] competition, manually translated from Czech and English into Slovak. Test set contains 3003 sentences in Czech, Slovak and English. Test set is described in [2]. References: [1] http://www.statmt.org/wmt11/evaluation-task.html [2] Petra Galuščáková and Ondřej Bojar. Improving SMT by Using Parallel Data of a Closely Related Language. In Human Language Technologies - The Baltic Perspective - Proceedings of the Fifth International Conference Baltic HLT 20...