LOGIN TO YOUR ACCOUNT

Username
Password
Remember Me
Or use your Academic/Social account:

CREATE AN ACCOUNT

Or use your Academic/Social account:

Congratulations!

You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.

Important!

Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message

CREATE AN ACCOUNT

Name:
Username:
Password:
Verify Password:
E-mail:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1

Search filters

Refine by

Publication Year

2012 (8)
2016 (8)
2017 (7)
2014 (5)
2015 (5)
View more
Publication Year

2012 (8)
2016 (8)
2017 (7)
2014 (5)
2015 (5)
2013 (3)
2011 (2)
2010 (1)

Access Mode

Type

Dataset (31)
Software (8)

Language

39 research data, page 1 of 4

WMT16 Quality Estimation Shared Task Training and Development Data

SPECIA Lucia; Logacheva, Varvara; Scarton, Carolina (2016)
Publisher: University of Sheffield
Projects: EC | QT21 (645452)
Embargo end date: 2016/02/29
Training and development data for the WMT16 QE task. Test data will be published as a separate item. This shared task will build on its previous four editions to further examine automatic methods for estimating the quality of machine translation output at run-time, without relying on reference translations. We include word-level, sentence-level and document-level estimation. The sentence and word-level tasks will explore a large dataset produced from post-editions by professional translat...

WMT 2011 Testing Set

Galuščáková, Petra; Bojar, Ondřej (2012)
Publisher: Charles University in Prague, UFAL
Projects: EC | EUROMATRIXPLUS (231720)
Embargo end date: 2012/05/15
Testing set from WMT 2011 [1] competition, manually translated from Czech and English into Slovak. Test set contains 3003 sentences in Czech, Slovak and English. Test set is described in [2]. References: [1] http://www.statmt.org/wmt11/evaluation-task.html [2] Petra Galuščáková and Ondřej Bojar. Improving SMT by Using Parallel Data of a Closely Related Language. In Human Language Technologies - The Baltic Perspective - Proceedings of the Fifth International Conference Baltic HLT 20...

Prague Czech-English Dependency Treebank 2.0

Texts The Prague Czech-English Dependency Treebank 2.0 (PCEDT 2.0) is a major update of the Prague Czech-English Dependency Treebank 1.0 (LDC2004T25). It is a manually parsed Czech-English parallel corpus sized over 1.2 million running words in almost 50,000 sentences for each part. Data The English part contains the entire Penn Treebank - Wall Street Journal Section (LDC99T42). The Czech part consists of Czech translations of all of the Penn Treebank-WSJ texts. The corpus is 1:1 ...

Manually Classified Errors in Cs->Sk Translation

Galuščáková, Petra; Bojar, Ondřej (2012)
Publisher: Charles University in Prague, UFAL
Projects: EC | EUROMATRIXPLUS (231720)
Embargo end date: 2012/05/15
Manual classification of errors of Czech-Slovak translation according to the classification introduced by Vilar et al. [1]. First 50 sentences from WMT 2010 test set were translated by 5 MT systems (Česílko, Česílko2, Google Translate and two Moses setups) and MT errors were manually marked and classified. Classification was applied in MT systems comparison [3]. Reference translation is included. References: [1] David Vilar, Jia Xu, Luis Fernando D’Haro and Hermann Ney. Error Analysi...

WMT16 Tuning Shared Task Models (Czech-to-English)

Kamran, Amir; Jawaid, Bushra; Bojar, Ondřej; Stanojevic, Milos (2016)
Publisher: Charles University in Prague, UFAL
Projects: EC | QT21 (645452)
Embargo end date: 2016/03/22
The item contains models to tune for the WMT16 Tuning shared task for Czech-to-English. CzEng 1.6pre (http://ufal.mff.cuni.cz/czeng/czeng16pre) corpus is used for the training of the translation models. The data is tokenized (using Moses tokenizer), lowercased and sentences longer than 60 words and shorter than 4 words are removed before training. Alignment is done using fast_align (https://github.com/clab/fast_align) and the standard Moses pipeline is used for training. Two 5-gram la...

Czech-Slovak Parallel Corpus

Galuščáková, Petra; Garabík, Radovan; Bojar, Ondřej (2012)
Publisher: Charles University in Prague, UFAL
Projects: EC | EUROMATRIXPLUS (231720)
Embargo end date: 2012/05/15
Czech-Slovak parallel corpus consisting of several freely available corpora (Acquis [1], Europarl [2], Official Journal of the European Union [3] and part of OPUS corpus [4] – EMEA, EUConst, KDE4 and PHP) and downloaded website of European Commission [5]. Corpus is published in both in plaintext format and with an automatic morphological annotation. References: [1] http://langtech.jrc.it/JRC-Acquis.html/ [2] http://www.statmt.org/europarl/ [3] http://apertium.eu/data [4] ht...

WMT17 Quality Estimation Shared Task Training and Development Data

SPECIA Lucia; Logacheva, Varvara (2017)
Publisher: University of Sheffield
Projects: EC | QT21 (645452)
Embargo end date: 2017/02/27
Training and development data for the WMT17 QE task. Test data will be published as a separate item. This shared task will build on its previous five editions to further examine automatic methods for estimating the quality of machine translation output at run-time, without relying on reference translations. We include word-level, phrase-level and sentence-level estimation. All tasks will make use of a large dataset produced from post-editions by professional translators. The data will be ...

MSTperl parser (2015-05-19)

Rosa, Rudolf (2015)
Publisher: Charles University in Prague, UFAL
Projects: EC | FAUST (247762)
Embargo end date: 2015/05/19
MSTperl is a Perl reimplementation of the MST parser of Ryan McDonald (http://www.seas.upenn.edu/~strctlrn/MSTParser/MSTParser.html). MST parser (Maximum Spanning Tree parser) is a state-of-the-art natural language dependency parser -- a tool that takes a sentence and returns its dependency tree. In MSTperl, only some functionality was implemented; the limitations include the following: the parser is a non-projective one, curently with no possibility of enforcing the requirement of...

WMT16 Tuning Shared Task Models (English-to-Czech)

Kamran, Amir; Jawaid, Bushra; Bojar, Ondřej; Stanojevic, Milos (2016)
Publisher: Charles University in Prague, UFAL
Projects: EC | QT21 (645452)
Embargo end date: 2016/03/22
This item contains models to tune for the WMT16 Tuning shared task for English-to-Czech. CzEng 1.6pre (http://ufal.mff.cuni.cz/czeng/czeng16pre) corpus is used for the training of the translation models. The data is tokenized (using Moses tokenizer), lowercased and sentences longer than 60 words and shorter than 4 words are removed before training. Alignment is done using fast_align (https://github.com/clab/fast_align) and the standard Moses pipeline is used for training. Two 5-gram l...

Khresmoi Summary Translation Test Data 1.1

Dušek, Ondřej; Hajič, Jan; Hlaváčová, Jaroslava; Pecina, Pavel; Tamchyna, Aleš; Urešová, Zdeňka (2014)
Publisher: Charles University in Prague, UFAL
Projects: EC | KHRESMOI (257528)
Embargo end date: 2014/04/28
This package contains data sets for development and testing of machine translation of sentences from summaries of medical articles between Czech, English, French, and German.