LOGIN TO YOUR ACCOUNT

Username
Password
Remember Me
Or use your Academic/Social account:

CREATE AN ACCOUNT

Or use your Academic/Social account:

Congratulations!

You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.

Important!

Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message

CREATE AN ACCOUNT

Name:
Username:
Password:
Verify Password:
E-mail:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1

Search filters

Refine by

Publication Year

2012 (8)
2016 (8)
2017 (7)
2014 (5)
2015 (5)
View more
Publication Year

2012 (8)
2016 (8)
2017 (7)
2014 (5)
2015 (5)
2013 (3)
2011 (2)
2010 (1)

Access Mode

Type

Dataset (31)
Software (8)

Language

39 research data, page 1 of 4

WMT17 Quality Estimation Shared Task Training and Development Data

SPECIA, Lucia; Logacheva, Varvara (2017)
Publisher: University of Sheffield
Projects: EC | QT21 (645452)
Embargo end date: 2017/02/27
Training and development data for the WMT17 QE task. Test data will be published as a separate item. This shared task will build on its previous five editions to further examine automatic methods for estimating the quality of machine translation output at run-time, without relying on reference translations. We include word-level, phrase-level and sentence-level estimation. All tasks will make use of a large dataset produced from post-editions by professional translators. The data will be ...

Depfix: Automatic Post-editing of SMT

Rosa, Rudolf (2015)
Publisher: Charles University in Prague, UFAL
Projects: EC | QTLEAP (610516)
Embargo end date: 2015/01/29
Depfix, a tool for Automatic Post-editing of SMT. See the project website for more information.

Additional German-Czech reference translations of the WMT'11 test set

Additional three Czech reference translations of the whole WMT 2011 data set (http://www.statmt.org/wmt11/test.tgz), translated from the German originals. Original segmentation of the WMT 2011 data is preserved.

WMT16 Tuning Shared Task Models (English-to-Czech)

Kamran, Amir; Jawaid, Bushra; Bojar, Ondřej; Stanojevic, Milos (2016)
Publisher: Charles University in Prague, UFAL
Projects: EC | QT21 (645452)
Embargo end date: 2016/03/22
This item contains models to tune for the WMT16 Tuning shared task for English-to-Czech. CzEng 1.6pre (http://ufal.mff.cuni.cz/czeng/czeng16pre) corpus is used for the training of the translation models. The data is tokenized (using Moses tokenizer), lowercased and sentences longer than 60 words and shorter than 4 words are removed before training. Alignment is done using fast_align (https://github.com/clab/fast_align) and the standard Moses pipeline is used for training. Two 5-gram l...

Prague Czech-English Dependency Treebank 2.0 Coref

Nedoluzhko, Anna; Novák, Michal; Cinková, Silvie; Mikulová, Marie; Mírovský, Jiří (2016)
Publisher: Charles University in Prague, UFAL
Projects: EC | QTLEAP (610516)
Embargo end date: 2016/03/30
The Prague Czech-English Dependency Treebank 2.0 Coref (PCEDT 2.0 Coref) is a parallel treebank building upon the original PCEDT 2.0 release and enriching it with the extended manual annotation of coreference, as well as with an improved automatic annotation of the coreferential expression alignment.

Urdu Monolingual Corpus

Jawaid, Bushra; Kamran, Amir; Bojar, Ondřej (2014)
Publisher: Charles University in Prague, UFAL
Projects: EC | MOSESCORE (288487)
Embargo end date: 2014/03/27
We release a sizeable monolingual Urdu corpus automatically tagged with part-of-speech tags. We extend the work of Jawaid and Bojar (2012) who use three different taggers and then apply a voting scheme to disambiguate among the different choices suggested by each tagger. We run this complex ensemble on a large monolingual corpus and release the both plain and tagged corpora.

QTLeap WSD/NED corpus

Agirre, Eneko; Branco, António; Popel, Martin; Simov, Kiril (2015)
Publisher: University of the Basque Country, UPV/EHU
Projects: EC | QTLEAP (610516)
Embargo end date: 2015/05/15
This corpora is part of Deliverable 5.5 of the European Commission project QTLeap FP7-ICT-2013.4.1-610516 (http://qtleap.eu). The texts are Q&A interactions from the real-user scenario (batches 1 and 2). The interactions in this corpus are available in Basque, Bulgarian, Czech, English, Portuguese and Spanish. The texts have been automatically annotated with NLP tools, including Word Sense Disambiguation, Named Entity Disambiguation and Coreference resolution. Please check deliverab...

Hindi Web Texts

Bojar, Ondřej; Straňák, Pavel; Zeman, Daniel (2011)
Publisher: Charles University in Prague, UFAL
Projects: EC | EUROMATRIXPLUS (231720)
Embargo end date: 2011/11/23
A Hindi corpus of texts downloaded mostly from news sites. Contains both the original raw texts and an extensively cleaned-up and tokenized version suitable for language modeling. 18M sentences, 308M tokens

Moses Web Demo

Bojar, Ondřej; Cífka, Ondřej; Pecina, Pavel; Tamchyna, Aleš (2014)
Publisher: Charles University in Prague, UFAL
Projects: EC | MOSESCORE (288487)
Embargo end date: 2014/10/23
An interactive web demo of selected ÚFAL MT systems.

WMT17 Quality Estimation Shared Test Data

SPECIA, Lucia; Logacheva, Varvara (2017)
Publisher: University of Sheffield
Projects: EC | QT21 (645452)
Embargo end date: 2017/04/13
Test data for the WMT17 QE task. Train data can be downloaded from http://hdl.handle.net/11372/LRT-1974 This shared task will build on its previous five editions to further examine automatic methods for estimating the quality of machine translation output at run-time, without relying on reference translations. We include word-level, phrase-level and sentence-level estimation. All tasks will make use of a large dataset produced from post-editions by professional translators. The data will ...