Remember Me
Or use your Academic/Social account:


Or use your Academic/Social account:


You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.


Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message


Verify Password:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1

Search filters

Refine by


QT21 (53)
HimL (35)
View more

QT21 (53)
HimL (35)
CLARA (28)
FAUST (18)
KConnect (15)
T4ME NET (11)
CSET CNGL: Next Gen... (1)

Publication Year

2012 (48)
2016 (45)
2014 (37)
2015 (34)
2017 (30)
View more
Publication Year

2012 (48)
2016 (45)
2014 (37)
2015 (34)
2017 (30)
2011 (28)
2013 (26)
2010 (25)
2009 (16)
2018 (2)
1962 (1)

Access Mode

Document Type

Document Language

Undetermined (265)
English (27)
292 documents, page 1 of 30

Representing Layered and Structured Data in the CoNLL-ST Format

Štěpánek, Jan; Straňák, Pavel (2010)
Projects: EC | EUROMATRIXPLUS (231720)
In this paper, we investigate the CoNLL Shared Task format, its properties and possibility of its use for complex annotations. We argue that, perhaps despite the original intent, it is one of the most important current formats for syntactically annotated data. We show the limits of the CoNLL-ST data format in its current form and propose several simple enhancements that push those limits further and make the format more robust and future proof. We analyse several different linguistic ...

Khresmoi Summary Translation Test Data 1.1

Pecina, Pavel; Tamchyna, Aleš; Urešová, Zdeňka; Hlaváčová, Jaroslava; Hajič, Jan; Dušek, Ondřej (2014)
Projects: EC | KHRESMOI (257528)
This package contains data sets for development and testing of machine translation of sentences from summaries of medical articles between Czech, English, French, and German.

Selecting Data for English-to-Czech Machine Translation

Bojar, Ondřej; Tamchyna, Aleš; Kamran, Amir; Galuščáková, Petra; Stanojević, Miloš (2012)
Projects: EC | EUROMATRIXPLUS (231720)
We provide a few insights on data selection for machine translation. We evaluate the quality of the new CzEng 1.0, a parallel data source used in WMT12. We describe a simple technique for reducing out-of-vocabulary rate after phrase extraction. We discuss the benefits of tuning towards multiple reference translations for English-Czech language pair. We introduce a novel approach to data selection by full-text indexing and search: we select sentences similar to th...

CUNI System for WMT17 Automatic Post-Editing Task

Variš, Dušan; Bojar, Ondřej (2017)
Projects: EC | HimL (644402), EC | QT21 (645452)
Following upon the last year's CUNI system for automatic post-editing of machine translation output, we focus on exploiting the potential of sequence-to-sequence neural models for this task. In this system description paper, we compare several encoder-decoder architectures on a smaller-scale models and present the system we submitted to WMT 2017 Automatic Post-Editing shared task based on this preliminary comparison. We also show how simple inclusion of synthetic data can improve the overa...


Bojar, Ondřej (2017)
Projects: EC | QT21 (645452)
A summary of the development of neural MT at Charles University.

Improving a Neural-based Tagger for Multiword Expression Identification

Variš, Dušan; Klyueva, Natalia (2018)
Projects: EC | T4ME NET (249119)
In this paper, we present a set of improvements introduced to MUMULS, a tagger for the automatic detection of verbal multiword expressions. Our tagger participated in the PARSEME shared task and it was the only one based on neural networks. We show that character-level embeddings can improve the performance, mainly by reducing the out-of-vocabulary rate. Furthermore, replacing the softmax layer in the decoder by a conditional random field classifier brings additional improvements. Finally, we...

Incorporation of a valency lexicon into a TectoMT pipeline

Kuboň, Vladislav; Klyueva, Natalia (2016)
Projects: EC | QTLEAP (610516)
In this paper, we focus on the incorporation of a valency lexicon into TectoMT system for Czech-Russian language pair. We demonstrate valency errors in MT output and describe how the introduction of a lexicon influenced the translation results. Though there was no impact on BLEU score, the manual inspection of concrete cases showed some improvement.

Producing Unseen Morphological Variants in Statistical Machine Translation

Tamchyna, Aleš; Huck, Matthias; Bojar, Ondřej; Fraser, Alexander (2017)
Projects: EC | QT21 (645452), EC | HimL (644402)
Translating into morphologically rich languages is difficult. Although the coverage of lemmas may be reasonable, many morphological variants cannot be learned from the training data. We present a statistical translation system that is able to produce these inflected word forms. Different from most previous work, we do not separate morphological prediction from lexical choice into two consecutive steps. Our approach is novel in that it is integrated in decoding and takes advanta...

A Summary of Research Activities: Technologies – Demands – Gaps – Roadmaps

Hajič, Jan (2015)
Projects: EC | CRACKER (645357)
A summary of research activities in the area of machine translation in the EU (Technologies – Demands – Gaps – Roadmaps) has been presented, including contributions from multiple other EU-funded projects.

Maximum Entropy Translation Model in Dependency-Based MT Framework

Popel, Martin; Žabokrtský, Zdeněk; Mareček, David (2010)
Projects: EC | FAUST (247762), EC | EUROMATRIXPLUS (231720)
Maximum Entropy Principle has been used successfully in various NLP tasks. In this paper we propose a forward translation model consisting of a set of maximum entropy classifiers: a separate classifier is trained for each (sufficiently frequent) source-side lemma. In this way the estimates of translation probabilities can be sensitive to a large number of features derived from the source sentence (including non-local features, features making use of sentence s...