LOGIN TO YOUR ACCOUNT

Username
Password
Remember Me
Or use your Academic/Social account:

CREATE AN ACCOUNT

Or use your Academic/Social account:

Congratulations!

You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.

Important!

Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message

CREATE AN ACCOUNT

Name:
Username:
Password:
Verify Password:
E-mail:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1

Search filters

Refine by

Project

EUROMATRIXPLUS (77)
QT21 (54)
QTLEAP (47)
KHRESMOI (36)
HimL (36)
View more
Project

EUROMATRIXPLUS (77)
QT21 (54)
QTLEAP (47)
KHRESMOI (36)
HimL (36)
MOSESCORE (29)
CLARA (28)
FAUST (18)
KConnect (15)
T4ME NET (11)
CRACKER (7)
DASMT (2)
ABU-MATRAN (1)
EUDAT (1)
PANACEA (1)
SUMMA (1)
CSET CNGL: Next Gen... (1)

Publication Year

2012 (48)
2016 (44)
2014 (37)
2015 (35)
2017 (32)
View more
Publication Year

2012 (48)
2016 (44)
2014 (37)
2015 (35)
2017 (32)
2011 (28)
2013 (26)
2010 (24)
2009 (17)
2018 (2)

Access Mode

Document Type

Document Language

Undetermined (272)
English (21)
293 documents, page 1 of 30

Representing Layered and Structured Data in the CoNLL-ST Format

Štěpánek, Jan; Straňák, Pavel (2010)
Projects: EC | EUROMATRIXPLUS (231720)
In this paper, we investigate the CoNLL Shared Task format, its properties and possibility of its use for complex annotations. We argue that, perhaps despite the original intent, it is one of the most important current formats for syntactically annotated data. We show the limits of the CoNLL-ST data format in its current form and propose several simple enhancements that push those limits further and make the format more robust and future proof. We analyse several different linguistic ...

HamleDT - HArmonized Multi-LanguagE Dependency Treebank

Ramasamy, Loganathan; Hajič, Jan; Žabokrtský, Zdeněk; Štěpánek, Jan; Popel, Martin; Zeman, Daniel; Mareček, David (2011)
Projects: EC | EUROMATRIXPLUS (231720), EC | T4ME NET (249119)
HamleDT is a set of software tools for conversion (hamonizing) of treebanks, which were created for different languages and based on various linguistic formalisms, into the same annotation framework. The main aim is to facilitate development of multilingual technology. HamleDT covers as many as 28 languages in its current version.

Selecting Data for English-to-Czech Machine Translation

Bojar, Ondřej; Tamchyna, Aleš; Kamran, Amir; Galuščáková, Petra; Stanojević, Miloš (2012)
Projects: EC | EUROMATRIXPLUS (231720)
We provide a few insights on data selection for machine translation. We evaluate the quality of the new CzEng 1.0, a parallel data source used in WMT12. We describe a simple technique for reducing out-of-vocabulary rate after phrase extraction. We discuss the benefits of tuning towards multiple reference translations for English-Czech language pair. We introduce a novel approach to data selection by full-text indexing and search: we select sentences similar to th...

Automatic Source Code Reduction

Diviš, Jiří; Bojar, Ondřej (2010)
Projects: EC | EUROMATRIXPLUS (231720)
The aim of this paper is to introduce Reductor, a program that automatically removes unused parts of the source code of valid programs written in the Mercury language. Reductor implements two main kinds of reductions: statical reduction and dynamical reduction. In the statical reduction, Reductor exploits semantic analysis of the Melbourne Mercury Compiler to nd routines which can be removed from the program. Dynamical reduction of routines additionally uses Mercury Deep Profiler and some ...

Khresmoi Summary Translation Test Data 1.1

Pecina, Pavel; Tamchyna, Aleš; Urešová, Zdeňka; Hlaváčová, Jaroslava; Hajič, Jan; Dušek, Ondřej (2014)
Projects: EC | KHRESMOI (257528)
This package contains data sets for development and testing of machine translation of sentences from summaries of medical articles between Czech, English, French, and German.

CUNI System for WMT17 Automatic Post-Editing Task

Variš, Dušan; Bojar, Ondřej (2017)
Projects: EC | HimL (644402), EC | QT21 (645452)
Following upon the last year's CUNI system for automatic post-editing of machine translation output, we focus on exploiting the potential of sequence-to-sequence neural models for this task. In this system description paper, we compare several encoder-decoder architectures on a smaller-scale models and present the system we submitted to WMT 2017 Automatic Post-Editing shared task based on this preliminary comparison. We also show how simple inclusion of synthetic data can improve the overa...

An Exploration of Word Embedding Initialization in Deep-Learning Tasks

Kocmi, Tom; Bojar, Ondřej (2017)
Projects: EC | HimL (644402), EC | QT21 (645452)
Word embeddings are the interface between the world of discrete units of text processing and the continuous, differentiable world of neural networks. In this work, we examine various random and pretrained initialization methods for embeddings used in deep networks and their effect on the performance on four NLP tasks with both recurrent and convolutional architectures. We confirm that pretrained embeddings are a little better than random initialization, especi...

Česko-slovenský paralelný korpus

Garabík, Radovan; Galuščáková, Petra; Bojar, Ondřej (2012)
Projects: EC | EUROMATRIXPLUS (231720)
Czech-Slovak parallel corpus consisting of several freely available corpora. Corpus is given in both plaintext format and with an automatic morphological annotation.

Probes in a Taxonomy of Factored Phrase-Based Models

Kamran, Amir; Jawaid, Bushra; Bojar, Ondřej (2012)
Projects: EC | EUROMATRIXPLUS (231720)
We introduce a taxonomy of factored phrase based translation scenarios and conduct a range of experiments in this taxonomy. We point out several common pitfalls when designing factored setups. The paper also describes our WMT12 submissions CU-BOJAR and CU-POOR-COMB.

Statistical Machine Translation between Related and Unrelated Languages

Kolovratník, David; Bojar, Ondřej; Klyueva, Natalia (2009)
Projects: EC | EUROMATRIXPLUS (231720)
In this paper we describe an attempt to compare how relatedness of languages can influence the performance of statistical machine translation (SMT). We apply the Moses toolkit on the Czech-English-Russian corpus UMC 0.1 in order to train two translation systems: Russian-Czech and English-Czech. The quality of the translation is evaluated on an independent test set of 1000 sentences parallel in all three languages using an automatic metric (BLEU score) as well as manual judgm...