LOGIN TO YOUR ACCOUNT

Username
Password
Remember Me
Or use your Academic/Social account:

CREATE AN ACCOUNT

Or use your Academic/Social account:

Congratulations!

You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.

Important!

Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message

CREATE AN ACCOUNT

Name:
Username:
Password:
Verify Password:
E-mail:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1

Search filters

Refine by

Project

EUROMATRIXPLUS (77)
QTLEAP (47)
QT21 (47)
KHRESMOI (36)
HimL (33)
View more
Project

EUROMATRIXPLUS (77)
QTLEAP (47)
QT21 (47)
KHRESMOI (36)
HimL (33)
MOSESCORE (29)
CLARA (28)
FAUST (18)
KConnect (14)
T4ME NET (10)
CRACKER (7)
ABU-MATRAN (1)
EUDAT (1)
PANACEA (1)
CSET CNGL: Next Gen... (1)

Publication Year

2012 (47)
2016 (44)
2014 (36)
2015 (35)
2011 (30)
View more
Publication Year

2012 (47)
2016 (44)
2014 (36)
2015 (35)
2011 (30)
2013 (26)
2010 (25)
2017 (25)
2009 (16)

Access Mode

Document Type

Conference object (206)
Other (44)
Article (27)
Unknown (8)
Preprint (7)
View more
Document Type

Conference object (206)
Other (44)
Article (27)
Unknown (8)
Preprint (7)
Report (6)
Part of book or cha... (4)
Book (1)
Doctoral thesis (1)
Lecture (1)

Document Language

Undetermined (264)
English (20)
284 documents, page 1 of 29

TectoMT – a deep-­linguistic core of the combined Chimera MT system

Chimera is a machine translation system that combines the TectoMT deep-linguistic core with phrase-based MT system Moses. For English–Czech pair it also uses the Depfix post-correction system. All the components run on Unix/Linux platform and are open source (available from Perl repository CPAN and the LINDAT/CLARIN repository). The main website is https://ufal.mff.cuni.cz/tectomt. The development is currently supported by the QTLeap 7th FP project (http://qtleap.eu).

HamleDT - HArmonized Multi-LanguagE Dependency Treebank

HamleDT is a set of software tools for conversion (hamonizing) of treebanks, which were created for different languages and based on various linguistic formalisms, into the same annotation framework. The main aim is to facilitate development of multilingual technology. HamleDT covers as many as 28 languages in its current version.

Towards an Indonesian-English SMT System: A Case Study of an Under-Studied and Under-Resourced Language, Indonesian

This paper describes a work on preparing an Indonesian-English Statistical Machine Translation (SMT) System. It includes the creation of Indonesian morphological analyzer, MorphInd, and the composing of an Indonesian-English parallel corpus, IDENTIC. We build an SMT system using the state-of-the-art phrase-based SMT system, MOSES. We show several scenarios where the morphological tool is used to incorporate morphological information in the SMT system trained with the composed parallel corpus.

Trainable Tokenizer

Trainable Tokenizer is able to tokenize and segment most languages based on supplied configuration and sample data. The tokenizer is not aimed e.g. for Chinese with no explicit delimitation of words.

Selecting Data for English-to-Czech Machine Translation

We provide a few insights on data selection for machine translation. We evaluate the quality of the new CzEng 1.0, a parallel data source used in WMT12. We describe a simple technique for reducing out-of-vocabulary rate after phrase extraction. We discuss the benefits of tuning towards multiple reference translations for English-Czech language pair. We introduce a novel approach to data selection by full-text indexing and search: we select sentences similar to th...

Prague Czech-English Dependency Treebank 2.0 Coref

We present an extended version of the Prague Czech-English Dependency Treebank 2.0 (PCEDT 2.0). It includes all annotation of coreference (the original one from PCEDT 2.0 as well as the new one) and improved cross-lingual alignment of coreferential expressions. The corpus released as PCEDT 2.0 Coref is publicly available.

Machine Translation of Medical Text in the KConnect Project

This paper presents the work on Machine Translation (MT) that has been conducted within the KConnect project, funded under the H2020 programme and focused on development and commercialization of cloud-based services for multilingual Semantic Annotation, Semantic Search and Machine Translation of electronic health records and medical publications. We first present the main goal and role of MT in the project and then briefly describe the main methods and components developed in the project, inc...

Coreferential expressions in English and Czech

In this talk, we present a comprehensive study on mappings between certain classes of coreferential expressions in English and Czech. We focused on central pronouns, relative pronouns and anaphoric zeros. For instance, the English sentence "It switched to a caffeine-free formula using its new Coke in 1985" has been in PCEDT translated to "V roce 1985 přešla na bezkofeinovou recepturu, kterou používá pro svojí novou kolu". This pair of sentences exhibits several types of changes in expressing ...

Twitter Crowd Translation -- Design and Objectives

The paper describes the design and implementation of a system for human and machine translation of tweets. In these early experiments, we limit the system to follow some selected sources from Ukraine (the source language is primarily Ukrainian and Russian, sometimes English).

Khresmoi Summary Translation Test Data 1.1

This package contains data sets for development and testing of machine translation of sentences from summaries of medical articles between Czech, English, French, and German.