Remember Me
Or use your Academic/Social account:


Or use your Academic/Social account:


You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.


Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message


Verify Password:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Biblio at Institute of Formal and Applied Linguistics
Institutional Repository
273 Publications
OpenAIRE 3.0 (OA, funding)
More information
Detailed data provider information (OpenDOAR)


  • Representing Layered and Structured Data in the CoNLL-ST Format

    In this paper, we investigate the CoNLL Shared Task format, its properties and possibility of its use for complex annotations. We argue that, perhaps despite the original intent, it is one of the most important current formats for syntactically annotated data. We show the limits of the CoNLL-ST data format in its current form and propose several simple enhancements that push those limits further and make the format more robust and future proof. We analyse several different linguistic ...

    Yet Another Language Identifier

    Language identification of written text has been studied for several decades. Despite this fact, most of the research is focused on a few most spoken languages, whereas the minor ones are ignored. The identification of a larger number of languages brings new difficulties that do not occur for a few languages. These difficulties are causing decreased accuracy. The objective of this paper is to investigate the sources of such degradation. In order to isolate the impact of individual factors, 5 ...

    Twitter Crowd Translation -- Design and Objectives

    The paper describes the design and implementation of a system for human and machine translation of tweets. In these early experiments, we limit the system to follow some selected sources from Ukraine (the source language is primarily Ukrainian and Russian, sometimes English).

    A corpus-based finite-state morphological toolkit for contemporary Arabic

    We develop an open-source large-scale finite-state morphological processing toolkit (AraComLex) for Modern StandardArabic (MSA) distributed under the GPLv3 license (http://aracomlex.sourceforge.net). The morphological transducer is based on a lexical database specifically constructed for this purpose. In contrast to previous resources, the database is tuned to MSA, eliminating lexical entries no longer attested in contemporary use. The database is built using a corpus of 1,089,111,204 word toke...

    English-Slovak Parallel Corpus

    English-Slovak parallel corpus consisting of several freely available corpora. Corpus is given in both in plaintext format and with an automatic morphological annotation
  • No data provider research data found
  • Latest Documents Timeline

    Chart is loading... It may take a bit of time. Please be patient and don't reload the page.

    Document Types

    Chart is loading... It may take a bit of time. Please be patient and don't reload the page.

    Funders in data provider publications

    Chart is loading... It may take a bit of time. Please be patient and don't reload the page.

    Projects with most Publications

    Chart is loading... It may take a bit of time. Please be patient and don't reload the page.

Share - Bookmark