OpenAIRE is about to release its new face with lots of new content and services.
During September, you may notice downtime in services, while some functionalities (e.g. user registration, login, validation, claiming) will be temporarily disabled.
We apologize for the inconvenience, please stay tuned!
For further information please contact helpdesk[at]openaire.eu

fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Name
LINDAT/CLARIN repository
Type
Data Repository
Items
42 Research Data
Compatibility
OpenAIRE Data (funded, referenced datasets)
OAI-PMH
http://lindat.mff.cuni.cz/repository/oai/openaire_data
More information
Detailed data provider information (Re3data)

 

  • No data provider publications found
  • Test Data DE-EN APE Shared Task WMT17

    Turchi, Marco; Chatterjee, Rajen; Negri, Marco (2017)
    Publisher: Fondazione Bruno Kessler, Trento, Italy
    Projects: EC | QT21 (645452)
    Embargo end date: 2017/04/11
    Test data for the WMT 2017 Automatic post-editing task (the same used for the Sentence-level Quality Estimation task). They consist in German-English triplets (source and target) belonging to the pharmacological domain and already tokenized. Test set contains 2,000 pairs. All data is provided by the EU project QT21 (http://www.qt21.eu/).

    Khresmoi Query Translation Test Data 1.0

    Pecina, Pavel; Dušek, Ondřej; Hajič, Jan; Urešová, Zdeňka (2013)
    Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
    Projects: EC | KHRESMOI (257528)
    Embargo end date: 2014/04/02
    This package contains data sets for development and testing of machine translation of medical search short queries between Czech, English, French, and German. The queries come from general public and medical experts.

    WMT16 APE Shared Task Data

    Turchi, Marco; Chatterjee, Rajen; Negri, Matteo (2016)
    Publisher: Fondazione Bruno Kessler, Trento, Italy
    Projects: EC | QT21 (645452)
    Embargo end date: 2016/02/21
    Training, development and text data (the same used for the Sentence-level Quality Estimation task) consist in English-German triplets (source, target and post-edit) belonging to the IT domain and already tokenized. Training and development respectively contain 12,000 and 1,000 triplets, while the test set 2,000 instances. All data is provided by the EU project QT21 (http://www.qt21.eu/).

    WMT16 Tuning Shared Task Models (English-to-Czech)

    Kamran, Amir; Jawaid, Bushra; Bojar, Ondřej; Stanojevic, Milos (2016)
    Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
    Projects: EC | QT21 (645452)
    Embargo end date: 2016/03/22
    This item contains models to tune for the WMT16 Tuning shared task for English-to-Czech. CzEng 1.6pre (http://ufal.mff.cuni.cz/czeng/czeng16pre) corpus is used for the training of the translation models. The data is tokenized (using Moses tokenizer), lowercased and sentences longer than 60 words and shorter than 4 words are removed before training. Alignment is done using fast_align (https://github.com/clab/fast_align) and the standard Moses pipeline is used for training. Two 5-gram l...

    Prague Czech-English Dependency Treebank 2.0 Coref

    Nedoluzhko, Anna; Novák, Michal; Cinková, Silvie; Mikulová, Marie; Mírovský, Jiří (2016)
    Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
    Projects: EC | QTLEAP (610516)
    Embargo end date: 2016/03/30
    The Prague Czech-English Dependency Treebank 2.0 Coref (PCEDT 2.0 Coref) is a parallel treebank building upon the original PCEDT 2.0 release and enriching it with the extended manual annotation of coreference, as well as with an improved automatic annotation of the coreferential expression alignment.
  • Latest Documents Timeline

    Chart is loading... It may take a bit of time. Please be patient and don't reload the page.

    Document Types

    Chart is loading... It may take a bit of time. Please be patient and don't reload the page.

    Projects with most Research Data

    Chart is loading... It may take a bit of time. Please be patient and don't reload the page.

Share - Bookmark

Cookies make it easier for us to provide you with our services. With the usage of our services you permit us to use cookies.
More information Ok