Home Catalogue
Language Resources
Bug reports
Send us your bug reports.
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Anglais Français
  • Purchase procedure & Conditions

  • Pricing & user licences

  • How to promote your resources ?

  • Contact Us
  • Catalogue of Language Resources

    ELRA releases free Language Resources.

    The ELRA Catalogue of Language Resources offers a repository of Language Resources (LRs) made available through ELRA.

    (See full-size image)

    An increasing number of LRs in the various fields of Human Language Technology (see image on the left-hand side) are distributed on behalf of ELRA via its operational body ELDA, thanks to the contribution of various players of the HLT community.

    Our aim is to provide Language Resources, by means of this repository, so as to prevent researchers and developers from investing efforts to rebuild resources which already exist as well as help them identify and access those resources.

    Other resources identified, but not available through ELRA, can be viewed in the Universal Catalogue.

    If you have any suggestions or comments, or need any further details about ELRA and its Catalogue of Language Resources, please refer to the contact us section.

    ELRA is a partner of OLAC (Open Language Archives Community). The catalogue can be viewed as an OLAC repository.

    New Resources
  • ELRA-W0125 : TRAD Chinese-French Parallel Text - Blog
    This is a parallel corpus of 15,809
    characters in Chinese and reference
    translations in French of 11,769 words.
    The source texts are a selection of blog

  • ELRA-L0100 : French dictionary of definitions (SYNAPSE)
    The French dictionary of definitions
    (SYNAPSE) consists of 216,835 entries
    (147,378 nouns, 80,552 adjectives,
    24,001 verbs, 4,677 adverbs, 1,560
    prefixes, 107 prepositions, 614
    interjections, 147 pronouns, 42
    conjunctions, 27 articles), 309,078
    definitions and 7,395 phraseological
    units (phrases). Grammatical information
    for each entry consists of: grammatical
    category, gender, number, inflected
    forms. This dictionary is provided in
    XML format together with its DTD.

  • ELRA-W0124 : English-Vietnamese Parallel Corpus
    This is a corpus of 500,000
    English-Vietnamese sentence pairs. The
    parallel corpus contains English
    documents translated by professional
    translators into Vietnamese. The source
    texts include books, dictionaries,
    newspapers, online news. The texts are
    provided in TEI format.

  • ELRA-S0394 : Metalogue Multi-Issue Bargaining Dialogue
    This corpus consists of approximately
    2.5 hours of semantically annotated
    English dialogue data that includes
    speech and transcripts. Six unique
    subjects (undergraduates between 19 and
    25 years of age) participated in the
    collection. The dialogue speech was
    captured with two headset microphones
    and saved in 16kHz, 16-bit mono linear
    PCM FLAC format. Transcripts were
    produced semi-automatically, using an
    automatic speech recognizer followed by
    manual correction. All text is presented
    in UTF-8 as either plain text or XML.

  • ELRA-S0395 : Nautilus Speaker Characterization (NSC) Corpus
    This corpus comprises clean microphone
    recordings of conversational speech from
    300 German speakers (126 males and 174
    females) aged 18 to 35 years, with no
    marked dialect/accent. The recordings
    were performed in an
    acoustically-isolated room in 2016/2017.
    Four scripted and four semi-spontaneous
    dialogs were elicited from the speakers,
    simulating telephone call inquiries.
    Additionally, spontaneous neutral and
    emotional speech utterances and
    questions were produced. All labels are
    provided, together with the speech
    recordings and the speakers' metadata.

  • (last update: May 2018)

    Copyright © 2008 ELRA
    ELRACatalogue 0.8.0