ELRA ELRA
  Home Catalogue
Language Resources
Bug reports
Send us your bug reports.
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Languages
Anglais Français
Informations
  • Purchase procedure & Conditions

  • Pricing & user licences

  • How to promote your resources ?

  • Contact Us
  • Catalogue of Language Resources

    ELRA releases free Language Resources.


    The ELRA Catalogue of Language Resources offers a repository of Language Resources (LRs) made available through ELRA.


    (See full-size image)

    An increasing number of LRs in the various fields of Human Language Technology (see image on the left-hand side) are distributed on behalf of ELRA via its operational body ELDA, thanks to the contribution of various players of the HLT community.

    Our aim is to provide Language Resources, by means of this repository, so as to prevent researchers and developers from investing efforts to rebuild resources which already exist as well as help them identify and access those resources.

    Other resources identified, but not available through ELRA, can be viewed in the Universal Catalogue.

    If you have any suggestions or comments, or need any further details about ELRA and its Catalogue of Language Resources, please refer to the contact us section.

    ELRA is a partner of OLAC (Open Language Archives Community). The catalogue can be viewed as an OLAC repository.

    New Resources
  • ELRA-S0399 : GlobalPhone Multilingual Model Package
    The GlobalPhone Multilingual Model
    Package contains about 22 hours of
    transcribed read speech spoken by native
    speakers in 22 languages (Arabic,
    Bulgarian, Chinese-Mandarin,
    Chinese-Shanghai, Croatian, Czech,
    French, German, Hausa, Japanese, Korean,
    Polish, Portuguese (Brazilian), Russian,
    Spanish (Latin America), Swahili,
    Swedish, Tamil, Thai, Turkish,
    Ukrainian, and Vietnamese). The
    GlobalPhone Multilingual Model Package
    covers about 1 hour of transcribed
    speech from 10 speakers (5 male, 5
    female) from each of the above listed 22
    languages.

  • ELRA-S0400 : GlobalPhone 2000 Speaker Package
    The GlobalPhone 2000 Speaker Package
    contains transcribed read speech spoken
    by 2000 native speakers in 22 languages
    (Arabic, Bulgarian, Chinese-Mandarin,
    Chinese-Shanghai, Croatian, Czech,
    French, German, Hausa, Japanese, Korean,
    Polish, Portuguese (Brazilian), Russian,
    Spanish (Latin America), Swahili,
    Swedish, Tamil, Thai, Turkish,
    Ukrainian, and Vietnamese). The
    GlobalPhone 2000 Speaker Package covers
    about 9,000 randomly selected utterances
    read by 2000 native speakers in 22
    languages, i.e. on average 4.5
    utterances corresponding to 40 seconds
    of speech per speaker amounting to a
    total of 22 hours of speech.

  • ELRA-S0401 : Persian Audio Dictionary
    This dictionary consists of more than
    50,000 entries (along with almost all
    wordforms and proper names) with
    corresponding audio files in MP3 and
    English transliterations. The words have
    been recorded with standard Persian
    (Farsi) pronunciation (all by a single
    speaker). This dictionary is provided
    with its software.

  • ELRA-W0127 : Normalized Arabic Fragments for Inestimable Stemming (NAFIS)
    This is an Arabic stemming gold standard
    corpus composed by a collection of 37
    sentences, selected to be representative
    of Arabic stemming tasks and manually
    annotated. Compiled sentences belong to
    various sources (poems, holy Quran,
    books, and periodics) of diversified
    kinds (proverb and dictum, article
    commentary, religious text, literature,
    historical fiction). NAFIS is
    represented according to the TEI
    standard.

  • ELRA-W0126 : Training and test data for Arabizi detection and transliteration
    The dataset is composed of : a
    collection of mixed English and Arabizi
    text intended to train and test a system
    for the automatic detection of
    code-switching in mixed English and
    Arabizi texts ; and a set of 3,452
    Arabizi tokens manually transliterated
    into Arabic, intended to train and test
    a system that performs Arabizi to Arabic
    transliteration

  • (last update: October 2018)

    Copyright © 2008 ELRA
    ELRACatalogue 0.8.0