ELRA ELRA
  Home Catalogue
Language Resources
Bug reports
Send us your bug reports.
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Languages
Anglais Français
Informations
  • Purchase procedure & Conditions

  • Pricing & user licences

  • How to promote your resources ?

  • Contact Us
  • Catalogue of Language Resources

    ELRA releases free Language Resources.


    The ELRA Catalogue of Language Resources offers a repository of Language Resources (LRs) made available through ELRA.


    (See full-size image)

    An increasing number of LRs in the various fields of Human Language Technology (see image on the left-hand side) are distributed on behalf of ELRA via its operational body ELDA, thanks to the contribution of various players of the HLT community.

    Our aim is to provide Language Resources, by means of this repository, so as to prevent researchers and developers from investing efforts to rebuild resources which already exist as well as help them identify and access those resources.

    Other resources identified, but not available through ELRA, can be viewed in the Universal Catalogue.

    If you have any suggestions or comments, or need any further details about ELRA and its Catalogue of Language Resources, please refer to the contact us section.

    ELRA is a partner of OLAC (Open Language Archives Community). The catalogue can be viewed as an OLAC repository.

    New Resources
  • ELRA-S0402 : Speaking atlas of the regional languages of France
    The Speaking atlas of the regional
    languages of France offers the same
    Aesop’s fable read in French and in a
    number of varieties of languages of
    France. This work, which has a
    scientific and heritage dimension,
    consists in highlighting the linguistic
    diversity of Metropolitan France and
    Overseas Territories, through recordings
    collected in the field and presented via
    an interactive map, with their
    orthographic transcription. As far as
    Occitan is concerned, about sixty
    varieties were collected in Gascony,
    Languedoc, Provence, northern Occitania
    and the Linguistic Crescent. Varieties
    of Basque, Breton, Franconian, West
    Flemish, Alsatian, Corsican, Catalan,
    Francoprovençal and Oïl language(s) are
    also provided, as well as about fifty
    languages in the French Overseas and
    non-territorial languages such as
    Rromani and the French sign language.

  • ELRA-S0399 : GlobalPhone Multilingual Model Package
    The GlobalPhone Multilingual Model
    Package contains about 22 hours of
    transcribed read speech spoken by native
    speakers in 22 languages (Arabic,
    Bulgarian, Chinese-Mandarin,
    Chinese-Shanghai, Croatian, Czech,
    French, German, Hausa, Japanese, Korean,
    Polish, Portuguese (Brazilian), Russian,
    Spanish (Latin America), Swahili,
    Swedish, Tamil, Thai, Turkish,
    Ukrainian, and Vietnamese). The
    GlobalPhone Multilingual Model Package
    covers about 1 hour of transcribed
    speech from 10 speakers (5 male, 5
    female) from each of the above listed 22
    languages.

  • ELRA-S0400 : GlobalPhone 2000 Speaker Package
    The GlobalPhone 2000 Speaker Package
    contains transcribed read speech spoken
    by 2000 native speakers in 22 languages
    (Arabic, Bulgarian, Chinese-Mandarin,
    Chinese-Shanghai, Croatian, Czech,
    French, German, Hausa, Japanese, Korean,
    Polish, Portuguese (Brazilian), Russian,
    Spanish (Latin America), Swahili,
    Swedish, Tamil, Thai, Turkish,
    Ukrainian, and Vietnamese). The
    GlobalPhone 2000 Speaker Package covers
    about 9,000 randomly selected utterances
    read by 2000 native speakers in 22
    languages, i.e. on average 4.5
    utterances corresponding to 40 seconds
    of speech per speaker amounting to a
    total of 22 hours of speech.

  • ELRA-S0401 : Persian Audio Dictionary
    This dictionary consists of more than
    50,000 entries (along with almost all
    wordforms and proper names) with
    corresponding audio files in MP3 and
    English transliterations. The words have
    been recorded with standard Persian
    (Farsi) pronunciation (all by a single
    speaker). This dictionary is provided
    with its software.

  • ELRA-W0127 : Normalized Arabic Fragments for Inestimable Stemming (NAFIS)
    This is an Arabic stemming gold standard
    corpus composed by a collection of 37
    sentences, selected to be representative
    of Arabic stemming tasks and manually
    annotated. Compiled sentences belong to
    various sources (poems, holy Quran,
    books, and periodics) of diversified
    kinds (proverb and dictum, article
    commentary, religious text, literature,
    historical fiction). NAFIS is
    represented according to the TEI
    standard.

  • (last update: November 2018)

    Copyright © 2008 ELRA
    ELRACatalogue 0.8.0