ELRA ELRA
  Home Catalogue
Language Resources
Bug reports
Send us your bug reports.
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Languages
Anglais Français
Informations
  • Purchase procedure & Conditions

  • Pricing & user licences

  • How to promote your resources ?

  • Contact Us
  • R&D Catalogue of Language Resources R&D Catalogue of Language Resources

    Considering the needs expressed by several academic institutions of the Human Language Technology field, ELDA is pleased to offer access to a version of its Catalogue of Language Resources dedicated to academic research. Indeed, at various occasions, while discussing with the players of the R&D academic community, we concluded to the importance to allow an easy and fast access to a list of resources more specifically produced for R&D purposes in Human Language Technology.

    Thus, we now provide a list of Language Resources, available at very affordable prices, and dedicated to a research use. So as to facilitate the access to this list, we preserved the interface and browsing tools of the ELDA catalogue. Of course, at any time, you may choose to return to the full version of the catalogue. Very soon, we will also implement an advanced search which will allow you to browse through our catalogue thanks to pre-defined selection criteria, such as the type of resources or the prices available (and many more criteria).

    Like the full version of the catalogue, the language resources available here are distributed into 4 categories : "Speech and Related Resources", "Written Resources", "Terminological Resources", and "Multimodal/Multimedia Resources".

    1/ Spoken LRs

    a - Telephone recordings
    The databases catalogued in this section have been produced with speaker recordings made over the telephone (fixed or mobile) network, or through a microphone. You will find speech resources recorded in various environments, and covering a large number of European and non-European languages, e.g. the databases produced in the framework of the SpeechDat project.

    b - Desktop/Microphone recordings
    The databases catalogued in this section have been produced with speaker recordings made over a microphone, e.g. the databases produced in the framework of the BABEL project databases.

    c - Broadcast Resources
    The databases catalogued in this section have been produced with speaker recordings made over radio, television or internet, such as the Italian Broadcast News Corpus.

    d - Speech Related Resources
    You will find in this section pronunciation and phonetic lexicons, such as BDLEX, PHONOLEX, and MHATLEX databases.

    2/ Written LRs

    a - Corpora
    This section contains monolingual and multilingual corpora, parallel or not, which may also be annotated. A few examples of the kind of resources you will find in this section are e.g. the corpora developed in the framework of the MULTEXT project, the Multilingual and Parallel Corpora (MLCC), French scientific corpora, newspaper corpora in Arabic, etc.

    b - Monolingual lexicons
    The section dedicated to monolingual lexicons contains various types of dictionaries, e.g. a dictionary of French verbs, the Japanese word dictionary, some PAROLE lexicons in many languages, etc.

    c - Multilingual lexicons
    Here you can find either bilingual or multilingual dictionaries and lexicons, such as the EuroWordNet databases.

    3/ Terminological LRs

    Monolingual, bilingual and multilingual terminological databases are available. They cover a large number of specialised domains, e.g. automobile engineering, insurance, linguistics, finance, etc., in a wide variety of languages.

    4/ Multimodal/Multimedia LRs

    The resources you will find in this section have been produced using different modalities, including the speech. An example of such resources is the database produced in the framework of the M2VTS project.


    LATEST UPDATES :

    New Resources
  • ELRA-T0378 : English-Persian database of idioms and expressions
    This database consists of about 30,000
    bilingual parallel sentences and phrases
    in English and Persian (15,000 in each
    language). It comes with a software
    through which the users can search a
    word, phrase or chunk and receive all
    idioms and expressions related to the
    query. The database is presented in
    Access format and the software is
    executable on Windows systems.

  • ELRA-S0406 : Glissando-sp
    Glissando-sp includes more than 12 hours
    of speech in Spanish, recorded under
    optimal acoustic conditions,
    orthographically transcribed,
    phonetically aligned and annotated with
    prosodic information (location of the
    stressed syllables and prosodic
    phrasing). The corpus was recorded by 8
    professional speakers and 20
    non-professional speakers: 4 “news
    broadcaster” professional speakers (2
    male and 2 female), 4 “advertising”
    professional speakers (2 male and 2
    female), and 20 non-professional
    speakers (10 male and 10 female).
    Glissando-sp is made of three
    subcorpora: readings of real news texts
    (provided by “Cadena Ser” radio
    station), interactions between two
    speakers oriented to a specific goal in
    the domain of information requests, and
    conversations between people who have
    some degree of familiarity with each
    other.

  • ELRA-T0380 : English-Persian terminology database of management and economics
    This bilingual terminology consists of
    around 15,000 terms in the field of
    management and economics sciences. It
    comes with a software through which the
    users can search a word, phrase or chunk
    and receive all entries related to the
    query. The main database of the software
    is presented in Access format and the
    software itself is executable on Windows
    systems.

  • ELRA-S0407 : Glissando-ca
    Glissando-ca includes more than 12 hours
    of speech in Catalan, recorded under
    optimal acoustic conditions,
    orthographically transcribed,
    phonetically aligned and annotated with
    prosodic information (location of the
    stressed syllables and prosodic
    phrasing). The corpus was recorded by 8
    professional speakers and 20
    non-professional speakers: 4 “news
    broadcaster” professional speakers (2
    male and 2 female), 4 “advertising”
    professional speakers (2 male and 2
    female), and 20 non-professional
    speakers (10 male and 10 female).
    Glissando-ca is made of three
    subcorpora: readings of real news texts
    (provided by “Cadena Ser” radio
    station), interactions between two
    speakers oriented to a specific goal in
    the domain of information requests, and
    conversations between people who have
    some degree of familiarity with each
    other.

  • ELRA-T0379 : English-Persian terminology database of computer and IT
    This bilingual terminology consists of
    around 25,000 terms in the field of
    computer engineering, computer sciences
    and information technology. It comes
    with a software through which the users
    can search a word, phrase or chunk and
    receive all entries related to the
    query. The database is presented in
    Access format and the software is
    executable on Windows systems.

  • (last update: July 2019)

    Copyright © 2006 ELRA
    ELRACatalogue R&D 0.8.0