ELRA ELRA
  Home Catalogue
Language Resources
Bug reports
Send us your bug reports.
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Languages
Anglais Français
Informations
  • Purchase procedure & Conditions

  • Pricing & user licences

  • How to promote your resources ?

  • Contact Us
  • Catalog Reference : ELRA-M0051
    EnToSSLNE - a Lexicon of Parallel Named Entities from English to South Slavic Languages
    This lexicon contains multiword entries which are not strictly named entities, but contain a word which is. For example, German shepherd is an entry in this lexicon, since many dogs of this breed exist. But, the adjective German makes it a named entity in a broader sense. Accordingly, there are many multiword units in the lexicon which contain ethnonyms. Similarly, the unit Planck's law belongs to this lexicon as well.

    Certain natural terms like biological species and substances, which are sometimes considered named entities, are not included in the lexicon.

    Languages
    The lexicon consists of 26,155 parallel named entities in seven languages: English and six South Slavic ones: Bosnian, Bulgarian, Croatian, Macedonian, Serbian, and Slovenian.

    Slovenian, Croatian and Bosnian are written in Latin script, Macedonian and Bulgarian in Cyrillic. Serbian language is specific since it may come in two scripts (Cyrillic and Latin) and two dialects (ekavica and ijekavica). This lexicon takes Serbian ekavica variant and its Cyrillic script.

    Classification
    The tags used for named entities are: ORGANIZATION, LOCATION, PERSON, PRODUCT and MISC. Each named entity belongs to one of these classes. The classes comprise:
    ORGANIZATION: political organizations, companies, schools, rock bands, sport teams
    LOCATION: geographical terms, fictional places, cosmic terms
    PERSON: humans, gods, saints, fictional characters
    PRODUCT: industrial products, software products, weapons, art works, documents, concepts, standards, formats, anthems, algorithms, journals, coats of arms, platforms, websites
    MISC: events, languages, peoples, tribes, alliances, orders, scientific discoveries, theories, titles, currencies, holidays, dynasties, positions, projects, historical periods, competitions, deceases, breeds, programs, set of locations, awards, musical genres, missions, artistic directions, set of organizations, networks.

    The lexicon consists of 26,155 entries. A tag is assigned to each one of them. The distribution of classes is as follows:
    ORGANIZATION: 1,575 entries
    LOCATION: 6,327 entries
    PERSON: 8,584 entries
    PRODUCT: 1,716 entries
    MISC: 7,953 entries

    Formats
    The lexicon comes in two formats: csv and xml.
    The first row in the csv file is a title row and tab is used as a field separator, eg:
    German Shepherd Nemški ovčar Njemački ovčar Njemački ovčar Немачки овчар Германски овчар Немска овчарка MISC

    In the xml file, the tag denoting the class is an attribute and languages are elements, eg:

    German Shepherd
    Nemški ovčar

    Njemački ovčar
    Njemački ovčar
    Немачки овчар
    Германски овчар
    Немска овчарка

    Technical Information
    Distribution medium : Downloadable
    Contents Click on the arrow to display content.
    written lexicon 
     
    Members Prices
    Academic - Commercial 1000.00 EUR
    Academic - Research 300.00 EUR
    Commercial - Commercial 1000.00 EUR
    Commercial - Research 1000.00 EUR
    Non Member Prices
    Academic - Commercial 2000.00 EUR
    Academic - Research 600.00 EUR
    Commercial - Commercial 2000.00 EUR
    Commercial - Research 2000.00 EUR

    Copyright © 2008 ELRA
    ELRACatalogue 0.8.0