Kennen Sie schon … CoDeRooMor?

Im Rahmen der Språkbanken wurde eine morphologische Datenbank veröffentlicht, in der über 16.000 Wörter aus zwei Korpora anhand ihrer morphologischen Bestandteile annotiert wurden:

The CoDeRooMor dataset (version 1.0) contains 16 230 lemgrams generated from COCTAILL (course book corpus) and SweLL-pilot (learner essay corpus) to represent vocabulary relevant for learners of Swedish as a second language, and hypothetically containing most frequent vocabulary in Swedish. The lemgrams in CoDeRooMor have been manually analysed for roots, prefixes, suffixes, infixes/binding morphemes (sv: fogemorfem) and other morpheme types, e.g. o‑är-lig: „o“ prefix, „är“ root , „lig“ suffix.

The dataset represents 4 429 unique roots, 259 unique derivational suffixes, 155 unique prefixes, 12 unique binding morphemes (infixes), and a few inflectional morphemes that have been analyzed as a part of lexicalized forms or similar.
Each lemgram has an associated word formation mechanism, such as derivation, compounding, root lexeme.
Morphological annotation scheme follows principles outlined in Swedish Academy Grammar (SAG) and SAOL/SO.

entdeckt im Nordeuropa-Blog

Schreibe einen Kommentar Antworten abbrechen