The leipzig corpora collection monolingual corpora of standard size

Publication

AuthorsBiemann, ChrisHeyer, Gerhard

Year2007

Abstract

We describe the Leipzig Corpora collection (LCC), a freely available resource for corpora and corpus statistics covering more than 20 languages at the time being. Unified format and easy accessibility encourage incorporation of the data into many projects and render the collection a useful resource especially in multilingual settings and for small languages. The preparation of monolingual corpora of standard sizes from different sources (web, newspaper, Wikipedia) is described in detail.

The leipzig corpora collection monolingual corpora of standard size

Publication

Abstract

Extracted information

Edits / History