Abstract
This paper describes a small, structured English corpus that is designed for translation into Less Commonly Taught Languages (LCTLs), and a set of re-usable tools for creation of similar corpora. The corpus systematically explores meanings that are known to affect morphology or syntax in the world's languages. Each sentence is associated with a feature structure showing the elements of meaning that are represented in the sentence. The corpus is highly structured so that it can support machine learning with only a small amount of data. As part of the REFLEX program, the corpus will be translated into multiple LCTLs, resulting in parallel corpora can be used for training of MT and other language technologies. Only the untranslated English corpus is described in this paper.
Original language | English |
---|---|
Pages | 5-8 |
Number of pages | 4 |
State | Published - 2006 |
Event | 2006 Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, HLT-NAACL 2006 - New York, United States Duration: Jun 4 2006 → Jun 9 2006 |
Conference
Conference | 2006 Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, HLT-NAACL 2006 |
---|---|
Country/Territory | United States |
City | New York |
Period | 06/4/06 → 06/9/06 |