Reinventing Part-Of-Speech Tagging

Ezra Black; Stephen Eubank; Hideki Kashioka; David Magerman; Jared Saia; Akira Ushioda

doi:10.5715/jnlp.5.3

Ezra Black, Stephen Eubank, Hideki Kashioka, David Magerman, Jared Saia, Akira Ushioda

著者情報

キーワード: corpus-based language modeling, part-of-speech tagging

ジャーナルフリー

1998 年 5 巻 1 号 p. 3-23

DOI https://doi.org/10.5715/jnlp.5.3

詳細

抄録

Part-of-speech tagging methodology has succeeded, but on problems that may lack real-world application. Redirection of the field is indicated, toward potentially more useful, but harder and more sophisticated tagging tasks: (1) using much more detailed tagsets (semantically and syntactically); (2) testing performance on treebanks reflecting the huge gamut of domains, etc., characterizing real-world applications; (3) understanding the magnitude of the unknown-word and unknown-tag problems, then overcoming them. Tagging results are presented on two versions of a new, highly variegated treebank, featuring tagsets of 2720 and 443 tags, respectively, and utilizing a dictionaryless, decision-tree tagger.

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）