抄録
Part-of-speech tagging methodology has succeeded, but on problems that may lack real-world application. Redirection of the field is indicated, toward potentially more useful, but harder and more sophisticated tagging tasks: (1) using much more detailed tagsets (semantically and syntactically); (2) testing performance on treebanks reflecting the huge gamut of domains, etc., characterizing real-world applications; (3) understanding the magnitude of the unknown-word and unknown-tag problems, then overcoming them. Tagging results are presented on two versions of a new, highly variegated treebank, featuring tagsets of 2720 and 443 tags, respectively, and utilizing a dictionaryless, decision-tree tagger.