Abstract
This paper describes a system for extracting named entities. The system is based on a ME (maximum entropy) model and transformation rules. Eight types of named entities are defined by IREX-NE, and each named entity consists of one or more morphemes, or it includes a substring of a morpheme. We define 40 named entity labels, which are at the beginning, the middle, or the end of a named entity, and extract a named entity which consists of one or more morphemes by estimating the labels according to the ME model. The trained ME model detects the relationship between features and named entity labels assigned to morphemes. The features are clues used for estimating labels. We use information about lexical items and parts-of-speech as features in the target morpheme. We also use information about lexical items and parts-of-speech in four morphemes, two on the left and two on the right of the target morpheme, as features. After estimating the named entity labels according to the ME model, we extract a named entity, which includes a substring of a morpheme, by using transformation rules. These rules are automatically acquired by investigating the difference between named entity labels in a tagged corpus and those extracted by our system from the same corpus without tags. This paper also evaluates the relationships between transformation rules and accuracy, between features and accuracy, and between the amount of training data and accuracy by conducting several comparative experiments.