人工知能学会論文誌
Online ISSN : 1346-8030
Print ISSN : 1346-0714
ISSN-L : 1346-0714
論文
異なる例からの素性の組合せを用いたペアワイズ分類器の学習
小山 聡マニング クリストファー D.
著者情報
ジャーナル フリー

2005 年 20 巻 2 号 p. 105-116

詳細
抄録

We propose a kernel method for using combinations of features across example pairs in learning pairwise classifiers. Pairwise classifiers, which identify whether two examples belong to the same class or not, are important components in duplicate detection, entity matching, and other clustering applications. Existing methods for learning pairwise classifiers from labeled training data are based on string edit distance or common features between two examples. However, if two examples from the same class have few common features, these methods have difficulties in finding these pairs and achieving high recall. One typical example is to check whether two abbreviated author names in different citations refer to the same person or not. Since similarities between examples from the same class become close to zero, classifiers fail to distinguish positive pairs from negative pairs. One approach to avoiding the problem of zero similarities is using conjunctions of different features across examples, but implementing this idea straightforwardly makes the computational cost prohibitive for practical problems. Using a kernel on pair instances, our method can use feature conjunctions across examples without actually doing feature mappings, which are computationally expensive. The kernel is a tensor product of two inner products on the original feature space. The corresponding feature mapping generates conjunctions of features only across the two different examples while that of the conventional polynomial kernel also generates conjunctions of features from the same example, which are irrelevant to pairwise classification and cause deterioration of accuracy. Our experiments on the author matching problem show that this method can give a precision 4 to 8 times higher than that of previous methods at medium recall levels.

著者関連情報
© 2005 JSAI (The Japanese Society for Artificial Intelligence)
前の記事 次の記事
feedback
Top