We propose a method for acquiring knowledge from a single corpus on correspondences between abbreviations and their original words.This is an improvement of our previous method so that higher precision is attained for the same recall.This knowledge is useful for such tasks as information retrieval, word sense disambiguation and summarization.Our method searches “abbreviation candidates” and “original word candidates” corresponding to the abbreviation candidates by using information of characters composing them.Then, in order to decide a correspondence between an abbreviation and its original word, the similarity between the abbreviation candidate and the original word candidate is calculated by using statistical information in the single corpus.For example, a correspomdemce betweemabbreviatiom “
gempatsu (a muclear power statiom)” amd origimal word “
genshiryoku hatsudensho (a nuclear power station)” is extracted by our method.Here, our method does not presume that information whether each noun in the corpus is an abbreviation or an original word is given.Experimental results show that our method is promising, as the precision attains 73.4%.We compare our method with our previous method and experimental results suggest that our method is able to extract correspondences between abbreviations and original words more appropriately than our previous method.
View full abstract