The nearest neighbor algorithm has been one of the most basic class of techniques in the field of Pattern classification. It is also the most basic and important technique in the fields such as case-based reasoning (CBR), memory-based reasoning (MBR), and example-based natural language Processing (EBNLP). In the nearest neighbor algorithm, the computational cost of example retrieval is one of the most important issues, especially when the number of examples in the database becomes large. In the field of pattern classification, there exist several techniques for reducing the computational cost of the nearest neighbor algorithm, while in other fields such as CBR, MBR, and EBNLP, there has been no technique except for the one using massively parallel computers. This paper proposes a novel technique for optimizing the nearest neighbor algorithm, especially for the use in CBR, MBR, and EBNLP. The basic idea is to use similarity calculation template, a data structure that enumerates all the possible patterns of calculating similarity between two examples. In the method, the nearest neighbor retrieval process is optimized by generating retrieval queries from an input and similarity calculation templates in a certain order. Its major advantages are as follows : 1) it can be implemented without any expensive hardwave such as massively parallel computers, 2)it is easy to add new examples to the example database. Experimental results show that nearly constant time nearest neighbor retrieval is achieved, independently of the number of examples in the database.
抄録全体を表示