Similar to the Internet, scientific and technological communications are rapidly accelerating in the 21st century. Among many papers in scientific journals, researchers must find out proper ones by using search engines of journal databases as well as other techniques that enables "pinpoint" search. We developed such a pinpoint search system based on the social bookmark technology, entitled "Defect dat@base" (http://www.kc.tsukuba.ac.jp/div-media/defect/). This database covers the specialized research area of "Defects in Semiconductors and Semiconductor Devices", and is collecting and precisely classifying important papers in this area in cooperation with specialist members. To extend and maintain the collection by not only human specialists but also computers, we studied statistical and morphological algorithms. As a result, we could learn how to choose important papers as accurate as human specialists do. Using over 16,000 papers of physics and engineering and over 77,000 e-mail texts, we carried out a large-scale comparative studies about differences between human and computer, and reached the following conclusion: Scientific papers are not as easily selectable as we can do for other types of texts, and for the better selection, we should focus on technical terms of the relevant area as well as abstracts and title words of the relevant papers.
View full abstract