JSAI Technical Report, Type 2 SIG
Online ISSN : 2436-5556
Volume 2009 , Issue SWO-020
The 20th SIG-SWO
Showing 1-15 articles out of 15 articles from the selected issue
  • Atsushi FUJII, Akihiko SANJOUBA
    Type: SIG paper
    2009 Volume 2009 Issue SWO-020 Pages 01-
    Published: January 22, 2009
    Released: September 17, 2021
    RESEARCH REPORT / TECHNICAL REPORT OPEN ACCESS

    Reflecting the rapid growth of science, technology, and culture, it has become common practice to consult tools on the World Wide Web for various terms. Existing search engines provide an enormous volume of information, but retrieved information is not organized. Handcompiled encyclopedias provide organized information, but the quantity of information is limited. To integrate the advantages of both tools, we have been proposing methods for encyclopedic search targeting information on the Web and patent information. In this paper, we propose a method to categorize multiple expository texts for a single term based on viewpoints. Because viewpoints required for explanation are different depending on the type of a term, such as animals and diseases, it is difficult to manually produce a large scale system. We use Wikipedia to extract a prototype of a viewpoint structure for each term type. We also use articles in Wikipedia for a machine learning method, which categorizes a given text into an appropriate viewpoint. We evaluate the effectiveness of our method experimentally.

    Download PDF (987K)
  • Tatsuya MORI, Hidetaka MASUDA, Yoji KIYOTA, Hiroshi NAKAGAWA
    Type: SIG paper
    2009 Volume 2009 Issue SWO-020 Pages 02-
    Published: January 22, 2009
    Released: September 17, 2021
    RESEARCH REPORT / TECHNICAL REPORT OPEN ACCESS
  • Takuya TEJIMA, Shinya SAKURAI, Takeshi MORITA, Noriaki IZUMI, Takahira ...
    Type: SIG paper
    2009 Volume 2009 Issue SWO-020 Pages 03-
    Published: January 22, 2009
    Released: September 17, 2021
    RESEARCH REPORT / TECHNICAL REPORT OPEN ACCESS
  • Takafumi NAKANISHI, Koji ZETTSU, Yutaka KIDAWARA, Yasushi KIYOKI
    Type: SIG paper
    2009 Volume 2009 Issue SWO-020 Pages 04-
    Published: January 22, 2009
    Released: September 17, 2021
    RESEARCH REPORT / TECHNICAL REPORT OPEN ACCESS

    This paper presents an interconnection method for heterogeneous knowledge bases by utilizing relation information extracted from Wikipedia. Recently, the number of users who employ search engines for not only retrieving Web pages but also understanding or learning an arbitrary concept has been increasing. It is difficult to understand and learn an arbitrary concept by using most of the current search engines. To understand or learn an arbitrary concept thoroughly, it is necessary to easily determine the various relationships between the heterogeneous knowledge bases. We consider one of the method for interconnection of knowledge bases on heterogeneous fields is using Wikipedia resources.

    Download PDF (620K)
  • Masahiro ITO, Kotaro NAKAYAMA, Takahiro HARA, Shojiro NISHIO
    Type: SIG paper
    2009 Volume 2009 Issue SWO-020 Pages 05-
    Published: January 22, 2009
    Released: September 17, 2021
    RESEARCH REPORT / TECHNICAL REPORT OPEN ACCESS

    Wikipedia, a huge scale Web based encyclopedia, attracts great attention as an invaluable corpus for knowledge extraction because it has various impressive characteristics such as a huge number of articles, live updates, a dense link structure, brief anchor texts and URL identification for concepts. We have already proved that we can use Wikipedia to construct a huge scale accurate association thesaurus. The association thesaurus we constructed covers almost 1.3 million concepts and its accuracy is proved in detailed experiments. In this paper, we introduce our project for constructing a high quality association thesaurus from Wikipedia

    Download PDF (526K)
  • Yuka YAMAZAKI, Takaichi ITO, Takashi IBA, Kenji KUMASAKA
    Type: SIG paper
    2009 Volume 2009 Issue SWO-020 Pages 06-
    Published: January 22, 2009
    Released: September 17, 2021
    RESEARCH REPORT / TECHNICAL REPORT OPEN ACCESS

    The purpose of this study is to show a law of Japanese Wikipedia's growth by analyzing rules and diversity. For this purpose, we analyze the distributions of the frequencies of hyperlinks. First, on Wikipedia's whole data, there are the same distributions of hyperlinks' frequencies on anytime every year. It means that Wikipedia keeps the distributions' rule, although it grows dramatically in number of articles. Second, on data of each categories, there are a variety of hyperlinks' distributions. It means that Wikipedia makes the diversity of its growth in each categories, although it keeps the distributions' rule as a whole. In conclusion, the coexistence of the rule of whole data's distributions and the diversity of each categories' distributions can be regarded as a law of Wikipedia's growth. This study suggests a new viewpoint of the construction of knowledge on digital media through the analysis of Wikipedia as a growing encyclopedia.

    Download PDF (1154K)
  • Satoshi ITOH, Takaichi ITO, Kenji KUMASAKA, Takashi IBA
    Type: SIG paper
    2009 Volume 2009 Issue SWO-020 Pages 07-
    Published: January 22, 2009
    Released: September 17, 2021
    RESEARCH REPORT / TECHNICAL REPORT OPEN ACCESS

    The purpose of this paper is to clarify the creation process in mass collaboration. In order to accomplish this purpose, we analyzed how editors write articles in Japanese Wikipedia from 2002 to the end of June 2008 using several data logs of "Featured Articles". First, we classified editors into three types; "originator", "writer", and "corrector". As a result, we found that most of editors work as corrector. Second, we visualized the interactions among editors as networks, and classified them into four types. The results imply that Japanese Wikipedia has variety of editors' roles in common and the diversity of interaction structure between editors.

    Download PDF (860K)
  • Yohei NODA, Yoji KIYOTA, Hiroshi NAKAGAWA
    Type: SIG paper
    2009 Volume 2009 Issue SWO-020 Pages 08-
    Published: January 22, 2009
    Released: September 17, 2021
    RESEARCH REPORT / TECHNICAL REPORT OPEN ACCESS

    Articles in Wikipedia are classified from a lot of standpoints by the category system of Wikipedia. Using this property, we can detect unexpected information from Wikipedia articles. For example, the article "Taro Aso" belongs not only to the category "Prime Ministers of Japan", but also to the category "Olympic shooters of Japan". In this study, we focus on such relations of categories, and processed the graph network of Wikipedia categories statistically. Finally, we detected unexpected information by using the results of statistical processing.

    Download PDF (241K)
  • Yu SUZUKI, Keitaku KANEMOTO, Kyoji KAWAGOE
    Type: SIG paper
    2009 Volume 2009 Issue SWO-020 Pages 09-
    Published: January 22, 2009
    Released: September 17, 2021
    RESEARCH REPORT / TECHNICAL REPORT OPEN ACCESS

    In this paper, we propose an editor-based credibility calculation method for articles inWikipedia. Wikipedia is the encyclopedia which can edit by everyone who access the Wikipedia. Therefore, when an author edit uncertain or uncredible description to an article, the edited article is uncredible. We assume that the credibilities of articles are based on the credibility of the authors. This means that uncredible authors frequently write uncredible descriptions, whereas credible authors frequently write credible descriptionns. We assume that uncredible articles are suddenly edit by another credible authors for correcting descriptions. Then, the remain ratio of articles by authors should depends on the credibility of authors. In our experimental evaluation, we confirmed that our proporsed method performs better accuracy.

    Download PDF (935K)
  • Jian CHEN, Roman Y. SHTYKH, Qun JIN
    Type: SIG paper
    2009 Volume 2009 Issue SWO-020 Pages 10-
    Published: January 22, 2009
    Released: September 17, 2021
    RESEARCH REPORT / TECHNICAL REPORT OPEN ACCESS

    In this study, we present an information recommendation model based on a set of concept classes that are extracted from Wikipedia categories and pages. The indices of all the pages are organized so that they represent concepts. Using this information, data representing the users' access behavior are collected and categorized according to the concept classes. The proposed model is then established by analyzing the preprocessed data in terms of short, medium, long periods, and calculating the probabilities corresponding to each concept.

    Download PDF (221K)
  • Kazuki TANAKA, Noboru SUGAMURA
    Type: SIG paper
    2009 Volume 2009 Issue SWO-020 Pages 11-
    Published: January 22, 2009
    Released: September 17, 2021
    RESEARCH REPORT / TECHNICAL REPORT OPEN ACCESS
  • Mariko KAWABA, Hiroyuki NAKASAKI, Takehito UTSURO, Tomohiro FUKUHARA
    Type: SIG paper
    2009 Volume 2009 Issue SWO-020 Pages 12-
    Published: January 22, 2009
    Released: September 17, 2021
    RESEARCH REPORT / TECHNICAL REPORT OPEN ACCESS

    This paper studies how to estimate distribution of topics in Japanese Blogosphere, where about 300,000 Wikipedia entries are used for representing a hierarchy of topics. First, in order to estimate whether there exists at least one blog feed closely related to a given topic, we use the number of hits of the topic keyword in the blogosphere. We empirically examine the range of the number of hits and conclude that the range should be 10,000 ∼ 500,000. According to our manual evaluation of this range, about 70% of Wikipedia entries can be linked to at least one blog feed, which partially justifies our claim. Then, we apply SVMs to the task of judging whether, given a topic, each of blog feeds is closely related to the given topic. Based on the learned SVMs model, we further automatically judge whether there exists at least one blog feed closely related to a given topic. Finally, we study how to discover Wikipedia categories with Wikipedia entries, where more than 30 ∼ 40% of them can be linked to blog feeds closely related to the corresponding topic.

    Download PDF (513K)
  • Kentaro HORI, Tetsuya OISHI, Tsunenori MINE, Ryuzo HASEGAWA, Hiroshi F ...
    Type: SIG paper
    2009 Volume 2009 Issue SWO-020 Pages 13-
    Published: January 22, 2009
    Released: September 17, 2021
    RESEARCH REPORT / TECHNICAL REPORT OPEN ACCESS

    This paper proposes a web retrieval system with extended queries generated from the contents of Wikipedia.By using the extended queries, we aim to support user's retrieval and knowledge acquisition. To extract extended query items, we make much of hyperlinks in Wikipedia in addition to the related word extraction algorithm.We evaluated the system through experimental use of it by several examinees and the questionnaires to them.Experimental results show that our system works well for user's retrieval and knowledge acquisition.

    Download PDF (309K)
  • Koji EGUCHI, Hitohiro SHIOZAKI
    Type: SIG paper
    2009 Volume 2009 Issue SWO-020 Pages 14-
    Published: January 22, 2009
    Released: September 17, 2021
    RESEARCH REPORT / TECHNICAL REPORT OPEN ACCESS

    Very recently, topic model-based retrieval methods have produced good results using Latent Dirichlet Allocation (LDA) model or its variants in language modeling framework. However, for the task of retrieving annotated documents, LDA-based methods cannot directly make use of multiple attribute types that are specified by the annotations. In this paper, we explore a new retrieval method using a multitype topic model that can directly handle multiple word types, such as annotated entities, category labels and other words that are typically used in Wikipedia. We investigate how to effectively apply the multitype topic model to retrieve documents from a typeannotated collection, and then show that our proposed method significantly outperforms several state-of-the-art methods through experiments in the task of entity ranking using a Wikipedia collection.

    Download PDF (372K)
  • [in Japanese], [in Japanese], [in Japanese], [in Japanese]
    Type: SIG paper
    2009 Volume 2009 Issue SWO-020 Pages 15-
    Published: January 22, 2009
    Released: September 17, 2021
    RESEARCH REPORT / TECHNICAL REPORT OPEN ACCESS

    Interlanguage-links (ILLs) among Wikipedias are one of important multilingual resources. In this paper, we describe (1) an analysis results of ILLs among Chinese, Japanese, Korean, and English (CJKE) Wikipedias, (2) evaluation results of ILLs using traditional dictionaries, (3) hierarchic analysis about Category-links of Wikipedias, (4) our cross-lingual keyword navigation system using ILLs.

    Download PDF (640K)
feedback
Top