JSAI Technical Report, Type 2 SIG

Modeling Term Descriptions Using Wikipedia and its Application to Encyclopedic Search

Atsushi FUJII, Akihiko SANJOUBA

Article type: SIG paper
2009Volume 2009Issue SWO-020 Pages 01-
Published: January 22, 2009
Released on J-STAGE: September 17, 2021

DOIhttps://doi.org/10.11517/jsaisigtwo.2009.SWO-020_01

RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

Show abstractHide abstract

Reflecting the rapid growth of science, technology, and culture, it has become common practice to consult tools on the World Wide Web for various terms. Existing search engines provide an enormous volume of information, but retrieved information is not organized. Handcompiled encyclopedias provide organized information, but the quantity of information is limited. To integrate the advantages of both tools, we have been proposing methods for encyclopedic search targeting information on the Web and patent information. In this paper, we propose a method to categorize multiple expository texts for a single term based on viewpoints. Because viewpoints required for explanation are different depending on the type of a term, such as animals and diseases, it is difficult to manually produce a large scale system. We use Wikipedia to extract a prototype of a viewpoint structure for each term type. We also use articles in Wikipedia for a machine learning method, which categorizes a given text into an appropriate viewpoint. We evaluate the effectiveness of our method experimentally.

View full abstract

Download PDF (987K)
Wik-IE: A tool to extract structure of Wikipedia entries

Tatsuya MORI, Hidetaka MASUDA, Yoji KIYOTA, Hiroshi NAKAGAWA

Article type: SIG paper
2009Volume 2009Issue SWO-020 Pages 02-
Published: January 22, 2009
Released on J-STAGE: September 17, 2021

DOIhttps://doi.org/10.11517/jsaisigtwo.2009.SWO-020_02

RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

Show abstractHide abstract

[in Japanese]

View full abstract

Download PDF (130K)
A Domain Ontology Development Environment with Wikipedia and Folksonomy Tags

Takuya TEJIMA, Shinya SAKURAI, Takeshi MORITA, Noriaki IZUMI, Takahira ...

Article type: SIG paper
2009Volume 2009Issue SWO-020 Pages 03-
Published: January 22, 2009
Released on J-STAGE: September 17, 2021

DOIhttps://doi.org/10.11517/jsaisigtwo.2009.SWO-020_03

RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

Show abstractHide abstract

[in Japanese]

View full abstract

Download PDF (1125K)
An Interconnection Method for Heterogeneous Knowledge Bases by Utilizing Wikipedia

Takafumi NAKANISHI, Koji ZETTSU, Yutaka KIDAWARA, Yasushi KIYOKI

Article type: SIG paper
2009Volume 2009Issue SWO-020 Pages 04-
Published: January 22, 2009
Released on J-STAGE: September 17, 2021

DOIhttps://doi.org/10.11517/jsaisigtwo.2009.SWO-020_04

RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

Show abstractHide abstract

This paper presents an interconnection method for heterogeneous knowledge bases by utilizing relation information extracted from Wikipedia. Recently, the number of users who employ search engines for not only retrieving Web pages but also understanding or learning an arbitrary concept has been increasing. It is difficult to understand and learn an arbitrary concept by using most of the current search engines. To understand or learn an arbitrary concept thoroughly, it is necessary to easily determine the various relationships between the heterogeneous knowledge bases. We consider one of the method for interconnection of knowledge bases on heterogeneous fields is using Wikipedia resources.

View full abstract

Download PDF (620K)
Our Association Thesaurus Construction Project from Wikipedia

Masahiro ITO, Kotaro NAKAYAMA, Takahiro HARA, Shojiro NISHIO

Article type: SIG paper
2009Volume 2009Issue SWO-020 Pages 05-
Published: January 22, 2009
Released on J-STAGE: September 17, 2021

DOIhttps://doi.org/10.11517/jsaisigtwo.2009.SWO-020_05

RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

Show abstractHide abstract

Wikipedia, a huge scale Web based encyclopedia, attracts great attention as an invaluable corpus for knowledge extraction because it has various impressive characteristics such as a huge number of articles, live updates, a dense link structure, brief anchor texts and URL identification for concepts. We have already proved that we can use Wikipedia to construct a huge scale accurate association thesaurus. The association thesaurus we constructed covers almost 1.3 million concepts and its accuracy is proved in detailed experiments. In this paper, we introduce our project for constructing a high quality association thesaurus from Wikipedia

View full abstract

Download PDF (526K)
Rules and Diversity of Wikipedia's Growth

Yuka YAMAZAKI, Takaichi ITO, Takashi IBA, Kenji KUMASAKA

Article type: SIG paper
2009Volume 2009Issue SWO-020 Pages 06-
Published: January 22, 2009
Released on J-STAGE: September 17, 2021

DOIhttps://doi.org/10.11517/jsaisigtwo.2009.SWO-020_06

RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

Show abstractHide abstract

The purpose of this study is to show a law of Japanese Wikipedia's growth by analyzing rules and diversity. For this purpose, we analyze the distributions of the frequencies of hyperlinks. First, on Wikipedia's whole data, there are the same distributions of hyperlinks' frequencies on anytime every year. It means that Wikipedia keeps the distributions' rule, although it grows dramatically in number of articles. Second, on data of each categories, there are a variety of hyperlinks' distributions. It means that Wikipedia makes the diversity of its growth in each categories, although it keeps the distributions' rule as a whole. In conclusion, the coexistence of the rule of whole data's distributions and the diversity of each categories' distributions can be regarded as a law of Wikipedia's growth. This study suggests a new viewpoint of the construction of knowledge on digital media through the analysis of Wikipedia as a growing encyclopedia.

View full abstract

Download PDF (1154K)
A Study on Creation Process in Mass Collaboration

Satoshi ITOH, Takaichi ITO, Kenji KUMASAKA, Takashi IBA

Article type: SIG paper
2009Volume 2009Issue SWO-020 Pages 07-
Published: January 22, 2009
Released on J-STAGE: September 17, 2021

DOIhttps://doi.org/10.11517/jsaisigtwo.2009.SWO-020_07

RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

Show abstractHide abstract

The purpose of this paper is to clarify the creation process in mass collaboration. In order to accomplish this purpose, we analyzed how editors write articles in Japanese Wikipedia from 2002 to the end of June 2008 using several data logs of "Featured Articles". First, we classified editors into three types; "originator", "writer", and "corrector". As a result, we found that most of editors work as corrector. Second, we visualized the interactions among editors as networks, and classified them into four types. The results imply that Japanese Wikipedia has variety of editors' roles in common and the diversity of interaction structure between editors.

View full abstract

Download PDF (860K)
The Analysis of Wikipedia categories for detecting unexpected knowledge

Yohei NODA, Yoji KIYOTA, Hiroshi NAKAGAWA

Article type: SIG paper
2009Volume 2009Issue SWO-020 Pages 08-
Published: January 22, 2009
Released on J-STAGE: September 17, 2021

DOIhttps://doi.org/10.11517/jsaisigtwo.2009.SWO-020_08

RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

Show abstractHide abstract

Articles in Wikipedia are classified from a lot of standpoints by the category system of Wikipedia. Using this property, we can detect unexpected information from Wikipedia articles. For example, the article "Taro Aso" belongs not only to the category "Prime Ministers of Japan", but also to the category "Olympic shooters of Japan". In this study, we focus on such relations of categories, and processed the graph network of Wikipedia categories statistically. Finally, we detected unexpected information by using the results of statistical processing.

View full abstract

Download PDF (241K)
A Reliability Measure of Articles in Wikipedia based on Edit History

Yu SUZUKI, Keitaku KANEMOTO, Kyoji KAWAGOE

Article type: SIG paper
2009Volume 2009Issue SWO-020 Pages 09-
Published: January 22, 2009
Released on J-STAGE: September 17, 2021

DOIhttps://doi.org/10.11517/jsaisigtwo.2009.SWO-020_09

RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

Show abstractHide abstract

In this paper, we propose an editor-based credibility calculation method for articles inWikipedia. Wikipedia is the encyclopedia which can edit by everyone who access the Wikipedia. Therefore, when an author edit uncertain or uncredible description to an article, the edited article is uncredible. We assume that the credibilities of articles are based on the credibility of the authors. This means that uncredible authors frequently write uncredible descriptions, whereas credible authors frequently write credible descriptionns. We assume that uncredible articles are suddenly edit by another credible authors for correcting descriptions. Then, the remain ratio of articles by authors should depends on the credibility of authors. In our experimental evaluation, we confirmed that our proporsed method performs better accuracy.

View full abstract

Download PDF (935K)
An Information Recommendation Model Based on Concept Classes Extracted from Wikipedia Categories

Jian CHEN, Roman Y. SHTYKH, Qun JIN

Article type: SIG paper
2009Volume 2009Issue SWO-020 Pages 10-
Published: January 22, 2009
Released on J-STAGE: September 17, 2021

DOIhttps://doi.org/10.11517/jsaisigtwo.2009.SWO-020_10

RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

Show abstractHide abstract

In this study, we present an information recommendation model based on a set of concept classes that are extracted from Wikipedia categories and pages. The indices of all the pages are organized so that they represent concepts. Using this information, data representing the users' access behavior are collected and categorized according to the concept classes. The proposed model is then established by analyzing the preprocessed data in terms of short, medium, long periods, and calculating the probabilities corresponding to each concept.

View full abstract

Download PDF (221K)
Construction and Evaluation of Language Model for Speech Recognition using Wikipedia

Kazuki TANAKA, Noboru SUGAMURA

Article type: SIG paper
2009Volume 2009Issue SWO-020 Pages 11-
Published: January 22, 2009
Released on J-STAGE: September 17, 2021

DOIhttps://doi.org/10.11517/jsaisigtwo.2009.SWO-020_11

RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

Show abstractHide abstract

[in Japanese]

View full abstract

Download PDF (488K)
Estimating Topic Distribution of Japanese Blogsphere based on Wikipedia Topic Hierachy

Mariko KAWABA, Hiroyuki NAKASAKI, Takehito UTSURO, Tomohiro FUKUHARA

Article type: SIG paper
2009Volume 2009Issue SWO-020 Pages 12-
Published: January 22, 2009
Released on J-STAGE: September 17, 2021

DOIhttps://doi.org/10.11517/jsaisigtwo.2009.SWO-020_12

RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

Show abstractHide abstract

This paper studies how to estimate distribution of topics in Japanese Blogosphere, where about 300,000 Wikipedia entries are used for representing a hierarchy of topics. First, in order to estimate whether there exists at least one blog feed closely related to a given topic, we use the number of hits of the topic keyword in the blogosphere. We empirically examine the range of the number of hits and conclude that the range should be 10,000 ∼ 500,000. According to our manual evaluation of this range, about 70% of Wikipedia entries can be linked to at least one blog feed, which partially justifies our claim. Then, we apply SVMs to the task of judging whether, given a topic, each of blog feeds is closely related to the given topic. Based on the learned SVMs model, we further automatically judge whether there exists at least one blog feed closely related to a given topic. Finally, we study how to discover Wikipedia categories with Wikipedia entries, where more than 30 ∼ 40% of them can be linked to blog feeds closely related to the corresponding topic.

View full abstract

Download PDF (513K)
Web Retrieval with Extended Queries Generated from Wikipedia and Its Evaluation

Kentaro HORI, Tetsuya OISHI, Tsunenori MINE, Ryuzo HASEGAWA, Hiroshi F ...

Article type: SIG paper
2009Volume 2009Issue SWO-020 Pages 13-
Published: January 22, 2009
Released on J-STAGE: September 17, 2021

DOIhttps://doi.org/10.11517/jsaisigtwo.2009.SWO-020_13

RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

Show abstractHide abstract

This paper proposes a web retrieval system with extended queries generated from the contents of Wikipedia.By using the extended queries, we aim to support user's retrieval and knowledge acquisition. To extract extended query items, we make much of hyperlinks in Wikipedia in addition to the related word extraction algorithm.We evaluated the system through experimental use of it by several examinees and the questionnaires to them.Experimental results show that our system works well for user's retrieval and knowledge acquisition.

View full abstract

Download PDF (309K)
Wikipedia Retrieval using Multitype Topic Models

Koji EGUCHI, Hitohiro SHIOZAKI

Article type: SIG paper
2009Volume 2009Issue SWO-020 Pages 14-
Published: January 22, 2009
Released on J-STAGE: September 17, 2021

DOIhttps://doi.org/10.11517/jsaisigtwo.2009.SWO-020_14

RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

Show abstractHide abstract

Very recently, topic model-based retrieval methods have produced good results using Latent Dirichlet Allocation (LDA) model or its variants in language modeling framework. However, for the task of retrieving annotated documents, LDA-based methods cannot directly make use of multiple attribute types that are specified by the annotations. In this paper, we explore a new retrieval method using a multitype topic model that can directly handle multiple word types, such as annotated entities, category labels and other words that are typically used in Wikipedia. We investigate how to effectively apply the multitype topic model to retrieve documents from a typeannotated collection, and then show that our proposed method significantly outperforms several state-of-the-art methods through experiments in the task of entity ranking using a Wikipedia collection.

View full abstract

Download PDF (372K)
Crosslingual Information Access using Wikipedia: Analysis and Application of Interlanguage-links of Wikipedias

[in Japanese], [in Japanese], [in Japanese], [in Japanese]

Article type: SIG paper
2009Volume 2009Issue SWO-020 Pages 15-
Published: January 22, 2009
Released on J-STAGE: September 17, 2021

DOIhttps://doi.org/10.11517/jsaisigtwo.2009.SWO-020_15

RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

Show abstractHide abstract

Interlanguage-links (ILLs) among Wikipedias are one of important multilingual resources. In this paper, we describe (1) an analysis results of ILLs among Chinese, Japanese, Korean, and English (CJKE) Wikipedias, (2) evaluation results of ILLs using traditional dictionaries, (3) hierarchic analysis about Category-links of Wikipedias, (4) our cross-lingual keyword navigation system using ILLs.

View full abstract

Download PDF (640K)

Register with J-STAGE for free!