Transactions of the Japanese Society for Artificial Intelligence
Online ISSN : 1346-8030
Print ISSN : 1346-0714
ISSN-L : 1346-0714
Volume 23, Issue 5
Displaying 1-9 of 9 articles from this issue
Special Issue: Informatiion Compilation: foundations and possibilities
Technical Papers
  • Ichiro Ide, Tomoyoshi Kinoshita, Tomokazu Takahashi, Hiroshi Mo, Norio ...
    2008 Volume 23 Issue 5 Pages 282-292
    Published: 2008
    Released on J-STAGE: June 03, 2008
    JOURNAL FREE ACCESS
    Recent increase of digital storage capacity has enabled the creation of large-scale on-line broadcast video archives. In order to make full use of the data in the archive, it is necessary to let a user easily grasp the availability of certain video data and their contents. Considering this problem, we have been investigating efficient and effective retrieval and reusing methodologies of archived video data. The archive used as a test-bed consists of more than 1,000 hours of news video obtained from a Japanese news program during the past six years. This paper first proposes a news topic tracking and structuring method. A structure called the `topic thread structure', is organized so that it should represent the temporal flow of news topics originating from a specified news story. The paper next introduces a browsing and editing interface that enables the user to browse through news stories along the topic thread structure, and also assists the compilation of selected news stories as a customized video summary or a documentary. The method was applied to the archived news video data in order to observe the quality of the topic thread structure and the usability of the prototype interface. As a result, some structures represented the flow of topics quite close to real-world comprehension. In addition, experiments showed that when the structure could be considered meaningful, the interface combined with the structure could drastically reduce the time needed to browse through the archive for news stories related to the user's interest.
    Download PDF (1294K)
  • Tsuyoshi Murata, Tomoyuki Ikeya
    2008 Volume 23 Issue 5 Pages 293-302
    Published: 2008
    Released on J-STAGE: June 03, 2008
    JOURNAL FREE ACCESS
    Visualizing and analyzing social interactions of CGM (Consumer Generated Media) are important for understanding overall activities on the internet. Social interactions are often represented as simple networks that are composed of homogeneous nodes and edges between them. However, related entities in real world are often not homogeneous. Such relations are naturally represented as heterogeneous networks composed of more than one kind of nodes and edges connecting them. In the case of CGM, for example, users and their contents constitute nodes of heterogeneous networks. There are related users (user communities) and related contents (contents communities) in the heterogeneous networks. Discovering both communities and finding correspondence among them will clarify the characteristics of the communites. This paper describes an attempt for visualizing and analyzing social interactions of Yahoo! Chiebukuro (Japanese Yahoo! Answers). New criteria for measuring correspondence between user communities and board communites are defined, and characteristics of both communities are analyzed using the criteria.
    Download PDF (1692K)
  • Tsunenori ISHIOKA
    2008 Volume 23 Issue 5 Pages 303-309
    Published: 2008
    Released on J-STAGE: June 03, 2008
    JOURNAL FREE ACCESS
    To more accurately assess the logical structure of Japanese essays, I have devised a technique that uses end-of-sentence modality and demonstrative pronouns referencing earlier paragraphs as new indicators of structure in addition to conjunctive expressions which have hitherto often used for Japanese as well as for European languages. It is hoped that this will yield better results because conjunctive expressions are intentionally avoided in Japanese. I applied this technique to the editorial and commentary (Yoroku) columns of the Mainichi Shimbun newspaper and used it to represent the structure and development of the arguments made by these articles in the form of constellation diagrams which are used in the field of statistics. As a result, I found that this graph is useful in that it enables the overall distribution to be ascertained, and allows the temporal changes in the logical structure of the data in question to be ascertained.
    Download PDF (601K)
  • Tatsunori MORI, Atsushi FUJIOKA, Ichiro MURATA
    2008 Volume 23 Issue 5 Pages 310-318
    Published: 2008
    Released on J-STAGE: June 03, 2008
    JOURNAL FREE ACCESS
    In order to summarize trend information in document and visualize it, we have to have a method to automatically extract statistical information from document. In this paper, we investigate automated extraction of statistical information, especially, expressions of name of statistical information. First, we classify those expressions into three categories, namely, the action type, the attribute type, and the definition type. Second, the internal structures of them are examined. According to the internal structures, we defined an XML tag set to annotate each part of names of statistical information. As a feasibility study of automated extraction of them, we conducted an experiment in which parts of names of statistics are extracted by using a standard chunking algorithm. The experimental result shows that the parts of names of statistics defined by the tag set can be extracted with good accuracy in the case that we can prepare a training corpus of the domain similar to target documents. On the other hand, the extraction accuracy will be degraded when we cannot prepare such a training corpus.
    Download PDF (1037K)
  • Ken-ichi Fukui, Kazumi Saito, Masahiro Kimura, Masayuki Numao
    2008 Volume 23 Issue 5 Pages 319-329
    Published: 2008
    Released on J-STAGE: June 03, 2008
    JOURNAL FREE ACCESS
    We have been developing a neural network-based approach for visual information compilation. We have extended the Self-Organizing Map (SOM) model by introducing a sequencing weight function into the neuron topology, called Sequence-based SOM (SbSOM). SbSOM visualizes the dynamics of various clusters such as their generation or extinction, convergence or divergence, and merging or division. By utilizing the neuron topology and the neighborhood function of SOM, SbSOM can mitigate the problems associated to the conventional sliding-window method. We clarified a target problem class of SbSOM and confirmed the basic properties of this proposed method using a two-dimensional simulated sequential dataset. Moreover, our experiment using a dataset of real-world news articles indicates that topic transition can indeed be seen from the acquired map. Visualization of cluster sequential changes aids in the comprehension of such phenomena which come useful in various domains such as fault diagnosis and medical check-up, among others.
    Download PDF (1171K)
  • Kazuki Takegawa, Yoshinori Hijikata, Shogo Nishida
    2008 Volume 23 Issue 5 Pages 330-343
    Published: 2008
    Released on J-STAGE: June 03, 2008
    JOURNAL FREE ACCESS
    Recently, the turn volume of music data on the Internet has increased rapidly. This has increased the user's cost to find music data suiting their preference from such a large data set. We propose a content-based music search and recommendation system. This system has an interface for searching and finding music data and an interface for editing a user profile which is necessary for music recommendation. By exploiting the visualization of the feature space of music and the visualization of the user profile, the user can search music data and edit the user profile. Furthermore, by exploiting the infomation which can be acquired from each visualized object in a mutually complementary manner, we make it easier for the user to search music data and edit the user profile. Concretely, the system gives to the user an information obtained from the user profile when searching music data and an information obtained from the feature space of music when editing the user profile.
    Download PDF (1929K)
Regular
Technical Papers
  • Shin Ando, Einoshin Suzuki
    2008 Volume 23 Issue 5 Pages 344-354
    Published: 2008
    Released on J-STAGE: July 03, 2008
    JOURNAL FREE ACCESS
    Identifying atypical objects is one of the classic tasks in machine learning. Recent works, e.g., One-class Clustering and Minority Detection, have explored the task further to identify clusters of atypical objects which strongly contrast from the rest of the dataset. In such problems, avoiding false positive detection is an important yet significantly difficult issue. In this paper, we propose an information theoretic clustering which aims to compactly represent the global and local structures of the dataset and identify atypical clusters in terms of information geometric distance. The former objective contributes to reducing the number of false positive detections. Its formalization further yields a unifying view of the classic outlier detection and the novel tasks. We present a scalable algorithm for detecting multiple clusters of atypical objects without a pre-defined number of clusters. The algorithm is evaluated as an unsupervised two-class classification using simulated datasets and a text classification benchmark.
    Download PDF (1668K)
  • Tadahiko Kumamoto, Katsumi Tanaka
    2008 Volume 23 Issue 5 Pages 355-363
    Published: 2008
    Released on J-STAGE: July 15, 2008
    JOURNAL FREE ACCESS
    This paper proposes a Web retrieval system that accurately and exhaustively collects the web pages which are related to a user-specified topic from the Web. When users entered a character string as a query into our proposed system, the system lexically paraphrases and expands the character string. Consequently, the system can present more topic-related web pages than conventional search engines do. First, our proposed system extracts nouns, adjectives, verbs, and katakana characters as target words from the query or character string which users entered, obtains candidate words for paraphrasing the target words based on information retrieval on the Web, and tests validity of their paraphrasing using two kinds of co-occurrence dictionaries. Then, the system expands the initial query by replacing zero or more of the target words with the candidate words that were determined to be valid. A distinctive point of the system is that it uses not only a co-occurrence dictionary that describes ``preceding,'' ``following,'' and ``predicate'' relationships between words but also an impression dictionary that describes co-occurrence relationships between words and two contrasting sets of impression words for the validity test. We also evaluated performance of the proposed system on paraphrasing and information retrieval on the Web using seven sample queries. As a result, its effectiveness was proved.
    Download PDF (267K)
  • Norihito YASUDA, Hiroyuki TODA
    2008 Volume 23 Issue 5 Pages 364-373
    Published: 2008
    Released on J-STAGE: July 18, 2008
    JOURNAL FREE ACCESS
    Geographic information retrieval (GIR) is a new research area that aims at the retrieval of geographic-related documents based not only on keyword relevance but also on geographic relationships between the query and the geographic information in texts. It is natural for people to want information related to just their surroundings. Conventional GIR systems, however, have relatively poor granularity, such as city or province, because they use geographic information in restricted ways -- mostly just for filtering. To address this problem, we propose a geographic scoring method that considers extent implied by each geographic names appeared in texts to emphasize geographic names that focus specific areas, rather than broad geographic names. Furthermore, to improve robustness against errors in pre-processing such as geo-parsing and geo-coding, we also propose a noise elimination method based on clustering. Evaluation is conducted using standard TREC-style evaluation metrics including MAP, R-precision, and so on. The results show that our method outperforms two baseline approaches: full-text search and using the nearest point in the text.
    Download PDF (720K)
feedback
Top