JSAI Technical Report, Type 2 SIG
Online ISSN : 2436-5556
Volume 2010, Issue DMSM-A903
The 12th SIG-DMSM
Displaying 1-22 of 22 articles from this issue
  • Kenta SUZUKI, Rei HAMAKAWA
    Article type: SIG paper
    2010Volume 2010Issue DMSM-A903 Pages 01-
    Published: March 29, 2010
    Released on J-STAGE: August 28, 2021
    RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

    This paper proposes a method to recommend the Web contents (a novel, comics) in accord with the taste of the user to a user with evaluated web content by the similarity of the review. In late years there are many studies that recommend Web contents to the user by acquiring the taste of the user. These studies show personalized information to a user to recommend the contents that matched the taste of various users. Our method supposes the taste of the user from the review of the contents that acquired from a user. Beforehand, classify the sentences of the review of contents in "a sentence related to the contents" and "a sentence to express the impression of the user who reviewed" and accumulate in the system. And our method recommends the contents that resemble taste of the user to a user by comparing "the review of contents accumulated in the system" with "the review of the contents that acquired from a user" by each classification.

    Download PDF (620K)
  • Tsukasa ISHIGAKI, Takeshi TAKENAKA, Yoichi MOTOMURA
    Article type: SIG paper
    2010Volume 2010Issue DMSM-A903 Pages 02-
    Published: March 29, 2010
    Released on J-STAGE: August 28, 2021
    RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

    In this report, we describe a knowledge discovery method from probability structure model constructed by large scale data fusion concerning a buying behavior in daily life. A latent class model is proposed in order to segment into a customer category and item category which is estimated from an ID-POS data and questionnaire data of customer's life styles and personalities. The variables which includes such category label and feature of customers and items is modeled as Bayesian network for knowledge discovery.

    Download PDF (542K)
  • Masatoshi NAKAMURA, Toshio SHIMOKAWA, Masashi GOTO
    Article type: SIG paper
    2010Volume 2010Issue DMSM-A903 Pages 03-
    Published: March 29, 2010
    Released on J-STAGE: August 28, 2021
    RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS
  • Susumu SHIRAYAMA
    Article type: SIG paper
    2010Volume 2010Issue DMSM-A903 Pages 04-
    Published: March 29, 2010
    Released on J-STAGE: August 28, 2021
    RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

    Owing to the volume of data generated in recent computations and experiments, it is quite difficult to extract useful information from these data even if using scientific/information visualization techniques. Method or methodology to extract useful information from such data should be considered. Several concepts of very large scale visualization are proposed in this situation. Most of them are based on high-performance computing techniques or highly-efficient devices for computer graphics. Although such studies have succeeded in visualizing ultra-scale data, several issues remain unsolved. In this paper, a flexible visualization methodology based on "post visualization process", which includes a human recognition process and quantitative evaluations of visualized results is introduced. Finally, a possibility that a visualization agent designed from a process model helps to reduce the difficulty of handling huge data is described.

    Download PDF (331K)
  • Hidenao ABE, Shuasku TSUMOTO
    Article type: SIG paper
    2010Volume 2010Issue DMSM-A903 Pages 05-
    Published: March 29, 2010
    Released on J-STAGE: August 28, 2021
    RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

    In this paper, we present a method to characterize given datasets based on objective rule evaluation indices and classification rule learning algorithms. For transfer learning approach, most of methods to detect the limitations use performance indices of sets of classifiers such as accuracies of classifier sets. However, those of each classifier are also useful. By considering the issue, we performed a case study to identify similarity of datasets even if the datasets have totally different attribute sets, comparing with the conventional data characterizing technique.

    Download PDF (100K)
  • Junichi KOBAYASHI, Kazuaki KOMOTO
    Article type: SIG paper
    2010Volume 2010Issue DMSM-A903 Pages 06-
    Published: March 29, 2010
    Released on J-STAGE: August 28, 2021
    RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

    Stochastic gradient boosting is a kind of the boosting methods invented by Jerome H.Friedman and it is known to be a very powerful method for making predictive models in some cases. In fact, FEG wins the second prize in KDD Cup 2009 by using this method. We survey the methodology of stochastic gradient boosting and introduce our analytical procedure in KDD Cup 2009. It is a good example where stochastic gradient boosting shows its effectiveness.

    Download PDF (319K)
  • Takanori AYANO, Joe SUZUKI
    Article type: SIG paper
    2010Volume 2010Issue DMSM-A903 Pages 07-
    Published: March 29, 2010
    Released on J-STAGE: August 28, 2021
    RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS
  • Y-h. TAGUCHI
    Article type: SIG paper
    2010Volume 2010Issue DMSM-A903 Pages 08-
    Published: March 29, 2010
    Released on J-STAGE: August 28, 2021
    RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

    Detection of genes which are differently expressed between distinct conditions is important task in bioinformatics. Recently, epigeneitic markers turn out to have more direct relatioship with phenotypes than gene expression. In this talk, we will demostrate how well epigentic marker can be used to detect difference between conditions. Espetially, using PCA is more efficient to achieve this task.

    Download PDF (478K)
  • [in Japanese]
    Article type: SIG paper
    2010Volume 2010Issue DMSM-A903 Pages 09-
    Published: March 29, 2010
    Released on J-STAGE: August 28, 2021
    RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

    We propose a learning algorithm for nonparametric estimation and on-line prediction for general stationary ergodic sources. We divide the real space R into a set A of finite subsets, transform a given sequence in R into the sequence in A to encode the latter using universal coding for finite sequences with distortion. We prepare infinitely many such A, and mixture the estimated measure to obtain a measure of sequences in R which may be either discrete or continuous. If the sequence is emitted by a stationary ergodic source, then the Kullback-Leibler information divided by the sequence length n converges to zero as n goes to infinity. In particular, for continuous sources, the method does not require existence of a probability density function. In this sense, this paper extends Ryabko's universal measure. The measure can be used for online prediction to estimate next data given the past sequence.

    Download PDF (123K)
  • Shinichi YOSHIDA, Kohei HATANO, Eiji TAKIMOTO, Masayuki TAKEDA
    Article type: SIG paper
    2010Volume 2010Issue DMSM-A903 Pages 10-
    Published: March 29, 2010
    Released on J-STAGE: August 28, 2021
    RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

    We propose online prediction algorithms for data streams whose characteristics might change over time. Our algorithms are applications of online learning with experts. In particular, our algorithms combine base predictors over sliding windows with different length as experts. As a result, our algorithms are guaranteed to be competitive with the base predictor with the best fixed-length sliding window in hindsight.

    Download PDF (488K)
  • Takafumi KANAMORI, Taiji SUZUKI, Masashi SUGIYAMA
    Article type: SIG paper
    2010Volume 2010Issue DMSM-A903 Pages 11-
    Published: March 29, 2010
    Released on J-STAGE: August 28, 2021
    RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

    Density ratio estimation has gathered a great deal of attention recently since it can be used for various data processing tasks. In this paper, we consider three methods of density ratio estimation: (A) the numerator and denominator densities are separately estimated and then the ratio of the estimated densities is computed, (B) a logistic regression classifier discriminating denominator samples from numerator samples is learned and then the ratio of the posterior probabilities is computed, and (C) the density ratio function is directly modeled and learned by minimizing the empirical Kullback-Leibler divergence. We first prove that when the numerator and denominator densities are known to be members of the exponential family, (A) is better than (B) and (B) is better than (C). Then we show that once the model assumption is violated, (C) is better than (A) and (B). Thus in practical situations where no exact model is available, (C) would be the most promising approach to density ratio estimation.

    Download PDF (183K)
  • Masanori KAWAKITA, Jun'ichi TAKEUCHI
    Article type: SIG paper
    2010Volume 2010Issue DMSM-A903 Pages 12-
    Published: March 29, 2010
    Released on J-STAGE: August 28, 2021
    RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

    This paper studies a technique to improve regression with unlabeled data. The key idea of our proposal is that the semi-supervised learning can be recasted as a regression problem under covariate shift. The weighted likelihood approach is a natural choice for estimating regression parameters under covariate shift. Literature [9] showed that the optimal choice of weight function is the ratio of labeled data density to unlabelled data density. In application of this idea to our setting, the optimal weight function is trivially taking always the value one. However, our proposal is to discard this optimal weight function and to estimate it. This is deeply related to the work by [5]. The resultant algorithm is shown to perform well by some experiments.

    Download PDF (169K)
  • Viet ANHNGUYEN, Akihiro YAMAMOTO
    Article type: SIG paper
    2010Volume 2010Issue DMSM-A903 Pages 13-
    Published: March 29, 2010
    Released on J-STAGE: August 28, 2021
    RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

    We study the problem of mining closed frequent tree patterns from tree databases that are updated regularly over time. Frequent tree mining, like frequent itemset mining, is often a very time consuming process, and thus, it is undesirable to mine from scratch when the change to the database is small. The set of previous mined patterns, which also can be considered as a description of the database, should be reused as much as possible to compute new emerging patterns. We proposed, in this paper, a novel and efficient incremental mining algorithm for closed frequent labeled ordered trees. We adopted a divide-and-conquer strategy and applied different mining techniques in different parts of the mining process. No additional scan of the whole database is needed and just a relative small amount of information from previous mining iteration has to be maintained. Our experimental study on real-life datasets demonstrates the efficiency and scalability of our algorithms.

    Download PDF (257K)
  • Keiko YAMAMOTO, Satoru HAYAMIZU, Atsuyuki KAMEYAMA, Yoshikazu UCHIYAMA ...
    Article type: SIG paper
    2010Volume 2010Issue DMSM-A903 Pages 14-
    Published: March 29, 2010
    Released on J-STAGE: August 28, 2021
    RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

    This paper describes about the problem on estimation of disease risks with a large health checkup database. The proposed method uses a naive Bayesian classifier with the extension of two dimensional kernel density estimation technique. The framework is tested by estimation of disease risks for examinee with three diseases, hypertension, diabetes and dyslipidemia. Combination of attribute interactions and naive Bayesian method shows considerable improvement in estimation experiments.

    Download PDF (96K)
  • Atsunori MINAMIKAWA, Hiroyuki YOKOYAMA
    Article type: SIG paper
    2010Volume 2010Issue DMSM-A903 Pages 15-
    Published: March 29, 2010
    Released on J-STAGE: August 28, 2021
    RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

    In this paper, we propose egogram estimation method from weblog text data. Egogram is one of the personality models which illustrate the ego states of the users. In our method, the features which is appropriate for egogram are selected using the information gain of the each word which is contained in weblog text, and estimation is performed by Multinomial Naive Bayes classifiers. We evaluate our method in some classification scenario and show its effectiveness.

    Download PDF (323K)
  • Yohji AKAMA, Yasutaka UWANO
    Article type: SIG paper
    2010Volume 2010Issue DMSM-A903 Pages 16-
    Published: March 29, 2010
    Released on J-STAGE: August 28, 2021
    RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

    We study the empirical spectral distribution of so-called large dimensional random matrices. By empirical process theory and measure concentration inequalities, we provide a sufficient condition for the sum of the largest eigenvalues of the sample covariance matrix to be consistent, in the limit of the sample size n with the dimension d of data in the sample varying along n.

    Download PDF (141K)
  • Kohei HAYASHI, Takashi TAKENOUCHI, Tomohiro SHIBATA, Yuki KAMIYA, Dais ...
    Article type: SIG paper
    2010Volume 2010Issue DMSM-A903 Pages 17-
    Published: March 29, 2010
    Released on J-STAGE: August 28, 2021
    RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

    We study tensor-based Bayesian probabilistic modeling of heterogeneously attributed multi-dimensional arrays each of which assumes a different exponential-family distribution. Simulation experiments show that our method outperforms other methods such as PARAFAC and Tucker decomposition in missing-values prediction for cross-national statistics. We further show that the method is applicable to discover anomalies in heterogeneous office-logging data.

    Download PDF (211K)
  • Y. NISHIMORI
    Article type: SIG paper
    2010Volume 2010Issue DMSM-A903 Pages 18-
    Published: March 29, 2010
    Released on J-STAGE: August 28, 2021
    RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

    We review algorithms and theory of manifold learning in machine learning.

    Download PDF (101K)
  • Ai AZUMA, Masashi SHIMBO, Yuji MATSUMOTO
    Article type: SIG paper
    2010Volume 2010Issue DMSM-A903 Pages 19-
    Published: March 29, 2010
    Released on J-STAGE: August 28, 2021
    RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

    When we apply machine learning or data mining technique to sequential data, it is often required to take a summation over all the possible sequences. We cannot calculate such a summation directly from its definition in practice. Although the ordinary forward-backward algorithm provides an efficient way to do it, it is applicable to quite limited types of summations. In this paper, we propose general algebraic frameworks for generalization of the forward-backward algorithm. We show some examples falling within this framework and their importance.

    Download PDF (155K)
  • Yoshiharu MAENO
    Article type: SIG paper
    2010Volume 2010Issue DMSM-A903 Pages 20-
    Published: March 29, 2010
    Released on J-STAGE: August 28, 2021
    RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

    A method is presented to discover a network topology and transmission parameters behind an infectious disease outbreak from a given time sequence dataset. A likelihood function is derived analytically from the equations which describes the stochastic process for reaction and diffusion in a metapopulation network. The method is potentially applicable to discovering the networks which mediate the diffusion of rumors, information, new ideas, or influence.

    Download PDF (151K)
  • Akihiro INOKUCHI, Takashi WASHIO
    Article type: SIG paper
    2010Volume 2010Issue DMSM-A903 Pages 21-
    Published: March 29, 2010
    Released on J-STAGE: August 28, 2021
    RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS
  • [in Japanese]
    Article type: SIG paper
    2010Volume 2010Issue DMSM-A903 Pages c01-
    Published: March 29, 2010
    Released on J-STAGE: August 28, 2021
    RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS
    Download PDF (235K)
feedback
Top