人工知能学会第二種研究会資料

コンテンツ評価情報の類似度を用いたユーザの嗜好推測とコンテンツ推薦

鈴木健太, 濱川礼

原稿種別: 研究会資料
2010 年 2010 巻 DMSM-A903 号 p. 01-
発行日: 2010/03/29
公開日: 2021/08/28

DOIhttps://doi.org/10.11517/jsaisigtwo.2010.DMSM-A903_01

研究報告書・技術報告書フリー

抄録を表示する抄録を非表示にする

This paper proposes a method to recommend the Web contents (a novel, comics) in accord with the taste of the user to a user with evaluated web content by the similarity of the review. In late years there are many studies that recommend Web contents to the user by acquiring the taste of the user. These studies show personalized information to a user to recommend the contents that matched the taste of various users. Our method supposes the taste of the user from the review of the contents that acquired from a user. Beforehand, classify the sentences of the review of contents in "a sentence related to the contents" and "a sentence to express the impression of the user who reviewed" and accumulate in the system. And our method recommends the contents that resemble taste of the user to a user by comparing "the review of contents accumulated in the system" with "the review of the contents that acquired from a user" by each classification.

抄録全体を表示

PDF形式でダウンロード (620K)
小売サービスにおけるカテゴリマイニング～大規模データ融合による顧客と商品の同時カテゴリ分類と知識発見

石垣司, 竹中毅, 本村陽一

原稿種別: 研究会資料
2010 年 2010 巻 DMSM-A903 号 p. 02-
発行日: 2010/03/29
公開日: 2021/08/28

DOIhttps://doi.org/10.11517/jsaisigtwo.2010.DMSM-A903_02

研究報告書・技術報告書フリー

抄録を表示する抄録を非表示にする

In this report, we describe a knowledge discovery method from probability structure model constructed by large scale data fusion concerning a buying behavior in daily life. A latent class model is proposed in order to segment into a customer category and item category which is estimated from an ID-POS data and questionnaire data of customer's life styles and personalities. The variables which includes such category label and feature of customers and items is modeled as Bayesian network for knowledge discovery.

抄録全体を表示

PDF形式でダウンロード (542K)
Lasso 調整型確率化平衡樹木による回帰解析

中村将俊, 下川敏雄, 後藤昌司

原稿種別: 研究会資料
2010 年 2010 巻 DMSM-A903 号 p. 03-
発行日: 2010/03/29
公開日: 2021/08/28

DOIhttps://doi.org/10.11517/jsaisigtwo.2010.DMSM-A903_03

研究報告書・技術報告書フリー

抄録を表示する抄録を非表示にする

回帰解析の場面では，目的変数と説明変数の間にある(確率) モデルを想定し，そのモデルに則した形で観測データに対する統計的解釈を付与することが一般的である．しかし目的変数が1 個，説明変数が複数個ある場面において，パラメータに関する線形性(加法性) を想定した線形回帰モデルでは現実の現象を捉えたモデルを構築することは困難である．一つの対処法は，モデル内に非線形構造および交互作用構造を含めることができる樹木構造接近法を用いることである．樹木構造接近法は，Breiman et al.(1984) による分類回帰樹木(CART：Classi cation and RegressionTrees) 法の提案以降，様々な手法が統計科学あるいはデータ・マイニングの分野で提案されている．近年では，樹木構造接近法の低い予測確度の問題を回避するための方法，すなわちアンサンブル学習法が注目されている．アンサンブル学習法とは，樹木モデル(弱学習器) を統合することで，高い予測確度をもつことができる方法である．その代表的な手法の一つが確率化平衡樹木(RF：RandomForest：Breiman, 2001) 法である．このとき，樹木の構築過程に縮小推定量を加味することで，より良好な推定量を得られることがFriedman & Popescu(2004) により指摘されている．本発表では，RF 法に縮小推定量のひとつであるLasso(Tibshirani,1996) を加味させたLasso 調整型確率化平衡樹木(Lasso-RF) 法を提案する．

抄録全体を表示

PDF形式でダウンロード (109K)
科学可視化のポスト処理

白山晋

原稿種別: 研究会資料
2010 年 2010 巻 DMSM-A903 号 p. 04-
発行日: 2010/03/29
公開日: 2021/08/28

DOIhttps://doi.org/10.11517/jsaisigtwo.2010.DMSM-A903_04

研究報告書・技術報告書フリー

抄録を表示する抄録を非表示にする

Owing to the volume of data generated in recent computations and experiments, it is quite difficult to extract useful information from these data even if using scientific/information visualization techniques. Method or methodology to extract useful information from such data should be considered. Several concepts of very large scale visualization are proposed in this situation. Most of them are based on high-performance computing techniques or highly-efficient devices for computer graphics. Although such studies have succeeded in visualizing ultra-scale data, several issues remain unsolved. In this paper, a flexible visualization methodology based on "post visualization process", which includes a human recognition process and quantitative evaluations of visualized results is introduced. Finally, a possibility that a visualization agent designed from a process model helps to reduce the difficulty of handling huge data is described.

抄録全体を表示

PDF形式でダウンロード (331K)
分類ルール評価指標を用いたデータセット類似度分析

阿部秀尚, 津本周作

原稿種別: 研究会資料
2010 年 2010 巻 DMSM-A903 号 p. 05-
発行日: 2010/03/29
公開日: 2021/08/28

DOIhttps://doi.org/10.11517/jsaisigtwo.2010.DMSM-A903_05

研究報告書・技術報告書フリー

抄録を表示する抄録を非表示にする

In this paper, we present a method to characterize given datasets based on objective rule evaluation indices and classification rule learning algorithms. For transfer learning approach, most of methods to detect the limitations use performance indices of sets of classifiers such as accuracies of classifier sets. However, those of each classifier are also useful. By considering the issue, we performed a case study to identify similarity of datasets even if the datasets have totally different attribute sets, comparing with the conventional data characterizing technique.

抄録全体を表示

PDF形式でダウンロード (100K)
確率勾配ブースティングを用いたテレコムの契約者行動予測モデルの紹介（KDD Cup 2009 での分析より）

小林淳一, 高本和明

原稿種別: 研究会資料
2010 年 2010 巻 DMSM-A903 号 p. 06-
発行日: 2010/03/29
公開日: 2021/08/28

DOIhttps://doi.org/10.11517/jsaisigtwo.2010.DMSM-A903_06

研究報告書・技術報告書フリー

抄録を表示する抄録を非表示にする

Stochastic gradient boosting is a kind of the boosting methods invented by Jerome H.Friedman and it is known to be a very powerful method for making predictive models in some cases. In fact, FEG wins the second prize in KDD Cup 2009 by using this method. We survey the methodology of stochastic gradient boosting and introduce our analytical procedure in KDD Cup 2009. It is a good example where stochastic gradient boosting shows its effectiveness.

抄録全体を表示

PDF形式でダウンロード (319K)
Catoni流の帰納的PAC-Bayesian 学習に関する一考察

綾野孝則, 鈴木譲

原稿種別: 研究会資料
2010 年 2010 巻 DMSM-A903 号 p. 07-
発行日: 2010/03/29
公開日: 2021/08/28

DOIhttps://doi.org/10.11517/jsaisigtwo.2010.DMSM-A903_07

研究報告書・技術報告書フリー

抄録を表示する抄録を非表示にする

Catoni のThe Thermodynamics of Statistical Learning について調査し、推定すべきパラメータとサンプルとの間の相互情報量を評価する。

抄録全体を表示

PDF形式でダウンロード (79K)
PCAを用いた2群の有意差検定

田口善弘

原稿種別: 研究会資料
2010 年 2010 巻 DMSM-A903 号 p. 08-
発行日: 2010/03/29
公開日: 2021/08/28

DOIhttps://doi.org/10.11517/jsaisigtwo.2010.DMSM-A903_08

研究報告書・技術報告書フリー

抄録を表示する抄録を非表示にする

Detection of genes which are differently expressed between distinct conditions is important task in bioinformatics. Recently, epigeneitic markers turn out to have more direct relatioship with phenotypes than gene expression. In this talk, we will demostrate how well epigentic marker can be used to detect difference between conditions. Espetially, using PCA is more efficient to achieve this task.

抄録全体を表示

PDF形式でダウンロード (478K)
離散や連続を仮定しないノンパラメトリック推定とオンライン学習

鈴木譲

原稿種別: 研究会資料
2010 年 2010 巻 DMSM-A903 号 p. 09-
発行日: 2010/03/29
公開日: 2021/08/28

DOIhttps://doi.org/10.11517/jsaisigtwo.2010.DMSM-A903_09

研究報告書・技術報告書フリー

抄録を表示する抄録を非表示にする

We propose a learning algorithm for nonparametric estimation and on-line prediction for general stationary ergodic sources. We divide the real space R into a set A of finite subsets, transform a given sequence in R into the sequence in A to encode the latter using universal coding for finite sequences with distortion. We prepare infinitely many such A, and mixture the estimated measure to obtain a measure of sequences in R which may be either discrete or continuous. If the sequence is emitted by a stationary ergodic source, then the Kullback-Leibler information divided by the sequence length n converges to zero as n goes to infinity. In particular, for continuous sources, the method does not require existence of a probability density function. In this sense, this paper extends Ryabko's universal measure. The measure can be used for online prediction to estimate next data given the past sequence.

抄録全体を表示

PDF形式でダウンロード (123K)
重みつき窓を用いた適応型オンライン予測

吉田真一, 畑埜晃平 , 瀧本英二, 竹田正幸

原稿種別: 研究会資料
2010 年 2010 巻 DMSM-A903 号 p. 10-
発行日: 2010/03/29
公開日: 2021/08/28

DOIhttps://doi.org/10.11517/jsaisigtwo.2010.DMSM-A903_10

研究報告書・技術報告書フリー

抄録を表示する抄録を非表示にする

We propose online prediction algorithms for data streams whose characteristics might change over time. Our algorithms are applications of online learning with experts. In particular, our algorithms combine base predictors over sliding windows with different length as experts. As a result, our algorithms are guaranteed to be competitive with the base predictor with the best fixed-length sliding window in hindsight.

抄録全体を表示

PDF形式でダウンロード (488K)
密度比推定の理論的解析

金森敬文, 鈴木大慈, 杉山将

原稿種別: 研究会資料
2010 年 2010 巻 DMSM-A903 号 p. 11-
発行日: 2010/03/29
公開日: 2021/08/28

DOIhttps://doi.org/10.11517/jsaisigtwo.2010.DMSM-A903_11

研究報告書・技術報告書フリー

抄録を表示する抄録を非表示にする

Density ratio estimation has gathered a great deal of attention recently since it can be used for various data processing tasks. In this paper, we consider three methods of density ratio estimation: (A) the numerator and denominator densities are separately estimated and then the ratio of the estimated densities is computed, (B) a logistic regression classifier discriminating denominator samples from numerator samples is learned and then the ratio of the posterior probabilities is computed, and (C) the density ratio function is directly modeled and learned by minimizing the empirical Kullback-Leibler divergence. We first prove that when the numerator and denominator densities are known to be members of the exponential family, (A) is better than (B) and (B) is better than (C). Then we show that once the model assumption is violated, (C) is better than (A) and (B). Thus in practical situations where no exact model is available, (C) would be the most promising approach to density ratio estimation.

抄録全体を表示

PDF形式でダウンロード (183K)
ラベル無しデータを用いた回帰の改良

川喜田雅則, 竹内純一

原稿種別: 研究会資料
2010 年 2010 巻 DMSM-A903 号 p. 12-
発行日: 2010/03/29
公開日: 2021/08/28

DOIhttps://doi.org/10.11517/jsaisigtwo.2010.DMSM-A903_12

研究報告書・技術報告書フリー

抄録を表示する抄録を非表示にする

This paper studies a technique to improve regression with unlabeled data. The key idea of our proposal is that the semi-supervised learning can be recasted as a regression problem under covariate shift. The weighted likelihood approach is a natural choice for estimating regression parameters under covariate shift. Literature [9] showed that the optimal choice of weight function is the ratio of labeled data density to unlabelled data density. In application of this idea to our setting, the optimal weight function is trivially taking always the value one. However, our proposal is to discard this optimal weight function and to estimate it. This is deeply related to the work by [5]. The resultant algorithm is shown to perform well by some experiments.

抄録全体を表示

PDF形式でダウンロード (169K)
Incremental Mining of Closed Frequent Subtrees

Viet Anh Nguyen, Akihiro Yamamoto

原稿種別: 研究会資料
2010 年 2010 巻 DMSM-A903 号 p. 13-
発行日: 2010/03/29
公開日: 2021/08/28

DOIhttps://doi.org/10.11517/jsaisigtwo.2010.DMSM-A903_13

研究報告書・技術報告書フリー

抄録を表示する抄録を非表示にする

We study the problem of mining closed frequent tree patterns from tree databases that are updated regularly over time. Frequent tree mining, like frequent itemset mining, is often a very time consuming process, and thus, it is undesirable to mine from scratch when the change to the database is small. The set of previous mined patterns, which also can be considered as a description of the database, should be reused as much as possible to compute new emerging patterns. We proposed, in this paper, a novel and efficient incremental mining algorithm for closed frequent labeled ordered trees. We adopted a divide-and-conquer strategy and applied different mining techniques in different parts of the mining process. No additional scan of the whole database is needed and just a relative small amount of information from previous mining iteration has to be maintained. Our experimental study on real-life datasets demonstrates the efficiency and scalability of our algorithms.

抄録全体を表示

PDF形式でダウンロード (257K)
大規模健診データに関するナイーブベイズ分類器のノンパラメトリックな拡張

山本けい子, 速水悟, 亀山敦之, 内山良一, 紀ノ定保臣

原稿種別: 研究会資料
2010 年 2010 巻 DMSM-A903 号 p. 14-
発行日: 2010/03/29
公開日: 2021/08/28

DOIhttps://doi.org/10.11517/jsaisigtwo.2010.DMSM-A903_14

研究報告書・技術報告書フリー

抄録を表示する抄録を非表示にする

This paper describes about the problem on estimation of disease risks with a large health checkup database. The proposed method uses a naive Bayesian classifier with the extension of two dimensional kernel density estimation technique. The framework is tested by estimation of disease risks for examinee with three diseases, hypertension, diabetes and dyslipidemia. Combination of attribute interactions and naive Bayesian method shows considerable improvement in estimation experiments.

抄録全体を表示

PDF形式でダウンロード (96K)
テキストマイニングによる個人Blog データからの性格推定手法

南川敦宣, 横山浩之

原稿種別: 研究会資料
2010 年 2010 巻 DMSM-A903 号 p. 15-
発行日: 2010/03/29
公開日: 2021/08/28

DOIhttps://doi.org/10.11517/jsaisigtwo.2010.DMSM-A903_15

研究報告書・技術報告書フリー

抄録を表示する抄録を非表示にする

In this paper, we propose egogram estimation method from weblog text data. Egogram is one of the personality models which illustrate the ego states of the users. In our method, the features which is appropriate for egogram are selected using the information gain of the each word which is contained in weblog text, and estimation is performed by Multinomial Naive Bayes classifiers. We evaluate our method in some classification scenario and show its effectiveness.

抄録全体を表示

PDF形式でダウンロード (323K)
主成分分析の固有値の一致性について

赤間陽二, 上野康隆

原稿種別: 研究会資料
2010 年 2010 巻 DMSM-A903 号 p. 16-
発行日: 2010/03/29
公開日: 2021/08/28

DOIhttps://doi.org/10.11517/jsaisigtwo.2010.DMSM-A903_16

研究報告書・技術報告書フリー

抄録を表示する抄録を非表示にする

We study the empirical spectral distribution of so-called large dimensional random matrices. By empirical process theory and measure concentration inequalities, we provide a sufficient condition for the sum of the largest eigenvalues of the sample covariance matrix to be consistent, in the limit of the sample size n with the dimension d of data in the sample varying along n.

抄録全体を表示

PDF形式でダウンロード (141K)
指数族テンソル因子化法による欠損値予測と異常検知

林浩平, 竹之内高志, 柴田智広, 神谷祐樹, 加藤大志 , 國枝和雄, 山田敬嗣, 池田和司

原稿種別: 研究会資料
2010 年 2010 巻 DMSM-A903 号 p. 17-
発行日: 2010/03/29
公開日: 2021/08/28

DOIhttps://doi.org/10.11517/jsaisigtwo.2010.DMSM-A903_17

研究報告書・技術報告書フリー

抄録を表示する抄録を非表示にする

We study tensor-based Bayesian probabilistic modeling of heterogeneously attributed multi-dimensional arrays each of which assumes a different exponential-family distribution. Simulation experiments show that our method outperforms other methods such as PARAFAC and Tucker decomposition in missing-values prediction for cross-national statistics. We further show that the method is applicable to discover anomalies in heterogeneous office-logging data.

抄録全体を表示

PDF形式でダウンロード (211K)
多様体学習と非線形次元縮約

西森康則

原稿種別: 研究会資料
2010 年 2010 巻 DMSM-A903 号 p. 18-
発行日: 2010/03/29
公開日: 2021/08/28

DOIhttps://doi.org/10.11517/jsaisigtwo.2010.DMSM-A903_18

研究報告書・技術報告書フリー

抄録を表示する抄録を非表示にする

We review algorithms and theory of manifold learning in machine learning.

抄録全体を表示

PDF形式でダウンロード (101K)
半環に基づく前向き後ろ向きアルゴリズムの一般化

東藍, 新保仁, 松本裕治

原稿種別: 研究会資料
2010 年 2010 巻 DMSM-A903 号 p. 19-
発行日: 2010/03/29
公開日: 2021/08/28

DOIhttps://doi.org/10.11517/jsaisigtwo.2010.DMSM-A903_19

研究報告書・技術報告書フリー

抄録を表示する抄録を非表示にする

When we apply machine learning or data mining technique to sequential data, it is often required to take a summation over all the possible sequences. We cannot calculate such a summation directly from its definition in practice. Although the ordinary forward-backward algorithm provides an efficient way to do it, it is applicable to quite limited types of summations. In this paper, we propose general algebraic frameworks for generalization of the forward-backward algorithm. We show some examples falling within this framework and their importance.

抄録全体を表示

PDF形式でダウンロード (155K)
拡散現象を媒介するネットワークのプロファイリング

前野義晴

原稿種別: 研究会資料
2010 年 2010 巻 DMSM-A903 号 p. 20-
発行日: 2010/03/29
公開日: 2021/08/28

DOIhttps://doi.org/10.11517/jsaisigtwo.2010.DMSM-A903_20

研究報告書・技術報告書フリー

抄録を表示する抄録を非表示にする

A method is presented to discover a network topology and transmission parameters behind an infectious disease outbreak from a given time sequence dataset. A likelihood function is derived analytically from the equations which describes the stochastic process for reaction and diffusion in a metapopulation network. The method is potentially applicable to discovering the networks which mediate the diffusion of rumors, information, new ideas, or influence.

抄録全体を表示

PDF形式でダウンロード (151K)
頂点により誘導される頻出グラフ系列パターンのマイニング

猪口明博, 鷲尾隆

原稿種別: 研究会資料
2010 年 2010 巻 DMSM-A903 号 p. 21-
発行日: 2010/03/29
公開日: 2021/08/28

DOIhttps://doi.org/10.11517/jsaisigtwo.2010.DMSM-A903_21

研究報告書・技術報告書フリー

抄録を表示する抄録を非表示にする

The mining of a complete set of frequent subgraphs from labeled graph data has been studied extensively. Furthermore, much attention has recently been paid to frequent pattern mining from graph sequences (dynamic graphs or evolving graphs). In this paper, we define a novel class of subgraph subsequence called an "induced subgraph subsequence" to enable efficient mining of a complete set of frequent patterns from graph sequences containing large graphs and long sequences. We also propose an efficient method to mine frequent patterns, called "FRISSs (Frequent Relevant, and Induced Subgraph Subsequences)", from graph sequences. The fundamental performance of the method was evaluated using artificial datasets, and its practicality was confirmed through experiments using a real-world dataset.

抄録全体を表示

PDF形式でダウンロード (499K)
データマイニングと統計数理研究会（第 12 回）目次

データマイニングと統計数理研究会

原稿種別: 研究会資料
2010 年 2010 巻 DMSM-A903 号 p. c01-
発行日: 2010/03/29
公開日: 2021/08/28

DOIhttps://doi.org/10.11517/jsaisigtwo.2010.DMSM-A903_c01

研究報告書・技術報告書フリー

PDF形式でダウンロード (235K)

J-STAGEへの登録はこちら（無料）