This study aims to develop a system which visualizes subjective information. Focus- ing on onomatopoeias as such information, we estimate which senses an onomatopoeia belongs to among "touch", "taste", "smell", "hearing", "sight", "pleasure (positive)" and "unpleasure (neg- ative)". For this purpose, we use a machine learning method (Support Vector Machine) which utilizes phonetic symbols and the number of occurrences of them in the onomatopoeia. Then, the experimental result for evaluation demonstrates that (1) the best performance is achieved for "hearing" and "sight", and (2) the performance of the classifier is similar to that of human. Finally, we propose the system which creates city maps displaying distribution of subjective information for senses.
The characteristics of user behaviors in explorative information access are reported, which reflect the differences of the environments she uses and the tasks she engages in. Using a model of information access behaviors and a log data coding based on that model, the analysis was conducted on the log data obtained in VisEx, an experiment for evaluating interactive and explorative information access environments. It shows that introduced retrieval methods, narrowing-down and similarity-based retrieval, are used as a substitute of sequential document checking, and those effectiveness differs depending on task characteristics.
This paper aims to raise the accuracy of multi-class text classification by means of graph-based semi-supervised learning (GBSSL). It is essential to construct a proper graph expressing the relation among nodes in GBSSL. We propose a method to construct a similarity graph by employing both surface information and latent information to express similarity between nodes. Experimenting on Reuters-21578 corpus, we have confirmed that our proposed method works well for raising the accuracy of GBSSL in multi-class text classification task.
We have proposed a method to raise the accuracy of text classification based on latent topic information, introducing several techniques such as extracting important words with PageRank algorithm and reducing the size of target documents by replacing them with important sentences in themselves. We have experimented on text classification with Reuters-21578 data set and confirmed that our proposed method worked to raise the accuracy of text classification. In this paper, we aim to verify our method with additional experiments using 20 Newsgroups data set and report the experimental result.
This paper considers a collaborative policy for combining tools, in development of system using TETDM. TETDM is a total environment for text data mining, can prepare for various mining tasks by combination of small mining tools. However, a useful guide in the design of system constructed with several small tools developed by different tool developers has not been considered. This paper describes a design guide adjusting user's purpose and system's specifications for constructing general purpose system, and shows an example of practice.