Bulletin of the Computational Statistics of Japan
Online ISSN : 2189-9789
Print ISSN : 0914-8930
ISSN-L : 0914-8930
Volume 28, Issue 1
Displaying 1-19 of 19 articles from this issue
President Address
Papers
  • Masashi Hyodo, Hiroki Watanabe, Takahiro Nishiyama
    2015Volume 28Issue 1 Pages 3-17
    Published: 2015
    Released on J-STAGE: May 01, 2017
    JOURNAL FREE ACCESS
    In this paper, we consider the estimation of inverse of the covariance matrix. The estimation of the inverse of the covariance matrix under a multivariate normal distribution is an important issue in practical situations as well as from theoretical aspects. When the dimension is larger than the sample size, the Wishart matrix is singular, and thus many estimators have been constructed by using regularized estimation of the Wishart matrix. On the other hand, even if sample size is larger than dimension, it is well known that the usual estimator is typically not well-conditioned for the case dimension is large. In such situations, we propose the new estimators based on the unbiased estimator of the inverse of the covariance matrix. Also, the asymptotic optimalities with respect to loss for these estimators are obtained. Finally, the performances of our estimators are investigated by Monte Carlo simulations.
    Download PDF (1955K)
Software
  • Yoshikazu Yamamoto, Takahiko Ozaki
    2015Volume 28Issue 1 Pages 19-27
    Published: 2015
    Released on J-STAGE: May 01, 2017
    JOURNAL FREE ACCESS
    In this paper, we introduce our software for huge amount of data written in the Java language.
    Our software have some aggregation methods and predictive simulation method with the Hadoop cluster.
    The Apache Hadoop software library is a framework for the parallel and distributed processing of huge data sets on the cluster of computers. MapReduce is a programming model designed for processing large volumes of data in parallel by dividing the work into a set of independent tasks. The Apache Hadoop MapReduce is an implementation of MapReduce for the Apache Hadoop.
    By the increase in the data size, we need to analyze the huge amount of data, which were not able to be carried until now. It is possible to analyze them with the big data technologies such as the Apache Hadoop. We can develop Apache Hadoop MapReduce applications using the Java language. Our software can perform the parallel and distributed processing of huge amount of data. First, we show simple example of a MapReduce application to explain the MapReduce feature and the key-value store. Next, we introduce our software methods, which are aggregation and predictive simulation with data sets of Joint Association Study Group of Management Science.
    Download PDF (1883K)
Reviews
  • [in Japanese]
    2015Volume 28Issue 1 Pages 29-30
    Published: 2015
    Released on J-STAGE: May 01, 2017
    JOURNAL FREE ACCESS
    Download PDF (1734K)
  • Shinobu Ogi
    2015Volume 28Issue 1 Pages 31-40
    Published: 2015
    Released on J-STAGE: May 01, 2017
    JOURNAL FREE ACCESS
    The technology called “text mining” has been attracting a lot of attention from researchers and business peple recently. In this paper, text mining itself is surveyed from various viewpoints. One of the viewpoints is the historical one, and the chronological trends of themes that contains text mining in research papers, patent claims and published books are overviewed based on various data sources. Also taxonomic survey of text mining researches is described briefly in order to let the readers have clear and tangible images of this research area. The description includes subject data classes, software tools, algorithmic tools and dictionaries. Important case studies have to be described rather briefly in various part of this survey because of space limitation, but the author hopes this paper will help the readers understand the current status of this rapidly developing research area and its applications. The author hopes this survey will help the readers understand the current status of this rapidly developing research area.
    Download PDF (2281K)
  • Takehiko Yasukawa
    2015Volume 28Issue 1 Pages 41-55
    Published: 2015
    Released on J-STAGE: May 01, 2017
    JOURNAL FREE ACCESS
    Nonnegative Matrix Factorization (NMF), which decomposes the nonnegative matrix into two nonnegative component matrices, provides new way for the text data analysis. Typically, in the text data analysis, I use the term-by-document matrix, which is nonnegative matrix, and NMF is better way to analyze the text data.
    In this paper, I introduce the basic idea of NMF from the point of view of the text data analysis application and report the analysis for the abstracts of Bulletin of the Computational Statistics of Japan over the past 25 years.
    Download PDF (3090K)
  • Atsuko Hara, Makoto Saegusa, Yuichi Ishibashi
    2015Volume 28Issue 1 Pages 57-68
    Published: 2015
    Released on J-STAGE: May 01, 2017
    JOURNAL FREE ACCESS
    A pathology report is a document that contains the diagnosis determined by examining cells and tissues under a microscope written by a pathological doctor. It plays an important role in cancer diagnosis and staging, which helps to decide treatment options. In this paper, we first described an algorithm to enable the numeric transformation of pathology reports using both text mining and statistical analysis and we attempted to develop a diagnostic processing model that done by pathological doctor. Furthermore, pathological diagnosis supporting system was provided, containing (1) extraction and representation of the similar archival report using latent semantic analysis, (2) calculation of probabilities for possible diseases by Bayes' theorem, (3) consistency verification between diagnosis and details of the reports. In the future, we aim for improving the precision of pathological diagnosis using this practical system application.
    Download PDF (2571K)
  • Soichi Noguchi, Makiko Yuasa, Keisuke Iwamoto, Shin Maruyama
    2015Volume 28Issue 1 Pages 69-80
    Published: 2015
    Released on J-STAGE: May 01, 2017
    JOURNAL FREE ACCESS
    The spread of social networking services (SNSs) such as Twitter has allowed us to maintain records of our daily communication without any special effort and, consequently, enabled us to gather large amounts of information on our communicative activities. However, despite the seeming ease with which this information gathering is possible, we should invest more effort in examining what kinds of techniques would be useful for understanding the details of the content and structure of data gathered from SNSs. In this report, we presented an example of a way to analyze daily tweets using a text-mining technique. To do so, we sampled text datasets posted by an astronaut on Twitter during his long-duration space flight and after his return to earth (the primary author was the astronaut himself), and reactions from people on the ground (i.e., the followers) to the astronaut's tweets. Our analysis demonstrated that tweets by the astronaut that contained photos or movies elicited various reactions from his followers (e.g., an increase in the amount of "re"-tweets and quicker responses to tweets). We further speculated that our analytic technique had the potential to be applied to various other fields such as marketing research and public information gatherings.
    Download PDF (10581K)
  • Shizue Izumi, Kenichi Satoh, Noriyuki Kawano
    2015Volume 28Issue 1 Pages 81-92
    Published: 2015
    Released on J-STAGE: May 01, 2017
    JOURNAL FREE ACCESS
    Lately written texts to social networking services like Twitter and Facebook are attracted to attention as big data. And these texts can be treated as longitudinally observed text data. Extraction of the longitudinal trends of keyword appearance and its classification can summarize the changes of characteristics in longitudinal text data. We propose a analytical method of the longitudinally observed text data, with an application of the method of estimating semiparametric varying coefficients using a mixed effects model proposed by Satoh and Tonda (2013). Our method consists of series of analytical methods, estimating the probability of keyword appearance using a logistic regression for the keyword appearance in the longitudinally observed text data, and classifying and visualizing the longitudinal trends of keyword appearance using summary of predictors. Results from the analysis of Hiroshima Peace Declaration enabled us to describe the longitudinal trends of keyword appearance in the text data. And the time affected classification results and the keyword location are visualized in a two-dimensional scatter plot, which provided additional information on the analogy between two classifications and the degree of intimacy with keywords. Further some practical interpretations of the classified results with consideration of social background implied an appropriateness of our proposal.
    Download PDF (6519K)
Report of Activites
Editorial Board
feedback
Top