In this paper, we analyze the remarks in the Diet by the short-term parliamentarians and ministers by applying machine learning. Regarding the short-term lawmakers, so-called “Children Politicians,” who won in the election with the support by mass media are analyzed and the questions by those who won another election and by those who did not are compared with each other. Then we analyze the answers in the Diet by the ministers who stepped down due to their inappropriate remarks or scandals and by those who served long as the cabinet ministers. In machine learning, we apply maximum entropy method and naive Bayes method with features extracted by discriminant analysis, and their performances on classification are compared. The result of the analysis shows that short-term “Children Politicians” use polite expressions less often and cares more about profits and losses. As for the short-term ministers, they use rude expressions not suitable in the Diet more frequently, while often insisting on high ideals and their own efforts.
In this paper, we propose a method to generate regional business sentiment indexes by using text of regional banks for business use. There are various textual data in the bank. In this study, we focused on contact histories from within those textual data. Contact histories are data recorded when employees communicate something with a customer, and various things are described. By analyzing contact histories, we thought that we can understand business sentiments of the area. Therefore, in this research, we generate regional sentiment business indexes from the contact histories. First, we selected the optimal model for business index creation using economy watchers survey. After that, we created regional sentiment business indexes using the model and compared it with the existing indicators.
In this paper, we try to extract relationships between company presidents’ messages and company’s profitability. Although top messages are directed outside the company, we consider that top messages affect profitability because business overview, future goals and mission statements are included and messages are presented for the general public including the stakeholders. However, as many companies tend to make top messages open by web sites, it takes a lot of time to collect them. Accordingly, to evaluate the influence of many top messages continuously, the automatic collection by Web crawling was used. In this study, the automatic collection method extracted 825 top messages from 3420 websites and 62 top messages were analyzed. Since there were few extracted features and significant difference was not obtained, it is necessary to carefully examine the experimental results, but the results suggested that references to group companies may affect profitability.
We describe performance evaluation of a method for recognizing utterances in local assembly minutes. The experimental datasets were collected from local assembly minutes of four municipalities for 4 years from April 2011 to March, 2013. The four municipalities are Tokyo, Aomori, Osaka and Fukuoka. We manually annotated each sentence whether the sentence is an utterance or not. In the experiment using the data of 4 years, we conducted two experiments using the different dataset between learning data and test data. In other words, we used ”data of same municipality” and ”data of different municipalities” for the learning data and the test data. As a result, the average correct answer rate of SVM was the highest, 0.985, 0.951 respectively. In addition, we conducted an experiment for reducing learning data that the dataset consists of assembly minutes of one year. As a result, the average correct answer rate of LSTM was 0.926, which was 0.061 higher than the average correct answer rate of SVM.
In this research, we propose a method to extract sentences including causal information concerning business performance (e.g. ”Orders of semiconductor manufacturing equipments were good”) from summary of financial statements. A previous research to extract sentences including causal information concerning business performance from summary of financial statements exists. In contrast, our method automatically generates training data by more accurately narrowing down the sentences extracted by the previous method. Using them, our method extracts more sentences including causal information concerning business performance than the previous method by deep learning.
We created a corpus for prefectural assembly minutes. Each record in the corpus has a field of ”speaker’s name.” This field is manually checked to improve credibility of the corpus. The corpus also has attributes of speakers (namely, electoral district, birth year and gender) in the case of assembly members. We try to analyze the corpus and illustrate activities of all assembly members in Japan. This paper describes: 1) difference in the amount of speech depending on age and gender, 2) difference in the contents of speech depending on gender, 3) difference in the contents of speech depending on age.
Due to the development and maturation of recent computer and measurement technologies, opportunities to handle multidimensional data are increasing. Many research reports focus on multidimensional data processing. They aim to construct the framework of statistical analysis of multidimensional data. Construction of a methodology of statistical analysis focusing on multidimensional data processing is important. In this paper, we consider statistical method for multi-dimensional data. We define fuzzy interval data and introduce an approximate method for Bayesian estimation by using Zadeh’s probability concept of fuzzy events. In this paper, the data whose boundaries are vague are called fuzzy interval data. Here, probability variables are observed as fuzzy interval data. We formulate the approximate Bayesian estimation using the concept of the probability of fuzzy events. However, the method treating the membership functions of fuzzy interval data precisely causes the complexity of calculation. To solve this problem, we introduce the method using the middle point of membership functions as the representative points. Then, we can settle such problems. Now, we suppose that the fuzzy interval data are obtained from the multi-dimensional normal population. When a prior distribution of population parameter is a multi-dimensional normal distribution, we can show that the posterior distribution forms the multi-dimensional normal distribution approximately by using our proposed method. As a result, even if we obtain fuzzy interval data, we can formulate an approximate multi-dimensional Bayesian estimation which is not so far different from the conventional Bayesian estimation. Finally, we provide the numerical examples to illustrate our proposed model. In realistic situations, it is not limited to always being able to determine the shape of membership functions as the ideal left-right symmetrical type. In examples, the practicality of our proposed method is studied in condition in which the left-right symmetrical type of trapezoidal membership function is not perfectly satisfied. Consequently, practicality is shown.
Many studies on fuzzy modeling (learning of fuzzy inference systems) with vector quantization (VQ) and steepest descend method (SDM) have been made. It is known that they are superior in the number of rules (parameters) compared with other learning methods. Most of conventional learning methods using VQ are ones that determine initial assignment of center parameters for membership functions in antecedent part using input part and, input and output part of learning data. On the other hand, the VQ learning method performing supervised learning for learning data is known. Therefore, it is desired that the learning method combining these VQ methods shows good performance. In this paper, we will propose a learning method combining VQ and SDM methods. In order to demonstrate the effectiveness of the proposed method, numerical simulations for function approximation and pattern classification problems are performed.