In recent years, data analysis systems that combine natural language generation and visualization are increasing. The medium of text is superior to visualization in that it does not require special knowledge for users to understand the important facts buried in data. However, automatically summarizing large data with texts becomes too long, because in table data, the more attributes there are, the more statistical features there are. In this study, we address this scalability problem by using the hierarchical structure that many large data sets have explicitly or implicitly, in both items and attributes. First, the system focuses text generation and visualization only on the areas that users are interested in. Secondly, it interactively shifts the focus as the user's interest shifts. In this way, we propose an idea for a system that allows users to explore the entire data smoothly while limiting the amount of text presented at once. We also implement the system and show that this idea is effective.
We have been discussing data sharing among autonomous independent sites in the distributed system to meet various demands for application. Among them the need for collaborative data sharing has attracted attention in many fields where not only the owner of the original data but the receiver can update that shared data. The BCDS Agent is a new building unit for configuring such systems with scalability and versatility. It rests on the novel feature of bidirectional programming which encourages us to take the compositional approach in developing the distributed system with data consistency.
We present the key issue on designing the BCDS Agent with some examples.
We proposed a framework for investigating causes of system faults with software log and infrastructure log. Recently, the complexity of computer system grows up more and more because the computer system includes not only scratch-build software but also infrastructure, package software, cloud services, network. In addition, the techniques influence each other in the computer system. Software engineers have more difficulties to detect true causes of system faults. Therefore, the proposed framework helps software engineers detect causes of system faults even if system faults are caused by not only software faults but also infrastructure faults. The most feature of the framework is gaps of system log between normal processing and error processing. The gaps lead developers into true causes of system faults. Moreover, we developed tools for supporting the frameworks. Using the tools, we applied the framework into the system faults in the real computer system. The framework and tools helped us detect true causes of the two real system faults.
In the software industry, a variety of training courses have been adopted by companies for educating entry-level engineers to learn the basic knowledge of computer science and programming language(s). While such training courses are advocated as effective, little is known about the actual impact of the training courses. To fill this gap, we have conducted an empirical study to analyze the impact of a month-long training course in which 23 entry-level engineers of a company participated. In a nutshell, we found that source codes that are written by participants tend to be more functional and less redundant after the training course.
In this study, we quantitatively compare the effects of outlier handling methods in training datasets for model building on eight software effort estimation models (e.g., linear multiple regression, regression trees, random forests, support vector regression, etc.), and we evaluate the effectiveness of the data smoothing method proposed by the authors. In our experiments, we compare three outlier removal methods (outlier removal using Cook's distance, TEAK, and Filter-INC) in addition to the data smoothing method. Experimental results showed that the data smoothing method combined with the outlier detection method in Cook's distance or Filter-INC were found to build a model with good estimation performance.