Recently, Transformer-based pre-trained language models have achieved great success in natural language processing methods. As a result, there is a growing interest in software development and programming to apply pre-trained language models to a large amount of programming code. For example, CodeT5, which is T5 pre-trained with programming code, has shown significant performance improvements in various software development tasks such as code generation, code summarization, and code classification. However, building these models requires a huge amount of computing resources and time, and not everyone can do it. We propose a method of continuted pretraining to multilingual T5 for adapting python. This study reports that the proposed method shows improved performance results in tasks such as code generation and error diagnosis.
In software development, the ability to predict fault-prone modules, that are likely to contain bugs, with high accuracy leads to more efficient testing and debugging. In order to improve prediction accuracy, removal of outlier data in training data of prediction models that adversely affect prediction has been studied. In this paper, we propose a more robust outlier removal method that identifies and removes outliers in training data using a third-party dataset obtained from projects different from the one being predicted in the cross-version prediction. Results of evaluation experiments show that the proposed method can improve prediction accuracy for the majority of projects and is more effective than existing outlier removal methods such as MOA and CC-MOA.
In programming exercises, we propose a method to localize defects based on automatic program repair. Localizing defects is accomplished by comparing a learner's program and model answers using program segments, which are sequences of statements with no branches, and replacing a learner's segments with model answers' segments until all test cases are passed. We implemented a prototype tool for localizing defects and confirmed that our tool can find defects for practical use.
This study proposes a framework to evaluate the reliability of obfuscating transformations in program code. The reliability of an obfuscating transformation can be evaluated by whether the transformation makes program code harder to analyze while preserving the functionality. The proposed framework applies obfuscating transformations to a collection of program code, executes the test cases, and measures the ratio of obfuscated program code that pass the test cases and the mean of the distance of opcode sequences before and after obfuscation. Two experiments have been conducted to evaluate the reliability of existing 43 obfuscating transformations implemented in well-known obfuscation tools, Tigress and Obfuscator-LLVM. The proposed framework revealed that there were combinations of obfuscating transformations which did not preserve the functionality of programs, even though each of the transformations worked properly for the programs.
In IoT development, boilerplate implementations for communication protocols, device operation, and so on are frequently needed. Therefore, similar patches may be applied to other code fragments when a patch is prepared for a particular defect. In this study, given a single patch for a defect, we propose an approach to detect code clones of the defect and generate patches for them. In a case study, we extracted 26 cases from a dataset of IoT defects in which code clones of the defects existed and applied the proposed approach to them. As a result, we found that the proposed approach successfully generated patches for all 26 cases.
Dynamically typed languages, such as JavaScript, require more memory compared to statically typed languages because they determine data types at runtime and need to store data type information in memory along with the values. However, even in dynamically typed languages, collections such as arrays are often used to store values of the same type. To reduce memory usage, we implemented a technique called “storage strategy” on arrays that store only values of the same type. This technique involves holding the type information within the collection itself rather than storing it separately. We introduced smaller data types such as 1-byte integers for the array data types to further reduce memory usage. We implemented our proposal in a JavaScript virtual machine specifically designed for embedded systems and evaluated its performance.
We propose a subsystem of concurrent separation logic with fractional permissions introduced by Brotherston et al. Separation logic is an extension of Hoare logic that reasons about programs using shared mutable data. Separation logic has separating conjunction asserting that its subformulas hold for separated (disjoint) parts in the heap. Fractional permissions manage access permission of shared resources between concurrent threads. Brotherston et al. introduced an extension of concurrent separation logic with fractional permissions, but they still need to discuss the decidability of logic. The heart of this paper is restricting the formulas of the system to symbolic heaps. We present examples to illustrate that our system is appropriate to prove the entailment for data structures, such as list segments with cycles. We eliminate permissions by normalization, and therefore we can reduce the entailment checking problem to the existing decidable entailment checking.