-
Misaki Kojima, Naoki Nishida
Article type: Regular Paper
Subject area: Special Section on Programming
2024Volume 32 Pages
417-435
Published: 2024
Released on J-STAGE: May 15, 2024
JOURNAL
FREE ACCESS
An all-path reachability (APR, for short) problem of a logically constrained term rewrite system (LCTRS, for short) is a pair of constrained terms representing state sets. An APR problem is demonically valid if every finite execution path from any state in the first set to an irreducible state includes a state in the second set. We have proposed a framework to reduce the non-occurrence of specified error states in a transition system represented by an LCTRS to an APR problem with irreducible constant destinations. In this paper, by focusing on quasi-termination of constrained narrowing, we characterize a class of LCTRSs of which APR problems with constant destinations are decidable. Quasi-termination of a (constrained) term w.r.t. narrowing is the finiteness of the set of reachable narrowed (constrained) terms from the initial (constrained) term up to variable renaming (and equivalence of constraints). To this end, we first introduce an inference rule for disproofs to a proof system for APR problems with constant destinations and formulate (dis)proof trees of APR problems in the style of cyclic proofs. Secondly, to prove correctness of disproof trees, we adapt constrained narrowing to LCTRSs and show soundness of constrained narrowing of LCTRSs w.r.t. constrained rewriting. Thirdly, we show a sufficient condition of LCTRSs for quasi-termination of constrained narrowing starting from linear constrained terms that have no nesting of defined symbols: Rewrite rules are right-linear and nesting-free w.r.t. defined symbols, and the application of rewrite rules for (mutually) recursive defined symbols does not increase the height of resulting constrained terms of narrowing. Finally, we show that if a constrained term is quasi-terminating w.r.t. narrowing of an LCTRS over a sort-wise finite signature with a decidable theory, then the APR problem consisting of the constrained term and an irreducible constant is decidable.
View full abstract
-
Masahiro Yasugi, Kento Emoto, Tasuku Hiraishi
Article type: Regular Paper
Subject area: Special Section on Programming
2024Volume 32 Pages
436-450
Published: 2024
Released on J-STAGE: May 15, 2024
JOURNAL
FREE ACCESS
Mechanisms for legitimate execution stack access (LESA for short) provide legitimate access to values of callers' variables sleeping deeply in a C-like execution stack. LESA mechanisms enable efficient implementations of high-level services that require dynamic rearrangement of running software, such as garbage collection, first-class continuations, and dynamic load balancing, for implementing efficient and safe high-level languages. As a nested-function-style LESA mechanism, we can create a closure from a nested function definition and indirectly call the nested function via a pointer to the closure, achieving access to values of the variables captured in the closure-creation-time environment. In this study, we propose restartable exception handling mechanisms as new LESA mechanisms, which provide callable exception handlers without non-local exiting; handlers are found on the basis of dynamic scope without requiring pointers. In addition, the LESA mechanisms can be used at multiple levels; e.g., garbage collection can be started during the creation of a first-class continuation. In this paper, we design core languages that feature leveled restartable exception handling mechanisms and discuss their properties and implementations.
View full abstract
-
Bach Nguyen Trong, Kanae Tsushima, Zhenjiang Hu
Article type: Regular Paper
Subject area: Special Section on Programming
2024Volume 32 Pages
451-465
Published: 2024
Released on J-STAGE: May 15, 2024
JOURNAL
FREE ACCESS
Although many advanced synthesizers can generate unidirectional programs on relations (tables of relational database systems) from user-provided examples, the difficulty of synthesizing bidirectional programs - each being a well-behaved pair comprising a forward program and a backward program - from examples involving tables that have internal functional dependencies, still remains an unsolved issue. In this paper, we propose an approach that leverages the cutting-edge template-based synthesizer, PROSYNTH, to achieve the synthesis of bidirectional programs from examples with functional dependencies. We forward propagate functional dependencies from the source through intermediate relations to the view via a set of atomic forward programs, which together compose a complete forward program. Then, we illustrate how to formulate Datalog templates that encapsulate well-behaved view update strategies with minimal effects, along with additional templates that capture the constraints and effects introduced by the specified functional dependencies. Utilizing these templates, PROSYNTH could be employed to synthesize atomic backward programs that can subsequently be combined to form a complete backward program. We have implemented our proposed approach in a prototype named SYNTHBP and evaluated it on 38 practical benchmarks. Our evaluation shows that SYNTHBP is able to automate 37 of these benchmarks effectively.
View full abstract
-
Atsuki Tamekuri, Saneyasu Yamaguchi
Article type: Regular Paper
Subject area: Special Section on Databases
2024Volume 32 Pages
466-470
Published: 2024
Released on J-STAGE: May 15, 2024
JOURNAL
FREE ACCESS
Deep neural networks have greatly improved natural language processing and text analysis technologies. In particular, pre-trained large language models have achieved significant improvement. However, it has been argued that they are black boxes and that it is important to provide interpretability. In our previous work, we focused on self-attention and proposed methods for providing and evaluating interpretability. However, the work did not use large language models, and the evaluation method used unusual sentences by deleting words. In this paper, we focus on BERT, which is a popular large language model, and its masking function instead of deleting words. We then show a problem of using this masking function to provide interpretability, which is that the mask token is not neutral for decision. We then propose an evaluation method based on this masking function with training to learn that the mask token is neutral.
View full abstract
-
Bach Nguyen Trong, Kanae Tsushima, Zhenjiang Hu
Article type: Regular Paper
Subject area: Special Section on Databases
2024Volume 32 Pages
471-486
Published: 2024
Released on J-STAGE: May 15, 2024
JOURNAL
FREE ACCESS
Bidirectional transformations between different representations of related information appear frequently in many different areas like databases, software engineering, and programming languages. A bidirectional program expressing a bidirectional transformation includes a pair of programs - a get that defines a view over a source and a put that translates view updates to source updates - and provides strong guarantees about the well-behavedness of the get and the put. It is known to be challenging to develop bidirectional programs that are both well-behaved and practically useful. In this paper, we propose an approach to synthesizing well-behaved and practical bidirectional programs on relations from user-provided examples and data schemas. We start by synthesizing a get and decomposing it into a set of simple and atomic geta whose corresponding puta exists and thus reduce the synthesis of (get, put) into sub-synthesis of (geta, puta). Then, we solve each sub-synthesis task with a set of well-designed templates and combine all results to form the final bidirectional program. We have implemented our approach in a framework called SYNTHBX and evaluated it on a benchmark suite of 56 tasks from three sources. SYNTHBX successfully synthesizes well-behaved bidirectional programs for 52 tasks, with an average synthesis time of 19 seconds per task and within 3 seconds each for 37 of them, which shows practical usefulness of our approach.
View full abstract
-
Yuki Kuwabara, Yu Suzuki
Article type: Regular Paper
Subject area: Special Section on Databases
2024Volume 32 Pages
487-495
Published: 2024
Released on J-STAGE: May 15, 2024
JOURNAL
FREE ACCESS
Stopwords are generally used to improve the accuracy of document classification and retrieval. We believe that setting appropriate stopwords improves classification accuracy. However, in our preliminary experiments, in document classification tasks using BERT, existing stopword lists are not effective for improving classification accuracy. To solve this problem, we construct a method for generating stopwords using the attention mechanism of the classifiers. In this method, words with high attention in misclassified input documents and low attention in correctly classified documents are treated as stopwords. The system probabilistically removes stopwords. The system automatically sets the probability of each word in input documents being a stopword when it builds the classification model. We conduct experiments to confirm effectiveness of our stopword generation method. Our experimental results show that there are cases using stopwords generated by our method that improve the classification accuracy. Three of the six classification tasks tested in this study show significant differences in accuracy improvement.
View full abstract
-
Yuichi Tokunaga
Article type: Special Issue of Intelligent Transportation Systems and Pervasive Systems that Recreate the Value of Physical Mobility
2024Volume 32 Pages
496-497
Published: 2024
Released on J-STAGE: June 15, 2024
JOURNAL
FREE ACCESS
-
Hisatomo Hanabusa, Tadashi Komiya, Kyohei Ichinose, Koji Takahashi, Ry ...
Article type: Special Issue of Intelligent Transportation Systems and Pervasive Systems that Recreate the Value of Physical Mobility
Subject area: Invited Papers
2024Volume 32 Pages
498-508
Published: 2024
Released on J-STAGE: June 15, 2024
JOURNAL
FREE ACCESS
This study aims to develop a nowcast crowd simulation framework using real-time observed data. The framework is a part of the nowcast simulation framework we suggest that can reproduce the present crowd flow of pedestrian space in transportation terminals or city blocks sequentially in real-time. This mechanism aims to understand the crowd situation and avoid the city's concentration on social resource use. The framework consists of the precast process to prepare the input data of the average crowd situation and the nowcast process to reproduce the current crowd situation. This paper is focused on the nowcast process. It suggests a method to understand pedestrian route and OD (Origin to Destination) patterns and to estimate OD demands as the input of the crowd simulation model. The applicability for practical use is confirmed by discussing the framework's requirements and the demonstration in the real field. Finally, the discussions of the nowcast, forecast, and backcast process in the city using the framework are shared as future work.
View full abstract
-
Shota Horisaki, Kazushige Matama, Katsuhiro Naito, Hidekazu Suzuki
Article type: Special Issue of Intelligent Transportation Systems and Pervasive Systems that Recreate the Value of Physical Mobility
Subject area: Wireless/Mobile Networks
2024Volume 32 Pages
509-519
Published: 2024
Released on J-STAGE: June 15, 2024
JOURNAL
FREE ACCESS
CYber PHysical overlay network over Internet Communication (CYPHONIC) has been proposed as a communication architecture that simultaneously achieves communication connectivity and mobility transparency in a mixed IPv4/IPv6 environment. Using CYPHONIC, applications running on mobile devices and IoT devices can realize end-to-end encrypted communication across an overlay network. However, if firewalls installed on the communication path between end nodes do not allow the CYPHONIC protocol, the overlay network cannot be constructed. This paper proposes CYPHONIC-over-QUIC, which integrates QUIC, a standardized general-purpose transport protocol designed for web communications, into CYPHONIC to provide end-to-end encrypted communications that can pass through firewalls and NATs. We implemented CYPHONIC-over-QUIC on two Raspberry Pi 4s and Linux servers running on AWS EC2, and evaluated its communication performance using the actual Internet environment. As a result, we confirmed that the signaling process at the start of communication does not affect the application communication and that the throughput performance is equivalent to that of the conventional CYPHONIC.
View full abstract
-
Ayumu Harada, Akihito Hiromori, Hirozumi Yamaguchi
Article type: Special Issue of Intelligent Transportation Systems and Pervasive Systems that Recreate the Value of Physical Mobility
Subject area: Mobile Computing
2024Volume 32 Pages
520-532
Published: 2024
Released on J-STAGE: June 15, 2024
JOURNAL
FREE ACCESS
Japan has been severely impacted by natural disasters including but not limited to earthquakes such as the Great Hanshin-Awaji Earthquake in 1995 and the Kumamoto Earthquake in 2016. These seismic events have underscored the high number of casualties that result from individuals becoming trapped in collapsed buildings or affected by fires, thereby accentuating the need for building-specific earthquake assessments. Although experts have performed detailed analyses using automated satellite imagery and UAV-captured photos for longer-term objectives such as secondary disaster prevention, reconstruction, and insurance claim verification, these require a substantial amount of time to implement. This paper introduces two methods that employ UAVs to rapidly detect anomalous building structures, allowing for the simultaneous observation of multiple buildings. First, we present a method for identifying building structures from incomplete three-dimensional point clouds acquired by unmanned aerial vehicles (UAVs) that move faster over the target area in a limited time for rapid assessment. The method efficiently identifies the structural characteristics of buildings by operating under certain geometric assumptions such as the angles between building sides being approximately 90 degrees and vertical consistency in building shape. We also present a method for identifying collapsed buildings by extracting features from point clouds in Fisher vector and normal histogram and using a machine-learning model for detection. In our evaluations, we have shown that by limiting observations to less than half of the building structure, the first method can successfully recognize the geometric shape of 70% of the undamaged buildings. In the experiments for the second method, both feature extraction methods achieved a Receiver Operating Characteristic Area Under the Curve (ROC-AUC) values greater than 0.99.
View full abstract
-
Cher Yen Tan, Ryotaro Akagawa, Tatsuya Yamazaki, Motohiko Yamazaki
Article type: Regular Paper
Subject area: Special Section on Consumer Device & System
2024Volume 32 Pages
533-542
Published: 2024
Released on J-STAGE: June 15, 2024
JOURNAL
FREE ACCESS
Lung cancer is the most common type of cancer and is the leading cause of cancer-related deaths in Japan. Regarding lung cancer diagnosis, a pivotal aspect in lung cancer treatment is tumor detection and selection of appropriate cancer treatment. Computer Tomography (CT) images are usually used to detect tumors. After tumor detection, identifying mutations in the Epidermal Growth Factor Receptor (EGFR) gene is essential for cancer treatment selection. The EGFR gene is a key factor associated with cancer cell proliferation its mutation appears both inside and around the tumors. This paper proposes a lung cancer diagnostic system designed to streamline the process from tumor detection to EGFR gene mutation identification. The proposed system consists of three modules: an input interface module, an automated lung tumor segmentation module, and an EGFR mutation prediction module. The system is characterized in that all modules are consistently based on image processing. Consequently, the proposed system enables users to acquire the diagnosis results of tumor detection as well as EGFR gene mutation prediction by just providing the input CT image data and the patients' clinical features. Our experimental results confirm that the system achieves performance levels comparable to existing research, both in terms of lung tumor segmentation accuracy and the precision of EGFR mutation predictions.
View full abstract
-
Koga Toriumi, Takeshi Kamiyama, Masato Oguchi, Saneyasu Yamaguchi
Article type: Special Issue of Intelligent Transportation Systems and Pervasive Systems that Recreate the Value of Physical Mobility
Subject area: Special Section on Consumer Device & System
2024Volume 32 Pages
543-551
Published: 2024
Released on J-STAGE: June 15, 2024
JOURNAL
FREE ACCESS
Smartphones can be used in portrait or landscape mode, and their operating systems detect the device orientation and automatically adjust the screen view orientation. However, the adjustment policy is pessimistic, and the delay between device rotation and screen view rotation is large. We believe that this delay may degrade the user experience. In this paper, we propose a method to predict device rotation in the near future based on acceleration using a support vector machine, and to quickly adjust the screen view orientation based on the prediction results. We then show that the method can predict device rotation with high or non-low accuracy (with low false positive and false negative probabilities) and that the method can adjust the screen view orientation with a short delay time through our experimental results on three to six subjects using a smartphone with the Android operating system installed with the proposed method.
View full abstract
-
Shunya Oguchi, Shoji Yuen
Article type: Regular Paper
Subject area: Special Section on Programming
2024Volume 32 Pages
552-564
Published: 2024
Released on J-STAGE: July 15, 2024
JOURNAL
FREE ACCESS
We present bidirectional data flow analysis for constant propagation in the concurrent reversible intermediate language, CRIL, proposed by the authors. A CRIL program consists of basic blocks, each with one instruction enclosed by entry and exit labels. In CRIL, basic blocks are executed concurrently in forward and backward directions, keeping causal safety and causal liveness as the fundamental correctness for reversibility. To optimize CRIL programs, we focus on bidirectional data flow among basic blocks. We propose data flow analysis among basic blocks to reveal data flow conflicts in both directions. The bidirectional data flow analysis enables the backward constant propagation in addition to the forward one. We define directed edges labeled variables among basic blocks for possible forward data flows and give directed edges for possible backward data flows using the edges of foward data flows. It is shown that the forward and backward data flow edges cover all the data flows in executing the program in either direction. Based on this soundness property, if a constant propagates by the data flow edges, then it propagates in all concurrent executions of both directions. We show bidirectional constant propagation in a CRIL example by the data flow analysis.
View full abstract
-
Yuuichi Teranishi
Article type: Special Issue of “Applications and the internet” in conjunction with the main topics of COMPSAC 2023
2024Volume 32 Pages
565
Published: 2024
Released on J-STAGE: August 15, 2024
JOURNAL
FREE ACCESS
-
Daiki Nobayashi, Renju Akashi, Kazuya Tsukamoto, Takeshi Ikenaga, Myun ...
Article type: Special Issue of “Applications and the internet” in conjunction with the main topics of COMPSAC 2023
Subject area: Invited Papers
2024Volume 32 Pages
566-574
Published: 2024
Released on J-STAGE: August 15, 2024
JOURNAL
FREE ACCESS
We propose a floating cyber-physical system (F-CPS) as a novel platform for data distribution and application execution to realize a CPS. The F-CPS constructs a local-oriented and distributed CPS using devices with computing capability close to the user, such as smartphones, sensor devices, vehicles, and roadside units. This paper first provides an overview of the F-CPS and then discusses the challenges posed by the spatio-temporal retention system (STD-RS), a network technology for data distribution and maintenance in the local area of the F-CPS. To address the challenges associated the STD-RS, we propose adaptive data transmission control to effectively retain the STD in a certain area. Finally, we performed network simulations to demonstrate the effectiveness of the proposed adaptive method.
View full abstract
-
Hangli Ge, Takashi Michikata, Noboru Koshizuka
Article type: Special Issue of “Applications and the internet” in conjunction with the main topics of COMPSAC 2023
Subject area: Algorithm Theory
2024Volume 32 Pages
575-585
Published: 2024
Released on J-STAGE: August 15, 2024
JOURNAL
FREE ACCESS
In this paper, a method of multi-weighted graphs learning for passenger count prediction in railway networks is presented. Traffic prediction can provide significant insights for railway system optimization, urban planning, smart city development, etc. However, affected by the complexity of the railway networks as well as other factors, including spatial, temporal, and other external ones, traffic prediction on railway networks remains a critical task. To achieve high prediction performance of the models and discover the correlation between the models' performance and features, we propose six heterogenerous weight graphs, i.e., connection graph, distance graph, correlation graph and their fused graphs to fully construct the spatial and geometrical features of the railway network. Two representative graph neural networks (GNN), that is, the graph convolutional network (GCN) and graph attention network (GAT) were implemented for evaluation. The evaluation results demonstrate that the proposed GAT model learning on the correlation graph achieves the best performance, as it can reduce the metrics of mean absolute error (MAE), root mean square error (RMSE), and mean absolute percentage error metrics (MAPE) on average by 19.7%, 6.9%, 27.9% respectively. Moreover, to investigate the trade-off on the graph scale and the prediction performance, we improved our model to search the optimal K value of neighbor nodes for graph partition for optimization. The K-neighboring method shows small scale graph also has the opportunity to achieve promising prediction result. The effectiveness of the proposal including multiple weight graphs as well as the optimal K-value for neighboring also were investigated. It can provide the interpretability of the traffic prediction tasks on the railway network.
View full abstract
-
Jie Yin, Yutaka Ishikawa, Atsuko Takefusa
Article type: Special Issue of “Applications and the internet” in conjunction with the main topics of COMPSAC 2023
Subject area: System Software Design and Implementation
2024Volume 32 Pages
586-595
Published: 2024
Released on J-STAGE: August 15, 2024
JOURNAL
FREE ACCESS
Cybersecuity has always been a challenging topic along with the accelerating growth in the number of connected Internet of Things (IoT) devices and their heterogeneity. System monitoring as one of the predominant security hardening approaches are often introduced to IoT systems for detecting anomaly activities and ongoing intrusion. System auditing is one of the fundamental approaches for implementing such systems. However, most of the existing monitoring techniques for IoT systems heavily rely on network traffic analysis. In the previous work, we emphasized the device endpoint itself, proposed a flexible and extensible monitoring framework for Linux-based IoT systems, and presented the feasibility and performance evaluation of the framework by implementing a monitoring prototype and an IoT application simulating real-world surveillance scenario on an ARM device. In this work, we further improved the implementation of the monitoring prototype and introduced Linux control groups, a.k.a cgroups for meticulous resource management between the monitoring components and the application processes. By conducting a series of comparative evaluation experiments under different CPU isolation and scheduling methods, our experiment results showcased the significance of the CPU isolation and process scheduling methods in terms of performance, and the minimal overhead cost of the proposed monitoring framework on IoT device.
View full abstract
-
Kyosuke Teramoto, Tomoki Haruyama, Takuru Shimoyama, Fumihiko Kato, Hi ...
Article type: Special Issue of “Applications and the internet” in conjunction with the main topics of COMPSAC 2023
Subject area: Embedded System Technology
2024Volume 32 Pages
596-604
Published: 2024
Released on J-STAGE: August 15, 2024
JOURNAL
FREE ACCESS
With the recent development of smart homes and IoT, sensing of daily activities has been attracting attention. Wi-Fi CSI-based methods have been studied as a method for activity recognition because they can reduce costs and protect privacy by not showing the identity of people. In a previous study it was reported that CSI can recognize simple human motions indoors. However, in the previous studies, the CSI-based methods were not able to recognize the various activities of people in their daily lives sufficiently. In addition, the training data is often manually annotated, and the training cost is high. Therefore, in this study, we performed action recognition using semi-supervised learning with a 1D-CNN on three receivers for everyday actions. The proposed method uses a 1D-CNN model to identify which room the subject is in when identifying actions such as global learning and then uses semi-supervised learning based on the FixMatch method as local learning to recognize more detailed actions. Although FixMatch was originally designed for image classification, the proposed method is implemented in such a way that it is effective even for time-series data format by using a data expansion method for time-series data. In basic experiments, we confirmed that the proposed method achieves an accuracy of 98% for global learning to determine rooms, and an accuracy higher than that of conventional supervised learning for local learning to recognize detailed actions.
View full abstract
-
Tatsuya Sato, Taku Shimosawa, Nariyoshi Yamai
Article type: Special Issue of “Applications and the internet” in conjunction with the main topics of COMPSAC 2023
Subject area: Distributed Systems Operation and Management
2024Volume 32 Pages
605-617
Published: 2024
Released on J-STAGE: August 15, 2024
JOURNAL
FREE ACCESS
Enterprises have focused much of their attention on consortium blockchains, in which multiple authorized organizations participate to form a consortium, in contrast to public blockchains, which consist of unspecified participants. Since a system using a blockchain is based on participants providing/using resources such as nodes to each other, a mechanism to motivate participants to provide resources is important. Typically, a public blockchain realizes an incentive mechanism for providing nodes by unspecified participants through mining rewards and transaction fees associated with a consensus algorithm such as PoW on the platform layer. On the other hand, a consortium blockchain usually does not have an incentive mechanism in the platform itself. In order to maintain and operate a consortium, it is essential that participating organizations provide resources or pay the costs of providing resources. However, because participating organizations use different percentages of the resources, a uniform fee could create inequalities or unfair situations among organizations. Therefore, it is necessary to manage billing to remove the gap between the costs borne and the actual usage or/and the unfairness among participants in the consortium. This research aims to realize a billing management method for consortium blockchain-based systems. We propose a smart contract-based service billing management mechanism named “BillingOpsSC” to ensure cross-organizational transparency for billing calculation and management. Furthermore, to ensure fairness among participants, we design a billing calculation model based on metering of the “contribution” by each organization such as providing resources, and the system “usage” by each organization, in the consortium. We implement a prototype of the proposed method including the billing management mechanism and calculation model for Hyperledger Fabric, which is one of the major consortium blockchains, and evaluate the method using the prototype. The results show that our proposed method is effective in ensuring transparency and fairness in billing management among multiple organizations, and in terms of performance is practical enough to be used in making monthly billing calculations.
View full abstract
-
Masataka Kakinouchi, Kazumasa Omote
Article type: Regular Paper
Subject area: Security and Society
2024Volume 32 Pages
618-627
Published: 2024
Released on J-STAGE: August 15, 2024
JOURNAL
FREE ACCESS
Information security measures are essential for users of information and communication technology (ICT). This necessity may lead to mental exhaustion (i.e., security fatigue) due to the troublesome and time-consuming measures. It has already been shown that security fatigue prevents the effectiveness of information security measures. However, there are no studies on security fatigue for the visually impaired, so there is a problem that the results of studies conducted for the sighted cannot be applied to the visually impaired. It is necessary to propose methods to reduce the degree of fatigue, not only for those with low levels of security measures, but also for those with high fatigue, because the levels of security measures may decrease as the fatigue level accumulates. In this study, we analyze only visually impaired university students about the relationship between the degree of security fatigue and the degree of taking security measures. From this point of view, we propose new two security education methods that will contribute to an increase in the levels of taking security measures for the visually impaired. Thus, in response to these security education methods, we engage in problem-posing activities during individual interviews and analyze the vital aspects of screen-reader education.
View full abstract
-
Haiyue Song, Raj Dabre, Chenhui Chu, Atsushi Fujita, Sadao Kurohashi
Article type: Regular Paper
Subject area: Natural Language Processing
2024Volume 32 Pages
628-640
Published: 2024
Released on J-STAGE: August 15, 2024
JOURNAL
FREE ACCESS
Lecture transcript translation helps learners understand online courses; however, building a high-quality lecture machine translation system lacks publicly available parallel corpora. To address this, we examine a framework for parallel corpus mining, which provides a quick and effective way to mine a parallel corpus from publicly available lectures on Coursera. To create the parallel corpora, we propose a dynamic programming based sentence alignment algorithm which leverages the cosine similarity of machine-translated sentences. The sentence alignment F1 score reaches 96%, which is higher than using the BERTScore, LASER, or sentBERT methods. For both English-Japanese and English-Chinese lecture translations, we extracted parallel corpora of approximately 50,000 lines and created development and test sets through manual filtering for benchmarking translation performance. Through machine translation experiments, we show that the mined corpora enhance the quality of lecture transcript translation when used in conjunction with out-of-domain parallel corpora via multistage fine-tuning. Furthermore, this study also suggests guidelines for gathering and cleaning corpora, mining parallel sentences, cleaning noise in the mined data, and creating high-quality evaluation splits. For the sake of reproducibility, we have released the corpora as well as the code to create them.
View full abstract
-
Mohammad Soltanian, Keivan Borna
Article type: Regular Paper
Subject area: Image Processing
2024Volume 32 Pages
641-651
Published: 2024
Released on J-STAGE: August 15, 2024
JOURNAL
FREE ACCESS
The classification of encoded frame level descriptors is a common approach to video event recognition. However, the structural incorporation of visual temporal clues to the encoding process is often ignored, resulting in reduced recognition accuracy. In this paper, a spatio-temporal video encoding method is proposed that improves the accuracy of video event recognition. By fine-tuning the Convolutional Neural Network (CNN) concept score extractors, pre-trained on ImageNet, frame level descriptors are computed. The descriptors are then encoded and normalized and are fed to the classifier to discriminate between the video events. The main contribution here is to use the temporal dimension of video signals to construct a spatio-temporal vector of locally aggregated descriptors (VLAD) encoding scheme. The proposed encoding is shown to be in the form of a non-convex constrained optimization problem with ℓ0 norm terms, which is transformed, by a Gaussian approximation, into a smoothed version. This makes the cost function differentiable and overcomes the non-smoothness challenge. Compared to the state-of-the-art video event recognition schemes, the proposed method achieves comparable performance as examined over two public datasets: Columbia consumer video (CCV) and Kinetics-400.
View full abstract
-
Vijay Daultani, Héctor Javier Vázquez Martínez, Naoaki Okazaki
Article type: Regular Paper
Subject area: Special Section on Databases
2024Volume 32 Pages
652-666
Published: 2024
Released on J-STAGE: August 15, 2024
JOURNAL
FREE ACCESS
The success of Language Models (LMs) on a variety of NLP tasks has prompted the design and analysis of natural language benchmarks to evaluate their fitness for particular applications. In this work, we focus on the NLP task of acceptability rating, whereby a given model must rate the ‘goodness’ of a series of tokens. We find the current commonly used datasets to benchmark for LM sentence acceptability fail to capture the distribution of naturally occurring written data. Moreover, we find that the bias toward shorter (5-8 word) sentences is a strong confounding factor that contributes positively to LMs' performance. We then introduce seven datasets collected from the NLP literature that closely follow the sentence length distribution of naturally occurring written text. In our experiments, when sentence length is controlled by adjusting the distribution to match naturally occurring data, we observe a performance drop for current commonly used datasets of up to 48 points in MCC. We conclude with a discussion on implications for current applications and recommendations to improve our current commonly used acceptability benchmarking datasets.
View full abstract
-
Noriaki Matsumura, Takahiro Hasegawa
Article type: Regular Paper
Subject area: Special Section on digital practices
2024Volume 32 Pages
667-677
Published: 2024
Released on J-STAGE: August 15, 2024
JOURNAL
FREE ACCESS
Three Management System Standards (MSS) published by the International Organization for Standardization (ISO) are applicable to organizations providing IT services: the MSS of Information Security, Service, and Business Continuity (3MSS). Operating 3MSS without integrating processes, including Risk Assessments (RA), may result in duplication of processes and inconsistency in assessment results. Although the ISO provides examples for integrating MSS requirements, it does not provide specifics on how to integrate RA, which are the core elements of MSS. Studies related to the integration of MSS have not yet revealed any methods for integrating RA. Here, we devise and present a method for integrating RA in 3MSS, called the Integrated Risk Assessment Method for Three Management System Standards (IRA-3MSS). The Business Impact Analysis (BIA), which is required by the Business Continuity Management System (BCMS) shows priorities of IT services to be followed. The IRA-3MSS incorporates those priorities as parameters into the integrated RA method for the Information Security Management System (ISMS), and the Service Management System (SMS). The case study results showed that the duplication of RA processes in 3MSS could be avoided using IRA-3MSS. Because IRA-3MSS also integrates the calculation of risk levels for assets and IT services through an established formula, inconsistency in assessment results did not occur. These results demonstrate the effectiveness of IRA-3MSS and provide a novel perspective for studies related to MSS integration.
View full abstract
-
Mitsuaki Akiyama
Article type: Special Issue of Cybersecurity Technologies for Secure Supply-Chain
2024Volume 32 Pages
678
Published: 2024
Released on J-STAGE: September 15, 2024
JOURNAL
FREE ACCESS
-
Shogo Shiraki, Hayato Kimura, Takanori Isobe
Article type: Special Issue of Cybersecurity Technologies for Secure Supply-Chain
Subject area: Security Infrastructure
2024Volume 32 Pages
679-689
Published: 2024
Released on J-STAGE: September 15, 2024
JOURNAL
FREE ACCESS
The global outbreak of COVID-19 has rapidly increased the use of video conferencing systems, highlighting the importance of end-to-end encryption (E2EE) technology for protecting user privacy. This paper analyzes the security of SFrame, an E2EE technology used in Cisco Webex and similar systems. Specifically, we assess the security of SFrame as described in two versions published by the IETF: an older version known as draft-omara-sframe-01 and the latest version at the time of writing this paper, draft-ietf-sframe-enc-03. The draft-omara-sframe-01 had signature vulnerabilities pointed out by Isobe et al., and the draft-ietf-sframe-enc-03 includes updates addressing these issues. First, we analyze the older version of SFrame, draft-omara-sframe-01, and propose new attacks that allows us to manipulate more plaintext than the attacks identified by Isobe et al. Next, we study the latest version, draft-ietf-sframe-enc-03, showing that many of the attacks proposed by Isobe et al. and our new attacks are still effective against this latest version, even when it uses the same signature process as the older version.
View full abstract
-
Ren Ishibashi, Kazuki Yoneyama
Article type: Special Issue of Cybersecurity Technologies for Secure Supply-Chain
Subject area: Security Infrastructure
2024Volume 32 Pages
690-709
Published: 2024
Released on J-STAGE: September 15, 2024
JOURNAL
FREE ACCESS
Authenticated Key Exchange (AKE) is a cryptographic protocol to share a common session key among multiple parties. At ISC 2021, Liu, Tang, and Zhou proposed a modular multi-factor AKE framework resilient to a characteristic attack called server compromise impersonation based on big data in the bounded-retrieval model and concrete post-quantum big data-based AKE schemes. They also formulated a security model (LTZ model) that captures perfect forward secrecy, key compromise impersonation, and server compromise impersonation. However, the LTZ model does not consider the compromise of ephemeral secret keys, and their schemes rely on the random oracle model. In this paper, we extend the LTZ model (LTZ-eCK model) to capture the compromise of ephemeral secret keys and propose a generic construction of big data-based AKE resilient to both server compromise impersonation and ephemeral key leakage in the standard model. Our generic construction allows us to achieve the post-quantum big data-based AKE scheme (from isogenies, lattice, etc.) in the LTZ-eCK model without random oracles.
View full abstract
-
Yoshiyuki Sakamaki, Takeru Fukuoka, Junpei Yamaguchi, Masanobu Morinag ...
Article type: Special Issue of Cybersecurity Technologies for Secure Supply-Chain
Subject area: Security Infrastructure
2024Volume 32 Pages
710-718
Published: 2024
Released on J-STAGE: September 15, 2024
JOURNAL
FREE ACCESS
A social network is a social structure formed by the participants and their relationships. There are many studies and research papers regarding the various social networks. We regard a peer-to-peer supply chain network as a social network. On social networks, nodes' (participants') metrics regarding reputation and reliability are helpful information for forming and improving their relationships. Many researchers have proposed various mathematical models and node metrics on social networks. A network modeled as an edge-weighted directed graph is called a weighted signed network (WSN). We assume each node subjectively evaluates other nodes related to itself by scores and therefore refer to them as subjective scores. We obtain a weighted signed network by relating subjective scores to edge weights. There are many studies of methods to calculate the reputation and reliability metrics of nodes from the viewpoint of a whole network by using these subjective scores. However, subjective scores of each node tend to be confidential information for a person and organization. Nodes therefore wish to keep their scores confidential. This paper proposes exponentially convergent scores called 2-fairness and 2-goodness, for nodes of weighted signed networks and proposes a secure rating computation for them that keeps each node's subjective scores and related information secret.
View full abstract
-
Koji Shima, Hiroshi Doi
Article type: Special Issue of Cybersecurity Technologies for Secure Supply-Chain
Subject area: Security Infrastructure
2024Volume 32 Pages
719-730
Published: 2024
Released on J-STAGE: September 15, 2024
JOURNAL
FREE ACCESS
Hierarchical secret-sharing schemes (HSSSs) are known for the manner in which they share a secret among a group of participants partitioned into levels. First, we developed a fast HSSS that requires a minimal number of higher-level participants for recovering the secret. Next, in HSSSs, depending on the allocation of participant identities, there exists a specific case in which the secret cannot be recovered in spite of the presence of an authorized subset. For practical use, this traditional allocation issue should be eliminated in advance by adopting some simple condition. In this paper, we propose an XOR-based HSSS that can be applied to any level. It theoretically resolves the allocation issue for threshold k=3 with a simple condition and is therefore ideal for practical use. We also present an evaluation of our software implementation.
View full abstract
-
Qingxin Mao, Daisuke Makita, Michel van Eeten, Katsunari Yoshioka, Tsu ...
Article type: Special Issue of Cybersecurity Technologies for Secure Supply-Chain
Subject area: Network Security
2024Volume 32 Pages
731-747
Published: 2024
Released on J-STAGE: September 15, 2024
JOURNAL
FREE ACCESS
Carpet bombing-type DDoS attacks targeting a wide-range network rather than a single IP address have threatened the Internet. Some researchers have investigated the characteristics of single-target DDoS attacks. Still, much less is known about the characteristics of carpet bombing, even the differences between them. In this paper, we profile characteristics of carpet bombing via data from amplification DDoS honeypots and the differences between single-target DRDoS attacks and carpet bombing. We analyze attacks highly concentrated on a specific network on victims, duration, number of packets, ports, and TTLs, and describe the differences between single-target DRDoS attacks and carpet bombing. Our analysis at the level of Autonomous Systems demonstrates that carpet bombing attacks target more hosting networks, including some critical targets, than single-target attacks. We found carpet bombing attacks targeting more “Corporate” networks. We also found that each IP address targeted by carpet bombing receives fewer packets than single-target DRDoS attacks. According to the result of the comparison of attack duration and TTL, carpet bombing lasted longer and referred to having diverse values of TTL in the packets. On the contrary, most single-target DRDoS attacks have a single value of TTL in the packets. This implies carpet bombing has a higher probability of originating from multiple sources. Finally, comparing ports shows that using various ports for Carpet Bombing is highly proportional to single-target DRDoS attacks.
View full abstract
-
Ngoc Minh Phung, Mamoru Mimura
Article type: Special Issue of Cybersecurity Technologies for Secure Supply-Chain
Subject area: Network Security
2024Volume 32 Pages
748-756
Published: 2024
Released on J-STAGE: September 15, 2024
JOURNAL
FREE ACCESS
Malicious JavaScript detection using machine learning models has shown many great results over the years. However, real-world data only has a small fraction of malicious JavaScript. Many previous techniques ignore most of the benign samples and focus on training a machine learning model with a balanced dataset. This paper continues the previous work (Phung and Mimura, 2023), uses Support vector machine (SVM) and Multi-layer perceptron (MLP) as classifiers, trains the models with a Doc2Vec-based filter that can quickly classify JavaScript malware using Natural Language Processing (NLP) and feature re-sampling. In this paper, the total features of the benign samples will be reduced using a combination of word vectors and a clustering model. Random seed oversampling will generate new training malicious data based on the original training dataset. We evaluate our models with a dataset of over 30,000 samples obtained from top popular websites, PhishTank, and GitHub. The experimental result shows that Abstract syntax tree (AST) parsing has the most effect on the improvement of the detection scores.
View full abstract
-
Takayuki Miura, Masanobu Kii, Toshiki Shibahara, Kazuki Iwahana, Tetsu ...
Article type: Special Issue of Cybersecurity Technologies for Secure Supply-Chain
Subject area: Security and Society
2024Volume 32 Pages
757-766
Published: 2024
Released on J-STAGE: September 15, 2024
JOURNAL
FREE ACCESS
Synthetic data generation techniques are promising for anonymizing high-dimensional tabular datasets, and their privacy protection can be evaluated by membership inference attacks. However, the existing evaluation framework has limitations from two perspectives: (1) it cannot evaluate the worst-case because a target sample is chosen randomly; and (2) the decision criterion of an adversary's inference is black box since the adversary conducts membership inference by using machine learning models. In this paper, we propose a framework to overcome the above limitations in a simple and clear fashion. To cope with limitation (1), we introduce a statistical distance to choose a vulnerable target sample. To cope with limitation (2), we propose two interpretable inference methods. One is a method with typical statistics scores, and the other is a method with the Euclidean distance from the target sample. We conduct extensive experiments on two datasets and five synthesis algorithms to confirm the effectiveness of our framework. The experiments show that our framework enables us to evaluate privacy in synthetic data generation techniques more tightly.
View full abstract
-
Tomohiko Yano, Hiroki Kuzuno
Article type: Special Issue of Cybersecurity Technologies for Secure Supply-Chain
Subject area: System Security
2024Volume 32 Pages
767-778
Published: 2024
Released on J-STAGE: September 15, 2024
JOURNAL
FREE ACCESS
In recent years, Open-source software (OSS) has become a mainstream technology essential to information systems. However, its secure application requires a comprehensive understanding of its various security risks. One of them is vulnerability risk. A vulnerability risk involves the discovery of a new vulnerability in the OSS in use, which must be immediately addressed by security administrators, such as software updates. On the other hand, developmental risks involve OSS that are not in active development. If the development of an OSS is stalled, an alternative OSS should be considered because newly identified vulnerabilities may not be fixed. Therefore, a specialized method is required to analyze vulnerability and developmental risks of OSS, while accounting for their dependencies. This paper proposes a method that identifies such security risks of OSS by extracting, linking, and visualizing the vulnerabilities, development status, and dependency information. The proposed method enables security administrators to check visualization results, identify OSS with security risks, and consider appropriate countermeasures. We experimentally evaluate the adequacy of the visualizations for the purpose of the identification of security risks, and calculate the processing time required to visualize the risks.
View full abstract
-
Huang Shu-Pei, Takuya Watanabe, Mitsuaki Akiyama, Tatsuya Mori
Article type: Special Issue of Cybersecurity Technologies for Secure Supply-Chain
Subject area: System Security
2024Volume 32 Pages
779-788
Published: 2024
Released on J-STAGE: September 15, 2024
JOURNAL
FREE ACCESS
This study aims to explore the privacy concerns associated with static hard-coded URLs used by virtual reality (VR) applications. Among these URLs, we identified destinations leading to advertising and analytics services, which may raise user privacy concerns. Our primary focus was in external libraries or sources that developers import, including privacy-related URLs. Through our measurement study of the VR device Oculus Go, we used a proposed categorization methodology to disclose popular advertising and analytics sources. The results revealed the presence of potential privacy threats and their implications for user rights. Consequently, we discuss the possible improvements for VR manufacturers to enhance future VR experiences.
View full abstract
-
Takayuki Sasaki, Jia Wang, Kazumasa Omote, Katsunari Yoshioka, Tsutomu ...
Article type: Special Issue of Cybersecurity Technologies for Secure Supply-Chain
Subject area: System Security
2024Volume 32 Pages
789-800
Published: 2024
Released on J-STAGE: September 15, 2024
JOURNAL
FREE ACCESS
In recent years, Ethereum, which is a leading application for realizing blockchain services, has received much attention for its usability and functionality. Ethereum executes smart contracts and arbitrary programmable calculations, in addition to cryptocurrency trading. However, cyberattacks target misconfigured Ethereum clients with application programming interface (API) enabled, specifically JSON-RPC. Herein, we propose EtherWatch, a framework to detect and analyze malicious and/or suspicious Ethereum accounts using three data sources (a honeypot, an internet-wide scanner, and a blockchain explorer). The honeypot, named Etherpot, leverages a proxy server placed between a real Ethereum client and the internet. It modifies client responses to attract attackers, identifies malicious accounts, and analyzes their behaviors. Using scan results from Shodan, we also detect suspicious Ethereum accounts registered on multiple nodes. Finally, we utilize Etherscan, a well-known blockchain explorer, to track and analyze the activities of the detected accounts. During six weeks of observations, we discovered 538 hosts attempting to call JSON-RPC of our honeypots using 41 types of methods, including a type of unreported attack in the wild. Specifically, we observed account hijacking, mining, and smart contract attacks. We detected 16 malicious accounts using the honeypots and 64 suspicious accounts from the Shodan scan results, with five overlapping accounts. Finally, from Etherscan, we collected records of activities related to the detected accounts, including transactions of 21.50 ETH and mining of 22.61 ETH (equivalent to 39,494 US$ and 41,533 US$, respectively, as of June 9, 2023).
View full abstract
-
Takuya Watanabe, Eitaro Shioji, Mitsuaki Akiyama, Tatsuya Mori
Article type: Special Issue of Cybersecurity Technologies for Secure Supply-Chain
Subject area: System Security
2024Volume 32 Pages
801-816
Published: 2024
Released on J-STAGE: September 15, 2024
JOURNAL
FREE ACCESS
Intermediary web services such as web proxies, translators, and archives have become pervasive as a means to enhance the openness of the web. These services aim to remove the intrinsic obstacles to web access; i.e., access blocking, language barriers, and missing websites. In this study, we refer to these services as web rehosting services and make the first exploration of their security flaws. Web rehosting services use a single domain name to rehost several websites that have distinct domain names; this characteristic makes web rehosting services intrinsically vulnerable to violating the same origin policy if not operated carefully. Based on the intrinsic vulnerability of web rehosting services, we demonstrate that an attacker can perform five different types of attacks that target users of web rehosting services: persistent man-in-the-middle attack, abusing privileges to access various resources, stealing credentials, stealing browser history, and session hijacking/injection. Our extensive analysis of 21 web rehosting services, which have more than 200 million accesses per day, revealed that these attacks are feasible. In response to this observation, we provide effective countermeasures against each type of attack. Through our efforts, several major web services have successfully completed improvements to make their architecture more secure.
View full abstract
-
Kenji Hisazumi
Article type: Special Issue of Embedded Systems Engineering
2024Volume 32 Pages
817
Published: 2024
Released on J-STAGE: September 15, 2024
JOURNAL
FREE ACCESS
-
Yoshitada Takaso, Nao Yoshimura, Hiroshi Oyama, Hiroaki Takada, Takuya ...
Article type: Special Issue of Embedded Systems Engineering
Subject area: Software Analysis and Design
2024Volume 32 Pages
818-828
Published: 2024
Released on J-STAGE: September 15, 2024
JOURNAL
FREE ACCESS
As Internet of Things (IoT) technology continues to advance, the necessity for enhanced reliability in embedded systems becomes increasingly paramount. At the same time, the use of reliable multiprocessor real-time operating systems (RTOSs) are becoming increasingly prevalent. However, the extensive development effort and limited reusability of multiprocessor RTOS-based development remain hindrances to their use. To solve this problem, this paper proposes a component framework for reliable multiprocessor RTOSs. In this paper, OS functionalities are modularized into components, facilitating flexible processor and memory allocation based on component specifications. We also provide plugins for automated file generation from these specifications, and a GUI application for configuring time partitioning. Performance evaluations using test programs demonstrate that our framework maintains the target OS's main functionalities and performance, while enabling component-based extensions.
View full abstract
-
Yixiao Xing, Yixiao Li, Hiroaki Takada
Article type: Special Issue of Embedded Systems Engineering
Subject area: Architecture
2024Volume 32 Pages
829-843
Published: 2024
Released on J-STAGE: September 15, 2024
JOURNAL
FREE ACCESS
The adoption of multi-core platforms in embedded systems is growing steadily. Because current RTOSs mainly target dual cores, they tend to encounter performance bottlenecks as the number of cores keeps increasing. However, existing benchmark tools cannot effectively identify these issues. In this paper, we propose a benchmark methodology with multi-core specific factors taken into consideration. We further implement it on an experiment platform based on the TOPPERS/FMP RTOS kernel and 4-core Xilinx ZCU-104 board. The evaluation results show that our benchmark can successfully assess the performance characteristics under different kernel lock designs and various workload scenarios. We also reveal new insights on the possible challenges to implement an efficient multi-core RTOS.
View full abstract
-
Siqi Peng, Akihiro Yamamoto
Article type: Regular Paper
Subject area: Application Systems
2024Volume 32 Pages
844-860
Published: 2024
Released on J-STAGE: September 15, 2024
JOURNAL
FREE ACCESS
Concept lattice reduction is an important task related to formal concept analysis (FCA), a technique for knowledge extraction from binary relational data. Concept lattice reduction aims to approximate the input of FCA, called formal contexts, into simpler ones so that the volume and complexity of the output of FCA, called formal concepts, will also be reduced. A widely-preferred strategy for the task is object reduction, which approximates the input context by merging similar objects in it. While many methods were developed based on this strategy, we found that they may have two problems. First, the “approximate” context generated by such methods may introduce too many unnecessary modifications to the original one, making it hard to be considered a proper approximation. Second, these methods may unexpectedly insert or eliminate some concepts after reduction, which may cause significant changes to the knowledge extracted by FCA. To solve these problems, we introduce a new method using integer linear programming, which translates concept lattice reduction into an integer linear optimization problem and can suppress the changes caused to the input contexts and extracted concepts by adding corresponding linear constraints. We conduct experiments on several data sets to prove that our method works.
View full abstract
-
Takuya Maekawa
Article type: Special Issue of Ubiquitous Computing Systems (XII)
2024Volume 32 Pages
861-862
Published: 2024
Released on J-STAGE: October 15, 2024
JOURNAL
FREE ACCESS
-
Teerawat Kumrai, Zesheng Cai, Takuya Maekawa, Takahiro Hara, Kazuya Oh ...
Article type: Special Issue of Ubiquitous Computing Systems (XII)
Subject area: Mobile Computing
2024Volume 32 Pages
863-872
Published: 2024
Released on J-STAGE: October 15, 2024
JOURNAL
FREE ACCESS
This paper proposes a deep learning-based angle-of-arrival (AoA) estimation method using Wi-Fi channel state information (CSI). In MIMO-OFDM systems, data are transmitted using orthogonal subcarriers at different frequencies using multiple transmit-receive antenna pairs. However, because different subcarriers have different wavelengths, the impact of environmental noise on the CSI of the subcarriers is also different in real environments. In addition, we utilize compressed CSI data, involving the reduction of dimensionality from raw CSI data. This introduces a possibility of losing information about individual antennas in some subcarriers, potentially impacting the accuracy of AoA estimation. In this study, we propose to selectively use optimal subcarriers for estimating the AoA using a deep neural network. Our designed neural network, named AoA-Net, has an end-to-end architecture that automatically performs subcarrier selection using CSI data as input and then performs AoA estimation using information regarding the selected subcarriers. In addition, we employ multi-task learning to efficiently acquire knowledge that assists in selecting optimal subcarriers. To the best of our knowledge this is the first CSI-based study that estimates the AoA using end-to-end deep learning based on subcarrier selection. We demonstrate the effectiveness of our method using data collected in real environments, and our method exhibits state-of-the-art performance when using leave-one-environment-out cross-validation.
View full abstract
-
Kazuma Iwase, Tomoyuki Takenawa
Article type: Regular Paper
Subject area: Machine Learning & Data Mining
2024Volume 32 Pages
873-880
Published: 2024
Released on J-STAGE: October 15, 2024
JOURNAL
FREE ACCESS
Recent advances in numerical simulation methods based on physical models and their combination with machine learning have improved the accuracy of weather forecasts. However, the accuracy decreases in complex terrains such as mountainous regions because these methods usually use grids of several kilometers square and simple machine learning models. While deep learning has also made significant progress in recent years, its direct application is difficult to utilize the physical knowledge used in the simulation. This paper proposes a method that uses machine learning to interpolate future weather in mountainous regions using forecast data from surrounding plains and past observed data to improve weather forecasts in mountainous regions. We focus on mountainous regions in Japan and predict temperature and precipitation mainly using LightGBM as a machine learning model. Despite the use of a small dataset, through feature engineering and model tuning, our method partially achieves improvements in the RMSE with significantly less training time.
View full abstract
-
Chengguang Gan, Qinghao Zhang, Tatsunori Mori
Article type: Regular Paper
Subject area: Natural Language Processing
2024Volume 32 Pages
881-893
Published: 2024
Released on J-STAGE: October 15, 2024
JOURNAL
FREE ACCESS
The automation of resume screening is a crucial aspect of the recruitment process in organizations. Automated resume screening systems often encompass a range of natural language processing (NLP) tasks. This paper introduces a novel Large Language Models (LLMs) based agent framework for resume screening, aimed at enhancing efficiency and time management in recruitment processes. Our framework is distinct in its ability to efficiently summarize and grade each resume from a large dataset. Moreover, it utilizes LLM agents for decision-making. To evaluate our framework, we constructed a dataset from actual resumes and simulated a resume screening process. Subsequently, the outcomes of the simulation experiment were compared and subjected to detailed analysis. The results demonstrate that our automated resume screening framework is 11 times faster than traditional manual methods. Furthermore, by fine-tuning the LLMs, we observed a significant improvement in the F1 score, reaching 87.73%, during the resume sentence classification phase. In the resume summarization and grading phase, our fine-tuned model surpassed the baseline performance of the GPT-3.5 model. Analysis of the decision-making efficacy of the LLM agents in the final offer stage further underscores the potential of LLM agents in transforming resume screening processes.
View full abstract
-
Daiki Watanabe, Takumi Tada, Kazuya Haraguchi
Article type: Regular Paper
Subject area: Information Mathematics
2024Volume 32 Pages
894-902
Published: 2024
Released on J-STAGE: November 15, 2024
JOURNAL
FREE ACCESS
Suppose that we are given a tuple (G, I, σ) of an undirected graph G=(V, E), an itemset I={1, 2, ..., q}, and a mapping σ:V → 2I. For S ⊆ V, let us denote Iσ(S):=⋂v∈Sσ(v). We say that S induces a support-closed connected subgraph if S is maximal with respect to the connectivity and the common itemset (i.e., the subgraph G[S] induced by S is connected and, for any v ∈ V \ S, G[S ∪ {v}] is disconnected or Iσ(S ∪ {v})⊊ Iσ(S)). Several algorithms have been developed for enumerating all support-closed connected induced subgraphs, where some of them are shown to be polynomial-amortized or polynomial-delay. However, a theoretical complexity bound does not necessarily represent the empirical performance of an algorithm. In this paper, we compare the empirical performance of four enumeration algorithms named COPINE (branch-and-bound), COOMA (dynamic programming), FT (family-tree) and BP (binary partition), respectively. We observe that, for random instances, COOMA is generally the fastest, followed by BP, and that these two algorithms are more robust to parameters of random instances than the other two. We then extend BP to a problem that arises from bacterial GWAS (Genome-Wide Association Studies) and observe that its implementation is much faster than an existing implementation of CALDERA, a previous algorithm.
View full abstract
-
Tianjia Ni, Kento Sugiura, Yoshiharu Ishikawa, Kejing Lu
Article type: Regular Paper
Subject area: Data Models and Database Design
2024Volume 32 Pages
903-915
Published: 2024
Released on J-STAGE: November 15, 2024
JOURNAL
FREE ACCESS
In recent years, efficient query processing in databases has become more crucial with the sophistication of analysis requirements. Approximate query processing (AQP) is one of the approaches to dealing with database queries on big data. In this research, we focus on synopsis construction on a relational database and the query technology based on it, called bounded approximate query processing (BAQ). This paper points out the limitations of existing research BAQ and solves them by the proposed BAQ±. BAQ± is capable of dealing datasets with a broader range while ensuring the exact error bound. Furthermore, compared to the original BAQ, BAQ± generates a more compact synopsis with various data distributions. We introduce an innovative bucketing approach to construct smaller synopses while keeping the same properties in BAQ. Additionally, we propose novel rewrite methods for answering online queries by deriving error guarantee conditions. We provide extensive experiment assessments using different distribution datasets. Our BAQ± provides a smaller synopsis at 64% the size of BAQ and efficiently executes online queries within the exact error bound.
View full abstract
-
Naoya Yokoyama, Kiyofumi Tanaka
Article type: Regular Paper
Subject area: Network Service Basics
2024Volume 32 Pages
916-928
Published: 2024
Released on J-STAGE: November 15, 2024
JOURNAL
FREE ACCESS
When offering services such as e-commerce on the cloud, there is a necessity to adjust the amount of server resources provided in response to the irregularly increasing and decreasing traffic. This need arises when there is a desire to maintain a constant level of service while, at the same time, minimizing costs as much as possible. There is an abundance of prior research regarding the prediction of future loads from past time-series data. Many of these approaches rely on traditional time-series forecasting, which necessitates that the data used for learning adhere to stationary or unit root processes, or they use deep learning approaches that include using a vast amount of data and parameters. In this study, we propose a traffic forecasting method that includes dynamic window size changes that can follow even slight variation in trends. This method incorporates a fuzzy-entropy-based burst traffic detection in the regression estimation with sliding-window learning. In the evaluation, we conduct four comparative experiments using actual traffic rather than simulations. As a result, compared to the baseline, the proposed reduced the number of request failures and improved the Mean Squared Error between the ideal and the actual container count by 26.4 points on average.
View full abstract
-
Yuma Dose, Shuichiro Haruta, Takahiro Hara
Article type: Regular Paper
Subject area: Network Services
2024Volume 32 Pages
929-937
Published: 2024
Released on J-STAGE: November 15, 2024
JOURNAL
FREE ACCESS
Graph-based recommendation models, which utilize a user-item interaction graph whose edges represent users' preferences, are well known because of their effectiveness. However, many existing models are intrinsically transductive, meaning they can make recommendations only for users and items that exist in training data. To solve this challenge, some researchers focus on recommendations in an inductive scenario where new users/items emerge in the inference phase. In this paper, we propose a contrastive learning framework for the inductive scenario, INductive Contrastive Learning (INCL). The situation where new users/items emerge can be interpreted as changes in the interaction graph. Therefore, it is crucial to capture the change in the interaction graph for inductive recommendations. Based on the perspective, INCL creates an augmented graph by adding or removing edges of the original interaction graph and simulates changes in the interaction graph. By performing contrastive learning, INCL fosters each representation generated from the original interaction graph and the augmented graph to be similar. As a result, INCL is trained to generate robust representations that can adapt to changes in the interaction graph. Experimental results on real-world datasets demonstrate that INCL outperforms existing models in the inductive scenario, achieving up to 2.13% improvement in Recall@20.
View full abstract
-
Mariko Chiba, Wataru Yamada, Keiichi Ochiai
Article type: Regular Paper
Subject area: User Interfaces and Interactive Systems
2024Volume 32 Pages
938-947
Published: 2024
Released on J-STAGE: November 15, 2024
JOURNAL
FREE ACCESS
In oral communication, especially public speaking, it is essential to speak at a speed that is appropriate for the situation. However, speech control requires substantial training. Although several speech-training systems that provide automated feedback on users' speech quality or behavior have been developed, users are still required to consciously control their ways of speaking to improve their speech. This study proposes a speech-supporting system that enables users to speak at a pace close to the target rate with minimum conscious adjustment. Because auditory feedback on the speaker's voice with a short delay disturbs speaking, we used auditory feedback with continuously varying delay times to slow the speaker's speech rate when speaking fast. We implemented a prototype and conducted a user study with ten speakers to confirm the effects on the speech style of the speaker during public speaking. The results show that the proposed system can slow the speech rate when the user speaks quickly without instructing the speaker on how to respond to auditory feedback. The findings also suggest that the proposed system causes less discomfort to the speaker than delayed auditory feedback with a constant delay.
View full abstract
-
Makoto Miyazaki, Hiroyoshi Watanabe, Mieko Masaka, Kumiko Takai
Article type: Regular Paper
Subject area: Education
2024Volume 32 Pages
948-962
Published: 2024
Released on J-STAGE: November 15, 2024
JOURNAL
FREE ACCESS
In higher education, the assessment and development of generic skills are important tasks that complement the learning of specialized knowledge. This study presents several requirements for generic skills assessment systems based on an analysis of the characteristics of generic skills assessment activities at universities. Specifically, an appropriate system should be able to deal with the parent-child relationship within the assessment indexes because generic skills are comprised various subordinate skills. Moreover, the system must be interoperable with other educational systems such as learning management systems. We developed a self-assessment system that meets these requirements and conforms with two 1EdTech technical standards, i.e., CASE (Competencies and Academic Standards Exchange) and LTI (Learning Tools Interoperability). The developed system was used 50 times for generic skills assessment activities with different groups of students in distinct contexts within the Department of Information and Electronic Engineering at Teikyo University. The results of these real-world applications demonstrate that the system worked well in the activities and was well received by students, indicating that it can be employed as an assessment system at universities with 4-year curriculums.
View full abstract