Journal of Information Processing

Efficient Pattern Matching over Out-of-Order Event Streams Using Sliding Buffer

Rina Trisminingsih, Savong Bou, Toshiyuki Amagasa

Article type: Regular Paper
Subject area: Special Section on Databases
2024Volume 32 Pages 963-972
Published: 2024
Released on J-STAGE: November 15, 2024

DOIhttps://doi.org/10.2197/ipsjjip.32.963

JOURNAL FREE ACCESS

Show abstractHide abstract

Pattern matching over event streams has been widely used in many modern applications. A key technology tailored for pattern matching in streaming settings is Complex Event Processing (CEP). The most common CEP-based systems assume that input events arrive in total order based on their occurrence timestamps. However, this assumption fails in real-world applications for several reasons, such as network latency, machine failure, causing out-of-order events. Consequently, the current CEP systems face challenges to ensure accuracy and timeliness in pattern matching. To solve this problem, a number of out-of-order event stream handling approaches have been proposed. This paper introduces a new approach to out-of-order event stream handling for nondeterministic finite automaton (NFA)-based pattern matching. The key idea is to employ a sliding buffer structure that utilizes the sliding window concept and temporal indexing to incrementally evaluate the predefined pattern over streams. Further, we present a new matching algorithm that supports an invalidation mechanism for incorrect prior pattern matches affected by out-of-order events. We conduct extensive experimentation with synthetic and real-world datasets, and the results confirm that our proposed approach can deal with out-of-order event streams in pattern matching better than the state-of-the-art approach, especially in computational latency.

View full abstract

Download PDF (33411K)
PhiSN: Phishing URL Detection Using Segmentation and NLP Features

Eint Sandi Aung, Hayato Yamana

Article type: Regular Paper
Subject area: Special Section on Databases
2024Volume 32 Pages 973-989
Published: 2024
Released on J-STAGE: November 15, 2024

DOIhttps://doi.org/10.2197/ipsjjip.32.973

JOURNAL FREE ACCESS

Show abstractHide abstract

Phishing is a cyberattack method employed by malicious actors who impersonate authorized personnel to illicitly obtain confidential information. Phishing URLs constitute a form of URL-based intrusion designed to entice users into disclosing sensitive data. This study focuses on URL-based phishing detection, which involves identifying phishing webpages by analyzing their URLs. We primarily address two unresolved issues of previous research: (i) insufficient word tokenization of URLs, which leads to the inclusion of meaningless and unknown words, thereby diminishing the accuracy of phishing detection, (ii) dataset-dependent lexical features, such as phish-hinted words extracted exclusively from a given dataset, which restricts the robustness of the detection method. To solve the first issue, we propose a new segmentation-based word-level tokenization algorithm called SegURLizer. The second issue is addressed by dataset-independent natural language processing (NLP) features, incorporating various features extracted from domain-part and path-part of URLs. Then, we employ the segmentation-based features with SegURLizer on two neural network models - long short-term memory (LSTM) and bidirectional long short-term memory (BiLSTM). We then train the 36 selected NLP-based features on deep neural networks (DNNs). Afterward, we combine the outputs of the two models to develop a hybrid DNN model, named “PhiSN, ” representing phishing URL detection through segmentation and NLP features. Our experiments, conducted on five open datasets, confirmed the higher performance of our PhiSN model compared to state-of-the-art methods. Specifically, our model achieved higher F1-measure and accuracy, ranging from 95.30% to 99.75% F1-measure and from 95.41% to 99.76% accuracy, on each dataset.

View full abstract

Download PDF (3766K)
Measuring Local and Shuffled Privacy of Gradient Randomized Response

Marin Matsumoto, Tsubasa Takahashi, Seng Pei Liew, Masato Oguchi

Article type: Regular Paper
Subject area: Special Section on Databases
2024Volume 32 Pages 990-1002
Published: 2024
Released on J-STAGE: November 15, 2024

DOIhttps://doi.org/10.2197/ipsjjip.32.990

JOURNAL FREE ACCESS

Show abstractHide abstract

Local differential privacy (LDP) provides a strong privacy guarantee in a distributed setting such as federated learning (FL). When a central curator deploys local randomizers satisfying ε₀-LDP, how can we confirm and measure the given privacy guarantees at clients? To answer the above question, we introduce an empirical privacy test in FL clients by measuring the lower bounds of LDP, which gives us empirical ε₀ and probability that the two gradients can be distinguished. To audit the given privacy guarantees (i.e., ε₀), we first discover a worst-case scenario that reaches the theoretical upper bound of LDP, which is essential to empirically materialize the given privacy guarantees. We further instantiate several adversaries in FL under LDP to observe empirical LDP at various attack surfaces. The empirical privacy test with those adversary instantiations enables FL clients to understand how the given privacy level protects them more intuitively and verify that mechanisms claiming ε₀-LDP provide equivalent privacy protection. We also demonstrate numerical observations of the measured privacy in these adversarial settings, and the randomization algorithm LDP-SGD is vulnerable to gradient manipulation and a maliciously well-manipulated model. We further discuss employing a shuffler to measure empirical privacy in a collaborative way and also measuring the privacy of the shuffled model. Our observation suggests that the theoretical ε in the shuffle model has room for improvement.

View full abstract

Download PDF (933K)
Practical Persistent Multi-word Compare-and-Swap Algorithms for Many-core CPUs

Kento Sugiura, Manabu Nishimura, Yoshiharu Ishikawa

Article type: Regular Paper
Subject area: Special Section on Databases
2024Volume 32 Pages 1003-1012
Published: 2024
Released on J-STAGE: November 15, 2024

DOIhttps://doi.org/10.2197/ipsjjip.32.1003

JOURNAL FREE ACCESS

Show abstractHide abstract

In the last decade, academic and industrial researchers have focused on persistent memory because of the development of the first practical product, Intel Optane. One of the main challenges of persistent memory programming is to guarantee consistent durability over separate memory addresses, and Wang et al. proposed a persistent multi-word compare-and-swap (PMwCAS) algorithm to solve this problem. However, their algorithm contains redundant compare-and-swap (CAS) and cache flush instructions and does not achieve sufficient performance on many-core CPUs. This paper proposes a new algorithm to improve performance on many-core CPUs by removing useless CAS/flush instructions from PMwCAS operations. We also exclude dirty flags, which help ensure consistent durability in the original algorithm, from our algorithm using PMwCAS descriptors as write-ahead logs. Experimental results show that the proposed method is up to ten times faster than the original algorithm and suggests several productive uses of PMwCAS operations.

View full abstract

Download PDF (1248K)
Qobra: Accelerating Transactional Serializability Verifier by Quantum Annealing

Natsuki Hamada, Kazuhiro Saito, Hideyuki Kawashima

Article type: Regular Paper
Subject area: Special Section on Advanced Computing Systems
2024Volume 32 Pages 1013-1022
Published: 2024
Released on J-STAGE: November 15, 2024

DOIhttps://doi.org/10.2197/ipsjjip.32.1013

JOURNAL FREE ACCESS

Show abstractHide abstract

Serializability is a standard that guarantees the correct execution of database operations. Since cloud databases do not always guarantee serializability, users must check for serializability on their own. However, what operations are executed in the cloud databases is a black box for users, making it difficult for them to make a judgment. This is a combinatorial optimization problem called the “black-box serializability problem, ” which is a satisfiability problem known to be NP-complete. A seminal work called Cobra has proposed an architecture that solves this problem using the SMT solver, a general-purpose solver for the satisfiability problem. Still, as the number of transactions increases, the search space will expand exponentially, so it becomes challenging to determine serializability. On the other hand, quantum annealing is excellent for fast-solving combinatorial optimization problems and can be applied to this satisfiability problem. This paper proposes a fast solver for the black-box serializability problem using quantum annealing extending Cobra. The evaluation results show that the proposed method is 751-890 times faster than Cobra.

View full abstract

Download PDF (474K)
A Message Reduction Scheme Based on Geographical Information in Initiative-evacuation Induction Using Social Graphs

Sora Okamoto, Tomoya Kawakami

Article type: Regular Paper
Subject area: Special Section on Consumer Device & System
2024Volume 32 Pages 1023-1032
Published: 2024
Released on J-STAGE: November 15, 2024

DOIhttps://doi.org/10.2197/ipsjjip.32.1023

JOURNAL FREE ACCESS

Show abstractHide abstract

In Japan, tsunamis and landslides are frequently caused as secondary disasters after earthquakes, and early evacuation is an important issue to reduce the damage. In this study, the authors propose a method to reduce the number of evacuation messages based on geographical information while maintaining the time to complete evacuation using social graphs. We define evacuation target areas and evacuate only designated targets within these areas. The proposed method relays evacuation status only when there is a person (node) within the evacuation area. This method reduces the number of messages while maintaining the evacuation completion time. The evaluation results confirmed that the proposed method can reduce the number of messages during evacuation while maintaining the time required to complete the evacuation, compared to the case where all nodes outside the evacuation target area relay the evacuation status and the case where no relay is performed.

View full abstract

Download PDF (3157K)
Privacy-protective Distributed Machine Learning between Rich Devices and Edge Servers

Saki Takano, Akihiro Nakao, Saneyasu Yamaguch, Masato Oguchi

Article type: Regular Paper
Subject area: Special Section on Consumer Device & System
2024Volume 32 Pages 1033-1043
Published: 2024
Released on J-STAGE: November 15, 2024

DOIhttps://doi.org/10.2197/ipsjjip.32.1033

JOURNAL FREE ACCESS

Show abstractHide abstract

The utilization of data collected by edge devices, including personal information, in machine learning has emerged as a significant trend in recent years. In most distributed machine learning methods, like Federated Learning, data or training results are typically aggregated and managed on high-performance servers. However, transferring users' personal information to an external server can raise privacy concerns due to the potential risk of data leakage. To tackle this issue, we propose a distributed machine learning model that offers robust privacy protection, allowing users to decide whether or not to share personal data with the server. In the proposed model, the edge device takes over the training at the edge server and sends only the results for which the user has given permission to the edge server for integration. To validate the effectiveness of the proposed model, we performed experiments on facial image recognition using Jetson Nano as an edge device. The experimental results confirmed that edge devices were capable of utilizing personal information in a short time, while the edge server achieved enhanced accuracy by integrating multiple training results. Thus, the results show that the proposed model enables the safe and efficient utilization of data collected by edge devices.

View full abstract

Download PDF (1289K)
Comfortable PandA: Student-Developed Extension for Enhancing the Usability of Sakai LMS at Kyoto University

Kazuki Takeda, Shoji Kajita

Article type: Regular Paper
Subject area: Special Section on digital practices
2024Volume 32 Pages 1044-1055
Published: 2024
Released on J-STAGE: November 15, 2024

DOIhttps://doi.org/10.2197/ipsjjip.32.1044

JOURNAL FREE ACCESS

Show abstractHide abstract

The COVID-19 pandemic brought a rapid shift to online education at higher educational institutions, significantly increasing the usage of Learning Management System (LMS). Although LMS is a feature-rich educational platform, it is primarily designed to meet the needs of faculty and instructors, and its usability from the student's perspective is often overlooked. This paper presents Comfortable PandA, a web browser extension developed by a student to enhance the usability of Kyoto University's LMS “PandA” based on the open source Sakai LMS. The extension aims to improve the user interface of the LMS by providing new features, such as a list of assignments and test&quizzes and a notification system for new assignments. In this paper, the extension's development, features, and collaboration with Kyoto University for its official recognition and regulation for future enhancements are described in detail. An evaluation based on student surveys is also presented to demonstrate the extension's effectiveness. Notably, the extension's usage has been increasing since its release in May 2020, and it is currently used by over 75% of undergraduate students at Kyoto University. This paper highlights the importance of student-led development in enhancing the usability of the LMS and emphasizes the critical role of integrating perspectives from the student community in the evolution of the LMS.

View full abstract

Download PDF (3205K)
Editor's Message to Special Issue of Information Security and Trust to Support Social and Ethical Digital Activities

Midori Inaba

Article type: Special Issue of Information Security and Trust to Support Social and Ethical Digital Activities
2024Volume 32 Pages 1056
Published: 2024
Released on J-STAGE: December 15, 2024

DOIhttps://doi.org/10.2197/ipsjjip.32.1056

JOURNAL FREE ACCESS

Download PDF (33K)
Development and Trial Application of an Improved MRC-EDC Method for Risk Assessment of Attacks on Humans by Generative AI

Ryoichi Sasaki, Kenta Onishi, Yoshihiro Mitsui, Masato Terada

Article type: Special Issue of Information Security and Trust to Support Social and Ethical Digital Activities
Subject area: Evaluation and Management
2024Volume 32 Pages 1057-1065
Published: 2024
Released on J-STAGE: December 15, 2024

DOIhttps://doi.org/10.2197/ipsjjip.32.1057

JOURNAL FREE ACCESS

Show abstractHide abstract

The authors previously proposed classifying the relationship between AI and security into four types: attacks using AI, attacks by AI, attacks to AI, and security measures using AI. Subsequently, generative AI such as ChatGPT has become widely used. Therefore, we examined the impact of the emergence of generative AI on the relationship between AI and security and demonstrated a pressing need for countermeasures against attacks by generative AI. The authors then categorized three types of attacks from generative AI to humans: “Terminator, ” “2001: A Space Odyssey, ” and “Mad Scientist, ” and proposed potential countermeasures against them. The MRC-EDC method developed earlier by the authors aimed to optimize the combination of countermeasures, but it was not suitable for this subject due to its full-quantitative approach, necessitating rigorous cost and risk estimation. Consequently, we developed an improved MRC-EDC method that partially incorporates a semi-quantitative approach and conducted a trial to propose countermeasures against attacks by generative AI. As a result, five cost-effective countermeasures were identified, confirming the effectiveness of this method.

View full abstract

Download PDF (7031K)
Man-in-the-Portal: Breaking SSL/TLS Silently Abusing Captive Portal

Keiichiro Kimura, Hiroki Kuzuno, Yoshiaki Shiraishi, Masakatu Morii

Article type: Special Issue of Information Security and Trust to Support Social and Ethical Digital Activities
Subject area: Wireless/Mobile Networks
2024Volume 32 Pages 1066-1081
Published: 2024
Released on J-STAGE: December 15, 2024

DOIhttps://doi.org/10.2197/ipsjjip.32.1066

JOURNAL FREE ACCESS

Show abstractHide abstract

The proliferation of public hotspots has led to the use of captive portals to protect hotspots and ensure their appropriate use. Captive portals control external sessions until user authentication at the hotspot is complete. One feature of captive portals is that these can redirect the authenticated user to an arbitrary website. However, cyber-attacks have been reported that exploit captive portals and there is an urgent need to improve the protocol of captive portals. In this paper, we reveal a critical flaw in the captive portal protocol and propose a man-in-the-middle attack that exploits the flaw to disable SSL/TLS. We name this attack Man-in-the-Portal (MITP). The attack is the first to exploit the post-authentication redirection of a captive portal as a starting point to disable SSL/TLS. The attacker can easily eavesdrop on and tamper with a victim device's communications. Our attack is also feasible without requiring any special privileges or tools. To demonstrate the effectiveness and practicality, we evaluate the MITP attack on five commercially available wireless devices as our proof-of-concept, and show that the attack poses significant threats. Furthermore, we analyze the root causes of the MITP attack and present protocol-level countermeasures to improve the security of wireless communications.

View full abstract

Download PDF (22695K)
Investigate the Countermeasure to Mitigate the Privacy Leakage in DNS over QUIC

Guannan Hu, Kensuke Fukuda

Article type: Special Issue of Information Security and Trust to Support Social and Ethical Digital Activities
Subject area: Network Security
2024Volume 32 Pages 1082-1089
Published: 2024
Released on J-STAGE: December 15, 2024

DOIhttps://doi.org/10.2197/ipsjjip.32.1082

JOURNAL FREE ACCESS

Show abstractHide abstract

Original DNS packets are unencrypted, which leads to information leakage while users visit websites. The adversary could monitor the DNS communication and infer the users' Internet preferences, which may contain private sensitive content, such as health, finance, and religion. Although several encrypted DNS protocols have been proposed - DNS over HTTPS (DoH), DNS over TLS (DoT), and DNS over QUIC (DoQ), recent research shows that the adversary could still infer the category of websites even using DoT and DoH. This paper studies the privacy leakage problem of DoQ protocol with two different DNS recursive resolvers (NextDNS and Bind). We investigate 30 categories from Alexa's top websites for binary and multi-classification. As a baseline analysis, we first show the classification performance of the binary classification (i.e., sensitive and non-sensitive). We find that the classification performance of the websites is high both in NextDNS and Bind resolvers for identifying whether the category of websites is sensitive. More particularly, we indicate that discriminative features are mainly related to the inter-arrival time of packets and packet length. For the multi-classification, we notice the performances decrease as the number of categories increases, meaning that the impact of the leakage is limited. Next, we further investigate four possible countermeasures that could affect the classification results: using AdBlocker extension, disabling DNS prefetch, adding random delay in responses, and padding the DNS payload. 1) We show that using AdBlocker and disabling DNS prefetch are less effective in mitigating the attack. 2) We find that mean F1 scores decrease as the delays increase. Specifically, it decreases the classification performance by 22% with NextDNS and 18% with Bind. 3) DNS padding decreases the classification performance by 9%. We further investigate the combination of the two countermeasures: both adding random (0-60ms and 0-100ms) delays and padding the DNS payload with binary and multi-classification of 30 categories. We confirm that the combined method could greatly reduce the classification performance, on average 27% of binary and 22% of multi-classification in Bind. These results indicate that adding random time and padding can protect users' information from the website fingerprinting attack, though the random delay might affect the user experiences.

View full abstract

Download PDF (686K)
Proposal of Open Source Software Security Risk Indicator Based on Vulnerability Management Interview

Hiroki Kuzuno, Tomohiko Yano, Kazuki Omo, Jeroen van der Ham, Toshihir ...

Article type: Special Issue of Information Security and Trust to Support Social and Ethical Digital Activities
Subject area: Network Security
2024Volume 32 Pages 1090-1104
Published: 2024
Released on J-STAGE: December 15, 2024

DOIhttps://doi.org/10.2197/ipsjjip.32.1090

JOURNAL FREE ACCESS

Show abstractHide abstract

Open source software (OSS) has gained significant prominence in recent years. The security of OSS is a vital concern within information systems relying on OSS. When vulnerabilities are identified in OSS projects, previous security research has suggested approaches such as vulnerability classification, risk estimation, and exploitability analysis. Evaluating whether these vulnerabilities and the associated risks in the OSS development process pose security threats remains a complex challenge. In this study, through an interview survey on existing vulnerability management, we have determined the need for ongoing and automated countermeasures against attacks by comprehending OSS security risks based on the survey findings. We then evaluated the effectiveness of these countermeasures and proposed a security risk indicator for OSS to address these issues. The proposed method measures OSS security risk indicators by integrating vulnerability information with the development status of OSS. The suggested security risk indicator for OSS serves as a benchmark for security measures in the operation of information systems. In our evaluation, we evaluated the effectiveness of the proposed OSS security risk indicator in identifying threats from multiple OSS and assessed the computational cost associated with calculating OSS security risk indicators.

View full abstract

Download PDF (1079K)
A Machine Learning Based Three-step Framework for Malicious URL Detection

Qisheng Chen, Kazumasa Omote

Article type: Special Issue of Information Security and Trust to Support Social and Ethical Digital Activities
Subject area: Network Security
2024Volume 32 Pages 1105-1113
Published: 2024
Released on J-STAGE: December 15, 2024

DOIhttps://doi.org/10.2197/ipsjjip.32.1105

JOURNAL FREE ACCESS

Show abstractHide abstract

Malicious URL is a security problem that has plagued the Internet for a long time. Previously, people usually used the method of establishing blacklists to distinguish between malicious URLs and benign URLs, but to solve the shortcomings of using blacklist method to detect malicious URLs, such as slow update speed, the research of using machine learning to detect malicious URLs is increasing. These research projects have proposed their own methods and obtained great accuracy, but the summary research on malicious URLs detection is insufficient. In this paper, we propose a three-step framework: Segmentation step, Embedding step and Machine Learning step, for malicious URLs detection, which makes sense for systematically summarizing different machine learning based malicious URL detection methods. We overview 14 related works by our three-step framework and find that almost all research on malicious URLs detection using machine learning can be classified by the three-step framework. We evaluate some context-considering methods, the methods that consider the corpus's context during the vector generation, and machine learning models to test their suitability using our three-step framework. According to the results, we verify the importance of considering context and find that context-considering embedding methods are more important and the malicious URLs detection accuracy improved with context-considering methods.

View full abstract

Download PDF (338K)
Can We Create a TLS Lie Detector?

Atsushi Kanda, Masaki Hashimoto, Takao Okubo

Article type: Special Issue of Information Security and Trust to Support Social and Ethical Digital Activities
Subject area: Network Security
2024Volume 32 Pages 1114-1124
Published: 2024
Released on J-STAGE: December 15, 2024

DOIhttps://doi.org/10.2197/ipsjjip.32.1114

JOURNAL FREE ACCESS

Show abstractHide abstract

As encryption technology has become more widely used, attackers have begun to use techniques that increase the stealth nature of their attacks based on the assumption that encrypted communications are being used. As an example, some threat actors, including Lazarus, have been reported to use a sophisticated technique, named “FakeTLS”. This is a method that aims to avoid detection and blocking by Deep Packet Inspection (DPI) by disguising its appearance as Transport Layer Security (TLS) communication. In this study, based on the FakeTLS method used by Lazarus, we attempted to distinguish whether TLS communication is spoofed or not without decrypting the communication content. We have created a dataset of normal TLS and FakeTLS based on command output results, which attackers often collect in the early stages of an intrusion. FakeTLS data were encrypted with algorithms often used by threat actors. For some algorithms, we reproduced exactly the same algorithms as Lazarus's methods. We collected the features based on the Shannon entropy and randomness testings from the encrypted part of the TLS communications and constructed a classifier named TLS Lie Detector using novelty detection methods. Our experimental results showed that the classifier can detect lies with an F0.5 score of 0.88, an F1 score of 0.78, an F2 score of 0.70, and a Matthews correlation coefficient of 0.74. In particular, our proposed method could completely detect FakeTLS using weak encryption algorithms.

View full abstract

Download PDF (789K)
Quantitative Estimation of Local Distortions in Historical Maps by Moving Least Squares Technique

Takeshi Miura

Article type: Regular Paper
Subject area: Applications in Humanities
2024Volume 32 Pages 1125-1128
Published: 2024
Released on J-STAGE: December 15, 2024

DOIhttps://doi.org/10.2197/ipsjjip.32.1125

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper proposes a method to quantitatively estimate the local-distortion characteristics in historical maps. First, the Euclidean regression technique is applied to a given historical map to fit its size, direction and location to those of a present map. Next, the moving least squares technique is used to formulate the relationship between the present map and the fitted historical map as a localized similarity-transformation function. The coefficients of the obtained function quantitatively and separately give the amounts of two main local-distortion processes at each local area: rotation heuristics and implicit scaling. The experimental results show the effectiveness of the proposed method.

View full abstract

Download PDF (7182K)

Register with J-STAGE for free!