Journal of Information Processing
Online ISSN : 1882-6652
ISSN-L : 1882-6652
Volume 32
Displaying 101-116 of 116 articles from this issue
 
  • Rina Trisminingsih, Savong Bou, Toshiyuki Amagasa
    Article type: Regular Paper
    Subject area: Special Section on Databases
    2024Volume 32 Pages 963-972
    Published: 2024
    Released on J-STAGE: November 15, 2024
    JOURNAL FREE ACCESS

    Pattern matching over event streams has been widely used in many modern applications. A key technology tailored for pattern matching in streaming settings is Complex Event Processing (CEP). The most common CEP-based systems assume that input events arrive in total order based on their occurrence timestamps. However, this assumption fails in real-world applications for several reasons, such as network latency, machine failure, causing out-of-order events. Consequently, the current CEP systems face challenges to ensure accuracy and timeliness in pattern matching. To solve this problem, a number of out-of-order event stream handling approaches have been proposed. This paper introduces a new approach to out-of-order event stream handling for nondeterministic finite automaton (NFA)-based pattern matching. The key idea is to employ a sliding buffer structure that utilizes the sliding window concept and temporal indexing to incrementally evaluate the predefined pattern over streams. Further, we present a new matching algorithm that supports an invalidation mechanism for incorrect prior pattern matches affected by out-of-order events. We conduct extensive experimentation with synthetic and real-world datasets, and the results confirm that our proposed approach can deal with out-of-order event streams in pattern matching better than the state-of-the-art approach, especially in computational latency.

    Download PDF (33411K)
  • Eint Sandi Aung, Hayato Yamana
    Article type: Regular Paper
    Subject area: Special Section on Databases
    2024Volume 32 Pages 973-989
    Published: 2024
    Released on J-STAGE: November 15, 2024
    JOURNAL FREE ACCESS

    Phishing is a cyberattack method employed by malicious actors who impersonate authorized personnel to illicitly obtain confidential information. Phishing URLs constitute a form of URL-based intrusion designed to entice users into disclosing sensitive data. This study focuses on URL-based phishing detection, which involves identifying phishing webpages by analyzing their URLs. We primarily address two unresolved issues of previous research: (i) insufficient word tokenization of URLs, which leads to the inclusion of meaningless and unknown words, thereby diminishing the accuracy of phishing detection, (ii) dataset-dependent lexical features, such as phish-hinted words extracted exclusively from a given dataset, which restricts the robustness of the detection method. To solve the first issue, we propose a new segmentation-based word-level tokenization algorithm called SegURLizer. The second issue is addressed by dataset-independent natural language processing (NLP) features, incorporating various features extracted from domain-part and path-part of URLs. Then, we employ the segmentation-based features with SegURLizer on two neural network models - long short-term memory (LSTM) and bidirectional long short-term memory (BiLSTM). We then train the 36 selected NLP-based features on deep neural networks (DNNs). Afterward, we combine the outputs of the two models to develop a hybrid DNN model, named “PhiSN, ” representing phishing URL detection through segmentation and NLP features. Our experiments, conducted on five open datasets, confirmed the higher performance of our PhiSN model compared to state-of-the-art methods. Specifically, our model achieved higher F1-measure and accuracy, ranging from 95.30% to 99.75% F1-measure and from 95.41% to 99.76% accuracy, on each dataset.

    Download PDF (3766K)
  • Marin Matsumoto, Tsubasa Takahashi, Seng Pei Liew, Masato Oguchi
    Article type: Regular Paper
    Subject area: Special Section on Databases
    2024Volume 32 Pages 990-1002
    Published: 2024
    Released on J-STAGE: November 15, 2024
    JOURNAL FREE ACCESS

    Local differential privacy (LDP) provides a strong privacy guarantee in a distributed setting such as federated learning (FL). When a central curator deploys local randomizers satisfying ε0-LDP, how can we confirm and measure the given privacy guarantees at clients? To answer the above question, we introduce an empirical privacy test in FL clients by measuring the lower bounds of LDP, which gives us empirical ε0 and probability that the two gradients can be distinguished. To audit the given privacy guarantees (i.e., ε0), we first discover a worst-case scenario that reaches the theoretical upper bound of LDP, which is essential to empirically materialize the given privacy guarantees. We further instantiate several adversaries in FL under LDP to observe empirical LDP at various attack surfaces. The empirical privacy test with those adversary instantiations enables FL clients to understand how the given privacy level protects them more intuitively and verify that mechanisms claiming ε0-LDP provide equivalent privacy protection. We also demonstrate numerical observations of the measured privacy in these adversarial settings, and the randomization algorithm LDP-SGD is vulnerable to gradient manipulation and a maliciously well-manipulated model. We further discuss employing a shuffler to measure empirical privacy in a collaborative way and also measuring the privacy of the shuffled model. Our observation suggests that the theoretical ε in the shuffle model has room for improvement.

    Download PDF (933K)
  • Kento Sugiura, Manabu Nishimura, Yoshiharu Ishikawa
    Article type: Regular Paper
    Subject area: Special Section on Databases
    2024Volume 32 Pages 1003-1012
    Published: 2024
    Released on J-STAGE: November 15, 2024
    JOURNAL FREE ACCESS

    In the last decade, academic and industrial researchers have focused on persistent memory because of the development of the first practical product, Intel Optane. One of the main challenges of persistent memory programming is to guarantee consistent durability over separate memory addresses, and Wang et al. proposed a persistent multi-word compare-and-swap (PMwCAS) algorithm to solve this problem. However, their algorithm contains redundant compare-and-swap (CAS) and cache flush instructions and does not achieve sufficient performance on many-core CPUs. This paper proposes a new algorithm to improve performance on many-core CPUs by removing useless CAS/flush instructions from PMwCAS operations. We also exclude dirty flags, which help ensure consistent durability in the original algorithm, from our algorithm using PMwCAS descriptors as write-ahead logs. Experimental results show that the proposed method is up to ten times faster than the original algorithm and suggests several productive uses of PMwCAS operations.

    Download PDF (1248K)
  • Natsuki Hamada, Kazuhiro Saito, Hideyuki Kawashima
    Article type: Regular Paper
    Subject area: Special Section on Advanced Computing Systems
    2024Volume 32 Pages 1013-1022
    Published: 2024
    Released on J-STAGE: November 15, 2024
    JOURNAL FREE ACCESS

    Serializability is a standard that guarantees the correct execution of database operations. Since cloud databases do not always guarantee serializability, users must check for serializability on their own. However, what operations are executed in the cloud databases is a black box for users, making it difficult for them to make a judgment. This is a combinatorial optimization problem called the “black-box serializability problem, ” which is a satisfiability problem known to be NP-complete. A seminal work called Cobra has proposed an architecture that solves this problem using the SMT solver, a general-purpose solver for the satisfiability problem. Still, as the number of transactions increases, the search space will expand exponentially, so it becomes challenging to determine serializability. On the other hand, quantum annealing is excellent for fast-solving combinatorial optimization problems and can be applied to this satisfiability problem. This paper proposes a fast solver for the black-box serializability problem using quantum annealing extending Cobra. The evaluation results show that the proposed method is 751-890 times faster than Cobra.

    Download PDF (474K)
  • Sora Okamoto, Tomoya Kawakami
    Article type: Regular Paper
    Subject area: Special Section on Consumer Device & System
    2024Volume 32 Pages 1023-1032
    Published: 2024
    Released on J-STAGE: November 15, 2024
    JOURNAL FREE ACCESS

    In Japan, tsunamis and landslides are frequently caused as secondary disasters after earthquakes, and early evacuation is an important issue to reduce the damage. In this study, the authors propose a method to reduce the number of evacuation messages based on geographical information while maintaining the time to complete evacuation using social graphs. We define evacuation target areas and evacuate only designated targets within these areas. The proposed method relays evacuation status only when there is a person (node) within the evacuation area. This method reduces the number of messages while maintaining the evacuation completion time. The evaluation results confirmed that the proposed method can reduce the number of messages during evacuation while maintaining the time required to complete the evacuation, compared to the case where all nodes outside the evacuation target area relay the evacuation status and the case where no relay is performed.

    Download PDF (3157K)
  • Saki Takano, Akihiro Nakao, Saneyasu Yamaguch, Masato Oguchi
    Article type: Regular Paper
    Subject area: Special Section on Consumer Device & System
    2024Volume 32 Pages 1033-1043
    Published: 2024
    Released on J-STAGE: November 15, 2024
    JOURNAL FREE ACCESS

    The utilization of data collected by edge devices, including personal information, in machine learning has emerged as a significant trend in recent years. In most distributed machine learning methods, like Federated Learning, data or training results are typically aggregated and managed on high-performance servers. However, transferring users' personal information to an external server can raise privacy concerns due to the potential risk of data leakage. To tackle this issue, we propose a distributed machine learning model that offers robust privacy protection, allowing users to decide whether or not to share personal data with the server. In the proposed model, the edge device takes over the training at the edge server and sends only the results for which the user has given permission to the edge server for integration. To validate the effectiveness of the proposed model, we performed experiments on facial image recognition using Jetson Nano as an edge device. The experimental results confirmed that edge devices were capable of utilizing personal information in a short time, while the edge server achieved enhanced accuracy by integrating multiple training results. Thus, the results show that the proposed model enables the safe and efficient utilization of data collected by edge devices.

    Download PDF (1289K)
  • Kazuki Takeda, Shoji Kajita
    Article type: Regular Paper
    Subject area: Special Section on digital practices
    2024Volume 32 Pages 1044-1055
    Published: 2024
    Released on J-STAGE: November 15, 2024
    JOURNAL FREE ACCESS

    The COVID-19 pandemic brought a rapid shift to online education at higher educational institutions, significantly increasing the usage of Learning Management System (LMS). Although LMS is a feature-rich educational platform, it is primarily designed to meet the needs of faculty and instructors, and its usability from the student's perspective is often overlooked. This paper presents Comfortable PandA, a web browser extension developed by a student to enhance the usability of Kyoto University's LMS “PandA” based on the open source Sakai LMS. The extension aims to improve the user interface of the LMS by providing new features, such as a list of assignments and test&quizzes and a notification system for new assignments. In this paper, the extension's development, features, and collaboration with Kyoto University for its official recognition and regulation for future enhancements are described in detail. An evaluation based on student surveys is also presented to demonstrate the extension's effectiveness. Notably, the extension's usage has been increasing since its release in May 2020, and it is currently used by over 75% of undergraduate students at Kyoto University. This paper highlights the importance of student-led development in enhancing the usability of the LMS and emphasizes the critical role of integrating perspectives from the student community in the evolution of the LMS.

    Download PDF (3205K)
  • Midori Inaba
    Article type: Special Issue of Information Security and Trust to Support Social and Ethical Digital Activities
    2024Volume 32 Pages 1056
    Published: 2024
    Released on J-STAGE: December 15, 2024
    JOURNAL FREE ACCESS
    Download PDF (33K)
  • Ryoichi Sasaki, Kenta Onishi, Yoshihiro Mitsui, Masato Terada
    Article type: Special Issue of Information Security and Trust to Support Social and Ethical Digital Activities
    Subject area: Evaluation and Management
    2024Volume 32 Pages 1057-1065
    Published: 2024
    Released on J-STAGE: December 15, 2024
    JOURNAL FREE ACCESS

    The authors previously proposed classifying the relationship between AI and security into four types: attacks using AI, attacks by AI, attacks to AI, and security measures using AI. Subsequently, generative AI such as ChatGPT has become widely used. Therefore, we examined the impact of the emergence of generative AI on the relationship between AI and security and demonstrated a pressing need for countermeasures against attacks by generative AI. The authors then categorized three types of attacks from generative AI to humans: “Terminator, ” “2001: A Space Odyssey, ” and “Mad Scientist, ” and proposed potential countermeasures against them. The MRC-EDC method developed earlier by the authors aimed to optimize the combination of countermeasures, but it was not suitable for this subject due to its full-quantitative approach, necessitating rigorous cost and risk estimation. Consequently, we developed an improved MRC-EDC method that partially incorporates a semi-quantitative approach and conducted a trial to propose countermeasures against attacks by generative AI. As a result, five cost-effective countermeasures were identified, confirming the effectiveness of this method.

    Download PDF (7031K)
  • Keiichiro Kimura, Hiroki Kuzuno, Yoshiaki Shiraishi, Masakatu Morii
    Article type: Special Issue of Information Security and Trust to Support Social and Ethical Digital Activities
    Subject area: Wireless/Mobile Networks
    2024Volume 32 Pages 1066-1081
    Published: 2024
    Released on J-STAGE: December 15, 2024
    JOURNAL FREE ACCESS

    The proliferation of public hotspots has led to the use of captive portals to protect hotspots and ensure their appropriate use. Captive portals control external sessions until user authentication at the hotspot is complete. One feature of captive portals is that these can redirect the authenticated user to an arbitrary website. However, cyber-attacks have been reported that exploit captive portals and there is an urgent need to improve the protocol of captive portals. In this paper, we reveal a critical flaw in the captive portal protocol and propose a man-in-the-middle attack that exploits the flaw to disable SSL/TLS. We name this attack Man-in-the-Portal (MITP). The attack is the first to exploit the post-authentication redirection of a captive portal as a starting point to disable SSL/TLS. The attacker can easily eavesdrop on and tamper with a victim device's communications. Our attack is also feasible without requiring any special privileges or tools. To demonstrate the effectiveness and practicality, we evaluate the MITP attack on five commercially available wireless devices as our proof-of-concept, and show that the attack poses significant threats. Furthermore, we analyze the root causes of the MITP attack and present protocol-level countermeasures to improve the security of wireless communications.

    Download PDF (22695K)
  • Guannan Hu, Kensuke Fukuda
    Article type: Special Issue of Information Security and Trust to Support Social and Ethical Digital Activities
    Subject area: Network Security
    2024Volume 32 Pages 1082-1089
    Published: 2024
    Released on J-STAGE: December 15, 2024
    JOURNAL FREE ACCESS

    Original DNS packets are unencrypted, which leads to information leakage while users visit websites. The adversary could monitor the DNS communication and infer the users' Internet preferences, which may contain private sensitive content, such as health, finance, and religion. Although several encrypted DNS protocols have been proposed - DNS over HTTPS (DoH), DNS over TLS (DoT), and DNS over QUIC (DoQ), recent research shows that the adversary could still infer the category of websites even using DoT and DoH. This paper studies the privacy leakage problem of DoQ protocol with two different DNS recursive resolvers (NextDNS and Bind). We investigate 30 categories from Alexa's top websites for binary and multi-classification. As a baseline analysis, we first show the classification performance of the binary classification (i.e., sensitive and non-sensitive). We find that the classification performance of the websites is high both in NextDNS and Bind resolvers for identifying whether the category of websites is sensitive. More particularly, we indicate that discriminative features are mainly related to the inter-arrival time of packets and packet length. For the multi-classification, we notice the performances decrease as the number of categories increases, meaning that the impact of the leakage is limited. Next, we further investigate four possible countermeasures that could affect the classification results: using AdBlocker extension, disabling DNS prefetch, adding random delay in responses, and padding the DNS payload. 1) We show that using AdBlocker and disabling DNS prefetch are less effective in mitigating the attack. 2) We find that mean F1 scores decrease as the delays increase. Specifically, it decreases the classification performance by 22% with NextDNS and 18% with Bind. 3) DNS padding decreases the classification performance by 9%. We further investigate the combination of the two countermeasures: both adding random (0-60ms and 0-100ms) delays and padding the DNS payload with binary and multi-classification of 30 categories. We confirm that the combined method could greatly reduce the classification performance, on average 27% of binary and 22% of multi-classification in Bind. These results indicate that adding random time and padding can protect users' information from the website fingerprinting attack, though the random delay might affect the user experiences.

    Download PDF (686K)
  • Hiroki Kuzuno, Tomohiko Yano, Kazuki Omo, Jeroen van der Ham, Toshihir ...
    Article type: Special Issue of Information Security and Trust to Support Social and Ethical Digital Activities
    Subject area: Network Security
    2024Volume 32 Pages 1090-1104
    Published: 2024
    Released on J-STAGE: December 15, 2024
    JOURNAL FREE ACCESS

    Open source software (OSS) has gained significant prominence in recent years. The security of OSS is a vital concern within information systems relying on OSS. When vulnerabilities are identified in OSS projects, previous security research has suggested approaches such as vulnerability classification, risk estimation, and exploitability analysis. Evaluating whether these vulnerabilities and the associated risks in the OSS development process pose security threats remains a complex challenge. In this study, through an interview survey on existing vulnerability management, we have determined the need for ongoing and automated countermeasures against attacks by comprehending OSS security risks based on the survey findings. We then evaluated the effectiveness of these countermeasures and proposed a security risk indicator for OSS to address these issues. The proposed method measures OSS security risk indicators by integrating vulnerability information with the development status of OSS. The suggested security risk indicator for OSS serves as a benchmark for security measures in the operation of information systems. In our evaluation, we evaluated the effectiveness of the proposed OSS security risk indicator in identifying threats from multiple OSS and assessed the computational cost associated with calculating OSS security risk indicators.

    Download PDF (1079K)
  • Qisheng Chen, Kazumasa Omote
    Article type: Special Issue of Information Security and Trust to Support Social and Ethical Digital Activities
    Subject area: Network Security
    2024Volume 32 Pages 1105-1113
    Published: 2024
    Released on J-STAGE: December 15, 2024
    JOURNAL FREE ACCESS

    Malicious URL is a security problem that has plagued the Internet for a long time. Previously, people usually used the method of establishing blacklists to distinguish between malicious URLs and benign URLs, but to solve the shortcomings of using blacklist method to detect malicious URLs, such as slow update speed, the research of using machine learning to detect malicious URLs is increasing. These research projects have proposed their own methods and obtained great accuracy, but the summary research on malicious URLs detection is insufficient. In this paper, we propose a three-step framework: Segmentation step, Embedding step and Machine Learning step, for malicious URLs detection, which makes sense for systematically summarizing different machine learning based malicious URL detection methods. We overview 14 related works by our three-step framework and find that almost all research on malicious URLs detection using machine learning can be classified by the three-step framework. We evaluate some context-considering methods, the methods that consider the corpus's context during the vector generation, and machine learning models to test their suitability using our three-step framework. According to the results, we verify the importance of considering context and find that context-considering embedding methods are more important and the malicious URLs detection accuracy improved with context-considering methods.

    Download PDF (338K)
  • Atsushi Kanda, Masaki Hashimoto, Takao Okubo
    Article type: Special Issue of Information Security and Trust to Support Social and Ethical Digital Activities
    Subject area: Network Security
    2024Volume 32 Pages 1114-1124
    Published: 2024
    Released on J-STAGE: December 15, 2024
    JOURNAL FREE ACCESS

    As encryption technology has become more widely used, attackers have begun to use techniques that increase the stealth nature of their attacks based on the assumption that encrypted communications are being used. As an example, some threat actors, including Lazarus, have been reported to use a sophisticated technique, named “FakeTLS”. This is a method that aims to avoid detection and blocking by Deep Packet Inspection (DPI) by disguising its appearance as Transport Layer Security (TLS) communication. In this study, based on the FakeTLS method used by Lazarus, we attempted to distinguish whether TLS communication is spoofed or not without decrypting the communication content. We have created a dataset of normal TLS and FakeTLS based on command output results, which attackers often collect in the early stages of an intrusion. FakeTLS data were encrypted with algorithms often used by threat actors. For some algorithms, we reproduced exactly the same algorithms as Lazarus's methods. We collected the features based on the Shannon entropy and randomness testings from the encrypted part of the TLS communications and constructed a classifier named TLS Lie Detector using novelty detection methods. Our experimental results showed that the classifier can detect lies with an F0.5 score of 0.88, an F1 score of 0.78, an F2 score of 0.70, and a Matthews correlation coefficient of 0.74. In particular, our proposed method could completely detect FakeTLS using weak encryption algorithms.

    Download PDF (789K)
  • Takeshi Miura
    Article type: Regular Paper
    Subject area: Applications in Humanities
    2024Volume 32 Pages 1125-1128
    Published: 2024
    Released on J-STAGE: December 15, 2024
    JOURNAL FREE ACCESS

    This paper proposes a method to quantitatively estimate the local-distortion characteristics in historical maps. First, the Euclidean regression technique is applied to a given historical map to fit its size, direction and location to those of a present map. Next, the moving least squares technique is used to formulate the relationship between the present map and the fitted historical map as a localized similarity-transformation function. The coefficients of the obtained function quantitatively and separately give the amounts of two main local-distortion processes at each local area: rotation heuristics and implicit scaling. The experimental results show the effectiveness of the proposed method.

    Download PDF (7182K)
feedback
Top