Journal of Information Processing
Online ISSN : 1882-6652
ISSN-L : 1882-6652
Volume 26
Showing 51-91 articles out of 91 articles from the selected issue
  • Masafumi Oyamada, Jianquan Liu, Shinji Ito, Kazuyo Narita, Takuya Arak ...
    Type: Regular Papers
    Subject area: Special Section on Databases
    2018 Volume 26 Pages 416-426
    Published: 2018
    Released: May 15, 2018
    JOURNALS FREE ACCESS

    In this paper, we present CVS (Compressed Vector Set), a fast and space-efficient data mining framework that efficiently handles both sparse and dense datasets. CVS holds a set of vectors in a compressed format and conducts primitive vector operations, such as lp-norm and dot product, without decompression. By combining these primitive operations, CVS accelerates prominent data mining or machine learning algorithms including k-nearest neighbor algorithm, stochastic gradient descent algorithm on logistic regression, and kernel methods. In contrast to the commonly used sparse matrix/vector representation, which is not effective for dense datasets, CVS efficiently handles sparse datasets and dense datasets in a unified manner. Our experimental results demonstrate that CVS can process both dense datasets and sparse datasets faster than conventional sparse vector representation with smaller memory usage.

    Download PDF (466K)
  • Suppanut Pothirattanachaikul, Takehiro Yamamoto, Sumio Fujita, Akira T ...
    Type: Regular Papers
    Subject area: Special Section on Databases
    2018 Volume 26 Pages 427-438
    Published: 2018
    Released: May 15, 2018
    JOURNALS FREE ACCESS

    Web searchers often use a Web search engine to find a way or means to achieve his/her goal. For example, a user intending to solve his/her sleeping problem, the query “sleeping pills” may be used. However, there may be another solution to achieve the same goal, such as “have a cup of hot milk” or “stroll before bedtime.” The problem is that the user may not be aware that these solutions exist. Thus, he/she will probably choose to take a sleeping pill without considering these solutions. In this study, we define and tackle the alternative action mining problem. In particular, we attempt to develop a method for mining alternative actions for a given query. We define alternative actions as actions which share the same goal and define the alternative action mining problem as similar in the search result diversification. To tackle the problem, we propose leveraging a community Q&A (cQA) corpus for mining alternative actions. The cQA corpus can be seen as an archival dataset comprising dialogues between questioners, who want to know the solutions to their problem, and respondents, who suggest different solutions. We propose a method to compute how well two actions can be alternative actions by using a question-answer structure in a cQA corpus. Our method builds a question-action bipartite graph and recursively computes how well two actions can be alternative actions. We conducted experiments to investigate the effectiveness of our method using two newly built test collections, each containing 50 queries. The experimental results indicated that, for Japanese test collection, our proposed method significantly outperformed two types of baselines, one used the conventional query suggestions and the other extracted alternative-actions from the Web documents, in terms of D#-nDCG@8. Also, for English test collection, our method significantly outperformed the baseline using the conventional query suggestions in terms of D#-nDCG@8.

    Download PDF (573K)
  • Takashi Isozaki, Satoshi Hasegawa, Hajime Anada
    Type: Regular Papers
    Subject area: Algorithm Theory
    2018 Volume 26 Pages 439-444
    Published: 2018
    Released: June 15, 2018
    JOURNALS FREE ACCESS

    Ant colony optimization (ACO) is a well-known swarm intelligence algorithm, with a population-based approach inspired by the foraging strategies of real ants. ACO has been applied to various combinatorial optimization problems belonging to non-deterministic polynomial-time hard (NP-hard) combinational problems. Of these, the traveling salesman problem (TSP) is one of the most important problems in the fields of technology and science. ACO algorithms provide promising results; however, they cannot compete with front-line algorithms when solving the TSP. In such situations, we consider that the Max-Min Ant System (MMAS) is fundamental and expansive in ACO algorithms. Therefore, we constructed a new algorithm by introducing three elements to the MMAS: individual ant memories initialized using the nearest neighbor method to memorize a good solution, New Solution Importance, and a local search procedure. Finally, we confirmed the effectiveness of the proposed algorithm by comparing it to other ACO algorithms using several benchmark problems.

    Download PDF (254K)
  • Keita Doi, Ryota Shioya, Hideki Ando
    Type: Regular Papers
    Subject area: Computer Architecture
    2018 Volume 26 Pages 445-460
    Published: 2018
    Released: June 15, 2018
    JOURNALS FREE ACCESS

    Current multicore processors achieve high throughput by executing multiple independent programs in parallel. However, it is difficult to utilize multiple cores effectively to reduce the execution time of a single program. This is due to a variety of problems, including slow inter-thread communication and high-overhead thread creation. Dramatic improvements in the single-core architecture have reached their limit; thus, it is necessary to effectively use multiple cores to reduce single-program execution time. Tightly coupled multicore architectures provide a potential solution because of their very low-latency inter-thread communication and very light-weight thread creation. One such multicore architecture called SKY has been proposed. SKY has shown its effectiveness in multithreaded execution of a single program, but several problems must be overcome before further performance improvements can be achieved. The problems this paper focuses on are as follows: 1) The SKY compiler partitions programs at a basic block level, but does not explore the inside of basic blocks. This misses an opportunity to find good partitioning. 2) The SKY processor always sequentializes a new thread if the forking core in which it is supposed to be created is busy. However, this is not necessarily a good decision. 3) If the execution of register communication instructions among cores is delayed, the other register communication instructions can be delayed, causing the following thread execution to stall. This situation occurs when the instruction window of a core becomes full. To address these problems, we propose the following three software and hardware techniques: 1) Instruction-level thread partitioning: the compiler explores the inside of basic blocks to find a better program partition. 2) Selective thread creation: the hardware selectively sequentializes or waits for the creation of a new thread to achieve better performance. 3) Automatic register communication: register communication is automatically performed by a small hardware support instead of using instruction window resources. We evaluated the performance of SKY using SPEC2000 benchmark programs. Results on four cores show that the proposed techniques improved performance by 4% and 26% on average (maximum of 11% and 206%) for SPECint2000 and SPECfp2000 programs, respectively, compared with the case where the proposed techniques are not applied. As a result, performance improvements of 1.21 and 1.93 times on average (maximum of 1.52 and 3.30 times) were achieved, respectively, compared with the performance of a single core.

    Download PDF (1583K)
  • Yoshihiro Oyama
    Type: Regular Papers
    Subject area: Network Security
    2018 Volume 26 Pages 461-476
    Published: 2018
    Released: June 15, 2018
    JOURNALS FREE ACCESS

    Once malware has infected a system, it may lie dormant (or asleep) to control resource consumption speeds, remain undetected until the time of an attack, and thwart dynamic analysis. Because of their aggressive and abnormal use of sleep behavior, malware programs are expected to exhibit traits that distinguish them from other programs. However, the details of the sleep behavior of real malware are not sufficiently understood, and the diversity of sleep behavior among different malware samples or families is also unclear. In this paper, we discuss the characteristic sleep behavior of recent malware and explore the potential for applying the features of sleep behavior to malware classification. Specifically, we demonstrate that a wide variety of sleeps are executed by a set of malware samples and that sleeps are a promising source of features for distinguishing between different malware samples. Furthermore, we show that applying a learning algorithm to sleep behavior information can result in high classification accuracy and present several examples of typical and rare sleep behaviors observed in the execution of real malware.

    Download PDF (739K)
  • Ryo Nojima, Hidenobu Oguri, Hiroaki Kikuchi, Hiroshi Nakagawa, Koki Ha ...
    Type: Regular Papers
    Subject area: Security and Society
    2018 Volume 26 Pages 477-485
    Published: 2018
    Released: June 15, 2018
    JOURNALS FREE ACCESS

    Many companies and organizations have been collecting personal data with the aim of sharing it with partners. To prevent re-identification, the data should be anonymized before being shared. Although many anonymization methods have been proposed thus far, choosing one from them is not trivial since there is no widely accepted criteria. To overcome this situation, we have been conducting a data anonymization and re-identification competition, called PWS CUP, in Japan. In this paper, we introduce a problem appeared at the competition, named an excessive anonymization, and show how to formally handle it.

    Download PDF (1169K)
  • Masato Hashimoto, Qiangfu Zhao
    Type: Regular Papers
    Subject area: Knowledge Processing
    2018 Volume 26 Pages 486-496
    Published: 2018
    Released: June 15, 2018
    JOURNALS FREE ACCESS

    Aware agents (A-agents) are programs that can be aware of various situations and support human decisions. The aim of this study is to implement A-agents on portable/wearable computing devices (P/WCDs) like smartphones. Since P/WCDs usually have limited resources, it is difficult to implement many A-agents together in one P/WCD. Cloud-based system is a solution, but this kind of system has privacy problem. To use a cloud server while preserving privacy, we propose a new protocol that has four features, namely, 1) divide each A-agent into two parts, and implement the computationally expensive part using the cloud server; 2) encrypt the data before sending them to the server; 3) share the same black-box computing model on the server side by various A-agents; and 4) make the final decisions on the P/WCD side with selected results obtained from the server. Experimental results show that the performance of the A-agents does not change significantly even if they share the same black-box model. In addition, the P/WCD can be more energy efficient. Therefore, the proposed protocol can be very useful for improving the usability of P/WCDs.

    Download PDF (2384K)
  • Yoshitatsu Matsuda, Takayuki Sekiya, Kazunori Yamaguchi
    Type: Regular Papers
    Subject area: Education
    2018 Volume 26 Pages 497-508
    Published: 2018
    Released: June 15, 2018
    JOURNALS FREE ACCESS

    The design of appropriate curricula is one of the most important issues in higher educational institutions, and there are many features to be considered. In this paper, the two key features (“locality bias” and “combination of two simple factors”) were discovered by investigating the actual computer science (CS) curricula of the top-ranked universities on the basis of Computer Science Curricula 2013 (CS2013), where the CS topics are classified into the 18 Knowledge Areas (KAs). We applied a machine learning method named simplified, supervised latent Dirichlet allocation (ssLDA) to the actual syllabi of the CS departments of the 47 top-ranked universities. ssLDA estimates the relative weights of the KAs of CS2013 in each syllabus. Then, each CS department was characterized as the averaged weights of the KAs over its included syllabi. We applied the three well-known data analysis methods (hierarchical cluster analysis, principle component analysis, and non-negative matrix factorization) to the averaged weights of each department and found the above two key features quantitatively and objectively.

    Download PDF (1684K)
  • Ryusuke Mori, Masato Oguchi, Saneyasu Yamaguchi
    Type: Regular Papers
    Subject area: Special Section on Consumer Device & System
    2018 Volume 26 Pages 509-517
    Published: 2018
    Released: June 15, 2018
    JOURNALS FREE ACCESS

    Android Runtime (ART), which is the standard application runtime environment, has a garbage collection (GC) function. ART have an implementation of generational GC. The GC clusters objects into two groups, which are the young and old generations. An object in the young generation is promoted into an old object after passing several times of GC executions. In this paper, we propose to adjust the promoting condition based on the object feature, which is the size of each object. We then evaluate our proposed method and demonstrate that our method based on the feature can reduce the memory consumption of applications with smaller performance decline than the method without feature consideration.

    Download PDF (2086K)
  • Shintaro Oishi, Yoshihiro Shirakawa, Masatsugu Ichino, Hiroshi Yoshiur ...
    Type: Regular Papers
    Subject area: System Security
    2018 Volume 26 Pages 518-528
    Published: 2018
    Released: July 15, 2018
    JOURNALS FREE ACCESS

    A personal identification method has been developed for searching for a specific person among other people that fuses iris and periocular features using AdaBoost. It effectively integrates scores for many features using an AdaBoost configuration in which feature selection corresponds to weak classifier selection. We found three interesting facts of evaluation. First, evaluation using up to eight features showed that identification accuracy increased with the number of features used. The lowest equal error rate (EER) was 1.3% when eight features were used, and the highest identification rate was 94.1% when eight features were used. Second, the advantage of the proposed method over a weighted sum method increased with the number of features used. The difference in EER was 1.1% when eight features were used due to the generation of a nonlinear decision boundary, and the difference in the identification rate was 1.8% when eight features were used, again due to the generation of a nonlinear decision boundary. Finally, using an effective combination of information from both eyes further improved the accuracy (the difference in EER between the four-feature case and the eight-feature case was 0.7%, and the difference in identification rate between the four-feature case and the eight-feature case was 4.6%).

    Download PDF (2506K)
  • Chenhui Chu, Raj Dabre, Sadao Kurohashi
    Type: Regular Papers
    Subject area: Natural Language Processing
    2018 Volume 26 Pages 529-538
    Published: 2018
    Released: July 15, 2018
    JOURNALS FREE ACCESS

    Neural machine translation (NMT) has shown very promising results when there are large amounts of parallel corpora. However, for low resource domains, vanilla NMT cannot give satisfactory performance due to overfitting on the small size of parallel corpora. Two categories of domain adaptation approaches have been proposed for low resource NMT, i.e., adaptation using out-of-domain parallel corpora and in-domain monolingual corpora. In this paper, we conduct a comprehensive empirical comparison of the methods in both categories. For domain adaptation using out-of-domain parallel corpora, we further propose a novel domain adaptation method named mixed fine tuning, which combines two existing methods namely fine tuning and multi domain NMT. For domain adaptation using in-domain monolingual corpora, we compare two existing methods namely language model fusion and synthetic data generation. In addition, we propose a method that combines these two categories. We empirically compare all the methods and discuss their benefits and shortcomings. To the best of our knowledge, this is the first work on a comprehensive empirical comparison of domain adaptation methods for NMT.

    Download PDF (782K)
  • Hironori Nakajo
    Type: Special Issue of Embedded Systems Engineering
    2018 Volume 26 Pages 539
    Published: 2018
    Released: August 15, 2018
    JOURNALS FREE ACCESS
    Download PDF (33K)
  • Zhongqi Ma, Ryo Kurachi, Gang Zeng, Hiroaki Takada
    Type: Special Issue of Embedded Systems Engineering
    Subject area: Implementation of Operating System Functionality
    2018 Volume 26 Pages 540-548
    Published: 2018
    Released: August 15, 2018
    JOURNALS FREE ACCESS

    The recently developed FMLP+ provides significant advantages for partitioned fixed priority scheduling, since it ensures asymptotically optimal O(n) maximum priority-inversion blocking. The constraints under the FMLP+ can be exploited to determine bounds on the blocking time. However, these bounds may be pessimistic since shared resources local to a processor do not incur priority-inversion blocking in some cases. Consequently, a schedulable task set may be erroneously judged as unschedulable because of these pessimistic values. Based on our analysis, additional constraints were added to compute the maximum blocking time bound of each task with linear programming and the corresponding worst-case response time. The results of experiments show our proposed strategy is less pessimistic than existing strategies. Meanwhile, we also demonstrate that local resource sharing should be used instead of global resource sharing where possible.

    Download PDF (1013K)
  • Takuro Yamamoto, Takuma Hara, Takuya Ishikawa, Hiroshi Oyama, Hiroaki ...
    Type: Special Issue of Embedded Systems Engineering
    Subject area: Development Environments
    2018 Volume 26 Pages 549-561
    Published: 2018
    Released: August 15, 2018
    JOURNALS FREE ACCESS

    In embedded network software running on embedded systems within the Internet of Things (IoT), high levels of runtime efficiency and user productivity are required. As an approach to improve the productivity of software development, the mruby on TOPPERS embedded component system (TECS) framework has been proposed; note that mruby on TECS framework employs a scripting language (i.e., a lightweight Ruby implementation) and supports component-based development. In this paper, we propose an extended mruby on TECS framework for its application in developing software for IoT devices, including sensors and actuators. Our proposed framework enables mruby programs to utilize Tomakomai Internetworking (TINET), a TCP/IP protocol stack specifically designed for use in embedded systems. Further, the proposed framework incorporates two component-based functions, i.e., a componentized TINET stack called TINET+TECS and a componentized Two-Level Segregate Fit (TLSF) dynamic memory allocator called TLSF+TECS. Here, TINET+TECS improves configurability and scalability and offers software developers high levels of productivity through variable network buffer sizes and the ability to add or remove various TCP (or UDP) functions. TINET+TECS utilizes a dynamic TECS component connection method to satisfy the original TINET specifications. Further, TLSF+TECS is a thread-safe memory allocator that runs at high speeds and efficiently consumes memory. The experimental results of the comparison between TINET+TECS and the original TINET show that execution time and memory consumption overhead are both reduced; further, we conclude that configurability is improved. Finally, the TLSF+TECS function which obtains and reports statistical information regarding mruby's virtual machine (VM) memory usage, helps developers debug and verify their embedded IoT systems.

    Download PDF (1829K)
  • Renzhi Wang, Mizuho Iwaihara
    Type: Regular Papers
    Subject area: Special Section on Databases
    2018 Volume 26 Pages 562-570
    Published: 2018
    Released: August 15, 2018
    JOURNALS FREE ACCESS

    Wikipedia is the largest online encyclopedia, utilized as machine-knowledgeable and semantic resources. Links within Wikipedia indicate that two linked articles or parts of them are related each other about their topics. Existing link detection methods focus on linking to article titles, because most of links in Wikipedia point to article titles. But there is a number of links in Wikipedia pointing to corresponding specific segments, such as paragraphs, because the whole article is too general and it is hard for readers to obtain the intention of the link. We propose a method to automatically predict whether a link target is a specific segment or the whole article, and evaluate which segment is most relevant. We propose a combination method of Latent Dirichlet Allocation (LDA) and Maximum Likelihood Estimation (MLE) to represent every segment as a vector, and then we obtain similarity of each segment pair. Finally, we utilize variance, standard deviation and other statistical features to produce prediction results. We also apply word embeddings to embed all the segments into a semantic space and calculate cosine similarities between segment pairs. Then we utilize Random Forest to train a classifier to predict link scopes. Evaluations on Wikipedia articles show an ensemble of the proposed features achieved the best results.

    Download PDF (1804K)
  • Michiharu Takemoto
    Type: Special Issue of Applications and the Internet in Conjunction with Main Topics of COMPSAC 2017
    2018 Volume 26 Pages 571
    Published: 2018
    Released: September 15, 2018
    JOURNALS FREE ACCESS
    Download PDF (35K)
  • Fevzi Belli, Nevin Güler Dincer, Michael Linschulte, Tugkan Tuglular
    Type: Special Issue of Applications and the Internet in Conjunction with Main Topics of COMPSAC 2017
    Subject area: Invited Papers
    2018 Volume 26 Pages 572-589
    Published: 2018
    Released: September 15, 2018
    JOURNALS FREE ACCESS

    Model-based testing of large systems usually requires decomposition of the model into hierarchical submodels for generating test sequences, which fulfills the goals of module testing, but not those of system testing. System testing requires test sequences be generated from a fully resolved model, which necessitates refining the top-level model, that is, by replacing its elements with submodels they represent. If the depth of model hierarchy is high, the number of test sequences along with their length increases resulting in high test costs. For solving this conflict, a novel approach is introduced that generates test sequences based on the top-level model and replaces elements of these sequences by corresponding, optimized test sequences generated by the submodels. To compensate the shortcoming at test accuracy, the present approach selects components that have lowering impact on the overall system reliability. The objective is to increase the reliabilities of these critical components by intensive testing and appropriate correction which, as a consequence, also increases the overall reliability at less test effort without losing accuracy. An empirical study based on a large web-based commercial system is performed to validate the approach and analyze its characteristics, and to discuss its strengths and weaknesses.

    Download PDF (3125K)
  • Shuji Sannomiya, Akira Sato, Kenichi Yoshida, Hiroaki Nishikawa
    Type: Special Issue of Applications and the Internet in Conjunction with Main Topics of COMPSAC 2017
    Subject area: Network Quality and Control
    2018 Volume 26 Pages 590-600
    Published: 2018
    Released: September 15, 2018
    JOURNALS FREE ACCESS

    To identify abnormal traffic such as P2P flows, DDoS attacks, and Internet worms, this paper discusses a circuit design to realize real-time abnormal traffic detection in broadband networks. Real-time counting of cardinality is the key feature of the circuit. Although our previous study showed that cardinality counting is effective for detecting various types of abnormal traffic, the slowness of DRAM access prevented us from deploying cardinality counting in backbone networks. To address the problem of DRAM access time, this paper proposes a new algorithm for cardinality counting. By changing the order of the cardinality counting process, the proposed algorithm enables parallel accesses of DRAM circuits, which hides the slow DRAM access time through a pipeline circuit. In addition, we propose a new hashing function that also hides the DRAM access problem. It partially replaces scattered addresses with successive addresses, in order to use a faster DRAM burst access. We also report the accuracy of the cardinality counting of the new algorithm, and describe the estimated processing performance based on a pipeline tact level circuit simulation. Our experimental results show that the use of the self-timed pipeline circuit can help realize cardinality counting at rates up to 100Gbps.

    Download PDF (1374K)
  • Kenji Fujikawa, Ved P. Kafle, Pedro Martinez-Julia, Abu Hena Al Muktad ...
    Type: Special Issue of Applications and the Internet in Conjunction with Main Topics of COMPSAC 2017
    Subject area: Distributed Systems Operation and Management
    2018 Volume 26 Pages 601-611
    Published: 2018
    Released: September 15, 2018
    JOURNALS FREE ACCESS

    In this paper, we propose a mechanism for automatic configuration of name-bound virtual networks (NBVNs) for Internet of Things (IoT). Generally, IoT devices indicate their correspondent nodes by names. However, current technologies for the construction of virtual networks (VNs) rely on VLANs, IP routing, and OpenFlow control, thus they do not provide a name-based solution. Our proposal fills this gap. We first define business players and their roles for constructing and using the NBVNs. The players are application service provider (ASP), virtual network operator (VNO), and infrastructure provider (InP). Subsequently, we propose a system to automatically construct NBVN. It automatically allocates and assigns the IPv6 addresses required by the network nodes, including IoT devices, of each NBVN. Moreover, it auto-configures the mechanisms for data forwarding and name resolution. Thus, the proposed system constructs area- and/or time-bound VNs for offering network services to event-centric IoT applications, such as outdoor concerts and sporting events. Furthermore, we apply our proposed system to automatically construct wide-area management networks. Finally, we demonstrate that the constructed NBVNs are capable of configuring thousands of addresses and name entries within a minute, thus IoT devices can communicate with each other by using names instead of addresses.

    Download PDF (1789K)
  • Ei Khaing Win, Tomoki Yoshihisa, Yoshimasa Ishi, Tomoya Kawakami, Yuui ...
    Type: Special Issue of Applications and the Internet in Conjunction with Main Topics of COMPSAC 2017
    Subject area: Security Infrastructure
    2018 Volume 26 Pages 612-624
    Published: 2018
    Released: September 15, 2018
    JOURNALS FREE ACCESS

    In this paper, we propose an elliptic curve cryptography (ECC)-based certificateless multi-receiver encryption scheme for device to device communications on Internet of Things (IoT) applications. The proposed scheme eliminates computation expensive pairing operations to provide a lightweight multi-receiver encryption scheme, which has favourable properties for IoT applications. In addition to less time usage for both sender and receiver, the proposed scheme offers the necessary security properties such as source authentication, implicit user authentication, message integrity, and replay attack prevention for secure data exchange. In this paper, we show security proof for the proposed scheme based on the Elliptic Curve Discrete Logarithm Problem (ECDLP). We implemented our proposed scheme on a real embedded Android device and confirmed that it achieves less time cost for both encryption and decryption compared with the existing most efficient certificate-based multi-receiver encryption scheme and certificateless multi-receiver encryption scheme.

    Download PDF (432K)
  • Tomoka Azakami, Chihiro Shibata, Ryuya Uda
    Type: Special Issue of Applications and the Internet in Conjunction with Main Topics of COMPSAC 2017
    Subject area: Machine Learning & Data Mining
    2018 Volume 26 Pages 625-636
    Published: 2018
    Released: September 15, 2018
    JOURNALS FREE ACCESS

    Although most existing text-based CAPTCHAs use distorted images of alphanumerics, they have been criticized because large image distortions make it difficult for human beings to recognize the characters, despite the ease with which computers can eliminate distortions and consequently recognize them. Ergonomically designed CAPTCHAs, which exploit human-specific phenomena, are a solution to this problem. Ergonomic design enables humans to momentarily recognize them, while significantly increasing computational costs for machines to recognize characters. Recently, owing to the development of deep learning, the image recognition capability of machines has improved dramatically. Characters are easily recognized within reasonable time by deep convolutional neural networks (DCNNs), which have similar architectures to visual perception mechanisms of the brain. In this paper, to clarify whether ergonomically designed CAPTCHAs can withstand state-of-the-art methods of deep learning, we use several kinds of DCNNs to measure the classification rates of the characters displayed. With respect to ergonomic designs, we first use Amodal CAPTCHA proposed by Mori et al., which exploits the two human-specific phenomena of amodal completion and aftereffects. We secondly modify Amodal CAPTCHA by adding jagged lines to the edges of characters, aiming to prevent DCNNs from recognizing them correctly, since edges are one of the most fundamental features for DCNNs. Experimental results, however, show that both naive and jagged-lined Amodal CAPTCHAs are almost completely broken. Another approach we conducted is to use only complete characters without shielding as training data, assuming that attackers have no information about how amodal completion and jagged edges were applied. However, even for this assumption, the classification rate of DCNNs is still sufficiently high. On the whole, our results in this paper show that any ergonomic effects such as amodal completion and jagged edges are no longer countermeasures against character recognition by DCNNs.

    Download PDF (3217K)
  • Isao Echizen
    Type: Special Issue of Computer Security Technologies for a Super Smart Society
    2018 Volume 26 Pages 637
    Published: 2018
    Released: September 15, 2018
    JOURNALS FREE ACCESS
    Download PDF (37K)
  • Hiroaki Kikuchi, Chika Hamanaga, Hideo Yasunaga, Hiroki Matsui, Hideki ...
    Type: Special Issue of Computer Security Technologies for a Super Smart Society
    Subject area: Security and Society
    2018 Volume 26 Pages 638-647
    Published: 2018
    Released: September 15, 2018
    JOURNALS FREE ACCESS

    This paper studies the feasibility of privacy-preserving data mining in epidemiological study. As for the data-mining algorithm, we focus on a linear multiple regression that can be used to identify the most significant factors among many possible variables, such as the history of many diseases. We try to identify the linear model to quantify the most significant cause of death from distributed dataset related to the patient and the disease information. In this paper, we have conducted an experiment using a real medical dataset related to a stroke and attempt to apply multiple regression with six predictors of age, sex, the medical scales, e.g., Japan Coma Scale, and the modified Rankin Scale. Our contributions of this paper include (1) to propose a practical privacy-preserving protocol for linear multiple regression with vertically partitioned datasets, (2) to show the feasibility of the proposed system using the real medical dataset distributed into two parties, the hospital who knows the technical details of diseases while patients are in the hospital, and the local government who knows the resident even after the patient has left hospital, (3) to show the accuracy and the performance of the PPDM system which allows us to estimate the expected processing time when an arbitrary number of predictors are used and (4) to study the complexity of the extended models of vertically partition.

    Download PDF (733K)
  • Mitsuhiro Hattori, Takato Hirano, Nori Matsuda, Fumio Omatsu, Rina Shi ...
    Type: Special Issue of Computer Security Technologies for a Super Smart Society
    Subject area: Security and Society
    2018 Volume 26 Pages 648-661
    Published: 2018
    Released: September 15, 2018
    JOURNALS FREE ACCESS

    Privacy-preserving data mining technologies have been studied extensively, and as one general approach, Calmon and Fawaz have proposed a data distortion mechanism based on a statistical inference attack framework. This theory has been extended by Erdogdu et al. to time-series data and been applied to energy disaggregation of smart-meter data. However, their theory assumes both smart-meter data and sensitive appliance state information are available when applying the privacy-preserving mechanism which is impractical in typical smart-meter systems where only the total power usage is available. In this paper, we extend their approach to enable the application of a privacy-utility tradeoff mechanism to such practical applications. Firstly, we define a system model which captures both the architecture of the smart-meter system and the practical constraints that the power usage of each appliance cannot be measured individually. This enables us to formalize the tradeoff problem more rigorously. Secondly, we propose a privacy-utility tradeoff mechanism for that system. We apply a linear Gaussian model assumption to the system and thereby reduce the problem of obtaining unobservable information to that of learning the system parameters. Finally, we conduct two experiments applying the proposed mechanism to the power usage data of actual households. The results of the two experiments show that the proposed mechanism works partly effectively; i.e., it prevents usage analysis of certain types of sensitive appliances while at the same time preserving that of non-sensitive appliances.

    Download PDF (1065K)
  • Chun-Jung Wu, Ying Tie, Satoshi Hara, Kazuki Tamiya, Akira Fujita, Kat ...
    Type: Special Issue of Computer Security Technologies for a Super Smart Society
    Subject area: System Security
    2018 Volume 26 Pages 662-672
    Published: 2018
    Released: September 15, 2018
    JOURNALS FREE ACCESS

    In recent years, many Internet-of-Things (IoT) devices, such as home routers and Internet Protocol (IP) cameras, have been compromised through infection by malware as a consequence of weak authentication and other vulnerabilities. Malware infection can lead to functional disorders and/or misuse of these devices in cyberattacks of various kinds. However, unlike personal computers (PCs), low-cost IoT devices lack rich computational resources, with the result that conventional protection mechanisms, such as signature-based anti-virus software, cannot be used. In this study, we present IoTProtect, a light-weight, whitelist-based protection mechanism that can be deployed easily on existing commercial products with very little modification of their firmware. IoTProtect uses a whitelist to check processes running on IoT devices and terminate unknown processes periodically. Our experiments using four low-cost IoT devices and 4, 981 in-the-wild malware binaries show that IoTProtect successfully terminated 99.92% of the processes created by the binaries within 44 seconds after their infection with central processing unit (CPU) overhead of 24% and disk space overhead of 288KB.

    Download PDF (1888K)
  • Yuhei Kawakoya, Eitaro Shioji, Yuto Otsuki, Makoto Iwamura, Jun Miyosh ...
    Type: Regular Papers
    Subject area: Network Security
    2018 Volume 26 Pages 673-686
    Published: 2018
    Released: September 15, 2018
    JOURNALS FREE ACCESS

    Understanding how application programming interfaces (APIs) are used in a program plays an important role in malware analysis. This, however, has resulted in an endless battle between malware authors and malware analysts around the development of API [de]obfuscation techniques over the last few decades. Our goal in this paper is to show the limit of existing API de-obfuscation techniques. To do that, we first analyzed existing API [de]obfuscation techniques and clarified that an attack vector commonly exists in these techniques; then, we present Stealth Loader, which is a program loader to bypass all existing API de-obfuscation techniques. The core idea of Stealth Loader is to load a dynamic link library (DLL) and resolve its dependency without leaving any traces on memory to be detected. We demonstrated the effectiveness of Stealth Loader by analyzing a set of Windows executables and malware protected with Stealth Loader using major dynamic and static analysis tools. The results indicate that among other obfuscation tools, only Stealth Loader is able to successfully bypass all analysis tools.

    Download PDF (750K)
  • Yuichi Taguchi, Tsutomu Yoshinaga
    Type: Regular Papers
    Subject area: Special Section on Advanced Computing Systems
    2018 Volume 26 Pages 687-695
    Published: 2018
    Released: September 15, 2018
    JOURNALS FREE ACCESS

    A method for assessing performance of a cloud-based remote data-protection system is proposed. This method makes it possible to determine whether the system performance achieves recovery point objective (RPO) or not. It is also necessary to resize the data-protection system for satisfying required performance with smaller expenses. Therefore, we established a formula for simulating system performance. Based on the results of the simulation, we show it is possible to determine an appropriate system size that achieves RPO as well as minimizes the amount of system resources. It is experimentally demonstrated that there are no gap between the experimentally measured and simulated recovery points at peak of system workload. So we concluded that the proposed simulation is accurate enough. Also, it is estimated that 89% of system resource expenses could be reduced in comparison to the situation in which data-protection system is designed to fit a peak workload.

    Download PDF (1363K)
  • Junji Yamada, Ushio Jimbo, Ryota Shioya, Masahiro Goshima, Shuichi Sak ...
    Type: Regular Papers
    Subject area: Special Section on Advanced Computing Systems
    2018 Volume 26 Pages 696-705
    Published: 2018
    Released: September 15, 2018
    JOURNALS FREE ACCESS

    The region that includes the register file is a hot spot in high-performance cores that limits the clock frequency. Although multibanking drastically reduces the area and energy consumption of the register files of superscalar processor cores, it suffers from low IPC due to bank conflicts. This paper proposes a bank-aware instruction scheduler which selects instructions so that no bank conflict occurs, except for a bank conflict in one instruction. The evaluation results show that, compared with NORCS, which is the latest research on a register file for area and energy efficiency, a proposed register file with 24 banks achieves a 20.9% and 56.0% reduction in circuit area and in energy consumption, while maintaining a relative IPC of 97.0%, and the latency of the instruction scheduler.

    Download PDF (2775K)
  • Tsutomu Terada
    Type: Special Issue of Ubiquitous Computing Systems (VII)
    2018 Volume 26 Pages 706
    Published: 2018
    Released: October 15, 2018
    JOURNALS FREE ACCESS
    Download PDF (34K)
  • Tatsuya Ishihara, Kris M. Kitani, Chieko Asakawa, Michitaka Hirose
    Type: Special Issue of Ubiquitous Computing Systems (VII)
    Subject area: Sensing Systems
    2018 Volume 26 Pages 707-717
    Published: 2018
    Released: October 15, 2018
    JOURNALS FREE ACCESS

    For many automated navigation applications, the underlying localization algorithm must be able to continuously produce results that are both accurate and stable. To date, various types of localization approaches including GPS, Wi-Fi, Bluetooth and cameras have been studied extensively. Image-based localization approaches have been developed by using commodity devices, such as smartphones, and these have been shown to produce accurate localization systems. However, image-based localization approaches do not work well in environments that lack visual features. Therefore, we propose a novel approach that combines the use of radio-wave information with computer vision-based localization. In particular, we assume that Bluetooth low energy (BLE) devices are already installed in the environment. We integrate radio-wave information with two types of well-known image-based localization approaches: a Structure from Motion (SfM) based approach and a deep convolutional neural network (CNN) based approach. Our experimental results show that both image-based localization approaches can be more accurate when combined with radio-wave signals. The results also show that the localization accuracy of the proposed deep CNN approach is comparable to that of SfM and significantly more robust than it. In addition, the proposed deep CNN approach was found to be robust to BLE device failures.

    Download PDF (3072K)
  • Yu Enokibori, Kenji Mase
    Type: Special Issue of Ubiquitous Computing Systems (VII)
    Subject area: Health Care
    2018 Volume 26 Pages 718-727
    Published: 2018
    Released: October 15, 2018
    JOURNALS FREE ACCESS

    E-textiles have come to be used instead of several types of common equipment, such as bed-sheets, in some cases. An application using body pressure data collected through such bed-sheet type sensors is the in-bed posture classification expected for pressure ulcer prevention. Since such body pressure data is a kind of low-resolution image, Deep Neural Network (DNN) based algorithms seem suitable. However, it is difficult to collect enough data to use for DNN in this domain because the number of sleep postures obtained from one experiment is small. For an example, the number of postures collected from 19 subjects with four hours of sleep each is only 224. To solve such a small data-size problem in DNN, data augmentation techniques have been proposed. However, random augmentations are not so suitable. Therefore, we investigated appropriate augmentation parameters for this domain. As a result, the combination of the up to ±20% and ±40% random shifts along short and long sides of a bed, the up to ±10degree rotation, and non-use of other transformations showed the best performance. With the parameters, the built DNN showed 99.7% accuracy and 0.997 Weighted F1-score for three posture classifications: supine, left and right lateral positions, and 97.1% accuracy and 0.970 Weighted F1-score for four posture classifications: supine, prone, left and right lateral positions.

    Download PDF (3869K)
  • Ren Ohmura, Kodai Yamamoto
    Type: Special Issue of Ubiquitous Computing Systems (VII)
    Subject area: Health Care
    2018 Volume 26 Pages 728-735
    Published: 2018
    Released: October 15, 2018
    JOURNALS FREE ACCESS

    In this study, with the aim of improving bystander CPR support using a smartphone and a smartwatch, we evaluated six feedback methods considering practical situations. Since CPR is sometimes required to be conducted in a noisy place, each method was evaluated with 50dB and 80dB noise environments, which correspond to a silent office and a noisy construction site, respectively. Also, considering the requirement for a bystander to maintain the safety of him/herself and in order to give appropriate care to the patient, the capability of noticing change in patient condition during CPR was evaluated. From the evaluation results, the best feedback method is a method that uses voice, a metronome sound and a graphic display on a smartphone and vibration and graphic display on a smartwatch if both a smartphone and a smartwatch are available. For only a smartphone, the result shows that feedback using only voice is better in the loud environment, while feedback using voice and clicking sounds is the best in the quiet environment. Moreover, with regard to the subjective feeling, feedback using only a smartwatch is worse than other methods, and it is recommended to be used in conjunction with a smartphone.

    Download PDF (1077K)
  • Toru Kobayashi, Taishi Miyazaki, Rinsuke Uchida, Hiroe Tanaka, Kenichi ...
    Type: Special Issue of Ubiquitous Computing Systems (VII)
    Subject area: Social Media Applications
    2018 Volume 26 Pages 736-746
    Published: 2018
    Released: October 15, 2018
    JOURNALS FREE ACCESS

    We suggest a Social Media Agency Robot that can be used for interactive communication between elderly people and the younger generation via existing Social Media. This robot system has been implemented on a cloud service and a single board computer embedded in a human-type robot, which is equipped with a microphone, camera, speaker, sensors, and network access function, so that elderly people can transmit and receive information by voice via Social Media without using smartphones. We employed LINE which is a proprietary application for instant communications on electronic devices such as smartphones. In LINE, we need to select a message destination address before sending messages. On the other hand, our developed robot is basically operated by voice due to realizing the simple user interface. Therefore, we suggested a message exchange learning-type destination estimation method that enables elderly people not to express a message destination address explicitly. We also designed the Social Media Agency Robot based on the service-oriented architecture that enables us easily to use open innovation such as external cloud-based services. We developed the prototype system including the message exchange learning-type destination estimation method. Then, we did an experiment to evaluate our prototype system at a house for the elderly with the care service and detached houses in the Nagasaki prefecture. We confirmed the effectiveness of the message destination estimation through the real message exchange experiment by the elderly. We also evaluated the usability of the prototype system through the interview with the experiment subject.

    Download PDF (1754K)
  • Shu Kaneko, Eiichiro Chishiro
    Type: Regular Papers
    Subject area: Special Section on Programming
    2018 Volume 26 Pages 747-754
    Published: 2018
    Released: October 15, 2018
    JOURNALS FREE ACCESS

    Resource Description Framework (RDF) is widely used as a framework for uniformly describing resources on the web. RDF expresses various data with triples that consist of subject, predicate and object. Query information can be inquired using the standard query language called SPARQL. Since the amount of retrieval processing also increases for large-scale data, many processing performance improvement methods by distributed processing using multiple servers have been proposed. As for processing of queries that perform aggregations such as SPARQL duplication elimination and average value function, it is impossible to avoid an increase in throughput due to an increase in the amount of data. In this paper, we propose a query conversion processing method so that distributed processing method can be applied to queries including aggregation. In addition, based on a method using summary information of data for distributed processing, the results of the retrieval using the proposed method were evaluated using real data. As a result, we confirmed the speedup of the search process by distributed processing.

    Download PDF (928K)
  • Art Subpa-asa, Ying Fu, Yinqiang Zheng, Toshiyuki Amano, Imari Sato
    Type: Regular Papers
    Subject area: Image Processing
    2018 Volume 26 Pages 755-767
    Published: 2018
    Released: November 15, 2018
    JOURNALS FREE ACCESS

    Direct and global component separation is an approach to study the light transport that provides a basic understanding of the property of a scene. The conventional technique for separation relies on multiple images or an approximation which results in loss of spatial resolution. In this article, we propose a novel single image separation technique by introducing a linear basis equation with full resolution. We evaluate the data independent Fourier basis and learning-based PCA basis to locate the better basis representation of direct and global components. We carefully analyze the importance of high spatial frequency pattern to the effectiveness of our technique. Moreover, we propose the performance enhancement technique to reduce memory usage and computation time for practical implementation. The experimental results confirm that our proposed method delivers higher separation accuracy and better image quality than the previous methods and is applicable to challenging video sequences.

    Download PDF (3714K)
  • Zhaohao Zeng, Cheng Luo, Lifeng Shang, Hang Li, Tetsuya Sakai
    Type: Regular Papers
    Subject area: Special Section on Databases
    2018 Volume 26 Pages 768-778
    Published: 2018
    Released: November 15, 2018
    JOURNALS FREE ACCESS

    We attempt to tackle the problem of evaluating textual, multi-round, task-oriented dialogues between the customer and the helpdesk, such as those that take the form of online chats. As an initial step towards automatic evaluation of helpdesk agent systems, we have constructed a test collection comprising 3, 700 real Customer-Helpdesk multi-round dialogues by mining Weibo, a major Chinese microblogging media. Each dialogue has been annotated with multiple subjective quality annotations and nugget annotations, where a nugget is a minimal sequence of posts by the same utterer that helps towards problem solving. In addition, 34% of the dialogues have been manually translated into English. We first propose a nugget-based dialogue quality evaluation measure called Utility for Customer and Helpdesk (UCH), where a nugget is a manually identified utterance within a dialogue that helps the Customer advance towards problem solving. In addition, we propose a simple neural network-based approach to predicting the dialogue quality scores from the entire dialogue, which we call Neural Evaluation Machine (NEM). According to our experiments with DCH-1, UCH correlates better with the appropriateness of utterances than with customer satisfaction. In contrast, as NEM leverages natural language expressions within the dialogue, it correlates relatively well with customer satisfaction.

    Download PDF (1239K)
  • Akitoshi Okumura, Takamichi Hoshino, Susumu Handa, Eiko Yamada, Masahi ...
    Type: Regular Papers
    Subject area: Special Section on Consumer Device & System
    2018 Volume 26 Pages 779-788
    Published: 2018
    Released: November 15, 2018
    JOURNALS FREE ACCESS

    This paper proposes an identity-verification system for attendees of large-scale events using face recognition from selfies taken with smartphone cameras. Such a system has been required to prevent illegal resale such as ticket scalping. The problem in verifying ticket holders is how to simultaneously verify identities efficiently and prevent individuals from impersonating others at a large-scale event at which tens of thousands of people participate. We developed two ticket ID systems for identifying the purchaser and holder of a ticket that uses two face-recognition systems, i.e., tablet-based and non-stop face-recognition systems that require ID equipment such as a tablet terminal with a camera, card reader, and ticket-issuing printer. Since these systems were proven effective for preventing illegal resale by verifying attendees at large concerts of popular music groups, they have been used at more than 100 concerts. However, simplifying the ID equipment is necessary from an operational view-point. It is also necessary to secure clear facial photos from a technical view-point of face-recognition accuracy because face recognition fails when unclear face photos are obtained, i.e., when individuals have their eyes closed, not looking directly forward, or have hair covering their faces. We proposed a system that uses attendees' selfies as input photos for face recognition, which simplifies ID equipment by only requiring smartphone cameras, enabling clear face photos to be obtained the allowing attendees to take their own photos. The system achieved 97.5% face-recognition accuracy in preliminary tests of 240 photos taken under various brightness and background conditions.

    Download PDF (2426K)
  • Kouichi Mouri
    Type: Special Issue of Toward Realizing Trusted Infrastructure by Security Technology and Human Resources for It
    2018 Volume 26 Pages 789
    Published: 2018
    Released: December 15, 2018
    JOURNALS FREE ACCESS
    Download PDF (36K)
  • Takayuki Sasaki, Katsunari Yoshioka, Tsutomu Matsumoto
    Type: Special Issue of Toward Realizing Trusted Infrastructure by Security Technology and Human Resources for It
    Subject area: Network Security
    2018 Volume 26 Pages 790-803
    Published: 2018
    Released: December 15, 2018
    JOURNALS FREE ACCESS

    New attack methods, such as new malware and exploits are released every day. Attack information is essential to improve defense mechanisms. However, we can identify barriers against attack information sharing. One barrier is that most targeted organizations do not want to disclose the attack and incident information because they fear negative public relations caused by disclosing incident information. Another barrier is that attack and incident information include confidential information. To address this problem, we propose a confidentiality-preserving collaborative defense architecture that analyzes incident information without disclosing confidential information of the attacked organizations. To avoid disclosure of confidential information, the key features of the proposed architecture are (1) exchange of trained classifiers, e.g., neural networks, that represent abstract information rather than raw attack/incident information and (2) classifier aggregation via ensemble learning to build an accurate classifier using the information of the collaborative organizations. We implement and evaluate an initial prototype of the proposed architecture. The results indicate that the malware classification accuracy improved from 90.4% to 92.2% by aggregating five organization classifiers. We conclude that the proposed architecture is feasible and demonstrates practical performance. We expect that the proposed architecture will facilitate an effective and collaborative response to current attack-defense situations.

    Download PDF (874K)
  • Mamoru Mimura, Hidema Tanaka
    Type: Regular Papers
    Subject area: Network Security
    2018 Volume 26 Pages 804-812
    Published: 2018
    Released: December 15, 2018
    JOURNALS FREE ACCESS

    Cyberattack techniques continue to evolve every day. Detecting unseen drive-by-download attacks or C&C traffic is a challenging task. Pattern-matching-based techniques and using malicious blacklists are not efficient anymore, because attackers easily change the traffic pattern or infrastructure to avoid detection. Therefore, many behavior-based detection methods have been proposed, which use the immutable characteristic of the traffic. These previous methods, however, focus on the attack technique, and can only detect drive-by-download (DbD) attacks or C&C traffic which have the immutable characteristic. These traditional methods have to devise the feature vectors. This paper proposes a generic detection method, which is independent of attack methods and does not need devising feature vectors. Our method uses Paragraph Vector, an unsupervised algorithm that learns fixed-length feature representations from variable-length texts and classifiers. Our method uses Paragraph Vector to capture the context in proxy server logs. We conducted cross-validation, timeline analysis and cross-dataset validation with multiple datasets. The experimental results show our method can detect unseen DbD attacks and C&C traffic in proxy server logs. The best F-measure achieved 0.97 in the timeline analysis and 0.96 on the other dataset.

    Download PDF (645K)
  • Yuhei Kawakoya, Makoto Iwamura, Jun Miyoshi
    Type: Regular Papers
    Subject area: System Security
    2018 Volume 26 Pages 813-824
    Published: 2018
    Released: December 15, 2018
    JOURNALS FREE ACCESS

    Windows Application Programming Interface (API) is an important data source for analysts to effectively understand the functions of malware. Due to this, malware authors are likely to hide the imported APIs in their malware by taking advantage of various obfuscation techniques. In this paper, we first build a formal model of the Import Address Table (IAT) reconstruction procedure to keep our description independent of specific implementations and then formally point out that the current IAT reconstruction is vulnerable to position obfuscation techniques, which are anti-analysis techniques obfuscating the positions of loaded APIs or Dynamic Link Libraries (DLLs). Next, we introduce an approach for API name resolution, which is an essential step in IAT reconstruction, on the basis of taint analysis to defeat position obfuscation techniques. The key idea of our approach is that we first define taint tags, each of which has a unique value for each API, apply the taint of the API to each of its instructions, track the movement of the API instructions by propagating the tags, and then resolve API names from the propagated tags for IAT reconstruction after acquiring a memory dump of the process under analysis. Finally, we experimentally demonstrate that a system in which our proposed API name resolution has been implemented enables us to correctly identify imported APIs even when malware authors apply various position obfuscation techniques to their malware.

    Download PDF (883K)
feedback
Top