Special Section on Parallel, Distributed, and Reconfigurable Computing, and Networking
-
Fukuhito OOSHITA
2020 Volume E103.D Issue 12 Pages
2411
Published: December 01, 2020
Released on J-STAGE: December 01, 2020
JOURNAL
FREE ACCESS
-
Lucas Saad Nogueira NUNES, Jacir Luiz BORDIM, Yasuaki ITO, Koji NAKANO
Article type: PAPER
Subject area: Fundamentals of Information Systems
2020 Volume E103.D Issue 12 Pages
2412-2420
Published: December 01, 2020
Released on J-STAGE: December 01, 2020
JOURNAL
FREE ACCESS
The volume of digital information is growing at an extremely fast pace which, in turn, exacerbates the need of efficient mechanisms to find the presence of a pattern in an input text or a set of input strings. Combining the processing power of Graphics Processing Unit (GPU) with matching algorithms seems a natural alternative to speedup the string-matching process. This work proposes a Parallel Rabin-Karp implementation (PRK) that encompasses a fast-parallel prefix-sums algorithm to maximize parallelization and accelerate the matching verification. Given an input text T of length n and p patterns of length m, the proposed implementation finds all occurrences of p in T in O(m+q+n/τ+nm/q) time, where q is a sufficiently large prime number and τ is the available number of threads. Sequential and parallel versions of the PRK have been implemented. Experiments have been executed on p≥1 patterns of length m comprising of m=10, 20, 30 characters which are compared against a text string of length n=227. The results show that the parallel implementation of the PRK algorithm on NVIDIA V100 GPU provides speedup surpassing 372 times when compared to the sequential implementation and speedup of 12.59 times against an OpenMP implementation running on a multi-core server with 128 threads. Compared to another prominent GPU implementation, the PRK implementation attained speedup surpassing 37 times.
View full abstract
-
Jingcheng SHEN, Fumihiko INO, Albert FARRÉS, Mauricio HANZICH
Article type: PAPER
Subject area: Fundamentals of Information Systems
2020 Volume E103.D Issue 12 Pages
2421-2434
Published: December 01, 2020
Released on J-STAGE: December 01, 2020
JOURNAL
FREE ACCESS
Graphics processing units (GPUs) are highly efficient architectures for parallel stencil code; however, the small device (i.e., GPU) memory capacity (several tens of GBs) necessitates the use of out-of-core computation to process excess data. Great programming effort is needed to manually implement efficient out-of-core stencil code. To relieve such programming burdens, directive-based frameworks emerged, such as the pipelined accelerator (PACC); however, they usually lack specific optimizations to reduce data transfer. In this paper, we extend PACC with two data-centric optimizations to address data transfer problems. The first is a direct-mapping scheme that eliminates host (i.e., CPU) buffers, which intermediate between the original data and device buffers. The second is a region-sharing scheme that significantly reduces host-to-device data transfer. The extended PACC was applied to an acoustic wave propagator, automatically extending the length of original serial code 2.3-fold to obtain the out-of-core code. Experimental results revealed that on a Tesla V100 GPU, the generated code ran 41.0, 22.1, and 3.6 times as fast as implementations based on Open Multi-Processing (OpenMP), Unified Memory, and the previous PACC, respectively. The generated code also demonstrated usefulness with small datasets that fit in the device capacity, running 1.3 times as fast as an in-core implementation.
View full abstract
-
Ke CUI, Michihiro KOIBUCHI
Article type: PAPER
Subject area: Fundamentals of Information Systems
2020 Volume E103.D Issue 12 Pages
2435-2443
Published: December 01, 2020
Released on J-STAGE: December 01, 2020
JOURNAL
FREE ACCESS
Random network topologies have been proposed as a low-latency network for parallel computers. Although multicast is a common collective-communication operation, multicast algorithms each of which consists of a large number of unicasts are not well optimized for random network topologies. In this study, we firstly apply a two-opt algorithm for building efficient multicast on random network topologies. The two-opt algorithm creates a skilled ordered list of visiting nodes to minimize the total path hops or the total possible contention counts of unicasts that form the target multicast. We secondly extend to apply the two-opt algorithm for the other collective-communication operations, e.g., allreduce and allgather. The SimGrid discrete-event simulation results show that the two-opt multicast outperforms that in typical MPI implementation by up to 22% of the execution time of an MPI program that repeats the MPI_Bcast function. The two-opt allreduce and the two-opt allgather operations also improve by up to 15% and 14% the execution time when compared to those used in typical MPI implementations, respectively.
View full abstract
-
Chenxu WANG, Yutong LU, Zhiguang CHEN, Junnan LI
Article type: PAPER
Subject area: Fundamentals of Information Systems
2020 Volume E103.D Issue 12 Pages
2444-2456
Published: December 01, 2020
Released on J-STAGE: December 01, 2020
JOURNAL
FREE ACCESS
Training deep learning (DL) is a computationally intensive process; as a result, training time can become so long that it impedes the development of DL. High performance computing clusters, especially supercomputers, are equipped with a large amount of computing resources, storage resources, and efficient interconnection ability, which can train DL networks better and faster. In this paper, we propose a method to train DL networks distributed with high efficiency. First, we propose a hierarchical synchronous Stochastic Gradient Descent (SGD) strategy, which can make full use of hardware resources and greatly increase computational efficiency. Second, we present a two-level parameter synchronization scheme which can reduce communication overhead by transmitting parameters of the first layer models in shared memory. Third, we optimize the parallel I/O by making each reader read data as continuously as possible to avoid the high overhead of discontinuous data reading. At last, we integrate the LARS algorithm into our system. The experimental results demonstrate that our approach has tremendous performance advantages relative to unoptimized methods. Compared with the native distributed strategy, our hierarchical synchronous SGD strategy (HSGD) can increase computing efficiency by about 20 times.
View full abstract
-
Yuxi SUN, Hideharu AMANO
Article type: PAPER
Subject area: Computer System
2020 Volume E103.D Issue 12 Pages
2457-2462
Published: December 01, 2020
Released on J-STAGE: December 01, 2020
JOURNAL
FREE ACCESS
Recurrent neural networks (RNNs) have been proven effective for sequence-based tasks thanks to their capability to process temporal information. In real-world systems, deep RNNs are more widely used to solve complicated tasks such as large-scale speech recognition and machine translation. However, the implementation of deep RNNs on traditional hardware platforms is inefficient due to long-range temporal dependence and irregular computation patterns within RNNs. This inefficiency manifests itself in the proportional increase in the latency of RNN inference with respect to the number of layers of deep RNNs on CPUs and GPUs. Previous work has focused mostly on optimizing and accelerating individual RNN cells. To make deep RNN inference fast and efficient, we propose an accelerator based on a multi-FPGA platform called Flow-in-Cloud (FiC). In this work, we show that the parallelism provided by the multi-FPGA system can be taken advantage of to scale up the inference of deep RNNs, by partitioning a large model onto several FPGAs, so that the latency stays close to constant with respect to increasing number of RNN layers. For single-layer and four-layer RNNs, our implementation achieves 31x and 61x speedup compared with an Intel CPU.
View full abstract
-
Masayuki SHIMODA, Youki SADA, Ryosuke KURAMOCHI, Shimpei SATO, Hiroki ...
Article type: PAPER
Subject area: Computer System
2020 Volume E103.D Issue 12 Pages
2463-2470
Published: December 01, 2020
Released on J-STAGE: December 01, 2020
JOURNAL
FREE ACCESS
In the realization of convolutional neural networks (CNNs) in resource-constrained embedded hardware, the memory footprint of weights is one of the primary problems. Pruning techniques are often used to reduce the number of weights. However, the distribution of nonzero weights is highly skewed, which makes it more difficult to utilize the underlying parallelism. To address this problem, we present SENTEI*, filter-wise pruning with distillation, to realize hardware-aware network architecture with comparable accuracy. The filter-wise pruning eliminates weights such that each filter has the same number of nonzero weights, and retraining with distillation retains the accuracy. Further, we develop a zero-weight skipping inter-layer pipelined accelerator on an FPGA. The equalization enables inter-filter parallelism, where a processing block for a layer executes filters concurrently with straightforward architecture. Our evaluation of semantic-segmentation tasks indicates that the resulting mIoU only decreased by 0.4 points. Additionally, the speedup and power efficiency of our FPGA implementation were 33.2× and 87.9× higher than those of the mobile GPU. Therefore, our technique realizes hardware-aware network with comparable accuracy.
View full abstract
-
Ryuta KAWANO, Ryota YASUDO, Hiroki MATSUTANI, Michihiro KOIBUCHI, Hide ...
Article type: PAPER
Subject area: Computer System
2020 Volume E103.D Issue 12 Pages
2471-2479
Published: December 01, 2020
Released on J-STAGE: December 01, 2020
JOURNAL
FREE ACCESS
Network throughput has become an important issue for big-data analysis on Warehouse-Scale Computing (WSC) systems. It has been reported that randomly-connected inter-switch networks can enlarge the network throughput. For irregular networks, a multi-path routing method called k-shortest path routing is conventionally utilized. However, it cannot efficiently exploit longer-than-shortest paths that would be detour paths to avoid bottlenecks. In this work, a novel routing method called k-optimized path routing to achieve high throughput is proposed for irregular networks. We introduce a heuristic to select detour paths that can avoid bottlenecks in the network to improve the average-case network throughput. Experimental results by network simulation show that the proposed k-optimized path routing can improve the saturation throughput by up to 18.2% compared to the conventional k-shortest path routing. Moreover, it can reduce the computation time required for optimization to 1/2760 at a minimum compared to our previously proposed method.
View full abstract
-
Yao HU, Michihiro KOIBUCHI
Article type: PAPER
Subject area: Computer System
2020 Volume E103.D Issue 12 Pages
2480-2493
Published: December 01, 2020
Released on J-STAGE: December 01, 2020
JOURNAL
FREE ACCESS
Due to recent technology progress based on big-data processing, many applications present irregular or unpredictable communication patterns among compute nodes in high-performance computing (HPC) systems. Traditional communication infrastructures, e.g., torus or fat-tree interconnection networks, may not handle well their matchmaking problems with these newly emerging applications. There are already many communication-efficient application mapping algorithms for these typical non-random network topologies, which use nearby compute nodes to reduce the network distances. However, for the above unpredictable communication patterns, it is difficult to efficiently map their applications onto the non-random network topologies. In this context, we recommend using random network topologies as the communication infrastructures, which have drawn increasing attention for the use of HPC interconnects due to their small diameter and average shortest path length (ASPL). We make a comparative study to analyze the impact of application mapping performance on non-random and random network topologies. We propose using topology embedding metrics, i.e., diameter and ASPL, and list several diameter/ASPL-based application mapping algorithms to compare their job scheduling performances, assuming that the communication pattern of each application is unpredictable to the computing system. Evaluation with a large compound application workload shows that, when compared to non-random topologies, random topologies can reduce the average turnaround time up to 39.3% by a random connected mapping method and up to 72.1% by a diameter/ASPL-based mapping algorithm. Moreover, when compared to the baseline topology mapping method, the proposed diameter/ASPL-based topology mapping strategy can reduce up to 48.0% makespan and up to 78.1% average turnaround time, and improve up to 1.9x system utilization over random topologies.
View full abstract
-
Hiromu MIYAZAKI, Takuto KANAMORI, Md Ashraful ISLAM, Kenji KISE
Article type: PAPER
Subject area: Computer System
2020 Volume E103.D Issue 12 Pages
2494-2503
Published: December 01, 2020
Released on J-STAGE: December 01, 2020
JOURNAL
FREE ACCESS
RISC-V is a RISC based open and loyalty free instruction set architecture which has been developed since 2010, and can be used for cost-effective soft processors on FPGAs. The basic 32-bit integer instruction set in RISC-V is defined as RV32I, which is sufficient to support the operating system environment and suits for embedded systems. In this paper, we propose an optimized RV32I soft processor named RVCoreP adopting five-stage pipelining. Three effective methods are applied to the processor to improve the operating frequency. These methods are instruction fetch unit optimization, ALU optimization, and data memory optimization. We implement RVCoreP in Verilog HDL and verify the behavior using Verilog simulation and an actual Xilinx Atrix-7 FPGA board. We evaluate IPC (instructions per cycle), operating frequency, hardware resource utilization, and processor performance. From the evaluation results, we show that RVCoreP achieves 30.0% performance improvement compared with VexRiscv, which is a high-performance and open source RV32I processor selected from some related works.
View full abstract
-
Elsayed A. ELSAYED, Kenji KISE
Article type: PAPER
Subject area: Computer System
2020 Volume E103.D Issue 12 Pages
2504-2517
Published: December 01, 2020
Released on J-STAGE: December 01, 2020
JOURNAL
FREE ACCESS
Data sorting is an important operation in computer science. It is extensively used in several applications such as database and searching. While high-performance sorting accelerators are in demand, it is very important to pay attention to the hardware resources for such kind of high-performance sorters. In this paper, we propose three FPGA based architectures to accelerate sorting operation based on the merge sorting algorithm. We call our proposals as WMS: Wide Merge Sorter, EHMS: Efficient Hardware Merge Sorter, and EHMSP: Efficient Hardware Merge Sorter Plus. We target the Virtex UltraScale FPGA device. Evaluation results show that our proposed merge sorters maintain both the high-performance and cost-effective properties. While using much fewer hardware resources, our proposed merge sorters achieve higher performance compared to the state-of-the-art. For instance, with 256 sorted records are produced per cycle, implementation results of proposed EHMS show a significant reduction in the required number of Flip Flops (FFs) and Look-Up Tables (LUTs) to about 66% and 79%, respectively over the state-of-the-art merge sorter. Moreover, while requiring fewer hardware resources, EHMS achieves about 1.4x higher throughput than the state-of-the-art merge sorter. For the same number of produced records, proposed WMS also achieves about 1.6x throughput improvement over the state-of-the-art while requiring about 81% of FFs and 76% of LUTs needed by the state-of-the-art sorter.
View full abstract
-
Shougo INOUE, Satoshi FUJITA
Article type: PAPER
Subject area: Computer System
2020 Volume E103.D Issue 12 Pages
2518-2524
Published: December 01, 2020
Released on J-STAGE: December 01, 2020
JOURNAL
FREE ACCESS
In this paper, we consider the collaborative editing of two-dimensional (2D) data such as handwritten letters and illustrations. In contrast to the editing of 1D data, which is generally realized by the combination of insertion/deletion of characters, overriding of strokes can have a specific meaning in editing 2D data. In other words, the appearance of the resulting picture depends on the reflection order of strokes to the shared canvas in addition of the absolute coordinate of the strokes. We propose a Peer-to-Peer (P2P) collaborative drawing system consisting of several nodes with replica canvas, in which the consistency among replica canvases is maintained through data channel of WebRTC. The system supports three editing modes concerned with the reflection order of strokes generated by different users. The result of experiments indicates that the proposed system realizes a short latency of around 120 ms, which is a half of a cloud-based system implemented with Firebase Realtime Database. In addition, it realizes a smooth drawing of pictures on remote canvases with a refresh rate of 12 fps.
View full abstract
-
SeokHwan KONG, Saikia DIPJYOTI, JaiYong LEE
Article type: LETTER
Subject area: Computer System
2020 Volume E103.D Issue 12 Pages
2525-2527
Published: December 01, 2020
Released on J-STAGE: December 01, 2020
JOURNAL
FREE ACCESS
With the spread of smart cities through 5G and the development of IoT devices, the number of services requiring firm assurance of high capacity and ultra-low delay quality in various forms is increasing. However, continuous growth of large data makes it difficult for a centralized cloud to ensure quality of service. For this, a variety of distributed application architecture researches, such as MEC (Mobile|Mutli-access Edge Computing), are in progress. However, vendor-dependent MEC technology based on VNF (Virtual Network Function) has performance and scalability issues when deploying a variety of 5G-based services. This paper proposes PRISM-MECR, an SDN (Software Defined Network) based hardware accelerated MEC router using P4[3] programmable chip, to improve forwarding performance while minimizing load of host CPU cores in charge of forwarding among MEC technologies.
View full abstract
-
Yasuhiro NAKAHARA, Masato KIYAMA, Motoki AMAGASAKI, Masahiro IIDA
Article type: LETTER
Subject area: Computer System
2020 Volume E103.D Issue 12 Pages
2528-2529
Published: December 01, 2020
Released on J-STAGE: December 01, 2020
JOURNAL
FREE ACCESS
Quantization is an important technique for implementing convolutional neural networks on edge devices. Quantization often requires relearning, but relearning sometimes cannot be always be applied because of issues such as cost or privacy. In such cases, it is important to know the numerical precision required to maintain accuracy. We accurately simulate calculations on hardware and accurately measure the relationship between accuracy and numerical precision.
View full abstract
-
Xiaoxuan GUO, Renxi GONG, Haibo BAO, Zhenkun LU
Article type: PAPER
Subject area: Fundamentals of Information Systems
2020 Volume E103.D Issue 12 Pages
2549-2558
Published: December 01, 2020
Released on J-STAGE: December 01, 2020
JOURNAL
FREE ACCESS
It is well known that the large-scale access of wind power to the power system will affect the economic and environmental objectives of power generation scheduling, and also bring new challenges to the traditional deterministic power generation scheduling because of the intermittency and randomness of wind power. In order to deal with these problems, a multiobjective optimization dispatch method of wind-thermal power system is proposed. The method can be described as follows: A multiobjective interval power generation scheduling model of wind-thermal power system is firstly established by describing the wind speed on wind farm as an interval variable, and the minimization of fuel cost and pollution gas emission cost of thermal power unit is chosen as the objective functions. And then, the optimistic and pessimistic Pareto frontiers of the multi-objective interval power generation scheduling are obtained by utilizing an improved normal boundary intersection method with a normal boundary intersection (NBI) combining with a bilevel optimization method to solve the model. Finally, the optimistic and pessimistic compromise solutions is determined by a distance evaluation method. The calculation results of the 16-unit 174-bus system show that by the proposed method, a uniform optimistic and pessimistic Pareto frontier can be obtained, the analysis of the impact of wind speed interval uncertainty on the economic and environmental indicators can be quantified. In addition, it has been verified that the Pareto front in the actual scenario is distributed between the optimistic and pessimistic Pareto front, and the influence of different wind power access levels on the optimistic and pessimistic Pareto fronts is analyzed.
View full abstract
-
Ho-Young KIM, Seong-Won LEE
Article type: PAPER
Subject area: Software System
2020 Volume E103.D Issue 12 Pages
2559-2567
Published: December 01, 2020
Released on J-STAGE: December 01, 2020
JOURNAL
FREE ACCESS
In an internet of things (IoT) system using an energy harvesting device and a secondary (2nd) battery, regardless of the age of the 2nd battery, the power management shortens the lifespan of the entire system. In this paper, we propose a scheme that extends the lifetime of the energy harvesting-based IoT system equipped with a Lithium 2nd battery. The proposed scheme includes several policies of using a supercapacitor as a primary energy storage, limiting the charging level according to the predicted harvesting energy, swinging the energy level around the minimum stress state of charge (SOC) level, and delaying the charge start time. Experiments with natural solar energy measurements based on a battery aging approximation model show that the proposed method can extend the operation lifetime of an existing IoT system from less than one and a half year to more than four years.
View full abstract
-
Akkharawoot TAKHOM, Sasiporn USANAVASIN, Thepchai SUPNITHI, Prachya BO ...
Article type: PAPER
Subject area: Software Engineering
2020 Volume E103.D Issue 12 Pages
2568-2577
Published: December 01, 2020
Released on J-STAGE: December 01, 2020
JOURNAL
FREE ACCESS
Ontology describes concepts and relations in a specific domain-knowledge that are important for knowledge representation and knowledge sharing. In the past few years, several tools have been introduced for ontology modeling and editing. To design and develop an ontology is one of the challenge tasks and its challenges are quite similar to software development as it requires many collaborative activities from many stakeholders (e.g. domain experts, knowledge engineers, application users, etc.) through the development cycle. Most of the existing tools do not provide collaborative feature to support stakeholders to collaborate work more effectively. In addition, there are lacking of standard process adoption for ontology development task. Thus, in this work, we incorporated ontology development process into Scrum process as used for process standard in software engineering. Based on Scrum, we can perform standard agile development of ontology that can reduce the development cycle as well as it can be responding to any changes better and faster. To support this idea, we proposed a Scrum Ontology Development Framework, which is an online collaborative framework for agile ontology design and development. Each ontology development process based on Scrum model will be supported by different services in our framework, aiming to promote collaborative activities among different roles of stakeholders. In addition to services such as ontology visualized modeling and editing, we also provide three more important features such as 1) concept/relation misunderstanding diagnosis, 2) cross-domain concept detection and 3) concept classification. All these features allow stakeholders to share their understanding and collaboratively discuss to improve quality of domain ontologies through a community consensus.
View full abstract
-
Ying JI, Yu WANG, Jien KATO, Kensaku MORI
Article type: PAPER
Subject area: Data Engineering, Web Information Systems
2020 Volume E103.D Issue 12 Pages
2578-2589
Published: December 01, 2020
Released on J-STAGE: December 01, 2020
JOURNAL
FREE ACCESS
With the rapid development of multimedia, violent video can be easily accessed in games, movies, websites, and so on. Identifying violent videos and rating violence extent is of great importance to media filtering and children protection. Many previous studies only address the problems of violence scene detection and violent action recognition, yet violence rating problem is still not solved. In this paper, we present a novel video-level rating prediction method to estimate violence extent automatically. It has two main characteristics: (1) a two-stream network is fine-tuned to construct effective representations of violent videos; (2) a violence rating prediction machine is designed to learn the strength relationship among different videos. Furthermore, we present a novel violent video dataset with a total of 1,930 human-involved violent videos designed for violence rating analysis. Each video is annotated with 6 fine-grained objective attributes, which are considered to be closely related to violence extent. The ground-truth of violence rating is given by pairwise comparison method. The dataset is evaluated in both stability and convergence. Experiment results on this dataset demonstrate the effectiveness of our method compared with the state-of-art classification methods.
View full abstract
-
Hayato YAMAKI, Hiroaki NISHI, Shinobu MIWA, Hiroki HONDA
Article type: PAPER
Subject area: Information Network
2020 Volume E103.D Issue 12 Pages
2590-2599
Published: December 01, 2020
Released on J-STAGE: December 01, 2020
JOURNAL
FREE ACCESS
We propose a technique to reduce compulsory misses of packet processing cache (PPC), which largely affects both throughput and energy of core routers. Rather than prefetching data, our technique called response prediction cache (RPC) speculatively stores predicted data in PPC without additional access to the low-throughput and power-consuming memory (i.e., TCAM). RPC predicts the data related to a response flow at the arrival of the corresponding request flow, based on the request-response model of internet communications. Our experimental results with 11 real-network traces show that RPC can reduce the PPC miss rate by 13.4% in upstream and 47.6% in downstream on average when we suppose three-layer PPC. Moreover, we extend RPC to adaptive RPC (A-RPC) that selects the use of RPC in each direction within a core router for further improvement in PPC misses. Finally, we show that A-RPC can achieve 1.38x table-lookup throughput with 74% energy consumption per packet, when compared to conventional PPC.
View full abstract
-
Sanghun CHOI, Yichen AN, Iwao SASASE
Article type: PAPER
Subject area: Information Network
2020 Volume E103.D Issue 12 Pages
2600-2610
Published: December 01, 2020
Released on J-STAGE: December 01, 2020
JOURNAL
FREE ACCESS
The flooding DDoS attack is a serious problem these days. In order to detect the flooding DDoS attack, the survival approaches and the mitigation approaches have been investigated. Since the survival approach occurs the burden on the victims, the mitigation approach is mainly studied. As for the mitigation approaches, to detect the flooding DDoS attack, the conventional schemes using the bloom filter, machine learning, and pattern analyzation have been investigated. However, those schemes are not effective to ensure the high accuracy (ACC), the high true positive rate (TPR), and the low false positive rate (FPR). In addition, the data size and calculation time are high. Moreover, the performance is not effective from the fluctuant attack packet per second (pps). In order to effectively detect the flooding DDoS attack, we propose the lightweight detection using bloom filter against flooding DDoS attack. To detect the flooding DDoS attack and ensure the high accuracy, the high true positive rate, and the low false positive rate, the dec-all (decrement-all) operation and the checkpoint are flexibly changed from the fluctuant pps in the bloom filter. Since we only consider the IP address, all kinds of flooding attacks can be detected without the blacklist and whitelist. Moreover, there is no complexity to recognize the attack. By the computer simulation with the datasets, we show our scheme achieves an accuracy of 97.5%. True positive rate and false positive rate show 97.8% and 6.3%, respectively. The data size for processing is much small as 280bytes. Furthermore, our scheme can detect the flooding DDoS attack at once in 11.1sec calculation time.
View full abstract
-
Haitao XIE, Qingtao FAN, Qian XIAO
Article type: PAPER
Subject area: Artificial Intelligence, Data Mining
2020 Volume E103.D Issue 12 Pages
2611-2619
Published: December 01, 2020
Released on J-STAGE: December 01, 2020
JOURNAL
FREE ACCESS
Nowadays recommender systems (RS) keep drawing attention from academia, and collaborative filtering (CF) is the most successful technique for building RS. To overcome the inherent limitation, which is referred to as data sparsity in CF, various solutions are proposed to incorporate additional social information into recommendation processes, such as trust networks. However, existing methods suffer from multi-source data integration (i.e., fusion of social information and ratings), which is the basis for similarity calculation of user preferences. To this end, we propose a social collaborative filtering method based on novel trust metrics. Firstly, we use Graph Convolutional Networks (GCNs) to learn the associations between social information and user ratings while considering the underlying social network structures. Secondly, we measure the direct-trust values between neighbors by representing multi-source data as user ratings on popular items, and then calculate the indirect-trust values based on trust propagations. Thirdly, we employ all trust values to create a social regularization in user-item rating matrix factorization in order to avoid overfittings. The experiments on real datasets show that our approach outperforms the other state-of-the-art methods on usage of multi-source data to alleviate data sparsity.
View full abstract
-
Kazuki SESHIMO, Akira OTA, Daichi NISHIO, Satoshi YAMANE
Article type: PAPER
Subject area: Artificial Intelligence, Data Mining
2020 Volume E103.D Issue 12 Pages
2620-2631
Published: December 01, 2020
Released on J-STAGE: December 01, 2020
JOURNAL
FREE ACCESS
In recent years, the use of big data has attracted more attention, and many techniques for data analysis have been proposed. Big data analysis is difficult, however, because such data varies greatly in its regularity. Heterogeneous mixture machine learning is one algorithm for analyzing such data efficiently. In this study, we propose online heterogeneous learning based on an online EM algorithm. Experiments show that this algorithm has higher learning accuracy than that of a conventional method and is practical. The online learning approach will make this algorithm useful in the field of data analysis.
View full abstract
-
Hiroyuki OKUDA, Nobuto SUGIE, Tatsuya SUZUKI, Kentaro HARAGUCHI, Zibo ...
Article type: PAPER
Subject area: Artificial Intelligence, Data Mining
2020 Volume E103.D Issue 12 Pages
2632-2642
Published: December 01, 2020
Released on J-STAGE: December 01, 2020
JOURNAL
FREE ACCESS
Path planning and motion control are fundamental components to realize safe and reliable autonomous driving. The discrimination of the role of these two components, however, is somewhat obscure because of strong mathematical interaction between these two components. This often results in a redundant computation in the implementation. One of attracting idea to overcome this redundancy is a simultaneous path planning and motion control (SPPMC) based on a model predictive control framework. SPPMC finds the optimal control input considering not only the vehicle dynamics but also the various constraints which reflect the physical limitations, safety constraints and so on to achieve the goal of a given behavior. In driving in the real traffic environment, decision making has also strong interaction with planning and control. This is much more emphasized in the case that several tasks are switched in some context to realize higher-level tasks. This paper presents a basic idea to integrate decision making, path planning and motion control which is able to be executed in realtime. In particular, lane-changing behavior together with the decision of its initiation is selected as the target task. The proposed idea is based on the nonlinear model predictive control and appropriate switching of the cost function and constraints in it. As the result, the decision of the initiation, planning, and control of the lane-changing behavior are achieved by solving a single optimization problem under several constraints such as safety. The validity of the proposed method is tested by using a vehicle simulator.
View full abstract
-
Liyang ZHANG, Hiroyuki SUZUKI, Akio KOYAMA
Article type: PAPER
Subject area: Artificial Intelligence, Data Mining
2020 Volume E103.D Issue 12 Pages
2643-2648
Published: December 01, 2020
Released on J-STAGE: December 01, 2020
JOURNAL
FREE ACCESS
In recent years, with the improvement of health awareness, people have paid more and more attention to proper meal. Existing research has shown that a proper meal can help people prevent lifestyle diseases such as diabetes. In this research, by attaching sensors to the tableware, the information during the meal can be captured, and after processing and analyzing it, the meal information, such as time and sequence of meal, can be obtained. This paper introduces how to use supervised learning and multi-instance learning to deal with meal information and a detailed comparison is made. Three supervised learning algorithms and two multi-instance learning algorithms are used in the experiment. The experimental results showed that although the supervised learning algorithms have achieved good results in F-score, the multi-instance learning algorithms have achieved better results not only in accuracy but also in F-score.
View full abstract
-
Kazunori IWATA
Article type: PAPER
Subject area: Pattern Recognition
2020 Volume E103.D Issue 12 Pages
2649-2658
Published: December 01, 2020
Released on J-STAGE: December 01, 2020
JOURNAL
FREE ACCESS
The nearest neighbor method is a simple and flexible scheme for the classification of data points in a vector space. It predicts a class label of an unseen data point using a majority rule for the labels of known data points inside a neighborhood of the unseen data point. Because it sometimes achieves good performance even for complicated problems, several derivatives of it have been studied. Among them, the discriminant adaptive nearest neighbor method is particularly worth revisiting to demonstrate its application. The main idea of this method is to adjust the neighbor metric of an unseen data point to the set of known data points before label prediction. It often improves the prediction, provided the neighbor metric is adjusted well. For statistical shape analysis, shape classification attracts attention because it is a vital topic in shape analysis. However, because a shape is generally expressed as a matrix, it is non-trivial to apply the discriminant adaptive nearest neighbor method to shape classification. Thus, in this study, we develop the discriminant adaptive nearest neighbor method to make it slightly more useful in shape classification. To achieve this development, a mixture model and optimization algorithm for shape clustering are incorporated into the method. Furthermore, we describe several helpful techniques for the initial guess of the model parameters in the optimization algorithm. Using several shape datasets, we demonstrated that our method is successful for shape classification.
View full abstract
-
Noriyuki MATSUNAGA, Yamato OHTANI, Tatsuya HIRAHARA
Article type: PAPER
Subject area: Speech and Hearing
2020 Volume E103.D Issue 12 Pages
2659-2672
Published: December 01, 2020
Released on J-STAGE: December 01, 2020
JOURNAL
FREE ACCESS
Deep neural network (DNN)-based speech synthesis became popular in recent years and is expected to soon be widely used in embedded devices and environments with limited computing resources. The key intention of these systems in poor computing environments is to reduce the computational cost of generating speech parameter sequences while maintaining voice quality. However, reducing computational costs is challenging for two primary conventional DNN-based methods used for modeling speech parameter sequences. In feed-forward neural networks (FFNNs) with maximum likelihood parameter generation (MLPG), the MLPG reconstructs the temporal structure of the speech parameter sequences ignored by FFNNs but requires additional computational cost according to the sequence length. In recurrent neural networks, the recursive structure allows for the generation of speech parameter sequences while considering temporal structures without the MLPG, but increases the computational cost compared to FFNNs. We propose a new approach for DNNs to acquire parameters captured from the temporal structure by backpropagating the errors of multiple attributes of the temporal sequence via the loss function. This method enables FFNNs to generate speech parameter sequences by considering their temporal structure without the MLPG. We generated the fundamental frequency sequence and the mel-cepstrum sequence with our proposed method and conventional methods, and then synthesized and subjectively evaluated the speeches from these sequences. The proposed method enables even FFNNs that work on a frame-by-frame basis to generate speech parameter sequences by considering the temporal structure and to generate sequences perceptually superior to those from the conventional methods.
View full abstract
-
Junya KOGUCHI, Shinnosuke TAKAMICHI, Masanori MORISE, Hiroshi SARUWATA ...
Article type: PAPER
Subject area: Speech and Hearing
2020 Volume E103.D Issue 12 Pages
2673-2681
Published: December 01, 2020
Released on J-STAGE: December 01, 2020
JOURNAL
FREE ACCESS
We propose a speech analysis-synthesis and deep neural network (DNN)-based text-to-speech (TTS) synthesis framework using Gaussian mixture model (GMM)-based approximation of full-band spectral envelopes. GMMs have excellent properties as acoustic features in statistic parametric speech synthesis. Each Gaussian function of a GMM fits the local resonance of the spectrum. The GMM retains the fine spectral envelope and achieve high controllability of the structure. However, since conventional speech analysis methods (i.e., GMM parameter estimation) have been formulated for a narrow-band speech, they degrade the quality of synthetic speech. Moreover, a DNN-based TTS synthesis method using GMM-based approximation has not been formulated in spite of its excellent expressive ability. Therefore, we employ peak-picking-based initialization for full-band speech analysis to provide better initialization for iterative estimation of the GMM parameters. We introduce not only prediction error of GMM parameters but also reconstruction error of the spectral envelopes as objective criteria for training DNN. Furthermore, we propose a method for multi-task learning based on minimizing these errors simultaneously. We also propose a post-filter based on variance scaling of the GMM for our framework to enhance synthetic speech. Experimental results from evaluating our framework indicated that 1) the initialization method of our framework outperformed the conventional one in the quality of analysis-synthesized speech; 2) introducing the reconstruction error in DNN training significantly improved the synthetic speech; 3) our variance-scaling-based post-filter further improved the synthetic speech.
View full abstract
-
Tomohiro TAKAHASHI, Katsumi KONISHI, Kazunori URUMA, Toshihiro FURUKAW ...
Article type: PAPER
Subject area: Image Processing and Video Processing
2020 Volume E103.D Issue 12 Pages
2682-2692
Published: December 01, 2020
Released on J-STAGE: December 01, 2020
JOURNAL
FREE ACCESS
This paper proposes an image inpainting algorithm based on multiple linear models and matrix rank minimization. Several inpainting algorithms have been previously proposed based on the assumption that an image can be modeled using autoregressive (AR) models. However, these algorithms perform poorly when applied to natural photographs because they assume that an image is modeled by a position-invariant linear model with a fixed model order. In order to improve inpainting quality, this work introduces a multiple AR model and proposes an image inpainting algorithm based on multiple matrix rank minimization with sparse regularization. In doing so, a practical algorithm is provided based on the iterative partial matrix shrinkage algorithm, with numerical examples showing the effectiveness of the proposed algorithm.
View full abstract
-
Kangbo SUN, Jie ZHU
Article type: PAPER
Subject area: Image Recognition, Computer Vision
2020 Volume E103.D Issue 12 Pages
2693-2700
Published: December 01, 2020
Released on J-STAGE: December 01, 2020
JOURNAL
FREE ACCESS
Location and feature representation of object's parts play key roles in fine-grained visual recognition. To promote the final recognition accuracy without any bounding boxes/part annotations, many studies adopt object location networks to propose bounding boxes/part annotations with only category labels, and then crop the images into partial images to help the classification network make the final decision. In our work, to propose more informative partial images and effectively extract discriminative features from the original and partial images, we propose a two-stage approach that can fuse the original features and partial features by evaluating and ranking the information of partial images. Experimental results show that our proposed approach achieves excellent performance on two benchmark datasets, which demonstrates its effectiveness.
View full abstract
-
Manabu OKAWA
Article type: PAPER
Subject area: Image Recognition, Computer Vision
2020 Volume E103.D Issue 12 Pages
2701-2708
Published: December 01, 2020
Released on J-STAGE: December 01, 2020
JOURNAL
FREE ACCESS
In this paper, we propose a novel single-template strategy based on a mean template set and locally/globally weighted dynamic time warping (LG-DTW) to improve the performance of online signature verification. Specifically, in the enrollment phase, we implement a time series averaging method, Euclidean barycenter-based DTW barycenter averaging, to obtain a mean template set considering intra-user variability among reference samples. Then, we acquire a local weighting estimate considering a local stability sequence that is obtained analyzing multiple matching points of an optimal match between the mean template and reference sets. Thereafter, we derive a global weighting estimate based on the variable importance estimated by gradient boosting. Finally, in the verification phase, we apply both local and global weighting methods to acquire a discriminative LG-DTW distance between the mean template set and a query sample. Experimental results obtained on the public SVC2004 Task2 and MCYT-100 signature datasets confirm the effectiveness of the proposed method for online signature verification.
View full abstract
-
Shi QIU, German M. DANIEL, Katsuro INOUE
Article type: LETTER
Subject area: Software Engineering
2020 Volume E103.D Issue 12 Pages
2709-2712
Published: December 01, 2020
Released on J-STAGE: December 01, 2020
JOURNAL
FREE ACCESS
For Free and Open Source Software (FOSS), identifying the copyright notices is important. However, both the collaborative manner of FOSS project development and the large number of source files increase its difficulty. In this paper, we aim at automatically identifying the copyright notices in source files based on machine learning techniques. The evaluation experiment shows that our method outperforms FOSSology, the only existing method based on regular expression.
View full abstract
-
Guangyuan LIU, Daokun CHEN
Article type: LETTER
Subject area: Information Network
2020 Volume E103.D Issue 12 Pages
2713-2716
Published: December 01, 2020
Released on J-STAGE: December 01, 2020
JOURNAL
FREE ACCESS
How to restore virtual network against substrate network failure (e.g. link cut) is one of the key challenges of network virtualization. The traditional virtual network recovery (VNR) methods are mostly based on the idea of centralized control. However, if multiple virtual networks fail at the same time, their recovery processes are usually queued according to a specific priority, which may increase the average waiting time of users. In this letter, we study distributed virtual network recovery (DVNR) method to improve the virtual network recovery efficiency. We establish exclusive virtual machine (VM) for each virtual network and process recovery requests of multiple virtual networks in parallel. Simulation results show that the proposed DVNR method can obtain recovery success rate closely to centralized VNR method while yield ~70% less average recovery time.
View full abstract
-
Mengmeng LI, Xiaoguang REN, Yanzhen WANG, Wei QIN, Yi LIU
Article type: LETTER
Subject area: Artificial Intelligence, Data Mining
2020 Volume E103.D Issue 12 Pages
2717-2720
Published: December 01, 2020
Released on J-STAGE: December 01, 2020
JOURNAL
FREE ACCESS
Feature selection is important for learning algorithms, and it is still an open problem. Antlion optimizer is an excellent nature inspired method, but it doesn't work well for feature selection. This paper proposes a hybrid approach called Ant-Antlion Optimizer which combines advantages of antlion's smart behavior of antlion optimizer and ant's powerful searching movement of ant colony optimization. A mutation operator is also adopted to strengthen exploration ability. Comprehensive experiments by binary classification problems show that the proposed algorithm is superiority to other state-of-art methods on four performance indicators.
View full abstract
-
Farzin MATIN, Yoosoo JEONG, Hanhoon PARK
Article type: LETTER
Subject area: Image Processing and Video Processing
2020 Volume E103.D Issue 12 Pages
2721-2724
Published: December 01, 2020
Released on J-STAGE: December 01, 2020
JOURNAL
FREE ACCESS
Multiscale retinex is one of the most popular image enhancement methods. However, its control parameters, such as Gaussian kernel sizes, gain, and offset, should be tuned carefully according to the image contents. In this letter, we propose a new method that optimizes the parameters using practical swarm optimization and multi-objective function. The method iteratively verifies the visual quality (i.e. brightness, contrast, and colorfulness) of the enhanced image using a multi-objective function while subtly adjusting the parameters. Experimental results shows that the proposed method achieves better image quality qualitatively and quantitatively compared with other image enhancement methods.
View full abstract
-
Yubo LIU, Yangting LAI, Jianyong CHEN, Lingyu LIANG, Qiaoming DENG
Article type: LETTER
Subject area: Computer Graphics
2020 Volume E103.D Issue 12 Pages
2725-2729
Published: December 01, 2020
Released on J-STAGE: December 01, 2020
JOURNAL
FREE ACCESS
Computer aided design (CAD) technology is widely used for architectural design, but current CAD tools still require high-level design specifications from human. It would be significant to construct an intelligent CAD system allowing automatic architectural layout parsing (AutoALP), which generates candidate designs or predicts architectural attributes without much user intervention. To tackle these problems, many learning-based methods were proposed, and benchmark dataset become one of the essential elements for the data-driven AutoALP. This paper proposes a new dataset called SCUT-AutoALP for multi-paradigm applications. It contains two subsets: 1) Subset-I is for floor plan design containing 300 residential floor plan images with layout, boundary and attribute labels; 2) Subset-II is for urban plan design containing 302 campus plan images with layout, boundary and attribute labels. We analyzed the samples and labels statistically, and evaluated SCUT-AutoALP for different layout parsing tasks of floor plan/urban plan based on conditional generative adversarial networks (cGAN) models. The results verify the effectiveness and indicate the potential applications of SCUT-AutoALP. The dataset is available at https://github.com/designfuturelab702/SCUT-AutoALP-Database-Release.
View full abstract