IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
Volume E104.D, Issue 10
Displaying 1-34 of 34 articles from this issue
Special Section on Formal Approaches
  • Fuyuki ISHIKAWA
    2021 Volume E104.D Issue 10 Pages 1514
    Published: October 01, 2021
    Released on J-STAGE: October 01, 2021
    JOURNAL FREE ACCESS
    Download PDF (56K)
  • Yoshinao ISOBE, Nobuhiko MIYAMOTO, Noriaki ANDO, Yutaka OIWA
    Article type: PAPER
    2021 Volume E104.D Issue 10 Pages 1515-1532
    Published: October 01, 2021
    Released on J-STAGE: October 01, 2021
    JOURNAL FREE ACCESS

    In this paper, we demonstrate that a formal approach is effective for improving reliability of cooperative robot designs, where the control logics are expressed in concurrent FSMs (Finite State Machines), especially in accordance with the standard FSM4RTC (FSM for Robotic Technology Components), by a case study of cooperative transport robots. In the case study, FSMs are modeled in the formal specification language CSP (Communicating Sequential Processes) and checked by the model-checking tool FDR, where we show techniques for modeling and verification of cooperative robots implemented with the help of the RTM (Robotic Technology Middleware).

    Download PDF (4622K)
  • Ryoga NOGUCHI, Yoshikazu HANATANI, Kazuki YONEYAMA
    Article type: PAPER
    2021 Volume E104.D Issue 10 Pages 1533-1543
    Published: October 01, 2021
    Released on J-STAGE: October 01, 2021
    JOURNAL FREE ACCESS

    Home Energy Management Systems (HEMS) contain devices of multiple manufacturers. Also, a large number of groups of devices must be managed according to several clustering situations. Hence, since it is necessary to establish a common secret group key among group members, the group key management scheme of IEEE 802.21 is used. However, no security verification result by formal methods is known. In this paper, we give the first formal verification result of secrecy and authenticity of the group key management scheme of IEEE 802.21 against insider and outsider attacks using ProVerif, which is an automatic verification tool for cryptographic protocols. As a result, we clarify that a spoofing attack by an insider and a replay attack by an outsider are found for the basic scheme, but these attacks can be prevented by using the scheme with the digital signature option.

    Download PDF (745K)
Special Section on Picture Coding and Image Media Processing
  • Toshiaki FUJII
    2021 Volume E104.D Issue 10 Pages 1544
    Published: October 01, 2021
    Released on J-STAGE: October 01, 2021
    JOURNAL FREE ACCESS
    Download PDF (55K)
  • Hideaki KIMATA, Xiaojun WU, Ryuichi TANIDA
    Article type: PAPER
    2021 Volume E104.D Issue 10 Pages 1545-1554
    Published: October 01, 2021
    Released on J-STAGE: October 01, 2021
    JOURNAL FREE ACCESS

    The need for real-time use of human dynamics data is increasing. The technical requirements for this include improved databases for handling a large amount of data as well as highly accurate sensing of people's movements. A bitmap index format has been proposed for high-speed processing of data that spreads in a two-dimensional space. Using the same format is expected to provide a service that searches queries, reads out desired data, visualizes it, and analyzes it. In this study, we propose a coding format that enables human dynamics data to compress it in the target data size, in order to save data storage for successive increase of real-time human dynamics data. In the proposed method, the spatial population distribution, which is expressed by a probability distribution, is approximated and compressed using the one-pixel one-byte data format normally used for image coding. We utilize two kinds of approximation, which are accuracy of probability and precision of spatial location, in order to control the data size and the amount of information. For accuracy of probability, we propose a non-linear mapping method for the spatial distribution, and for precision of spatial location, we propose spatial scalable layered coding to refine the mesh level of the spatial distribution. Also, in order to enable additional detailed analysis, we propose another scalable layered coding that improves the accuracy of the distribution. We demonstrate through experiments that the proposed data approximation and coding format achieve sufficient approximation of spatial population distribution in the given condition of target data size.

    Download PDF (1472K)
  • Chao WANG, Michihiko OKUYAMA, Ryo MATSUOKA, Takahiro OKABE
    Article type: PAPER
    2021 Volume E104.D Issue 10 Pages 1555-1562
    Published: October 01, 2021
    Released on J-STAGE: October 01, 2021
    JOURNAL FREE ACCESS

    Water detection is important for machine vision applications such as visual inspection and robot motion planning. In this paper, we propose an approach to per-pixel water detection on unknown surfaces with a hyperspectral image. Our proposed method is based on the water spectral characteristics: water is transparent for visible light but translucent/opaque for near-infrared light and therefore the apparent near-infrared spectral reflectance of a surface is smaller than the original one when water is present on it. Specifically, we use a linear combination of a small number of basis vector to approximate the spectral reflectance and estimate the original near-infrared reflectance from the visible reflectance (which does not depend on the presence or absence of water) to detect water. We conducted a number of experiments using real images and show that our method, which estimates near-infrared spectral reflectance based on the visible spectral reflectance, has better performance than existing techniques.

    Download PDF (1363K)
  • Kazuki KASAI, Kaoru KAWAKITA, Akira KUBOTA, Hiroki TSURUSAKI, Ryosuke ...
    Article type: PAPER
    2021 Volume E104.D Issue 10 Pages 1563-1571
    Published: October 01, 2021
    Released on J-STAGE: October 01, 2021
    JOURNAL FREE ACCESS

    In this paper, we present an efficient and robust method for estimating Homography matrix for soccer field registration between a captured camera image and a soccer field model. The presented method first detects reliable field lines from the camera image through clustering. Constructing a novel directional feature of the intersection points of the lines in both the camera image and the model, the presented method then finds matching pairs of these points between the image and the model. Finally, Homography matrix estimations and validations are performed using the obtained matching pairs, which can reduce the required number of Homography matrix calculations. Our presented method uses possible intersection points outside image for the point matching. This effectively improves robustness and accuracy of Homography estimation as demonstrated in experimental results.

    Download PDF (2624K)
  • Yuya KAMATAKI, Yusuke KAMEDA, Yasuyo KITA, Ichiro MATSUDA, Susumu ITOH
    Article type: LETTER
    2021 Volume E104.D Issue 10 Pages 1572-1575
    Published: October 01, 2021
    Released on J-STAGE: October 01, 2021
    JOURNAL FREE ACCESS

    This paper proposes a lossless coding method for HDR color images stored in a floating point format called Radiance RGBE. In this method, three mantissa and a common exponent parts, each of which is represented in 8-bit depth, are encoded using the block-adaptive prediction technique with some modifications considering the data structure.

    Download PDF (7118K)
  • Takayuki HATTORI, Kohei INOUE, Kenji HARA
    Article type: LETTER
    2021 Volume E104.D Issue 10 Pages 1576-1579
    Published: October 01, 2021
    Released on J-STAGE: October 01, 2021
    JOURNAL FREE ACCESS

    We propose a generalization of the rolling guidance filter (RGF) to a similarity-based clustering (SBC) algorithm which can handle general vector data. The proposed RGF-based SBC algorithm makes the similarities between data clearer than the original similarity values computed from the original data. On the basis of the similarity values, we assign cluster labels to data by an SBC algorithm. Experimental results show that the proposed algorithm achieves better clustering result than the result by the naive application of the SBC algorithm to the original similarity values. Additionally, we study the convergence of a unimodal vector dataset to its mean vector.

    Download PDF (1008K)
Regular Section
  • Hongwei YANG, Fucheng XUE, Dan LIU, Li LI, Jiahui FENG
    Article type: PAPER
    Subject area: Computer System
    2021 Volume E104.D Issue 10 Pages 1580-1591
    Published: October 01, 2021
    Released on J-STAGE: October 01, 2021
    JOURNAL FREE ACCESS

    Service composition optimization is a classic NP-hard problem. How to quickly select high-quality services that meet user needs from a large number of candidate services is a hot topic in cloud service composition research. An efficient second-order beetle swarm optimization is proposed with a global search ability to solve the problem of cloud service composition optimization in this study. First, the beetle antennae search algorithm is introduced into the modified particle swarm optimization algorithm, initialize the population bying using a chaotic sequence, and the modified nonlinear dynamic trigonometric learning factors are adopted to control the expanding capacity of particles and global convergence capability. Second, modified secondary oscillation factors are incorporated, increasing the search precision of the algorithm and global searching ability. An adaptive step adjustment is utilized to improve the stability of the algorithm. Experimental results founded on a real data set indicated that the proposed global optimization algorithm can solve web service composition optimization problems in a cloud environment. It exhibits excellent global searching ability, has comparatively fast convergence speed, favorable stability, and requires less time cost.

    Download PDF (1757K)
  • Yosuke MUKASA, Tomoya WAKAIZUMI, Shu TANAKA, Nozomu TOGAWA
    Article type: PAPER
    Subject area: Computer System
    2021 Volume E104.D Issue 10 Pages 1592-1600
    Published: October 01, 2021
    Released on J-STAGE: October 01, 2021
    JOURNAL FREE ACCESS

    In an amusement park, an attraction-visiting route considering the waiting time and traveling time improves visitors' satisfaction and experience. We focus on Ising machines to solve the problem, which are recently expected to solve combinatorial optimization problems at high speed by mapping the problems to Ising models or quadratic unconstrained binary optimization (QUBO) models. We propose a mapping of the visiting-route recommendation problem in amusement parks to a QUBO model for solving it using Ising machines. By using an actual Ising machine, we could obtain feasible solutions one order of magnitude faster with almost the same accuracy as the simulated annealing method for the visiting-route recommendation problem.

    Download PDF (1000K)
  • Natthawute SAE-LIM, Shinpei HAYASHI, Motoshi SAEKI
    Article type: PAPER
    Subject area: Software Engineering
    2021 Volume E104.D Issue 10 Pages 1601-1615
    Published: October 01, 2021
    Released on J-STAGE: October 01, 2021
    JOURNAL FREE ACCESS

    Code smells can be detected using tools such as a static analyzer that detects code smells based on source code metrics. Developers perform refactoring activities based on the result of such detection tools to improve source code quality. However, such an approach can be considered as reactive refactoring, i.e., developers react to code smells after they occur. This means that developers first suffer the effects of low-quality source code before they start solving code smells. In this study, we focus on proactive refactoring, i.e., refactoring source code before it becomes smelly. This approach would allow developers to maintain source code quality without having to suffer the impact of code smells. To support the proactive refactoring process, we propose a technique to detect decaying modules, which are non-smelly modules that are about to become smelly. We present empirical studies on open source projects with the aim of studying the characteristics of decaying modules. Additionally, to facilitate developers in the refactoring planning process, we perform a study on using a machine learning technique to predict decaying modules and report a factor that contributes most to the performance of the model under consideration.

    Download PDF (936K)
  • Satoshi FUJITA
    Article type: PAPER
    Subject area: Information Network
    2021 Volume E104.D Issue 10 Pages 1616-1623
    Published: October 01, 2021
    Released on J-STAGE: October 01, 2021
    JOURNAL FREE ACCESS

    To realize an information-centric networking, IPFS (InterPlanetary File System) generates a unique ContentID for each content by applying a cryptographic hash to the content itself. Although it could improve the security against attacks such as falsification, it makes difficult to realize a similarity search in the framework of IPFS, since the similarity of contents is not reflected in the proximity of ContentIDs. To overcome this issue, we propose a method to apply a locality sensitive hash (LSH) to feature vectors extracted from contents as the key of indexes stored in IPFS. By conducting experiments with 10,000 random points corresponding to stored contents, we found that more than half of randomly given queries return a non-empty result for the similarity search, and yield an accurate result which is outside the σ confidence interval of an ordinary flooding-based method. Note that such a collection of random points corresponds to the worst case scenario for the proposed scheme since the performance of similarity search could improve when points and queries follow an uneven distribution.

    Download PDF (389K)
  • Satoshi FUJITA
    Article type: PAPER
    Subject area: Information Network
    2021 Volume E104.D Issue 10 Pages 1624-1631
    Published: October 01, 2021
    Released on J-STAGE: October 01, 2021
    JOURNAL FREE ACCESS

    In this paper, we propose a method to enhance the download efficiency of BitTorrent protocol with the notion of structures in the set of pieces generated from a shared file and the swarm of peers downloading the same shared file. More specifically, as for the set of pieces, we introduce the notion of super-pieces called clusters, which is aimed to enlarge the granularity of the management of request-and-reply of pieces, and as for the swarm of peers, we organize a clique consisting of several peers with similar upload capacity, to improve the smoothness of the flow of pieces associated with a cluster. As is shown in the simulation results, the proposed extensions significantly reduce the download time of the first 75% of the downloaders, and thereby improve the performance of P2P-assisted video streaming such as Akamai NetSession and BitTorrent DNA.

    Download PDF (314K)
  • HongYuan CAO, Tsuyoshi KATO
    Article type: PAPER
    Subject area: Artificial Intelligence, Data Mining
    2021 Volume E104.D Issue 10 Pages 1632-1639
    Published: October 01, 2021
    Released on J-STAGE: October 01, 2021
    JOURNAL FREE ACCESS

    Contamination of water resources with pathogenic microorganisms excreted in human feces is a worldwide public health concern. Surveillance of fecal contamination is commonly performed by routine monitoring for a single type or a few types of microorganism(s). To design a feasible routine for periodic monitoring and to control risks of exposure to pathogens, reliable statistical algorithms for inferring correlations between concentrations of microorganisms in water need to be established. Moreover, because pathogens are often present in low concentrations, some contaminations are likely to be under a detection limit. This yields a pairwise left-censored dataset and complicates computation of correlation coefficients. Errors of correlation estimation can be smaller if undetected values are imputed better. To obtain better imputations, we utilize side information and develop a new technique, the asymmetric Tobit model which is an extension of the Tobit model so that domain knowledge can be exploited effectively when fitting the model to a censored dataset. The empirical results demonstrate that imputation with domain knowledge is effective for this task.

    Download PDF (363K)
  • Tengfei SHAO, Yuya IEIRI, Reiko HISHIYAMA
    Article type: PAPER
    Subject area: Office Information Systems, e-Business Modeling
    2021 Volume E104.D Issue 10 Pages 1640-1650
    Published: October 01, 2021
    Released on J-STAGE: October 01, 2021
    JOURNAL FREE ACCESS

    Tourist satisfaction plays a very important role in the development of local community tourism. For the development of tourist destinations in local communities, it is important to measure, maintain, and improve tourist destination royalties over the medium to long term. It has been proven that improving tourist satisfaction is a major factor in improving tourist destination royalties. Therefore, to improve tourist satisfaction in local communities, we identified multiple clusters of sightseeing spots and determined that the satisfaction of tourists can be increased based on these clusters of sightseeing spots. Our discovery flow can be summarized as follows. First, we extracted tourism keywords from guidebooks on sightseeing spots. We then constructed a complex network of tourists and sightseeing spots based on the data collected from experiments conducted in Kyoto. Next, we added the corresponding tourism keywords to each sightseeing spot. Finally, by analyzing network motifs, we successfully discovered multiple clusters of sightseeing spots that could be used to improve tourist satisfaction.

    Download PDF (612K)
  • Huiling LI, Cong LIU, Qingtian ZENG, Hua HE, Chongguang REN, Lei WANG, ...
    Article type: PAPER
    Subject area: Office Information Systems, e-Business Modeling
    2021 Volume E104.D Issue 10 Pages 1651-1660
    Published: October 01, 2021
    Released on J-STAGE: October 01, 2021
    JOURNAL FREE ACCESS

    Effective emergency resource allocation is essential to guarantee a successful emergency disposal, and it has become a research focus in the area of emergency management. Emergency event logs are accumulated in modern emergency management systems and can be analyzed to support effective resource allocation. This paper proposes a novel approach for efficient emergency resource allocation by mining emergency event logs. More specifically, an emergency event log with various attributes, e.g., emergency task name, emergency resource type (reusable and consumable ones), required resource amount, and timestamps, is first formalized. Then, a novel algorithm is presented to discover emergency response process models, represented as an extension of Petri net with resource and time elements, from emergency event logs. Next, based on the discovered emergency response process models, the minimum resource requirements for both reusable and consumable resources are obtained, and two resource allocation strategies, i.e., the Shortest Execution Time (SET) strategy and the Least Resource Consumption (LRC) strategy, are proposed to support efficient emergency resource allocation decision-making. Finally, a chlorine tank explosion emergency case study is used to demonstrate the applicability and effectiveness of the proposed resource allocation approach.

    Download PDF (1168K)
  • Sahoko NAKAYAMA, Andros TJANDRA, Sakriani SAKTI, Satoshi NAKAMURA
    Article type: PAPER
    Subject area: Speech and Hearing
    2021 Volume E104.D Issue 10 Pages 1661-1677
    Published: October 01, 2021
    Released on J-STAGE: October 01, 2021
    JOURNAL FREE ACCESS

    The phenomenon where a speaker mixes two or more languages within the same conversation is called code-switching (CS). Handling CS is challenging for automatic speech recognition (ASR) and text-to-speech (TTS) because it requires coping with multilingual input. Although CS text or speech may be found in social media, the datasets of CS speech and corresponding CS transcriptions are hard to obtain even though they are required for supervised training. This work adopts a deep learning-based machine speech chain to train CS ASR and CS TTS with each other with semisupervised learning. After supervised learning with monolingual data, the machine speech chain is then carried out with unsupervised learning of either the CS text or speech. The results show that the machine speech chain trains ASR and TTS together and improves performance without requiring the pair of CS speech and corresponding CS text. We also integrate language embedding and language identification into the CS machine speech chain in order to handle CS better by giving language information. We demonstrate that our proposed approach can improve the performance on both a single CS language pair and multiple CS language pairs, including the unknown CS excluded from training data.

    Download PDF (1712K)
  • Ruochen LIAO, Kousuke MORIWAKI, Yasushi MAKIHARA, Daigo MURAMATSU, Nor ...
    Article type: PAPER
    Subject area: Image Recognition, Computer Vision
    2021 Volume E104.D Issue 10 Pages 1678-1690
    Published: October 01, 2021
    Released on J-STAGE: October 01, 2021
    JOURNAL FREE ACCESS

    In this study, we propose a method to estimate body composition-related health indicators (e.g., ratio of body fat, body water, and muscle, etc.) using video-based gait analysis. This method is more efficient than individual measurement using a conventional body composition meter. Specifically, we designed a deep-learning framework with a convolutional neural network (CNN), where the input is a gait energy image (GEI) and the output consists of the health indicators. Although a vast amount of training data is typically required to train network parameters, it is unfeasible to collect sufficient ground-truth data, i.e., pairs consisting of the gait video and the health indicators measured using a body composition meter for each subject. We therefore use a two-step approach to exploit an auxiliary gait dataset that contains a large number of subjects but lacks the ground-truth health indicators. At the first step, we pre-train a backbone network using the auxiliary dataset to output gait primitives such as arm swing, stride, the degree of stoop, and the body width — considered to be relevant to the health indicators. At the second step, we add some layers to the backbone network and fine-tune the entire network to output the health indicators even with a limited number of ground-truth data points of the health indicators. Experimental results show that the proposed method outperforms the other methods when training from scratch as well as when using an auto-encoder-based pre-training and fine-tuning approach; it achieves relatively high estimation accuracy for the body composition-related health indicators except for body fat-relevant ones.

    Download PDF (1797K)
  • Takahisa YAMAMOTO, Shiki TAKEUCHI, Atsushi NAKAZAWA
    Article type: PAPER
    Subject area: Image Recognition, Computer Vision
    2021 Volume E104.D Issue 10 Pages 1691-1701
    Published: October 01, 2021
    Released on J-STAGE: October 01, 2021
    JOURNAL FREE ACCESS

    Visual sentiment analysis has a lot of applications, including image captioning, opinion mining, and advertisement; however, it is still a difficult problem and existing algorithms cannot produce satisfactory results. One of the difficulties in classifying images into emotions is that visual sentiments are evoked by different types of information - visual and semantic information where visual information includes colors or textures, and semantic information includes types of objects evoking emotions and/or their combinations. In contrast to the existing methods that use only visual information, this paper shows a novel algorithm for image emotion recognition that uses both information simultaneously. For semantic features, we introduce an object vector and a word vector. The object vector is created by an object detection method and reflects existing objects in an image. The word vector is created by transforming the names of detected objects through a word embedding model. This vector will be similar among objects that are semantically similar. These semantic features and a visual feature made by a fine-tuned convolutional neural network (CNN) are concatenated. We perform the classification by the concatenated feature vector. Extensive evaluation experiments using emotional image datasets show that our method achieves the best accuracy except for one dataset against other existing methods. The improvement in accuracy of our method from existing methods is 4.54% at the highest.

    Download PDF (3514K)
  • Ying KANG, Cong LIU, Ning WANG, Dianxi SHI, Ning ZHOU, Mengmeng LI, Yu ...
    Article type: PAPER
    Subject area: Image Recognition, Computer Vision
    2021 Volume E104.D Issue 10 Pages 1702-1711
    Published: October 01, 2021
    Released on J-STAGE: October 01, 2021
    JOURNAL FREE ACCESS

    Siamese visual tracking, viewed as a problem of max-similarity matching to the target template, has absorbed increasing attention in computer vision. However, it is a challenge for current Siamese trackers that the demands of balance between accuracy in real-time tracking and robustness in long-time tracking are hard to meet. This work proposes a new Siamese based tracker with a dual-pipeline correlated fusion network (named as ADF-SiamRPN), which consists of one initial template for robust correlation, and the other transient template with the ability of adaptive feature optimal selection for accurate correlation. By the promotion from the learnable correlation-response fusion network afterwards, we are in pursuit of the synthetical improvement of tracking performance. To compare the performance of ADF-SiamRPN with state-of-the-art trackers, we conduct lots of experiments on benchmarks like OTB100, UAV123, VOT2016, VOT2018, GOT-10k, LaSOT and TrackingNet. The experimental results of tracking demonstrate that ADF-SiamRPN outperforms all the compared trackers and achieves the best balance between accuracy and robustness.

    Download PDF (2775K)
  • Shu JIANG, Rui WANG, Zuchao LI, Masao UTIYAMA, Kehai CHEN, Eiichiro SU ...
    Article type: PAPER
    Subject area: Natural Language Processing
    2021 Volume E104.D Issue 10 Pages 1712-1723
    Published: October 01, 2021
    Released on J-STAGE: October 01, 2021
    JOURNAL FREE ACCESS

    Standard neural machine translation (NMT) is on the assumption that the document-level context is independent. Most existing document-level NMT approaches are satisfied with a smattering sense of global document-level information, while this work focuses on exploiting detailed document-level context in terms of a memory network. The capacity of the memory network that detecting the most relevant part of the current sentence from memory renders a natural solution to model the rich document-level context. In this work, the proposed document-aware memory network is implemented to enhance the Transformer NMT baseline. Experiments on several tasks show that the proposed method significantly improves the NMT performance over strong Transformer baselines and other related studies.

    Download PDF (1001K)
  • Zhengjie LI, Jiabao GAO, Jinmei LAI
    Article type: PAPER
    Subject area: Biocybernetics, Neurocomputing
    2021 Volume E104.D Issue 10 Pages 1724-1733
    Published: October 01, 2021
    Released on J-STAGE: October 01, 2021
    JOURNAL FREE ACCESS

    In recent years FPGA has become popular in CNN acceleration, and many CNN-to-FPGA toolchains are proposed to fast deploy CNN on FPGA. However, for these toolchains, updating CNN network means regeneration of RTL code and re-implementation which is time-consuming and may suffer timing-closure problems. So, we propose HBDCA: a toolchain and corresponding accelerator. The CNN on HBDCA is defined by the content of BRAM. The toolchain integrates UpdateMEM utility of Xilinx, which updates content of BRAM without re-synthesis and re-implementation process. The toolchain also integrates TensorFlow Lite which provides high-accuracy quantization. HBDCA supports 8-bits per-channel quantization of weights and 8-bits per-layer quantization of activations. Upgrading CNN on accelerator means the kernel size of CNN may change. Flexible structure of HBDCA supports kernel-level parallelism with three different sizes (3×3, 5×5, 7×7). HBDCA implements four types of parallelism in convolution layer and two types of parallelism in fully-connected layer. In order to reduce access number to memory, both spatial and temporal data-reuse techniques were applied on convolution layer and fully-connect layer. Especially, temporal reuse is adopted at both row and column level of an Input Feature Map of convolution layer. Data can be just read once from BRAM and reused for the following clock. Experiments show by updating BRAM content with single UpdateMEM command, three CNNs with different kernel size (3×3, 5×5, 7×7) are implemented on HBDCA. Compared with traditional design flow, UpdateMEM reduces development time by 7.6X-9.1X for different synthesis or implementation strategy. For similar CNN which is created by toolchain, HBDCA has smaller latency (9.97µs-50.73µs), and eliminates re-implementation when update CNN. For similar CNN which is created by dedicated design, HBDCA also has the smallest latency 9.97µs, the highest accuracy 99.14% and the lowest power 1.391W. For different CNN which is created by similar toolchain which eliminate re-implementation process, HBDCA achieves higher speedup 120.28X.

    Download PDF (2550K)
  • Motohiro SUNOUCHI, Masaharu YOSHIOKA
    Article type: PAPER
    Subject area: Music Information Processing
    2021 Volume E104.D Issue 10 Pages 1734-1748
    Published: October 01, 2021
    Released on J-STAGE: October 01, 2021
    JOURNAL FREE ACCESS

    This paper proposes new acoustic feature signatures based on the multiscale fractal dimension (MFD), which are robust against the diversity of environmental sounds, for the content-based similarity search. The diversity of sound sources and acoustic compositions is a typical feature of environmental sounds. Several acoustic features have been proposed for environmental sounds. Among them is the widely-used Mel-Frequency Cepstral Coefficients (MFCCs), which describes frequency-domain features. However, in addition to these features in the frequency domain, environmental sounds have other important features in the time domain with various time scales. In our previous paper, we proposed enhanced multiscale fractal dimension signature (EMFD) for environmental sounds. This paper extends EMFD by using the kernel density estimation method, which results in better performance of the similarity search tasks. Furthermore, it newly proposes another acoustic feature signature based on MFD, namely very-long-range multiscale fractal dimension signature (MFD-VL). The MFD-VL signature describes several features of the time-varying envelope for long periods of time. The MFD-VL signature has stability and robustness against background noise and small fluctuations in the parameters of sound sources, which are produced in field recordings. We discuss the effectiveness of these signatures in the similarity sound search by comparing with acoustic features proposed in the DCASE 2018 challenges. Due to the unique descriptiveness of our proposed signatures, we confirmed the signatures are effective when they are used with other acoustic features.

    Download PDF (3323K)
  • Jiao GUAN, Jueping CAI, Ruilian XIE, Yequn WANG, Jinzhi LAI
    Article type: LETTER
    Subject area: Computer System
    2021 Volume E104.D Issue 10 Pages 1749-1752
    Published: October 01, 2021
    Released on J-STAGE: October 01, 2021
    JOURNAL FREE ACCESS

    This letter presents an oblivious and load-balanced routing (OLBR) method without virtual channels for 2D mesh Network-on-chip (NoC). To balance the traffic load of network and avoid deadlock, OLBR divides network nodes into two regions, one region contains the nodes of east and west sides of NoC, in which packets are routed by odd-even turn rule with Y direction preference (OE-YX), and the remaining nodes are divided to the other region, in which packets are routed by odd-even turn rule with alterable priority arbitration (OE-APA). Simulation results show that OLBR's saturation throughput can be improved than related works by 11.73% and OLBR balances the traffic load over entire network.

    Download PDF (439K)
  • Jun MENG, Gangyi DING, Laiyang LIU
    Article type: LETTER
    Subject area: Data Engineering, Web Information Systems
    2021 Volume E104.D Issue 10 Pages 1753-1757
    Published: October 01, 2021
    Released on J-STAGE: October 01, 2021
    JOURNAL FREE ACCESS

    In view of the different spatial and temporal resolutions of observed multi-source heterogeneous carbon dioxide data and the uncertain quality of observations, a data fusion prediction model for observed multi-scale carbon dioxide concentration data is studied. First, a wireless carbon sensor network is created, the gross error data in the original dataset are eliminated, and remaining valid data are combined with kriging method to generate a series of continuous surfaces for expressing specific features and providing unified spatio-temporally normalized data for subsequent prediction models. Then, the long short-term memory network is used to process these continuous time- and space-normalized data to obtain the carbon dioxide concentration prediction model at any scales. Finally, the experimental results illustrate that the proposed method with spatio-temporal features is more accurate than the single sensor monitoring method without spatio-temporal features.

    Download PDF (1875K)
  • Hao ZHOU, Zhuangzhuang ZHANG, Yun LIU, Meiyan XUAN, Weiwei JIANG, Hail ...
    Article type: LETTER
    Subject area: Artificial Intelligence, Data Mining
    2021 Volume E104.D Issue 10 Pages 1758-1761
    Published: October 01, 2021
    Released on J-STAGE: October 01, 2021
    JOURNAL FREE ACCESS

    Single image dehazing algorithm based on Dark Channel Prior (DCP) is widely known. More and more image dehazing algorithms based on DCP have been proposed. However, we found that it is more effective to use DCP in the RAW images before the ISP pipeline. In addition, for the problem of DCP failure in the sky area, we propose an algorithm to segment the sky region and compensate the transmission. Extensive experimental results on both subjective and objective evaluation demonstrate that the performance of the modified DCP (MDCP) has been greatly improved, and it is competitive with the state-of-the-art methods.

    Download PDF (530K)
  • Jae-Won KIM, Hochong PARK
    Article type: LETTER
    Subject area: Speech and Hearing
    2021 Volume E104.D Issue 10 Pages 1762-1765
    Published: October 01, 2021
    Released on J-STAGE: October 01, 2021
    JOURNAL FREE ACCESS

    We propose a new method for improving the recognition performance of phonemes, speech emotions, and music genres using multi-task learning. When tasks are closely related, multi-task learning can improve the performance of each task by learning common feature representation for all the tasks. However, the recognition tasks considered in this study demand different input signals of speech and music at different time scales, resulting in input features with different characteristics. In addition, a training dataset with multiple labels for all information sources is not available. Considering these issues, we conduct multi-task learning in a sequential training process using input features with a single label for one information source. A comparative evaluation confirms that the proposed method for multi-task learning provides higher performance for all recognition tasks than individual learning for each task as in conventional methods.

    Download PDF (663K)
  • Lili WEI, Zhenglong YANG, Zhenming WANG, Guozhong WANG
    Article type: LETTER
    Subject area: Image Processing and Video Processing
    2021 Volume E104.D Issue 10 Pages 1766-1769
    Published: October 01, 2021
    Released on J-STAGE: October 01, 2021
    JOURNAL FREE ACCESS

    Since HEVC intra rate control has no prior information to rely on for coding, it is a difficult work to obtain the optimal λ for every coding tree unit (CTU). In this paper, a convolutional neural network (CNN) based intra rate control is proposed. Firstly, a CNN with two last output channels is used to predict the key parameters of the CTU R-λ curve. For well training the CNN, a combining loss function is built and the balance factor γ is explored to achieve the minimum loss result. Secondly, the initial CTU λ can be calculated by the predicted results of the CNN and the allocated bit per pixel (bpp). According to the rate distortion optimization (RDO) of a frame, a spatial equation is derived between the CTU λ and the frame λ. Lastly, The CTU clipping function is used to obtain the optimal CTU λ for the intra rate control. The experimental results show that the proposed algorithm improves the intra rate control performance significantly with a good rate control accuracy.

    Download PDF (445K)
  • Daming LIN, Jie WANG, Yundong LI
    Article type: LETTER
    Subject area: Image Recognition, Computer Vision
    2021 Volume E104.D Issue 10 Pages 1770-1774
    Published: October 01, 2021
    Released on J-STAGE: October 01, 2021
    JOURNAL FREE ACCESS

    Rapid building damage identification plays a vital role in rescue operations when disasters strike, especially when rescue resources are limited. In the past years, supervised machine learning has made considerable progress in building damage identification. However, the usage of supervised machine learning remains challenging due to the following facts: 1) the massive samples from the current damage imagery are difficult to be labeled and thus cannot satisfy the training requirement of deep learning, and 2) the similarity between partially damaged and undamaged buildings is high, hindering accurate classification. Leveraging the abundant samples of auxiliary domains, domain adaptation aims to transfer a classifier trained by historical damage imagery to the current task. However, traditional domain adaptation approaches do not fully consider the category-specific information during feature adaptation, which might cause negative transfer. To address this issue, we propose a novel domain adaptation framework that individually aligns each category of the target domain to that of the source domain. Our method combines the variational autoencoder (VAE) and the Gaussian mixture model (GMM). First, the GMM is established to characterize the distribution of the source domain. Then, the VAE is constructed to extract the feature of the target domain. Finally, the Kullback-Leibler (KL) divergence is minimized to force the feature of the target domain to observe the GMM of the source domain. Two damage detection tasks using post-earthquake and post-hurricane imageries are utilized to verify the effectiveness of our method. Experiments show that the proposed method obtains improvements of 4.4% and 9.5%, respectively, compared with the conventional method.

    Download PDF (824K)
  • Rui SUN, Qili LIANG, Zi YANG, Zhenghui ZHAO, Xudong ZHANG
    Article type: LETTER
    Subject area: Image Recognition, Computer Vision
    2021 Volume E104.D Issue 10 Pages 1775-1779
    Published: October 01, 2021
    Released on J-STAGE: October 01, 2021
    JOURNAL FREE ACCESS

    Video-based person re-identification (re-ID) aims at retrieving person across non-overlapping camera and has achieved promising results owing to deep convolutional neural network. Due to the dynamic properties of the video, the problems of background clutters and occlusion are more serious than image-based person Re-ID. In this letter, we present a novel triple attention network (TriANet) that simultaneously utilizes temporal, spatial, and channel context information by employing the self-attention mechanism to get robust and discriminative feature. Specifically, the network has two parts, where the first part introduces a residual attention subnetwork, which contains channel attention module to capture cross-dimension dependencies by using rotation and transformation and spatial attention module to focus on pedestrian feature. In the second part, a time attention module is designed to judge the quality score of each pedestrian, and to reduce the weight of the incomplete pedestrian image to alleviate the occlusion problem. We evaluate our proposed architecture on three datasets, iLIDS-VID, PRID2011 and MARS. Extensive comparative experimental results show that our proposed method achieves state-of-the-art results.

    Download PDF (1934K)
  • Enze YANG, Shuoyan LIU, Yuxin LIU, Kai FANG
    Article type: LETTER
    Subject area: Biocybernetics, Neurocomputing
    2021 Volume E104.D Issue 10 Pages 1780-1783
    Published: October 01, 2021
    Released on J-STAGE: October 01, 2021
    JOURNAL FREE ACCESS

    Crowd flow prediction in high density urban scenes is involved in a wide range of intelligent transportation and smart city applications, and it has become a significant topic in urban computing. In this letter, a CNN-based framework called Pyramidal Spatio-Temporal Network (PSTNet) for crowd flow prediction is proposed. Spatial encoding is employed for spatial representation of external factors, while prior pyramid enhances feature dependence of spatial scale distances and temporal spans, after that, post pyramid is proposed to fuse the heterogeneous spatio-temporal features of multiple scales. Experimental results based on TaxiBJ and MobileBJ demonstrate that proposed PSTNet outperforms the state-of-the-art methods.

    Download PDF (522K)
  • Song CHENG, Zixuan LI, Yongsen WANG, Wanbing ZOU, Yumei ZHOU, Delong S ...
    Article type: LETTER
    Subject area: Biocybernetics, Neurocomputing
    2021 Volume E104.D Issue 10 Pages 1784-1788
    Published: October 01, 2021
    Released on J-STAGE: October 01, 2021
    JOURNAL FREE ACCESS

    Binary neural networks (BNNs), where both activations and weights are radically quantized to be {-1, +1}, can massively accelerate the run-time performance of convolution neural networks (CNNs) for edge devices, by computation complexity reduction and memory footprint saving. However, the non-differentiable binarizing function used in BNNs, makes the binarized models hard to be optimized, and introduces significant performance degradation than the full-precision models. Many previous works managed to correct the backward gradient of binarizing function with various improved versions of straight-through estimation (STE), or in a gradual approximate approach, but the gradient suppression problem was not analyzed and handled. Thus, we propose a novel gradient corrected approximation (GCA) method to match the discrepancy between binarizing function and backward gradient in a gradual and stable way. Our work has two primary contributions: The first is to approximate the backward gradient of binarizing function using a simple leaky-steep function with variable window size. The second is to correct the gradient approximation by standardizing the backward gradient propagated through binarizing function. Experiment results show that the proposed method outperforms the baseline by 1.5% Top-1 accuracy on ImageNet dataset without introducing extra computation cost.

    Download PDF (1650K)
  • Kaiyu WANG, Sichen TAO, Rong-Long WANG, Yuki TODO, Shangce GAO
    Article type: LETTER
    Subject area: Biocybernetics, Neurocomputing
    2021 Volume E104.D Issue 10 Pages 1789-1792
    Published: October 01, 2021
    Released on J-STAGE: October 01, 2021
    JOURNAL FREE ACCESS

    In 2019, a new selection method, named fitness-distance balance (FDB), was proposed. FDB has been proved to have a significant effect on improving the search capability for evolutionary algorithms. But it still suffers from poor flexibility when encountering various optimization problems. To address this issue, we propose a functional weights-enhanced FDB (FW). These functional weights change the original weights in FDB from fixed values to randomly generated ones by a distribution function, thereby enabling the algorithm to select more suitable individuals during the search. As a case study, FW is incorporated into the spherical search algorithm. Experimental results based on various IEEE CEC2017 benchmark functions demonstrate the effectiveness of FW.

    Download PDF (206K)
feedback
Top