IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
Volume E104.D, Issue 12
Displaying 1-23 of 23 articles from this issue
Special Section on Parallel, Distributed, and Reconfigurable Computing, and Networking
  • Shinya TAKAMAEDA
    2021 Volume E104.D Issue 12 Pages 2028
    Published: December 01, 2021
    Released on J-STAGE: December 01, 2021
    JOURNAL FREE ACCESS
    Download PDF (89K)
  • Kohei ITO, Kensuke IIZUKA, Kazuei HIRONAKA, Yao HU, Michihiro KOIBUCHI ...
    Article type: PAPER
    2021 Volume E104.D Issue 12 Pages 2029-2039
    Published: December 01, 2021
    Released on J-STAGE: December 01, 2021
    JOURNAL FREE ACCESS

    Multi-FPGA systems have gained attention because of their high performance and power efficiency. A multi-FPGA system called Flow-in-Cloud (FiC) is currently being developed as an accelerator of multi-access edge computing (MEC). FiC consists of multiple mid-range FPGAs tightly connected by high-speed serial links. Since time-critical jobs are assumed in MEC, a circuit-switched network with static time-division multiplexing (STDM) switches has been implemented on FiC. This paper investigates techniques of enhancing the interconnection performance of FiC. Unlike switching fabrics for Network on Chips or parallel machines, economical multi-FPGA systems, such as FiC, use Xilinx Aurora IP and FireFly cables with multiple lanes. We adopted the link aggregation and the slot distribution for using multiple lanes. To mitigate the bottleneck between an STDM switch and user logic, we also propose a multi-ejection STDM switch. We evaluated various combinations of our techniques by using three practical applications on an FiC prototype with 24 boards. When the number of slots is large and transferred data size is small, the slot distribution was sometimes more effective, while the link aggregation was superior for other most cases. Our multi-ejection STDM switch mitigated the bottleneck in ejection ports and successfully reduced the number of time slots. As a result, by combining the link aggregation and multi-ejection STDM switch, communication performance improved up to 7.50 times with few additional resources. Although the performance of the fast Fourier transform with the highest communication ratio could not be enhanced by using multiple boards when a lane was used, 1.99 times performance improvement was achieved by using 8 boards with four lanes and our multi-ejection switch compared with a board.

    Download PDF (4154K)
  • Akira JINGUJI, Shimpei SATO, Hiroki NAKAHARA
    Article type: PAPER
    2021 Volume E104.D Issue 12 Pages 2040-2047
    Published: December 01, 2021
    Released on J-STAGE: December 01, 2021
    JOURNAL FREE ACCESS

    Convolutional neural network (CNN) has a high recognition rate in image recognition and are used in embedded systems such as smartphones, robots and self-driving cars. Low-end FPGAs are candidates for embedded image recognition platforms because they achieve real-time performance at a low cost. However, CNN has significant parameters called weights and internal data called feature maps, which pose a challenge for FPGAs for performance and memory capacity. To solve these problems, we exploit a split-CNN and weight sparseness. The split-CNN reduces the memory footprint by splitting the feature map into smaller patches and allows the feature map to be stored in the FPGA's high-throughput on-chip memory. Weight sparseness reduces computational costs and achieves even higher performance. We designed a dedicated architecture of a sparse CNN and a memory buffering scheduling for a split-CNN and implemented this on the PYNQ-Z1 FPGA board with a low-end FPGA. An experiment on classification using VGG16 shows that our implementation is 3.1 times faster than the GPU, and 5.4 times faster than an existing FPGA implementation.

    Download PDF (1901K)
  • Koki HONDA, Kaijie WEI, Masatoshi ARAI, Hideharu AMANO
    Article type: PAPER
    2021 Volume E104.D Issue 12 Pages 2048-2056
    Published: December 01, 2021
    Released on J-STAGE: December 01, 2021
    JOURNAL FREE ACCESS

    Automobile companies have been trying to replace side mirrors of cars with small cameras for reducing air resistance. It enables us to apply some image processing to improve the quality of the image. Contrast Limited Adaptive Histogram Equalization (CLAHE) is one of such techniques to improve the quality of the image for the side mirror camera, which requires a large computation performance. Here, an implementation method of CLAHE on a low-end FPGA board by high-level synthesis is proposed. CLAHE has two main processing parts: cumulative distribution function (CDF) generation, and bilinear interpolation. During the CDF generation, the effect of increasing loop initiation interval can be greatly reduced by placing multiple Processing Elements (PEs). and during the interpolation, latency and BRAM usage were reduced by revising how to hold CDF and calculation method. Finally, by connecting each module with streaming interfaces, using data flow pragmas, overlapping processing, and hiding data transfer, our HLS implementation achieved a comparable result to that of HDL. We parameterized the components of the algorithm so that the number of tiles and the size of the image can be easily changed. The source code for this research can be downloaded from https://github.com/kokihonda/fpga_clahe.

    Download PDF (910K)
  • Tomoya ITSUBO, Michihiro KOIBUCHI, Hideharu AMANO, Hiroki MATSUTANI
    Article type: PAPER
    2021 Volume E104.D Issue 12 Pages 2057-2067
    Published: December 01, 2021
    Released on J-STAGE: December 01, 2021
    JOURNAL FREE ACCESS

    Since deep learning workloads perform a large number of matrix operations on training data, GPUs (Graphics Processing Units) are efficient especially for the training phase. A cluster of computers each of which equips multiple GPUs can significantly accelerate the deep learning workloads. More specifically, a back-propagation algorithm following a gradient descent approach is used for the training. Although the gradient computation is still a major bottleneck of the training, gradient aggregation and optimization impose both communication and computation overheads, which should also be reduced for further shortening the training time. To address this issue, in this paper, multiple GPUs are interconnected with a PCI Express (PCIe) over 10Gbit Ethernet (10GbE) technology. Since these remote GPUs are interconnected with network switches, gradient aggregation and optimizers (e.g., SGD, AdaGrad, Adam, and SMORMS3) are offloaded to FPGA-based 10GbE switches between remote GPUs; thus, the gradient aggregation and parameter optimization are completed in the network. The proposed FPGA-based 10GbE switches with the four optimizers are implemented on NetFPGA-SUME board. Their resource utilizations are increased by PEs for the optimizers, and they consume up to 56% of the resources. Evaluation results using four remote GPUs connected via the proposed FPGA-based switch demonstrate that these optimizers are accelerated by up to 3.0x and 1.25x compared to CPU and GPU implementations, respectively. Also, the gradient aggregation throughput by the FPGA-based switch achieves up to 98.3% of the 10GbE line rate.

    Download PDF (4289K)
  • Ryosuke KURAMOCHI, Hiroki NAKAHARA
    Article type: PAPER
    2021 Volume E104.D Issue 12 Pages 2068-2077
    Published: December 01, 2021
    Released on J-STAGE: December 01, 2021
    JOURNAL FREE ACCESS

    Convolutional neural networks (CNNs) are widely used for image processing tasks in both embedded systems and data centers. In data centers, high accuracy and low latency are desired for various tasks such as image processing of streaming videos. We propose an FPGA-based low-latency CNN inference for randomly wired convolutional neural networks (RWCNNs), whose layer structures are based on random graph models. Because RWCNNs have several convolution layers that have no direct dependencies between them, our architecture can process them efficiently using a pipeline method. At each layer, we need to use the calculation results of multiple layers as the input. We use an FPGA with HBM2 to enable parallel access to the input data with multiple HBM2 channels. We schedule the order of execution of the layers to improve the pipeline efficiency. We build a conflict graph using the scheduling results. Then, we allocate the calculation results of each layer to the HBM2 channels by coloring the graph. Because the pipeline execution needs to be properly controlled, we developed an automatic generation tool for hardware functions. We implemented the proposed architecture on the Alveo U50 FPGA. We investigated a trade-off between latency and recognition accuracy for the ImageNet classification task by comparing the inference performances for different input image sizes. We compared our accelerator with a conventional accelerator for ResNet-50. The results show that our accelerator reduces the latency by 2.21 times. We also obtained 12.6 and 4.93 times better efficiency than CPU and GPU, respectively. Thus, our accelerator for RWCNNs is suitable for low-latency inference.

    Download PDF (924K)
  • Miho YAMAKURA, Ryousei TAKANO, Akram BEN AHMED, Midori SUGAYA, Hidehar ...
    Article type: PAPER
    2021 Volume E104.D Issue 12 Pages 2078-2088
    Published: December 01, 2021
    Released on J-STAGE: December 01, 2021
    JOURNAL FREE ACCESS

    FPGA (Field Programmable Gate Array) based accelerators are attracting significant interest in cloud computing systems. Combining multi-FPGA systems with cloud computing brings a new perspective to the reconfigurable computing research. However, the multi-tenancy of a multi-FPGA system has not been fully discussed in the previous researches. In this paper, we propose a multi-tenant resource management system, named FiC-RM, for a multi-FPGA cloud system. FiC-RM provides users with a set of FPGA resources according to their requirements and allows them to exclusively access FPGA boards and the interconnection network. To achieve this, we propose a placement algorithm which is a key to efficiently share the limited resources. We demonstrate FiC-RM controls a practical scale multi-FPGA system. Moreover, Our simulation study shows that our placement algorithm achieved 3 to 4% improvement in the average resource usage and a 20-second reduction in the response time, compared to other existing naive algorithms.

    Download PDF (4421K)
  • Kouki OZAWA, Takahiro HIROFUCHI, Ryousei TAKANO, Midori SUGAYA
    Article type: PAPER
    2021 Volume E104.D Issue 12 Pages 2089-2096
    Published: December 01, 2021
    Released on J-STAGE: December 01, 2021
    JOURNAL FREE ACCESS

    With the development of IoT devices and sensors, edge computing is leading towards new services like autonomous cars and smart cities. Low-latency data access is an essential requirement for such services, and a large-capacity cache server is needed on the edge side. However, it is not realistic to build a large capacity cache server using only DRAM because DRAM is expensive and consumes substantially large power. A hybrid main memory system is promising to address this issue, in which main memory consists of DRAM and non-volatile memory. It achieves a large capacity of main memory within the power supply capabilities of current servers. In this paper, we propose Fogcached, that is, the extension of a widely-used KVS (Key-Value Store) server program (i.e., Memcached) to exploit both DRAM and non-volatile main memory (NVMM). We used Intel Optane DCPM as NVMM for its prototype. Fogcached implements a Dual-LRU (Least Recently Used) mechanism that seamlessly extends the memory management of Memcached to hybrid main memory. Fogcached reuses the segmented LRU of Memcached to manage cached objects in DRAM, adds another segmented LRU for those in DCPM and bridges the LRUs by a mechanism to automatically replace cached objects between DRAM and DCPM. Cached objects are autonomously moved between the two memory devices according to their access frequencies. Through experiments, we confirmed that Fogcached improved the peak value of a latency distribution by about 40% compared to Memcached.

    Download PDF (1098K)
  • Koki HIGASHI, Yoichi ISHIWATA, Takeshi OHKAWA, Midori SUGAYA
    Article type: PAPER
    2021 Volume E104.D Issue 12 Pages 2097-2108
    Published: December 01, 2021
    Released on J-STAGE: December 01, 2021
    JOURNAL FREE ACCESS

    Recently, edge servers located closer than the cloud have become expected for the purpose of processing the large amount of sensor data generated by IoT devices such as robots. Research has been proposed to improve responsiveness as a cache server by applying KVS (Key-Value Store) to the edge as a method for obtaining high responsiveness. Above all, a hybrid-KVS server that uses both DRAM and NVMM (Non-Volatile Main Memory) devices is expected to achieve both responsiveness and reliability. However, its effectiveness has not been verified in actual applications, and its effectiveness is not clear in terms of its relationship with the cloud. The purpose of this study is to evaluate the effectiveness of hybrid-KVS servers using the SLAM (Simultaneous Localization and Mapping), which is a widely used application in robots and autonomous driving. It is appropriate for applying an edge server and requires responsiveness and reliability. SLAM is generally implemented on ROS (Robot Operating System) middleware and communicates with the server through ROS middleware. However, if we use hybrid-KVS on the edge with the SLAM and ROS, the communication could not be achieved since the message objects are different from the format expected by KVS. Therefore, in this research, we propose a mechanism to apply the ROS memory object to hybrid-KVS by designing and implementing the data serialization function to extend ROS. As a result of the proposed fogcached-ros and evaluation, we confirm the effectiveness of low API overhead, support for data used by SLAM, and low latency difference between the edge and cloud.

    Download PDF (2248K)
  • Kazuichi OE, Takeshi NANRI
    Article type: PAPER
    2021 Volume E104.D Issue 12 Pages 2109-2120
    Published: December 01, 2021
    Released on J-STAGE: December 01, 2021
    JOURNAL FREE ACCESS

    Hybrid storage techniques are useful methods to improve the cost performance for input-output (IO) intensive workloads. These techniques choose areas of concentrated IO accesses and migrate them to an upper tier to extract as much performance as possible through greater use of upper tier areas. Automated tiered storage with fast memory and slow flash storage (ATSMF) is a hybrid storage system situated between non-volatile memories (NVMs) and solid-state drives (SSDs). ATSMF aims to reduce the average response time for IO accesses by migrating areas of concentrated IO access from an SSD to an NVM. When a concentrated IO access finishes, the system migrates these areas from the NVM back to the SSD. Unfortunately, the published ATSMF implementation temporarily consumes much NVM capacity upon migrating concentrated IO access areas to NVM, because its algorithm executes NVM migration with high priority. As a result, it often delays evicting areas in which IO concentrations have ended to the SSD. Therefore, to reduce the consumption of NVM while maintaining the average response time, we developed new techniques for making ATSMF more practical. The first is a queue handling technique based on the number of IO accesses for NVM migration and eviction. The second is an eviction method that selects only write-accessed partial regions in finished areas. The third is a technique for variable eviction timing to balance the NVM consumption and average response time. Experimental results indicate that the average response times of the proposed ATSMF are almost the same as those of the published ATSMF, while the NVM consumption is three times lower in best case.

    Download PDF (1239K)
  • Hiroki OKADA, Masato YOSHIMI, Celimuge WU, Tsutomu YOSHINAGA
    Article type: PAPER
    2021 Volume E104.D Issue 12 Pages 2121-2130
    Published: December 01, 2021
    Released on J-STAGE: December 01, 2021
    JOURNAL FREE ACCESS

    In this study, we propose a mechanism called adaptive failsoft control to address peak traffic in mobile live streaming, using a chasing playback function. Although a cache system is avaliable to support the chasing playback function for live streaming in a base station and device-to-device communication, the request concentration by highlight scenes influences the traffic load owing to data unavailability. To avoid data unavailability, we adapted two live streaming features: (1) streaming data while switching the video quality, and (2) time variability of the number of requests. The second feature enables a fallback mechanism for the cache system by prioritizing cache eviction and terminating the transfer of cache-missed requests. This paper discusses the simulation results of the proposed mechanism, which adopts a request model appropriate for (a) avoiding peak traffic and (b) maintaining continuity of service.

    Download PDF (760K)
Regular Section
  • Ryoma SENDA, Yoshiaki TAKATA, Hiroyuki SEKI
    Article type: PAPER
    Subject area: Fundamentals of Information Systems
    2021 Volume E104.D Issue 12 Pages 2131-2144
    Published: December 01, 2021
    Released on J-STAGE: December 01, 2021
    JOURNAL FREE ACCESS

    A pushdown system (PDS) is known as an abstract model of recursive programs. For PDS, model checking methods have been studied and applied to various software verification such as interprocedural data flow analysis and malware detection. However, PDS cannot manipulate data values from an infinite domain. A register PDS (RPDS) is an extension of PDS by adding registers to deal with data values in a restricted way. This paper proposes algorithms for LTL model checking problems for RPDS with simple and regular valuations, which are labelings of atomic propositions to configurations with reasonable restriction. First, we introduce RPDS and related models, and then define the LTL model checking problems for RPDS. Second, we give algorithms for solving these problems and also show that the problems are EXPTIME-complete. As practical examples, we show solutions of a malware detection and an XML schema checking in the proposed framework.

    Download PDF (406K)
  • Weijun LIU
    Article type: PAPER
    Subject area: Fundamentals of Information Systems
    2021 Volume E104.D Issue 12 Pages 2145-2153
    Published: December 01, 2021
    Released on J-STAGE: December 01, 2021
    JOURNAL FREE ACCESS

    Computing the Lempel-Ziv Factorization (LZ77) of a string is one of the most important problems in computer science. Nowadays, it has been widely used in many applications such as data compression, text indexing and pattern discovery, and already become the heart of many file compressors like gzip and 7zip. In this paper, we show a linear time algorithm called Xone for computing the LZ77, which has the same space requirement with the previous best space requirement for linear time LZ77 factorization called BGone. Xone greatly improves the efficiency of BGone. Experiments show that the two versions of Xone: XoneT and XoneSA are about 27% and 31% faster than BGoneT and BGoneSA, respectively.

    Download PDF (471K)
  • Ran LI, Huibiao ZHU, Jiaqi YIN
    Article type: PAPER
    Subject area: Software System
    2021 Volume E104.D Issue 12 Pages 2154-2163
    Published: December 01, 2021
    Released on J-STAGE: December 01, 2021
    JOURNAL FREE ACCESS

    Ceph is an object-based parallel distributed file system that provides excellent performance, reliability, and scalability. Additionally, Ceph provides its Cephx authentication system to authenticate users, so that it can identify users and realize authentication. In this paper, we first model the basic architecture of Ceph using process algebra CSP (Communicating Sequential Processes). With the help of the model checker PAT (Process Analysis Toolkit), we feed the constructed model to PAT and then verify several related properties, including Deadlock Freedom, Data Reachability, Data Write Integrity, Data Consistency and Authentication. The verification results show that the original model cannot cater to the Authentication property. Therefore, we formalize a new model of Ceph where Cephx is adopted. In the light of the new verification results, it can be found that Cephx satisfies all these properties.

    Download PDF (1719K)
  • Yuto JUMONJI, Hiroshi YAMADA
    Article type: PAPER
    Subject area: Dependable Computing
    2021 Volume E104.D Issue 12 Pages 2164-2172
    Published: December 01, 2021
    Released on J-STAGE: December 01, 2021
    JOURNAL FREE ACCESS

    Reboot-based recovery is a simple but powerful method to recover applications from failures and unstable states. Reboot-based recovery faces a challenge to apply it to a new type of applications, in-memory databases (DBs). Unlike legacy applications, since rebooting in-memory DBs loses memory objects including key-value pairs and DB blocks, it is required to restore them, causing severe performance degradation after the reboot. This paper presents an approach that allows us to perform reboot-based recovery of in-memory DBs with lower performance degradation. Our key insight is to decouple data content objects from all the memory objects. Our approach treats data items as data content objects, preserves data content objects on memory across reboots, and enforces restarted in-memory DBs to attach them. To show the effectiveness of our approach, we elaborate the idea into two real-world DBs, MyRocks and memcached. The prototypes successfully mitigate performance degradation after their reboot-based recovery.

    Download PDF (457K)
  • Yuki KAJIWARA, Junjun ZHENG, Koichi MOURI
    Article type: PAPER
    Subject area: Artificial Intelligence, Data Mining
    2021 Volume E104.D Issue 12 Pages 2173-2183
    Published: December 01, 2021
    Released on J-STAGE: December 01, 2021
    JOURNAL FREE ACCESS

    The number of malware, including variants and new types, is dramatically increasing over the years, posing one of the greatest cybersecurity threats nowadays. To counteract such security threats, it is crucial to detect malware accurately and early enough. The recent advances in machine learning technology have brought increasing interest in malware detection. A number of research studies have been conducted in the field. It is well known that malware detection accuracy largely depends on the training dataset used. Creating a suitable training dataset for efficient malware detection is thus crucial. Different works usually use their own dataset; therefore, a dataset is only effective for one detection method, and strictly comparing several methods using a common training dataset is difficult. In this paper, we focus on how to create a training dataset for efficiently detecting malware. To achieve our goal, the first step is to clarify the information that can accurately characterize malware. This paper concentrates on threads, by treating them as important information for characterizing malware. Specifically, on the basis of the dynamic analysis log from the Alkanet, a system call tracer, we obtain the thread information and classify the thread information processing into four patterns. Then the malware detection is performed using the number of transitions of system calls appearing in the thread as a feature. Our comparative experimental results showed that the primary thread information is important and useful for detecting malware with high accuracy.

    Download PDF (438K)
  • Ruicong ZHI, Caixia ZHOU, Junwei YU, Tingting LI, Ghada ZAMZMI
    Article type: PAPER
    Subject area: Human-computer Interaction
    2021 Volume E104.D Issue 12 Pages 2184-2194
    Published: December 01, 2021
    Released on J-STAGE: December 01, 2021
    JOURNAL FREE ACCESS

    Pain is an essential physiological phenomenon of human beings. Accurate assessment of pain is important to develop proper treatment. Although self-report method is the gold standard in pain assessment, it is not applicable to individuals with communicative impairment. Non-verbal pain indicators such as pain related facial expressions and changes in physiological parameters could provide valuable insights for pain assessment. In this paper, we propose a multimodal-based Stream Integrated Neural Network with Different Frame Rates (SINN) that combines facial expression and biomedical signals for automatic pain assessment. The main contributions of this research are threefold. (1) There are four-stream inputs of the SINN for facial expression feature extraction. The variant facial features are integrated with biomedical features, and the joint features are utilized for pain assessment. (2) The dynamic facial features are learned in both implicit and explicit manners to better represent the facial changes that occur during pain experience. (3) Multiple modalities are utilized to identify various pain states, including facial expression and biomedical signals. The experiments are conducted on publicly available pain datasets, and the performance is compared with several deep learning models. The experimental results illustrate the superiority of the proposed model, and it achieves the highest accuracy of 68.2%, which is up to 5% higher than the basic deep learning models on pain assessment with binary classification.

    Download PDF (2022K)
  • Sashi NOVITASARI, Sakriani SAKTI, Satoshi NAKAMURA
    Article type: PAPER
    Subject area: Speech and Hearing
    2021 Volume E104.D Issue 12 Pages 2195-2208
    Published: December 01, 2021
    Released on J-STAGE: December 01, 2021
    JOURNAL FREE ACCESS

    Real-time machine speech translation systems mimic human interpreters and translate incoming speech from a source language to the target language in real-time. Such systems can be achieved by performing low-latency processing in ASR (automatic speech recognition) module before passing the output to MT (machine translation) and TTS (text-to-speech synthesis) modules. Although several studies recently proposed sequence mechanisms for neural incremental ASR (ISR), these frameworks have a more complicated training mechanism than the standard attention-based ASR because they have to decide the incremental step and learn the alignment between speech and text. In this paper, we propose attention-transfer ISR (AT-ISR) that learns the knowledge from attention-based non-incremental ASR for a low delay end-to-end speech recognition. ISR comes with a trade-off between delay and performance, so we investigate how to reduce AT-ISR delay without a significant performance drop. Our experiment shows that AT-ISR achieves a comparable performance to the non-incremental ASR when the incremental recognition begins after the speech utterance reaches 25% of the complete utterance length. Additional experiments to investigate the effect of ISR on translation tasks are also performed. The focus is to find the optimum granularity of the output unit. The results reveal that our end-to-end subword-level ISR resulted in the best translation quality with the lowest WER and the lowest uncovered-word rate.

    Download PDF (1322K)
  • Hongcui WANG, Pierre ROUSSEL, Bruce DENBY
    Article type: PAPER
    Subject area: Speech and Hearing
    2021 Volume E104.D Issue 12 Pages 2209-2217
    Published: December 01, 2021
    Released on J-STAGE: December 01, 2021
    JOURNAL FREE ACCESS

    A Silent Speech Interface (SSI) is a sensor-based, Artificial Intelligence (AI) enabled system in which articulation is performed without the use of the vocal chords, resulting in a voice interface that conserves the ambient audio environment, protects private data, and also functions in noisy environments. Though portable SSIs based on ultrasound imaging of the tongue have obtained Word Error Rates rivaling that of acoustic speech recognition, SSIs remain relegated to the laboratory due to stability issues. Indeed, reliable extraction of acoustic features from ultrasound tongue images in real-life situations has proven elusive. Recently, Representation Learning has shown considerable success in learning underlying structure in noisy, high-dimensional raw data. In its unsupervised form, Representation Learning is able to reveal structure in unlabeled data, thus greatly simplifying the data preparation task. In the present article, a 3D Convolutional Neural Network architecture is applied to unlabeled ultrasound images, and is shown to reliably predict future tongue configurations. By comparing the 3DCNN to a simple previous-frame predictor, it is possible to recognize tongue trajectories comprising transitions between regions of stability that correlate with formant trajectories in a spectrogram of the signal. Prospects for using the underlying structural representation to provide features for subsequent speech processing tasks are presented.

    Download PDF (2881K)
  • Wenyi GE, Yi LIN, Zhitao WANG, Guigui WANG, Shihan TAN
    Article type: PAPER
    Subject area: Image Processing and Video Processing
    2021 Volume E104.D Issue 12 Pages 2218-2225
    Published: December 01, 2021
    Released on J-STAGE: December 01, 2021
    JOURNAL FREE ACCESS

    In this paper, we present a simple yet powerful deep neural network for natural image dehazing. The proposed method is designed based on U-Net architecture and we made some design changes to make it better. We first use Group Normalization to replace Batch Normalization to solve the problem of insufficient batch size due to hardware limitations. Second, we introduce FReLU activation into the U-Net block, which can achieve capturing complicated visual layouts with regular convolutions. Experimental results on public benchmarks demonstrate the effectiveness of the modified components. On the SOTS Indoor and Outdoor datasets, it obtains PSNR of 32.23 and 31.64 respectively, which are comparable performances with state-of-the-art methods. The code is publicly available online soon.

    Download PDF (2892K)
  • Uuganbayar GANBOLD, Junya SATO, Takuya AKASHI
    Article type: PAPER
    Subject area: Image Recognition, Computer Vision
    2021 Volume E104.D Issue 12 Pages 2226-2236
    Published: December 01, 2021
    Released on J-STAGE: December 01, 2021
    JOURNAL FREE ACCESS

    Horizon detection is useful in maritime image processing for various purposes, such as estimation of camera orientation, registration of consecutive frames, and restriction of the object search region. Existing horizon detection methods are based on edge extraction. For accuracy, they use multiple images, which are filtered with different filter sizes. However, this increases the processing time. In addition, these methods are not robust to blurting. Therefore, we developed a horizon detection method without extracting the candidates from the edge information by formulating the horizon detection problem as a global optimization problem. A horizon line in an image plane was represented by two parameters, which were optimized by an evolutionary algorithm (genetic algorithm). Thus, the local and global features of a horizon were concurrently utilized in the optimization process, which was accelerated by applying a coarse-to-fine strategy. As a result, we could detect the horizon line on high-resolution maritime images in about 50ms. The performance of the proposed method was tested on 49 videos of the Singapore marine dataset and the Buoy dataset, which contain over 16000 frames under different scenarios. Experimental results show that the proposed method can achieve higher accuracy than state-of-the-art methods.

    Download PDF (3152K)
  • Zifen HE, Shouye ZHU, Ying HUANG, Yinhui ZHANG
    Article type: PAPER
    Subject area: Image Recognition, Computer Vision
    2021 Volume E104.D Issue 12 Pages 2237-2243
    Published: December 01, 2021
    Released on J-STAGE: December 01, 2021
    JOURNAL FREE ACCESS

    This paper presents a novel method for weakly supervised semantic segmentation of 3D point clouds using a novel graph and edge convolutional neural network (GECNN) towards 1% and 10% point cloud with labels. Our general framework facilitates semantic segmentation by encoding both global and local scale features via a parallel graph and edge aggregation scheme. More specifically, global scale graph structure cues of point clouds are captured by a graph convolutional neural network, which is propagated from pairwise affinity representation over the whole graph established in a d-dimensional feature embedding space. We integrate local scale features derived from a dynamic edge feature aggregation convolutional neural networks that allows us to fusion both global and local cues of 3D point clouds. The proposed GECNN model is trained by using a comprehensive objective which consists of incomplete, inexact, self-supervision and smoothness constraints based on partially labeled points. The proposed approach enforces global and local consistency constraints directly on the objective losses. It inherently handles the challenges of segmenting sparse 3D point clouds with limited annotations in a large scale point cloud space. Our experiments on the ShapeNet and S3DIS benchmarks demonstrate the effectiveness of the proposed approach for efficient (within 20 epochs) learning of large scale point cloud semantics despite very limited labels.

    Download PDF (2151K)
  • Sang-Hoon KIM
    Article type: LETTER
    Subject area: Software System
    2021 Volume E104.D Issue 12 Pages 2244-2247
    Published: December 01, 2021
    Released on J-STAGE: December 01, 2021
    JOURNAL FREE ACCESS

    There have been increasing demands for distributed operating systems to better utilize scattered resources over multiple nodes. This paper enlightens the challenges and requirements for the communication layers for distributed operating systems, and makes a case for a versatile, high-performance communication layer over InfiniBand network.

    Download PDF (241K)
feedback
Top