IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
Volume E106.D, Issue 10
Displaying 1-19 of 19 articles from this issue
Special Section on Picture Coding and Image Media Processing
  • Ichiro MATSUDA
    2023 Volume E106.D Issue 10 Pages 1620
    Published: October 01, 2023
    Released on J-STAGE: October 01, 2023
    JOURNAL FREE ACCESS
    Download PDF (60K)
  • Xin JIN, Jia GUO
    Article type: PAPER
    2023 Volume E106.D Issue 10 Pages 1621-1626
    Published: October 01, 2023
    Released on J-STAGE: October 01, 2023
    JOURNAL FREE ACCESS

    Human motion prediction has always been an interesting research topic in computer vision and robotics. It means forecasting human movements in the future conditioning on historical 3-dimensional human skeleton sequences. Existing predicting algorithms usually rely on extensive annotated or non-annotated motion capture data and are non-adaptive. This paper addresses the problem of few-frame human motion prediction, in the spirit of the recent progress on manifold learning. More precisely, our approach is based on the insight that achieving an accurate prediction relies on a sufficiently linear expression in the latent space from a few training data in observation space. To accomplish this, we propose Regressive Gaussian Process Latent Variable Model (RGPLVM) that introduces a novel regressive kernel function for the model training. By doing so, our model produces a linear mapping from the training data space to the latent space, while effectively transforming the prediction of human motion in physical space to the linear regression analysis in the latent space equivalent. The comparison with two learning motion prediction approaches (the state-of-the-art meta learning and the classical LSTM-3LR) demonstrate that our GPLVM significantly improves the prediction performance on various of actions in the small-sample size regime.

    Download PDF (1122K)
  • Norihiko KAWAI, Hiroaki KOIKE
    Article type: PAPER
    2023 Volume E106.D Issue 10 Pages 1627-1637
    Published: October 01, 2023
    Released on J-STAGE: October 01, 2023
    JOURNAL FREE ACCESS

    Due to the global outbreak of coronaviruses, people are increasingly wearing masks even when photographed. As a result, photos uploaded to web pages and social networking services with the lower half of the face hidden are less likely to convey the attractiveness of the photographed persons. In this study, we propose a method to complete facial mask regions using StyleGAN2, a type of Generative Adversarial Networks (GAN). In the proposed method, a reference image of the same person without a mask is prepared separately from a target image of the person wearing a mask. After the mask region in the target image is temporarily inpainted, the face orientation and contour of the person in the reference image are changed to match those of the target image using StyleGAN2. The changed image is then composited into the mask region while correcting the color tone to produce a mask-free image while preserving the person's features.

    Download PDF (6638K)
  • Ying JI, Yu WANG, Kensaku MORI, Jien KATO
    Article type: PAPER
    2023 Volume E106.D Issue 10 Pages 1638-1649
    Published: October 01, 2023
    Released on J-STAGE: October 01, 2023
    JOURNAL FREE ACCESS

    Social relationships (e.g., couples, opponents) are the foundational part of society. Social relation atmosphere describes the overall interaction environment between social relationships. Discovering social relation atmosphere can help machines better comprehend human behaviors and improve the performance of social intelligent applications. Most existing research mainly focuses on investigating social relationships, while ignoring the social relation atmosphere. Due to the complexity of the expressions in video data and the uncertainty of the social relation atmosphere, it is even difficult to define and evaluate. In this paper, we innovatively analyze the social relation atmosphere in video data. We introduce a Relevant Visual Concept (RVC) from the social relationship recognition task to facilitate social relation atmosphere recognition, because social relationships contain useful information about human interactions and surrounding environments, which are crucial clues for social relation atmosphere recognition. Our approach consists of two main steps: (1) we first generate a group of visual concepts that preserve the inherent social relationship information by utilizing a 3D explanation module; (2) the extracted relevant visual concepts are used to supplement the social relation atmosphere recognition. In addition, we present a new dataset based on the existing Video Social Relation Dataset. Each video is annotated with four kinds of social relation atmosphere attributes and one social relationship. We evaluate the proposed method on our dataset. Experiments with various 3D ConvNets and fusion methods demonstrate that the proposed method can effectively improve recognition accuracy compared to end-to-end ConvNets. The visualization results also indicate that essential information in social relationships can be discovered and used to enhance social relation atmosphere recognition.

    Download PDF (12044K)
  • Akira KUBOTA, Kazuya KODAMA, Daiki TAMURA, Asami ITO
    Article type: PAPER
    2023 Volume E106.D Issue 10 Pages 1650-1660
    Published: October 01, 2023
    Released on J-STAGE: October 01, 2023
    JOURNAL FREE ACCESS

    Focal stacks (FS) have attracted attention as an alternative representation of light field (LF). However, the problem of reconstructing LF from its FS is considered ill-posed. Although many regularization methods have been discussed, no method has been proposed to solve this problem perfectly. This paper showed that the LF can be perfectly reconstructed from the FS through a filter bank in theory for Lambertian scenes without occlusion if the camera aperture for acquiring the FS is a Cauchy function. The numerical simulation demonstrated that the filter bank allows perfect reconstruction of the LF.

    Download PDF (7825K)
  • Onhi KATO, Akira KUBOTA
    Article type: PAPER
    2023 Volume E106.D Issue 10 Pages 1661-1672
    Published: October 01, 2023
    Released on J-STAGE: October 01, 2023
    JOURNAL FREE ACCESS

    Various haze removal methods based on the atmospheric scattering model have been presented in recent years. Most methods have targeted strong haze images where light is scattered equally in all color channels. This paper presents a haze removal method using near-infrared (NIR) images for relatively weak haze images. In order to recover the lost edges, the presented method first extracts edges from an appropriately weighted NIR image and fuses it with the color image. By introducing a wavelength-dependent scattering model, our method then estimates the transmission map for each color channel and recovers the color more naturally from the edge-recovered image. Finally, the edge-recovered and the color-recovered images are blended. In this blending process, the regions with high lightness, such as sky and clouds, where unnatural color shifts are likely to occur, are effectively estimated, and the optimal weighting map is obtained. Our qualitative and quantitative evaluations using 59 pairs of color and NIR images demonstrated that our method can recover edges and colors more naturally in weak haze images than conventional methods.

    Download PDF (7277K)
  • Keiichiro TAKADA, Yasuaki TOKUMO, Tomohiro IKAI, Takeshi CHUJOH
    Article type: LETTER
    2023 Volume E106.D Issue 10 Pages 1673-1676
    Published: October 01, 2023
    Released on J-STAGE: October 01, 2023
    JOURNAL FREE ACCESS

    Video-based point cloud compression (V-PCC) utilizes video compression technology to efficiently encode dense point clouds providing state-of-the-art compression performance with a relatively small computation burden. V-PCC converts 3-dimensional point cloud data into three types of 2-dimensional frames, i.e., occupancy, geometry, and attribute frames, and encodes them via video compression. On the other hand, the quality of these frames may be degraded due to video compression. This paper proposes an adaptive neural network-based post-processing filter on attribute frames to alleviate the degradation problem. Furthermore, a novel training method using occupancy frames is studied. The experimental results show average BD-rate gains of 3.0%, 29.3% and 22.2% for Y, U and V respectively.

    Download PDF (811K)
Regular Section
  • Sinyu JUNG, Keiichi KANEKO
    Article type: PAPER
    Subject area: Fundamentals of Information Systems
    2023 Volume E106.D Issue 10 Pages 1677-1685
    Published: October 01, 2023
    Released on J-STAGE: October 01, 2023
    JOURNAL FREE ACCESS

    A feedback node set (FNS) of a graph is a subset of the nodes of the graph whose deletion makes the residual graph acyclic. By finding an FNS in an interconnection network, we can set a check point at each node in it to avoid a livelock configuration. Hence, to find an FNS is a critical issue to enhance the dependability of a parallel computing system. In this paper, we propose a method to find FNS's in n-pancake graphs and n-burnt pancake graphs. By analyzing the types of cycles proposed in our method, we also give the number of the nodes in the FNS in an n-pancake graph, (n-2.875)(n-1)!+1.5(n-3)!, and that in an n-burnt pancake graph, 2n-1(n-1)!(n-3.5).

    Download PDF (648K)
  • Yunqi MA, Satoshi FUJITA
    Article type: PAPER
    Subject area: Information Network
    2023 Volume E106.D Issue 10 Pages 1686-1693
    Published: October 01, 2023
    Released on J-STAGE: October 01, 2023
    JOURNAL FREE ACCESS

    Peer-to-peer (P2P) technology has gained popularity as a way to enhance system performance. Nodes in a P2P network work together by providing network resources to one another. In this study, we examine the use of P2P technology for video streaming and develop a distributed incentive mechanism to prevent free-riding. Our proposed solution combines WebTorrent and the Solana blockchain and can be accessed through a web browser. To incentivize uploads, some of the received video chunks are encrypted using AES. Smart contracts on the blockchain are used for third-party verification of uploads and for managing access to the video content. Experimental results on a test network showed that our system can encrypt and decrypt chunks in about 1/40th the time it takes using WebRTC, without affecting the quality of video streaming. Smart contracts were also found to quickly verify uploads in about 860 milliseconds. The paper also explores how to effectively reward virtual points for uploads.

    Download PDF (486K)
  • Shiling SHI, Stefan HOLST, Xiaoqing WEN
    Article type: PAPER
    Subject area: Dependable Computing
    2023 Volume E106.D Issue 10 Pages 1694-1704
    Published: October 01, 2023
    Released on J-STAGE: October 01, 2023
    JOURNAL FREE ACCESS

    High power dissipation during scan test often causes undue yield loss, especially for low-power circuits. One major reason is that the resulting IR-drop in shift mode may corrupt test data. A common approach to solving this problem is partial-shift, in which multiple scan chains are formed and only one group of scan chains is shifted at a time. However, existing partial-shift based methods suffer from two major problems: (1) their IR-drop estimation is not accurate enough or computationally too expensive to be done for each shift cycle; (2) partial-shift is hence applied to all shift cycles, resulting in long test time. This paper addresses these two problems with a novel IR-drop-aware scan shift method, featuring: (1) Cycle-based IR-Drop Estimation (CIDE) supported by a GPU-accelerated dynamic power simulator to quickly find potential shift cycles with excessive peak IR-drop; (2) a scan shift scheduling method that generates a scan chain grouping targeted for each considered shift cycle to reduce the impact on test time. Experiments on ITC'99 benchmark circuits show that: (1) the CIDE is computationally feasible; (2) the proposed scan shift schedule can achieve a global peak IR-drop reduction of up to 47%. Its scheduling efficiency is 58.4% higher than that of an existing typical method on average, which means our method has less test time.

    Download PDF (1587K)
  • Yingyao WANG, Han WANG, Chaoqun DUAN, Tiejun ZHAO
    Article type: PAPER
    Subject area: Artificial Intelligence, Data Mining
    2023 Volume E106.D Issue 10 Pages 1705-1714
    Published: October 01, 2023
    Released on J-STAGE: October 01, 2023
    JOURNAL FREE ACCESS

    Question-answering tasks over structured knowledge (i.e., tables and graphs) require the ability to encode structural information. Traditional pre-trained language models trained on linear-chain natural language cannot be directly applied to encode tables and graphs. The existing methods adopt the pre-trained models in such tasks by flattening structured knowledge into sequences. However, the serialization operation will lead to the loss of the structural information of knowledge. To better employ pre-trained transformers for structured knowledge representation, we propose a novel structure-aware transformer (SATrans) that injects the local-to-global structural information of the knowledge into the mask of the different self-attention layers. Specifically, in the lower self-attention layers, SATrans focus on the local structural information of each knowledge token to learn a more robust representation of it. In the upper self-attention layers, SATrans further injects the global information of the structured knowledge to integrate the information among knowledge tokens. In this way, the SATrans can effectively learn the semantic representation and structural information from the knowledge sequence and the attention mask, respectively. We evaluate SATrans on the table fact verification task and the knowledge base question-answering task. Furthermore, we explore two methods to combine symbolic and linguistic reasoning for these tasks to solve the problem that the pre-trained models lack symbolic reasoning ability. The experiment results reveal that the methods consistently outperform strong baselines on the two benchmarks.

    Download PDF (976K)
  • Baoxian WANG, Zhihao DONG, Yuzhao WANG, Shoupeng QIN, Zhao TAN, Weigan ...
    Article type: PAPER
    Subject area: Image Recognition, Computer Vision
    2023 Volume E106.D Issue 10 Pages 1715-1722
    Published: October 01, 2023
    Released on J-STAGE: October 01, 2023
    JOURNAL FREE ACCESS

    As a typical surface defect of tunnel lining structures, cracking disease affects the durability of tunnel structures and poses hidden dangers to tunnel driving safety. Factors such as interference from the complex service environment of the tunnel and the low signal-to-noise ratio of the crack targets themselves, have led to existing crack recognition methods based on semantic segmentation being unable to meet actual engineering needs. Based on this, this paper uses the Unet network as the basic framework for crack identification and proposes to construct a multi-kernel convolution cascade enhancement (MKCE) model to achieve accurate detection and identification of crack diseases. First of all, to ensure the performance of crack feature extraction, the model modified the main feature extraction network in the basic framework to ResNet-50 residual network. Compared with the VGG-16 network, this modification can extract richer crack detail features while reducing model parameters. Secondly, considering that the Unet network cannot effectively perceive multi-scale crack features in the skip connection stage, a multi-kernel convolution cascade enhancement module is proposed by combining a cascaded connection of multi-kernel convolution groups and multi-expansion rate dilated convolution groups. This module achieves a comprehensive perception of local details and the global content of tunnel lining cracks. In addition, to better weaken the effect of tunnel background clutter interference, a convolutional block attention calculation module is further introduced after the multi-kernel convolution cascade enhancement module, which effectively reduces the false alarm rate of crack recognition. The algorithm is tested on a large number of subway tunnel crack image datasets. The experimental results show that, compared with other crack recognition algorithms based on deep learning, the method in this paper has achieved the best results in terms of accuracy and intersection over union (IoU) indicators, which verifies the method in this paper has better applicability.

    Download PDF (864K)
  • Takao YAMANAKA, Tatsuya SUZUKI, Taiki NOBUTSUNE, Chenjunlin WU
    Article type: PAPER
    Subject area: Image Recognition, Computer Vision
    2023 Volume E106.D Issue 10 Pages 1723-1731
    Published: October 01, 2023
    Released on J-STAGE: October 01, 2023
    JOURNAL FREE ACCESS

    Omni-directional images have been used in wide range of applications including virtual/augmented realities, self-driving cars, robotics simulators, and surveillance systems. For these applications, it would be useful to estimate saliency maps representing probability distributions of gazing points with a head-mounted display, to detect important regions in the omni-directional images. This paper proposes a novel saliency-map estimation model for the omni-directional images by extracting overlapping 2-dimensional (2D) plane images from omni-directional images at various directions and angles of view. While 2D saliency maps tend to have high probability at the center of images (center bias), the high-probability region appears at horizontal directions in omni-directional saliency maps when a head-mounted display is used (equator bias). Therefore, the 2D saliency model with a center-bias layer was fine-tuned with an omni-directional dataset by replacing the center-bias layer to an equator-bias layer conditioned on the elevation angle for the extraction of the 2D plane image. The limited availability of omni-directional images in saliency datasets can be compensated by using the well-established 2D saliency model pretrained by a large number of training images with the ground truth of 2D saliency maps. In addition, this paper proposes a multi-scale estimation method by extracting 2D images in multiple angles of view to detect objects of various sizes with variable receptive fields. The saliency maps estimated from the multiple angles of view were integrated by using pixel-wise attention weights calculated in an integration layer for weighting the optimal scale to each object. The proposed method was evaluated using a publicly available dataset with evaluation metrics for omni-directional saliency maps. It was confirmed that the accuracy of the saliency maps was improved by the proposed method.

    Download PDF (2089K)
  • Takehiro TAKAYANAGI, Kiyoshi IZUMI
    Article type: PAPER
    Subject area: Natural Language Processing
    2023 Volume E106.D Issue 10 Pages 1732-1741
    Published: October 01, 2023
    Released on J-STAGE: October 01, 2023
    JOURNAL FREE ACCESS

    Personalized stock recommendations aim to suggest stocks tailored to individual investor needs, significantly aiding the financial decision making of an investor. This study shows the advantages of incorporating context into personalized stock recommendation systems. We embed item contextual information such as technical indicators, fundamental factors, and business activities of individual stocks. Simultaneously, we consider user contextual information such as investors' personality traits, behavioral characteristics, and attributes to create a comprehensive investor profile. Our model incorporating contextual information, validated on novel stock recommendation tasks, demonstrated a notable improvement over baseline models when incorporating these contextual features. Consistent outperformance across various hyperparameters further underscores the robustness and utility of our model in integrating stocks' features and investors' traits into personalized stock recommendations.

    Download PDF (1212K)
  • Jonghyeok YOU, Heesoo KIM, Kilho LEE
    Article type: LETTER
    Subject area: Software System
    2023 Volume E106.D Issue 10 Pages 1742-1746
    Published: October 01, 2023
    Released on J-STAGE: October 01, 2023
    JOURNAL FREE ACCESS

    This paper proposes a fault-resilient ROS platform supporting rapid fault detection and recovery. The platform employs heartbeat-based fault detection and node replication-based recovery. Our prototype implementation on top of the ROS Melodic shows a great performance in evaluations with a Nvidia development board and an inverted pendulum device.

    Download PDF (1060K)
  • Hongli ZHANG, Jinglei LIU
    Article type: LETTER
    Subject area: Artificial Intelligence, Data Mining
    2023 Volume E106.D Issue 10 Pages 1747-1751
    Published: October 01, 2023
    Released on J-STAGE: October 01, 2023
    JOURNAL FREE ACCESS

    With the emergence of a large quantity of data in science and industry, it is urgent to improve the prediction accuracy and reduce the high complexity of Gaussian process regression (GPR). However, the traditional global approximation and local approximation have corresponding shortcomings, such as global approximation tends to ignore local features, and local approximation has the problem of over-fitting. In order to solve these problems, a large-scale Gaussian process regression algorithm (RFFLT) combining random Fourier features (RFF) and local approximation is proposed. 1) In order to speed up the training time, we use the random Fourier feature map input data mapped to the random low-dimensional feature space for processing. The main innovation of the algorithm is to design features by using existing fast linear processing methods, so that the inner product of the transformed data is approximately equal to the inner product in the feature space of the shift invariant kernel specified by the user. 2) The generalized robust Bayesian committee machine (GRBCM) based on Tsallis mutual information method is used in local approximation, which enhances the flexibility of the model and generates a sparse representation of the expert weight distribution compared with previous work. The algorithm RFFLT was tested on six real data sets, which greatly shortened the time of regression prediction and improved the prediction accuracy.

    Download PDF (174K)
  • Jinsheng WEI, Haoyu CHEN, Guanming LU, Jingjie YAN, Yue XIE, Guoying Z ...
    Article type: LETTER
    Subject area: Image Processing and Video Processing
    2023 Volume E106.D Issue 10 Pages 1752-1756
    Published: October 01, 2023
    Released on J-STAGE: October 01, 2023
    JOURNAL FREE ACCESS

    Micro-expression recognition (MER) draws intensive research interest as micro-expressions (MEs) can infer genuine emotions. Prior information can guide the model to learn discriminative ME features effectively. However, most works focus on researching the general models with a stronger representation ability to adaptively aggregate ME movement information in a holistic way, which may ignore the prior information and properties of MEs. To solve this issue, driven by the prior information that the category of ME can be inferred by the relationship between the actions of facial different components, this work designs a novel model that can conform to this prior information and learn ME movement features in an interpretable way. Specifically, this paper proposes a Decomposition and Reconstruction-based Graph Representation Learning (DeRe-GRL) model to efectively learn high-level ME features. DeRe-GRL includes two modules: Action Decomposition Module (ADM) and Relation Reconstruction Module (RRM), where ADM learns action features of facial key components and RRM explores the relationship between these action features. Based on facial key components, ADM divides the geometric movement features extracted by the graph model-based backbone into several sub-features, and learns the map matrix to map these sub-features into multiple action features; then, RRM learns weights to weight all action features to build the relationship between action features. The experimental results demonstrate the effectiveness of the proposed modules, and the proposed method achieves competitive performance.

    Download PDF (686K)
  • Wan Yeon LEE, Yun-Seok CHOI, Tong Min KIM
    Article type: LETTER
    Subject area: Image Processing and Video Processing
    2023 Volume E106.D Issue 10 Pages 1757-1760
    Published: October 01, 2023
    Released on J-STAGE: October 01, 2023
    JOURNAL FREE ACCESS

    We propose a quantitative measurement technique of video forgery that eliminates the decision burden of subtle boundary between normal and tampered patterns. We also propose the automatic adjustment scheme of spatial and temporal target zones, which maximizes the abnormality measurement of forged videos. Evaluation shows that the proposed scheme provides manifest detection capability against both inter-frame and intra-frame forgeries.

    Download PDF (694K)
  • Editorial Committee of Special Section on Formal Approaches
    2023 Volume E106.D Issue 10 Pages 1761
    Published: October 01, 2023
    Released on J-STAGE: October 01, 2023
    JOURNAL FREE ACCESS
    Download PDF (13K)
feedback
Top