IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
E105.D 巻, 10 号
選択された号の論文の25件中1~25を表示しています
Special Section on Formal Approaches
Special Section on Picture Coding and Image Media Processing
  • Ichiro MATSUDA
    2022 年 E105.D 巻 10 号 p. 1678
    発行日: 2022/10/01
    公開日: 2022/10/01
    ジャーナル フリー
  • Kohei TATEISHI, Chihiro TSUTAKE, Keita TAKAHASHI, Toshiaki FUJII
    原稿種別: PAPER
    2022 年 E105.D 巻 10 号 p. 1679-1690
    発行日: 2022/10/01
    公開日: 2022/10/01
    ジャーナル フリー

    A light field (LF), which is represented as a set of dense, multi-view images, has been used in various 3D applications. To make LF acquisition more efficient, researchers have investigated compressive sensing methods by incorporating certain coding functionalities into a camera. In this paper, we focus on a challenging case called snapshot compressive LF imaging, in which an entire LF is reconstructed from only a single acquired image. To embed a large amount of LF information in a single image, we consider two promising methods based on rapid optical control during a single exposure: time-multiplexed coded aperture (TMCA) and coded focal stack (CFS), which were proposed individually in previous works. Both TMCA and CFS can be interpreted in a unified manner as extensions of the coded aperture (CA) and focal stack (FS) methods, respectively. By developing a unified algorithm pipeline for TMCA and CFS, based on deep neural networks, we evaluated their performance with respect to other possible imaging methods. We found that both TMCA and CFS can achieve better reconstruction quality than the other snapshot methods, and they also perform reasonably well compared to methods using multiple acquired images. To our knowledge, we are the first to present an overall discussion of TMCA and CFS and to compare and validate their effectiveness in the context of compressive LF imaging.

  • Yoshitaka KIDANI, Haruhisa KATO, Kei KAWAMURA, Hiroshi WATANABE
    原稿種別: PAPER
    2022 年 E105.D 巻 10 号 p. 1691-1703
    発行日: 2022/10/01
    公開日: 2022/10/01
    ジャーナル フリー

    Geometric partitioning mode (GPM) is a new inter prediction tool adopted in versatile video coding (VVC), which is the latest video coding of international standard developed by joint video expert team in 2020. Different from the regular inter prediction performed on rectangular blocks, GPM separates a coding block into two regions by the pre-defined 64 types of straight lines, generates inter predicted samples for each separated region, and then blends them to obtain the final inter predicted samples. With this feature, GPM improves the prediction accuracy at the boundary between the foreground and background with different motions. However, GPM has room to further improve the prediction accuracy if the final predicted samples can be generated using not only inter prediction but also intra prediction. In this paper, we propose a GPM with inter and intra prediction to achieve further enhanced compression capability beyond VVC. To maximize the coding performance of the proposed method, we also propose the restriction of the applicable intra prediction mode number and the prohibition of applying the intra prediction to both GPM-separated regions. The experimental results show that the proposed method improves the coding performance gain by the conventional GPM method of VVC by 1.3 times, and provides an additional coding performance gain of 1% bitrate savings in one of the coding structures for low-latency video transmission where the conventional GPM method cannot be utilized.

  • Seung-Tak NOH, Hiroki HARADA, Xi YANG, Tsukasa FUKUSATO, Takeo IGARASH ...
    原稿種別: PAPER
    2022 年 E105.D 巻 10 号 p. 1704-1711
    発行日: 2022/10/01
    公開日: 2022/10/01
    ジャーナル フリー

    It is important to consider curvature properties around the control points to produce natural-looking results in the vector illustration. C2 interpolating splines satisfy point interpolation with local support. Unfortunately, they cannot control the sharpness of the segment because it utilizes trigonometric function as blending function that has no degree of freedom. In this paper, we alternate the definition of C2 interpolating splines in both interpolation curve and blending function. For the interpolation curve, we adopt a rational Bézier curve that enables the user to tune the shape of curve around the control point. For the blending function, we generalize the weighting scheme of C2 interpolating splines and replace the trigonometric weight to our novel hyperbolic blending function. By extending this basic definition, we can also handle exact non-C2 features, such as cusps and fillets, without losing generality. In our experiment, we provide both quantitative and qualitative comparisons to existing parametric curve models and discuss the difference among them.

  • Wenhao HUANG, Akira TSUGE, Yin CHEN, Tadashi OKOSHI, Jin NAKAZAWA
    原稿種別: PAPER
    2022 年 E105.D 巻 10 号 p. 1712-1720
    発行日: 2022/10/01
    公開日: 2022/10/01
    ジャーナル フリー

    Crowdedness of buses is playing an increasingly important role in the disease control of COVID-19. The lack of a practical approach to sensing the crowdedness of buses is a major problem. This paper proposes a bus crowdedness sensing system which exploits deep learning-based object detection to count the numbers of passengers getting on and off a bus and thus estimate the crowdedness of buses in real time. In our prototype system, we combine YOLOv5s object detection model with Kalman Filter object tracking algorithm to implement a sensing algorithm running on a Jetson nano-based vehicular device mounted on a bus. By using the driving recorder video data taken from real bus, we experimentally evaluate the performance of the proposed sensing system to verify that our proposed system system improves counting accuracy and achieves real-time processing at the Jetson Nano platform.

  • Kotaro MATSUURA, Chihiro TSUTAKE, Keita TAKAHASHI, Toshiaki FUJII
    原稿種別: LETTER
    2022 年 E105.D 巻 10 号 p. 1721-1725
    発行日: 2022/10/01
    公開日: 2022/10/01
    ジャーナル フリー

    Inspired by the framework of algorithm unrolling, we propose a scalable network architecture that computes layer patterns for light field displays, enabling control of the trade-off between the display quality and the computational cost on a single pre-trained network.

Regular Section
  • Ana GUASQUE, Patricia BALBASTRE
    原稿種別: PAPER
    専門分野: Fundamentals of Information Systems
    2022 年 E105.D 巻 10 号 p. 1726-1733
    発行日: 2022/10/01
    公開日: 2022/10/01
    ジャーナル フリー

    In order to obtain a feasible schedule of a hard real-time system, heuristic based techniques are the solution of choice. In the last few years, optimization solvers have gained attention from research communities due to their capability of handling large number of constraints. Recently, some works have used integer linear programming (ILP) for solving mono processor scheduling of real-time systems. In fact, ILP is commonly used for static scheduling of multiprocessor systems. However, two main solvers are used to solve the problem indistinctly. But, which one is the best for obtaining a schedulable system for hard real-time systems? This paper makes a comparison of two well-known optimization software packages (CPLEX and GUROBI) for the problem of finding a feasible schedule on monoprocessor hard real-time systems.

  • Kenya TAJIMA, Takahiko HENMI, Tsuyoshi KATO
    原稿種別: PAPER
    専門分野: Artificial Intelligence, Data Mining
    2022 年 E105.D 巻 10 号 p. 1734-1742
    発行日: 2022/10/01
    公開日: 2022/10/01
    ジャーナル フリー

    Domain knowledge is useful to improve the generalization performance of learning machines. Sign constraints are a handy representation to combine domain knowledge with learning machine. In this paper, we consider constraining the signs of the weight coefficients in learning the linear support vector machine, and develop an optimization algorithm for minimizing the empirical risk under the sign constraints. The algorithm is based on the Frank-Wolfe method that also converges sublinearly and possesses a clear termination criterion. We show that each iteration of the Frank-Wolfe also requires O(nd+d2) computational cost. Furthermore, we derive the explicit expression for the minimal iteration number to ensure an ε-accurate solution by analyzing the curvature of the objective function. Finally, we empirically demonstrate that the sign constraints are a promising technique when similarities to the training examples compose the feature vector.

  • Jinyan LU, Quanzhen HUANG, Shoubing LIU
    原稿種別: PAPER
    専門分野: Artificial Intelligence, Data Mining
    2022 年 E105.D 巻 10 号 p. 1743-1750
    発行日: 2022/10/01
    公開日: 2022/10/01
    ジャーナル フリー

    For intelligent vision measurement, the geometric image feature extraction is an essential issue. Contour primitive of interest (CPI) means a regular-shaped contour feature lying on a target object, which is widely used for geometric calculation in vision measurement and servoing. To realize that the CPI extraction model can be flexibly applied to different novel objects, the one-shot learning based CPI extraction can be implemented with deep convolutional neural network, by using only one annotated support image to guide the CPI extraction process. In this paper, we propose a multi-stage contour primitives of interest extraction network (MS-CPieNet), which uses the multi-stage strategy to improve the discrimination ability of CPI and complex background. Second, the spatial non-local attention module is utilized to enhance the deep features, by globally fusing the image features with both short and long ranges. Moreover, the dense 4-direction classification is designed to obtain the normal direction of the contour, and the directions can be further used for the contour thinning post-process. The effectiveness of the proposed methods is validated by the experiments with the OCP and ROCM datasets. A 2-D measurement experiments are conducted to demonstrate the convenient application of the proposed MS-CPieNet.

  • Quan XIU HO, Takao JINNO, Yusuke UCHIMI, Shigeru KURIYAMA
    原稿種別: PAPER
    専門分野: Image Recognition, Computer Vision
    2022 年 E105.D 巻 10 号 p. 1751-1758
    発行日: 2022/10/01
    公開日: 2022/10/01
    ジャーナル フリー

    The colors of objects in natural images are affected by the color of lighting, and accurately estimating an illuminant's color is indispensable in analyzing scenes lit by colored lightings. Recent lighting environments enhance colorfulness due to the spread of light-emitting diode (LED) lightings whose colors are flexibly controlled in a full visible spectrum. However, existing color estimations mainly focus on the single illuminant of normal color ranges. The estimation of multiple illuminants of unusual color settings, such as blue or red of high chroma, has not been studied yet. Therefore, new color estimations should be developed for multiple illuminants of various colors. In this article, we propose a color estimation for LED lightings using Color Line features, which regards the color distribution as a straight line in a local area. This local estimate is suitable for estimating various colors of multiple illuminants. The features are sampled at many small regions in an image and aggregated to estimate a few global colors using supervised learning with a convolutional neural network. We demonstrate the higher accuracy of our method over existing ones for such colorful lighting environments by producing the image dataset lit by multiple LED lightings in a full-color range.

  • Yuichiro NOMURA, Takio KURITA
    原稿種別: PAPER
    専門分野: Image Recognition, Computer Vision
    2022 年 E105.D 巻 10 号 p. 1759-1768
    発行日: 2022/10/01
    公開日: 2022/10/01
    ジャーナル フリー

    In recent years, deep neural networks (DNNs) have made a significant impact on a variety of research fields and applications. One drawback of DNNs is that it requires a huge amount of dataset for training. Since it is very expensive to ask experts to label the data, many non-expert data collection methods such as web crawling have been proposed. However, dataset created by non-experts often contain corrupted labels, and DNNs trained on such dataset are unreliable. Since DNNs have an enormous number of parameters, it tends to overfit to noisy labels, resulting in poor generalization performance. This problem is called Learning with Noisy labels (LNL). Recent studies showed that DNNs are robust to the noisy labels in the early stage of learning before over-fitting to noisy labels because DNNs learn the simple patterns first. Therefore DNNs tend to output true labels for samples with noisy labels in the early stage of learning, and the number of false predictions for samples with noisy labels is higher than for samples with clean labels. Based on these observations, we propose a new sample selection approach for LNL using the number of false predictions. Our method periodically collects the records of false predictions during training, and select samples with a low number of false predictions from the recent records. Then our method iteratively performs sample selection and training a DNNs model using the updated dataset. Since the model is trained with more clean samples and records more accurate false predictions for sample selection, the generalization performance of the model gradually increases. We evaluated our method on two benchmark datasets, CIFAR-10 and CIFAR-100 with synthetically generated noisy labels, and the obtained results which are better than or comparative to the-state-of-the-art approaches.

  • Nenghuan ZHANG, Yongbin WANG, Xiaoguang WANG, Peng YU
    原稿種別: PAPER
    専門分野: Multimedia Pattern Processing
    2022 年 E105.D 巻 10 号 p. 1769-1779
    発行日: 2022/10/01
    公開日: 2022/10/01
    ジャーナル フリー

    Recently, multi-modal fusion methods based on remote sensing data and social sensing data have been widely used in the field of urban region function recognition. However, due to the high complexity of noise problem, most of the existing methods are not robust enough when applied in real-world scenes, which seriously affect their application value in urban planning and management. In addition, how to extract valuable periodic feature from social sensing data still needs to be further study. To this end, we propose a multi-modal fusion network guided by feature co-occurrence for urban region function recognition, which leverages the co-occurrence relationship between multi-modal features to identify abnormal noise feature, so as to guide the fusion network to suppress noise feature and focus on clean feature. Furthermore, we employ a graph convolutional network that incorporates node weighting layer and interactive update layer to effectively extract valuable periodic feature from social sensing data. Lastly, experimental results on public available datasets indicate that our proposed method yeilds promising improvements of both accuracy and robustness over several state-of-the-art methods.

  • Manaya TOMIOKA, Tsuneo KATO, Akihiro TAMURA
    原稿種別: PAPER
    専門分野: Natural Language Processing
    2022 年 E105.D 巻 10 号 p. 1780-1789
    発行日: 2022/10/01
    公開日: 2022/10/01
    ジャーナル フリー

    A neural conversational model (NCM) based on an encoder-decoder recurrent neural network (RNN) with an attention mechanism learns different sequence-to-sequence mappings from what neural machine translation (NMT) learns even when based on the same technique. In the NCM, we confirmed that target-word-to-source-word mappings captured by the attention mechanism are not as clear and stationary as those for NMT. Considering that vector norms indicate a magnitude of information in the processing, we analyzed the inner workings of an encoder-decoder GRU-based NCM focusing on the norms of word embedding vectors and hidden vectors. First, we conducted correlation analyses on the norms of word embedding vectors with frequencies in the training set and with conditional entropies of a bi-gram language model to understand what is correlated with the norms in the encoder and decoder. Second, we conducted correlation analyses on norms of change in the hidden vector of the recurrent layer with their input vectors for the encoder and decoder, respectively. These analyses were done to understand how the magnitude of information propagates through the network. The analytical results suggested that the norms of the word embedding vectors are associated with their semantic information in the encoder, while those are associated with the predictability as a language model in the decoder. The analytical results further revealed how the norms propagate through the recurrent layer in the encoder and decoder.

  • Yang LI, Rui QI
    原稿種別: PAPER
    専門分野: Natural Language Processing
    2022 年 E105.D 巻 10 号 p. 1790-1798
    発行日: 2022/10/01
    公開日: 2022/10/01
    ジャーナル フリー

    Stance prediction on social media aims to infer the stances of users towards a specific topic or event, which are not expressed explicitly. It is of great significance for public opinion analysis to extract and determine users' stances using user-generated content on social media. Existing research makes use of various signals, ranging from text content to online network connections of users on these platforms. However, it lacks joint modeling of the heterogeneous information for stance prediction. In this paper, we propose a self-supervised heterogeneous graph contrastive learning framework for stance prediction in online debate forums. Firstly, we perform data augmentation on the original heterogeneous information network to generate an augmented view. The original view and augmented view are learned from a meta-path based graph encoder respectively. Then, the contrastive learning among the two views is conducted to obtain high-quality representations of users and issues. Finally, the stance prediction is accomplished by matrix factorization between users and issues. The experimental results on an online debate forum dataset show that our model outperforms other competitive baseline methods significantly.

  • Taekeun PARK, Keewon KIM
    原稿種別: LETTER
    専門分野: Information Network
    2022 年 E105.D 巻 10 号 p. 1799-1802
    発行日: 2022/10/01
    公開日: 2022/10/01
    ジャーナル フリー

    In this paper, we propose a scheme to strengthen network-based moving target defense with disposable identifiers. The main idea is to change disposable identifiers for each packet to maximize unpredictability with large hopping space and substantially high hopping frequency. It allows network-based moving target defense to defeat active scanning, passive scanning, and passive host profiling attacks. Experimental results show that the proposed scheme changes disposable identifiers for each packet while requiring low overhead.

  • Yang WANG, Hongliang FU, Huawei TAO, Jing YANG, Hongyi GE, Yue XIE
    原稿種別: LETTER
    専門分野: Artificial Intelligence, Data Mining
    2022 年 E105.D 巻 10 号 p. 1803-1806
    発行日: 2022/10/01
    公開日: 2022/10/01
    ジャーナル フリー

    This letter focuses on the cross-corpus speech emotion recognition (SER) task, in which the training and testing speech signals in cross-corpus SER belong to different speech corpora. Existing algorithms are incapable of effectively extracting common sentiment information between different corpora to facilitate knowledge transfer. To address this challenging problem, a novel convolutional auto-encoder and adversarial domain adaptation (CAEADA) framework for cross-corpus SER is proposed. The framework first constructs a one-dimensional convolutional auto-encoder (1D-CAE) for feature processing, which can explore the correlation among adjacent one-dimensional statistic features and the feature representation can be enhanced by the architecture based on encoder-decoder-style. Subsequently the adversarial domain adaptation (ADA) module alleviates the feature distributions discrepancy between the source and target domains by confusing domain discriminator, and specifically employs maximum mean discrepancy (MMD) to better accomplish feature transformation. To evaluate the proposed CAEADA, extensive experiments were conducted on EmoDB, eNTERFACE, and CASIA speech corpora, and the results show that the proposed method outperformed other approaches.

  • Joanna Kazzandra DUMAGPI, Yong-Jin JEONG
    原稿種別: LETTER
    専門分野: Artificial Intelligence, Data Mining
    2022 年 E105.D 巻 10 号 p. 1807-1811
    発行日: 2022/10/01
    公開日: 2022/10/01
    ジャーナル フリー

    Fine-grained image analysis, such as pixel-level approaches, improves threat detection in x-ray security images. In the practical setting, the cost of obtaining complete pixel-level annotations increases significantly, which can be reduced by partially labeling the dataset. However, handling partially labeled datasets can lead to training complicated multi-stage networks. In this paper, we propose a new end-to-end object separation framework that trains a single network on a partially labeled dataset while also alleviating the inherent class imbalance at the data and object proposal level. Empirical results demonstrate significant improvement over existing approaches.

  • Kaito SATTA, Hiroaki SASAKI
    原稿種別: LETTER
    専門分野: Artificial Intelligence, Data Mining
    2022 年 E105.D 巻 10 号 p. 1812-1816
    発行日: 2022/10/01
    公開日: 2022/10/01
    ジャーナル フリー

    The purpose of graph embedding is to learn a lower-dimensional embedding function for graph data. Existing methods usually rely on maximum likelihood estimation (MLE), and often learn an embedding function through conditional mean estimation (CME). However, MLE is well-known to be vulnerable to the contamination of outliers. Furthermore, CME might restrict the applicability of the graph embedding methods to a limited range of graph data. To cope with these problems, this paper proposes a novel method for graph embedding called the robust ratio graph embedding (RRGE). RRGE is based on the ratio estimation between the conditional and marginal probability distributions of link weights given data vectors, and would be applicable to a wider-range of graph data than CME-based methods. Moreover, to achieve outlier-robust estimation, the ratio is estimated with the γ-cross entropy, which is a robust alternative to the standard cross entropy. Numerical experiments on artificial data show that RRGE is robust against outliers and performs well even when CME-based methods do not work at all. Finally, the performance of the proposed method is demonstrated on realworld datasets using neural networks.

  • Jiyeon LEE, Kilho LEE
    原稿種別: LETTER
    専門分野: Human-computer Interaction
    2022 年 E105.D 巻 10 号 p. 1817-1820
    発行日: 2022/10/01
    公開日: 2022/10/01
    ジャーナル フリー

    Privacy violations via spy cameras are becoming increasingly serious. With the recent advent of various smart home IoT devices, such as smart TVs and robot vacuum cleaners, spycam attacks that steal users' information are being carried out in more unpredictable ways. In this paper, we introduce a new spycam attack on a mobile WebVR environment. It is performed by a web attacker who maliciously accesses the back-facing cameras of victims' mobile devices while they are browsing the attacker's WebVR site. This has the power to allow the attacker to capture victims' surroundings even at the desired field of view through sophisticated content placement in VR scenes, resulting in serious privacy breaches for mobile VR users. In this letter, we introduce a new threat facing mobile VR and show that it practically works with major browsers in a stealthy manner.

  • Zhi LIU, Jia CAO, Xiaohan GUAN, Mengmeng ZHANG
    原稿種別: LETTER
    専門分野: Image Processing and Video Processing
    2022 年 E105.D 巻 10 号 p. 1821-1824
    発行日: 2022/10/01
    公開日: 2022/10/01
    ジャーナル フリー

    Inter-channel correlation is one of the redundancy which need to be eliminated in video coding. In the latest video coding standard H.266/VVC, the DM (Direct Mode) and CCLM (Cross-component Linear Model) modes have been introduced to reduce the similarity between luminance and chroma. However, inter-channel correlation is still observed. In this paper, a new inter-channel prediction algorithm is proposed, which utilizes coloring principle to predict chroma pixels. From the coloring perspective, for most natural content video frames, the three components Y, U and V always demonstrate similar coloring pattern. Therefore, the U and V components can be predicted using the coloring pattern of the Y component. In the proposed algorithm, correlation coefficients are obtained in a lightweight way to describe the coloring relationship between current pixel and reference pixel in Y component, and used to predict chroma pixels. The optimal position for the reference samples is also designed. Base on the selected position of the reference samples, two new chroma prediction modes are defined. Experiment results show that, compared with VTM 12.1, the proposed algorithm has an average of -0.92% and -0.96% BD-rate improvement for U and V components, for All Intra (AI) configurations. At the same time, the increased encoding time and decoding time can be ignored.

  • Zhi LIU, Fangyuan ZHAO, Mengmeng ZHANG
    原稿種別: LETTER
    専門分野: Image Processing and Video Processing
    2022 年 E105.D 巻 10 号 p. 1825-1828
    発行日: 2022/10/01
    公開日: 2022/10/01
    ジャーナル フリー

    In video-text retrieval task, mainstream framework consists of three parts: video encoder, text encoder and similarity calculation. MMT (Multi-modal Transformer) achieves remarkable performance for this task, however, it faces the problem of insufficient training dataset. In this paper, an efficient multimodal aggregation network for video-text retrieval is proposed. Different from the prior work using MMT to fuse video features, the NetVLAD is introduced in the proposed network. It has fewer parameters and is feasible for training with small datasets. In addition, since the function of CLIP (Contrastive Language-Image Pre-training) can be considered as learning language models from visual supervision, it is introduced as text encoder in the proposed network to avoid overfitting. Meanwhile, in order to make full use of the pre-training model, a two-step training scheme is designed. Experiments show that the proposed model achieves competitive results compared with the latest work.

  • Koki TSUBOTA, Hiroaki AKUTSU, Kiyoharu AIZAWA
    原稿種別: LETTER
    専門分野: Image Processing and Video Processing
    2022 年 E105.D 巻 10 号 p. 1829-1833
    発行日: 2022/10/01
    公開日: 2022/10/01
    ジャーナル フリー

    Image quality assessment (IQA) is a fundamental metric for image processing tasks (e.g., compression). With full-reference IQAs, traditional IQAs, such as PSNR and SSIM, have been used. Recently, IQAs based on deep neural networks (deep IQAs), such as LPIPS and DISTS, have also been used. It is known that image scaling is inconsistent among deep IQAs, as some perform down-scaling as pre-processing, whereas others instead use the original image size. In this paper, we show that the image scale is an influential factor that affects deep IQA performance. We comprehensively evaluate four deep IQAs on the same five datasets, and the experimental results show that image scale significantly influences IQA performance. We found that the most appropriate image scale is often neither the default nor the original size, and the choice differs depending on the methods and datasets used. We visualized the stability and found that PieAPP is the most stable among the four deep IQAs.

feedback
Top