Journal of Signal Processing
Online ISSN : 1880-1013
Print ISSN : 1342-6230
ISSN-L : 1342-6230
Volume 28, Issue 6
Special Issue on Nonlinear Circuits, Communications and Signal Processing (Editor-in-Chief: Takashi Yahagi)
Displaying 1-8 of 8 articles from this issue
  • Hidenori Matsuzaki
    2024 Volume 28 Issue 6 Pages 257-265
    Published: November 01, 2024
    Released on J-STAGE: November 01, 2024
    JOURNAL FREE ACCESS

    In this paper, we present an online method for estimating the spectral flatness of a stochastic process, in which a flatness measure is computed as a function of reflection coefficients obtained by linear prediction. Its implementation is straight-forward as the task of linear prediction is performed using a well-established algorithm known as the gradient adaptive lattice predictor. Several simulation results show that the algorithm can discriminate the magnitude of flatness, particularly the deviation from the ideal flatness in real time. This capability seems to be suitable for detecting anomalies in a nearly white stochastic process, including the innovation process in Kalman filtering as a typical example.

    Download PDF (7477K)
  • Yoto Ikezaki, Yuna Harada, Yuting Geng, Masato Nakayama, Takanobu Nish ...
    2024 Volume 28 Issue 6 Pages 267-275
    Published: November 01, 2024
    Released on J-STAGE: November 01, 2024
    JOURNAL FREE ACCESS

    Highly realistic sound-field reproduction systems have been attracting attention owing to their ability to reproduce a sound field. These systems commonly use electrodynamic loudspeakers (EDLs) to construct sound images. The directivity of EDLs is broad, so the constructed sound images become diffused owing to the reverberation characteristics of the room. Therefore, sharp sound-image construction requires a large, complex system. As a more feasible approach, we focus on parametric array loudspeakers (PALs). The directivity of PALs is narrow, so the constructed sound image is sharp. Therefore, we propose a sound-field reproduction system that involves using a PAL to construct a sharp sound image and EDLs to reproduce the sensation of reverberation. To reproduce the sensation of reverberation, the early reflections of the target sound field are calculated using the mirror-image method, and virtual early reflections are produced. Thus, the sharpness of the sound image and the sensation of reverberation can be easily controlled to reproduce a highly realistic sound field. We conducted objective evaluation experiments to verify the effectiveness of the proposed system.

    Download PDF (1950K)
  • Hayaki Ito, Shuya Shida, Yutaka Suzuki
    2024 Volume 28 Issue 6 Pages 277-283
    Published: November 01, 2024
    Released on J-STAGE: November 01, 2024
    JOURNAL FREE ACCESS

    Proficiency assessments of violin playing are subjective, posing a challenge to beginners in evaluating their skill levels. To address this concern, we conceptualized the quantitative assessment of proficiency levels using acoustic analysis, thereby objectively evaluating performance. The first experiment used a sharpness analysis for staccato playing. Acoustic features were observed at the beginning of the playing. In the second experiment, roughness analysis was utilized 0.2 s after the beginning of playing. Differences are evident in the number of peaks in the roughness time series. A relationship between the results of acoustic analysis and sensory evaluation was observed. Therefore, the results suggest that acoustic features can serve as proficiency indicators.

    Download PDF (1754K)
  • Siwei Wang, Kaoru Arakawa
    2024 Volume 28 Issue 6 Pages 285-292
    Published: November 01, 2024
    Released on J-STAGE: November 01, 2024
    JOURNAL FREE ACCESS

    In this paper, we propose a novel approach to makeup style transfer including hairstyle on the basis of StarGAN-v2, which is a generative adversarial network for multidomain image translation. The proposed method allows users to generate personalized ideal makeup face images by incorporating interactive evolutionary computation (IEC) into the StarGAN-v2 model. Here, the style codes of face images in StarGAN-v2 are considered chromosomes of individuals in the genetic algorithm, being optimized in the process of IEC. Unlike traditional makeup transfer methods that are limited to a fixed style, our system enables users to actively participate in the style selection process, resulting in more diverse and satisfactory makeup face images, considering the human subjective criteria. The results of computer simulations and their subjective evaluation by users are shown to verify the high performance of the proposed method.

    Download PDF (1477K)
  • Masahiro Ogawa, Takeshi Kumaki
    2024 Volume 28 Issue 6 Pages 293-299
    Published: November 01, 2024
    Released on J-STAGE: November 01, 2024
    JOURNAL FREE ACCESS

    In recent years, food problems have arisen due to population changes. To solve this problem, Advanced technologies such as robots and artificial intelligence are increasingly being used to improve the efficiency of agriculture. In particular, plant factories are attracting attention because they have a high affinity for advanced technologies and can be produced regardless of the cultivation location and climate. However, production in plant factories exhibits of higher management costs and lower profitability than traditional cultivation methods. It is thought that this problem can be solved by predicting plant growth and notifying the farm manager. In this research, we will use data that can be measured at plant factories to create a machine learning model which predicts, both the size and weight of an agricultural product from a single piece of data. As a result, we were able to predict multiple items using a relatively lightweight model. The overall error was small, with an average error rate of about 15%. Although the average error rate for weight was about 30%, we were able to create a model that behaves close to the actual measured values.

    Download PDF (1416K)
  • Yu Osuka, Kota Yoshida, Shunsuke Okura
    2024 Volume 28 Issue 6 Pages 301-307
    Published: November 01, 2024
    Released on J-STAGE: November 01, 2024
    JOURNAL FREE ACCESS

    For the forthcoming Internet of Things (IoT) era, it will be important to reduce the data output from sensors as well as their energy efficiency. Since conventional image sensor output data for photography are often redundant in AI applications, we propose a CMOS image sensor that can generate both RGB color images for humans and feature data for deep learning (DL). Use of feature data allows reducing the energy efficiency of the image classification system and saving storage space for imaging data. We performed experiments to demonstrate that the simulated feature data are suitable for use in image classification tasks. A five-layer convolutional neural network (CNN) classifier was trained and tested using the aggressively quantized feature data generated from a person dataset, where image classification accuracy was also improved when applying contrast enhancement. According to the experimental results, an accuracy of 95.9% was achieved using 1-bit feature data, resulting in 93.75% reduction in the amount of data compared to RGB color images.

    Download PDF (1767K)
  • Aulia Adila, Candy Olivia Mawalim, Takuto Isoyama, Masashi Unoki
    2024 Volume 28 Issue 6 Pages 309-313
    Published: November 01, 2024
    Released on J-STAGE: November 01, 2024
    JOURNAL FREE ACCESS

    A reliable speech watermarking technique must balance satisfying four requirements: inaudibility, robustness, blind detectability, and confidentiality. A previous study proposed a speech watermarking technique based on direct spread spectrum (DSS) using a linear prediction (LP) scheme, i.e., LP-DSS, that could simultaneously satisfy these four requirements. However, an inaudibility issue was found due to the incorporation of a blind detection scheme with frame synchronization. In this paper, we investigate the feasibility of utilizing a psychoacoustical model, which simulates auditory masking, to control the suitable embedding level of the watermark signal to resolve the inaudibility issue in the LP-DSS scheme. Evaluation results confirmed that controlling the embedding level with the psychoacoustical model, with a constant scaling factor setting, could balance the trade-off between inaudibility and detection ability with a payload up to 64 bps.

    Download PDF (857K)
  • Ye Htet, Thi Thi Zin, Hiroki Tamura, Kazuhiro Kondo, Etsuo Chosa
    2024 Volume 28 Issue 6 Pages 315-319
    Published: November 01, 2024
    Released on J-STAGE: November 01, 2024
    JOURNAL FREE ACCESS

    This study addresses the efficient recognition of elderly people's daily actions, emphasizing transition states, using privacy-preserving depth data and deep learning algorithms. Stereo-depth cameras collect data from an elder care center, ensuring privacy by capturing only depth information without revealing identifiable details. The research investigates spatial and temporal features in movement patterns by employing a Convolutional Neural Network (CNN) for transfer learning on segmented person image sequences to extract spatial features, while a Recurrent Neural Network (RNN) decoder extracts temporal features. The proposed study evaluated various CNN and RNN integrated architectures, assessing algorithmic performance on real-world data from three elderly participants. Experimental outcomes reveal the best model achieving 95% overall accuracy for all actions and an average accuracy of over 80% for classifying transition states. Beyond accuracy, comprehensive evaluation includes precision, recall, and F1-score, offering a thorough assessment of the developed algorithm's practical effectiveness on real-world data.

    Download PDF (967K)
feedback
Top