ITE Transactions on Media Technology and Applications
Online ISSN : 2186-7364
ISSN-L : 2186-7364
最新号
選択された号の論文の23件中1~23を表示しています
Special Section on Welcome to the Special Section on Technologies for Post-COVID 3D Media
Special Section Welcome to the Special Section on Fast-track Review
Regular Section
  • Yuta Mimura
    2025 年 13 巻 1 号 p. 126-135
    発行日: 2025年
    公開日: 2025/01/01
    ジャーナル フリー

    Generative models excel in creating realistic images, yet their dependency on extensive datasets for training presents significant challenges, especially in domains where data collection is costly or challenging. Current data-efficient methods largely focus on Generative Adversarial Network (GAN) architectures, leaving a gap in training other types of generative models. Our study introduces “phased data augmentation” as a novel technique that addresses this gap by optimizing training in limited data scenarios without altering the inherent data distribution. By limiting the augmentation intensity throughout the learning phases, our method enhances the model's ability to learn from limited data, thus maintaining fidelity. Applied to a model integrating PixelCNNs with Vector Quantized Variational AutoEncoder 2 (VQ-VAE-2), our approach demonstrates superior performance in both quantitative and qualitative evaluations across diverse datasets. This represents an important step forward in the efficient training of likelihood-based models, extending the usefulness of data augmentation techniques beyond just GANs.

  • Kazutaka Hayakawa, Haruki Nishio, Yoshiaki Nakagawa, Tomokazu Sato
    2025 年 13 巻 1 号 p. 136-146
    発行日: 2025年
    公開日: 2025/01/01
    ジャーナル フリー

    In this article, we propose a novel method to automatically estimates camera posture parameters of an in-vehicle camera unit. Several methods have already been proposed which estimate those parameters mainly relying on known texture patterns on the ground (e.g. like road signs, lane markers). Unlike conventional methods, our method achieves camera calibration without given texture patterns, by using the camera trajectory estimated by Structure from Motion (SfM) as a clue. As another contribution, we have evaluated the effectiveness of multiple techniques that are empirically known to improve the robustness and accuracy of SfM but have not been well discussed in the literature. In an experiment, we show that the pose parameters can automatically be estimated in the real driving environments, and the results from the proposed and compared methods are quantitatively evaluated.

  • Atsuro Ichigaya, Shunsuke Iwamura, Shimpei Nemoto, Yuichi Kondo, Kazuh ...
    2025 年 13 巻 1 号 p. 147-154
    発行日: 2025年
    公開日: 2025/01/01
    ジャーナル フリー

    The new versatile video coding (VVC) standard is designed as a multi-purpose coding method to enhance both compression performance and functionality. Multilayer coding is one of significant features of this functionality. Mainstream single layer profiles in the traditional standards, such as AVC/H.264 and HEVC/H.265, are not designed to support multilayer coding functionality. The functionality is implemented in the extensional multilayer profiles. In contrast, VVC is designed to get this multilayer coding function as a basic function of new video coding. VVC implemented both Main 10 and Multilayer Main 10 profiles as primary profiles based on common architecture. The Multilayer Main 10 profile efficiently encodes multiple videos with varied resolutions, frame rates, and qualities. This study proposes a novel multilayer functionality using the Multilayer Main 10 profile, termed content layering, and introduces a pioneering sign-language video service system for broadcasting, which is envisaged as a next-generation broadcasting service.

  • Keiji Uemura, Kiyoshi Kiyokawa, Nobuchika Sakata
    2025 年 13 巻 1 号 p. 155-165
    発行日: 2025年
    公開日: 2025/01/01
    ジャーナル フリー

    The proliferation of surveillance cameras and advancements in image analysis technology have facilitated the extraction of extensive information from images. However, these techniques extract features from the person's region in an image and provide information based on the person's appearance. This paper proposes a method for estimating the areas of interest of a person captured in an image by reconstructing the image from the first-person perspective. In this study, we aim to reconstruct a subject-viewpoint image by estimating the gaze direction of the person captured in an image. To achieve this, we propose a method that utilizes keypoints from 3D posture estimation for gaze direction estimation. Compared with a deep-neural-network-based approach that directly estimates the gaze direction from images, the proposed method exhibits comparable accuracy and processing speeds. In addition, our subject experiment reveals the characteristics of our method and the challenges.

  • Yuki Kakui, Kota Araki, Changyo Han, Shogo Fukushima, Takeshi Naemura
    2025 年 13 巻 1 号 p. 166-178
    発行日: 2025年
    公開日: 2025/01/01
    ジャーナル フリー
    電子付録

    Screen-camera communication enables instantaneous retrieval of on-screen information. Imperceptible screen-camera communication, embedding data in videos unnoticeably, holds great promise as it does not interfere with the viewing experience. The imperceptible color vibration method achieves this by alternately displaying two colors with the same luminance for each pixel. However, decoding performance may deteriorate due to interframe differences in the original video content. To address this, we propose a novel decoding approach using a dual-camera smartphone to capture two images with different modulation values simultaneously. This method allows for computing color differences between temporally close images, reducing artifacts from temporal changes in the original content. Our experimental results demonstrate an improved decoding rate and decreased recognition time compared to the previous method.

  • Akio Kobayashi, Junji Onishi
    2025 年 13 巻 1 号 p. 179-186
    発行日: 2025年
    公開日: 2025/01/01
    ジャーナル フリー

    This study undertakes an end-to-end braille translation approach for Japanese speech to address the needs of deafblind individuals. The Japanese language consists of many characters, such as kanji and kana, and much effort is required for manual translation. Thus, in Japan, automated braille translation is anticipated to enhance information accessibility for deafblind individuals. Conventionally, braille translation relies on separate ASR and braille software, creating a two-step process. This two-step process is inefficient because the speech-to-braille translation is performed via kana-kanji characters. On the other hand, Japanese Braille exhibits strong compatibility with automatic speech recognition (ASR) owing to its predominant use of kana characters, which mirrors Japanese phonetic features. Therefore, one-step speech-to-braille translation is expected to perform better than the conventional two-step method. In this study, we propose an end-to-end (E2E) approach using neural networks to translate Japanese Braille directly from speech and compare it with the conventional two-step method.

  • Jikai Li, Shogo Muramatsu
    2025 年 13 巻 1 号 p. 187-199
    発行日: 2025年
    公開日: 2025/01/01
    ジャーナル フリー

    This study develops a self-supervised image denoising technique that incorporates a structured deep image prior (DIP) approach with Stein's unbiased risk estimator and linear expansion of thresholding (SURE-LET). Leveraging interscale and interchannel dependencies of images to develop a multichannel denoising approach. The original DIP, introduced by Ulyanov et al. in 2018, requires a random image as the input for restoration, offering an advantage of not requesting training data. However, the interpretability of the role of the network is limited, and challenges exist in customizing its architecture to incorporate domain knowledge. This work integrates SURE-LET with Monte Carlo computation into the DIP framework, providing the reason of the random image supply and shifting the focus from generator to restorer design, thus enabling the network structure of DIP to more easily reflect domain knowledge. The significance of the developed method is confirmed through denoising simulations using the Kodak image dataset.

feedback
Top