ITE Transactions on Media Technology and Applications
Online ISSN : 2186-7364
ISSN-L : 2186-7364
Current issue
Displaying 1-23 of 23 articles from this issue
Special Section on Welcome to the Special Section on Technologies for Post-COVID 3D Media
  • Hideki Kakeya
    2025 Volume 13 Issue 1 Pages 1
    Published: 2025
    Released on J-STAGE: January 01, 2025
    JOURNAL FREE ACCESS
    Download PDF (25K)
  • Hiroto Omori, Hideki Kakeya
    2025 Volume 13 Issue 1 Pages 2-7
    Published: 2025
    Released on J-STAGE: January 01, 2025
    JOURNAL FREE ACCESS
    Supplementary material

    In this paper, we propose coarse integral imaging composed of a fine interleaved Fresnel lens array. By using elemental lenses with smaller widths, the parallax between adjacent elemental images is reduced, resulting in improved image continuity. Additionally, the small parallax leads to clearer images with minimal blurring even when the images are blended by the interleaved Fresnel lens array. To address the issue of narrow viewing zone due to smaller elemental lenses, eye-tracking technology is used to follow the motion of the viewer, which enables observation of the image from a wide viewing angle. Single-pass multi-view rendering is introduced to generate a large number of small elemental images in real-time.

    Download PDF (3141K)
  • Hiroki Takatsuka, Takumi Watanabe, Shiro Suyama, Munekazu Date, Hirots ...
    2025 Volume 13 Issue 1 Pages 8-13
    Published: 2025
    Released on J-STAGE: January 01, 2025
    JOURNAL FREE ACCESS

    A novel display covering entire field of view with image is highly attractive for various application. This paper proposes to reconstruct an aerial image behind viewing eyes that are different from normal parallax of convergence side to realize ultra-wide field-of-view. As aerial display has no hardware around aerial image, we can observe aerial image formed behind the eyes between the display hardware and aerial image. We have developed a prototype aerial display to form an image just behind viewing eyes using aerial imaging by retro-reflection (AIRR). Even when aerial image is reconstructed behind viewing eyes, left and right reversed image with diverged absolute parallax to aerial image can be observed. Furthermore, our proposed method can cover entire field of view with images.

    Download PDF (3194K)
  • Shinya Sakane, Hiroki Takatsuka, Shiro Suyama, Hirotsugu Yamamoto
    2025 Volume 13 Issue 1 Pages 14-22
    Published: 2025
    Released on J-STAGE: January 01, 2025
    JOURNAL FREE ACCESS

    Our goal is to realize a large-scale aerial display for the use of traffic information display. One of the primary challenges is to enhance the floating sensation of the aerial image. Depth perception diminishes with increasing distance due to inherent human visual limitations. Therefore, ingenuity was needed to make the aerial image appear to float over long distances. We propose the use of an aerial guide to enhance the observer's perception of the floating sensation in the aerial image at extended distance. The aerial guide is a frame that surrounds the aerial sign on the aerial image plane. The purpose of this paper is to confirm the effectiveness of aerial guide. We investigate how the presence of the aerial guide alters the observer's floating sensation using our prototype. Additionally, we examine the impact of varying colors on the floating effect of the aerial images through the method of pair comparison.

    Download PDF (3777K)
  • Kyoya Hino, Kensuke Tamano, Shiro Suyama, Hirotsugu Yamamoto
    2025 Volume 13 Issue 1 Pages 23-30
    Published: 2025
    Released on J-STAGE: January 01, 2025
    JOURNAL FREE ACCESS

    We propose the aerial see-through Depth-Fused 3D (DFD) display that allows the depth position of the DFD image to be freely changed and enables the observer to observe both the DFD image and the background through the image. We investigate this psychophysical experiments on depth perception. Since the position of the floating edge image by the Arc 3D display and the aerial image by aerial imaging by retro-reflection (AIRR) can be changed, the aerial see-through DFD display allows the position of the DFD image to be freely expressed. In addition, since the arc 3D substrate appears to be nearly colorless and transparent, the background can be observed through the DFD image.

    Download PDF (4899K)
  • Shoma Yada, Yasushi Onishi, Takafumi Koike
    2025 Volume 13 Issue 1 Pages 31-37
    Published: 2025
    Released on J-STAGE: January 01, 2025
    JOURNAL FREE ACCESS

    We evaluate the image quality of aerial images using Retroreflective Mirror Array (RMA) and propose a model for the misalignment of the glass plate during manufacturing that causes the degradation of the image quality of the aerial images. We evaluated RMA's resolution characteristics using the Modulation Transfer Function and modeled its cause of distortion of aerial imaging.

    Download PDF (2620K)
  • Riku Shiobara, Garimagai Borjigin, Hideki Kakeya
    2025 Volume 13 Issue 1 Pages 38-43
    Published: 2025
    Released on J-STAGE: January 01, 2025
    JOURNAL FREE ACCESS

    In this paper, we propose an autostereoscopic display with time-multiplexed directional backlight using an electroluminescent display as a light source to improve power efficiency and luminance uniformity. We made a prototype system composed of a 26.5-inch OLED display with a refresh rate of 120 Hz for backlight, a 24-inch TN LCD panel with the same refresh rate for imaging, and an interleaved linear Fresnel lens array to generate collimated directional light. We evaluate crosstalk level, ratio of luminance to power consumption, and luminance uniformity given by the conventional system using a pair of LCD panels and the proposed system composed of an LCD panel and an OLED display. It is confirmed that our proposed method improves power efficiency and luminance uniformity.

    Download PDF (2990K)
  • Yusuke Matsuda, Yukihiro Hirata
    2025 Volume 13 Issue 1 Pages 44-52
    Published: 2025
    Released on J-STAGE: January 01, 2025
    JOURNAL FREE ACCESS
    Supplementary material

    In this study, we developed a system that allow users to switch between first- and third-person perspectives in a virtual reality (VR) environment. We investigated how this perspective-switching system affects the sense of embodiment, sense of presence, VR motion sickness, and task efficiency, compared to using only first- or third-person perspectives. The task was an "object-carrying task" where a designated object was placed in a designated box within a limited time, accompanied by "walking" in VR environment. This task had a higher degree of freedom and complexity than those in previous studies. The experiment revealed that switching to a third-person perspective improved the sense of embodiment. However, there was no significant advantage over the first-person perspective.

    Download PDF (1175K)
  • Takeru Nishiyama, Shiro Suyama, Hirotsugu Yamamoto
    2025 Volume 13 Issue 1 Pages 53-60
    Published: 2025
    Released on J-STAGE: January 01, 2025
    JOURNAL FREE ACCESS

    For realize large aerial imaging by retro-reflection (AIRR) with reduced effect of retro-reflectors gap, we propose a method to visually complement the gap of aerial image. AIRR composed of multiple retro-reflectors for large image size has a problem of the gap in the aerial image due to the reflector gap. In this paper, we derive equations for obtaining the width of the gap between retro-reflectors and the width of the retro-reflectors that enable the perception of the entire aerial image. The equation provides the maximum width of the retro-reflector that can complement the gap depends on the viewing distance and distance from the eye to the retro-reflectors.

    Download PDF (3445K)
  • Tatsuya Shiratori, Kengo Fujii, Tomohiro Yendo
    2025 Volume 13 Issue 1 Pages 61-74
    Published: 2025
    Released on J-STAGE: January 01, 2025
    JOURNAL FREE ACCESS

    One of the problems with light field displays is that the resolution decreases the further away from the display surface. As a result, there is a limit to the displayable range in the depth direction. To solve this problem, the display surface is multilayered. By making the display surface multilayered, the distance between the display object and the display surface becomes closer, the resolution improves, and the displayable range in the depth direction is expanded. As a method to multilayer the display surface, this study proposes the use of two lens arrays with multiple types of lenses. In addition, this study proposes a method that allows a larger spacing between display surfaces using two lens arrays. Simulations are performed and it is confirmed that the display surfaces are multilayered and the resolution is improved.

    Download PDF (10584K)
  • Genki Takeuchi, Garimagai Borjigin, Hideki Kakeya
    2025 Volume 13 Issue 1 Pages 75-82
    Published: 2025
    Released on J-STAGE: January 01, 2025
    JOURNAL FREE ACCESS
    Supplementary material

    This paper proposes a novel method for realizing high-definition super-multiview three-dimensional displays using time-division multiplexing parallax barriers. Unlike conventional methods that present parallax in only one direction, the proposed method alternately displays a parallax barrier pattern with two differently tilted angles and corresponding images with parallax. This approach enables reproduction of parallax in all directions and focal effects for lines with arbitrary angles. Experimental results demonstrate that the proposed method induces focal accommodation of human eyes, as evidenced by studies involving human subjects. Overall, this study presents a promising approach for achieving high-quality three-dimensional displays with enhanced parallax and focal effects.

    Download PDF (4589K)
  • Masanobu Gido, Shota Nakagawa, Kensaku Mori, Hideki Kakeya
    2025 Volume 13 Issue 1 Pages 83-89
    Published: 2025
    Released on J-STAGE: January 01, 2025
    JOURNAL FREE ACCESS

    In this paper, we present a deep learning-based method for kidney and tumor region segmentation using 3D CT data from multiple sources. We conduct experiments by training with mixed datasets and fine-tuning transfer learning. Throughout these experiments, data augmentation is applied by blending arterial phase CT images and portal vein phase CT images. Our findings reveal a trend: higher accuracy in predicting kidney labels is achieved when fine-tuning transfer learning is applied, while higher accuracy in predicting tumor labels is attained when training with a mixed dataset. This suggests the effectiveness of fine-tuning when the variations in the datasets are relatively small, as seen in the case of kidneys. Conversely, training with mixed datasets proves effective when the variation in prediction targets is relatively large, such as with tumors. It is also confirmed that integrating the results by different training policies improves the overall segmentation results.

    Download PDF (2623K)
Special Section Welcome to the Special Section on Fast-track Review
  • Shingo Ando
    2025 Volume 13 Issue 1 Pages 90
    Published: 2025
    Released on J-STAGE: January 01, 2025
    JOURNAL FREE ACCESS
    Download PDF (26K)
  • Reo Fukunaga, Soh Yoshida, Ryota Higashimoto, Mitsuji Muneyasu
    2025 Volume 13 Issue 1 Pages 91-105
    Published: 2025
    Released on J-STAGE: January 01, 2025
    JOURNAL FREE ACCESS

    Recent methods for learning with noisy labels often mitigate the effects of noisy labels by sample selection and label correction. However, high feature similarity between classes can reduce the effectiveness of these methods. In this paper, we propose a learning method that uses contrastive learning to explicitly disentangle features of highly similar classes in the feature space. Specifically, we first compute the similarity between classes to identify similar classes. Next, we introduce a new loss function that separates the features of similar class samples in the feature space. This solves the problem of the mixing of similar classes, which affected previous methods. Our proposed method can easily be integrated into the loss functions of various existing methods. Experiments on CIFAR-10, CIFAR-100, WebVision, and Clothing1M show our method achieves high accuracy on datasets with various noise patterns, outperforming existing methods significantly at high noise rates.

    Download PDF (1083K)
  • Tomoki Abe, Soh Yoshida, Mitsuji Muneyasu
    2025 Volume 13 Issue 1 Pages 106-118
    Published: 2025
    Released on J-STAGE: January 01, 2025
    JOURNAL FREE ACCESS

    Fake news has become a significant societal problem, and the need for automatic fake news detection techniques is growing. In recent years, graph-based methods focusing on the structure of news propagation have been proposed and significantly improved detection accuracy. Although some methods consider the temporal evolution of the propagation structure using dynamic graphs, they typically use a two-step approach, where structural features are first extracted independently of the temporal information and are then combined with temporal features in a separate step. In this study, we propose a novel fake news detection method based on a dynamic graph convolutional network that directly incorporates time series information during structural feature extraction. By introducing time series-aware structural feature extraction, our method more effectively captures the temporal evolution of the news propagation structure, improving fake news detection performance. We evaluated the effectiveness of the proposed method through experiments on two real-world datasets, FakeNewsNet and FibVID.

    Download PDF (552K)
  • Ryotaro Ooe, Kazuhiro Fujita, Koji Shinomiya
    2025 Volume 13 Issue 1 Pages 119-125
    Published: 2025
    Released on J-STAGE: January 01, 2025
    JOURNAL FREE ACCESS

    License plates captured in surveillance videos often have insufficient resolution, making it difficult to recognize the numbers. In this paper, we propose a novel method for license plate number recognition using sparse PCA coefficients and Naive Bayes classifier. The proposed method is applied to low-resolution license plate images, and its performance is compared with two conventional methods: moment features with Bayes classifier and PCA coefficients with naive Bayes classifier. The evaluation results, including the first-candidate recognition rate, the recognition rate up to the second candidate, and the classification cross-entropy, show that the proposed method achieves the best performance.

    Download PDF (428K)
Regular Section
  • Yuta Mimura
    2025 Volume 13 Issue 1 Pages 126-135
    Published: 2025
    Released on J-STAGE: January 01, 2025
    JOURNAL FREE ACCESS

    Generative models excel in creating realistic images, yet their dependency on extensive datasets for training presents significant challenges, especially in domains where data collection is costly or challenging. Current data-efficient methods largely focus on Generative Adversarial Network (GAN) architectures, leaving a gap in training other types of generative models. Our study introduces “phased data augmentation” as a novel technique that addresses this gap by optimizing training in limited data scenarios without altering the inherent data distribution. By limiting the augmentation intensity throughout the learning phases, our method enhances the model's ability to learn from limited data, thus maintaining fidelity. Applied to a model integrating PixelCNNs with Vector Quantized Variational AutoEncoder 2 (VQ-VAE-2), our approach demonstrates superior performance in both quantitative and qualitative evaluations across diverse datasets. This represents an important step forward in the efficient training of likelihood-based models, extending the usefulness of data augmentation techniques beyond just GANs.

    Download PDF (1517K)
  • Kazutaka Hayakawa, Haruki Nishio, Yoshiaki Nakagawa, Tomokazu Sato
    2025 Volume 13 Issue 1 Pages 136-146
    Published: 2025
    Released on J-STAGE: January 01, 2025
    JOURNAL FREE ACCESS

    In this article, we propose a novel method to automatically estimates camera posture parameters of an in-vehicle camera unit. Several methods have already been proposed which estimate those parameters mainly relying on known texture patterns on the ground (e.g. like road signs, lane markers). Unlike conventional methods, our method achieves camera calibration without given texture patterns, by using the camera trajectory estimated by Structure from Motion (SfM) as a clue. As another contribution, we have evaluated the effectiveness of multiple techniques that are empirically known to improve the robustness and accuracy of SfM but have not been well discussed in the literature. In an experiment, we show that the pose parameters can automatically be estimated in the real driving environments, and the results from the proposed and compared methods are quantitatively evaluated.

    Download PDF (5016K)
  • Atsuro Ichigaya, Shunsuke Iwamura, Shimpei Nemoto, Yuichi Kondo, Kazuh ...
    2025 Volume 13 Issue 1 Pages 147-154
    Published: 2025
    Released on J-STAGE: January 01, 2025
    JOURNAL FREE ACCESS

    The new versatile video coding (VVC) standard is designed as a multi-purpose coding method to enhance both compression performance and functionality. Multilayer coding is one of significant features of this functionality. Mainstream single layer profiles in the traditional standards, such as AVC/H.264 and HEVC/H.265, are not designed to support multilayer coding functionality. The functionality is implemented in the extensional multilayer profiles. In contrast, VVC is designed to get this multilayer coding function as a basic function of new video coding. VVC implemented both Main 10 and Multilayer Main 10 profiles as primary profiles based on common architecture. The Multilayer Main 10 profile efficiently encodes multiple videos with varied resolutions, frame rates, and qualities. This study proposes a novel multilayer functionality using the Multilayer Main 10 profile, termed content layering, and introduces a pioneering sign-language video service system for broadcasting, which is envisaged as a next-generation broadcasting service.

    Download PDF (2268K)
  • Keiji Uemura, Kiyoshi Kiyokawa, Nobuchika Sakata
    2025 Volume 13 Issue 1 Pages 155-165
    Published: 2025
    Released on J-STAGE: January 01, 2025
    JOURNAL FREE ACCESS

    The proliferation of surveillance cameras and advancements in image analysis technology have facilitated the extraction of extensive information from images. However, these techniques extract features from the person's region in an image and provide information based on the person's appearance. This paper proposes a method for estimating the areas of interest of a person captured in an image by reconstructing the image from the first-person perspective. In this study, we aim to reconstruct a subject-viewpoint image by estimating the gaze direction of the person captured in an image. To achieve this, we propose a method that utilizes keypoints from 3D posture estimation for gaze direction estimation. Compared with a deep-neural-network-based approach that directly estimates the gaze direction from images, the proposed method exhibits comparable accuracy and processing speeds. In addition, our subject experiment reveals the characteristics of our method and the challenges.

    Download PDF (1611K)
  • Yuki Kakui, Kota Araki, Changyo Han, Shogo Fukushima, Takeshi Naemura
    2025 Volume 13 Issue 1 Pages 166-178
    Published: 2025
    Released on J-STAGE: January 01, 2025
    JOURNAL FREE ACCESS
    Supplementary material

    Screen-camera communication enables instantaneous retrieval of on-screen information. Imperceptible screen-camera communication, embedding data in videos unnoticeably, holds great promise as it does not interfere with the viewing experience. The imperceptible color vibration method achieves this by alternately displaying two colors with the same luminance for each pixel. However, decoding performance may deteriorate due to interframe differences in the original video content. To address this, we propose a novel decoding approach using a dual-camera smartphone to capture two images with different modulation values simultaneously. This method allows for computing color differences between temporally close images, reducing artifacts from temporal changes in the original content. Our experimental results demonstrate an improved decoding rate and decreased recognition time compared to the previous method.

    Download PDF (3034K)
  • Akio Kobayashi, Junji Onishi
    2025 Volume 13 Issue 1 Pages 179-186
    Published: 2025
    Released on J-STAGE: January 01, 2025
    JOURNAL FREE ACCESS

    This study undertakes an end-to-end braille translation approach for Japanese speech to address the needs of deafblind individuals. The Japanese language consists of many characters, such as kanji and kana, and much effort is required for manual translation. Thus, in Japan, automated braille translation is anticipated to enhance information accessibility for deafblind individuals. Conventionally, braille translation relies on separate ASR and braille software, creating a two-step process. This two-step process is inefficient because the speech-to-braille translation is performed via kana-kanji characters. On the other hand, Japanese Braille exhibits strong compatibility with automatic speech recognition (ASR) owing to its predominant use of kana characters, which mirrors Japanese phonetic features. Therefore, one-step speech-to-braille translation is expected to perform better than the conventional two-step method. In this study, we propose an end-to-end (E2E) approach using neural networks to translate Japanese Braille directly from speech and compare it with the conventional two-step method.

    Download PDF (427K)
  • Jikai Li, Shogo Muramatsu
    2025 Volume 13 Issue 1 Pages 187-199
    Published: 2025
    Released on J-STAGE: January 01, 2025
    JOURNAL FREE ACCESS

    This study develops a self-supervised image denoising technique that incorporates a structured deep image prior (DIP) approach with Stein's unbiased risk estimator and linear expansion of thresholding (SURE-LET). Leveraging interscale and interchannel dependencies of images to develop a multichannel denoising approach. The original DIP, introduced by Ulyanov et al. in 2018, requires a random image as the input for restoration, offering an advantage of not requesting training data. However, the interpretability of the role of the network is limited, and challenges exist in customizing its architecture to incorporate domain knowledge. This work integrates SURE-LET with Monte Carlo computation into the DIP framework, providing the reason of the random image supply and shifting the focus from generator to restorer design, thus enabling the network structure of DIP to more easily reflect domain knowledge. The significance of the developed method is confirmed through denoising simulations using the Kodak image dataset.

    Download PDF (6923K)
feedback
Top