ITE Transactions on Media Technology and Applications
Online ISSN : 2186-7364
ISSN-L : 2186-7364
Current issue
Displaying 1-18 of 18 articles from this issue
Special Section on 3D Media Technology 2026
  • Hiroyasu Ujike
    2026Volume 14Issue 1 Pages 1
    Published: 2026
    Released on J-STAGE: January 01, 2026
    JOURNAL FREE ACCESS
    Download PDF (408K)
  • Tomoki Inoue, Chihiro Tsutake, Keita Takahashi, Toshiaki Fujii
    2026Volume 14Issue 1 Pages 2-10
    Published: 2026
    Released on J-STAGE: January 01, 2026
    JOURNAL FREE ACCESS

    We propose a hybrid reconstruction method for coded light-field imaging. Most previous methods utilized pre-trained reconstruction, in which the reconstruction process was first pre-trained on a light-field dataset taken from various 3-D scenes and then applied to new target 3-D scenes. However, pre-trained reconstruction is not necessarily optimal for a specific 3-D scene and sometimes results in insufficient reconstruction quality for the fine details. To address this issue, we first introduce a method of self-supervised reconstruction that focuses on the data observed from a specific 3-D scene. To this end, we incorporate a learning-based 3-D representation technique called neural radiance fields (NeRFs) into the framework of coded light-field imaging. Moreover, we combine pre-trained and self-supervised approaches seamlessly to synergize the strengths of both. Experimental results demonstrate that our method can achieve better reconstruction quality consistently over various 3-D scenes than the previous pre-trained methods.

    Download PDF (3318K)
  • Kaito Hori, Chihiro Tsutake, Keita Takahashi, Toshiaki Fujii
    2026Volume 14Issue 1 Pages 11-17
    Published: 2026
    Released on J-STAGE: January 01, 2026
    JOURNAL FREE ACCESS

    To time-efficiently and stably acquire the intensity information for phase retrieval under a coherent illumination, we leverage an event-based vision sensor(EVS) that can detect changes in logarithmic intensity at the pixel level with a wide dynamic range. In our optical system, we translate the EVS along the optical axis, where the EVS records the intensity changes induced by defocus as events. To recover phase distributions, we formulate a partial differential equation, referred to as the transport of event equation, which presents a linear relationship between the defocus events and the phase distribution. We demonstrate through experiments that the EVS is more advantageous than the conventional image sensor for rapidly and stably detecting the intensity information, defocus events, which enables accurate phase retrieval, particularly under low-lighting conditions.

    Download PDF (4846K)
  • Riku Shiobara, Hideki Kakeya
    2026Volume 14Issue 1 Pages 18-24
    Published: 2026
    Released on J-STAGE: January 01, 2026
    JOURNAL FREE ACCESS

    The interleaved linear Fresnel lens is known to improve image quality in autostereoscopic displays with time-multiplexed directional backlight. In this paper, we propose a method to reduce crosstalk by narrowing the width of the elemental lenses in the interleaved linear Fresnel lens array, thereby decreasing the width of the directional light. Furthermore, to address the luminance non-uniformity that stands out when using a fine-pitch interleaved linear Fresnel lens array, we propose a technique in which an adaptive mask pattern is displayed on a liquid crystal display (LCD) panel and superimposed on the lens array. The proposed method reduced the crosstalk level approximately by half and improved luminance uniformity regardless of the viewing angle by applying an adaptive mask pattern.

    Download PDF (4817K)
  • Hiroto Omori, Hideki Kakeya
    2026Volume 14Issue 1 Pages 25-32
    Published: 2026
    Released on J-STAGE: January 01, 2026
    JOURNAL FREE ACCESS

    High-resolution coarse integral imaging (CII) utilizes a fine-pitch interleaved Fresnel lens array; however, the smaller lens pitch dramatically increases the number of views to be rendered, hindering real-time performance. Moreover, extending the viewing zone requires elemental images to be generated in real time to match the viewer's eye position, making high frame rates indispensable. To address these challenges, we introduce a cluster-level culling technique that accelerates multi-view rendering for CII and sustains real-time frame rates even for scenes with millions of polygons. We further implement the technique in a prototype system on an iPad and an iPhone, both of which achieve real-time performance, demonstrating that CII can be realized with an exceptionally simple mobile-device setup.

    Download PDF (9311K)
  • Kyosuke Yanagida, Takafumi Koike
    2026Volume 14Issue 1 Pages 33-41
    Published: 2026
    Released on J-STAGE: January 01, 2026
    JOURNAL FREE ACCESS

    We developed an autostereoscopic 3D live system using a minimal configuration of real-time stereo capture and a binocular autostereoscopic 3D display. We compared the displayed contents of the autostereoscopic 3D live system and conventional 2D video conferencing through both quantitative analysis and subjective evaluation. For the quantitative evaluation, we measured and assessed the fixation duration on the face area of the conversational partner using eye tracking. Our system showed longer fixation duration and a greater number of fixations on the face area compared to 2D video conferencing. For the subjective evaluation, we surveyed participants and analyzed the results. We used Wilcoxon signed-rank tests to compare our system with 2D video conferencing. The results show that our system achieves higher ratings for sense of presence and realism.

    Download PDF (3920K)
  • Ryotaro Umemoto, Sei Sato, Kengo Fujii, Tomohiro Yendo, Masahiro Iwaha ...
    2026Volume 14Issue 1 Pages 42-50
    Published: 2026
    Released on J-STAGE: January 01, 2026
    JOURNAL FREE ACCESS

    This paper presents a novel AR 3D display system using a semi-transparent anisotropic reflective screen with polarization selectivity. The proposed screen efficiently reflects projector light to generate parallax, while remaining trans-parent to background light, thus improving display visibility and reducing unwanted illumination of background objects. The screen is fabricated from a wire grid polarizer with embossed dimples, achieving both anisotropic reflectivity and polarization selectivity. We describe the system principle, fabrication method, and experimental evaluation. Experimental results show that the screen achieves high visibility and horizontal parallax. The display will contribute to the development of immersive, large, and high-visibility AR 3D displays.

    Download PDF (6648K)
  • Yasunori Akashi, Daisuke Kuramoto, Changyo Han, Takeshi Naemura
    2026Volume 14Issue 1 Pages 51-56
    Published: 2026
    Released on J-STAGE: January 01, 2026
    JOURNAL FREE ACCESS

    In tabletop systems using mid-air images, the viewing area of mid-air images in multiple directions around the table enables various experiences. When displaying mid-air images of 3D objects using a Retro-Transmissive Plate (RT Plate), two RT Plates are commonly used to correct pseudoscopic effects. However, since the two plates must be positioned with an offset, the viewing area of the mid-air image is limited to one direction of the table, and there is a distance between the observer and the mid-air image. The proposed method uses multiple symmetrical mirror structures perpendicular to the two plates. By using mirror images to virtually double the length of the plates, the method expands the viewing area, allowing the mid-air image to be viewed from opposite sides of the table. Additionally, eliminating the positional offset between the two plates enables the distance from the observers to the mid-air image displayable area to be reduced.

    Download PDF (5337K)
  • Takeru Nishiyama, Shiro Suyama, Hirotsugu Yamamoto
    2026Volume 14Issue 1 Pages 57-64
    Published: 2026
    Released on J-STAGE: January 01, 2026
    JOURNAL FREE ACCESS

    In this study, we propose a method for improving aerial image gaps in dual-sided aerial imaging optical system in AIRR by periodic motion of retro-reflector. Slitted retro-reflectors in the dual-sided display cause areas where no retro-reflectors exist in the image formation optical path, resulting in a problem of aerial image gaps. In this paper, as the gap position of retro-reflector can be continuously changed by periodically moving the retro-reflector, the visibility of the aerial image can be improved.

    Download PDF (8508K)
  • Sotaro Kaneko, Kazuaki Takiyama, Shiro Suyama, Hirotsugu Yamamoto
    2026Volume 14Issue 1 Pages 65-72
    Published: 2026
    Released on J-STAGE: January 01, 2026
    JOURNAL FREE ACCESS

    We propose an aerial heater optical system that enhances the realism and operability of interactive aerial displays by providing thermal sensation at the aerial image position. The system is based on aerial imaging by retro-reflection (AIRR) by use of a punched metal beam splitter and an aluminum-coated corner cube array to converge far-infrared radiation (FIR) from a heat source. Ray-tracing simulation confirms the principle and quantifies the optical efficiency based on structural parameters such as the hole diameter, pitch, and thickness of the punched metal, and the prism size of the retro-reflector. A prototype has been developed and indicates a strong correlation between the simulated optical efficiency and the measured temperature increase.

    Download PDF (6465K)
  • Ryota Yamada, Shiro Suyama, Hirotsugu Yamamoto
    2026Volume 14Issue 1 Pages 73-77
    Published: 2026
    Released on J-STAGE: January 01, 2026
    JOURNAL FREE ACCESS

    This study demonstrates that the system for estimating the 3D shape of a convex aerial image can be implemented by use of a hand-tracking device, the Leap Motion Controller 2. In the formation of 3D aerial images based on Aerial Imaging by Retro-Reflection (AIRR), when assuming a perfectly spherical screen during image projection, distortions and degradation of interaction accuracy are caused if the actual screen is elliptical. Whenever the screen shape changes due to replacement or aging, the shape must be remeasured. In this study, the observer points to five positions on the aerial image: the center, the top, the bottom, the left, and the right edge, which are captured by the Leap Motion, and the principal axis lengths are easily and cost-effectively estimated as 3D shape. The estimated lengths of the principal axis of the aerial image closely matched those of the actual screen obtained from photographic measurements.

    Download PDF (3131K)
  • Keitaro Sameshima, Ann Ito, Kimitaka Tsutsumi
    2026Volume 14Issue 1 Pages 78-84
    Published: 2026
    Released on J-STAGE: January 01, 2026
    JOURNAL FREE ACCESS

    In this study, we develop a basic principle of a new on-ear headphone driven by hemi-spherical speaker array. It reproduces a desired sound field near the ear canal using the speaker array. By imposing an l2-norm ball on the loss function, we derived a convex projection-based method that minimizes the error between the desired and reproduced sound fields while preventing a speaker gain fluctuation that is likely to occur with the conventional pseudo-inverse method. Computer simulation results show that the convex projection method achieves signal-to-noise ratio (SNR) of approximately 20 dB while maintaining a smaller standard deviation of 0.95 dB of the speaker gains compared to the pseudo-inverse method.

    Download PDF (1362K)
  • Tokio Satou, Kimitaka Tsutsumi
    2026Volume 14Issue 1 Pages 85-91
    Published: 2026
    Released on J-STAGE: January 01, 2026
    JOURNAL FREE ACCESS

    In this study, we investigate a method of capturing spatial sound accurately in a format of higher order ambisonics (HOA), where the sound pressure distribution is expressed with spherical harmonic series expansion, with the small number of microphones by explicitly modeling the sparsity of sound sources in space. To incorporate the sparsity, the proposed method first find a sparse solution in the domain of plain wave expansion using least absolute shrinkage and selection operator (LASSO), then convert the solution into spherical harmonic series. Computer simulation results show that the proposed method achieved improvements of approximately 30 dB in signal-to-noise ratio compared with the conventional spherical harmonic expansion based method especially when the number of microphones is small.

    Download PDF (1705K)
Special Section on Fast-track Review
  • Shingo Ando
    2026Volume 14Issue 1 Pages 92
    Published: 2026
    Released on J-STAGE: January 01, 2026
    JOURNAL FREE ACCESS
    Download PDF (386K)
  • Yuki Rogi, Kota Yoshida, Ayaka Banno, Takeshi Fujino, Shunsuke Okura
    2026Volume 14Issue 1 Pages 93-101
    Published: 2026
    Released on J-STAGE: January 01, 2026
    JOURNAL FREE ACCESS

    With the development of IoT technology, edge AI is widely expected. Security and recovery from attacks are important for further development of edge AI. One of the attacks on edge AI is adversarial example (AE) attack which artificially causes false recognition by adding perturbation. As one of the solutions, a defense method to remove adversarial perturbation by adding disturbance noise and then using denoising autoencoder (DAE) has been proposed. In this paper, we first show that the effectiveness of the defense method noise is low when the perturbation noise is based on predictable pseudorandom. Next, we propose a defense method based on unpredictable pixel reset noise of a CMOS image sensor and a pre-processing to enhance the randomness of the perturbation noise. According to simulation results, we confirmed that the defense performance against AE attacks is improved by approximately 30%.

    Download PDF (4818K)
  • Keiichiro Kuroda, Yudai Morikaku, Yu Osuka, Ryoya Iegaki, Ryuichi Ujii ...
    2026Volume 14Issue 1 Pages 102-109
    Published: 2026
    Released on J-STAGE: January 01, 2026
    JOURNAL FREE ACCESS

    Anticipating the rise of the Internet of Things (IoT) era, we have proposed an object detection framework that employs a CMOS image sensor with binary feature extraction to reduce power requirements. Initially, we presented a lightweight deep neural network for the feature data based on the YOLOv7, comparable to the YOLOv7-tiny in the number of parameters and FLOPs, but it enhances large object recognition accuracy (APL50) by 6.6%. Moreover, our approach achieves a 48.8% reduction of GPU power consumption compared to the YOLOv7. Additionally, we introduce an on-chip signal processing method for the binary feature data. The proposed method achieves a compression rate of 64.1% and increases GPU power consumption by only 14.9% during the decoding process preceding object detection. Moreover, the size of 1-bit feature data is reduced by 96.0%, and object recognition accuracy is improved by 4.0% relative to 1-bit RGB color images.

    Download PDF (3675K)
  • Jinlong Zhu, Keigo Sakurai, Ren Togo, Takahiro Ogawa, Miki Haseyama
    2026Volume 14Issue 1 Pages 110-118
    Published: 2026
    Released on J-STAGE: January 01, 2026
    JOURNAL FREE ACCESS

    We propose a novel text-controllable polyphonic symbolic music generation method based on diffusion models. Symbolic music generation has garnered significant attention due to its flexibility and seamless integration with Digital Audio Workstations (DAWs), as it enables the generation of MIDI files, facilitating easier modification compared to waveform music. Although existing techniques enable control through chords or other metadata, few methods allow intuitive control via text prompts, which better align with user preferences. To address this limitation, we introduce Text-Controllable Polyphonic Symbolic Music Generation (TPSMG), a diffusion model specifically designed for text-conditioned symbolic music generation. Our approach incorporates a text condition module into a U-Net backbone within a Denoising Diffusion Probabilistic Model. This module translates text prompts into embeddings that steer the denoising process, thereby enabling precise, text-based control over music generation. Experimental results demonstrate that our method generates high-quality polyphonic symbolic music outputs that closely reflect the intended textual input.

    Download PDF (2353K)
Regular Section
feedback
Top