ITE Transactions on Media Technology and Applications
Online ISSN : 2186-7364
ISSN-L : 2186-7364
Volume 9, Issue 1
Displaying 1-14 of 14 articles from this issue
Special Section on Advanced Imaging and Computer Graphics Technology
  • Nobuhiko Mukai
    2021 Volume 9 Issue 1 Pages 1
    Published: 2021
    Released on J-STAGE: January 01, 2021
    JOURNAL FREE ACCESS
    Download PDF (408K)
  • Issei Fujishiro, Anri Kobayashi
    2021 Volume 9 Issue 1 Pages 2-12
    Published: 2021
    Released on J-STAGE: January 01, 2021
    JOURNAL FREE ACCESS
    Supplementary material

    When a musical instrument player performs music, the accompanying visual information can have a significant effect on the performance. In this paper, we present an ambient music co-player (AMP) as a system that generates background videos in response to the impromptu performance of a single musical instrument player. The AMP system evaluates the performance to interpret the real player's emotional impression and generates an influential video based on the results of the evaluation. The player tends to change their performance while being inspired by the generated video, further triggering the system to modify the video. The AMP system aims to establish an affective loop where the system continues applying stimuli to the performance of the real player. The final goal of this study is to make the system act as a “co-player” of the player and to amplify the quality of the player's performing experience entirely through interactions between the two. By conducting a user evaluation, it was proven that the AMP system was able to inspire an amateur guitarist as the subject through affective video generation and to make his performance better than when playing alone.

    Download PDF (1217K)
  • Sayaka Minewaki, Yo Umeki, Ryosuke Harakawa, Masahiro Iwahashi
    2021 Volume 9 Issue 1 Pages 13-24
    Published: 2021
    Released on J-STAGE: January 01, 2021
    JOURNAL FREE ACCESS

    An image after tone mapping (TM) has noise bias, i.e., noise values with a non-zero mean, because of the non-linearity of the TM function. Therefore, noise reduction filters based on the zero-mean assumption do not work well for such images. To overcome this limitation, noise bias compensation (NBC) divides pixels into subsets depending on their values and adaptively adjusts them using a Bayesian approach. However, previous studies on NBC target only gray-scale images and assume that the noise mean before TM is zero. This paper proposes a method for NBC that targets color images processed by TM with a non-zero noise mean. The proposed method adaptively calculates the compensation values based on prior knowledge that represents noise corresponding to each pixel value of RGB channels with a Bayesian approach. Experimental results show this Bayesian approach successfully reduces noise even for color images containing noise with a non-zero mean.

    Download PDF (4045K)
  • Kohei Sakai, Yasutaka Inagaki, Keita Takahashi, Toshiaki Fujii, Hajime ...
    2021 Volume 9 Issue 1 Pages 25-32
    Published: 2021
    Released on J-STAGE: January 01, 2021
    JOURNAL FREE ACCESS
    Supplementary material

    A light field can carry rich visual information of a real 3-D scene, leading to many attractive applications. However, the acquisition of a light field is challenging due to the large amount of data. In our previous work, we proposed an efficient method for this task using a coded-aperture camera with a convolutional neural network (CNN) which can computationally reconstruct a light field from several images acquired with different aperture patterns. In this work, we report two follow-up contributions to the previous work. First, we integrated a color filter array, which is common in RGB cameras, and the related color processing into the algorithm pipeline. This integration led to better reconstruction quality for color light fields. We then analyzed how the reconstruction quality obtained with our method was affected by the complexity of light fields. We also showed the possibility of using this analysis to predict the reconstruction quality from the acquired images.

    Download PDF (1635K)
  • Takuya Natsume, Masamichi Oishi, Marie Oshima, Nobuhiko Mukai
    2021 Volume 9 Issue 1 Pages 33-41
    Published: 2021
    Released on J-STAGE: January 01, 2021
    JOURNAL FREE ACCESS

    The study of fluid analysis includes many examples of particle methods such as SPH and MPS. We have also performed a simulation and compared the results with those of a physical experiment in which viscous fluid was injected from a circular tube into a water tank in order to investigate the droplet formation process. Previous studies used an interfacial tension model that could consider the influence of the other phase in two-phase flow. However, the environment of the physical experiment differed from that of the simulation. Therefore, in this study we performed a viscous fluid injection simulation using the same environment as in the physical experiment. On the basis of the results, we have validated the proposed method by comparing the droplet size and the formation cycle between the physical experiment and the simulation.

    Download PDF (7884K)
  • Tomoki Haruyama, Sho Takahashi, Takahiro Ogawa, Miki Haseyama
    2021 Volume 9 Issue 1 Pages 42-53
    Published: 2021
    Released on J-STAGE: January 01, 2021
    JOURNAL FREE ACCESS

    A new method that generates user-selectable event summaries from unedited raw soccer videos is presented in this paper. Since there are more unedited raw soccer videos than broadcasted/distributed soccer videos and unedited videos have various viewers, it is necessary to analyze these videos for meeting the demands of various viewers. The proposed method introduces a multimodal CNN-BiLSTM architecture for analyzing unedited raw soccer videos. This architecture extracts candidate scenes for event summarization from unedited soccer videos and classifies these scenes into typical events. Finally, our method generates user-selectable event summaries by simultaneously considering the importance of candidate scenes and the event classification results. Experimental results using real unedited raw soccer videos show the effectiveness of our method.

    Download PDF (2740K)
  • Kazuma Ohtomo, Ryosuke Harakawa, Takahiro Ogawa, Miki Haseyama, Masahi ...
    2021 Volume 9 Issue 1 Pages 54-61
    Published: 2021
    Released on J-STAGE: January 01, 2021
    JOURNAL FREE ACCESS

    Tumblr is a popular micro-blogging service on which users can share posts comprising text and images. This paper presents a method for personalizing post recommendations for each user from a large number of posts. Specifically, we develop a supervised multi-variational auto encoder considering user preference (SMVAE-UP). SMVAE-UP can extract relationships between text and image features by considering class information representing a user's preference for each post; thus, preference-aware multimodal features can be calculated. Furthermore, for each target user, a network that enables comparison between a user and posts in the same feature space is constructed using the preference-aware multimodal features and metadata on posts. By applying graph convolutional networks (GCNs) to the network constructed for each target user, an accurate recommendation matching each user's preferred posts becomes feasible. Experimental results for real-world datasets including six users and 99,844 posts show the effectiveness of our method.

    Download PDF (1294K)
Special Section on ITE Awards Selection
Regular Section
  • Peng Wang, Jay Arre Toque, Ari Ide-Ektessabi
    2021 Volume 9 Issue 1 Pages 71-79
    Published: 2021
    Released on J-STAGE: January 01, 2021
    JOURNAL FREE ACCESS

    In digital archiving of cultural heritages that have uneven and textured surfaces, it is challenging for researchers to acquire as much information as possible without having to used multiple techniques or devices. In this paper, a line-sensor camera based photometric stereo method for high-resolution 2.5D surface shape reconstruction is proposed. Compared to using an area sensor-based camera, a line sensor-based one can acquire higher resolution images with less geometric distortion, which can yield more accurate depth map results. A technique for estimating the lighting direction based on such system is addressed in detail for high-resolution surface shape reconstruction with improved efficiency. Experimental results yielded reconstructed depth maps comparable with the accuracy of conventional method like a laser ranger. In addition to surface reconstruction, the images acquired are colorimetrically accurate. This means that the method can produce both stereoscopic and spectroscopic information for digitizing cultural heritage with textured surfaces.

    Download PDF (5818K)
  • Garimagai Borjigin, Hideki Kakeya
    2021 Volume 9 Issue 1 Pages 80-85
    Published: 2021
    Released on J-STAGE: January 01, 2021
    JOURNAL FREE ACCESS

    In this paper, we propose autostereoscopic displays with novel directional backlight designs. A high level of stereoscopic crosstalk is the main problem to be solved in the conventional systems. To reduce crosstalk, we propose a directional backlight system that suppresses the effect of field curvature only with a single layer of curved lens array. It is confirmed that the crosstalk level is reduced notably by the proposed methods. The uniformity of backlight intensity is also increased by using a lens array composed of trapezoid elemental lenses in place of rectangle elemental lenses.

    Download PDF (3701K)
  • Kota Imaeda, Keita Takahashi, Toshiaki Fujii, Yukihiro Bandoh, Seishi ...
    2021 Volume 9 Issue 1 Pages 86-94
    Published: 2021
    Released on J-STAGE: January 01, 2021
    JOURNAL FREE ACCESS

    An efficient coding method for light fields (LFs) is presented. The method is based on a sophisticated video coding standard called High Efficiency Video Coding (HEVC), but dose not directly encode the LF images using the HEVC codec. Instead, the LF images are first transformed into a set of smaller number of images, called basis images, to remove the redundancies among the images. The basis images are then reordered to produce a temporally smooth video sequence, which is finally encoded using the HEVC codec. In the decoding process, the decoded frames are inversely transformed into the original LF. The first and final transformations are modeled using neural networks and optimized for the target LF. The frame reordering is formulated as a traveling salesman problem (TSP) and solved using a greedy method. The experimental results show that our method can achieve better rate-distortion performance than other HEVC-based light-field coding methods.

    Download PDF (2192K)
  • Hiroki Tsurusaki, Keisuke Nonaka, Ryosuke Watanabe, Tomoaki Konno, Sei ...
    2021 Volume 9 Issue 1 Pages 95-104
    Published: 2021
    Released on J-STAGE: January 01, 2021
    JOURNAL FREE ACCESS

    Sports scene analysis is an important technology to quantify a player's action and visualize game statistics. To realize such technology, camera calibration is required to recognize the player's position from a video. In this paper, we propose an automatic camera calibration method by using intersection resorting and refinement. Our contributions are 1) flexible intersection selection and 2) intersection refinement to improve accuracy in calibration. A homography matrix is used to convert world coordinates to image coordinates for the calibration. Sports scenes can be estimated by using a priori information such as length and position of field lines and their intersections. Conventional methods using the field lines and intersections cannot realize sufficient calibration accuracy because the intersections are selected from the combination of horizontal and vertical lines. Moreover, displacement at the intersections occurs between the detected position and a real one on the input image. Our proposed method can solve these problems by flexible intersection selection and refinement. As a result, a player's position in the real world is identified from the video by using the estimated homography matrix. Our experimental results show that the proposed method achieves higher accuracy than that by conventional methods.

    Download PDF (2432K)
  • Shinji Kimura, Yuji Aburakawa, Fumiaki Watanabe, Shiho Torashima, Shun ...
    2021 Volume 9 Issue 1 Pages 105-112
    Published: 2021
    Released on J-STAGE: January 01, 2021
    JOURNAL FREE ACCESS

    Online video communication systems are widely used among users between remote locations, but the communication quality is still inferior to face-to-face communication. To enhance the quality of video communication systems, it is necessary to provide a sense of presence of the users at a high level and to realize eye contact for better non-verbal communication. The technologies that realize virtual image projection and frontal image capture are promising for such purposes. However, conventional systems require bulky display screens. Thus, we proposed a thinner system using a holographic optical element (HOE), which was utilized as a transparent off-axis mirror and helped increase the flexibility of the system configuration, thus reducing the depth of the space in front. To verify the feasibility of the proposed system, we established a proof-of-concept system with dispersion compensation optics and a full-color HOE, and the system simultaneously realized virtual image projection and frontal image capture.

    Download PDF (3994K)
feedback
Top