ITE Transactions on Media Technology and Applications

Special Section on Advanced Imaging and Computer Graphics Technology

[Foreword] Welcome to the Special Section on Advanced Imaging and Computer Graphics Technology

Nobuhiko Mukai

2021 年9 巻1 号 p. 1
発行日: 2021年
公開日: 2021/01/01

DOIhttps://doi.org/10.3169/mta.9.1

ジャーナルフリー

PDF形式でダウンロード (408K)
[Invited Paper] Ambient Music Co-player: Generating Affective Video in Response to Impromptu Music Performance

Issei Fujishiro, Anri Kobayashi

2021 年9 巻1 号 p. 2-12
発行日: 2021年
公開日: 2021/01/01

DOIhttps://doi.org/10.3169/mta.9.2

ジャーナルフリー
電子付録

抄録を表示する抄録を非表示にする

When a musical instrument player performs music, the accompanying visual information can have a significant effect on the performance. In this paper, we present an ambient music co-player (AMP) as a system that generates background videos in response to the impromptu performance of a single musical instrument player. The AMP system evaluates the performance to interpret the real player's emotional impression and generates an influential video based on the results of the evaluation. The player tends to change their performance while being inspired by the generated video, further triggering the system to modify the video. The AMP system aims to establish an affective loop where the system continues applying stimuli to the performance of the real player. The final goal of this study is to make the system act as a “co-player” of the player and to amplify the quality of the player's performing experience entirely through interactions between the two. By conducting a user evaluation, it was proven that the AMP system was able to inspire an amateur guitarist as the subject through affective video generation and to make his performance better than when playing alone.

抄録全体を表示

PDF形式でダウンロード (1217K)
[Paper] Noise Bias Compensation for Color Images after Tone Mapping

Sayaka Minewaki, Yo Umeki, Ryosuke Harakawa, Masahiro Iwahashi

2021 年9 巻1 号 p. 13-24
発行日: 2021年
公開日: 2021/01/01

DOIhttps://doi.org/10.3169/mta.9.13

ジャーナルフリー

抄録を表示する抄録を非表示にする

An image after tone mapping (TM) has noise bias, i.e., noise values with a non-zero mean, because of the non-linearity of the TM function. Therefore, noise reduction filters based on the zero-mean assumption do not work well for such images. To overcome this limitation, noise bias compensation (NBC) divides pixels into subsets depending on their values and adaptively adjusts them using a Bayesian approach. However, previous studies on NBC target only gray-scale images and assume that the noise mean before TM is zero. This paper proposes a method for NBC that targets color images processed by TM with a non-zero noise mean. The proposed method adaptively calculates the compensation values based on prior knowledge that represents noise corresponding to each pixel value of RGB channels with a Bayesian approach. Experimental results show this Bayesian approach successfully reduces noise even for color images containing noise with a non-zero mean.

抄録全体を表示

PDF形式でダウンロード (4045K)
[Paper] CFA Handling and Quality Analysis for Compressive Light Field Camera

Kohei Sakai, Yasutaka Inagaki, Keita Takahashi, Toshiaki Fujii, Hajime ...

2021 年9 巻1 号 p. 25-32
発行日: 2021年
公開日: 2021/01/01

DOIhttps://doi.org/10.3169/mta.9.25

ジャーナルフリー
電子付録

抄録を表示する抄録を非表示にする

A light field can carry rich visual information of a real 3-D scene, leading to many attractive applications. However, the acquisition of a light field is challenging due to the large amount of data. In our previous work, we proposed an efficient method for this task using a coded-aperture camera with a convolutional neural network (CNN) which can computationally reconstruct a light field from several images acquired with different aperture patterns. In this work, we report two follow-up contributions to the previous work. First, we integrated a color filter array, which is common in RGB cameras, and the related color processing into the algorithm pipeline. This integration led to better reconstruction quality for color light fields. We then analyzed how the reconstruction quality obtained with our method was affected by the complexity of light fields. We also showed the possibility of using this analysis to predict the reconstruction quality from the acquired images.

抄録全体を表示

PDF形式でダウンロード (1635K)
[Paper] Droplet Formulation Method for Viscous Fluid Injection Considering the Effect of Liquid-Liquid Two-Phase Flow

Takuya Natsume, Masamichi Oishi, Marie Oshima, Nobuhiko Mukai

2021 年9 巻1 号 p. 33-41
発行日: 2021年
公開日: 2021/01/01

DOIhttps://doi.org/10.3169/mta.9.33

ジャーナルフリー

抄録を表示する抄録を非表示にする

The study of fluid analysis includes many examples of particle methods such as SPH and MPS. We have also performed a simulation and compared the results with those of a physical experiment in which viscous fluid was injected from a circular tube into a water tank in order to investigate the droplet formation process. Previous studies used an interfacial tension model that could consider the influence of the other phase in two-phase flow. However, the environment of the physical experiment differed from that of the simulation. Therefore, in this study we performed a viscous fluid injection simulation using the same environment as in the physical experiment. On the basis of the results, we have validated the proposed method by comparing the droplet size and the formation cycle between the physical experiment and the simulation.

抄録全体を表示

PDF形式でダウンロード (7884K)
[Paper] User-selectable Event Summarization in Unedited Raw Soccer Video via Multimodal Bidirectional LSTM

Tomoki Haruyama, Sho Takahashi, Takahiro Ogawa, Miki Haseyama

2021 年9 巻1 号 p. 42-53
発行日: 2021年
公開日: 2021/01/01

DOIhttps://doi.org/10.3169/mta.9.42

ジャーナルフリー

抄録を表示する抄録を非表示にする

A new method that generates user-selectable event summaries from unedited raw soccer videos is presented in this paper. Since there are more unedited raw soccer videos than broadcasted/distributed soccer videos and unedited videos have various viewers, it is necessary to analyze these videos for meeting the demands of various viewers. The proposed method introduces a multimodal CNN-BiLSTM architecture for analyzing unedited raw soccer videos. This architecture extracts candidate scenes for event summarization from unedited soccer videos and classifies these scenes into typical events. Finally, our method generates user-selectable event summaries by simultaneously considering the importance of candidate scenes and the event classification results. Experimental results using real unedited raw soccer videos show the effectiveness of our method.

抄録全体を表示

PDF形式でダウンロード (2740K)
[Paper] Personalized Recommendation of Tumblr Posts Using Graph Convolutional Networks with Preference-aware Multimodal Features

Kazuma Ohtomo, Ryosuke Harakawa, Takahiro Ogawa, Miki Haseyama, Masahi ...

2021 年9 巻1 号 p. 54-61
発行日: 2021年
公開日: 2021/01/01

DOIhttps://doi.org/10.3169/mta.9.54

ジャーナルフリー

抄録を表示する抄録を非表示にする

Tumblr is a popular micro-blogging service on which users can share posts comprising text and images. This paper presents a method for personalizing post recommendations for each user from a large number of posts. Specifically, we develop a supervised multi-variational auto encoder considering user preference (SMVAE-UP). SMVAE-UP can extract relationships between text and image features by considering class information representing a user's preference for each post; thus, preference-aware multimodal features can be calculated. Furthermore, for each target user, a network that enables comparison between a user and posts in the same feature space is constructed using the preference-aware multimodal features and metadata on posts. By applying graph convolutional networks (GCNs) to the network constructed for each target user, an accurate recommendation matching each user's preferred posts becomes feasible. Experimental results for real-world datasets including six users and 99,844 posts show the effectiveness of our method.

抄録全体を表示

PDF形式でダウンロード (1294K)

Special Section on ITE Awards Selection

[Foreword] Welcome to the Special Section on ITE Awards Selection

Toshiaki Fujii

2021 年9 巻1 号 p. 62
発行日: 2021年
公開日: 2021/01/01

DOIhttps://doi.org/10.3169/mta.9.62

ジャーナルフリー

PDF形式でダウンロード (401K)
[Invited Paper] Fast and Accurate Whole-Body Pose Estimation in the Wild and Its Applications

Jianfeng XU, Kazuyuki TASAKA, Masashi YAMAGUCHI

2021 年9 巻1 号 p. 63-70
発行日: 2021年
公開日: 2021/01/01

DOIhttps://doi.org/10.3169/mta.9.63

ジャーナルフリー

抄録を表示する抄録を非表示にする

Recently, multi-person pose estimation techniques have drawn significant attention in both academia and industry due to their great potential utility. Here, we present a fast and accurate in-the-wild whole-body pose estimation system. Our system detects not only body keypoints, but also foot and hand keypoints, with high accuracy in real time. Furthermore, we present two typical applications of our system, in the areas of fitness and speed climbing, and describe the optimization of these applications, which has yielded both computing acceleration and improved detection accuracy.

抄録全体を表示

PDF形式でダウンロード (2329K)

Regular Section

[Paper] A 2.5D Surface Structure Reconstruction Method using an Imaging System with Linear Sensor

Peng Wang, Jay Arre Toque, Ari Ide-Ektessabi

2021 年9 巻1 号 p. 71-79
発行日: 2021年
公開日: 2021/01/01

DOIhttps://doi.org/10.3169/mta.9.71

ジャーナルフリー

抄録を表示する抄録を非表示にする

In digital archiving of cultural heritages that have uneven and textured surfaces, it is challenging for researchers to acquire as much information as possible without having to used multiple techniques or devices. In this paper, a line-sensor camera based photometric stereo method for high-resolution 2.5D surface shape reconstruction is proposed. Compared to using an area sensor-based camera, a line sensor-based one can acquire higher resolution images with less geometric distortion, which can yield more accurate depth map results. A technique for estimating the lighting direction based on such system is addressed in detail for high-resolution surface shape reconstruction with improved efficiency. Experimental results yielded reconstructed depth maps comparable with the accuracy of conventional method like a laser ranger. In addition to surface reconstruction, the images acquired are colorimetrically accurate. This means that the method can produce both stereoscopic and spectroscopic information for digitizing cultural heritage with textured surfaces.

抄録全体を表示

PDF形式でダウンロード (5818K)
[Paper] Autostereoscopic Displays with Time-multiplexed Directional Backlight Using Curved Lens Arrays

Garimagai Borjigin, Hideki Kakeya

2021 年9 巻1 号 p. 80-85
発行日: 2021年
公開日: 2021/01/01

DOIhttps://doi.org/10.3169/mta.9.80

ジャーナルフリー

抄録を表示する抄録を非表示にする

In this paper, we propose autostereoscopic displays with novel directional backlight designs. A high level of stereoscopic crosstalk is the main problem to be solved in the conventional systems. To reduce crosstalk, we propose a directional backlight system that suppresses the effect of field curvature only with a single layer of curved lens array. It is confirmed that the crosstalk level is reduced notably by the proposed methods. The uniformity of backlight intensity is also increased by using a lens array composed of trapezoid elemental lenses in place of rectangle elemental lenses.

抄録全体を表示

PDF形式でダウンロード (3701K)
[Paper] HEVC-based Light-field Coding using Basis Images and Frame Reordering

Kota Imaeda, Keita Takahashi, Toshiaki Fujii, Yukihiro Bandoh, Seishi ...

2021 年9 巻1 号 p. 86-94
発行日: 2021年
公開日: 2021/01/01

DOIhttps://doi.org/10.3169/mta.9.86

ジャーナルフリー

抄録を表示する抄録を非表示にする

An efficient coding method for light fields (LFs) is presented. The method is based on a sophisticated video coding standard called High Efficiency Video Coding (HEVC), but dose not directly encode the LF images using the HEVC codec. Instead, the LF images are first transformed into a set of smaller number of images, called basis images, to remove the redundancies among the images. The basis images are then reordered to produce a temporally smooth video sequence, which is finally encoded using the HEVC codec. In the decoding process, the decoded frames are inversely transformed into the original LF. The first and final transformations are modeled using neural networks and optimized for the target LF. The frame reordering is formulated as a traveling salesman problem (TSP) and solved using a greedy method. The experimental results show that our method can achieve better rate-distortion performance than other HEVC-based light-field coding methods.

抄録全体を表示

PDF形式でダウンロード (2192K)
[Paper] Sports Camera Calibration using Flexible Intersection Selection and Refinement

Hiroki Tsurusaki, Keisuke Nonaka, Ryosuke Watanabe, Tomoaki Konno, Sei ...

2021 年9 巻1 号 p. 95-104
発行日: 2021年
公開日: 2021/01/01

DOIhttps://doi.org/10.3169/mta.9.95

ジャーナルフリー

抄録を表示する抄録を非表示にする

Sports scene analysis is an important technology to quantify a player's action and visualize game statistics. To realize such technology, camera calibration is required to recognize the player's position from a video. In this paper, we propose an automatic camera calibration method by using intersection resorting and refinement. Our contributions are 1) flexible intersection selection and 2) intersection refinement to improve accuracy in calibration. A homography matrix is used to convert world coordinates to image coordinates for the calibration. Sports scenes can be estimated by using a priori information such as length and position of field lines and their intersections. Conventional methods using the field lines and intersections cannot realize sufficient calibration accuracy because the intersections are selected from the combination of horizontal and vertical lines. Moreover, displacement at the intersections occurs between the detected position and a real one on the input image. Our proposed method can solve these problems by flexible intersection selection and refinement. As a result, a player's position in the real world is identified from the video by using the estimated homography matrix. Our experimental results show that the proposed method achieves higher accuracy than that by conventional methods.

抄録全体を表示

PDF形式でダウンロード (2432K)
[Paper] Holographic Video Communication System Realizing Virtual Image Projection and Frontal Image Capture

Shinji Kimura, Yuji Aburakawa, Fumiaki Watanabe, Shiho Torashima, Shun ...

2021 年9 巻1 号 p. 105-112
発行日: 2021年
公開日: 2021/01/01

DOIhttps://doi.org/10.3169/mta.9.105

ジャーナルフリー

抄録を表示する抄録を非表示にする

Online video communication systems are widely used among users between remote locations, but the communication quality is still inferior to face-to-face communication. To enhance the quality of video communication systems, it is necessary to provide a sense of presence of the users at a high level and to realize eye contact for better non-verbal communication. The technologies that realize virtual image projection and frontal image capture are promising for such purposes. However, conventional systems require bulky display screens. Thus, we proposed a thinner system using a holographic optical element (HOE), which was utilized as a transparent off-axis mirror and helped increase the flexibility of the system configuration, thus reducing the depth of the space in front. To verify the feasibility of the proposed system, we established a proof-of-concept system with dispersion compensation optics and a full-color HOE, and the system simultaneously realized virtual image projection and frontal image capture.

抄録全体を表示

PDF形式でダウンロード (3994K)

J-STAGEへの登録はこちら（無料）