ITE Transactions on Media Technology and Applications

Special Section on Multimedia Content Analysis

[Foreword] Welcome to the Special Section on Multimedia Content Analysis in the Transactions on Media Technology and Applications

Shin'ichi Satoh

2013 年1 巻2 号 p. 90
発行日: 2013年
公開日: 2013/04/01

DOIhttps://doi.org/10.3169/mta.1.90

ジャーナルフリー

PDF形式でダウンロード (24K)
[Invited Paper] Content Analysis for Home Videos

Naoko Nitta, Noboru Babaguchi

2013 年1 巻2 号 p. 91-100
発行日: 2013年
公開日: 2013/04/01

DOIhttps://doi.org/10.3169/mta.1.91

ジャーナルフリー

抄録を表示する抄録を非表示にする

The popularity of hand-held video camcorders has increased the amount of poor-quality home videos captured by amateur camcorder users. This paper introduces the content analysis techniques, namely, techniques for segmentation, indexing, and static and dynamic representation generation, which have been developed to help viewers watch such poor-quality videos by considering the characteristics of home videos.

抄録全体を表示

PDF形式でダウンロード (550K)
[Invited Paper] Explain This to Me!

A Study on Automatic Recompilation of Broadcast News Video

Ichiro Ide, Frank Nack

2013 年1 巻2 号 p. 101-117
発行日: 2013年
公開日: 2013/04/01

DOIhttps://doi.org/10.3169/mta.1.101

ジャーナルフリー

抄録を表示する抄録を非表示にする

This paper addresses a framework that facilitates the semi-automated authoring of already edited new stories available in the repository of a news corporation or a public broadcast video archive. The newly generated video explains the chronological development of a current event, such as the resignation of a Prime Minister. The aim is to facilitate a journalist with an audio-visual body based on which he/she can finalize the explanatory piece. The framework introduces techniques that exploit demoscopic data in form of polls for the development of the general story outline; the automatic retrieval of relevant material by using a combination of event templates and automatic news summarization over topic threads; and the generation of the final video by applying a set of trimming rules. Example generations are presented and discussed, and an outline of future work is presented.

抄録全体を表示

PDF形式でダウンロード (1281K)
[Paper] Using Trajectory Features to Recognize Human Actions Within Crowd Sequences of Real Surveillance Video

Masaki Takahashi, Masahide Naemura, Mahito Fujii, Shin'ichi Satoh

2013 年1 巻2 号 p. 118-126
発行日: 2013年
公開日: 2013/04/01

DOIhttps://doi.org/10.3169/mta.1.118

ジャーナルフリー

抄録を表示する抄録を非表示にする

There is growing interest in automatic recognition of human actions in video sequences shot by surveillance cameras. However, it's difficult to analyze human actions in real environments. That is, almost all of the current techniques can only detect simple actions within video sequences showing controlled environments. We propose action recognition methods based on multiple trajectories that can identify human actions within crowd sequences of real surveillance video. The methods use novel techniques for detecting diverse actions: a motion-speed invariant feature descriptor made from a key-point trajectory, and a weighting and clustering for the trajectory features. We conducted several experiments on the proposed methods, in which our previously proposed single-trajectory method was used as a baseline for comparison and the dataset was that of the TRECVID Surveillance Event Detection task. We discuss how to select the proper method to detect actions in crowd situations through an analysis of these experimental results.

抄録全体を表示

PDF形式でダウンロード (544K)
[Paper] Inferring Segmentation Label and Color Distribution in a Unified Framework using Global Constraints

Viet-Quoc Pham, Keita Takahashi, Takeshi Naemura

2013 年1 巻2 号 p. 127-137
発行日: 2013年
公開日: 2013/04/01

DOIhttps://doi.org/10.3169/mta.1.127

ジャーナルフリー

抄録を表示する抄録を非表示にする

In this paper, we propose a unified framework for inferring the segmentation label and color distribution of an image region of interest. Recent studies have shown that segmentation with global consistency measures outperforms conventional techniques based on pixelwise measures. However, such global approaches require a precise input distribution to obtain the correct extraction. To overcome this strict assumption, we propose a new approach in which the given reference distribution plays a guiding role in inferring the latent distribution and its consistent region. The inference is based on an assumption that the latent distribution resembles the distribution of the consistent region but is distinct from the distribution of the complement region. We state the problem as the minimization of an energy function consisting of global similarities and implement an iterative scheme for jointly optimizing distribution and segmentation. Rich experimental results demonstrate the advantages of using our approach with various segmentation problems.

抄録全体を表示

PDF形式でダウンロード (1042K)
[Paper] Relative-Distance-Based Soft Voting for Human Attribute Analysis using Top-View Images

Toshihiko Yamasaki, Tsuhan Chen

2013 年1 巻2 号 p. 138-147
発行日: 2013年
公開日: 2013/04/01

DOIhttps://doi.org/10.3169/mta.1.138

ジャーナルフリー

抄録を表示する抄録を非表示にする

This paper proposes a soft voting based bag-of-features (BoF) model considering relative distance of the feature vectors to the nearest-neighbor codeword. The proposed method is more efficient than the kernel distance based soft voting method, which requires brute force parameter optimization. The proposed algorithm is applied to human attribute analysis using top-view images and conventional object classification. The experimental results for the human attribute analysis have demonstrated 100% accuracy for both gender classification and bag possession status classification. It has also been demonstrated that discriminative ability is comparable to that of the fine-tuned kernel distance based soft voting method.

抄録全体を表示

PDF形式でダウンロード (332K)
[Paper] Aerial Image Matching using Adaptive Selection of Orientation Code Image Pyramids

Mitsuhiro Ishimaru, Makoto Sato

2013 年1 巻2 号 p. 148-156
発行日: 2013年
公開日: 2013/04/01

DOIhttps://doi.org/10.3169/mta.1.148

ジャーナルフリー

抄録を表示する抄録を非表示にする

A new pyramidal approach for aerial image matching is proposed. Challenges associated with aerial imagery, such as the complexity and diversity, variations with time, and large data size, have led to exploration of various techniques. One method uses orientation code matching, which together with a pyramidal approach can achieve efficient wide-area aerial image matching. However, as the pyramid levels deepen, the matching success rate tends to decrease. To avoid this problem, we classify aerial imagery broadly into two types of scenes, and define different methods that are appropriate to respective scenes. The proposed technique produces two orientation code pyramids, from which the appropriate one can be selected adaptively. Therefore, we can obtain robust and efficient matching for any scene. Experimental results obtained using both urban and mountainous scenes demonstrate that the matching success rate at the upper pyramid levels is superior to that obtained when using only one generation method.

抄録全体を表示

PDF形式でダウンロード (568K)
[Paper] Scene Detection Using a Large Number of Text Features

Ichiro Yamada, Yohei Nakada, Atsushi Matsui, Takashi Matsumoto, Kikuka ...

2013 年1 巻2 号 p. 157-166
発行日: 2013年
公開日: 2013/04/01

DOIhttps://doi.org/10.3169/mta.1.157

ジャーナルフリー

抄録を表示する抄録を非表示にする

Broadcasting stations store a large volume of TV programs and manage them in their archives. To enable such programs to be used effectively, the technique for analyzing what is depicted in each scene plays a crucial role. TV programs often contain typical scenes which are used for specific purposes. This paper proposes a novel method of detecting such typical scenes by analyzing the context of closed captions. The proposed method handles a huge number of text features extracted from the closed captions through its use of a Monte Carlo based boosting algorithm. In experiments, we classified text segments extracted from the closed captions as to whether or not the corresponding scene is typical one. The results confirmed that our method classified with comparable accuracy to a conventional method using the AdaBoost algorithm and achieved a dramatic reduction in the learning time.

抄録全体を表示

PDF形式でダウンロード (350K)

Regular Section

[Paper] Service Differentiation Based Incentive Mechanism for P2P Streaming in Hybrid Overlay Network

Suphakit Awiphan, Zhou Su, Jiro Katto

2013 年1 巻2 号 p. 167-177
発行日: 2013年
公開日: 2013/04/01

DOIhttps://doi.org/10.3169/mta.1.167

ジャーナルフリー

抄録を表示する抄録を非表示にする

In Peer-to-Peer (P2P) networks, an incentive mechanism is a necessary component to deal with the free-riding behavior. The challenge is that direct reciprocal incentives; e.g., tit-for-tat, which consider the cooperation of peers in a pair-wise manner, are not suited with P2P streaming. In this paper, we propose a new service differentiation mechanism to provide a redistribution incentive for P2P streaming in a hybrid overlay network. The contribution of a peer can be measured from the number of video sub-streams that it uploads to other peers. By sending one request message, the number of sub-streams that each peer can retrieve is varied by its contribution level. An altruistic peer thus has to send less request messages and will experience smoother video quality than a selfish peer. Through simulations, we demonstrate that our solution can provide service differentiation among peers with better streaming quality than the tit-for-tat scheme.

抄録全体を表示

PDF形式でダウンロード (346K)
[Paper] Quality Estimation Method for Fractal Compressed Images

Megumi Takezawa, Hirofumi Sanada, Miki Haseyama

2013 年1 巻2 号 p. 178-183
発行日: 2013年
公開日: 2013/04/01

DOIhttps://doi.org/10.3169/mta.1.178

ジャーナルフリー

抄録を表示する抄録を非表示にする

A method for estimating the quality of images compressed by fractal image compression is presented in this paper. Fractal image compression based on an iterated function system is one of the compression techniques for digital images. It utilizes the self-similarity of images and achieves high image-compression performance. However, fractal image compression is currently not being in widespread use because it does not necessarily provide high-quality compressed images. We cannot determine whether a given image is unsuitable for fractal image compression without encoding it. Therefore, in this paper, we propose a new criterion for estimating the suitability of fractal image compression for a given image. By using the proposed criterion, we can estimate the quality of the compressed image in a short time without actually encoding the image.

抄録全体を表示

PDF形式でダウンロード (301K)
[Paper] A CMOS Optoelectronic Neural Interface Device Based on an Image Sensor with On-chip Light Stimulation and Extracellular Neural Signal Recording for Optogenetics

Yosmongkol Sawadsaringkarn, Tomoaki Miyatani, Toshihiko Noda, Kiyotaka ...

2013 年1 巻2 号 p. 184-189
発行日: 2013年
公開日: 2013/04/01

DOIhttps://doi.org/10.3169/mta.1.184

ジャーナルフリー

抄録を表示する抄録を非表示にする

A CMOS-based optoelectronic device is proposed for on-chip neural stimulation and observation with optogenetic methodology. The device is capable of local light delivery for stimulation and electrical neural signal recording. The device consists of an array of InGaN light emitting diodes (LEDs) and Au stacked bump electrodes integrated on a CMOS image sensor. Capabilities of on-chip light stimulation and signal recording were quantitatively characterized. We have also confirmed that neuron-like cells can be cultured on the surface of the device.

抄録全体を表示

PDF形式でダウンロード (283K)
[Paper] Semantic Concept Detection based on Spatial Pyramid Matching and Semi-supervised Learning

Yoshihiko Kawai, Mahito Fujii

2013 年1 巻2 号 p. 190-198
発行日: 2013年
公開日: 2013/04/01

DOIhttps://doi.org/10.3169/mta.1.190

ジャーナルフリー

抄録を表示する抄録を非表示にする

Analyzing video for semantic content is very important for finding the desired video among a huge amount of accumulated video data. One conventional method for detecting objects depicted in video is called the bag-of-visual-words method, and is based on local feature occurrence frequencies. We propose a method that improves on the detection accuracy of traditional method by dividing video frames into overlapped sub-regions of various sizes. The method computes local and global features for each of these sub-regions to reflect spatial positioning in the feature vectors. These changes ensure that the method is resistant to variations in the size and position of objects appearing in the video. We also propose a training framework based on semi-supervised learning that uses a small number of labeled data points as a starting point and generates additional labeled training data efficiently, with few errors. Experiments using a video data set confirmed improved detection accuracy over earlier methods.

抄録全体を表示

PDF形式でダウンロード (145K)
[Letter] Pseudo-haptic Sensation Elicited by Background Visual Motion

Junji Watanabe

2013 年1 巻2 号 p. 199-202
発行日: 2013年
公開日: 2013/04/01

DOIhttps://doi.org/10.3169/mta.1.199

ジャーナルフリー

抄録を表示する抄録を非表示にする

Haptic interaction techniques based on visual feedback have been proposed. The primary principle of presenting the pseudo-haptic sensations is to change the ratio between displacement of user's input and visual displacement of its indicator. Here I report that a pseudo-haptic sensation can also be produced simply by modulating the speeds of background visual images, without changing the movement of visual indicator itself. In this work, I performed two psychophysical experiments to study dominant parameters for generating the pseudo-haptic sensation elicited by background visual motion.

抄録全体を表示

PDF形式でダウンロード (129K)

J-STAGEへの登録はこちら（無料）