The popularity of hand-held video camcorders has increased the amount of poor-quality home videos captured by amateur camcorder users. This paper introduces the content analysis techniques, namely, techniques for segmentation, indexing, and static and dynamic representation generation, which have been developed to help viewers watch such poor-quality videos by considering the characteristics of home videos.
This paper addresses a framework that facilitates the semi-automated authoring of already edited new stories available in the repository of a news corporation or a public broadcast video archive. The newly generated video explains the chronological development of a current event, such as the resignation of a Prime Minister. The aim is to facilitate a journalist with an audio-visual body based on which he/she can finalize the explanatory piece. The framework introduces techniques that exploit demoscopic data in form of polls for the development of the general story outline; the automatic retrieval of relevant material by using a combination of event templates and automatic news summarization over topic threads; and the generation of the final video by applying a set of trimming rules. Example generations are presented and discussed, and an outline of future work is presented.
There is growing interest in automatic recognition of human actions in video sequences shot by surveillance cameras. However, it's difficult to analyze human actions in real environments. That is, almost all of the current techniques can only detect simple actions within video sequences showing controlled environments. We propose action recognition methods based on multiple trajectories that can identify human actions within crowd sequences of real surveillance video. The methods use novel techniques for detecting diverse actions: a motion-speed invariant feature descriptor made from a key-point trajectory, and a weighting and clustering for the trajectory features. We conducted several experiments on the proposed methods, in which our previously proposed single-trajectory method was used as a baseline for comparison and the dataset was that of the TRECVID Surveillance Event Detection task. We discuss how to select the proper method to detect actions in crowd situations through an analysis of these experimental results.
In this paper, we propose a unified framework for inferring the segmentation label and color distribution of an image region of interest. Recent studies have shown that segmentation with global consistency measures outperforms conventional techniques based on pixelwise measures. However, such global approaches require a precise input distribution to obtain the correct extraction. To overcome this strict assumption, we propose a new approach in which the given reference distribution plays a guiding role in inferring the latent distribution and its consistent region. The inference is based on an assumption that the latent distribution resembles the distribution of the consistent region but is distinct from the distribution of the complement region. We state the problem as the minimization of an energy function consisting of global similarities and implement an iterative scheme for jointly optimizing distribution and segmentation. Rich experimental results demonstrate the advantages of using our approach with various segmentation problems.
This paper proposes a soft voting based bag-of-features (BoF) model considering relative distance of the feature vectors to the nearest-neighbor codeword. The proposed method is more efficient than the kernel distance based soft voting method, which requires brute force parameter optimization. The proposed algorithm is applied to human attribute analysis using top-view images and conventional object classification. The experimental results for the human attribute analysis have demonstrated 100% accuracy for both gender classification and bag possession status classification. It has also been demonstrated that discriminative ability is comparable to that of the fine-tuned kernel distance based soft voting method.
A new pyramidal approach for aerial image matching is proposed. Challenges associated with aerial imagery, such as the complexity and diversity, variations with time, and large data size, have led to exploration of various techniques. One method uses orientation code matching, which together with a pyramidal approach can achieve efficient wide-area aerial image matching. However, as the pyramid levels deepen, the matching success rate tends to decrease. To avoid this problem, we classify aerial imagery broadly into two types of scenes, and define different methods that are appropriate to respective scenes. The proposed technique produces two orientation code pyramids, from which the appropriate one can be selected adaptively. Therefore, we can obtain robust and efficient matching for any scene. Experimental results obtained using both urban and mountainous scenes demonstrate that the matching success rate at the upper pyramid levels is superior to that obtained when using only one generation method.
Broadcasting stations store a large volume of TV programs and manage them in their archives. To enable such programs to be used effectively, the technique for analyzing what is depicted in each scene plays a crucial role. TV programs often contain typical scenes which are used for specific purposes. This paper proposes a novel method of detecting such typical scenes by analyzing the context of closed captions. The proposed method handles a huge number of text features extracted from the closed captions through its use of a Monte Carlo based boosting algorithm. In experiments, we classified text segments extracted from the closed captions as to whether or not the corresponding scene is typical one. The results confirmed that our method classified with comparable accuracy to a conventional method using the AdaBoost algorithm and achieved a dramatic reduction in the learning time.
In Peer-to-Peer (P2P) networks, an incentive mechanism is a necessary component to deal with the free-riding behavior. The challenge is that direct reciprocal incentives; e.g., tit-for-tat, which consider the cooperation of peers in a pair-wise manner, are not suited with P2P streaming. In this paper, we propose a new service differentiation mechanism to provide a redistribution incentive for P2P streaming in a hybrid overlay network. The contribution of a peer can be measured from the number of video sub-streams that it uploads to other peers. By sending one request message, the number of sub-streams that each peer can retrieve is varied by its contribution level. An altruistic peer thus has to send less request messages and will experience smoother video quality than a selfish peer. Through simulations, we demonstrate that our solution can provide service differentiation among peers with better streaming quality than the tit-for-tat scheme.
A method for estimating the quality of images compressed by fractal image compression is presented in this paper. Fractal image compression based on an iterated function system is one of the compression techniques for digital images. It utilizes the self-similarity of images and achieves high image-compression performance. However, fractal image compression is currently not being in widespread use because it does not necessarily provide high-quality compressed images. We cannot determine whether a given image is unsuitable for fractal image compression without encoding it. Therefore, in this paper, we propose a new criterion for estimating the suitability of fractal image compression for a given image. By using the proposed criterion, we can estimate the quality of the compressed image in a short time without actually encoding the image.
A CMOS-based optoelectronic device is proposed for on-chip neural stimulation and observation with optogenetic methodology. The device is capable of local light delivery for stimulation and electrical neural signal recording. The device consists of an array of InGaN light emitting diodes (LEDs) and Au stacked bump electrodes integrated on a CMOS image sensor. Capabilities of on-chip light stimulation and signal recording were quantitatively characterized. We have also confirmed that neuron-like cells can be cultured on the surface of the device.
Analyzing video for semantic content is very important for finding the desired video among a huge amount of accumulated video data. One conventional method for detecting objects depicted in video is called the bag-of-visual-words method, and is based on local feature occurrence frequencies. We propose a method that improves on the detection accuracy of traditional method by dividing video frames into overlapped sub-regions of various sizes. The method computes local and global features for each of these sub-regions to reflect spatial positioning in the feature vectors. These changes ensure that the method is resistant to variations in the size and position of objects appearing in the video. We also propose a training framework based on semi-supervised learning that uses a small number of labeled data points as a starting point and generates additional labeled training data efficiently, with few errors. Experiments using a video data set confirmed improved detection accuracy over earlier methods.
Haptic interaction techniques based on visual feedback have been proposed. The primary principle of presenting the pseudo-haptic sensations is to change the ratio between displacement of user's input and visual displacement of its indicator. Here I report that a pseudo-haptic sensation can also be produced simply by modulating the speeds of background visual images, without changing the movement of visual indicator itself. In this work, I performed two psychophysical experiments to study dominant parameters for generating the pseudo-haptic sensation elicited by background visual motion.