In lecturers with writing on the blackboard, the timing of the behaviors (noting and listening) by students plays important roles on the evaluation of the understanding and interest for given contents. In this report,we discuss the embedding of movies in hand-written notes using a wearable camera for the purpose of the analysis of the sequence of behaviors by students. Concretely, we describe image processing methods for the classi_cation of behaviors by students and the detection of the pen position from the movie recorded by a wearable camera. Moreover, we describe a method for the correspondence the movie to the contents in notes based on the time.
This paper describes several methods for 3D measurements from a single uncalibrated image using various geometrical clues in a real scene. First, we consider a speci_ed plane in real space as a base plane (reference plane), and estimate the camera projection matrix using two image points whose actual heights from the reference plane are already known. Second, based on the projection matrix, we estimate a new plane via the reference plane whose parameters are already determined. Speci_cally, we propose two new methods for estimating a plane inter-secting the reference plane with an arbitrary angle: (i) the method using a imaged circle and a imaged center of the circle which exists on that plane, (ii) the method using two imaged circles that both exist on that plane. Further-more, we show how to extract other 3D information related to the reference plane from the single image.Finally, we verify the validity and the effectiveness of our methods through some experiments.
In recent years, the method for measurement of a human body action by Kinect as a markerless motion capture has been reported variously. In these reports, the main proposals are the technique of enabling measurement of all the directions, and the main solutions are using two or more Kinects. In this case, it is necessary to presume “which Kinect's position data is the rightest”, and to reconstruct position data from them. This paper describes construction of the motion capture system by two Kinects, a way of the coordinate conversion between Kinects, and a method of position data reconstruction from them.
In order to improve the image quality of magnified image in the image enlargement, the super resolution using total variation (TV) regularization has been proposed. Our previous method can make many kinds of high frequency images by changing TV regularization's parameter. Due to the high frequency image uses, we can generate magnified images with various quality. However, our experiments show that the optimum parameter is different for each feature of image to generate a high quality magnified image. In this paper, an input image is classified into texture area and other area, and the optimum parameter is determined in each area to generate a high quality magnified image. As the experimental results, the magnified image by proposed method is finer than linear interpolation and previous super resolution method.
ScalablePoisson disk sampling technique is one of the most powerful NPR techniques to generate artistic imagesfrom input photos.It can generate various artistic images, for example watercolor painting, oil painting, colored paper mosaic, and more.However, a weak point of the scalablePoisson disk sampling technique is totake long time.We propose and develop a fast scalablePoisson disk sampling technique by referring pre-sampled data.Experimental resultsshowthat our proposed method can reduce computational time of various types of artistic images.
An effective combination of music and a video image generates efficacious interaction between auditory and visual processing. Then, these techniques are used in concert halls, on promotional videos, or on TV commercials. However, the professional knowledge and/or tools are necessary to create video images considering the interaction between auditory and visual processing. From this kind of backgrounds, we have proposed a method to create a video image well-suited to the music based on its characteristics and chorus section detection automatically. However, the tempo and the impression of the music has not been taken into account in the previously proposed method. Then, in this study, we propose a method to create a video better-suited to a given music by considering the tempo and the impression extracted from the music. The results of the verification experiment show the effectiveness of the proposed method.
DCT(discrete cosine transform) in JPEG is the one of effective image coding methods based on the orthogonal transform with the energy compaction. However, visual quality of the encoded image is deteriorated in the edge region when the data compression ratio is higher. Independent component analysis (ICA) can obtain a set of basis which corresponding to the structural feature of the input image, and image coding method using ICA has been proposed by focusing the sparseness in the ICA coefficients. In the previous method ICA has the problem of the coding performance that more bitrate is required at the gradation region compared with the edge region in the input image. This paper presents a new image coding method using both ICA and DCT based on the image segmentation.
Many computational algorithms of discrete cosine transform (DCT) have been proposed for reducing operation loads. They concentrated on four-point, eight-point, and 16-point DCT at the request of compression method. In this paper, the design and approximation for DCT with 32 and over size are considered using lifting scheme based on Wang’s factorization method. The effect of approximated lifting coefficients is shown through some simulations.
In order to improve the coding performance of subband image coding, the optimum frequency band partition (OFBP) has been proposed. Previous OFBP has the problem that the partition pattern is fixed in spite of coding rate is changed, because the parameter of the previous OFBP is signal power in each subband before quantization. This paper presents that the similarity between the probability density functions (PDFs) in subband signals is closely related to the improvement of coding performance of the OFBP. According to our theoretical derivation, the new OFBP is determined using new parameter which can evaluate the similarity between PDFs. It is seen in the experimental results that the entropy of the proposed method is reduced than the previous OFBP under the condition of the same image quality.
The purpose of this study is to infer the foreground category reasion of existed object in image. Useful regions are extracted by using saliency map at first. Then, the category of extracted region is recognized. And finally the foreground regions are inferred by using color information of the recognized category. In experiment, we used 5 categories of Microsoft Research Cambridge 21 Dataset. As a result, 0.50 of F-measure was obtained as the best accuracy.
In recent years, many researches on emotion estimation from facial expression have been conducted. In human communications, not only the meanings of words but also many kinds of nonverbal information such as tones of voice and facial expressions are used. Then, for example, in order to perform smooth communication between robots and human, it is quite important to estimate the emotion of the conversation partner. Therefore, in this study, we propose a method to recognize nine facial emotions (happiness, sadness, surprise, anger, fear, disgust, contempt, kissing, neutral) by using a Kinect in real time.