To investigate the effects of adjacent stimuli on roughness perception and human uniformity of feeling, the magnitude of perceptual roughness on the middle finger was measured under several conditions in which different roughness stimuli on the index and ring fingers were presented. The magnitude of the roughness perceived on the middle finger depended on the stimuli on the adjacent fingers. We describe a multiple roughness perception model with an averaging mechanism for tactile sensation of the fingers. In addition, we found that uniform visual stimulus enhanced tactile uniformity.
We describe a method that generates dance motions with human emotions from motion-capture data. To generate the dance motions, we developed an emotional motion editor (EME). The EME adds human emotions to the dance motions by modifying the original motion-capture data interactively: for instance, by changing the speed of motion or by altering the joint angles. To evaluate the emotional expressions in the dance motions generated by EME, we performed an assessment experiment by conducting a questionnaire survey, and we examined the results with the statistical t-test. As a result, we confirmed that the dance motions with human emotions are obtainable on the EME by just adjusting a few of its parameters.
To study the application of event-related potential (ERP) for performing picture quality evaluation, ERP was measured for both still and blurred pictures that were subjectively evaluated. A comparison method with quality scales for each “better” and “worse” opinion was used. The results showed that the largest P300 amplitude for each “much better” and “much worse” opinion appeared, and the P300 amplitude and latency varied depending on the opinions in each quality scale, which indicates one of the methods for picture quality evaluation using ERP.
We investigated the effects of voice training on subjective evaluation of voice. First, we collected 305 words which represent voice characteristics and narrowed them down to 10: ‘sweet’, ‘vivid’, ‘calm’, ‘pleasant’, ‘muffled’, ‘high-pitched’, ‘thin’, ‘husky’, ‘clear’, and ‘thick’, that would be used to evaluate voice characteristics. We then carried out a subjective evaluation experiment using these words for speeches of announcers, students who had received no training, and students who had received voice training (before, during, and after). We found that the voices of announcers were more intelligible than those of students, and that the voice of students with training became more intelligible as the training advanced.
We analyzed the individual characteristics of visual impressions such as activity (e.g., colorfulness, warmness, contrasty, and waviness), and regularity (e.g., whiteness, regularness, naturalness) depending on perceptual quality scale for JPEG coded images. The individual characteristics of visual impression were grouped into three classes on the basis of similar features where each class characteristics (naturalness - regularity, brightness - complexity) is different based on the values of quality scale changing. The characteristics of an average impression were different from the characteristics of an individual impression. As a result, we focused this study to image quality control based on the individual characteristics of visual impressions.
An advanced binocular active camera system for wide area surveillance is proposed. The system is equipped with high-resolution actuators, wide angle lens cameras, and telescope lens cameras, which enables it to smoothly track a pedestrian from a far distance and to continuously shoot a stable video in detail. Depth information is available because it is a stereo vision system. Differences in depth between the target and the background can be used to distinguish the one from the other. To achieve robust tracking when the background is complicated, a 3D tracking image processing algorithm based on binocular disparity is proposed to separate the target being tracked from the complicated background, even though the target has changed its appearance according to its rotation. Moreover, to avoid the difficulty of looking for corresponding points between a pair of stereo images, the baseline distance between the two eyes of the proposed system is kept very short. However, this leads to poor vergence control, a problem that is rectified by applying a mathematics model based on the neural pathways of human binocular eye movement, thus enabling stable binocular tracking.
This paper presents a user-friendly texture mapping engine for semi-automatically texturing 3D models from real-world images. The engine implements a novel approach to simplify the user's interactions when registering images to geometry. Our approach has three benefits. First, the number of interactions required by the user is significantly reduced. Second, the system works well even with low-precision corrections by the user. Third, these interactions are simple because they consist only of dragging operations controlled via real-time feedback. The key idea is to take advantage of a three-dimensional orientation sensor attached to the camera so as to simplify the object pose estimation. Geometric computation is introduced to implement this idea. To robustly refine the pose, we implemented an existing tracking method by extensively taking advantage of Lie group formalism. A set of experiments demonstrating the efficiency and practicability of this approach was conducted.
Computer-generated hologram (CGH) is a technology that makes hologram data using a simulation of the recording process of holography. However, there are still many problems with CGH. We describe a calculation method that takes into consideration the reflectance distribution of object surfaces for CGH to improve the reality of reconstructed images. This method is based on the Blinn, or the Torrance-Sparrow, reflection models in CG. In these models, object surfaces are divided into a set of mirror-like microfacets. By calculating the phase differences of object surfaces with these models, various reflectance distributions for CGH are expressed. We did computer-simulated experiments, which compared the light intensity distribution of simulated values on each reflectance factor. We also did optical reconstruction experiments. These experiments showed that the proposed method makes various reflectance distributions of reconstructed images.
An architecture of a programmable systolic array processor is proposed for the discrete wavelet transform (DWT). This transform requires a huge amount of data to be filtered. To achieve this, many processor elements (PEs) are implemented. However, the hardware of a multiplier for multiply-accumulate operations is large, and complicated connections among PEs lower flexibility and scalability. By using the time-divided multiple-operation method, the execution unit with a simple structure of shifters and a three-input adder achieved 50% of hardware size and the same performance of that achieved with a multiplier and an adder. The unique network mechanism among PEs and the systolic array architecture provided a high level of data transfer, flexibility, and scalability. Using this architecture enables a processor with ten PEs to execute DWT for 1024×1024 image pixels in 26.3 ms.
We have developed a single-chip HDTV encoder LSI for H.264/AVC High422P@L4.1 that is optimized for broadcasting transport systems. Significant bit-rate reduction around 50% or more relative to MPEG-2 can be achieved by originally developed motion vector coding algorithm and subjective quality enhancement through adaptive bit allocation considering human visual system. It also enables low-delay encoder implementation with at shortest 50ms encoding delay, with advanced coding control that optimizes the balance of buffering delay and coded image quality as well as variable delay time control mechanism. We adopt a hierarchical multi-CPU framework and CPU interface those enable flexible encoder control per GOP/Picture/MB, to realize various image quality and encoding delay adjustments. This LSI is fabricated in a 90-nm CMOS process and can be integrated in a 9x9 mm2 chip.
We developed a real-time electronic zoom system that can crop any area in a four-sensor pick-up type Super Hi-Vision (SHV) video and output the area as an HDTV video. The cropped area can be selected using the type of remote control used in a conventional robotic camera system. The maximum zoom ratio is 4X to maintain the HDTV quality. This electronic zoom system enables new program production techniques. For example, an HDTV video of a decisive moment can be cropped from a wide-angle SHV video shot. To evaluate the quality of the cropped image, we examined the resolution, S/N, and chromatic aberrations of generated images. The evaluation results showed that the quality of the cropped HDTV video is good enough for use in HDTV program production.
A floating millivolt reference circuit to generate a PTAT current was developed by using MOSFETs operated in the subthreshold region. The circuit generates a floating voltage of 10 mV. The variations in the reference are ±2.7 % in a temperature range from -20 to 100°C. The total power consumption of the circuit was 1.3 μW.
We propose method of understanding the real environment with data from visual information. Recently, we have needed a robot system that handles difficult problems facing nursing care, other medical fields, welfare, and so on. However, it is difficult to understand and adapt to the surrounding environment from real images. We constructed a computer system for understanding the real environment with a neural network model of associative memories. We built this modeling system that associated the category of an unknown object in the virtual room interior. This system saved data for many contexts for environmental knowledge. The experimental results indicate the effectiveness of the system. This paper provides a quick overview of the system applied to object recognition.
An optical retransmission system for 12- and 21-GHz satellite broadcasting radio and 10-Gbps-class-PON signals is proposed. The system utilized the same frequency pass-through method. Minimum received optical power and coupling optical power ratio between the satellite broadcasting radio and 10-Gbps-class-PON signals have been calculated.
The use of a nonlinear filter for super-resolution of a single image is presented. A nonlinear iterative solution procedure is derived from an approximate Newton method for minimizing errors in low-resolution images reconstructed from estimated high-resolution images. We use the bilateral filter for image smoothing in the reconstruction steps. Our method was shown to produce high-resolution images without halos around edges and with enhanced fine textures.
This paper describes the making of stereoscopic 3-D movies utilizing the Metaverse. The authors focused on machinima, which is a way to make movies using PCs and video games. A tool for making stereoscopic 3-D movies in Second Life is proposed. The tool also enables running experiments with different ways of stereo shooting. The purpose of this study was to propose a novel method that supports movie creators to make stereoscopic 3-D movies easily.