A 1/4-inch VGA wide-dynamic-range complementary metal oxide semiconductor (CMOS) image sensor with resistance to high temperatures has been developed using a lateral overflow capacitor in a pixel, a very low dark-current front-end-of-line (VLDC FEOL) , and either an inorganic cap layer on an on-chip-micro-lens (OCML) or a metal hermetically sealed package to suppress the degradation of the spectra response of the OCML and color filter. The dark current level was reduced to 25e-/sec/pixel at 60°C. Sensor chips with no cap and ones with the inorganic cap layer on the OCML were assembled into either a metal hermetically sealed package or a conventional package. The chips with the inorganic cap layer and the ones with the metal hermetically sealed package showed no significant degradation of the spectra response in any of the R/G/B pixels even after a thermal stress test at 150°C. Improved image sensing performances were observed up to 85°C, and the dynamic range was extended to 94dB.
We investigated a negative feedback method for adding functionality to a CMOS image sensor. Our sensor effectively uses the method to set any intermediate voltage into a photodiode capacitance while a pixel circuit is in motion. The negative feedback reset functions as a noise cancellation technique and can obtain intermediate image data during charge accumulation. Using the above features, we achieved duplicated interlaced processing and were able to output frame-difference images without frame buffers. The experimental results obtained with a chip fabricated using a 0.25-μm CMOS process demonstrate that intra-frame motion detection is an effective application of negative feedback resetting.
When conducting depth estimation using image sensing it is necessary for the system to be able to deal with a large amount of comparisons and operations; therefore, achieving high-speed performance is a challenge. In order to solve this problem, we propose a depth estimation system that uses a smart image sensor that has an in-focus judgment function with multiple-focus images. This judgment function is based on the “depth from focus” (DFF) technique, which has the following advantages: 1) a simple algorithm, 2) a single camera, and 3) it is more accurate in the broad-depth estimation area. Furthermore, we propose an interpolation method for getting depth values in textureless regions that cannot be obtained using the judgment function of the smart image sensor. We evaluated our proposed method using computer simulation and we show some experimental results obtained using the prototype chip.
We introduce a multiple-viewpoint 3-D measurement system that is suitable for acquiring high-speed and high-accuracy 3-D model model movie in real-time. The present system consists of multiple range finders, and each one consists of a camera and projector. We use a high-speed 3-D image sensor as a camera and used the range finders to measure various target objects from multiple viewpoints in a time-division manner. We then carried out a general linear calibration and evaluated the range accuracy. We demonstrated acquisition speeds of up to 10.6 range-maps/s for two cameras and 6.1 range-maps/s for three cameras in real-time. Our system is suitable for high-accuracy 3-D model acquisition with, at 1200 mm, an error to distance of less than 0.2 %. We show the applicability of the present system for when a strong light source is used. The present system is suitable for complete 3-D model acquisition since the detected areas of the overlap in multiple-viewpoint 3-D range finding is small.
Optical construction of images, namely in image-quality, effects the mind of a viewer. Therefore, film, TV, and photography professionals create images to affect viewers' emotions and feelings. We investigated how this type of image quality effects the sense of time of viewers. In experiment I, impression evaluations, using the normalized ranking method, were conducted on images that had been digitally processed to create different types of image-quality. The results show that the color of images is largely concerned with the sense of the past. In experiment II, we evaluated the impression of time of monochrome images that had been digitally processed, creating different colors, to investigate the effects of these hues with regards to the viewer's impression of time. In experiment III, we evaluated sepia toned images, which roused a strong sense of the past in the viewers. It was found that chroma, especially sepia, plays an important role in the sense-of-the-past feeling of the viewers of these images.
We propose non-deterministic methods for automatically detecting occlusions of people's faces by superimposed symbols. We trained a face detector using an ensemble-learning algorithm (GibbsBoost) that is based on the sequential Monte Carlo method. We implemented an occlusion detector using a mixture of two discriminant functions that were related to the size of the detected face region and the occluded face area. One realization of this detector achieved a true positive detection rate of 90%. We present experimental results and discuss possibilities for further improvements.
It is very important to understand the relationship in a VR environment between the type of image projected and the user's perception of distance, speed and time. However, there has been no study on this. We studied this relationship using virtual routes that were projected in CAVE with five screens in stereo-vision. We arranged objects around the virtual route and prepared two sets of conditions. One was low density, which means the virtual routes were surrounded with not many objects. The other was high density, which means there were many objects. The results of our experiment show that the number of objects (the density of arranged objects around the virtual route) affected the user's perception of distance, speed, and time. The influence on the perception of time was smaller than that on distance and speed. Under the high density condition, the users felt that the distance and time was longer and the speed faster.
Moiré appears when two patterns with slightly different pitches are put together. If there is a gap between these two overlapping patterns, the moiré will then have a binocular parallax, which allows the viewer to experience a depth (floating above the surface or sinking). We explain how we, by overlapping moiré and an ordinary two-dimensional image, produced a prototype "pseudoscopic 3D display" in which the moiré had depth before and behind the 2D image. We also quantitatively analyzed the relationships between the moiré pitch and depth reproduction. The analysis results closely matched the results of a subjective evaluation test. Furthermore, to verify the effectiveness of our system, we produced two types of pseudoscopic 3D display (print type and full-color moving image display type) , and verified that a pseudoscopic 3D display with moiré has visually adequate depth.
In the field of medical image processing, 3-D (three dimensional) image processing is often required to deal with MRI, CT and Positron Emission Tomography (PET) image data. However, it is not easy to construct complex 3D processing procedures manually compared with 2D ones. Thus, we previously proposed a new method named 3-D Automatic Construction of Tree-structural Image Transformation (3D-ACTIT) for making various 3-D image-processing procedures based upon example learning. In this paper, we apply 3D-ACTIT to 3-D PET image data. PET is a kind of medical image which indicates the metabolism of the human body. This type of image has been used recently as a method of detecting cancer, but because of low resolution and unclear indication, it is difficult to distinguish the outline of internal organs. Experimental results show that liver region segmentation from 3D-PET image data can be constructed automatically by the proposal method.
Voice activity detection is an important part of the development of speech functions for on-board car navigation and assistance systems. It is difficult to detect voice activity using only sound information in a vehicle environment that has a wide variety of sounds and noises. We propose an suitable image feature and integration method that can be used to develop a robust bimodal voice activity detection (VAD) systems using a driver's voice and facial images. We select the normal correlation value between sequential mouth images and the number of low-intensity pixels in mouth image, which we then used as the feature for VAD. We propose a system in which the discrimination function consist of the sum of weighted singles feature discrimination functions and combinations of logical addition and multiplication of singles feature discrimination functions. The experimental results show that the proposed sound and image features can be useful and that the proposed integration method has a 97% hit rate, which is 9 points better than the previous integration method at the point that false alarm rate is about 12%.
Many people would like to record their daily lives and control their experiences effectively. For this purpose, we have developed a wearable video system to capture our personal experiences. However, segmentation and retrieval of our experiences in the wearable video system is still a significant problem. If used in our everyday lives, a wearable video system records many different scenes in various locations. One of the segmentation points is when we change location from one place to another. We propose a method for enabling a wearable video system to detect scene changes. The proposed method has two stages: in the first stage, the system detects changes in physiological data (heat flux and skin temperature) in order to extract candidate scenes that are then analyzed for environmental changes in the second stage, in which we improve the ability of the system to detect changes in scene by analyzing histogram changes in video frames. In our experiments, the accuracy of detecting environmental changes was 62% in the first stage, which rose to 70% in the second stage, while the recall rate remained above 80%