This paper proposes a stereo camera using two fisheye cameras. The measurement range of the stereo camera is increased by using fisheye cameras. Fisheye images are deformed by using an equirectangular projection to simplify the stereo correspondence search in the images. Stereo images are rectified to correct the influence of external parameters between a left camera and a right camera. Rectification parameters are estimated from feature points obtained from an image of an arbitrary environment. Errors due to deviation of the position and rotation of the left and right cameras are removed by the stereo rectification. The corresponding points are calculated by using stereo matching in the deformed image, and the distance is measured from the disparity. The performance of the fisheye stereo camera using equirectangular images is evaluated by simulations and experiments.
The autonomous driving at level 3 requires hand over the operation to driver when the system could not continue driving automatically. In order to response this requirement, the driver monitoring system is an important technology which estimates the posture and facial angle of driver. In addition, it has to detect the sitting on the driver seat as prior task. The conventional methods take time consuming because it extracts the hand craft features and learn classifiers with them for each task. In this paper, we introduce Heterogeneous Learning for simultaneous learning of multiple tasks by DCNN. It is able to share a feature extraction and output multiple tasks at the same time. We combine 2 tasks in a single DCNN that estimates body posture and facial angle. We also challenge the combination of driver sitting detection and body posture estimation. While the processing time of our method is 2.6 ms in GPU, the conventional DCNN is 3.4 ms. The performance of proposed methods with Heterogeneous Learning are comparable accuracy to DCNNs of single task. Our method is faster than conventional method with equivalent performance.
Companies require new human resource management approach which brings out each worker's strength. In this paper we propose integrated approach for “measurement based management” which based on statistical analysis of body motion data by wearable sensors of workers. Feature of the proposed approach is including a process that workers interpret the result by themselves and encouraging change of behavior at work. The result of applying the proposed method for actual organization, it is confirmed that workers successfully obtained their interpretation, consensus formation and validation of work measures. This confirmed that the proposed method is effective for office workers to improve their work style proactively and continuously.
This paper proposes a novel approach for video-based person re-identification that exploits deep convolutional neural networks to learn the similarity of persons observed from video camera. By Convolutional Neural Networks (CNN), each video sequence of a person is mapped to a Euclidean space where distances between feature embeddings directly correspond to measures of person similarity. By improved parameter learning method called Entire Triplet Loss, all possible triplets in the mini-batch are taken into account to update network parameters at once. This simple change of parameter updating method significantly improves network training, enabling the embeddings to be further discriminative. Experimental results show that proposed model achieves new state of the art identification rate on iLIDS-VID dataset and PRID-2011 dataset with 78.3%, 83.9% at rank 1, respectively.
We propose a practical framework for statistically evaluating the results of object recognition. This framework makes it possible to map the results of recognition of objects from various viewpoints into the three-dimensional space. In this paper, the following two effects obtained by this framework are described. (1) Statistically stable recognition results can be obtained by integration of recognition results from various viewpoints. (2) The relationship between the viewpoint and the recognition rate can be visualized and the robustness of the learned model to the change of the viewpoint can be evaluated. We demonstrate the effectiveness of the framework through experiments using several real objects.
A lighting module using a holographic optical element (HOE) for glossy surface inspection has been developed and demonstrated. This lighting module has several advantages for such kind of inspection compare to the usual lighting s ystem. The differences between HOEs lighting system and usual lighting system were discussed. HOEs were fabricated by our originally developed ultra-high speed collinear holographic printer. The several optical systems and image processing method using hue also were developed, the validity of our system for actual defect was investigated.
This paper proposes extended Generalized Hough Transform (GHT) to introduce training process by using Partial Least Squares (PLS) regression analysis. Hough transform can robustly detect patterns against noise and occlusions, and GHT is adapted to perform the generic object detection. However, GHT has a weak point in detecting object shape change. Thus, we introduced a training process to determine the voting weight of GHT by using PLS regression analysis. We call our method “PLS Hough transform”. Thereby, it becomes possible to generic object detection, while maintaining the framework of Hough-based object detection. Also, features are selected by Variable Influence on Projection (VIP) or regression coefficient obtained by PLS regression analysis. To confirm the effectiveness of our method, we applied PLS Hough transform to the vehicle detection from aerial and satellite images.
This paper describes a method for an automatic detection method of the small dent on the metal plate. We employed the photometric stereo as three-dimensional measurement method, which has advantages in terms of low cost and short measurement time. In addition, we realized a high precision measurement system by using an 18bit camera. The small dent on the surface of the metal plate is detected by the inner product of the surface normal vectors measured by the photometric stereo. Moreover, we confirmed an effectiveness of our method by detection experiments.
In this paper, we propose a method of human action recognition for videos in which actions are continuously transitioning. First, we make pose estimator which has learned joint coordinates using Convolutional Neural Networks (CNN) and extract feature from intermediate structure of it. Second, we train action recognizer structured by Long Short-Term Memory (LSTM), using pose feature and environmental feature as inputs. At that time, we propose Pose-Centric Learning. In addition, from pose feature we calculate Attention that represents importance of environmental feature for each element, and filtering latter feature by Attention to make this effective one. When modeling action recognizer, we structure Hierarchical model of LSTM. In experiments, we evaluated our method comparing to conventional method and achieve 15.7% improvement from it on challenging action recognition dataset.
Machine vision technology is important because it is used for automated assembly or visual inspections in factories. Template matching is one of the general recognition methods used for this. But, this method need to match a large number of templates to detect rotated objects in an acquired image. Therefore, it creates computing time problems. We solve this problem by proposing fast matching method using “manifolds”. Here, “manifolds” means trajectories in the multi-dimensional space. They comprise continuous points that correspond to multiple images. Since, the computing time depends on the manifold's shape, the proposed method reduced it by optimizing the manifold's shape. Experimental results showed the method achieved a 92.5% recognition rate and a 236 msec processing time.
In the fisheries, understanding the fishing ground and estimating of the amount of resources are required for increasing production stability, strengthening international competitiveness and disaster recovery. Recently, The investigations of aquatic resources are carried out using DV camera. The investigations using DV camera give less damage to the fishing ground than capturing resources. Seabed videos are taken for measuring scallops in scatter scallop fishery in Hokkaido, Japan. We are able to investigate and fishery better if we get not only information of number of scallops but also bottom sediment that scallops inhabit. Our research aim is 4 types sediment classification, sand, ballast, gravel and shell beds. Shell beds are accumulation of scallop's carcasses. Using seabed images took by DV camera enable getting high precision and wide range of information. In this paper, we consider bottom sediment classification method from seabed image using Convolutional Neural Network. This experiment shows accuracy more than 95% in all sediment types.
We propose an image matching method that keeps recognizing with high accuracy over a long period of time. On a production line, although a multitude of the same kind of components can be recognized, the appearance of a target object changes over time. Usually, to accommodate that change in appearance, the template used for image recognition is periodically updated by using past recognition results. Due to the false recognition, the template might be updated by the data from not only the target object but also the other objects. This leads to degrade the performance. In this research, we define the pixels which become those factors as “Outlier-pixel”. With the proposed method, outlier pixels in past recognition results are extracted, and they are excluded from the processing to update the template. Accordingly, the template can be updated in a stable manner. To evaluate the performance of the proposed method, 5000 images in which the appearance of the target object changes (due to variation of lighting and adhesion of dirt) were used. According to the results of the evaluation, the proposed method achieves recognition rate of 99.5%, which is higher than that of a conventional update-type template-matching method.
Visual inspection is a vital step for maintaining the quality of industrial parts. In visual inspection, the relation between the three factors, camera pose, lights direction and normal vector of a part is very important to detect abnormalities on industrial part. We propose a visual inspection method for industrial parts using an image sequence captured while rotating an industrial part using a rotating table. By rotating a part, we can easily change the relation between three factors. We track the points on part's surfaces in an image sequence. We discriminate abnormal surface points such as scratches from the normal ones based on pixel value transitions in an image sequence. For accurately tracking the points on the part surface, we propose a novel camera calibration method of the rotating table and the camera with a telecentric lens and a pose estimation method for the rotating part. We use angular invariant feature vectors for improving robustness against types and shapes of industrial parts to be inspected. We presented experimental results using real data of industrial parts and verified that the proposed method could detect scratches on surfaces.
In this paper, we propose a new method to segment book shape represented as 3D point cloud into each page as an elemental technologies of book digitization based on 3D sensing. First, we focus on that each page of books can be treated as a two-dimensional surface embedded in three-dimensional Euclidean space. And then, we derive Lipschitz continuity of both normal and curvature to surfaces representing the shape of each page based on the nature of book shapes and the differential geometric properties of surfaces. The conditions for arbitrary two points to exist on the same surfaces are derived by Lipschitz continuity of both normal and curvature to surfaces. The proposal method achieves page segmentation by clustering point cloud using disjoint-set data structure based on this conditions. In the experiment, we evaluated calculation time and accuracy of the proposal method, and showed that our method achieved about 97% success rate with several milliseconds.
This paper proposes a novel method to improve the robustness for scale-reconstructible structure from motion (SfM) using refraction. SfM is a three-dimensional (3D) measurement method using a single moving camera. The conventional SfM can simultaneously estimate the 3D positions of objects and camera poses. However, the real-world scales of objects cannot be determined. To solve this problem, a SfM using refraction was proposed. In the SfM using refraction, it was verified that the object shapes could be reconstructed with their real-world scales using just two images captured through a refractive plate. However, the SfM using refraction is greatly influenced by measurement errors, leading to reconstruction failure. The purpose of this paper is to improve the robustness for scale-reconstructible SfM using refraction. To that end, bundle adjustment considering the influence of refraction is proposed. We propose approaches to selection of initial values for bundle adjustment and evaluation function of the optimization. In simulations, it was verified that the 3D reconstruction was successful by applying the proposed method even if the measurement errors were large. The reconstruction was also successful in reconstructing the scale of the object in the real experiment.
In this paper, a novel method for 6 degrees of freedom (DoF) localization of a single spherical camera in a man-made environment is proposed. Taking advantage of the various line features that are usually present in such an environment, a technique to match the 2D line feature information inside a spherical image to the 3D line segment information available in a known 3D model of the environment is developed. There are two main challenges to be overcome. First is the detection of the line feature information in a spherical image and its abstraction into a descriptor that is compatible with the 3D line feature information in the model. Second is to evaluate similarity of the line feature information from the 2D image and that from arbitrary 6 DoF poses in the 3D environment model in order to localize the camera. To deal with the former, a randomized hough transform with spherical gradient-based filtering is used to accurately detect line features in the image and create a line feature descriptor. The same descriptor is created from arbitrary 6 DoF poses in the 3D model. Then, to deal with the latter, the Earth Mover's Distance (EMD) is used to evaluate their similarity. The proposed method was evaluated in a real environment with its 3D model. The results demonstrated that it can effectively estimate the 6 DoF pose of a spherical camera using a single image.
Bird's-eye view system is one of image presentation systems for teleoperation. Bird's-eye view image helps an operator to visually recognize surrounding environments because of its high visibility. However, when there are obstacles that are higher than the floor or the ground, there is a problem that their positional appearance is incorrectly shown on bird's-eye view image because of the image distortion. This paper presents a new method to visualize correctly obstacles' positional appearance on the bird's-eye view image by omnidirectional 3D ranging. Specifically, the information of the distance to obstacles measured by LiDAR is superimposed onto the floor or the ground as points with colors obtained by fish-eye cameras. As shown in the experimental results, the image generated by this proposed method can improve the safety and visibility in teleoperation.