Path prediction methods with deep learning architectures consider the interactions of pedestrians with the feature of the surrounding physical environment. However, these methods process all pedestrian targets as a unified category, making it difficult to predict a suitable path for each category. In real scenes, both pedestrians and vehicles must be considered. Predicting the path that corresponds to a target type is possible by considering the types of multiple targets. Therefore, to achieve path prediction compatible with individual categories, we propose a path prediction method that simultaneously represents the target type as an attribute and considers physical environment information. Our method inputs feature vectors that represent i) past object trajectory, ii) the attribute, and iii) the semantics of the surrounding area into a long short-term memory, making it possible to predict a proper path for each target. Experiments prove that our approach can predict a path with higher precision. Also, we analyze its effectiveness by introducing the attribute of the prediction target and the physical environment's information.
The purpose of this study is to develop a wide-range measurement system for parking support systems. A fisheye stereo camera is used as a sensor for this purpose. The fisheye camera has a wide angle of view, so it is suitable for parking assistance sensors. In previous research, distance measurement was performed using a fisheye stereo camera with a general method called binocular stereo, but the accuracy was insufficient. Therefore, this research attempts to introduce motion stereo in addition to binocular stereo. Since these two measurement methods have different properties, they can be expected to reinforce each other's weaknesses by fusing them. In the motion stereo, three types of matching are introduced. In this method, three types of measurement results with different baseline directions can be obtained so that the accuracy of distance measurement can be expected to improve by fusing. In order to take advantage of each measurement method, we propose a fusion method called bilateral-like filter. This filter gives the continuity of the measurement values obtained from the dense measurement of binocular stereo and the weight that takes into account the position in the image where the measurement accuracy of each method is high.
Inspection of concrete structures such as tunnels and bridges is most often performed in outdoor environments where wind and vehicle noise are strongly present. Therefore, inspection methods must be robust against acoustic noise. The use of an impact hammer, which has a force sensor embedded in its head, has the advantage of being inherently robust against acoustic noise compared to the commonly used acoustic hammering inspection method while retaining the same ease of use. However, being able to capture data only during the short impact time, force sensor alone does not allow for acceptable defect detection. Therefore, in this study, the detection performance of defects was improved by considering the position of the crack on the concrete surface and the sample position obtained from a camera image in addition to the response of the force sensor of the impact hammer. From the experimental results obtained using concrete test blocks in laboratory conditions, the ability to detect defects with an impact hammer was significantly improved.
This paper proposes a novel method for on-site software customization to improve performance of facial image sensing. Facial image sensors have been increasingly useful in various fields such as the driver monitoring system in the automobile and the elderly person monitoring system in the lifestyle support service. However, it is difficult to perform facial image sensing with high accuracy in cases that a user has large scars on the face, or defects/deformations of the facial parts such as eyes, nose, and mouth. So, for example, the driver monitoring system doesn’t help very much in safe-driving. In such cases, it is necessary to customize the software for the facial image sensing on-site like a dealer shop. According to this presented proposal, only setting some simple parameters is needed, on the other hand, no relearning process on facial feature values is needed. So, it brings high performance of his/her facial image sensing as well as general. Also, it is shown how to collect the intended user’s facial images for the software customization and its accuracy validation by using the portable and user-friendly equipment. It is confirmed that the proposed method is effective by the experiments using actual facial images.
We investigate how to represent the probability of gaze distributions for indicating that the observers frequently view body parts when they judge impression words for the body parts of the subjects in the person images. In the field of cognitive science, analytical studies have been reported on how observers view person images and judge the impressions of the subjects. However, there was no discussion of how to represent the probability of gaze distributions when judging impressions for the subjects. Our method gives the observers the task of judging impression words related to a formal scene, and measure the gaze locations from the observers. We represent a conditional gaze probability of body parts using the measured gaze locations. We evaluated how the gaze probabilities change between impression words and body parts included in the tasks. We confirmed that there was a tendency of causing differences among impression words because the divergences between the conditional gaze probabilities of the body parts were large.
In this paper, simple tasks mimicking visual label inspection are described to compare the accuracy of humans with that of deep learning techniques. The number of training samples that are required to obtain equal or higher accuracy as the inspection by humans is investigated using the simple task. In our method, letters printed on test labels are represented as symbols. The variations in the symbols are controlled by changing the angle of rotation, the defective position, and the defect rate. Training samples consisting of images and defect bounding boxes are automatically generated. The experimental results have shown that the number of training samples was needed to be in the order of several thousand to obtain equal or higher accuracy of humans in the simple task. They have been also demonstrated that the number of training samples was needed to be in the order of tens of thousands when the defect rate of the symbols was low.
This paper proposes a visualization method for inspecting the manufacturing defects of products. The standardization of the inspection task and elimination of re-manufacturing are the main problems in the production of various kinds of products in small quantities. The proposed method visualizes manufacturing defects by superimposing a 3D-CAD model on a monocular image using AR(augmented reality) technology. Industrial products have several straight lines and little texture. Therefore, this method uses edges to estimate the six degrees of freedom (6DoF; rotational and translational) object pose. Users need partial manipulation to realize coarse-to-fine pose estimation accurately and easily. Moreover, manufacturing defects are detected and visualized by robust registration based on the least median of squares (LMedS) method. The advantages of the proposed method are as follows: 1) It needs only one monocular camera and a CAD model for inspection. 2) It does not need data acquisition for learning or the conditional settings of the environment in advance. This method enables novices to inspect products on-site efficiently. We demonstrate the effectiveness of the proposed method by creating an original dataset based on a public dataset.
This study proposes a novel camera localization method using color information of spherical camera images and 3D distance information obtained from the 3D model of the environment. To achieve higher accuracy, instead of using particular features, all pixels in spherical camera images are used for the estimation. Continuous tracking of the camera pose is achieved by updating the 3D distance information using previous estimated camera pose. The effectiveness of this method is confirmed by experiments.
This paper proposes a novel method to improve auto detection of small tumors in segmentation of computed tomography (CT) images. For medical images, there is a high bias in a dataset that most parts of images are non-tumor. For recent tumor auto detection, convolutional neural network (CNN) is used. In these methods, small tumors are hardly detected because it is difficult to detect from only single slice image. In the proposed method, sequential multi-slices including a target slice image are used as an input patch. For training CNN model, a loss function called Multi-Slices (MS) loss which is calculated with several annotations of sequential multi-slices is proposed. By using multi-slices for segmentation of CT images, the segmentation model gets to recognize a small tumor which exists in sequential several slices. Our proposed method using multi-slices improves 5.9% in DICE coefficient compared with a conventional method using a single slice. This paper presents that the proposed method is effective for detection of small tumors.
We present a system for creating short summarized videos from longer cooking videos. These videos can be easily shared on social media to convey the required steps of a particular recipe. Typically creating such videos is time-consuming and requires video editing skills. Therefore, we propose a semi-automatic system using the information of an online recipe website, which contains both images and text descriptions. We first search for key frames in the video based on similarity images of each step in the recipe. For more accurate key frame matching, we use labels between objects in the video and the text description, as well as regions of interest defined by motion. Both user studies and qualitative experiments confirmed the usability and effectiveness of our proposed system.
Random noise injures both the basic image quality and also the following image processing procedures. The low-pass filter is commonly used as the image denoising. Low-pass filter can reduce noise, however, the edge becomes always blur as the side effect. In order to suppress this side effect, we proposed edge preserving noise reduction filter using Fast M-estimation method. As the Proposed method is applied experimentally to the noisy image, it was clarified that the noise was clearly reduced and the performance of edge preserving was realized at the same time. Further experimental considerations on the toughness of the proposed method would be expected hereafter.
Recent learning-based multi-view stereo (MVS) approaches have shown excellent performance. These approaches typically train a deep neural network to estimate dense depth maps from multiple images. However, most of these approaches require large-scale dense depth maps as the supervisory signals during training. This paper proposes a self-supervised learning framework for MVS, which learns to estimate dense depth maps from multiple images without dense depth supervision. Taking an arbitrary number of images as input, we produce sparse depth maps using structure from motion and use it as self-supervision. We apply reconstruction and smoothness losses to regions where there is no sparse depth. For stable training, we introduce a pseudo-depth loss, which is the difference between depth maps estimated by the network with the current and past parameters. Experimental results on multiple datasets demonstrate the effectiveness of our self-supervised learning framework.
In the computer graphics field, the microfacets theory has been a hypothesis in many studies as an effective model for expressing light's reflection from a rough surface, but no studies actually observe microfacets as images. To establish a thin-film interference color simulation of a titanium oxide film based on a physical model in this study, we attempted obser- vation via an optical microscope to obtain a microfacet normal distribution model in an oxide-layered coating on titanium. Then, we successfully observed properties or optical phenomena corresponding to the microfacet concept (theory). We verified the acquired microfacet images and proposed a method to calculate the microfacets’normal distribution via color image analysis. Thus, the calculated distribution was considered a typical Gaussian distribution with an added distortion component.
In the event of a disaster, unmanned construction with remotely operated construction machinery is critical for quick disaster recovery. Those machines can weight up to several tons and can easily sink on inadequate soil. Therefore, it is important to judge the trafficability of remotely operated construction machinery at a disaster site. In this research, we propose a non-contact method for judging trafficability. The proposed method classifies the soil type and estimates the water content using spectral images. The cone index is then estimated and the trafficability is judged from the cone index. As experiment, we judged the trafficability for a real construction machine using the proposed method. The results showed the effectiveness of the proposed method based on soil type classification and water content estimation.