In this study, to promote the translation and digitization of historical documents, we attempted to recognize Japanese classical kuzushiji characters by using the dataset released by the Center for Open Data in the Humanities (CODH). Using deep learning, which has undergone remarkable development in the field of image classification, we analyzed how successfully deep learning could classify more than 1,500-class kuzushiji characters through experiments, and what made it difficult to classify kuzushiji characters. In addition, we introduced a method to automatically eliminate characters that were difficult to classify or characters that were not used during training. In an actual translation work, those ambiguous characters can be left as unknown characters as “ 〓 (geta)” and pass on to the expert to make a decision. Finally, our experiments showed that the classification rate was improved from 72.10% to over 90% by performing the data augmentation and classifying only characters with high confidence and confirmed the effectiveness of our proposed method.
While recognizing an image taken from an arbitrary angle in a real environment, it may be challenging to obtain a good recognition result on occasion. That is, the recognition accuracy is affected by changing the viewpoint (object viewing direction). This phenomenon could come from the fact that people photograph target objects on well explainable aspect. It has been known empirically that the recognition rate varies depending on the viewpoint so far, but this topic remains underdeveloped. Therefore, in this research, we aim to clarify and visualize the existence of viewpoint biases in existing large object recognition image datasets. We also quantify the viewpoint biases by a definition of viewpoint bias index.
We propose a method for classifying gender using training samples after applying privacy-protection. Recently, training samples containing individuals require to protect their privacy. Head regions of training sample are usually manipulated for privacy-protection. However, the accuracy of gender classification is degraded when directly using the protected training samples. Here, we aim to use the human visual capability that people can correctly recognize males and females though the head regions are manipulated. We use gaze distributions of observers who view stimulus images for the preprocessing of gender classifier. Experimental results show that our method improved the accuracy of gender classification after manipulating the training samples by masking, pixelation and blur for privacy-protection.
In this paper, a novel setting is tackled in which a neural network generates object images with transferred attributes, by conditioning on natural language commands. Conventional methods for object image transformation have used visual attributes, which are components that describe the object's color, posture, etc. This paper builds on this approach and finds an algorithm to precisely extract information from natural language commands, which transfers the attributes of an image and completes this image translation model. The effectiveness of our information extraction model is experimented, with additional tests to see if the change in visual attributes is correctly seen in the image.
It is important to visualize a teleoperated robot and its surrounding environment to the operator. An arbitrary viewpoint visualization system generates images that simulates to be taken from external viewpoint with onboard sensors. It is necessary to calibrate the sensors for field utilization of the visualization system. In this research, a sensor calibration method for a laser radar and multiple fisheye cameras is proposed. The proposed method uses the shape model of machine and multiple planes in the surrounding environment to calibrate fisheye cameras and laser radar. The calibration experiments was conducted with the real machine. The obtained image by the calibration result of the proposed method has low discrepancy between laser radar mesh and fisheye camera texture.
In recent years, many researches on surrounding environment recognition using a LIDAR have been actively conducted in order to develop autonomous driving. PointNet, which is a DNN-based object recognition method that directly processes 3D point clouds, has shown good performance. However, it has a limitation that since the number of input points is fixed, it cannot be applied to point-clouds containing varying numbers of points. Therefore, this paper proposes a novel sampling module that allows the input of an arbitrary number of points into the PointNet. The proposed module features two functions: down-sampling that can maintain the shape of target objects, and up-sampling based on LIDAR characteristics. To evaluate the effectiveness of the proposed method, an experiment is conducted on point clouds provided by the KITTI Vision Benchmark Suite. The experimental result shows that the recognition method using PointNet with the proposed sampling module outperforms the conventional methods.
Making everyday dishes is a vital part of everyday life. This paper proposes a novel system that takes a food flyer of a grocery store to automatically recommend popular recipes containing ingredients appeared in the flyer. Based on an optical character recognition (OCR) technique, our algorithm extracts the ingredient names from the words in the flyer by matching them with a dedicated ingredient dictionary. The extracted ingredients are used as queries to retrieve cooking information from a recipe database. Newly proposed word correction scheme using multiple similarity measures robustly corrects misrecognized characters from the OCR to boost the extraction performance. We conducted both quantitative and subjective evaluations to confirm the effectiveness of the proposed method. The subjective evaluation revealed that more than 90% of the participants rated the proposed system as practical and were satisfied with the quality of recommended recipes.
Recently, the development of deep learning has enabled robots to grasp objects more reliably than ever. Given this fact, there is an increasing demand for helper robots or home robots. To make these robots real, robots need to understand not only how to grasp objects but also their functions. We propose a new representation for the functions of objects, task-oriented function, which is based on operational task input. This representation makes it possible to describe a variety of ways to use an object. We also propose a new dataset for task-oriented function and a network to detect it. This model reached 79.7% mean IOU in our dataset.
In the field of Kansei (affective) engineering, the approach is often taken of modeling product's Kansei index to meet user's affective needs. This study work on modeling Kansei index automatically by machine learning using review text and images of products on the web. The proposed method follows: (1) Extraction of the main impressions of target domain and calculation of text impression scores that express the strength of each impression from review text by text mining, (2) creation of the product image data set with the training label made from the distribution of evaluation to products' impression by human and (3) construction of the deep neural network that estimates image impression score of product using the data set. Wristwatches were applied to proposed method as target domain. Then, estimation accuracy of constructed deep neural network was verified. As a result, high correlation coefficient 0.67 was confirmed between image impression scores and text impression scores, and effectiveness of the proposed method was confirmed. In addition, since the result exceeded correlation coefficient 0.51 calculated from estimation result of another deep neural network which has not learned distribution of evaluation, it was shown that learning the distribution was effective for improving estimation accuracy.
The purpose of this research is to develop vision-driven image captioning technology capable of creating not just simple situation description but refined expressions such as jokes. The proposed method consists of three phases; collection, joke, and assessment. In “collection” phase, we collect 270,000 images and 5,000,000 funny jokes (captions) from Japanese joke website “Bokete” and build a joke database named BoketeDB. In “joke” phase, we adopt a step function to change weight corresponding to evaluation of captions in the BoketeDB. The function outputs Funny Score representing evaluation. We utilize the Funny Score to tune parameters for training conventional CNN-LSTM model. In “assessment” phase, we prepare two kinds of evaluation ways for machine-generated jokes. One is to collect questionnaires regarding the generated jokes from the unspecified 121 subjects. The other is to directly post the generated jokes to website Bokete. The evaluation is performed by people browsing the website. We have verified that the proposed method is superior to conventional image captioning method to deal with jokes.
In this paper, we analyze headlight flicker patterns to improve a pedestrian's detectability from a driver. Recently, headlights are becoming capable of selectively projecting light on a pedestrian in addition to the normal forward projection. However, it is still not analyzed how the light should be projected to effectively improve the detectability of the pedestrian. Firstly, we confirm the effectiveness of the flicker light projection in a real-world setting. Next, we conduct an experiment in ambient light conditions using a driving simulator to find the effective flicker pattern in each condition. As results, we confirmed that the flicker light projection contributes to improve the pedestrian's detectability from a driver and effective fundamental frequencies of flicker lights are different depending on the ambient light conditions.