Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
35th (2021)
Displaying 251-300 of 514 articles from this issue
  • Shunsuke HABARA, Yoshiaki KUROSAWA, Kazuya MERA, Toshiyuki TAKEZAWA
    Session ID: 2Xin5-22
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    There is a growing trend towards implementing technologies that use deep neural networks to improve sound quality by signal denoising, and a system that converts voice quality in real-time for the online conference. In the field of computer vision, inpainting techniques based on deep neural networks have also been developed in recent years. In this paper, we focus on an inpainting technique with contextual attention to recover spectrograms. We apply a mask to the time direction of the spectrogram and examine whether the spectrogram can be recovered from the non-masked area. We propose a method to improve the accuracy of speech restoration by providing a gradient in the frequency direction to the spectrogram. As a result, our proposed method improved one of sound metrics: Mel-Cepstral Distortion. We also confirmed that the attention map improved attention in the frequency.

    Download PDF (767K)
  • Takamichi TODA, Yuta TOMOMATSU, Masakazu SUGIYAMA
    Session ID: 2Yin5-01
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    In recent years, more and more companies are introducing chatbots in the field of customer support, but there is a problem that the maintenance of chatbots requires a lot of manpower. We propose a system to streamline the redesign of response candidates, which is one of the most time-consuming maintenance tasks. The redesign of answer candidates can be done in two ways: by tying them to existing answers, or by creating new answers. For the former, we proposed an automatic linking method based on the continuity of speech, and for the latter, we proposed a method for extracting new inquiries by using clauses. We evaluated these methods by applying them to the logs of chatbots used in actual customer support and were able to show the effectiveness of the methods in multiple domains.

    Download PDF (295K)
  • Sota HORIUCHI, Ryuichiro HIGASHINAKA
    Session ID: 2Yin5-02
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    In order for a dialogue system to provide information and recommendations tailored to the user, it is important to ask the user questions and obtain the necessary information. However, depending on the question, asking it in the middle of a dialogue may disrupt the flow of the conversation or decrease the user’s satisfaction. In this research, we aim to develop a response generation model for a chat-oriented dialogue system that can ask specific questions naturally. Specifically, we constructed a response generation model which generates utterances on the basis of both the dialogue context and the question to be asked. As a result of experiments using dialogue simulation, we confirmed that the proposed model has the potential to ask specific questions while maintaining naturalness.

    Download PDF (354K)
  • Atsushi KEYAKI, Yuuki TACHIOKA
    Session ID: 2Yin5-03
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    In this study, we aim to generate high quality data by considering the similarity between entities. The experimental evaluations showed that the classification accuracy was improved by considering the similarity of entities.

    Download PDF (436K)
  • Masakazu SUGIYAMA, Ryoma YOSHIMURA, Yuta TOMOMATSU, Mamoru KOMACHI
    Session ID: 2Yin5-04
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    In recent years, the performance of speech recognition and speech synthesis has improved, and automatic voice response services using them have begun to be widely provided. In that service, the accuracy of speech recognition is an important factor that is directly linked to the quality of service, but the accuracy of speech recognition is not perfect even though the performance has been improved. So, we consider correcting speech recognition errors, in the same way as grammatical error correction. The performance of grammatical error correction has improved dramatically due to the rise of deep learning methods using language models pre-trained with a huge corpora, but there is no huge Japanese speech recognition error corpus. Therefore, we analyzed the tendency of errors from a small Japanese speech recognition corpus, formulated error assignment rules, and applied the rule to a huge Japanese corpus to automatically create a pseudo speech recognition error corpus. In this study, we perform an error correction experiment by Transformer using pseudo error corpora created under multiple settings for pre-training, and evaluate the effect of corpus creation settings on accuracy.

    Download PDF (424K)
  • Taiki Miyanishi MIYANISHI, Motoaki KAWANABE
    Session ID: 2Yin5-05
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    We propose an open-ended multimodal video question answering (VideoQA) method that simultaneously takes motion, appearance, and audio signals as input and then outputs textual answers. Although audio information is useful for understanding video content along with visual one, standard open-ended VideoQA methods exploit only the motion-appearance signals and ignore the audio one. Moreover, due to the lack of fine-grained modeling multimodality data and effective fusing them, a few prior works using motion, visual appearance, and audio signals showed poor results on public benchmarks. To address these problems, we propose multi-stream 3-dimensional convolutional networks (3D ConvNets) modulated with textual conditioning information. Our model integrates the fine-grained motion-appearance and audio information to the multiple 3D ConvNets and then modulates their intermediate representation using question-guided spatiotemporal information. Experimental results on public open-ended VideoQA datasets with audio track show our VideoQA method by effectively combines motion, appearance, and audio signals and outperformed state-of-the-art methods.

    Download PDF (797K)
  • Ayana NIWA, Horoshi MATSUDA
    Session ID: 2Yin5-06
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    In recent years, much research about sentiment analysis has focused on the emotional aspects of sentiment, such as the causes of emotions, and this study focuses on the "diverse of emotion sensitivities." In constructing datasets for sentiment analysis, it is common to set up various grammatical rules and word recognition criteria to guarantee labeling consistency because of the fluctuation in emotional understanding among annotators. However, strict criteria can cause biases, such as excluding the emotional expression that the reader naturally perceives from the annotation targets partially. Therefore, in this study, we propose a policy for the intuitive annotation of emotional expressions by readers. Then, we analyze the fluctuation of emotional interpretation and annotations expressed with the constructed dataset. In addition, we evaluate the ability of semi-supervised learning using unlabeled data to absorb the fluctuation of polarity expressions and labels.

    Download PDF (364K)
  • Estimation of Multiple aspect category polarities and target phrases
    Yoshihide MIURA, Etwi Barimah APPIAH, Masayasu ATSUMI
    Session ID: 2Yin5-07
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    Sentiment analysis is a task that aims to analyze opinions, feelings, and attitudes from texts, and classifies whether the polarity of them is positive or negative. One of the tasks of sentiment analysis is aspect-based sentiment analysis. This task analyzes the sentiment of a text by extracting entities and attributes as aspectual information contained in the text, and classifies the polarity of them from their context. In this paper, we propose a neural network model that solves three tasks of identification of multiple aspect categories, polarity classification, and identification of target phrases for each aspect category by using the pre-trained language model BERT for text encoding. The performance of the model is evaluated using the SemEval dataset. Experiments show that the accuracy of the model in identifying aspect categories in texts and estimating their polarity is 98% and 95% respectively and the accuracy of the target phrase estimation is 81%.

    Download PDF (748K)
  • Hiroaki TAKATSU, Ryota ANDO, Yoichi MATSUYAMA, Tetsunori KOBAYASHI
    Session ID: 2Yin5-08
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    We propose a method to identify emotion labels of titles and sentences of news articles using a model combining BERT and BiLSTM-CRF as a problem of sequence labeling. First, we constructed a dataset with emotion labels ("positive," "negative," or "neutral") annotated for titles and sentences of news articles, and then evaluated its effectiveness of the model using the dataset. Furthermore, as an application example, we demonstrate a task of manually classifying articles written about a certain keyword into positive, negative, or neutral. We confirmed that the classification work could be completed in a shorter time than when these emphasis was not applied when the colors of titles and sentences were emphasized according to the estimated emotion labels.

    Download PDF (378K)
  • Shiho SASAKI, Wataru OKAMOTO, Kouhei OSATO, Kazuhiko KAWAMOTO
    Session ID: 2Yin5-09
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    In robot control using reinforcement learning, it is becoming common to acquire polices in a simulation environment and then apply them to a real environment. Since there is a gap between their environments, several methods have been proposed for bridging the gap by training robots in various simulation environments. In this work, we propose a curriculum reinforcement learning method for robots that can walk in various terrains. For the curriculum learning, the terrain in the simulation environment is represented by an Ising model and its interaction parameter is used to determine the complexity of the terrain shape. From the nature of the Ising model, the terrain becomes flat when the interaction parameter is large and uneven when it is small. The evaluation experiments show the effectiveness of the terrain parameterization for curriculum reinforcement learning.

    Download PDF (284K)
  • Ayano ENTA, Ichiro KOBAYASHI, Lis Kanashiro PEREIRA
    Session ID: 2Yin5-10
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    The internal behaviors of a model acquired by reinforcement learning cannot be understood by humans because the model itself is a black box. Therefore, we apply fuzzy modeling for the input-output relationships of a deep reinforcement learning model, and express the relationships with fuzzy language variables to make lingistic control rules. In this study, using CartPole as an experiment subject, we explain control rules of the model learned by Deep Q-Network in language, and try to control CartPole using those control rules.

    Download PDF (566K)
  • Sio RYUU, Masayasu ATSUMI, Yuuki MURATA
    Session ID: 2Yin5-11
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    This paper proposes a method of global self-localization based on a deep neural network of spatial feature recognition. The spatial feature recognition network consists of four modules of a spatial feature extraction CNN,a spatial category classification CNN,a semantic segmentation network for estimating surrounding semantic segment distribution and an instance category classification CNN.Global self-localization is performed based on instance categories and guide signs which is recognized by OCR of sign segments. Experiments are conducted for evaluating performance of the proposed global localization method.

    Download PDF (781K)
  • Kota MANTANI
    Session ID: 2Yin5-12
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    In recent years, underwater drones have been developed and introduced for investigating underwater. It is indispensable to measure the distance between the robot and the obstacle in correctly in order to carry out the investigation or operation efficiently and safely by an unmanned underwater robot. In this paper, in order to automatically achieve depth estimation from the images taken by an underwater drone equipped with a monocular camera, a machine learning based algorithm which enables underwater depth estimation by applying the previous work with a depth estimation model for the ground condition is proposed. In addition, in order to confirm the performance of the proposed method, quantitative evaluation to the generated underwater depth map is performed. As a result, we success to apply the ground depth estimation model to the underwater image and obtain a highly accurate underwater depth map. Furthermore, by using the distance information from an equipped sonar, the depth of the target object can be corrected with high accuracy.

    Download PDF (509K)
  • Yuki MURATA, Masayasu ATSUMI
    Session ID: 2Yin5-13
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    In the field of person re-identification, it is necessary to deal with various poses and clothing changes to identify the same person from multiple camera images. However, existing deep learning methods, which are strongly affected by appearance of person images, have problems in extracting person features invariant to poses and clothing. To solve this problem, we propose models that jointly learn networks for reconstructing shape and texture features used in 3DCG person synthesis in addition to a person feature extraction network for person re-identification. The proposed model has an OSNet as the backbone and consists of a shape feature reconstruction module, a texture reconstruction module, and a person re-identification module. We evaluate the robustness of the proposed model to changes by using the LTCC dataset, where changes in clothing are taken into account, and the reconstructed Market-1501 dataset to take into account changes in poses.

    Download PDF (907K)
  • Katsuhiro ARAYA, Yasuyuki NAKAMURA, Shigemitsu YAMAOKA, Kazuhiko NISHI
    Session ID: 2Yin5-14
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    In today's Japan, dangerous tailgating driving has become a social problem. In this study, we show that a system can be used to detect and warn the driver of dangerous driving. We also demonstrate the feasibilities of the system using actual data as an experiment.

    Download PDF (726K)
  • Junichi OKUBO, Kouhei OZASA, Masahiro OKANO, Hiroaki SUGAWARA
    Session ID: 2Yin5-15
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    In Smart City planning, vehicle counting is very important task. In this task we propose to use Re-Identification model. Using CNN with cosine similarity we evaluate Re-Identification to replace centroid object tracking. We reached 89% accuracy, which is a little behind using centroid object tracking but it is more simple and have enough room to improve.

    Download PDF (551K)
  • Masahiro OKANO, Junichi OKUBO, Kouhei OZASA, Hiroaki SUGAWARA, Junichi ...
    Session ID: 2Yin5-16
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    Traffic surveys are carried by manual. For automation, traffic counting application of vehicle types using object detection technology has been developed. In previous studies, SSD300 with VGG16 as the base network was adopted, however assuming actual operation, the required speed for data processing and accuracy differ depending on the survey condition, and thus it is desirable to use the model properly according to the situation. This study compared the performance (the inference time and mAP) of learning the same data set with SSD models with three networks (VGG, MobileNetV1 and MobileNetV2). MobileNetV1 and V2 won the VGG with similar mAP, however V1 and V2 had similar inference time.

    Download PDF (376K)
  • Takuya SAWADA
    Session ID: 2Yin5-17
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    In recent years, as a kind of important sensor camera is introduced in many automobiles in the field of automated driving. In particular, the recognition accuracy of the camera is an important research issue that leads to the safety of both drivers and pedestrians. Furthermore, the degradation of recognition performance under hazardous conditions is a critical issue. The proposed method based on panorama images containing information from 360 ° directions in a single image. In this paper, we propose rainfall and raindrop removal model for panorama images in order to improve the recognition accuracy of panorama images in rainy condition. A traditional model dedicated to the condition of rainfall is trained on panorama images because the distortions specific to panorama images occur and the recognition accuracy decreases. Moreover, the proposed model is also verified with panorama images including water droplets. Finally, the proposed method is evaluated with panorama images including rainfall and raindrop with respect to PSNR and SSIM.

    Download PDF (570K)
  • Shigeru SUGAWARA SUGAWARA
    Session ID: 2Yin5-18
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    Machine-learning was applied to the automation of data analysis of mid-infrared hyperspectral imaging. It was verified whether the distribution of five types of black marking pens on recycled paper could be visualized accurately. The measurement results of blackened circles were used as training data, and the measurement results of the cross letters and the all-pens’-sample were used as verification data. First, the difference spectra were calculated for the spectra and its second derivative. Next, the principal component score was obtained by the principal component analysis. Classification accuracy was 94% by using the second discriminant analysis. The distribution of ink's difference of the crossed lines could be visualized accurately. For the all-pens'-sample, although there were some misidentifications, the ink distribution could be identified with some accuracy.

    Download PDF (632K)
  • Shinya OKONOGI, Shinnosuke TOMIYAMA, Masashi YAMAMOTO, Jun KITAGAWA, F ...
    Session ID: 2Yin5-19
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    In previous researches, we introduced Phase Only Correlation (POC) to the image-processing-based alignment method for lithography. However, the accuracy of alignment results varied from case to case. In this research, we carry out an accuracy evaluation method, where multiple AI models are applied and compared. We managed to achieve a 96% accuracy and a 99% recall from the best performed model.

    Download PDF (508K)
  • Shunnsuke ASAHI, Hiromi NARIMATSU, Junji YAMATO, Hirotoshi TAIRA
    Session ID: 2Yin5-20
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    We have developed a system that recognizes objects in illustration images with high accuracy. Illustration images generally include types such as color images, grayscale images, and line arts. These types of images have different characteristics from each other, which hinders high-precision image recognition. In this study, we propose a new method for recognizing objects in illustration images. The method recognize objects using a recognizer dedicated to each type after classifying the illustration images into three types. In the experiment, it was realized object recognition in the illustration image with higher accuracy than the exsiting methods.

    Download PDF (667K)
  • Yuma TAMURA
    Session ID: 2Yin5-21
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    In recent years, unmanned underwater vehicle(UUV) has attracted many attention for ocean exploration. For a UUV, the object recognition makes an important role for automatic cruising. However, underwater object recognition meets some special difficulties due to the non-linear image deterioration. Furthermore, in the case of a neural network(NN) based object recognition system, real-time processing is required embedded GPU. In this paper, we evaluate the real-time performance of the existing technique on the GPU module (JetsonXavier NX) for the embodiment, and a new correction method is proposed based on the evaluation results. In addition, the structural similarity and the real time performance of the existing algorithms are implemented on JetsonXavirer NX to make a comparison. Finally a new neural network is proposed to enable real-time processing of recognition based on the evaluation result. The proposed method improved peak signal-to-noise ratio(PSNR) and structural similarity(SSIM) in the region having fewer red light, compared to previous methods. Furthermore, this method can restore the perceptual and statistical qualities of the distorted images in real-time.

    Download PDF (489K)
  • Sayako WATANABE, Lis Kanashiro PEREIRA, Ichioro KOBAYASHI
    Session ID: 2Yin5-22
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    Although recent text-to-image models achieved great success on generating images from the description of an object, such as a bird with brown and black striped wings and a yellow beak", these models may still struggle to generate images based on the understanding of the attributes of the object. We propose a text-to-image model that better reflects the meaning of words that express an object's attribute (i.e., adjectives). More specifically, we consider the case where the vector representation of shoes' images are changed with four adjectives, i.e., sporty, comfortable, pointy, and open, and we generate images that better reflect the meaning of these adjectives.

    Download PDF (1512K)
  • Mie HAYASHI, Naoki MORI
    Session ID: 3D1-OS-12a-01
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    In recent years, the application of artificial intelligence technology to the fashion field has attracted attention. This research proposes a method of optimizing the fashion outfit schedule by acquiring the performance scores of outfits from a deep learning model that learns outfits composed of images of multiple clothes and accessories. In the proposed method, first, input the outfit that combines the clothes you have into the already learned Bi-LSTM + VSE model, and get the performance scores. Based on the scores, I created a list consisting of multiple outfits, that is, a mix and match clothing plan, using the Thermodynamical Genetic Algorithm (TDGA). We impose restrictions that the same outfit should not be used during the period, the same item should not be used within 3 days, and there should be no items that have never been used during the period. These restrictions make it is possible to create a mix and match clothing plan while considering diversity. To confirm the effectiveness of the proposed method as a recommendation system, numerical experiments were carried out taking real fashion item data as examples.

    Download PDF (1259K)
  • Arisa OBA, Shoki OTA, Taichi AMANO, Seiya YAMAMOTO, Takehiro MOTOMITSU ...
    Session ID: 3D1-OS-12a-02
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    There have been challenges that automatically generate creative elements such as game quests and dungeons, but automatic generation for game has not been focused on important immersiveness. In this study, the goal isn't only to automatically generate games using artificial intelligence, but also automatically generate games that appeal to user's sensibilities. A system that integrated automatic scenario generation, BGM selection, facial image generation, and conversion of sentence end for action role-playing game was developed. In the scenario generation, quest scenario was automatically generated based on the results of moving scene analysis for existing works. Next, BGM selection was realized based on emotional state of scene. Furthermore, in order to realize naturalness of character, it was possible to generate facial images according to the attributes and automatically conversion of sentence end. A system was constructed by integrating these elements in action role-playing game format which user can flexibly act.

    Download PDF (601K)
  • Sayaka NADAMOTO, Naoki MORI, Makoto OKADA
    Session ID: 3D1-OS-12a-03
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    In recent years, recognition technology using artificial intelligence has been attracting attention against the background of the rapid development of machine learning. Especially in the field of image recognition, research results have been actively reported, and the advent of neural networks has made it possible to recognize objects with high accuracy in a wide range of tasks. On the other hand, in astrophotography, it is difficult to draw a line between the target celestial body and other areas, and the state of the stars differs greatly depending on the photographer and the equipment, so identifying the constellations in astrophotography is a very difficult task even with artificial intelligence. In addition, the lack of datasets on astrophotography necessary for learning artificial intelligence is one of the factors that make it difficult to identify constellations in astrophotography. In this study, we propose a method to generate constellation images from star map data and use them to identify constellations in astrophotography. The effectiveness of the proposed method has been confirmed by numerical experiments.

    Download PDF (1398K)
  • Keita TSUKIMAWASHI, Miki UENO
    Session ID: 3D1-OS-12a-04
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    Recently, document digitalization have decreased an oppotunity of handwriting. On the other hand, such tendency enrich handwriting in some special situations. In order to put some thought into handwriting and express personality, people sometimes debate whether character style suits situations especially considering shape and balance. It is difficult to select character style on writing unless they vividly aware of attractive shape of characters. Those are broadly divided into two styles, "formal" and "casual"; the former means characters suit a curriculum vitae, the latter means characters suit private memo. In this study, we have built the system to visualize probabilities of classifying "formal" and "casual" in order to give suggestion about suitable character styles with Convolutional Neural Networks(CNN). Furthermore, we have introduced the function to automatically generate character images in ideal style with Generative Adversarial Networks(GAN).

    Download PDF (1206K)
  • Yuuri SAITO, Hajime MURAI
    Session ID: 3D2-OS-12b-01
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    Characters are one of the most important elements in creating an engaging story. There are many works in which characters make it more interesting. Therefore, in the fields of narrative theory and natural language processing, many studies have been conducted on the roles of characters and their relationships. However, there have been few quantitative studies focusing on the number of characters. In this study, we analyzed the number of characters and their roles in each story in boys' comic, and accumulated numerical data on the appearance, exit, and role change of characters at appropriate times in a full-length story. In addition, conducted factor analysis to extract the combination patterns of the roles of the characters. As a result, it became clear that there is a common feature in the number of characters in a story in boys' comic. In addition, we extracted frequent patterns in the arrangement of characters over a medium- to long-term time series.

    Download PDF (370K)
  • Wataru YOSHIDA, Akira TERAUCHI, Naoki MORI, Makoto OKADA
    Session ID: 3D2-OS-12b-02
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    Data Augmentation (DA) is a technique to generate additional data from existing data and effectively interpolate the data space. However, choosing the right DA for a given dataset and task requires a high level of expertise, a large amount of time, and in many cases a deep understanding of the data domain. In order to solve the above problems, with the recent development of automatic machine learning, attention has been paid to automatic DA application methods for image recognition tasks, and it has been reported that AutoAugment and Fast AutoAugment provide the best results in various image classification tasks. However, the conventional automatic application methods of DA have only dealt with benchmark problems and have not yet been applied to real-world data. Therefore, in this study, we investigate effective augmentation methods for cartoon data by using TDGA AutoAugment (TDGA AA), which is a method proposed by the authors to search for augmentation strategies that fit the problem while maintaining diversity. The effectiveness of the proposed TDGA AA is confirmed by computer simulation taking several cartoon datasets as examples.

    Download PDF (622K)
  • Ryoma OKAMOTO, Akira TERAUCHI, Naoki MORI, Makoto OKADA
    Session ID: 3D2-OS-12b-03
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    In recent years, semantic segmentation by DeepLabv3+ has attracted much attention. One of the features of this model is atrous convolution, which adjusts the convolutional range of filter (field-of-view of filter) by dilating its convolutional position (viewpoints of filter). However, previous atrous convolution considers only adjusting field-of-view of filter and does not consider adjusting viewpoints of filter. In this paper, we propose free configurational atrous convolution, which is an extension of the viewpoint placement method in previous atrous convolution, and semantic segmentation based on this method. In the proposed method, we first partition viewpoints in the 3 × 3 convolutional filter into two groups. Next, we applied atrous convolution with different rates to the filters. Then, by adding the outputs, we realized the convolution with the adjustment of field-of-view of filter and viewpoints of filter. The effectiveness of the proposed method is confirmed by computer simulations taking the benchmark dataset of semantic segmentation as an example.

    Download PDF (412K)
  • Kiichi HIRAI, Miki UENO
    Session ID: 3D2-OS-12b-04
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    A photographic paper set an impression of a photograph. It strongly relates to expression of creative work. Nowadays, there are many types of photo paper available such as glossy paper and matte paper. These options provides creators with expressive capacity. On the other hand, it makes difficult to select the optimal paper harmonizing with a certain photo. From the reason, novice photographers often give up printing. For expert ones, it cost a lot of time to select a suitable paper by their own on applying to photographic contest. To solve such a problem, we aim to improve printing technique and provide creative support for both photographers. In this study, we built a system to estimate the optimal paper for a photo by machine learning techniques. The dataset were prepared from the photographic contest and the model were constructed with fine-tuning. Furthermore, we develop a function to visualize a part of photo which is contributed to estimate.

    Download PDF (1059K)
  • Toshinori AOKI, Miki UENO
    Session ID: 3D2-OS-12b-05
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    Photographers carefully select main object, sub theme, location and time in order to make creative works. They tend to consider their difficulties and experiments to evaluate their photos. Thus, they sometimes miss subjective perspective despite the importantance to improve photographic technique. Nowadays, it is easy to obtain "like" evaluation from the others in Social Media. However, they provide evaluation with only a part of works on the web. From the reason, it dose not always relates to give suggestion for improving techniques. In this research, we have constructed the original dataset considering suitability of target object, relationships between target objects, composition, and Japanese sense of the season. In addition, we have built the system with machine learning to estimate evaluate points from the others for photographers.

    Download PDF (849K)
  • Hajime MURAI
    Session ID: 3D4-OS-12c-01
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    Previous researches in automatic story generations have focused on mainly simple patterned stories. In this research, complex story patterns were targeted and those were analyzed as combination of fundamental story patterns. As a result, 43 fundamental story patterns were extracted. Moreover, extracted fundamental story patterns were utilized to build new story framework based on included common elements. This research proposed two types for combining fundamental story patterns, the continuous structure, and the nested structure.

    Download PDF (542K)
  • Shuuhei TOYOSAWA, Hajime MURAI
    Session ID: 3D4-OS-12c-02
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    To automatically generate a story, it is necessary to create plots that compose the story and generate sentences based on the plots. It is also necessary to make the content meaningful at the stage of automatically generating the plot of the story. One of the interesting things about the story is the surprising punch line. To generate a surprising punch line must incorporate punch line plot at the stage of creating plot. Shinichi Hoshi's Flash Fiction is one of the works that includes the punch line with externality. Regarding the Hoshi’s works, detailed patterns of punch lines have been clarified in some themes. In addition, it has become clear that the classification of the elements that make up the pattern and the functions, context dependencies, and sentence expressions of the elements can be extracted as structured data. In this study, we automatically generated story plots including punch lines using the patterns and elements of punch lines that were clarified in previous studies. Specifically, we created an algorithm that uses two patterns from the theme of "robbery, thieves, fraud, and prisoners" and automatically generates them using the data of 168 elements that can be assigned to those patterns. In the future, data conversion and automatically generate elements that can be assigned to the SF pattern, which is a representative genre of Shinichi Hoshi are planted.

    Download PDF (412K)
  • Riku IIKURA, Makoto OKADA, Naoki MORI
    Session ID: 3D4-OS-12c-03
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    We study the problem of automatically generating a well-coherent story by a computer. In general, stories such as novels and movies are required to be coherent, that is, the beginning and ending of the story need to be properly connected by multiple related events with emotional ups and downs. Against this background, we have set up a new task to generate a story by giving the first and last sentences of the story as input and complementing them. In this paper, we propose a model that considers information of the last sentence in the process of generating sentences forward from the first sentence of the story. We evaluate the generated story using the story coherence evaluation model based on the general-purpose language model newly made for this paper, instead of the conventional evaluation metrics that compares the generated story with ground-truth. Through experiments, we show that the proposed method can generate a well-coherent story.

    Download PDF (431K)
  • Motoki SAKAI, Ren OHMURA, Shogo OKADA, Masaki SHUZO, Hirohiko SUWA, No ...
    Session ID: 3E1-OS-5a-01
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    In this research, a consortium was formatted by 11 university including Tokyo Denki University to collect multimodal dataset in group discussion (GD) by the unified experimental protocol. So far, multimodal dataset, movies and, audio data were collected from two types GD: brain storming (BS) and GD aimed at consensus building. During GD, ECG; EMG; acceleration and angular velocity of user’s body and head; light of sight; pulse signal were measured as multimodal dataset. In addition, emotional annotation or classification tags were labeled for each utterance to analyze communication skills. Collected dataset will be able to be used for various purposes.

    Download PDF (762K)
  • Mika NAKANO
    Session ID: 3E1-OS-5a-02
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    Recent attention has been to argumentation due to its value in individual development as well as for adapting to changes in society, VUCA. Although studies of argumentative discourse are useful and are of great value, the number of such studies are limited. Nakano (2018) revealed two factors which facilitate argumentative discourse through observation: task structure (structured/non-structured) and relationship (competition/cooperation). The present paper analyzed the effects of these factors on participants. An experiment was conducted with twelve groups of thirty-six students. The groups were allocated one condition of four. Three factors were found to have subjective effects on argumentative discourse following a factor analysis of participants’ post survey. The results of a two-way ANOVA indicated that two factors had statistically significant effects on students’ argumentative discourse. Condition B (non-structured/competition) was revealed as having the greatest effect on all.

    Download PDF (337K)
  • Masahide YUASA
    Session ID: 3E1-OS-5a-03
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    Several studies have reported different ways to improve the discussion skills of students, as many companies have attempted to use both job interviews and group discussions in their recruitment processes. However, these studies have introduced methods for improving the skills of students who already possess adequate discussion skills; such methods may be ineffective in improving the skills of those students who do not have sufficient discussion skills. In addition, the advice in the literature is not easy to understand for students who are not specialists in communication. Therefore, this study attempted to provide suitable advice based on the existing skill level of the student and to establish effective advice methods. The advice in this study was prepared by the students and provided to other students. Based on interviews, we propose a chart to provide simple advice that can be easily understood by students. We have described the results of the interviews and the criteria for giving advice and its effects on students.

    Download PDF (957K)
  • Akihiro TAKAGI, Masaki SHUZO, Motoki SAKAI
    Session ID: 3E1-OS-5a-04
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    If we can recognize emotions in real time from people's biometric data, it will be possible to respond and speak according to their emotions. In this study, we conducted group discussion experiments with 15 university students participating in eight sessions. During these experiments, we obtained the participants' body movement data and emotional annotation data by subjective evaluation. A machine learning method was used to recognize emotions using acceleration and gyro sensor data. As a result of initial examination by Random Forest, we obtained a discrimination rate of more than 70% for pleasant and unpleasant states.

    Download PDF (555K)
  • Ryosuke UENO, Tatuya SAKATO, Yukiko NAKANO
    Session ID: 3E2-OS-5b-01
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    Providing feedback to a speaker is an essential communication signal for maintaining a conversation. In addition to verbal feedback responses, facial expressions are also effective modalities to convey the listener's response to the speaker's utterances. Moreover, not only the type of facial expressions, but also the degree of intensity of the expression may influence the meaning of the specific feedback. In this study, we propose a multimodal deep neural network model that predicts the intensity of facial expressions co-occurring with feedback responses. We collected 33 video-mediated conversations by groups of three people and obtained language, facial and audio data for each participant. We also annotated feedback responses and clustered their BERT-embedding expressions to classify feedback responses. In the proposed method, a decoder with attention mechanism for audio, visual, and language modalities produce the intensity for the 17 AUs frame by frame and a classifier of feedback labels were trained by multi-task learning. In the evaluation of the prediction performance of the feedback label, there was a bias in the prediction performance depending on the category. For AU intensity prediction, the multi-task model had a smaller loss function value (loss) than the single-task model, indicating a better model.

    Download PDF (522K)
  • Atsushi ITO, Tatsuya SAKATO, Yukiko NAKANO
    Session ID: 3E2-OS-5b-02
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    There are many opportunities that people make decisions through group discussions, and persuasiveness is one of the important skills to take advantage during the discussion. With the goal of estimating the persuasiveness of conversation participants, first, we collected a group discussion corpus and annotated persuasiveness of each participants for one-minute intervals. Then, using the dataset, we created GRU-based neural network models that estimate the participant’s persuasiveness from speech, language, and visual information. Comparing unimodal and multimodal models, we found that a multimodal model that combines language and audio information performed best, and the accuracy was 0.5625.

    Download PDF (534K)
  • Taiga MORI, Yasuharu DEN
    Session ID: 3E2-OS-5b-03
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    In this paper, we focused on changes in group interaction with increasing intimacy. It was suggested that in the initial stage, physical environments such as the seat configuration have a big effect on the frequency of exchanges but the effect of psychological factors such as the intimacy becomes bigger with developing personal relationship. Moreover, participants intimate with a particular co-participant used a design to exclude other co-participants from the ongoing topic.

    Download PDF (495K)
  • Yuriko TACHIBANA, Yutarou TAKEUCHI, Norihisa SEGAWA
    Session ID: 3E2-OS-5b-04
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    With the spread of COVID-19 throughout the world, it is important to minimize the spread until effective drugs and vaccines are developed against this disease. Therefore, concentrated contact needs to be reduced to prevent human-to-human transmission. In this paper, we propose a simplified system that uses a monocular camera to measure physical distancing and warn people of intensive contact with others. We use an inexpensive camera and equipment to monitor the space aiming at warning people who are under concentrated contact conditions.

    Download PDF (994K)
  • Its availability for evaluation of group interaction
    Kazuaki KONDO, Taichi NAKAMURA, Yuichi NAKAMURA, Shinichi SATOH
    Session ID: 3E2-OS-5b-05
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    Facial expression is important and well-used clue to estimate emotion of a target person. However, in daily situation, facial expression captured in an image is often ambiguous and not reliable to estimate emotion behind it. In this report, we introduce a new approach to this issue that recognizes change in facial expression according to smiling. The proposed method compares two input images to estimate ascent or descent of smiling intensity. It achieved over 90\% recognition accuracy for facial expression changes during daily conversations. We also discuss availability of recognizing facial expression change for several applications and remained issues required to be solved for practical use.

    Download PDF (1546K)
  • Kota NINOMIYA, Hiroki SHINOHARA, Satoshi KODERA, Susumu KATSUSHIKA, Mi ...
    Session ID: 3F1-GS-10i-01
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    To build semantic segmentation models for interpretation of medical images, doctors need to make supervised images. Active learning is adopted to decrease annotation cost. However, little is known about the effective allocation of the supervised/un-supervised data during the first and the later training. We investigated the effect of the fluctuation of its allocation using 1463 Intravascular ultrasound images and discussed effective strategies of active learning. We first built three models which used 107, 359, 723 images respectively. Three models were built using 400, 700, 1000 supervised images and images sets predicted by the first models. The result suggests that the same accuracy is reached regardless of the number of the images at the beginning. We concluded that it would be efficient to start training even from a small number of supervised images and build following models using annotated images with higher uncertainty and predicted images.

    Download PDF (457K)
  • Mitsuhiko NAKAMOTO, Susumu KATSUSHIKA, Satoshi KODERA, Hiroki SHINOHAR ...
    Session ID: 3F1-GS-10i-02
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    The development of deep learning algorithms usually requires large labeled training datasets. However, some kind of medical data, such as the echocardiogram videos of cardiac sarcoidosis, is highly difficult to collect. The purpose of this study was to develop a deep learning model to detect cardiac sarcoidosis using a small dataset of 302 echocardiogram videos. We compared several different model architectures including 2D and 3D models, and also discussed the effect of pretraining on a large open dataset of echocardiogram videos. We found that 3D models outperforms 2D models, and the pretraining improved the performance of the model from an AUC of 0.761 (95% CI 0.610, 0.911) to 0.841 (95% CI 0.716, 0.968).

    Download PDF (488K)
  • Haruto ONIZUKA, Tougorou MATSUI
    Session ID: 3F1-GS-10i-03
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    In this paper, we propose a method to supplement missing values in blood test data. In recent years, the digitization of hospital charts has progressed, and the number of electronic data that can be utilized is enormous. However, it is only used in normal business and is not used for secondary purposes. It is expected that oversight can be prevented by analyzing a large number of test results that are difficult for doctors to distinguish by analyzing the data and presenting the results to doctors. Since the doctor selects the test items of the blood test as needed, the values of many test items become unobserved (missing values). In the multiple imputation method that complements on the premise that defects occur randomly, the missing values are estimated using a linear model, but the missing values in the blood test data do not occur randomly. Therefore, in this paper, we try to complement by estimating the missing values using a nonlinear model. Besides, blood test data will be classified using multiple machine learning methods, and the effects will be confirmed.

    Download PDF (402K)
  • Takumi YOSHIDA, Tohgoroh MATSUI
    Session ID: 3F1-GS-10i-04
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    In this paper, we will extract important test items from blood test data by using game theory-based importance decisions. Since the weights were based on the number of occurrences of the test items selected by the stepwise method, the weights were assigned uniformly regardless of the order of occurrences and were not necessarily appropriate in terms of importance. In this paper, we propose a method of assigning weights to test items selected by the stepwise method by using the importance decision method based on game theory. It is also applied to actual blood test data to extract test items that are considered important.

    Download PDF (381K)
  • Genya NOBUHARA, Hideki FUJII, Hideaki UCHIDA, Shinobu YOSHIMURA
    Session ID: 3F1-GS-10i-05
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    In the current home medical care system, the matching of patients and doctors and the scheduling of medical care are done manually, which is inefficient for doctors. In order to make home medical care more general, scheduling must be more efficient and automated. The goal of this research is to develop an efficient algorithm that helps to create such a schedule. As a first step, the authors applied deep reinforcement learning to the vehicle routing problem (VRP), a problem for minimizing the travel costs of multiple vehicles that travel from a starting point to a demanded point with satisfying all demands. Then, the problem was extended to the scheduling problem for visiting patients by adding conditions specific to home medical care, such as time constraints for treating patients in their desired time frame and matching patients and doctors according to symptoms, gender, etc.

    Download PDF (445K)
  • Miho KASAMATSU, Takehito UTSURO, Yu SAITO, Yumiko ISHIKAWA
    Session ID: 3F2-GS-10j-01
    Published: 2021
    Released on J-STAGE: June 14, 2021
    CONFERENCE PROCEEDINGS FREE ACCESS

    In developmental psychology research, there is a specific order of infants' development, and infants show reactions in accordance with their developmental stages. Ishikawa and Maekawa (1996) focused on the relationship between the infants' developmental stages and the characteristics of picture books or the interaction induced by them. They showed that there is a specific order of infants' development in the picture book readings and daily lives. However, their study is based on the questionnaire survey of the year of 1996. Since 1996, there have been plenty of changes daily lives of infants and their relation to picture books. Based on this observation, we conducted another questionnaire survey in the year of 2020 that is almost the same as the one of the year of 1996. Then, we compare the results of analyzing the order of infants' development between the years of 1996 and 2020. It is shown that infants' development that is related to interest in picture books tends to be observed earlier in 2020 than in 1996. Infants' development that is related to picture book reading with infants' utterance, on the other hand, does not tend to be observed earlier.

    Download PDF (586K)
feedback
Top