Proceedings of the Annual Conference of JSAI

Geometric Structure-Aware 3D Point Cloud Generation by Deep Learning

Takumi KIMURA, Takashi MATSUBARA, Kuniaki UEHARA

Session ID: 3I4-GS-7a-04
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_3I4GS7a04

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

A three-dimensional point cloud is used in a wide range of fields such as robotics and autonomous cars and is becoming popular as a compact representation of an object's surface. Deep generative models for point clouds typically have been adapted to model variations by a map from a ball-like set of latent variables. However, previous approaches have not paid much attention to the topological structure of a point cloud. For this reason, a continuous map cannot express the varying number of holes and intersections. In this paper, we propose a flow-based deep generative model with multiple latent labels. By maximizing the mutual information, a map conditioned by a label is assigned to a continuous subset of a given point cloud, like a chart of a manifold. This enables our proposed model to preserve the topological structure with clear boundaries, while previous approaches tend to suffer from blurs and to fail in generating holes. Experimental results demonstrate that our proposed model achieves the state-of-the-art performance in generation and reconstruction among sampling-based point cloud generators.

View full abstract

Download PDF (1249K)
Single Image Super-resolution Using Deep Unsupervised Learning Network

Kazuhiro YAMAWAKI, Xian-Hua HAN

Session ID: 3I4-GS-7a-05
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_3I4GS7a05

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

With the recent development of deep learning, super-resolution of a single image has been verified to improve the accuracy. These methods are implemented in a fully supervised manner for super-resolving the observed LR images with the known degradation model, and thus it faces difficulty to recover high-resolution images from the low-resolution images captured under unknown degradation models. In this study, we propose a deep unsupervised learning network to solve these problems. The proposed network architecture consists of a generative network for predicting high-resolution images and a degradation module for automatically learning the degradation operations for observed low-resolution images. Specifically, we exploit a Encoder-Decoder structure to serve as the generative network, which has been proven to have the powerful capability for modeling high quality images while the degradation module is implemented with a special depth-wise convolution layer, where its parameters are learnable. Therefore the proposed unsupervised learning SR framework is implemented in an end-to-end learning network training of the degradation module. To verify the effectiveness of the proposed method, we conduct extensive experiments on three publicly available benchmark datasets, and manifest superior performance even for the LR images captured under complex degradation models.

View full abstract

Download PDF (1450K)
Investigation and Analysis of Utterance Information Based on The Suspects' Actual Voice of The Communications Fraud

Kenichiro TSUJI, Sho MITARAI, Nagisa MUNEKATA

Session ID: 3J1-GS-6a-01
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_3J1GS6a01

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

The damage caused by communications fraud has become a major social problem, and various organizations and institutions are taking various countermeasures. However, it is difficult to deal with all of them because the keywords used in fraudulent calls change with the diversification of the methods. In this paper, we attempted to extract the speech characteristics of a suspect from the voice of a communications fraud. In this paper, we tried to extract the characteristics of the suspect's speech from the communications fraud voice. The results of morphological analysis showed that many words related to time were used, which gave a sense of urgency to the victims. In addition, the speaking speed was slightly faster than that of normal conversation. Using these results, we constructed a discriminant model for communications frauds and verified the classification accuracy.

View full abstract

Download PDF (446K)
A Study on Factors that Improve the Accuracy of Common Sense Reasoning in Natural Language Processing

Through the Linguistic Factors in Recent Benchmark Tasks

Hiromitsu OTA

Session ID: 3J1-GS-6a-02
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_3J1GS6a02

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

In recent years, the development of Transfomer-derived models, mainly BERT and RoBERTa, has been remarkable, and they have been put to practical use in all fields of natural language processing such as machine translation, automatic summarization, and automatic sentence generation. Knowledge representation and reasoning are used to support these, and by incorporating general knowledge into machines such as robots, there are active movements aimed at improving the accuracy of information retrieval and question answering. In this study, while the movement centered on BERT is being established, it is assumed that the improvement of the corpus provides the intrinsic value, and from the linguistic aspect, which factor contributes to the improvement of accuracy, and on the other hand, it is insufficient. It is to consider whether it is done. In particular, the area of common sense reasoning is centered on international benchmarking tasks, but it is always criticized that the language model is limited because it is created with a limited distribution of data sets. .. In this, it is necessary to check the contents of the leaderboard of each task. Wikipedia, ConceptNet, etc. are expected to improve accuracy by common sense reasoning of written words, but they also linguistically propose how to integrate common sense reasoning into interactive spoken language dialogue.

View full abstract

Download PDF (639K)
Relative risk between situations

Comprehensive analysis of situations based on distributed representation

Masaaki OZAKI, Emiko UCHIYAMA, Yoshifumi NISHIDA

Session ID: 3J1-GS-6a-03
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_3J1GS6a03

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

In this paper, we propose a new method for situational relative risk analysis that integrates a natural language processing technique and an epidemiological basic indicator, relative risk. The free-description style text data for accident situations were divided into 2 parts; pre- and post-accidental situation and converted into the situation vectors using distributed representation. For each part of the text data, the relative risk for situations was calculated. The relationship between the relative risk of the 2 parts was analyzed. To verify our method, we used the disaster benefit system of the Japan Sport Council (JSC) that is the complete data of school accidents in Japan. The proposed method enabled us to analyze the changes of the relative risk with the situation changes and to extract dangerous combinations of pre- and post-accidental situations.

View full abstract

Download PDF (1254K)
Data Augmentation Method using Phrase Exchanging in Text Classification Learning

Akito ARITA, Masayuki KOMAI, Daisuke SATOH, Ryousuke MARUKO, Megumi OH ...

Session ID: 3J1-GS-6a-04
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_3J1GS6a04

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

It is generally said that tens of thousands or more of training data are required to obtain high accuracy in text processing tasks using deep learning. However, small amounts of data are available to train the solvers in practical tasks. In this paper, we propose a method to improve the accuracy of a text classification task. The method utilizes a data augmentation by exchanging phrases depend the same phrase.

View full abstract

Download PDF (523K)
Efficacy of Graph-based Text Representations in Machine Reading Comprehension

Naoki KOSAKA, Tetsunori KOBAYASHI, Yoshihiko HAYASHI

Session ID: 3J1-GS-6a-05
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_3J1GS6a05

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

The graph-based representation of a text that properly captures its linguistic structures has been the main concern in natural language processing. It attracts more researchers recently, as a graph is an explicit symbolic representation that can be nicely combined with external knowledge resources. We explore the efficacy of graph-based text representations by devising and comparing reading comprehension models. Specifically, we construct the graph-based representation of an input text by basing the dependency structures of sentences and enhancing them with several methods that add inter-sentence edges. The resulting edge-rich graph is then fed into a graph convolution network to acquire a vector representation that is essential in solving the target multi-choice reading comprehension task. The experimental results suggest that the proposed graph-based model is promising and may contribute to further improve the performance by being coupled with the model relying on a large-scaled pre-trained language model.

View full abstract

Download PDF (579K)
Command Understanding Using Attention-based Sequence to Sequence in Executing a Task of Robots

Takumi TSURUE, Yongwoon CHOI

Session ID: 3J2-GS-6b-01
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_3J2GS6b01

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

In order to respond to human requests, it is very difficult for a robot to accurately understand a natural language to be expressed with many different ways for a unique command. In response to this, there is a method of the semantic analysis based on Seq2Seq (Sequence to Sequence) model of the attention-type for understanding the commands given by humans. However, since the output of the method is expressed in a complex logical form, it needs to be converted for the robot to perform the task for a command. Here, we propose a method for robots to understand commands with appropriate output in the execution of tasks. The output generated with the proposed method is composed of the information (words) necessary to a task in the order of [”task”, ”target”]. The experimental results using the instruction sentences used in a league of RoboCup@Home will show that it is possible to understand the instructions without using logical expressions.

View full abstract

Download PDF (581K)
Data-oriented Approach for Lookahead Response Generation

Yosuke KISHINAMI, Reina AKAMA, Shiki SATO, Jun SUZUKI, Ryoko TOKUHISA, ...

Session ID: 3J2-GS-6b-02
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_3J2GS6b02

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

In human-human conversation, the current utterance in a dialog is often influenced by previous and future contexts. Among these, looking ahead over future context is one of the most critical factors for active conversation. In this paper, we propose a novel training strategy to help neural response generation models generate responses that take into account information from the future context. Our training strategy considers a sequence consisting of the response and its future context as an output sequence, and the model learns to generate the output sequence from an input sequence, i.e., past utterances. In our experiments, we investigate the effect of the proposed strategy on the look-ahead ability of a dialog system via the "Lookahead Chit Chat Task."

View full abstract

Download PDF (386K)
Understanding of Ambiguous Language Commands Using Probabilistic Models

Daiki HOMMA, Tatsuya AOKI, Takato HORII, Takayuki NAGAI

Session ID: 3J2-GS-6b-03
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_3J2GS6b03

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

In recent years, home service robots, which assist human daily life, have been developed. However, the service robots do not distribute wider in the real environment because of the difficulty of understanding human language commands. For instance, when the robot receives the command "Put in the kitchen sink," it must grasp an appropriate object before moving to the kitchen sink. The robot is required to determine when it should perform a task from language commands. Furthermore, the robot is required to recognize the validity of language commands because humans sometimes make mistakes. This paper tackles these issues by employing probabilistic models. Our proposed model learns the relationship between robot observations (e.g., object image, the robot position) and verbal commands in an unsupervised manner. We evaluate the proposed method in the recognition tasks of tense and validity of verbal commands. The results reveal that our proposed model outperforms other machine learning methods.

View full abstract

Download PDF (620K)
An Interactive FAQ System with Ability to Understand What Users Want to Know

Hiroshi HONDA, Johane TAKEUCHI, Mikio NAKANO

Session ID: 3J2-GS-6b-04
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_3J2GS6b04

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

We propose interactive FAQ systems that have ability to explain what users want. In recent years, many services that explain product manuals such as FAQ systems have been provided. However, users may not always be able to express functions of products they want to know in a language. Furthermore, the users may even have misunderstandings about functions. Therefore, we develop an interactive FAQ system with ability to understand what users want even if they ask ambiguous questions and have misunderstandings. Using logistic regression, we trained models that predict car functions and misunderstandings about cars from user utterances. As a result of the evaluation, it was possible to confirm the high accuracy of function predictions and misunderstanding predictions despite the relatively small amount of training data. In addition, the results of the subjective evaluation confirmed that this system was easy for users to use.

View full abstract

Download PDF (864K)
How-to Tip Machine Comprehension with QA Examples collected from a Community QA Site

Tingxuan LI, Shuting BAI, Seiji SUZUKI, Takehito UTSURO, Yasuhide KAWA ...

Session ID: 3J2-GS-6b-05
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_3J2GS6b05

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

In the field of factoid question answering(QA), it is known that the state-of-the-art technology has achieved an accuracy comparable to human. However, in the area of non-factoid QA, there are only limited numbers of datasets for training QA models. So within the field of the non-factoid QA, we develop a dataset for training Japanese tip QA models. Although it can be shown that the trained Japanese tip QA model outperforms the factoid QA model, this thesis further aims at answering tip questions more closely related to daily lives. Specifically, we collect community QA examples from a community QA site and then apply the trained Japanese tip QA model to those community QA examples. Evaluation results again show that the trained tip QA model outperforms the factoid QA model when testing against those community QA examples.

View full abstract

Download PDF (1009K)
Weight and Activation Ternarization in BERT

Soichiro KAKU, Kyosuke NISHIDA, Sen YOSHIDA

Session ID: 3J4-GS-6c-01
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_3J4GS6c01

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

Quantization techniques that approximate float values with a small number of bits have been attracting attention to reduce the model size and speed of pre-trained language models such as BERT. On the other hand, quantization of activation (input to each layer) is mostly done with 8 bits, and it is empirically known that approximation with less than 8 bits is difficult to maintain accuracy. In this study, we consider outliers in the intermediate representation of BERT to be a problem, and propose a ternarization method that can deal with outliers in the activation of each layer of the pre-trained BERT. Experimental results show that the ternarized model of weight and activation outperformed the previous method in language modeling and downstream tasks.

View full abstract

Download PDF (518K)
Detection of the citation-worthiness using BERT and its error analysis

Kohji DOHSAKA, Hiromi NARIMATSU, Kohei KOYAMA, Ryuichiro HIGASHINAKA, ...

Session ID: 3J4-GS-6c-02
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_3J4GS6c02

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

Due to the explosive increase in academic papers and the need to cite appropriate references in writing papers, research on paper writing support has been conducted. In this paper, we focus on the citation-worthiness task of detecting which sentences need a citation. First, we developed a detection model based on transfer learning of the large-scale language model BERT that uses the existing Citation Worthiness dataset, and we obtained a significant performance improvement over the conventional method using convolutional neural networks. Next, we developed a detection model for each citation function using the Citation Function dataset. The evaluation results showed that the detection performance of citation-worthiness varies by citation functions. The citation functions like ``Background,'' expressed in various expressions, tended to lower performance than those like ``Compare & Contrast,'' expressed in limited surface forms. The error analysis indicated the necessity of a detection model that allows for the citation context.

View full abstract

Download PDF (303K)
An analysis of the improvement in Japanese dependency analysis using BERT

Tadatomo UDAGAWA, Daisuke KUBO, Takuya MATSUZAKI

Session ID: 3J4-GS-6c-03
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_3J4GS6c03

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

In this paper, we investigate how the accuracy of Japanese dependency analysis is improved by using BERT.We compare two neural models based on BERT and those based only on traditional features.In our experiments, both of the BERT-based models outperformed the accuracy of the models based only on traditional features.We analyze the difference, in terms of the POS combination and the distance between the bunsetsu pairs, to find the main factor of the improvement.

View full abstract

Download PDF (636K)
Investigation of Natural Language Processing by Deep Learning for Application to Development and Operation of X as a Service

Shigeaki GOTO, Eiji TSUCHIYA, Yoshihiro MIZUNO

Session ID: 3J4-GS-6c-04
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_3J4GS6c04

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

X as a Service, represented by Software as a Service, often adopts the DevOps method, which integrates development and operations. The purpose of adopting DevOps is to grow the system into what the users want by repeatedly acquiring user needs during system operation and reflecting user needs during system development. In this research, we investigate a natural language processing algorithm that extracts descriptions of user needs from natural language texts such as SNS posts, and automatically converts them into a SysML-compliant representation that can be easily reflected in system development, so that DevOps can be executed faster. In this paper, we report on the definition of the natural language processing task, its implementation with the application of a named entity recognition task by BERT, and the trial results confirming the F-measure of 69.3%.

View full abstract

Download PDF (362K)
Verification of text classification model using mathematical optimization solver

Hikaru TOMONARI, Masaaki NISHINO, Akihiro YAMAMOTO

Session ID: 3J4-GS-6c-05
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_3J4GS6c05

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

Even a neural network model (NN model) with high prediction accuracy may change its prediction due to small noise (perturbation) in the input data. The presence of perturbations can cause problems when NN models are used for text classification or machine translation. To reduce such risks, we need to find out how robust the NN model is against perturbations. A method has been proposed to accurately check the robustness of NN models with image inputs using a mathematical optimization solver, and this method is called verification of neural networks. On the other hand, when text is used as input, it is difficult to define perturbations due to the discrete nature of characters and words. In this study, we propose a method for verification of neural networks by defining perturbations similar to those for images, using word embedding vectors as input. In addition, we conducted an experiment to check the validity of the verification method, and found a correlation between the proposed method and several models with different robustness.

View full abstract

Download PDF (320K)
Yawning Detection under mask for Driver Monitor

Tomoya MATSUBARA, Ahmed MOUSTAFA

Session ID: 3N1-IS-2d-01
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_3N1IS2d01

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

This paper proposes an approach for detecting yawning under mask. The ultimate goal is to quantify drowsiness and fatigue on the driver monitor even under a mask. It will be possible by analyzing the wrinkles on the mask when the driver yawns. You Only Look Once (YOLO) is used as the detection method. If the certainty of YOLO's prediction is low, use BruteForceMatcher to improve the overall accuracy. To evaluate the proposed approach, a test is performed using actual yawning footage. As an experimental result, I hope that the accuracy will be improved when the proposed method is used than when only YOLO is used.

View full abstract

Download PDF (144K)
Accurate underwater model based dataset and analysis

Shunsuke TAKAO

Session ID: 3N1-IS-2d-02
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_3N1IS2d02

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

Although underwater images are important in many fields, image degradation such as color distortion or declined contrast caused by the complex ocean environment is a serious problem. In order to remove strong noises in underwater images, learning based approaches like deep learning are a prominent solution, but making large underwater dataset is a challenging task, not as in land images. Artificial images are commonly used in stead of real images to satisfy sufficient data in underwater image processing, but previous underwater image models are simplified and lacking reality. In order to enhance underwater images, this research constructs large underwater dataset based on correct underwater image model. Also, analysis of the constructed dataset and the performance of the proposed model is presented. PSNR of the proposed dataset distributed in wider range, suggesting the reality of the proposed dataset.

View full abstract

Download PDF (347K)
Audio Guide Text Generation of Manual Information for Repairing Precision Equipment

Yuriko YAMAYA, Shintaro KAWAMURA, Seigo HARASHIMA, Shinya IGUCHI

Session ID: 3N1-IS-2d-03
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_3N1IS2d03

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

One of the major tasks of the customer engineers by precision equipment manufacturing industries is to repair the customers’ machines. In case they encounter unknown or difficult procedures, they need to check manuals during repairing it. Furthermore, their works are often done at narrow spaces and make their hands dirty, so there is strong needs on hands-free guides, such as audio text guide. In manuals, there are not only the texts but also images which show the engineers the positional relationships between a target part and the peripheral parts, and the directions of which the target part can be moved. That is, the text of the manuals is insufficient to carry out their work. We propose to generate procedure explanation in texts for hands-free guides, by acquiring the information on the relationship between the target part and peripheral parts from the images and adding them to the information on the target part operation.

View full abstract

Download PDF (362K)
MIDI note embedding with fastText model

Yingfeng FU, Yusuke TANIMURA, Hidemoto NAKADA

Session ID: 3N1-IS-2d-04
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_3N1IS2d04

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

Distributed word representation greatly promoted research in NLP. Same as languages, MIDI music is constructed in the way of sequence, with a determined alphabet of notes and events. We proposed a way of training MIDI note embedding with an adaption of Facebook's fastText model. We then evaluate the model by word similarity, word analogy, and a classification task. The result shows that the adopted fastText model generalizes well in MIDI data and it’s promising to be used on future downstream tasks.

View full abstract

Download PDF (250K)
A Tennis Racket Tip Tracking Method from Sequential Images Using A CNN

Taichi HOSOI, Hirohisa HIOKI

Session ID: 3N1-IS-2d-05
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_3N1IS2d05

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

Recent achievements in image processing technologies enable us to automatically extract various information from sports videos and utilize it for purposes like analyzing games. For analyzing sports played with equipment like tennis, tracking their movements matters as well as those of players. For tracking players' movements, we already have methods that can estimate joint positions from videos. Meanwhile, for equipment, although we can locate it in videos by object detection methods, such location information is not always enough for our purpose. We require more detailed information like to which direction a racket is facing. We hence propose a method to track the tip of a tennis racket in a video for analyzing its movements. Considering applicability and usability, we are aiming at making our method work for single video streams taken under various conditions (courts, racket colors, clothes and weather) and can track a racket tip stably even when it happens to be occluded by a player or looks blurred in videos. For this purpose, we employ a CNN (Convolutional Neural Network) which processes time sequential images. We have performed an experiment and found that our method seems to work better than a method processing images one by one separately.

View full abstract

Download PDF (1557K)
Improving Exploration and Convergence Speed with Multi-Actor Control DDPG

David John Lucien FELICES, Mitsuhiko KIMOTO, Shoya MATSUMORI, Michita ...

Session ID: 3N3-IS-2e-02
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_3N3IS2e02

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

In Reinforcement Learning, the Deep Deterministic Policy Gradient (DDPG) algorithm is considered to be a powerful tool for continuous control tasks. However, when it comes to complex environments, DDPG does not always show positive results due to its inefficient exploration mechanism. To deal with such issues, several studies decided to increase the number of actors, but without considering if there was an actual optimal number of actors that an agent could have. We propose MAC-DDPG, which consists of a DDPG architecture with a variable number of actor networks. We also compare the computational cost and learning curves of using different numbers of actor networks on various OpenAI Gym environments. The main goal of this research is to keep the computational cost as low as possible while improving deep exploration so that increasing the number of actors is not detrimental in solving less complex environments fast. Currently, results show a potential increase in scores obtained on some environments (around +10%) compared with those obtained with classic DDPG, but greatly increase the time necessary to run the same number of epochs (time linearly increases with the number of actors).

View full abstract

Download PDF (148K)
A study on the effect of regularization for weight imprinting

Paulino CRISTOVAO, Hidemoto NAKADA, Yusuke TANIMURA, Hideki ASOH

Session ID: 3N3-IS-2e-04
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_3N3IS2e04

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

We investigate the Few Shot Learning based on the weight imprinting technique. The performance of imprinted weights deeply depends on the quality of the representation the encoder creates. However, it is known that the extracted representation quality affects the performance of the imprinted model, it is not known what characteristics are required for weight imprinting. The representation leads to the highest classification accuracy for base classes might not be the best one for downstream imprinting tasks. We are investigating how we can get a `better' representation in terms of WIP. Currently, we are focusing on regularization, model architecture, data augmentation, auxiliary dataset, and auxiliary tasks.

View full abstract

Download PDF (241K)
Deep Inverse Reinforcement Learning with Adversarial One-Class Classification

Daiko KISHIKAWA, Sachiyo ARAI

Session ID: 3N3-IS-2e-05
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_3N3IS2e05

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

Recently, inverse reinforcement learning, which estimates the reward from an expert's trajectories, has been attracting attention for imitating complex behaviors and estimating intentions. This study proposes a novel deep inverse reinforcement learning method that combines LogReg-IRL, an IRL method based on linearly solvable Markov decision process, and ALOCC, an adversarial one-class classification method. The proposed method can quickly learn rewards and state values without reinforcement learning executions or trajectories to be compared. We show that the proposed method obtains a more expert-like gait than LogReg-IRL in the BipedalWalker task through computer experiments.

View full abstract

Download PDF (167K)
Preventing the Transmission of Slanderous Comments in Online Chat Rooms

Masaki ITO, [in Japanese]

Session ID: 4C3-OS-1a-01
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_4C3OS1a01

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

In recent years, there have been more and more opportunities for online chatting, and comments that hurt others may be sent intentionally or unintentionally. In many systems, inappropriate words are prepared in advance, and comments containing such words are prevented from being sent. However, this method does not provide a fundamental solution because it cannot change the consciousness of the sender. Therefore, in this study, we propose a function to prevent the transmission of slanderous comments by displaying a message to the sender of a potentially slanderous comment to encourage awareness and confirmation of the content, as well as a function to estimate and visualize the accumulated damage caused by slander to the user receiving the comment. We also propose a function to estimate and visualize the accumulated damage caused by slanderous comments. We also propose a function to estimate and visualize the accumulated damage caused by slander by the user who receives the comment. These functions will help change the user's awareness when sending a comment. Through experiments, we verified whether the proposed functions can prevent the transmission of slanderous comments.

View full abstract

Download PDF (491K)
Supporting Personality Estimation Based on the Percentage of Emotional Words Used in Tweets

Yuto KUDO, [in Japanese]

Session ID: 4C3-OS-1a-02
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_4C3OS1a02

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

In recent years, social networking services (SNS) have become very popular. While it is easy to connect with many people, it is also easy to get into trouble. One of the ways to prevent troubles is to check the information of the person with whom you interact in advance. In this study, we propose a Twitter-based system that supports the selection of an interaction partner by displaying the results of personality estimation of each user based on the percentage of emotional words used in tweets. Users of this system can search for people with whom they want to interact by narrowing down the users based on the provided personality information of Twitter users and then checking the actual tweets of the narrowed users. Through experiments, we verified whether the system can smoothly find Twitter users with whom the user wants to interact.

View full abstract

Download PDF (1396K)
Verification of Model Modification Operations in Interactive Topic Modeling based on GDM

Kenji KOBAYASHI, Hiroki SHIBATA, Yasufumi TAKAMA

Session ID: 4C3-OS-1a-03
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_4C3OS1a03

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

This paper proposes an interactive topic modeling based on GDM (Geometric Dirichlet Means) and verifies its effectiveness by applying it to a news corpus. Topic modeling is a method for probabilistically analyzing latent topics in a set of documents. As it is an unsupervised learning, it may produce results that an analyst does not intend. To solve this problem, this paper introduces the concept of Human-in-the-Loop to obtain topics corresponding to the analyst's intention by incorporating the analyst's knowledge into the learning process. The proposed method employs GDM, which is based on geometric computation and has a high affinity with a document clustering. Model change operations with the parameters to be adjusted are defined, of which the effectiveness is shown with a verification experiment.

View full abstract

Download PDF (365K)
Interpretation Support System for Classification Patterns from Deep Learning Networks using HMM

Masayuki ANDO, Wataru SUNAYAMA, Yuji HATANAKA

Session ID: 4C3-OS-1a-04
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_4C3OS1a04

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

In deep learning, there is a problem that concrete classification patterns for deriving reasons for classification are often incomprehensible. In this paper, we propose a classification patterns extraction system from deep learning networks and verified the effectiveness of the system. The proposed system extracts classification patterns from the trained learning networks of LSTM using HMM. Then the system displays the extracted classification patterns so that users of the system can interpret the learning networks. In the verification experiments, the interpretations of the extracted classification patterns were compared with the interpretations of the classification patterns based on the TFIDF ranking. The results showed that the proposed system can extract classification patterns effective for interpretations of the learning networks.

View full abstract

Download PDF (1132K)
An Investigation on Educating Data Scientists with Practical Skills through Problem-Based Learning Under the Online-Offline Hybrid Emvironment

Lessons Learned from PBL for the First and Second Year University Students

Munehiko SASAJIMA, Ken ISHIBASHI, Takehiro YAMAMOTO, Naoki KATOH, Hiro ...

Session ID: 4C4-OS-1b-01
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_4C4OS1b01

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

Although many faculties give "Statistics" and "Computer Science" centered course for educating data scientists, it is necessary to foster not only technical skills of programming and statistics, but also problem-identifying and problem-solving abilities. Since 2019, the authors' faculty, Social Information Science department, University of Hyogo, in cooperation with Supermarket KOHYO (a supermarket chain in Kinki district), and Macromill, Inc. (a marketing research company) carried out half year program of a PBL (Problem-Based learning) for all of the first grade students (101 in total). The program treated real survey data of approximately 30000 consumers collected by the Macromill, and the students tried to make proposals for solving each problem of the seven real supermarkets belonging to the KOHYO chain. According to the questionnaires after the PBL, major of the students noticed what skills are necessary for them to be professional data scientists. Especially in 2020, under the spreading of COVID-19, the PBL was carried out by mixing on-line and off-line teaching. The authors found some changes of the students' minds and issues for managing practical PBL in large scale.

View full abstract

Download PDF (744K)
Voice Navigation for Learning Text Mining Procedures

Masaya SATO, Wataru SUNAYAMA

Session ID: 4C4-OS-1b-02
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_4C4OS1b02

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

Nowadays, the training of data scientists has become an urgent task, and university education is becoming compulsory. Also, in data science, the focus is often on individual analytical methods. There are few opportunities to explain a series of procedures from the state of having data at hand to the acquisition of knowledge as an analysis result. Therefore, in this research, we aim to learn a series of procedures for data analysis, using the text mining tool “TETDM”, we propose a voice navigation system that helps beginners in data analysis perform a series of procedures from data input to knowledge acquisition. As a result of the experiment, it was confirmed that the proposed navigation system has the effect of supporting the smooth execution of a series of text analysis procedures.

View full abstract

Download PDF (410K)
Preliminary study of liver cirrhosis staging and classification using deep metric learning network

Katsuhiro NAKAI, Xian-Hua HAN

Session ID: 4C4-OS-1b-03
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_4C4OS1b03

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

Progression staging and classification of liver cirrhosis plays important role in determining the accurate treatment and assessing clinical efficacy. Currently, liver biopsy is the gold standard method for liver cirrhosis staging via sampling the real liver tissue, which imposes heavy burden on the patient. To alleviate the heavy burden on patients, recent research pays extensive attention on non-invasive methods such as blood tests and medical images for liver cirrhosis diagnosis. In this paper, we investigate a non-invasive progression staging method of liver cirrhosis using MRI images and deep learning methods. This study exploits a novel module (dubbed as AFM module) consisting of additive angular margin and fisher margin, and integrates it deep learning network to maximize the cirrhosis stage separability. Experiments on the MRI images provided by Shandong University, which includes three progression stages of liver cirrhosis: early, middle and last stages, validate that the performance gain with the integration of the proposed AFM module are from 3% to 7% compared to the baseline models: VGG16, ResNet18, and ResNet50.

View full abstract

Download PDF (506K)
Proposal of Human Error Prediction Method by Biometric Index and Logistic Regression

Yuto SAITO, Ryota MATSUBARA, Bin Mohd Anuardi Muhammad Nur ADILIN, Mid ...

Session ID: 4D2-OS-4a-02
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_4D2OS4a02

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

Various methods using biometric data have been proposed for the analysis of the mental state of human error. The purpose of this study is to construct a prediction model of errors and to predict them in real time by measuring mental states using EEG, heartbeat, and questionnaire results. We proposed a prediction model of errors for each individual using the EEG, heart rate, and questionnaire results obtained from the Stroop task. As a result, it was found that some indices of EEG, heartbeat, and questionnaire results were related to errors, and these indices were incorporated into the error prediction model. In addition, we tested whether human errors can be prevented by predicting errors in real time. As a result, when an error was predicted, the occurrence of the error was confirmed in 97% of cases.

View full abstract

Download PDF (610K)
Preference prediction for images using facial expression in multiple image domains

Yoshiyuki SATO, Yuta HORAGUCHI, Lorraine VANEL, Satoshi SHIOIRI

Session ID: 4D2-OS-4a-03
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_4D2OS4a03

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

We are facing with ever-increasing amount of image contents including photos obtained by ourselves and images posted on SNS sites by others. In such a situation, it is essential to develop a technique that can recommend images preferred by a user without imposing much effort to the user. In this study, we conducted an experiment to obtain image preference data and developed a machine learning model that predicts image preference. In addition to the presented images, we also utilized recorded facial images as implicit information, and compared which features better predict image preference. Furthermore, we used two different image domains (lunchboxes and landscapes) to investigate how image domain influences the facial features useful for preference prediction. We showed that, in both domains, the performance of preference prediction improved significantly by incorporating facial features. By analyzing the contribution of facial features to model prediction, we also showed that facial features related to positive and negative emotions were important for lunchbox and landscape images, respectively. This suggests that human image preferences for different image domains are well predicted by a machine learning model, though the preference is manifested as distinct facial features across different image domains.

View full abstract

Download PDF (866K)
Deep learning for emotion perception of others

Shiro KUMANO, Akihiro MATSUFUJI, Yan ZHOU

Session ID: 4D2-OS-4a-04
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_4D2OS4a04

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

The main target of automatic conventional emotion estimation has been the person's emotional state or the aggregate impressions of multiple external observers. However, limited effort has been made on estimating the impressions of a single other person. To this end, we have proposed a model that assumes conditional independence of the target and the rater. However, due to the simplicity of the model, the prediction performance for unknown subjects and unknown raters was limited. In this study, we attempted to improve the prediction performance by using deep learning. As a result of emotion recognition experiments on facial expression images, the effectiveness of the proposed method was confirmed.

View full abstract

Download PDF (392K)
End-to-End Deep Learning for pNN50 Estimation Using a Spatiotemporal Representation

Sayyedjavad ZIARATNIA, Peeraya SRIPIAN, Kazuo OHZEKI, Midori SUGAYA

Session ID: 4D3-OS-4b-01
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_4D3OS4b01

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

Various industries widely use emotion estimation to evaluate their consumer satisfaction towards their product. Generally, emotion can be estimated based on observable expressions such as facial expression, or unobservable expressions such as biological signals. Although used by many research, the Facial Expression Recognition (FER) has a lack of precision for expressions that are very similar to each other or a situation where the shown expression differs from the real subject’s emotion. On the other hand, biological signal indexes such as pNN50 can act as a supportive mechanism to improve emotion estimation from observable expressions such as FER method. pNN50 is a reliable index to estimate stress-relax, and it originates from unconscious emotions that cannot be manipulated. In this work, we propose a method for pNN50 estimation from facial video using a Deep Learning model. Transfer learning technique and a pre-trained Image recognition Convolutional Neural Network (CNN) model are employed to estimate pNN50 based on a spatiotemporal map created from a series of frames in a facial video. The model trained on low, middle, and high pNN50 values, shows an accuracy of about 80%. Therefore, it indicates the potential of our proposed method, and we can expand it to categorize the more detailed level of pNN50 values.

View full abstract

Download PDF (646K)
Multimodal Concept Learning using Biosignals

Kazuaki OHMORI, Kazuki MIYAZAWA, Tatsuya AOKI, Takato HORII, Takayuki ...

Session ID: 4D3-OS-4b-02
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_4D3OS4b02

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

The brain receives various signals through its own body. These signals are classified into exteroception, interoception, and proprioception and are structurally integrated. This integrated structure is considered to be the basis of intelligence including emotions. However, there are few studies on the construction of emotional and cognitive models using actual sensory signals since it is difficult to measure these sensory signals continuously. In this study, we capture multimodal sensory signals from a real human body and attempt to integrate and structure this information by applying machine learning methods. Then, we discuss the possibility of reproducing the concepts in the brain by analyzing the integrated structure. In particular, we report on concept formation based on signals obtained in an eating task, and how signals obtained in non-eating tasks are perceived.

View full abstract

Download PDF (758K)
Integrated Model of Interoception, Exteroception and Proprioception Using Predictive Coding

Yume HIRAI, Takato HORII, Takayuki NAGAI

Session ID: 4D3-OS-4b-03
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_4D3OS4b03

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

In this study, we propose a computational model that integrates interoception, exteroception, and proprioception in an upper hierarchy, based on the predictive coding mechanism. In the experiment, we simulate a robot arm with these three perceptions, which carries out the task of lifting objects. As a result, the proposed model was able to form concepts related to the objects in the upper hierarchy through repeated experience of the task, and to predict interoception and proprioception from the input exteroception. Furthermore, we confirmed that the prediction error of the sensory signal changes according to the degree of concept formation in the upper hierarchy. In addition, by classifying the differential values for the prediction errors of the interoception calculated in the proposed model, the relationship between prediction errors of the interoception and basic emotions can be discussed.

View full abstract

Download PDF (565K)
Hyperspherical Representation of Emotion by Combining Recognition and Unification Tasks Based on Multimodal Fusion

Seiichi HARATA, Takuto SAKUMA, Shohei KATO

Session ID: 4D3-OS-4b-04
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_4D3OS4b04

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

To emulate human emotions in agents, the mathematical representation of emotion (an emotional space) is essential for each component, such as emotion recognition, generation, and expression. This study aims to model human emotion perception by acquiring a modality-independent emotional space by extracting shared emotional information from different modalities. We propose a method of acquiring a hyperspherical emotional space by fusing multimodalities on a DNN and combining the emotion recognition task and the unification task. The emotion recognition task learns the representation of emotions, and the unification task learns an identical emotional space from each modality. Through the experiments with audio-visual data, we confirmed that the proposed method could adequately represent emotions in a low-dimensional hyperspherical emotional space under this paper's experimental conditions. We also confirmed that the proposed method's emotional representation is modality-independent by measuring the robustness of the emotion recognition in the available modalities through a modality ablation experiment.

View full abstract

Download PDF (6797K)
A Method of Ensemble Learning Towards Emotion Recognition Considering Individual Differences

Akihiro MATSUFUJI, Erina KASANO, Eri SATO-SHIMOKAWARA, Toru YAMAGUCHI

Session ID: 4D3-OS-4b-05
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_4D3OS4b05

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

It is desirable for interactive robots and artificial agents to take into account the emotion and provides appropriate empathetic output to the user. The development of machine learning and deep learning technologies helped to have a big advance in the research field of emotion understanding by machine. However, these sophisticated technologies still struggling with individual differences in emotion. In this paper, we present an ensemble learning method towards emotion recognition considering the individual differences. Our proposed method divided the training data into each person's training data, and train the independent multi-models corresponding to each person as submodels of ensemble learning architecture. Furthermore, we implemented the dynamic weight decision for selecting the appropriate submodel to recognize the user's emotion using a few samples of the user's emotional behavior. As a result, our architecture performed well than the conventional machine learning model.

View full abstract

Download PDF (288K)
Identification of Automatic Thoughts in Cognitive Restructuring with a Virtual Agent

Kazuhiro SHIDARA, Hiroki TANAKA, Hiroyoshi ADACHI, Daisuke KANAYAMA, Y ...

Session ID: 4D4-OS-4c-01
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_4D4OS4c01

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

Cognitive behavior therapy with virtual agents has been proposed for the purpose of promoting mental health. On the other hand, there is a lack of quantitative analysis of the dialogue content. Therefore, we analyzed the automatic thoughts of users using dialogue data based on cognitive restructuring with a virtual agent. As a result of the evaluation by a psychiatrist, 36.1% of the experimental participants were unsuccessful in identifying the automatic thoughts. Therefore, we propose a classifier to classify the success or failure of identifying automatic thoughts as a basic technology for guiding the identification of automatic thoughts. We performed supervised learning using the automatic thought sentences collected in the dialogue experiments and the automatic thoughts published in medical books as training data. As a result, the F1-score was 0.833. This classifier has the potential to allow virtual agents to automatically guide the identification of automatic thoughts.

View full abstract

Download PDF (367K)
Proposition of Multimodal Dialogue System Reflecting Human Emotions

Yuki SAMEI, Komei HIRUTA, Satoshi SUGA, Yoji KAWANO, Eichi TAKAYA, Yos ...

Session ID: 4D4-OS-4c-02
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_4D4OS4c02

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

One of the reasons why current dialogue systems can not interact like humans is that these systems lack an ability of reflecting human emotions in interaction. In this study, we develop a new dialogue system that interacts by using multimodal emotions such as facial expressions, tone of voice, and speech content. When the multimodal emotion values obtained from the user are input into the system, it speaks with displaying pictograms with facial expressions appropriate to each situation. Experiment was conducted by preparing a comparison model. And the results showed that the proposed model can display more appropriate facial expressions.

View full abstract

Download PDF (417K)
The Use of Action-Relation Probability in Policy Reuse for Dialog Management

Tung The NGUYEN, Koichiro YOSHINO, Sakriani SAKTI, Satoshi NAKAMURA

Session ID: 4E1-OS-11a-01
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_4E1OS11a01

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

Reusing policies in a new domain, which is trained on the existing domain, is an important problem of dialogue management research based on reinforcement learning. This work defines action-relation probabilities between the action spaces of the new and the target domains using mixture density networks for the reuse of policies. Experimental results showed that the proposed modeling of action-relation probabilities based on component matching using regression realized the effective policy reuse.

View full abstract

Download PDF (322K)
Towards a task-oriented dialogue system based on the integration of understanding and generation modules using reinforcement learning

Atsumoto OHASHI, Ryuichiro HIGASHINAKA

Session ID: 4E1-OS-11a-02
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_4E1OS11a02

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

In order to accomplish tasks, it is important for task-oriented dialogue systems to adapt to users and dialogue situations. However, in many systems, each module is developed separately and connected, which makes it difficult for a system to respond flexibly to unexpected users and dialogue situations. In this research, we aim to realize a system that can adapt to users and dialogue situations by making each module share its own information with others and learn how to behave in order to maximize the system performance through reinforcement learning. With dialogue simulations in a tourist domain, we confirmed that the proposed method leads to an improvement in the task completion rate.

View full abstract

Download PDF (397K)
Analysis of Subjective Evaluation for Fine-tuning Methods of Transformer encoder-decoder based Conversational Systems

Hiroaki SUGIYAMA, Hiromi NARIMATSU, Masahiro MIZUKAMI, Tsunehiro ARIMO ...

Session ID: 4E1-OS-11a-03
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_4E1OS11a03

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

In recent years, several high-performance conversational systems based on the transformer encoder-decoder model have been proposed. Natural response generation is achieved by increasing the system scale (model parameters, amount of training data, etc.). While previous studies have analyzed the relationship between the system size and decoding method on the subjective evaluation of dialogues, they have not analyzed the differences among Fine-tune corpora. In addition, conventional analysis has focused only on overall naturalness and superiority, and has not sufficiently analyzed the relationship with multifaceted and detailed impressions. We evaluate and analyze the impressions of human dialogues in different Fine-tune corpora, system sizes, and the use of additional information.

View full abstract

Download PDF (523K)
A Study of Error Recovery Methods for User Information Acquisition in Daily Dialogue

Yoshiki OHIRA, Takahisa UCHIDA, Takashi MINATO, Hiroshi ISHIGURO

Session ID: 4E1-OS-11a-04
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_4E1OS11a04

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

The purpose of this research is to develop a dialogue system that models its user’s preferences and experiences in daily dialogue. Understanding the user's preferences and experiences is important to increase the user's dialogue satisfaction. When acquiring user information, it is necessary to continue the dialogue according to the user's knowledge. In this paper, we propose a recovery method that tries to identify the intended concept in the user's utterance by comparing the user's utterance with the system's concept when it is not identified (error). The context of the dialogue is defined as a frame representation, and the system updates the context to identify the intended concept based on the information obtained from the user's previous utterances. In addition, when the user's utterance is ambiguous, it performs estimation to determine the intended concept. Here, it uses a common sense based on the experience data of third parties obtained in advance. The goal is to identify the intended concept without decreasing the user's motivation to talk. This kind of error recovery method is important not only for robust dialogue generation during user information acquisition, but also for promoting mutual understanding between users and the system.

View full abstract

Download PDF (525K)
Predict the Timing of Backchannels Using Dialog History and Linguistic/Acoustic Features

Toshiki MUROMACHI, Yoshinobu KANO

Session ID: 4E1-OS-11a-05
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_4E1OS11a05

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

Backchannels could allow spoken dialogue systems to make communication smoother and to elicit more conversation from users. We propose a model that uses acoustic features, linguistic features, and dialogue histories, to predict appropriate timings of backchannels. Our experimental results show that the proposed method performs better than our baseline model that uses acoustic and linguistic features only. Furthermore, we conducted a subjective experiment on predicting timings of backchannels, which results showed that the proposed method can predict the timings of the giving backchannels with a performance similar to that of a human annotator. We obtained a higher evaluation than the baseline model in our five-grade evaluation by seven human subjects, confirming the effectiveness of our proposed method.

View full abstract

Download PDF (531K)
A Study on Various Adaptive Responses in Non-Task-Oriented Dialogue Systems Considering Users' Acceptable Range

Hirofumi KIKUCHI, JIE YANG, Hideaki KIKUCHI

Session ID: 4E2-OS-11b-02
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_4E2OS11b02

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

Recently, the number of elderly people living alone in households is increasing in Japan. In these households, the frequency of conversation is decreasing. There are concerns that less frequent conversations will lead to a decline in health. Spoken dialogue systems are expected to be used to meet this demand for conversation. However, spoken dialogue systems have a problem of decreasing the users' desire to continue the dialogue. In this research, we aim to solve such a problem of breakdown. We have confirmed that there exists an acceptable range of system response to users' utterances using a single speaker's utterances. In this paper, we recorded user utterances by nine speakers and conducted a listening evaluation experiment to confirm the existence of acceptance for various types of user utterances. As results, the tendency of the relationship between user utterances and system responses, which is related to the users' acceptance judgment, was clarified.

View full abstract

Download PDF (1064K)
Statistical Dialogue System Considering Non-verbal Emotion Expression

Kazuya MERA, Mayuna ISHIDA, Shunsuke HABARA, Yoshiaki KUROSAWA, Toshiy ...

Session ID: 4E2-OS-11b-03
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_4E2OS11b03

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

To deal with user’s emotion on rule-based dialogue system, vast number of rules should be prepared because the number of rules are multiplied by the type of emotions. On the other hand, although statistical dialogue system is robust for various inputs, it can deal with only text information. This paper proposes a statistical dialogue system which can deal with user’s emotions by converting them into emojis. The user’s emotion is estimated from tone of the user’s voice and appended to the tail of input sentence as an emoji. Then, emoji in output sentence is considered as the agent’s emotion when the output voice is synthesized. The question-answer pairs including emojis are collected from tweet-reply pairs on Twitter, and the pairs are also used for fine-tuning. Experimental result revealed that reply utterances by the proposed method were more suitable for user’s emotions than that by no-emoji method.

View full abstract

Download PDF (2363K)
Sentence Generation Reflecting Personality Traits

Sanae YAMASHITA, Noriyuki OKUMURA

Session ID: 4E2-OS-11b-04
Published: 2021
Released on J-STAGE: June 14, 2021

DOIhttps://doi.org/10.11517/pjsai.JSAI2021.0_4E2OS11b04

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

Dialogue systems have a problem of lack and inconsistency of characteristics or personality. This paper describesa text replacing method by subwords using BERT’s masked token prediction with transfer learning. As a result,we found that SentencePiece method without morphological analysis replaces tokens more fluent than the methodof Byte Pair Encoding after morphological analysis. And also, on conscientiousness and neuroticism factors ofBig Five, SentencePiece shows that transfer learning with individual tweets of research participants can reflect thepersonality of the writer.

View full abstract

Download PDF (395K)

Register with J-STAGE for free!