-
Takumi KIMURA, Takashi MATSUBARA, Kuniaki UEHARA
Session ID: 3I4-GS-7a-04
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
A three-dimensional point cloud is used in a wide range of fields such as robotics and autonomous cars and is becoming popular as a compact representation of an object's surface. Deep generative models for point clouds typically have been adapted to model variations by a map from a ball-like set of latent variables. However, previous approaches have not paid much attention to the topological structure of a point cloud. For this reason, a continuous map cannot express the varying number of holes and intersections. In this paper, we propose a flow-based deep generative model with multiple latent labels. By maximizing the mutual information, a map conditioned by a label is assigned to a continuous subset of a given point cloud, like a chart of a manifold. This enables our proposed model to preserve the topological structure with clear boundaries, while previous approaches tend to suffer from blurs and to fail in generating holes. Experimental results demonstrate that our proposed model achieves the state-of-the-art performance in generation and reconstruction among sampling-based point cloud generators.
View full abstract
-
Kazuhiro YAMAWAKI, Xian-Hua HAN
Session ID: 3I4-GS-7a-05
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
With the recent development of deep learning, super-resolution of a single image has been verified to improve the accuracy. These methods are implemented in a fully supervised manner for super-resolving the observed LR images with the known degradation model, and thus it faces difficulty to recover high-resolution images from the low-resolution images captured under unknown degradation models. In this study, we propose a deep unsupervised learning network to solve these problems. The proposed network architecture consists of a generative network for predicting high-resolution images and a degradation module for automatically learning the degradation operations for observed low-resolution images. Specifically, we exploit a Encoder-Decoder structure to serve as the generative network, which has been proven to have the powerful capability for modeling high quality images while the degradation module is implemented with a special depth-wise convolution layer, where its parameters are learnable. Therefore the proposed unsupervised learning SR framework is implemented in an end-to-end learning network training of the degradation module. To verify the effectiveness of the proposed method, we conduct extensive experiments on three publicly available benchmark datasets, and manifest superior performance even for the LR images captured under complex degradation models.
View full abstract
-
Kenichiro TSUJI, Sho MITARAI, Nagisa MUNEKATA
Session ID: 3J1-GS-6a-01
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
The damage caused by communications fraud has become a major social problem, and various organizations and institutions are taking various countermeasures. However, it is difficult to deal with all of them because the keywords used in fraudulent calls change with the diversification of the methods. In this paper, we attempted to extract the speech characteristics of a suspect from the voice of a communications fraud. In this paper, we tried to extract the characteristics of the suspect's speech from the communications fraud voice. The results of morphological analysis showed that many words related to time were used, which gave a sense of urgency to the victims. In addition, the speaking speed was slightly faster than that of normal conversation. Using these results, we constructed a discriminant model for communications frauds and verified the classification accuracy.
View full abstract
-
Through the Linguistic Factors in Recent Benchmark Tasks
Hiromitsu OTA
Session ID: 3J1-GS-6a-02
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
In recent years, the development of Transfomer-derived models, mainly BERT and RoBERTa, has been remarkable, and they have been put to practical use in all fields of natural language processing such as machine translation, automatic summarization, and automatic sentence generation. Knowledge representation and reasoning are used to support these, and by incorporating general knowledge into machines such as robots, there are active movements aimed at improving the accuracy of information retrieval and question answering. In this study, while the movement centered on BERT is being established, it is assumed that the improvement of the corpus provides the intrinsic value, and from the linguistic aspect, which factor contributes to the improvement of accuracy, and on the other hand, it is insufficient. It is to consider whether it is done. In particular, the area of common sense reasoning is centered on international benchmarking tasks, but it is always criticized that the language model is limited because it is created with a limited distribution of data sets. .. In this, it is necessary to check the contents of the leaderboard of each task. Wikipedia, ConceptNet, etc. are expected to improve accuracy by common sense reasoning of written words, but they also linguistically propose how to integrate common sense reasoning into interactive spoken language dialogue.
View full abstract
-
Comprehensive analysis of situations based on distributed representation
Masaaki OZAKI, Emiko UCHIYAMA, Yoshifumi NISHIDA
Session ID: 3J1-GS-6a-03
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
In this paper, we propose a new method for situational relative risk analysis that integrates a natural language processing technique and an epidemiological basic indicator, relative risk. The free-description style text data for accident situations were divided into 2 parts; pre- and post-accidental situation and converted into the situation vectors using distributed representation. For each part of the text data, the relative risk for situations was calculated. The relationship between the relative risk of the 2 parts was analyzed. To verify our method, we used the disaster benefit system of the Japan Sport Council (JSC) that is the complete data of school accidents in Japan. The proposed method enabled us to analyze the changes of the relative risk with the situation changes and to extract dangerous combinations of pre- and post-accidental situations.
View full abstract
-
Akito ARITA, Masayuki KOMAI, Daisuke SATOH, Ryousuke MARUKO, Megumi OH ...
Session ID: 3J1-GS-6a-04
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
It is generally said that tens of thousands or more of training data are required to obtain high accuracy in text processing tasks using deep learning. However, small amounts of data are available to train the solvers in practical tasks. In this paper, we propose a method to improve the accuracy of a text classification task. The method utilizes a data augmentation by exchanging phrases depend the same phrase.
View full abstract
-
Naoki KOSAKA, Tetsunori KOBAYASHI, Yoshihiko HAYASHI
Session ID: 3J1-GS-6a-05
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
The graph-based representation of a text that properly captures its linguistic structures has been the main concern in natural language processing. It attracts more researchers recently, as a graph is an explicit symbolic representation that can be nicely combined with external knowledge resources. We explore the efficacy of graph-based text representations by devising and comparing reading comprehension models. Specifically, we construct the graph-based representation of an input text by basing the dependency structures of sentences and enhancing them with several methods that add inter-sentence edges. The resulting edge-rich graph is then fed into a graph convolution network to acquire a vector representation that is essential in solving the target multi-choice reading comprehension task. The experimental results suggest that the proposed graph-based model is promising and may contribute to further improve the performance by being coupled with the model relying on a large-scaled pre-trained language model.
View full abstract
-
Takumi TSURUE, Yongwoon CHOI
Session ID: 3J2-GS-6b-01
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
In order to respond to human requests, it is very difficult for a robot to accurately understand a natural language to be expressed with many different ways for a unique command. In response to this, there is a method of the semantic analysis based on Seq2Seq (Sequence to Sequence) model of the attention-type for understanding the commands given by humans. However, since the output of the method is expressed in a complex logical form, it needs to be converted for the robot to perform the task for a command. Here, we propose a method for robots to understand commands with appropriate output in the execution of tasks. The output generated with the proposed method is composed of the information (words) necessary to a task in the order of [”task”, ”target”]. The experimental results using the instruction sentences used in a league of RoboCup@Home will show that it is possible to understand the instructions without using logical expressions.
View full abstract
-
Yosuke KISHINAMI, Reina AKAMA, Shiki SATO, Jun SUZUKI, Ryoko TOKUHISA, ...
Session ID: 3J2-GS-6b-02
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
In human-human conversation, the current utterance in a dialog is often influenced by previous and future contexts. Among these, looking ahead over future context is one of the most critical factors for active conversation. In this paper, we propose a novel training strategy to help neural response generation models generate responses that take into account information from the future context. Our training strategy considers a sequence consisting of the response and its future context as an output sequence, and the model learns to generate the output sequence from an input sequence, i.e., past utterances. In our experiments, we investigate the effect of the proposed strategy on the look-ahead ability of a dialog system via the "Lookahead Chit Chat Task."
View full abstract
-
Daiki HOMMA, Tatsuya AOKI, Takato HORII, Takayuki NAGAI
Session ID: 3J2-GS-6b-03
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
In recent years, home service robots, which assist human daily life, have been developed. However, the service robots do not distribute wider in the real environment because of the difficulty of understanding human language commands. For instance, when the robot receives the command "Put in the kitchen sink," it must grasp an appropriate object before moving to the kitchen sink. The robot is required to determine when it should perform a task from language commands. Furthermore, the robot is required to recognize the validity of language commands because humans sometimes make mistakes. This paper tackles these issues by employing probabilistic models. Our proposed model learns the relationship between robot observations (e.g., object image, the robot position) and verbal commands in an unsupervised manner. We evaluate the proposed method in the recognition tasks of tense and validity of verbal commands. The results reveal that our proposed model outperforms other machine learning methods.
View full abstract
-
Hiroshi HONDA, Johane TAKEUCHI, Mikio NAKANO
Session ID: 3J2-GS-6b-04
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
We propose interactive FAQ systems that have ability to explain what users want. In recent years, many services that explain product manuals such as FAQ systems have been provided. However, users may not always be able to express functions of products they want to know in a language. Furthermore, the users may even have misunderstandings about functions. Therefore, we develop an interactive FAQ system with ability to understand what users want even if they ask ambiguous questions and have misunderstandings. Using logistic regression, we trained models that predict car functions and misunderstandings about cars from user utterances. As a result of the evaluation, it was possible to confirm the high accuracy of function predictions and misunderstanding predictions despite the relatively small amount of training data. In addition, the results of the subjective evaluation confirmed that this system was easy for users to use.
View full abstract
-
Tingxuan LI, Shuting BAI, Seiji SUZUKI, Takehito UTSURO, Yasuhide KAWA ...
Session ID: 3J2-GS-6b-05
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
In the field of factoid question answering(QA), it is known that the state-of-the-art technology has achieved an accuracy comparable to human. However, in the area of non-factoid QA, there are only limited numbers of datasets for training QA models. So within the field of the non-factoid QA, we develop a dataset for training Japanese tip QA models. Although it can be shown that the trained Japanese tip QA model outperforms the factoid QA model, this thesis further aims at answering tip questions more closely related to daily lives. Specifically, we collect community QA examples from a community QA site and then apply the trained Japanese tip QA model to those community QA examples. Evaluation results again show that the trained tip QA model outperforms the factoid QA model when testing against those community QA examples.
View full abstract
-
Soichiro KAKU, Kyosuke NISHIDA, Sen YOSHIDA
Session ID: 3J4-GS-6c-01
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
Quantization techniques that approximate float values with a small number of bits have been attracting attention to reduce the model size and speed of pre-trained language models such as BERT. On the other hand, quantization of activation (input to each layer) is mostly done with 8 bits, and it is empirically known that approximation with less than 8 bits is difficult to maintain accuracy. In this study, we consider outliers in the intermediate representation of BERT to be a problem, and propose a ternarization method that can deal with outliers in the activation of each layer of the pre-trained BERT. Experimental results show that the ternarized model of weight and activation outperformed the previous method in language modeling and downstream tasks.
View full abstract
-
Kohji DOHSAKA, Hiromi NARIMATSU, Kohei KOYAMA, Ryuichiro HIGASHINAKA, ...
Session ID: 3J4-GS-6c-02
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
Due to the explosive increase in academic papers and the need to cite appropriate references in writing papers, research on paper writing support has been conducted. In this paper, we focus on the citation-worthiness task of detecting which sentences need a citation. First, we developed a detection model based on transfer learning of the large-scale language model BERT that uses the existing Citation Worthiness dataset, and we obtained a significant performance improvement over the conventional method using convolutional neural networks. Next, we developed a detection model for each citation function using the Citation Function dataset. The evaluation results showed that the detection performance of citation-worthiness varies by citation functions. The citation functions like ``Background,'' expressed in various expressions, tended to lower performance than those like ``Compare & Contrast,'' expressed in limited surface forms. The error analysis indicated the necessity of a detection model that allows for the citation context.
View full abstract
-
Tadatomo UDAGAWA, Daisuke KUBO, Takuya MATSUZAKI
Session ID: 3J4-GS-6c-03
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
In this paper, we investigate how the accuracy of Japanese dependency analysis is improved by using BERT.We compare two neural models based on BERT and those based only on traditional features.In our experiments, both of the BERT-based models outperformed the accuracy of the models based only on traditional features.We analyze the difference, in terms of the POS combination and the distance between the bunsetsu pairs, to find the main factor of the improvement.
View full abstract
-
Shigeaki GOTO, Eiji TSUCHIYA, Yoshihiro MIZUNO
Session ID: 3J4-GS-6c-04
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
X as a Service, represented by Software as a Service, often adopts the DevOps method, which integrates development and operations. The purpose of adopting DevOps is to grow the system into what the users want by repeatedly acquiring user needs during system operation and reflecting user needs during system development. In this research, we investigate a natural language processing algorithm that extracts descriptions of user needs from natural language texts such as SNS posts, and automatically converts them into a SysML-compliant representation that can be easily reflected in system development, so that DevOps can be executed faster. In this paper, we report on the definition of the natural language processing task, its implementation with the application of a named entity recognition task by BERT, and the trial results confirming the F-measure of 69.3%.
View full abstract
-
Hikaru TOMONARI, Masaaki NISHINO, Akihiro YAMAMOTO
Session ID: 3J4-GS-6c-05
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
Even a neural network model (NN model) with high prediction accuracy may change its prediction due to small noise (perturbation) in the input data. The presence of perturbations can cause problems when NN models are used for text classification or machine translation. To reduce such risks, we need to find out how robust the NN model is against perturbations. A method has been proposed to accurately check the robustness of NN models with image inputs using a mathematical optimization solver, and this method is called verification of neural networks. On the other hand, when text is used as input, it is difficult to define perturbations due to the discrete nature of characters and words. In this study, we propose a method for verification of neural networks by defining perturbations similar to those for images, using word embedding vectors as input. In addition, we conducted an experiment to check the validity of the verification method, and found a correlation between the proposed method and several models with different robustness.
View full abstract
-
Tomoya MATSUBARA, Ahmed MOUSTAFA
Session ID: 3N1-IS-2d-01
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
This paper proposes an approach for detecting yawning under mask. The ultimate goal is to quantify drowsiness and fatigue on the driver monitor even under a mask. It will be possible by analyzing the wrinkles on the mask when the driver yawns. You Only Look Once (YOLO) is used as the detection method. If the certainty of YOLO's prediction is low, use BruteForceMatcher to improve the overall accuracy. To evaluate the proposed approach, a test is performed using actual yawning footage. As an experimental result, I hope that the accuracy will be improved when the proposed method is used than when only YOLO is used.
View full abstract
-
Shunsuke TAKAO
Session ID: 3N1-IS-2d-02
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
Although underwater images are important in many fields, image degradation such as color distortion or declined contrast caused by the complex ocean environment is a serious problem. In order to remove strong noises in underwater images, learning based approaches like deep learning are a prominent solution, but making large underwater dataset is a challenging task, not as in land images. Artificial images are commonly used in stead of real images to satisfy sufficient data in underwater image processing, but previous underwater image models are simplified and lacking reality. In order to enhance underwater images, this research constructs large underwater dataset based on correct underwater image model. Also, analysis of the constructed dataset and the performance of the proposed model is presented. PSNR of the proposed dataset distributed in wider range, suggesting the reality of the proposed dataset.
View full abstract
-
Yuriko YAMAYA, Shintaro KAWAMURA, Seigo HARASHIMA, Shinya IGUCHI
Session ID: 3N1-IS-2d-03
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
One of the major tasks of the customer engineers by precision equipment manufacturing industries is to repair the customers’ machines. In case they encounter unknown or difficult procedures, they need to check manuals during repairing it. Furthermore, their works are often done at narrow spaces and make their hands dirty, so there is strong needs on hands-free guides, such as audio text guide. In manuals, there are not only the texts but also images which show the engineers the positional relationships between a target part and the peripheral parts, and the directions of which the target part can be moved. That is, the text of the manuals is insufficient to carry out their work. We propose to generate procedure explanation in texts for hands-free guides, by acquiring the information on the relationship between the target part and peripheral parts from the images and adding them to the information on the target part operation.
View full abstract
-
Yingfeng FU, Yusuke TANIMURA, Hidemoto NAKADA
Session ID: 3N1-IS-2d-04
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
Distributed word representation greatly promoted research in NLP. Same as languages, MIDI music is constructed in the way of sequence, with a determined alphabet of notes and events. We proposed a way of training MIDI note embedding with an adaption of Facebook's fastText model. We then evaluate the model by word similarity, word analogy, and a classification task. The result shows that the adopted fastText model generalizes well in MIDI data and it’s promising to be used on future downstream tasks.
View full abstract
-
Taichi HOSOI, Hirohisa HIOKI
Session ID: 3N1-IS-2d-05
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
Recent achievements in image processing technologies enable us to automatically extract various information from sports videos and utilize it for purposes like analyzing games. For analyzing sports played with equipment like tennis, tracking their movements matters as well as those of players. For tracking players' movements, we already have methods that can estimate joint positions from videos. Meanwhile, for equipment, although we can locate it in videos by object detection methods, such location information is not always enough for our purpose. We require more detailed information like to which direction a racket is facing. We hence propose a method to track the tip of a tennis racket in a video for analyzing its movements. Considering applicability and usability, we are aiming at making our method work for single video streams taken under various conditions (courts, racket colors, clothes and weather) and can track a racket tip stably even when it happens to be occluded by a player or looks blurred in videos. For this purpose, we employ a CNN (Convolutional Neural Network) which processes time sequential images. We have performed an experiment and found that our method seems to work better than a method processing images one by one separately.
View full abstract
-
David John Lucien FELICES, Mitsuhiko KIMOTO, Shoya MATSUMORI, Michita ...
Session ID: 3N3-IS-2e-02
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
In Reinforcement Learning, the Deep Deterministic Policy Gradient (DDPG) algorithm is considered to be a powerful tool for continuous control tasks. However, when it comes to complex environments, DDPG does not always show positive results due to its inefficient exploration mechanism. To deal with such issues, several studies decided to increase the number of actors, but without considering if there was an actual optimal number of actors that an agent could have. We propose MAC-DDPG, which consists of a DDPG architecture with a variable number of actor networks. We also compare the computational cost and learning curves of using different numbers of actor networks on various OpenAI Gym environments. The main goal of this research is to keep the computational cost as low as possible while improving deep exploration so that increasing the number of actors is not detrimental in solving less complex environments fast. Currently, results show a potential increase in scores obtained on some environments (around +10%) compared with those obtained with classic DDPG, but greatly increase the time necessary to run the same number of epochs (time linearly increases with the number of actors).
View full abstract
-
Paulino CRISTOVAO, Hidemoto NAKADA, Yusuke TANIMURA, Hideki ASOH
Session ID: 3N3-IS-2e-04
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
We investigate the Few Shot Learning based on the weight imprinting technique. The performance of imprinted weights deeply depends on the quality of the representation the encoder creates. However, it is known that the extracted representation quality affects the performance of the imprinted model, it is not known what characteristics are required for weight imprinting. The representation leads to the highest classification accuracy for base classes might not be the best one for downstream imprinting tasks. We are investigating how we can get a `better' representation in terms of WIP. Currently, we are focusing on regularization, model architecture, data augmentation, auxiliary dataset, and auxiliary tasks.
View full abstract
-
Daiko KISHIKAWA, Sachiyo ARAI
Session ID: 3N3-IS-2e-05
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
Recently, inverse reinforcement learning, which estimates the reward from an expert's trajectories, has been attracting attention for imitating complex behaviors and estimating intentions. This study proposes a novel deep inverse reinforcement learning method that combines LogReg-IRL, an IRL method based on linearly solvable Markov decision process, and ALOCC, an adversarial one-class classification method. The proposed method can quickly learn rewards and state values without reinforcement learning executions or trajectories to be compared. We show that the proposed method obtains a more expert-like gait than LogReg-IRL in the BipedalWalker task through computer experiments.
View full abstract
-
Masaki ITO, [in Japanese]
Session ID: 4C3-OS-1a-01
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
In recent years, there have been more and more opportunities for online chatting, and comments that hurt others may be sent intentionally or unintentionally. In many systems, inappropriate words are prepared in advance, and comments containing such words are prevented from being sent. However, this method does not provide a fundamental solution because it cannot change the consciousness of the sender. Therefore, in this study, we propose a function to prevent the transmission of slanderous comments by displaying a message to the sender of a potentially slanderous comment to encourage awareness and confirmation of the content, as well as a function to estimate and visualize the accumulated damage caused by slander to the user receiving the comment. We also propose a function to estimate and visualize the accumulated damage caused by slanderous comments. We also propose a function to estimate and visualize the accumulated damage caused by slander by the user who receives the comment. These functions will help change the user's awareness when sending a comment. Through experiments, we verified whether the proposed functions can prevent the transmission of slanderous comments.
View full abstract
-
Yuto KUDO, [in Japanese]
Session ID: 4C3-OS-1a-02
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
In recent years, social networking services (SNS) have become very popular. While it is easy to connect with many people, it is also easy to get into trouble. One of the ways to prevent troubles is to check the information of the person with whom you interact in advance. In this study, we propose a Twitter-based system that supports the selection of an interaction partner by displaying the results of personality estimation of each user based on the percentage of emotional words used in tweets. Users of this system can search for people with whom they want to interact by narrowing down the users based on the provided personality information of Twitter users and then checking the actual tweets of the narrowed users. Through experiments, we verified whether the system can smoothly find Twitter users with whom the user wants to interact.
View full abstract
-
Kenji KOBAYASHI, Hiroki SHIBATA, Yasufumi TAKAMA
Session ID: 4C3-OS-1a-03
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
This paper proposes an interactive topic modeling based on GDM (Geometric Dirichlet Means) and verifies its effectiveness by applying it to a news corpus. Topic modeling is a method for probabilistically analyzing latent topics in a set of documents. As it is an unsupervised learning, it may produce results that an analyst does not intend. To solve this problem, this paper introduces the concept of Human-in-the-Loop to obtain topics corresponding to the analyst's intention by incorporating the analyst's knowledge into the learning process. The proposed method employs GDM, which is based on geometric computation and has a high affinity with a document clustering. Model change operations with the parameters to be adjusted are defined, of which the effectiveness is shown with a verification experiment.
View full abstract
-
Masayuki ANDO, Wataru SUNAYAMA, Yuji HATANAKA
Session ID: 4C3-OS-1a-04
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
In deep learning, there is a problem that concrete classification patterns for deriving reasons for classification are often incomprehensible. In this paper, we propose a classification patterns extraction system from deep learning networks and verified the effectiveness of the system. The proposed system extracts classification patterns from the trained learning networks of LSTM using HMM. Then the system displays the extracted classification patterns so that users of the system can interpret the learning networks. In the verification experiments, the interpretations of the extracted classification patterns were compared with the interpretations of the classification patterns based on the TFIDF ranking. The results showed that the proposed system can extract classification patterns effective for interpretations of the learning networks.
View full abstract
-
Lessons Learned from PBL for the First and Second Year University Students
Munehiko SASAJIMA, Ken ISHIBASHI, Takehiro YAMAMOTO, Naoki KATOH, Hiro ...
Session ID: 4C4-OS-1b-01
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
Although many faculties give "Statistics" and "Computer Science" centered course for educating data scientists, it is necessary to foster not only technical skills of programming and statistics, but also problem-identifying and problem-solving abilities. Since 2019, the authors' faculty, Social Information Science department, University of Hyogo, in cooperation with Supermarket KOHYO (a supermarket chain in Kinki district), and Macromill, Inc. (a marketing research company) carried out half year program of a PBL (Problem-Based learning) for all of the first grade students (101 in total). The program treated real survey data of approximately 30000 consumers collected by the Macromill, and the students tried to make proposals for solving each problem of the seven real supermarkets belonging to the KOHYO chain. According to the questionnaires after the PBL, major of the students noticed what skills are necessary for them to be professional data scientists. Especially in 2020, under the spreading of COVID-19, the PBL was carried out by mixing on-line and off-line teaching. The authors found some changes of the students' minds and issues for managing practical PBL in large scale.
View full abstract
-
Masaya SATO, Wataru SUNAYAMA
Session ID: 4C4-OS-1b-02
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
Nowadays, the training of data scientists has become an urgent task, and university education is becoming compulsory. Also, in data science, the focus is often on individual analytical methods. There are few opportunities to explain a series of procedures from the state of having data at hand to the acquisition of knowledge as an analysis result. Therefore, in this research, we aim to learn a series of procedures for data analysis, using the text mining tool “TETDM”, we propose a voice navigation system that helps beginners in data analysis perform a series of procedures from data input to knowledge acquisition. As a result of the experiment, it was confirmed that the proposed navigation system has the effect of supporting the smooth execution of a series of text analysis procedures.
View full abstract
-
Katsuhiro NAKAI, Xian-Hua HAN
Session ID: 4C4-OS-1b-03
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
Progression staging and classification of liver cirrhosis plays important role in determining the accurate treatment and assessing clinical efficacy. Currently, liver biopsy is the gold standard method for liver cirrhosis staging via sampling the real liver tissue, which imposes heavy burden on the patient. To alleviate the heavy burden on patients, recent research pays extensive attention on non-invasive methods such as blood tests and medical images for liver cirrhosis diagnosis. In this paper, we investigate a non-invasive progression staging method of liver cirrhosis using MRI images and deep learning methods. This study exploits a novel module (dubbed as AFM module) consisting of additive angular margin and fisher margin, and integrates it deep learning network to maximize the cirrhosis stage separability. Experiments on the MRI images provided by Shandong University, which includes three progression stages of liver cirrhosis: early, middle and last stages, validate that the performance gain with the integration of the proposed AFM module are from 3% to 7% compared to the baseline models: VGG16, ResNet18, and ResNet50.
View full abstract
-
Yuto SAITO, Ryota MATSUBARA, Bin Mohd Anuardi Muhammad Nur ADILIN, Mid ...
Session ID: 4D2-OS-4a-02
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
Various methods using biometric data have been proposed for the analysis of the mental state of human error. The purpose of this study is to construct a prediction model of errors and to predict them in real time by measuring mental states using EEG, heartbeat, and questionnaire results. We proposed a prediction model of errors for each individual using the EEG, heart rate, and questionnaire results obtained from the Stroop task. As a result, it was found that some indices of EEG, heartbeat, and questionnaire results were related to errors, and these indices were incorporated into the error prediction model. In addition, we tested whether human errors can be prevented by predicting errors in real time. As a result, when an error was predicted, the occurrence of the error was confirmed in 97% of cases.
View full abstract
-
Yoshiyuki SATO, Yuta HORAGUCHI, Lorraine VANEL, Satoshi SHIOIRI
Session ID: 4D2-OS-4a-03
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
We are facing with ever-increasing amount of image contents including photos obtained by ourselves and images posted on SNS sites by others. In such a situation, it is essential to develop a technique that can recommend images preferred by a user without imposing much effort to the user. In this study, we conducted an experiment to obtain image preference data and developed a machine learning model that predicts image preference. In addition to the presented images, we also utilized recorded facial images as implicit information, and compared which features better predict image preference. Furthermore, we used two different image domains (lunchboxes and landscapes) to investigate how image domain influences the facial features useful for preference prediction. We showed that, in both domains, the performance of preference prediction improved significantly by incorporating facial features. By analyzing the contribution of facial features to model prediction, we also showed that facial features related to positive and negative emotions were important for lunchbox and landscape images, respectively. This suggests that human image preferences for different image domains are well predicted by a machine learning model, though the preference is manifested as distinct facial features across different image domains.
View full abstract
-
Shiro KUMANO, Akihiro MATSUFUJI, Yan ZHOU
Session ID: 4D2-OS-4a-04
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
The main target of automatic conventional emotion estimation has been the person's emotional state or the aggregate impressions of multiple external observers. However, limited effort has been made on estimating the impressions of a single other person. To this end, we have proposed a model that assumes conditional independence of the target and the rater. However, due to the simplicity of the model, the prediction performance for unknown subjects and unknown raters was limited. In this study, we attempted to improve the prediction performance by using deep learning. As a result of emotion recognition experiments on facial expression images, the effectiveness of the proposed method was confirmed.
View full abstract
-
Sayyedjavad ZIARATNIA, Peeraya SRIPIAN, Kazuo OHZEKI, Midori SUGAYA
Session ID: 4D3-OS-4b-01
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
Various industries widely use emotion estimation to evaluate their consumer satisfaction towards their product. Generally, emotion can be estimated based on observable expressions such as facial expression, or unobservable expressions such as biological signals. Although used by many research, the Facial Expression Recognition (FER) has a lack of precision for expressions that are very similar to each other or a situation where the shown expression differs from the real subject’s emotion. On the other hand, biological signal indexes such as pNN50 can act as a supportive mechanism to improve emotion estimation from observable expressions such as FER method. pNN50 is a reliable index to estimate stress-relax, and it originates from unconscious emotions that cannot be manipulated. In this work, we propose a method for pNN50 estimation from facial video using a Deep Learning model. Transfer learning technique and a pre-trained Image recognition Convolutional Neural Network (CNN) model are employed to estimate pNN50 based on a spatiotemporal map created from a series of frames in a facial video. The model trained on low, middle, and high pNN50 values, shows an accuracy of about 80%. Therefore, it indicates the potential of our proposed method, and we can expand it to categorize the more detailed level of pNN50 values.
View full abstract
-
Kazuaki OHMORI, Kazuki MIYAZAWA, Tatsuya AOKI, Takato HORII, Takayuki ...
Session ID: 4D3-OS-4b-02
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
The brain receives various signals through its own body. These signals are classified into exteroception, interoception, and proprioception and are structurally integrated. This integrated structure is considered to be the basis of intelligence including emotions. However, there are few studies on the construction of emotional and cognitive models using actual sensory signals since it is difficult to measure these sensory signals continuously. In this study, we capture multimodal sensory signals from a real human body and attempt to integrate and structure this information by applying machine learning methods. Then, we discuss the possibility of reproducing the concepts in the brain by analyzing the integrated structure. In particular, we report on concept formation based on signals obtained in an eating task, and how signals obtained in non-eating tasks are perceived.
View full abstract
-
Yume HIRAI, Takato HORII, Takayuki NAGAI
Session ID: 4D3-OS-4b-03
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
In this study, we propose a computational model that integrates interoception, exteroception, and proprioception in an upper hierarchy, based on the predictive coding mechanism. In the experiment, we simulate a robot arm with these three perceptions, which carries out the task of lifting objects. As a result, the proposed model was able to form concepts related to the objects in the upper hierarchy through repeated experience of the task, and to predict interoception and proprioception from the input exteroception. Furthermore, we confirmed that the prediction error of the sensory signal changes according to the degree of concept formation in the upper hierarchy. In addition, by classifying the differential values for the prediction errors of the interoception calculated in the proposed model, the relationship between prediction errors of the interoception and basic emotions can be discussed.
View full abstract
-
Seiichi HARATA, Takuto SAKUMA, Shohei KATO
Session ID: 4D3-OS-4b-04
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
To emulate human emotions in agents, the mathematical representation of emotion (an emotional space) is essential for each component, such as emotion recognition, generation, and expression. This study aims to model human emotion perception by acquiring a modality-independent emotional space by extracting shared emotional information from different modalities. We propose a method of acquiring a hyperspherical emotional space by fusing multimodalities on a DNN and combining the emotion recognition task and the unification task. The emotion recognition task learns the representation of emotions, and the unification task learns an identical emotional space from each modality. Through the experiments with audio-visual data, we confirmed that the proposed method could adequately represent emotions in a low-dimensional hyperspherical emotional space under this paper's experimental conditions. We also confirmed that the proposed method's emotional representation is modality-independent by measuring the robustness of the emotion recognition in the available modalities through a modality ablation experiment.
View full abstract
-
Akihiro MATSUFUJI, Erina KASANO, Eri SATO-SHIMOKAWARA, Toru YAMAGUCHI
Session ID: 4D3-OS-4b-05
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
It is desirable for interactive robots and artificial agents to take into account the emotion and provides appropriate empathetic output to the user. The development of machine learning and deep learning technologies helped to have a big advance in the research field of emotion understanding by machine. However, these sophisticated technologies still struggling with individual differences in emotion. In this paper, we present an ensemble learning method towards emotion recognition considering the individual differences. Our proposed method divided the training data into each person's training data, and train the independent multi-models corresponding to each person as submodels of ensemble learning architecture. Furthermore, we implemented the dynamic weight decision for selecting the appropriate submodel to recognize the user's emotion using a few samples of the user's emotional behavior. As a result, our architecture performed well than the conventional machine learning model.
View full abstract
-
Kazuhiro SHIDARA, Hiroki TANAKA, Hiroyoshi ADACHI, Daisuke KANAYAMA, Y ...
Session ID: 4D4-OS-4c-01
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
Cognitive behavior therapy with virtual agents has been proposed for the purpose of promoting mental health. On the other hand, there is a lack of quantitative analysis of the dialogue content. Therefore, we analyzed the automatic thoughts of users using dialogue data based on cognitive restructuring with a virtual agent. As a result of the evaluation by a psychiatrist, 36.1% of the experimental participants were unsuccessful in identifying the automatic thoughts. Therefore, we propose a classifier to classify the success or failure of identifying automatic thoughts as a basic technology for guiding the identification of automatic thoughts. We performed supervised learning using the automatic thought sentences collected in the dialogue experiments and the automatic thoughts published in medical books as training data. As a result, the F1-score was 0.833. This classifier has the potential to allow virtual agents to automatically guide the identification of automatic thoughts.
View full abstract
-
Yuki SAMEI, Komei HIRUTA, Satoshi SUGA, Yoji KAWANO, Eichi TAKAYA, Yos ...
Session ID: 4D4-OS-4c-02
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
One of the reasons why current dialogue systems can not interact like humans is that these systems lack an ability of reflecting human emotions in interaction. In this study, we develop a new dialogue system that interacts by using multimodal emotions such as facial expressions, tone of voice, and speech content. When the multimodal emotion values obtained from the user are input into the system, it speaks with displaying pictograms with facial expressions appropriate to each situation. Experiment was conducted by preparing a comparison model. And the results showed that the proposed model can display more appropriate facial expressions.
View full abstract
-
Tung The NGUYEN, Koichiro YOSHINO, Sakriani SAKTI, Satoshi NAKAMURA
Session ID: 4E1-OS-11a-01
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
Reusing policies in a new domain, which is trained on the existing domain, is an important problem of dialogue management research based on reinforcement learning. This work defines action-relation probabilities between the action spaces of the new and the target domains using mixture density networks for the reuse of policies. Experimental results showed that the proposed modeling of action-relation probabilities based on component matching using regression realized the effective policy reuse.
View full abstract
-
Atsumoto OHASHI, Ryuichiro HIGASHINAKA
Session ID: 4E1-OS-11a-02
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
In order to accomplish tasks, it is important for task-oriented dialogue systems to adapt to users and dialogue situations. However, in many systems, each module is developed separately and connected, which makes it difficult for a system to respond flexibly to unexpected users and dialogue situations. In this research, we aim to realize a system that can adapt to users and dialogue situations by making each module share its own information with others and learn how to behave in order to maximize the system performance through reinforcement learning. With dialogue simulations in a tourist domain, we confirmed that the proposed method leads to an improvement in the task completion rate.
View full abstract
-
Hiroaki SUGIYAMA, Hiromi NARIMATSU, Masahiro MIZUKAMI, Tsunehiro ARIMO ...
Session ID: 4E1-OS-11a-03
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
In recent years, several high-performance conversational systems based on the transformer encoder-decoder model have been proposed. Natural response generation is achieved by increasing the system scale (model parameters, amount of training data, etc.). While previous studies have analyzed the relationship between the system size and decoding method on the subjective evaluation of dialogues, they have not analyzed the differences among Fine-tune corpora. In addition, conventional analysis has focused only on overall naturalness and superiority, and has not sufficiently analyzed the relationship with multifaceted and detailed impressions. We evaluate and analyze the impressions of human dialogues in different Fine-tune corpora, system sizes, and the use of additional information.
View full abstract
-
Yoshiki OHIRA, Takahisa UCHIDA, Takashi MINATO, Hiroshi ISHIGURO
Session ID: 4E1-OS-11a-04
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
The purpose of this research is to develop a dialogue system that models its user’s preferences and experiences in daily dialogue. Understanding the user's preferences and experiences is important to increase the user's dialogue satisfaction. When acquiring user information, it is necessary to continue the dialogue according to the user's knowledge. In this paper, we propose a recovery method that tries to identify the intended concept in the user's utterance by comparing the user's utterance with the system's concept when it is not identified (error). The context of the dialogue is defined as a frame representation, and the system updates the context to identify the intended concept based on the information obtained from the user's previous utterances. In addition, when the user's utterance is ambiguous, it performs estimation to determine the intended concept. Here, it uses a common sense based on the experience data of third parties obtained in advance. The goal is to identify the intended concept without decreasing the user's motivation to talk. This kind of error recovery method is important not only for robust dialogue generation during user information acquisition, but also for promoting mutual understanding between users and the system.
View full abstract
-
Toshiki MUROMACHI, Yoshinobu KANO
Session ID: 4E1-OS-11a-05
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
Backchannels could allow spoken dialogue systems to make communication smoother and to elicit more conversation from users. We propose a model that uses acoustic features, linguistic features, and dialogue histories, to predict appropriate timings of backchannels. Our experimental results show that the proposed method performs better than our baseline model that uses acoustic and linguistic features only. Furthermore, we conducted a subjective experiment on predicting timings of backchannels, which results showed that the proposed method can predict the timings of the giving backchannels with a performance similar to that of a human annotator. We obtained a higher evaluation than the baseline model in our five-grade evaluation by seven human subjects, confirming the effectiveness of our proposed method.
View full abstract
-
Hirofumi KIKUCHI, JIE YANG, Hideaki KIKUCHI
Session ID: 4E2-OS-11b-02
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
Recently, the number of elderly people living alone in households is increasing in Japan. In these households, the frequency of conversation is decreasing. There are concerns that less frequent conversations will lead to a decline in health. Spoken dialogue systems are expected to be used to meet this demand for conversation. However, spoken dialogue systems have a problem of decreasing the users' desire to continue the dialogue. In this research, we aim to solve such a problem of breakdown. We have confirmed that there exists an acceptable range of system response to users' utterances using a single speaker's utterances. In this paper, we recorded user utterances by nine speakers and conducted a listening evaluation experiment to confirm the existence of acceptance for various types of user utterances. As results, the tendency of the relationship between user utterances and system responses, which is related to the users' acceptance judgment, was clarified.
View full abstract
-
Kazuya MERA, Mayuna ISHIDA, Shunsuke HABARA, Yoshiaki KUROSAWA, Toshiy ...
Session ID: 4E2-OS-11b-03
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
To deal with user’s emotion on rule-based dialogue system, vast number of rules should be prepared because the number of rules are multiplied by the type of emotions. On the other hand, although statistical dialogue system is robust for various inputs, it can deal with only text information. This paper proposes a statistical dialogue system which can deal with user’s emotions by converting them into emojis. The user’s emotion is estimated from tone of the user’s voice and appended to the tail of input sentence as an emoji. Then, emoji in output sentence is considered as the agent’s emotion when the output voice is synthesized. The question-answer pairs including emojis are collected from tweet-reply pairs on Twitter, and the pairs are also used for fine-tuning. Experimental result revealed that reply utterances by the proposed method were more suitable for user’s emotions than that by no-emoji method.
View full abstract
-
Sanae YAMASHITA, Noriyuki OKUMURA
Session ID: 4E2-OS-11b-04
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
Dialogue systems have a problem of lack and inconsistency of characteristics or personality. This paper describesa text replacing method by subwords using BERT’s masked token prediction with transfer learning. As a result,we found that SentencePiece method without morphological analysis replaces tokens more fluent than the methodof Byte Pair Encoding after morphological analysis. And also, on conscientiousness and neuroticism factors ofBig Five, SentencePiece shows that transfer learning with individual tweets of research participants can reflect thepersonality of the writer.
View full abstract