-
Koki USUI, Kazunori TERADA, Celso M. de MELO
Session ID: 2I5-OS-9a-02
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
The expressed emotion (facial expression) has the ability to change the decision-making and behavior of the observer. In the present study, we showed that when an industrial robot expresses emotions in a situation that requires cooperative decision-making with human partners, the time to reach an agreement increased.
View full abstract
-
Seiichi HARATA, Takuto SAKUMA, Shohei KATO
Session ID: 2I5-OS-9a-03
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
This study aims to obtain a mathematical representation of emotions (an emotional space) common to the modalities. We compare methods for embedding emotions into non-Euclidean spaces using multimodal DNNs. The proposed model fuses the emotional spaces for each modality embedded in the latent space based on the Klein model, a hyperbolic space model. Then, we train the model by multitasking the emotion recognition task and the latent space unification task. In the experiment using audio-visual data, we compare the representation on hyperbolic space with the Euclidean or Hemi-hyperspherical representation considered in previous studies. We evaluate the robustness of emotion recognition when the modality is missing and confirm that the proposed method obtains shared representations of emotions across modalities in a low-dimensional hyperbolic space. We also compare the emotion recognition tendency of the proposed model with human raters to examine the representational power of the proposed emotional space.
View full abstract
-
Mayuko OZAWA, Ryotaro NAGASE, Takahiro FUKUMORI, Yoichi YAMASHITA
Session ID: 2I5-OS-9a-04
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
In this paper, we analyze the impact of topic information on the performance of speech emotion recognition (SER). Conventional methods estimate emotions considering individuality, such as age and gender. However, there are factors, not limited to individuality, that influence the expression of emotions. For example, when people talk about the death of their friend, they may feel more sadness than anger and happiness. We propose two methods of SER considering the topic information. One of the proposed methods is using one-hot representation of the topic label. The other is utilizing the occurrence ratio of emotions in each topic. We experiment with the IEMOCAP corpus for recognizing emotions. As a result, the two methods performed equally or better than the baseline. These results show that topic information is effective on the performance of SER.
View full abstract
-
Yan ZHOU, Tsukasa ISHIGAKI, Shiro KUMANO
Session ID: 2I5-OS-9a-05
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
Post hoc approaches that seek explanations in deep models using reverse engineering and other methods have been widely used in Explainable AI. However, approaches that aim to build models that are inherently interpretable by limiting complexity have not yet been much explored, at least in the field of Affective Computing. In this study, we aim to achieve both high predictive performance and interpretability by integrating an explanatory item response model for ordinal scales, for which psychological interpretation is well established, into a deep neural network. Experiments were conducted to confirm the extent to which the proposed method can predict an individual's perceived emotion from the facial expressions of others, and good prediction results were obtained. The proposed method is expected to be used for education and support of interpersonal interaction as a complementary method to the post-hoc method.
View full abstract
-
Kana MIYAMOTO, Hiroki TANAKA, Satoshi NAKAMURA
Session ID: 2I6-OS-9b-01
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
We have developed an emotion induction system that predicts participants' emotions from EEG and provides personalized music. Although it is important to secure the amount of data for training emotion prediction models, it is a burden for the participants to record EEG data for a long time. In this study, we aim to investigate a training method for using a small amount of EEG data. We propose using meta-learning that trains a pre-training model that can be adapted easily to each participant. As a result of predicting valence and arousal from EEG, the method with meta-learning showed a significantly lower prediction error than the method without meta-learning (p<.001).
View full abstract
-
Kazuhiro Shidsara SHIDARA, Hiroki TANAKA, Hiroyoshi ADACHI, Daisuke KA ...
Session ID: 2I6-OS-9b-02
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
Cognitive Behavior Therapy (CBT) is a psychotherapy that has been well-established as a method of mental health care for the general public in addition to the treatment of psychiatric disorders such as depressive disorders. In CBT, a human therapist asks a patient questions to evaluate an automatic thought and guides the patient to improve their moods. Virtual agents have been anticipated to be able to automatically provide CBT. In this study, we investigate the effect of questions of virtual agents on the mood of participants. We implement two scenarios with different numbers of questions and conducted a two-group comparison experiment. As a result, it was found that the amount of improvement in mood was significantly larger in the scenario with many questions than in the scenario with few questions. This result implies that the virtual agent's questions contribute to the improvement of the user's mood.
View full abstract
-
Ryota MATSUKUMA, Shogo OKADA, Maiko MATSUMOTO, Atsushi NAKAMOTO
Session ID: 2I6-OS-9b-03
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
Research on social signal processing is being conducted to compute and understand the aspects of sociality that humans form through communication and behavior from multimodal information observed from communicators. In recent years, machine learning methods have been proposed to model the skills of communicators. We collected data on online counseling and created a multimodal corpus for the purpose of analyzing and modeling the consultation skills of professional advisors. The data corpus is composed of images, voices, utterances of counselors and client, and empathic label. In this paper, we outline the corpus and report the results of the initial analysis, focusing on the synchrony of the actions both counselor and client in estimating the scenes in which the counselor and client gained empathy and the client gained conviction.
View full abstract
-
Tomoya OHBA, Gai SUZUKI, Haruki KUROKI, Shogo OKADA
Session ID: 2I6-OS-9b-04
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
The purpose of this study is to develop a system to evaluate skills in job interviews and to provide feedback for skill improvement. We constructed a VR-experience type humanoid agent system and collected a dataset of job interview dialogues. This dataset includes multimodal information of the interviewee’s video, biometric signals, gaze, and speech during the dialogue, interview skill scores annotated by expert interview trainers for each question-answer, and self-annotations of confidence in the interview.We report the results of our analysis of a model we built to predict interview skills and subjects’ confidence level.
View full abstract
-
Tomoki YAMAUCHI, Kei NAKAGAWA, Kentaro MINAMI, Kentaro IMAJO
Session ID: 2J4-GS-10-01
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
In recent years, many investors have been developed quantitative stock prediction models based on machine learning. It is difficult to put a machine learning-based stock price prediction model into practical use due to two challenges: market efficiency and lack of interpretability. Trader-Company (TC) method is a recently developed evolutionary method that finds interpretable temporal rules with high prediction accuracy. However, the TC method does not take into account regime changes, and the regime changes may worsen the prediction accuracy. Therefore, in this study, we propose the Multiple-World Trader-Company (MWTC) method in order to improve high robustness against regime changes. In the MWTC method, the Company model that manages Trader is used as a weak learner, and multiple companies individually learn the training data divided by regime. Empirical analysis using actual market data shows that the MWTC method achieves better prediction accuracy than the baseline method.
View full abstract
-
Kentaro MINAMI, Kentaro IMAJO, Kei NAKAGAWA, Taku IMAHASE
Session ID: 2J4-GS-10-02
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
Full-Scale Optimization (FSO) is a framework of portfolio construction that directly maximizes expected utility over a sample of historical returns. The existing formulation of FSO is based merely on the empirical distribution of returns, which can lead to poor out-of-sample performance when the return distribution is time-varying. In this paper, we propose a framework of portfolio construction, Predictive Full-Scale Optimization (PFSO), which combines FSO and distributional prediction. PFSO is flexible enough to incorporate investors' risk appetite and perspectives on future return distribution. Also, we propose a novel continuous optimization algorithm for FSO that rapidly converges to optimal solutions under hierarchical budget constraints. We perform numerical experiments on real-world portfolio data and demonstrate the effectiveness of our proposed method.
View full abstract
-
Daiki KATO, Jiacheng LI, Masato NOTO
Session ID: 2J4-GS-10-03
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
In recent years, there has been a lot of research on stock price forecasting. Technical analysis, which uses opening and closing prices to make forecasting, has become the mainstream method for stock price forecasting. Since this method grasps the trend of data and makes forecasting, the ease of making forecasting differs depending on the data. Traditionally, daily-charts have been widely used; and depending on the research, other charts such as minute-charts have been selected by hand. Currently, there is little evidence that the selected scales are the most appropriate, and the results could be further improved. In this study, we propose a method to relate the results of machine learning and statistical methods. In the experiment, we use USDJPY and multiple time series. First, we investigate easily predictable scales by using a LSTM model. The aim is to gain a foothold in explainability by providing statistical support for the results. As a result of the experiment, we find that there are differences of the predictive accuracy on each scale. In addition, the correlation with the data is confirmed. Finally, we discuss how to use this research to expand explainability.
View full abstract
-
A case of the rental office market in Tokyo
Takeshi MIZUTA
Session ID: 2J4-GS-10-04
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
There are growing concern about the impact of COVID-19 pandemic on commercial real estate market and related financial systems. This research perform a comparative analysis of office rent and demand forecast in the Tokyo office market. Firstly, we estimate office rent trend after 2020 using the DiPasquale and Wheaton’s office rent model (DiPW, hereinafter), which is one of traditional regression-based forecasting strategies from real estate economics. Secondly, we construct office space demand model using recurrent neural network (RNN) and embed it into the DiPW model. We also apply dimension reduction via dynamic factor analysis (DFA) to summarize macro-economic trend and compare these forecast models in terms of predictive accuracy. By combining RNN and DFA, we examine the predictive relationship between office space demand and macro-economic trend to find the following. First, the prediction accuracy is improved by introducing dynamic factors to machine learning model. Secondly, we find that not only GDP related indices but also economic indices related to labor, firm and public finance are important factors in forecasting office space demand in Tokyo.
View full abstract
-
Shaofeng YANG, Yoshiki OGAWA, Koji IKEUCHI, Ryosuke SHIBASAKI
Session ID: 2J4-GS-10-05
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
This study proposes an economic damage estimation model based on the Graph Neural Network (GNN) method, assuming that economic damage spreads through networks among firms when firms are affected by natural disasters. In the proposed model, various inter-firm networks (seven types in total, such as business relationship, investment relationship, same industry, same region, etc.) are extracted from the corporate credit survey data set. Next, graph data is created from the features of individual firms and the network structure of the extracted firms, and a learning model based on the GNN is constructed. To test the effectiveness of the model, we trained seven types of networks from actual company data for the period from 2009 to 2019 and validated their forecasting accuracy. As a result, we found that the prediction error (MSE) of the networks among firms with investments, business partners, and the same municipality was relatively small, and in particular, the model of the relationship with the investee had the smallest prediction error. This implies that the damage spillover of the transaction value of firms is more likely to be influenced by each firm's investees, business partners, and firms in the same city.
View full abstract
-
Junichi ARAHORI
Session ID: 2J5-OS-24a-01
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
Discussions on AI principles have shifted from content to implementation methods. Legislation is being discussed globally in order to realize the principles, and the impact on future AI development is yet unknown. Here's a bird's-eye view of the strict legal obligations that European Committee proposed in 2021, which imposes strictly on AI providers.
View full abstract
-
Roy SUGIMURA
Session ID: 2J5-OS-24a-02
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
International standardization of AI started in 2018 at ISO / IEC JTC 1 SC 42. In this paper, we will give an overview of the flow of research and development and commercialization, which are important for the standardization of AI, and explain the basic information for considering the direction of discussions on ethics and governance.
View full abstract
-
Focus: ISO/IEC and IEEE
Takashi EGAWA
Session ID: 2J5-OS-24a-03
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
This paper describes how ISO/IEC and IEEE standards contribute to the realization of AI ethics. First it clarifies what aspect of AI ethics can be handled by standards. Then it introduces the latest discussion about values to be standardized, and argues that most of them are traditional ones, and fine tuning of existing standards is sufficient. Then it shows the details of how AI ethics are standardized by describing ISO/IEC TR 24027 (bias) and others.
View full abstract
-
Minao KUKITA
Session ID: 2J6-OS-24b-01
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
The purpose of this presentation is to shed light on what ethical governance must be, especially in comparison to risk management. To do so, the author will refer to the proposed AI regulation published by the European Commission in April 2021. It categorizes AI systems into those that create "unacceptable risk," "high risk," and "low or minimal risk," and proposes to impose different levels of legal regulation for each. The author will discuss what the proposed regulations consider to be risks, and how dealing with such risks relates to ethics, or where there is a difference. In conclusion, the author will argue that while risk analysis methods are based on predetermined evaluation criteria, ethics, in its essence, must include questioning and revising existing evaluation criteria. This implies that risk management and ethical governance must be complementary in methodology.
View full abstract
-
Yuri NAKAO, Kenji KOBAYASHI, Simone STUMPF
Session ID: 2J6-OS-24b-02
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
As artificial intelligence (AI) has been used to make social decisions such as hiring, loan decisions, and bail decisions, the issue that AI models can reflect the discriminatory bias that training data include has been pointed out. Although various majors to mitigate the bias and achieve fair AI have been taken, truly fair AI is difficult to realize because the concept of fairness is difficult to be defined unitarily because the concept of fairness can be changed according to the difference in cultures or positions. In this paper, we propose a framework called "Fairness by Design" to reflect the diverse concepts of fairness. The framework consists of a series of workshops to clarify the design requirement of AI systems with people from diverse cultures and positions, the development of interactive AI systems, aggregating the suggestions for model adjustments by people in various cultures and positions, and the mitigation of the bias considering intersectional bias. We apply this framework to the loan decision data and clarify that the concept of fairness is diverse on the basis of cultures and we can obtain the different AI models according to the cultures.
View full abstract
-
Takuya YOKOTA, Yuri NAKAO
Session ID: 2J6-OS-24b-03
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
Recently, fairness-aware artificial intelligence (AI) technology has been developed to remove discriminatory bias, e.g., bias on race and gender. On the other hand, there are trade-offs among the metrics of accuracy and fairness in AI models and different stakeholders have different preferences for the metrics. Hence, to form an agreement on the preferences, existing research has explored workshop approaches encouraging dialogue among stakeholders. However, it is practically difficult for multiple stakeholders to have conversations at the same place and time. In this paper, we propose a method to aggregate the preference of each stakeholder using an online survey. In our study, 739 crowdsourced participants are randomly divided into 4 stakeholder groups and asked to rank 5 machine learning models. Through this survey, we calculate the preference of each stakeholder group. We examine the preference of each stakeholder and whether the information on the other stakeholder affects a stakeholder's preference for the metrics. As a result, through our method, the preference of each stakeholder successfully meets the requirements of their role. On the other hand, it is clarified that the preference for the metrics is not affected by the information on the other stakeholders.
View full abstract
-
Hirohito OKUDA, Tetsuya ISHIDA
Session ID: 2J6-OS-24b-04
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
Co-creation business is emerging to adopt rapidly developing technologies and respond to complexifying market. In co-creation businesses, multiple actors collaborate together toward common business goals with different interests and expected responsibilities. We discuss challenges toward implementation of AI principles for co-creation businesses.
View full abstract
-
AI Ethics Impact Assessment Method
Izumi NITTA, Kyouko OHHASHI, Satoko SHIGA, Sachiko ONODERA
Session ID: 2J6-OS-24b-05
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
In recent years, "from principle to practice" has become an important issue on AI ethics. In this presentation, we propose a method called "AI Ethical Impact Assessment Method" for AI developers and providers to implement the principles of AI ethics in their AI systems. The AI Ethical Impact Assessment method analyzes where and what ethical risks exist in an AI system. By formulating guidelines for AI ethics and associating them with AI systems and relationships with stakeholders, it is possible to identify events that appear as ethical issues and the factors that cause them. This method was applied to cases of AI Incident Database operated by Partnership on AI, and the effectiveness of the method was confirmed for incident cases in various industries and AI applications. A collection of these case studies and a toolkit of the method will be published, and we will exchange opinions with people who use this method and make improvements. In this presentation, we will introduce an overview of the AI Ethical Impact Assessment Method and our efforts to improve and disseminate it.
View full abstract
-
Shunya OCHIAI, Tohgoroh MATSUI, Yuya OKAMURA, Shinji KAGEYAMA, Takeshi ...
Session ID: 2K4-GS-10-01
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
We describe a new information retrieval system to utilize business records.NAL, the target company of this study, is engaged in contract work for automobile maintenance.NAL undertakes contracts for comprehensive car maintenance with many companies that has cars.The goal of this system is to make it possible to support decision-making.We made a system to search similar sentences in the memorandum text in the business record.The user can find similar situations in the past with the current situation.We use BERT, a transformer neural network, to implement semantic search.However, our trial system could not find the appropriate sentence because the queries or memorandum tests contain the company-specific jargon and abbreviations used in NAL.We proposed a method to improve the semantic search by using Sentence-BERT and creating training data from a small jargon dictionary and a general similar sentence dataset.
View full abstract
-
Yutaka TAKAHASHI
Session ID: 2K4-GS-10-02
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
It has been a long time since the need to promote digital transformation (DX) was recognized. Nonetheless, there are not many successful examples of DX in Japan. The purpose of this study is to verify whether Japan's strategic planning direction for DX is appropriate. To test this, we focused on the performance trends of companies selected as “DX Stocks.” The result is that the companies selected for the DX stocks did not perform as well as the other “non-DX Stocks” companies although they had been selected for the “DX Stocks,” and that DX was not a success. This suggests that there are problems with the qualitative criteria used to select “DX Stocks” and with the ways of formulating individual companies’ DX strategies. To deal with this, this paper proposes the use of quantitative simulations. This suggests AI providers’ role should expand to business model simulations.
View full abstract
-
Comparison of the trust game when the counterparty is a "human" and a "robot"
Yoshiki YAMASHITA, Midori NOMA, Minori FUJIOKA, Hiroaki OZAWA, Wakana ...
Session ID: 2K4-GS-10-03
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
The purpose of this research is to clarify the mechanism of trust formation in mutual economic transactions between humans and AI. Nowadays, there are more and more opportunities for long-term relationships between humans and robots, such as nursing robots and asset management robots. Trust is essential in such continuous mutual transactions, and various studies have been conducted on trust between humans and robots. In this experiment, we examined the relationship between emotion and trust in a continuous trust relationship using a trust game with repetitions for both human-human and human-robot. The results showed that social emotions (gratitude and anger) had no effect on trust in a continuous relationship. However, the amount of money transferred for human was larger than robots in successful transactions (Return minus Investment≧0), and in unsuccessful condition (RMI≦0), one for humans was smaller than robots. This has implications for the cooperative behavior of humans and AI.
View full abstract
-
Nami IINO, Hiroya MIURA, Hideaki TAKEDA, Masatoshi HAMANAKA, Takuichi ...
Session ID: 2K4-GS-10-04
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
Instrumental performance is one of the most difficult skills to evaluate quantitatively. There are also various ways to analyze the verbal and non-verbal information generated in the field of instruction. In our previous study, we analyzed the performance and speech segments in one-to-one classical guitar lesson, and defined instructional labels that represent the ``instructor's perspective.'' In this study, we attempted to analyze the lesson structurally by aggregating the above segment information and the information of the instructional labels assigned to them. The results indicate that there is commonality in group interpretations in the tree structure and that new labels that semantically categorize the content of teachers' speeches are useful in determining the hierarchy.
View full abstract
-
Taiki IEDA, Yuji NOZAKI, Maki SAKAMOTO
Session ID: 2K4-GS-10-05
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
This paper presents a method for building a regression model that predicts the atmosphere of cities represented in onomatopoeia scales by using fundamental statistics of cities, such as population and number of restaurants. To build our regression model, we conducted two experiments, one is conducted to select suitable onomatopoeia for scales. In this onomatopoeia selection experiment, we asked participants to answer the onomatopoeia that represents the atmosphere around the station. In the other annotation experiment, we conducted questionnaire to quantify the atmosphere in onomatopoeia scales. Our regression model was a support vector regression model built from training data collected in the annotation experiment and statistical information. As a result, accuracy of our regression model for several target atmosphere ("kibi-kibi", "howa-howa", "yuru-yuru" and "iso-iso") was greater than 0.5.
View full abstract
-
Kazuma KOBAYASHI, Yasuyuki TAKAMIZAWA, Sono ITO, Mototaka MIYAKE, Yuki ...
Session ID: 2K5-OS-1a-01
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
Synthetic data using generative models have been attracting attention in recent years. One promising application of synthetic data in medical imaging is to generate medical images with particular clinical findings to complement the fundamental difficulty to collect large-scale datasets due to privacy concerns. However, generative adversarial networks have an inherent tendency to overfit the most frequent features in a dataset. Therefore, an elaborated approach is needed to obtain synthetic data for specific clinical findings. In this article, we propose a novel image generation pipeline that can incorporate expert knowledge of clinical medicine by editing generated medical images.
View full abstract
-
Rina KAGAWA, Masaru SHIRASUNA, Atsushi IKEDA, Masaru SANUKI, Hidehito ...
Session ID: 2K5-OS-1a-02
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
The development of statistical learning techniques generally requires large, accurately annotated data sets. However, for tasks where the definition of the correct label cannot be uniquely defined, especially when the task is highly specialized such as medical data, it is difficult to obtain large, accurately annotated data sets. We hypothesized that there exists an appropriate thinking time that balances the trade-off between accuracy and mental strain. We tested the effect of an intervention in which participants were prevented from answering for a certain period of time after the image was presented to them when deciding whether a medical image was abnormal or normal. In two behavioral experiments (physicians (N=634)), the expectation of a correct response increased when the image was made unanswerable for one second after presentation. This study showed that annotation quality can be improved in a simple and cost-effective way by utilizing human cognitive characteristics.
View full abstract
-
Mitsuhiko NAKAMOTO, Satoshi KODERA, Hirotoshi TAKEUCHI, Shinnosuke SAW ...
Session ID: 2K5-OS-1a-03
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
Self-supervised learning has been demonstrated to be a powerful way to use unlabeled data in computer vision tasks. In this study, we propose a self-supervised pretraining approach to improve the performance of deep learning models that detect left ventricular systolic dysfunction from 12-lead electrocardiography data. We first pretrain an encoder that can extract rich features from unlabeled electro- cardiography data using self-supervised contrastive learning, and then fine-tune the model on the downstream dataset using the pretrained encoder. In experiments, our proposed approach achieved higher performance than the supervised baseline method, using only 28% of the labels used by the baseline method.
View full abstract
-
Shogo TAKAOKA, Atsushi IKEDA, Hirokazu NOSATO, Hidenori SAKANASHI, Mas ...
Session ID: 2K5-OS-1a-04
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
This paper proposes a lesion detection model using annotation expansion for Narrow Band Imaging (NBI). In the proposed method, Variational Autoencoder (VAE) is used to extend the annotation of a data-set performed by medical specialists. This method compensates for the lack of NBI data, which is not easy to collect, and leads to improved lesion detection performance. In this paper, in order to verify the effectiveness of the proposed method, experiments using actual bladder endoscopic images were performed. As a result of the experiment, a sensitivity of 74.9%, specificity of 98.2%, and F value of 78.0% were obtained. This result shows an improvement in lesion detection performance compared to the model without annotation expansion, and confirms the effectiveness of the proposed method for lesion detection.
View full abstract
-
Ryuunosuke KOUNOSU, Hirokazu NOSATO, Yuu NAKAJIMA
Session ID: 2K5-OS-1a-05
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
When applying artificial intelligence to medical imaging, deep learning models pre-trained on ImageNet are commonly used. However, ImageNet cannot be used for commercial purposes, making it difficult to put to practical use even if excellent diagnostic support is achieved. Therefore, we propose a method to apply a deep learning model pre-trained on the FractalDB dataset, an automatically generated image dataset, to medical imaging. In this paper, we use cystoscopy images to validate the effectiveness against medical images of proposed pre-training method. As a result, the classification model using FractalDB-1k, which has 1000 classes among FractalDB, as a pre-training model outperformed the classification model trained only on cystoscopy images in terms of Accuracy, Sensitivity, Specificity, F1-Score, Precision, and AUC.
View full abstract
-
Yukari SHIROTA
Session ID: 2K6-OS-1b-01
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
In this paper, we explain Shapley value that is widely used to increase explainability of AI-based analysis result, using a concrete example of a prefectural comparison of the birthrate in Japan. Suppose that we conduct the regression with birthrate as the target variable and several predictor variables such as the number of marriages. We did not find that the relationship between the target and the raw predictor variable values. On the other hand, when we use the Shapley value is used instead of the raw predictor value, a stronger correlation can be obtained. This is because the Shapley values are calculated based on the characteristic functions of the individual data (in this case, each prefecture). The structural characteristics of the prefecture vary from prefecture to prefecture, and the effect is different even in the same number of marriages they have. Similarly, in the medical field, the incidence rate is considered to be different even under the same conditions, depending on the characteristics of individual person. The advantage of interpretation by Shapley values is that multiple factors can be examined using characteristic functions and the importance of the factors can be determined based on the structural characteristics of the individual. The intrinsic meaning of Shapley values is explained in this paper in an easy-to-understand way by visualization.
View full abstract
-
Ko MURASE, Shinichirou YOKOYAMA, Shogo HUKUDA, Ken INOUE, Shogo OKADA
Session ID: 2K6-OS-1b-02
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
In recent years, there has been concern that the number of patients with dementia will increase as the population ages. On the other hand, if the tendency toward dementia can be detected at an early stage, it may be possible to delay the progression of symptoms by providing appropriate treatment. Against this background, the establishment of a behavioral recognition model for estimating dementia tendency based on behavioral information is one of the most important issues in health care technology. In this study, we focus on the findings that dementia tends to be associated with reduced daytime activity, sleep disturbance, and irregular sleep patterns. First, we obtained various behavioral data from 132 subjects, including those diagnosed with dementia, and created a data set that included the Mini Mental State Examination (MMSE) at the time of measurement. The behavioral data obtained were specifically activity per minute, heart rate, and respiration rate, each of which was obtained for 24 hours for each subject. In this presentation, we report the results of the construction and evaluation of a classification model for MMSE scores indicating dementia tendency using the dataset. The result of the three-class classification using Random Forest was 0.459 in F-value. We also report the results of visualization of the contribution using SHAP in this case.
View full abstract
-
Iko NAKARI, Keiki TAKADAMA
Session ID: 2K6-OS-1b-03
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
This paper proposes the novel Sleep Apnea Syndrome (SAS) detection method based on the comparison of Random Forests (RF). Concretely, the method compares two RFs between SAS and non-SAS subjects to discover the physiological characteristics and detects SAS according to the difference of the RFs. The method employs the bio-vibration data acquired by the mattress sensor during sleep as the input of RF and the WAKE (shallow sleep) or non-WAKE as the output of RF for learning characteristics of WAKE. Through the human experiment with nine SAS and nine non-SAS subjects, the following implications have been revealed: (1) the accuracy of SAS detection with the proposed method is 88.9% and (2) a comparison of RFs between SAS and non-SAS subjects discovers that it is difficult to estimate WAKE for SAS based solely on the magnitude of body movement, whereas it is easy to estimate WAKE based on that for non-SAS.
View full abstract
-
Satoshi KODERA, Kota NINOMIYA, Shinnosuke SAWANO, Susumu KATSUSHIKA, H ...
Session ID: 2K6-OS-1b-04
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
Since medical AI is expected to be socially implemented and have a great impact on patients, it is important to investigate patient awareness of medical AI. In this study, we conducted an internet survey for the purpose of surveying patients' awareness of medical AI. Considering gender and age, a questionnaire survey was conducted with 1240 people who regularly visit medical institutions and 620 people who do not. 78.1% were expecting medical AI, and 47.7% were worried about medical AI. The percentage of people who accepted the diagnosis of medical AI was 56.7% when the diagnosis accuracy was high, 41.4% when the explanation was clear, and 84.8% when the diagnosis accuracy was high and the explanation was clear. When the opinions of the attending physician and medical AI differed, if the diagnostic accuracy of the attending physician and AI was the same, the doctor's diagnosis was accepted, but it was overwhelming at 81.3%. When the medical AI diagnosis was wrong, 65.7% of the respondents thought that the doctor was responsible for the wrong drug treatment. This study showed that both accuracy and accountability are important in medical AI, and that patients trust their doctor more than medical AI.
View full abstract
-
Kota NINOMIYA, Hiroki SHINOHARA, Kodera SATOSHI, Katsushika SUSUMU, Sh ...
Session ID: 2K6-OS-1b-05
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
In order to properly interpret medical images, a great deal of experience is required in addition to specialized knowledge. However, the noise generated by the limitations of examination equipment and other factors has made their interpretation difficult. Although various denoising methods using deep learning have been proposed, it is not always clear which denoising method is appropriate for medical image interpretation by a specialist. In this study, we investigated denoising methods suitable for medical image interpretation through evaluation experiments on four kinds of movies: echocardiography (gray scale and color), coronary angiography (gray scale), and in-vehicle videos in a city (gray scale).The videos using DnCNN, PPN2V, and Real ESRGAN, which are denoising methods based on deep learning, and the original videos were ranked by five cardiologists. Real ESRGAN was stably rated higher than the original images except coronary angiography. The other methods showed equal or slightly inferior results when compared to the original movie. This suggests that as for medical images a combination of Real ESRGAN and a denoising method to preserve the edges and structure of the objects will enable better interpretation support.
View full abstract
-
Koki YAMADA, Ayako YAMAGIWA, Yosuke TAKAO, Masayuki GOTO
Session ID: 2L1-GS-2-01
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
Dedicated Apps for child-caring QA system are widely used by many parents, on which users can ask questions about child-rearing. Problems that parents have should be changed depending on their children's life stage. If we can comprehend the problems that parents who grow small children have from the question data, it can contribute to the improvement of the users' satisfaction for a QA service by giving users proper information with correct timing based on problems they have. In this study, we propose an analytical model which can extract the topics of users' problems from question data and acquire their topic transitions over the stages of child-rearing. Furthermore, by taking a probabilistic view of the topic transitions, we construct a method for estimating the topic that is most likely to shift from a previous topic at a given time to the next non-question period. Finally, we show the results of applying the proposed method to the real data and conduct an evaluation experiment to show the usefulness of the estimation result by the proposed method.
View full abstract
-
Kazunori YAWATA, Keisuke KIRYU, Kota KATAYANAGI, Ken MOHRI, Kazuho SEK ...
Session ID: 2L1-GS-2-02
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
In recent years, there has been a remarkable development in natural language processing technology using deep learning algorithms, such as BERT developed by Google and the GPT-x series developed by the OpenAI Foundation. Nowadays, research is being conducted not only on simple tasks such as categorizing sentences, but also on generative tasks such as creating and summarizing sentences. In this experiment, we generated a pre-training model of GPT-2 and fine-tuned it to adapt to the question-answering task in order to verify whether GPT-2 can be applied to question-answering chatbots. For fine-tuning, we used the FAQ data of a life insurance company. As a result, we were able to obtain natural answers in about 80% of the test data and ideal answers in about 60%. We believe that this mechanism can be used to configure a question and answer system with a different approach from the rule-based system.
View full abstract
-
Ryohei KANEDA, Daichi HAGA, Hiroaki SUGIYAMA, Masaki SHUZO, Eisaku MAE ...
Session ID: 2L1-GS-2-03
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
Advances in neural language processing technology make it possible to generate more natural speech in non-task oriented dialogues such as chatting. In order to achieve natural and diverse speech production, it is necessary to generate utterances not only by referring to the history of previous utterances, but also by referring to appropriate external knowledge. In addition to structured information used in task-oriented dialogues (e.g., price and access in travel guide dialogues), unstructured information represented by user-generated contents (e.g., review text from general users) is expected to be utilized as external knowledge. However, it is not always easy to extract appropriate external knowledge according to context under a mixture of structured/unstructured information. In this study, we investigated a knowledge selection method for speech generation using BERT. We took the travel guide domain as a case study and examined the input information for appropriate knowledge selection.
View full abstract
-
Keita MORIWAKI, Shun OONO, Hiroaki SUGIYAMA, Masaki SHUZO, Eisaku MAED ...
Session ID: 2L1-GS-2-04
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
To constrain the factual inconsistency in neural generated sentences, we tried to postfix the inconsistent sentences. Detection/modification models were trained by pseudo dataset which were rewritten from original dataset for the neural sentence generator. Our experimental results show sequential process of detection and modification of inconsistence was effective while single process of modification tended to change some consistent sentences. For some remaining inconsistent sentences, other training datasets for detection and modification models to work complementarily will improve the postfix performance.
View full abstract
-
Masashi OKADA, Tadahiro TANIGUCHI
Session ID: 2M1-OS-19a-01
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
The present paper proposes a novel reinforcement learning method with world models, DreamingV2, a collaborative extension of DreamerV2 and Dreaming. DreamerV2 is a cutting-edge model-based reinforcement learning from pixels that uses discrete world models to represent latent states with categorical variables. Dreaming is also a form of reinforcement learning from pixels that attempts to avoid the autoencoding process in general world model training by involving a reconstruction-free contrastive learning objective. The proposed DreamingV2 is a novel approach of adopting both the discrete representation of DreamingV2 and the reconstruction-free objective of Dreaming. Compared to DreamerV2 and other recent model-based methods without reconstruction, DreamingV2 achieves the best scores on five simulated challenging 3D robot arm tasks. We believe that DreamingV2 will be a reliable solution for robot learning since its discrete representation is suitable to describe discontinuous environments, and the reconstruction-free fashion well manages complex vision observations.
View full abstract
-
Eri KURODA, Ichiro KOBAYASHI
Session ID: 2M1-OS-19a-02
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
The use of machine learning to understand the real world has been one of the most important challenges in recent years. Variational Temporal Abstraction (VTA) is a model that extracts the latent structure of a changing environment from visual information. However, VTA extracts the structure of the changing points of image features, not the latent structure that represents the changing points based on the physical behavior of objects in the image. In this study, we improved VTA to extract the latent structure that represents the change point by expressing the physical relationship based on the behavior of the object in the image expressed as a graph structure. By doing so, we tried to realize world recognition based on the recognition of the physical behavior of objects, as humans do. We also verified the accuracy of judging collision, disappearance, stopping, etc. of objects represented as graphs using the proposed method.
View full abstract
-
Kaito KUSUMOTO, Shingo MURATA
Session ID: 2M1-OS-19a-03
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
Representation learning of multi-modal data has a potential to understand a shared structure across modalities. The objective of this study is to develop a computational framework that can learn to extract latent representations from multi-modal data by using a deep generative model. A particular modality is considered to hold low-dimensional latent representations; however, these representations are not always fully shared with another modality. Therefore, we assume that each modality holds both shared and private latent representations. Under this assumption, we propose a deep generative model that can learn to extract these different latent representations from both non-time-series and time-series data in an end-to-end manner. To evaluate this framework, we conducted a simulation experiment in which an artificial multi-modal dataset consisting of images and strokes with shared and private information was utilized. Experimental results demonstrate that the proposed framework successfully learned to extract both the shared and private latent representations.
View full abstract
-
Minori TOYODA, Kanata SUZUKI, Yoshihiko HAYASHI, Tetsuya OGATA
Session ID: 2M1-OS-19a-04
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
In this study, we achieved bidirectional translation between description and action using small paired data. The ability to mutually generate descriptions and actions is essential for robots to collaborate with humans in their daily lives. The robots need to associate real-world objects with linguistic expressions, and machine learning approaches require large-scale paired data. However, a paired dataset is costly to construct and difficult to collect. We propose a two-stage training method for the bidirectional translation that does not require complete paired data. In the proposed method, we pre-trained autoencoders for description and action with a large amount of non-paired data. Then, we fine-tuned the entire model to combine their intermediate representations using the small paired data. We experimentally evaluated our method using a paired dataset consisting of motion-captured actions and descriptions. The results showed that our method performed well even when the number of paired data to train was small.
View full abstract
-
Atsuya KITADA, Yusuke IWASAWA, Yutaka MATSUO
Session ID: 2M1-OS-19a-05
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
Conventional deep learning methods make a priori assumptions about the model structure for input and teacher data. Consequently, many individually optimized models have been proposed for each task. In recent years, self-supervised learning, in which models are learned from input data alone, without the use of supervisory data, to obtain a generic representation, has been actively studied. On the other hand, many of these methods still learn by pre-defining the model structure. We propose a framework of recursive self-supervised learning. The proposed method iteratively and recursively predicts the mid-layer features generated by a network trained by self-supervised learning. In this way, the proposed method stacks feature extraction layers in a bottom-up manner, producing higher-order features that integrate the input. Experiments qualitatively and quantitatively validated the effectiveness of the proposed method by weight visualization of the feature extraction layers and linear classification accuracy of the mid-layer features. The effectiveness of the proposed method was demonstrated when the task of self-supervised learning was properly set up. This study suggests that the proposed method can construct an appropriate structure depending on the input.
View full abstract
-
Masaya KAGEYAMA, Masashi OKADA, Tadahiro TANIGUCHI
Session ID: 2M4-OS-19b-01
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
World models will be an effective paradigm that enables control systems in partially observable environments, e.g., visual servoing. However, the previous literature represents controllers by neural policies, and it makes it difficult to apply to industrial scenarios with several strict requirements such as realtime control and behaviors’ explainability. This paper aims to obtain world models in which we can utilize PID controllers, a well-established and time-tested scheme in the industry. For this purpose, we introduce a new loss function and heuristics to previous world model method methods, Recurrent State Space Model and Dreaming, and discuss how to realize PID controllable world modes through visualization analysis and control experiments in learned world models.
View full abstract
-
Keno HARADA, Masahiro SUZUKI, Yutaka MATSUO
Session ID: 2M4-OS-19b-02
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
In reinforcement learning, action is treated as a point in the action space, with little emphasis on the design of the action space. In contrast to the existing reinforcement learning frameworks, we consider action as the amount of change in the latent space to reach the target state, referring to the human action process, and define this as latent action. We propose a representation learning method using Predictive Variational Autoencoder which enables that taking latent action to minimize the distance to the goal state in the latent space corresponds to the optimal action in the actual input space. We verify by experiments that action selection by latent actions using Predictive Variational Autoencoder can achieve more stable control compared to the method which uses Variational Autoencoder for current observation and selects actions based on errors from the control goal in the input space. And we discuss possible issues in extending the action selection method using latent actions.
View full abstract
-
Eiji UCHIBE
Session ID: 2M4-OS-19b-03
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
Reinforcement learning algorithms are categorized into model-based methods, which explicitly estimate an environmental model and a reward function, and model-free methods, which directly learn a policy from real or generated experiences. So far, we have proposed the asynchronous parallel reinforcement learning algorithm for training multiple model-free and model-based reinforcement learners. The experimental results show a simple algorithm can contribute to complex algorithms' learning. However, a learner was selected stochastically according to the value function, and therefore, learning mechanisms have not been discussed. In addition, several components such as state prediction and value prediction errors were not taken into account. In this study, we compare several adaptive coordination mechanisms. For example, we evaluate the coordination based on the value functions, state prediction and value prediction errors, weighted coordination, and learning the weights. Then, we discuss learning efficiency, the ability to follow the changes in the environment, and the perspective of neuroscience.
View full abstract
-
Katsuyoshi MAEYAMA, Tadahiro TANIGUCHI
Session ID: 2M4-OS-19b-04
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
In this paper, we propose an imitation learning method based on minimizing the KL divergence between the state transition prediction results of learned policy and the state estimation results of expert data. We use the Recurrent State Space Model (RSSM), a kind of world model for state estimation. RSSM has been used in PlaNet and Dreamer, which are state of the art deep reinforcement learning methods. We compared the learn on the MuJoCo simulation environment. From the experimental results, we found that the proposed method can obtain higher total rewards. Learning by this method enables imitation learning based on state transitions, rather than the direct imitation of actions.
View full abstract
-
Kento KAWAHARAZUKA, Kei OKADA, Masayuki INABA
Session ID: 2M5-OS-19c-01
Published: 2022
Released on J-STAGE: July 11, 2022
CONFERENCE PROCEEDINGS
FREE ACCESS
When a robot performs a task, it is necessary to modelize the relationships among its body, target objects, tools, and environments, and to control the body so as to realize the target states. However, when these relationships are complex, it is difficult to modelize them using classical methods, and when these relationships change with time, it is necessary to deal with the temporal changes in the model. In this study, we have developed Deep Predictive Model with Parametric Bias (DPMPB) to cope with this modeling difficulties and temporal model changes. We summarize the theory and experiments on various robots, and discuss its effectiveness.
View full abstract