Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
36th (2022)
Displaying 101-150 of 727 articles from this issue
  • Yuki SHIMIZU, Takuto YAMAUCHI, Kenji TEI
    Session ID: 1N1-GS-5-02
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    System optimization needs to be achieved according to the evaluation indicators.The study of qualitative makespan comparisons provided an algorithm that not only guaranteed safety, but also automatically generated the optimal process in terms of time efficiency from qualitative execution time data.However, this algorithm of makespan comparison has challenges in computational cost and cannot be applied to large models.In this paper, we propose a partial problematization algorithm for the makespan comparison problem that improves the computational cost.The proposed method reduces the computational cost of the comparison problem by finding the state space to be compared in the model, and then performing the makespan comparison for each partial problem.We tested our method on multiple patterns of models, and has succeeded in reducing the comparison time by up to 91.7\% while maintaining the equivalence of the generative controller for models consisting of multiple subproblems with convergence to a single end state and branches with contingencies.

    Download PDF (509K)
  • Koshi SHIBATA, Kei KIMURA, Taiki TODO, Makoto YOKOO
    Session ID: 1N1-GS-5-03
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    The multi-agent pathfinding (MAPF) problem takes as input a set of agents with different starts and goals, and assigns a path that does not cause conflicts among the agents. There is a powerful exact algorithm called Meta-Agent Conflict Based Search (MA-CBS) for minimizing the sum of the costs for each agent to reach the goal, which is known to be NP-hard. The behavior of MA-CBS varies depending on a condition called the merge policy, but it is known that it requires a huge amount of execution time for some MAPF instances under the conventional static merge policy. In this paper, we propose a new merge policy that dynamically changes the properties of the algorithm, and show by computer experiments that the algorithm that implements the proposed merge policy runs efficiently on more various instances than the conventional one.

    Download PDF (529K)
  • Kenta HANADA, Yuki AMEMIYA, Kenji SUGIMOTO
    Session ID: 1N1-GS-5-04
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    We aim to obtain feasible solutions of Generalized Mutual Assignment Problem (GMAP) as good as possible in a distributed environment (multi-agent system). In existing studies, distributed Lagrangian heuristic algorithms have been proposed where the agents try to find feasible solutions based on prices (Lagrange multipliers), and it is known that good quality feasible solutions can be obtained by updating the price based on the solution of the convex optimization problem (Lagrangian dual problem). In this study, we introduce Alternating Direction Method of Multiplier (ADMM) as an algorithm for updating prices. ADMM is an algorithm to solve convex optimization problems in a distributed environment with a proximal operator (mapping). However, the proximal mapping cannot be calculated easily due to the black box of the objective function in the Lagrangian dual problem. Therefore, the proximity mapping is calculated by linear approximation of the objective function. The effectiveness of the proposed algorithm is evaluated by numerical experiments.

    Download PDF (312K)
  • Toshihiro MATSUI
    Session ID: 1N1-GS-5-05
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    Equality among agents in multiagent reinforcement learning is an important issue in practical domains. In an existing study, a centralized solution method based on the approach of multi-objective reinforcement learning that reduces the action cost of each agent and improves the equality among them has been proposed. However, it applies a solution method based on a modified Q-learning on joint state- action space for all agents. Therefore, its state- action space is relatively large, and there are opportunities of decentralized methods that decompose the state-action space into multiple parts and integrate them via multiagent cooperation techniques. Toward the application of the existing method to decentralized approaches, we investigate a solution method that employ decomposed learning tables for partial joint state-action spaces of pairs of agents, and experimentally evaluate the influence of the proposed approach.

    Download PDF (287K)
  • Daiki SHIMOKAWA, Naoto YOSHIDA, Shuzo KOYAMA, Satoshi KURIHARA
    Session ID: 1N4-OS-10a-01
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    Currently, there has been a lot of research on so-called Narrow AI, but the study of Artificial General Intelligence (AGI) is still in the beginning phase. For the realization of AGI, the planning method that integrates System1 and System2 is necessary. Therefore, the application of Activity Propagation Multi-Agent Planning has been proposed as a method that combines both immediate and deliberative planning for this purpose. However, the parameter tuning method required to run this network has not been established, and a huge number of parameters have been determined manually. Therefore, in this paper, we attempted to automate the parameter adjustment using evolutionary computation, taking into account the similarity of Agent. The experiments were conducted in a simulation environment called Tile World. The results show that by using NeuroEvolution of Augmenting Topologies (NEAT), an evolutionary computation, we were able to obtain a high degree of fitness under the experimental environment and automatically adjust the parameters considering the similarity of Agent.

    Download PDF (912K)
  • Yuya OSAKI, Yoshihiko KATO, Yutaro YAMANAKA, Taro TOKUI, Satoshi KURIH ...
    Session ID: 1N4-OS-10a-02
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    Object detection in images plays an important role in assisting visually impaired people who have limited visual information to walk. In recent years, neural network-based inference has become the mainstream for object detection. However, due to the structure of object detectors, the input image is reduced to a fixed size, and thus the detection of small objects has been considered a difficult problem. In order to solve this problem, a method that divides the input image into segments and then performs inference on each segmented image is considered. However, this method increases the processing time significantly. In this study, we propose a method that creates a knowledge database similar to the one we have, and uses it to perform inference by focusing on areas where the target object is likely to be. In this study, we conducted experiments under the assumption that we have knowledge that there is a traffic light behind a pedestrian crossing, and showed that we can detect traffic lights that are not detected by inference based on the input image. We also showed that our system can reduce the processing time compared to a method that performs inference on a segmented input image.

    Download PDF (1359K)
  • Yuya KOBAYASHI, Masahiro SUZUKI, Yutaka MATSUO
    Session ID: 1N4-OS-10a-03
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    In this research, we are interested in text-to-image from the point of view of pondering. Pondering is a mental activity which update inference results using observations and past experiences. Recently, text-to-image models using CLIP have been gaining much attention for its great generation quality. These models update their latent variables in order to adapt its output to input prompt, and this process can be regarded as a kind of pondering. However, these models update their entire output (latent variable) and they cannot attend to specific areas. Though we said this is a kind of pondering, we also think this process is not structured enough, and it is still more like intuition. In this research, we propose structured models to enable attention to specific areas and update them independently. We hope this improves the generation quality and gives us a clue to thinking about pondering.

    Download PDF (1314K)
  • Ayame SHIMIZU, Kei WAKABAYASHI, Masaki MATSUBARA, Ito HIROYOSHI, Moris ...
    Session ID: 1N4-OS-10a-04
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    Many processes within machine learning models are a black box, and in most cases, their inference cannot be explained in a way humans can understand. This becomes a serious problem when implementing machine learning in domains requiring accountability.Wan et al. have proposed a method called NBDT, which generates a classification model with a tree structure of binary classifiers as each node from a deep learning model for multi-class classification, but it is unclear what kind of decision each node represents.In our work, we propose a method to reconstruct a machine learning model whose features used for judgment can be explained in natural language by incorporating a tree structure model constructed by NBDT into human-in-the-loop.The proposed method extracts only the nodes whose judgments are clear for humans and reconstructs a transparent machine learning model based on human annotation.Through crowdsourcing experiments, we show that it is possible to build machine learning models based on human interpretable judgments expressed by natural language.

    Download PDF (2611K)
  • Satoshi KURIHARA, Takumi SUGIURA
    Session ID: 1N4-OS-10a-05
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    The past efforts toward the realization of intelligent architectures since the start of AI research can be summarized in the question of how to achieve both deliberate (System 2) and immediate (System 1) responses. Historically, deliberate architectures were the first to study, then adaptive architectures, which can operate in the complex and dynamically changing real world, attracted attention in the past. The current situation is the opposite of what we have seen in the past, where the immediate adaptive architectures based on machine learning are taking the lead and the attention to the contemplative architectures is increasing. In this paper, we discuss the key technologies for integrating system1 and 2, such as the free energy principle, reservoir computing, and the activation spreading type multi-agent planning proposed by the authors.

    Download PDF (385K)
  • Hiroshi YAMAKAWA, Ayako FUKAWA, Yutaka MATSUO
    Session ID: 1N5-OS-10b-01
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    In current consciousness research, the major theories that originate in neuroscience deal with the integration of information, and many of the theoretical arguments originate from existence in the world. However, none of these approaches provide sufficient insights into computational mechanisms. In contrast, the entification process, in which the search for existence is performed on a data structure (aligned structure) that allows inductive reasoning, can provide a basic mechanism for information integration to recognize existence. In this study, we set up a working hypothesis that consciousness is an entification process. Next, we will consider the multifaceted position of the working hypothesis, and discuss approaches to studying the brain as a reference for the mechanism that performs the entification process.

    Download PDF (770K)
  • Yuya KONDO, Naoto YOSHIDA, Shuzo KOYAMA, Daiki SHIMOKAWA, Yoshihiko KA ...
    Session ID: 1N5-OS-10b-02
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    In recent years, artificial intelligence based on neural networks has become widespread, but it remains specialized artificial intelligence. On the other hand, Agent Network Architecture (ANA) exists as a planning method that combines both readability of the thought process and immediacy and deliberation of planning required for general-purpose artificial intelligence. However, the network connectivity of ANA has been designed manually in the past. In this paper, we propose a method to automatically generate the coupling relations of ANA networks based on the difference of environmental information before and after an action. In the process of generating network connections, we provide incentives to perform unlearned actions as “curiosity”. Through experiments in a virtual environment, we show that the proposed method can generate network coupling more efficiently than the methods that perform actions randomly or give priority to unlearned executable actions. Furthermore, by applying the proposed method to a real robot, we show that the proposed method can be useful in a real environment by incorporating a mechanism that avoids the execution of error-prone actions.

    Download PDF (682K)
  • Kazuma NAGASHIMA, Ryo YONEDA, Jumpei NISHIKAWA, Junya MORITA, Tetsuya ...
    Session ID: 1N5-OS-10b-03
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    Mind wandering is caused by saturation (boredom) with the task. In other words, cognitive resources required to perform the task decrease with mastery on the task. The surplus resources, which increase with mastery, are naturally directed to thinking pattern that is usually used in default mode activity. Initially, mind wandering only consumes surplus resources. However, as it continues, the feedback loop accompanied with mind-wandering is strengthened and begins to affect task performance. In this study, we represent this process by combining functions of ACT-R (cognitive architecture) using declarative memory activation. In addition, we depict the task mastery by motor learning using feedback from online task performance. Furthermore, stochastic fluctuations are introduced into the memory activation to change the transition probability between the loops of thinking. These fluctuations are assumed to correspond to parasympathetic activity, which increase with time and decrease with novel stimuli. Simulations with implemented mechanism have shown the learning curve that is consistent with optimal level theory of arousal (inverted U-shaped curve).

    Download PDF (465K)
  • Shoichi HASEGAWA, Yoshinobu HAGIWARA, Akira TANIGUCHI, Lotfi El HAFI, ...
    Session ID: 1N5-OS-10b-04
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    We propose a method that integrates probabilistic logic and spatial concept to enable a robot to acquire knowledge of the relationships between objects and places in a new environment with a few learning times. By combining logical inference with prior knowledge and cross-modal inference within spatial concept, the robot can infer the place of an object even when the probability of its existence is a priori unknown. We conducted experiments in which a robot searched for objects in a simulation environment using four methods: 1) spatial concept only, 2) prior knowledge only, 3) spatial concept and prior knowledge, and 4) probabilistic logic and spatial concept (proposed). We confirmed the effectiveness of the proposed method by comparing the number of place visits it took for the robot to find all the objects. We observed that the robot could find the objects faster using the proposed method.

    Download PDF (792K)
  • Reo KOBAYASHI, Naoto YOSHIDA, Sawako TAJIMA, Yuuki SAMEI, Yoshihiko KA ...
    Session ID: 1N5-OS-10b-05
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    In recent years, there has been a great deal of research on autonomous agents that can achieve their own goals by interacting with the environment through sensors and actuators. Among them, there is research on the architecture of autonomous agents, in which the agent’s overall action selection is determined by multi-agent negotiation, and the agent operates in a real environment. In this paper, we propose an architecture that can cope with dynamic environments, interact with users and other systems, apply optimization algorithms, and achieve multiple goals. It is based on a multi-agent method called ANA, with two extensions: a change in the way sensors, actuators, and planners are arranged, and a change in the way modules in the planner are connected. In order to confirm its effectiveness of it, we performed simulations on a game engine. The results were compared with those of the existing ANA, and it was confirmed that the proposed method enables user interaction and reactive action selection.

    Download PDF (729K)
  • Taro HATAKEYAMA, Ryusuke SAITO, Komei HIRUTA, Atsushi HASHIMOTO, Satos ...
    Session ID: 1O1-GS-7-01
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    Manga is one of the representative cultures of Japan. In general, manga artists depict characters with black lines on a white background and describe their appearance and movements abstractly, while exaggerating their characteristics geometrically. Aiming to reproduce such information processing capabilities of humans computationally, we propose Conditional GAN Inversion, which is the application of GAN Inversion to Conditional GAN, to realize the translation from photos to manga faces. Conditional GAN learns multiple domains in a shared network. It enables geometrically large deformations and the preservation of the identity of original images. Experimental results show that our method generates high-quality manga faces preserving the drawing style and the identities compared to other related state-of-the-art methods.

    Download PDF (765K)
  • Tomohiro SHIZUNO, Taiyo NAKAGAWA, Daiki ISHIGURO, Daii HOSOKAWA, Yuki ...
    Session ID: 1O1-GS-7-02
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    Since jewelry images are characterized by light reflections, they require more elaborate corrections than landscape images. In this study, we develop a system that uses CycleGAN to learn the translation between the captured image and the image processed by an expert using Photoshop. We applied various low-light image correction methods to the jewelry and compared them using the evaluation metrics of PSNR, SSIM, and color similarity in LAB color space. The results show that CycleGAN outperforms the other methods with the help of image registration technique.

    Download PDF (1321K)
  • Komei HIRUTA, Ryusuke SAITO, Taro HATAKEYAMA, Atsushi HASHIMOTO, Satos ...
    Session ID: 1O1-GS-7-03
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    One of the requirements for creating engaging content such as comics and games is to design fascinating characters. Fascinating characters may come out by intuitive inspiration that cannot be expressed in words. On the other hand, excellent inspiration, which is the source of creative ideas, is not something that people can just come up with. Therefore, the purpose of this research is to help creators expand their imagination by controlling the generated images based on non-verbal impressions. First, we propose Conditional FastGAN, which can generate high-quality data even on small datasets such as artworks. In addition, in order to extract the "impressions" that people receive from images as features, we have developed an annotation system that utilizes "color" as a non-verbal impression medium. Our experiments, used cartoon face images extracted from Osamu Tezuka's works, the MUCT dataset consisting of face photographs in various orientations, and a color-based impression information dataset obtained by our system. The results showed the possibility to generate images corresponding to the conditions of live images, cartoons, and impressions.

    Download PDF (768K)
  • Masaya HOJO, Kota YOSHIDA, Takeshi FUJINO
    Session ID: 1O1-GS-7-04
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    The object detection technology using deep neural networks (DNNs) is used in advanced driver assistance systems (ADAS). A visible-light (RGB) camera is often used in automobiles, however performance-degradation occurs because of weak reflected light from objects when the ambient light source is insufficient. A far-infrared (FIR) camera, which detects FIR light radiated from objects, is not affected by the ambient light sources. On the other hand, it is required to collect FIR images in various seasons and weather because the appearance of objects on FIR images depends on the ambient temperature. Furthermore, FIR cameras are not commonly equipped with vehicles, making it difficult to acquire large numbers of FIR images. In this paper, we generate FIR images from RGB images by using multiple Cycle GAN trained at different ambient temperatures. The object detection model trained by the generated images has high robustness against the change of ambient temperature.

    Download PDF (2498K)
  • Masafumi OKAMOTO, Hiroshi SHIMBO, Sano SHUNSUKE, Toshiaki MIZOBUCHI, T ...
    Session ID: 1O1-GS-7-05
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    In recent years, there has been a demand for comprehensive deterioration evaluation of aging concrete structures. One of the methods of comprehensive deterioration evaluation is impact-echo monitoring. In this paper, as a part of the quantification of impact-echo monitoring by machine learning model, we investigate the improvement of discrimination accuracy of the model by data augmentation of impact-echo sound. Based on the existing impact-echo sound data collected from real concrete structures, fake sound data is generated by a generative model using the GAN method to extend the training data for the discriminative model. By comparing the discriminative model trained on the training data consisting only of the existing impact-echo sound data and the discriminative model trained on the training data after the data augmentation, the improvement of the discrimination accuracy for the test data of impact-echo sound is demonstrated.

    Download PDF (514K)
  • Zhen ZHAO, Yoshiki HASIOKA
    Session ID: 1O4-GS-7-01
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    Extracting information from images of cards such as driver’s licenses or credit cards is a computer vision task with widespread needs. In many cases, images taken with a smartphone are taken from an arbitrary position and angle specified by people. To recognize the card’s text with OCR, it is necessary first to localize the card within the image, transform it to a rectangle, and then rotate it to the correct orientation. Deep learning-based methods are able to perform these localization and rotation tasks with high accuracy. However, handling the two tasks with two separate models results in increased processing times. In this work, we propose a solution to this problem which uses a single object detection model to perform both the localization and rotation tasks, thereby allowing cards to be processed quickly without sacrificing accuracy.

    Download PDF (652K)
  • Yuto ICHIKAWA, Kennichiro SHIMADA, Ryosuke TANNO, Tomonori IZUMITANI
    Session ID: 1O4-GS-7-02
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    Data augmentation by random pasting of cropped rectangular images, which include objects, is commonly used to generate training data for object detection model learning.Using this simple method, the boundary between the original image and the pasted images tends to be unnatural.This may affect detection performance.In addition, it is costly to crop images into the object shapes with the masks obtained by unsupervised learning methods.We propose a method to generate images with natural boundary using the copy-paste GAN.The method can produce augmented images without mask creation costs.To show the effectiveness of the method, we compared it to conventional methods in terms of detection accuracy and the confidence score using two image datasets,the Airbus Ship Detection Challenge dataset, and the Happy-whale dataset.The proposed method demonstrate the effectiveness of our data augmentation framework.

    Download PDF (964K)
  • Masahiro Okano OKANO, Riku OGATA, Junichi OKUBO, Junichiro FUJII, Taka ...
    Session ID: 1O4-GS-7-03
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    Assessing the value of public spaces is an issue in urban planning. People's staying time in the target space is used as one of the indexes to quantify the value of public space. Until now, surveys have been conducted by people visually checking videos, but labor is required and work efficiency is required. Therefore, in this study, we examined a method to automatically evaluate the staying time by utilizing RTFM (Robust Temporal Feature Magnitude learning) shown by Tian et al. RTFM is a model for detecting abnormal action of surveillance cameras, and this method was applied to detect specific human behavior in public spaces. In this paper, we conducted RTFM learning on images taken with a handy camera in a public space, and compared the accuracy when the loss function and activation function were changed.

    Download PDF (497K)
  • Hidenori YAMATO, Makoto OKADA, Naoki MORI
    Session ID: 1O4-GS-7-04
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    Since comics are multimodal creations consisting of pictures and dialogues, they are attracting attention in the understanding of human creations by artificial intelligence. In comics, identifying this relation in pictures and dialogues is an important task for artificial intelligence to understand the contents of comics, because it helps to understand the contents. In this study, we focus on the anaphoric relation in demonstratives in reference resolution for the purpose of understanding comics with artificial intelligence. We constructed a dataset on the anaphoric relation between demonstratives and their antecedents in comic dialogues, and applied the constructed dataset to machine learning methods to verify the possibility of correspondence analysis of indicatives in comics and the effectiveness of the dataset.

    Download PDF (391K)
  • Mizuki TADA, Makoto OKADA, Naoki MORI
    Session ID: 1O4-GS-7-05
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    Nowadays, understanding various rhetorical expressions is an important task in computerized text comprehension. We attempted to construct a dataset of sarcastic expressions in Japanese comics for the purpose of understanding sarcasm in comics. One problem is the lack of data on sarcastic expressions in Japanese. To solve this problem, we used the English sarcasm corpus to estimate sarcasm by constructing several models using deep learning techniques, and applied these models to the comic dialogue data in order to confirm effectiveness of the English dataset to the Japanese data and the possibility of the proposed dataset and method.

    Download PDF (367K)
  • Takayuki YAMAUCHI, Yuichi TOKUYAMA, Tomoya KAWAI, Hidetaka EGUCHI, Gak ...
    Session ID: 1O5-GS-7-01
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    Classification with imbalanced class distributions is a major problem in machine learning. It is well known as the imbalanced data problem. A known effective method to handle this problem is equalizing the number of data for each class by upsampling or downsampling. Another is reflecting the weight according to the number of data in each class in the learning process. However, these methods may cause decreases in the generalization performance of the deep learning models and lack of information during learning, and result in decreases in classification accuracy. In this study, we are investigating a learning process of the deep learning model using typical images (master images) that capture the characteristics of images for each class. In this paper, the effect of the proposed method on the imbalance data problem is shown by applying it to the unbalanced data of semiconductor wafer defect map.

    Download PDF (339K)
  • Masaya OIKAWA, Takuto YAMAUCHI, Kenji TEI
    Session ID: 1O5-GS-7-02
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    Instance segmentation is a pixel-by-pixel segmentation technique for individual object instances. There have been many studies on building detection models from small datasets to solve the practical problem of insufficient training datasets. One of the studies is zero-shot learning, which is a task to correctly detect even unseen objects by transferring visual knowledge obtained from seen objects through intermediate representations such as word vectors. In this paper, we propose ZSIwithCLIP, which introduces a domain-independent image classification model (CLIP) to improve the class recognition accuracy of the zero-shot instance segmentation model (ZSI). We also conducted learning and inference experiments to examine the effectiveness of ZSIwithCLIP. As a quantitative evaluation, the overall accuracy of instance segmentation is slightly reduced, while as a qualitative evaluation, the accuracy of classifying semantically similar classes is improved for simple images with a small number of objects and little overlap between objects.

    Download PDF (792K)
  • Ayaka TAKAMOTO, Kyoji UMEMURA
    Session ID: 1O5-GS-7-03
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    Classifying music into genres or composers is a classical task in music classification. In this paper, we propose a novel method for classifying music composers. In general, we use whole data in the music piece when we estimate the composer. However, this research considers estimating the composer after using the segmented data rather than whole the music data. In our proposed method, we divide the data into short segments and use majority voting. For this experiment, we collected 15 piano pieces from 5 composers and used a total of 75 pieces of MIDI data. The experiment shows that our proposed method produces a higher number of correct answers than the results of previous studies using the information of all the pieces. We also show that our proposed method can also be used for a detailed analysis of musical pieces.

    Download PDF (381K)
  • Takuya ARIMOTO, Kazuaki KONDO, Kei SHIMONISHI, Yuichi NAKAMURA
    Session ID: 1O5-GS-7-04
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    While a groupwork activity is a good tool for learning how to cooperate with others, it requires participants to achieve productive reflection. To develop an information media for supporting reflection of a groupwork activity, we propose a new method of characterizing groupwork scenes captured in a video. It automatically forms a low-level feature space that well approximates given scenes based on a topic model called Latent Dirichlet Allocation (LDA). We define elementary ``interaction words'' to describe participants' cooperative behaviors and ``scenes'' with a collection of them. Because their relation is corresponding to that between ``words'' and ``documents'' assumed in the topic model, it enables LDA to characterize groupwork activity records. In the experimental evaluation, we applied the proposed method to a cooperative groupwork in which participants build a higher tower as possible within a time limit. We confirmed that each groupwork scene is reasonably characterized in the viewpoint of cooperation.

    Download PDF (896K)
  • Yoshihiro YAMAZAKI, Shota ORIHASHI, Ryo MASUMURA, Mihiro UCHIDA, Akihi ...
    Session ID: 1O5-GS-7-05
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    Many attempts have been made to build multimodal dialog systems that can answer a question about given audio-visual information, and a representative task for such systems is Audio Visual Scene-Aware Dialog (AVSD). For understanding visual information, most conventional AVSD models adopt a Convolutional Neural Network (CNN)-based video feature extractor. Although CNN tends to extract spacially and temporally local information, global information is also crucial to boost video understanding, since AVSD requires long-term temporal visual dependency and whole visual information. In this study, we apply the video understanding module with space-time attention that can capture spatially and temporally global representations more efficiently than the CNN-based module. We observed that our AVSD model achieved high objective and subjective performance scores for answer generation.

    Download PDF (866K)
  • Yoshiki OHIRA, Takahisa UCHIDA, Takashi MINATO, Hiroshi ISHIGURO
    Session ID: 1P1-GS-10-01
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    The purpose of this research is to develop a dialogue system that models user opinions in daily dialogue. Modeling the user's opinion is important to increase the user's dialogue satisfaction. In this research, we use a model that abstracts the opinions of multiple people (social opinion model) to model the opinions of users, and model the opinions of individuals from the viewpoint of mutual correspondence and differences. We also realized a dialogue that updates the social opinion model from the individual opinion model is also realized. As a method of constructing a social opinion model, an individual opinion model is abstracted from multiple viewpoints (what, where, who, how). This makes it possible to generate various social models. First, we analyzed the opinion data collected in advance and extracted the social model. As a result, it was confirmed that a social model that is referred to in dialogue from a specific viewpoint can be constructed. Next, based on this, we examined a dialogue strategy that models both the social model and the individual model. Specifically, we explored dialogue strategies when there are multiple social opinion models, and rules for updating social models from individual models acquired through dialogue.

    Download PDF (487K)
  • Makoto KOIKE
    Session ID: 1P1-GS-10-02
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    US Patent No. 3951134 relates to a machine for changing the brain wave of a subject. The machine is a kind of an anti-personnel radar, comprising a transmitter for irradiating microwave onto the head of the subject; a receiver for receiving the reflection at the head of the subject; and a computer for analyzing the reflection to obtain brain waves. A feedback signal is transmitted to the head of the subject so as to change the brain waves, thereby allowing the machine to intervene the free will of the subject. US Patent No. 7150715 improves the neurofeedback machine such that distributed artificial intelligence is used as the computer. Since these machines apply anti-personnel radar, the brain waves can be obtained without any consent from the subject. But as an ethical matter, the consent of the subject should be mandatory. From the perspective of philosophy, Leibniz submits that the preestablished harmony is realized through monads. Presumably, the preestablished harmony would be achieved when the machine intervenes in the free will of the humanity. Similarly, the categorical imperative, which is the core of the moral philosophy of Kant, would be realized by the machine for intervening in the free will.

    Download PDF (381K)
  • Hikaru ISHIZUKA, Shun SHIRAMATSU, Keiko ONO
    Session ID: 1P1-GS-10-03
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    The term "Sublation" or "Aufheben" refers to the process of arriving at a developed answer to two opposing arguments without denying either of them. The sublation is important for building consensus that can be accepted by everyone in public debates. In this study, we quantified the the degree of sublation and conducted a discussion experiment and analyzed the results in order to investigate the factors necessary for the cessation of conflicting opinions in discussions. As a result, a weak positive correlation was found between the number of URLs posted as evidence for one's opinion and the cessation of opposition to a consensus proposal. The number of URLs posted as the basis of one's opinion during a discussion was weakly positively correlated with the cessation of opposing views. However, in actual discussions and debates, there are times when everyone has similar arguments and there are few or no opposing arguments. In such a situation, it can be said that there is a bias of opinions in the discussion, and it is difficult to stop the discussion. To address this proble, we prototyped a method to eliminate bias in opinions, in which a bot posts information that reinforces the opinion of a minority in a discussion.

    Download PDF (679K)
  • Chenhao CHEN, Kosuke TOKUHARA, Yutaka ARAKAWA, Ko WATANABE, Shoya ISHI ...
    Session ID: 1P1-GS-10-04
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    In this paper, we present an end-to-end online meeting quantifying system, which can exactly detect and quantify three micro-behavior indicators, speaking, nodding, and smile, for online meeting evaluation. For active speaker detection (ASD), we build a multi-modal neural network framework which consists of audio and video temporal encoders, audio-visual cross-attention mechanism for inter-modality interaction, and a self-attention mechanism to capture long-term speaking evidence. For nodding detection, based on the WHENet framework proposed in the research field of head pose estimation (HPE), we can estimate the head pitch angles as the nodding feature. Then we build a gated recurrent unit (GRU) network with squeeze-and-excitation (SE) module to recognize nodding movement from videos. Finally, we utilize a Haar cascade classifier for smile detection. The experimental results using K-fold Cross Validation show that the F1-score of each detection module achieves 94.9%, 79.67% and 71.19% respectively.

    Download PDF (253K)
  • Shougo NAKAMURA, Hajime MURAI
    Session ID: 1P1-GS-10-05
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    In the automatic generation of stories, methods based on quantitative structure analysis have been tried. However, conventional objects are only short story patterns. Except for the genre of role-playing game (RPG), analysis and application of the complex plot structure which combines multiple short stories have hardly been carried out. Moreover, diversity and time series change of complex plot structure have not been clarified. In this study, we analyzed the narrative structure and made the data of the continuous, nested, and parallel relationships in multiple short stories. As a result of comparing between series of works, the features of each series were extracted. In addition to that, the features of the time series change of the complex plot structure were also clarified. The results of this research can be utilized as a basis for automatic plot generation of long and complex stories.

    Download PDF (631K)
  • Ryuji KAWANO, Daiki TAMASHIRO, Satoshi ONO
    Session ID: 1P4-GS-6-01
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    Recent studies have shown that Deep Neural Networks (DNNs) can cause misclassification by adversarial examples (AEs), which are input including carefully designed perturbations. This paper proposes an adversarial attack method to DNNs for Japanese language processing. The proposed method add perturbations to Japanese sentence by character type conversion, i.e., converting word notations in the sentence between kanji, hiragana, and katakana. Experimental results showed that the proposed character type conversion attack successfully made DNNs misclassify Japanese sentences.

    Download PDF (522K)
  • Yoichi ISHIBASHI, Sho YOKOI, Katsuhito SUDOH, Satoshi NAKAMURA
    Session ID: 1P4-GS-6-02
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    Word embedding is a fundamental technology in natural language processing. Representation learning for words has been progressing; however, word set operations in embedding spaces have not been defined yet, although they are often performed on sets of words. In this study, we aim to compute representations of sets and set operations using only pre-trained word embeddings. We focused on quantum logic and demonstrated that quantum logic can function as a linguistic set operation in the embedding space.

    Download PDF (432K)
  • Mikiteru KOBAYASHI, Yoshihide KATO, Shigeki Kobayashi MATSUBARA
    Session ID: 1P4-GS-6-03
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    Syntactic conversion methods are useful for comparing the performance of parsers that are based on different syntactic formalism. Most of the methods, however, do not treat empty categories, which the current phrase structure parsers can identify. This paper proposes a method that uses empty category information for conversion from phrase structures into dependency structures. Our proposed method converts a phrase structure of Penn Treebank (PTB) into a dependency structure of Enhanced Universal Dependencies (EUD). The method copies the phrase structures corresponding to empty categories and replaces each empty category with the corresponding phrase structure. This treatment of empty categories enables us to derive dependency relations that are represented by empty categories.

    Download PDF (349K)
  • Yuki NAKATANI, Tomoyuki KAJIWARA, Takashi NINOMIYA
    Session ID: 1P4-GS-6-04
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    In text generation tasks such as machine translation, models are generally trained using cross-entropy loss.However, mismatches between the loss function and the evaluation metric are often problematic.It is known that this problem can be addressed by direct optimization to the evaluation metric with reinforcement learning.In machine translation, previous studies have used BLEU to calculate rewards for reinforcement learning, but BLEU is not well correlated with human evaluation.In this study, we investigate the impact on machine translation quality through reinforcement learning based on evaluation metrics that are more highly correlated with human evaluation.Experimental results show that reinforcement learning based on BERT trained on the STS task can improve various evaluation metrics.

    Download PDF (288K)
  • Mana IHORI, Hiroshi SATO, Tomohiro TANAKA, Ryo MASUMURA
    Session ID: 1P4-GS-6-05
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    This paper defines the document revision task and proposes a novel modeling method. In this task, we aim to simultaneously consider multiple perspectives for writing supports. To this end, it is important not only to correct grammatical errors but also to improve readability and perspicuity; however, it is difficult to prepare enough matched dataset that handles multiple perspectives simultaneously. To mitigate this problem, our idea is to utilize not only a limited matched dataset but also various partially-matched datasets that handles individual perspectives. Since suitable partially-matched datasets have either been published or can easily be made, we expect to prepare a large amount of these partially-matched datasets. To effectively utilize these datasets, our proposed modeling method incorporates ``on-off'' switches into sequence-to-sequence model to distinguish the matched and individual partially-matched datasets. Experiments using the document revision dataset demonstrate the effectiveness of the proposed method.

    Download PDF (1147K)
  • Soichiro MURAKAMI, Sho HOSHINO, Peinan ZHANG
    Session ID: 1P5-GS-6-01
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    Natural language generation for advertising emerges as an effective tool that helps the advertisers to strive for an increasing number of online ads. In this survey paper, we overview the research trends in the past decade including template-based methods and neural generation models. We also discuss the key challenges and directions for future work such as faithful generation, utilization of multimodal information, and development of benchmark datasets.

    Download PDF (643K)
  • Yoshiyuki KOBAYASHI, Takafumi KOSHINAKA
    Session ID: 1P5-GS-6-02
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    Many EC sites publish reviews, which are feedback from purchasers of each product, to help customers make purchase decisions and improve site sales. A review consists of review text, which is an impression of the product, and a numerical rating, such as a five-point scale. It is beneficial for both the site operator and the customers if this information is properly provided. In this study, we conduct a regression analysis to predict the rating from the review text. An experiment using 60,000 reviews from Rakuten Ichiba shows that the Transformer model can better predict the five levels of ratings given by customers compared to linear prediction models and RNN-based models. In addition, while the ratings given by the consumers and the review text generally agree, there are cases where they do not, and qualitative analysis confirms the possibility that the predictive model can detect and correct such discrepancies.

    Download PDF (465K)
  • Hayate WAKASAKI, Shunsuke KANDO, Yusuke MIYAO
    Session ID: 1P5-GS-6-03
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    Detailed analysis of the content and writing style of music reviews has not yet been sufficiently conducted, despite the potential for industrial and academic development. In this research, we tackle the task of automatically classifying each sentence of a music review by computer according to its content. We also developed a classification scheme for this purpose, and constructed an annotated corpus based on the scheme. Each sentence of a review may consist of parts describing the background of the composition, the personal information of the player, the characteristics of the performance, or the author's impression from the performance. In order to create a classifier that can automatically perform such a classification, we first designed an annotation scheme that enables clear classification by humans. Based on this scheme, we manually annotated the review text data collected from a music review website, and constructed a dataset for training and evaluation of machine learning classifier. These results were evaluated by the degree of annotation agreement between annotators and the accuracy (F-measure) of the classifier.

    Download PDF (346K)
  • Kohei FUKAMACHI
    Session ID: 1P5-GS-6-04
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    Onomatopoeias can sensitively and concisely express concepts that are difficult to verbalize sensitively and concisely. Interactive AI, such as smart speakers, are becoming increasingly important, but it is difficult for them to understand the subtleties of new onomatopoeias or onomatopoeias that are unique to a particular person. Our work attempted a more practical classification of onomatopoeic usage by focusing on the accent position in speech onomatopoeia, with reference to a previous work. As a result, our work can achieve higher classification accuracy than in a previous work.

    Download PDF (318K)
  • Yuuki SHIGEMATSU, Motoki AMAGASAKI
    Session ID: 1P5-GS-6-05
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    The task of summary generation is important in a society with rapidly growing data. We can expect that summary generation tasks, which extract only the necessary information from a large amount of information, will attract more and more attention in the future. This study focuses on Summary generation for the purpose of assisting people to acquire information. We consider an extractive summary generation model using unsupervised learning. Specifically, we propose EST (Extractive Summarization from masked sentences prediction Transformer), which generates extractive summaries from inter-sentence information.In our evaluation, although the average of the evaluation results was lower than that of the related studies, it was found that some articles obtained higher scores than others.

    Download PDF (399K)
  • Akira MATSUKI
    Session ID: 1S1-IS-3-01
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    Japan has experienced many disasters in the past, but there is a gap in the ability to respond to disasters depending on the region. Therefore, it is hoped that the use of agent simulators for disaster response will provide virtual experience in response. However, there are still many regions that do not know the correct strategies due to lack of experience and are unable to effectively use the agent-based strategy planning system. Therefore, we attempt to bridge the gap in response capability between regions by transferring strategies from regions with rich experience in disaster response to other regions with insufficient experience. Specifically, we will construct a disaster response simulator with expert agent functions that simulates the rescue strategies of experienced agents and utilizes the expert strategies imitated by the agents in the process when the simulator is used in other regions to support strategy planning. However, it is inconvenient if the strategies of the agents are uniform, because the quality of their activities is evaluated differently in each region, such as whether or not they have experienced a disaster. In order to solve this problem, we add a mechanism that can be modified by the decision makers who use the system, and introduce a framework of HITL that continuously learns from people for better strategies.

    Download PDF (278K)
  • Shiyao DING, Hideki AOYAMA, Donghui LIN
    Session ID: 1S1-IS-3-02
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    Most prior work on Multiagent path finding (MAPF), a problem of identifying a group of collision-free paths for multiple agents, was on grid graphs, assumed agents' actions are only four directions (up, down, right, left) or wait. We study here a new MAPF problem that does not rely on such assumptions and is more generally on a non-grid graph. Some algorithms for solving traditional MAPF can also be applied to this new problem, which can be categorized two types: search-based method and learning-based method. However, the challenges created by the non-grid feature, such as large state/action space hinder to apply either of two types methods. Thus, we propose a third approach that combines MARL algorithm and search method, can accelerate the learning process. Specifically, one part of the agents’ pathfinding is solved according to predefined rules. Then, based on the pathfinding results, the other part of the agents are further trained by MARL. This can accelerate the learning process. Finally, the experimental results show our proposed method to be more effective than some existing algorithms.

    Download PDF (478K)
  • Jawad HAQBEEN, Sofia SAHAB, Takayuki ITO
    Session ID: 1S1-IS-3-03
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    While expert rating and crowd voting is still a dominant approach for selecting winners in contests for creative works, however, due to participation scale in such large-scale idea contests, utilizing only the human-led mechanisms in the winning selection process are untenable. Thus, a few digital participatory platforms have recently used a “scoring system” to reward points for participants, that is, let the submitted opinions be evaluated by AI. Toward this end, drawing upon tournament theory, we investigate how a contest’s reliance on AI-enabled point-based system ranking can help experts and crowd for selecting winners, and also, we assess how each of these selection mechanisms (system ranking/scoring, expert rating, and crowd voting) affects contest winners’ selection. This enables us to collect large-scale participation, and to explore participation ranking in light of AI, expert rating, and crowd voting, and compare among each selection mechanism while looking to the final winners. The results showcase the contest’s reliance on system ranking is positively associated with expert and crowd voting winning mechanisms. Moreover, AI-enabled winning incentive mechanisms have proven essential for helping experts and crowds to select appropriate authors of ideas in large-scale contests, and is more appealing to large-scale idea contests for winner selection.

    Download PDF (1178K)
  • Ning DING, Kazuya TAKEDA, Keisuke FUJII
    Session ID: 1S1-IS-3-04
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    Evaluating the performance of players in dynamic competition plays a vital role in effective sports coaching. However, the evaluation of players in racket sports has been still difficult in a quantitative manner, because it is derived from the integration of complex tactical and technical (i.e., whole-body movement) performances. In this paper, we propose a new evaluation method for racket sports based on deep reinforcement learning, which can analyze the player's motion in more detail than the results (i.e., scores). Our method uses historical data including players' tactical and technical performance information to learn the next score probability as Q function, which is used to value players’ actions. We verified our approach by comparing various models and present the effectiveness of our method through use cases that analyze the performance of the top badminton players in world-class events.

    Download PDF (1027K)
  • Kentaro NAKAI, Ritsuko IWAI, Takatsune KUMADA
    Session ID: 1S1-IS-3-05
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    For humans, meals are significant, not only to intake nutrients or satisfaction, but also to feel connected with family and society through interpersonal communication. The purpose of this study is to estimate and examine psychological states and traits from texts describing eating experiences using BERT. Texts about positive, negative and neutral eating experiences are collected from 877 crowdworkers, along with their psychological traits (loneliness and depression). The accuracy of 6-label classification of three psychological states x two eating situations (co-eating or eating alone) is 72%. Although the accuracies of binary classification of loneliness and that of depression are around 55%, the accuracies are comparable with those of crowdworkers. These results suggest that estimating psychological traits is more difficult than psychological states from a single text per crowdworker. Further analyses reveal that the fine-tuned BERT classifiers of psychological traits use different language features from human raters.

    Download PDF (261K)
  • Xiaoyun GUO, Guanhong LI
    Session ID: 1S4-IS-1-01
    Published: 2022
    Released on J-STAGE: July 11, 2022
    CONFERENCE PROCEEDINGS FREE ACCESS

    With the accelerating development of AI technology, society has stepped into the 5.0 era and the level of digitization of society has been improved like never before. As a result, IoT systems built on the basis of AI technology have become the infrastructure of society, and their defects may have a huge impact causing the loss of property and even human lives. Therefore, we have higher requirements for the reliability and safety of IoT systems. To ensure the reliability and safety of IoT systems, formal verification is an effective method to check IoT systems with abstract mathematical models during the system design phase. However, the application prospects and potential problems of this approach in the Social 5.0 era are not clear. In this paper, we analyze and discuss this issue. We discuss the prospect of the application of formal verification by analyzing its applicability and the way it would be applied in IoT systems. We conclude that the reliability and security of IoT systems can be improved by the large-scale application of formal verification. Based on this, we also discuss the security and privacy issues that need to be considered for the large-scale application of formal verification in IoT systems.

    Download PDF (150K)
feedback
Top