-
Dilemma in Socially Responsible Investing
Takayuki MIZUNO, Shouhei DOI, Takahiro TSUCHIYA, Shuhei KURIZAKI
Session ID: 4H2-GS-11c-05
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
We connect the corporate ownership network to the network of financial instruments that inject money into the ownership network. Almost all the asset managers have at least one path through which their equity stakes eventually reach a munitions company with the second or further apart links in the global ownership network. This result suggests that two obstacles encourage decoupling of equity stakes and social responsibilities so that socially responsible investors become incapable of making positive impacts with their investing strategy. The first obstacle is the fact that ETFs and other similar financial instruments separate capital and corporate control. The second obstacle is the complexity of ownership network itself. While the investors may have the potential to control munition companies, the complexity of the ownership network is likely to prevent the investors from knowing its own potential.
View full abstract
-
Kanoko SUZUKI, Takanori MATSUI, Shun KAWAKUBO, Naoki MASUHARA, Asako I ...
Session ID: 4H3-GS-11d-01
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
Working on SDGs and sharing successful practices with wider stakeholders are important to achieve SDGs. In this study, with a deep-learning natural language processing model, BERT, we aimed to (1) build a classifier that enables to map the meanings of practices and issues to the SDGs context, (2) visualize the nexus between SDGs, and (3) build a matching system between local issues and initiatives which can be solutions. Firstly, documents which were published by the United Nations, and the Japanese Government, and the proposals for solving issues about SDGs that were collected by the Cabinet Office were collected. With those data, a data frame with each document and multi-labels corresponded to SDGs was constructed, and text data augmentation method with WordNet data-base was applied to the data frame. Next, Pretrained Japanese BERT model was fine-tuned by a multi-label text classification task, and nested cross-validation was conducted to optimize the hyperparameters and estimate the cross-validation accuracy. Finally, the co-occurrence network among SDGs was visualized with the fine-tuned BERT model, and a matching system was developed by obtaining cosine similarity between embedded vectors of local issues and initiatives.
View full abstract
-
Daisuke OBA, Naoki YOSHINAGA, Masashi TOYODA
Session ID: 4H3-GS-11d-02
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
In this study, as an approach to provide a deeper insight into neural Natural Language Processing (NLP) models, we propose a method for analyzing the roles of individual neurons, which is the finest component of the models, by observing sentences which strongly activate individual neurons. Our method retrieves the sentences from the massive corpora, and abstracts the sentences for interpretation using data-mining techniques. In the experiments, we demonstrate that our method can give thoughtful insights into what linguistic aspects each neuron of a given model captures as well as how multiple neurons relate to each other.
View full abstract
-
An analysis of word and readability
Yoshitaka HIROSE
Session ID: 4H3-GS-11d-03
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
The purpose of this study is to identify the linguistic characteristics of public sector accounting information disclosed by the world's major cities. The linguistic information of public sector accounting in annual reports disclosed by cities is analyzed by morphological analysis, TF-IDF, DP, N-gram, FOG index and other based on the analysis of readability indicators, the following results were obtained. First, the analysis of city annual reports revealed that traditionally used accounting technical terms such as assets, earnings, and cash are representative words. Second, a cross-city comparison of key words revealed that different cities emphasize different explanatory items. Third, the level of readability of the cities' annual reports was found to be, on average, that of a third-year college student. This study is the first to reveal the actual status of public accounting language information in annual reports in major cities around the world.
View full abstract
-
Shogo SHIROSAKI, Tsugumu MATSUI, Taketomo KANAZAWA, Ayu ITOU, Satsuki ...
Session ID: 4H3-GS-11d-04
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
In this study, we extracted the portions of the annual reports of companies that clearly stated that they had introduced flexible work arrangements, such as telework and flextime systems, during 2019 before the impact of the COVID-19 infection. Using text mining, we analyzed the backgrounds of the companies that had introduced flexible work arrangements and investigated and analyzed the trends of the companies. After the morphological analysis, we extracted 100 words before and after the target word using KWIC, identified the frequent words as a corpus, and visualized them by creating two-word co-occurrence network diagrams. As a result, we found that keywords related to change and positive impact on business, such as "productivity improvement," "efficiency improvement," and "reform," were frequently used. In addition, the results showed a bias toward the "information and communication industry" in the industry category and the "electronics and information communication" in the category.
View full abstract
-
Concept of issue identification process support system for child maltreatment cases
Kota TAKAOKA, Yoichi MOTOMURA, Ken SATO, Yoshiaki NISHIGAI
Session ID: 4H3-GS-11d-05
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
In order to make a legal decision, identifying issues is important in judicial proceedings. However, it is sometimes very difficult to find the issues which should be in an argument because the way of their investigation and judgement process is different in each of the cases. To solve the problem, this study aims to clarify the concept of issue identification process support based on data and build a demo of the system. As a use-case, we employ child abuse cases for the system and deploy the artificial intelligence which enables to conduct probabilistic causal inference by Bayesian Network to support their judgement. Then, we discuss the concept of the future issue of this identification process support system, including the question whether this can be applied in trial process with advanced Bayesian Network from the perspectives of each case, user and manager position.
View full abstract
-
Mai YOKOZEKI, Natsuki MURAKAMI, Riko SUZUKI, Hitomi YANAKA, Koji MINES ...
Session ID: 4I1-GS-7b-01
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
This paper introduces a new video-and-language dataset with human actions for multimodal inference.The dataset consists of 200 videos, 5554 action labels, and 1942 action triplets of the form <subject, action, object>. Action labels contain various expressions such as aspectual and intentional phrases that are characteristic of videos but do not appear in existing video and image datasets. The dataset is expected to be applied to the evaluation of the multimodal inference system between the video and semantically complicated sentences such as negation and quantity.
View full abstract
-
Takumi OHKUMA, Hideki NAKAYAMA
Session ID: 4I1-GS-7b-02
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
One-class image classification is the task of discriminating whether images belong to a certain class, and this task is important for the recognition of certain visual concepts. Human is good at solving this task with only a few data, and the performance of few-shot learning methods of previous works is much less than that of human. To improve the performance, we propose ``Multi-modal Belongingness Network (MMBeNet)'', which is an extended model of ``Belongingness Network (MMBeNet) \cite{BeNet}'', to use not only a few image data but also semantic information such as attributes and word vector, and call this task ``multi-modal few-shot one-class image classification’’. We consider that semantic information is an important factor of the high ability of humans and confirm that it is effective for this task to improve the performance through experiments. Besides, MMBeNet can solve not only multi-modal tasks but also image-only few-shot and zero-shot tasks by a single model.
View full abstract
-
Yumana HIRATA, Masashi KASAMATSU, Yukikazu MURAKAMI, Hayate WAKISAKA
Session ID: 4I1-GS-7b-03
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
In recent years, Japanese agriculture has tended to have a declining working. And there is concern that this tendency would continue to worsen. In order to realize sustainable agriculture, it is necessary to pass on the skills of veteran farmers to novice farmers. However, the opportunities for novice farmers to receive direct guidance from veteran farmers have been decreasing. Therefore, the preparation of agricultural work manuals has become more important. However, many conventional farm work manuals only consist with sentences and images. That's why there is a problem that it is difficult for novice farmers with little experience to understand them intuitively. In response to this, in this study, we used an eye camera and Mask R-CNN, which is one of the instance segmentation algorithms, to see what a veteran farmers sees, based on the difference in the line-of-sight movement between a veteran farmers and a novice farmers. We propose to create a web-based agricultural work succession manual. In this research, we created a web manual for strawberry farming work. As a work process, we took the line of sight of raising seedlings, cutting, flooding, planting, harvesting, and packing, and conducted hearings based on the difference in the line of sight between veteran farmers and novice farmers, and created a manual.
View full abstract
-
Shin ASAKAWA, Jun MUTO
Session ID: 4I1-GS-7b-04
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
Various attempts to explain the task performances of aphasic patients have been made in practice. These models have often based on explanatory models that originated in the 1980s. We attempted to update these models by using deep learning models to process visual features of the picture naming tasks and the semantic features. The representation including penultimate layers converting visual inputs such as line drawings into language responses can be explained by using convolutional neural network models, while the semantic impaired patients' task performance can be explained by word embedding models. The real images frequently elicited to assess patients in aphasia were used as visual stimuli. We advocated ResNet and VGG16 for the recognition processes of the line drawing stimuli, while employed the word2vec for the semantic representation as a lexical representation. It made them possible to provide a more detailed explanation for the empirical data. It is expected to contribute to the interpretation and implementation of neuroscience tests in the following ways: 1) clarification of the meaning of stimulus pictures and words in aphasia tests, and 2) provision of selection criteria for training materials in rehabilitation.
View full abstract
-
Yanjun SUN, Ichiro KOBAYASHI
Session ID: 4I1-GS-7b-05
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
In this study, we aim to investigate whether multimodal information can improve the understanding of uni-modal information by clarifying the relationship between the variables of each modality in the latent space. Here, we especially focus on two modalities: image and natural language, and have investigated whether a common image to synonymous sentences is useful for conversion between those two sentences through the latent space. As a result of the preliminary experiment, we confirmed that the accuracy and the efficiency of reconstructing the input sentence using the image whose content reflects that of the sentence is higher than the case without using such image.
View full abstract
-
Shoki SAKAMOTO, Akira TANIGUCHI, Tadahiro TANIGUCHI, Hirokazu KAMEOKA
Session ID: 4I2-GS-7c-01
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
Star generative adversarial network for voice conversion (StarGAN-VC) is a method allowing non-parallel many-to-many voice conversion. Though in voice conversion task, retention of linguistic information is very important, sounds converted by StarGAN-VC sometimes collapsed linguistic information. This is because StarGAN-VC does not use any linguistic information during learning the voice conversion, and it just focuses non-symbolic acoustic features.This paper proposes a method that exploited speech recognition results presumed by automatic speech recognition (ASR) in training of StarGAN-VC's Generator. The experiment shows that our proposed method can make StarGAN-VC retain more linguistic information than the vanilla StarGAN-VC.
View full abstract
-
Soichiro KOMURA, Kaede HAYASHI, Akira TANIGUCHI, Tadahiro TANIGUCHI, H ...
Session ID: 4I2-GS-7c-02
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
Nonparametric Bayesian double articulation analyzer (NPB-DAA) is a method to discover words and phoneme units from continuous speech signals in an unsupervised manner. However, acoustic features have speaker-dependency, and it prevent NPB-DAA from discovering words and phonem units from multi-speaker utterances. This paper proposes to use star generative adversarial network for voice conversion (StarGAN-VC) to extract speaker-independent acoustic features and optimize NPB-DAA and StarGAN-VC simultaneously by using mutual learning based on Neuro-SERKET framework. The effect of mutual learning is shown through an experiment.
View full abstract
-
Hiroshi FUKETA, Yukinori MORITA
Session ID: 4I2-GS-7c-03
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
In this paper, we propose neural network models based on the neural ordinary differential equation (NODE) for small-footprint keyword spotting (KWS). KWS, which detects pre-defined keyword from input audio data, draws much attention as a promising technique to realize so-called “voice user interface” that can control mobile phones and smart speakers by voice. Recently, many researchers have demonstrated KWS with artificial neural networks and have achieved high inference accuracy. Voice-controlled devices are, however, usually battery-operated, and hence memory footprint and compute resources are severely restricted. To cope with this restriction, we present techniques to apply NODE to KWS that make it possible to reduce the number of parameters and computations during inference. Finally, we show that the number of model parameters of the proposed model is smaller by 68% than that of the conventional KWS model.
View full abstract
-
Shintaro ISHIKAWA, Komei SUGIURA
Session ID: 4I2-GS-7c-04
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
Currently, domestic service robots have an insufficient ability to interact naturally through language. This is because understanding human instructions is complicated by a variety of ambiguities and missing information. Existing methods are insufficient to model reference expressions that specify relationships between objects. In this paper, we propose Target-dependent UNITER, which learns directly the relationship between the target object and other objects by focusing on the relevant regions within an image, instead of the whole image. Our model is validated on two standard datasets, and the results show that Target-dependent UNITER outperforms the baseline method in terms of classification accuracy.
View full abstract
-
an Approach Based on Graph Neural Networks
Shun TAKAHASHI, Sakti SAKRIANI, Satoshi NAKAMURA
Session ID: 4I2-GS-7c-05
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
Zero resource speech technology aims for discovering discrete units in a limited amount of unannotated, raw speech data. The previous studies have mainly focused on learning the discrete units from acoustic features, segmented by fixed small time-frame. While achieving high unit quality, they suffer from high bitrate due to the time-frame encoding. In this work, in order to lower the bitrate, we propose a novel approach based on discrete autoencoder and graph convolutional networks. We exploit the speech features discretized by vector-quantization encoding. Since the maximum number of the discretized features is predetermined, we consider a directed graph where each node represents a discretized acoustic feature and each edge transition from one feature to another. Using graph convolution, we extract and encode the topological feature of the graph into each node, and then we symmetrize the graph to apply spectral clustering on the node features. In terms of ABX error rate and bit rate estimation, we demonstrate that our model successfully decreases the bitrate, while retaining the unit quality.
View full abstract
-
Jeffrey KOUGO, Takayuki WATANABE, Junji YAMATO, Hirotoshi TAIRA, Hirom ...
Session ID: 4I3-GS-7d-01
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
Illustration images, such as those used as options in English exam questions, tend to have lower accuracy than photographs in object recognition using CNN, etc. It has been pointed out that CNN learns more texture than shape in object recognition, which is presumably an obstacle to improving the recognition rate of illustration images. In this study, we tried a method that inhibits the learning of texture information by synthesizing various textures for object images of the same shape utilizing style transition, promoting the learning of shape information, and confirmed the improvement of the recognition rate of illustration images.
View full abstract
-
Yuki IKEDA, Yongwoon CHOI
Session ID: 4I3-GS-7d-02
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
The purpose of this study is to propose a method for Image Segmentation without hand-made training data, by moving a camera itself and generating the differential image for the object. The recent image segmentation methods which added the way of a deep learning has greatly been improved in its speed and accuracy aspects, and increasingly been used in the various fields. The output of these methods has been done in the way of end-to-end without person’s help, by providing with a large amount of training data necessary to their learning. However, the training data for the good result and accuracy have to need the time and effort of many persons, and right answers in them have to be certainly included. Otherwise, the correct results are generally not expected. Thus, as a way to solve these problems required to make a large amount of hand-made training data, our proposed method will be useful. Here, the effectiveness of the proposed method will be demonstrated through the experimental results obtained by using image segmentation architecture composed to include a robot.
View full abstract
-
Kousuke UO, Hiroyoshi ITO, Masaki MATSUBARA, Atsuyuki MORISHIMA, Yukin ...
Session ID: 4I3-GS-7d-03
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
With the recent developments of deep learning, the performance of semantic segmentation has been greatly improved. Creating a large set of traning data requires high annotation costs. One of the ways to reduce the annotation cost is active learning, which selects the data that is uncertain for the current model. Most of the active learning methods assume that the annotation cost of each data is constant; however, the annotation cost varies according to the data. This paper proposes an active learning strategy to select image regions that are expected to be informative and the annotation cost of which is low. Our method predicts the annotation time of each region and combines it to the uncertainty to calculate the score. The results of our preliminary experiment demonstrate that the proposed method is able to reduce the annotation cost in the early stage of training.
View full abstract
-
Sota KATO, Kazuhiro HOTTA
Session ID: 4I3-GS-7d-04
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
In semantic segmentation, where all pixels in an image are labeled, CNN is known to produce highly accurate results, and it has been applied to automated driving techniques. Cross Entropy Loss is often used for learning semantic segmentation, and Intersection over Union (IoU) is often used as its evaluation metric. In order to achieve more accurate prediction, loss functions that directly optimize the IoU have been studied in recent years. However, almost of the previous studies have shown the effectiveness only in the case of two-class segmentation, and few studies have confirmed the effectiveness in multi-class segmentation. In this study, we propose a new loss function that improves the accuracy of the IoU by optimizing the matrix composed of class probabilities. The effectiveness of the proposed method is confirmed by experiments on two-class and multi-class segmentation.
View full abstract
-
Ryuto YOSHIDA, Junichiro FUJII, Junichi OKUBO, Masazumi AMAKATA
Session ID: 4I3-GS-7d-05
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
In deep learning, data cleansing is effective in improving the accuracy of the model. On the other hand, the number of data is also an important factor for proper training. Therefore, when performing data cleansing, it is necessary to apply an effective method. Based on this problem, this study verified the effect of data cleansing on the crack segmentation for revetment. In the verification, various datasets was created based on the features of training images. And training results was compared for each dataset.
View full abstract
-
Akiho IWATA, Hirono KAWASHIMA, Makoto KAWANO, Jin NAKAZAWA
Session ID: 4I4-GS-7e-01
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
In this study, we work on the automatic classification of the elements of the step sequence from the video of figure skating. In figure skating, the scoring of all performances consists of judging each element and evaluating the performance, and all are done visually by the referees. However, it is a cost for the referees to judge and evaluate the element at the same time. Therefore, by automatically recognizing the elements, the cost on the referees can be reduced, and the referees can focus on evaluating the performance.Given formulating the element recognition as a video classification problem, we need to build a figure skating dataset with several undesirable properties. We use a convolutional neural network for element recognition with several techniques that treat the properties. In the experiment, we conducted ablation studies to verify which technique is useful for the figure skate dataset and report the result of the studies.
View full abstract
-
Kohei HIRAKI, Masahiro SUZUKI, Yutaka MATSUO
Session ID: 4I4-GS-7e-02
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
The purpose of this study is to predict the fall point of tennis serve from the pose information of the player. Compared to other racquet sports, the speed of tennis serve is faster and the range of the court is wider, which makes it more difficult to return the serve. Therefore, predicting the course of the serve and returning it are considered to be important for winning and losing points. As a previous study, there is a method to predict the fall point of a serve in table tennis. However, since the player who exists in the tennis video is smaller than the table tennis player, applying this method to tennis will cause the performance of pose estimation to deteriorate, resulting in failure in predicting the fall point. We propose to improve pose estimation performance by dividing the detection into two stages: human detection and pose estimation.
View full abstract
-
Kotaro KITAYAMA, Jun SUZUKI, Nobuyuki SHIMIZU
Session ID: 4I4-GS-7e-03
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
Automatic video summarization is one of the crucial technologies to alleviate the cost of developers and end-usersto check the contents of videos. Moreover, it can also work as clues of video retrieval to only obtain required videosfrom extremely many consumer-generated videos. This paper specifically focuses on a video summarization task,which we callvideo key-frame captioning. This task requires systems to extract a predefined number of key-framesand simultaneously generate a description of the series of extracted key-frames that summarize the given video well.We introduce a formal task definition of our new task and discuss procedures for creating a dataset for evaluationof key-frame captioning tasks.
View full abstract
-
Three dimensional coordinates prediction of landmarks on craniofacial bone
Soh NISHIMOTO
Session ID: 4I4-GS-7e-04
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
Objective: X-ray CT images of the craniofacial region are now routinely used in the diagnosis and treatment of craniofacial conditions. 3D reconstruction is possible, and the distance between landmarks and the angle between reference planes can be measured in 3D. In order to do so, it is necessary to plot the landmarks, but this requires time and experience. In this study, we attempted to predict the 3D coordinates of feature points from a set of CT images by using two-phased deep learning networks. Methods: In the first phase, the DICOM image set was compressed and a model was trained. In the second phase, 3D images around the coordinates of each landmark were cropped with original resolution and models were trained. Results: The prediction error was smaller when the estimation was done in two phases than when it was done only in the first phase.
View full abstract
-
Takumi EGE, Takuma TERADA, Hiroto NAGAYOSHI
Session ID: 4I4-GS-7e-05
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
This paper presents a study of worker activity recognition that recognizes the time series of worker activities from videos. Our approach adopts pose estimation to detect worker postures from videos and applies an action segmentation model to the estimated worker postures to reduce over-segmentation errors. The result of experiments using newly created simulated datasets revealed that high-accurate time series recognition of worker activities is possible while reducing over-segmentation errors by applying an action segmentation model.
View full abstract
-
Misaki ISHIJIMA, Daisuke BEKKI
Session ID: 4J1-GS-6d-01
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
The development of CCG parsers requires a CCG treebank, which is known to be costly. This is mainly because checking the consistency of a tree structure is a more complex task than that of a linear structure or a dependency tree. In order to reduce this cost, we need a development environment that has knowledge of grammar: for example, when a part of a parse tree is modified, the system automatically re-parses the sentence to reflect the modification and checks the grammaticality of the modified tree. In our project, we are building such a grammar development environment as a GUI system. We implemented the system by using Yesod, a web application framework written in Haskell, and employ lightblue as a Japanese CCG parser. We demonstrate how we can display and manipulate CCG parse trees in the GUI setting.
View full abstract
-
Sora TAGAMI, Daisuke BEKKI
Session ID: 4J1-GS-6d-02
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
In this study, we apply the method of knowledge distillation to the Japanese morphological analyzerrakkyoand evaluate if the method compresses its model size, and the training converges for smaller datasets. Recently,Japanese morphological analyzers have achieved high performance in both accuracy and speed. From the viewpointof practical uses, however, it is preferable to reduce the model size. The rakkyo model, among others, succeeded insignificantly reducing its model size by using only character unigrams and discard the dictionary, by the training onsilver data of 500 million sentences generated by Juman++. We tried to further compress rakkyo by constructinga neural morphological analyzer for Japanese using the outputs of rakkyo, namely the probabilistic distributions astraining data. The evaluation is done against the silver data generated by rakkyo, which suggests that our modelapproaches the accuracy of rakkyo with a smaller amount of data.
View full abstract
-
Taro YANO, Takeoka KUNIHIRO, Masafumi OYAMADA
Session ID: 4J1-GS-6d-03
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
Zero-shot hierarchical text classification, which classifies a text into a class in a hierarchy consisting of both seen and unseen classes, is important in wide applications such as news recommendation and product categorization. Two existing approaches, (1) matching approach and (2) hierarchical classification-based approach, have different performance characteristics on seen and unseen classes: matching approach performs well on unseen classes but worse on seen classes and vice versa in hierarchical classification-based approach. In this paper, we propose a zero-shot hierarchical text classification method that combines and generalizes two approaches to improve the performance on both seen and unseen classes. Experiments results on real-world datasets demonstrate the superiority of our proposed method over the baselines.
View full abstract
-
Sora KADOTANI, Tomoyuki KAJIWARA, Yuki ARASE, Makoto ONIZUKA
Session ID: 4J1-GS-6d-04
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
Style transfer is a natural language processing task that transforms the expressions of an input sentence while retaining the meaning of an input sentence. We attempt to improve the quality of style transfer by using curriculum learning. Curriculum learning is a method that designs a training process starting from easy training samples to difficult training samples. We propose edit distance as a measure to determine the difficulty of transformation. Experiments on formality style transfer showed that our model improves the quality of style transfer.
View full abstract
-
Motonari KAMBARA, Komei SUGIURA
Session ID: 4J1-GS-6d-05
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
The purpose of this paper is to extend the dataset based on a cross-modal generative language generation model. We propose a Case Relation Transformer (CRT) that generates a fetching instruction sentence from an image, such as ``Move the blue flip-flop to the lower left box.'' Unlike existing methods, CRT uses Transformer to capture the visual and geometric features of objects in an image. The Case Relation Block allows the CRT to process the object. We conducted comparative experiments and human evaluations. Experimental results showed that CRT outperformed the baseline methods.
View full abstract
-
Kazutaka KINUGAWA, Hitoshi ITO, Hideya MINO, Isao GOTO, Ichiro YAMADA
Session ID: 4J2-GS-6e-01
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
Temporal expression recognition is a long-standing problem in natural language processing (NLP). One difficulty of this task is to disambiguate specific temporal expressions which change the meanings depending on their contexts. Especially in Japanese news domain, this is an essential issue since these temporal expressions frequently occur and consequently mislead NLP systems. One of the effective approaches to tackle this problem is to build a supervised classification model, but a huge cost is required to prepare an enough amount of labeled training data. In this paper, we present an automatic data labelling method for such a Japanese specific temporal term. We leverage word alignment in Japanse-English parallel corpus and resolve their ambiguities based on both Japanese and English side information. We efficiently build a dataset and conduct a manual inspection against this dataset to confirm the efficacy of our technique. We train several baseline models on this dataset and obtain consistent performance.
View full abstract
-
Kana KOYANO, Riko SUZUKI, Izumi HARUTA, Hitomi YANAKA, Daisuke BEKKI
Session ID: 4J2-GS-6e-02
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
In the semantic analysis of real texts, there is a social need for a task to automatically compute the similarity/differences between descriptions in multiple documents. Recently, an approach using the natural language inference system ccg2lambda has been attempted for this task. It performs syntactic and semantic analyses, and automated theorem provers for two documents to determine bi-directional entailment relations, and uses the information on the unproven terms for similarity/difference computation. In this research, we focus on the similarity/difference computation of sentences containing numerical expressions, and aim to achieve detailed similarity/difference computation by enabling inferences such as three times implies two or three times. Specifically, we modified the ccg2lambda pipeline by 1) correctly rewriting the syntax trees of sentences containing quantitative expressions using Tsurgeon, and 2) modifying the semantic template according to the rewritten syntax tree.
View full abstract
-
Kento TANAKA, Taichi NISHIMURA, Keisuke SHIRAI, Hirotaka KAMEKO, Shins ...
Session ID: 4J2-GS-6e-03
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
In language learning, training output skill such as speaking and writing is vital in order to retain the learned knowledge. However, scoring descriptive questions by humans would be costly, and this is why automatic scoring systems attract attention. In this research, we try to realize an automatic scoring system for picture description. Concretely, (i) we first analyze the trends of errors that English learners would make, (ii) then create a pseudo dataset by artificially mimicking the errors, and (iii) finally consider a model that judges whether a given pair of a picture and a sentence is valid or not. In experiments, we trained the model with the created pseudo data and evaluate it with the answers provided by actual learners. From experimental results, we found that our model outperforms a random agent.
View full abstract
-
Shintaro SUDA, Akihiro TSUJI, Akito SUZUKI, Tokuma SUZUKI, Ryo ITO
Session ID: 4J2-GS-6e-04
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
The purpose of this paper is to explore the use of distributed representations of news entities for the problem of predicting news impact in financial markets. We propose a method to embed various kinds of news entities (event, country and asset information) into a vector space. Then, we apply the entity representation to predict the FX volatility in order to consider the country where the news occurs and the news category of the event. Our work provides a direction for future research that applies the relationships among entities to various problems in finance.
View full abstract
-
Shingo SASHIDA, Kei NAKAGAWA
Session ID: 4J2-GS-6e-05
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
In recent years, various text mining techniques have been utilized in the field of both academic and practical finance. The economic causal chain is one example and refers to a cause and effect network structure constructed by extracting a description indicating a causal relationship from the texts of financial statement summaries. There is the lead-lag effect which spreads to the ’effect ’stock group when a large stock fluctuation in the ’cause ’ stock group in the causal chain occurs. However, in economic causality among companies, a company’s positive effect can either positively or negatively affect another causally related companies. That is, considering positive or negative sentiments is important for considering the lead-lag effect in the economic causal chain. The SSESTM (Supervised Sentiment Extraction via Screening and Topic Modeling) model has been proposed as a sentiment analysis specialized for stock return forecasting, and it produced a substantial profit in the U.S. stock market. In this study, we propose an investment strategy that exploits the lead-lag effect in the causal chain relationship considering the sentiments with the SSESTM model. We confirm the profitability of our proposed strategy and there is the evidence of stock return predictability across causally linked companies considering sentiment.
View full abstract
-
Takashi KAMBE, Sho YOKOI, Masashi YOSHIKAWA, Kentaro INUI
Session ID: 4J3-GS-6f-01
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
The broad range of applications in natural language processing and text mining requires the computation of sentence similarities, such as similarity-based text retrieval, automatic evaluation of generated texts. However, these studies have largely ignored multi-word expressions (MWEs), an important component of natural language. MWEs are phrases for which the meaning of the whole phrase cannot be naturally inferred from the meaning of constituent words, such as “hot dog.” Needless to say, when computing the meaning of the whole sentence, accurate processing of the meaning of MWEs is as important as that of each word. To introduce the perspective of MWEs into the study of textual similarity, we attempt to create a new textual similarity dataset requiring semantic computation of MWEs. Specifically, we exploited (1) a combination of back-translation and constrained decoding, and (2) mask prediction by BERT. We showed that our proposed can make balanced sentence similarity evaluation data.
View full abstract
-
Hitomi YANAKA, Koji MINESHIMA
Session ID: 4J3-GS-6f-02
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
This paper introduces JSICK, a Japanese dataset for Recognizing Textual Entailment (RTE) and Semantic Textual Similarity (STS), manually translated from the English dataset SICK that focuses on compositional aspects of natural language inferences. Each sentence in JSICK is annotated with semantic tags to analyze whether models can capture diverse semantic phenomena. We perform a baseline evaluation of BERT-based RTE and STS models on JSICK, as well as a stress test in terms of word order scrambling in the JSICK test set. The results suggest that there is room for improving the performance on complex inferences and the generalization capacity of the models.
View full abstract
-
Shuting BAI, Tingxuan LI, Seiji SUZUKI, Takehito UTSURO, Yasuhide KAWA ...
Session ID: 4J3-GS-6f-03
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
In this paper, we focus on the task of how-to tip machine reading comprehension (MRC), which is in the field of non-factoid MRC. Then, in the field of how-to tip MRC, we propose a method to build a context dataset, to which we apply a certain procedure of retrieving candidates of context paragraphs that are supposed to include candidates of answers to the given question. The information source of the context dataset is the column pages collected from how-to tip Web sites. We show that it is easy to develop a context dataset consisting of more than a few thousand context paragraphs. Then, we propose a procedure to combine a search module based on TF-IDF and a BERT machine reading comprehension model that is evaluated based on the context dataset developed in this paper.
View full abstract
-
Tomoki NISHIYAMA, Kazuaki ANDO
Session ID: 4J3-GS-6f-04
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
In recent years, elementary schools are practicing Newspaper in Education (NIE) which is education using newspaper as teaching materials. However, it is difficult for elementary school students to understand the contents of newspaper articles because the articles are written for the general readers. If the text of the articles can be automatically paraphrased to simple text, we can say that this problem can be improved. Currently, there are not many Japanese paraphrase corpora or data sets for children. In this study, we focus on news articles on NHK NEWS WEB EASY (NNWE) which is a website for children. This site has simple and easy to understand news articles by paraphrasing selected general news articles manually. The purpose of this study is to construct a Japanese paraphrase corpus by mapping each sentence of the related news articles on NNWE and NHK News Web (NNW) as easy and difficult sentences, respectively. The purpose of this study is to construct a Japanese paraphrase corpus by mapping each sentence of the related news articles on NNWE and the NNW as easy and difficult sentences, respectively. As the result of experiments, we confirmed that the effectiveness of the method for computing the similarity of sentences based on alignment between word embeddings on the Japanese data set, and the accuracy of the method was improved by specifying the part of speech used for the computation.
View full abstract
-
Mana ISHIDA, Hitomi YANAKA, Kana MANOME, Daisuke BEKKI
Session ID: 4J3-GS-6f-05
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
Temporal relation recognition between events in clinical texts is a challenging and valuable task. To realize medical information retrieval for a time-series event, we propose a method to perform inferences over temporal relations in the medical domain using ccg2lambda, an integrated system that maps a sentence to a higher-order logical formula by syntactic analysis and semantic composition based on Combinatorial Category Grammar (CCG). However, current ccg2lambda does not parse multi-word expressions in clinical texts. To solve this issue, we add a multi-word expression analysis module to ccg2lambda. We construct a dataset annotated with the semantic relations of multi-word expressions in clinical texts and implement a multi-word expression analysis module using BiLSTMs. Our enhanced ccg2lambda with a multi-word expression analysis module enables us to correctly map some sentences in clinical texts to their semantic representations.
View full abstract
-
Shinichi TACHIBANA, Tomohiko HARADA, Kazuhiko TSUDA
Session ID: 4J4-GS-6g-01
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
The purpose of this study is to evidence a method of understanding the intentions of taste expressions from word-of-mouth data of cooking recipe websites using text mining. This study aims to clarify the use of the word “KOKU” as an example to verify the method. As a result of applying the prediction results of a machine learning model to LIME, using word-of-mouth data including KOKU and SAPPARI as a data set, it was found that the characteristic ingredients contributing to the KOKU were identified. As a result of the analysis, which food composition data is used, it was confirmed that there was a statistically significant difference in lipids and other components of the food ingredients that were identified as the characteristic words of KOKU and SAPPARI. We will conduct further verification by increasing the number of target data based on this method.
View full abstract
-
Interpreting Herman Melville's Typee from the Perspective of Natural Language Processing
Jun KANEKO, Takashi OTSUKI, Takayuki SAKAGUCHI
Session ID: 4J4-GS-6g-02
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
English and American literature has so far been studied using stylistics from Natural Language Processing, but here it tried to be interpreted from that perspective. The words that writers used in their works are sometimes slightly different in nuance from that used by people in general. The differences in the meanings of words were viewed not sensitively (qualitatively) but quantitatively understood. Herman Melville’s first novel, Typee, was analyzed here. By comparing word vectors of general linguistic data (pretrained model of fastText) with that from texts written by the writer (trained by fastText), the writer’s sense of language was revealed. Based on what has been already studied so far in terms of American Literature, some important words were selected. Therefore, it was confirmed that the differences in the meanings of words were quantitatively demonstrated by the words indicated by their degree of similarity.
View full abstract
-
Yutaka TANABE, Yuto KAMISHIRO, Seiki MATOBA, Kosuke HISHINUMA, Ichiro ...
Session ID: 4J4-GS-6g-03
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
We have developed a solver for the test of car licence in Japan as an example of anautomatic rule judgment system.In this study, we examined how to automatically divide the situation explanation part and the question part for a given sentence and solve it as a task of recognition of textual entailment.We solved the problem using BERT, which is one of the large-scale general-purpose language expression models.It was found that the rule-based division method we set, improved the accuracy of automatic answering.
View full abstract
-
Ryuku NOBUSUE, Masanori AKIYOSHI
Session ID: 4J4-GS-6g-04
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
In this study, we propose a Haiku generation method that injects a sense of surprise using adversarial learning.The method incorporates unexpectedness and subject clarity of the Haiku to measure these features with the topicmodel pre-trained from Wiki corpus. These features are used in the generator of GAN, where the discriminatorjudges whether such generated Haikus come from man-made or not. Evaluation is done by several questionnairesthat includes AHP, and experimental results show the proposed method has almost the same sense of surprisecompared to Haikus generated by human.
View full abstract
-
Higasa TAKAHIRO, Yoji KAWANO, Satoshi KURIHARA
Session ID: 4J4-GS-6g-05
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
In recent years, the market for content generation has been expanding, and the demand for stories and scenarios as the basis for content has been increasing. However, the shortage of scenario writers and the limitation of story patterns that can be created have become serious problems. Therefore, there is a need to develop a system that can stimulate writer’s creativity in scenario generation. In this study, we adopted the 13-phase theory to generate plots, which is abstract of scenario, by focusing on the roles of characters. We confirmed that generating plots by considering the characters can generate more consistent plots than randomly generating them.
View full abstract
-
Rui YOSHINAGA, Natsuki OKA, Kazuaki TANAKA
Session ID: 4N1-IS-3a-01
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
Our goal is to build an agent that listens to music with people. We believe that the agent can interact with people more naturally if it has music preferences. Madison and Schiolde (2017) found that repeated listening increases music liking through music listening experiments. The purpose of this study is to build an agent that likes songs it hears repeatedly. We consider the degree of prediction accuracy as the degree of familiarity with a song. The agent listens to a song as raw audio, predicts the song's continuation with a generative model, compares the prediction with the actual input, and judges the music's familiarity. We implemented the agent and investigated its possibility.
View full abstract
-
Keisuke MORITA, Natsuki OKA, Kazuaki TANAKA, Masahiro MIYATA, Takashi ...
Session ID: 4N1-IS-3a-02
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
It is believed that non-cognitive abilities grow significantly during childhood, and the Japanese early childhood education community has begun to focus on the education of non-cognitive abilities. However, measurement and estimation methods of non-cognitive abilities have challenges in quantitative and objective aspects. Besides, there is still little research on the active use of AI technology in education. In this study, we proposed a method for quantitatively estimating "motivation for class," which is one of the children's non-cognitive abilities. Specifically, we examined the number and distribution of intersections between children's gazes using the information on the direction of their faces and their gazes during class and determined whether all children should focus their gazes on a single point. We also quantitatively examined whether each child behaved according to the scene by measuring the distance between each child's gaze and the center of gravity of the intersection of the other children's gazes. As a result, we found that we could capture the characteristics of each child's behavior. In the future, we plan to use inverse reinforcement learning to estimate intrinsic motivation from children's gaze information.
View full abstract
-
Yuki KUBO, Natsuki OKA, Subaru HANADA, Kazuaki TANAKA, Tomomi TAKAHASH ...
Session ID: 4N1-IS-3a-03
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
There is a growing need for dialogue systems that can interact appropriately with children. Pechat is a button-shaped speaker that is attached to a stuffed animal. Caregivers can have a voice dialogue with children through the stuffed animal by selecting speech on their smartphones. Our goal is to automate speech selection so that Pechat can interact with children without parental intervention. The system learned to select an appropriate response from a given set of candidates. The information available for the learning process was the caregiver's utterance selection history and the evaluation of the system utterance, which was judged by the prosodic features of the child's response. Prosody was used as a reward for reinforcement learning and defined the states of reinforcement learning. The experimental results revealed the difference in learning with and without the use of prosody.
View full abstract
-
Ahmed MOUSTAFA, Daiki SETOGUCHI
Session ID: 4N1-IS-3a-04
Published: 2021
Released on J-STAGE: June 14, 2021
CONFERENCE PROCEEDINGS
FREE ACCESS
In the field of automated negotiation, there has been a growing interest in models that can explain the rational decisions of automated negotiating agents in order to gain the trust of users. Those models enable humans to trust agents by understanding their behavioral principles. In specific, in automated negotiation, appropriate compromises need to be made during the negotiation to match the other negotiating party in order to reach an agreement that is mutually beneficial. However, the negotiating agents currently use simple negotiation models. In this paper, we propose an automated negotiation model based on Q-learning. This enables the negotiating agent to make appropriate compromises to match the other negotiating party, which results in greater mutual benefit. The experimental evaluations show that the proposed agent is faster and has better results than the existing agents.
View full abstract