詳細検索結果

VAEに基づく潜在発話トピックとセマンティクスを活用したマルチモーダルセンチメント予測

*立石修平, 小瀬木悠佳, 八島浩文, 中辻真

人工知能学会全国大会論文集
2022年 JSAI2022 巻 3P4-GS-2-01
発行日: 2022年
公開日: 2022/07/11

DOI https://doi.org/10.11517/pjsai.JSAI2022.0_3P4GS201

会議録・要旨集フリー

抄録を表示する抄録を非表示にする

昨今の情報技術分野において、「メタバース」という単語が一大共通テーマとなって界隈を大いに賑わせている。機械学習分野においてこの「メタバース」という概念と向き合うにあたっては、この世界の複数の要素を基として、事象を分析する能力を持つモデルが求められる。幸いにして、機械学習のフィールドにおいては、そうした複数元の入力データを組み合わせた学習の手法に対しては、旧来より「マルチモーダル学習」という命名がなされており、様々な手法が提唱されてきた。しかしながら、この複数の入力データをどう組み合わせれば、ただ単純に各入力データへの分析結果を糾合しただけではない、より良い精度の結果が出せるのかという課題は常に突き付けられており、各々研究者の頭を悩ませている。我々はこの問題に対して、(1)各単語へのセマンティック情報の添加、(2)注意機構を用いた各モダリティ間の関連性の抽出、そして(3)各モダリティ情報を統一した発話全体に対する潜在空間をもとにしたトピック情報の添加、の三要素を用いることで、マルチモーダル感情分析において従来の精度を上回るモデルを構築した。

抄録全体を表示

PDF形式でダウンロード (281K)
An Intra- and Inter-Emotion Transformer-Based Fusion Model with Homogeneous and Diverse Constraints Using Multi-Emotional Audiovisual Features for Depression Detection

Shiyu TENG, Jiaqing LIU, Yue HUANG, Shurong CHAI, Tomoko TATEYAMA, Xinyin HUANG, Lanfen LIN, Yen-Wei CHEN

IEICE Transactions on Information and Systems
2024年 E107.D 巻 3 号 342-353
発行日: 2024/03/01
公開日: 2024/03/01

DOI https://doi.org/10.1587/transinf.2023HCP0006

ジャーナルフリー

抄録を表示する抄録を非表示にする

Depression is a prevalent mental disorder affecting a significant portion of the global population, leading to considerable disability and contributing to the overall burden of disease. Consequently, designing efficient and robust automated methods for depression detection has become imperative. Recently, deep learning methods, especially multimodal fusion methods, have been increasingly used in computer-aided depression detection. Importantly, individuals with depression and those without respond differently to various emotional stimuli, providing valuable information for detecting depression. Building on these observations, we propose an intra- and inter-emotional stimulus transformer-based fusion model to effectively extract depression-related features. The intra-emotional stimulus fusion framework aims to prioritize different modalities, capitalizing on their diversity and complementarity for depression detection. The inter-emotional stimulus model maps each emotional stimulus onto both invariant and specific subspaces using individual invariant and specific encoders. The emotional stimulus-invariant subspace facilitates efficient information sharing and integration across different emotional stimulus categories, while the emotional stimulus specific subspace seeks to enhance diversity and capture the distinct characteristics of individual emotional stimulus categories. Our proposed intra- and inter-emotional stimulus fusion model effectively integrates multimodal data under various emotional stimulus categories, providing a comprehensive representation that allows accurate task predictions in the context of depression detection. We evaluate the proposed model on the Chinese Soochow University students dataset, and the results outperform state-of-the-art models in terms of concordance correlation coefficient (CCC), root mean squared error (RMSE) and accuracy.

抄録全体を表示

PDF形式でダウンロード (6961K)
階層的Tensor Fusion を用いた交渉対話における嘘検出

グエンテトウン, 吉野幸一郎, サクリアニサクティ, 中村哲

人工知能学会研究会資料言語・音声理解と対話処理研究会
2019年 87 巻
発行日: 2019/11/20
公開日: 2021/06/28

DOI https://doi.org/10.11517/jsaislud.87.0_07

会議録・要旨集フリー

PDF形式でダウンロード (676K)
雑談対話におけるマルチモーダル情報を統合した興味判定手法

松本紗規子, 荒木雅弘

人工知能学会研究会資料言語・音声理解と対話処理研究会
2018年 84 巻
発行日: 2018/10/15
公開日: 2021/06/28

DOI https://doi.org/10.11517/jsaislud.84.0_23

会議録・要旨集フリー

PDF形式でダウンロード (607K)
A Two-Stage Attention Based Modality Fusion Framework for Multi-Modal Speech Emotion Recognition

Dongni HU, Chengxin CHEN, Pengyuan ZHANG, Junfeng LI, Yonghong YAN, Qingwei ZHAO

IEICE Transactions on Information and Systems
2021年 E104.D 巻 8 号 1391-1394
発行日: 2021/08/01
公開日: 2021/08/01

DOI https://doi.org/10.1587/transinf.2021EDL8002

ジャーナルフリー

抄録を表示する抄録を非表示にする

Recently, automated recognition and analysis of human emotion has attracted increasing attention from multidisciplinary communities. However, it is challenging to utilize the emotional information simultaneously from multiple modalities. Previous studies have explored different fusion methods, but they mainly focused on either inter-modality interaction or intra-modality interaction. In this letter, we propose a novel two-stage fusion strategy named modality attention flow (MAF) to model the intra- and inter-modality interactions simultaneously in a unified end-to-end framework. Experimental results show that the proposed approach outperforms the widely used late fusion methods, and achieves even better performance when the number of stacked MAF blocks increases.

抄録全体を表示

PDF形式でダウンロード (226K)
Beyond Sentiment Analysis: A Review of Recent Trends in Text Based Sentiment Analysis and Emotion Detection

Lai Po Hung, Suraya Alias

Journal of Advanced Computational Intelligence and Intelligent Informatics
2023年 27 巻 1 号 84-95
発行日: 2023/01/20
公開日: 2023/01/20

DOI https://doi.org/10.20965/jaciii.2023.p0084

ジャーナルオープンアクセス

抄録を表示する抄録を非表示にする

Sentiment Analysis is probably one of the best-known area in text mining. However, in recent years, as big data rose in popularity more areas of text classification are being explored. Perhaps the next task to catch on is emotion detection, the task of identifying emotions. This is because emotions are the finer grained information which could be extracted from opinions. So besides writer sentiments, writer emotion is also a valuable data. Emotion detection can be done using text, facial expressions, verbal communications and brain waves; however, the focus of this review is on text-based sentiment analysis and emotion detection. The internet has provided an avenue for the public to express their opinions easily. These expressions not only contain positive or negative sentiments, it contains emotions as well. These emotions can help in social behaviour analysis, decision and policy makings for companies and the country. Emotion detection can further support other tasks such as opinion mining and early depression detection. This review provides a comprehensive analysis of the shift in recent trends from text sentiment analysis to emotion detection and the challenges in these tasks. We summarize some of the recent works in the last five years and look at the methods they used. We also look at the models of emotion classes that are generally referenced. The trend of text-based emotion detection has shifted from the early keyword-based comparisons to machine learning and deep learning algorithms that provide more flexibility to the task and better performance.

抄録全体を表示

PDF形式でダウンロード (416K)
Vision-Text Time Series Correlation for Visual-to-Language Story Generation

Rizal Setya PERDANA, Yoshiteru ISHIDA

IEICE Transactions on Information and Systems
2021年 E104.D 巻 6 号 828-839
発行日: 2021/06/01
公開日: 2021/06/01

DOI https://doi.org/10.1587/transinf.2020EDP7131

ジャーナルフリー

抄録を表示する抄録を非表示にする

Automatic generation of textual stories from visual data representation, known as visual storytelling, is a recent advancement in the problem of images-to-text. Instead of using a single image as input, visual storytelling processes a sequential array of images into coherent sentences. A story contains non-visual concepts as well as descriptions of literal object(s). While previous approaches have applied external knowledge, our approach was to regard the non-visual concept as the semantic correlation between visual modality and textual modality. This paper, therefore, presents new features representation based on a canonical correlation analysis between two modalities. Attention mechanism are adopted as the underlying architecture of the image-to-text problem, rather than standard encoder-decoder models. Canonical Correlation Attention Mechanism (CAAM), the proposed end-to-end architecture, extracts time series correlation by maximizing the cross-modal correlation. Extensive experiments on VIST dataset ( http://visionandlanguage.net/VIST/dataset.html ) were conducted to demonstrate the effectiveness of the architecture in terms of automatic metrics, with additional experiments show the impact of modality fusion strategy.

抄録全体を表示

PDF形式でダウンロード (3741K)
D4AC: 異分野連携のためのマルチモーダル対話システム構築ツール

中野幹生, 東中竜一郎

人工知能学会研究会資料言語・音声理解と対話処理研究会
2023年 99 巻
発行日: 2023/12/04
公開日: 2023/12/04

DOI https://doi.org/10.11517/jsaislud.99.0_172

会議録・要旨集認証あり

抄録を表示する抄録を非表示にする

対話システム技術を様々な社会課題の解決に役立てるためには、社会課題解決の専門家と対話システムの専門家による異分野連携が必要である。この連携を容易にするためには、プログラミング経験のない社会課題解決の専門家でも対話システムが構築できるようにすること有効である。本発表では、プログラムを書かずにマルチモーダル対話システムを構築することを可能にするツール、D4ACについて述べる。D4ACをxAIML-SUNABAのようなテキスト対話システム構築ツールと共に用いることで、ユーザの顔画像から得られる年齢・性別・感情・対話参加度の情報を用いて対話フローを変更できるマルチモーダル対話システムを構築できる。D4ACは、技術知識なしでインストール・起動・設定の変更が可能である。本発表では、D4ACを名古屋大学のTMIプログラムの学生プロジェクトで利用してもらった結果に関しても述べる。

抄録全体を表示

PDF形式でダウンロード (820K)

J-STAGEへの登録はこちら（無料）