人工知能学会研究会資料言語・音声理解と対話処理研究会

同時通訳者の発話理解における主題情報の構造化

石塚浩之

原稿種別: 研究会資料
p. 01-
発行日: 2015/03/05
公開日: 2021/06/28

DOIhttps://doi.org/10.11517/jsaislud.73.0_01

会議録・要旨集フリー

抄録を表示する抄録を非表示にする

Through the qualitative analysis of a transcript from an English-Japanese simultaneous interpreting performance, this study explores the actuality of the interpreter's mental operations in utterance comprehension. The transcript prepared for this study is a set of parallel texts, which represents temporal correspondence between the source utterances and the interpreter's translation. This study focuses on how the interpreter structures and retains topical information, and on how she uses it throughout the rest of her performance. The analysis suggests that the interpreter's mental representation is not simply an accumulation of linguistic information received from the source speech, but a complex that can include an implicit structure as a result of cognitive operations such as pragmatic inferences and the construction of mental models.

抄録全体を表示

PDF形式でダウンロード (662K)
音声対話システムにおける\\発話の誤分割修復要否判定のユーザ適応

堀田尚希, 駒谷和範, 佐藤理史, 中野幹生

原稿種別: 研究会資料
p. 02-
発行日: 2015/03/05
公開日: 2021/06/28

DOIhttps://doi.org/10.11517/jsaislud.73.0_02

会議録・要旨集フリー

抄録を表示する抄録を非表示にする

A spoken dialogue system should respond quickly after a user finishes speaking, but this often causes incorrect segmentation of user utterances by erroneous voice activity detection. We previously developed a method that performs a posteriori restoration for the incorrectly segmented utterances. A crucial part of the method is to classify whether the restoration is required or not. In this paper, we improve the accuracy by adapting the classification to each user. We focus on speaking tempo of each user, which can be obtained during dialogues. We reveal a correlation between each user's tempos and their appropriate thresholds used in the classification. We then derive a linear regression function that converts the tempos into the thresholds. We adapt two classifiers: that simply using a threshold and decision tree learning. Experimental results showed the proposed user adaptation for the two classifiers improved the classification accuracies by 3.3% and 2.1%.

抄録全体を表示

PDF形式でダウンロード (1008K)
母語（日本語）における話者交替タイミングの獲得

市川熹, 川端良子, 菊池英明, 堀内靖雄, 黒岩眞吾

原稿種別: 研究会資料
p. 03-
発行日: 2015/03/05
公開日: 2021/06/28

DOIhttps://doi.org/10.11517/jsaislud.73.0_03

会議録・要旨集フリー

抄録を表示する抄録を非表示にする

In the dialog among the mother tongue talkers, overlap utterances are done in transition-relevance places (TRP). This appearance seems to appear as the result of some capability which makes the mental burden of the dialog by the mother tongue light. We examined the age to win this capability about the Japanese mother tongue talkers. Last year, it did an analysis to the 6 year-old nursery school children. As a result, it found that it was already won. This time, it analyzed the dialog of the 5 year-old kindergartner. A difference among individuals was seen among the nursery school children to the acquisition.

抄録全体を表示

PDF形式でダウンロード (1202K)
傾聴対話における相槌形態と先行発話の統語構造の関係の分析

山口貴史, 井上昂治, 吉野幸一郎, 高梨克也, 河原達也

原稿種別: 研究会資料
p. 04-
発行日: 2015/03/05
公開日: 2021/06/28

DOIhttps://doi.org/10.11517/jsaislud.73.0_04

会議録・要旨集フリー

抄録を表示する抄録を非表示にする

We investigate the relationship between backchannels and the syntactic structure in the delimited preceding utterances in attentive listening such as counseling. First, we find out the relationship between particular patterns of backchannels and the category of the clause boundary. Next, we analyze the syntactic structure by using the depth of the syntax tree and the number of cases related to the end of utterance. It is shown that there is a relationship between particular patterns of backchannels and the complexity of the preceding utterances. The results suggest that we can choose different kinds of backchannels depending on the preceding utterance.

抄録全体を表示

PDF形式でダウンロード (581K)
模擬遠隔聴診における相互行為パターンのマルチモーダル・マルチチャネル連鎖分析

高梨克也, 堀謙太, 内藤知佐子, 黒田知宏

原稿種別: 研究会資料
p. 05-
発行日: 2015/03/05
公開日: 2021/06/28

DOIhttps://doi.org/10.11517/jsaislud.73.0_05

会議録・要旨集フリー

抄録を表示する抄録を非表示にする

In tele-auscultation, since a doctor cannot operate an auscultator by herself, she must refer a target point by a marker and have a helper in a distant place move an auscultator by proxy. This article analyzes a simulated tele-auscultation experiment and proposes an interactional pattern observed in the process of multimodal communication from pointing by marker by a doctor, operation of an auscultator by a helper to auscultation by the doctor. This pattern is then considered in terms of division of transmission between multi-channels of a system for tele-auscultation and a tele-conference system for conversation. At the end, problems on the system environment found in the experiment are addressed.

抄録全体を表示

PDF形式でダウンロード (741K)
認知症者のための会話支援エージェントの開発

蛇穴祐稀, 今渕貴志, プリマオキディッキ A., 伊藤久祥, 安田清

原稿種別: 研究会資料
p. 06-
発行日: 2015/03/05
公開日: 2021/06/28

DOIhttps://doi.org/10.11517/jsaislud.73.0_06

会議録・要旨集フリー

抄録を表示する抄録を非表示にする

In this study, we developed an interactive conversational agent software to ameliorate the sympthon of dimentia patients. This software works as a speech therapy tool, which acts as a conversation partner to a patient. We defined three sets of reminiscent questions into the software. Each set containes 15 questions. The software utilizes constrained local model (CLM) and voice detections to determine the utterances of patients. Once the CLM recognizes a patient's facial landmarks, it starts to ask him using the pre-defined questions. The software will continue to ask using subsequent questions when it doesn't detect utterances from either distance changes between mouth landmarks or changes of voice of the patient. Our experiments show that the voice detection solely enables utterance detections in a low environmental noise while the CLM succeeds to detect utterances regardless of the environmental noise.

抄録全体を表示

PDF形式でダウンロード (7102K)
仮想的演技空間の創出方略の分析～同一ネタの漫才とコントの違いを通じて～

土肥健太, 寺岡丈博, 榎本美香

原稿種別: 研究会資料
p. 07-
発行日: 2015/03/05
公開日: 2021/06/28

DOIhttps://doi.org/10.11517/jsaislud.73.0_07

会議録・要旨集フリー

抄録を表示する抄録を非表示にする

To reveal methods of making effective pauses and communicating as a character in a comedy skit, we have analyzed strategies for creating a histrionic, i.e., exaggerated and overly theatrical, comic performance that is often observed in comedy skits. We compared a manzai performance, which is considered to be a realistic comedy skit, with a comedy skit that had a histrionic performance by the comedy duo ``Sandwich Man'' doing the same material, in order to investigate the differences in the inter-utterance structure and posture-configuration structure of the two comedy styles. The results for the inter-utterance structure indicated that there were differences in the pauses between utterances, but there were no differences in the speech rates (sec/mora) between the comedy styles. Additionally, we found that in the posture-configuration structure, one of the performers turned his face toward his partner's face for a longer time in the histrionic comedy skit than that in the manzai skit. Also, in the histrionic comedy skit, the sum of the performers' shoulder-widths was shorter as seen by the audience than that in the manzai performance. Therefore, we concluded that utterance pauses and the performers' shoulder-widths are important factors in creating a histrionic comic show.

抄録全体を表示

PDF形式でダウンロード (874K)
物理的環境に埋め込まれた多人数インタラクションとしての指揮に伴う言語行為分析

白土峻平, 寺岡丈博, 榎本美香

原稿種別: 研究会資料
p. 08-
発行日: 2015/03/05
公開日: 2021/06/28

DOIhttps://doi.org/10.11517/jsaislud.73.0_08

会議録・要旨集フリー

抄録を表示する抄録を非表示にする

The purpose of this paper is demonstrate a general pattern of sequences of speech acts between a commander and a large number of receivers carrying out the commanded action. We analyze interaction data in which multi-participants collaboratively drag huge trees (about 18M) from a mountain hill to a village for the fire festival at Nozawa Onsen in Nagano. In the result, we reveal that the basic sequence is lined with `a command to start an action', `an acceptance of the command', `a rallying cry to start the action', and `a responding cry to start the action. In a smooth commanding, `a command to end the action' is put at the end of the sequence.

抄録全体を表示

PDF形式でダウンロード (1754K)
身体に刻み込まれた記憶 -音声から手話へ，手話から触手話へ-

坊農真弓

原稿種別: 研究会資料
p. 09-
発行日: 2015/03/05
公開日: 2021/06/28

DOIhttps://doi.org/10.11517/jsaislud.73.0_09

会議録・要旨集フリー

抄録を表示する抄録を非表示にする

This study offers a critique of representationalist theories of cognition by observing how embodied actions, such as speakers' mouth movements during speech and listeners' nodding to indicate a collaborative attitude, are encoded as bodily memories. This paper draws on a corpus-based micro-analysis of multimodal interaction using sign language and tactile sign language and considers two phenomena: (1) the use of mouthing during sign language interaction, and (2) the use of nodding and backchannel cues during tactile sign language interaction. In analysis 1, I found that native signers used mouthing in ways that resembled its original function (e.g., for conveying images of unknown words in their minds). In analysis 2, I found examples in which, at early stages of using tactile sign language, deafblind individuals with congenital deafness used nodding and backchannel cues similar to a visual signer's. However, deafblind individuals with a long history of tactile signing shifted drastically toward a more tactile modality for expressing backchannel cues. As a result of these observations, I apply insights from research regarding embodied actions to communication involving sign language and tactile sign language.

抄録全体を表示

PDF形式でダウンロード (7127K)

J-STAGEへの登録はこちら（無料）