日本音響学会誌
Online ISSN : 2432-2040
Print ISSN : 0369-4232
音声によるオンライン質問回答システム
好田 正紀中津 良平鹿野 清宏伊藤 憲三
著者情報
ジャーナル フリー

1978 年 34 巻 3 号 p. 194-203

詳細
抄録

Recently, the research of Speech Understanding System (SUS) has attracted great interest as a new approach to continuous speech recognition. The features of the concept of SUS are the following three points. (1) The contents of conversation are restricted to some defined area. (2) Emphasis is placed on understanding the meanings and contents of input speech rather than recognizing each word or phrase. (3) The recognition of input speech is performed through question-answering between a computer and a user. This paper describes on the contents of the SUS which the authors have studied from 1974 to 1976 and which can operate in on-line mode. The task to be performed with the system is the reservation service of train seats, and 28 stations and 181 trains are treated. Table 2 shows the seven items of reservation. The vocabulary of input speech consists of 112 words. The system consists of three parts as shown in Fig. 1. They are the acoustic processor, the linguistic processor and the audio response unit. Figure 2 illustrates the computer system on which the question-answer system in implemented. The acoustic processor and the audio response unit are implemented on NEAC 3200/70, and the linguistic processor on PF U-400. The use of high-speed speech processors connected to NEAC 3200/70 and the high-speed data transmission between these computers makes the one-line processing possible. The detailed construction of the system is shown in Fig. 3. In the acoustic processor, the feature extraction and the phoneme recognition are executed, and the results of treatment are represented in the form of phoneme lattice. In the linguistic processor, the meanings and contents of input speech are grasped through the word recognition, the syntatic analysis and the inference. Then corresponding to the recognition results, the sentences for response are composed. The audio response unit synthesizes these sentences as the response to the user. Input speech to the system must have short pauses fo more than 0. 5sec between adjacent phrases. But except this constraint, a user may speak freely to the system without being restricted by the order of reservation items or the grammar. A model of conversation was prepared so that a computer and a user can make smooth and natural question-answering. Table 3 shows the seven states in the conversation model, for each of which particular response sentences are prepared. Figure 4 shows the transition among these states. The inference by the use of time table is executed during the transition among states, which is useful to reduce the number of question-answering cycles. The output speech from the system is synthesized using words or phrases as units. For this purpose, 23 kinds of sentence patterns and 460 kinds of words or phrases to be inserted into these sentences are prepared. The performance of the system was tested by on-line question-answering experiments. Eight male speakers tried to make 320 kinds of seat reservations in total (40 reservations for each speaker), and 99. 1% of all the reservations were successfully completed. The average number of question-answering cycles, excluding the first input, was 3. 21 to complete the reservations. The detailed analysis of the contents of the question-answer is shown in Table 5, which reveals that the number of times of reinput due to rejection or misrecognition was small. These results show that the system operates fairly well in the on-line question-answering mode. The average time for acoustic and linguistic processing is 5. 0 times as much as the real-time. Figure 6 shows an example of the time chart of processing.

著者関連情報
© 1978 一般社団法人 日本音響学会
前の記事 次の記事
feedback
Top