IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
Regular Section
Interactive Learning of Spoken Words and Their Meanings Through an Audio-Visual Interface
Naoto IWAHASHI
Author information
JOURNAL FREE ACCESS

2008 Volume E91.D Issue 2 Pages 312-321

Details
Abstract
This paper presents a new interactive learning method for spoken word acquisition through human-machine audio-visual interfaces. During the course of learning, the machine makes a decision about whether an orally input word is a word in the lexicon the machine has learned, using both speech and visual cues. Learning is carried out on-line, incrementally, based on a combination of active and unsupervised learning principles. If the machine judges with a high degree of confidence that its decision is correct, it learns the statistical models of the word and a corresponding image category as its meaning in an unsupervised way. Otherwise, it asks the user a question in an active way. The function used to estimate the degree of confidence is also learned adaptively on-line. Experimental results show that the combination of active and unsupervised learning principles enables the machine and the user to adapt to each other, which makes the learning process more efficient.
Content from these authors
© 2008 The Institute of Electronics, Information and Communication Engineers
Previous article Next article
feedback
Top