Proceedings of the Fuzzy System Symposium
Session ID : 2C3-3
Conference information

proceeding
Extraction of features to identify a speaker in videos by using a deep learning
*Minori OmuraYutaka MatsushitaJunnosuke Suzumori
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

This study examines whether discriminant analysis or neural networks are more effective in predicting the utterance or no utterance, using lip features as explanatory variables. First, the maximum amplitude and frequency derived from a lip movement wave and the coordinates of the four fixed points in a lip are defined as feature values. As for the coordinates, three cases are set where both x and y coordinates and one of them are used. Second, by applying these feature values to discriminant analysis and neural networks, the utterance or no utterance is predicted. Consequently, it is shown that a neural network in which only the y-coordinate of lips is used as the explanatory variables guarantees high prediction accuracy.

Content from these authors
© 2023 Japan Society for Fuzzy Theory and Intelligent Informatics
Previous article Next article
feedback
Top