Humans develop their concept of an object by classifying it into a category, and acquire language by interacting with others at the same time. Thus, the meaning of a word can be learnt by connecting the recognized word and concept. We consider such an ability to be important in allowing robots to flexibly develop their knowledge of language and concepts. Accordingly, we propose a method that enables robots to acquire such knowledge. The object concept is formed by classifying multimodal information acquired from objects, and the language model is acquired from human speech describing object features. We propose a stochastic model of language and concepts, and knowledge is learnt by estimating the model parameters. The important point is that language and concepts are interdependent. There is a high probability that the same words will be uttered to objects in the same category. Similarly, objects to which the same words are uttered are highly likely to have the same features. Using this relation, the accuracy of both speech recognition and object classification can be improved by the proposed method. However, it is difficult to directly estimate the parameters of the proposed model, because there are many parameters that are required. Therefore, we approximate the proposed model, and estimate its parameters using a nested Pitman--Yor language model and multimodal latent Dirichlet allocation to acquire the language and concept, respectively. The experimental results show that the accuracy of speech recognition and object classification is improved by the proposed method.
Predicting human activities is important for improving recommender systems or analyzing social relationships among users. Those human activities are usually represented as multi-object relationships (e.g. user's tagging activities for items or user's tweeting activities at some locations). Since multi-object relationships are naturally represented as a tensor, tensor factorization is becoming more important for predicting users' possible activities. However, its prediction accuracy is weak for ambiguous and/or sparsely observed objects. Our solution, Semantic data Representation for Tensor Factorization (SRTF), tackles these problems by incorporating semantics into tensor factorization based on the following ideas: (1) It first links objects to vocabularies/taxonomies and resolves the ambiguity caused by objects that can be used for multiple purposes. (2) It then links objects to composite classes that merge classes in different kinds of vocabularies/taxonomies (e.g. classes in vocabularies for movie genres and those for directors) to avoid low prediction accuracy caused by rough-grained semantics. (3) It finally lifts sparsely observed objects into their classes to solve the sparsity problem for rarely observed objects. To the best of our knowledge, this is the first study that leverages semantics to inject expert knowledge into tensor factorization. Experiments show that SRTF achieves up to 10% higher accuracy (lower RMSE value) than state-of-the-art methods.