Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
32nd (2018)
Session ID : 1J2-01
Conference information

A Linear-Time Kernelized Co-occurrence Norm for Sparse Linguistic Expressions
*Sho YOKOISosuke KOBAYASHIKenji FUKUMIZUKentaro INUI
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

Estimating pointwise mutual information (PMI), a well-known co-occurrence measure between linguistic expressions, leads to a trade-off between learning time and the robustness to data sparsity. We propose a new kernel-based co-occurrence measure, named pointwise HSIC (PHSIC). PHSIC, intuitively, is a ``smoothed PMI'' by kernels, so it is robust to data sparsity; furthermore, its estimator is reduced to an efficient linear-time matrix calculation. In our experiments, we apply PHSIC to a dialogue response selection task using sparse language data. Experimental results show that the learning speed is about $100$ times faster than that of a recurrent neural network-based PMI estimator; moreover, when the size of the data is small, its predictive performance hardly deteriorates compared to PMI.

Content from these authors
© 2018 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top