In this paper, we investigate a problem existing in Japanese word sense disambiguation (WSD) through a HiraganaKanji conversion task. In choosing words to consider as features, we propose a method that employs word embeddings and pointwise mutual information and evaluate the proposed method. The experimental results suggest that our method is more effective than other methods using word embeddings. We conduct an experiment using SemEval 2010 Japanese WSD Task and our proposed method achieve better accuracy. We also compare the accuracy when changing the amount of training data. We find that the difference in accuracy between the methods becomes small when a very large amount of training data is used. We have confirmed that the method of improving accuracy while using fewer training data is important in WSD because the number of sentences required to obtain high accuracy increases exponentially. We also experiment on the domain of data and confirmed that using datasets for ambiguity matching in each domain is important in improving accuracy.
Learning semantic textual relatedness is a core research subject in natural language processing. Vector-based models are often used to compute sentence representations from words or predicate-argument structures, but these models cannot capture semantics accurately with consistency. Conversely, logical semantic representations can capture sentence semantics in depth and with much greater accuracy, but their symbolic nature does not offer graded notions of textual similarity. We propose a method for learning semantic textual relatedness by combining shallow features with features extracted from natural deduction proofs using bidirectional entailment relations between sentence pairs. For the natural deduction proofs, we use ccg2lambda, a higher-order automatic inference system that converts Combinatory Categorial Grammar (CCG) derivation trees into semantic representations and conducts natural deduction proofs. We evaluate our system using two major NLP tasks: learning textual similarity and recognizing textual entailment. Our experiments demonstrate that our approach can outperform other logic-based systems and we obtain high performance levels for the RTE task using the SICK dataset. Our evaluations also demonstrate that features derived from the proofs are effective for learning semantic textual relatedness and we quantify our contribution to the research area.