2018 Volume 25 Issue 3 Pages 295-324
Learning semantic textual relatedness is a core research subject in natural language processing. Vector-based models are often used to compute sentence representations from words or predicate-argument structures, but these models cannot capture semantics accurately with consistency. Conversely, logical semantic representations can capture sentence semantics in depth and with much greater accuracy, but their symbolic nature does not offer graded notions of textual similarity. We propose a method for learning semantic textual relatedness by combining shallow features with features extracted from natural deduction proofs using bidirectional entailment relations between sentence pairs. For the natural deduction proofs, we use ccg2lambda, a higher-order automatic inference system that converts Combinatory Categorial Grammar (CCG) derivation trees into semantic representations and conducts natural deduction proofs. We evaluate our system using two major NLP tasks: learning textual similarity and recognizing textual entailment. Our experiments demonstrate that our approach can outperform other logic-based systems and we obtain high performance levels for the RTE task using the SICK dataset. Our evaluations also demonstrate that features derived from the proofs are effective for learning semantic textual relatedness and we quantify our contribution to the research area.