The increasing number of clinical texts, such as electronic medical records and discharge summaries, has led to ongoing research on natural language processing in the medical field. Since medical compound words often appear in clinical texts, in-depth semantic analysis and natural language inference tasks for clinical text are challenging. In this study, we developed a semantic analysis and logical inference system for clinical texts, called Medc2l, by improving ccg2lambda, an integrated system that maps a sentence to a higher-order logical formula by syntactic analysis and semantic composition and performs inference between logical formulas. To handle compound words in clinical texts, we included a compound-word analysis module to ccg2lambda. The compound-word analysis module first assigns semantic tags to the constituents of compound words using sequence-labeling models. Based on this information, the module derives the Combinatory Categorial Grammar (CCG) trees of compound words and obtains their semantic representations that allow logical inference. In addition, we created an inference test set involving compound words with critical texts. The experiments using the test set showed that our inference system achieved comparable or superior performance to the deep learning-based NLI models. Our system particularly predicted labels accurately for non-entailment problems.
View full abstract