Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
38th (2024)
Session ID : 3S1-OS-7b-04
Conference information

Training Dataset for Japanese Simplification in Medical Domain
*Koki HORIGUCHITomoyuki KAJIWARATakashi NINOMIYAShoko WAKAMIYAEiji ARAMAKI
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

We release a large-scale parallel corpus for medical text simplification in Japanese. This corpus can be used to train a text simplification model that paraphrases medical terms into expressions that patients can understand without effort. To address the low-resource problem for this task in Japanese, we automatically extracted 17,300 sentence pairs that were semantically equivalent from both professional and consumer versions of articles in online medical dictionaries. We compared several sentence embedding models for Japanese and extracted simplified sentence pairs from article pairs by embedding-based bipartite graph matching. Experimental results on Japanese text simplification tasks in four domains revealed that models trained on our medical text simplification corpus achieved high performance in medical domains.

Content from these authors
© 2024 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top