Host: The Japanese Society for Artificial Intelligence
Name : The 38th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 38
Location : [in Japanese]
Date : May 28, 2024 - May 31, 2024
We release a large-scale parallel corpus for medical text simplification in Japanese. This corpus can be used to train a text simplification model that paraphrases medical terms into expressions that patients can understand without effort. To address the low-resource problem for this task in Japanese, we automatically extracted 17,300 sentence pairs that were semantically equivalent from both professional and consumer versions of articles in online medical dictionaries. We compared several sentence embedding models for Japanese and extracted simplified sentence pairs from article pairs by embedding-based bipartite graph matching. Experimental results on Japanese text simplification tasks in four domains revealed that models trained on our medical text simplification corpus achieved high performance in medical domains.