抄録
We study automatic simplification of Japanese conversational utterances
(single-turn spoken-style inputs) into easier Japanese intended to support older
adults. Because parallel corpora for spoken-style simplification are scarce, we adopt
a two-stage optimization: supervised fine-tuning (SFT) to obtain an initial policy,
followed by reinforcement learning with Proximal Policy Optimization (PPO) [2] to
better match conversational input distributions. For PPO, we use conversational utterances
extracted from the Nagoya University Conversation Corpus (NUCC) [10]
and additionally include “difficultified” conversational variants to strengthen robustness
to conversational difficulty factors. PPO is trained with a composite reward that balances meaning preservation and naturalness with lexical/syntactic simplicity and penalties for conversational difficulty factors. We evaluate outputs using paired comparisons between SFT and PPO on the same prompts. On valid paired outputs (N = 964), PPO tends to shorten outputs (median Δlength ratio = −0.041; median Δout len = −4 characters), while meaning-related scores decrease on average (median ΔBERTScore Precision = −0.004, Recall = −0.015) [3]. We additionally analyze BERTScore changes by PPO/SFT output-length-ratio bins and observe that Precision can decrease even when PPO outputs are shorter, suggesting sensitivity to surface-form changes. For robustness, we analyze invalid outputs on N = 1000 paired generations; PPO reduces the invalid-output rate from 3.5% to 2.1%. An exact McNemar test indicates the reduction is statistically significant (two-sided p = 5.19×10−4; one-sided improvement p = 2.59 × 10−4).