Two-Stage Conversational Japanese Simplification: SFT Followed by PPO

SHIMOURA Shunya; OKUHARA Shun

doi:10.32188/iar.4.0_29

抄録

We study automatic simplification of Japanese conversational utterances (single-turn spoken-style inputs) into easier Japanese intended to support older adults. Because parallel corpora for spoken-style simplification are scarce, we adopt a two-stage optimization: supervised fine-tuning (SFT) to obtain an initial policy, followed by reinforcement learning with Proximal Policy Optimization (PPO) [2] to better match conversational input distributions. For PPO, we use conversational utterances extracted from the Nagoya University Conversation Corpus (NUCC) [10] and additionally include “difficultified” conversational variants to strengthen robustness to conversational difficulty factors. PPO is trained with a composite reward that balances meaning preservation and naturalness with lexical/syntactic simplicity and penalties for conversational difficulty factors. We evaluate outputs using paired comparisons between SFT and PPO on the same prompts. On valid paired outputs (N = 964), PPO tends to shorten outputs (median Δlength ratio = −0.041; median Δout len = −4 characters), while meaning-related scores decrease on average (median ΔBERTScore Precision = −0.004, Recall = −0.015) [3]. We additionally analyze BERTScore changes by PPO/SFT output-length-ratio bins and observe that Precision can decrease even when PPO outputs are shorter, suggesting sensitivity to surface-form changes. For robustness, we analyze invalid outputs on N = 1000 paired generations; PPO reduces the invalid-output rate from 3.5% to 2.1%. An exact McNemar test indicates the reduction is statistically significant (two-sided p = 5.19×10−4; one-sided improvement p = 2.59 × 10−4).

著者関連情報

お気に入り & アラート

閲覧履歴

責任著者(Corresponding author)

会議情報

J-STAGEへの登録はこちら（無料）