Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
System Paper (Peer-Reviewed)
Building and Leveraging Domain-specific Pre-trained Models to Support Japanese News Summarization
Shotaro IshiharaEiki MurataYasufumi NakamaHiromu Takahashi
Author information
JOURNAL FREE ACCESS

2024 Volume 31 Issue 4 Pages 1717-1745

Details
Abstract

This study presents an editing support system based on domain-specific pre-trained models to support the summarization of Japanese news articles. Specifically, we organized the real-world system requirements and presented an editing support system developed by combining existing technologies and the evaluation points to be investigated. First, we pre-trained and fine-tuned T5 models on Japanese financial news corpora to reproduce a specific writing style and observed that they outperformed general models in the headline and three-line summary generation tasks, despite the smaller size of the training corpus. Second, we quantitatively and qualitatively analyzed the hallucinations of the domain-specific T5 models to reveal the characteristics of the generated hallucinations. Finally, the usefulness of the overall system, including domain-specific BERT models for predicting click-through rates, was discussed.

Content from these authors
© 2024 The Association for Natural Language Processing
Previous article Next article
feedback
Top