This study aims to explore the effectiveness of the Llama 3.1-8B model in generating plain Japanese texts within medical contexts through fine-tuning. A dataset for model training was constructed based on representative Japanese medical plain Japanese materials, and the Llama 3.1-8B model was fine-tuned using this dataset. Subsequently, the model’s ability and performance in generating relevant texts for medical scenarios were evaluated by analyzing the generated content and using metrics such as accuracy, F1 score, ROUGE series, and BLEU-4. The results show that despite issues like a small amount of inaccurate content and grammatical errors, nearly 80% of the content is reasonable, accurately expressed, concise, and easy to understand. The model performs more satisfactorily in short sentence generation tasks, but its feedback effect on long text generation instructions is suboptimal. The findings indicate that research on plain Japanese based on model fine-tuning has certain feasibility, though optimization of data scale and training data is required to enhance its accuracy and stability.
View full abstract