Mathematical Linguistics
Online ISSN : 2433-0302
Print ISSN : 0453-4611
Paper (A)
An Attempt to Analysis of Formulaic Language in Written Japanese
Using the n-gram Model
Zhenjun Su
Author information
JOURNAL OPEN ACCESS

2026 Volume 35 Issue 4 Pages 224-239

Details
Abstract

This study aims to clarify the types and usage frequency of basic formulaic language in written Japanese by analyzing the publication sub-corpus from the Balanced Corpus of Contemporary Written Japanese. We used the n-gram model to analyze high-frequency n-grams ranging from 3-grams to 7-grams. The results reveal that the formulaic language common across the three genres largely consist of patterns and combinations of multiple content words, with 3-grams and 4-grams occupying a significant proportion. Furthermore, many of these expressions are categorized as collocations. Additionally, formulaic language consisting of two or more content words tend to feature combinations of [noun/particle/verb], which express more detailed meanings. Finally, in terms of the proportion of formulaic language, fixed expressions such as collocations are found to occupy a high percentage, while semi-fixed expressions, such as patterns, are also present to a notable degree. These findings suggest that both the regularity and flexibility of vocabulary coexist in written Japanese.

Content from these authors
© 2026 The Mathematical Linguistic Society of Japan

この記事はクリエイティブ・コモンズ [表示 - 非営利 - 改変禁止 4.0 国際]ライセンスの下に提供されています。
https://creativecommons.org/licenses/by-nc-nd/4.0/deed.ja
Previous article Next article
feedback
Top