2026 Volume 35 Issue 4 Pages 224-239
This study aims to clarify the types and usage frequency of basic formulaic language in written Japanese by analyzing the publication sub-corpus from the Balanced Corpus of Contemporary Written Japanese. We used the n-gram model to analyze high-frequency n-grams ranging from 3-grams to 7-grams. The results reveal that the formulaic language common across the three genres largely consist of patterns and combinations of multiple content words, with 3-grams and 4-grams occupying a significant proportion. Furthermore, many of these expressions are categorized as collocations. Additionally, formulaic language consisting of two or more content words tend to feature combinations of [noun/particle/verb], which express more detailed meanings. Finally, in terms of the proportion of formulaic language, fixed expressions such as collocations are found to occupy a high percentage, while semi-fixed expressions, such as patterns, are also present to a notable degree. These findings suggest that both the regularity and flexibility of vocabulary coexist in written Japanese.