Studies in the Japanese Language
Online ISSN : 2189-5732
Print ISSN : 1349-5119
[Notes and Discussion]
Word Alignment between The Tales of the Heike (Amakusa Edition) and its Source Using Word Vectors
Yūho KITAZAKI
Author information
JOURNAL FREE ACCESS

2025 Volume 21 Issue 2 Pages 53-61

Details
Abstract

Creating parallel corpora from texts with similar plots but from different historical periods is valuable for efficient diachronic comparative studies and quantitative analysis. This paper examines methods for automatic word alignment in historical Japanese texts, focusing on The Tales of the Heike (Amakusa Edition) and its vernacular translation source.

A straightforward approach to word alignment is to use edit distance between lemma strings, but this method faces difficulties in identifying “substitution relationships between different words.” To address this limitation, we employ Word2Vec, a word vector model that represents semantic similarities between words numerically, enabling more accurate alignment than simple edit distance metrics.

Content from these authors
© 2025 The Society for Japanese Linguistics
Previous article Next article
feedback
Top