単語ベクトルを用いた『天草版平家物語』と原拠本『平家物語』の対応付け

北﨑 勇帆

doi:10.20666/nihongonokenkyu.21.2_53

Abstract

Creating parallel corpora from texts with similar plots but from different historical periods is valuable for efficient diachronic comparative studies and quantitative analysis. This paper examines methods for automatic word alignment in historical Japanese texts, focusing on The Tales of the Heike (Amakusa Edition) and its vernacular translation source.

A straightforward approach to word alignment is to use edit distance between lemma strings, but this method faces difficulties in identifying “substitution relationships between different words.” To address this limitation, we employ Word2Vec, a word vector model that represents semantic similarities between words numerically, enabling more accurate alignment than simple edit distance metrics.

Content from these authors

Favorites & Alerts

Add to favorites
Additional info alert
Citation alert
Authentication alert

Corresponding author

Register with J-STAGE for free!