Host: The Japanese Society for Artificial Intelligence
Name : The 38th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 38
Location : [in Japanese]
Date : May 28, 2024 - May 31, 2024
The objective of this study was to create a comprehensive machine translation dataset for the Bohairic dialect of Coptic, aiming to support both the liturgical use within the Coptic Orthodox Church and the broader language revitalization movement. As a result, by digitizing a vast array of Bohairic texts, we assembled a dataset containing over 400,000 tokens in Bohairic Coptic distributed across 27,900 Bohairic Coptic-English translation pairs. This dataset was specifically designed to train models based on the OPUS-MT framework, which are integrated into the Coptic Translator platform to facilitate accurate and accessible translations. This project not only demonstrates the application of digital humanities in linguistic preservation but also provides a valuable resource for computational linguistics, contributing to the ongoing efforts to revitalize and maintain the Bohairic dialect of the Coptic language.