Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
38th (2024)
Session ID : 1N4-OS-18-01
Conference information

Building a Machine Translation Dataset to Support Coptic Language Education and Revitalization Movement
*So MIYAGAWA
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

The objective of this study was to create a comprehensive machine translation dataset for the Bohairic dialect of Coptic, aiming to support both the liturgical use within the Coptic Orthodox Church and the broader language revitalization movement. As a result, by digitizing a vast array of Bohairic texts, we assembled a dataset containing over 400,000 tokens in Bohairic Coptic distributed across 27,900 Bohairic Coptic-English translation pairs. This dataset was specifically designed to train models based on the OPUS-MT framework, which are integrated into the Coptic Translator platform to facilitate accurate and accessible translations. This project not only demonstrates the application of digital humanities in linguistic preservation but also provides a valuable resource for computational linguistics, contributing to the ongoing efforts to revitalize and maintain the Bohairic dialect of the Coptic language.

Content from these authors
© 2024 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top