2024 Volume 31 Issue 4 Pages 1691-1716
This study clarifies how the domain adaptation of bidirectional encoder representations from transformers (BERT) contributes to the syntactic analysis of mathematical texts and their limitations. Experimental results show that the domain adaptation of BERT is highly effective, even with a relatively small amount of raw in-domain data. This improves the accuracy of the syntactic dependency analysis by up to four points without any annotated in-domain data. By analyzing the improvement, we found that numerous errors involving the mathematical expressions have been corrected. Errors related to structures that are not frequent in the out-domain fine-tuning data were difficult to improve by only the domain adaptation of BERT. This study also revealed that the effectiveness of BERT depends on the representation of the mathematical expressions in the input. Among several different representations of mathematical expressions, the highest dependency accuracy was achieved using a simple method where an entire mathematical expression is replaced with a dedicated special token.