How Domain Adaptation of BERT Improves Syntactic Parsing of Math Text

Runa Yoshida; Takuya Matsuzaki

doi:10.5715/jnlp.31.1691

Abstract

This study clarifies how the domain adaptation of bidirectional encoder representations from transformers (BERT) contributes to the syntactic analysis of mathematical texts and their limitations. Experimental results show that the domain adaptation of BERT is highly effective, even with a relatively small amount of raw in-domain data. This improves the accuracy of the syntactic dependency analysis by up to four points without any annotated in-domain data. By analyzing the improvement, we found that numerous errors involving the mathematical expressions have been corrected. Errors related to structures that are not frequent in the out-domain fine-tuning data were difficult to improve by only the domain adaptation of BERT. This study also revealed that the effectiveness of BERT depends on the representation of the mathematical expressions in the input. Among several different representations of mathematical expressions, the highest dependency accuracy was achieved using a simple method where an entire mathematical expression is replaced with a dedicated special token.

Content from these authors

Licensed under CC BY 4.0
https://creativecommons.org/licenses/by/4.0/

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!