Article ID: 2024EDL8107
Compared to general object detection problems, the detection of mathematical expressions (MED) in document images has its own challenges, like the small size of inline formulas, the rich set of mathematical symbols, and the similarity between variables and normal text characters. To deal with those challenges, we transform the multi-class MED task into a multi-label semantic segmentation problem. With a basic encoder-decoder structure of 3.9 million parameters and trained from scratch, our proposed MEDNet model can achieve top detection performance on three public datasets: TFD2019, Marmot, and IBEM2021. MEDNet is especially effective in detecting small formulas when achieving the F1 score of 95.40% for the inline and 95.82% for all expressions on the test set of the IBEM2021 competition data.