2024 Volume 21 Pages 107-125
Since the 2010s, research has been progressing to construct accounting fraud detection models not only using financial indicators but also textual features through text analysis. In this review, we focus on studies that have constructed accounting fraud detection models using the Form 10-K text, and survey the eight studies from 2010 to 2020, focusing on the process of feature extraction and the detection accuracy of the models.
Summarizing the results of previous studies through the review, the following five issues are: 1) Studies using the “bag of words” approach face challenges in interpreting and theorizing why features contribute to the detection of accounting fraud; 2) There is a complementary relationship between textual features and financial indicators in detecting accounting fraud. Future research should look at which financial indicators have a highly complementarity with textual features; 3) Prior research has not clarified whether features extracted from the whole Form 10-K are more accurate in detecting accounting fraud than features extracted from a specific section such as the MD&A section. 4) Few studies apply methods other than matched sampling in sampling of fraudulent and non-fraudulent cases; 5) More studies using time series data are needed.