Abstract
Recently generic forum boards, such as Bulletin Board Systems (BBS) have experienced an increase of illegal and
harmful activities, especially on BBS, which purpose is to exchange user contact IDs for further private chat applications (so called “ ID exchange BBS”). On such BBS, lexical transcription is often jargonized, or modified intentionally, making it difficult to extract information using standard tools. In this study, we first study the typology of harmful jargonized expressions on ID exchan ge BBS. Based on this typology we propose a method dealing with the intentional transcription modifications on ID exchange BBS in a robust way. In the evaluation, we developed a system applying the proposed method and verified the performance in estimating harmfulness of BBS entries. We also further improved the system by applying an automatic sentence pattern extraction method to separate harmful from non harmful entries. The experiment confirmed that it was possible to eliminate most of the erroneous tran scription modifications with the proposed method.