Analysis of Tender Documents Using Sequence Labeling with LLM-based Improver

Tomoki ITO; Hiroki SAKAJI

doi:10.11517/pjsai.JSAI2024.0_2Q5IS101

抄録

Bidders often take a long time to read and understand tender documents because they require specialized knowledge, and tender documents are generally long. Here, the function that can extract specific items (i.e., item extractor) and the function that can highlight words or phrases related to specific items (i.e., word-phrase highlighter) are in great demand. To develop such type of functions, we need to solve two problems. The first problem is the problem related to the annotated data set. The second problem concerns the BERT-based sequence labeling approach in a small training dataset setting. To solve the first problem, we created two types of sequence labeling datasets related to Item Extractor and Word-Phrase Highlighter. To solve the second problem, we propose the Information Extraction (IE) method, which combines (1) a supervised learning approach using BERT-based sequence labeling and (2) a large language model (LLM)-based improver. Experimental evaluation demonstrates the effectivenes of our approach. Moreover, as an application, We then developed the web application system called Tender Document Analyzer (TDDA).

著者関連情報

お気に入り & アラート

閲覧履歴

責任著者(Corresponding author)

会議情報

J-STAGEへの登録はこちら（無料）