A Lage-scale Labeled English Text Datasets for Machine Learning: Case of Issue-based Information System

Jawad HAQBEEN; Sofia SAHAB; Takayuki ITO

doi:10.11517/pjsai.JSAI2023.0_1P5OS16b04

37th (2023)

セッションID: 1P5-OS-16b-04

DOI https://doi.org/10.11517/pjsai.JSAI2023.0_1P5OS16b04

会議情報

主催: The Japanese Society for Artificial Intelligence

会議名: 2023年度人工知能学会全国大会（第37回）

回次: 37

開催地: 熊本城ホール＋オンライン

開催日: 2023/06/06 - 2023/06/09

A Lage-scale Labeled English Text Datasets for Machine Learning: Case of Issue-based Information System

*Jawad HAQBEEN, Sofia SAHAB, Takayuki ITO

著者情報

キーワード: annotation, text dataset, machine learning, deep learning, natural language processing

会議録・要旨集フリー

詳細

抄録

Textual data has emerged as one of the fastest-growing data types on the internet. This development has led to significant advancements in the field of Natural Language Processing (NLP) in recent years, primarily driven by the utilization of Deep Learning (DL) and Machine Learning (ML) techniques. These methods are known to require copious amounts of labeled text data in a specific format and structure for model training purposes using some sort of dialogue mapping. For instance, node and link extractor models have been trained in D-Agree using text-based training data while adopting Issue-based Information System (IBIS) notation. However, training such models in English has been challenging due to the arduousness of preparing labeled IBIS English datasets. In this study, we present a process for annotating and releasing large quantities of training data for machine learning based on IBIS, providing researchers with a free environment to train their opinion extractor models in English.

責任著者(Corresponding author)

会議情報

J-STAGEへの登録はこちら（無料）