OSSプロジェクトのIssue議論内容に対するBERTおよびAutoMLを用いた文章分類の提案

山田 侑樹; 櫨山 淳雄; 小川 雄太郎

doi:10.11517/pjsai.JSAI2020.0_3Rin408

34th (2020)

Session ID : 3Rin4-08

DOI https://doi.org/10.11517/pjsai.JSAI2020.0_3Rin408

Conference information

Host: The Japanese Society for Artificial Intelligence

Name : 34th Annual Conference, 2020

Number : 34

Location : Online

Date : June 09, 2020 - June 12, 2020

Classification of Issue discussions in Open Source Software Projects using BERT or Automated ML

*Yuki YAMADA, Atsuo HAZEYAMA, Yutaro OGAWA

Author information

Keywords: BERT, Automated ML, OSS, Natural Language Processing

CONFERENCE PROCEEDINGS FREE ACCESS

Details

Abstract

Abstract: (1) Purpose: Discovering and retrieving relevant information from lengthy documents is a challenging task, such as product defect reports, chat-histories of a call center, minutes of the meeting. Thus, constructing a technic identifying information types of each sentences in a document is important. We challenged revealing which type of Feature Engineering is effective for this task, or confirmed whether the BERT model is effective. We used Open Source Software Issue discussion as a corpus in this study, such as TensorFlow and scikit-learn. (2) Results: As a result from trained models using AutoML and calculated the global importance using SHAP, the length of sentences, the position in the document and the time between comments are important. A limited fine tuning of BERT, which means training only the parameters of the final layer, was no significant difference in the performance from ordinal logistic regression.

Corresponding author

Conference information

Register with J-STAGE for free!