Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
34th (2020)
Session ID : 3Rin4-08
Conference information

Classification of Issue discussions in Open Source Software Projects using BERT or Automated ML
*Yuki YAMADAAtsuo HAZEYAMAYutaro OGAWA
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

Abstract: (1) Purpose: Discovering and retrieving relevant information from lengthy documents is a challenging task, such as product defect reports, chat-histories of a call center, minutes of the meeting. Thus, constructing a technic identifying information types of each sentences in a document is important. We challenged revealing which type of Feature Engineering is effective for this task, or confirmed whether the BERT model is effective. We used Open Source Software Issue discussion as a corpus in this study, such as TensorFlow and scikit-learn. (2) Results: As a result from trained models using AutoML and calculated the global importance using SHAP, the length of sentences, the position in the document and the time between comments are important. A limited fine tuning of BERT, which means training only the parameters of the final layer, was no significant difference in the performance from ordinal logistic regression.

Content from these authors
© 2020 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top