2024 Volume 5 Issue 1 Pages 66-76
Natural language processing (NLP) is expected to be one of the interfaces to strengthen the tight connection between physical and virtual spaces in the digital twin, and is being considered for practical use in the field of civil engineering. For this technology to be fully functional, it is necessary for the language models to understand civil engineering terminology in the context, and appropriate evaluation of the models is required. However, most of the previous studies have focused on how to adapt the technology to the civil engineering field, and have not focused on the evaluation of the capability of language models to generate sentences. Therefore, this study aims to establish an evaluation metric to evaluate the capability of language models in the field of civil engineering. As a first step, we created a new dataset for evaluation and compared which of the existing metrics are appropriate for the civil engineering field. Finally, we discuss the issues involved in considering an automatic evaluation index.