Abstract
An AI would be able to detect bugs in a game video, i.e., frames in a play video that are more likely to have bugs, instead of human debuggers, by inputting “bug text’’ that represents in-video bugs as well as each frame of the play video and calculating the similarity of OpenAI’s CLIP between language and images. However, the existing studies for CLIPbased bug detection have a problem that they require to be input bug text manually. Therefore, this paper prototypes and validates a method to generate “bug text” as an input to CLIP along with each frame of the play video, by using object detection, WordNet, and ConceptNet, where bugs means that objects in a game video violate the laws of physics.