2021 Volume 50 Issue 4 Pages 614-618
In this project, we are developing a system to remove objects, logos, annotations, and noises from videos with as little human intervention as possible by using SiamMask and a video completion method, both of which are existing methods. The user specifies a target object by drawing a bounding box around it in the first frame of the video; this bounding box is taken as input by SiamMask. SiamMask then tracks the target object and produces its mask in each frame. The resulting masks are then taken as input by a video completion method, which produces the final video completion result. The goal of this project is that, after drawing the bounding box, the user immediately obtains the video completion result. However, the mask produced by our current method is not always perfect. When imperfections arise, the user still has to manually modify the mask using an image editing software. As an application of this method, we also propose a method to generate a highly accurate mask of the target object based on the difference between the input video and the restoration result.