人工知能学会論文誌
Online ISSN : 1346-8030
Print ISSN : 1346-0714
ISSN-L : 1346-0714
特集論文「エージェント技術とその応用」
音声手掛かりを用いた顔画像のInpainting手法の提案とその評価
赤塚 哲丸折原 良平清 雄一田原 康之大須賀 昭彦
著者情報
ジャーナル フリー

2025 年 40 巻 4 号 p. AG25-C_1-9

詳細
抄録

In recent years, inpainting of human face images has attracted a great deal of attention among various inpainting tasks that aim to complete missing or obscured regions in digital images. Compared to inpainting of abstract objects such as landscapes, buildings, or patterns, inpainting of human face images presents unique and considerable challenges. This is due to the fact that human faces have an complex visual structure, where even subtle differences in facial features or proportions can cause a sense of discomfort or seem unnatural. As such, faithfully completing missing regions in face images while maintaining a convincing and realistic appearance is an extremely difficult problem. Although some inpainting techniques have been developed to fill in missing facial parts based on analyzing and extrapolating from surrounding available facial information, most existing methods struggle to reproduce facially coherent results. To address this, we propose leveraging supplementary voice data, which contains cues strongly correlated to an individual’s facial structure and expressions, to guide and enhance face image inpainting. Specifically, our proposed method uses voice segments as additional conditioning inputs when generating missing facial regions, with the aim of improving fidelity and perceptual realism of completed faces. To rigorously evaluate this voiceaugmented face inpainting approach, we constructed a large test dataset consisting of around 20,000 pseudo-masked face images paired with corresponding preprocessed voice samples of each individual. In comparative experiments, our method attained significantly higher face completion quality versus a baseline model without any voice inputs. Additionally, we conducted challenging real-world verification tests using actual masked face images and raw voice data as inputs. Although performance remained insufficient for reliably handling real occluded faces, these experiments confirmed that voice conditioning clearly improves results on artificial test data, demonstrating its viability as a supplementary signal to guide generative face inpainting systems.

著者関連情報
© JSAI (The Japanese Society for Artificial Intelligence)
前の記事
feedback
Top