Yuki Endo and Yoshihiro Kanamori
University of Tsukuba
Computer Animation and Virtual Worlds (Computer Graphics International 2022)

This paper tackles the challenging problem of one-shot semantic image synthesis from rough sparse annotations, which we call "semantic scribbles." Namely, from only a single training pair annotated with semantic scribbles, we generate realistic and diverse images with layout control over, e.g., facial part layouts and body poses. We present a training strategy that performs pseudo labeling for semantic scribbles using the StyleGAN prior. Our key idea is to construct a simple mapping between StyleGAN features and each semantic class from a single example of semantic scribbles. With such mappings, we can generate an unlimited number of pseudo semantic scribbles from random noise to train an encoder for controlling a pre-trained StyleGAN generator. Even with our rough pseudo semantic scribbles obtained via one-shot supervision, our method can synthesize high-quality images thanks to our GAN inversion framework. We further offer optimization-based post-processing to refine the pixel alignment of synthesized images. Qualitative and quantitative results on various datasets demonstrate improvement over previous approaches in one-shot settings.
Last modified: June 2022
[back]