抄録
In statistical signal processing and machine learning, an open issue has been how to obtain a generative model that can produce samples from high-dimensional data distributions such as images and speeches. Generative adversarial networks (GANs) have emerged as a powerful framework that provides clues to solving this problem. A GAN is composed of two networks: a generator that transforms noise variables to data space and a discriminator that discriminates real and generated data. These two networks are optimized using a min-max game: the generator attempts to deceive the discriminator by generating data indistinguishable from the real data, while the discriminator attempts not to be deceived by the generator by finding the best discrimination between real and generated data. This novel framework enables the implicit estimation of a data distribution and enables the generator to generate high-fidelity data that are almost indistinguishable from real data. This beneficial and powerful property has attracted a great deal of attention, and a wide range of research, from basic research to practical applications, has been recently conducted. In this paper, I summarize these studies and explain the foundations and applications of GANs. Specifically, I first clarify the relation between GANs and other deep generative models then provide the theory of GANs with numerical formula. Next, I introduce recent advances in GANs and describe the impressive applications that are highly related to acoustic and speech signal processing. Finally, I conclude this paper by mentioning future directions.