GdR ISIS Théorie du deep learning - June 28, 2021
Generative Adversarial Networks (GANs; Goodfellow et al., 2014) have become a canonical approach to generative modeling as they produce realistic samples for numerous data types, with a plethora of variants (Wang et al., 2021). Much effort has been put in gaining a better understanding of the training process, with a particular focus on studying GAN loss functions to conclude about their comparative advantages. Yet, empirical evaluations (Lucic et al., 2018; Kurach et al., 2019) have shown that different GAN formulations can yield approximately the same performance regardless of the chosen loss. This indicates that by focusing exclusively on the formal loss function, theoretical studies might not model practical settings adequately.
In particular, the discriminator being a trained neural network is not taken into account, nor are the corresponding inductive biases which might considerably alter the generator?s loss landscape. Moreover, neglecting this constraint hampers the analysis of gradient-based learning of the generator on finite training sets, since the gradient from the associated discriminator is ill-defined everywhere. These limitations thus hinder the potential of theoretical analyses to explain GAN?s empirical behaviour.
In this work, leveraging the recent developments in the theory of deep learning driven by Neural Tangent Kernels (NTKs; Jacot et al., 2018), we provide a framework of analysis for GANs incorporating explicitly the discriminator?s architecture which comes with several advantages.
First, we prove that, in the proposed framework, under mild conditions on its architecture and its loss, the trained discriminator has strong differentiability properties; this result holds for several GAN formulations and standard architectures, thus making the generator?s learning problem well-defined. This emphasizes the role of the discriminator?s architecture in GANs trainability.
We then show how our framework can be useful to derive both theoretical and empirical analyses of standard losses and architectures. We highlight for instance links between Integral Probability Metric (IPM) based GANs and the Maximum Mean Discrepancy (MMD) given by the discriminator?s NTK, or the role of the ReLU activation in GAN architectures.
This is a cowork by Jean-Yves Franceschi, Emmanuel de Bézenac, Ibrahim Ayed, Mickaël Chen, Sylvain Lamprier, Patrick Gallinari.