Conditional Batch Normalization

GeneralIntroduced 2000145 papers

Description

Conditional Batch Normalization (CBN) is a class-conditional variant of batch normalization. The key idea is to predict the $\gamma$ and $\beta$ of the batch normalization from an embedding - e.g. a language embedding in VQA. CBN enables the linguistic embedding to manipulate entire feature maps by scaling them up or down, negating them, or shutting them off. CBN has also been used in GANs to allow class information to affect the batch normalization parameters.

Consider a single convolutional layer with batch normalization module $\text{BN}\left(F\_{i,c,h,w}|\gamma\_{c}, \beta\_{c}\right)$ for which pretrained scalars $\gamma\_{c}$ and $\beta\_{c}$ are available. We would like to directly predict these affine scaling parameters from, e.g., a language embedding $\mathbf{e\_{q}}$ . When starting the training procedure, these parameters must be close to the pretrained values to recover the original ResNet model as a poor initialization could significantly deteriorate performance. Unfortunately, it is difficult to initialize a network to output the pretrained $\gamma$ and $\beta$ . For these reasons, the authors propose to predict a change $\delta\beta\_{c}$ and $\delta\gamma\_{c}$ on the frozen original scalars, for which it is straightforward to initialize a neural network to produce an output with zero-mean and small variance.

The authors use a one-hidden-layer MLP to predict these deltas from a question embedding $\mathbf{e\_{q}}$ for all feature maps within the layer:

$\Delta\beta = \text{MLP}\left(\mathbf{e\_{q}}\right)$

$\Delta\gamma = \text{MLP}\left(\mathbf{e\_{q}}\right)$

So, given a feature map with $C$ channels, these MLPs output a vector of size $C$ . We then add these predictions to the $\beta$ and $\gamma$ parameters:

$\hat{\beta}\_{c} = \beta\_{c} + \Delta\beta\_{c}$

$\hat{\gamma}\_{c} = \gamma\_{c} + \Delta\gamma\_{c}$

Finally, these updated $\hat{β}$ and $\hat{\gamma}$ are used as parameters for the batch normalization: $\text{BN}\left(F\_{i,c,h,w}|\hat{\gamma\_{c}}, \hat{\beta\_{c}}\right)$ . The authors freeze all ResNet parameters, including $\gamma$ and $\beta$ , during training. A ResNet consists of four stages of computation, each subdivided in several residual blocks. In each block, the authors apply CBN to the three convolutional layers.

Papers Using This Method

ParaGAN: A Scalable Distributed Training Framework for Generative Adversarial Networks2024-11-06 Unsupervised Panoptic Interpretation of Latent Spaces in GANs Using Space-Filling Vector Quantization2024-10-27 RATLIP: Generative Adversarial CLIP Text-to-Image Synthesis Based on Recurrent Affine Transformations2024-05-13 Data-driven Crop Growth Simulation on Time-varying Generated Images using Multi-conditional Generative Adversarial Networks2023-12-06 On quantifying and improving realism of images generated with diffusion2023-09-26 Precision-Recall Divergence Optimization for Generative Modeling with GANs and Normalizing Flows2023-09-21 A Strategic Framework for Optimal Decisions in Football 1-vs-1 Shot-Taking Situations: An Integrated Approach of Machine Learning, Theory-Based Modeling, and Game Theory2023-07-27 Pyrus Base: An Open Source Python Framework for the RoboCup 2D Soccer Simulation2023-07-22 Diffusion Models Beat GANs on Image Classification2023-07-17 Diversity is Strength: Mastering Football Full Game with Interactive Reinforcement Learning of Multiple AIs2023-06-28 Rosetta Neurons: Mining the Common Units in a Model Zoo2023-06-15 Toward more accurate and generalizable brain deformation estimators for traumatic brain injury detection with unsupervised domain adaptation2023-06-08 FOOCTTS: Generating Arabic Speech with Acoustic Environment for Football Commentator2023-06-07 Action valuation of on- and off-ball soccer players based on multi-agent deep reinforcement learning2023-05-29 Is Centralized Training with Decentralized Execution Framework Centralized Enough for MARL?2023-05-27 Adaptive action supervision in reinforcement learning from real-world multi-agent demonstrations2023-05-22 An Empirical Study on Google Research Football Multi-agent Scenarios2023-05-16 The MuSe 2023 Multimodal Sentiment Analysis Challenge: Mimicked Emotions, Cross-Cultural Humour, and Personalisation2023-05-05 SportsMOT: A Large Multi-Object Tracking Dataset in Multiple Sports Scenes2023-04-11 VARS: Video Assistant Referee System for Automated Soccer Decision Making from Multiple Views2023-04-10