Papers With Code 2 | ML Benchmarks, SotA Results & Code

Description

Contrastive Predictive Coding (CPC) learns self-supervised representations by predicting the future in latent space by using powerful autoregressive models. The model uses a probabilistic contrastive loss which induces the latent space to capture information that is maximally useful to predict future samples.

First, a non-linear encoder $g\_{enc}$ maps the input sequence of observations $x\_{t}$ to a sequence of latent representations $z\_{t} = g\_{enc}\left(x\_{t}\right)$ , potentially with a lower temporal resolution. Next, an autoregressive model $g\_{ar}$ summarizes all $z\leq{t}$ in the latent space and produces a context latent representation $c\_{t} = g\_{ar}\left(z\leq{t}\right)$ .

A density ratio is modelled which preserves the mutual information between $x\_{t+k}$ and $c\_{t}$ as follows:

$f\_{k}\left(x\_{t+k}, c\_{t}\right) \propto \frac{p\left(x\_{t+k}|c\_{t}\right)}{p\left(x\_{t+k}\right)}$

where $\propto$ stands for ’proportional to’ (i.e. up to a multiplicative constant). Note that the density ratio $f$ can be unnormalized (does not have to integrate to 1). The authors use a simple log-bilinear model:

$f\_{k}\left(x\_{t+k}, c\_{t}\right) = \exp\left(z^{T}\_{t+k}W\_{k}c\_{t}\right)$

Any type of autoencoder and autoregressive can be used. An example the authors opt for is strided convolutional layers with residual blocks and GRUs.

The autoencoder and autoregressive models are trained to minimize an InfoNCE loss (see components).

Description

A density ratio is modelled which preserves the mutual information between $x\_{t+k}$ and $c\_{t}$ as follows:

$f\_{k}\left(x\_{t+k}, c\_{t}\right) \propto \frac{p\left(x\_{t+k}|c\_{t}\right)}{p\left(x\_{t+k}\right)}$

$f\_{k}\left(x\_{t+k}, c\_{t}\right) = \exp\left(z^{T}\_{t+k}W\_{k}c\_{t}\right)$

Any type of autoencoder and autoregressive can be used. An example the authors opt for is strided convolutional layers with residual blocks and GRUs.

The autoencoder and autoregressive models are trained to minimize an InfoNCE loss (see components).

Contrastive Predictive Coding

Description

Papers Using This Method

Contrastive Predictive Coding

Description

Papers Using This Method