TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Content-Conditioned Style Encoder

Content-Conditioned Style Encoder

Computer VisionIntroduced 20001 papers
Source Paper

Description

The Content-Conditioned Style Encoder, or COCO, is a style encoder used for image-to-image translation in the COCO-FUNIT architecture. Unlike the style encoder in FUNIT, COCO takes both content and style image as input. With this content conditioning scheme, we create a direct feedback path during learning to let the content image influence how the style code is computed. It also helps reduce the direct influence of the style image to the extract style code.

The bottom part of the Figure details architecture. First, the content image is fed into an encoder E_S,CE\_{S, C}E_S,C to compute a spatial feature map. This content feature map is then mean-pooled and mapped to a vector ζ_c.\zeta\_{c} .ζ_c. Similarly, the style image is fed into encoder E_S,SE\_{S, S}E_S,S to compute a spatial feature map. The style feature map is then mean-pooled and concatenated with an input-independent bias vector: the constant style bias (CSB). Note that while the regular bias in deep networks is added to the activations, in CSB, the bias is concatenated with the activations. The CSB provides a fixed input to the style encoder, which helps compute a style code that is less sensitive to the variations in the style image.

The concatenation of the style vector and the CSB is mapped to a vector ζ_s\zeta\_{s}ζ_s via a fully connected layer. We then perform an element-wise product operation to ζ_c\zeta\_{c}ζ_c and ζ_s\zeta\_{s}ζ_s, which is the final style code. The style code is then mapped to produce the AdaIN parameters for generating the translation. Through this element-wise product operation, the resulting style code is heavily influenced by the content image. One way to look at this mechanism is that it produces a customized style code for the input content image.

The COCO is used as a drop-in replacement for the style encoder in FUNIT. Let ϕ\phiϕ denote the COCO mapping. The translation output is then computed via

z_c=E_c(xc),zs=ϕ(E_s,s(xs),E_s,c(x_c)),x‾=F(z_c,z_s)z\_{c}=E\_{c}\left(x_{c}\right), z_{s}=\phi\left(E\_{s, s}\left(x_{s}\right), E\_{s, c}\left(x\_{c}\right)\right), \overline{\mathbf{x}}=F\left(z\_{c}, z\_{s}\right)z_c=E_c(xc​),zs​=ϕ(E_s,s(xs​),E_s,c(x_c)),x=F(z_c,z_s)

The style code extracted by the COCO is more robust to variations in the style image. Note that we set E_S,C≡E_CE\_{S, C} \equiv E\_{C}E_S,C≡E_C to keep the number of parameters in our model similar to that in FUNIT.

Papers Using This Method

COCO-FUNIT: Few-Shot Unsupervised Image Translation with a Content Conditioned Style Encoder2020-07-15