TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/Swinv2-Imagen

Swinv2-Imagen

Reported on 20 benchmarks across 4 tasks · 1 paper · 8 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Audio10 results

  • 10-shot image generationonMulti-Modal-CelebA-HQ
    FID· 2022-10-18
    10.31
    SOTA
    Swinv2-Imagen: Hierarchical Vision Transformer Diffusion Models for Text-to-Image GenerationarXiv:2210.09549
  • 10-shot image generationonCUB
    Inception score· 2022-10-18
    8.44
    SOTA
    Swinv2-Imagen: Hierarchical Vision Transformer Diffusion Models for Text-to-Image GenerationarXiv:2210.09549
  • 1 Image, 2*2 StitchionMulti-Modal-CelebA-HQ
    FID· 2022-10-18
    10.31
    SOTA
    Swinv2-Imagen: Hierarchical Vision Transformer Diffusion Models for Text-to-Image GenerationarXiv:2210.09549
  • 1 Image, 2*2 StitchionCUB
    Inception score· 2022-10-18
    8.44
    SOTA
    Swinv2-Imagen: Hierarchical Vision Transformer Diffusion Models for Text-to-Image GenerationarXiv:2210.09549
  • 10-shot image generationonCOCO (Common Objects in Context)
    FID· uses extra data· 2022-10-18
    7.21
    best: 5 (RAT-Diffusion)
    Swinv2-Imagen: Hierarchical Vision Transformer Diffusion Models for Text-to-Image GenerationarXiv:2210.09549
  • 10-shot image generationonCOCO (Common Objects in Context)
    Inception score· uses extra data· 2022-10-18
    31.46
    best: 34.67 (FuseDream (k=10, 256))
    Swinv2-Imagen: Hierarchical Vision Transformer Diffusion Models for Text-to-Image GenerationarXiv:2210.09549
  • 10-shot image generationonCUB
    FID· 2022-10-18
    9.78
    best: 6.36 (RAT-Diffusion)
    Swinv2-Imagen: Hierarchical Vision Transformer Diffusion Models for Text-to-Image GenerationarXiv:2210.09549
  • 1 Image, 2*2 StitchionCOCO (Common Objects in Context)
    FID· uses extra data· 2022-10-18
    7.21
    best: 5 (RAT-Diffusion)
    Swinv2-Imagen: Hierarchical Vision Transformer Diffusion Models for Text-to-Image GenerationarXiv:2210.09549
  • 1 Image, 2*2 StitchionCOCO (Common Objects in Context)
    Inception score· uses extra data· 2022-10-18
    31.46
    best: 34.67 (FuseDream (k=10, 256))
    Swinv2-Imagen: Hierarchical Vision Transformer Diffusion Models for Text-to-Image GenerationarXiv:2210.09549
  • 1 Image, 2*2 StitchionCUB
    FID· 2022-10-18
    9.78
    best: 6.36 (RAT-Diffusion)
    Swinv2-Imagen: Hierarchical Vision Transformer Diffusion Models for Text-to-Image GenerationarXiv:2210.09549

Medical5 results

  • Image GenerationonCUB
    Inception score· 2022-10-18
    8.44
    SOTA
    Swinv2-Imagen: Hierarchical Vision Transformer Diffusion Models for Text-to-Image GenerationarXiv:2210.09549
  • Image GenerationonMulti-Modal-CelebA-HQ
    FID· 2022-10-18
    10.31
    SOTA
    Swinv2-Imagen: Hierarchical Vision Transformer Diffusion Models for Text-to-Image GenerationarXiv:2210.09549
  • Image GenerationonCOCO (Common Objects in Context)
    FID· uses extra data· 2022-10-18
    7.21
    best: 5 (RAT-Diffusion)
    Swinv2-Imagen: Hierarchical Vision Transformer Diffusion Models for Text-to-Image GenerationarXiv:2210.09549
  • Image GenerationonCOCO (Common Objects in Context)
    Inception score· uses extra data· 2022-10-18
    31.46
    best: 34.67 (FuseDream (k=10, 256))
    Swinv2-Imagen: Hierarchical Vision Transformer Diffusion Models for Text-to-Image GenerationarXiv:2210.09549
  • Image GenerationonCUB
    FID· 2022-10-18
    9.78
    best: 6.36 (RAT-Diffusion)
    Swinv2-Imagen: Hierarchical Vision Transformer Diffusion Models for Text-to-Image GenerationarXiv:2210.09549

Natural Language Processing5 results

  • Text-to-Image GenerationonCUB
    Inception score· 2022-10-18
    8.44
    SOTA
    Swinv2-Imagen: Hierarchical Vision Transformer Diffusion Models for Text-to-Image GenerationarXiv:2210.09549
  • Text-to-Image GenerationonMulti-Modal-CelebA-HQ
    FID· 2022-10-18
    10.31
    SOTA
    Swinv2-Imagen: Hierarchical Vision Transformer Diffusion Models for Text-to-Image GenerationarXiv:2210.09549
  • Text-to-Image GenerationonCOCO (Common Objects in Context)
    FID· uses extra data· 2022-10-18
    7.21
    best: 5 (RAT-Diffusion)
    Swinv2-Imagen: Hierarchical Vision Transformer Diffusion Models for Text-to-Image GenerationarXiv:2210.09549
  • Text-to-Image GenerationonCOCO (Common Objects in Context)
    Inception score· uses extra data· 2022-10-18
    31.46
    best: 34.67 (FuseDream (k=10, 256))
    Swinv2-Imagen: Hierarchical Vision Transformer Diffusion Models for Text-to-Image GenerationarXiv:2210.09549
  • Text-to-Image GenerationonCUB
    FID· 2022-10-18
    9.78
    best: 6.36 (RAT-Diffusion)
    Swinv2-Imagen: Hierarchical Vision Transformer Diffusion Models for Text-to-Image GenerationarXiv:2210.09549