TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/GeminiFusion (Swin-Large)

GeminiFusion (Swin-Large)

Reported on 4 benchmarks across 2 tasks · 1 paper · 4 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Medical3 results

  • Semantic SegmentationonSUN-RGBD
    Mean IoU· 2024-06-03
    54.6
    SOTA
    GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision TransformerarXiv:2406.01210
  • Semantic SegmentationonNYU Depth v2
    Mean IoU· uses extra data· 2024-06-03
    60.9
    best: 63.6 (OmniVec2)
    SOTA
    GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision TransformerarXiv:2406.01210
  • Semantic SegmentationonNYU Depth v2
    Mean IoU· 2024-06-03
    60.2
    best: 63.6 (OmniVec2)
    GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision TransformerarXiv:2406.01210

Audio3 results

  • 10-shot image generationonSUN-RGBD
    Mean IoU· 2024-06-03
    54.6
    SOTA
    GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision TransformerarXiv:2406.01210
  • 10-shot image generationonNYU Depth v2
    Mean IoU· uses extra data· 2024-06-03
    60.9
    best: 63.6 (OmniVec2)
    SOTA
    GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision TransformerarXiv:2406.01210
  • 10-shot image generationonNYU Depth v2
    Mean IoU· 2024-06-03
    60.2
    best: 63.6 (OmniVec2)
    GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision TransformerarXiv:2406.01210