TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/Phantom-Wan-14B

Phantom-Wan-14B

Reported on 28 benchmarks across 4 tasks · 1 paper · 8 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Computer Vision14 results

  • VideoonOpenS2V-Eval
    FaceSim· 2025-02-16
    0.5148
    best: 0.5509 (Wan2.1-VACE-14B)
    SOTA
    Phantom: Subject-consistent video generation via cross-modal alignmentarXiv:2502.11079
  • VideoonOpenS2V-Eval
    GmeScore· 2025-02-16
    0.7065
    best: 0.7138 (Wan2.1-VACE-1.3B-Preview)
    SOTA
    Phantom: Subject-consistent video generation via cross-modal alignmentarXiv:2502.11079
  • Image to Video GenerationonOpenS2V-Eval
    FaceSim· 2025-02-16
    0.5148
    best: 0.5509 (Wan2.1-VACE-14B)
    SOTA
    Phantom: Subject-consistent video generation via cross-modal alignmentarXiv:2502.11079
  • Image to Video GenerationonOpenS2V-Eval
    GmeScore· 2025-02-16
    0.7065
    best: 0.7138 (Wan2.1-VACE-1.3B-Preview)
    SOTA
    Phantom: Subject-consistent video generation via cross-modal alignmentarXiv:2502.11079
  • VideoonOpenS2V-Eval
    Aesthetics· 2025-02-16
    0.4639
    best: 0.4824 (Wan2.1-VACE-1.3B)
    Phantom: Subject-consistent video generation via cross-modal alignmentarXiv:2502.11079
  • VideoonOpenS2V-Eval
    Motion· 2025-02-16
    0.3342
    best: 0.416 (Kling 1.6)
    Phantom: Subject-consistent video generation via cross-modal alignmentarXiv:2502.11079
  • VideoonOpenS2V-Eval
    NaturalScore· 2025-02-16
    0.6866
    best: 0.7906 (Kling 1.6)
    Phantom: Subject-consistent video generation via cross-modal alignmentarXiv:2502.11079
  • VideoonOpenS2V-Eval
    NexusScore· 2025-02-16
    0.3743
    best: 0.4592 (Kling 1.6)
    Phantom: Subject-consistent video generation via cross-modal alignmentarXiv:2502.11079
  • VideoonOpenS2V-Eval
    Total Score· 2025-02-16
    0.5232
    best: 0.5446 (Kling 1.6)
    Phantom: Subject-consistent video generation via cross-modal alignmentarXiv:2502.11079
  • Image to Video GenerationonOpenS2V-Eval
    Aesthetics· 2025-02-16
    0.4639
    best: 0.4824 (Wan2.1-VACE-1.3B)
    Phantom: Subject-consistent video generation via cross-modal alignmentarXiv:2502.11079
  • Image to Video GenerationonOpenS2V-Eval
    Motion· 2025-02-16
    0.3342
    best: 0.416 (Kling 1.6)
    Phantom: Subject-consistent video generation via cross-modal alignmentarXiv:2502.11079
  • Image to Video GenerationonOpenS2V-Eval
    NaturalScore· 2025-02-16
    0.6866
    best: 0.7906 (Kling 1.6)
    Phantom: Subject-consistent video generation via cross-modal alignmentarXiv:2502.11079
  • Image to Video GenerationonOpenS2V-Eval
    NexusScore· 2025-02-16
    0.3743
    best: 0.4592 (Kling 1.6)
    Phantom: Subject-consistent video generation via cross-modal alignmentarXiv:2502.11079
  • Image to Video GenerationonOpenS2V-Eval
    Total Score· 2025-02-16
    0.5232
    best: 0.5446 (Kling 1.6)
    Phantom: Subject-consistent video generation via cross-modal alignmentarXiv:2502.11079

Natural Language Processing7 results

  • Video GenerationonOpenS2V-Eval
    FaceSim· 2025-02-16
    0.5148
    best: 0.5509 (Wan2.1-VACE-14B)
    SOTA
    Phantom: Subject-consistent video generation via cross-modal alignmentarXiv:2502.11079
  • Video GenerationonOpenS2V-Eval
    GmeScore· 2025-02-16
    0.7065
    best: 0.7138 (Wan2.1-VACE-1.3B-Preview)
    SOTA
    Phantom: Subject-consistent video generation via cross-modal alignmentarXiv:2502.11079
  • Video GenerationonOpenS2V-Eval
    Aesthetics· 2025-02-16
    0.4639
    best: 0.4824 (Wan2.1-VACE-1.3B)
    Phantom: Subject-consistent video generation via cross-modal alignmentarXiv:2502.11079
  • Video GenerationonOpenS2V-Eval
    Motion· 2025-02-16
    0.3342
    best: 0.416 (Kling 1.6)
    Phantom: Subject-consistent video generation via cross-modal alignmentarXiv:2502.11079
  • Video GenerationonOpenS2V-Eval
    NaturalScore· 2025-02-16
    0.6866
    best: 0.7906 (Kling 1.6)
    Phantom: Subject-consistent video generation via cross-modal alignmentarXiv:2502.11079
  • Video GenerationonOpenS2V-Eval
    NexusScore· 2025-02-16
    0.3743
    best: 0.4592 (Kling 1.6)
    Phantom: Subject-consistent video generation via cross-modal alignmentarXiv:2502.11079
  • Video GenerationonOpenS2V-Eval
    Total Score· 2025-02-16
    0.5232
    best: 0.5446 (Kling 1.6)
    Phantom: Subject-consistent video generation via cross-modal alignmentarXiv:2502.11079

Audio7 results

  • 1 Image, 2*2 StitchionOpenS2V-Eval
    FaceSim· 2025-02-16
    0.5148
    best: 0.5509 (Wan2.1-VACE-14B)
    SOTA
    Phantom: Subject-consistent video generation via cross-modal alignmentarXiv:2502.11079
  • 1 Image, 2*2 StitchionOpenS2V-Eval
    GmeScore· 2025-02-16
    0.7065
    best: 0.7138 (Wan2.1-VACE-1.3B-Preview)
    SOTA
    Phantom: Subject-consistent video generation via cross-modal alignmentarXiv:2502.11079
  • 1 Image, 2*2 StitchionOpenS2V-Eval
    Aesthetics· 2025-02-16
    0.4639
    best: 0.4824 (Wan2.1-VACE-1.3B)
    Phantom: Subject-consistent video generation via cross-modal alignmentarXiv:2502.11079
  • 1 Image, 2*2 StitchionOpenS2V-Eval
    Motion· 2025-02-16
    0.3342
    best: 0.416 (Kling 1.6)
    Phantom: Subject-consistent video generation via cross-modal alignmentarXiv:2502.11079
  • 1 Image, 2*2 StitchionOpenS2V-Eval
    NaturalScore· 2025-02-16
    0.6866
    best: 0.7906 (Kling 1.6)
    Phantom: Subject-consistent video generation via cross-modal alignmentarXiv:2502.11079
  • 1 Image, 2*2 StitchionOpenS2V-Eval
    NexusScore· 2025-02-16
    0.3743
    best: 0.4592 (Kling 1.6)
    Phantom: Subject-consistent video generation via cross-modal alignmentarXiv:2502.11079
  • 1 Image, 2*2 StitchionOpenS2V-Eval
    Total Score· 2025-02-16
    0.5232
    best: 0.5446 (Kling 1.6)
    Phantom: Subject-consistent video generation via cross-modal alignmentarXiv:2502.11079