TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/Dita-300M

Dita-300M

Reported on 8 benchmarks across 1 task · 1 paper

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Robots8 results

  • Robot ManipulationonSimplerEnv-Google Robot
    Variant Aggregation· uses extra data· 2025-03-25
    0.652
    best: 0.688 (SpatialVLA)
    Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action PolicyarXiv:2503.19757
  • Robot ManipulationonSimplerEnv-Google Robot
    Variant Aggregation-Move Near· uses extra data· 2025-03-25
    0.73
    best: 0.792 (RT-2-X)
    Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action PolicyarXiv:2503.19757
  • Robot ManipulationonSimplerEnv-Google Robot
    Variant Aggregation-Open/Close Drawer· uses extra data· 2025-03-25
    0.37
    best: 0.011 (Octo-Base)
    Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action PolicyarXiv:2503.19757
  • Robot ManipulationonSimplerEnv-Google Robot
    Variant Aggregation-Pick Coke Can· uses extra data· 2025-03-25
    0.855
    best: 0.907 (SoFar)
    Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action PolicyarXiv:2503.19757
  • Robot ManipulationonSimplerEnv-Google Robot
    Visual Matching· uses extra data· 2025-03-25
    0.687
    best: 0.749 (SoFar)
    Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action PolicyarXiv:2503.19757
  • Robot ManipulationonSimplerEnv-Google Robot
    Visual Matching-Move Near· uses extra data· 2025-03-25
    0.76
    best: 0.917 (SoFar)
    Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action PolicyarXiv:2503.19757
  • Robot ManipulationonSimplerEnv-Google Robot
    Visual Matching-Open/Close Drawer· uses extra data· 2025-03-25
    0.463
    best: 0.227 (Octo-Base)
    Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action PolicyarXiv:2503.19757
  • Robot ManipulationonSimplerEnv-Google Robot
    Visual Matching-Pick Coke Can· uses extra data· 2025-03-25
    0.837
    best: 0.923 (SoFar)
    Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action PolicyarXiv:2503.19757