TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Janus-Pro: Unified Multimodal Understanding and Generation...

Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling

Xiaokang Chen, Zhiyu Wu, Xingchao Liu, Zizheng Pan, Wen Liu, Zhenda Xie, Xingkai Yu, Chong Ruan

2025-01-29Text-to-Image GenerationInstruction FollowingText to Image GenerationImage GenerationVisual Question Answering
PaperPDFCode(official)

Abstract

In this work, we introduce Janus-Pro, an advanced version of the previous work Janus. Specifically, Janus-Pro incorporates (1) an optimized training strategy, (2) expanded training data, and (3) scaling to larger model size. With these improvements, Janus-Pro achieves significant advancements in both multimodal understanding and text-to-image instruction-following capabilities, while also enhancing the stability of text-to-image generation. We hope this work will inspire further exploration in the field. Code and models are publicly available.

Results

TaskDatasetMetricValueModel
Image GenerationWISEBiology0.36Janus-pro
Image GenerationWISEChemistry0.26Janus-pro
Image GenerationWISECultural0.3Janus-pro
Image GenerationWISEOverall0.35Janus-pro
Image GenerationWISEPhysics0.42Janus-pro
Image GenerationWISESpace0.49Janus-pro
Image GenerationWISETime0.37Janus-pro
Image GenerationGenEvalOverall0.8Janus-Pro-7B
Image GenerationGenEvalOverall0.73Janus-Pro-1B
Visual Question Answering (VQA)MM-VetGPT-4 score50Janus-Pro-7B
Visual Question Answering (VQA)MM-VetGPT-4 score39.8Janus-Pro-1B
Text-to-Image GenerationGenEvalOverall0.8Janus-Pro-7B
Text-to-Image GenerationGenEvalOverall0.73Janus-Pro-1B
10-shot image generationGenEvalOverall0.8Janus-Pro-7B
10-shot image generationGenEvalOverall0.73Janus-Pro-1B
Visual Question AnsweringMM-VetGPT-4 score50Janus-Pro-7B
Visual Question AnsweringMM-VetGPT-4 score39.8Janus-Pro-1B
1 Image, 2*2 StitchiGenEvalOverall0.8Janus-Pro-7B
1 Image, 2*2 StitchiGenEvalOverall0.73Janus-Pro-1B

Related Papers

AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning2025-07-17fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection2025-07-17FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization2025-07-17A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints2025-07-17Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17FADE: Adversarial Concept Erasure in Flow Models2025-07-16Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16