Xiaokang Chen, Zhiyu Wu, Xingchao Liu, Zizheng Pan, Wen Liu, Zhenda Xie, Xingkai Yu, Chong Ruan
In this work, we introduce Janus-Pro, an advanced version of the previous work Janus. Specifically, Janus-Pro incorporates (1) an optimized training strategy, (2) expanded training data, and (3) scaling to larger model size. With these improvements, Janus-Pro achieves significant advancements in both multimodal understanding and text-to-image instruction-following capabilities, while also enhancing the stability of text-to-image generation. We hope this work will inspire further exploration in the field. Code and models are publicly available.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Image Generation | WISE | Biology | 0.36 | Janus-pro |
| Image Generation | WISE | Chemistry | 0.26 | Janus-pro |
| Image Generation | WISE | Cultural | 0.3 | Janus-pro |
| Image Generation | WISE | Overall | 0.35 | Janus-pro |
| Image Generation | WISE | Physics | 0.42 | Janus-pro |
| Image Generation | WISE | Space | 0.49 | Janus-pro |
| Image Generation | WISE | Time | 0.37 | Janus-pro |
| Image Generation | GenEval | Overall | 0.8 | Janus-Pro-7B |
| Image Generation | GenEval | Overall | 0.73 | Janus-Pro-1B |
| Visual Question Answering (VQA) | MM-Vet | GPT-4 score | 50 | Janus-Pro-7B |
| Visual Question Answering (VQA) | MM-Vet | GPT-4 score | 39.8 | Janus-Pro-1B |
| Text-to-Image Generation | GenEval | Overall | 0.8 | Janus-Pro-7B |
| Text-to-Image Generation | GenEval | Overall | 0.73 | Janus-Pro-1B |
| 10-shot image generation | GenEval | Overall | 0.8 | Janus-Pro-7B |
| 10-shot image generation | GenEval | Overall | 0.73 | Janus-Pro-1B |
| Visual Question Answering | MM-Vet | GPT-4 score | 50 | Janus-Pro-7B |
| Visual Question Answering | MM-Vet | GPT-4 score | 39.8 | Janus-Pro-1B |
| 1 Image, 2*2 Stitchi | GenEval | Overall | 0.8 | Janus-Pro-7B |
| 1 Image, 2*2 Stitchi | GenEval | Overall | 0.73 | Janus-Pro-1B |