An Impartial Transformer for Story Visualization
Nikolaos Tsakas, Maria Lymperaiou, Giorgos Filandrianos, Giorgos Stamou
Abstract
Story Visualization is an advanced task of computed vision that targets sequential image synthesis, where the generated samples need to be realistic, faithful to their conditioning and sequentially consistent. Our work proposes a novel architectural and training approach: the Impartial Transformer achieves both text-relevant plausible scenes and sequential consistency utilizing as few trainable parameters as possible. This enhancement is even able to handle synthesis of 'hard' samples with occluded objects, achieving improved evaluation metrics comparing to past approaches.
Results
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Text-To-Image | CLEVR-SV | LPIPS | 0.21 | Impartial Transformer |
Related Papers
fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection2025-07-17FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization2025-07-17A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints2025-07-17Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17FADE: Adversarial Concept Erasure in Flow Models2025-07-16CharaConsist: Fine-Grained Consistent Character Generation2025-07-15CATVis: Context-Aware Thought Visualization2025-07-15