TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/StableVITON: Learning Semantic Correspondence with Latent ...

StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On

Jeongho Kim, Gyojung Gu, Minho Park, Sunghyun Park, Jaegul Choo

2023-12-04CVPR 2024 1Virtual Try-onSemantic correspondence
PaperPDFCode(official)

Abstract

Given a clothing image and a person image, an image-based virtual try-on aims to generate a customized image that appears natural and accurately reflects the characteristics of the clothing image. In this work, we aim to expand the applicability of the pre-trained diffusion model so that it can be utilized independently for the virtual try-on task.The main challenge is to preserve the clothing details while effectively utilizing the robust generative capability of the pre-trained model. In order to tackle these issues, we propose StableVITON, learning the semantic correspondence between the clothing and the human body within the latent space of the pre-trained diffusion model in an end-to-end manner. Our proposed zero cross-attention blocks not only preserve the clothing details by learning the semantic correspondence but also generate high-fidelity images by utilizing the inherent knowledge of the pre-trained model in the warping process. Through our proposed novel attention total variation loss and applying augmentation, we achieve the sharp attention map, resulting in a more precise representation of clothing details. StableVITON outperforms the baselines in qualitative and quantitative evaluation, showing promising quality in arbitrary person images. Our code is available at https://github.com/rlawjdghek/StableVITON.

Results

TaskDatasetMetricValueModel
Virtual Try-onVITON-HDFID8.233StableVITON
1 Image, 2*2 StitchiVITON-HDFID8.233StableVITON

Related Papers

TalkFashion: Intelligent Virtual Try-On Assistant Based on Multimodal Large Language Model2025-07-08Video Virtual Try-on with Conditional Diffusion Transformer Inpainter2025-06-26RL from Physical Feedback: Aligning Large Motion Models with Humanoid Control2025-06-15Real-Time Per-Garment Virtual Try-On with Temporal Consistency for Loose-Fitting Garments2025-06-14Low-Barrier Dataset Collection with Real Human Body for Interactive Per-Garment Virtual Try-On2025-06-12Jamais Vu: Exposing the Generalization Gap in Supervised Semantic Correspondence2025-06-09Do It Yourself: Learning Semantic Correspondence from Pseudo-Labels2025-06-05MotionRAG-Diff: A Retrieval-Augmented Diffusion Framework for Long-Term Music-to-Dance Generation2025-06-03