TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/FitDiT: Advancing the Authentic Garment Details for High-f...

FitDiT: Advancing the Authentic Garment Details for High-fidelity Virtual Try-on

Boyuan Jiang, Xiaobin Hu, Donghao Luo, Qingdong He, Chengming Xu, Jinlong Peng, Jiangning Zhang, Chengjie Wang, Yunsheng Wu, Yanwei Fu

2024-11-15Virtual Try-on
PaperPDFCode(official)Code

Abstract

Although image-based virtual try-on has made considerable progress, emerging approaches still encounter challenges in producing high-fidelity and robust fitting images across diverse scenarios. These methods often struggle with issues such as texture-aware maintenance and size-aware fitting, which hinder their overall effectiveness. To address these limitations, we propose a novel garment perception enhancement technique, termed FitDiT, designed for high-fidelity virtual try-on using Diffusion Transformers (DiT) allocating more parameters and attention to high-resolution features. First, to further improve texture-aware maintenance, we introduce a garment texture extractor that incorporates garment priors evolution to fine-tune garment feature, facilitating to better capture rich details such as stripes, patterns, and text. Additionally, we introduce frequency-domain learning by customizing a frequency distance loss to enhance high-frequency garment details. To tackle the size-aware fitting issue, we employ a dilated-relaxed mask strategy that adapts to the correct length of garments, preventing the generation of garments that fill the entire mask area during cross-category try-on. Equipped with the above design, FitDiT surpasses all baselines in both qualitative and quantitative evaluations. It excels in producing well-fitting garments with photorealistic and intricate details, while also achieving competitive inference times of 4.57 seconds for a single 1024x768 image after DiT structure slimming, outperforming existing methods.

Results

TaskDatasetMetricValueModel
Virtual Try-onVITON-HDFID4.7309FItDiT
1 Image, 2*2 StitchiVITON-HDFID4.7309FItDiT

Related Papers

TalkFashion: Intelligent Virtual Try-On Assistant Based on Multimodal Large Language Model2025-07-08Video Virtual Try-on with Conditional Diffusion Transformer Inpainter2025-06-26Real-Time Per-Garment Virtual Try-On with Temporal Consistency for Loose-Fitting Garments2025-06-14Low-Barrier Dataset Collection with Real Human Body for Interactive Per-Garment Virtual Try-On2025-06-12VITON-DRR: Details Retention Virtual Try-on via Non-rigid Registration2025-05-29Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals2025-05-27VTBench: Comprehensive Benchmark Suite Towards Real-World Virtual Try-on Models2025-05-26HF-VTON: High-Fidelity Virtual Try-On via Consistent Geometric and Semantic Alignment2025-05-26