Street TryOn: Learning In-the-Wild Virtual Try-On from Unpaired Images

Abstract

Virtual try-on has become a popular research topic, but most existing methods focus on studio images with a clean background. They can achieve plausible results for this studio try-on setting by learning to warp a garment image to fit a person's body from paired training data, i.e., garment images paired with images of people wearing the same garment. Such data is often collected from commercial websites, where each garment is demonstrated both by itself and on several models. By contrast, it is hard to collect paired data for in-the-wild scenes, and therefore, virtual try-on for casual images of people against cluttered backgrounds is rarely studied.

In this work, we fill the gap in the current virtual try-on research by (1) introducing a Street TryOn benchmark to evaluate performance on street scenes and (2) proposing a novel method that can learn without paired data, from a set of in-the-wild person images directly. Our method can achieve robust performance across shop and street domains using a novel DensePose warping correction method combined with diffusion-based inpainting controlled by pose and semantic segmentation. Our experiments demonstrate competitive performance for standard studio try-on tasks a nd SOTA performance for street try-on and cross-domain try-on tasks.

StreetTryOn Dataset/Benchmark

we introduce a new benchmark, StreetTryOn, derived from the large fashion retrieval dataset DeepFashion2. We filter out over 90% of DeepFashion2 images that are infeasible for try-on tasks (e.g., non-frontal view, large occlusion, dark environment, etc.) to obtain 12,364 and 2,089 street person images for training and validation, respectively.

Release

The data is released here.

Street TryOn: Learning In-the-Wild Virtual Try-On from Unpaired Images

University of Illinois at Urbana-Champaign

WACV 2025

Abstract

Street2Street TryOn

StreetTryOn Dataset/Benchmark

Release