Street TryOn: Learning In-the-Wild Virtual Try-On from Unpaired Images

Aiyu Cui
Jay Mahajan
Viraj Shah
Preeti Gomathinayagam
Svetlana Lazebnik

University of Illinois at Urbana-Champaign



Virtual try-on has become a popular research topic, but most existing methods focus on studio images with a clean background. They can achieve plausible results for this studio try-on setting by learning to warp a garment image to fit a person's body from paired training data, i.e., garment images paired with images of people wearing the same garment. Such data is often collected from commercial websites, where each garment is demonstrated both by itself and on several models. By contrast, it is hard to collect paired data for in-the-wild scenes, and therefore, virtual try-on for casual images of people against cluttered backgrounds is rarely studied.

In this work, we fill the gap in the current virtual try-on research by (1) introducing a Street TryOn benchmark to evaluate performance on street scenes and (2) proposing a novel method that can learn without paired data, from a set of in-the-wild person images directly. Our method can achieve robust performance across shop and street domains using a novel DensePose warping correction method combined with diffusion-based inpainting controlled by pose and semantic segmentation. Our experiments demonstrate competitive performance for standard studio try-on tasks a nd SOTA performance for street try-on and cross-domain try-on tasks.

Street2Street TryOn

The Street2Street TryOn tasks takes a garment from a street image and aim to put it on a person in a causual image against cluttered backgrounds. The below images show our proposed method can effectively achieve the Street2Street virtual try-on.

StreetTryOn Dataset/Benchmark

we introduce a new benchmark, StreetTryOn, derived from the large fashion retrieval dataset DeepFashion2. We filter out over 90% of DeepFashion2 images that are infeasible for try-on tasks (e.g., non-frontal view, large occlusion, dark environment, etc.) to obtain 12,364 and 2,089 street person images for training and validation, respectively.


The data is released here.

Thanks to Unnat Jain for sharing the template of this project page.