Search | arXiv e-print repository

DiCTI: Diffusion-based Clothing Designer via Text-guided Input

Authors: Ajda Lampe, Julija Stopar, Deepak Kumar Jain, Shinichiro Omachi, Peter Peer, Vitomir Štruc

Abstract: Recent developments in deep generative models have opened up a wide range of opportunities for image synthesis, leading to significant changes in various creative fields, including the fashion industry. While numerous methods have been proposed to benefit buyers, particularly in virtual try-on applications, there has been relatively less focus on facilitating fast prototyping for designers and cus… ▽ More Recent developments in deep generative models have opened up a wide range of opportunities for image synthesis, leading to significant changes in various creative fields, including the fashion industry. While numerous methods have been proposed to benefit buyers, particularly in virtual try-on applications, there has been relatively less focus on facilitating fast prototyping for designers and customers seeking to order new designs. To address this gap, we introduce DiCTI (Diffusion-based Clothing Designer via Text-guided Input), a straightforward yet highly effective approach that allows designers to quickly visualize fashion-related ideas using text inputs only. Given an image of a person and a description of the desired garments as input, DiCTI automatically generates multiple high-resolution, photorealistic images that capture the expressed semantics. By leveraging a powerful diffusion-based inpainting model conditioned on text inputs, DiCTI is able to synthesize convincing, high-quality images with varied clothing designs that viably follow the provided text descriptions, while being able to process very diverse and challenging inputs, captured in completely unconstrained settings. We evaluate DiCTI in comprehensive experiments on two different datasets (VITON-HD and Fashionpedia) and in comparison to the state-of-the-art (SoTa). The results of our experiments show that DiCTI convincingly outperforms the SoTA competitor in generating higher quality images with more elaborate garments and superior text prompt adherence, both according to standard quantitative evaluation measures and human ratings, generated as part of a user study. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: Accepted to FG 2024

arXiv:2311.18659 [pdf, other]

Comparison of Autoscaling Frameworks for Containerised Machine-Learning-Applications in a Local and Cloud Environment

Authors: Christian Schroeder, Rene Boehm, Alexander Lampe

Abstract: When deploying machine learning (ML) applications, the automated allocation of computing resources-commonly referred to as autoscaling-is crucial for maintaining a consistent inference time under fluctuating workloads. The objective is to maximize the Quality of Service metrics, emphasizing performance and availability, while minimizing resource costs. In this paper, we compare scalable deployment… ▽ More When deploying machine learning (ML) applications, the automated allocation of computing resources-commonly referred to as autoscaling-is crucial for maintaining a consistent inference time under fluctuating workloads. The objective is to maximize the Quality of Service metrics, emphasizing performance and availability, while minimizing resource costs. In this paper, we compare scalable deployment techniques across three levels of scaling: at the application level (TorchServe, RayServe) and the container level (K3s) in a local environment (production server), as well as at the container and machine levels in a cloud environment (Amazon Web Services Elastic Container Service and Elastic Kubernetes Service). The comparison is conducted through the study of mean and standard deviation of inference time in a multi-client scenario, along with upscaling response times. Based on this analysis, we propose a deployment strategy for both local and cloud-based environments. △ Less

Submitted 25 February, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

Comments: 6 pages, 3 figures

MSC Class: 94-04 ACM Class: I.2.11

arXiv:2212.06550 [pdf, other]

doi 10.1109/ICAIIC54071.2022.9722662

Body Segmentation Using Multi-task Learning

Authors: Julijan Jug, Ajda Lampe, Vitomir Štruc, Peter Peer

Abstract: Body segmentation is an important step in many computer vision problems involving human images and one of the key components that affects the performance of all downstream tasks. Several prior works have approached this problem using a multi-task model that exploits correlations between different tasks to improve segmentation performance. Based on the success of such solutions, we present in this… ▽ More Body segmentation is an important step in many computer vision problems involving human images and one of the key components that affects the performance of all downstream tasks. Several prior works have approached this problem using a multi-task model that exploits correlations between different tasks to improve segmentation performance. Based on the success of such solutions, we present in this paper a novel multi-task model for human segmentation/parsing that involves three tasks, i.e., (i) keypoint-based skeleton estimation, (ii) dense pose prediction, and (iii) human-body segmentation. The main idea behind the proposed Segmentation--Pose--DensePose model (or SPD for short) is to learn a better segmentation model by sharing knowledge across different, yet related tasks. SPD is based on a shared deep neural network backbone that branches off into three task-specific model heads and is learned using a multi-task optimization objective. The performance of the model is analysed through rigorous experiments on the LIP and ATR datasets and in comparison to a recent (state-of-the-art) multi-task body-segmentation model. Comprehensive ablation studies are also presented. Our experimental results show that the proposed multi-task (segmentation) model is highly competitive and that the introduction of additional tasks contributes towards a higher overall segmentation performance. △ Less

Submitted 13 December, 2022; originally announced December 2022.

arXiv:2212.04437 [pdf, other]

doi 10.1109/WACV51458.2022.00226

C-VTON: Context-Driven Image-Based Virtual Try-On Network

Authors: Benjamin Fele, Ajda Lampe, Peter Peer, Vitomir Štruc

Abstract: Image-based virtual try-on techniques have shown great promise for enhancing the user-experience and improving customer satisfaction on fashion-oriented e-commerce platforms. However, existing techniques are currently still limited in the quality of the try-on results they are able to produce from input images of diverse characteristics. In this work, we propose a Context-Driven Virtual Try-On Net… ▽ More Image-based virtual try-on techniques have shown great promise for enhancing the user-experience and improving customer satisfaction on fashion-oriented e-commerce platforms. However, existing techniques are currently still limited in the quality of the try-on results they are able to produce from input images of diverse characteristics. In this work, we propose a Context-Driven Virtual Try-On Network (C-VTON) that addresses these limitations and convincingly transfers selected clothing items to the target subjects even under challenging pose configurations and in the presence of self-occlusions. At the core of the C-VTON pipeline are: (i) a geometric matching procedure that efficiently aligns the target clothing with the pose of the person in the input images, and (ii) a powerful image generator that utilizes various types of contextual information when synthesizing the final try-on result. C-VTON is evaluated in rigorous experiments on the VITON and MPV datasets and in comparison to state-of-the-art techniques from the literature. Experimental results show that the proposed approach is able to produce photo-realistic and visually convincing results and significantly improves on the existing state-of-the-art. △ Less

Submitted 8 December, 2022; originally announced December 2022.

Comments: Accepted to WACV 2022

arXiv:2107.09647 [pdf, other]

Proximal Policy Optimization for Tracking Control Exploiting Future Reference Information

Authors: Jana Mayer, Johannes Westermann, Juan Pedro Gutiérrez H. Muriedas, Uwe Mettin, Alexander Lampe

Abstract: In recent years, reinforcement learning (RL) has gained increasing attention in control engineering. Especially, policy gradient methods are widely used. In this work, we improve the tracking performance of proximal policy optimization (PPO) for arbitrary reference signals by incorporating information about future reference values. Two variants of extending the argument of the actor and the critic… ▽ More In recent years, reinforcement learning (RL) has gained increasing attention in control engineering. Especially, policy gradient methods are widely used. In this work, we improve the tracking performance of proximal policy optimization (PPO) for arbitrary reference signals by incorporating information about future reference values. Two variants of extending the argument of the actor and the critic taking future reference values into account are presented. In the first variant, global future reference values are added to the argument. For the second variant, a novel kind of residual space with future reference values applicable to model-free reinforcement learning is introduced. Our approach is evaluated against a PI controller on a simple drive train model. We expect our method to generalize to arbitrary references better than previous approaches, pointing towards the applicability of RL to control real systems. △ Less

Submitted 20 July, 2021; originally announced July 2021.

arXiv:1910.00881 [pdf, other]

Scientific opportunies for bERLinPro 2020+, report with ideas and conclusions from bERLinProCamp 2019

Authors: Thorsten Kamps, Michael Abo-Bakr, Andreas Adelmann, Kevin Andre, Deepa Angal-Kalinin, Felix Armborst, Andre Arnold, Michaela Arnold, Raymond Amador, Stephen Benson, Yulia Choporova, Illya Drebot, Ralph Ernstdorfer, Pavel Evtushenko, Kathrin Goldammer, Andreas Jankowiak, Georg Hofftstaetter, Florian Hug, Ji-Gwang Hwang, Lee Jones, Julius Kuehn, Jens Knobloch, Bettina Kuske, Andre Lampe, Sonal Mistry , et al. (16 additional authors not shown)

Abstract: The Energy Recovery Linac (ERL) paradigm offers the promise to generate intense electron beams of superior quality with extremely small six-dimensional phase space for many applications in the physical sciences, materials science, chemistry, health, information technology and security. Helmholtz-Zentrum Berlin started in 2010 an intensive R\&D programme to address the challenges related to the ERL… ▽ More The Energy Recovery Linac (ERL) paradigm offers the promise to generate intense electron beams of superior quality with extremely small six-dimensional phase space for many applications in the physical sciences, materials science, chemistry, health, information technology and security. Helmholtz-Zentrum Berlin started in 2010 an intensive R\&D programme to address the challenges related to the ERL as driver for future light sources by setting up the bERLinPro (Berlin ERL Project) ERL with 50 MeV beam energy and high average current. The project is close to reach its major milestone in 2020, acceleration and recovery of a high brightness electron beam. The goal of bERLinProCamp 2019 was to discuss scientific opportunities for bERLinPro 2020+. bERLinProCamp 2019 was held on Tue, 17.09.2019 at Helmholtz-Zentrum Berlin, Berlin, Germany. This paper summarizes the main themes and output of the workshop. △ Less

Submitted 8 January, 2020; v1 submitted 2 October, 2019; originally announced October 2019.

arXiv:1907.07890 [pdf, other]

A feasibility study of deep neural networks for the recognition of banknotes regarding central bank requirements

Authors: Julia Schulte, Daniel Staps, Alexander Lampe

Abstract: This paper contains a feasibility study of deep neural networks for the classification of Euro banknotes with respect to requirements of central banks on the ATM and high speed sorting industry. Instead of concentrating on the accuracy for a large number of classes as in the famous ImageNet Challenge we focus thus on conditions with few classes and the requirement of rejection of images belonging… ▽ More This paper contains a feasibility study of deep neural networks for the classification of Euro banknotes with respect to requirements of central banks on the ATM and high speed sorting industry. Instead of concentrating on the accuracy for a large number of classes as in the famous ImageNet Challenge we focus thus on conditions with few classes and the requirement of rejection of images belonging clearly to neither of the trained classes (i.e. classification in a so-called 0-class). These special requirements are part of frameworks defined by central banks as the European Central Bank and are met by current ATMs and high speed sorting machines. We also consider training and classification time on state of the art GPU hardware. The study concentrates on the banknote recognition whereas banknote class dependent authenticity and fitness checks are a topic of its own which is not considered in this work. △ Less

Submitted 7 October, 2019; v1 submitted 18 July, 2019; originally announced July 2019.

Comments: 6 pages, 4 figures

ACM Class: I.7.5; I.5.1; I.2.6; G.1.6; G.3

arXiv:1408.2167 [pdf, other]

The Strength of the Grätzer-Schmidt Theorem

Authors: Katie Brodhead, Mushfeq Khan, Bjørn Kjos-Hanssen, William A. Lampe, Paul Kim Long V. Nguyen, Richard A. Shore

Abstract: The Grätzer-Schmidt theorem of lattice theory states that each algebraic lattice is isomorphic to the congruence lattice of an algebra. We study the reverse mathematics of this theorem. We also show that the set of indices of computable lattices that are complete is $Πぱい^1_1$-complete; the set of indices of computable lattices that are algebraic is $Πぱい^1_1$-complete; the set of compact elements of… ▽ More The Grätzer-Schmidt theorem of lattice theory states that each algebraic lattice is isomorphic to the congruence lattice of an algebra. We study the reverse mathematics of this theorem. We also show that the set of indices of computable lattices that are complete is $Πぱい^1_1$-complete; the set of indices of computable lattices that are algebraic is $Πぱい^1_1$-complete; the set of compact elements of a computable lattice is $Πぱい^{1}_{1}$ and can be $Πぱい^1_1$-complete; and the set of compact elements of a distributive computable lattice is $Πぱい^{0}_{3}$, and there is an algebraic distributive computable lattice such that the set of its compact elements is $Πぱい^0_3$-complete. △ Less

Submitted 4 January, 2019; v1 submitted 9 August, 2014; originally announced August 2014.

Comments: This journal version replaces the conference version (Computability in Europe, Lecture Notes in Computer Science 5635 (2009), 59--67)

MSC Class: 03D

Journal ref: Archive for Mathematical Logic 55 (2016), no. 5, 687--704

Showing 1–8 of 8 results for author: Lampe, A