(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–14 of 14 results for author: Blattmann, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.12015  [pdf, other

    cs.CV

    Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation

    Authors: Axel Sauer, Frederic Boesel, Tim Dockhorn, Andreas Blattmann, Patrick Esser, Robin Rombach

    Abstract: Diffusion models are the main driver of progress in image and video synthesis, but suffer from slow inference speed. Distillation methods, like the recently introduced adversarial diffusion distillation (ADD) aim to shift the model from many-shot to single-step inference, albeit at the cost of expensive and difficult optimization due to its reliance on a fixed pretrained DINOv2 discriminator. We i… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  2. arXiv:2403.03206  [pdf, other

    cs.CV

    Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

    Authors: Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek, Robin Rombach

    Abstract: Diffusion models create data from noise by inverting the forward paths of data towards noise and have emerged as a powerful generative modeling technique for high-dimensional, perceptual data such as images and videos. Rectified flow is a recent generative model formulation that connects data and noise in a straight line. Despite its better theoretical properties and conceptual simplicity, it is n… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  3. arXiv:2311.17042  [pdf, other

    cs.CV

    Adversarial Diffusion Distillation

    Authors: Axel Sauer, Dominik Lorenz, Andreas Blattmann, Robin Rombach

    Abstract: We introduce Adversarial Diffusion Distillation (ADD), a novel training approach that efficiently samples large-scale foundational image diffusion models in just 1-4 steps while maintaining high image quality. We use score distillation to leverage large-scale off-the-shelf image diffusion models as a teacher signal in combination with an adversarial loss to ensure high image fidelity even in the l… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  4. arXiv:2311.15127  [pdf, other

    cs.CV

    Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

    Authors: Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Dominik Lorenz, Yam Levi, Zion English, Vikram Voleti, Adam Letts, Varun Jampani, Robin Rombach

    Abstract: We present Stable Video Diffusion - a latent video diffusion model for high-resolution, state-of-the-art text-to-video and image-to-video generation. Recently, latent diffusion models trained for 2D image synthesis have been turned into generative video models by inserting temporal layers and finetuning them on small, high-quality video datasets. However, training methods in the literature vary wi… ▽ More

    Submitted 25 November, 2023; originally announced November 2023.

  5. arXiv:2307.01952  [pdf, other

    cs.CV cs.AI

    SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

    Authors: Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, Robin Rombach

    Abstract: We present SDXL, a latent diffusion model for text-to-image synthesis. Compared to previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone: The increase of model parameters is mainly due to more attention blocks and a larger cross-attention context as SDXL uses a second text encoder. We design multiple novel conditioning schemes and train SDXL on multiple aspect ra… ▽ More

    Submitted 4 July, 2023; originally announced July 2023.

  6. arXiv:2304.08818  [pdf, other

    cs.CV cs.LG

    Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models

    Authors: Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, Karsten Kreis

    Abstract: Latent Diffusion Models (LDMs) enable high-quality image synthesis while avoiding excessive compute demands by training a diffusion model in a compressed lower-dimensional latent space. Here, we apply the LDM paradigm to high-resolution video generation, a particularly resource-intensive task. We first pre-train an LDM on images only; then, we turn the image generator into a video generator by int… ▽ More

    Submitted 27 December, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

    Comments: Conference on Computer Vision and Pattern Recognition (CVPR) 2023. Project page: https://research.nvidia.com/labs/toronto-ai/VideoLDM/

  7. arXiv:2207.13038  [pdf, other

    cs.CV

    Text-Guided Synthesis of Artistic Images with Retrieval-Augmented Diffusion Models

    Authors: Robin Rombach, Andreas Blattmann, Björn Ommer

    Abstract: Novel architectures have recently improved generative image synthesis leading to excellent visual quality in various tasks. Of particular note is the field of ``AI-Art'', which has seen unprecedented growth with the emergence of powerful multimodal models such as CLIP. By combining speech and image synthesis models, so-called ``prompt-engineering'' has become established, in which carefully select… ▽ More

    Submitted 26 July, 2022; originally announced July 2022.

    Comments: 4 pages

  8. arXiv:2204.11824  [pdf, other

    cs.CV

    Semi-Parametric Neural Image Synthesis

    Authors: Andreas Blattmann, Robin Rombach, Kaan Oktay, Jonas Müller, Björn Ommer

    Abstract: Novel architectures have recently improved generative image synthesis leading to excellent visual quality in various tasks. Much of this success is due to the scalability of these architectures and hence caused by a dramatic increase in model complexity and in the computational resources invested in training these models. Our work questions the underlying paradigm of compressing large training dat… ▽ More

    Submitted 24 October, 2022; v1 submitted 25 April, 2022; originally announced April 2022.

    Comments: NeurIPS 2022

  9. arXiv:2112.10752  [pdf, other

    cs.CV

    High-Resolution Image Synthesis with Latent Diffusion Models

    Authors: Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer

    Abstract: By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond. Additionally, their formulation allows for a guiding mechanism to control the image generation process without retraining. However, since these models typically operate directly in pixel space, optimization o… ▽ More

    Submitted 13 April, 2022; v1 submitted 20 December, 2021; originally announced December 2021.

    Comments: CVPR 2022

  10. arXiv:2108.08827  [pdf, other

    cs.CV

    ImageBART: Bidirectional Context with Multinomial Diffusion for Autoregressive Image Synthesis

    Authors: Patrick Esser, Robin Rombach, Andreas Blattmann, Björn Ommer

    Abstract: Autoregressive models and their sequential factorization of the data likelihood have recently demonstrated great potential for image representation and synthesis. Nevertheless, they incorporate image context in a linear 1D order by attending only to previously synthesized image patches above or to the left. Not only is this unidirectional, sequential bias of attention unnatural for images as it di… ▽ More

    Submitted 19 August, 2021; originally announced August 2021.

  11. arXiv:2107.02790  [pdf, other

    cs.CV

    iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis

    Authors: Andreas Blattmann, Timo Milbich, Michael Dorkenwald, Björn Ommer

    Abstract: How would a static scene react to a local poke? What are the effects on other parts of an object if you could locally push it? There will be distinctive movement, despite evident variations caused by the stochastic nature of our world. These outcomes are governed by the characteristic kinematics of objects that dictate their overall motion caused by a local interaction. Conversely, the movement of… ▽ More

    Submitted 6 October, 2021; v1 submitted 6 July, 2021; originally announced July 2021.

    Comments: ICCV 2021, Project page is available at https://bit.ly/3dJN4Lf

  12. arXiv:2106.11303  [pdf, other

    cs.CV

    Understanding Object Dynamics for Interactive Image-to-Video Synthesis

    Authors: Andreas Blattmann, Timo Milbich, Michael Dorkenwald, Björn Ommer

    Abstract: What would be the effect of locally poking a static scene? We present an approach that learns naturally-looking global articulations caused by a local manipulation at a pixel level. Training requires only videos of moving objects but no information of the underlying manipulation of the physical scene. Our generative model learns to infer natural object dynamics as a response to user interaction an… ▽ More

    Submitted 21 June, 2021; originally announced June 2021.

    Comments: CVPR 2021, project page available at https://bit.ly/3cxfA2L

  13. arXiv:2105.04551  [pdf, other

    cs.CV

    Stochastic Image-to-Video Synthesis using cINNs

    Authors: Michael Dorkenwald, Timo Milbich, Andreas Blattmann, Robin Rombach, Konstantinos G. Derpanis, Björn Ommer

    Abstract: Video understanding calls for a model to learn the characteristic interplay between static scene content and its dynamics: Given an image, the model must be able to predict a future progression of the portrayed scene and, conversely, a video should be explained in terms of its static image content and all the remaining characteristics not present in the initial frame. This naturally suggests a bij… ▽ More

    Submitted 17 June, 2021; v1 submitted 10 May, 2021; originally announced May 2021.

    Comments: Accepted to CVPR 2021

  14. arXiv:2103.04677  [pdf, other

    cs.CV

    Behavior-Driven Synthesis of Human Dynamics

    Authors: Andreas Blattmann, Timo Milbich, Michael Dorkenwald, Björn Ommer

    Abstract: Generating and representing human behavior are of major importance for various computer vision applications. Commonly, human video synthesis represents behavior as sequences of postures while directly predicting their likely progressions or merely changing the appearance of the depicted persons, thus not being able to exercise control over their actual behavior during the synthesis process. In con… ▽ More

    Submitted 22 April, 2021; v1 submitted 8 March, 2021; originally announced March 2021.

    Comments: Accepted to CVPR 2021 as Poster