Search Results for author: Stefan Andreas Baumann

Found 6 papers, 4 papers with code

CTRLorALTer: Conditional LoRAdapter for Efficient 0-Shot Control & Altering of T2I Models

no code implementations • 13 May 2024 • Nick Stracke, Stefan Andreas Baumann, Joshua M. Susskind, Miguel Angel Bautista, Björn Ommer

Text-to-image generative models have become a prominent and powerful tool that excels at generating high-resolution realistic images.

Paper
Add Code

Continuous, Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions

1 code implementation • 25 Mar 2024 • Stefan Andreas Baumann, Felix Krause, Michael Neumayr, Nick Stracke, Vincent Tao Hu, Björn Ommer

We demonstrate that these directions can be used to augment the prompt text input with fine-grained control over attributes of specific subjects in a compositional manner (control over multiple attributes of a single subject) without having to adapt the diffusion model.

Attribute

Paper
Code

DepthFM: Fast Monocular Depth Estimation with Flow Matching

no code implementations • 20 Mar 2024 • Ming Gui, Johannes S. Fischer, Ulrich Prestel, Pingchuan Ma, Dmytro Kotovenko, Olga Grebenkova, Stefan Andreas Baumann, Vincent Tao Hu, Björn Ommer

Due to the generative nature of our approach, our model reliably predicts the confidence of its depth estimates.

Monocular Depth Estimation

Paper
Add Code

ZigMa: A DiT-style Zigzag Mamba Diffusion Model

1 code implementation • 20 Mar 2024 • Vincent Tao Hu, Stefan Andreas Baumann, Ming Gui, Olga Grebenkova, Pingchuan Ma, Johannes Fischer, Björn Ommer

The diffusion model has long been plagued by scalability and quadratic complexity issues, especially within transformer-based structures.

159

Paper
Code

Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers

1 code implementation • 21 Jan 2024 • Katherine Crowson, Stefan Andreas Baumann, Alex Birch, Tanishq Mathew Abraham, Daniel Z. Kaplan, Enrico Shippole

We present the Hourglass Diffusion Transformer (HDiT), an image generative model that exhibits linear scaling with pixel count, supporting training at high-resolution (e. g. $1024 \times 1024$) directly in pixel-space.

Image Generation

2,106

Paper
Code

Deeper Convolutional Neural Networks and Broad Augmentation Policies Improve Performance in Musical Key Estimation

1 code implementation • Proceedings of the International Society for Music Information Retrieval Conference (ISMIR) 2021 • Stefan Andreas Baumann

In recent years, complex convolutional neural network architectures such as the Inception architecture have been shown to offer significant improvements over previous architectures in image classification.

Ranked #1 on Key Detection on Giantsteps (using extra training data)

Information Retrieval Key Detection +2

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.