-
A Field of Experts Prior for Adapting Neural Networks at Test Time
Authors:
Neerav Karani,
Georg Brunner,
Ertunc Erdil,
Simin Fei,
Kerem Tezcan,
Krishna Chaitanya,
Ender Konukoglu
Abstract:
Performance of convolutional neural networks (CNNs) in image analysis tasks is often marred in the presence of acquisition-related distribution shifts between training and test images. Recently, it has been proposed to tackle this problem by fine-tuning trained CNNs for each test image. Such test-time-adaptation (TTA) is a promising and practical strategy for improving robustness to distribution s…
▽ More
Performance of convolutional neural networks (CNNs) in image analysis tasks is often marred in the presence of acquisition-related distribution shifts between training and test images. Recently, it has been proposed to tackle this problem by fine-tuning trained CNNs for each test image. Such test-time-adaptation (TTA) is a promising and practical strategy for improving robustness to distribution shifts as it requires neither data sharing between institutions nor annotating additional data. Previous TTA methods use a helper model to increase similarity between outputs and/or features extracted from a test image with those of the training images. Such helpers, which are typically modeled using CNNs, can be task-specific and themselves vulnerable to distribution shifts in their inputs. To overcome these problems, we propose to carry out TTA by matching the feature distributions of test and training images, as modelled by a field-of-experts (FoE) prior. FoEs model complicated probability distributions as products of many simpler expert distributions. We use 1D marginal distributions of a trained task CNN's features as experts in the FoE model. Further, we compute principal components of patches of the task CNN's features, and consider the distributions of PCA loadings as additional experts. We validate the method on 5 MRI segmentation tasks (healthy tissues in 4 anatomical regions and lesions in 1 one anatomy), using data from 17 clinics, and on a MRI registration task, using data from 3 clinics. We find that the proposed FoE-based TTA is generically applicable in multiple tasks, and outperforms all previous TTA methods for lesion segmentation. For healthy tissue segmentation, the proposed method outperforms other task-agnostic methods, but a previous TTA method which is specifically designed for segmentation performs the best for most of the tested datasets. Our code is publicly available.
△ Less
Submitted 10 February, 2022;
originally announced February 2022.
-
Of Non-Linearity and Commutativity in BERT
Authors:
Sumu Zhao,
Damian Pascual,
Gino Brunner,
Roger Wattenhofer
Abstract:
In this work we provide new insights into the transformer architecture, and in particular, its best-known variant, BERT. First, we propose a method to measure the degree of non-linearity of different elements of transformers. Next, we focus our investigation on the feed-forward networks (FFN) inside transformers, which contain 2/3 of the model parameters and have so far not received much attention…
▽ More
In this work we provide new insights into the transformer architecture, and in particular, its best-known variant, BERT. First, we propose a method to measure the degree of non-linearity of different elements of transformers. Next, we focus our investigation on the feed-forward networks (FFN) inside transformers, which contain 2/3 of the model parameters and have so far not received much attention. We find that FFNs are an inefficient yet important architectural element and that they cannot simply be replaced by attention blocks without a degradation in performance. Moreover, we study the interactions between layers in BERT and show that, while the layers exhibit some hierarchical structure, they extract features in a fuzzy manner. Our results suggest that BERT has an inductive bias towards layer commutativity, which we find is mainly due to the skip connections. This provides a justification for the strong performance of recurrent and weight-shared transformer models.
△ Less
Submitted 7 May, 2021; v1 submitted 12 January, 2021;
originally announced January 2021.
-
Medley2K: A Dataset of Medley Transitions
Authors:
Lukas Faber,
Sandro Luck,
Damian Pascual,
Andreas Roth,
Gino Brunner,
Roger Wattenhofer
Abstract:
The automatic generation of medleys, i.e., musical pieces formed by different songs concatenated via smooth transitions, is not well studied in the current literature. To facilitate research on this topic, we make available a dataset called Medley2K that consists of 2,000 medleys and 7,712 labeled transitions. Our dataset features a rich variety of song transitions across different music genres. W…
▽ More
The automatic generation of medleys, i.e., musical pieces formed by different songs concatenated via smooth transitions, is not well studied in the current literature. To facilitate research on this topic, we make available a dataset called Medley2K that consists of 2,000 medleys and 7,712 labeled transitions. Our dataset features a rich variety of song transitions across different music genres. We provide a detailed description of this dataset and validate it by training a state-of-the-art generative model in the task of generating transitions between songs.
△ Less
Submitted 25 August, 2020;
originally announced August 2020.
-
Telling BERT's full story: from Local Attention to Global Aggregation
Authors:
Damian Pascual,
Gino Brunner,
Roger Wattenhofer
Abstract:
We take a deep look into the behavior of self-attention heads in the transformer architecture. In light of recent work discouraging the use of attention distributions for explaining a model's behavior, we show that attention distributions can nevertheless provide insights into the local behavior of attention heads. This way, we propose a distinction between local patterns revealed by attention and…
▽ More
We take a deep look into the behavior of self-attention heads in the transformer architecture. In light of recent work discouraging the use of attention distributions for explaining a model's behavior, we show that attention distributions can nevertheless provide insights into the local behavior of attention heads. This way, we propose a distinction between local patterns revealed by attention and global patterns that refer back to the input, and analyze BERT from both angles. We use gradient attribution to analyze how the output of an attention attention head depends on the input tokens, effectively extending the local attention-based analysis to account for the mixing of information throughout the transformer layers. We find that there is a significant discrepancy between attention and attribution distributions, caused by the mixing of context inside the model. We quantify this discrepancy and observe that interestingly, there are some patterns that persist across all layers despite the mixing.
△ Less
Submitted 13 January, 2021; v1 submitted 9 April, 2020;
originally announced April 2020.
-
On Identifiability in Transformers
Authors:
Gino Brunner,
Yang Liu,
Damián Pascual,
Oliver Richter,
Massimiliano Ciaramita,
Roger Wattenhofer
Abstract:
In this paper we delve deep in the Transformer architecture by investigating two of its core components: self-attention and contextual embeddings. In particular, we study the identifiability of attention weights and token embeddings, and the aggregation of context into hidden tokens. We show that, for sequences longer than the attention head dimension, attention weights are not identifiable. We pr…
▽ More
In this paper we delve deep in the Transformer architecture by investigating two of its core components: self-attention and contextual embeddings. In particular, we study the identifiability of attention weights and token embeddings, and the aggregation of context into hidden tokens. We show that, for sequences longer than the attention head dimension, attention weights are not identifiable. We propose effective attention as a complementary tool for improving explanatory interpretations based on attention. Furthermore, we show that input tokens retain to a large degree their identity across the model. We also find evidence suggesting that identity information is mainly encoded in the angle of the embeddings and gradually decreases with depth. Finally, we demonstrate strong mixing of input information in the generation of contextual embeddings by means of a novel quantification method based on gradient attribution. Overall, we show that self-attention distributions are not directly interpretable and present tools to better understand and further investigate Transformer models.
△ Less
Submitted 7 February, 2020; v1 submitted 12 August, 2019;
originally announced August 2019.
-
Attentive Multi-Task Deep Reinforcement Learning
Authors:
Timo Bram,
Gino Brunner,
Oliver Richter,
Roger Wattenhofer
Abstract:
Sharing knowledge between tasks is vital for efficient learning in a multi-task setting. However, most research so far has focused on the easier case where knowledge transfer is not harmful, i.e., where knowledge from one task cannot negatively impact the performance on another task. In contrast, we present an approach to multi-task deep reinforcement learning based on attention that does not requ…
▽ More
Sharing knowledge between tasks is vital for efficient learning in a multi-task setting. However, most research so far has focused on the easier case where knowledge transfer is not harmful, i.e., where knowledge from one task cannot negatively impact the performance on another task. In contrast, we present an approach to multi-task deep reinforcement learning based on attention that does not require any a-priori assumptions about the relationships between tasks. Our attention network automatically groups task knowledge into sub-networks on a state level granularity. It thereby achieves positive knowledge transfer if possible, and avoids negative transfer in cases where tasks interfere. We test our algorithm against two state-of-the-art multi-task/transfer learning approaches and show comparable or superior performance while requiring fewer network parameters.
△ Less
Submitted 5 July, 2019;
originally announced July 2019.
-
Using State Predictions for Value Regularization in Curiosity Driven Deep Reinforcement Learning
Authors:
Gino Brunner,
Manuel Fritsche,
Oliver Richter,
Roger Wattenhofer
Abstract:
Learning in sparse reward settings remains a challenge in Reinforcement Learning, which is often addressed by using intrinsic rewards. One promising strategy is inspired by human curiosity, requiring the agent to learn to predict the future. In this paper a curiosity-driven agent is extended to use these predictions directly for training. To achieve this, the agent predicts the value function of t…
▽ More
Learning in sparse reward settings remains a challenge in Reinforcement Learning, which is often addressed by using intrinsic rewards. One promising strategy is inspired by human curiosity, requiring the agent to learn to predict the future. In this paper a curiosity-driven agent is extended to use these predictions directly for training. To achieve this, the agent predicts the value function of the next state at any point in time. Subsequently, the consistency of this prediction with the current value function is measured, which is then used as a regularization term in the loss function of the algorithm. Experiments were made on grid-world environments as well as on a 3D navigation task, both with sparse rewards. In the first case the extended agent is able to learn significantly faster than the baselines.
△ Less
Submitted 30 September, 2018;
originally announced October 2018.
-
The Urban Last Mile Problem: Autonomous Drone Delivery to Your Balcony
Authors:
Gino Brunner,
Bence Szebedy,
Simon Tanner,
Roger Wattenhofer
Abstract:
Drone delivery has been a hot topic in the industry in the past few years. However, existing approaches either focus on rural areas or rely on centralized drop-off locations from where the last mile delivery is performed. In this paper we tackle the problem of autonomous last mile delivery in urban environments using an off-the-shelf drone. We build a prototype system that is able to fly to the ap…
▽ More
Drone delivery has been a hot topic in the industry in the past few years. However, existing approaches either focus on rural areas or rely on centralized drop-off locations from where the last mile delivery is performed. In this paper we tackle the problem of autonomous last mile delivery in urban environments using an off-the-shelf drone. We build a prototype system that is able to fly to the approximate delivery location using GPS and then find the exact drop-off location using visual navigation. The drop-off location could, e.g., be on a balcony or porch, and simply needs to be indicated by a visual marker on the wall or window. We test our system components in simulated environments, including the visual navigation and collision avoidance. Finally, we deploy our drone in a real-world environment and show how it can find the drop-off point on a balcony. To stimulate future research in this topic we open source our code.
△ Less
Submitted 21 September, 2018;
originally announced September 2018.
-
MIDI-VAE: Modeling Dynamics and Instrumentation of Music with Applications to Style Transfer
Authors:
Gino Brunner,
Andres Konrad,
Yuyi Wang,
Roger Wattenhofer
Abstract:
We introduce MIDI-VAE, a neural network model based on Variational Autoencoders that is capable of handling polyphonic music with multiple instrument tracks, as well as modeling the dynamics of music by incorporating note durations and velocities. We show that MIDI-VAE can perform style transfer on symbolic music by automatically changing pitches, dynamics and instruments of a music piece from, e.…
▽ More
We introduce MIDI-VAE, a neural network model based on Variational Autoencoders that is capable of handling polyphonic music with multiple instrument tracks, as well as modeling the dynamics of music by incorporating note durations and velocities. We show that MIDI-VAE can perform style transfer on symbolic music by automatically changing pitches, dynamics and instruments of a music piece from, e.g., a Classical to a Jazz style. We evaluate the efficacy of the style transfer by training separate style validation classifiers. Our model can also interpolate between short pieces of music, produce medleys and create mixtures of entire songs. The interpolations smoothly change pitches, dynamics and instrumentation to create a harmonic bridge between two music pieces. To the best of our knowledge, this work represents the first successful attempt at applying neural style transfer to complete musical compositions.
△ Less
Submitted 20 September, 2018;
originally announced September 2018.
-
Symbolic Music Genre Transfer with CycleGAN
Authors:
Gino Brunner,
Yuyi Wang,
Roger Wattenhofer,
Sumu Zhao
Abstract:
Deep generative models such as Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs) have recently been applied to style and domain transfer for images, and in the case of VAEs, music. GAN-based models employing several generators and some form of cycle consistency loss have been among the most successful for image domain transfer. In this paper we apply such a model to symbol…
▽ More
Deep generative models such as Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs) have recently been applied to style and domain transfer for images, and in the case of VAEs, music. GAN-based models employing several generators and some form of cycle consistency loss have been among the most successful for image domain transfer. In this paper we apply such a model to symbolic music and show the feasibility of our approach for music genre transfer. Evaluations using separate genre classifiers show that the style transfer works well. In order to improve the fidelity of the transformed music, we add additional discriminators that cause the generators to keep the structure of the original music mostly intact, while still achieving strong genre transfer. Visual and audible results further show the potential of our approach. To the best of our knowledge, this paper represents the first application of GANs to symbolic music domain transfer.
△ Less
Submitted 20 September, 2018;
originally announced September 2018.
-
Natural Language Multitasking: Analyzing and Improving Syntactic Saliency of Hidden Representations
Authors:
Gino Brunner,
Yuyi Wang,
Roger Wattenhofer,
Michael Weigelt
Abstract:
We train multi-task autoencoders on linguistic tasks and analyze the learned hidden sentence representations. The representations change significantly when translation and part-of-speech decoders are added. The more decoders a model employs, the better it clusters sentences according to their syntactic similarity, as the representation space becomes less entangled. We explore the structure of the…
▽ More
We train multi-task autoencoders on linguistic tasks and analyze the learned hidden sentence representations. The representations change significantly when translation and part-of-speech decoders are added. The more decoders a model employs, the better it clusters sentences according to their syntactic similarity, as the representation space becomes less entangled. We explore the structure of the representation space by interpolating between sentences, which yields interesting pseudo-English sentences, many of which have recognizable syntactic structure. Lastly, we point out an interesting property of our models: The difference-vector between two sentences can be added to change a third sentence with similar features in a meaningful way.
△ Less
Submitted 18 January, 2018;
originally announced January 2018.
-
JamBot: Music Theory Aware Chord Based Generation of Polyphonic Music with LSTMs
Authors:
Gino Brunner,
Yuyi Wang,
Roger Wattenhofer,
Jonas Wiesendanger
Abstract:
We propose a novel approach for the generation of polyphonic music based on LSTMs. We generate music in two steps. First, a chord LSTM predicts a chord progression based on a chord embedding. A second LSTM then generates polyphonic music from the predicted chord progression. The generated music sounds pleasing and harmonic, with only few dissonant notes. It has clear long-term structure that is si…
▽ More
We propose a novel approach for the generation of polyphonic music based on LSTMs. We generate music in two steps. First, a chord LSTM predicts a chord progression based on a chord embedding. A second LSTM then generates polyphonic music from the predicted chord progression. The generated music sounds pleasing and harmonic, with only few dissonant notes. It has clear long-term structure that is similar to what a musician would play during a jam session. We show that our approach is sensible from a music theory perspective by evaluating the learned chord embeddings. Surprisingly, our simple model managed to extract the circle of fifths, an important tool in music theory, from the dataset.
△ Less
Submitted 21 November, 2017;
originally announced November 2017.
-
Teaching a Machine to Read Maps with Deep Reinforcement Learning
Authors:
Gino Brunner,
Oliver Richter,
Yuyi Wang,
Roger Wattenhofer
Abstract:
The ability to use a 2D map to navigate a complex 3D environment is quite remarkable, and even difficult for many humans. Localization and navigation is also an important problem in domains such as robotics, and has recently become a focus of the deep reinforcement learning community. In this paper we teach a reinforcement learning agent to read a map in order to find the shortest way out of a ran…
▽ More
The ability to use a 2D map to navigate a complex 3D environment is quite remarkable, and even difficult for many humans. Localization and navigation is also an important problem in domains such as robotics, and has recently become a focus of the deep reinforcement learning community. In this paper we teach a reinforcement learning agent to read a map in order to find the shortest way out of a random maze it has never seen before. Our system combines several state-of-the-art methods such as A3C and incorporates novel elements such as a recurrent localization cell. Our agent learns to localize itself based on 3D first person images and an approximate orientation angle. The agent generalizes well to bigger mazes, showing that it learned useful localization and navigation capabilities.
△ Less
Submitted 20 November, 2017;
originally announced November 2017.
-
RAFCON: a Graphical Tool for Task Programming and Mission Control
Authors:
Sebastian G. Brunner,
Franz Steinmetz,
Rico Belder,
Andreas Dömel
Abstract:
There are many application fields for robotic systems including service robotics, search and rescue missions, industry and space robotics. As the scenarios in these areas grow more and more complex, there is a high demand for powerful tools to efficiently program heterogeneous robotic systems. Therefore, we created RAFCON, a graphical tool to develop robotic tasks and to be used for mission contro…
▽ More
There are many application fields for robotic systems including service robotics, search and rescue missions, industry and space robotics. As the scenarios in these areas grow more and more complex, there is a high demand for powerful tools to efficiently program heterogeneous robotic systems. Therefore, we created RAFCON, a graphical tool to develop robotic tasks and to be used for mission control by remotely monitoring the execution of the tasks. To define the tasks, we use state machines which support hierarchies and concurrency. Together with a library concept, even complex scenarios can be handled gracefully. RAFCON supports sophisticated debugging functionality and tightly integrates error handling and recovery mechanisms. A GUI with a powerful state machine editor makes intuitive, visual programming and fast prototyping possible. We demonstrated the capabilities of our tool in the SpaceBotCamp national robotic competition, in which our mobile robot solved all exploration and assembly challenges fully autonomously. It is therefore also a promising tool for various RoboCup leagues.
△ Less
Submitted 30 May, 2016;
originally announced May 2016.
-
Spitzer Observations of M33 and the Hot Star, H II Region Connection
Authors:
Robert H. Rubin,
Janet P. Simpson,
Sean W. J. Colgan,
Reginald J. Dufour,
Gregory Brunner,
Ian A. McNabb,
Adalbert W. A. Pauldrach,
Edwin F. Erickson,
Michael R. Haas,
Robert I. Citron
Abstract:
We have observed emission lines of [S IV] 10.51, H(7-6) 12.37, [Ne II] 12.81, [Ne III] 15.56, and [S III] 18.71 um in a number of extragalactic H II regions with the Spitzer Space Telescope. A previous paper presented our data and analysis for the substantially face-on spiral galaxy M83. Here we report our results for the local group spiral galaxy M33. The nebulae selected cover a wide range of…
▽ More
We have observed emission lines of [S IV] 10.51, H(7-6) 12.37, [Ne II] 12.81, [Ne III] 15.56, and [S III] 18.71 um in a number of extragalactic H II regions with the Spitzer Space Telescope. A previous paper presented our data and analysis for the substantially face-on spiral galaxy M83. Here we report our results for the local group spiral galaxy M33. The nebulae selected cover a wide range of galactocentric radii (R_G). The observations were made with the Infrared Spectrograph with the short wavelength, high resolution module. The above set of five lines is observed cospatially, thus permitting a reliable comparison of the fluxes. From the measured fluxes, we determine the ionic abundance ratios including Ne++/Ne+, S3+/S++, and S++/Ne+ and find that there is a correlation of increasingly higher ionization with larger R_G. By sampling the dominant ionization states of Ne (Ne+, Ne++) and S (S++, S3+) for H II regions, we can estimate the Ne/H, S/H, and Ne/S ratios. We find from linear least-squares fits that there is a decrease in metallicity with increasing R_G: d log (Ne/H)/dR_G = -0.058+-0.014 and d log (S/H)/dR_G = -0.052+-0.021 dex kpc-1. There is no apparent variation in the Ne/S ratio with R_G. Unlike our previous similar study of M83, where we conjectured that this ratio was an upper limit, for M33 the derived ratios are likely a robust indication of Ne/S. This occurs because the H II regions have lower metallicity and higher ionization than those in M83. Both Ne and S are primary elements produced in alpha-chain reactions, following C and O burning in stars, making their yields depend very little on the stellar metallicity. Thus, it is expected that Ne/S remains relatively constant throughout a galaxy. The median (average) Ne/S ratio derived for H II regions in M33 is 16.3 (16.9), just slightly higher than
△ Less
Submitted 4 April, 2008;
originally announced April 2008.
-
Warm Molecular Gas in M51: Mapping the Excitation Temperature and Mass of H_2 with the Spitzer Infrared Spectrograph
Authors:
G. Brunner,
K. Sheth,
L. Armus,
M. Wolfire,
S. Vogel,
E. Schinnerer,
G. Helou,
R. Dufour,
J. Smith,
D. Dale
Abstract:
We have mapped the warm molecular gas traced by the H_2 S(0) - H_2 S(5) pure rotational mid-infrared emission lines over a radial strip across the nucleus and disk of M51 (NGC 5194) using the Infrared Spectrograph (IRS) on the Spitzer Space Telescope. The six H_2 lines have markedly different emission distributions. We obtained the H_2 temperature and surface density distributions by assuming a…
▽ More
We have mapped the warm molecular gas traced by the H_2 S(0) - H_2 S(5) pure rotational mid-infrared emission lines over a radial strip across the nucleus and disk of M51 (NGC 5194) using the Infrared Spectrograph (IRS) on the Spitzer Space Telescope. The six H_2 lines have markedly different emission distributions. We obtained the H_2 temperature and surface density distributions by assuming a two temperature model: a warm (T = 100 - 300 K) phase traced by the low J (S(0) - S(2)) lines and a hot phase (T = 400 - 1000 K) traced by the high J (S(2) - S(5)) lines. The lowest molecular gas temperatures are found within the spiral arms (T ~ 155 K), while the highest temperatures are found in the inter-arm regions (T > 700 K). The warm gas surface density reaches a maximum of 11 M_sun/pc^2 in the northwestern spiral arm, whereas the hot gas surface density peaks at 0.24 M_sun/pc^2 at the nucleus. The spatial offset between the peaks in the warm and hot phases and the differences in the distributions of the H_2 line emission suggest that the warm phase is mostly produced by UV photons in star forming regions while the hot phase is mostly produced by shocks or X-rays associated with nuclear activity. The warm H_2 is found in the dust lanes of M51, spatially offset from the brightest HII regions. The warm H_2 is generally spatially coincident with the cold molecular gas traced by CO (J = 1 - 0) emission, consistent with excitation of the warm phase in dense photodissociation regions (PDRs). In contrast, the hot H_2 is most prominent in the nuclear region. Here, over a 0.5 kpc radius around the nucleus of M51, the hot H_2 coincides with [O IV](25.89 micron) and X-ray emission indicating that shocks and/or X-rays are responsible for exciting this phase.
△ Less
Submitted 30 November, 2007;
originally announced December 2007.