-
Evaluating the Ability of LLMs to Solve Semantics-Aware Process Mining Tasks
Authors:
Adrian Rebmann,
Fabian David Schmidt,
Goran Glavaš,
Han van der Aa
Abstract:
The process mining community has recently recognized the potential of large language models (LLMs) for tackling various process mining tasks. Initial studies report the capability of LLMs to support process analysis and even, to some extent, that they are able to reason about how processes work. This latter property suggests that LLMs could also be used to tackle process mining tasks that benefit…
▽ More
The process mining community has recently recognized the potential of large language models (LLMs) for tackling various process mining tasks. Initial studies report the capability of LLMs to support process analysis and even, to some extent, that they are able to reason about how processes work. This latter property suggests that LLMs could also be used to tackle process mining tasks that benefit from an understanding of process behavior. Examples of such tasks include (semantic) anomaly detection and next activity prediction, which both involve considerations of the meaning of activities and their inter-relations. In this paper, we investigate the capabilities of LLMs to tackle such semantics-aware process mining tasks. Furthermore, whereas most works on the intersection of LLMs and process mining only focus on testing these models out of the box, we provide a more principled investigation of the utility of LLMs for process mining, including their ability to obtain process mining knowledge post-hoc by means of in-context learning and supervised fine-tuning. Concretely, we define three process mining tasks that benefit from an understanding of process semantics and provide extensive benchmarking datasets for each of them. Our evaluation experiments reveal that (1) LLMs fail to solve challenging process mining tasks out of the box and when provided only a handful of in-context examples, (2) but they yield strong performance when fine-tuned for these tasks, consistently surpassing smaller, encoder-based language models.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Fast Gibbs sampling for the local and global trend Bayesian exponential smoothing model
Authors:
Xueying Long,
Daniel F. Schmidt,
Christoph Bergmeir,
Slawek Smyl
Abstract:
In Smyl et al. [Local and global trend Bayesian exponential smoothing models. International Journal of Forecasting, 2024.], a generalised exponential smoothing model was proposed that is able to capture strong trends and volatility in time series. This method achieved state-of-the-art performance in many forecasting tasks, but its fitting procedure, which is based on the NUTS sampler, is very comp…
▽ More
In Smyl et al. [Local and global trend Bayesian exponential smoothing models. International Journal of Forecasting, 2024.], a generalised exponential smoothing model was proposed that is able to capture strong trends and volatility in time series. This method achieved state-of-the-art performance in many forecasting tasks, but its fitting procedure, which is based on the NUTS sampler, is very computationally expensive. In this work, we propose several modifications to the original model, as well as a bespoke Gibbs sampler for posterior exploration; these changes improve sampling time by an order of magnitude, thus rendering the model much more practically relevant. The new model, and sampler, are evaluated on the M3 dataset and are shown to be competitive, or superior, in terms of accuracy to the original method, while being substantially faster to run.
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
Self-Distillation for Model Stacking Unlocks Cross-Lingual NLU in 200+ Languages
Authors:
Fabian David Schmidt,
Philipp Borchert,
Ivan Vulić,
Goran Glavaš
Abstract:
LLMs have become a go-to solution not just for text generation, but also for natural language understanding (NLU) tasks. Acquiring extensive knowledge through language modeling on web-scale corpora, they excel on English NLU, yet struggle to extend their NLU capabilities to underrepresented languages. In contrast, machine translation models (MT) produce excellent multilingual representations, resu…
▽ More
LLMs have become a go-to solution not just for text generation, but also for natural language understanding (NLU) tasks. Acquiring extensive knowledge through language modeling on web-scale corpora, they excel on English NLU, yet struggle to extend their NLU capabilities to underrepresented languages. In contrast, machine translation models (MT) produce excellent multilingual representations, resulting in strong translation performance even for low-resource languages. MT encoders, however, lack the knowledge necessary for comprehensive NLU that LLMs obtain through language modeling training on immense corpora. In this work, we get the best both worlds by integrating MT encoders directly into LLM backbones via sample-efficient self-distillation. The resulting MT-LLMs preserve the inherent multilingual representational alignment from the MT encoder, allowing lower-resource languages to tap into the rich knowledge embedded in English-centric LLMs. Merging the MT encoder and LLM in a single model, we mitigate the propagation of translation errors and inference overhead of MT decoding inherent to discrete translation-based cross-lingual transfer (e.g., translate-test). Evaluation spanning three prominent NLU tasks and 127 predominantly low-resource languages renders MT-LLMs highly effective in cross-lingual transfer. MT-LLMs substantially and consistently outperform translate-test based on the same MT model, showing that we truly unlock multilingual language understanding for LLMs.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
News Without Borders: Domain Adaptation of Multilingual Sentence Embeddings for Cross-lingual News Recommendation
Authors:
Andreea Iana,
Fabian David Schmidt,
Goran Glavaš,
Heiko Paulheim
Abstract:
Rapidly growing numbers of multilingual news consumers pose an increasing challenge to news recommender systems in terms of providing customized recommendations. First, existing neural news recommenders, even when powered by multilingual language models (LMs), suffer substantial performance losses in zero-shot cross-lingual transfer (ZS-XLT). Second, the current paradigm of fine-tuning the backbon…
▽ More
Rapidly growing numbers of multilingual news consumers pose an increasing challenge to news recommender systems in terms of providing customized recommendations. First, existing neural news recommenders, even when powered by multilingual language models (LMs), suffer substantial performance losses in zero-shot cross-lingual transfer (ZS-XLT). Second, the current paradigm of fine-tuning the backbone LM of a neural recommender on task-specific data is computationally expensive and infeasible in few-shot recommendation and cold-start setups, where data is scarce or completely unavailable. In this work, we propose a news-adapted sentence encoder (NaSE), domain-specialized from a pretrained massively multilingual sentence encoder (SE). To this end, we construct and leverage PolyNews and PolyNewsParallel, two multilingual news-specific corpora. With the news-adapted multilingual SE in place, we test the effectiveness of (i.e., question the need for) supervised fine-tuning for news recommendation, and propose a simple and strong baseline based on (i) frozen NaSE embeddings and (ii) late click-behavior fusion. We show that NaSE achieves state-of-the-art performance in ZS-XLT in true cold-start and few-shot news recommendation.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Knowledge Distillation vs. Pretraining from Scratch under a Fixed (Computation) Budget
Authors:
Minh Duc Bui,
Fabian David Schmidt,
Goran Glavaš,
Katharina von der Wense
Abstract:
Compared to standard language model (LM) pretraining (i.e., from scratch), Knowledge Distillation (KD) entails an additional forward pass through a teacher model that is typically substantially larger than the target student model. As such, KD in LM pretraining materially slows down throughput of pretraining instances vis-a-vis pretraining from scratch. Scaling laws of LM pretraining suggest that…
▽ More
Compared to standard language model (LM) pretraining (i.e., from scratch), Knowledge Distillation (KD) entails an additional forward pass through a teacher model that is typically substantially larger than the target student model. As such, KD in LM pretraining materially slows down throughput of pretraining instances vis-a-vis pretraining from scratch. Scaling laws of LM pretraining suggest that smaller models can close the gap to larger counterparts if trained on more data (i.e., processing more tokens)-and under a fixed computation budget, smaller models are able be process more data than larger models. We thus hypothesize that KD might, in fact, be suboptimal to pretraining from scratch for obtaining smaller LMs, when appropriately accounting for the compute budget. To test this, we compare pretraining from scratch against several KD strategies for masked language modeling (MLM) in a fair experimental setup, with respect to amount of computation as well as pretraining data. Downstream results on GLUE, however, do not confirm our hypothesis: while pretraining from scratch performs comparably to ordinary KD under a fixed computation budget, more sophisticated KD strategies, namely TinyBERT (Jiao et al., 2020) and MiniLM (Wang et al., 2023), outperform it by a notable margin. We further find that KD yields larger gains over pretraining from scratch when the data must be repeated under the fixed computation budget.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
Prevalidated ridge regression is a highly-efficient drop-in replacement for logistic regression for high-dimensional data
Authors:
Angus Dempster,
Geoffrey I. Webb,
Daniel F. Schmidt
Abstract:
Logistic regression is a ubiquitous method for probabilistic classification. However, the effectiveness of logistic regression depends upon careful and relatively computationally expensive tuning, especially for the regularisation hyperparameter, and especially in the context of high-dimensional data. We present a prevalidated ridge regression model that closely matches logistic regression in term…
▽ More
Logistic regression is a ubiquitous method for probabilistic classification. However, the effectiveness of logistic regression depends upon careful and relatively computationally expensive tuning, especially for the regularisation hyperparameter, and especially in the context of high-dimensional data. We present a prevalidated ridge regression model that closely matches logistic regression in terms of classification error and log-loss, particularly for high-dimensional data, while being significantly more computationally efficient and having effectively no hyperparameters beyond regularisation. We scale the coefficients of the model so as to minimise log-loss for a set of prevalidated predictions derived from the estimated leave-one-out cross-validation error. This exploits quantities already computed in the course of fitting the ridge regression model in order to find the scaling parameter with nominal additional computational expense.
△ Less
Submitted 28 January, 2024;
originally announced January 2024.
-
Computational homogenization of phase-field fracture
Authors:
Felix Schmidt,
Stefan Schuß,
Christian Hesch
Abstract:
In this contribution we investigate the application of phase-field fracture models on non-linear multiscale computational homogenization schemes. In particular, we introduce different phase-fields on a two-scale problem and develop a thermodynamically consistent model. This allows on the one hand for the prediction of local micro-fracture patterns, which effectively acts as an anisotropic damage m…
▽ More
In this contribution we investigate the application of phase-field fracture models on non-linear multiscale computational homogenization schemes. In particular, we introduce different phase-fields on a two-scale problem and develop a thermodynamically consistent model. This allows on the one hand for the prediction of local micro-fracture patterns, which effectively acts as an anisotropic damage model on the macroscale. On the other and, the macro-fracture phase-field model allows to predict complex fracture pattern with regard to local microstructures. Both phase-fields are introduced in a common framework, such that a joint consistent linearization for the Newton-Raphson iteration can be developed. Finally, the limits of both models as well as the applicability are shown in different numerical examples.
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
Variational formulation and monolithic solution of computational homogenization methods
Authors:
Christian Hesch,
Felix Schmidt,
Stefan Schuß
Abstract:
In this contribution, we derive a consistent variational formulation for computational homogenization methods and show that traditional FE2 and IGA2 approaches are special discretization and solution techniques of this most general framework. This allows us to enhance dramatically the numerical analysis as well as the solution of the arising algebraic system. In particular, we expand the dimension…
▽ More
In this contribution, we derive a consistent variational formulation for computational homogenization methods and show that traditional FE2 and IGA2 approaches are special discretization and solution techniques of this most general framework. This allows us to enhance dramatically the numerical analysis as well as the solution of the arising algebraic system. In particular, we expand the dimension of the continuous system, discretize the higher dimensional problem consistently and apply afterwards a discrete null-space matrix to remove the additional dimensions. A benchmark problem, for which we can obtain an analytical solution, demonstrates the superiority of the chosen approach aiming to reduce the immense computational costs of traditional FE2 and IGA2 formulations to a fraction of the original requirements. Finally, we demonstrate a further reduction of the computational costs for the solution of general non-linear problems.
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
From Prediction to Action: Critical Role of Performance Estimation for Machine-Learning-Driven Materials Discovery
Authors:
Mario Boley,
Felix Luong,
Simon Teshuva,
Daniel F Schmidt,
Lucas Foppa,
Matthias Scheffler
Abstract:
Materials discovery driven by statistical property models is an iterative decision process, during which an initial data collection is extended with new data proposed by a model-informed acquisition function--with the goal to maximize a certain "reward" over time, such as the maximum property value discovered so far. While the materials science community achieved much progress in developing proper…
▽ More
Materials discovery driven by statistical property models is an iterative decision process, during which an initial data collection is extended with new data proposed by a model-informed acquisition function--with the goal to maximize a certain "reward" over time, such as the maximum property value discovered so far. While the materials science community achieved much progress in developing property models that predict well on average with respect to the training distribution, this form of in-distribution performance measurement is not directly coupled with the discovery reward. This is because an iterative discovery process has a shifting reward distribution that is over-proportionally determined by the model performance for exceptional materials. We demonstrate this problem using the example of bulk modulus maximization among double perovskite oxides. We find that the in-distribution predictive performance suggests random forests as superior to Gaussian process regression, while the results are inverse in terms of the discovery rewards. We argue that the lack of proper performance estimation methods from pre-computed data collections is a fundamental problem for improving data-driven materials discovery, and we propose a novel such estimator that, in contrast to naïve reward estimation, successfully predicts Gaussian processes with the "expected improvement" acquisition function as the best out of four options in our demonstrational study for double perovskites. Importantly, it does so without requiring the over thousand ab initio computations that were needed to confirm this prediction.
△ Less
Submitted 6 December, 2023; v1 submitted 27 November, 2023;
originally announced November 2023.
-
Scalable Probabilistic Forecasting in Retail with Gradient Boosted Trees: A Practitioner's Approach
Authors:
Xueying Long,
Quang Bui,
Grady Oktavian,
Daniel F. Schmidt,
Christoph Bergmeir,
Rakshitha Godahewa,
Seong Per Lee,
Kaifeng Zhao,
Paul Condylis
Abstract:
The recent M5 competition has advanced the state-of-the-art in retail forecasting. However, we notice important differences between the competition challenge and the challenges we face in a large e-commerce company. The datasets in our scenario are larger (hundreds of thousands of time series), and e-commerce can afford to have a larger assortment than brick-and-mortar retailers, leading to more i…
▽ More
The recent M5 competition has advanced the state-of-the-art in retail forecasting. However, we notice important differences between the competition challenge and the challenges we face in a large e-commerce company. The datasets in our scenario are larger (hundreds of thousands of time series), and e-commerce can afford to have a larger assortment than brick-and-mortar retailers, leading to more intermittent data. To scale to larger dataset sizes with feasible computational effort, firstly, we investigate a two-layer hierarchy and propose a top-down approach to forecasting at an aggregated level with less amount of series and intermittency, and then disaggregating to obtain the decision-level forecasts. Probabilistic forecasts are generated under distributional assumptions. Secondly, direct training at the lower level with subsamples can also be an alternative way of scaling. Performance of modelling with subsets is evaluated with the main dataset. Apart from a proprietary dataset, the proposed scalable methods are evaluated using the Favorita dataset and the M5 dataset. We are able to show the differences in characteristics of the e-commerce and brick-and-mortar retail datasets. Notably, our top-down forecasting framework enters the top 50 of the original M5 competition, even with models trained at a higher level under a much simpler setting.
△ Less
Submitted 2 November, 2023;
originally announced November 2023.
-
Bayes beats Cross Validation: Efficient and Accurate Ridge Regression via Expectation Maximization
Authors:
Shu Yu Tew,
Mario Boley,
Daniel F. Schmidt
Abstract:
We present a novel method for tuning the regularization hyper-parameter, $λ$, of a ridge regression that is faster to compute than leave-one-out cross-validation (LOOCV) while yielding estimates of the regression parameters of equal, or particularly in the setting of sparse covariates, superior quality to those obtained by minimising the LOOCV risk. The LOOCV risk can suffer from multiple and bad…
▽ More
We present a novel method for tuning the regularization hyper-parameter, $λ$, of a ridge regression that is faster to compute than leave-one-out cross-validation (LOOCV) while yielding estimates of the regression parameters of equal, or particularly in the setting of sparse covariates, superior quality to those obtained by minimising the LOOCV risk. The LOOCV risk can suffer from multiple and bad local minima for finite $n$ and thus requires the specification of a set of candidate $λ$, which can fail to provide good solutions. In contrast, we show that the proposed method is guaranteed to find a unique optimal solution for large enough $n$, under relatively mild conditions, without requiring the specification of any difficult to determine hyper-parameters. This is based on a Bayesian formulation of ridge regression that we prove to have a unimodal posterior for large enough $n$, allowing for both the optimal $λ$ and the regression coefficients to be jointly learned within an iterative expectation maximization (EM) procedure. Importantly, we show that by utilizing an appropriate preprocessing step, a single iteration of the main EM loop can be implemented in $O(\min(n, p))$ operations, for input data with $n$ rows and $p$ columns. In contrast, evaluating a single value of $λ$ using fast LOOCV costs $O(n \min(n, p))$ operations when using the same preprocessing. This advantage amounts to an asymptotic improvement of a factor of $l$ for $l$ candidate values for $λ$ (in the regime $q, p \in O(\sqrt{n})$ where $q$ is the number of regression targets).
△ Less
Submitted 2 November, 2023; v1 submitted 28 October, 2023;
originally announced October 2023.
-
One For All & All For One: Bypassing Hyperparameter Tuning with Model Averaging For Cross-Lingual Transfer
Authors:
Fabian David Schmidt,
Ivan Vulić,
Goran Glavaš
Abstract:
Multilingual language models enable zero-shot cross-lingual transfer (ZS-XLT): fine-tuned on sizable source-language task data, they perform the task in target languages without labeled instances. The effectiveness of ZS-XLT hinges on the linguistic proximity between languages and the amount of pretraining data for a language. Because of this, model selection based on source-language validation is…
▽ More
Multilingual language models enable zero-shot cross-lingual transfer (ZS-XLT): fine-tuned on sizable source-language task data, they perform the task in target languages without labeled instances. The effectiveness of ZS-XLT hinges on the linguistic proximity between languages and the amount of pretraining data for a language. Because of this, model selection based on source-language validation is unreliable: it picks model snapshots with suboptimal target-language performance. As a remedy, some work optimizes ZS-XLT by extensively tuning hyperparameters: the follow-up work then routinely struggles to replicate the original results. Other work searches over narrower hyperparameter grids, reporting substantially lower performance. In this work, we therefore propose an unsupervised evaluation protocol for ZS-XLT that decouples performance maximization from hyperparameter tuning. As a robust and more transparent alternative to extensive hyperparameter tuning, we propose to accumulatively average snapshots from different runs into a single model. We run broad ZS-XLT experiments on both higher-level semantic tasks (NLI, extractive QA) and a lower-level token classification task (NER) and find that conventional model selection based on source-language validation quickly plateaus to suboptimal ZS-XLT performance. On the other hand, our accumulative run-by-run averaging of models trained with different hyperparameters boosts ZS-XLT performance and closely correlates with "oracle" ZS-XLT, i.e., model selection based on target-language validation performance.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
Computing Marginal and Conditional Divergences between Decomposable Models with Applications
Authors:
Loong Kuan Lee,
Geoffrey I. Webb,
Daniel F. Schmidt,
Nico Piatkowski
Abstract:
The ability to compute the exact divergence between two high-dimensional distributions is useful in many applications but doing so naively is intractable. Computing the alpha-beta divergence -- a family of divergences that includes the Kullback-Leibler divergence and Hellinger distance -- between the joint distribution of two decomposable models, i.e chordal Markov networks, can be done in time ex…
▽ More
The ability to compute the exact divergence between two high-dimensional distributions is useful in many applications but doing so naively is intractable. Computing the alpha-beta divergence -- a family of divergences that includes the Kullback-Leibler divergence and Hellinger distance -- between the joint distribution of two decomposable models, i.e chordal Markov networks, can be done in time exponential in the treewidth of these models. However, reducing the dissimilarity between two high-dimensional objects to a single scalar value can be uninformative. Furthermore, in applications such as supervised learning, the divergence over a conditional distribution might be of more interest. Therefore, we propose an approach to compute the exact alpha-beta divergence between any marginal or conditional distribution of two decomposable models. Doing so tractably is non-trivial as we need to decompose the divergence between these distributions and therefore, require a decomposition over the marginal and conditional distributions of these models. Consequently, we provide such a decomposition and also extend existing work to compute the marginal and conditional alpha-beta divergence between these decompositions. We then show how our method can be used to analyze distributional changes by first applying it to a benchmark image dataset. Finally, based on our framework, we propose a novel way to quantify the error in contemporary superconducting quantum computers. Code for all experiments is available at: https://lklee.dev/pub/2023-icdm/code
△ Less
Submitted 13 October, 2023;
originally announced October 2023.
-
VR-based body tracking to stimulate musculoskeletal training
Authors:
M. Neidhardt,
S. Gerlach F. N. Schmidt,
I. A. K. Fiedler,
S. Grube,
B. Busse,
A. Schlaefer
Abstract:
Training helps to maintain and improve sufficient muscle function, body control, and body coordination. These are important to reduce the risk of fracture incidents caused by falls, especially for the elderly or people recovering from injury. Virtual reality training can offer a cost-effective and individualized training experience. We present an application for the HoloLens 2 to enable musculoske…
▽ More
Training helps to maintain and improve sufficient muscle function, body control, and body coordination. These are important to reduce the risk of fracture incidents caused by falls, especially for the elderly or people recovering from injury. Virtual reality training can offer a cost-effective and individualized training experience. We present an application for the HoloLens 2 to enable musculoskeletal training for elderly and impaired persons to allow for autonomous training and automatic progress evaluation. We designed a virtual downhill skiing scenario that is controlled by body movement to stimulate balance and body control. By adapting the parameters of the ski slope, we can tailor the intensity of the training to individual users. In this work, we evaluate whether the movement data of the HoloLens 2 alone is sufficient to control and predict body movement and joint angles during musculoskeletal training. We record the movements of 10 healthy volunteers with external tracking cameras and track a set of body and joint angles of the participant during training. We estimate correlation coefficients and systematically analyze whether whole body movement can be derived from the movement data of the HoloLens 2. No participant reports movement sickness effects and all were able to quickly interact and control their movement during skiing. Our results show a high correlation between HoloLens 2 movement data and the external tracking of the upper body movement and joint angles of the lower limbs.
△ Less
Submitted 7 August, 2023;
originally announced August 2023.
-
QUANT: A Minimalist Interval Method for Time Series Classification
Authors:
Angus Dempster,
Daniel F. Schmidt,
Geoffrey I. Webb
Abstract:
We show that it is possible to achieve the same accuracy, on average, as the most accurate existing interval methods for time series classification on a standard set of benchmark datasets using a single type of feature (quantiles), fixed intervals, and an 'off the shelf' classifier. This distillation of interval-based approaches represents a fast and accurate method for time series classification,…
▽ More
We show that it is possible to achieve the same accuracy, on average, as the most accurate existing interval methods for time series classification on a standard set of benchmark datasets using a single type of feature (quantiles), fixed intervals, and an 'off the shelf' classifier. This distillation of interval-based approaches represents a fast and accurate method for time series classification, achieving state-of-the-art accuracy on the expanded set of 142 datasets in the UCR archive with a total compute time (training and inference) of less than 15 minutes using a single CPU core.
△ Less
Submitted 2 August, 2023;
originally announced August 2023.
-
Adaptive Certified Training: Towards Better Accuracy-Robustness Tradeoffs
Authors:
Zhakshylyk Nurlanov,
Frank R. Schmidt,
Florian Bernard
Abstract:
As deep learning models continue to advance and are increasingly utilized in real-world systems, the issue of robustness remains a major challenge. Existing certified training methods produce models that achieve high provable robustness guarantees at certain perturbation levels. However, the main problem of such models is a dramatically low standard accuracy, i.e. accuracy on clean unperturbed dat…
▽ More
As deep learning models continue to advance and are increasingly utilized in real-world systems, the issue of robustness remains a major challenge. Existing certified training methods produce models that achieve high provable robustness guarantees at certain perturbation levels. However, the main problem of such models is a dramatically low standard accuracy, i.e. accuracy on clean unperturbed data, that makes them impractical. In this work, we consider a more realistic perspective of maximizing the robustness of a model at certain levels of (high) standard accuracy. To this end, we propose a novel certified training method based on a key insight that training with adaptive certified radii helps to improve both the accuracy and robustness of the model, advancing state-of-the-art accuracy-robustness tradeoffs. We demonstrate the effectiveness of the proposed method on MNIST, CIFAR-10, and TinyImageNet datasets. Particularly, on CIFAR-10 and TinyImageNet, our method yields models with up to two times higher robustness, measured as an average certified radius of a test set, at the same levels of standard accuracy compared to baseline approaches.
△ Less
Submitted 24 July, 2023;
originally announced July 2023.
-
Free Lunch: Robust Cross-Lingual Transfer via Model Checkpoint Averaging
Authors:
Fabian David Schmidt,
Ivan Vulić,
Goran Glavaš
Abstract:
Massively multilingual language models have displayed strong performance in zero-shot (ZS-XLT) and few-shot (FS-XLT) cross-lingual transfer setups, where models fine-tuned on task data in a source language are transferred without any or with only a few annotated instances to the target language(s). However, current work typically overestimates model performance as fine-tuned models are frequently…
▽ More
Massively multilingual language models have displayed strong performance in zero-shot (ZS-XLT) and few-shot (FS-XLT) cross-lingual transfer setups, where models fine-tuned on task data in a source language are transferred without any or with only a few annotated instances to the target language(s). However, current work typically overestimates model performance as fine-tuned models are frequently evaluated at model checkpoints that generalize best to validation instances in the target languages. This effectively violates the main assumptions of "true" ZS-XLT and FS-XLT. Such XLT setups require robust methods that do not depend on labeled target language data for validation and model selection. In this work, aiming to improve the robustness of "true" ZS-XLT and FS-XLT, we propose a simple and effective method that averages different checkpoints (i.e., model snapshots) during task fine-tuning. We conduct exhaustive ZS-XLT and FS-XLT experiments across higher-level semantic tasks (NLI, extractive QA) and lower-level token classification tasks (NER, POS). The results indicate that averaging model checkpoints yields systematic and consistent performance gains across diverse target languages in all tasks. Importantly, it simultaneously substantially desensitizes XLT to varying hyperparameter choices in the absence of target language validation. We also show that checkpoint averaging benefits performance when further combined with run averaging (i.e., averaging the parameters of models fine-tuned over independent runs).
△ Less
Submitted 26 May, 2023;
originally announced May 2023.
-
An Approach to Multiple Comparison Benchmark Evaluations that is Stable Under Manipulation of the Comparate Set
Authors:
Ali Ismail-Fawaz,
Angus Dempster,
Chang Wei Tan,
Matthieu Herrmann,
Lynn Miller,
Daniel F. Schmidt,
Stefano Berretti,
Jonathan Weber,
Maxime Devanne,
Germain Forestier,
Geoffrey I. Webb
Abstract:
The measurement of progress using benchmarks evaluations is ubiquitous in computer science and machine learning. However, common approaches to analyzing and presenting the results of benchmark comparisons of multiple algorithms over multiple datasets, such as the critical difference diagram introduced by Demšar (2006), have important shortcomings and, we show, are open to both inadvertent and inte…
▽ More
The measurement of progress using benchmarks evaluations is ubiquitous in computer science and machine learning. However, common approaches to analyzing and presenting the results of benchmark comparisons of multiple algorithms over multiple datasets, such as the critical difference diagram introduced by Demšar (2006), have important shortcomings and, we show, are open to both inadvertent and intentional manipulation. To address these issues, we propose a new approach to presenting the results of benchmark comparisons, the Multiple Comparison Matrix (MCM), that prioritizes pairwise comparisons and precludes the means of manipulating experimental results in existing approaches. MCM can be used to show the results of an all-pairs comparison, or to show the results of a comparison between one or more selected algorithms and the state of the art. MCM is implemented in Python and is publicly available.
△ Less
Submitted 19 May, 2023;
originally announced May 2023.
-
Unlocking Layer-wise Relevance Propagation for Autoencoders
Authors:
Kenyu Kobayashi,
Renata Khasanova,
Arno Schneuwly,
Felix Schmidt,
Matteo Casserini
Abstract:
Autoencoders are a powerful and versatile tool often used for various problems such as anomaly detection, image processing and machine translation. However, their reconstructions are not always trivial to explain. Therefore, we propose a fast explainability solution by extending the Layer-wise Relevance Propagation method with the help of Deep Taylor Decomposition framework. Furthermore, we introd…
▽ More
Autoencoders are a powerful and versatile tool often used for various problems such as anomaly detection, image processing and machine translation. However, their reconstructions are not always trivial to explain. Therefore, we propose a fast explainability solution by extending the Layer-wise Relevance Propagation method with the help of Deep Taylor Decomposition framework. Furthermore, we introduce a novel validation technique for comparing our explainability approach with baseline methods in the case of missing ground-truth data. Our results highlight computational as well as qualitative advantages of the proposed explainability solution with respect to existing methods.
△ Less
Submitted 21 March, 2023;
originally announced March 2023.
-
Approximation of radiative transfer for surface spectral features
Authors:
Frédéric Schmidt
Abstract:
Remote sensing hyperspectral and more generally spectral instruments are common tools to decipher surface features in Earth and Planetary science. While linear mixture is the most common approximation for compounds detection (mineral, water, ice, etc...), the transfer of light in surface and atmospheric medium are highly non-linear. The exact simulation of non-linearities can be estimated at very…
▽ More
Remote sensing hyperspectral and more generally spectral instruments are common tools to decipher surface features in Earth and Planetary science. While linear mixture is the most common approximation for compounds detection (mineral, water, ice, etc...), the transfer of light in surface and atmospheric medium are highly non-linear. The exact simulation of non-linearities can be estimated at very high numerical cost. Here I propose a very simple non-linear form (that includes the regular linear area mixture) of radiative transfer to approximate surface spectral feature. I demonstrate that this analytical form is able to approximate the grain size and intimate mixture dependence of surface features. In addition, the same analytical form can approximate the effect of Martian mineral aerosols. Unfortunately, Earth aerosols are more complex (water droplet, water ice, soot,...) and are not expected to follow the same trend.
△ Less
Submitted 13 April, 2023; v1 submitted 6 February, 2023;
originally announced February 2023.
-
Universe Points Representation Learning for Partial Multi-Graph Matching
Authors:
Zhakshylyk Nurlanov,
Frank R. Schmidt,
Florian Bernard
Abstract:
Many challenges from natural world can be formulated as a graph matching problem. Previous deep learning-based methods mainly consider a full two-graph matching setting. In this work, we study the more general partial matching problem with multi-graph cycle consistency guarantees. Building on a recent progress in deep learning on graphs, we propose a novel data-driven method (URL) for partial mult…
▽ More
Many challenges from natural world can be formulated as a graph matching problem. Previous deep learning-based methods mainly consider a full two-graph matching setting. In this work, we study the more general partial matching problem with multi-graph cycle consistency guarantees. Building on a recent progress in deep learning on graphs, we propose a novel data-driven method (URL) for partial multi-graph matching, which uses an object-to-universe formulation and learns latent representations of abstract universe points. The proposed approach advances the state of the art in semantic keypoint matching problem, evaluated on Pascal VOC, CUB, and Willow datasets. Moreover, the set of controlled experiments on a synthetic graph matching dataset demonstrates the scalability of our method to graphs with large number of nodes and its robustness to high partiality.
△ Less
Submitted 7 December, 2022; v1 submitted 1 December, 2022;
originally announced December 2022.
-
Sparse Horseshoe Estimation via Expectation-Maximisation
Authors:
Shu Yu Tew,
Daniel F. Schmidt,
Enes Makalic
Abstract:
The horseshoe prior is known to possess many desirable properties for Bayesian estimation of sparse parameter vectors, yet its density function lacks an analytic form. As such, it is challenging to find a closed-form solution for the posterior mode. Conventional horseshoe estimators use the posterior mean to estimate the parameters, but these estimates are not sparse. We propose a novel expectatio…
▽ More
The horseshoe prior is known to possess many desirable properties for Bayesian estimation of sparse parameter vectors, yet its density function lacks an analytic form. As such, it is challenging to find a closed-form solution for the posterior mode. Conventional horseshoe estimators use the posterior mean to estimate the parameters, but these estimates are not sparse. We propose a novel expectation-maximisation (EM) procedure for computing the MAP estimates of the parameters in the case of the standard linear model. A particular strength of our approach is that the M-step depends only on the form of the prior and it is independent of the form of the likelihood. We introduce several simple modifications of this EM procedure that allow for straightforward extension to generalised linear models. In experiments performed on simulated and real data, our approach performs comparable, or superior to, state-of-the-art sparse estimation methods in terms of statistical performance and computational cost.
△ Less
Submitted 6 November, 2022;
originally announced November 2022.
-
Robotic Maintenance of Road Infrastructures: The HERON Project
Authors:
Iason Katsamenis,
Matthaios Bimpas,
Eftychios Protopapadakis,
Charalampos Zafeiropoulos,
Dimitris Kalogeras,
Anastasios Doulamis,
Nikolaos Doulamis,
Carlos Martín-Portugués Montoliu,
Yannis Handanos,
Franziska Schmidt,
Lionel Ott,
Miquel Cantero,
Rafael Lopez
Abstract:
Of all public assets, road infrastructure tops the list. Roads are crucial for economic development and growth, providing access to education, health, and employment. The maintenance, repair, and upgrade of roads are therefore vital to road users' health and safety as well as to a well-functioning and prosperous modern economy. The EU-funded HERON project will develop an integrated automated syste…
▽ More
Of all public assets, road infrastructure tops the list. Roads are crucial for economic development and growth, providing access to education, health, and employment. The maintenance, repair, and upgrade of roads are therefore vital to road users' health and safety as well as to a well-functioning and prosperous modern economy. The EU-funded HERON project will develop an integrated automated system to adequately maintain road infrastructure. In turn, this will reduce accidents, lower maintenance costs, and increase road network capacity and efficiency. To coordinate maintenance works, the project will design an autonomous ground robotic vehicle that will be supported by autonomous drones. Sensors and scanners for 3D mapping will be used in addition to artificial intelligence toolkits to help coordinate road maintenance and upgrade workflows.
△ Less
Submitted 9 May, 2022;
originally announced May 2022.
-
HYDRA: Competing convolutional kernels for fast and accurate time series classification
Authors:
Angus Dempster,
Daniel F. Schmidt,
Geoffrey I. Webb
Abstract:
We demonstrate a simple connection between dictionary methods for time series classification, which involve extracting and counting symbolic patterns in time series, and methods based on transforming input time series using convolutional kernels, namely ROCKET and its variants. We show that by adjusting a single hyperparameter it is possible to move by degrees between models resembling dictionary…
▽ More
We demonstrate a simple connection between dictionary methods for time series classification, which involve extracting and counting symbolic patterns in time series, and methods based on transforming input time series using convolutional kernels, namely ROCKET and its variants. We show that by adjusting a single hyperparameter it is possible to move by degrees between models resembling dictionary methods and models resembling ROCKET. We present HYDRA, a simple, fast, and accurate dictionary method for time series classification using competing convolutional kernels, combining key aspects of both ROCKET and conventional dictionary methods. HYDRA is faster and more accurate than the most accurate existing dictionary methods, and can be combined with ROCKET and its variants to further improve the accuracy of these methods.
△ Less
Submitted 25 March, 2022;
originally announced March 2022.
-
Efficient Runtime Profiling for Black-box Machine Learning Services on Sensor Streams
Authors:
Soeren Becker,
Dominik Scheinert,
Florian Schmidt,
Odej Kao
Abstract:
In highly distributed environments such as cloud, edge and fog computing, the application of machine learning for automating and optimizing processes is on the rise. Machine learning jobs are frequently applied in streaming conditions, where models are used to analyze data streams originating from e.g. video streams or sensory data. Often the results for particular data samples need to be provided…
▽ More
In highly distributed environments such as cloud, edge and fog computing, the application of machine learning for automating and optimizing processes is on the rise. Machine learning jobs are frequently applied in streaming conditions, where models are used to analyze data streams originating from e.g. video streams or sensory data. Often the results for particular data samples need to be provided in time before the arrival of next data. Thus, enough resources must be provided to ensure the just-in-time processing for the specific data stream. This paper focuses on proposing a runtime modeling strategy for containerized machine learning jobs, which enables the optimization and adaptive adjustment of resources per job and component. Our black-box approach assembles multiple techniques into an efficient runtime profiling method, while making no assumptions about underlying hardware, data streams, or applied machine learning jobs. The results show that our method is able to capture the general runtime behaviour of different machine learning jobs already after a short profiling phase.
△ Less
Submitted 10 March, 2022;
originally announced March 2022.
-
Towards a trustworthy, secure and reliable enclave for machine learning in a hospital setting: The Essen Medical Computing Platform (EMCP)
Authors:
Hendrik F. R. Schmidt,
Jörg Schlötterer,
Marcel Bargull,
Enrico Nasca,
Ryan Aydelott,
Christin Seifert,
Folker Meyer
Abstract:
AI/Computing at scale is a difficult problem, especially in a health care setting. We outline the requirements, planning and implementation choices as well as the guiding principles that led to the implementation of our secure research computing enclave, the Essen Medical Computing Platform (EMCP), affiliated with a major German hospital. Compliance, data privacy and usability were the immutable r…
▽ More
AI/Computing at scale is a difficult problem, especially in a health care setting. We outline the requirements, planning and implementation choices as well as the guiding principles that led to the implementation of our secure research computing enclave, the Essen Medical Computing Platform (EMCP), affiliated with a major German hospital. Compliance, data privacy and usability were the immutable requirements of the system. We will discuss the features of our computing enclave and we will provide our recipe for groups wishing to adopt a similar setup.
△ Less
Submitted 13 January, 2022;
originally announced January 2022.
-
Computational homogenization of higher-order continua
Authors:
Felix Schmidt,
Melanie Krüger,
Marc-Andre Keip,
Christian Hesch
Abstract:
We introduce a novel computational framework for the multiscale simulation of higher-order continua that allows for the consideration of first-, second- and third- order effects at both micro- and macro-level. In line with classical two-scale approaches, we describe the microstructure via representative volume elements (RVE) that are attached at each integration point of the macroscopic problem. T…
▽ More
We introduce a novel computational framework for the multiscale simulation of higher-order continua that allows for the consideration of first-, second- and third- order effects at both micro- and macro-level. In line with classical two-scale approaches, we describe the microstructure via representative volume elements (RVE) that are attached at each integration point of the macroscopic problem. To take account of the extended continuity requirements of independent fields at micro- and macro-level, we discretize both scales via isogeometric analysis (IGA). As a result, we obtain an IGA2-method that is conceptually similar to the well-known FE2-method. We demonstrate the functionality and accuracy of this novel multiscale method by means of a series of multiscale simulations involving different kinds of higher-order continua.
△ Less
Submitted 7 March, 2022; v1 submitted 8 December, 2021;
originally announced December 2021.
-
LOS: Local-Optimistic Scheduling of Periodic Model Training For Anomaly Detection on Sensor Data Streams in Meshed Edge Networks
Authors:
Soeren Becker,
Florian Schmidt,
Lauritz Thamsen,
Ana Juan Ferrer,
Odej Kao
Abstract:
Anomaly detection is increasingly important to handle the amount of sensor data in Edge and Fog environments, Smart Cities, as well as in Industry 4.0. To ensure good results, the utilized ML models need to be updated periodically to adapt to seasonal changes and concept drifts in the sensor data. Although the increasing resource availability at the edge can allow for in-situ execution of model tr…
▽ More
Anomaly detection is increasingly important to handle the amount of sensor data in Edge and Fog environments, Smart Cities, as well as in Industry 4.0. To ensure good results, the utilized ML models need to be updated periodically to adapt to seasonal changes and concept drifts in the sensor data. Although the increasing resource availability at the edge can allow for in-situ execution of model training directly on the devices, it is still often offloaded to fog devices or the cloud.
In this paper, we propose Local-Optimistic Scheduling (LOS), a method for executing periodic ML model training jobs in close proximity to the data sources, without overloading lightweight edge devices. Training jobs are offloaded to nearby neighbor nodes as necessary and the resource consumption is optimized to meet the training period while still ensuring enough resources for further training executions. This scheduling is accomplished in a decentralized, collaborative and opportunistic manner, without full knowledge of the infrastructure and workload. We evaluated our method in an edge computing testbed on real-world datasets. The experimental results show that LOS places the training executions close to the input sensor streams, decreases the deviation between training time and training period by up to 40% and increases the amount of successfully scheduled training jobs compared to an in-situ execution.
△ Less
Submitted 27 September, 2021;
originally announced September 2021.
-
EdgePier: P2P-based Container Image Distribution in Edge Computing Environments
Authors:
Soeren Becker,
Florian Schmidt,
Odej Kao
Abstract:
Edge and fog computing architectures utilize container technologies in order to offer a lightweight application deployment. Container images are stored in registry services and operated by orchestration platforms to download and start the respective applications on nodes of the infrastructure. During large application rollouts, the connection to the registry is prone to become a bottleneck, which…
▽ More
Edge and fog computing architectures utilize container technologies in order to offer a lightweight application deployment. Container images are stored in registry services and operated by orchestration platforms to download and start the respective applications on nodes of the infrastructure. During large application rollouts, the connection to the registry is prone to become a bottleneck, which results in longer provisioning times and deployment latencies. Previous work has mainly addressed this problem by proposing scalable registries, leveraging the BitTorrent protocol or distributed storage to host container images. However, for lightweight and dynamic edge environments the overhead of several dedicated components is not feasible in regard to its interference of the actual workload and is subject to failures due to the introduced complexity.
In this paper we introduce a fully decentralized container registry called EdgePier, that can be deployed across edge sites and is able to decrease container deployment times by utilizing peer-to-peer connections between participating nodes. Image layers are shared without the need for further centralized orchestration entities. The conducted evaluation shows that the provisioning times are improved by up to 65% in comparison to a baseline registry, even with limited bandwidth to the cloud.
△ Less
Submitted 27 September, 2021;
originally announced September 2021.
-
Multi-Objective Reconstruction Of Software Architecture
Authors:
Frederick Schmidt,
Stephen MacDonell,
Andy M. Connor
Abstract:
Design erosion is a persistent problem within the software engineering discipline. Software designs tend to deteriorate over time and there is a need for tools and techniques that support software architects when dealing with legacy systems. This paper presents an evaluation of a Search Based Software Engineering (SBSE) approach intended to recover high-level architecture designs of software syste…
▽ More
Design erosion is a persistent problem within the software engineering discipline. Software designs tend to deteriorate over time and there is a need for tools and techniques that support software architects when dealing with legacy systems. This paper presents an evaluation of a Search Based Software Engineering (SBSE) approach intended to recover high-level architecture designs of software systems by structuring low-level artefacts into high-level architecture artefact configurations. In particular , this paper describes the performance evaluation of a number of metaheuristic search algorithms applied to architecture reconstruction problems with high dimensionality in terms of objectives. These problems have been selected as representative of the typical challenges faced by software architects dealing with legacy systems and the results inform the ongoing developed of a software tool that supports the analysis of trade-offs between different reconstructed architectures.
△ Less
Submitted 28 April, 2021;
originally announced April 2021.
-
Towards a Cognitive Compute Continuum: An Architecture for Ad-Hoc Self-Managed Swarms
Authors:
Ana Juan Ferrer,
Soeren Becker,
Florian Schmidt,
Lauritz Thamsen,
Odej Kao
Abstract:
In this paper we introduce our vision of a Cognitive Computing Continuum to address the changing IT service provisioning towards a distributed, opportunistic, self-managed collaboration between heterogeneous devices outside the traditional data center boundaries. The focal point of this continuum are cognitive devices, which have to make decisions autonomously using their on-board computation and…
▽ More
In this paper we introduce our vision of a Cognitive Computing Continuum to address the changing IT service provisioning towards a distributed, opportunistic, self-managed collaboration between heterogeneous devices outside the traditional data center boundaries. The focal point of this continuum are cognitive devices, which have to make decisions autonomously using their on-board computation and storage capacity based on information sensed from their environment. Such devices are moving and cannot rely on fixed infrastructure elements, but instead realise on-the-fly networking and thus frequently join and leave temporal swarms. All this creates novel demands for the underlying architecture and resource management, which must bridge the gap from edge to cloud environments, while keeping the QoS parameters within required boundaries. The paper presents an initial architecture and a resource management framework for the implementation of this type of IT service provisioning.
△ Less
Submitted 10 March, 2021;
originally announced March 2021.
-
Towards AIOps in Edge Computing Environments
Authors:
Soeren Becker,
Florian Schmidt,
Anton Gulenko,
Alexander Acker,
Odej Kao
Abstract:
Edge computing was introduced as a technical enabler for the demanding requirements of new network technologies like 5G. It aims to overcome challenges related to centralized cloud computing environments by distributing computational resources to the edge of the network towards the customers. The complexity of the emerging infrastructures increases significantly, together with the ramifications of…
▽ More
Edge computing was introduced as a technical enabler for the demanding requirements of new network technologies like 5G. It aims to overcome challenges related to centralized cloud computing environments by distributing computational resources to the edge of the network towards the customers. The complexity of the emerging infrastructures increases significantly, together with the ramifications of outages on critical use cases such as self-driving cars or health care. Artificial Intelligence for IT Operations (AIOps) aims to support human operators in managing complex infrastructures by using machine learning methods. This paper describes the system design of an AIOps platform which is applicable in heterogeneous, distributed environments. The overhead of a high-frequency monitoring solution on edge devices is evaluated and performance experiments regarding the applicability of three anomaly detection algorithms on edge devices are conducted. The results show, that it is feasible to collect metrics with a high frequency and simultaneously run specific anomaly detection algorithms directly on edge devices with a reasonable overhead on the resource utilization.
△ Less
Submitted 12 February, 2021;
originally announced February 2021.
-
Optimizing Convergence for Iterative Learning of ARIMA for Stationary Time Series
Authors:
Kevin Styp-Rekowski,
Florian Schmidt,
Odej Kao
Abstract:
Forecasting of time series in continuous systems becomes an increasingly relevant task due to recent developments in IoT and 5G. The popular forecasting model ARIMA is applied to a large variety of applications for decades. An online variant of ARIMA applies the Online Newton Step in order to learn the underlying process of the time series. This optimization method has pitfalls concerning the comp…
▽ More
Forecasting of time series in continuous systems becomes an increasingly relevant task due to recent developments in IoT and 5G. The popular forecasting model ARIMA is applied to a large variety of applications for decades. An online variant of ARIMA applies the Online Newton Step in order to learn the underlying process of the time series. This optimization method has pitfalls concerning the computational complexity and convergence. Thus, this work focuses on the computational less expensive Online Gradient Descent optimization method, which became popular for learning of neural networks in recent years. For the iterative training of such models, we propose a new approach combining different Online Gradient Descent learners (such as Adam, AMSGrad, Adagrad, Nesterov) to achieve fast convergence. The evaluation on synthetic data and experimental datasets show that the proposed approach outperforms the existing methods resulting in an overall lower prediction error.
△ Less
Submitted 25 January, 2021;
originally announced January 2021.
-
Artificial Intelligence for IT Operations (AIOPS) Workshop White Paper
Authors:
Jasmin Bogatinovski,
Sasho Nedelkoski,
Alexander Acker,
Florian Schmidt,
Thorsten Wittkopp,
Soeren Becker,
Jorge Cardoso,
Odej Kao
Abstract:
Artificial Intelligence for IT Operations (AIOps) is an emerging interdisciplinary field arising in the intersection between the research areas of machine learning, big data, streaming analytics, and the management of IT operations. AIOps, as a field, is a candidate to produce the future standard for IT operation management. To that end, AIOps has several challenges. First, it needs to combine sep…
▽ More
Artificial Intelligence for IT Operations (AIOps) is an emerging interdisciplinary field arising in the intersection between the research areas of machine learning, big data, streaming analytics, and the management of IT operations. AIOps, as a field, is a candidate to produce the future standard for IT operation management. To that end, AIOps has several challenges. First, it needs to combine separate research branches from other research fields like software reliability engineering. Second, novel modelling techniques are needed to understand the dynamics of different systems. Furthermore, it requires to lay out the basis for assessing: time horizons and uncertainty for imminent SLA violations, the early detection of emerging problems, autonomous remediation, decision making, support of various optimization objectives. Moreover, a good understanding and interpretability of these aiding models are important for building trust between the employed tools and the domain experts. Finally, all this will result in faster adoption of AIOps, further increase the interest in this research field and contribute to bridging the gap towards fully-autonomous operating IT systems.
The main aim of the AIOPS workshop is to bring together researchers from both academia and industry to present their experiences, results, and work in progress in this field. The workshop aims to strengthen the community and unite it towards the goal of joining the efforts for solving the main challenges the field is currently facing. A consensus and adoption of the principles of openness and reproducibility will boost the research in this emerging area significantly.
△ Less
Submitted 15 January, 2021;
originally announced January 2021.
-
Multidimensional coupling: A variationally consistent approach to fiber-reinforced materials
Authors:
Ustim Khristenko,
Stefan Schuß,
Melanie Krüger,
Felix Schmidt,
Barbara Wohlmuth,
Christian Hesch
Abstract:
A novel mathematical model for fiber-reinforced materials is proposed. It is based on a 1-dimensional beam model for the thin fiber structures, a flexible and general 3-dimensional elasticity model for the matrix and an overlapping domain decomposition approach. From a computational point of view, this is motivated by the fact that matrix and fibers can easily meshed independently. Our main intere…
▽ More
A novel mathematical model for fiber-reinforced materials is proposed. It is based on a 1-dimensional beam model for the thin fiber structures, a flexible and general 3-dimensional elasticity model for the matrix and an overlapping domain decomposition approach. From a computational point of view, this is motivated by the fact that matrix and fibers can easily meshed independently. Our main interest is in fiber reinforce polymers where the Young's modulus are quite different. Thus the modeling error from the overlapping approach is of no significance. The coupling conditions acknowledge both, the forces and the moments of the beam model and transfer them to the background material. A suitable static condensation procedure is applied to remove the beam balance equations. The condensed system then forms our starting point for a numerical approximation in terms of isogeometric analysis. The choice of our discrete basis functions of higher regularity is motivated by the fact, that as a result of the static condensation, we obtain second gradient terms in fiber direction. Eventually, a series of benchmark tests demonstrate the flexibility and robustness of the proposed methodology. As a proof-of-concept, we show that our new model is able to capture bending, torsion and shear dominated situations.
△ Less
Submitted 10 April, 2021; v1 submitted 7 January, 2021;
originally announced January 2021.
-
The voice of COVID-19: Acoustic correlates of infection
Authors:
Katrin D. Bartl-Pokorny,
Florian B. Pokorny,
Anton Batliner,
Shahin Amiriparian,
Anastasia Semertzidou,
Florian Eyben,
Elena Kramer,
Florian Schmidt,
Rainer Schönweiler,
Markus Wehler,
Björn W. Schuller
Abstract:
COVID-19 is a global health crisis that has been affecting many aspects of our daily lives throughout the past year. The symptomatology of COVID-19 is heterogeneous with a severity continuum. A considerable proportion of symptoms are related to pathological changes in the vocal system, leading to the assumption that COVID-19 may also affect voice production. For the very first time, the present st…
▽ More
COVID-19 is a global health crisis that has been affecting many aspects of our daily lives throughout the past year. The symptomatology of COVID-19 is heterogeneous with a severity continuum. A considerable proportion of symptoms are related to pathological changes in the vocal system, leading to the assumption that COVID-19 may also affect voice production. For the very first time, the present study aims to investigate voice acoustic correlates of an infection with COVID-19 on the basis of a comprehensive acoustic parameter set. We compare 88 acoustic features extracted from recordings of the vowels /i:/, /e:/, /o:/, /u:/, and /a:/ produced by 11 symptomatic COVID-19 positive and 11 COVID-19 negative German-speaking participants. We employ the Mann-Whitney U test and calculate effect sizes to identify features with the most prominent group differences. The mean voiced segment length and the number of voiced segments per second yield the most important differences across all vowels indicating discontinuities in the pulmonic airstream during phonation in COVID-19 positive participants. Group differences in the front vowels /i:/ and /e:/ are additionally reflected in the variation of the fundamental frequency and the harmonics-to-noise ratio, group differences in back vowels /o:/ and /u:/ in statistics of the Mel-frequency cepstral coefficients and the spectral slope. Findings of this study can be considered an important proof-of-concept contribution for a potential future voice-based identification of individuals infected with COVID-19.
△ Less
Submitted 17 December, 2020;
originally announced December 2020.
-
MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification
Authors:
Angus Dempster,
Daniel F. Schmidt,
Geoffrey I. Webb
Abstract:
Until recently, the most accurate methods for time series classification were limited by high computational complexity. ROCKET achieves state-of-the-art accuracy with a fraction of the computational expense of most existing methods by transforming input time series using random convolutional kernels, and using the transformed features to train a linear classifier. We reformulate ROCKET into a new…
▽ More
Until recently, the most accurate methods for time series classification were limited by high computational complexity. ROCKET achieves state-of-the-art accuracy with a fraction of the computational expense of most existing methods by transforming input time series using random convolutional kernels, and using the transformed features to train a linear classifier. We reformulate ROCKET into a new method, MINIROCKET, making it up to 75 times faster on larger datasets, and making it almost deterministic (and optionally, with additional computational expense, fully deterministic), while maintaining essentially the same accuracy. Using this method, it is possible to train and test a classifier on all of 109 datasets from the UCR archive to state-of-the-art accuracy in less than 10 minutes. MINIROCKET is significantly faster than any other method of comparable accuracy (including ROCKET), and significantly more accurate than any other method of even roughly-similar computational expense. As such, we suggest that MINIROCKET should now be considered and used as the default variant of ROCKET.
△ Less
Submitted 14 July, 2021; v1 submitted 16 December, 2020;
originally announced December 2020.
-
Machine Learning for automatic identification of new minor species
Authors:
Frederic Schmidt,
Guillaume Cruz Mermy,
Justin Erwin,
Severine Robert,
Lori Neary,
Ian R. Thomas,
Frank Daerden,
Bojan Ristic,
Manish R. Patel,
Giancarlo Bellucci,
Jose-Juan Lopez-Moreno,
Ann-Carine Vandaele
Abstract:
One of the main difficulties to analyze modern spectroscopic datasets is due to the large amount of data. For example, in atmospheric transmittance spectroscopy, the solar occultation channel (SO) of the NOMAD instrument onboard the ESA ExoMars2016 satellite called Trace Gas Orbiter (TGO) had produced $\sim$10 millions of spectra in 20000 acquisition sequences since the beginning of the mission in…
▽ More
One of the main difficulties to analyze modern spectroscopic datasets is due to the large amount of data. For example, in atmospheric transmittance spectroscopy, the solar occultation channel (SO) of the NOMAD instrument onboard the ESA ExoMars2016 satellite called Trace Gas Orbiter (TGO) had produced $\sim$10 millions of spectra in 20000 acquisition sequences since the beginning of the mission in April 2018 until 15 January 2020. Other datasets are even larger with $\sim$billions of spectra for OMEGA onboard Mars Express or CRISM onboard Mars Reconnaissance Orbiter. Usually, new lines are discovered after a long iterative process of model fitting and manual residual analysis. Here we propose a new method based on unsupervised machine learning, to automatically detect new minor species. Although precise quantification is out of scope, this tool can also be used to quickly summarize the dataset, by giving few endmembers ("source") and their abundances. We approximate the dataset non-linearity by a linear mixture of abundance and source spectra (endmembers). We used unsupervised source separation in form of non-negative matrix factorization to estimate those quantities. Several methods are tested on synthetic and simulation data. Our approach is dedicated to detect minor species spectra rather than precisely quantifying them. On synthetic example, this approach is able to detect chemical compounds present in form of 100 hidden spectra out of $10^4$, at 1.5 times the noise level. Results on simulated spectra of NOMAD-SO targeting CH$_{4}$ show that detection limits goes in the range of 100-500 ppt in favorable conditions. Results on real martian data from NOMAD-SO show that CO$_{2}$ and H$_{2}$O are present, as expected, but CH$_{4}$ is absent. Nevertheless, we confirm a set of new unexpected lines in the database, attributed by ACS instrument Team to the CO$_{2}$ magnetic dipole.
△ Less
Submitted 15 December, 2020;
originally announced December 2020.
-
3D Bird Reconstruction: a Dataset, Model, and Shape Recovery from a Single View
Authors:
Marc Badger,
Yufu Wang,
Adarsh Modh,
Ammon Perkes,
Nikos Kolotouros,
Bernd G. Pfrommer,
Marc F. Schmidt,
Kostas Daniilidis
Abstract:
Automated capture of animal pose is transforming how we study neuroscience and social behavior. Movements carry important social cues, but current methods are not able to robustly estimate pose and shape of animals, particularly for social animals such as birds, which are often occluded by each other and objects in the environment. To address this problem, we first introduce a model and multi-view…
▽ More
Automated capture of animal pose is transforming how we study neuroscience and social behavior. Movements carry important social cues, but current methods are not able to robustly estimate pose and shape of animals, particularly for social animals such as birds, which are often occluded by each other and objects in the environment. To address this problem, we first introduce a model and multi-view optimization approach, which we use to capture the unique shape and pose space displayed by live birds. We then introduce a pipeline and experiments for keypoint, mask, pose, and shape regression that recovers accurate avian postures from single views. Finally, we provide extensive multi-view keypoint and mask annotations collected from a group of 15 social birds housed together in an outdoor aviary. The project website with videos, results, code, mesh model, and the Penn Aviary Dataset can be found at https://marcbadger.github.io/avian-mesh.
△ Less
Submitted 13 August, 2020;
originally announced August 2020.
-
A strain-gradient formulation for fiber reinforced polymers: Hybrid phase-field model for porous-ductile fracture
Authors:
Maik Dittman,
Jonathan Schult,
Felix Schmidt,
Christian Hesch
Abstract:
A novel numerical approach to analyze the mechanical behavior within composite materials including the inelastic regime up to final failure is presented. Therefore, a second-gradient theory is combined with phase-field methods to fracture. In particular, we assume that the polymeric matrix material undergoes ductile fracture, whereas continuously embedded fibers undergo brittle fracture as it is t…
▽ More
A novel numerical approach to analyze the mechanical behavior within composite materials including the inelastic regime up to final failure is presented. Therefore, a second-gradient theory is combined with phase-field methods to fracture. In particular, we assume that the polymeric matrix material undergoes ductile fracture, whereas continuously embedded fibers undergo brittle fracture as it is typical e.g. for roving glass reinforced thermoplastics. A hybrid phase-field approach is developed and applied along with a modified Gurson-Tvergaard-Needelman GTN-type plasticity model accounting for a temperature-dependent growth of voids on microscale. The mechanical response of the arising microstructure of the woven fabric gives rise to additional higher-order terms, representing homogenized bending contributions of the fibers. Eventually, a series of tests is conducted for this physically comprehensive multifield formulation to investigate different kinds and sequences of failure within long fiber reinforced polymers.
△ Less
Submitted 19 April, 2021; v1 submitted 21 July, 2020;
originally announced July 2020.
-
Neural Network Virtual Sensors for Fuel Injection Quantities with Provable Performance Specifications
Authors:
Eric Wong,
Tim Schneider,
Joerg Schmitt,
Frank R. Schmidt,
J. Zico Kolter
Abstract:
Recent work has shown that it is possible to learn neural networks with provable guarantees on the output of the model when subject to input perturbations, however these works have focused primarily on defending against adversarial examples for image classifiers. In this paper, we study how these provable guarantees can be naturally applied to other real world settings, namely getting performance…
▽ More
Recent work has shown that it is possible to learn neural networks with provable guarantees on the output of the model when subject to input perturbations, however these works have focused primarily on defending against adversarial examples for image classifiers. In this paper, we study how these provable guarantees can be naturally applied to other real world settings, namely getting performance specifications for robust virtual sensors measuring fuel injection quantities within an engine. We first demonstrate that, in this setting, even simple neural network models are highly susceptible to reasonable levels of adversarial sensor noise, which are capable of increasing the mean relative error of a standard neural network from 6.6% to 43.8%. We then leverage methods for learning provably robust networks and verifying robustness properties, resulting in a robust model which we can provably guarantee has at most 16.5% mean relative error under any sensor noise. Additionally, we show how specific intervals of fuel injection quantities can be targeted to maximize robustness for certain ranges, allowing us to train a virtual sensor for fuel injection which is provably guaranteed to have at most 10.69% relative error under noise while maintaining 3% relative error on non-adversarial data within normalized fuel injection ranges of 0.6 to 1.0.
△ Less
Submitted 30 June, 2020;
originally announced July 2020.
-
Sensor Artificial Intelligence and its Application to Space Systems -- A White Paper
Authors:
Anko Börner,
Heinz-Wilhelm Hübers,
Odej Kao,
Florian Schmidt,
Sören Becker,
Joachim Denzler,
Daniel Matolin,
David Haber,
Sergio Lucia,
Wojciech Samek,
Rudolph Triebel,
Sascha Eichstädt,
Felix Biessmann,
Anna Kruspe,
Peter Jung,
Manon Kok,
Guillermo Gallego,
Ralf Berger
Abstract:
Information and communication technologies have accompanied our everyday life for years. A steadily increasing number of computers, cameras, mobile devices, etc. generate more and more data, but at the same time we realize that the data can only partially be analyzed with classical approaches. The research and development of methods based on artificial intelligence (AI) made enormous progress in t…
▽ More
Information and communication technologies have accompanied our everyday life for years. A steadily increasing number of computers, cameras, mobile devices, etc. generate more and more data, but at the same time we realize that the data can only partially be analyzed with classical approaches. The research and development of methods based on artificial intelligence (AI) made enormous progress in the area of interpretability of data in recent years. With growing experience, both, the potential and limitations of these new technologies are increasingly better understood. Typically, AI approaches start with the data from which information and directions for action are derived. However, the circumstances under which such data are collected and how they change over time are rarely considered. A closer look at the sensors and their physical properties within AI approaches will lead to more robust and widely applicable algorithms. This holistic approach which considers entire signal chains from the origin to a data product, "Sensor AI", is a highly relevant topic with great potential. It will play a decisive role in autonomous driving as well as in areas of automated production, predictive maintenance or space research. The goal of this white paper is to establish "Sensor AI" as a dedicated research topic. We want to exchange knowledge on the current state-of-the-art on Sensor AI, to identify synergies among research groups and thus boost the collaboration in this key technology for science and industry.
△ Less
Submitted 9 June, 2020;
originally announced June 2020.
-
BERT as a Teacher: Contextual Embeddings for Sequence-Level Reward
Authors:
Florian Schmidt,
Thomas Hofmann
Abstract:
Measuring the quality of a generated sequence against a set of references is a central problem in many learning frameworks, be it to compute a score, to assign a reward, or to perform discrimination. Despite great advances in model architectures, metrics that scale independently of the number of references are still based on n-gram estimates. We show that the underlying operations, counting words…
▽ More
Measuring the quality of a generated sequence against a set of references is a central problem in many learning frameworks, be it to compute a score, to assign a reward, or to perform discrimination. Despite great advances in model architectures, metrics that scale independently of the number of references are still based on n-gram estimates. We show that the underlying operations, counting words and comparing counts, can be lifted to embedding words and comparing embeddings. An in-depth analysis of BERT embeddings shows empirically that contextual embeddings can be employed to capture the required dependencies while maintaining the necessary scalability through appropriate pruning and smoothing techniques. We cast unconditional generation as a reinforcement learning problem and show that our reward function indeed provides a more effective learning signal than n-gram reward in this challenging setting.
△ Less
Submitted 5 March, 2020;
originally announced March 2020.
-
Outage Duration in Poisson Networks and its Application to Erasure Codes
Authors:
Udo Schilcher,
Siddhartha Borkotoky,
Jorge F. Schmidt,
Christian Bettstetter
Abstract:
We derive the probability distribution of the link outage duration at a typical receiver in a wireless network with Poisson distributed interferers sending messages with slotted random access over a Rayleigh fading channel. This result is used to analyze the performance of random linear network coding, showing that there is an optimum code rate and that interference correlation affects the decodin…
▽ More
We derive the probability distribution of the link outage duration at a typical receiver in a wireless network with Poisson distributed interferers sending messages with slotted random access over a Rayleigh fading channel. This result is used to analyze the performance of random linear network coding, showing that there is an optimum code rate and that interference correlation affects the decoding probability and throughput.
△ Less
Submitted 26 November, 2019;
originally announced November 2019.
-
Generalization in Generation: A closer look at Exposure Bias
Authors:
Florian Schmidt
Abstract:
Exposure bias refers to the train-test discrepancy that seemingly arises when an autoregressive generative model uses only ground-truth contexts at training time but generated ones at test time. We separate the contributions of the model and the learning framework to clarify the debate on consequences and review proposed counter-measures. In this light, we argue that generalization is the underlyi…
▽ More
Exposure bias refers to the train-test discrepancy that seemingly arises when an autoregressive generative model uses only ground-truth contexts at training time but generated ones at test time. We separate the contributions of the model and the learning framework to clarify the debate on consequences and review proposed counter-measures. In this light, we argue that generalization is the underlying property to address and propose unconditional generation as its fundamental benchmark. Finally, we combine latent variable modeling with a recent formulation of exploration in reinforcement learning to obtain a rigorous handling of true and generated contexts. Results on language modeling and variational sentence auto-encoding confirm the model's generalization capability.
△ Less
Submitted 7 November, 2019; v1 submitted 1 October, 2019;
originally announced October 2019.
-
InceptionTime: Finding AlexNet for Time Series Classification
Authors:
Hassan Ismail Fawaz,
Benjamin Lucas,
Germain Forestier,
Charlotte Pelletier,
Daniel F. Schmidt,
Jonathan Weber,
Geoffrey I. Webb,
Lhassane Idoumghar,
Pierre-Alain Muller,
François Petitjean
Abstract:
This paper brings deep learning at the forefront of research into Time Series Classification (TSC). TSC is the area of machine learning tasked with the categorization (or labelling) of time series. The last few decades of work in this area have led to significant progress in the accuracy of classifiers, with the state of the art now represented by the HIVE-COTE algorithm. While extremely accurate,…
▽ More
This paper brings deep learning at the forefront of research into Time Series Classification (TSC). TSC is the area of machine learning tasked with the categorization (or labelling) of time series. The last few decades of work in this area have led to significant progress in the accuracy of classifiers, with the state of the art now represented by the HIVE-COTE algorithm. While extremely accurate, HIVE-COTE cannot be applied to many real-world datasets because of its high training time complexity in O(N2 * T4) for a dataset with N time series of length T. For example, it takes HIVE-COTE more than 8 days to learn from a small dataset with N = 1500 time series of short length T = 46. Meanwhile deep learning has received enormous attention because of its high accuracy and scalability. Recent approaches to deep learning for TSC have been scalable, but less accurate than HIVE-COTE. We introduce InceptionTime - an ensemble of deep Convolutional Neural Network (CNN) models, inspired by the Inception-v4 architecture. Our experiments show that InceptionTime is on par with HIVE-COTE in terms of accuracy while being much more scalable: not only can it learn from 1,500 time series in one hour but it can also learn from 8M time series in 13 hours, a quantity of data that is fully out of reach of HIVE-COTE.
△ Less
Submitted 5 December, 2020; v1 submitted 11 September, 2019;
originally announced September 2019.
-
Autoregressive Text Generation Beyond Feedback Loops
Authors:
Florian Schmidt,
Stephan Mandt,
Thomas Hofmann
Abstract:
Autoregressive state transitions, where predictions are conditioned on past predictions, are the predominant choice for both deterministic and stochastic sequential models. However, autoregressive feedback exposes the evolution of the hidden state trajectory to potential biases from well-known train-test discrepancies. In this paper, we combine a latent state space model with a CRF observation mod…
▽ More
Autoregressive state transitions, where predictions are conditioned on past predictions, are the predominant choice for both deterministic and stochastic sequential models. However, autoregressive feedback exposes the evolution of the hidden state trajectory to potential biases from well-known train-test discrepancies. In this paper, we combine a latent state space model with a CRF observation model. We argue that such autoregressive observation models form an interesting middle ground that expresses local correlations on the word level but keeps the state evolution non-autoregressive. On unconditional sentence generation we show performance improvements compared to RNN and GAN baselines while avoiding some prototypical failure modes of autoregressive models.
△ Less
Submitted 30 August, 2019;
originally announced August 2019.
-
Adversarial camera stickers: A physical camera-based attack on deep learning systems
Authors:
Juncheng Li,
Frank R. Schmidt,
J. Zico Kolter
Abstract:
Recent work has documented the susceptibility of deep learning systems to adversarial examples, but most such attacks directly manipulate the digital input to a classifier. Although a smaller line of work considers physical adversarial attacks, in all cases these involve manipulating the object of interest, e.g., putting a physical sticker on an object to misclassify it, or manufacturing an object…
▽ More
Recent work has documented the susceptibility of deep learning systems to adversarial examples, but most such attacks directly manipulate the digital input to a classifier. Although a smaller line of work considers physical adversarial attacks, in all cases these involve manipulating the object of interest, e.g., putting a physical sticker on an object to misclassify it, or manufacturing an object specifically intended to be misclassified. In this work, we consider an alternative question: is it possible to fool deep classifiers, over all perceived objects of a certain type, by physically manipulating the camera itself? We show that by placing a carefully crafted and mainly-translucent sticker over the lens of a camera, one can create universal perturbations of the observed images that are inconspicuous, yet misclassify target objects as a different (targeted) class. To accomplish this, we propose an iterative procedure for both updating the attack perturbation (to make it adversarial for a given classifier), and the threat model itself (to ensure it is physically realizable). For example, we show that we can achieve physically-realizable attacks that fool ImageNet classifiers in a targeted fashion 49.6% of the time. This presents a new class of physically-realizable threat models to consider in the context of adversarially robust machine learning. Our demo video can be viewed at: https://youtu.be/wUVmL33Fx54
△ Less
Submitted 8 June, 2019; v1 submitted 21 March, 2019;
originally announced April 2019.
-
Application-Agnostic Offloading of Packet Processing
Authors:
Oliver Hohlfeld,
Helge Reelfs,
Jan Rüth,
Florian Schmidt,
Torsten Zimmermann,
Jens Hiller,
Klaus Wehrle
Abstract:
As network speed increases, servers struggle to serve all requests directed at them. This challenge is rooted in a partitioned data path where the split between the kernel space networking stack and user space applications induces overheads. To address this challenge, we propose Santa, a new architecture to optimize the data path by enabling server applications to partially offload packet processi…
▽ More
As network speed increases, servers struggle to serve all requests directed at them. This challenge is rooted in a partitioned data path where the split between the kernel space networking stack and user space applications induces overheads. To address this challenge, we propose Santa, a new architecture to optimize the data path by enabling server applications to partially offload packet processing to a generic rule processor. We exemplify Santa by showing how it can drastically accelerate kernel-based packet processing - a currently neglected domain. Our evaluation of a broad class of applications, namely DNS, Memcached, and HTTP, highlights that Santa can substantially improve the server performance by a factor of 5.5, 2.1, and 2.5, respectively.
△ Less
Submitted 1 April, 2019;
originally announced April 2019.
-
Interference Prediction in Wireless Networks: Stochastic Geometry meets Recursive Filtering
Authors:
Jorge F. Schmidt,
Udo Schilcher,
Mahin K. Atiq,
Christian Bettstetter
Abstract:
This article proposes and evaluates a technique to predict the level of interference in wireless networks. We design a recursive predictor that estimates future interference values by filtering measured interference at a given location. The predictor's parameterization is done offline by translating the autocorrelation of interference into an autoregressive moving average (ARMA) representation. Th…
▽ More
This article proposes and evaluates a technique to predict the level of interference in wireless networks. We design a recursive predictor that estimates future interference values by filtering measured interference at a given location. The predictor's parameterization is done offline by translating the autocorrelation of interference into an autoregressive moving average (ARMA) representation. This ARMA model is inserted into a steady-state Kalman filter enabling nodes to predict with low computational effort. Results show a good accuracy of predicted values versus true values for relevant time horizons. Although the predictor is parameterized for Poisson-distributed nodes, Rayleigh fading, and fixed message lengths, a sensitivity analysis shows that it also tends to work well in more general network scenarios. Numerical examples for underlay device-to-device communications, a common wireless sensor technology, and coexistence scenarios of Wi-Fi and LTE illustrate its broad applicability. The predictor can be applied as part of interference management to improve medium access, scheduling, and radio resource allocation.
△ Less
Submitted 10 February, 2021; v1 submitted 26 March, 2019;
originally announced March 2019.