-
TSPP: A Unified Benchmarking Tool for Time-series Forecasting
Authors:
Jan Bączek,
Dmytro Zhylko,
Gilberto Titericz,
Sajad Darabi,
Jean-Francois Puget,
Izzy Putterman,
Dawid Majchrowski,
Anmol Gupta,
Kyle Kranen,
Pawel Morkisz
Abstract:
While machine learning has witnessed significant advancements, the emphasis has largely been on data acquisition and model creation. However, achieving a comprehensive assessment of machine learning solutions in real-world settings necessitates standardization throughout the entire pipeline. This need is particularly acute in time series forecasting, where diverse settings impede meaningful compar…
▽ More
While machine learning has witnessed significant advancements, the emphasis has largely been on data acquisition and model creation. However, achieving a comprehensive assessment of machine learning solutions in real-world settings necessitates standardization throughout the entire pipeline. This need is particularly acute in time series forecasting, where diverse settings impede meaningful comparisons between various methods. To bridge this gap, we propose a unified benchmarking framework that exposes the crucial modelling and machine learning decisions involved in developing time series forecasting models. This framework fosters seamless integration of models and datasets, aiding both practitioners and researchers in their development efforts. We benchmark recently proposed models within this framework, demonstrating that carefully implemented deep learning models with minimal effort can rival gradient-boosting decision trees requiring extensive feature engineering and expert knowledge.
△ Less
Submitted 8 January, 2024; v1 submitted 28 December, 2023;
originally announced December 2023.
-
Deep learning-based estimation of time-dependent parameters in Markov models with application to nonlinear regression and SDEs
Authors:
Andrzej Kałuża,
Paweł M. Morkisz,
Bartłomiej Mulewicz,
Paweł Przybyłowicz,
Martyna Wiącek
Abstract:
We present a novel deep learning method for estimating time-dependent parameters in Markov processes through discrete sampling. Departing from conventional machine learning, our approach reframes parameter approximation as an optimization problem using the maximum likelihood approach. Experimental validation focuses on parameter estimation in multivariate regression and stochastic differential equ…
▽ More
We present a novel deep learning method for estimating time-dependent parameters in Markov processes through discrete sampling. Departing from conventional machine learning, our approach reframes parameter approximation as an optimization problem using the maximum likelihood approach. Experimental validation focuses on parameter estimation in multivariate regression and stochastic differential equations (SDEs). Theoretical results show that the real solution is close to SDE with parameters approximated using our neural network-derived under specific conditions. Our work contributes to SDE-based model parameter estimation, offering a versatile tool for diverse fields.
△ Less
Submitted 13 December, 2023;
originally announced December 2023.
-
On the randomized Euler algorithm under inexact information
Authors:
Marcin Baranek,
Andrzej Kałuża,
Paweł M. Morkisz,
Paweł Przybyłowicz,
Michał Sobieraj
Abstract:
This paper focuses on analyzing the error of the randomized Euler algorithm when only noisy information about the coefficients of the underlying stochastic differential equation (SDE) and the driving Wiener process is available. Two classes of disturbed Wiener process are considered, and the dependence of the algorithm's error on the regularity of the disturbing functions is investigated. The pape…
▽ More
This paper focuses on analyzing the error of the randomized Euler algorithm when only noisy information about the coefficients of the underlying stochastic differential equation (SDE) and the driving Wiener process is available. Two classes of disturbed Wiener process are considered, and the dependence of the algorithm's error on the regularity of the disturbing functions is investigated. The paper also presents results from numerical experiments to support the theoretical findings.
△ Less
Submitted 10 July, 2023;
originally announced July 2023.
-
Heterogenous Ensemble of Models for Molecular Property Prediction
Authors:
Sajad Darabi,
Shayan Fazeli,
Jiwei Liu,
Alexandre Milesi,
Pawel Morkisz,
Jean-François Puget,
Gilberto Titericz
Abstract:
Previous works have demonstrated the importance of considering different modalities on molecules, each of which provide a varied granularity of information for downstream property prediction tasks. Our method combines variants of the recent TransformerM architecture with Transformer, GNN, and ResNet backbone architectures. Models are trained on the 2D data, 3D data, and image modalities of molecul…
▽ More
Previous works have demonstrated the importance of considering different modalities on molecules, each of which provide a varied granularity of information for downstream property prediction tasks. Our method combines variants of the recent TransformerM architecture with Transformer, GNN, and ResNet backbone architectures. Models are trained on the 2D data, 3D data, and image modalities of molecular graphs. We ensemble these models with a HuberRegressor. The models are trained on 4 different train/validation splits of the original train + valid datasets. This yields a winning solution to the 2\textsuperscript{nd} edition of the OGB Large-Scale Challenge (2022) on the PCQM4Mv2 molecular property prediction dataset. Our proposed method achieves a test-challenge MAE of $0.0723$ and a validation MAE of $0.07145$. Total inference time for our solution is less than 2 hours. We open-source our code at https://github.com/jfpuget/NVIDIA-PCQM4Mv2.
△ Less
Submitted 20 November, 2022;
originally announced November 2022.
-
A Framework for Large Scale Synthetic Graph Dataset Generation
Authors:
Sajad Darabi,
Piotr Bigaj,
Dawid Majchrowski,
Artur Kasymov,
Pawel Morkisz,
Alex Fit-Florea
Abstract:
Recently there has been increasing interest in developing and deploying deep graph learning algorithms for many tasks, such as fraud detection and recommender systems. Albeit, there is a limited number of publicly available graph-structured datasets, most of which are tiny compared to production-sized applications or are limited in their application domain. This work tackles this shortcoming by pr…
▽ More
Recently there has been increasing interest in developing and deploying deep graph learning algorithms for many tasks, such as fraud detection and recommender systems. Albeit, there is a limited number of publicly available graph-structured datasets, most of which are tiny compared to production-sized applications or are limited in their application domain. This work tackles this shortcoming by proposing a scalable synthetic graph generation tool to scale the datasets to production-size graphs with trillions of edges and billions of nodes. The tool learns a series of parametric models from proprietary datasets that can be released to researchers to study various graph methods on the synthetic data increasing prototype development and novel applications. We demonstrate the generalizability of the framework across a series of datasets, mimicking structural and feature distributions as well as the ability to scale them across varying sizes demonstrating their usefulness for benchmarking and model development. Code can be found on https://github.com/NVIDIA/DeepLearningExamples/tree/master/Tools/DGLPyTorch/SyntheticGraphGeneration.
△ Less
Submitted 5 October, 2023; v1 submitted 4 October, 2022;
originally announced October 2022.
-
Tiered Pruning for Efficient Differentialble Inference-Aware Neural Architecture Search
Authors:
Sławomir Kierat,
Mateusz Sieniawski,
Denys Fridman,
Chen-Han Yu,
Szymon Migacz,
Paweł Morkisz,
Alex-Fit Florea
Abstract:
We propose three novel pruning techniques to improve the cost and results of inference-aware Differentiable Neural Architecture Search (DNAS). First, we introduce Prunode, a stochastic bi-path building block for DNAS, which can search over inner hidden dimensions with O(1) memory and compute complexity. Second, we present an algorithm for pruning blocks within a stochastic layer of the SuperNet du…
▽ More
We propose three novel pruning techniques to improve the cost and results of inference-aware Differentiable Neural Architecture Search (DNAS). First, we introduce Prunode, a stochastic bi-path building block for DNAS, which can search over inner hidden dimensions with O(1) memory and compute complexity. Second, we present an algorithm for pruning blocks within a stochastic layer of the SuperNet during the search. Third, we describe a novel technique for pruning unnecessary stochastic layers during the search. The optimized models resulting from the search are called PruNet and establishes a new state-of-the-art Pareto frontier for NVIDIA V100 in terms of inference latency for ImageNet Top-1 image classification accuracy. PruNet as a backbone also outperforms GPUNet and EfficientNet on the COCO object detection task on inference latency relative to mean Average Precision (mAP).
△ Less
Submitted 5 January, 2023; v1 submitted 23 September, 2022;
originally announced September 2022.
-
Euler scheme for approximation of solution of nonlinear ODEs under inexact information
Authors:
Natalia Czyżewska,
Paweł M. Morkisz,
Paweł Przybyłowicz
Abstract:
We investigate error of the Euler scheme in the case when the right-hand side function of the underlying ODE satisfies nonstandard assumptions such as local one-sided Lipschitz condition and local Hölder continuity. Moreover, we assume two cases in regards to information availability: exact and noisy with respect to the right-hand side function. Optimality analysis of the Euler scheme is also prov…
▽ More
We investigate error of the Euler scheme in the case when the right-hand side function of the underlying ODE satisfies nonstandard assumptions such as local one-sided Lipschitz condition and local Hölder continuity. Moreover, we assume two cases in regards to information availability: exact and noisy with respect to the right-hand side function. Optimality analysis of the Euler scheme is also provided. Finally, we present the results of some numerical experiments.
△ Less
Submitted 3 August, 2023; v1 submitted 15 September, 2022;
originally announced September 2022.
-
Relative Molecule Self-Attention Transformer
Authors:
Łukasz Maziarka,
Dawid Majchrowski,
Tomasz Danel,
Piotr Gaiński,
Jacek Tabor,
Igor Podolak,
Paweł Morkisz,
Stanisław Jastrzębski
Abstract:
Self-supervised learning holds promise to revolutionize molecule property prediction - a central task to drug discovery and many more industries - by enabling data efficient learning from scarce experimental data. Despite significant progress, non-pretrained methods can be still competitive in certain settings. We reason that architecture might be a key bottleneck. In particular, enriching the bac…
▽ More
Self-supervised learning holds promise to revolutionize molecule property prediction - a central task to drug discovery and many more industries - by enabling data efficient learning from scarce experimental data. Despite significant progress, non-pretrained methods can be still competitive in certain settings. We reason that architecture might be a key bottleneck. In particular, enriching the backbone architecture with domain-specific inductive biases has been key for the success of self-supervised learning in other domains. In this spirit, we methodologically explore the design space of the self-attention mechanism tailored to molecular data. We identify a novel variant of self-attention adapted to processing molecules, inspired by the relative self-attention layer, which involves fusing embedded graph and distance relationships between atoms. Our main contribution is Relative Molecule Attention Transformer (R-MAT): a novel Transformer-based model based on the developed self-attention layer that achieves state-of-the-art or very competitive results across a~wide range of molecule property prediction tasks.
△ Less
Submitted 12 October, 2021;
originally announced October 2021.
-
Efficient GPU implementation of randomized SVD and its applications
Authors:
Łukasz Struski,
Paweł Morkisz,
Przemysław Spurek,
Samuel Rodriguez Bernabeu,
Tomasz Trzciński
Abstract:
Matrix decompositions are ubiquitous in machine learning, including applications in dimensionality reduction, data compression and deep learning algorithms. Typical solutions for matrix decompositions have polynomial complexity which significantly increases their computational cost and time. In this work, we leverage efficient processing operations that can be run in parallel on modern Graphical P…
▽ More
Matrix decompositions are ubiquitous in machine learning, including applications in dimensionality reduction, data compression and deep learning algorithms. Typical solutions for matrix decompositions have polynomial complexity which significantly increases their computational cost and time. In this work, we leverage efficient processing operations that can be run in parallel on modern Graphical Processing Units (GPUs), predominant computing architecture used e.g. in deep learning, to reduce the computational burden of computing matrix decompositions. More specifically, we reformulate the randomized decomposition problem to incorporate fast matrix multiplication operations (BLAS-3) as building blocks. We show that this formulation, combined with fast random number generators, allows to fully exploit the potential of parallel processing implemented in GPUs. Our extensive evaluation confirms the superiority of this approach over the competing methods and we release the results of this research as a part of the official CUDA implementation (https://docs.nvidia.com/cuda/cusolver/index.html).
△ Less
Submitted 12 March, 2024; v1 submitted 5 October, 2021;
originally announced October 2021.
-
Approximation of solutions of DDEs under nonstandard assumptions via Euler scheme
Authors:
Natalia Czyżewska,
Paweł M. Morkisz,
Paweł Przybyłowicz
Abstract:
We deal with approximation of solutions of delay differential equations (DDEs) via the classical Euler algorithm. We investigate the pointwise error of the Euler scheme under nonstandard assumptions imposed on the right-hand side function $f$. Namely, we assume that $f$ is globally of at most linear growth, satisfies globally one-side Lipschitz condition but it is only locally Hölder continuous. W…
▽ More
We deal with approximation of solutions of delay differential equations (DDEs) via the classical Euler algorithm. We investigate the pointwise error of the Euler scheme under nonstandard assumptions imposed on the right-hand side function $f$. Namely, we assume that $f$ is globally of at most linear growth, satisfies globally one-side Lipschitz condition but it is only locally Hölder continuous. We provide a detailed error analysis of the Euler algorithm under such nonstandard regularity conditions. Moreover, we report results of numerical experiments.
△ Less
Submitted 12 January, 2022; v1 submitted 7 June, 2021;
originally announced June 2021.
-
On mathematical aspects of evolution of dislocation density in metallic materials
Authors:
Natalia Czyżewska,
Jan Kusiak,
Paweł Morkisz,
Piotr Oprocha,
Maciej Pietrzyk,
Paweł Przybyłowicz,
Łukasz Rauch,
Danuta Szeliga
Abstract:
This paper deals with the solution of delay differential equations describing evolution of dislocation density in metallic materials. Hardening, restoration, and recrystallization characterizing the evolution of dislocation populations provide the essential equation of the model. The last term transforms ordinary differential equation (ODE) into delay differential equation (DDE) with strong (in ge…
▽ More
This paper deals with the solution of delay differential equations describing evolution of dislocation density in metallic materials. Hardening, restoration, and recrystallization characterizing the evolution of dislocation populations provide the essential equation of the model. The last term transforms ordinary differential equation (ODE) into delay differential equation (DDE) with strong (in general, Hölder) nonlinearity. We prove upper error bounds for the explicit Euler method, under the assumption that the right-hand side function is Hölder continuous and monotone which allows us to compare accuracy of other numerical methods in our model (e.g. Runge-Kutta), in particular when explicit formulas for solutions are not known. Finally, we test the above results in simulations of real industrial process.
△ Less
Submitted 17 November, 2020;
originally announced November 2020.
-
Randomized Runge-Kutta method -- stability and convergence under inexact information
Authors:
Tomasz Bochacik,
Maciej Goćwin,
Paweł M. Morkisz,
Paweł Przybyłowicz
Abstract:
We deal with optimal approximation of solutions of ODEs under local Lipschitz condition and inexact discrete information about the right-hand side functions. We show that the randomized two-stage Runge-Kutta scheme is the optimal method among all randomized algorithms based on standard noisy information. We perform numerical experiments that confirm our theoretical findings. Moreover, for the opti…
▽ More
We deal with optimal approximation of solutions of ODEs under local Lipschitz condition and inexact discrete information about the right-hand side functions. We show that the randomized two-stage Runge-Kutta scheme is the optimal method among all randomized algorithms based on standard noisy information. We perform numerical experiments that confirm our theoretical findings. Moreover, for the optimal algorithm we rigorously investigate properties of regions of absolute stability.
△ Less
Submitted 22 June, 2020;
originally announced June 2020.
-
Randomized derivative-free Milstein algorithm for efficient approximation of solutions of SDEs under noisy information
Authors:
Paweł M. Morkisz,
Paweł Przybyłowicz
Abstract:
We deal with pointwise approximation of solutions of scalar stochastic differential equations in the presence of informational noise about underlying drift and diffusion coefficients. We define a randomized derivative-free version of Milstein algorithm $\mathcal{\bar A}^{df-RM}_n$ and investigate its error. We also study lower bounds on the error of an arbitrary algorithm. It turns out that in som…
▽ More
We deal with pointwise approximation of solutions of scalar stochastic differential equations in the presence of informational noise about underlying drift and diffusion coefficients. We define a randomized derivative-free version of Milstein algorithm $\mathcal{\bar A}^{df-RM}_n$ and investigate its error. We also study lower bounds on the error of an arbitrary algorithm. It turns out that in some case the scheme $\mathcal{\bar A}^{df-RM}_n$ is the optimal one. Finally, in order to test the algorithm $\mathcal{\bar A}^{df-RM}_n$ in practice, we report performed numerical experiments.
△ Less
Submitted 14 December, 2019;
originally announced December 2019.
-
Optimal approximation of stochastic integrals in analytic noise model
Authors:
Andrzej Kałuża,
Paweł M. Morkisz,
Paweł Przybyłowicz
Abstract:
We study approximate stochastic Itô integration of processes belonging to a class of progressively measurable stochastic processes that are Hölder continuous in the $r$th mean.
Inspired by increasingly popularity of computations with low precision (used on Graphics Processing Units -- GPUs and standard Computer Processing Units -- CPU for significant speedup), we introduce a suitable analytic no…
▽ More
We study approximate stochastic Itô integration of processes belonging to a class of progressively measurable stochastic processes that are Hölder continuous in the $r$th mean.
Inspired by increasingly popularity of computations with low precision (used on Graphics Processing Units -- GPUs and standard Computer Processing Units -- CPU for significant speedup), we introduce a suitable analytic noise model of standard noisy information about $X$ and $W$. In this model we show that the upper bounds on the error of the Riemann-Maruyama quadrature are proportional to $n^{-\varrho}+δ_1+δ_2$, where $n$ is a number of noisy evaluations of $X$ and $W$, $\varrho\in (0,1]$ is a Hölder exponent of $X$, and $δ_1,δ_2\geq 0$ are precision parameters for values of $X$ and $W$, respectively. Moreover, we show that the error of any algorithm based on at most $n$ noisy evaluations of $X$ and $W$ is at least $C(n^{-\varrho}+δ_1)$. Finally, we report numerical experiments performed on both CPU and GPU, that confirm our theoretical findings, together with some computational performance comparison between those two architectures.
△ Less
Submitted 27 December, 2018;
originally announced December 2018.