(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–28 of 28 results for author: Sarlos, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.15881  [pdf, other

    cs.LG cs.AI

    Fast Tree-Field Integrators: From Low Displacement Rank to Topological Transformers

    Authors: Krzysztof Choromanski, Arijit Sehanobish, Somnath Basu Roy Chowdhury, Han Lin, Avinava Dubey, Tamas Sarlos, Snigdha Chaturvedi

    Abstract: We present a new class of fast polylog-linear algorithms based on the theory of structured matrices (in particular low displacement rank) for integrating tensor fields defined on weighted trees. Several applications of the resulting fast tree-field integrators (FTFIs) are presented, including (a) approximation of graph metrics with tree metrics, (b) graph classification, (c) modeling on meshes, an… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Preprint. Comments welcome

  2. arXiv:2403.00028  [pdf, ps, other

    cs.CR cs.LG

    Lower Bounds for Differential Privacy Under Continual Observation and Online Threshold Queries

    Authors: Edith Cohen, Xin Lyu, Jelani Nelson, Tamás Sarlós, Uri Stemmer

    Abstract: One of the most basic problems for studying the "price of privacy over time" is the so called private counter problem, introduced by Dwork et al. (2010) and Chan et al. (2010). In this problem, we aim to track the number of events that occur over time, while hiding the existence of every single event. More specifically, in every time step $t\in[T]$ we learn (in an online fashion) that $Δでるた_t\geq 0$… ▽ More

    Submitted 17 April, 2024; v1 submitted 28 February, 2024; originally announced March 2024.

  3. arXiv:2312.02132  [pdf, other

    cs.LG cs.AI cs.CR cs.DS

    Hot PATE: Private Aggregation of Distributions for Diverse Task

    Authors: Edith Cohen, Benjamin Cohen-Wang, Xin Lyu, Jelani Nelson, Tamas Sarlos, Uri Stemmer

    Abstract: The Private Aggregation of Teacher Ensembles (PATE) framework is a versatile approach to privacy-preserving machine learning. In PATE, teacher models that are not privacy-preserving are trained on distinct portions of sensitive data. Privacy-preserving knowledge transfer to a student model is then facilitated by privately aggregating teachers' predictions on new examples. Employing PATE with gener… ▽ More

    Submitted 17 May, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

  4. arXiv:2312.01990  [pdf, other

    cs.RO cs.AI

    SARA-RT: Scaling up Robotics Transformers with Self-Adaptive Robust Attention

    Authors: Isabel Leal, Krzysztof Choromanski, Deepali Jain, Avinava Dubey, Jake Varley, Michael Ryoo, Yao Lu, Frederick Liu, Vikas Sindhwani, Quan Vuong, Tamas Sarlos, Ken Oslund, Karol Hausman, Kanishka Rao

    Abstract: We present Self-Adaptive Robust Attention for Robotics Transformers (SARA-RT): a new paradigm for addressing the emerging challenge of scaling up Robotics Transformers (RT) for on-robot deployment. SARA-RT relies on the new method of fine-tuning proposed by us, called up-training. It converts pre-trained or already fine-tuned Transformer-based robotic policies of quadratic time complexity (includi… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  5. arXiv:2311.01960  [pdf, ps, other

    cs.DS cs.LG

    Hardness of Low Rank Approximation of Entrywise Transformed Matrix Products

    Authors: Tamas Sarlos, Xingyou Song, David Woodruff, Qiuyi, Zhang

    Abstract: Inspired by fast algorithms in natural language processing, we study low rank approximation in the entrywise transformed setting where we want to find a good rank $k$ approximation to $f(U \cdot V)$, where $U, V^\top \in \mathbb{R}^{n \times r}$ are given, $r = O(\log(n))$, and $f(x)$ is a general scalar function. Previous work in sublinear low rank approximation has shown that if both (1)… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

    Comments: Accepted and formatted in Neurips 2023

  6. arXiv:2302.01925  [pdf, other

    cs.LG

    Learning a Fourier Transform for Linear Relative Positional Encodings in Transformers

    Authors: Krzysztof Marcin Choromanski, Shanda Li, Valerii Likhosherstov, Kumar Avinava Dubey, Shengjie Luo, Di He, Yiming Yang, Tamas Sarlos, Thomas Weingarten, Adrian Weller

    Abstract: We propose a new class of linear Transformers called FourierLearner-Transformers (FLTs), which incorporate a wide range of relative positional encoding mechanisms (RPEs). These include regular RPE techniques applied for sequential data, as well as novel RPEs operating on geometric data embedded in higher-dimensional Euclidean spaces. FLTs construct the optimal RPE mechanism implicitly by learning… ▽ More

    Submitted 3 April, 2024; v1 submitted 3 February, 2023; originally announced February 2023.

    Comments: AISTATS 2024

  7. arXiv:2302.00942  [pdf, other

    cs.LG

    Efficient Graph Field Integrators Meet Point Clouds

    Authors: Krzysztof Choromanski, Arijit Sehanobish, Han Lin, Yunfan Zhao, Eli Berger, Tetiana Parshakova, Alvin Pan, David Watkins, Tianyi Zhang, Valerii Likhosherstov, Somnath Basu Roy Chowdhury, Avinava Dubey, Deepali Jain, Tamas Sarlos, Snigdha Chaturvedi, Adrian Weller

    Abstract: We present two new classes of algorithms for efficient field integration on graphs encoding point clouds. The first class, SeparatorFactorization(SF), leverages the bounded genus of point cloud mesh graphs, while the second class, RFDiffusion(RFD), uses popular epsilon-nearest-neighbor graph representations for point clouds. Both can be viewed as providing the functionality of Fast Multipole Metho… ▽ More

    Submitted 4 October, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

    Journal ref: ICML 2023

  8. arXiv:2302.00787  [pdf, other

    cs.LG stat.ML

    FAVOR#: Sharp Attention Kernel Approximations via New Classes of Positive Random Features

    Authors: Valerii Likhosherstov, Krzysztof Choromanski, Avinava Dubey, Frederick Liu, Tamas Sarlos, Adrian Weller

    Abstract: The problem of efficient approximation of a linear operator induced by the Gaussian or softmax kernel is often addressed using random features (RFs) which yield an unbiased approximation of the operator's result. Such operators emerge in important applications ranging from kernel methods to efficient Transformers. We propose parameterized, positive, non-trigonometric RFs which approximate Gaussian… ▽ More

    Submitted 1 February, 2023; originally announced February 2023.

  9. arXiv:2211.12063  [pdf, ps, other

    cs.CR cs.DS

    Generalized Private Selection and Testing with High Confidence

    Authors: Edith Cohen, Xin Lyu, Jelani Nelson, Tamás Sarlós, Uri Stemmer

    Abstract: Composition theorems are general and powerful tools that facilitate privacy accounting across multiple data accesses from per-access privacy bounds. However they often result in weaker bounds compared with end-to-end analysis. Two popular tools that mitigate that are the exponential mechanism (or report noisy max) and the sparse vector technique. They were generalized in a couple of recent private… ▽ More

    Submitted 9 February, 2023; v1 submitted 22 November, 2022; originally announced November 2022.

    Comments: Appeared in ITCS 2023; This version: revised introduction and related works sections;

  10. arXiv:2211.06387  [pdf, ps, other

    cs.LG cs.CR cs.DS

    Õptimal Differentially Private Learning of Thresholds and Quasi-Concave Optimization

    Authors: Edith Cohen, Xin Lyu, Jelani Nelson, Tamás Sarlós, Uri Stemmer

    Abstract: The problem of learning threshold functions is a fundamental one in machine learning. Classical learning theory implies sample complexity of $O(ξくしー^{-1} \log(1/βべーた))$ (for generalization error $ξくしー$ with confidence $1-βべーた$). The private version of the problem, however, is more challenging and in particular, the sample complexity must depend on the size $|X|$ of the domain. Progress on quantifying this dep… ▽ More

    Submitted 11 November, 2022; originally announced November 2022.

  11. arXiv:2207.00956  [pdf, ps, other

    cs.DS cs.CR cs.LG

    Tricking the Hashing Trick: A Tight Lower Bound on the Robustness of CountSketch to Adaptive Inputs

    Authors: Edith Cohen, Jelani Nelson, Tamás Sarlós, Uri Stemmer

    Abstract: CountSketch and Feature Hashing (the "hashing trick") are popular randomized dimensionality reduction methods that support recovery of $\ell_2$-heavy hitters (keys $i$ where $v_i^2 > εいぷしろん\|\boldsymbol{v}\|_2^2$) and approximate inner products. When the inputs are {\em not adaptive} (do not depend on prior outputs), classic estimators applied to a sketch of size $O(\ell/εいぷしろん)$ are accurate for a number o… ▽ More

    Submitted 28 August, 2022; v1 submitted 3 July, 2022; originally announced July 2022.

  12. arXiv:2205.15317  [pdf, other

    cs.LG cs.AI

    Chefs' Random Tables: Non-Trigonometric Random Features

    Authors: Valerii Likhosherstov, Krzysztof Choromanski, Avinava Dubey, Frederick Liu, Tamas Sarlos, Adrian Weller

    Abstract: We introduce chefs' random tables (CRTs), a new class of non-trigonometric random features (RFs) to approximate Gaussian and softmax kernels. CRTs are an alternative to standard random kitchen sink (RKS) methods, which inherently rely on the trigonometric maps. We present variants of CRTs where RFs are positive, a key requirement for applications in recent low-rank Transformers. Further variance r… ▽ More

    Submitted 30 May, 2022; originally announced May 2022.

  13. arXiv:2202.13736  [pdf, other

    cs.DS cs.LG

    On the Robustness of CountSketch to Adaptive Inputs

    Authors: Edith Cohen, Xin Lyu, Jelani Nelson, Tamás Sarlós, Moshe Shechner, Uri Stemmer

    Abstract: CountSketch is a popular dimensionality reduction technique that maps vectors to a lower dimension using randomized linear measurements. The sketch supports recovering $\ell_2$-heavy hitters of a vector (entries with $v[i]^2 \geq \frac{1}{k}\|\boldsymbol{v}\|^2_2$). We study the robustness of the sketch in adaptive settings where input vectors may depend on the output from prior inputs. Adaptive s… ▽ More

    Submitted 28 February, 2022; originally announced February 2022.

  14. arXiv:2107.07999  [pdf, other

    cs.LG cs.AI

    From block-Toeplitz matrices to differential equations on graphs: towards a general theory for scalable masked Transformers

    Authors: Krzysztof Choromanski, Han Lin, Haoxian Chen, Tianyi Zhang, Arijit Sehanobish, Valerii Likhosherstov, Jack Parker-Holder, Tamas Sarlos, Adrian Weller, Thomas Weingarten

    Abstract: In this paper we provide, to the best of our knowledge, the first comprehensive approach for incorporating various masking mechanisms into Transformers architectures in a scalable way. We show that recent results on linear causal attention (Choromanski et al., 2021) and log-linear RPE-attention (Luo et al., 2021) are special cases of this general mechanism. However by casting the problem as a topo… ▽ More

    Submitted 27 March, 2023; v1 submitted 16 July, 2021; originally announced July 2021.

    Comments: 20 pages, 12 figures

  15. arXiv:2101.07415  [pdf, other

    cs.LG cs.NE cs.RO

    ES-ENAS: Efficient Evolutionary Optimization for Large Hybrid Search Spaces

    Authors: Xingyou Song, Krzysztof Choromanski, Jack Parker-Holder, Yunhao Tang, Qiuyi Zhang, Daiyi Peng, Deepali Jain, Wenbo Gao, Aldo Pacchiano, Tamas Sarlos, Yuxiang Yang

    Abstract: In this paper, we approach the problem of optimizing blackbox functions over large hybrid search spaces consisting of both combinatorial and continuous parameters. We demonstrate that previous evolutionary algorithms which rely on mutation-based approaches, while flexible over combinatorial spaces, suffer from a curse of dimensionality in high dimensional continuous spaces both theoretically and e… ▽ More

    Submitted 15 March, 2023; v1 submitted 18 January, 2021; originally announced January 2021.

    Comments: Previously published at ICLR 2020 NAS Workshop. See https://github.com/google-research/google-research/tree/master/es_enas for associated code

  16. arXiv:2010.13048  [pdf, other

    cs.LG cs.CR cs.DS stat.ML

    Differentially Private Weighted Sampling

    Authors: Edith Cohen, Ofir Geri, Tamas Sarlos, Uri Stemmer

    Abstract: Common datasets have the form of elements with keys (e.g., transactions and products) and the goal is to perform analytics on the aggregated form of key and frequency pairs. A weighted sample of keys by (a function of) frequency is a highly versatile summary that provides a sparse set of representative keys and supports approximate evaluations of query statistics. We propose private weighted sampl… ▽ More

    Submitted 31 March, 2021; v1 submitted 25 October, 2020; originally announced October 2020.

    Comments: 38 pages, 9 figures

    Journal ref: AISTATS 2021

  17. arXiv:2009.14794  [pdf, other

    cs.LG cs.CL stat.ML

    Rethinking Attention with Performers

    Authors: Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, Afroz Mohiuddin, Lukasz Kaiser, David Belanger, Lucy Colwell, Adrian Weller

    Abstract: We introduce Performers, Transformer architectures which can estimate regular (softmax) full-rank-attention Transformers with provable accuracy, but using only linear (as opposed to quadratic) space and time complexity, without relying on any priors such as sparsity or low-rankness. To approximate softmax attention-kernels, Performers use a novel Fast Attention Via positive Orthogonal Random featu… ▽ More

    Submitted 19 November, 2022; v1 submitted 30 September, 2020; originally announced September 2020.

    Comments: Published as a conference paper + oral presentation at ICLR 2021. 38 pages. See https://github.com/google-research/google-research/tree/master/protein_lm for protein language model code, and https://github.com/google-research/google-research/tree/master/performer for Performer code. See https://ai.googleblog.com/2020/10/rethinking-attention-with-performers.html for Google AI Blog

  18. arXiv:2006.03555  [pdf, other

    cs.LG cs.CL stat.ML

    Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers

    Authors: Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, David Belanger, Lucy Colwell, Adrian Weller

    Abstract: Transformer models have achieved state-of-the-art results across a diverse range of domains. However, concern over the cost of training the attention mechanism to learn complex dependencies between distant inputs continues to grow. In response, solutions that exploit the structure and sparsity of the learned attention matrix have blossomed. However, real-world applications that involve long sequen… ▽ More

    Submitted 30 September, 2020; v1 submitted 5 June, 2020; originally announced June 2020.

    Comments: This arXiv submission has been deprecated. Please see "Rethinking Attention with Performers" at arXiv:2009.14794 for the most updated version of the paper

  19. arXiv:2003.13563  [pdf, other

    cs.LG stat.ML

    Stochastic Flows and Geometric Optimization on the Orthogonal Group

    Authors: Krzysztof Choromanski, David Cheikhi, Jared Davis, Valerii Likhosherstov, Achille Nazaret, Achraf Bahamou, Xingyou Song, Mrugank Akarte, Jack Parker-Holder, Jacob Bergquist, Yuan Gao, Aldo Pacchiano, Tamas Sarlos, Adrian Weller, Vikas Sindhwani

    Abstract: We present a new class of stochastic, geometrically-driven optimization algorithms on the orthogonal group $O(d)$ and naturally reductive homogeneous manifolds obtained from the action of the rotation group $SO(d)$. We theoretically and experimentally demonstrate that our methods can be applied in various fields of machine learning including deep, convolutional and recurrent neural networks, reinf… ▽ More

    Submitted 30 March, 2020; originally announced March 2020.

  20. arXiv:1907.06511  [pdf, other

    cs.NE cs.AI cs.LG cs.RO

    Reinforcement Learning with Chromatic Networks for Compact Architecture Search

    Authors: Xingyou Song, Krzysztof Choromanski, Jack Parker-Holder, Yunhao Tang, Wenbo Gao, Aldo Pacchiano, Tamas Sarlos, Deepali Jain, Yuxiang Yang

    Abstract: We present a neural architecture search algorithm to construct compact reinforcement learning (RL) policies, by combining ENAS and ES in a highly scalable and intuitive way. By defining the combinatorial search space of NAS to be the set of different edge-partitionings (colorings) into same-weight classes, we represent compact architectures via efficient learned edge-partitionings. For several RL… ▽ More

    Submitted 6 April, 2021; v1 submitted 10 July, 2019; originally announced July 2019.

    Comments: Published at ICLR 2020 Neural Architecture Search Workshop. This paper is deprecated; please see arXiv:2101.07415 for the newer version

  21. arXiv:1905.12721  [pdf, other

    cs.LG math.OC stat.ML

    Matrix-Free Preconditioning in Online Learning

    Authors: Ashok Cutkosky, Tamas Sarlos

    Abstract: We provide an online convex optimization algorithm with regret that interpolates between the regret of an algorithm using an optimal preconditioning matrix and one using a diagonal preconditioning matrix. Our regret bound is never worse than that obtained by diagonal preconditioning, and in certain setting even surpasses that of algorithms with full-matrix preconditioning. Importantly, our algorit… ▽ More

    Submitted 29 May, 2019; originally announced May 2019.

    Comments: ICML 2019

  22. arXiv:1903.03784  [pdf, other

    stat.ML cs.LG

    Orthogonal Estimation of Wasserstein Distances

    Authors: Mark Rowland, Jiri Hron, Yunhao Tang, Krzysztof Choromanski, Tamas Sarlos, Adrian Weller

    Abstract: Wasserstein distances are increasingly used in a wide variety of applications in machine learning. Sliced Wasserstein distances form an important subclass which may be estimated efficiently through one-dimensional sorting operations. In this paper, we propose a new variant of sliced Wasserstein distance, study the use of orthogonal coupling in Monte Carlo estimation of Wasserstein distances and dr… ▽ More

    Submitted 5 April, 2019; v1 submitted 9 March, 2019; originally announced March 2019.

    Comments: Published at AISTATS 2019

  23. arXiv:1704.01255  [pdf, other

    cs.LG stat.ML

    Linear Additive Markov Processes

    Authors: Ravi Kumar, Maithra Raghu, Tamas Sarlos, Andrew Tomkins

    Abstract: We introduce LAMP: the Linear Additive Markov Process. Transitions in LAMP may be influenced by states visited in the distant history of the process, but unlike higher-order Markov processes, LAMP retains an efficient parametrization. LAMP also allows the specific dependence on history to be learned efficiently from data. We characterize some theoretical properties of LAMP, including its steady-st… ▽ More

    Submitted 4 April, 2017; originally announced April 2017.

    Comments: Accepted to WWW 2017

  24. arXiv:1610.06209  [pdf, other

    cs.LG

    Structured adaptive and random spinners for fast machine learning computations

    Authors: Mariusz Bojarski, Anna Choromanska, Krzysztof Choromanski, Francois Fagan, Cedric Gouy-Pailler, Anne Morvan, Nourhan Sakr, Tamas Sarlos, Jamal Atif

    Abstract: We consider an efficient computational framework for speeding up several machine learning algorithms with almost no loss of accuracy. The proposed framework relies on projections via structured matrices that we call Structured Spinners, which are formed as products of three structured matrix-blocks that incorporate rotations. The approach is highly generic, i.e. i) structured matrices under consid… ▽ More

    Submitted 26 November, 2016; v1 submitted 19 October, 2016; originally announced October 2016.

    Comments: arXiv admin note: substantial text overlap with arXiv:1605.09046

  25. arXiv:1605.09046  [pdf, other

    cs.LG stat.ML

    TripleSpin - a generic compact paradigm for fast machine learning computations

    Authors: Krzysztof Choromanski, Francois Fagan, Cedric Gouy-Pailler, Anne Morvan, Tamas Sarlos, Jamal Atif

    Abstract: We present a generic compact computational framework relying on structured random matrices that can be applied to speed up several machine learning algorithms with almost no loss of accuracy. The applications include new fast LSH-based algorithms, efficient kernel computations via random feature maps, convex optimization algorithms, quantization techniques and many more. Certain models of the pres… ▽ More

    Submitted 6 June, 2016; v1 submitted 29 May, 2016; originally announced May 2016.

  26. arXiv:1408.3060  [pdf, other

    cs.LG stat.ML

    Fastfood: Approximate Kernel Expansions in Loglinear Time

    Authors: Quoc Viet Le, Tamas Sarlos, Alexander Johannes Smola

    Abstract: Despite their successes, what makes kernel methods difficult to use in many large scale problems is the fact that storing and computing the decision function is typically expensive, especially at prediction time. In this paper, we overcome this difficulty by proposing Fastfood, an approximation that accelerates such computation significantly. Key to Fastfood is the observation that Hadamard matric… ▽ More

    Submitted 13 August, 2014; originally announced August 2014.

  27. arXiv:1004.4240  [pdf, ps, other

    cs.DS

    A Sparse Johnson--Lindenstrauss Transform

    Authors: Anirban Dasgupta, Ravi Kumar, Tamás Sarlós

    Abstract: Dimension reduction is a key algorithmic tool with many applications including nearest-neighbor search, compressed sensing and linear algebra in the streaming model. In this work we obtain a {\em sparse} version of the fundamental tool in dimension reduction --- the Johnson--Lindenstrauss transform. Using hashing and local densification, we construct a sparse projection matrix with just… ▽ More

    Submitted 23 April, 2010; originally announced April 2010.

    Comments: 10 pages, conference version.

    MSC Class: 68Q25; 68Q87; 68W20 ACM Class: F.2.0; G.3

  28. arXiv:0710.1435  [pdf, ps, other

    cs.DS

    Faster Least Squares Approximation

    Authors: Petros Drineas, Michael W. Mahoney, S. Muthukrishnan, Tamas Sarlos

    Abstract: Least squares approximation is a technique to find an approximate solution to a system of linear equations that has no exact solution. In a typical setting, one lets $n$ be the number of constraints and $d$ be the number of variables, with $n \gg d$. Then, existing exact methods find a solution vector in $O(nd^2)$ time. We present two randomized algorithms that provide very accurate relative-error… ▽ More

    Submitted 26 September, 2010; v1 submitted 7 October, 2007; originally announced October 2007.

    Comments: 25 pages; minor changes from previous version; this version will appear in Numerische Mathematik