Search | arXiv e-print repository

Large Row-Constrained Supersaturated Designs for High-throughput Screening

Authors: Byran J. Smucker, Stephen E. Wright, Isaac Williams, Richard C. Page, Andor J. Kiss, Surendra Bikram Silwal, Maria Weese, David J. Edwards

Abstract: High-throughput screening, in which multiwell plates are used to test large numbers of compounds against specific targets, is widely used across many areas of the biological sciences and most prominently in drug discovery. We propose a statistically principled approach to these screening experiments, using the machinery of supersaturated designs and the Lasso. To accommodate limitations on the num… ▽ More High-throughput screening, in which multiwell plates are used to test large numbers of compounds against specific targets, is widely used across many areas of the biological sciences and most prominently in drug discovery. We propose a statistically principled approach to these screening experiments, using the machinery of supersaturated designs and the Lasso. To accommodate limitations on the number of biological entities that can be applied to a single microplate well, we present a new class of row-constrained supersaturated designs. We develop a computational procedure to construct these designs, provide some initial lower bounds on the average squared off-diagonal values of their main-effects information matrix, and study the impact of the constraint on design quality. We also show via simulation that the proposed constrained row screening method is statistically superior to existing methods and demonstrate the use of the new methodology on a real drug-discovery system. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: Supplementary materials can be found at https://sites.miamioh.edu/byran-smucker/research-2/

arXiv:2406.02891 [pdf, other]

A Bi-metric Framework for Fast Similarity Search

Authors: Haike Xu, Sandeep Silwal, Piotr Indyk

Abstract: We propose a new "bi-metric" framework for designing nearest neighbor data structures. Our framework assumes two dissimilarity functions: a ground-truth metric that is accurate but expensive to compute, and a proxy metric that is cheaper but less accurate. In both theory and practice, we show how to construct data structures using only the proxy metric such that the query procedure achieves the ac… ▽ More We propose a new "bi-metric" framework for designing nearest neighbor data structures. Our framework assumes two dissimilarity functions: a ground-truth metric that is accurate but expensive to compute, and a proxy metric that is cheaper but less accurate. In both theory and practice, we show how to construct data structures using only the proxy metric such that the query procedure achieves the accuracy of the expensive metric, while only using a limited number of calls to both metrics. Our theoretical results instantiate this framework for two popular nearest neighbor search algorithms: DiskANN and Cover Tree. In both cases we show that, as long as the proxy metric used to construct the data structure approximates the ground-truth metric up to a bounded factor, our data structure achieves arbitrarily good approximation guarantees with respect to the ground-truth metric. On the empirical side, we apply the framework to the text retrieval problem with two dissimilarity functions evaluated by ML models with vastly different computational costs. We observe that for almost all data sets in the MTEB benchmark, our approach achieves a considerably better accuracy-efficiency tradeoff than the alternatives, such as re-ranking. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2403.08917 [pdf, other]

Efficiently Computing Similarities to Private Datasets

Authors: Arturs Backurs, Zinan Lin, Sepideh Mahabadi, Sandeep Silwal, Jakub Tarnawski

Abstract: Many methods in differentially private model training rely on computing the similarity between a query point (such as public or synthetic data) and private data. We abstract out this common subroutine and study the following fundamental algorithmic problem: Given a similarity function $f$ and a large high-dimensional private dataset $X \subset \mathbb{R}^d$, output a differentially private (DP) da… ▽ More Many methods in differentially private model training rely on computing the similarity between a query point (such as public or synthetic data) and private data. We abstract out this common subroutine and study the following fundamental algorithmic problem: Given a similarity function $f$ and a large high-dimensional private dataset $X \subset \mathbb{R}^d$, output a differentially private (DP) data structure which approximates $\sum_{x \in X} f(x,y)$ for any query $y$. We consider the cases where $f$ is a kernel function, such as $f(x,y) = e^{-\|x-y\|_2^2/σしぐま^2}$ (also known as DP kernel density estimation), or a distance function such as $f(x,y) = \|x-y\|_2$, among others. Our theoretical results improve upon prior work and give better privacy-utility trade-offs as well as faster query times for a wide range of kernels and distance functions. The unifying approach behind our results is leveraging `low-dimensional structures' present in the specific functions $f$ that we study, using tools such as provable dimensionality reduction, approximation theory, and one-dimensional decomposition of the functions. Our algorithms empirically exhibit improved query times and accuracy over prior state of the art. We also present an application to DP classification. Our experiments demonstrate that the simple methodology of classifying based on average similarity is orders of magnitude faster than prior DP-SGD based approaches for comparable accuracy. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: To appear at ICLR 2024

arXiv:2312.07535 [pdf, other]

Improved Frequency Estimation Algorithms with and without Predictions

Authors: Anders Aamand, Justin Y. Chen, Huy Lê Nguyen, Sandeep Silwal, Ali Vakilian

Abstract: Estimating frequencies of elements appearing in a data stream is a key task in large-scale data analysis. Popular sketching approaches to this problem (e.g., CountMin and CountSketch) come with worst-case guarantees that probabilistically bound the error of the estimated frequencies for any possible input. The work of Hsu et al. (2019) introduced the idea of using machine learning to tailor sketch… ▽ More Estimating frequencies of elements appearing in a data stream is a key task in large-scale data analysis. Popular sketching approaches to this problem (e.g., CountMin and CountSketch) come with worst-case guarantees that probabilistically bound the error of the estimated frequencies for any possible input. The work of Hsu et al. (2019) introduced the idea of using machine learning to tailor sketching algorithms to the specific data distribution they are being run on. In particular, their learning-augmented frequency estimation algorithm uses a learned heavy-hitter oracle which predicts which elements will appear many times in the stream. We give a novel algorithm, which in some parameter regimes, already theoretically outperforms the learning based algorithm of Hsu et al. without the use of any predictions. Augmenting our algorithm with heavy-hitter predictions further reduces the error and improves upon the state of the art. Empirically, our algorithms achieve superior performance in all experiments compared to prior approaches. △ Less

Submitted 12 December, 2023; originally announced December 2023.

Comments: NeurIPS 2023

arXiv:2310.02219 [pdf, other]

What do we learn from a large-scale study of pre-trained visual representations in sim and real environments?

Authors: Sneha Silwal, Karmesh Yadav, Tingfan Wu, Jay Vakil, Arjun Majumdar, Sergio Arnaud, Claire Chen, Vincent-Pierre Berges, Dhruv Batra, Aravind Rajeswaran, Mrinal Kalakrishnan, Franziska Meier, Oleksandr Maksymets

Abstract: We present a large empirical investigation on the use of pre-trained visual representations (PVRs) for training downstream policies that execute real-world tasks. Our study spans five different PVRs, two different policy-learning paradigms (imitation and reinforcement learning), and three different robots for 5 distinct manipulation and indoor navigation tasks. From this effort, we can arrive at t… ▽ More We present a large empirical investigation on the use of pre-trained visual representations (PVRs) for training downstream policies that execute real-world tasks. Our study spans five different PVRs, two different policy-learning paradigms (imitation and reinforcement learning), and three different robots for 5 distinct manipulation and indoor navigation tasks. From this effort, we can arrive at three insights: 1) the performance trends of PVRs in the simulation are generally indicative of their trends in the real world, 2) the use of PVRs enables a first-of-its-kind result with indoor ImageNav (zero-shot transfer to a held-out scene in the real world), and 3) the benefits from variations in PVRs, primarily data-augmentation and fine-tuning, also transfer to the real-world performance. See project website for additional details and visuals. △ Less

Submitted 3 October, 2023; originally announced October 2023.

Comments: Project website https://pvrs-sim2real.github.io/

MSC Class: 68T45 (Primary) 68T40; 68T05(Secondary) ACM Class: I.2.9; I.2.6; I.4.8; I.5.4

arXiv:2309.16840 [pdf, other]

Constant Approximation for Individual Preference Stable Clustering

Authors: Anders Aamand, Justin Y. Chen, Allen Liu, Sandeep Silwal, Pattara Sukprasert, Ali Vakilian, Fred Zhang

Abstract: Individual preference (IP) stability, introduced by Ahmadi et al. (ICML 2022), is a natural clustering objective inspired by stability and fairness constraints. A clustering is $αあるふぁ$-IP stable if the average distance of every data point to its own cluster is at most $αあるふぁ$ times the average distance to any other cluster. Unfortunately, determining if a dataset admits a $1$-IP stable clustering is NP-Ha… ▽ More Individual preference (IP) stability, introduced by Ahmadi et al. (ICML 2022), is a natural clustering objective inspired by stability and fairness constraints. A clustering is $αあるふぁ$-IP stable if the average distance of every data point to its own cluster is at most $αあるふぁ$ times the average distance to any other cluster. Unfortunately, determining if a dataset admits a $1$-IP stable clustering is NP-Hard. Moreover, before this work, it was unknown if an $o(n)$-IP stable clustering always \emph{exists}, as the prior state of the art only guaranteed an $O(n)$-IP stable clustering. We close this gap in understanding and show that an $O(1)$-IP stable clustering always exists for general metrics, and we give an efficient algorithm which outputs such a clustering. We also introduce generalizations of IP stability beyond average distance and give efficient, near-optimal algorithms in the cases where we consider the maximum and minimum distances within and between clusters. △ Less

Submitted 28 September, 2023; originally announced September 2023.

Comments: 20 pages

arXiv:2307.03043 [pdf, other]

A Near-Linear Time Algorithm for the Chamfer Distance

Authors: Ainesh Bakshi, Piotr Indyk, Rajesh Jayaram, Sandeep Silwal, Erik Waingarten

Abstract: For any two point sets $A,B \subset \mathbb{R}^d$ of size up to $n$, the Chamfer distance from $A$ to $B$ is defined as $\text{CH}(A,B)=\sum_{a \in A} \min_{b \in B} d_X(a,b)$, where $d_X$ is the underlying distance measure (e.g., the Euclidean or Manhattan distance). The Chamfer distance is a popular measure of dissimilarity between point clouds, used in many machine learning, computer vision, an… ▽ More For any two point sets $A,B \subset \mathbb{R}^d$ of size up to $n$, the Chamfer distance from $A$ to $B$ is defined as $\text{CH}(A,B)=\sum_{a \in A} \min_{b \in B} d_X(a,b)$, where $d_X$ is the underlying distance measure (e.g., the Euclidean or Manhattan distance). The Chamfer distance is a popular measure of dissimilarity between point clouds, used in many machine learning, computer vision, and graphics applications, and admits a straightforward $O(d n^2)$-time brute force algorithm. Further, the Chamfer distance is often used as a proxy for the more computationally demanding Earth-Mover (Optimal Transport) Distance. However, the \emph{quadratic} dependence on $n$ in the running time makes the naive approach intractable for large datasets. We overcome this bottleneck and present the first $(1+εいぷしろん)$-approximate algorithm for estimating the Chamfer distance with a near-linear running time. Specifically, our algorithm runs in time $O(nd \log (n)/\varepsilon^2)$ and is implementable. Our experiments demonstrate that it is both accurate and fast on large high-dimensional datasets. We believe that our algorithm will open new avenues for analyzing large high-dimensional point clouds. We also give evidence that if the goal is to \emph{report} a $(1+\varepsilon)$-approximate mapping from $A$ to $B$ (as opposed to just its value), then any sub-quadratic time algorithm is unlikely to exist. △ Less

Submitted 6 July, 2023; originally announced July 2023.

arXiv:2306.11312 [pdf, other]

Data Structures for Density Estimation

Authors: Anders Aamand, Alexandr Andoni, Justin Y. Chen, Piotr Indyk, Shyam Narayanan, Sandeep Silwal

Abstract: We study statistical/computational tradeoffs for the following density estimation problem: given $k$ distributions $v_1, \ldots, v_k$ over a discrete domain of size $n$, and sampling access to a distribution $p$, identify $v_i$ that is "close" to $p$. Our main result is the first data structure that, given a sublinear (in $n$) number of samples from $p$, identifies $v_i$ in time sublinear in $k$.… ▽ More We study statistical/computational tradeoffs for the following density estimation problem: given $k$ distributions $v_1, \ldots, v_k$ over a discrete domain of size $n$, and sampling access to a distribution $p$, identify $v_i$ that is "close" to $p$. Our main result is the first data structure that, given a sublinear (in $n$) number of samples from $p$, identifies $v_i$ in time sublinear in $k$. We also give an improved version of the algorithm of Acharya et al. (2018) that reports $v_i$ in time linear in $k$. The experimental evaluation of the latter algorithm shows that it achieves a significant reduction in the number of operations needed to achieve a given accuracy compared to prior work. △ Less

Submitted 20 June, 2023; originally announced June 2023.

Comments: To appear at ICML'23

arXiv:2304.07652 [pdf, other]

Learned Interpolation for Better Streaming Quantile Approximation with Worst-Case Guarantees

Authors: Nicholas Schiefer, Justin Y. Chen, Piotr Indyk, Shyam Narayanan, Sandeep Silwal, Tal Wagner

Abstract: An $\varepsilon$-approximate quantile sketch over a stream of $n$ inputs approximates the rank of any query point $q$ - that is, the number of input points less than $q$ - up to an additive error of $\varepsilon n$, generally with some probability of at least $1 - 1/\mathrm{poly}(n)$, while consuming $o(n)$ space. While the celebrated KLL sketch of Karnin, Lang, and Liberty achieves a provably opt… ▽ More An $\varepsilon$-approximate quantile sketch over a stream of $n$ inputs approximates the rank of any query point $q$ - that is, the number of input points less than $q$ - up to an additive error of $\varepsilon n$, generally with some probability of at least $1 - 1/\mathrm{poly}(n)$, while consuming $o(n)$ space. While the celebrated KLL sketch of Karnin, Lang, and Liberty achieves a provably optimal quantile approximation algorithm over worst-case streams, the approximations it achieves in practice are often far from optimal. Indeed, the most commonly used technique in practice is Dunning's t-digest, which often achieves much better approximations than KLL on real-world data but is known to have arbitrarily large errors in the worst case. We apply interpolation techniques to the streaming quantiles problem to attempt to achieve better approximations on real-world data sets than KLL while maintaining similar guarantees in the worst case. △ Less

Submitted 15 April, 2023; originally announced April 2023.

Comments: 11 pages, 5 figures, published at SIAM ACDA 2023

arXiv:2304.07413 [pdf, other]

Robust Algorithms on Adaptive Inputs from Bounded Adversaries

Authors: Yeshwanth Cherapanamjeri, Sandeep Silwal, David P. Woodruff, Fred Zhang, Qiuyi Zhang, Samson Zhou

Abstract: We study dynamic algorithms robust to adaptive input generated from sources with bounded capabilities, such as sparsity or limited interaction. For example, we consider robust linear algebraic algorithms when the updates to the input are sparse but given by an adversary with access to a query oracle. We also study robust algorithms in the standard centralized setting, where an adversary queries an… ▽ More We study dynamic algorithms robust to adaptive input generated from sources with bounded capabilities, such as sparsity or limited interaction. For example, we consider robust linear algebraic algorithms when the updates to the input are sparse but given by an adversary with access to a query oracle. We also study robust algorithms in the standard centralized setting, where an adversary queries an algorithm in an adaptive manner, but the number of interactions between the adversary and the algorithm is bounded. We first recall a unified framework of [HKM+20, BKM+22, ACSS23] for answering $Q$ adaptive queries that incurs $\widetilde{\mathcal{O}}(\sqrt{Q})$ overhead in space, which is roughly a quadratic improvement over the naïve implementation, and only incurs a logarithmic overhead in query time. Although the general framework has diverse applications in machine learning and data science, such as adaptive distance estimation, kernel density estimation, linear regression, range queries, and point queries and serves as a preliminary benchmark, we demonstrate even better algorithmic improvements for (1) reducing the pre-processing time for adaptive distance estimation and (2) permitting an unlimited number of adaptive queries for kernel density estimation. Finally, we complement our theoretical results with additional empirical evaluations. △ Less

Submitted 14 April, 2023; originally announced April 2023.

arXiv:2303.18240 [pdf, other]

Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?

Authors: Arjun Majumdar, Karmesh Yadav, Sergio Arnaud, Yecheng Jason Ma, Claire Chen, Sneha Silwal, Aryan Jain, Vincent-Pierre Berges, Pieter Abbeel, Jitendra Malik, Dhruv Batra, Yixin Lin, Oleksandr Maksymets, Aravind Rajeswaran, Franziska Meier

Abstract: We present the largest and most comprehensive empirical study of pre-trained visual representations (PVRs) or visual 'foundation models' for Embodied AI. First, we curate CortexBench, consisting of 17 different tasks spanning locomotion, navigation, dexterous, and mobile manipulation. Next, we systematically evaluate existing PVRs and find that none are universally dominant. To study the effect of… ▽ More We present the largest and most comprehensive empirical study of pre-trained visual representations (PVRs) or visual 'foundation models' for Embodied AI. First, we curate CortexBench, consisting of 17 different tasks spanning locomotion, navigation, dexterous, and mobile manipulation. Next, we systematically evaluate existing PVRs and find that none are universally dominant. To study the effect of pre-training data size and diversity, we combine over 4,000 hours of egocentric videos from 7 different sources (over 4.3M images) and ImageNet to train different-sized vision transformers using Masked Auto-Encoding (MAE) on slices of this data. Contrary to inferences from prior work, we find that scaling dataset size and diversity does not improve performance universally (but does so on average). Our largest model, named VC-1, outperforms all prior PVRs on average but does not universally dominate either. Next, we show that task- or domain-specific adaptation of VC-1 leads to substantial gains, with VC-1 (adapted) achieving competitive or superior performance than the best known results on all of the benchmarks in CortexBench. Finally, we present real-world hardware experiments, in which VC-1 and VC-1 (adapted) outperform the strongest pre-existing PVR. Overall, this paper presents no new techniques but a rigorous systematic evaluation, a broad set of findings about PVRs (that in some cases, refute those made in narrow domains in prior work), and open-sourced code and models (that required over 10,000 GPU-hours to train) for the benefit of the research community. △ Less

Submitted 1 February, 2024; v1 submitted 31 March, 2023; originally announced March 2023.

Comments: Project website: https://eai-vc.github.io

arXiv:2303.01453 [pdf, other]

Improved Space Bounds for Learning with Experts

Authors: Anders Aamand, Justin Y. Chen, Huy Lê Nguyen, Sandeep Silwal

Abstract: We give improved tradeoffs between space and regret for the online learning with expert advice problem over $T$ days with $n$ experts. Given a space budget of $n^δでるた$ for $δでるた\in (0,1)$, we provide an algorithm achieving regret $\tilde{O}(n^2 T^{1/(1+δでるた)})$, improving upon the regret bound $\tilde{O}(n^2 T^{2/(2+δでるた)})$ in the recent work of [PZ23]. The improvement is particularly salient in the regime… ▽ More We give improved tradeoffs between space and regret for the online learning with expert advice problem over $T$ days with $n$ experts. Given a space budget of $n^δでるた$ for $δでるた\in (0,1)$, we provide an algorithm achieving regret $\tilde{O}(n^2 T^{1/(1+δでるた)})$, improving upon the regret bound $\tilde{O}(n^2 T^{2/(2+δでるた)})$ in the recent work of [PZ23]. The improvement is particularly salient in the regime $δでるた\rightarrow 1$ where the regret of our algorithm approaches $\tilde{O}_n(\sqrt{T})$, matching the $T$ dependence in the standard online setting without space restrictions. △ Less

Submitted 2 March, 2023; originally announced March 2023.

arXiv:2212.00642 [pdf, other]

Sub-quadratic Algorithms for Kernel Matrices via Kernel Density Estimation

Authors: Ainesh Bakshi, Piotr Indyk, Praneeth Kacham, Sandeep Silwal, Samson Zhou

Abstract: Kernel matrices, as well as weighted graphs represented by them, are ubiquitous objects in machine learning, statistics and other related fields. The main drawback of using kernel methods (learning and inference using kernel matrices) is efficiency -- given $n$ input points, most kernel-based algorithms need to materialize the full $n \times n$ kernel matrix before performing any subsequent comput… ▽ More Kernel matrices, as well as weighted graphs represented by them, are ubiquitous objects in machine learning, statistics and other related fields. The main drawback of using kernel methods (learning and inference using kernel matrices) is efficiency -- given $n$ input points, most kernel-based algorithms need to materialize the full $n \times n$ kernel matrix before performing any subsequent computation, thus incurring $Ωおめが(n^2)$ runtime. Breaking this quadratic barrier for various problems has therefore, been a subject of extensive research efforts. We break the quadratic barrier and obtain $\textit{subquadratic}$ time algorithms for several fundamental linear-algebraic and graph processing primitives, including approximating the top eigenvalue and eigenvector, spectral sparsification, solving linear systems, local clustering, low-rank approximation, arboricity estimation and counting weighted triangles. We build on the recent Kernel Density Estimation framework, which (after preprocessing in time subquadratic in $n$) can return estimates of row/column sums of the kernel matrix. In particular, we develop efficient reductions from $\textit{weighted vertex}$ and $\textit{weighted edge sampling}$ on kernel graphs, $\textit{simulating random walks}$ on kernel graphs, and $\textit{importance sampling}$ on matrices to Kernel Density Estimation and show that we can generate samples from these distributions in $\textit{sublinear}$ (in the support of the distribution) time. Our reductions are the central ingredient in each of our applications and we believe they may be of independent interest. We empirically demonstrate the efficacy of our algorithms on low-rank approximation (LRA) and spectral sparsification, where we observe a $\textbf{9x}$ decrease in the number of kernel evaluations over baselines for LRA and a $\textbf{41x}$ reduction in the graph size for spectral sparsification. △ Less

Submitted 1 December, 2022; originally announced December 2022.

arXiv:2211.09964 [pdf, ps, other]

Optimal Algorithms for Linear Algebra in the Current Matrix Multiplication Time

Authors: Yeshwanth Cherapanamjeri, Sandeep Silwal, David P. Woodruff, Samson Zhou

Abstract: We study fundamental problems in linear algebra, such as finding a maximal linearly independent subset of rows or columns (a basis), solving linear regression, or computing a subspace embedding. For these problems, we consider input matrices $\mathbf{A}\in\mathbb{R}^{n\times d}$ with $n > d$. The input can be read in $\text{nnz}(\mathbf{A})$ time, which denotes the number of nonzero entries of… ▽ More We study fundamental problems in linear algebra, such as finding a maximal linearly independent subset of rows or columns (a basis), solving linear regression, or computing a subspace embedding. For these problems, we consider input matrices $\mathbf{A}\in\mathbb{R}^{n\times d}$ with $n > d$. The input can be read in $\text{nnz}(\mathbf{A})$ time, which denotes the number of nonzero entries of $\mathbf{A}$. In this paper, we show that beyond the time required to read the input matrix, these fundamental linear algebra problems can be solved in $d^ωおめが$ time, i.e., where $ωおめが\approx 2.37$ is the current matrix-multiplication exponent. To do so, we introduce a constant-factor subspace embedding with the optimal $m=\mathcal{O}(d)$ number of rows, and which can be applied in time $\mathcal{O}\left(\frac{\text{nnz}(\mathbf{A})}αあるふぁ\right) + d^{2 + αあるふぁ}\text{poly}(\log d)$ for any trade-off parameter $αあるふぁ>0$, tightening a recent result by Chepurko et. al. [SODA 2022] that achieves an $\exp(\text{poly}(\log\log n))$ distortion with $m=d\cdot\text{poly}(\log\log d)$ rows in $\mathcal{O}\left(\frac{\text{nnz}(\mathbf{A})}αあるふぁ+d^{2+αあるふぁ+o(1)}\right)$ time. Our subspace embedding uses a recently shown property of stacked Subsampled Randomized Hadamard Transforms (SRHT), which actually increase the input dimension, to "spread" the mass of an input vector among a large number of coordinates, followed by random sampling. To control the effects of random sampling, we use fast semidefinite programming to reweight the rows. We then use our constant-factor subspace embedding to give the first optimal runtime algorithms for finding a maximal linearly independent subset of columns, regression, and leverage score sampling. To do so, we also introduce a novel subroutine that iteratively grows a set of independent rows, which may be of independent interest. △ Less

Submitted 19 January, 2023; v1 submitted 17 November, 2022; originally announced November 2022.

Comments: SODA 2023

arXiv:2211.03232 [pdf, other]

Exponentially Improving the Complexity of Simulating the Weisfeiler-Lehman Test with Graph Neural Networks

Authors: Anders Aamand, Justin Y. Chen, Piotr Indyk, Shyam Narayanan, Ronitt Rubinfeld, Nicholas Schiefer, Sandeep Silwal, Tal Wagner

Abstract: Recent work shows that the expressive power of Graph Neural Networks (GNNs) in distinguishing non-isomorphic graphs is exactly the same as that of the Weisfeiler-Lehman (WL) graph test. In particular, they show that the WL test can be simulated by GNNs. However, those simulations involve neural networks for the 'combine' function of size polynomial or even exponential in the number of graph nodes… ▽ More Recent work shows that the expressive power of Graph Neural Networks (GNNs) in distinguishing non-isomorphic graphs is exactly the same as that of the Weisfeiler-Lehman (WL) graph test. In particular, they show that the WL test can be simulated by GNNs. However, those simulations involve neural networks for the 'combine' function of size polynomial or even exponential in the number of graph nodes $n$, as well as feature vectors of length linear in $n$. We present an improved simulation of the WL test on GNNs with \emph{exponentially} lower complexity. In particular, the neural network implementing the combine function in each node has only a polylogarithmic number of parameters in $n$, and the feature vectors exchanged by the nodes of GNN consists of only $O(\log n)$ bits. We also give logarithmic lower bounds for the feature vector length and the size of the neural networks, showing the (near)-optimality of our construction. △ Less

Submitted 21 December, 2022; v1 submitted 6 November, 2022; originally announced November 2022.

Comments: 22 pages,5 figures, published at NeurIPS 2022. Updated funding statements

arXiv:2210.15114 [pdf, other]

Faster Linear Algebra for Distance Matrices

Authors: Piotr Indyk, Sandeep Silwal

Abstract: The distance matrix of a dataset $X$ of $n$ points with respect to a distance function $f$ represents all pairwise distances between points in $X$ induced by $f$. Due to their wide applicability, distance matrices and related families of matrices have been the focus of many recent algorithmic works. We continue this line of research and take a broad view of algorithm design for distance matrices w… ▽ More The distance matrix of a dataset $X$ of $n$ points with respect to a distance function $f$ represents all pairwise distances between points in $X$ induced by $f$. Due to their wide applicability, distance matrices and related families of matrices have been the focus of many recent algorithmic works. We continue this line of research and take a broad view of algorithm design for distance matrices with the goal of designing fast algorithms, which are specifically tailored for distance matrices, for fundamental linear algebraic primitives. Our results include efficient algorithms for computing matrix-vector products for a wide class of distance matrices, such as the $\ell_1$ metric for which we get a linear runtime, as well as an $Ωおめが(n^2)$ lower bound for any algorithm which computes a matrix-vector product for the $\ell_{\infty}$ case, showing a separation between the $\ell_1$ and the $\ell_{\infty}$ metrics. Our upper bound results, in conjunction with recent works on the matrix-vector query model, have many further downstream applications, including the fastest algorithm for computing a relative error low-rank approximation for the distance matrix induced by $\ell_1$ and $\ell_2^2$ functions and the fastest algorithm for computing an additive error low-rank approximation for the $\ell_2$ metric, in addition to applications for fast matrix multiplication among others. We also give algorithms for constructing distance matrices and show that one can construct an approximate $\ell_2$ distance matrix in time faster than the bound implied by the Johnson-Lindenstrauss lemma. △ Less

Submitted 26 October, 2022; originally announced October 2022.

Comments: Selected as Oral for NeurIPS 2022

arXiv:2209.10614 [pdf, other]

Learning-Augmented Algorithms for Online Linear and Semidefinite Programming

Authors: Elena Grigorescu, Young-San Lin, Sandeep Silwal, Maoyuan Song, Samson Zhou

Abstract: Semidefinite programming (SDP) is a unifying framework that generalizes both linear programming and quadratically-constrained quadratic programming, while also yielding efficient solvers, both in theory and in practice. However, there exist known impossibility results for approximating the optimal solution when constraints for covering SDPs arrive in an online fashion. In this paper, we study onli… ▽ More Semidefinite programming (SDP) is a unifying framework that generalizes both linear programming and quadratically-constrained quadratic programming, while also yielding efficient solvers, both in theory and in practice. However, there exist known impossibility results for approximating the optimal solution when constraints for covering SDPs arrive in an online fashion. In this paper, we study online covering linear and semidefinite programs in which the algorithm is augmented with advice from a possibly erroneous predictor. We show that if the predictor is accurate, we can efficiently bypass these impossibility results and achieve a constant-factor approximation to the optimal solution, i.e., consistency. On the other hand, if the predictor is inaccurate, under some technical conditions, we achieve results that match both the classical optimal upper bounds and the tight lower bounds up to constant factors, i.e., robustness. More broadly, we introduce a framework that extends both (1) the online set cover problem augmented with machine-learning predictors, studied by Bamas, Maggiori, and Svensson (NeurIPS 2020), and (2) the online covering SDP problem, initiated by Elad, Kale, and Naor (ICALP 2016). Specifically, we obtain general online learning-augmented algorithms for covering linear programs with fractional advice and constraints, and initiate the study of learning-augmented algorithms for covering SDP problems. Our techniques are based on the primal-dual framework of Buchbinder and Naor (Mathematics of Operations Research, 34, 2009) and can be further adjusted to handle constraints where the variables lie in a bounded region, i.e., box constraints. △ Less

Submitted 21 October, 2022; v1 submitted 21 September, 2022; originally announced September 2022.

Comments: 44 pages, 3 figures. To appear in NeurIPS 2022

arXiv:2206.14354 [pdf, ps, other]

Hardness and Algorithms for Robust and Sparse Optimization

Authors: Eric Price, Sandeep Silwal, Samson Zhou

Abstract: We explore algorithms and limitations for sparse optimization problems such as sparse linear regression and robust linear regression. The goal of the sparse linear regression problem is to identify a small number of key features, while the goal of the robust linear regression problem is to identify a small number of erroneous measurements. Specifically, the sparse linear regression problem seeks a… ▽ More We explore algorithms and limitations for sparse optimization problems such as sparse linear regression and robust linear regression. The goal of the sparse linear regression problem is to identify a small number of key features, while the goal of the robust linear regression problem is to identify a small number of erroneous measurements. Specifically, the sparse linear regression problem seeks a $k$-sparse vector $x\in\mathbb{R}^d$ to minimize $\|Ax-b\|_2$, given an input matrix $A\in\mathbb{R}^{n\times d}$ and a target vector $b\in\mathbb{R}^n$, while the robust linear regression problem seeks a set $S$ that ignores at most $k$ rows and a vector $x$ to minimize $\|(Ax-b)_S\|_2$. We first show bicriteria, NP-hardness of approximation for robust regression building on the work of [OWZ15] which implies a similar result for sparse regression. We further show fine-grained hardness of robust regression through a reduction from the minimum-weight $k$-clique conjecture. On the positive side, we give an algorithm for robust regression that achieves arbitrarily accurate additive error and uses runtime that closely matches the lower bound from the fine-grained hardness result, as well as an algorithm for sparse regression with similar runtime. Both our upper and lower bounds rely on a general reduction from robust linear regression to sparse regression that we introduce. Our algorithms, inspired by the 3SUM problem, use approximate nearest neighbor data structures and may be of independent interest for solving sparse optimization problems. For instance, we demonstrate that our techniques can also be used for the well-studied sparse PCA problem. △ Less

Submitted 28 June, 2022; originally announced June 2022.

Comments: ICML 2022

arXiv:2204.12055 [pdf, other]

Faster Fundamental Graph Algorithms via Learned Predictions

Authors: Justin Y. Chen, Sandeep Silwal, Ali Vakilian, Fred Zhang

Abstract: We consider the question of speeding up classic graph algorithms with machine-learned predictions. In this model, algorithms are furnished with extra advice learned from past or similar instances. Given the additional information, we aim to improve upon the traditional worst-case run-time guarantees. Our contributions are the following: (i) We give a faster algorithm for minimum-weight bipartite… ▽ More We consider the question of speeding up classic graph algorithms with machine-learned predictions. In this model, algorithms are furnished with extra advice learned from past or similar instances. Given the additional information, we aim to improve upon the traditional worst-case run-time guarantees. Our contributions are the following: (i) We give a faster algorithm for minimum-weight bipartite matching via learned duals, improving the recent result by Dinitz, Im, Lavastida, Moseley and Vassilvitskii (NeurIPS, 2021); (ii) We extend the learned dual approach to the single-source shortest path problem (with negative edge lengths), achieving an almost linear runtime given sufficiently accurate predictions which improves upon the classic fastest algorithm due to Goldberg (SIAM J. Comput., 1995); (iii) We provide a general reduction-based framework for learning-based graph algorithms, leading to new algorithms for degree-constrained subgraph and minimum-cost $0$-$1$ flow, based on reductions to bipartite matching and the shortest path problem. Finally, we give a set of general learnability theorems, showing that the predictions required by our algorithms can be efficiently learned in a PAC fashion. △ Less

Submitted 25 April, 2022; originally announced April 2022.

arXiv:2204.09951 [pdf, other]

Motif Cut Sparsifiers

Authors: Michael Kapralov, Mikhail Makarov, Sandeep Silwal, Christian Sohler, Jakab Tardos

Abstract: A motif is a frequently occurring subgraph of a given directed or undirected graph $G$. Motifs capture higher order organizational structure of $G$ beyond edge relationships, and, therefore, have found wide applications such as in graph clustering, community detection, and analysis of biological and physical networks to name a few. In these applications, the cut structure of motifs plays a crucial… ▽ More A motif is a frequently occurring subgraph of a given directed or undirected graph $G$. Motifs capture higher order organizational structure of $G$ beyond edge relationships, and, therefore, have found wide applications such as in graph clustering, community detection, and analysis of biological and physical networks to name a few. In these applications, the cut structure of motifs plays a crucial role as vertices are partitioned into clusters by cuts whose conductance is based on the number of instances of a particular motif, as opposed to just the number of edges, crossing the cuts. In this paper, we introduce the concept of a motif cut sparsifier. We show that one can compute in polynomial time a sparse weighted subgraph $G'$ with only $\widetilde{O}(n/εいぷしろん^2)$ edges such that for every cut, the weighted number of copies of $M$ crossing the cut in $G'$ is within a $1+εいぷしろん$ factor of the number of copies of $M$ crossing the cut in $G$, for every constant size motif $M$. Our work carefully combines the viewpoints of both graph sparsification and hypergraph sparsification. We sample edges which requires us to extend and strengthen the concept of cut sparsifiers introduced in the seminal work of to the motif setting. We adapt the importance sampling framework through the viewpoint of hypergraph sparsification by deriving the edge sampling probabilities from the strong connectivity values of a hypergraph whose hyperedges represent motif instances. Finally, an iterative sparsification primitive inspired by both viewpoints is used to reduce the number of edges in $G$ to nearly linear. In addition, we present a strong lower bound ruling out a similar result for sparsification with respect to induced occurrences of motifs. △ Less

Submitted 12 September, 2022; v1 submitted 21 April, 2022; originally announced April 2022.

Comments: 48 pages, 3 figures

arXiv:2204.09136 [pdf, other]

The White-Box Adversarial Data Stream Model

Authors: Miklos Ajtai, Vladimir Braverman, T. S. Jayram, Sandeep Silwal, Alec Sun, David P. Woodruff, Samson Zhou

Abstract: We study streaming algorithms in the white-box adversarial model, where the stream is chosen adaptively by an adversary who observes the entire internal state of the algorithm at each time step. We show that nontrivial algorithms are still possible. We first give a randomized algorithm for the $L_1$-heavy hitters problem that outperforms the optimal deterministic Misra-Gries algorithm on long stre… ▽ More We study streaming algorithms in the white-box adversarial model, where the stream is chosen adaptively by an adversary who observes the entire internal state of the algorithm at each time step. We show that nontrivial algorithms are still possible. We first give a randomized algorithm for the $L_1$-heavy hitters problem that outperforms the optimal deterministic Misra-Gries algorithm on long streams. If the white-box adversary is computationally bounded, we use cryptographic techniques to reduce the memory of our $L_1$-heavy hitters algorithm even further and to design a number of additional algorithms for graph, string, and linear algebra problems. The existence of such algorithms is surprising, as the streaming algorithm does not even have a secret key in this model, i.e., its state is entirely known to the adversary. One algorithm we design is for estimating the number of distinct elements in a stream with insertions and deletions achieving a multiplicative approximation and sublinear space; such an algorithm is impossible for deterministic algorithms. We also give a general technique that translates any two-player deterministic communication lower bound to a lower bound for {\it randomized} algorithms robust to a white-box adversary. In particular, our results show that for all $p\ge 0$, there exists a constant $C_p>1$ such that any $C_p$-approximation algorithm for $F_p$ moment estimation in insertion-only streams with a white-box adversary requires $Ωおめが(n)$ space for a universe of size $n$. Similarly, there is a constant $C>1$ such that any $C$-approximation algorithm in an insertion-only stream for matrix rank requires $Ωおめが(n)$ space with a white-box adversary. Our algorithmic results based on cryptography thus show a separation between computationally bounded and unbounded adversaries. (Abstract shortened to meet arXiv limits.) △ Less

Submitted 23 July, 2022; v1 submitted 19 April, 2022; originally announced April 2022.

Comments: PODS 2022

arXiv:2203.13251 [pdf, other]

Dexterous Imitation Made Easy: A Learning-Based Framework for Efficient Dexterous Manipulation

Authors: Sridhar Pandian Arunachalam, Sneha Silwal, Ben Evans, Lerrel Pinto

Abstract: Optimizing behaviors for dexterous manipulation has been a longstanding challenge in robotics, with a variety of methods from model-based control to model-free reinforcement learning having been previously explored in literature. Perhaps one of the most powerful techniques to learn complex manipulation strategies is imitation learning. However, collecting and learning from demonstrations in dexter… ▽ More Optimizing behaviors for dexterous manipulation has been a longstanding challenge in robotics, with a variety of methods from model-based control to model-free reinforcement learning having been previously explored in literature. Perhaps one of the most powerful techniques to learn complex manipulation strategies is imitation learning. However, collecting and learning from demonstrations in dexterous manipulation is quite challenging. The complex, high-dimensional action-space involved with multi-finger control often leads to poor sample efficiency of learning-based methods. In this work, we propose 'Dexterous Imitation Made Easy' (DIME) a new imitation learning framework for dexterous manipulation. DIME only requires a single RGB camera to observe a human operator and teleoperate our robotic hand. Once demonstrations are collected, DIME employs standard imitation learning methods to train dexterous manipulation policies. On both simulation and real robot benchmarks we demonstrate that DIME can be used to solve complex, in-hand manipulation tasks such as 'flipping', 'spinning', and 'rotating' objects with the Allegro hand. Our framework along with pre-collected demonstrations is publicly available at https://nyu-robot-learning.github.io/dime. △ Less

Submitted 24 March, 2022; originally announced March 2022.

Comments: The first two authors contributed equally

arXiv:2203.09572 [pdf, other]

Triangle and Four Cycle Counting with Predictions in Graph Streams

Authors: Justin Y. Chen, Talya Eden, Piotr Indyk, Honghao Lin, Shyam Narayanan, Ronitt Rubinfeld, Sandeep Silwal, Tal Wagner, David P. Woodruff, Michael Zhang

Abstract: We propose data-driven one-pass streaming algorithms for estimating the number of triangles and four cycles, two fundamental problems in graph analytics that are widely studied in the graph data stream literature. Recently, (Hsu 2018) and (Jiang 2020) applied machine learning techniques in other data stream problems, using a trained oracle that can predict certain properties of the stream elements… ▽ More We propose data-driven one-pass streaming algorithms for estimating the number of triangles and four cycles, two fundamental problems in graph analytics that are widely studied in the graph data stream literature. Recently, (Hsu 2018) and (Jiang 2020) applied machine learning techniques in other data stream problems, using a trained oracle that can predict certain properties of the stream elements to improve on prior "classical" algorithms that did not use oracles. In this paper, we explore the power of a "heavy edge" oracle in multiple graph edge streaming models. In the adjacency list model, we present a one-pass triangle counting algorithm improving upon the previous space upper bounds without such an oracle. In the arbitrary order model, we present algorithms for both triangle and four cycle estimation with fewer passes and the same space complexity as in previous algorithms, and we show several of these bounds are optimal. We analyze our algorithms under several noise models, showing that the algorithms perform well even when the oracle errs. Our methodology expands upon prior work on "classical" streaming algorithms, as previous multi-pass and random order streaming algorithms can be seen as special cases of our algorithms, where the first pass or random order was used to implement the heavy edge oracle. Lastly, our experiments demonstrate advantages of the proposed method compared to state-of-the-art streaming algorithms. △ Less

Submitted 17 March, 2022; originally announced March 2022.

Comments: To be presented at ICLR 2022

arXiv:2110.14094 [pdf, other]

Learning-Augmented $k$-means Clustering

Authors: Jon C. Ergun, Zhili Feng, Sandeep Silwal, David P. Woodruff, Samson Zhou

Abstract: $k$-means clustering is a well-studied problem due to its wide applicability. Unfortunately, there exist strong theoretical limits on the performance of any algorithm for the $k$-means problem on worst-case inputs. To overcome this barrier, we consider a scenario where "advice" is provided to help perform clustering. Specifically, we consider the $k$-means problem augmented with a predictor that,… ▽ More $k$-means clustering is a well-studied problem due to its wide applicability. Unfortunately, there exist strong theoretical limits on the performance of any algorithm for the $k$-means problem on worst-case inputs. To overcome this barrier, we consider a scenario where "advice" is provided to help perform clustering. Specifically, we consider the $k$-means problem augmented with a predictor that, given any point, returns its cluster label in an approximately optimal clustering up to some, possibly adversarial, error. We present an algorithm whose performance improves along with the accuracy of the predictor, even though naïvely following the accurate predictor can still lead to a high clustering cost. Thus if the predictor is sufficiently accurate, we can retrieve a close to optimal clustering with nearly optimal runtime, breaking known computational barriers for algorithms that do not have access to such advice. We evaluate our algorithms on real datasets and show significant improvements in the quality of clustering. △ Less

Submitted 19 March, 2022; v1 submitted 26 October, 2021; originally announced October 2021.

Comments: ICLR 2022

arXiv:2110.08991 [pdf, other]

Dimensionality Reduction for Wasserstein Barycenter

Authors: Zachary Izzo, Sandeep Silwal, Samson Zhou

Abstract: The Wasserstein barycenter is a geometric construct which captures the notion of centrality among probability distributions, and which has found many applications in machine learning. However, most algorithms for finding even an approximate barycenter suffer an exponential dependence on the dimension $d$ of the underlying space of the distributions. In order to cope with this "curse of dimensional… ▽ More The Wasserstein barycenter is a geometric construct which captures the notion of centrality among probability distributions, and which has found many applications in machine learning. However, most algorithms for finding even an approximate barycenter suffer an exponential dependence on the dimension $d$ of the underlying space of the distributions. In order to cope with this "curse of dimensionality," we study dimensionality reduction techniques for the Wasserstein barycenter problem. When the barycenter is restricted to support of size $n$, we show that randomized dimensionality reduction can be used to map the problem to a space of dimension $O(\log n)$ independent of both $d$ and $k$, and that \emph{any} solution found in the reduced dimension will have its cost preserved up to arbitrary small error in the original space. We provide matching upper and lower bounds on the size of the reduced dimension, showing that our methods are optimal up to constant factors. We also provide a coreset construction for the Wasserstein barycenter problem that significantly decreases the number of input distributions. The coresets can be used in conjunction with random projections and thus further improve computation time. Lastly, our experimental results validate the speedup provided by dimensionality reduction while maintaining solution quality. △ Less

Submitted 18 October, 2021; v1 submitted 17 October, 2021; originally announced October 2021.

Comments: Published as a conference paper in NeurIPS 2021

arXiv:2107.01804 [pdf, other]

Randomized Dimensionality Reduction for Facility Location and Single-Linkage Clustering

Authors: Shyam Narayanan, Sandeep Silwal, Piotr Indyk, Or Zamir

Abstract: Random dimensionality reduction is a versatile tool for speeding up algorithms for high-dimensional problems. We study its application to two clustering problems: the facility location problem, and the single-linkage hierarchical clustering problem, which is equivalent to computing the minimum spanning tree. We show that if we project the input pointset $X$ onto a random $d = O(d_X)$-dimensional s… ▽ More Random dimensionality reduction is a versatile tool for speeding up algorithms for high-dimensional problems. We study its application to two clustering problems: the facility location problem, and the single-linkage hierarchical clustering problem, which is equivalent to computing the minimum spanning tree. We show that if we project the input pointset $X$ onto a random $d = O(d_X)$-dimensional subspace (where $d_X$ is the doubling dimension of $X$), then the optimum facility location cost in the projected space approximates the original cost up to a constant factor. We show an analogous statement for minimum spanning tree, but with the dimension $d$ having an extra $\log \log n$ term and the approximation factor being arbitrarily close to $1$. Furthermore, we extend these results to approximating solutions instead of just their costs. Lastly, we provide experimental results to validate the quality of solutions and the speedup due to the dimensionality reduction. Unlike several previous papers studying this approach in the context of $k$-means and $k$-medians, our dimension bound does not depend on the number of clusters but only on the intrinsic dimensionality of $X$. △ Less

Submitted 5 July, 2021; originally announced July 2021.

Comments: 25 pages. Published as a conference paper in ICML 2021

arXiv:2106.14952 [pdf, other]

Adversarial Robustness of Streaming Algorithms through Importance Sampling

Authors: Vladimir Braverman, Avinatan Hassidim, Yossi Matias, Mariano Schain, Sandeep Silwal, Samson Zhou

Abstract: In this paper, we introduce adversarially robust streaming algorithms for central machine learning and algorithmic tasks, such as regression and clustering, as well as their more general counterparts, subspace embedding, low-rank approximation, and coreset construction. For regression and other numerical linear algebra related tasks, we consider the row arrival streaming model. Our results are bas… ▽ More In this paper, we introduce adversarially robust streaming algorithms for central machine learning and algorithmic tasks, such as regression and clustering, as well as their more general counterparts, subspace embedding, low-rank approximation, and coreset construction. For regression and other numerical linear algebra related tasks, we consider the row arrival streaming model. Our results are based on a simple, but powerful, observation that many importance sampling-based algorithms give rise to adversarial robustness which is in contrast to sketching based algorithms, which are very prevalent in the streaming literature but suffer from adversarial attacks. In addition, we show that the well-known merge and reduce paradigm in streaming is adversarially robust. Since the merge and reduce paradigm allows coreset constructions in the streaming setting, we thus obtain robust algorithms for $k$-means, $k$-median, $k$-center, Bregman clustering, projective clustering, principal component analysis (PCA) and non-negative matrix factorization. To the best of our knowledge, these are the first adversarially robust results for these problems yet require no new algorithmic implementations. Finally, we empirically confirm the robustness of our algorithms on various adversarial attacks and demonstrate that by contrast, some common existing algorithms are not robust. (Abstract shortened to meet arXiv limits) △ Less

Submitted 25 October, 2021; v1 submitted 28 June, 2021; originally announced June 2021.

Comments: NeurIPS 2021

arXiv:2106.08396 [pdf, other]

Learning-based Support Estimation in Sublinear Time

Authors: Talya Eden, Piotr Indyk, Shyam Narayanan, Ronitt Rubinfeld, Sandeep Silwal, Tal Wagner

Abstract: We consider the problem of estimating the number of distinct elements in a large data set (or, equivalently, the support size of the distribution induced by the data set) from a random sample of its elements. The problem occurs in many applications, including biology, genomics, computer systems and linguistics. A line of research spanning the last decade resulted in algorithms that estimate the su… ▽ More We consider the problem of estimating the number of distinct elements in a large data set (or, equivalently, the support size of the distribution induced by the data set) from a random sample of its elements. The problem occurs in many applications, including biology, genomics, computer systems and linguistics. A line of research spanning the last decade resulted in algorithms that estimate the support up to $ \pm \varepsilon n$ from a sample of size $O(\log^2(1/\varepsilon) \cdot n/\log n)$, where $n$ is the data set size. Unfortunately, this bound is known to be tight, limiting further improvements to the complexity of this problem. In this paper we consider estimation algorithms augmented with a machine-learning-based predictor that, given any element, returns an estimation of its frequency. We show that if the predictor is correct up to a constant approximation factor, then the sample complexity can be reduced significantly, to \[ \ \log (1/\varepsilon) \cdot n^{1-Θしーた(1/\log(1/\varepsilon))}. \] We evaluate the proposed algorithms on a collection of data sets, using the neural-network based estimators from {Hsu et al, ICLR'19} as predictors. Our experiments demonstrate substantial (up to 3x) improvements in the estimation accuracy compared to the state of the art algorithm. △ Less

Submitted 15 June, 2021; originally announced June 2021.

Comments: 17 pages. Published as a conference paper in ICLR 2021

arXiv:2012.04488 [pdf, ps, other]

A Concentration Inequality for the Facility Location Problem

Authors: Sandeep Silwal

Abstract: We give a concentration inequality for a stochastic version of the facility location problem. We show the objective $C_n = \min_{F \subseteq [0,1]^2}|F|+\sum_{x\in X}\min_{f\in F}\|x-f\|$ is concentrated in an interval of length $O(n^{1/6})$ and $\E[C_n]=Θしーた(n^{2/3})$ if the input $X$ consists of i.i.d. uniform points in the unit square. Our main tool is to use a geometric quantity, previously used… ▽ More We give a concentration inequality for a stochastic version of the facility location problem. We show the objective $C_n = \min_{F \subseteq [0,1]^2}|F|+\sum_{x\in X}\min_{f\in F}\|x-f\|$ is concentrated in an interval of length $O(n^{1/6})$ and $\E[C_n]=Θしーた(n^{2/3})$ if the input $X$ consists of i.i.d. uniform points in the unit square. Our main tool is to use a geometric quantity, previously used in the design of approximation algorithms for the facility location problem, to analyze a martingale process. Many of our techniques generalize to other settings. △ Less

Submitted 23 July, 2022; v1 submitted 8 December, 2020; originally announced December 2020.

Comments: Operations Research Letters, Volume 50

arXiv:2009.01986 [pdf, other]

Smoothed analysis of the condition number under low-rank perturbations

Authors: Rikhav Shah, Sandeep Silwal

Abstract: Let $M$ be an arbitrary $n$ by $n$ matrix of rank $n-k$. We study the condition number of $M$ plus a \emph{low-rank} perturbation $UV^T$ where $U, V$ are $n$ by $k$ random Gaussian matrices. Under some necessary assumptions, it is shown that $M+UV^T$ is unlikely to have a large condition number. The main advantages of this kind of perturbation over the well-studied dense Gaussian perturbation, whe… ▽ More Let $M$ be an arbitrary $n$ by $n$ matrix of rank $n-k$. We study the condition number of $M$ plus a \emph{low-rank} perturbation $UV^T$ where $U, V$ are $n$ by $k$ random Gaussian matrices. Under some necessary assumptions, it is shown that $M+UV^T$ is unlikely to have a large condition number. The main advantages of this kind of perturbation over the well-studied dense Gaussian perturbation, where every entry is independently perturbed, is the $O(nk)$ cost to store $U,V$ and the $O(nk)$ increase in time complexity for performing the matrix-vector multiplication $(M+UV^T)x$. This improves the $Ωおめが(n^2)$ space and time complexity increase required by a dense perturbation, which is especially burdensome if $M$ is originally sparse. Our results also extend to the case where $U$ and $V$ have rank larger than $k$ and to symmetric and complex settings. We also give an application to linear systems solving and perform some numerical experiments. Lastly, barriers in applying low-rank noise to other problems studied in the smoothed analysis framework are discussed. △ Less

Submitted 13 July, 2021; v1 submitted 3 September, 2020; originally announced September 2020.

arXiv:2006.05418 [pdf, ps, other]

A note on the universality of ESDs of inhomogeneous random matrices

Authors: Vishesh Jain, Sandeep Silwal

Abstract: In this short note, we extend the celebrated results of Tao and Vu, and Krishnapur on the universality of empirical spectral distributions to a wide class of inhomogeneous complex random matrices, by showing that a technical and hard-to-verify Fourier domination assumption may be replaced simply by a natural uniform anti-concentration assumption. Along the way, we show that inhomogeneous complex… ▽ More In this short note, we extend the celebrated results of Tao and Vu, and Krishnapur on the universality of empirical spectral distributions to a wide class of inhomogeneous complex random matrices, by showing that a technical and hard-to-verify Fourier domination assumption may be replaced simply by a natural uniform anti-concentration assumption. Along the way, we show that inhomogeneous complex random matrices, whose expected squared Hilbert-Schmidt norm is quadratic in the dimension, and whose entries (after symmetrization) are uniformly anti-concentrated at $0$ and infinity, typically have smallest singular value $Ωおめが(n^{-1/2})$. The rate $n^{-1/2}$ is sharp, and closes a gap in the literature. Our proofs closely follow recent works of Livshyts, and Livshyts, Tikhomirov, and Vershynin on inhomogeneous real random matrices. The new ingredient is a couple of anti-concentration inequalities for sums of independent, but not necessarily identically distributed, complex random variables, which may also be useful in other contexts. △ Less

Submitted 9 June, 2020; originally announced June 2020.

Comments: 11 pages; comments welcome!

arXiv:1912.01098 [pdf, ps, other]

Using Dimensionality Reduction to Optimize t-SNE

Authors: Rikhav Shah, Sandeep Silwal

Abstract: t-SNE is a popular tool for embedding multi-dimensional datasets into two or three dimensions. However, it has a large computational cost, especially when the input data has many dimensions. Many use t-SNE to embed the output of a neural network, which is generally of much lower dimension than the original data. This limits the use of t-SNE in unsupervised scenarios. We propose using \textit{rando… ▽ More t-SNE is a popular tool for embedding multi-dimensional datasets into two or three dimensions. However, it has a large computational cost, especially when the input data has many dimensions. Many use t-SNE to embed the output of a neural network, which is generally of much lower dimension than the original data. This limits the use of t-SNE in unsupervised scenarios. We propose using \textit{random} projections to embed high dimensional datasets into relatively few dimensions, and then using t-SNE to obtain a two dimensional embedding. We show that random projections preserve the desirable clustering achieved by t-SNE, while dramatically reducing the runtime of finding the embedding. △ Less

Submitted 2 December, 2019; originally announced December 2019.

Comments: 11th Annual Workshop on Optimization for Machine Learning (OPT2019 )

arXiv:1911.08320 [pdf, ps, other]

Property Testing of LP-Type Problems

Authors: Rogers Epstein, Sandeep Silwal

Abstract: Given query access to a set of constraints $S$, we wish to quickly check if some objective function $\varphi$ subject to these constraints is at most a given value $k$. We approach this problem using the framework of property testing where our goal is to distinguish the case $\varphi(S) \le k$ from the case that at least an $εいぷしろん$ fraction of the constraints in $S$ need to be removed for… ▽ More Given query access to a set of constraints $S$, we wish to quickly check if some objective function $\varphi$ subject to these constraints is at most a given value $k$. We approach this problem using the framework of property testing where our goal is to distinguish the case $\varphi(S) \le k$ from the case that at least an $εいぷしろん$ fraction of the constraints in $S$ need to be removed for $\varphi(S) \le k$ to hold. We restrict our attention to the case where $(S, \varphi)$ are LP-Type problems which is a rich family of combinatorial optimization problems with an inherent geometric structure. By utilizing a simple sampling procedure which has been used previously to study these problems, we are able to create property testers for any LP-Type problem whose query complexities are independent of the number of constraints. To the best of our knowledge, this is the first work that connects the area of LP-Type problems and property testing in a systematic way. Among our results is a tight upper bound on the query complexity of testing clusterability with one cluster considered by Alon, Dar, Parnas, and Ron (FOCS 2000). We also supply a corresponding tight lower bound for this problem and other LP-Type problems using geometric constructions. △ Less

Submitted 19 November, 2019; originally announced November 2019.

Comments: 15 pages

arXiv:1911.07324 [pdf, ps, other]

Testing Properties of Multiple Distributions with Few Samples

Authors: Maryam Aliakbarpour, Sandeep Silwal

Abstract: We propose a new setting for testing properties of distributions while receiving samples from several distributions, but few samples per distribution. Given samples from $s$ distributions, $p_1, p_2, \ldots, p_s$, we design testers for the following problems: (1) Uniformity Testing: Testing whether all the $p_i$'s are uniform or $εいぷしろん$-far from being uniform in $\ell_1$-distance (2) Identity Testing:… ▽ More We propose a new setting for testing properties of distributions while receiving samples from several distributions, but few samples per distribution. Given samples from $s$ distributions, $p_1, p_2, \ldots, p_s$, we design testers for the following problems: (1) Uniformity Testing: Testing whether all the $p_i$'s are uniform or $εいぷしろん$-far from being uniform in $\ell_1$-distance (2) Identity Testing: Testing whether all the $p_i$'s are equal to an explicitly given distribution $q$ or $εいぷしろん$-far from $q$ in $\ell_1$-distance, and (3) Closeness Testing: Testing whether all the $p_i$'s are equal to a distribution $q$ which we have sample access to, or $εいぷしろん$-far from $q$ in $\ell_1$-distance. By assuming an additional natural condition about the source distributions, we provide sample optimal testers for all of these problems. △ Less

Submitted 17 November, 2019; originally announced November 2019.

Comments: ITCS 2020

arXiv:1812.11564 [pdf, other]

Spectral methods for testing cluster structure of graphs

Authors: Sandeep Silwal, Jonathan Tidor

Abstract: In the framework of graph property testing, we study the problem of determining if a graph admits a cluster structure. We say that a graph is $(k, φふぁい)$-clusterable if it can be partitioned into at most $k$ parts such that each part has conductance at least $φふぁい$. We present an algorithm that accepts all graphs that are $(2, φふぁい)$-clusterable with probability at least $\frac{2}3$ and rejects all graphs… ▽ More In the framework of graph property testing, we study the problem of determining if a graph admits a cluster structure. We say that a graph is $(k, φふぁい)$-clusterable if it can be partitioned into at most $k$ parts such that each part has conductance at least $φふぁい$. We present an algorithm that accepts all graphs that are $(2, φふぁい)$-clusterable with probability at least $\frac{2}3$ and rejects all graphs that are $εいぷしろん$-far from $(2, φふぁい^*)$-clusterable for $φふぁい^* \le μみゅーφふぁい^2 εいぷしろん^2$ with probability at least $\frac{2}3$ where $μみゅー> 0$ is a parameter that affects the query complexity. This improves upon the work of Czumaj, Peng, and Sohler by removing a $\log n$ factor from the denominator of the bound on $φふぁい^*$ for the case of $k=2$. Our work was concurrent with the work of Chiplunkar et al.\@ who achieved the same improvement for all values of $k$. Our approach for the case $k=2$ relies on the geometric structure of the eigenvectors of the graph Laplacian and results in an algorithm with query complexity $O(n^{1/2+O(1)μみゅー} \cdot \text{poly}(1/εいぷしろん, 1/φふぁい,\log n))$. △ Less

Submitted 30 December, 2018; originally announced December 2018.

Comments: 21 pages, 7 figures

arXiv:1811.12441 [pdf, ps, other]

Infinitely Many Primes Using Generating Functions

Authors: Sandeep Silwal

Abstract: In this short paper we present an elementary proof of the infinitude of primes. Our proof is similar in spirit to Euler's proof that the reciprocals of primes diverges and only uses tools from elementary number theory and calculus. In addition, our proof highlights an interesting use of generating functions. In this short paper we present an elementary proof of the infinitude of primes. Our proof is similar in spirit to Euler's proof that the reciprocals of primes diverges and only uses tools from elementary number theory and calculus. In addition, our proof highlights an interesting use of generating functions. △ Less

Submitted 31 December, 2018; v1 submitted 29 November, 2018; originally announced November 2018.

arXiv:1808.02046 [pdf, other]

doi 10.1093/comnet/cnz006

Directed Random Geometric Graphs

Authors: Jesse Michel, Sushruth Reddy, Rikhav Shah, Sandeep Silwal, Ramis Movassagh

Abstract: Many real-world networks are intrinsically directed. Such networks include activation of genes, hyperlinks on the internet, and the network of followers on Twitter among many others. The challenge, however, is to create a network model that has many of the properties of real-world networks such as powerlaw degree distributions and the small-world property. To meet these challenges, we introduce th… ▽ More Many real-world networks are intrinsically directed. Such networks include activation of genes, hyperlinks on the internet, and the network of followers on Twitter among many others. The challenge, however, is to create a network model that has many of the properties of real-world networks such as powerlaw degree distributions and the small-world property. To meet these challenges, we introduce the \textit{Directed} Random Geometric Graph (DRGG) model, which is an extension of the random geometric graph model. We prove that it is scale-free with respect to the indegree distribution, has binomial outdegree distribution, has a high clustering coefficient, has few edges and is likely small-world. These are some of the main features of aforementioned real world networks. We empirically observe that word association networks have many of the theoretical properties of the DRGG model. △ Less

Submitted 6 August, 2018; originally announced August 2018.

Comments: 14+5 pages, 5 figures, 3 tables

Journal ref: Journal of Complex Networks, Volume 7, Issue 5, October 2019, Pages 792-816,

arXiv:1510.05274 [pdf, ps, other]

Solitary pulse solutions of a coupled nonlinear Schrödinger system arising in optics

Authors: Sharad Silwal

Abstract: We prove the existence of travelling-wave solutions for a system of coupled nonlinear Schrödinger equations arising in nonlinear optics. Such a system describes second-harmonic generation in optical materials with $χかい^{(2)}$ nonlinearity. To prove the existence of travelling waves, we employ the method of concentration compactness to prove the relative compactness of minimizing sequences of the ass… ▽ More We prove the existence of travelling-wave solutions for a system of coupled nonlinear Schrödinger equations arising in nonlinear optics. Such a system describes second-harmonic generation in optical materials with $χかい^{(2)}$ nonlinearity. To prove the existence of travelling waves, we employ the method of concentration compactness to prove the relative compactness of minimizing sequences of the associated variational problem. △ Less

Submitted 18 October, 2015; originally announced October 2015.

arXiv:1508.06120 [pdf, ps, other]

Existence of bound states for (N+1)-coupled long-wave--short-wave interaction equations

Authors: Sharad Silwal

Abstract: We prove the existence of an infinite family of smooth positive bound states for (N +1)-coupled long-wave--short-wave interaction equations. The system describes the interaction between N short waves and a long wave and is of interest in physics and fluid dynamics. We prove the existence of an infinite family of smooth positive bound states for (N +1)-coupled long-wave--short-wave interaction equations. The system describes the interaction between N short waves and a long wave and is of interest in physics and fluid dynamics. △ Less

Submitted 29 August, 2016; v1 submitted 25 August, 2015; originally announced August 2015.

arXiv:1311.5756 [pdf, other]

doi 10.1016/j.exmath.2013.12.008

A visual formalism for weights satisfying reverse inequalities

Authors: Sapto Indratno, Diego Maldonado, Sharad Silwal

Abstract: In this expository article we introduce a diagrammatic scheme to represent reverse classes of weights and some of their properties. In this expository article we introduce a diagrammatic scheme to represent reverse classes of weights and some of their properties. △ Less

Submitted 13 January, 2014; v1 submitted 22 November, 2013; originally announced November 2013.

Comments: 32 pages, 43 figures. Minor typos fixed. To appear in Expositiones Mathematicae

Showing 1–40 of 40 results for author: Silwal, S