-
Natural Language Interfaces for Tabular Data Querying and Visualization: A Survey
Authors:
Weixu Zhang,
Yifei Wang,
Yuanfeng Song,
Victor Junqiu Wei,
Yuxing Tian,
Yiyan Qi,
Jonathan H. Chan,
Raymond Chi-Wing Wong,
Haiqin Yang
Abstract:
The emergence of natural language processing has revolutionized the way users interact with tabular data, enabling a shift from traditional query languages and manual plotting to more intuitive, language-based interfaces. The rise of large language models (LLMs) such as ChatGPT and its successors has further advanced this field, opening new avenues for natural language processing techniques. This…
▽ More
The emergence of natural language processing has revolutionized the way users interact with tabular data, enabling a shift from traditional query languages and manual plotting to more intuitive, language-based interfaces. The rise of large language models (LLMs) such as ChatGPT and its successors has further advanced this field, opening new avenues for natural language processing techniques. This survey presents a comprehensive overview of natural language interfaces for tabular data querying and visualization, which allow users to interact with data using natural language queries. We introduce the fundamental concepts and techniques underlying these interfaces with a particular emphasis on semantic parsing, the key technology facilitating the translation from natural language to SQL queries or data visualization commands. We then delve into the recent advancements in Text-to-SQL and Text-to-Vis problems from the perspectives of datasets, methodologies, metrics, and system designs. This includes a deep dive into the influence of LLMs, highlighting their strengths, limitations, and potential for future improvements. Through this survey, we aim to provide a roadmap for researchers and practitioners interested in developing and applying natural language interfaces for data interaction in the era of large language models.
△ Less
Submitted 19 May, 2024; v1 submitted 27 October, 2023;
originally announced October 2023.
-
Automatic Data Visualization Generation from Chinese Natural Language Questions
Authors:
Yan Ge,
Victor Junqiu Wei,
Yuanfeng Song,
Jason Chen Zhang,
Raymond Chi-Wing Wong
Abstract:
Data visualization has emerged as an effective tool for getting insights from massive datasets. Due to the hardness of manipulating the programming languages of data visualization, automatic data visualization generation from natural languages (Text-to-Vis) is becoming increasingly popular. Despite the plethora of research effort on the English Text-to-Vis, studies have yet to be conducted on data…
▽ More
Data visualization has emerged as an effective tool for getting insights from massive datasets. Due to the hardness of manipulating the programming languages of data visualization, automatic data visualization generation from natural languages (Text-to-Vis) is becoming increasingly popular. Despite the plethora of research effort on the English Text-to-Vis, studies have yet to be conducted on data visualization generation from questions in Chinese. Motivated by this, we propose a Chinese Text-to-Vis dataset in the paper and demonstrate our first attempt to tackle this problem. Our model integrates multilingual BERT as the encoder, boosts the cross-lingual ability, and infuses the $n$-gram information into our word representation learning. Our experimental results show that our dataset is challenging and deserves further research.
△ Less
Submitted 14 September, 2023;
originally announced September 2023.
-
Finding the Dynamics of an Integrable Quantum Many-Body System via Machine Learning
Authors:
Victor Wei,
Alev Orfi,
Felix Fehse,
W. A. Coish
Abstract:
We study the dynamics of the Gaudin magnet ("central-spin model") using machine-learning methods. This model is of practical importance, e.g., for studying non-Markovian decoherence dynamics of a central spin interacting with a large bath of environmental spins and for studies of nonequilibrium superconductivity. The Gaudin magnet is also integrable, admitting many conserved quantities: For $N$ sp…
▽ More
We study the dynamics of the Gaudin magnet ("central-spin model") using machine-learning methods. This model is of practical importance, e.g., for studying non-Markovian decoherence dynamics of a central spin interacting with a large bath of environmental spins and for studies of nonequilibrium superconductivity. The Gaudin magnet is also integrable, admitting many conserved quantities: For $N$ spins, the model Hamiltonian can be written as the sum of $N$ independent commuting operators. Despite this high degree of symmetry, a general closed-form analytic solution for the dynamics of this many-body problem remains elusive. Machine-learning methods may be well suited to exploiting the high degree of symmetry in integrable problems, even when an explicit analytic solution is not obvious. Motivated in part by this intuition, we use a neural-network representation (restricted Boltzmann machine) for each variational eigenstate of the model Hamiltonian. We then obtain accurate representations of the ground state and of the low-lying excited states of the Gaudin-magnet Hamiltonian through a variational Monte Carlo calculation. From the low-lying eigenstates, we find the non-perturbative dynamic transverse spin susceptibility, describing the linear response of a central spin to a time-varying transverse magnetic field in the presence of a spin bath. Having an efficient description of this susceptibility opens the door to improved characterization and quantum control procedures for qubits interacting with an environment of quantum two-level systems. These systems include electron-spin and hole-spin qubits interacting with environmental nuclear spins via hyperfine interactions or qubits with charge or flux degrees of freedom interacting with coherent charge or paramagnetic impurities.
△ Less
Submitted 22 September, 2023; v1 submitted 6 July, 2023;
originally announced July 2023.
-
Neural-Shadow Quantum State Tomography
Authors:
Victor Wei,
W. A. Coish,
Pooya Ronagh,
Christine A. Muschik
Abstract:
Quantum state tomography (QST) is the art of reconstructing an unknown quantum state through measurements. It is a key primitive for developing quantum technologies. Neural network quantum state tomography (NNQST), which aims to reconstruct the quantum state via a neural network ansatz, is often implemented via a basis-dependent cross-entropy loss function. State-of-the-art implementations of NNQS…
▽ More
Quantum state tomography (QST) is the art of reconstructing an unknown quantum state through measurements. It is a key primitive for developing quantum technologies. Neural network quantum state tomography (NNQST), which aims to reconstruct the quantum state via a neural network ansatz, is often implemented via a basis-dependent cross-entropy loss function. State-of-the-art implementations of NNQST are often restricted to characterizing a particular subclass of states, to avoid an exponential growth in the number of required measurement settings. To provide a more broadly applicable method for efficient state reconstruction, we present "neural-shadow quantum state tomography" (NSQST)-an alternative neural network-based QST protocol that uses infidelity as the loss function. The infidelity is estimated using the classical shadows of the target state. Infidelity is a natural choice for training loss, benefiting from the proven measurement sample efficiency of the classical shadow formalism. Furthermore, NSQST is robust against various types of noise without any error mitigation. We numerically demonstrate the advantage of NSQST over NNQST at learning the relative phases of three target quantum states of practical interest, as well as the advantage over direct shadow estimation. NSQST greatly extends the practical reach of NNQST and provides a novel route to effective quantum state tomography.
△ Less
Submitted 15 June, 2024; v1 submitted 1 May, 2023;
originally announced May 2023.
-
Simulating one-dimensional quantum chromodynamics on a quantum computer: Real-time evolutions of tetra- and pentaquarks
Authors:
Yasar Y. Atas,
Jan F. Haase,
Jinglei Zhang,
Victor Wei,
Sieglinde M. -L. Pfaendler,
Randy Lewis,
Christine A. Muschik
Abstract:
Quantum chromodynamics - the theory of quarks and gluons - has been known for decades, but it is yet to be fully understood. A recent example is the prediction and experimental discovery of tetraquarks, that opened a new research field. Crucially, numerous unsolved questions of the standard model can exclusively be addressed by nonperturbative calculations. Quantum computers can solve problems for…
▽ More
Quantum chromodynamics - the theory of quarks and gluons - has been known for decades, but it is yet to be fully understood. A recent example is the prediction and experimental discovery of tetraquarks, that opened a new research field. Crucially, numerous unsolved questions of the standard model can exclusively be addressed by nonperturbative calculations. Quantum computers can solve problems for which well established QCD methods are inapplicable, such as real-time evolution. We take a key step in exploring this possibility by performing a real-time evolution of tetraquark and pentaquark physics in one-dimensional SU(3) gauge theory on a superconducting quantum computer. Our experiment represents a first quantum computation involving quarks with three colour degrees of freedom, i.e. with the gauge group of QCD.
△ Less
Submitted 13 February, 2023; v1 submitted 7 July, 2022;
originally announced July 2022.
-
Symmetric Norm Estimation and Regression on Sliding Windows
Authors:
Vladimir Braverman,
Viska Wei,
Samson Zhou
Abstract:
The sliding window model generalizes the standard streaming model and often performs better in applications where recent data is more important or more accurate than data that arrived prior to a certain time. We study the problem of approximating symmetric norms (a norm on $\mathbb{R}^n$ that is invariant under sign-flips and coordinate-wise permutations) in the sliding window model, where only th…
▽ More
The sliding window model generalizes the standard streaming model and often performs better in applications where recent data is more important or more accurate than data that arrived prior to a certain time. We study the problem of approximating symmetric norms (a norm on $\mathbb{R}^n$ that is invariant under sign-flips and coordinate-wise permutations) in the sliding window model, where only the $W$ most recent updates define the underlying frequency vector. Whereas standard norm estimation algorithms for sliding windows rely on the smooth histogram framework of Braverman and Ostrovsky (FOCS 2007), analyzing the smoothness of general symmetric norms seems to be a challenging obstacle. Instead, we observe that the symmetric norm streaming algorithm of Braverman et. al. (STOC 2017) can be reduced to identifying and approximating the frequency of heavy-hitters in a number of substreams. We introduce a heavy-hitter algorithm that gives a $(1+ε)$-approximation to each of the reported frequencies in the sliding window model, thus obtaining the first algorithm for general symmetric norm estimation in the sliding window model. Our algorithm is a universal sketch that simultaneously approximates all symmetric norms in a parametrizable class and also improves upon the smooth histogram framework for estimating $L_p$ norms, for a range of large $p$. Finally, we consider the problem of overconstrained linear regression problem in the case that loss function that is an Orlicz norm, a symmetric norm that can be interpreted as a scale-invariant version of $M$-estimators. We give the first sublinear space algorithms that produce $(1+ε)$-approximate solutions to the linear regression problem for loss functions that are Orlicz norms in both the streaming and sliding window models.
△ Less
Submitted 3 September, 2021;
originally announced September 2021.
-
Sketch and Scale: Geo-distributed tSNE and UMAP
Authors:
Viska Wei,
Nikita Ivkin,
Vladimir Braverman,
Alexander Szalay
Abstract:
Running machine learning analytics over geographically distributed datasets is a rapidly arising problem in the world of data management policies ensuring privacy and data security. Visualizing high dimensional data using tools such as t-distributed Stochastic Neighbor Embedding (tSNE) and Uniform Manifold Approximation and Projection (UMAP) became common practice for data scientists. Both tools s…
▽ More
Running machine learning analytics over geographically distributed datasets is a rapidly arising problem in the world of data management policies ensuring privacy and data security. Visualizing high dimensional data using tools such as t-distributed Stochastic Neighbor Embedding (tSNE) and Uniform Manifold Approximation and Projection (UMAP) became common practice for data scientists. Both tools scale poorly in time and memory. While recent optimizations showed successful handling of 10,000 data points, scaling beyond million points is still challenging. We introduce a novel framework: Sketch and Scale (SnS). It leverages a Count Sketch data structure to compress the data on the edge nodes, aggregates the reduced size sketches on the master node, and runs vanilla tSNE or UMAP on the summary, representing the densest areas, extracted from the aggregated sketch. We show this technique to be fully parallel, scale linearly in time, logarithmically in memory, and communication, making it possible to analyze datasets with many millions, potentially billions of data points, spread across several data centers around the globe. We demonstrate the power of our method on two mid-size datasets: cancer data with 52 million 35-band pixels from multiple images of tumor biopsies; and astrophysics data of 100 million stars with multi-color photometry from the Sloan Digital Sky Survey (SDSS).
△ Less
Submitted 11 November, 2020;
originally announced November 2020.
-
Bounding the Charm Yukawa
Authors:
Nina M. Coyle,
Carlos E. M. Wagner,
Viska Wei
Abstract:
The study of the properties of the observed Higgs boson is one of the main research activities in High Energy Physics. Although the couplings of the Higgs to the weak gauge bosons and third generation quark and leptons have been studied in detail, little is known about the Higgs couplings to first and second generation fermions. In this article, we study the charm quark Higgs coupling in the so-ca…
▽ More
The study of the properties of the observed Higgs boson is one of the main research activities in High Energy Physics. Although the couplings of the Higgs to the weak gauge bosons and third generation quark and leptons have been studied in detail, little is known about the Higgs couplings to first and second generation fermions. In this article, we study the charm quark Higgs coupling in the so-called $κ$ framework. We emphasize the existence of specific correlations between the Higgs couplings that can render the measured LHC Higgs production rates close to the SM values in the presence of large deviations of the charm coupling from its SM value, $κ_c = 1$. Based on this knowledge, we update the indirect bounds on $κ_c$ through a fit to the precision Higgs measurements at the LHC. We also examine the limits on $κ_c$ arising from the radiative decay $H \to J/ψ+ γ$, the charm quark-associated Higgs production, charm quark decays of the Higgs field and charge asymmetry in $W^{\pm} + H$ production. Estimates for the future LHC sensitivity on $κ_c$ at the high luminosity run are provided.
△ Less
Submitted 28 October, 2019; v1 submitted 22 May, 2019;
originally announced May 2019.