-
Sirius: Contextual Sparsity with Correction for Efficient LLMs
Authors:
Yang Zhou,
Zhuoming Chen,
Zhaozhuo Xu,
Victoria Lin,
Beidi Chen
Abstract:
With the blossom of large language models (LLMs), inference efficiency becomes increasingly important. Various approximation methods are proposed to reduce the cost at inference time. Contextual Sparsity (CS) is appealing for its training-free nature and its ability to reach a higher compression ratio seemingly without quality degradation. However, after a comprehensive evaluation of contextual sp…
▽ More
With the blossom of large language models (LLMs), inference efficiency becomes increasingly important. Various approximation methods are proposed to reduce the cost at inference time. Contextual Sparsity (CS) is appealing for its training-free nature and its ability to reach a higher compression ratio seemingly without quality degradation. However, after a comprehensive evaluation of contextual sparsity methods on various complex generation tasks, we find that although CS succeeds in prompt-understanding tasks, CS significantly degrades the model performance for reasoning, deduction, and knowledge-based tasks. Despite the gap in end-to-end accuracy, we observed that sparse models often share general problem-solving logic and require only a few token corrections to recover the original model performance. This paper introduces Sirius, an efficient correction mechanism, which significantly recovers CS models quality on reasoning tasks while maintaining its efficiency gain. Sirius is evaluated on 6 models with 8 difficult generation tasks in reasoning, math, and coding and shows consistent effectiveness and efficiency. Also, we carefully develop a system implementation for Sirius and show that Sirius achieves roughly 20% reduction in latency for 8B model on-chip and 35% reduction for 70B model offloading. We open-source our implementation of Sirius at https://github.com/Infini-AI-Lab/Sirius.git.
△ Less
Submitted 5 September, 2024;
originally announced September 2024.
-
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts
Authors:
Xi Victoria Lin,
Akshat Shrivastava,
Liang Luo,
Srinivasan Iyer,
Mike Lewis,
Gargi Ghosh,
Luke Zettlemoyer,
Armen Aghajanyan
Abstract:
We introduce MoMa, a novel modality-aware mixture-of-experts (MoE) architecture designed for pre-training mixed-modal, early-fusion language models. MoMa processes images and text in arbitrary sequences by dividing expert modules into modality-specific groups. These groups exclusively process designated tokens while employing learned routing within each group to maintain semantically informed adap…
▽ More
We introduce MoMa, a novel modality-aware mixture-of-experts (MoE) architecture designed for pre-training mixed-modal, early-fusion language models. MoMa processes images and text in arbitrary sequences by dividing expert modules into modality-specific groups. These groups exclusively process designated tokens while employing learned routing within each group to maintain semantically informed adaptivity. Our empirical results reveal substantial pre-training efficiency gains through this modality-specific parameter allocation. Under a 1-trillion-token training budget, the MoMa 1.4B model, featuring 4 text experts and 4 image experts, achieves impressive FLOPs savings: 3.7x overall, with 2.6x for text and 5.2x for image processing compared to a compute-equivalent dense baseline, measured by pre-training loss. This outperforms the standard expert-choice MoE with 8 mixed-modal experts, which achieves 3x overall FLOPs savings (3x for text, 2.8x for image). Combining MoMa with mixture-of-depths (MoD) further improves pre-training FLOPs savings to 4.2x overall (text: 3.4x, image: 5.3x), although this combination hurts performance in causal inference due to increased sensitivity to router accuracy. These results demonstrate MoMa's potential to significantly advance the efficiency of mixed-modal, early-fusion language model pre-training, paving the way for more resource-efficient and capable multimodal AI systems.
△ Less
Submitted 12 August, 2024; v1 submitted 31 July, 2024;
originally announced July 2024.
-
Nearest Neighbor Speculative Decoding for LLM Generation and Attribution
Authors:
Minghan Li,
Xilun Chen,
Ari Holtzman,
Beidi Chen,
Jimmy Lin,
Wen-tau Yih,
Xi Victoria Lin
Abstract:
Large language models (LLMs) often hallucinate and lack the ability to provide attribution for their generations. Semi-parametric LMs, such as kNN-LM, approach these limitations by refining the output of an LM for a given prompt using its nearest neighbor matches in a non-parametric data store. However, these models often exhibit slow inference speeds and produce non-fluent texts. In this paper, w…
▽ More
Large language models (LLMs) often hallucinate and lack the ability to provide attribution for their generations. Semi-parametric LMs, such as kNN-LM, approach these limitations by refining the output of an LM for a given prompt using its nearest neighbor matches in a non-parametric data store. However, these models often exhibit slow inference speeds and produce non-fluent texts. In this paper, we introduce Nearest Neighbor Speculative Decoding (NEST), a novel semi-parametric language modeling approach that is capable of incorporating real-world text spans of arbitrary length into the LM generations and providing attribution to their sources. NEST performs token-level retrieval at each inference step to compute a semi-parametric mixture distribution and identify promising span continuations in a corpus. It then uses an approximate speculative decoding procedure that accepts a prefix of the retrieved span or generates a new token. NEST significantly enhances the generation quality and attribution rate of the base LM across a variety of knowledge-intensive tasks, surpassing the conventional kNN-LM method and performing competitively with in-context retrieval augmentation. In addition, NEST substantially improves the generation speed, achieving a 1.8x speedup in inference time when applied to Llama-2-Chat 70B.
△ Less
Submitted 30 May, 2024; v1 submitted 29 May, 2024;
originally announced May 2024.
-
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Authors:
Sainbayar Sukhbaatar,
Olga Golovneva,
Vasu Sharma,
Hu Xu,
Xi Victoria Lin,
Baptiste Rozière,
Jacob Kahn,
Daniel Li,
Wen-tau Yih,
Jason Weston,
Xian Li
Abstract:
We investigate efficient methods for training Large Language Models (LLMs) to possess capabilities in multiple specialized domains, such as coding, math reasoning and world knowledge. Our method, named Branch-Train-MiX (BTX), starts from a seed model, which is branched to train experts in embarrassingly parallel fashion with high throughput and reduced communication cost. After individual experts…
▽ More
We investigate efficient methods for training Large Language Models (LLMs) to possess capabilities in multiple specialized domains, such as coding, math reasoning and world knowledge. Our method, named Branch-Train-MiX (BTX), starts from a seed model, which is branched to train experts in embarrassingly parallel fashion with high throughput and reduced communication cost. After individual experts are asynchronously trained, BTX brings together their feedforward parameters as experts in Mixture-of-Expert (MoE) layers and averages the remaining parameters, followed by an MoE-finetuning stage to learn token-level routing. BTX generalizes two special cases, the Branch-Train-Merge method, which does not have the MoE finetuning stage to learn routing, and sparse upcycling, which omits the stage of training experts asynchronously. Compared to alternative approaches, BTX achieves the best accuracy-efficiency tradeoff.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Optimizing Language Models for Human Preferences is a Causal Inference Problem
Authors:
Victoria Lin,
Eli Ben-Michael,
Louis-Philippe Morency
Abstract:
As large language models (LLMs) see greater use in academic and commercial settings, there is increasing interest in methods that allow language models to generate texts aligned with human preferences. In this paper, we present an initial exploration of language model optimization for human preferences from direct outcome datasets, where each sample consists of a text and an associated numerical o…
▽ More
As large language models (LLMs) see greater use in academic and commercial settings, there is increasing interest in methods that allow language models to generate texts aligned with human preferences. In this paper, we present an initial exploration of language model optimization for human preferences from direct outcome datasets, where each sample consists of a text and an associated numerical outcome measuring the reader's response. We first propose that language model optimization should be viewed as a causal problem to ensure that the model correctly learns the relationship between the text and the outcome. We formalize this causal language optimization problem, and we develop a method--causal preference optimization (CPO)--that solves an unbiased surrogate objective for the problem. We further extend CPO with doubly robust CPO (DR-CPO), which reduces the variance of the surrogate objective while retaining provably strong guarantees on bias. Finally, we empirically demonstrate the effectiveness of (DR-)CPO in optimizing state-of-the-art LLMs for human preferences on direct outcome data, and we validate the robustness of DR-CPO under difficult confounding conditions.
△ Less
Submitted 5 June, 2024; v1 submitted 22 February, 2024;
originally announced February 2024.
-
Instruction-tuned Language Models are Better Knowledge Learners
Authors:
Zhengbao Jiang,
Zhiqing Sun,
Weijia Shi,
Pedro Rodriguez,
Chunting Zhou,
Graham Neubig,
Xi Victoria Lin,
Wen-tau Yih,
Srinivasan Iyer
Abstract:
In order for large language model (LLM)-based assistants to effectively adapt to evolving information needs, it must be possible to update their factual knowledge through continued training on new data. The standard recipe for doing so involves continued pre-training on new documents followed by instruction-tuning on question-answer (QA) pairs. However, we find that LLMs trained with this recipe s…
▽ More
In order for large language model (LLM)-based assistants to effectively adapt to evolving information needs, it must be possible to update their factual knowledge through continued training on new data. The standard recipe for doing so involves continued pre-training on new documents followed by instruction-tuning on question-answer (QA) pairs. However, we find that LLMs trained with this recipe struggle to answer questions, even though the perplexity of documents is minimized. We found that QA pairs are generally straightforward, while documents are more complex, weaving many factual statements together in an intricate manner. Therefore, we hypothesize that it is beneficial to expose LLMs to QA pairs before continued pre-training on documents so that the process of encoding knowledge from complex documents takes into account how this knowledge is accessed through questions. Based on this, we propose pre-instruction-tuning (PIT), a method that instruction-tunes on questions prior to training on documents. This contrasts with standard instruction-tuning, which learns how to extract knowledge after training on documents. Extensive experiments and ablation studies demonstrate that pre-instruction-tuning significantly enhances the ability of LLMs to absorb knowledge from new documents, outperforming standard instruction-tuning by 17.8%.
△ Less
Submitted 25 May, 2024; v1 submitted 20 February, 2024;
originally announced February 2024.
-
Mosaic number and Tile number of Corner Connection Tiles
Authors:
Vincent Lin
Abstract:
Lomonaco and Kauffman introduced knot mosaics in 2008 to model physical quantum states. These mosaics use a set of tiles to represent knots on $n x n$ grids. In 2023 Heap introduced a new set of tiles that can represent knots on a smaller board for small knots. Completing an exhaustive search of all knots or links, $K$, on different board sizes and types is the most common way to determine invaria…
▽ More
Lomonaco and Kauffman introduced knot mosaics in 2008 to model physical quantum states. These mosaics use a set of tiles to represent knots on $n x n$ grids. In 2023 Heap introduced a new set of tiles that can represent knots on a smaller board for small knots. Completing an exhaustive search of all knots or links, $K$, on different board sizes and types is the most common way to determine invariants for knots, such as the smallest board size needed to represent a knot, $m(K)$, and the least number of tiles needed to represent a knot, $t(K)$. In this paper, we propose a solution to an open question by providing a proof that all knots or links can be represented on corner connection mosaics using fewer tiles than traditional mosaics $t_c(K) < t(K)$, where $t_c(K)$ is the smallest number of corner connection tiles needed to represent knot \textit{K}. We also define bounds for corner connection mosaic size, $m_c(K)$, in terms of crossing number, $c(K)$, and simultaneously create a tool called the \textit{Corner Mosaic Complement} that we use to discover a relationship between traditional tiles and corner connection tiles. Finally, we construct an infinite family of links $L_n$ where the corner connection mosaic number $m_c(K)$ is known and provide a tool to analyze the efficiency of corner connection mosaic tiles.
△ Less
Submitted 27 November, 2023; v1 submitted 27 November, 2023;
originally announced November 2023.
-
Text-Transport: Toward Learning Causal Effects of Natural Language
Authors:
Victoria Lin,
Louis-Philippe Morency,
Eli Ben-Michael
Abstract:
As language technologies gain prominence in real-world settings, it is important to understand how changes to language affect reader perceptions. This can be formalized as the causal effect of varying a linguistic attribute (e.g., sentiment) on a reader's response to the text. In this paper, we introduce Text-Transport, a method for estimation of causal effects from natural language under any text…
▽ More
As language technologies gain prominence in real-world settings, it is important to understand how changes to language affect reader perceptions. This can be formalized as the causal effect of varying a linguistic attribute (e.g., sentiment) on a reader's response to the text. In this paper, we introduce Text-Transport, a method for estimation of causal effects from natural language under any text distribution. Current approaches for valid causal effect estimation require strong assumptions about the data, meaning the data from which one can estimate valid causal effects often is not representative of the actual target domain of interest. To address this issue, we leverage the notion of distribution shift to describe an estimator that transports causal effects between domains, bypassing the need for strong assumptions in the target domain. We derive statistical guarantees on the uncertainty of this estimator, and we report empirical results and analyses that support the validity of Text-Transport across data settings. Finally, we use Text-Transport to study a realistic setting--hate speech on social media--in which causal effects do shift significantly between text domains, demonstrating the necessity of transport when conducting causal inference on natural language.
△ Less
Submitted 31 October, 2023;
originally announced October 2023.
-
In-context Pretraining: Language Modeling Beyond Document Boundaries
Authors:
Weijia Shi,
Sewon Min,
Maria Lomeli,
Chunting Zhou,
Margaret Li,
Gergely Szilvasy,
Rich James,
Xi Victoria Lin,
Noah A. Smith,
Luke Zettlemoyer,
Scott Yih,
Mike Lewis
Abstract:
Large language models (LMs) are currently trained to predict tokens given document prefixes, enabling them to directly perform long-form generation and prompting-style tasks which can be reduced to document completion. Existing pretraining pipelines train LMs by concatenating random sets of short documents to create input contexts but the prior documents provide no signal for predicting the next d…
▽ More
Large language models (LMs) are currently trained to predict tokens given document prefixes, enabling them to directly perform long-form generation and prompting-style tasks which can be reduced to document completion. Existing pretraining pipelines train LMs by concatenating random sets of short documents to create input contexts but the prior documents provide no signal for predicting the next document. We instead present In-Context Pretraining, a new approach where language models are pretrained on a sequence of related documents, thereby explicitly encouraging them to read and reason across document boundaries. We can do In-Context Pretraining by simply changing the document ordering so that each context contains related documents, and directly applying existing pretraining pipelines. However, this document sorting problem is challenging. There are billions of documents and we would like the sort to maximize contextual similarity for every document without repeating any data. To do this, we introduce approximate algorithms for finding related documents with efficient nearest neighbor search and constructing coherent input contexts with a graph traversal algorithm. Our experiments show In-Context Pretraining offers a simple and scalable approach to significantly enhance LMs'performance: we see notable improvements in tasks that require more complex contextual reasoning, including in-context learning (+8%), reading comprehension (+15%), faithfulness to previous contexts (+16%), long-context reasoning (+5%), and retrieval augmentation (+9%).
△ Less
Submitted 24 June, 2024; v1 submitted 16 October, 2023;
originally announced October 2023.
-
RA-DIT: Retrieval-Augmented Dual Instruction Tuning
Authors:
Xi Victoria Lin,
Xilun Chen,
Mingda Chen,
Weijia Shi,
Maria Lomeli,
Rich James,
Pedro Rodriguez,
Jacob Kahn,
Gergely Szilvasy,
Mike Lewis,
Luke Zettlemoyer,
Scott Yih
Abstract:
Retrieval-augmented language models (RALMs) improve performance by accessing long-tail and up-to-date knowledge from external data stores, but are challenging to build. Existing approaches require either expensive retrieval-specific modifications to LM pre-training or use post-hoc integration of the data store that leads to suboptimal performance. We introduce Retrieval-Augmented Dual Instruction…
▽ More
Retrieval-augmented language models (RALMs) improve performance by accessing long-tail and up-to-date knowledge from external data stores, but are challenging to build. Existing approaches require either expensive retrieval-specific modifications to LM pre-training or use post-hoc integration of the data store that leads to suboptimal performance. We introduce Retrieval-Augmented Dual Instruction Tuning (RA-DIT), a lightweight fine-tuning methodology that provides a third option by retrofitting any LLM with retrieval capabilities. Our approach operates in two distinct fine-tuning steps: (1) one updates a pre-trained LM to better use retrieved information, while (2) the other updates the retriever to return more relevant results, as preferred by the LM. By fine-tuning over tasks that require both knowledge utilization and contextual awareness, we demonstrate that each stage yields significant performance improvements, and using both leads to additional gains. Our best model, RA-DIT 65B, achieves state-of-the-art performance across a range of knowledge-intensive zero- and few-shot learning benchmarks, significantly outperforming existing in-context RALM approaches by up to +8.9% in 0-shot setting and +1.4% in 5-shot setting on average.
△ Less
Submitted 6 May, 2024; v1 submitted 2 October, 2023;
originally announced October 2023.
-
Distributionally Robust Statistical Verification with Imprecise Neural Networks
Authors:
Souradeep Dutta,
Michele Caprio,
Vivian Lin,
Matthew Cleaveland,
Kuk Jin Jang,
Ivan Ruchkin,
Oleg Sokolsky,
Insup Lee
Abstract:
A particularly challenging problem in AI safety is providing guarantees on the behavior of high-dimensional autonomous systems. Verification approaches centered around reachability analysis fail to scale, and purely statistical approaches are constrained by the distributional assumptions about the sampling process. Instead, we pose a distributionally robust version of the statistical verification…
▽ More
A particularly challenging problem in AI safety is providing guarantees on the behavior of high-dimensional autonomous systems. Verification approaches centered around reachability analysis fail to scale, and purely statistical approaches are constrained by the distributional assumptions about the sampling process. Instead, we pose a distributionally robust version of the statistical verification problem for black-box systems, where our performance guarantees hold over a large family of distributions. This paper proposes a novel approach based on a combination of active learning, uncertainty quantification, and neural network verification. A central piece of our approach is an ensemble technique called Imprecise Neural Networks, which provides the uncertainty to guide active learning. The active learning uses an exhaustive neural-network verification tool Sherlock to collect samples. An evaluation on multiple physical simulators in the openAI gym Mujoco environments with reinforcement-learned controllers demonstrates that our approach can provide useful and scalable guarantees for high-dimensional systems.
△ Less
Submitted 11 December, 2023; v1 submitted 28 August, 2023;
originally announced August 2023.
-
Reimagining Retrieval Augmented Language Models for Answering Queries
Authors:
Wang-Chiew Tan,
Yuliang Li,
Pedro Rodriguez,
Richard James,
Xi Victoria Lin,
Alon Halevy,
Scott Yih
Abstract:
We present a reality check on large language models and inspect the promise of retrieval augmented language models in comparison. Such language models are semi-parametric, where models integrate model parameters and knowledge from external data sources to make their predictions, as opposed to the parametric nature of vanilla large language models. We give initial experimental findings that semi-pa…
▽ More
We present a reality check on large language models and inspect the promise of retrieval augmented language models in comparison. Such language models are semi-parametric, where models integrate model parameters and knowledge from external data sources to make their predictions, as opposed to the parametric nature of vanilla large language models. We give initial experimental findings that semi-parametric architectures can be enhanced with views, a query analyzer/planner, and provenance to make a significantly more powerful system for question answering in terms of accuracy and efficiency, and potentially for other NLP tasks
△ Less
Submitted 1 June, 2023;
originally announced June 2023.
-
SenteCon: Leveraging Lexicons to Learn Human-Interpretable Language Representations
Authors:
Victoria Lin,
Louis-Philippe Morency
Abstract:
Although deep language representations have become the dominant form of language featurization in recent years, in many settings it is important to understand a model's decision-making process. This necessitates not only an interpretable model but also interpretable features. In particular, language must be featurized in a way that is interpretable while still characterizing the original text well…
▽ More
Although deep language representations have become the dominant form of language featurization in recent years, in many settings it is important to understand a model's decision-making process. This necessitates not only an interpretable model but also interpretable features. In particular, language must be featurized in a way that is interpretable while still characterizing the original text well. We present SenteCon, a method for introducing human interpretability in deep language representations. Given a passage of text, SenteCon encodes the text as a layer of interpretable categories in which each dimension corresponds to the relevance of a specific category. Our empirical evaluations indicate that encoding language with SenteCon provides high-level interpretability at little to no cost to predictive performance on downstream tasks. Moreover, we find that SenteCon outperforms existing interpretable language representations with respect to both its downstream performance and its agreement with human characterizations of the text.
△ Less
Submitted 1 June, 2023; v1 submitted 24 May, 2023;
originally announced May 2023.
-
Counterfactual Augmentation for Multimodal Learning Under Presentation Bias
Authors:
Victoria Lin,
Louis-Philippe Morency,
Dimitrios Dimitriadis,
Srinagesh Sharma
Abstract:
In real-world machine learning systems, labels are often derived from user behaviors that the system wishes to encourage. Over time, new models must be trained as new training examples and features become available. However, feedback loops between users and models can bias future user behavior, inducing a presentation bias in the labels that compromises the ability to train new models. In this pap…
▽ More
In real-world machine learning systems, labels are often derived from user behaviors that the system wishes to encourage. Over time, new models must be trained as new training examples and features become available. However, feedback loops between users and models can bias future user behavior, inducing a presentation bias in the labels that compromises the ability to train new models. In this paper, we propose counterfactual augmentation, a novel causal method for correcting presentation bias using generated counterfactual labels. Our empirical evaluations demonstrate that counterfactual augmentation yields better downstream performance compared to both uncorrected models and existing bias-correction methods. Model analyses further indicate that the generated counterfactuals align closely with true counterfactuals in an oracle setting.
△ Less
Submitted 30 October, 2023; v1 submitted 23 May, 2023;
originally announced May 2023.
-
Towards A Unified View of Sparse Feed-Forward Network in Pretraining Large Language Model
Authors:
Zeyu Leo Liu,
Tim Dettmers,
Xi Victoria Lin,
Veselin Stoyanov,
Xian Li
Abstract:
Large and sparse feed-forward layers (S-FFN) such as Mixture-of-Experts (MoE) have proven effective in scaling up Transformers model size for \textit{pretraining} large language models. By only activating part of the FFN parameters conditioning on input, S-FFN improves generalization performance while keeping training and inference costs (in FLOPs) fixed. In this work, we analyzed two major design…
▽ More
Large and sparse feed-forward layers (S-FFN) such as Mixture-of-Experts (MoE) have proven effective in scaling up Transformers model size for \textit{pretraining} large language models. By only activating part of the FFN parameters conditioning on input, S-FFN improves generalization performance while keeping training and inference costs (in FLOPs) fixed. In this work, we analyzed two major design choices of S-FFN: the memory block (a.k.a. expert) size and the memory block selection method under a general conceptual framework of sparse neural memory. Using this unified framework, we compare several S-FFN architectures for language modeling and provide insights into their relative efficacy and efficiency. We found a simpler selection method -- \textbf{\texttt{Avg-K}} that selects blocks through their mean aggregated hidden states, achieving lower perplexity in language model pretraining compared to existing MoE architectures including Switch Transformer (Fedus et al., 2021) and HashLayer (Roller et al., 2021).
△ Less
Submitted 23 October, 2023; v1 submitted 23 May, 2023;
originally announced May 2023.
-
Vision-based Estimation of Fatigue and Engagement in Cognitive Training Sessions
Authors:
Yanchen Wang,
Adam Turnbull,
Yunlong Xu,
Kathi Heffner,
Feng Vankee Lin,
Ehsan Adeli
Abstract:
Computerized cognitive training (CCT) is a scalable, well-tolerated intervention that has promise for slowing cognitive decline. Outcomes from CCT are limited by a lack of effective engagement, which is decreased by factors such as mental fatigue, particularly in older adults at risk for dementia. There is a need for scalable, automated measures that can monitor mental fatigue during CCT. Here, we…
▽ More
Computerized cognitive training (CCT) is a scalable, well-tolerated intervention that has promise for slowing cognitive decline. Outcomes from CCT are limited by a lack of effective engagement, which is decreased by factors such as mental fatigue, particularly in older adults at risk for dementia. There is a need for scalable, automated measures that can monitor mental fatigue during CCT. Here, we develop and validate a novel Recurrent Video Transformer (RVT) method for monitoring real-time mental fatigue in older adults with mild cognitive impairment from video-recorded facial gestures during CCT. The RVT model achieved the highest balanced accuracy(78%) and precision (0.82) compared to the prior state-of-the-art models for binary and multi-class classification of mental fatigue and was additionally validated via significant association (p=0.023) with CCT reaction time. By leveraging dynamic temporal information, the RVT model demonstrates the potential to accurately measure real-time mental fatigue, laying the foundation for future personalized CCT that increase effective engagement.
△ Less
Submitted 15 November, 2023; v1 submitted 24 April, 2023;
originally announced April 2023.
-
DC4L: Distribution Shift Recovery via Data-Driven Control for Deep Learning Models
Authors:
Vivian Lin,
Kuk Jin Jang,
Souradeep Dutta,
Michele Caprio,
Oleg Sokolsky,
Insup Lee
Abstract:
Deep neural networks have repeatedly been shown to be non-robust to the uncertainties of the real world, even to naturally occurring ones. A vast majority of current approaches have focused on data-augmentation methods to expand the range of perturbations that the classifier is exposed to while training. A relatively unexplored avenue that is equally promising involves sanitizing an image as a pre…
▽ More
Deep neural networks have repeatedly been shown to be non-robust to the uncertainties of the real world, even to naturally occurring ones. A vast majority of current approaches have focused on data-augmentation methods to expand the range of perturbations that the classifier is exposed to while training. A relatively unexplored avenue that is equally promising involves sanitizing an image as a preprocessing step, depending on the nature of perturbation. In this paper, we propose to use control for learned models to recover from distribution shifts online. Specifically, our method applies a sequence of semantic-preserving transformations to bring the shifted data closer in distribution to the training set, as measured by the Wasserstein distance. Our approach is to 1) formulate the problem of distribution shift recovery as a Markov decision process, which we solve using reinforcement learning, 2) identify a minimum condition on the data for our method to be applied, which we check online using a binary classifier, and 3) employ dimensionality reduction through orthonormal projection to aid in our estimates of the Wasserstein distance. We provide theoretical evidence that orthonormal projection preserves characteristics of the data at the distributional level. We apply our distribution shift recovery approach to the ImageNet-C benchmark for distribution shifts, demonstrating an improvement in average accuracy of up to 14.21% across a variety of state-of-the-art ImageNet classifiers. We further show that our method generalizes to composites of shifts from the ImageNet-C benchmark, achieving improvements in average accuracy of up to 9.81%. Finally, we test our method on CIFAR-100-C and report improvements of up to 8.25%.
△ Less
Submitted 15 May, 2024; v1 submitted 20 February, 2023;
originally announced February 2023.
-
Credal Bayesian Deep Learning
Authors:
Michele Caprio,
Souradeep Dutta,
Kuk Jin Jang,
Vivian Lin,
Radoslav Ivanov,
Oleg Sokolsky,
Insup Lee
Abstract:
Uncertainty quantification and robustness to distribution shifts are important goals in machine learning and artificial intelligence. Although Bayesian Neural Networks (BNNs) allow for uncertainty in the predictions to be assessed, different sources of uncertainty are indistinguishable. We present Credal Bayesian Deep Learning (CBDL). Heuristically, CBDL allows to train an (uncountably) infinite e…
▽ More
Uncertainty quantification and robustness to distribution shifts are important goals in machine learning and artificial intelligence. Although Bayesian Neural Networks (BNNs) allow for uncertainty in the predictions to be assessed, different sources of uncertainty are indistinguishable. We present Credal Bayesian Deep Learning (CBDL). Heuristically, CBDL allows to train an (uncountably) infinite ensemble of BNNs, using only finitely many elements. This is possible thanks to prior and likelihood finitely generated credal sets (FGCSs), a concept from the imprecise probability literature. Intuitively, convex combinations of a finite collection of prior-likelihood pairs are able to represent infinitely many such pairs. After training, CBDL outputs a set of posteriors on the parameters of the neural network. At inference time, such posterior set is used to derive a set of predictive distributions that is in turn utilized to distinguish between aleatoric and epistemic uncertainties, and to quantify them. The predictive set also produces either (i) a collection of outputs enjoying desirable probabilistic guarantees, or (ii) the single output that is deemed the best, that is, the one having the highest predictive lower probability -- another imprecise-probabilistic concept. CBDL is more robust than single BNNs to prior and likelihood misspecification, and to distribution shift. We show that CBDL is better at quantifying and disentangling different types of uncertainties than single BNNs, ensemble of BNNs, and Bayesian Model Averaging. In addition, we apply CBDL to two case studies to demonstrate its downstream tasks capabilities: one, for motion prediction in autonomous driving scenarios, and two, to model blood glucose and insulin dynamics for artificial pancreas control. We show that CBDL performs better when compared to an ensemble of BNNs baseline.
△ Less
Submitted 22 February, 2024; v1 submitted 19 February, 2023;
originally announced February 2023.
-
LEVER: Learning to Verify Language-to-Code Generation with Execution
Authors:
Ansong Ni,
Srini Iyer,
Dragomir Radev,
Ves Stoyanov,
Wen-tau Yih,
Sida I. Wang,
Xi Victoria Lin
Abstract:
The advent of large language models trained on code (code LLMs) has led to significant progress in language-to-code generation. State-of-the-art approaches in this area combine LLM decoding with sample pruning and reranking using test cases or heuristics based on the execution results. However, it is challenging to obtain test cases for many real-world language-to-code applications, and heuristics…
▽ More
The advent of large language models trained on code (code LLMs) has led to significant progress in language-to-code generation. State-of-the-art approaches in this area combine LLM decoding with sample pruning and reranking using test cases or heuristics based on the execution results. However, it is challenging to obtain test cases for many real-world language-to-code applications, and heuristics cannot well capture the semantic features of the execution results, such as data type and value range, which often indicates the correctness of the program. In this work, we propose LEVER, a simple approach to improve language-to-code generation by learning to verify the generated programs with their execution results. Specifically, we train verifiers to determine whether a program sampled from the LLMs is correct or not based on the natural language input, the program itself and its execution results. The sampled programs are reranked by combining the verification score with the LLM generation probability, and marginalizing over programs with the same execution results. On four datasets across the domains of table QA, math QA and basic Python programming, LEVER consistently improves over the base code LLMs(4.6% to 10.9% with code-davinci-002) and achieves new state-of-the-art results on all of them.
△ Less
Submitted 1 September, 2023; v1 submitted 16 February, 2023;
originally announced February 2023.
-
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization
Authors:
Srinivasan Iyer,
Xi Victoria Lin,
Ramakanth Pasunuru,
Todor Mihaylov,
Daniel Simig,
Ping Yu,
Kurt Shuster,
Tianlu Wang,
Qing Liu,
Punit Singh Koura,
Xian Li,
Brian O'Horo,
Gabriel Pereyra,
Jeff Wang,
Christopher Dewan,
Asli Celikyilmaz,
Luke Zettlemoyer,
Ves Stoyanov
Abstract:
Recent work has shown that fine-tuning large pre-trained language models on a collection of tasks described via instructions, a.k.a. instruction-tuning, improves their zero and few-shot generalization to unseen tasks. However, there is a limited understanding of the performance trade-offs of different decisions made during the instruction-tuning process. These decisions include the scale and diver…
▽ More
Recent work has shown that fine-tuning large pre-trained language models on a collection of tasks described via instructions, a.k.a. instruction-tuning, improves their zero and few-shot generalization to unseen tasks. However, there is a limited understanding of the performance trade-offs of different decisions made during the instruction-tuning process. These decisions include the scale and diversity of the instruction-tuning benchmark, different task sampling strategies, fine-tuning with and without demonstrations, training using specialized datasets for reasoning and dialogue, and finally, the fine-tuning objectives themselves. In this paper, we characterize the effect of instruction-tuning decisions on downstream task performance when scaling both model and benchmark sizes. To this end, we create OPT-IML Bench: a large benchmark for Instruction Meta-Learning (IML) of 2000 NLP tasks consolidated into task categories from 8 existing benchmarks, and prepare an evaluation framework to measure three types of model generalizations: to tasks from fully held-out categories, to held-out tasks from seen categories, and to held-out instances from seen tasks. Through the lens of this framework, we first present insights about instruction-tuning decisions as applied to OPT-30B and further exploit these insights to train OPT-IML 30B and 175B, which are instruction-tuned versions of OPT. OPT-IML demonstrates all three generalization abilities at both scales on four different evaluation benchmarks with diverse tasks and input formats -- PromptSource, FLAN, Super-NaturalInstructions, and UnifiedSKG. Not only does it significantly outperform OPT on all benchmarks but is also highly competitive with existing models fine-tuned on each specific benchmark. We release OPT-IML at both scales, together with the OPT-IML Bench evaluation framework.
△ Less
Submitted 30 January, 2023; v1 submitted 22 December, 2022;
originally announced December 2022.
-
Training Trajectories of Language Models Across Scales
Authors:
Mengzhou Xia,
Mikel Artetxe,
Chunting Zhou,
Xi Victoria Lin,
Ramakanth Pasunuru,
Danqi Chen,
Luke Zettlemoyer,
Ves Stoyanov
Abstract:
Scaling up language models has led to unprecedented performance gains, but little is understood about how the training dynamics change as models get larger. How do language models of different sizes learn during pre-training? Why do larger language models demonstrate more desirable behaviors? In this paper, we analyze the intermediate training checkpoints of differently sized OPT models (Zhang et…
▽ More
Scaling up language models has led to unprecedented performance gains, but little is understood about how the training dynamics change as models get larger. How do language models of different sizes learn during pre-training? Why do larger language models demonstrate more desirable behaviors? In this paper, we analyze the intermediate training checkpoints of differently sized OPT models (Zhang et al.,2022)--from 125M to 175B parameters--on next-token prediction, sequence-level generation, and downstream tasks. We find that 1) at a given perplexity and independent of model sizes, a similar subset of training tokens see the most significant reduction in loss, with the rest stagnating or showing double-descent behavior; 2) early in training, all models learn to reduce the perplexity of grammatical sequences that contain hallucinations, with small models halting at this suboptimal distribution and larger ones eventually learning to assign these sequences lower probabilities; 3) perplexity is a strong predictor of in-context learning performance on 74 multiple-choice tasks from BIG-Bench, and this holds independent of the model size. Together, these results show that perplexity is more predictive of model behaviors than model size or training computation.
△ Less
Submitted 29 May, 2023; v1 submitted 19 December, 2022;
originally announced December 2022.
-
SeedBERT: Recovering Annotator Rating Distributions from an Aggregated Label
Authors:
Aneesha Sampath,
Victoria Lin,
Louis-Philippe Morency
Abstract:
Many machine learning tasks -- particularly those in affective computing -- are inherently subjective. When asked to classify facial expressions or to rate an individual's attractiveness, humans may disagree with one another, and no single answer may be objectively correct. However, machine learning datasets commonly have just one "ground truth" label for each sample, so models trained on these la…
▽ More
Many machine learning tasks -- particularly those in affective computing -- are inherently subjective. When asked to classify facial expressions or to rate an individual's attractiveness, humans may disagree with one another, and no single answer may be objectively correct. However, machine learning datasets commonly have just one "ground truth" label for each sample, so models trained on these labels may not perform well on tasks that are subjective in nature. Though allowing models to learn from the individual annotators' ratings may help, most datasets do not provide annotator-specific labels for each sample. To address this issue, we propose SeedBERT, a method for recovering annotator rating distributions from a single label by inducing pre-trained models to attend to different portions of the input. Our human evaluations indicate that SeedBERT's attention mechanism is consistent with human sources of annotator disagreement. Moreover, in our empirical evaluations using large language models, SeedBERT demonstrates substantial gains in performance on downstream subjective tasks compared both to standard deep learning models and to other current models that account explicitly for annotator disagreement.
△ Less
Submitted 23 November, 2022;
originally announced November 2022.
-
FOLIO: Natural Language Reasoning with First-Order Logic
Authors:
Simeng Han,
Hailey Schoelkopf,
Yilun Zhao,
Zhenting Qi,
Martin Riddell,
Wenfei Zhou,
James Coady,
David Peng,
Yujie Qiao,
Luke Benson,
Lucy Sun,
Alex Wardle-Solano,
Hannah Szabo,
Ekaterina Zubova,
Matthew Burtell,
Jonathan Fan,
Yixin Liu,
Brian Wong,
Malcolm Sailor,
Ansong Ni,
Linyong Nan,
Jungo Kasai,
Tao Yu,
Rui Zhang,
Alexander R. Fabbri
, et al. (10 additional authors not shown)
Abstract:
Large language models (LLMs) have achieved remarkable performance on a variety of natural language understanding tasks. However, existing benchmarks are inadequate in measuring the complex logical reasoning capabilities of a model. We present FOLIO, a human-annotated, logically complex and diverse dataset for reasoning in natural language (NL), equipped with first-order logic (FOL) annotations. FO…
▽ More
Large language models (LLMs) have achieved remarkable performance on a variety of natural language understanding tasks. However, existing benchmarks are inadequate in measuring the complex logical reasoning capabilities of a model. We present FOLIO, a human-annotated, logically complex and diverse dataset for reasoning in natural language (NL), equipped with first-order logic (FOL) annotations. FOLIO consists of 1,430 examples (unique conclusions), each paired with one of 487 sets of premises used to deductively reason for the validity of each conclusion. The logical correctness of the premises and conclusions is ensured by their FOL annotations, which are automatically verified by an FOL inference engine. In addition to the main NL reasoning task, NL-FOL pairs in FOLIO constitute a new NL-FOL translation dataset. Our experiments on FOLIO systematically evaluate the FOL reasoning ability of supervised fine-tuning on medium-sized language models. For both NL reasoning and NL-FOL translation, we benchmark multiple state-of-the-art language models. Our results show that a subset of FOLIO presents a challenge for one of the most capable {Large Language Model (LLM)} publicly available, GPT-4.
△ Less
Submitted 17 May, 2024; v1 submitted 2 September, 2022;
originally announced September 2022.
-
Lifting the Curse of Multilinguality by Pre-training Modular Transformers
Authors:
Jonas Pfeiffer,
Naman Goyal,
Xi Victoria Lin,
Xian Li,
James Cross,
Sebastian Riedel,
Mikel Artetxe
Abstract:
Multilingual pre-trained models are known to suffer from the curse of multilinguality, which causes per-language performance to drop as they cover more languages. We address this issue by introducing language-specific modules, which allows us to grow the total capacity of the model, while keeping the total number of trainable parameters per language constant. In contrast with prior work that learn…
▽ More
Multilingual pre-trained models are known to suffer from the curse of multilinguality, which causes per-language performance to drop as they cover more languages. We address this issue by introducing language-specific modules, which allows us to grow the total capacity of the model, while keeping the total number of trainable parameters per language constant. In contrast with prior work that learns language-specific components post-hoc, we pre-train the modules of our Cross-lingual Modular (X-Mod) models from the start. Our experiments on natural language inference, named entity recognition and question answering show that our approach not only mitigates the negative interference between languages, but also enables positive transfer, resulting in improved monolingual and cross-lingual performance. Furthermore, our approach enables adding languages post-hoc with no measurable drop in performance, no longer limiting the model usage to the set of pre-trained languages.
△ Less
Submitted 12 May, 2022;
originally announced May 2022.
-
On Continual Model Refinement in Out-of-Distribution Data Streams
Authors:
Bill Yuchen Lin,
Sida Wang,
Xi Victoria Lin,
Robin Jia,
Lin Xiao,
Xiang Ren,
Wen-tau Yih
Abstract:
Real-world natural language processing (NLP) models need to be continually updated to fix the prediction errors in out-of-distribution (OOD) data streams while overcoming catastrophic forgetting. However, existing continual learning (CL) problem setups cannot cover such a realistic and complex scenario. In response to this, we propose a new CL problem formulation dubbed continual model refinement…
▽ More
Real-world natural language processing (NLP) models need to be continually updated to fix the prediction errors in out-of-distribution (OOD) data streams while overcoming catastrophic forgetting. However, existing continual learning (CL) problem setups cannot cover such a realistic and complex scenario. In response to this, we propose a new CL problem formulation dubbed continual model refinement (CMR). Compared to prior CL settings, CMR is more practical and introduces unique challenges (boundary-agnostic and non-stationary distribution shift, diverse mixtures of multiple OOD data clusters, error-centric streams, etc.). We extend several existing CL approaches to the CMR setting and evaluate them extensively. For benchmarking and analysis, we propose a general sampling algorithm to obtain dynamic OOD data streams with controllable non-stationarity, as well as a suite of metrics measuring various aspects of online performance. Our experiments and detailed analysis reveal the promise and challenges of the CMR problem, supporting that studying CMR in dynamic OOD streams can benefit the longevity of deployed NLP models in production.
△ Less
Submitted 4 May, 2022;
originally announced May 2022.
-
OPT: Open Pre-trained Transformer Language Models
Authors:
Susan Zhang,
Stephen Roller,
Naman Goyal,
Mikel Artetxe,
Moya Chen,
Shuohui Chen,
Christopher Dewan,
Mona Diab,
Xian Li,
Xi Victoria Lin,
Todor Mihaylov,
Myle Ott,
Sam Shleifer,
Kurt Shuster,
Daniel Simig,
Punit Singh Koura,
Anjali Sridhar,
Tianlu Wang,
Luke Zettlemoyer
Abstract:
Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning. Given their computational cost, these models are difficult to replicate without significant capital. For the few that are available through APIs, no access is granted to the full model weights, making them difficult to study. We present Open…
▽ More
Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning. Given their computational cost, these models are difficult to replicate without significant capital. For the few that are available through APIs, no access is granted to the full model weights, making them difficult to study. We present Open Pre-trained Transformers (OPT), a suite of decoder-only pre-trained transformers ranging from 125M to 175B parameters, which we aim to fully and responsibly share with interested researchers. We show that OPT-175B is comparable to GPT-3, while requiring only 1/7th the carbon footprint to develop. We are also releasing our logbook detailing the infrastructure challenges we faced, along with code for experimenting with all of the released models.
△ Less
Submitted 21 June, 2022; v1 submitted 2 May, 2022;
originally announced May 2022.
-
Pretty Princess vs. Successful Leader: Gender Roles in Greeting Card Messages
Authors:
Jiao Sun,
Tongshuang Wu,
Yue Jiang,
Ronil Awalegaonkar,
Xi Victoria Lin,
Diyi Yang
Abstract:
People write personalized greeting cards on various occasions. While prior work has studied gender roles in greeting card messages, systematic analysis at scale and tools for raising the awareness of gender stereotyping remain under-investigated. To this end, we collect a large greeting card message corpus covering three different occasions (birthday, Valentine's Day and wedding) from three source…
▽ More
People write personalized greeting cards on various occasions. While prior work has studied gender roles in greeting card messages, systematic analysis at scale and tools for raising the awareness of gender stereotyping remain under-investigated. To this end, we collect a large greeting card message corpus covering three different occasions (birthday, Valentine's Day and wedding) from three sources (exemplars from greeting message websites, real-life greetings from social media and language model generated ones). We uncover a wide range of gender stereotypes in this corpus via topic modeling, odds ratio and Word Embedding Association Test (WEAT). We further conduct a survey to understand people's perception of gender roles in messages from this corpus and if gender stereotyping is a concern. The results show that people want to be aware of gender roles in the messages, but remain unconcerned unless the perceived gender roles conflict with the recipient's true personality. In response, we developed GreetA, an interactive visualization and writing assistant tool to visualize fine-grained topics in greeting card messages drafted by the users and the associated gender perception scores, but without suggesting text changes as an intervention.
△ Less
Submitted 27 December, 2021;
originally announced December 2021.
-
Efficient Large Scale Language Modeling with Mixtures of Experts
Authors:
Mikel Artetxe,
Shruti Bhosale,
Naman Goyal,
Todor Mihaylov,
Myle Ott,
Sam Shleifer,
Xi Victoria Lin,
Jingfei Du,
Srinivasan Iyer,
Ramakanth Pasunuru,
Giri Anantharaman,
Xian Li,
Shuohui Chen,
Halil Akin,
Mandeep Baines,
Louis Martin,
Xing Zhou,
Punit Singh Koura,
Brian O'Horo,
Jeff Wang,
Luke Zettlemoyer,
Mona Diab,
Zornitsa Kozareva,
Ves Stoyanov
Abstract:
Mixture of Experts layers (MoEs) enable efficient scaling of language models through conditional computation. This paper presents a detailed empirical study of how autoregressive MoE language models scale in comparison with dense models in a wide range of settings: in- and out-of-domain language modeling, zero- and few-shot priming, and full-shot fine-tuning. With the exception of fine-tuning, we…
▽ More
Mixture of Experts layers (MoEs) enable efficient scaling of language models through conditional computation. This paper presents a detailed empirical study of how autoregressive MoE language models scale in comparison with dense models in a wide range of settings: in- and out-of-domain language modeling, zero- and few-shot priming, and full-shot fine-tuning. With the exception of fine-tuning, we find MoEs to be substantially more compute efficient. At more modest training budgets, MoEs can match the performance of dense models using $\sim$4 times less compute. This gap narrows at scale, but our largest MoE model (1.1T parameters) consistently outperforms a compute-equivalent dense model (6.7B parameters). Overall, this performance gap varies greatly across tasks and domains, suggesting that MoE and dense models generalize differently in ways that are worthy of future study. We make our code and models publicly available for research use.
△ Less
Submitted 26 October, 2022; v1 submitted 20 December, 2021;
originally announced December 2021.
-
Few-shot Learning with Multilingual Language Models
Authors:
Xi Victoria Lin,
Todor Mihaylov,
Mikel Artetxe,
Tianlu Wang,
Shuohui Chen,
Daniel Simig,
Myle Ott,
Naman Goyal,
Shruti Bhosale,
Jingfei Du,
Ramakanth Pasunuru,
Sam Shleifer,
Punit Singh Koura,
Vishrav Chaudhary,
Brian O'Horo,
Jeff Wang,
Luke Zettlemoyer,
Zornitsa Kozareva,
Mona Diab,
Veselin Stoyanov,
Xian Li
Abstract:
Large-scale generative language models such as GPT-3 are competitive few-shot learners. While these models are known to be able to jointly represent many different languages, their training data is dominated by English, potentially limiting their cross-lingual generalization. In this work, we train multilingual generative language models on a corpus covering a diverse set of languages, and study t…
▽ More
Large-scale generative language models such as GPT-3 are competitive few-shot learners. While these models are known to be able to jointly represent many different languages, their training data is dominated by English, potentially limiting their cross-lingual generalization. In this work, we train multilingual generative language models on a corpus covering a diverse set of languages, and study their few- and zero-shot learning capabilities in a wide range of tasks. Our largest model with 7.5 billion parameters sets new state of the art in few-shot learning in more than 20 representative languages, outperforming GPT-3 of comparable size in multilingual commonsense reasoning (with +7.4% absolute accuracy improvement in 0-shot settings and +9.4% in 4-shot settings) and natural language inference (+5.4% in each of 0-shot and 4-shot settings). On the FLORES-101 machine translation benchmark, our model outperforms GPT-3 on 171 out of 182 directions with 32 training examples, while surpassing the official supervised baseline in 45 directions. We conduct an in-depth analysis of different multilingual prompting approaches, showing in particular that strong few-shot learning performance across languages can be achieved via cross-lingual transfer through both templates and demonstration examples. Finally, we evaluate our models in social value tasks such as hate speech detection in five languages and find it has limitations similar to comparable sized GPT-3 models.
△ Less
Submitted 10 November, 2022; v1 submitted 20 December, 2021;
originally announced December 2021.
-
Long and Short Periodic Billiard Trajectories in the Regular Pentagon
Authors:
Samuel Everett,
Vanessa Lin,
Aidan Mager
Abstract:
In any periodic direction on the regular pentagon billiard table, there exists two combinatorially different billiard paths, with one longer than the other. For each periodic direction, McMullen asked if one could determine whether the periodic trajectory through a given point is long, short, or a saddle connection. In this paper we present an algorithm resolving this question for trajectories ema…
▽ More
In any periodic direction on the regular pentagon billiard table, there exists two combinatorially different billiard paths, with one longer than the other. For each periodic direction, McMullen asked if one could determine whether the periodic trajectory through a given point is long, short, or a saddle connection. In this paper we present an algorithm resolving this question for trajectories emanating from the midpoints of the pentagon.
△ Less
Submitted 18 November, 2021;
originally announced November 2021.
-
Salt-based autopeering for DLT-networks
Authors:
Sebastian Müller,
Angelo Capossele,
Bartosz Kuśmierz,
Vivian Lin,
Hans Moog,
Andreas Penzkofer,
Olivia Saa,
William Sanders,
Wolfgang Welz
Abstract:
The security of any Distributed Ledger Technology (DLT) depends on the safety of the network layer. Much effort has been put into understanding the consensus layer of DLTs. However, many network layer designs seem ad-hoc and lack a careful analysis of the influence of the design decisions on the whole DLT system. We propose a salt-based automated neighbor selection protocol that shows the inherent…
▽ More
The security of any Distributed Ledger Technology (DLT) depends on the safety of the network layer. Much effort has been put into understanding the consensus layer of DLTs. However, many network layer designs seem ad-hoc and lack a careful analysis of the influence of the design decisions on the whole DLT system. We propose a salt-based automated neighbor selection protocol that shows the inherent tradeoffs of certain design decisions and allows a quantitative treatment of some network topology requirements. This example may serve as a design framework and facilitate future research. We provide a selection of results from simulations to highlight some tradeoffs in the design decisions.
△ Less
Submitted 3 November, 2021;
originally announced November 2021.
-
Screening the Coulomb interaction leads to a prethermal regime in two-dimensional bad conductors
Authors:
L. J. Stanley,
Ping V. Lin,
J. Jaroszyński,
Dragana Popović
Abstract:
The absence of thermalization in certain isolated many-body systems is of great fundamental interest. Many-body localization (MBL) is a widely studied mechanism for thermalization to fail in strongly disordered quantum systems, but it is still not understood precisely how the range of interactions affects the dynamical behavior and the existence of MBL, especially in dimensions $D>1$. By investiga…
▽ More
The absence of thermalization in certain isolated many-body systems is of great fundamental interest. Many-body localization (MBL) is a widely studied mechanism for thermalization to fail in strongly disordered quantum systems, but it is still not understood precisely how the range of interactions affects the dynamical behavior and the existence of MBL, especially in dimensions $D>1$. By investigating nonequilibrium dynamics in strongly disordered $D=2$ electron systems with power-law interactions $\propto 1/r^α$ and poor coupling to a thermal bath, here we observe MBL-like, prethermal dynamics for $α=3$. In contrast, for $α=1$, the system thermalizes, although the dynamics is glassy. Our results provide important insights for theory, especially since we obtained them on systems that are much closer to the thermodynamic limit than synthetic quantum systems employed in previous studies of MBL. Thus, our work is a key step towards further studies of ergodicity breaking and quantum entanglement in real materials.
△ Less
Submitted 3 November, 2023; v1 submitted 21 October, 2021;
originally announced October 2021.
-
A Multilayer Network Model of the Coevolution of the Spread of a Disease and Competing Opinions
Authors:
Kaiyan Peng,
Zheng Lu,
Vanessa Lin,
Michael R. Lindstrom,
Christian Parkinson,
Chuntian Wang,
Andrea L. Bertozzi,
Mason A. Porter
Abstract:
During the COVID-19 pandemic, conflicting opinions on physical distancing swept across social media, affecting both human behavior and the spread of COVID-19. Inspired by such phenomena, we construct a two-layer multiplex network for the coupled spread of a disease and conflicting opinions. We model each process as a contagion. On one layer, we consider the concurrent evolution of two opinions --…
▽ More
During the COVID-19 pandemic, conflicting opinions on physical distancing swept across social media, affecting both human behavior and the spread of COVID-19. Inspired by such phenomena, we construct a two-layer multiplex network for the coupled spread of a disease and conflicting opinions. We model each process as a contagion. On one layer, we consider the concurrent evolution of two opinions -- pro-physical-distancing and anti-physical-distancing -- that compete with each other and have mutual immunity to each other. The disease evolves on the other layer, and individuals are less likely (respectively, more likely) to become infected when they adopt the pro-physical-distancing (respectively, anti-physical-distancing) opinion. We develop approximations of mean-field type by generalizing monolayer pair approximations to multilayer networks; these approximations agree well with Monte Carlo simulations for a broad range of parameters and several network structures. Through numerical simulations, we illustrate the influence of opinion dynamics on the spread of the disease from complex interactions both between the two conflicting opinions and between the opinions and the disease. We find that lengthening the duration that individuals hold an opinion may help suppress disease transmission, and we demonstrate that increasing the cross-layer correlations or intra-layer correlations of node degrees may lead to fewer individuals becoming infected with the disease.
△ Less
Submitted 4 July, 2021;
originally announced July 2021.
-
Stage-wise Fine-tuning for Graph-to-Text Generation
Authors:
Qingyun Wang,
Semih Yavuz,
Victoria Lin,
Heng Ji,
Nazneen Rajani
Abstract:
Graph-to-text generation has benefited from pre-trained language models (PLMs) in achieving better performance than structured graph encoders. However, they fail to fully utilize the structure information of the input graph. In this paper, we aim to further improve the performance of the pre-trained language model by proposing a structured graph-to-text model with a two-step fine-tuning mechanism…
▽ More
Graph-to-text generation has benefited from pre-trained language models (PLMs) in achieving better performance than structured graph encoders. However, they fail to fully utilize the structure information of the input graph. In this paper, we aim to further improve the performance of the pre-trained language model by proposing a structured graph-to-text model with a two-step fine-tuning mechanism which first fine-tunes the model on Wikipedia before adapting to the graph-to-text generation. In addition to using the traditional token and position embeddings to encode the knowledge graph (KG), we propose a novel tree-level embedding method to capture the inter-dependency structures of the input graph. This new approach has significantly improved the performance of all text generation metrics for the English WebNLG 2017 dataset.
△ Less
Submitted 30 May, 2021; v1 submitted 17 May, 2021;
originally announced May 2021.
-
Learning to Synthesize Data for Semantic Parsing
Authors:
Bailin Wang,
Wenpeng Yin,
Xi Victoria Lin,
Caiming Xiong
Abstract:
Synthesizing data for semantic parsing has gained increasing attention recently. However, most methods require handcrafted (high-precision) rules in their generative process, hindering the exploration of diverse unseen data. In this work, we propose a generative model which features a (non-neural) PCFG that models the composition of programs (e.g., SQL), and a BART-based translation model that map…
▽ More
Synthesizing data for semantic parsing has gained increasing attention recently. However, most methods require handcrafted (high-precision) rules in their generative process, hindering the exploration of diverse unseen data. In this work, we propose a generative model which features a (non-neural) PCFG that models the composition of programs (e.g., SQL), and a BART-based translation model that maps a program to an utterance. Due to the simplicity of PCFG and pre-trained BART, our generative model can be efficiently learned from existing data at hand. Moreover, explicitly modeling compositions using PCFG leads to a better exploration of unseen programs, thus generate more diverse data. We evaluate our method in both in-domain and out-of-domain settings of text-to-SQL parsing on the standard benchmarks of GeoQuery and Spider, respectively. Our empirical results show that the synthesized data generated from our model can substantially help a semantic parser achieve better compositional and domain generalization.
△ Less
Submitted 27 April, 2021; v1 submitted 12 April, 2021;
originally announced April 2021.
-
FeTaQA: Free-form Table Question Answering
Authors:
Linyong Nan,
Chiachun Hsieh,
Ziming Mao,
Xi Victoria Lin,
Neha Verma,
Rui Zhang,
Wojciech Kryściński,
Nick Schoelkopf,
Riley Kong,
Xiangru Tang,
Murori Mutuma,
Ben Rosand,
Isabel Trindade,
Renusree Bandaru,
Jacob Cunningham,
Caiming Xiong,
Dragomir Radev
Abstract:
Existing table question answering datasets contain abundant factual questions that primarily evaluate the query and schema comprehension capability of a system, but they fail to include questions that require complex reasoning and integration of information due to the constraint of the associated short-form answers. To address these issues and to demonstrate the full challenge of table question an…
▽ More
Existing table question answering datasets contain abundant factual questions that primarily evaluate the query and schema comprehension capability of a system, but they fail to include questions that require complex reasoning and integration of information due to the constraint of the associated short-form answers. To address these issues and to demonstrate the full challenge of table question answering, we introduce FeTaQA, a new dataset with 10K Wikipedia-based {table, question, free-form answer, supporting table cells} pairs. FeTaQA yields a more challenging table question answering setting because it requires generating free-form text answers after retrieval, inference, and integration of multiple discontinuous facts from a structured knowledge source. Unlike datasets of generative QA over text in which answers are prevalent with copies of short text spans from the source, answers in our dataset are human-generated explanations involving entities and their high-level relations. We provide two benchmark methods for the proposed task: a pipeline method based on semantic-parsing-based QA systems and an end-to-end method based on large pretrained text generation models, and show that FeTaQA poses a challenge for both methods.
△ Less
Submitted 1 April, 2021;
originally announced April 2021.
-
NeurIPS 2020 NLC2CMD Competition: Translating Natural Language to Bash Commands
Authors:
Mayank Agarwal,
Tathagata Chakraborti,
Quchen Fu,
David Gros,
Xi Victoria Lin,
Jaron Maene,
Kartik Talamadupula,
Zhongwei Teng,
Jules White
Abstract:
The NLC2CMD Competition hosted at NeurIPS 2020 aimed to bring the power of natural language processing to the command line. Participants were tasked with building models that can transform descriptions of command line tasks in English to their Bash syntax. This is a report on the competition with details of the task, metrics, data, attempted solutions, and lessons learned.
The NLC2CMD Competition hosted at NeurIPS 2020 aimed to bring the power of natural language processing to the command line. Participants were tasked with building models that can transform descriptions of command line tasks in English to their Bash syntax. This is a report on the competition with details of the task, metrics, data, attempted solutions, and lessons learned.
△ Less
Submitted 8 August, 2021; v1 submitted 3 March, 2021;
originally announced March 2021.
-
Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic Parsing
Authors:
Xi Victoria Lin,
Richard Socher,
Caiming Xiong
Abstract:
We present BRIDGE, a powerful sequential architecture for modeling dependencies between natural language questions and relational databases in cross-DB semantic parsing. BRIDGE represents the question and DB schema in a tagged sequence where a subset of the fields are augmented with cell values mentioned in the question. The hybrid sequence is encoded by BERT with minimal subsequent layers and the…
▽ More
We present BRIDGE, a powerful sequential architecture for modeling dependencies between natural language questions and relational databases in cross-DB semantic parsing. BRIDGE represents the question and DB schema in a tagged sequence where a subset of the fields are augmented with cell values mentioned in the question. The hybrid sequence is encoded by BERT with minimal subsequent layers and the text-DB contextualization is realized via the fine-tuned deep attention in BERT. Combined with a pointer-generator decoder with schema-consistency driven search space pruning, BRIDGE attained state-of-the-art performance on popular cross-DB text-to-SQL benchmarks, Spider (71.1\% dev, 67.5\% test with ensemble model) and WikiSQL (92.6\% dev, 91.9\% test). Our analysis shows that BRIDGE effectively captures the desired cross-modal dependencies and has the potential to generalize to more text-DB related tasks. Our implementation is available at \url{https://github.com/salesforce/TabularSemanticParsing}.
△ Less
Submitted 30 December, 2020; v1 submitted 23 December, 2020;
originally announced December 2020.
-
ColloQL: Robust Cross-Domain Text-to-SQL Over Search Queries
Authors:
Karthik Radhakrishnan,
Arvind Srikantan,
Xi Victoria Lin
Abstract:
Translating natural language utterances to executable queries is a helpful technique in making the vast amount of data stored in relational databases accessible to a wider range of non-tech-savvy end users. Prior work in this area has largely focused on textual input that is linguistically correct and semantically unambiguous. However, real-world user queries are often succinct, colloquial, and no…
▽ More
Translating natural language utterances to executable queries is a helpful technique in making the vast amount of data stored in relational databases accessible to a wider range of non-tech-savvy end users. Prior work in this area has largely focused on textual input that is linguistically correct and semantically unambiguous. However, real-world user queries are often succinct, colloquial, and noisy, resembling the input of a search engine. In this work, we introduce data augmentation techniques and a sampling-based content-aware BERT model (ColloQL) to achieve robust text-to-SQL modeling over natural language search (NLS) questions. Due to the lack of evaluation data, we curate a new dataset of NLS questions and demonstrate the efficacy of our approach. ColloQL's superior performance extends to well-formed text, achieving 84.9% (logical) and 90.7% (execution) accuracy on the WikiSQL dataset, making it, to the best of our knowledge, the highest performing model that does not use execution guided decoding.
△ Less
Submitted 19 October, 2020;
originally announced October 2020.
-
GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing
Authors:
Tao Yu,
Chien-Sheng Wu,
Xi Victoria Lin,
Bailin Wang,
Yi Chern Tan,
Xinyi Yang,
Dragomir Radev,
Richard Socher,
Caiming Xiong
Abstract:
We present GraPPa, an effective pre-training approach for table semantic parsing that learns a compositional inductive bias in the joint representations of textual and tabular data. We construct synthetic question-SQL pairs over high-quality tables via a synchronous context-free grammar (SCFG) induced from existing text-to-SQL datasets. We pre-train our model on the synthetic data using a novel te…
▽ More
We present GraPPa, an effective pre-training approach for table semantic parsing that learns a compositional inductive bias in the joint representations of textual and tabular data. We construct synthetic question-SQL pairs over high-quality tables via a synchronous context-free grammar (SCFG) induced from existing text-to-SQL datasets. We pre-train our model on the synthetic data using a novel text-schema linking objective that predicts the syntactic role of a table field in the SQL for each question-SQL pair. To maintain the model's ability to represent real-world data, we also include masked language modeling (MLM) over several existing table-and-language datasets to regularize the pre-training process. On four popular fully supervised and weakly supervised table semantic parsing benchmarks, GraPPa significantly outperforms RoBERTa-large as the feature representation layers and establishes new state-of-the-art results on all of them.
△ Less
Submitted 28 May, 2021; v1 submitted 29 September, 2020;
originally announced September 2020.
-
Toward Multimodal Modeling of Emotional Expressiveness
Authors:
Victoria Lin,
Jeffrey M. Girard,
Michael A. Sayette,
Louis-Philippe Morency
Abstract:
Emotional expressiveness captures the extent to which a person tends to outwardly display their emotions through behavior. Due to the close relationship between emotional expressiveness and behavioral health, as well as the crucial role that it plays in social interaction, the ability to automatically predict emotional expressiveness stands to spur advances in science, medicine, and industry. In t…
▽ More
Emotional expressiveness captures the extent to which a person tends to outwardly display their emotions through behavior. Due to the close relationship between emotional expressiveness and behavioral health, as well as the crucial role that it plays in social interaction, the ability to automatically predict emotional expressiveness stands to spur advances in science, medicine, and industry. In this paper, we explore three related research questions. First, how well can emotional expressiveness be predicted from visual, linguistic, and multimodal behavioral signals? Second, which behavioral modalities are uniquely important to the prediction of emotional expressiveness? Third, which behavioral signals are reliably related to emotional expressiveness? To answer these questions, we add highly reliable transcripts and human ratings of perceived emotional expressiveness to an existing video database and use this data to train, validate, and test predictive models. Our best model shows promising predictive performance on this dataset (RMSE=0.65, R^2=0.45, r=0.74). Multimodal models tend to perform best overall, and models trained on the linguistic modality tend to outperform models trained on the visual modality. Finally, examination of our interpretable models' coefficients reveals a number of visual and linguistic behavioral signals--such as facial action unit intensity, overall word count, and use of words related to social processes--that reliably predict emotional expressiveness.
△ Less
Submitted 31 August, 2020;
originally announced September 2020.
-
Photon: A Robust Cross-Domain Text-to-SQL System
Authors:
Jichuan Zeng,
Xi Victoria Lin,
Caiming Xiong,
Richard Socher,
Michael R. Lyu,
Irwin King,
Steven C. H. Hoi
Abstract:
Natural language interfaces to databases (NLIDB) democratize end user access to relational data. Due to fundamental differences between natural language communication and programming, it is common for end users to issue questions that are ambiguous to the system or fall outside the semantic scope of its underlying query language. We present Photon, a robust, modular, cross-domain NLIDB that can fl…
▽ More
Natural language interfaces to databases (NLIDB) democratize end user access to relational data. Due to fundamental differences between natural language communication and programming, it is common for end users to issue questions that are ambiguous to the system or fall outside the semantic scope of its underlying query language. We present Photon, a robust, modular, cross-domain NLIDB that can flag natural language input to which a SQL mapping cannot be immediately determined. Photon consists of a strong neural semantic parser (63.2\% structure accuracy on the Spider dev benchmark), a human-in-the-loop question corrector, a SQL executor and a response generator. The question corrector is a discriminative neural sequence editor which detects confusion span(s) in the input question and suggests rephrasing until a translatable input is given by the user or a maximum number of iterations are conducted. Experiments on simulated data show that the proposed method effectively improves the robustness of text-to-SQL system against untranslatable user input. The live demo of our system is available at http://naturalsql.com.
△ Less
Submitted 3 August, 2020; v1 submitted 30 July, 2020;
originally announced July 2020.
-
DART: Open-Domain Structured Data Record to Text Generation
Authors:
Linyong Nan,
Dragomir Radev,
Rui Zhang,
Amrit Rau,
Abhinand Sivaprasad,
Chiachun Hsieh,
Xiangru Tang,
Aadit Vyas,
Neha Verma,
Pranav Krishna,
Yangxiaokang Liu,
Nadia Irwanto,
Jessica Pan,
Faiaz Rahman,
Ahmad Zaidi,
Mutethia Mutuma,
Yasin Tarabar,
Ankit Gupta,
Tao Yu,
Yi Chern Tan,
Xi Victoria Lin,
Caiming Xiong,
Richard Socher,
Nazneen Fatema Rajani
Abstract:
We present DART, an open domain structured DAta Record to Text generation dataset with over 82k instances (DARTs). Data-to-Text annotations can be a costly process, especially when dealing with tables which are the major source of structured data and contain nontrivial structures. To this end, we propose a procedure of extracting semantic triples from tables that encodes their structures by exploi…
▽ More
We present DART, an open domain structured DAta Record to Text generation dataset with over 82k instances (DARTs). Data-to-Text annotations can be a costly process, especially when dealing with tables which are the major source of structured data and contain nontrivial structures. To this end, we propose a procedure of extracting semantic triples from tables that encodes their structures by exploiting the semantic dependencies among table headers and the table title. Our dataset construction framework effectively merged heterogeneous sources from open domain semantic parsing and dialogue-act-based meaning representation tasks by utilizing techniques such as: tree ontology annotation, question-answer pair to declarative sentence conversion, and predicate unification, all with minimum post-editing. We present systematic evaluation on DART as well as new state-of-the-art results on WebNLG 2017 to show that DART (1) poses new challenges to existing data-to-text datasets and (2) facilitates out-of-domain generalization. Our data and code can be found at https://github.com/Yale-LILY/dart.
△ Less
Submitted 12 April, 2021; v1 submitted 6 July, 2020;
originally announced July 2020.
-
Double-Hard Debias: Tailoring Word Embeddings for Gender Bias Mitigation
Authors:
Tianlu Wang,
Xi Victoria Lin,
Nazneen Fatema Rajani,
Bryan McCann,
Vicente Ordonez,
Caiming Xiong
Abstract:
Word embeddings derived from human-generated corpora inherit strong gender bias which can be further amplified by downstream models. Some commonly adopted debiasing approaches, including the seminal Hard Debias algorithm, apply post-processing procedures that project pre-trained word embeddings into a subspace orthogonal to an inferred gender subspace. We discover that semantic-agnostic corpus reg…
▽ More
Word embeddings derived from human-generated corpora inherit strong gender bias which can be further amplified by downstream models. Some commonly adopted debiasing approaches, including the seminal Hard Debias algorithm, apply post-processing procedures that project pre-trained word embeddings into a subspace orthogonal to an inferred gender subspace. We discover that semantic-agnostic corpus regularities such as word frequency captured by the word embeddings negatively impact the performance of these algorithms. We propose a simple but effective technique, Double Hard Debias, which purifies the word embeddings against such corpus regularities prior to inferring and removing the gender subspace. Experiments on three bias mitigation benchmarks show that our approach preserves the distributional semantics of the pre-trained word embeddings while reducing gender bias to a significantly larger degree than prior approaches.
△ Less
Submitted 2 May, 2020;
originally announced May 2020.
-
Forecasting Sequential Data using Consistent Koopman Autoencoders
Authors:
Omri Azencot,
N. Benjamin Erichson,
Vanessa Lin,
Michael W. Mahoney
Abstract:
Recurrent neural networks are widely used on time series data, yet such models often ignore the underlying physical structures in such sequences. A new class of physics-based methods related to Koopman theory has been introduced, offering an alternative for processing nonlinear dynamical systems. In this work, we propose a novel Consistent Koopman Autoencoder model which, unlike the majority of ex…
▽ More
Recurrent neural networks are widely used on time series data, yet such models often ignore the underlying physical structures in such sequences. A new class of physics-based methods related to Koopman theory has been introduced, offering an alternative for processing nonlinear dynamical systems. In this work, we propose a novel Consistent Koopman Autoencoder model which, unlike the majority of existing work, leverages the forward and backward dynamics. Key to our approach is a new analysis which explores the interplay between consistent dynamics and their associated Koopman operators. Our network is directly related to the derived analysis, and its computational requirements are comparable to other baselines. We evaluate our method on a wide range of high-dimensional and short-term dependent problems, and it achieves accurate estimates for significant prediction horizons, while also being robust to noise.
△ Less
Submitted 30 June, 2020; v1 submitted 4 March, 2020;
originally announced March 2020.
-
Context-Dependent Models for Predicting and Characterizing Facial Expressiveness
Authors:
Victoria Lin,
Jeffrey M. Girard,
Louis-Philippe Morency
Abstract:
In recent years, extensive research has emerged in affective computing on topics like automatic emotion recognition and determining the signals that characterize individual emotions. Much less studied, however, is expressiveness, or the extent to which someone shows any feeling or emotion. Expressiveness is related to personality and mental health and plays a crucial role in social interaction. As…
▽ More
In recent years, extensive research has emerged in affective computing on topics like automatic emotion recognition and determining the signals that characterize individual emotions. Much less studied, however, is expressiveness, or the extent to which someone shows any feeling or emotion. Expressiveness is related to personality and mental health and plays a crucial role in social interaction. As such, the ability to automatically detect or predict expressiveness can facilitate significant advancements in areas ranging from psychiatric care to artificial social intelligence. Motivated by these potential applications, we present an extension of the BP4D+ dataset with human ratings of expressiveness and develop methods for (1) automatically predicting expressiveness from visual data and (2) defining relationships between interpretable visual signals and expressiveness. In addition, we study the emotional context in which expressiveness occurs and hypothesize that different sets of signals are indicative of expressiveness in different contexts (e.g., in response to surprise or in response to pain). Analysis of our statistical models confirms our hypothesis. Consequently, by looking at expressiveness separately in distinct emotional contexts, our predictive models show significant improvements over baselines and achieve comparable results to human performance in terms of correlation with the ground truth.
△ Less
Submitted 10 December, 2019;
originally announced December 2019.
-
CoSQL: A Conversational Text-to-SQL Challenge Towards Cross-Domain Natural Language Interfaces to Databases
Authors:
Tao Yu,
Rui Zhang,
He Yang Er,
Suyi Li,
Eric Xue,
Bo Pang,
Xi Victoria Lin,
Yi Chern Tan,
Tianze Shi,
Zihan Li,
Youxuan Jiang,
Michihiro Yasunaga,
Sungrok Shim,
Tao Chen,
Alexander Fabbri,
Zifan Li,
Luyao Chen,
Yuwen Zhang,
Shreya Dixit,
Vincent Zhang,
Caiming Xiong,
Richard Socher,
Walter S Lasecki,
Dragomir Radev
Abstract:
We present CoSQL, a corpus for building cross-domain, general-purpose database (DB) querying dialogue systems. It consists of 30k+ turns plus 10k+ annotated SQL queries, obtained from a Wizard-of-Oz (WOZ) collection of 3k dialogues querying 200 complex DBs spanning 138 domains. Each dialogue simulates a real-world DB query scenario with a crowd worker as a user exploring the DB and a SQL expert re…
▽ More
We present CoSQL, a corpus for building cross-domain, general-purpose database (DB) querying dialogue systems. It consists of 30k+ turns plus 10k+ annotated SQL queries, obtained from a Wizard-of-Oz (WOZ) collection of 3k dialogues querying 200 complex DBs spanning 138 domains. Each dialogue simulates a real-world DB query scenario with a crowd worker as a user exploring the DB and a SQL expert retrieving answers with SQL, clarifying ambiguous questions, or otherwise informing of unanswerable questions. When user questions are answerable by SQL, the expert describes the SQL and execution results to the user, hence maintaining a natural interaction flow. CoSQL introduces new challenges compared to existing task-oriented dialogue datasets:(1) the dialogue states are grounded in SQL, a domain-independent executable representation, instead of domain-specific slot-value pairs, and (2) because testing is done on unseen databases, success requires generalizing to new domains. CoSQL includes three tasks: SQL-grounded dialogue state tracking, response generation from query results, and user dialogue act prediction. We evaluate a set of strong baselines for each task and show that CoSQL presents significant challenges for future research. The dataset, baselines, and leaderboard will be released at https://yale-lily.github.io/cosql.
△ Less
Submitted 11 September, 2019;
originally announced September 2019.
-
Editing-Based SQL Query Generation for Cross-Domain Context-Dependent Questions
Authors:
Rui Zhang,
Tao Yu,
He Yang Er,
Sungrok Shim,
Eric Xue,
Xi Victoria Lin,
Tianze Shi,
Caiming Xiong,
Richard Socher,
Dragomir Radev
Abstract:
We focus on the cross-domain context-dependent text-to-SQL generation task. Based on the observation that adjacent natural language questions are often linguistically dependent and their corresponding SQL queries tend to overlap, we utilize the interaction history by editing the previous predicted query to improve the generation quality. Our editing mechanism views SQL as sequences and reuses gene…
▽ More
We focus on the cross-domain context-dependent text-to-SQL generation task. Based on the observation that adjacent natural language questions are often linguistically dependent and their corresponding SQL queries tend to overlap, we utilize the interaction history by editing the previous predicted query to improve the generation quality. Our editing mechanism views SQL as sequences and reuses generation results at the token level in a simple manner. It is flexible to change individual tokens and robust to error propagation. Furthermore, to deal with complex table structures in different domains, we employ an utterance-table encoder and a table-aware decoder to incorporate the context of the user utterance and the table schema. We evaluate our approach on the SParC dataset and demonstrate the benefit of editing compared with the state-of-the-art baselines which generate SQL from scratch. Our code is available at https://github.com/ryanzhumich/sparc_atis_pytorch.
△ Less
Submitted 9 September, 2019; v1 submitted 2 September, 2019;
originally announced September 2019.
-
gfoRmula: An R package for estimating effects of general time-varying treatment interventions via the parametric g-formula
Authors:
Victoria Lin,
Sean McGrath,
Zilu Zhang,
Lucia C. Petito,
Roger W. Logan,
Miguel A. Hernán,
Jessica G. Young
Abstract:
Researchers are often interested in using longitudinal data to estimate the causal effects of hypothetical time-varying treatment interventions on the mean or risk of a future outcome. Standard regression/conditioning methods for confounding control generally fail to recover causal effects when time-varying confounders are themselves affected by past treatment. In such settings, estimators derived…
▽ More
Researchers are often interested in using longitudinal data to estimate the causal effects of hypothetical time-varying treatment interventions on the mean or risk of a future outcome. Standard regression/conditioning methods for confounding control generally fail to recover causal effects when time-varying confounders are themselves affected by past treatment. In such settings, estimators derived from Robins's g-formula may recover time-varying treatment effects provided sufficient covariates are measured to control confounding by unmeasured risk factors. The package gfoRmula implements in R one such estimator: the parametric g-formula. This estimator easily adapts to binary or continuous time-varying treatments as well as contrasts defined by static or dynamic, deterministic or random treatment interventions, as well as interventions that depend on the natural value of treatment. The package accommodates survival outcomes as well as binary or continuous end of follow-up outcomes. For survival outcomes, the package has different options for handling competing events. This paper describes the gfoRmula package, along with motivating background, features, and examples.
△ Less
Submitted 29 October, 2019; v1 submitted 19 August, 2019;
originally announced August 2019.
-
SParC: Cross-Domain Semantic Parsing in Context
Authors:
Tao Yu,
Rui Zhang,
Michihiro Yasunaga,
Yi Chern Tan,
Xi Victoria Lin,
Suyi Li,
Heyang Er,
Irene Li,
Bo Pang,
Tao Chen,
Emily Ji,
Shreya Dixit,
David Proctor,
Sungrok Shim,
Jonathan Kraft,
Vincent Zhang,
Caiming Xiong,
Richard Socher,
Dragomir Radev
Abstract:
We present SParC, a dataset for cross-domainSemanticParsing inContext that consists of 4,298 coherent question sequences (12k+ individual questions annotated with SQL queries). It is obtained from controlled user interactions with 200 complex databases over 138 domains. We provide an in-depth analysis of SParC and show that it introduces new challenges compared to existing datasets. SParC demonstr…
▽ More
We present SParC, a dataset for cross-domainSemanticParsing inContext that consists of 4,298 coherent question sequences (12k+ individual questions annotated with SQL queries). It is obtained from controlled user interactions with 200 complex databases over 138 domains. We provide an in-depth analysis of SParC and show that it introduces new challenges compared to existing datasets. SParC demonstrates complex contextual dependencies, (2) has greater semantic diversity, and (3) requires generalization to unseen domains due to its cross-domain nature and the unseen databases at test time. We experiment with two state-of-the-art text-to-SQL models adapted to the context-dependent, cross-domain setup. The best model obtains an exact match accuracy of 20.2% over all questions and less than10% over all interaction sequences, indicating that the cross-domain setting and the con-textual phenomena of the dataset present significant challenges for future research. The dataset, baselines, and leaderboard are released at https://yale-lily.github.io/sparc.
△ Less
Submitted 5 June, 2019;
originally announced June 2019.