-
Theory of Josephson scanning microscopy with $s$-wave tip on unconventional superconducting surface: application to Bi$_2$Sr$_2$CaCu$_2$O$_{8+δ}$
Authors:
Peayush Choubey,
P. J. Hirschfeld
Abstract:
Josephson scanning tunneling microscopy (JSTM) is a powerful probe of the local superconducting order parameter, but studies have been largely limited to cases where superconducting sample and superconducting tip both have the same gap symmetry -- either s-wave or d-wave. It has been generally assumed that in an ideal $s$-to-$d$ JSTM experiment the critical current would vanish everywhere, as expe…
▽ More
Josephson scanning tunneling microscopy (JSTM) is a powerful probe of the local superconducting order parameter, but studies have been largely limited to cases where superconducting sample and superconducting tip both have the same gap symmetry -- either s-wave or d-wave. It has been generally assumed that in an ideal $s$-to-$d$ JSTM experiment the critical current would vanish everywhere, as expected for ideal $c$-axis planar junctions. We show here that this is not the case. Employing first-principles Wannier functions for Bi$_2$Sr$_2$CaCu$_2$O$_{8+δ}$, we develop a scheme to compute Josephson critical current ($I_{c}$) and quasiparticle tunneling current measured by JSTM with sub-angstrom resolution. We demonstrate that the critical current for tunneling between an s-wave tip and a superconducting cuprate sample has largest magnitude above O sites and it vanishes above Cu sites. $I_{c}$ changes sign under $π/2$-rotation and its average over a unit cell vanishes, as a direct consequence of the $d$-wave gap symmetry in cuprates. Further, we show that $I_{c}$ is strongly suppressed in the close vicinity of a Zn-like impurity owing to suppression of the superconducting order parameter. More interestingly, $I_{c}$ acquires non-vanishing values above the Cu sites near the impurity. The critical current modulations produced by the impurity occur at characteristic wavevectors distinct from the quasiparticle interference (QPI) analogue. Furthermore, the quasiparticle tunneling spectra in the JSTM set-up shows coherence peaks and impurity-induced resonances shifted by the $s$-wave tip gap. We discuss similarities and differences in JSTM observables and conventional STM observables, making specific predictions that can be tested in future JSTM experiments.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
AgentLite: A Lightweight Library for Building and Advancing Task-Oriented LLM Agent System
Authors:
Zhiwei Liu,
Weiran Yao,
Jianguo Zhang,
Liangwei Yang,
Zuxin Liu,
Juntao Tan,
Prafulla K. Choubey,
Tian Lan,
Jason Wu,
Huan Wang,
Shelby Heinecke,
Caiming Xiong,
Silvio Savarese
Abstract:
The booming success of LLMs initiates rapid development in LLM agents. Though the foundation of an LLM agent is the generative model, it is critical to devise the optimal reasoning strategies and agent architectures. Accordingly, LLM agent research advances from the simple chain-of-thought prompting to more complex ReAct and Reflection reasoning strategy; agent architecture also evolves from singl…
▽ More
The booming success of LLMs initiates rapid development in LLM agents. Though the foundation of an LLM agent is the generative model, it is critical to devise the optimal reasoning strategies and agent architectures. Accordingly, LLM agent research advances from the simple chain-of-thought prompting to more complex ReAct and Reflection reasoning strategy; agent architecture also evolves from single agent generation to multi-agent conversation, as well as multi-LLM multi-agent group chat. However, with the existing intricate frameworks and libraries, creating and evaluating new reasoning strategies and agent architectures has become a complex challenge, which hinders research investigation into LLM agents. Thus, we open-source a new AI agent library, AgentLite, which simplifies this process by offering a lightweight, user-friendly platform for innovating LLM agent reasoning, architectures, and applications with ease. AgentLite is a task-oriented framework designed to enhance the ability of agents to break down tasks and facilitate the development of multi-agent systems. Furthermore, we introduce multiple practical applications developed with AgentLite to demonstrate its convenience and flexibility. Get started now at: \url{https://github.com/SalesforceAIResearch/AgentLite}.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
Lexical Repetitions Lead to Rote Learning: Unveiling the Impact of Lexical Overlap in Train and Test Reference Summaries
Authors:
Prafulla Kumar Choubey,
Alexander R. Fabbri,
Caiming Xiong,
Chien-Sheng Wu
Abstract:
Ideal summarization models should generalize to novel summary-worthy content without remembering reference training summaries by rote. However, a single average performance score on the entire test set is inadequate in determining such model competencies. We propose a fine-grained evaluation protocol by partitioning a test set based on the lexical similarity of reference test summaries with traini…
▽ More
Ideal summarization models should generalize to novel summary-worthy content without remembering reference training summaries by rote. However, a single average performance score on the entire test set is inadequate in determining such model competencies. We propose a fine-grained evaluation protocol by partitioning a test set based on the lexical similarity of reference test summaries with training summaries. We observe up to a 5x (1.2x) difference in ROUGE-2 (entity recall) scores between the subsets with the lowest and highest similarity. Next, we show that such training repetitions also make a model vulnerable to rote learning, reproducing data artifacts such as factual errors, especially when reference test summaries are lexically close to training summaries. Consequently, we propose to limit lexical repetitions in training summaries during both supervised fine-tuning and likelihood calibration stages to improve the performance on novel test cases while retaining average performance. Our automatic and human evaluations on novel test subsets and recent news articles show that limiting lexical repetitions in training summaries can prevent rote learning and improve generalization.
△ Less
Submitted 15 November, 2023;
originally announced November 2023.
-
Embrace Divergence for Richer Insights: A Multi-document Summarization Benchmark and a Case Study on Summarizing Diverse Information from News Articles
Authors:
Kung-Hsiang Huang,
Philippe Laban,
Alexander R. Fabbri,
Prafulla Kumar Choubey,
Shafiq Joty,
Caiming Xiong,
Chien-Sheng Wu
Abstract:
Previous research in multi-document news summarization has typically concentrated on collating information that all sources agree upon. However, the summarization of diverse information dispersed across multiple articles about an event remains underexplored. In this paper, we propose a new task of summarizing diverse information encountered in multiple news articles encompassing the same event. To…
▽ More
Previous research in multi-document news summarization has typically concentrated on collating information that all sources agree upon. However, the summarization of diverse information dispersed across multiple articles about an event remains underexplored. In this paper, we propose a new task of summarizing diverse information encountered in multiple news articles encompassing the same event. To facilitate this task, we outlined a data collection schema for identifying diverse information and curated a dataset named DiverseSumm. The dataset includes 245 news stories, with each story comprising 10 news articles and paired with a human-validated reference. Next, to enable consistent automatic evaluation, we conducted a comprehensive analysis to pinpoint the position and verbosity biases when utilizing Large Language Model (LLM)-based metrics for evaluating the coverage and faithfulness of summaries. Through correlation analyses, we outline the best practices for effectively using automatic LLM-based metrics on the DiverseSumm dataset. Finally, we study how LLMs summarize multiple news articles by analyzing which type of diverse information LLMs are capable of identifying. Our analyses suggest that despite the extraordinary capabilities of LLMs in single-document summarization, the proposed task remains a complex challenge for them mainly due to their limited coverage, with GPT-4 only able to cover under 40% of the diverse information on average.
△ Less
Submitted 22 March, 2024; v1 submitted 17 September, 2023;
originally announced September 2023.
-
XGen-7B Technical Report
Authors:
Erik Nijkamp,
Tian Xie,
Hiroaki Hayashi,
Bo Pang,
Congying Xia,
Chen Xing,
Jesse Vig,
Semih Yavuz,
Philippe Laban,
Ben Krause,
Senthil Purushwalkam,
Tong Niu,
Wojciech Kryściński,
Lidiya Murakhovs'ka,
Prafulla Kumar Choubey,
Alex Fabbri,
Ye Liu,
Rui Meng,
Lifu Tu,
Meghana Bhat,
Chien-Sheng Wu,
Silvio Savarese,
Yingbo Zhou,
Shafiq Joty,
Caiming Xiong
Abstract:
Large Language Models (LLMs) have become ubiquitous across various domains, transforming the way we interact with information and conduct research. However, most high-performing LLMs remain confined behind proprietary walls, hindering scientific progress. Most open-source LLMs, on the other hand, are limited in their ability to support longer sequence lengths, which is a key requirement for many t…
▽ More
Large Language Models (LLMs) have become ubiquitous across various domains, transforming the way we interact with information and conduct research. However, most high-performing LLMs remain confined behind proprietary walls, hindering scientific progress. Most open-source LLMs, on the other hand, are limited in their ability to support longer sequence lengths, which is a key requirement for many tasks that require inference over an input context. To address this, we have trained XGen, a series of 7B parameter models on up to 8K sequence length for up to 1.5T tokens. We have also finetuned the XGen models on public-domain instructional data, creating their instruction-tuned counterparts (XGen-Inst). We open-source our models for both research advancements and commercial applications. Our evaluation on standard benchmarks shows that XGen models achieve comparable or better results when compared with state-of-the-art open-source LLMs. Our targeted evaluation on long sequence modeling tasks shows the benefits of our 8K-sequence models over 2K-sequence open-source LLMs.
△ Less
Submitted 6 September, 2023;
originally announced September 2023.
-
Improving Factual Consistency in Summarization with Compression-Based Post-Editing
Authors:
Alexander R. Fabbri,
Prafulla Kumar Choubey,
Jesse Vig,
Chien-Sheng Wu,
Caiming Xiong
Abstract:
State-of-the-art summarization models still struggle to be factually consistent with the input text. A model-agnostic way to address this problem is post-editing the generated summaries. However, existing approaches typically fail to remove entity errors if a suitable input entity replacement is not available or may insert erroneous content. In our work, we focus on removing extrinsic entity error…
▽ More
State-of-the-art summarization models still struggle to be factually consistent with the input text. A model-agnostic way to address this problem is post-editing the generated summaries. However, existing approaches typically fail to remove entity errors if a suitable input entity replacement is not available or may insert erroneous content. In our work, we focus on removing extrinsic entity errors, or entities not in the source, to improve consistency while retaining the summary's essential information and form. We propose to use sentence-compression data to train the post-editing model to take a summary with extrinsic entity errors marked with special tokens and output a compressed, well-formed summary with those errors removed. We show that this model improves factual consistency while maintaining ROUGE, improving entity precision by up to 30% on XSum, and that this model can be applied on top of another post-editor, improving entity precision by up to a total of 38%. We perform an extensive comparison of post-editing approaches that demonstrate trade-offs between factual consistency, informativeness, and grammaticality, and we analyze settings where post-editors show the largest improvements.
△ Less
Submitted 11 November, 2022;
originally announced November 2022.
-
Conformal Predictor for Improving Zero-shot Text Classification Efficiency
Authors:
Prafulla Kumar Choubey,
Yu Bai,
Chien-Sheng Wu,
Wenhao Liu,
Nazneen Rajani
Abstract:
Pre-trained language models (PLMs) have been shown effective for zero-shot (0shot) text classification. 0shot models based on natural language inference (NLI) and next sentence prediction (NSP) employ cross-encoder architecture and infer by making a forward pass through the model for each label-text pair separately. This increases the computational cost to make inferences linearly in the number of…
▽ More
Pre-trained language models (PLMs) have been shown effective for zero-shot (0shot) text classification. 0shot models based on natural language inference (NLI) and next sentence prediction (NSP) employ cross-encoder architecture and infer by making a forward pass through the model for each label-text pair separately. This increases the computational cost to make inferences linearly in the number of labels. In this work, we improve the efficiency of such cross-encoder-based 0shot models by restricting the number of likely labels using another fast base classifier-based conformal predictor (CP) calibrated on samples labeled by the 0shot model. Since a CP generates prediction sets with coverage guarantees, it reduces the number of target labels without excluding the most probable label based on the 0shot model. We experiment with three intent and two topic classification datasets. With a suitable CP for each dataset, we reduce the average inference time for NLI- and NSP-based models by 25.6% and 22.2% respectively, without dropping performance below the predefined error rate of 1%.
△ Less
Submitted 23 October, 2022;
originally announced October 2022.
-
Model ensemble instead of prompt fusion: a sample-specific knowledge transfer method for few-shot prompt tuning
Authors:
Xiangyu Peng,
Chen Xing,
Prafulla Kumar Choubey,
Chien-Sheng Wu,
Caiming Xiong
Abstract:
Prompt tuning approaches, which learn task-specific soft prompts for a downstream task conditioning on frozen pre-trained models, have attracted growing interest due to its parameter efficiency. With large language models and sufficient training data, prompt tuning performs comparably to full-model tuning. However, with limited training samples in few-shot settings, prompt tuning fails to match th…
▽ More
Prompt tuning approaches, which learn task-specific soft prompts for a downstream task conditioning on frozen pre-trained models, have attracted growing interest due to its parameter efficiency. With large language models and sufficient training data, prompt tuning performs comparably to full-model tuning. However, with limited training samples in few-shot settings, prompt tuning fails to match the performance of full-model fine-tuning. In this work, we focus on improving the few-shot performance of prompt tuning by transferring knowledge from soft prompts of source tasks. Recognizing the good generalization capabilities of ensemble methods in low-data regime, we first experiment and show that a simple ensemble of model predictions based on different source prompts, outperforms existing multi-prompt knowledge transfer approaches such as source prompt fusion in the few-shot setting. Motivated by this observation, we further investigate model ensembles and propose Sample-specific Ensemble of Source Models (SESoM). SESoM learns to adjust the contribution of each source model for each target sample separately when ensembling source model outputs. Through this way, SESoM inherits the superior generalization of model ensemble approaches and simultaneously captures the sample-specific competence of each source prompt. We conduct experiments across a diverse set of eight NLP tasks using models of different scales (T5-{base, large, XL}) and find that SESoM consistently outperforms the existing models of the same as well as larger parametric scale by a large margin.
△ Less
Submitted 1 March, 2023; v1 submitted 22 October, 2022;
originally announced October 2022.
-
Modeling Document-level Temporal Structures for Building Temporal Dependency Graphs
Authors:
Prafulla Kumar Choubey,
Ruihong Huang
Abstract:
We propose to leverage news discourse profiling to model document-level temporal structures for building temporal dependency graphs. Our key observation is that the functional roles of sentences used for profiling news discourse signify different time frames relevant to a news story and can, therefore, help to recover the global temporal structure of a document. Our analyses and experiments with t…
▽ More
We propose to leverage news discourse profiling to model document-level temporal structures for building temporal dependency graphs. Our key observation is that the functional roles of sentences used for profiling news discourse signify different time frames relevant to a news story and can, therefore, help to recover the global temporal structure of a document. Our analyses and experiments with the widely used knowledge distillation technique show that discourse profiling effectively identifies distant inter-sentence event and (or) time expression pairs that are temporally related and otherwise difficult to locate.
△ Less
Submitted 21 October, 2022;
originally announced October 2022.
-
P-Adapters: Robustly Extracting Factual Information from Language Models with Diverse Prompts
Authors:
Benjamin Newman,
Prafulla Kumar Choubey,
Nazneen Rajani
Abstract:
Recent work (e.g. LAMA (Petroni et al., 2019)) has found that the quality of the factual information extracted from Large Language Models (LLMs) depends on the prompts used to query them. This inconsistency is problematic because different users will query LLMs for the same information using different wording, but should receive the same, accurate responses regardless. In this work we aim to addre…
▽ More
Recent work (e.g. LAMA (Petroni et al., 2019)) has found that the quality of the factual information extracted from Large Language Models (LLMs) depends on the prompts used to query them. This inconsistency is problematic because different users will query LLMs for the same information using different wording, but should receive the same, accurate responses regardless. In this work we aim to address this shortcoming by introducing P-Adapters: lightweight models that sit between the embedding layer and first attention layer of LLMs. They take LLM embeddings as input and output continuous prompts that are used to query the LLM. Additionally, we investigate Mixture of Experts (MoE) models that learn a set of continuous prompts ("experts") and select one to query the LLM. They require a separate classifier trained on human-annotated data to map natural language prompts to the continuous ones. P-Adapters perform comparably to the more complex MoE models in extracting factual information from BERT and RoBERTa while eliminating the need for additional annotations. P-Adapters show between 12-26% absolute improvement in precision and 36-50% absolute improvement in consistency over a baseline of only using natural language queries. Finally, we investigate what makes P-Adapters successful and conclude that a significant factor is access to the LLM's embeddings of the original natural language prompt, particularly the subject of the entity pair being queried.
△ Less
Submitted 19 April, 2022; v1 submitted 14 October, 2021;
originally announced October 2021.
-
CaPE: Contrastive Parameter Ensembling for Reducing Hallucination in Abstractive Summarization
Authors:
Prafulla Kumar Choubey,
Alexander R. Fabbri,
Jesse Vig,
Chien-Sheng Wu,
Wenhao Liu,
Nazneen Fatema Rajani
Abstract:
Hallucination is a known issue for neural abstractive summarization models. Recent work suggests that the degree of hallucination may depend on errors in the training data. In this work, we propose a new method called Contrastive Parameter Ensembling (CaPE) to use training data more effectively, utilizing variations in noise in training samples to reduce hallucination. We first select clean and no…
▽ More
Hallucination is a known issue for neural abstractive summarization models. Recent work suggests that the degree of hallucination may depend on errors in the training data. In this work, we propose a new method called Contrastive Parameter Ensembling (CaPE) to use training data more effectively, utilizing variations in noise in training samples to reduce hallucination. We first select clean and noisy subsets from the training data using different automatic factual metrics. Then, we fine-tune a base summarization model, which is trained on all training samples, on the clean (noisy) subset to obtain an \textit{expert} (\textit{anti-expert}) model. Finally, we adjust the parameters of base model by the difference between parameters of the \textit{expert} and \textit{anti-expert} models, steering the base model towards the \textit{expert} model and away from the \textit{anti-expert} model. Experimental results show that CaPE improves performance across different automatic factual metrics and human evaluation, with the maximum improvement of 16.69\% and 15.78\% on summary-level dependency-arc entailment accuracy for the XSUM and CNN/DM datasets. The improvement in factual performance does not degrade the performance on other metrics of informativeness such as ROUGE.
△ Less
Submitted 20 May, 2022; v1 submitted 14 October, 2021;
originally announced October 2021.
-
Electronic Theory for Scanning Tunneling Microscopy Spectra in Infinite-Layer Nickelate Superconductors
Authors:
Peayush Choubey,
Ilya M. Eremin
Abstract:
Recent scanning tunneling microscopy (STM) observation of U-shaped and V-shaped spectra (and their mixture) in superconducting Nd$_{1-x}$Sr$_x$NiO$_2$ thin films has been interpreted as presence of two distinct gap symmetries in this nickelate superconductor [Gu et al., Nat. Comm. 11, 6027 (2020)]. Here, using a two-band model of nickelates capturing dominant contributions from Ni-$3d_{x^2-y^2}$ a…
▽ More
Recent scanning tunneling microscopy (STM) observation of U-shaped and V-shaped spectra (and their mixture) in superconducting Nd$_{1-x}$Sr$_x$NiO$_2$ thin films has been interpreted as presence of two distinct gap symmetries in this nickelate superconductor [Gu et al., Nat. Comm. 11, 6027 (2020)]. Here, using a two-band model of nickelates capturing dominant contributions from Ni-$3d_{x^2-y^2}$ and rare-earth (R)-$5d_{3z^2 - r^2}$ orbitals, we show that the experimental observation can be simply explained within a pairing scenario characterized by a conventional $d_{x^2-y^2}$-wave gap structure with lowest harmonic on the Ni-band and a $d_{x^2-y^2}$-wave gap with higher-harmonics on the R-band. We perform realistic simulations of STM spectra employing first-principles Wannier functions to properly account for the tunneling processes and obtain V, U, and mixed spectral line-shapes depending on the position of the STM tip within the unit cell. The V- and U-shaped spectra are contributed from Ni and R-bands, respectively, and Wannier functions, in essence, provide position-dependent weighing factors, determining the spectral line-shape at a given intra-unit cell position. We propose a phase-sensitive experiment to distinguish between the proposed $d$-wave gap structure and time-reversal symmetry breaking $d+is$ gap which yields very similar intra-unit cell spectra.
△ Less
Submitted 9 August, 2021;
originally announced August 2021.
-
Direct Visualization of a Static Incommensurate Antiferromagnetic Order by Suppressing the Superconducting Phase Coherence in Fe-doped Bi2Sr2CaCu2O8+delta
Authors:
Siyuan Wan,
Huazhou Li,
Peayush Choubey,
Qiangqiang Gu,
Han Li,
Huan Yang,
Ilya M. Eremin,
G. D. Gu,
Hai-Hu Wen
Abstract:
In cuprate superconductors, due to strong electronic correlations, there are multiple intertwined orders which either coexist or compete with superconductivity. Among them the antiferromagnetic (AF) order is the most prominent one. In the region where superconductivity sets in, the long-range AF order is destroyed. Yet the residual short-range AF fluctuations are present up to a much higher doping…
▽ More
In cuprate superconductors, due to strong electronic correlations, there are multiple intertwined orders which either coexist or compete with superconductivity. Among them the antiferromagnetic (AF) order is the most prominent one. In the region where superconductivity sets in, the long-range AF order is destroyed. Yet the residual short-range AF fluctuations are present up to a much higher doping and their role in the emergence of the superconducting phase is still highly debated. Here, by using a spin polarized scanning tunneling microscope, for the first time, we directly visualize an emergent incommensurate AF order in the nearby region of Fe impurities embedded in the optimally doped Bi2Sr2CaCu2O8+δ (Bi2212). Remarkably the Fe impurities suppress the superconducting coherence peaks with the gapped feature intact, but pin down the ubiquitous short-range incommensurate AF order. Our work shows an intimate relation between antiferromagnetism and superconductivity.
△ Less
Submitted 21 July, 2021;
originally announced July 2021.
-
Scattering Interference Signature of a Pair Density Wave State in the Cuprate Pseudogap Phase
Authors:
Shuqiu Wang,
Peayush Choubey,
Yi Xue Chong,
Weijiong Chen,
Wangping Ren,
H. Eisaki,
Shin-ichi Uchida,
Peter J. Hirschfeld,
J. C. Séamus Davis
Abstract:
An unidentified quantum fluid designated as the pseudogap (PG) phase is produced by electron-density depletion in the CuO$_2$ antiferromagnetic insulator. Current theories suggest that the PG phase may be a pair density wave (PDW) state characterized by a spatially modulating density of electron pairs. Such a state should exhibit a periodically modulating energy gap $Δ_P(\pmb r)$ in real-space, an…
▽ More
An unidentified quantum fluid designated as the pseudogap (PG) phase is produced by electron-density depletion in the CuO$_2$ antiferromagnetic insulator. Current theories suggest that the PG phase may be a pair density wave (PDW) state characterized by a spatially modulating density of electron pairs. Such a state should exhibit a periodically modulating energy gap $Δ_P(\pmb r)$ in real-space, and a characteristic quasiparticle scattering interference (QPI) signature $Λ_P(\pmb q)$ in wavevector space. By studying strongly underdoped Bi$_2$Sr$_2$CaDyCu$_2$O$_8$ at hole-density ~0.08 in the superconductive phase, we detect the $8a_0$-periodic $Δ_P(\pmb r)$ modulations signifying a PDW coexisting with superconductivity. Then, by visualizing the temperature dependence of this electronic structure from the superconducting into the pseudogap phase, we find evolution of the scattering interference signature $Λ(\pmb q)$ that is predicted specifically for the temperature dependence of an $8a_0$-periodic PDW. These observations are consistent with theory for the transition from a PDW state coexisting with d-wave superconductivity to a pure PDW state in the Bi$_2$Sr$_2$CaDyCu$_2$O$_8$ pseudogap phase.
△ Less
Submitted 26 October, 2021; v1 submitted 13 May, 2021;
originally announced May 2021.
-
Improving Gender Translation Accuracy with Filtered Self-Training
Authors:
Prafulla Kumar Choubey,
Anna Currey,
Prashant Mathur,
Georgiana Dinu
Abstract:
Targeted evaluations have found that machine translation systems often output incorrect gender, even when the gender is clear from context. Furthermore, these incorrectly gendered translations have the potential to reflect or amplify social biases. We propose a gender-filtered self-training technique to improve gender translation accuracy on unambiguously gendered inputs. This approach uses a sour…
▽ More
Targeted evaluations have found that machine translation systems often output incorrect gender, even when the gender is clear from context. Furthermore, these incorrectly gendered translations have the potential to reflect or amplify social biases. We propose a gender-filtered self-training technique to improve gender translation accuracy on unambiguously gendered inputs. This approach uses a source monolingual corpus and an initial model to generate gender-specific pseudo-parallel corpora which are then added to the training data. We filter the gender-specific corpora on the source and target sides to ensure that sentence pairs contain and correctly translate the specified gender. We evaluate our approach on translation from English into five languages, finding that our models improve gender translation accuracy without any cost to generic translation quality. In addition, we show the viability of our approach on several settings, including re-training from scratch, fine-tuning, controlling the balance of the training data, forward translation, and back-translation.
△ Less
Submitted 15 April, 2021;
originally announced April 2021.
-
BERT based Transformers lead the way in Extraction of Health Information from Social Media
Authors:
Sidharth R,
Abhiraj Tiwari,
Parthivi Choubey,
Saisha Kashyap,
Sahil Khose,
Kumud Lakara,
Nishesh Singh,
Ujjwal Verma
Abstract:
This paper describes our submissions for the Social Media Mining for Health (SMM4H)2021 shared tasks. We participated in 2 tasks:(1) Classification, extraction and normalization of adverse drug effect (ADE) mentions in English tweets (Task-1) and (2) Classification of COVID-19 tweets containing symptoms(Task-6). Our approach for the first task uses the language representation model RoBERTa with a…
▽ More
This paper describes our submissions for the Social Media Mining for Health (SMM4H)2021 shared tasks. We participated in 2 tasks:(1) Classification, extraction and normalization of adverse drug effect (ADE) mentions in English tweets (Task-1) and (2) Classification of COVID-19 tweets containing symptoms(Task-6). Our approach for the first task uses the language representation model RoBERTa with a binary classification head. For the second task, we use BERTweet, based on RoBERTa. Fine-tuning is performed on the pre-trained models for both tasks. The models are placed on top of a custom domain-specific processing pipeline. Our system ranked first among all the submissions for subtask-1(a) with an F1-score of 61%. For subtask-1(b), our system obtained an F1-score of 50% with improvements up to +8% F1 over the score averaged across all submissions. The BERTweet model achieved an F1 score of 94% on SMM4H 2021 Task-6.
△ Less
Submitted 15 April, 2021;
originally announced April 2021.
-
Atomic-scale Electronic Structure of the Cuprate Pair Density Wave State Coexisting with Superconductivity
Authors:
Peayush Choubey,
Sang Hyun Joo,
K. Fujita,
Zengyi Du,
S. D. Edkins,
M. H. Hamidian,
H. Eisaki,
S. Uchida,
A. P. Mackenzie,
Jinho Lee,
J. C. Séamus Davis,
P. J. Hirschfeld
Abstract:
The defining characteristic of hole-doped cuprates is $d$-wave high temperature superconductivity. However, intense theoretical interest is now focused on whether a pair density wave state (PDW) could coexist with cuprate superconductivity (D. F. Agterberg et al., Annual Review of Condensed Matter Physics 11, 231 (2020)). Here, we use a strong-coupling mean-field theory of cuprates, to model the a…
▽ More
The defining characteristic of hole-doped cuprates is $d$-wave high temperature superconductivity. However, intense theoretical interest is now focused on whether a pair density wave state (PDW) could coexist with cuprate superconductivity (D. F. Agterberg et al., Annual Review of Condensed Matter Physics 11, 231 (2020)). Here, we use a strong-coupling mean-field theory of cuprates, to model the atomic-scale electronic structure of an eight-unit-cell periodic, $d$-symmetry form factor, pair density wave (PDW) state coexisting with $d$-wave superconductivity (DSC). From this PDW+DSC model, the atomically-resolved density of Bogoliubov quasiparticle states N(r,E) is predicted at the terminal BiO surface of Bi$_2$Sr$_2$CaCu$_2$O$_8$ and compared with high-precision electronic visualization experiments using spectroscopic imaging STM. The PDW+DSC model predictions include the intra-unit-cell structure and periodic modulations of N(r,E), the modulations of the coherence peak energy $Δ_p$ (r), and the characteristics of Bogoliubov quasiparticle interference in scattering-wavevector space (q-space). Consistency between all these predictions and the corresponding experiments indicates that lightly hole-doped Bi$_2$Sr$_2$CaCu$_2$O$_8$ does contain a PDW+DSC state. Moreover, in the model the PDW+DSC state becomes unstable to a pure DSC state at a critical hole density p*, with empirically equivalent phenomena occurring in the experiments. All these results are consistent with a picture in which the cuprate translational symmetry breaking state is a PDW, the observed charge modulations are its consequence, the antinodal pseudogap is that of the PDW state, and the cuprate critical point at p* ~ 19% occurs due to disappearance of this PDW.
△ Less
Submitted 27 April, 2020; v1 submitted 26 February, 2020;
originally announced February 2020.
-
In Plain Sight: Media Bias Through the Lens of Factual Reporting
Authors:
Lisa Fan,
Marshall White,
Eva Sharma,
Ruisi Su,
Prafulla Kumar Choubey,
Ruihong Huang,
Lu Wang
Abstract:
The increasing prevalence of political bias in news media calls for greater public awareness of it, as well as robust methods for its detection. While prior work in NLP has primarily focused on the lexical bias captured by linguistic attributes such as word choice and syntax, other types of bias stem from the actual content selected for inclusion in the text. In this work, we investigate the effec…
▽ More
The increasing prevalence of political bias in news media calls for greater public awareness of it, as well as robust methods for its detection. While prior work in NLP has primarily focused on the lexical bias captured by linguistic attributes such as word choice and syntax, other types of bias stem from the actual content selected for inclusion in the text. In this work, we investigate the effects of informational bias: factual content that can nevertheless be deployed to sway reader opinion. We first produce a new dataset, BASIL, of 300 news articles annotated with 1,727 bias spans and find evidence that informational bias appears in news articles more frequently than lexical bias. We further study our annotations to observe how informational bias surfaces in news articles by different media outlets. Lastly, a baseline model for informational bias prediction is presented by fine-tuning BERT on our labeled data, indicating the challenges of the task and future directions.
△ Less
Submitted 5 September, 2019;
originally announced September 2019.
-
Improving Dialogue State Tracking by Discerning the Relevant Context
Authors:
Sanuj Sharma,
Prafulla Kumar Choubey,
Ruihong Huang
Abstract:
A typical conversation comprises of multiple turns between participants where they go back-and-forth between different topics. At each user turn, dialogue state tracking (DST) aims to estimate user's goal by processing the current utterance. However, in many turns, users implicitly refer to the previous goal, necessitating the use of relevant dialogue history. Nonetheless, distinguishing relevant…
▽ More
A typical conversation comprises of multiple turns between participants where they go back-and-forth between different topics. At each user turn, dialogue state tracking (DST) aims to estimate user's goal by processing the current utterance. However, in many turns, users implicitly refer to the previous goal, necessitating the use of relevant dialogue history. Nonetheless, distinguishing relevant history is challenging and a popular method of using dialogue recency for that is inefficient. We, therefore, propose a novel framework for DST that identifies relevant historical context by referring to the past utterances where a particular slot-value changes and uses that together with weighted system utterance to identify the relevant context. Specifically, we use the current user utterance and the most recent system utterance to determine the relevance of a system utterance. Empirical analyses show that our method improves joint goal accuracy by 2.75% and 2.36% on WoZ 2.0 and MultiWoZ 2.0 restaurant domain datasets respectively over the previous state-of-the-art GLAD model.
△ Less
Submitted 4 April, 2019;
originally announced April 2019.
-
TAMU at KBP 2017: Event Nugget Detection and Coreference Resolution
Authors:
Prafulla Kumar Choubey,
Ruihong Huang
Abstract:
In this paper, we describe TAMU's system submitted to the TAC KBP 2017 event nugget detection and coreference resolution task. Our system builds on the statistical and empirical observations made on training and development data. We found that modifiers of event nuggets tend to have unique syntactic distribution. Their parts-of-speech tags and dependency relations provides them essential character…
▽ More
In this paper, we describe TAMU's system submitted to the TAC KBP 2017 event nugget detection and coreference resolution task. Our system builds on the statistical and empirical observations made on training and development data. We found that modifiers of event nuggets tend to have unique syntactic distribution. Their parts-of-speech tags and dependency relations provides them essential characteristics that are useful in identifying their span and also defining their types and realis status. We further found that the joint modeling of event span detection and realis status identification performs better than the individual models for both tasks. Our simple system designed using minimal features achieved the micro-average F1 scores of 57.72, 44.27 and 42.47 for event span detection, type identification and realis status classification tasks respectively. Also, our system achieved the CoNLL F1 score of 27.20 in event coreference resolution task.
△ Less
Submitted 25 February, 2018; v1 submitted 6 November, 2017;
originally announced November 2017.
-
Event Coreference Resolution by Iteratively Unfolding Inter-dependencies among Events
Authors:
Prafulla Kumar Choubey,
Ruihong Huang
Abstract:
We introduce a novel iterative approach for event coreference resolution that gradually builds event clusters by exploiting inter-dependencies among event mentions within the same chain as well as across event chains. Among event mentions in the same chain, we distinguish within- and cross-document event coreference links by using two distinct pairwise classifiers, trained separately to capture di…
▽ More
We introduce a novel iterative approach for event coreference resolution that gradually builds event clusters by exploiting inter-dependencies among event mentions within the same chain as well as across event chains. Among event mentions in the same chain, we distinguish within- and cross-document event coreference links by using two distinct pairwise classifiers, trained separately to capture differences in feature distributions of within- and cross-document event clusters. Our event coreference approach alternates between WD and CD clustering and combines arguments from both event clusters after every merge, continuing till no more merge can be made. And then it performs further merging between event chains that are both closely related to a set of other chains of events. Experiments on the ECB+ corpus show that our model outperforms state-of-the-art methods in joint task of WD and CD event coreference resolution.
△ Less
Submitted 23 July, 2017;
originally announced July 2017.
-
A Sequential Model for Classifying Temporal Relations between Intra-Sentence Events
Authors:
Prafulla Kumar Choubey,
Ruihong Huang
Abstract:
We present a sequential model for temporal relation classification between intra-sentence events. The key observation is that the overall syntactic structure and compositional meanings of the multi-word context between events are important for distinguishing among fine-grained temporal relations. Specifically, our approach first extracts a sequence of context words that indicates the temporal rela…
▽ More
We present a sequential model for temporal relation classification between intra-sentence events. The key observation is that the overall syntactic structure and compositional meanings of the multi-word context between events are important for distinguishing among fine-grained temporal relations. Specifically, our approach first extracts a sequence of context words that indicates the temporal relation between two events, which well align with the dependency path between two event mentions. The context word sequence, together with a parts-of-speech tag sequence and a dependency relation sequence that are generated corresponding to the word sequence, are then provided as input to bidirectional recurrent neural network (LSTM) models. The neural nets learn compositional syntactic and semantic representations of contexts surrounding the two events and predict the temporal relation between them. Evaluation of the proposed approach on TimeBank corpus shows that sequential modeling is capable of accurately recognizing temporal relations between events, which outperforms a neural net model using various discrete features as input that imitates previous feature based models.
△ Less
Submitted 23 July, 2017;
originally announced July 2017.
-
Universality of scanning tunneling microscopy in cuprate superconductors
Authors:
Peayush Choubey,
Andreas Kreisel,
T. Berlijn,
Brian M. Andersen,
P. J. Hirschfeld
Abstract:
We consider the problem of local tunneling into cuprate superconductors, combining model based calculations for the superconducting order parameter with wavefunction information obtained from first principles electronic structure. For some time it has been proposed that scanning tunneling microscopy (STM) spectra do not reflect the properties of the superconducting layer in the CuO$_2$ plane direc…
▽ More
We consider the problem of local tunneling into cuprate superconductors, combining model based calculations for the superconducting order parameter with wavefunction information obtained from first principles electronic structure. For some time it has been proposed that scanning tunneling microscopy (STM) spectra do not reflect the properties of the superconducting layer in the CuO$_2$ plane directly beneath the STM tip, but rather a weighted sum of spatially proximate states determined by the details of the tunneling process. These "filter" ideas have been countered with the argument that similar conductance patterns have been seen around impurities and charge ordered states in systems with atomically quite different barrier layers. Here we use a recently developed Wannier function based method to calculate topographies, spectra, conductance maps and normalized conductance maps close to impurities. We find that it is the local planar Cu $d_{x^2-y^2}$ Wannier function, qualitatively similar for many systems, that controls the form of the tunneling spectrum and the spatial patterns near perturbations. We explain how, despite the fact that STM observables depend on the materials-specific details of the tunneling process and setup parameters, there is an overall universality in the qualitative features of conductance spectra. In particular, we discuss why STM results on Bi$_2$Sr$_2$CaCu$_2$O$_8$ and Ca$_{2-x}$Na$_x$CuO$_2$Cl$_2$ are essentially identical.
△ Less
Submitted 29 June, 2017;
originally announced June 2017.
-
Incommensurate charge ordered states in the $\mathit{t-t^{\prime}-J}$ model
Authors:
Peayush Choubey,
Wei-Lin Tu,
Ting-Kuo Lee,
P. J. Hirschfeld
Abstract:
We study the incommensurate charge ordered states in the $\mathit{t-t^{\prime}-J}$ model using the Gutzwiller mean field theory on large systems. In particular, we explore the properties of incommensurate charge modulated states referred to as nodal pair density waves (nPDW) in the literature. nPDW states intertwine site and bond charge order with modulated $d$-wave pair order, and are characteriz…
▽ More
We study the incommensurate charge ordered states in the $\mathit{t-t^{\prime}-J}$ model using the Gutzwiller mean field theory on large systems. In particular, we explore the properties of incommensurate charge modulated states referred to as nodal pair density waves (nPDW) in the literature. nPDW states intertwine site and bond charge order with modulated $d$-wave pair order, and are characterized by a nonzero amplitude of uniform pairing; they also manifest a dominant intra-unit cell $d$-density wave form factor. To compare with a recent scanning tunneling microscopy (STM) study [Hamidian \textit{et al.}, Nat. Phys. 3519 (2015)] of the cuprate superconductor BSCCO-2212, we compute the continuum local density of states (LDOS) at a typical STM tip height using the Wannier function based approach. By Fourier transforming Cu and O sub-lattice LDOS we also obtain bias-dependent intra-unit cell form factors and spatial phase difference. We find that in the nPDW state the behavior of form factors and spatial phase difference as a function of energy agrees remarkably well with the experiment. This is in contrast to commensurate charge modulated states, which we show do not agree with experiment. We propose that the nPDW states are good candidates for the charge density wave phase observed in the superconducting state of underdoped cuprates.
△ Less
Submitted 22 September, 2016;
originally announced September 2016.
-
Interpretation of scanning tunneling quasiparticle interference and impurity states in cuprates
Authors:
A. Kreisel,
Peayush Choubey,
T. Berlijn,
B. M. Andersen,
P. J. Hirschfeld
Abstract:
We apply a recently developed method combining first principles based Wannier functions with solutions to the Bogoliubov-de Gennes equations to the problem of interpreting STM data in cuprate superconductors. We show that the observed images of Zn on the surface of Bi$_2$Sr$_2$CaCu$_2$O$_8$ can only be understood by accounting for the tails of the Cu Wannier functions, which include significant we…
▽ More
We apply a recently developed method combining first principles based Wannier functions with solutions to the Bogoliubov-de Gennes equations to the problem of interpreting STM data in cuprate superconductors. We show that the observed images of Zn on the surface of Bi$_2$Sr$_2$CaCu$_2$O$_8$ can only be understood by accounting for the tails of the Cu Wannier functions, which include significant weight on apical O sites in neighboring unit cells. This calculation thus puts earlier crude "filter" theories on a microscopic foundation and solves a long standing puzzle. We then study quasiparticle interference phenomena induced by out-of-plane weak potential scatterers, and show how patterns long observed in cuprates can be understood in terms of the interference of Wannier functions above the surface. Our results show excellent agreement with experiment and enable a better understanding of novel phenomena in the cuprates via STM imaging.
△ Less
Submitted 28 May, 2015; v1 submitted 7 July, 2014;
originally announced July 2014.
-
Visualization of atomic-scale phenomena in superconductors: application to FeSe
Authors:
Peayush Choubey,
T. Berlijn,
A. Kreisel,
C. Cao,
P. J. Hirschfeld
Abstract:
We propose a simple method of calculating inhomogeneous, atomic-scale phenomena in superconductors which makes use of the wave function information traditionally discarded in the construction of tight-binding models used in the Bogoliubov-de Gennes equations. The method uses symmetry based first principles Wannier functions to visualize the effects of superconducting pairing on the distribution of…
▽ More
We propose a simple method of calculating inhomogeneous, atomic-scale phenomena in superconductors which makes use of the wave function information traditionally discarded in the construction of tight-binding models used in the Bogoliubov-de Gennes equations. The method uses symmetry based first principles Wannier functions to visualize the effects of superconducting pairing on the distribution of electronic states over atoms within a crystal unit cell. Local symmetries lower than the global lattice symmetry can thus be exhibited as well, rendering theoretical comparisons with scanning tunneling spectroscopy data much more useful. As a simple example, we discuss the geometric dimer states observed near defects in superconducting FeSe.
△ Less
Submitted 19 November, 2014; v1 submitted 29 January, 2014;
originally announced January 2014.