Search | arXiv e-print repository

arXiv:2006.12469 [pdf, other]

doi 10.1088/2632-2153/ac362b

Attention-based Quantum Tomography

Authors: Peter Cha, Paul Ginsparg, Felix Wu, Juan Carrasquilla, Peter L. McMahon, Eun-Ah Kim

Abstract: With rapid progress across platforms for quantum systems, the problem of many-body quantum state reconstruction for noisy quantum states becomes an important challenge. Recent works found promise in recasting the problem of quantum state reconstruction to learning the probability distribution of quantum state measurement vectors using generative neural network models. Here we propose the "Attentio… ▽ More With rapid progress across platforms for quantum systems, the problem of many-body quantum state reconstruction for noisy quantum states becomes an important challenge. Recent works found promise in recasting the problem of quantum state reconstruction to learning the probability distribution of quantum state measurement vectors using generative neural network models. Here we propose the "Attention-based Quantum Tomography" (AQT), a quantum state reconstruction using an attention mechanism-based generative network that learns the mixed state density matrix of a noisy quantum state. The AQT is based on the model proposed in "Attention is all you need" by Vishwani et al (2017) that is designed to learn long-range correlations in natural language sentences and thereby outperform previous natural language processing models. We demonstrate not only that AQT outperforms earlier neural-network-based quantum state reconstruction on identical tasks but that AQT can accurately reconstruct the density matrix associated with a noisy quantum state experimentally realized in an IBMQ quantum computer. We speculate the success of the AQT stems from its ability to model quantum entanglement across the entire quantum system much as the attention model for natural language processing captures the correlations among words in a sentence. △ Less

Submitted 3 November, 2021; v1 submitted 22 June, 2020; originally announced June 2020.

Journal ref: Mach. Learn.: Sci. Technol. 3 01LT01

arXiv:1706.04188 [pdf]

doi 10.15252/embj.201695531

Preprint Déjà Vu: an FAQ

Authors: P. Ginsparg

Abstract: I give a brief overview of arXiv history, and describe the current state of arXiv practice, both technical and sociological. This commentary originally appeared in the EMBO Journal, 19 Oct 2016. It was intended as an update on comments from the late 1990s regarding use of preprints by biologists (or lack thereof), but may be of interest to practitioners of other disciplines. It is based largely on… ▽ More I give a brief overview of arXiv history, and describe the current state of arXiv practice, both technical and sociological. This commentary originally appeared in the EMBO Journal, 19 Oct 2016. It was intended as an update on comments from the late 1990s regarding use of preprints by biologists (or lack thereof), but may be of interest to practitioners of other disciplines. It is based largely on a keynote presentation I gave to the ASAPbio inaugural meeting in Feb 2016, and responds as well to some follow-up questions. △ Less

Submitted 27 June, 2017; v1 submitted 13 June, 2017; originally announced June 2017.

Comments: 14 pages. v2: minor clarifications

arXiv:1705.10589 [pdf, other]

Jeffrey's prior sampling of deep sigmoidal networks

Authors: Lorien X. Hayden, Alexander A. Alemi, Paul H. Ginsparg, James P. Sethna

Abstract: Neural networks have been shown to have a remarkable ability to uncover low dimensional structure in data: the space of possible reconstructed images form a reduced model manifold in image space. We explore this idea directly by analyzing the manifold learned by Deep Belief Networks and Stacked Denoising Autoencoders using Monte Carlo sampling. The model manifold forms an only slightly elongated h… ▽ More Neural networks have been shown to have a remarkable ability to uncover low dimensional structure in data: the space of possible reconstructed images form a reduced model manifold in image space. We explore this idea directly by analyzing the manifold learned by Deep Belief Networks and Stacked Denoising Autoencoders using Monte Carlo sampling. The model manifold forms an only slightly elongated hyperball with actual reconstructed data appearing predominantly on the boundaries of the manifold. In connection with the results we present, we discuss problems of sampling high-dimensional manifolds as well as recent work [M. Transtrum, G. Hart, and P. Qiu, Submitted (2014)] discussing the relation between high dimensional geometry and model reduction. △ Less

Submitted 25 May, 2017; originally announced May 2017.

arXiv:1605.07228 [pdf]

doi 10.1002/asi.23753

A note concerning Primary Source Knowledge

Authors: HM Collins, P Ginsparg, L Reyes-Galindo

Abstract: We add a small increment to understanding the notion of Primary Source Knowledge, knowledge that the non-expert and the citizen can acquire by assiduously reading the primary scientific journal literature without being embedded in the cultural life of the corresponding technical specialty. This comes from exposing four papers to the automated computer filters used by the physics preprint server ar… ▽ More We add a small increment to understanding the notion of Primary Source Knowledge, knowledge that the non-expert and the citizen can acquire by assiduously reading the primary scientific journal literature without being embedded in the cultural life of the corresponding technical specialty. This comes from exposing four papers to the automated computer filters used by the physics preprint server arXiv. These filters are used to flag papers in need of further review by human assessors before being promulgated on the server; papers not flagged by the algorithm are generally posted on arXiv without further review. After the filtering, human moderators decide whether papers should be posted based on a relatively low bar of whether they are of interest, relevance and value to the research communities that populate arXiv. △ Less

Submitted 23 May, 2016; originally announced May 2016.

Comments: 9 pages

Journal ref: JASIST Dec 2016

arXiv:1503.05543 [pdf, other]

Text Segmentation based on Semantic Word Embeddings

Authors: Alexander A Alemi, Paul Ginsparg

Abstract: We explore the use of semantic word embeddings in text segmentation algorithms, including the C99 segmentation algorithm and new algorithms inspired by the distributed word vector representation. By developing a general framework for discussing a class of segmentation objectives, we study the effectiveness of greedy versus exact optimization approaches and suggest a new iterative refinement techni… ▽ More We explore the use of semantic word embeddings in text segmentation algorithms, including the C99 segmentation algorithm and new algorithms inspired by the distributed word vector representation. By developing a general framework for discussing a class of segmentation objectives, we study the effectiveness of greedy versus exact optimization approaches and suggest a new iterative refinement technique for improving the performance of greedy strategies. We compare our results to known benchmarks, using known metrics. We demonstrate state-of-the-art performance for an untrained method with our Content Vector Segmentation (CVS) on the Choi test set. Finally, we apply the segmentation procedure to an in-the-wild dataset consisting of text extracted from scholarly articles in the arXiv.org database. △ Less

Submitted 18 March, 2015; originally announced March 2015.

Comments: 10 pages, 4 figures. KDD2015 submission

arXiv:1412.2716 [pdf]

doi 10.1073/pnas.1415135111

Patterns of Text Reuse in a Scientific Corpus

Authors: Daniel T. Citron, Paul Ginsparg

Abstract: We consider the incidence of text "reuse" by researchers, via a systematic pairwise comparison of the text content of all articles deposited to arXiv.org from 1991--2012. We measure the global frequencies of three classes of text reuse, and measure how chronic text reuse is distributed among authors in the dataset. We infer a baseline for accepted practice, perhaps surprisingly permissive compared… ▽ More We consider the incidence of text "reuse" by researchers, via a systematic pairwise comparison of the text content of all articles deposited to arXiv.org from 1991--2012. We measure the global frequencies of three classes of text reuse, and measure how chronic text reuse is distributed among authors in the dataset. We infer a baseline for accepted practice, perhaps surprisingly permissive compared with other societal contexts, and a clearly delineated set of aberrant authors. We find a negative correlation between the amount of reused text in an article and its influence, as measured by subsequent citations. Finally, we consider the distribution of countries of origin of articles containing large amounts of reused text. △ Less

Submitted 8 December, 2014; originally announced December 2014.

Comments: 6 pages, plus 10 pages of supplementary material. To appear in PNAS (online 8 Dec 2014)

arXiv:1108.2700 [pdf, ps, other]

It was twenty years ago today ...

Authors: Paul Ginsparg

Abstract: To mark the 20th anniversary of the (14 Aug 1991) commencement of hep-th@xxx.lanl.gov (now arXiv.org), I've adapted this article from one that first appeared in Physics World (2008), was later reprinted (with permission) in Learned Publishing (2009), but never appeared in arXiv. I trace some historical context and early development of the resource, its later trajectory, and close with some thought… ▽ More To mark the 20th anniversary of the (14 Aug 1991) commencement of hep-th@xxx.lanl.gov (now arXiv.org), I've adapted this article from one that first appeared in Physics World (2008), was later reprinted (with permission) in Learned Publishing (2009), but never appeared in arXiv. I trace some historical context and early development of the resource, its later trajectory, and close with some thoughts about the future. This version is closer to my original draft, with some updates for this occasion, plus an astounding $2^5$ added footnotes. △ Less

Submitted 12 September, 2011; v1 submitted 14 August, 2011; originally announced August 2011.

Comments: 9 pages. v2: additional edifying comments interspersed throughout

arXiv:1010.2757 [pdf, ps, other]

doi 10.1002/asi.21428

Last but not Least: Additional Positional Effects on Citation and Readership in arXiv

Authors: Asif-ul Haque, Paul Ginsparg

Abstract: We continue investigation of the effect of position in announcements of newly received articles, a single day artifact, with citations received over the course of ensuing years. Earlier work [arXiv:0907.4740, arXiv:0805.0307] focused on the "visibility" effect for positions near the beginnings of announcements, and on the "self-promotion" effect associated to authors intentionally aiming for these… ▽ More We continue investigation of the effect of position in announcements of newly received articles, a single day artifact, with citations received over the course of ensuing years. Earlier work [arXiv:0907.4740, arXiv:0805.0307] focused on the "visibility" effect for positions near the beginnings of announcements, and on the "self-promotion" effect associated to authors intentionally aiming for these positions, with both found correlated to a later enhanced citation rate. Here we consider a "reverse-visibility" effect for positions near the ends of announcements, and on a "procrastination" effect associated to submissions made within the 20 minute period just before the daily deadline. For two large subcommunities of theoretical high energy physics, we find a clear "reverse-visibility" effect, in which articles near the ends of the lists receive a boost in both short-term readership and long-term citations, almost comparable in size to the "visibility" effect documented earlier. For one of those subcommunities, we find an additional "procrastination" effect, in which last position articles submitted shortly before the deadline have an even higher citation rate than those that land more accidentally in that position. We consider and eliminate geographic effects as responsible for the above, and speculate on other possible causes, including "oblivious" and "nightowl" effects. △ Less

Submitted 13 October, 2010; originally announced October 2010.

Comments: 13p, appeared JASIST on-line first (12 Oct 2010)

Journal ref: JASIST 61, 2381-2388 (Dec 2010)

arXiv:0907.4740 [pdf, ps, other]

doi 10.1002/asi.21166

Positional Effects on Citation and Readership in arXiv

Authors: Asif-ul Haque, Paul Ginsparg

Abstract: arXiv.org mediates contact with the literature for entire scholarly communities, both through provision of archival access and through daily email and web announcements of new materials, potentially many screenlengths long. We confirm and extend a surprising correlation between article position in these initial announcements, ordered by submission time, and later citation impact, due primarily t… ▽ More arXiv.org mediates contact with the literature for entire scholarly communities, both through provision of archival access and through daily email and web announcements of new materials, potentially many screenlengths long. We confirm and extend a surprising correlation between article position in these initial announcements, ordered by submission time, and later citation impact, due primarily to intentional "self-promotion" on the part of authors. A pure "visibility" effect was also present: the subset of articles accidentally in early positions fared measurably better in the long-term citation record than those lower down. Astrophysics articles announced in position 1, for example, overall received a median number of citations 83\% higher, while those there accidentally had a 44\% visibility boost. For two large subcommunities of theoretical high energy physics, hep-th and hep-ph articles announced in position 1 had median numbers of citations 50\% and 100\% larger than for positions 5--15, and the subsets there accidentally had visibility boosts of 38\% and 71\%. We also consider the positional effects on early readership. The median numbers of early full text downloads for astro-ph, hep-th, and hep-ph articles announced in position 1 were 82\%, 61\%, and 58\% higher than for lower positions, respectively, and those there accidentally had medians visibility-boosted by 53\%, 44\%, and 46\%. Finally, we correlate a variety of readership features with long-term citations, using machine learning methods, thereby extending previous results on the predictive power of early readership in a broader context. We conclude with some observations on impact metrics and dangers of recommender mechanisms. △ Less

Submitted 27 July, 2009; originally announced July 2009.

Comments: 28 pages, to appear in JASIST

Journal ref: JASIST 60, 2203-2218 (Nov 2009)

arXiv:cs/0702012 [pdf]

doi 10.1109/ICDM.2006.126

Plagiarism Detection in arXiv

Authors: Daria Sorokina, Johannes Gehrke, Simeon Warner, Paul Ginsparg

Abstract: We describe a large-scale application of methods for finding plagiarism in research document collections. The methods are applied to a collection of 284,834 documents collected by arXiv.org over a 14 year period, covering a few different research disciplines. The methodology efficiently detects a variety of problematic author behaviors, and heuristics are developed to reduce the number of false… ▽ More We describe a large-scale application of methods for finding plagiarism in research document collections. The methods are applied to a collection of 284,834 documents collected by arXiv.org over a 14 year period, covering a few different research disciplines. The methodology efficiently detects a variety of problematic author behaviors, and heuristics are developed to reduce the number of false positives. The methods are also efficient enough to implement as a real-time submission screen for a collection many times larger. △ Less

Submitted 1 February, 2007; originally announced February 2007.

Comments: Sixth International Conference on Data Mining (ICDM'06), Dec 2006

arXiv:cs/0609126 [pdf, ps, other]

doi 10.1087/095315107779490661

E-prints and Journal Articles in Astronomy: a Productive Co-existence

Authors: Edwin A. Henneken, Michael J. Kurtz, Simeon Warner, Paul Ginsparg, Guenther Eichhorn, Alberto Accomazzi, Carolyn S. Grant, Donna Thompson, Elizabeth Bohlen, Stephen S. Murray

Abstract: Are the e-prints (electronic preprints) from the arXiv repository being used instead of the journal articles? In this paper we show that the e-prints have not undermined the usage of journal papers in the astrophysics community. As soon as the journal article is published, the astronomical community prefers to read the journal article and the use of e-prints through the NASA Astrophysics Data Sy… ▽ More Are the e-prints (electronic preprints) from the arXiv repository being used instead of the journal articles? In this paper we show that the e-prints have not undermined the usage of journal papers in the astrophysics community. As soon as the journal article is published, the astronomical community prefers to read the journal article and the use of e-prints through the NASA Astrophysics Data System drops to zero. This suggests that the majority of astronomers have access to institutional subscriptions and that they choose to read the journal article when given the choice. Within the NASA Astrophysics Data System they are given this choice, because the e-print and the journal article are treated equally, since both are just one click away. In other words, the e-prints have not undermined journal use in the astrophysics community and thus currently do not pose a financial threat to the publishers. We present readership data for the arXiv category "astro-ph" and the 4 core journals in astronomy (Astrophysical Journal, Astronomical Journal, Monthly Notices of the Royal Astronomical Society and Astronomy & Astrophysics). Furthermore, we show that the half-life (the point where the use of an article drops to half the use of a newly published article) for an e-print is shorter than for a journal paper. The ADS is funded by NASA Grant NNG06GG68G. arXiv receives funding from NSF award #0404553 △ Less

Submitted 22 September, 2006; originally announced September 2006.

Comments: 8 pages, 4 figures, submitted to Learned Publishing

Journal ref: Learn.Publ.20:16-22,2007

arXiv:cs/0312018 [pdf, ps, other]

doi 10.1073/pnas.0308253100

Mapping Subsets of Scholarly Information

Authors: Paul Ginsparg, Paul Houle, Thorsten Joachims, Jae-Hoon Sul

Abstract: We illustrate the use of machine learning techniques to analyze, structure, maintain, and evolve a large online corpus of academic literature. An emerging field of research can be identified as part of an existing corpus, permitting the implementation of a more coherent community structure for its practitioners. We illustrate the use of machine learning techniques to analyze, structure, maintain, and evolve a large online corpus of academic literature. An emerging field of research can be identified as part of an existing corpus, permitting the implementation of a more coherent community structure for its practitioners. △ Less

Submitted 11 December, 2003; originally announced December 2003.

Comments: 10 pages, 4 figures, presented at Arthur M. Sackler Colloquium on "Mapping Knowledge Domains", 9--11 May 2003, Beckman Center, Irvine, CA, proceedings to appear in PNAS

ACM Class: H.3.1; H.3.6; I.2.6

Showing 1–12 of 12 results for author: Ginsparg, P