-
Synthetic Pre-Training Tasks for Neural Machine Translation
Authors:
Zexue He,
Graeme Blackwood,
Rameswar Panda,
Julian McAuley,
Rogerio Feris
Abstract:
Pre-training models with large crawled corpora can lead to issues such as toxicity and bias, as well as copyright and privacy concerns. A promising way of alleviating such concerns is to conduct pre-training with synthetic tasks and data, since no real-world information is ingested by the model. Our goal in this paper is to understand the factors that contribute to the effectiveness of pre-trainin…
▽ More
Pre-training models with large crawled corpora can lead to issues such as toxicity and bias, as well as copyright and privacy concerns. A promising way of alleviating such concerns is to conduct pre-training with synthetic tasks and data, since no real-world information is ingested by the model. Our goal in this paper is to understand the factors that contribute to the effectiveness of pre-training models when using synthetic resources, particularly in the context of neural machine translation. We propose several novel approaches to pre-training translation models that involve different levels of lexical and structural knowledge, including: 1) generating obfuscated data from a large parallel corpus 2) concatenating phrase pairs extracted from a small word-aligned corpus, and 3) generating synthetic parallel data without real human language corpora. Our experiments on multiple language pairs reveal that pre-training benefits can be realized even with high levels of obfuscation or purely synthetic parallel data. We hope the findings from our comprehensive empirical analysis will shed light on understanding what matters for NMT pre-training, as well as pave the way for the development of more efficient and less toxic models.
△ Less
Submitted 30 May, 2023; v1 submitted 19 December, 2022;
originally announced December 2022.
-
Simulations for Planning Next-Generation Exoplanet Radial Velocity Surveys
Authors:
Patrick D. Newman,
Peter Plavchan,
Jennifer A. Burt,
Johanna Teske,
Eric E. Mamajek,
2 Stephanie Leifer,
B. Scott Gaudi,
Gary Blackwood,
Rhonda Morgan
Abstract:
Future direct imaging missions such as HabEx and LUVOIR aim to catalog and characterize Earth-mass analogs around nearby stars. The exoplanet yield of these missions will be dependent on the frequency of Earth-like planets, and potentially the a priori knowledge of which stars specifically host suitable planetary systems. Ground or space based radial velocity surveys can potentially perform the pr…
▽ More
Future direct imaging missions such as HabEx and LUVOIR aim to catalog and characterize Earth-mass analogs around nearby stars. The exoplanet yield of these missions will be dependent on the frequency of Earth-like planets, and potentially the a priori knowledge of which stars specifically host suitable planetary systems. Ground or space based radial velocity surveys can potentially perform the pre-selection of targets and assist in the optimization of observation times, as opposed to an uninformed direct imaging survey. In this paper, we present our framework for simulating future radial velocity surveys of nearby stars in support of direct imaging missions. We generate lists of exposure times, observation time-series, and radial velocity time-series given a direct imaging target list. We generate simulated surveys for a proposed set of telescopes and precise radial velocity spectrographs spanning a set of plausible global-network architectures that may be considered for next generation extremely precise radial velocity surveys. We also develop figures of merit for observation frequency and planet detection sensitivity, and compare these across architectures. From these, we draw conclusions, given our stated assumptions and caveats, to optimize the yield of future radial velocity surveys in support of direct imaging missions. We find that all of our considered surveys obtain sufficient numbers of precise observations to meet the minimum theoretical white noise detection sensitivity for Earth-mass habitable zone planets, with margin to explore systematic effects due to stellar activity and correlated noise.
△ Less
Submitted 29 April, 2022;
originally announced April 2022.
-
Extreme Precision Radial Velocity Working Group Final Report
Authors:
Jonathan Crass,
B. Scott Gaudi,
Stephanie Leifer,
Charles Beichman,
Chad Bender,
Gary Blackwood,
Jennifer A. Burt,
John L. Callas,
Heather M. Cegla,
Scott A. Diddams,
Xavier Dumusque,
Jason D. Eastman,
Eric B. Ford,
Benjamin Fulton,
Rose Gibson,
Samuel Halverson,
Raphaëlle D. Haywood,
Fred Hearty,
Andrew W. Howard,
David W. Latham,
Johannes Löhner-Böttcher,
Eric E. Mamajek,
Annelies Mortier,
Patrick Newman,
Peter Plavchan
, et al. (11 additional authors not shown)
Abstract:
Precise mass measurements of exoplanets discovered by the direct imaging or transit technique are required to determine planet bulk properties and potential habitability. Furthermore, it is generally acknowledged that, for the foreseeable future, the Extreme Precision Radial Velocity (EPRV) measurement technique is the only method potentially capable of detecting and measuring the masses and orbit…
▽ More
Precise mass measurements of exoplanets discovered by the direct imaging or transit technique are required to determine planet bulk properties and potential habitability. Furthermore, it is generally acknowledged that, for the foreseeable future, the Extreme Precision Radial Velocity (EPRV) measurement technique is the only method potentially capable of detecting and measuring the masses and orbits of habitable-zone Earths orbiting nearby F, G, and K spectral-type stars from the ground. In particular, EPRV measurements with a precision of better than approximately 10 cm/s (with a few cm/s stability over many years) are required. Unfortunately, for nearly a decade, PRV instruments and surveys have been unable to routinely reach RV accuracies of less than roughly 1 m/s. Making EPRV science and technology development a critical component of both NASA and NSF program plans is crucial for reaching the goal of detecting potentially habitable Earthlike planets and supporting potential future exoplanet direct imaging missions such as the Habitable Exoplanet Observatory (HabEx) or the Large Ultraviolet Optical Infrared Surveyor (LUVOIR). In recognition of these facts, the 2018 National Academy of Sciences (NAS) Exoplanet Science Strategy (ESS) report recommended the development of EPRV measurements as a critical step toward the detection and characterization of habitable, Earth-analog planets. In response to the NAS-ESS recommendation, NASA and NSF commissioned the EPRV Working Group to recommend a ground-based program architecture and implementation plan to achieve the goal intended by the NAS. This report documents the activities, findings, and recommendations of the EPRV Working Group.
△ Less
Submitted 29 July, 2021;
originally announced July 2021.
-
Multilingual Neural Machine Translation with Task-Specific Attention
Authors:
Graeme Blackwood,
Miguel Ballesteros,
Todd Ward
Abstract:
Multilingual machine translation addresses the task of translating between multiple source and target languages. We propose task-specific attention models, a simple but effective technique for improving the quality of sequence-to-sequence neural multilingual translation. Our approach seeks to retain as much of the parameter sharing generalization of NMT models as possible, while still allowing for…
▽ More
Multilingual machine translation addresses the task of translating between multiple source and target languages. We propose task-specific attention models, a simple but effective technique for improving the quality of sequence-to-sequence neural multilingual translation. Our approach seeks to retain as much of the parameter sharing generalization of NMT models as possible, while still allowing for language-specific specialization of the attention model to a particular language-pair or task. Our experiments on four languages of the Europarl corpus show that using a target-specific model of attention provides consistent gains in translation quality for all possible translation directions, compared to a model in which all parameters are shared. We observe improved translation quality even in the (extreme) low-resource zero-shot translation directions for which the model never saw explicitly paired parallel data.
△ Less
Submitted 8 June, 2018;
originally announced June 2018.
-
Enabling concepts for a dual spacecraft formation-flying optical interferometer for NASA's ST3 mission
Authors:
Peter W. Gorham,
William M. Folkner,
Gary H. Blackwood
Abstract:
We present the enabling concept and technology for a dual spacecraft formation-flying optical interferometer, to be launched into a deep space orbit as Space Technology 3. The combiner spacecraft makes use of a nested cat's eye delay line configuration that minimizes wavefront distortion and stores 20 m of optical pathlength in a package of \sim 1.5 m length. A parabolic trajectory for the secon…
▽ More
We present the enabling concept and technology for a dual spacecraft formation-flying optical interferometer, to be launched into a deep space orbit as Space Technology 3. The combiner spacecraft makes use of a nested cat's eye delay line configuration that minimizes wavefront distortion and stores 20 m of optical pathlength in a package of \sim 1.5 m length. A parabolic trajectory for the secondary collector spacecraft enables baselines of up to 200 m for a fixed 20 m stored delay and spacecraft separations of up to 1 km.
△ Less
Submitted 15 December, 1999;
originally announced December 1999.