(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–29 of 29 results for author: Stahlberg, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2310.13678  [pdf, other

    cs.CL cs.AI cs.LG

    Long-Form Speech Translation through Segmentation with Finite-State Decoding Constraints on Large Language Models

    Authors: Arya D. McCarthy, Hao Zhang, Shankar Kumar, Felix Stahlberg, Ke Wu

    Abstract: One challenge in speech translation is that plenty of spoken content is long-form, but short units are necessary for obtaining high-quality translations. To address this mismatch, we adapt large language models (LLMs) to split long ASR transcripts into segments that can be independently translated so as to maximize the overall translation quality. We overcome the tendency of hallucination in LLMs… ▽ More

    Submitted 23 October, 2023; v1 submitted 20 October, 2023; originally announced October 2023.

    Comments: accepted to the Findings of EMNLP 2023. arXiv admin note: text overlap with arXiv:2212.09895

  2. arXiv:2308.11807  [pdf, other

    cs.CL

    Towards an On-device Agent for Text Rewriting

    Authors: Yun Zhu, Yinxiao Liu, Felix Stahlberg, Shankar Kumar, Yu-hui Chen, Liangchen Luo, Lei Shu, Renjie Liu, Jindong Chen, Lei Meng

    Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities for text rewriting. Nonetheless, the large sizes of these models make them impractical for on-device inference, which would otherwise allow for enhanced privacy and economical inference. Creating a smaller yet potent language model for text rewriting presents a formidable challenge because it requires balancing the need for a s… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

  3. arXiv:2212.09895  [pdf, other

    cs.CL

    Improved Long-Form Spoken Language Translation with Large Language Models

    Authors: Arya D. McCarthy, Hao Zhang, Shankar Kumar, Felix Stahlberg, Axel H. Ng

    Abstract: A challenge in spoken language translation is that plenty of spoken content is long-form, but short units are necessary for obtaining high-quality translations. To address this mismatch, we fine-tune a general-purpose, large language model to split long ASR transcripts into segments that can be independently translated so as to maximize the overall translation quality. We compare to several segmen… ▽ More

    Submitted 19 December, 2022; originally announced December 2022.

  4. arXiv:2211.04126  [pdf, other

    cs.CL

    Conciseness: An Overlooked Language Task

    Authors: Felix Stahlberg, Aashish Kumar, Chris Alberti, Shankar Kumar

    Abstract: We report on novel investigations into training models that make sentences concise. We define the task and show that it is different from related tasks such as summarization and simplification. For evaluation, we release two test sets, consisting of 2000 sentences each, that were annotated by two and five human annotators, respectively. We demonstrate that conciseness is a difficult task for which… ▽ More

    Submitted 8 November, 2022; originally announced November 2022.

    Comments: EMNLP 2022 Workshop on Text Simplification, Accessibility, and Readability (TSAR)

  5. arXiv:2206.07043  [pdf, other

    cs.CL

    Text Generation with Text-Editing Models

    Authors: Eric Malmi, Yue Dong, Jonathan Mallinson, Aleksandr Chuklin, Jakub Adamek, Daniil Mirylenka, Felix Stahlberg, Sebastian Krause, Shankar Kumar, Aliaksei Severyn

    Abstract: Text-editing models have recently become a prominent alternative to seq2seq models for monolingual text-generation tasks such as grammatical error correction, simplification, and style transfer. These tasks share a common trait - they exhibit a large amount of textual overlap between the source and target texts. Text-editing models take advantage of this observation and learn to generate the outpu… ▽ More

    Submitted 14 June, 2022; originally announced June 2022.

    Comments: Accepted as a tutorial at NAACL 2022

  6. arXiv:2205.00704  [pdf, other

    cs.CL

    Jam or Cream First? Modeling Ambiguity in Neural Machine Translation with SCONES

    Authors: Felix Stahlberg, Shankar Kumar

    Abstract: The softmax layer in neural machine translation is designed to model the distribution over mutually exclusive tokens. Machine translation, however, is intrinsically uncertain: the same source sentence can have multiple semantically equivalent translations. Therefore, we propose to replace the softmax activation with a multi-label classification layer that can model ambiguity more effectively. We c… ▽ More

    Submitted 2 May, 2022; originally announced May 2022.

    Comments: NAACL 2022 paper

  7. arXiv:2204.00471  [pdf, other

    cs.CL

    Uncertainty Determines the Adequacy of the Mode and the Tractability of Decoding in Sequence-to-Sequence Models

    Authors: Felix Stahlberg, Ilia Kulikov, Shankar Kumar

    Abstract: In many natural language processing (NLP) tasks the same input (e.g. source sentence) can have multiple possible outputs (e.g. translations). To analyze how this ambiguity (also known as intrinsic uncertainty) shapes the distribution learned by neural sequence models we measure sentence-level uncertainty by computing the degree of overlap between references in multi-reference test sets from two di… ▽ More

    Submitted 1 April, 2022; originally announced April 2022.

    Comments: ACL 2022 paper

  8. arXiv:2202.00153  [pdf, other

    cs.LG

    Transformer-based Models of Text Normalization for Speech Applications

    Authors: Jae Hun Ro, Felix Stahlberg, Ke Wu, Shankar Kumar

    Abstract: Text normalization, or the process of transforming text into a consistent, canonical form, is crucial for speech applications such as text-to-speech synthesis (TTS). In TTS, the system must decide whether to verbalize "1995" as "nineteen ninety five" in "born in 1995" or as "one thousand nine hundred ninety five" in "page 1995". We present an experimental comparison of various Transformer-based se… ▽ More

    Submitted 31 January, 2022; originally announced February 2022.

  9. arXiv:2105.13318  [pdf, other

    cs.CL

    Synthetic Data Generation for Grammatical Error Correction with Tagged Corruption Models

    Authors: Felix Stahlberg, Shankar Kumar

    Abstract: Synthetic data generation is widely known to boost the accuracy of neural grammatical error correction (GEC) systems, but existing methods often lack diversity or are too simplistic to generate the broad range of grammatical errors made by human writers. In this work, we use error type tags from automatic annotation tools such as ERRANT to guide synthetic data generation. We compare several models… ▽ More

    Submitted 27 May, 2021; originally announced May 2021.

    Comments: Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications, 2021. https://github.com/google-research-datasets/C4_200M-synthetic-dataset-for-grammatical-error-correction

  10. arXiv:2009.11136  [pdf, other

    cs.CL

    Seq2Edits: Sequence Transduction Using Span-level Edit Operations

    Authors: Felix Stahlberg, Shankar Kumar

    Abstract: We propose Seq2Edits, an open-vocabulary approach to sequence editing for natural language processing (NLP) tasks with a high degree of overlap between input and output texts. In this approach, each sequence-to-sequence transduction is represented as a sequence of edit operations, where each operation either replaces an entire source span with target tokens or keeps it unchanged. We evaluate our m… ▽ More

    Submitted 23 September, 2020; originally announced September 2020.

    Comments: Accepted at EMNLP 2020

  11. arXiv:2005.01483  [pdf, other

    cs.CL

    Using Context in Neural Machine Translation Training Objectives

    Authors: Danielle Saunders, Felix Stahlberg, Bill Byrne

    Abstract: We present Neural Machine Translation (NMT) training using document-level metrics with batch-level documents. Previous sequence-objective approaches to NMT training focus exclusively on sentence-level metrics like sentence BLEU which do not correspond to the desired evaluation metric, typically document BLEU. Meanwhile research into document-level NMT training focuses on data or model architecture… ▽ More

    Submitted 4 May, 2020; originally announced May 2020.

    Comments: ACL 2020

  12. arXiv:1912.02047  [pdf, other

    cs.CL

    Neural Machine Translation: A Review and Survey

    Authors: Felix Stahlberg

    Abstract: The field of machine translation (MT), the automatic translation of written text from one natural language into another, has experienced a major paradigm shift in recent years. Statistical MT, which mainly relies on various count-based models and which used to dominate MT research for decades, has largely been superseded by neural machine translation (NMT), which tackles translation with a single… ▽ More

    Submitted 29 September, 2020; v1 submitted 4 December, 2019; originally announced December 2019.

    Comments: Extended version of "Neural Machine Translation: A Review" accepted by the Journal of Artificial Intelligence Research (JAIR)

  13. arXiv:1908.10090  [pdf, other

    cs.CL

    On NMT Search Errors and Model Errors: Cat Got Your Tongue?

    Authors: Felix Stahlberg, Bill Byrne

    Abstract: We report on search errors and model errors in neural machine translation (NMT). We present an exact inference procedure for neural sequence models based on a combination of beam search and depth-first search. We use our exact search to find the global best model scores under a Transformer base model for the entire WMT15 English-German test set. Surprisingly, beam search fails to find these global… ▽ More

    Submitted 27 August, 2019; originally announced August 2019.

    Comments: EMNLP-2019

  14. arXiv:1907.00168  [pdf, other

    cs.CL

    The CUED's Grammatical Error Correction Systems for BEA-2019

    Authors: Felix Stahlberg, Bill Byrne

    Abstract: We describe two entries from the Cambridge University Engineering Department to the BEA 2019 Shared Task on grammatical error correction. Our submission to the low-resource track is based on prior work on using finite state transducers together with strong neural language models. Our system for the restricted track is a purely neural system consisting of neural language models and neural machine t… ▽ More

    Submitted 29 June, 2019; originally announced July 2019.

    Comments: BEA-2019 (ACL2019 workshop) shared task system description

  15. arXiv:1906.05786  [pdf, other

    cs.CL

    UCAM Biomedical translation at WMT19: Transfer learning multi-domain ensembles

    Authors: Danielle Saunders, Felix Stahlberg, Bill Byrne

    Abstract: The 2019 WMT Biomedical translation task involved translating Medline abstracts. We approached this using transfer learning to obtain a series of strong neural models on distinct domains, and combining them into multi-domain ensembles. We further experiment with an adaptive language-model ensemble weighting scheme. Our submission achieved the best submitted results on both directions of English-Sp… ▽ More

    Submitted 13 June, 2019; originally announced June 2019.

    Comments: To appear at WMT19

  16. arXiv:1906.05447  [pdf, other

    cs.CL

    Cued@wmt19:ewc&lms

    Authors: Felix Stahlberg, Danielle Saunders, Adria de Gispert, Bill Byrne

    Abstract: Two techniques provide the fabric of the Cambridge University Engineering Department's (CUED) entry to the WMT19 evaluation campaign: elastic weight consolidation (EWC) and different forms of language modelling (LMs). We report substantial gains by fine-tuning very strong baselines on former WMT test sets using a combination of checkpoint averaging and EWC. A sentence-level Transformer LM and a do… ▽ More

    Submitted 11 June, 2019; originally announced June 2019.

    Comments: WMT2019 system description (University of Cambridge)

  17. arXiv:1906.00408  [pdf, other

    cs.CL

    Domain Adaptive Inference for Neural Machine Translation

    Authors: Danielle Saunders, Felix Stahlberg, Adria de Gispert, Bill Byrne

    Abstract: We investigate adaptive ensemble weighting for Neural Machine Translation, addressing the case of improving performance on a new and potentially unknown domain without sacrificing performance on the original domain. We adapt sequentially across two Spanish-English and three English-German tasks, comparing unregularized fine-tuning, L2 and Elastic Weight Consolidation. We then report a novel scheme… ▽ More

    Submitted 2 June, 2019; originally announced June 2019.

    Comments: To appear at ACL 2019

  18. arXiv:1903.10625  [pdf, other

    cs.CL

    Neural Grammatical Error Correction with Finite State Transducers

    Authors: Felix Stahlberg, Christopher Bryant, Bill Byrne

    Abstract: Grammatical error correction (GEC) is one of the areas in natural language processing in which purely neural models have not yet superseded more traditional symbolic models. Hybrid systems combining phrase-based statistical machine translation (SMT) and neural sequence models are currently among the most effective approaches to GEC. However, both SMT and neural sequence-to-sequence models require… ▽ More

    Submitted 5 April, 2019; v1 submitted 25 March, 2019; originally announced March 2019.

    Comments: NAACL 2019

  19. arXiv:1809.00125  [pdf, other

    cs.CL

    Simple Fusion: Return of the Language Model

    Authors: Felix Stahlberg, James Cross, Veselin Stoyanov

    Abstract: Neural Machine Translation (NMT) typically leverages monolingual data in training through backtranslation. We investigate an alternative simple method to use monolingual data for NMT training: We combine the scores of a pre-trained and fixed language model (LM) with the scores of a translation model (TM) while the TM is trained from scratch. To achieve that, we train the translation model to predi… ▽ More

    Submitted 24 January, 2019; v1 submitted 1 September, 2018; originally announced September 2018.

    Comments: WMT18 paper

  20. arXiv:1808.09688  [pdf, other

    cs.CL

    An Operation Sequence Model for Explainable Neural Machine Translation

    Authors: Felix Stahlberg, Danielle Saunders, Bill Byrne

    Abstract: We propose to achieve explainable neural machine translation (NMT) by changing the output representation to explain itself. We present a novel approach to NMT which generates the target sentence by monotonically walking through the source sentence. Word reordering is modeled by operations which allow setting markers in the target sentence and move a target-side write head between those markers. In… ▽ More

    Submitted 29 August, 2018; originally announced August 2018.

    Comments: BlackboxNLP workshop at EMNLP 2018

  21. arXiv:1808.09465  [pdf, other

    cs.CL

    The University of Cambridge's Machine Translation Systems for WMT18

    Authors: Felix Stahlberg, Adria de Gispert, Bill Byrne

    Abstract: The University of Cambridge submission to the WMT18 news translation task focuses on the combination of diverse models of translation. We compare recurrent, convolutional, and self-attention-based neural models on German-English, English-German, and Chinese-English. Our final system combines all neural models together with a phrase-based SMT system in an MBR-based scheme. We report small but consi… ▽ More

    Submitted 28 August, 2018; originally announced August 2018.

    Comments: WMT18 system description paper

  22. arXiv:1805.00456  [pdf, other

    cs.CL

    Multi-representation Ensembles and Delayed SGD Updates Improve Syntax-based NMT

    Authors: Danielle Saunders, Felix Stahlberg, Adria de Gispert, Bill Byrne

    Abstract: We explore strategies for incorporating target syntax into Neural Machine Translation. We specifically focus on syntax in ensembles containing multiple sentence representations. We formulate beam search over such ensembles using WFSTs, and describe a delayed SGD update training procedure that is especially effective for long representations like linearized syntax. Our approach gives state-of-the-a… ▽ More

    Submitted 11 May, 2018; v1 submitted 1 May, 2018; originally announced May 2018.

    Comments: to appear at ACL 2018

  23. arXiv:1803.07204  [pdf, other

    cs.CL

    Why not be Versatile? Applications of the SGNMT Decoder for Machine Translation

    Authors: Felix Stahlberg, Danielle Saunders, Gonzalo Iglesias, Bill Byrne

    Abstract: SGNMT is a decoding platform for machine translation which allows paring various modern neural models of translation with different kinds of constraints and symbolic models. In this paper, we describe three use cases in which SGNMT is currently playing an active role: (1) teaching as SGNMT is being used for course work and student theses in the MPhil in Machine Learning, Speech and Language Techno… ▽ More

    Submitted 19 March, 2018; originally announced March 2018.

    Comments: Presented at AMTA 2018

  24. arXiv:1708.01809  [pdf, other

    cs.CL

    A Comparison of Neural Models for Word Ordering

    Authors: Eva Hasler, Felix Stahlberg, Marcus Tomalin, Adri`a de Gispert, Bill Byrne

    Abstract: We compare several language models for the word-ordering task and propose a new bag-to-sequence neural model based on attention-based sequence-to-sequence models. We evaluate the model on a large German WMT data set where it significantly outperforms existing models. We also describe a novel search strategy for LM-based word ordering and report results on the English Penn Treebank. Our best model… ▽ More

    Submitted 5 August, 2017; originally announced August 2017.

    Comments: Accepted for publication at INLG 2017

  25. arXiv:1707.06885  [pdf, other

    cs.CL

    SGNMT -- A Flexible NMT Decoding Platform for Quick Prototyping of New Models and Search Strategies

    Authors: Felix Stahlberg, Eva Hasler, Danielle Saunders, Bill Byrne

    Abstract: This paper introduces SGNMT, our experimental platform for machine translation research. SGNMT provides a generic interface to neural and symbolic scoring modules (predictors) with left-to-right semantic such as translation models like NMT, language models, translation lattices, $n$-best lists or other kinds of scores and constraints. Predictors can be combined with other predictors to form comple… ▽ More

    Submitted 21 July, 2017; originally announced July 2017.

    Comments: Accepted as EMNLP 2017 demo paper

  26. arXiv:1704.03279  [pdf, other

    cs.CL

    Unfolding and Shrinking Neural Machine Translation Ensembles

    Authors: Felix Stahlberg, Bill Byrne

    Abstract: Ensembling is a well-known technique in neural machine translation (NMT) to improve system performance. Instead of a single neural net, multiple neural nets with the same topology are trained separately, and the decoder generates predictions by averaging over the individual models. Ensembling often improves the quality of the generated translations drastically. However, it is not suitable for prod… ▽ More

    Submitted 21 July, 2017; v1 submitted 11 April, 2017; originally announced April 2017.

    Comments: Accepted at EMNLP 2017

  27. arXiv:1612.03791  [pdf, other

    cs.CL

    Neural Machine Translation by Minimising the Bayes-risk with Respect to Syntactic Translation Lattices

    Authors: Felix Stahlberg, Adrià de Gispert, Eva Hasler, Bill Byrne

    Abstract: We present a novel scheme to combine neural machine translation (NMT) with traditional statistical machine translation (SMT). Our approach borrows ideas from linearised lattice minimum Bayes-risk decoding for SMT. The NMT score is combined with the Bayes-risk of the translation according the SMT lattice. This makes our approach much more flexible than $n$-best list or lattice rescoring as the neur… ▽ More

    Submitted 13 February, 2017; v1 submitted 12 December, 2016; originally announced December 2016.

    Comments: EACL2017 short paper

  28. arXiv:1606.04963  [pdf, other

    cs.CL

    The Edit Distance Transducer in Action: The University of Cambridge English-German System at WMT16

    Authors: Felix Stahlberg, Eva Hasler, Bill Byrne

    Abstract: This paper presents the University of Cambridge submission to WMT16. Motivated by the complementary nature of syntactical machine translation and neural machine translation (NMT), we exploit the synergies of Hiero and NMT in different combination schemes. Starting out with a simple neural lattice rescoring approach, we show that the Hiero lattices are often too narrow for NMT ensembles. Therefore,… ▽ More

    Submitted 15 June, 2016; originally announced June 2016.

  29. Syntactically Guided Neural Machine Translation

    Authors: Felix Stahlberg, Eva Hasler, Aurelien Waite, Bill Byrne

    Abstract: We investigate the use of hierarchical phrase-based SMT lattices in end-to-end neural machine translation (NMT). Weight pushing transforms the Hiero scores for complete translation hypotheses, with the full translation grammar score and full n-gram language model score, into posteriors compatible with NMT predictive probabilities. With a slightly modified NMT beam-search decoder we find gains over… ▽ More

    Submitted 19 May, 2016; v1 submitted 15 May, 2016; originally announced May 2016.

    Comments: ACL 2016