(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–8 of 8 results for author: Merity, S

.
  1. arXiv:1911.11423  [pdf, other

    cs.CL cs.AI cs.NE

    Single Headed Attention RNN: Stop Thinking With Your Head

    Authors: Stephen Merity

    Abstract: The leading approaches in language modeling are all obsessed with TV shows of my youth - namely Transformers and Sesame Street. Transformers this, Transformers that, and over here a bonfire worth of GPU-TPU-neuromorphic wafer scale silicon. We opt for the lazy path of old and proven techniques with a fancy crypto inspired acronym: the Single Headed Attention RNN (SHA-RNN). The author's lone goal i… ▽ More

    Submitted 27 November, 2019; v1 submitted 26 November, 2019; originally announced November 2019.

    Comments: Addition of citations and contextual results (no attention head, single attention head, attention per layer), removal of wordpiece WikiText-103 numbers due to normalization issues, fix of SHA attention figure Q arrow, other minor fixes

  2. arXiv:1803.08240  [pdf, other

    cs.CL cs.AI cs.NE

    An Analysis of Neural Language Modeling at Multiple Scales

    Authors: Stephen Merity, Nitish Shirish Keskar, Richard Socher

    Abstract: Many of the leading approaches in language modeling introduce novel, complex and specialized architectures. We take existing state-of-the-art word level language models based on LSTMs and QRNNs and extend them to both larger vocabularies as well as character-level granularity. When properly tuned, LSTMs and QRNNs achieve state-of-the-art results on character-level (Penn Treebank, enwik8) and word-… ▽ More

    Submitted 22 March, 2018; originally announced March 2018.

  3. arXiv:1712.07316  [pdf, other

    cs.CL cs.LG stat.ML

    A Flexible Approach to Automated RNN Architecture Generation

    Authors: Martin Schrimpf, Stephen Merity, James Bradbury, Richard Socher

    Abstract: The process of designing neural architectures requires expert knowledge and extensive trial and error. While automated architecture search may simplify these requirements, the recurrent neural network (RNN) architectures generated by existing methods are limited in both flexibility and components. We propose a domain-specific language (DSL) for use in automated architecture search which can produc… ▽ More

    Submitted 19 December, 2017; originally announced December 2017.

  4. arXiv:1708.02182  [pdf, ps, other

    cs.CL cs.LG cs.NE

    Regularizing and Optimizing LSTM Language Models

    Authors: Stephen Merity, Nitish Shirish Keskar, Richard Socher

    Abstract: Recurrent neural networks (RNNs), such as long short-term memory networks (LSTMs), serve as a fundamental building block for many sequence learning tasks, including machine translation, language modeling, and question answering. In this paper, we consider the specific problem of word-level language modeling and investigate strategies for regularizing and optimizing LSTM-based models. We propose th… ▽ More

    Submitted 7 August, 2017; originally announced August 2017.

  5. arXiv:1708.01009  [pdf, ps, other

    cs.CL cs.NE

    Revisiting Activation Regularization for Language RNNs

    Authors: Stephen Merity, Bryan McCann, Richard Socher

    Abstract: Recurrent neural networks (RNNs) serve as a fundamental building block for many sequence tasks across natural language processing. Recent research has focused on recurrent dropout techniques or custom RNN cells in order to improve performance. Both of these can require substantial modifications to the machine learning model or to the underlying RNN configurations. We revisit traditional regulariza… ▽ More

    Submitted 3 August, 2017; originally announced August 2017.

  6. arXiv:1611.01576  [pdf, other

    cs.NE cs.AI cs.CL cs.LG

    Quasi-Recurrent Neural Networks

    Authors: James Bradbury, Stephen Merity, Caiming Xiong, Richard Socher

    Abstract: Recurrent neural networks are a powerful tool for modeling sequential data, but the dependence of each timestep's computation on the previous timestep's output limits parallelism and makes RNNs unwieldy for very long sequences. We introduce quasi-recurrent neural networks (QRNNs), an approach to neural sequence modeling that alternates convolutional layers, which apply in parallel across timesteps… ▽ More

    Submitted 21 November, 2016; v1 submitted 4 November, 2016; originally announced November 2016.

    Comments: Submitted to conference track at ICLR 2017

  7. arXiv:1609.07843  [pdf, other

    cs.CL cs.AI

    Pointer Sentinel Mixture Models

    Authors: Stephen Merity, Caiming Xiong, James Bradbury, Richard Socher

    Abstract: Recent neural network sequence models with softmax classifiers have achieved their best language modeling performance only with very large hidden states and large vocabularies. Even then they struggle to predict rare or unseen words even if the context makes the prediction unambiguous. We introduce the pointer sentinel mixture architecture for neural sequence models which has the ability to either… ▽ More

    Submitted 26 September, 2016; originally announced September 2016.

  8. arXiv:1603.01417  [pdf, other

    cs.NE cs.CL cs.CV

    Dynamic Memory Networks for Visual and Textual Question Answering

    Authors: Caiming Xiong, Stephen Merity, Richard Socher

    Abstract: Neural network architectures with memory and attention mechanisms exhibit certain reasoning capabilities required for question answering. One such architecture, the dynamic memory network (DMN), obtained high accuracy on a variety of language tasks. However, it was not shown whether the architecture achieves strong results for question answering when supporting facts are not marked during training… ▽ More

    Submitted 4 March, 2016; originally announced March 2016.