(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–50 of 93 results for author: Hwang, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.07950  [pdf, other

    cs.CL cs.AI cs.HC

    Rel-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance

    Authors: Kaitlyn Zhou, Jena D. Hwang, Xiang Ren, Nouha Dziri, Dan Jurafsky, Maarten Sap

    Abstract: The reconfiguration of human-LM interactions from simple sentence completions to complex, multi-domain, humanlike engagements necessitates new methodologies to understand how humans choose to rely on LMs. In our work, we contend that reliance is influenced by numerous factors within the interactional context of a generation, a departure from prior work that used verbalized confidence (e.g., "I'm c… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Preprint

  2. arXiv:2407.04249  [pdf, other

    cs.CV

    FeatureSORT: Essential Features for Effective Tracking

    Authors: Hamidreza Hashempoor, Rosemary Koikara, Yu Dong Hwang

    Abstract: In this work, we introduce a novel tracker designed for online multiple object tracking with a focus on being simple, while being effective. we provide multiple feature modules each of which stands for a particular appearance information. By integrating distinct appearance features, including clothing color, style, and target direction, alongside a ReID network for robust embedding extraction, our… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  3. arXiv:2406.06072  [pdf, other

    cs.CV cs.LG cs.RO

    Adapting Pretrained ViTs with Convolution Injector for Visuo-Motor Control

    Authors: Dongyoon Hwang, Byungkun Lee, Hojoon Lee, Hyunseung Kim, Jaegul Choo

    Abstract: Vision Transformers (ViT), when paired with large-scale pretraining, have shown remarkable performance across various computer vision tasks, primarily due to their weak inductive bias. However, while such weak inductive bias aids in pretraining scalability, this may hinder the effective adaptation of ViTs for visuo-motor control tasks as a result of the absence of control-centric inductive biases.… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: accepted to ICML 2024

  4. arXiv:2406.06037  [pdf, other

    cs.LG cs.AI cs.CV

    Investigating Pre-Training Objectives for Generalization in Vision-Based Reinforcement Learning

    Authors: Donghu Kim, Hojoon Lee, Kyungmin Lee, Dongyoon Hwang, Jaegul Choo

    Abstract: Recently, various pre-training methods have been introduced in vision-based Reinforcement Learning (RL). However, their generalization ability remains unclear due to evaluations being limited to in-distribution environments and non-unified experimental setups. To address this, we introduce the Atari Pre-training Benchmark (Atari-PB), which pre-trains a ResNet-50 model on 10 million transitions fro… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: accepted to ICML 2024

  5. arXiv:2406.00324  [pdf, other

    cs.LG cs.AI

    Do's and Don'ts: Learning Desirable Skills with Instruction Videos

    Authors: Hyunseung Kim, Byungkun Lee, Hojoon Lee, Dongyoon Hwang, Donghu Kim, Jaegul Choo

    Abstract: Unsupervised skill discovery is a learning paradigm that aims to acquire diverse behaviors without explicit rewards. However, it faces challenges in learning complex behaviors and often leads to learning unsafe or undesirable behaviors. For instance, in various continuous control tasks, current unsupervised skill discovery methods succeed in learning basic locomotions like standing but struggle wi… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  6. arXiv:2405.19703  [pdf, other

    cs.LG cs.CV stat.ML

    Towards a Better Evaluation of Out-of-Domain Generalization

    Authors: Duhun Hwang, Suhyun Kang, Moonjung Eo, Jimyeong Kim, Wonjong Rhee

    Abstract: The objective of Domain Generalization (DG) is to devise algorithms and models capable of achieving high performance on previously unseen test distributions. In the pursuit of this objective, average measure has been employed as the prevalent measure for evaluating models and comparing algorithms in the existing DG studies. Despite its significance, a comprehensive exploration of the average measu… ▽ More

    Submitted 2 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

  7. arXiv:2405.12807  [pdf, other

    cs.LG cs.AI cs.IT

    FAdam: Adam is a natural gradient optimizer using diagonal empirical Fisher information

    Authors: Dongseong Hwang

    Abstract: This paper establishes a mathematical foundation for the Adam optimizer, elucidating its connection to natural gradient descent through Riemannian and information geometry. We rigorously analyze the diagonal empirical Fisher information matrix (FIM) in Adam, clarifying all detailed approximations and advocating for the use of log probability functions as loss, which should be based on discrete dis… ▽ More

    Submitted 9 July, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

    Comments: 21 pages, 4 figures, 6 tables

  8. arXiv:2404.09173  [pdf, other

    cs.LG cs.AI cs.CL

    TransformerFAM: Feedback attention is working memory

    Authors: Dongseong Hwang, Weiran Wang, Zhuoyuan Huo, Khe Chai Sim, Pedro Moreno Mengibar

    Abstract: While Transformers have revolutionized deep learning, their quadratic attention complexity hinders their ability to process infinitely long inputs. We propose Feedback Attention Memory (FAM), a novel Transformer architecture that leverages a feedback loop to enable the network to attend to its own latent representations. This design fosters the emergence of working memory within the Transformer, a… ▽ More

    Submitted 7 May, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

    Comments: 26 pages, 12 figures, 14 tables

  9. arXiv:2404.01954  [pdf, other

    cs.CL cs.AI

    HyperCLOVA X Technical Report

    Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

    Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More

    Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 44 pages; updated authors list and fixed author names

  10. arXiv:2403.14238  [pdf, other

    cs.CL cs.AI

    Reinforcement Learning from Reflective Feedback (RLRF): Aligning and Improving LLMs via Fine-Grained Self-Reflection

    Authors: Kyungjae Lee, Dasol Hwang, Sunghyun Park, Youngsoo Jang, Moontae Lee

    Abstract: Despite the promise of RLHF in aligning LLMs with human preferences, it often leads to superficial alignment, prioritizing stylistic changes over improving downstream performance of LLMs. Underspecified preferences could obscure directions to align the models. Lacking exploration restricts identification of desirable outputs to improve the models. To overcome these challenges, we propose a novel f… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: 22 pages, 5 figures, Submitted to ACL 2024

  11. arXiv:2403.12821  [pdf, other

    cs.LG cs.AI

    FlowerFormer: Empowering Neural Architecture Encoding using a Flow-aware Graph Transformer

    Authors: Dongyeong Hwang, Hyunju Kim, Sunwoo Kim, Kijung Shin

    Abstract: The success of a specific neural network architecture is closely tied to the dataset and task it tackles; there is no one-size-fits-all solution. Thus, considerable efforts have been made to quickly and accurately estimate the performances of neural architectures, without full training or evaluation, for given tasks and datasets. Neural architecture encoding has played a crucial role in the estima… ▽ More

    Submitted 21 March, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: CVPR 2024 Camera-Ready

  12. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  13. arXiv:2402.17184  [pdf, other

    cs.CL cs.SD eess.AS

    Extreme Encoder Output Frame Rate Reduction: Improving Computational Latencies of Large End-to-End Models

    Authors: Rohit Prabhavalkar, Zhong Meng, Weiran Wang, Adam Stooke, Xingyu Cai, Yanzhang He, Arun Narayanan, Dongseong Hwang, Tara N. Sainath, Pedro J. Moreno

    Abstract: The accuracy of end-to-end (E2E) automatic speech recognition (ASR) models continues to improve as they are scaled to larger sizes, with some now reaching billions of parameters. Widespread deployment and adoption of these models, however, requires computationally efficient strategies for decoding. In the present work, we study one such strategy: applying multiple frame reduction layers in the enc… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: Accepted to 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024)

  14. arXiv:2402.01183  [pdf, other

    cs.RO cs.AI cs.CL cs.CV

    LINGO-Space: Language-Conditioned Incremental Grounding for Space

    Authors: Dohyun Kim, Nayoung Oh, Deokmin Hwang, Daehyung Park

    Abstract: We aim to solve the problem of spatially localizing composite instructions referring to space: space grounding. Compared to current instance grounding, space grounding is challenging due to the ill-posedness of identifying locations referred to by discrete expressions and the compositional ambiguity of referring expressions. Therefore, we propose a novel probabilistic space-grounding methodology (… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: Accepted by AAAI 2024

  15. arXiv:2401.06730  [pdf, other

    cs.CL cs.AI cs.HC

    Relying on the Unreliable: The Impact of Language Models' Reluctance to Express Uncertainty

    Authors: Kaitlyn Zhou, Jena D. Hwang, Xiang Ren, Maarten Sap

    Abstract: As natural language becomes the default interface for human-AI interaction, there is a need for LMs to appropriately communicate uncertainties in downstream applications. In this work, we investigate how LMs incorporate confidence in responses via natural language and how downstream users behave in response to LM-articulated uncertainties. We examine publicly deployed models and find that LMs are… ▽ More

    Submitted 9 July, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

    Comments: ACL 2024 (Camera Ready)

  16. arXiv:2401.05730  [pdf, other

    cs.CV cs.AI

    Enhancing Contrastive Learning with Efficient Combinatorial Positive Pairing

    Authors: Jaeill Kim, Duhun Hwang, Eunjung Lee, Jangwon Suh, Jimyeong Kim, Wonjong Rhee

    Abstract: In the past few years, contrastive learning has played a central role for the success of visual unsupervised representation learning. Around the same time, high-performance non-contrastive learning methods have been developed as well. While most of the works utilize only two views, we carefully review the existing multi-view methods and propose a general multi-view strategy that can improve learni… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

  17. arXiv:2312.10087  [pdf, ps, other

    eess.AS cs.LG cs.SD

    Revisiting the Entropy Semiring for Neural Speech Recognition

    Authors: Oscar Chang, Dongseong Hwang, Olivier Siohan

    Abstract: In streaming settings, speech recognition models have to map sub-sequences of speech to text before the full audio stream becomes available. However, since alignment information between speech and text is rarely available during training, models need to learn it in a completely self-supervised way. In practice, the exponential number of possible alignments makes this extremely challenging, with mo… ▽ More

    Submitted 18 December, 2023; v1 submitted 13 December, 2023; originally announced December 2023.

  18. arXiv:2312.07399  [pdf, other

    cs.CL cs.AI

    Large Language Models are Clinical Reasoners: Reasoning-Aware Diagnosis Framework with Prompt-Generated Rationales

    Authors: Taeyoon Kwon, Kai Tzu-iunn Ong, Dongjin Kang, Seungjun Moon, Jeong Ryong Lee, Dosik Hwang, Yongsik Sim, Beomseok Sohn, Dongha Lee, Jinyoung Yeo

    Abstract: Machine reasoning has made great progress in recent years owing to large language models (LLMs). In the clinical domain, however, most NLP-driven projects mainly focus on clinical classification or reading comprehension, and under-explore clinical reasoning for disease diagnosis due to the expensive rationale annotation with clinicians. In this work, we present a "reasoning-aware" diagnosis framew… ▽ More

    Submitted 10 May, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: Accepted to AAAI 2024

  19. arXiv:2311.16586  [pdf, other

    cs.IR

    SARDINE: A Simulator for Automated Recommendation in Dynamic and Interactive Environments

    Authors: Romain Deffayet, Thibaut Thonet, Dongyoon Hwang, Vassilissa Lehoux, Jean-Michel Renders, Maarten de Rijke

    Abstract: Simulators can provide valuable insights for researchers and practitioners who wish to improve recommender systems, because they allow one to easily tweak the experimental setup in which recommender systems operate, and as a result lower the cost of identifying general trends and uncovering novel findings about the candidate methods. A key requirement to enable this accelerated improvement cycle i… ▽ More

    Submitted 8 April, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

  20. arXiv:2311.11215  [pdf, other

    cs.CL cs.AI

    SPLAIN: Augmenting Cybersecurity Warnings with Reasons and Data

    Authors: Vera A. Kazakova, Jena D. Hwang, Bonnie J. Dorr, Yorick Wilks, J. Blake Gage, Alex Memory, Mark A. Clark

    Abstract: Effective cyber threat recognition and prevention demand comprehensible forecasting systems, as prior approaches commonly offer limited and, ultimately, unconvincing information. We introduce Simplified Plaintext Language (SPLAIN), a natural language generator that converts warning data into user-friendly cyber threat explanations. SPLAIN is designed to generate clear, actionable outputs, incorpor… ▽ More

    Submitted 18 November, 2023; originally announced November 2023.

    Comments: Presented at FLAIRS-2019 as poster (see ancillary files)

    ACM Class: I.2

    Journal ref: FLAIRS-2019

  21. arXiv:2311.08469  [pdf, other

    cs.CL

    UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations

    Authors: Wenting Zhao, Justin T Chiu, Jena D. Hwang, Faeze Brahman, Jack Hessel, Sanjiban Choudhury, Yejin Choi, Xiang Lorraine Li, Alane Suhr

    Abstract: Language technologies that accurately model the dynamics of events must perform commonsense reasoning. Existing work evaluating commonsense reasoning focuses on making inferences about common, everyday situations. To instead investigate the ability to model unusual, unexpected, and unlikely situations, we explore the task of uncommonsense abductive reasoning. Given a piece of context with an unexp… ▽ More

    Submitted 1 May, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: accepted at NAACL'24

  22. arXiv:2311.00059  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    The Generative AI Paradox: "What It Can Create, It May Not Understand"

    Authors: Peter West, Ximing Lu, Nouha Dziri, Faeze Brahman, Linjie Li, Jena D. Hwang, Liwei Jiang, Jillian Fisher, Abhilasha Ravichander, Khyathi Chandu, Benjamin Newman, Pang Wei Koh, Allyson Ettinger, Yejin Choi

    Abstract: The recent wave of generative AI has sparked unprecedented global attention, with both excitement and concern over potentially superhuman levels of artificial intelligence: models now take only seconds to produce outputs that would challenge or exceed the capabilities even of expert humans. At the same time, models still show basic errors in understanding that would not be expected even in non-exp… ▽ More

    Submitted 31 October, 2023; originally announced November 2023.

  23. arXiv:2310.20178  [pdf, other

    cs.LG cs.AI

    Learning to Discover Skills through Guidance

    Authors: Hyunseung Kim, Byungkun Lee, Hojoon Lee, Dongyoon Hwang, Sejik Park, Kyushik Min, Jaegul Choo

    Abstract: In the field of unsupervised skill discovery (USD), a major challenge is limited exploration, primarily due to substantial penalties when skills deviate from their initial trajectories. To enhance exploration, recent methodologies employ auxiliary rewards to maximize the epistemic uncertainty or entropy of states. However, we have identified that the effectiveness of these rewards declines as the… ▽ More

    Submitted 1 November, 2023; v1 submitted 31 October, 2023; originally announced October 2023.

    Comments: 29 pages, 18 figures, published at NeurIPS 2023

  24. arXiv:2310.17793  [pdf, other

    cs.CL cs.AI

    "You Are An Expert Linguistic Annotator": Limits of LLMs as Analyzers of Abstract Meaning Representation

    Authors: Allyson Ettinger, Jena D. Hwang, Valentina Pyatkin, Chandra Bhagavatula, Yejin Choi

    Abstract: Large language models (LLMs) show amazing proficiency and fluency in the use of language. Does this mean that they have also acquired insightful linguistic knowledge about the language, to an extent that they can serve as an "expert linguistic annotator"? In this paper, we examine the successes and limitations of the GPT-3, ChatGPT, and GPT-4 models in analysis of sentence meaning structure, focus… ▽ More

    Submitted 11 December, 2023; v1 submitted 26 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023 Findings (short)

  25. arXiv:2310.14356  [pdf, other

    cs.CV cs.CL cs.CY cs.HC

    Computer Vision Datasets and Models Exhibit Cultural and Linguistic Diversity in Perception

    Authors: Andre Ye, Sebastin Santy, Jena D. Hwang, Amy X. Zhang, Ranjay Krishna

    Abstract: Computer vision often treats human perception as homogeneous: an implicit assumption that visual stimuli are perceived similarly by everyone. This assumption is reflected in the way researchers collect datasets and train vision models. By contrast, literature in cross-cultural psychology and linguistics has provided evidence that people from different cultural backgrounds observe vastly different… ▽ More

    Submitted 9 March, 2024; v1 submitted 22 October, 2023; originally announced October 2023.

  26. arXiv:2309.12963  [pdf, ps, other

    eess.AS cs.SD

    Massive End-to-end Models for Short Search Queries

    Authors: Weiran Wang, Rohit Prabhavalkar, Dongseong Hwang, Qiujia Li, Khe Chai Sim, Bo Li, James Qin, Xingyu Cai, Adam Stooke, Zhong Meng, CJ Zheng, Yanzhang He, Tara Sainath, Pedro Moreno Mengibar

    Abstract: In this work, we investigate two popular end-to-end automatic speech recognition (ASR) models, namely Connectionist Temporal Classification (CTC) and RNN-Transducer (RNN-T), for offline recognition of voice search queries, with up to 2B model parameters. The encoders of our models use the neural architecture of Google's universal speech model (USM), with additional funnel pooling layers to signifi… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

  27. arXiv:2309.09996  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Improving Speech Recognition for African American English With Audio Classification

    Authors: Shefali Garg, Zhouyuan Huo, Khe Chai Sim, Suzan Schwartz, Mason Chua, Alëna Aksënova, Tsendsuren Munkhdalai, Levi King, Darryl Wright, Zion Mengesha, Dongseong Hwang, Tara Sainath, Françoise Beaufays, Pedro Moreno Mengibar

    Abstract: Automatic speech recognition (ASR) systems have been shown to have large quality disparities between the language varieties they are intended or expected to recognize. One way to mitigate this is to train or fine-tune models with more representative datasets. But this approach can be hindered by limited in-domain data for training and evaluation. We propose a new way to improve the robustness of a… ▽ More

    Submitted 16 September, 2023; originally announced September 2023.

  28. Towards Validating Long-Term User Feedbacks in Interactive Recommendation Systems

    Authors: Hojoon Lee, Dongyoon Hwang, Kyushik Min, Jaegul Choo

    Abstract: Interactive Recommender Systems (IRSs) have attracted a lot of attention, due to their ability to model interactive processes between users and recommender systems. Numerous approaches have adopted Reinforcement Learning (RL) algorithms, as these can directly maximize users' cumulative rewards. In IRS, researchers commonly utilize publicly available review datasets to compare and evaluate algorith… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

    Comments: Accepted to SIGIR'22

  29. arXiv:2306.05637  [pdf, other

    cs.LG

    On the Importance of Feature Decorrelation for Unsupervised Representation Learning in Reinforcement Learning

    Authors: Hojoon Lee, Koanho Lee, Dongyoon Hwang, Hyunho Lee, Byungkun Lee, Jaegul Choo

    Abstract: Recently, unsupervised representation learning (URL) has improved the sample efficiency of Reinforcement Learning (RL) by pretraining a model from a large unlabeled dataset. The underlying principle of these methods is to learn temporally predictive representations by predicting future states in the latent space. However, an important challenge of this approach is the representational collapse, wh… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    Comments: Accepted to ICML 2023

  30. arXiv:2306.01985  [pdf, other

    cs.CL

    COBRA Frames: Contextual Reasoning about Effects and Harms of Offensive Statements

    Authors: Xuhui Zhou, Hao Zhu, Akhila Yerukola, Thomas Davidson, Jena D. Hwang, Swabha Swayamdipta, Maarten Sap

    Abstract: Warning: This paper contains content that may be offensive or upsetting. Understanding the harms and offensiveness of statements requires reasoning about the social and situational context in which statements are made. For example, the utterance "your English is very good" may implicitly signal an insult when uttered by a white man to a non-white colleague, but uttered by an ESL teacher to their s… ▽ More

    Submitted 8 June, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

    Comments: Accepted to Findings of ACL 2023

  31. arXiv:2306.01789  [pdf, other

    cs.SD cs.CL eess.AS

    Edit Distance based RL for RNNT decoding

    Authors: Dongseong Hwang, Changwan Ryu, Khe Chai Sim

    Abstract: RNN-T is currently considered the industry standard in ASR due to its exceptional WERs in various benchmark tests and its ability to support seamless streaming and longform transcription. However, its biggest drawback lies in the significant discrepancy between its training and inference objectives. During training, RNN-T maximizes all alignment probabilities by teacher forcing, while during infer… ▽ More

    Submitted 14 July, 2023; v1 submitted 31 May, 2023; originally announced June 2023.

    Comments: 5 pages, 2 figures

  32. arXiv:2305.19472  [pdf, other

    cs.CL cs.AI cs.LG

    PlaSma: Making Small Language Models Better Procedural Knowledge Models for (Counterfactual) Planning

    Authors: Faeze Brahman, Chandra Bhagavatula, Valentina Pyatkin, Jena D. Hwang, Xiang Lorraine Li, Hirona J. Arai, Soumya Sanyal, Keisuke Sakaguchi, Xiang Ren, Yejin Choi

    Abstract: Procedural planning, which entails decomposing a high-level goal into a sequence of temporally ordered steps, is an important yet intricate task for machines. It involves integrating common-sense knowledge to reason about complex contextualized situations that are often counterfactual, e.g. "scheduling a doctor's appointment without a phone". While current approaches show encouraging results using… ▽ More

    Submitted 26 July, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: cited new paper, 27 pages

  33. arXiv:2305.18654  [pdf, other

    cs.CL cs.AI cs.LG

    Faith and Fate: Limits of Transformers on Compositionality

    Authors: Nouha Dziri, Ximing Lu, Melanie Sclar, Xiang Lorraine Li, Liwei Jiang, Bill Yuchen Lin, Peter West, Chandra Bhagavatula, Ronan Le Bras, Jena D. Hwang, Soumya Sanyal, Sean Welleck, Xiang Ren, Allyson Ettinger, Zaid Harchaoui, Yejin Choi

    Abstract: Transformer large language models (LLMs) have sparked admiration for their exceptional performance on tasks that demand intricate multi-step reasoning. Yet, these models simultaneously show failures on surprisingly trivial problems. This begs the question: Are these errors incidental, or do they signal more substantial limitations? In an attempt to demystify transformer LLMs, we investigate the li… ▽ More

    Submitted 31 October, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

    Comments: 10 pages + appendix (40 pages)

  34. arXiv:2305.13408  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Modular Domain Adaptation for Conformer-Based Streaming ASR

    Authors: Qiujia Li, Bo Li, Dongseong Hwang, Tara N. Sainath, Pedro M. Mengibar

    Abstract: Speech data from different domains has distinct acoustic and linguistic characteristics. It is common to train a single multidomain model such as a Conformer transducer for speech recognition on a mixture of data from all domains. However, changing data in one domain or adding a new domain would require the multidomain model to be retrained. To this end, we propose a framework called modular domai… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted to Interspeech 2023

  35. arXiv:2305.11012  [pdf, other

    cs.CV

    SDC-UDA: Volumetric Unsupervised Domain Adaptation Framework for Slice-Direction Continuous Cross-Modality Medical Image Segmentation

    Authors: Hyungseob Shin, Hyeongyu Kim, Sewon Kim, Yohan Jun, Taejoon Eo, Dosik Hwang

    Abstract: Recent advances in deep learning-based medical image segmentation studies achieve nearly human-level performance in fully supervised manner. However, acquiring pixel-level expert annotations is extremely expensive and laborious in medical imaging fields. Unsupervised domain adaptation (UDA) can alleviate this problem, which makes it possible to use annotated data in one imaging modality to train a… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

    Comments: 10 pages, 7 figures, CVPR 2023

  36. arXiv:2304.01552  [pdf, other

    cs.CV cs.AI cs.LG

    Meta-Learning with a Geometry-Adaptive Preconditioner

    Authors: Suhyun Kang, Duhun Hwang, Moonjung Eo, Taesup Kim, Wonjong Rhee

    Abstract: Model-agnostic meta-learning (MAML) is one of the most successful meta-learning algorithms. It has a bi-level optimization structure where the outer-loop process learns a shared initialization and the inner-loop process optimizes task-specific weights. Although MAML relies on the standard gradient descent in the inner-loop, recent studies have shown that controlling the inner-loop's gradient desce… ▽ More

    Submitted 29 November, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

    Comments: Accepted at CVPR 2023. Code is available at: https://github.com/Suhyun777/CVPR23-GAP; This is an extended version of our previous CVPR23 work

  37. arXiv:2304.01434  [pdf, other

    cs.CV cs.AI cs.LG

    VNE: An Effective Method for Improving Deep Representation by Manipulating Eigenvalue Distribution

    Authors: Jaeill Kim, Suhyun Kang, Duhun Hwang, Jungwook Shin, Wonjong Rhee

    Abstract: Since the introduction of deep learning, a wide scope of representation properties, such as decorrelation, whitening, disentanglement, rank, isotropy, and mutual information, have been studied to improve the quality of representation. However, manipulating such properties can be challenging in terms of implementational effectiveness and general applicability. To address these limitations, we propo… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

    Comments: Accepted at CVPR 2023. Code is available at: https://github.com/jaeill/CVPR23-VNE

  38. arXiv:2303.15381  [pdf, other

    cs.CL

    Causal schema induction for knowledge discovery

    Authors: Michael Regan, Jena D. Hwang, Keisuke Sakaguchi, James Pustejovsky

    Abstract: Making sense of familiar yet new situations typically involves making generalizations about causal schemas, stories that help humans reason about event sequences. Reasoning about events includes identifying cause and effect relations shared across event instances, a process we refer to as causal schema induction. Statistical schema induction systems may leverage structural knowledge encoded in dis… ▽ More

    Submitted 27 March, 2023; originally announced March 2023.

    Comments: 8 pages, appendix

  39. arXiv:2303.04143  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?

    Authors: Boris Knyazev, Doha Hwang, Simon Lacoste-Julien

    Abstract: Pretraining a neural network on a large dataset is becoming a cornerstone in machine learning that is within the reach of only a few communities with large-resources. We aim at an ambitious goal of democratizing pretraining. Towards that goal, we train and release a single neural network that can predict high quality ImageNet parameters of other neural networks. By using predicted parameters for i… ▽ More

    Submitted 31 May, 2023; v1 submitted 7 March, 2023; originally announced March 2023.

    Comments: ICML 2023, camera ready (7 tables with extra results added), code and models are at https://github.com/SamsungSAILMontreal/ghn3

  40. arXiv:2303.01105  [pdf, other

    eess.IV cs.CV cs.LG

    Evidence-empowered Transfer Learning for Alzheimer's Disease

    Authors: Kai Tzu-iunn Ong, Hana Kim, Minjin Kim, Jinseong Jang, Beomseok Sohn, Yoon Seong Choi, Dosik Hwang, Seong Jae Hwang, Jinyoung Yeo

    Abstract: Transfer learning has been widely utilized to mitigate the data scarcity problem in the field of Alzheimer's disease (AD). Conventional transfer learning relies on re-using models trained on AD-irrelevant tasks such as natural image classification. However, it often leads to negative transfer due to the discrepancy between the non-medical source and target medical domains. To address this, we pres… ▽ More

    Submitted 17 April, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

    Comments: Accepted to IEEE International Symposium on Biomedical Imaging (ISBI) 2023. The authorship was changed from co-first authors to a single first author, which was authorized by the adviser/corresponding author Jinyoung Yeo (Apr 18th, 2023)

  41. arXiv:2302.01496  [pdf, ps, other

    cs.CL cs.LG cs.SD eess.AS

    Efficient Domain Adaptation for Speech Foundation Models

    Authors: Bo Li, Dongseong Hwang, Zhouyuan Huo, Junwen Bai, Guru Prakash, Tara N. Sainath, Khe Chai Sim, Yu Zhang, Wei Han, Trevor Strohman, Francoise Beaufays

    Abstract: Foundation models (FMs), that are trained on broad data at scale and are adaptable to a wide range of downstream tasks, have brought large interest in the research community. Benefiting from the diverse data sources such as different modalities, languages and application domains, foundation models have demonstrated strong generalization and knowledge transfer capabilities. In this paper, we presen… ▽ More

    Submitted 2 February, 2023; originally announced February 2023.

  42. arXiv:2301.11578  [pdf, other

    cs.LG

    Learning to Unlearn: Instance-wise Unlearning for Pre-trained Classifiers

    Authors: Sungmin Cha, Sungjun Cho, Dasol Hwang, Honglak Lee, Taesup Moon, Moontae Lee

    Abstract: Since the recent advent of regulations for data protection (e.g., the General Data Protection Regulation), there has been increasing demand in deleting information learned from sensitive data in pre-trained models without retraining from scratch. The inherent vulnerability of neural networks towards adversarial attacks and unfairness also calls for a robust method to remove or correct information… ▽ More

    Submitted 15 January, 2024; v1 submitted 27 January, 2023; originally announced January 2023.

    Comments: AAAI 2024 camera ready version

  43. arXiv:2301.02903  [pdf, other

    cs.LG cs.CV

    Transferring Pre-trained Multimodal Representations with Cross-modal Similarity Matching

    Authors: Byoungjip Kim, Sungik Choi, Dasol Hwang, Moontae Lee, Honglak Lee

    Abstract: Despite surprising performance on zero-shot transfer, pre-training a large-scale multimodal model is often prohibitive as it requires a huge amount of data and computing resources. In this paper, we propose a method (BeamCLIP) that can effectively transfer the representations of a large pre-trained multimodal model (CLIP-ViT) into a small target model (e.g., ResNet-18). For unsupervised transfer,… ▽ More

    Submitted 7 January, 2023; originally announced January 2023.

    Comments: 20 pages, 10 figures, NeurIPS 2022

  44. arXiv:2212.10409  [pdf, other

    cs.CL

    ClarifyDelphi: Reinforced Clarification Questions with Defeasibility Rewards for Social and Moral Situations

    Authors: Valentina Pyatkin, Jena D. Hwang, Vivek Srikumar, Ximing Lu, Liwei Jiang, Yejin Choi, Chandra Bhagavatula

    Abstract: Context is everything, even in commonsense moral reasoning. Changing contexts can flip the moral judgment of an action; "Lying to a friend" is wrong in general, but may be morally acceptable if it is intended to protect their life. We present ClarifyDelphi, an interactive system that learns to ask clarification questions (e.g., why did you lie to your friend?) in order to elicit additional salie… ▽ More

    Submitted 30 May, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: Accepted to ACL 2023 main conference, 9 pages + bibliography + appendix

  45. arXiv:2212.09246  [pdf, other

    cs.CL

    I2D2: Inductive Knowledge Distillation with NeuroLogic and Self-Imitation

    Authors: Chandra Bhagavatula, Jena D. Hwang, Doug Downey, Ronan Le Bras, Ximing Lu, Lianhui Qin, Keisuke Sakaguchi, Swabha Swayamdipta, Peter West, Yejin Choi

    Abstract: Commonsense capabilities of pre-trained language models dramatically improve with scale, leading many to believe that scale is the only winning recipe. But is it? Here, we investigate an alternative that a priori seems impossible: can smaller language models (e.g., GPT-2) win over models that are orders of magnitude larger and better (e.g., GPT-3), if powered with novel commonsense distillation al… ▽ More

    Submitted 26 May, 2023; v1 submitted 18 December, 2022; originally announced December 2022.

    Comments: ACL 2023

  46. arXiv:2211.06516  [pdf, other

    cs.LG

    Bandits for Online Calibration: An Application to Content Moderation on Social Media Platforms

    Authors: Vashist Avadhanula, Omar Abdul Baki, Hamsa Bastani, Osbert Bastani, Caner Gocmen, Daniel Haimovich, Darren Hwang, Dima Karamshuk, Thomas Leeper, Jiayuan Ma, Gregory Macnamara, Jake Mullett, Christopher Palow, Sung Park, Varun S Rajagopal, Kevin Schaeffer, Parikshit Shah, Deeksha Sinha, Nicolas Stier-Moses, Peng Xu

    Abstract: We describe the current content moderation strategy employed by Meta to remove policy-violating content from its platforms. Meta relies on both handcrafted and learned risk models to flag potentially violating content for human review. Our approach aggregates these risk models into a single ranking score, calibrating them to prioritize more reliable risk models. A key challenge is that violation t… ▽ More

    Submitted 11 November, 2022; originally announced November 2022.

  47. arXiv:2211.02712  [pdf, other

    cs.LG cs.SD eess.AS

    Resource-Efficient Transfer Learning From Speech Foundation Model Using Hierarchical Feature Fusion

    Authors: Zhouyuan Huo, Khe Chai Sim, Bo Li, Dongseong Hwang, Tara N. Sainath, Trevor Strohman

    Abstract: Self-supervised pre-training of a speech foundation model, followed by supervised fine-tuning, has shown impressive quality improvements on automatic speech recognition (ASR) tasks. Fine-tuning separate foundation models for many downstream tasks are expensive since the foundation model is usually very big. Parameter-efficient fine-tuning methods (e.g. adapter, sparse update methods) offer an alte… ▽ More

    Submitted 4 November, 2022; originally announced November 2022.

  48. arXiv:2210.12678  [pdf, other

    cs.CL

    ComFact: A Benchmark for Linking Contextual Commonsense Knowledge

    Authors: Silin Gao, Jena D. Hwang, Saya Kanno, Hiromi Wakaki, Yuki Mitsufuji, Antoine Bosselut

    Abstract: Understanding rich narratives, such as dialogues and stories, often requires natural language processing systems to access relevant knowledge from commonsense knowledge graphs. However, these systems typically retrieve facts from KGs using simple heuristics that disregard the complex challenges of identifying situationally-relevant commonsense knowledge (e.g., contextualization, implicitness, ambi… ▽ More

    Submitted 23 October, 2022; originally announced October 2022.

    Comments: Findings of EMNLP 2022, long paper

  49. arXiv:2210.05793  [pdf, other

    cs.LG cs.CL cs.SD eess.AS

    Comparison of Soft and Hard Target RNN-T Distillation for Large-scale ASR

    Authors: Dongseong Hwang, Khe Chai Sim, Yu Zhang, Trevor Strohman

    Abstract: Knowledge distillation is an effective machine learning technique to transfer knowledge from a teacher model to a smaller student model, especially with unlabeled data. In this paper, we focus on knowledge distillation for the RNN-T model, which is widely used in state-of-the-art (SoTA) automatic speech recognition (ASR). Specifically, we compared using soft and hard target distillation to train l… ▽ More

    Submitted 28 October, 2022; v1 submitted 11 October, 2022; originally announced October 2022.

    Comments: 8 pages, 2 figures

  50. arXiv:2209.06293  [pdf, other

    cs.CL cs.CV

    Do Androids Laugh at Electric Sheep? Humor "Understanding" Benchmarks from The New Yorker Caption Contest

    Authors: Jack Hessel, Ana Marasović, Jena D. Hwang, Lillian Lee, Jeff Da, Rowan Zellers, Robert Mankoff, Yejin Choi

    Abstract: Large neural networks can now generate jokes, but do they really "understand" humor? We challenge AI models with three tasks derived from the New Yorker Cartoon Caption Contest: matching a joke to a cartoon, identifying a winning caption, and explaining why a winning caption is funny. These tasks encapsulate progressively more sophisticated aspects of "understanding" a cartoon; key elements are th… ▽ More

    Submitted 6 July, 2023; v1 submitted 13 September, 2022; originally announced September 2022.

    Journal ref: ACL 2023