(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–50 of 85 results for author: Park, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.16447  [pdf, ps, other

    eess.AS cs.SD

    The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization

    Authors: Samuele Cornell, Taejin Park, Steve Huang, Christoph Boeddeker, Xuankai Chang, Matthew Maciejewski, Matthew Wiesner, Paola Garcia, Shinji Watanabe

    Abstract: This paper presents the CHiME-8 DASR challenge which carries on from the previous edition CHiME-7 DASR (C7DASR) and the past CHiME-6 challenge. It focuses on joint multi-channel distant speech recognition (DASR) and diarization with one or more, possibly heterogeneous, devices. The main goal is to spur research towards meeting transcription approaches that can generalize across arbitrary number of… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  2. arXiv:2406.13342  [pdf, other

    cs.CL cs.AI

    ZeroDL: Zero-shot Distribution Learning for Text Clustering via Large Language Models

    Authors: Hwiyeol Jo, Hyunwoo Lee, Taiwoo Park

    Abstract: The recent advancements in large language models (LLMs) have brought significant progress in solving NLP tasks. Notably, in-context learning (ICL) is the key enabling mechanism for LLMs to understand specific tasks and grasping nuances. In this paper, we propose a simple yet effective method to contextualize a task toward a specific LLM, by (1) observing how a given LLM describes (all or a part of… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: ARR Submitted

  3. arXiv:2406.11875  [pdf, other

    cs.AI

    ChatPCG: Large Language Model-Driven Reward Design for Procedural Content Generation

    Authors: In-Chang Baek, Tae-Hwa Park, Jin-Ha Noh, Cheong-Mok Bae, Kyung-Joong Kim

    Abstract: Driven by the rapid growth of machine learning, recent advances in game artificial intelligence (AI) have significantly impacted productivity across various gaming genres. Reward design plays a pivotal role in training game AI models, wherein researchers implement concepts of specific reward functions. However, despite the presence of AI, the reward design process predominantly remains in the doma… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 4 pages, 2 figures, accepted at IEEE Conference on Games 2024

  4. arXiv:2406.06976  [pdf, other

    cs.LG cs.AI

    Discrete Dictionary-based Decomposition Layer for Structured Representation Learning

    Authors: Taewon Park, Hyun-Chul Kim, Minho Lee

    Abstract: Neuro-symbolic neural networks have been extensively studied to integrate symbolic operations with neural networks, thereby improving systematic generalization. Specifically, Tensor Product Representation (TPR) framework enables neural networks to perform differentiable symbolic operations by encoding the symbolic structure of data within vector spaces. However, TPR-based neural networks often str… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  5. arXiv:2406.01012  [pdf, other

    cs.LG cs.AI

    Attention-based Iterative Decomposition for Tensor Product Representation

    Authors: Taewon Park, Inchul Choi, Minho Lee

    Abstract: In recent research, Tensor Product Representation (TPR) is applied for the systematic generalization task of deep neural networks by learning the compositional structure of data. However, such prior works show limited performance in discovering and representing the symbolic structure from unseen test data because their decomposition to the structural representations was incomplete. In this work, w… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Published in ICLR 2024

  6. arXiv:2405.19795  [pdf, other

    cs.CL cs.AI

    SLM as Guardian: Pioneering AI Safety with Small Language Models

    Authors: Ohjoon Kwon, Donghyeon Jeon, Nayoung Choi, Gyu-Hwung Cho, Changbong Kim, Hyunwoo Lee, Inho Kang, Sun Kim, Taiwoo Park

    Abstract: Most prior safety research of large language models (LLMs) has focused on enhancing the alignment of LLMs to better suit the safety requirements of humans. However, internalizing such safeguard features into larger models brought challenges of higher training cost and unintended degradation of helpfulness. To overcome such challenges, a modular approach employing a smaller LLM to detect harmful us… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  7. arXiv:2405.14867  [pdf, other

    cs.CV

    Improved Distribution Matching Distillation for Fast Image Synthesis

    Authors: Tianwei Yin, Michaël Gharbi, Taesung Park, Richard Zhang, Eli Shechtman, Fredo Durand, William T. Freeman

    Abstract: Recent approaches have shown promises distilling diffusion models into efficient one-step generators. Among them, Distribution Matching Distillation (DMD) produces one-step generators that match their teacher in distribution, without enforcing a one-to-one correspondence with the sampling trajectories of their teachers. However, to ensure stable training, DMD requires an additional regression loss… ▽ More

    Submitted 24 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: Code, model, and dataset are available at https://tianweiy.github.io/dmd2

  8. arXiv:2405.06216  [pdf, other

    cs.CV

    Event-based Structure-from-Orbit

    Authors: Ethan Elms, Yasir Latif, Tae Ha Park, Tat-Jun Chin

    Abstract: Event sensors offer high temporal resolution visual sensing, which makes them ideal for perceiving fast visual phenomena without suffering from motion blur. Certain applications in robotics and vision-based navigation require 3D perception of an object undergoing circular or spinning motion in front of a static camera, such as recovering the angular velocity and shape of the object. The setting is… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: This work will be published in the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 2024

  9. arXiv:2405.05967  [pdf, other

    cs.CV cs.GR cs.LG

    Distilling Diffusion Models into Conditional GANs

    Authors: Minguk Kang, Richard Zhang, Connelly Barnes, Sylvain Paris, Suha Kwak, Jaesik Park, Eli Shechtman, Jun-Yan Zhu, Taesung Park

    Abstract: We propose a method to distill a complex multistep diffusion model into a single-step conditional GAN student model, dramatically accelerating inference, while preserving image quality. Our approach interprets diffusion distillation as a paired image-to-image translation task, using noise-to-image pairs of the diffusion model's ODE trajectory. For efficient regression loss computation, we propose… ▽ More

    Submitted 17 July, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

    Comments: Project page: https://mingukkang.github.io/Diffusion2GAN/ (ECCV2024)

  10. arXiv:2404.16029  [pdf, other

    cs.CV

    Editable Image Elements for Controllable Synthesis

    Authors: Jiteng Mu, Michaël Gharbi, Richard Zhang, Eli Shechtman, Nuno Vasconcelos, Xiaolong Wang, Taesung Park

    Abstract: Diffusion models have made significant advances in text-guided synthesis tasks. However, editing user-provided images remains challenging, as the high dimensional noise input space of diffusion models is not naturally suited for image inversion or spatial editing. In this work, we propose an image representation that promotes spatial editing of input images using a diffusion model. Concretely, we… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: Project page: https://jitengmu.github.io/Editable_Image_Elements/

  11. arXiv:2404.12388  [pdf, other

    cs.CV

    VideoGigaGAN: Towards Detail-rich Video Super-Resolution

    Authors: Yiran Xu, Taesung Park, Richard Zhang, Yang Zhou, Eli Shechtman, Feng Liu, Jia-Bin Huang, Difan Liu

    Abstract: Video super-resolution (VSR) approaches have shown impressive temporal consistency in upsampled videos. However, these approaches tend to generate blurrier results than their image counterparts as they are limited in their generative capability. This raises a fundamental question: can we extend the success of a generative image upsampler to the VSR task while preserving the temporal consistency? W… ▽ More

    Submitted 1 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: project page: https://videogigagan.github.io/

  12. arXiv:2404.12382  [pdf, other

    cs.CV cs.AI cs.GR

    Lazy Diffusion Transformer for Interactive Image Editing

    Authors: Yotam Nitzan, Zongze Wu, Richard Zhang, Eli Shechtman, Daniel Cohen-Or, Taesung Park, Michaël Gharbi

    Abstract: We introduce a novel diffusion transformer, LazyDiffusion, that generates partial image updates efficiently. Our approach targets interactive image editing applications in which, starting from a blank canvas or an image, a user specifies a sequence of localized image modifications using binary masks and text prompts. Our generator operates in two phases. First, a context encoder processes the curr… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  13. arXiv:2404.12333  [pdf, other

    cs.CV

    Customizing Text-to-Image Diffusion with Camera Viewpoint Control

    Authors: Nupur Kumari, Grace Su, Richard Zhang, Taesung Park, Eli Shechtman, Jun-Yan Zhu

    Abstract: Model customization introduces new concepts to existing text-to-image models, enabling the generation of the new concept in novel contexts. However, such methods lack accurate camera view control w.r.t the object, and users must resort to prompt engineering (e.g., adding "top-view") to achieve coarse view control. In this work, we introduce a new task -- enabling explicit control of camera viewpoi… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: project page: https://customdiffusion360.github.io

  14. arXiv:2404.08672  [pdf, other

    cs.IR cs.AI cs.CL cs.CY cs.LG

    Taxonomy and Analysis of Sensitive User Queries in Generative AI Search

    Authors: Hwiyeol Jo, Taiwoo Park, Nayoung Choi, Changbong Kim, Ohjoon Kwon, Donghyeon Jeon, Hyunwoo Lee, Eui-Hyeon Lee, Kyoungho Shin, Sun Suk Lim, Kyungmi Kim, Jihye Lee, Sun Kim

    Abstract: Although there has been a growing interest among industries to integrate generative LLMs into their services, limited experiences and scarcity of resources acts as a barrier in launching and servicing large-scale LLM-based conversational services. In this paper, we share our experiences in developing and operating generative AI models within a national-scale search engine, with a specific focus on… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  15. arXiv:2404.07217  [pdf, other

    eess.SP cs.AI cs.CV cs.LG

    Attention-aware Semantic Communications for Collaborative Inference

    Authors: Jiwoong Im, Nayoung Kwon, Taewoo Park, Jiheon Woo, Jaeho Lee, Yongjune Kim

    Abstract: We propose a communication-efficient collaborative inference framework in the domain of edge inference, focusing on the efficient use of vision transformer (ViT) models. The partitioning strategy of conventional collaborative inference fails to reduce communication cost because of the inherent architecture of ViTs maintaining consistent layer dimensions across the entire transformer encoder. There… ▽ More

    Submitted 31 May, 2024; v1 submitted 23 February, 2024; originally announced April 2024.

  16. arXiv:2404.01954  [pdf, other

    cs.CL cs.AI

    HyperCLOVA X Technical Report

    Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

    Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More

    Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 44 pages; updated authors list and fixed author names

  17. arXiv:2403.12036  [pdf, other

    cs.CV cs.GR cs.LG

    One-Step Image Translation with Text-to-Image Models

    Authors: Gaurav Parmar, Taesung Park, Srinivasa Narasimhan, Jun-Yan Zhu

    Abstract: In this work, we address two limitations of existing conditional diffusion models: their slow inference speed due to the iterative denoising process and their reliance on paired data for model fine-tuning. To tackle these issues, we introduce a general method for adapting a single-step diffusion model to new tasks and domains through adversarial learning objectives. Specifically, we consolidate va… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Github: https://github.com/GaParmar/img2img-turbo

  18. arXiv:2402.08451  [pdf, other

    cs.HC

    Moonwalk: Advancing Gait-Based User Recognition on Wearable Devices with Metric Learning

    Authors: Asaf Liberman, Oron Levy, Soroush Shahi, Cori Tymoszek Park, Mike Ralph, Richard Kang, Abdelkareem Bedri, Gierad Laput

    Abstract: Personal devices have adopted diverse authentication methods, including biometric recognition and passcodes. In contrast, headphones have limited input mechanisms, depending solely on the authentication of connected devices. We present Moonwalk, a novel method for passive user recognition utilizing the built-in headphone accelerometer. Our approach centers on gait recognition; enabling users to es… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

    ACM Class: H.5.2

  19. arXiv:2402.08420  [pdf, other

    cs.HC

    Vision-Based Hand Gesture Customization from a Single Demonstration

    Authors: Soroush Shahi, Cori Tymoszek Park, Richard Kang, Asaf Liberman, Oron Levy, Jun Gong, Abdelkareem Bedri, Gierad Laput

    Abstract: Hand gesture recognition is becoming a more prevalent mode of human-computer interaction, especially as cameras proliferate across everyday devices. Despite continued progress in this field, gesture customization is often underexplored. Customization is crucial since it enables users to define and demonstrate gestures that are more natural, memorable, and accessible. However, customization require… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

    ACM Class: H.5.2; I.4

  20. arXiv:2402.04618  [pdf, other

    cs.CV

    Multi-Scale Semantic Segmentation with Modified MBConv Blocks

    Authors: Xi Chen, Yang Cai, Yuan Wu, Bo Xiong, Taesung Park

    Abstract: Recently, MBConv blocks, initially designed for efficiency in resource-limited settings and later adapted for cutting-edge image classification performances, have demonstrated significant potential in image classification tasks. Despite their success, their application in semantic segmentation has remained relatively unexplored. This paper introduces a novel adaptation of MBConv blocks specificall… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

  21. arXiv:2401.04718  [pdf, other

    cs.CV

    Jump Cut Smoothing for Talking Heads

    Authors: Xiaojuan Wang, Taesung Park, Yang Zhou, Eli Shechtman, Richard Zhang

    Abstract: A jump cut offers an abrupt, sometimes unwanted change in the viewing experience. We present a novel framework for smoothing these jump cuts, in the context of talking head videos. We leverage the appearance of the subject from the other source frames in the video, fusing it with a mid-level representation driven by DensePose keypoints and face landmarks. To achieve motion, we interpolate the keyp… ▽ More

    Submitted 10 January, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

    Comments: Correct typos in the caption of Figure 1; Change the project website address. Project page: https://jeanne-wang.github.io/jumpcutsmoothing/

  22. arXiv:2312.16427  [pdf, other

    cs.LG cs.AI stat.ML

    Learning to Embed Time Series Patches Independently

    Authors: Seunghan Lee, Taeyoung Park, Kibok Lee

    Abstract: Masked time series modeling has recently gained much attention as a self-supervised representation learning strategy for time series. Inspired by masked image modeling in computer vision, recent works first patchify and partially mask out time series, and then train Transformers to capture the dependencies between patches by predicting masked patches from unmasked patches. However, we argue that c… ▽ More

    Submitted 2 May, 2024; v1 submitted 27 December, 2023; originally announced December 2023.

    Comments: ICLR 2024

  23. arXiv:2312.16424  [pdf, other

    cs.LG cs.AI stat.ML

    Soft Contrastive Learning for Time Series

    Authors: Seunghan Lee, Taeyoung Park, Kibok Lee

    Abstract: Contrastive learning has shown to be effective to learn representations from time series in a self-supervised way. However, contrasting similar time series instances or values from adjacent timestamps within a time series leads to ignore their inherent correlations, which results in deteriorating the quality of learned representations. To address this issue, we propose SoftCLT, a simple yet effect… ▽ More

    Submitted 22 March, 2024; v1 submitted 27 December, 2023; originally announced December 2023.

    Comments: ICLR 2024 Spotlight

  24. arXiv:2311.18828  [pdf, other

    cs.CV

    One-step Diffusion with Distribution Matching Distillation

    Authors: Tianwei Yin, Michaël Gharbi, Richard Zhang, Eli Shechtman, Fredo Durand, William T. Freeman, Taesung Park

    Abstract: Diffusion models generate high-quality images but require dozens of forward passes. We introduce Distribution Matching Distillation (DMD), a procedure to transform a diffusion model into a one-step image generator with minimal impact on image quality. We enforce the one-step image generator match the diffusion model at distribution level, by minimizing an approximate KL divergence whose gradient c… ▽ More

    Submitted 5 December, 2023; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: Project page: https://tianweiy.github.io/dmd/

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024

  25. arXiv:2311.04287  [pdf, other

    cs.CV cs.LG

    Holistic Evaluation of Text-To-Image Models

    Authors: Tony Lee, Michihiro Yasunaga, Chenlin Meng, Yifan Mai, Joon Sung Park, Agrim Gupta, Yunzhi Zhang, Deepak Narayanan, Hannah Benita Teufel, Marco Bellagente, Minguk Kang, Taesung Park, Jure Leskovec, Jun-Yan Zhu, Li Fei-Fei, Jiajun Wu, Stefano Ermon, Percy Liang

    Abstract: The stunning qualitative improvement of recent text-to-image models has led to their widespread attention and adoption. However, we lack a comprehensive quantitative understanding of their capabilities and risks. To fill this gap, we introduce a new benchmark, Holistic Evaluation of Text-to-Image Models (HEIM). Whereas previous evaluations focus mostly on text-image alignment and image quality, we… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

    Comments: NeurIPS 2023. First three authors contributed equally

  26. arXiv:2310.12378  [pdf, other

    eess.AS cs.SD

    The CHiME-7 Challenge: System Description and Performance of NeMo Team's DASR System

    Authors: Tae Jin Park, He Huang, Ante Jukic, Kunal Dhawan, Krishna C. Puvvada, Nithin Koluguri, Nikolay Karpov, Aleksandr Laptev, Jagadeesh Balam, Boris Ginsburg

    Abstract: We present the NVIDIA NeMo team's multi-channel speech recognition system for the 7th CHiME Challenge Distant Automatic Speech Recognition (DASR) Task, focusing on the development of a multi-channel, multi-speaker speech recognition system tailored to transcribe speech from distributed microphones and microphone arrays. The system predominantly comprises of the following integral modules: the Spea… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Journal ref: CHiME-7 Workshop 2023

  27. arXiv:2310.12371  [pdf, other

    eess.AS cs.SD

    Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data Generation

    Authors: Tae Jin Park, He Huang, Coleman Hooper, Nithin Koluguri, Kunal Dhawan, Ante Jukic, Jagadeesh Balam, Boris Ginsburg

    Abstract: We introduce a sophisticated multi-speaker speech data simulator, specifically engineered to generate multi-speaker speech recordings. A notable feature of this simulator is its capacity to modulate the distribution of silence and overlap via the adjustment of statistical parameters. This capability offers a tailored training environment for developing neural models suited for speaker diarization… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Journal ref: CHiME-7 Workshop 2023

  28. arXiv:2309.11645  [pdf, other

    cs.RO

    Online Supervised Training of Spaceborne Vision during Proximity Operations using Adaptive Kalman Filtering

    Authors: Tae Ha Park, Simone D'Amico

    Abstract: This work presents an Online Supervised Training (OST) method to enable robust vision-based navigation about a non-cooperative spacecraft. Spaceborne Neural Networks (NN) are susceptible to domain gap as they are primarily trained with synthetic images due to the inaccessibility of space. OST aims to close this gap by training a pose estimation NN online using incoming flight images during Rendezv… ▽ More

    Submitted 6 March, 2024; v1 submitted 20 September, 2023; originally announced September 2023.

    Comments: Accepted to ICRA'2024. Final revised version

  29. arXiv:2309.05248  [pdf, other

    eess.AS cs.SD

    Enhancing Speaker Diarization with Large Language Models: A Contextual Beam Search Approach

    Authors: Tae Jin Park, Kunal Dhawan, Nithin Koluguri, Jagadeesh Balam

    Abstract: Large language models (LLMs) have shown great promise for capturing contextual information in natural language processing tasks. We propose a novel approach to speaker diarization that incorporates the prowess of LLMs to exploit contextual cues in human dialogues. Our method builds upon an acoustic-based speaker diarization system by adding lexical information from an LLM in the inference stage. W… ▽ More

    Submitted 13 September, 2023; v1 submitted 11 September, 2023; originally announced September 2023.

    Comments: 4 pages 1 reference page, ICASSP format

  30. arXiv:2307.01676  [pdf, other

    cs.AI

    RaidEnv: Exploring New Challenges in Automated Content Balancing for Boss Raid Games

    Authors: Hyeon-Chang Jeon, In-Chang Baek, Cheong-mok Bae, Taehwa Park, Wonsang You, Taegwan Ha, Hoyun Jung, Jinha Noh, Seungwon Oh, Kyung-Joong Kim

    Abstract: The balance of game content significantly impacts the gaming experience. Unbalanced game content diminishes engagement or increases frustration because of repetitive failure. Although game designers intend to adjust the difficulty of game content, this is a repetitive, labor-intensive, and challenging process, especially for commercial-level games with extensive content. To address this issue, the… ▽ More

    Submitted 4 July, 2023; originally announced July 2023.

    Comments: 14 pages, 6 figures, 6 tables, 2 algorithms

  31. arXiv:2305.06459  [pdf, other

    eess.SP cs.GR cs.HC eess.IV q-bio.NC

    SlicerTMS: Real-Time Visualization of Transcranial Magnetic Stimulation for Mental Health Treatment

    Authors: Loraine Franke, Tae Young Park, Jie Luo, Yogesh Rathi, Steve Pieper, Lipeng Ning, Daniel Haehn

    Abstract: We present a real-time visualization system for Transcranial Magnetic Stimulation (TMS), a non-invasive neuromodulation technique for treating various brain disorders and mental health diseases. Our solution targets the current challenges of slow and labor-intensive practices in treatment planning. Integrating Deep Learning (DL), our system rapidly predicts electric field (E-field) distributions i… ▽ More

    Submitted 12 March, 2024; v1 submitted 10 May, 2023; originally announced May 2023.

    Comments: 11 pages, 4 figures, 2 tables, MICCAI

  32. arXiv:2304.06720  [pdf, other

    cs.CV cs.GR cs.LG

    Expressive Text-to-Image Generation with Rich Text

    Authors: Songwei Ge, Taesung Park, Jun-Yan Zhu, Jia-Bin Huang

    Abstract: Plain text has become a prevalent interface for text-to-image synthesis. However, its limited customization options hinder users from accurately describing desired outputs. For example, plain text makes it hard to specify continuous quantities, such as the precise RGB color value or importance of each word. Furthermore, creating detailed text prompts for complex scenes is tedious for humans to wri… ▽ More

    Submitted 28 May, 2024; v1 submitted 13 April, 2023; originally announced April 2023.

    Comments: Project webpage: https://rich-text-to-image.github.io/

  33. arXiv:2303.05511  [pdf, other

    cs.CV cs.GR cs.LG

    Scaling up GANs for Text-to-Image Synthesis

    Authors: Minguk Kang, Jun-Yan Zhu, Richard Zhang, Jaesik Park, Eli Shechtman, Sylvain Paris, Taesung Park

    Abstract: The recent success of text-to-image synthesis has taken the world by storm and captured the general public's imagination. From a technical standpoint, it also marked a drastic change in the favored architecture to design generative image models. GANs used to be the de facto choice, with techniques like StyleGAN. With DALL-E 2, auto-regressive and diffusion models became the new standard for large-… ▽ More

    Submitted 19 June, 2023; v1 submitted 9 March, 2023; originally announced March 2023.

    Comments: CVPR 2023. Project webpage at https://mingukkang.github.io/GigaGAN/

  34. arXiv:2303.00442  [pdf, other

    cs.LG cs.AI cs.CY

    Re-weighting Based Group Fairness Regularization via Classwise Robust Optimization

    Authors: Sangwon Jung, Taeeon Park, Sanghyuk Chun, Taesup Moon

    Abstract: Many existing group fairness-aware training methods aim to achieve the group fairness by either re-weighting underrepresented groups based on certain rules or using weakly approximated surrogates for the fairness metrics in the objective as regularization terms. Although each of the learning schemes has its own strength in terms of applicability or performance, respectively, it is difficult for an… ▽ More

    Submitted 1 March, 2023; originally announced March 2023.

  35. arXiv:2301.05225  [pdf, other

    cs.CV cs.GR cs.LG

    Domain Expansion of Image Generators

    Authors: Yotam Nitzan, Michaël Gharbi, Richard Zhang, Taesung Park, Jun-Yan Zhu, Daniel Cohen-Or, Eli Shechtman

    Abstract: Can one inject new concepts into an already trained generative model, while respecting its existing structure and knowledge? We propose a new task - domain expansion - to address this. Given a pretrained generator and novel (but related) domains, we expand the generator to jointly model all domains, old and new, harmoniously. First, we note the generator contains a meaningful, pretrained latent sp… ▽ More

    Submitted 17 April, 2023; v1 submitted 12 January, 2023; originally announced January 2023.

    Comments: Project Page and code are available at https://yotamnitzan.github.io/domain-expansion/. CVPR 2023 Camera-Ready

  36. arXiv:2206.03796  [pdf, other

    cs.RO eess.SP

    Adaptive Neural Network-based Unscented Kalman Filter for Robust Pose Tracking of Noncooperative Spacecraft

    Authors: Tae Ha Park, Simone D'Amico

    Abstract: This paper presents a neural network-based Unscented Kalman Filter (UKF) to estimate and track the pose (i.e., position and orientation) of a known, noncooperative, tumbling target spacecraft in a close-proximity rendezvous scenario. The UKF estimates the target's orbit and attitude relative to the servicer based on the pose information provided by a multi-task Convolutional Neural Network (CNN) f… ▽ More

    Submitted 8 May, 2023; v1 submitted 8 June, 2022; originally announced June 2022.

    Comments: Accepted to AIAA Journal of Guidance, Control, and Dynamics. Updated derivation of Section IV.B and experiments

  37. arXiv:2205.12231  [pdf, other

    cs.CV cs.GR

    ASSET: Autoregressive Semantic Scene Editing with Transformers at High Resolutions

    Authors: Difan Liu, Sandesh Shetty, Tobias Hinz, Matthew Fisher, Richard Zhang, Taesung Park, Evangelos Kalogerakis

    Abstract: We present ASSET, a neural architecture for automatically modifying an input high-resolution image according to a user's edits on its semantic segmentation map. Our architecture is based on a transformer with a novel attention mechanism. Our key idea is to sparsify the transformer's attention matrix at high resolutions, guided by dense attention extracted at lower image resolutions. While previous… ▽ More

    Submitted 24 May, 2022; originally announced May 2022.

    Comments: SIGGRAPH 2022 - Journal Track

  38. arXiv:2205.08512  [pdf, other

    cs.ET cs.CR physics.optics

    Experimental evaluation of digitally-verifiable photonic computing for blockchain and cryptocurrency

    Authors: Sunil Pai, Taewon Park, Marshall Ball, Bogdan Penkovsky, Maziyar Milanizadeh, Michael Dubrovsky, Nathnael Abebe, Francesco Morichetti, Andrea Melloni, Shanhui Fan, Olav Solgaard, David A. B. Miller

    Abstract: As blockchain technology and cryptocurrency become increasingly mainstream, ever-increasing energy costs required to maintain the computational power running these decentralized platforms create a market for more energy-efficient hardware. Photonic cryptographic hash functions, which use photonic integrated circuits to accelerate computation, promise energy efficiency for verifying transactions an… ▽ More

    Submitted 17 May, 2022; originally announced May 2022.

    Comments: 17 pages, 7 figures

  39. arXiv:2205.08501  [pdf, other

    cs.ET cs.LG physics.optics

    Experimentally realized in situ backpropagation for deep learning in nanophotonic neural networks

    Authors: Sunil Pai, Zhanghao Sun, Tyler W. Hughes, Taewon Park, Ben Bartlett, Ian A. D. Williamson, Momchil Minkov, Maziyar Milanizadeh, Nathnael Abebe, Francesco Morichetti, Andrea Melloni, Shanhui Fan, Olav Solgaard, David A. B. Miller

    Abstract: Neural networks are widely deployed models across many scientific disciplines and commercial endeavors ranging from edge computing and sensing to large-scale signal processing in data centers. The most efficient and well-entrenched method to train such networks is backpropagation, or reverse-mode automatic differentiation. To counter an exponentially increasing energy budget in the artificial inte… ▽ More

    Submitted 17 May, 2022; originally announced May 2022.

    Comments: 23 pages, 10 figures

  40. arXiv:2205.02837  [pdf, other

    cs.CV

    BlobGAN: Spatially Disentangled Scene Representations

    Authors: Dave Epstein, Taesung Park, Richard Zhang, Eli Shechtman, Alexei A. Efros

    Abstract: We propose an unsupervised, mid-level representation for a generative model of scenes. The representation is mid-level in that it is neither per-pixel nor per-image; rather, scenes are modeled as a collection of spatial, depth-ordered "blobs" of features. Blobs are differentiably placed onto a feature grid that is decoded into an image by a generative adversarial network. Due to the spatial unifor… ▽ More

    Submitted 29 July, 2022; v1 submitted 5 May, 2022; originally announced May 2022.

    Comments: ECCV 2022. Project webpage available at https://www.dave.ml/blobgan

  41. arXiv:2203.15974  [pdf, other

    eess.AS cs.CL

    Multi-scale Speaker Diarization with Dynamic Scale Weighting

    Authors: Tae Jin Park, Nithin Rao Koluguri, Jagadeesh Balam, Boris Ginsburg

    Abstract: Speaker diarization systems are challenged by a trade-off between the temporal resolution and the fidelity of the speaker representation. By obtaining a superior temporal resolution with an enhanced accuracy, a multi-scale approach is a way to cope with such a trade-off. In this paper, we propose a more advanced multi-scale diarization system based on a multi-scale diarization decoder. There are t… ▽ More

    Submitted 29 March, 2022; originally announced March 2022.

    Comments: Submitted to Interspeech 2022

  42. Robust Multi-Task Learning and Online Refinement for Spacecraft Pose Estimation across Domain Gap

    Authors: Tae Ha Park, Simone D'Amico

    Abstract: This work presents Spacecraft Pose Network v2 (SPNv2), a Convolutional Neural Network (CNN) for pose estimation of noncooperative spacecraft across domain gap. SPNv2 is a multi-scale, multi-task CNN which consists of a shared multi-scale feature encoder and multiple prediction heads that perform different tasks on a shared feature output. These tasks are all related to detection and pose estimatio… ▽ More

    Submitted 17 August, 2023; v1 submitted 8 March, 2022; originally announced March 2022.

    Comments: Accepted to Advances in Space Research; fixed error on reporting translation from heatmaps

  43. arXiv:2111.06934  [pdf, other

    cs.CV cs.LG

    Contrastive Feature Loss for Image Prediction

    Authors: Alex Andonian, Taesung Park, Bryan Russell, Phillip Isola, Jun-Yan Zhu, Richard Zhang

    Abstract: Training supervised image synthesis models requires a critic to compare two images: the ground truth to the result. Yet, this basic functionality remains an open problem. A popular line of approaches uses the L1 (mean absolute error) loss, either in the pixel or the feature space of pretrained deep networks. However, we observe that these losses tend to produce overly blurry and grey images, and o… ▽ More

    Submitted 12 November, 2021; originally announced November 2021.

    Comments: Appeared in Advances in Image Manipulation Workshop at ICCV 2021. GitHub: https://github.com/alexandonian/contrastive-feature-loss

  44. arXiv:2110.09695  [pdf, other

    cs.LG

    Tackling Dynamics in Federated Incremental Learning with Variational Embedding Rehearsal

    Authors: Tae Jin Park, Kenichi Kumatani, Dimitrios Dimitriadis

    Abstract: Federated Learning is a fast growing area of ML where the training datasets are extremely distributed, all while dynamically changing over time. Models need to be trained on clients' devices without any guarantees for either homogeneity or stationarity of the local private data. The need for continual training has also risen, due to the ever-increasing production of in-task data. However, pursuing… ▽ More

    Submitted 18 October, 2021; originally announced October 2021.

  45. arXiv:2110.04410  [pdf, other

    eess.AS cs.SD

    TitaNet: Neural Model for speaker representation with 1D Depth-wise separable convolutions and global context

    Authors: Nithin Rao Koluguri, Taejin Park, Boris Ginsburg

    Abstract: In this paper, we propose TitaNet, a novel neural network architecture for extracting speaker representations. We employ 1D depth-wise separable convolutions with Squeeze-and-Excitation (SE) layers with global context followed by channel attention based statistics pooling layer to map variable-length utterances to a fixed-length embedding (t-vector). TitaNet is a scalable architecture and achieves… ▽ More

    Submitted 8 October, 2021; originally announced October 2021.

    Comments: preprint. Submitted to ICASSP 2022

  46. SPEED+: Next-Generation Dataset for Spacecraft Pose Estimation across Domain Gap

    Authors: Tae Ha Park, Marcus Märtens, Gurvan Lecuyer, Dario Izzo, Simone D'Amico

    Abstract: Autonomous vision-based spaceborne navigation is an enabling technology for future on-orbit servicing and space logistics missions. While computer vision in general has benefited from Machine Learning (ML), training and validating spaceborne ML models are extremely challenging due to the impracticality of acquiring a large-scale labeled dataset of images of the intended target in the space environ… ▽ More

    Submitted 9 December, 2021; v1 submitted 6 October, 2021; originally announced October 2021.

    Journal ref: 2022 IEEE Aerospace Conference (AERO), 2022

  47. arXiv:2109.04650  [pdf, other

    cs.CL

    What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers

    Authors: Boseop Kim, HyoungSeok Kim, Sang-Woo Lee, Gichang Lee, Donghyun Kwak, Dong Hyeon Jeon, Sunghyun Park, Sungju Kim, Seonhoon Kim, Dongpil Seo, Heungsub Lee, Minyoung Jeong, Sungjae Lee, Minsub Kim, Suk Hyun Ko, Seokhun Kim, Taeyong Park, Jinuk Kim, Soyoung Kang, Na-Hyeon Ryu, Kang Min Yoo, Minsuk Chang, Soobin Suh, Sookyo In, Jinseong Park , et al. (12 additional authors not shown)

    Abstract: GPT-3 shows remarkable in-context learning ability of large-scale language models (LMs) trained on hundreds of billion scale data. Here we address some remaining issues less reported by the GPT-3 paper, such as a non-English LM, the performances of different sized models, and the effect of recently introduced prompt optimization on in-context learning. To achieve this, we introduce HyperCLOVA, a K… ▽ More

    Submitted 28 November, 2021; v1 submitted 9 September, 2021; originally announced September 2021.

    Comments: Accepted to EMNLP2021 as a long paper. Fixed some typos

  48. arXiv:2108.05529  [pdf, other

    cs.RO cs.CV cs.LG

    Robotic Testbed for Rendezvous and Optical Navigation: Multi-Source Calibration and Machine Learning Use Cases

    Authors: Tae Ha Park, Juergen Bosse, Simone D'Amico

    Abstract: This work presents the most recent advances of the Robotic Testbed for Rendezvous and Optical Navigation (TRON) at Stanford University - the first robotic testbed capable of validating machine learning algorithms for spaceborne optical navigation. The TRON facility consists of two 6 degrees-of-freedom KUKA robot arms and a set of Vicon motion track cameras to reconfigure an arbitrary relative pose… ▽ More

    Submitted 9 December, 2021; v1 submitted 12 August, 2021; originally announced August 2021.

    Comments: Presented at 2021 AAS/AIAA Astrodynamics Specialist Conference, Big Sky, Virtual, August 9-11 (2021)

    Report number: AAS 21-654

  49. arXiv:2108.02949  [pdf, other

    cs.LG

    Auxiliary Class Based Multiple Choice Learning

    Authors: Sihwan Kim, Dae Yon Jung, Taejang Park

    Abstract: The merit of ensemble learning lies in having different outputs from many individual models on a single input, i.e., the diversity of the base models. The high quality of diversity can be achieved when each model is specialized to different subsets of the whole dataset. Moreover, when each model explicitly knows to which subsets it is specialized, more opportunities arise to improve diversity. In… ▽ More

    Submitted 7 December, 2021; v1 submitted 6 August, 2021; originally announced August 2021.

  50. arXiv:2106.04411  [pdf, other

    cs.CV cs.AI

    Fair Feature Distillation for Visual Recognition

    Authors: Sangwon Jung, Donggyu Lee, Taeeon Park, Taesup Moon

    Abstract: Fairness is becoming an increasingly crucial issue for computer vision, especially in the human-related decision systems. However, achieving algorithmic fairness, which makes a model produce indiscriminative outcomes against protected groups, is still an unresolved problem. In this paper, we devise a systematic approach which reduces algorithmic biases via feature distillation for visual recogniti… ▽ More

    Submitted 10 June, 2021; v1 submitted 27 May, 2021; originally announced June 2021.