(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–50 of 479 results for author: Chang, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.08245  [pdf, other

    cs.LG cs.CV

    Feature Diversification and Adaptation for Federated Domain Generalization

    Authors: Seunghan Yang, Seokeon Choi, Hyunsin Park, Sungha Choi, Simyung Chang, Sungrack Yun

    Abstract: Federated learning, a distributed learning paradigm, utilizes multiple clients to build a robust global model. In real-world applications, local clients often operate within their limited domains, leading to a `domain shift' across clients. Privacy concerns limit each client's learning to its own domain data, which increase the risk of overfitting. Moreover, the process of aggregating models train… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  2. Proceedings of the Thirteenth Workshop on Trends in Functional Programming in Education

    Authors: Stephen Chang

    Abstract: This volume of the Electronic Proceedings in Theoretical Computer Science (EPTCS) contains revised selected papers that were initially presented at the 13th International Workshop on Trends in Functional Programming in Education (TFPIE 2024). This workshop was held at Seton Hall University in South Orange, NJ, USA on January 9, 2024. It was co-located with the 25th International Symposium on Trend… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Journal ref: EPTCS 405, 2024

  3. arXiv:2407.04822  [pdf, other

    eess.AS cs.LG cs.SD

    YourMT3+: Multi-instrument Music Transcription with Enhanced Transformer Architectures and Cross-dataset Stem Augmentation

    Authors: Sungkyun Chang, Emmanouil Benetos, Holger Kirchhoff, Simon Dixon

    Abstract: Multi-instrument music transcription aims to convert polyphonic music recordings into musical scores assigned to each instrument. This task is challenging for modeling as it requires simultaneously identifying multiple instruments and transcribing their pitch and precise timing, and the lack of fully annotated data adds to the training difficulties. This paper introduces YourMT3+, a suite of model… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Preprint submitted to IEEE MLSP 2024

  4. arXiv:2407.03331  [pdf, other

    cs.CV cs.AI cs.DC

    Anole: Adapting Diverse Compressed Models For Cross-Scene Prediction On Mobile Devices

    Authors: Yunzhe Li, Hongzi Zhu, Zhuohong Deng, Yunlong Cheng, Liang Zhang, Shan Chang, Minyi Guo

    Abstract: Emerging Artificial Intelligence of Things (AIoT) applications desire online prediction using deep neural network (DNN) models on mobile devices. However, due to the movement of devices, unfamiliar test samples constantly appear, significantly affecting the prediction accuracy of a pre-trained DNN. In addition, unstable network connection calls for local model inference. In this paper, we propose… ▽ More

    Submitted 9 May, 2024; originally announced July 2024.

  5. arXiv:2407.01863  [pdf, other

    cs.CL

    VSP: Assessing the dual challenges of perception and reasoning in spatial planning tasks for VLMs

    Authors: Qiucheng Wu, Handong Zhao, Michael Saxon, Trung Bui, William Yang Wang, Yang Zhang, Shiyu Chang

    Abstract: Vision language models (VLMs) are an exciting emerging class of language models (LMs) that have merged classic LM capabilities with those of image processing systems. However, the ways that these capabilities combine are not always intuitive and warrant direct investigation. One understudied capability in VLMs is visual spatial planning -- the ability to comprehend the spatial arrangements of obje… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  6. arXiv:2406.19560  [pdf, other

    cs.CV cs.LG eess.IV

    Cost-efficient Active Illumination Camera For Hyper-spectral Reconstruction

    Authors: Yuxuan Zhang, T. M. Sazzad, Yangyang Song, Spencer J. Chang, Ritesh Chowdhry, Tomas Mejia, Anna Hampton, Shelby Kucharski, Stefan Gerber, Barry Tillman, Marcio F. R. Resende, William M. Hammond, Chris H. Wilson, Alina Zare, Sanjeev J. Koppal

    Abstract: Hyper-spectral imaging has recently gained increasing attention for use in different applications, including agricultural investigation, ground tracking, remote sensing and many other. However, the high cost, large physical size and complicated operation process stop hyperspectral cameras from being employed for various applications and research fields. In this paper, we introduce a cost-efficient… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  7. arXiv:2406.10481  [pdf, other

    cs.LG math.OC stat.ME

    DCDILP: a distributed learning method for large-scale causal structure learning

    Authors: Shuyu Dong, Michèle Sebag, Kento Uemura, Akito Fujii, Shuang Chang, Yusuke Koyanagi, Koji Maruhashi

    Abstract: This paper presents a novel approach to causal discovery through a divide-and-conquer framework. By decomposing the problem into smaller subproblems defined on Markov blankets, the proposed DCDILP method first explores in parallel the local causal graphs of these subproblems. However, this local discovery phase encounters systematic challenges due to the presence of hidden confounders (variables w… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  8. arXiv:2406.09923  [pdf, other

    cs.CL cs.AI cs.LG

    CliBench: Multifaceted Evaluation of Large Language Models in Clinical Decisions on Diagnoses, Procedures, Lab Tests Orders and Prescriptions

    Authors: Mingyu Derek Ma, Chenchen Ye, Yu Yan, Xiaoxuan Wang, Peipei Ping, Timothy S Chang, Wei Wang

    Abstract: The integration of Artificial Intelligence (AI), especially Large Language Models (LLMs), into the clinical diagnosis process offers significant potential to improve the efficiency and accessibility of medical care. While LLMs have shown some promise in the medical domain, their application in clinical diagnosis remains underexplored, especially in real-world clinical practice, where highly sophis… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Project page: https://clibench.github.io

  9. arXiv:2406.08607  [pdf, other

    cs.CL cs.AI

    Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference

    Authors: Jiabao Ji, Yujian Liu, Yang Zhang, Gaowen Liu, Ramana Rao Kompella, Sijia Liu, Shiyu Chang

    Abstract: As Large Language Models (LLMs) demonstrate extensive capability in learning from documents, LLM unlearning becomes an increasingly important research area to address concerns of LLMs in terms of privacy, copyright, etc. A conventional LLM unlearning task typically involves two goals: (1) The target LLM should forget the knowledge in the specified forget documents, and (2) it should retain the oth… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 21 pages, 11 figures

  10. arXiv:2406.07007  [pdf, other

    cs.CL

    Crayon: Customized On-Device LLM via Instant Adapter Blending and Edge-Server Hybrid Inference

    Authors: Jihwan Bang, Juntae Lee, Kyuhong Shim, Seunghan Yang, Simyung Chang

    Abstract: The customization of large language models (LLMs) for user-specified tasks gets important. However, maintaining all the customized LLMs on cloud servers incurs substantial memory and computational overheads, and uploading user data can also lead to privacy concerns. On-device LLMs can offer a promising solution by mitigating these issues. Yet, the performance of on-device LLMs is inherently constr… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: ACL 2024 Main

  11. arXiv:2406.06950  [pdf, other

    cs.CL

    A Probabilistic Framework for LLM Hallucination Detection via Belief Tree Propagation

    Authors: Bairu Hou, Yang Zhang, Jacob Andreas, Shiyu Chang

    Abstract: This paper focuses on the task of hallucination detection, which aims to determine the truthfulness of LLM-generated statements. To address this problem, a popular class of methods utilize the LLM's self-consistencies in its beliefs in a set of logically related augmented statements generated by the LLM, which does not require external knowledge databases and can work with both white-box and black… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 26 pages, 18 figures

  12. arXiv:2405.18405  [pdf, other

    cs.CV cs.AI

    WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization

    Authors: Jiawei Ma, Yulei Niu, Shiyuan Huang, Guangxing Han, Shih-Fu Chang

    Abstract: Language has been useful in extending the vision encoder to data from diverse distributions without empirical discovery in training domains. However, as the image description is mostly at coarse-grained level and ignores visual details, the resulted embeddings are still ineffective in overcoming complexity of domains at inference time. We present a self-supervision framework WIDIn, Wording Images… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  13. arXiv:2405.11145  [pdf, other

    cs.CV cs.AI cs.MM

    Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions

    Authors: Junzhang Liu, Zhecan Wang, Hammad Ayyubi, Haoxuan You, Chris Thomas, Rui Sun, Shih-Fu Chang, Kai-Wei Chang

    Abstract: Despite the widespread adoption of Vision-Language Understanding (VLU) benchmarks such as VQA v2, OKVQA, A-OKVQA, GQA, VCR, SWAG, and VisualCOMET, our analysis reveals a pervasive issue affecting their integrity: these benchmarks contain samples where answers rely on assumptions unsupported by the provided context. Training models on such data foster biased learning and hallucinations as models te… ▽ More

    Submitted 25 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

  14. arXiv:2405.00736  [pdf, other

    eess.SP cs.LG

    Joint Signal Detection and Automatic Modulation Classification via Deep Learning

    Authors: Huijun Xing, Xuhui Zhang, Shuo Chang, Jinke Ren, Zixun Zhang, Jie Xu, Shuguang Cui

    Abstract: Signal detection and modulation classification are two crucial tasks in various wireless communication systems. Different from prior works that investigate them independently, this paper studies the joint signal detection and automatic modulation classification (AMC) by considering a realistic and complex scenario, in which multiple signals with different modulation schemes coexist at different ca… ▽ More

    Submitted 29 April, 2024; originally announced May 2024.

  15. arXiv:2404.16030  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    MoDE: CLIP Data Experts via Clustering

    Authors: Jiawei Ma, Po-Yao Huang, Saining Xie, Shang-Wen Li, Luke Zettlemoyer, Shih-Fu Chang, Wen-Tau Yih, Hu Xu

    Abstract: The success of contrastive language-image pretraining (CLIP) relies on the supervision from the pairing between images and captions, which tends to be noisy in web-crawled data. We present Mixture of Data Experts (MoDE) and learn a system of CLIP data experts via clustering. Each data expert is trained on one data cluster, being less sensitive to false negative noises in other clusters. At inferen… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: IEEE CVPR 2024 Camera Ready. Code Link: https://github.com/facebookresearch/MetaCLIP/tree/main/mode

  16. arXiv:2404.14852  [pdf, other

    cs.CV

    Ultrasound Nodule Segmentation Using Asymmetric Learning with Simple Clinical Annotation

    Authors: Xingyue Zhao, Zhongyu Li, Xiangde Luo, Peiqi Li, Peng Huang, Jianwei Zhu, Yang Liu, Jihua Zhu, Meng Yang, Shi Chang, Jun Dong

    Abstract: Recent advances in deep learning have greatly facilitated the automated segmentation of ultrasound images, which is essential for nodule morphological analysis. Nevertheless, most existing methods depend on extensive and precise annotations by domain experts, which are labor-intensive and time-consuming. In this study, we suggest using simple aspect ratio annotations directly from ultrasound clini… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: Accepted by TCSVT

  17. arXiv:2404.12274  [pdf, other

    cs.CL cs.AI

    Advancing the Robustness of Large Language Models through Self-Denoised Smoothing

    Authors: Jiabao Ji, Bairu Hou, Zhen Zhang, Guanhua Zhang, Wenqi Fan, Qing Li, Yang Zhang, Gaowen Liu, Sijia Liu, Shiyu Chang

    Abstract: Although large language models (LLMs) have achieved significant success, their vulnerability to adversarial perturbations, including recent jailbreak attacks, has raised considerable concerns. However, the increasing size of these models and their limited access make improving their robustness a challenging task. Among various defense strategies, randomized smoothing has shown great potential for… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: Accepted by NAACL 2024. Jiabao, Bairu, Zhen, Guanhua contributed equally. This is an updated version of the paper: arXiv:2307.07171

  18. arXiv:2404.07973  [pdf, other

    cs.CV

    Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models

    Authors: Haotian Zhang, Haoxuan You, Philipp Dufter, Bowen Zhang, Chen Chen, Hong-You Chen, Tsu-Jui Fu, William Yang Wang, Shih-Fu Chang, Zhe Gan, Yinfei Yang

    Abstract: While Ferret seamlessly integrates regional understanding into the Large Language Model (LLM) to facilitate its referring and grounding capability, it poses certain limitations: constrained by the pre-trained fixed visual encoder and failed to perform well on broader tasks. In this work, we unveil Ferret-v2, a significant upgrade to Ferret, with three key designs. (1) Any resolution grounding and… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: Preprint. 14 pages, 4 figures

  19. arXiv:2404.01954  [pdf, other

    cs.CL cs.AI

    HyperCLOVA X Technical Report

    Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

    Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More

    Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 44 pages; updated authors list and fixed author names

  20. arXiv:2403.18600  [pdf, other

    cs.CV cs.AI cs.RO

    RAP: Retrieval-Augmented Planner for Adaptive Procedure Planning in Instructional Videos

    Authors: Ali Zare, Yulei Niu, Hammad Ayyubi, Shih-fu Chang

    Abstract: Procedure Planning in instructional videos entails generating a sequence of action steps based on visual observations of the initial and target states. Despite the rapid progress in this task, there remain several critical challenges to be solved: (1) Adaptive procedures: Prior works hold an unrealistic assumption that the number of action steps is known and fixed, leading to non-generalizable mod… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: 23 pages, 6 figures, 12 tables

  21. arXiv:2403.17706  [pdf, other

    cs.CL cs.AI

    Enhanced Short Text Modeling: Leveraging Large Language Models for Topic Refinement

    Authors: Shuyu Chang, Rui Wang, Peng Ren, Haiping Huang

    Abstract: Crafting effective topic models for brief texts, like tweets and news headlines, is essential for capturing the swift shifts in social dynamics. Traditional topic models, however, often fall short in accurately representing the semantic intricacies of short texts due to their brevity and lack of contextual data. In our study, we harness the advanced capabilities of Large Language Models (LLMs) to… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: 6 pages, 4 figures

  22. Multi-modal Heart Failure Risk Estimation based on Short ECG and Sampled Long-Term HRV

    Authors: Sergio González, Abel Ko-Chun Yi, Wan-Ting Hsieh, Wei-Chao Chen, Chun-Li Wang, Victor Chien-Chia Wu, Shang-Hung Chang

    Abstract: Cardiovascular diseases, including Heart Failure (HF), remain a leading global cause of mortality, often evading early detection. In this context, accessible and effective risk assessment is indispensable. Traditional approaches rely on resource-intensive diagnostic tests, typically administered after the onset of symptoms. The widespread availability of electrocardiogram (ECG) technology and the… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

    Journal ref: S. González, A. K.-C. Yi, W.-T. Hsieh, W.-C. Chen, C.-L. Wang, V. C.-C. Wu, S.-H. Chang, Multi-modal heart failure risk estimation based on short ECG and sampled long-term HRV, Information Fusion 107 (2024) 102337

  23. arXiv:2403.13951  [pdf, other

    cs.CV cs.AI

    ACDG-VTON: Accurate and Contained Diffusion Generation for Virtual Try-On

    Authors: Jeffrey Zhang, Kedan Li, Shao-Yu Chang, David Forsyth

    Abstract: Virtual Try-on (VTON) involves generating images of a person wearing selected garments. Diffusion-based methods, in particular, can create high-quality images, but they struggle to maintain the identities of the input garments. We identified this problem stems from the specifics in the training formulation for diffusion. To address this, we propose a unique training scheme that limits the scope in… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  24. arXiv:2403.12027  [pdf, other

    cs.CL cs.AI cs.CV

    From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models

    Authors: Kung-Hsiang Huang, Hou Pong Chan, Yi R. Fung, Haoyi Qiu, Mingyang Zhou, Shafiq Joty, Shih-Fu Chang, Heng Ji

    Abstract: Data visualization in the form of charts plays a pivotal role in data analysis, offering critical insights and aiding in informed decision-making. Automatic chart understanding has witnessed significant advancements with the rise of large foundation models in recent years. Foundation models, such as large language models, have revolutionized various natural language processing tasks and are increa… ▽ More

    Submitted 25 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

  25. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  26. arXiv:2403.03532  [pdf, other

    cs.CV

    Extend Your Own Correspondences: Unsupervised Distant Point Cloud Registration by Progressive Distance Extension

    Authors: Quan Liu, Hongzi Zhu, Zhenxi Wang, Yunsong Zhou, Shan Chang, Minyi Guo

    Abstract: Registration of point clouds collected from a pair of distant vehicles provides a comprehensive and accurate 3D view of the driving scenario, which is vital for driving safety related applications, yet existing literature suffers from the expensive pose label acquisition and the deficiency to generalize to new data distributions. In this paper, we propose EYOC, an unsupervised distant point cloud… ▽ More

    Submitted 27 March, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

    Comments: In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

  27. arXiv:2403.01599  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos

    Authors: Yulei Niu, Wenliang Guo, Long Chen, Xudong Lin, Shih-Fu Chang

    Abstract: We study the problem of procedure planning in instructional videos, which aims to make a goal-oriented sequence of action steps given partial visual state observations. The motivation of this problem is to learn a structured and plannable state and action space. Recent works succeeded in sequence modeling of steps with only sequence-level annotations accessible during training, which overlooked th… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

    Comments: Accepted by ICLR 2024

  28. arXiv:2403.01335  [pdf, other

    cs.PL cs.HC

    Making Hybrid Languages: A Recipe

    Authors: Leif Andersen, Cameron Moy, Stephen Chang, Matthias Felleisen

    Abstract: The dominant programming languages support only linear text to express ideas. Visual languages offer graphical representations for entire programs, when viewed with special tools. Hybrid languages, with support from existing tools, allow developers to express their ideas with a mix of textual and graphical syntax tailored to an application domain. This mix puts both kinds of syntax on equal footin… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

  29. arXiv:2402.18697  [pdf, other

    stat.ML cs.LG cs.SI math.OC math.ST

    Inferring Dynamic Networks from Marginals with Iterative Proportional Fitting

    Authors: Serina Chang, Frederic Koehler, Zhaonan Qu, Jure Leskovec, Johan Ugander

    Abstract: A common network inference problem, arising from real-world data constraints, is how to infer a dynamic network from its time-aggregated adjacency matrix and time-varying marginals (i.e., row and column sums). Prior approaches to this problem have repurposed the classic iterative proportional fitting (IPF) procedure, also known as Sinkhorn's algorithm, with promising empirical results. However, th… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  30. arXiv:2402.18362  [pdf, other

    cs.CV cs.AI

    Objective and Interpretable Breast Cosmesis Evaluation with Attention Guided Denoising Diffusion Anomaly Detection Model

    Authors: Sangjoon Park, Yong Bae Kim, Jee Suk Chang, Seo Hee Choi, Hyungjin Chung, Ik Jae Lee, Hwa Kyung Byun

    Abstract: As advancements in the field of breast cancer treatment continue to progress, the assessment of post-surgical cosmetic outcomes has gained increasing significance due to its substantial impact on patients' quality of life. However, evaluating breast cosmesis presents challenges due to the inherently subjective nature of expert labeling. In this study, we present a novel automated approach, Attenti… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  31. arXiv:2402.17275  [pdf, other

    cs.CV

    One-Shot Structure-Aware Stylized Image Synthesis

    Authors: Hansam Cho, Jonghyun Lee, Seunggyu Chang, Yonghyun Jeong

    Abstract: While GAN-based models have been successful in image stylization tasks, they often struggle with structure preservation while stylizing a wide range of input images. Recently, diffusion models have been adopted for image stylization but still lack the capability to maintain the original quality of input images. Building on this, we propose OSASIS: a novel one-shot stylization method that is robust… ▽ More

    Submitted 1 April, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: CVPR 2024

  32. arXiv:2402.16827  [pdf, other

    cs.CL cs.LG

    A Survey on Data Selection for Language Models

    Authors: Alon Albalak, Yanai Elazar, Sang Michael Xie, Shayne Longpre, Nathan Lambert, Xinyi Wang, Niklas Muennighoff, Bairu Hou, Liangming Pan, Haewon Jeong, Colin Raffel, Shiyu Chang, Tatsunori Hashimoto, William Yang Wang

    Abstract: A major factor in the recent success of large language models is the use of enormous and ever-growing text datasets for unsupervised pre-training. However, naively training a model on all available data may not be optimal (or feasible), as the quality of available text data can vary. Filtering out data can also decrease the carbon footprint and financial costs of training models by reducing the am… ▽ More

    Submitted 8 March, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: Paper list available at https://github.com/alon-albalak/data-selection-survey

  33. arXiv:2402.16192  [pdf, other

    cs.CL

    Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing

    Authors: Jiabao Ji, Bairu Hou, Alexander Robey, George J. Pappas, Hamed Hassani, Yang Zhang, Eric Wong, Shiyu Chang

    Abstract: Aligned large language models (LLMs) are vulnerable to jailbreaking attacks, which bypass the safeguards of targeted LLMs and fool them into generating objectionable content. While initial defenses show promise against token-based threat models, there do not exist defenses that provide robustness against semantic attacks and avoid unfavorable trade-offs between robustness and nominal performance.… ▽ More

    Submitted 28 February, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

    Comments: 37 pages

  34. arXiv:2402.15939  [pdf

    eess.IV cs.LG

    Deep Separable Spatiotemporal Learning for Fast Dynamic Cardiac MRI

    Authors: Zi Wang, Min Xiao, Yirong Zhou, Chengyan Wang, Naiming Wu, Yi Li, Yiwen Gong, Shufu Chang, Yinyin Chen, Liuhong Zhu, Jianjun Zhou, Congbo Cai, He Wang, Di Guo, Guang Yang, Xiaobo Qu

    Abstract: Dynamic magnetic resonance imaging (MRI) plays an indispensable role in cardiac diagnosis. To enable fast imaging, the k-space data can be undersampled but the image reconstruction poses a great challenge of high-dimensional processing. This challenge leads to necessitate extensive training data in many deep learning reconstruction methods. This work proposes a novel and efficient approach, levera… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

    Comments: 10 pages, 11 figures, 3 tables

  35. arXiv:2402.09812  [pdf, other

    cs.CV

    DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization

    Authors: Jisu Nam, Heesu Kim, DongJae Lee, Siyoon Jin, Seungryong Kim, Seunggyu Chang

    Abstract: The objective of text-to-image (T2I) personalization is to customize a diffusion model to a user-provided reference concept, generating diverse images of the concept aligned with the target prompts. Conventional methods representing the reference concepts using unique text embeddings often fail to accurately mimic the appearance of the reference. To address this, one solution may be explicitly con… ▽ More

    Submitted 23 April, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

    Comments: Project page is available at https://ku-cvlab.github.io/DreamMatcher/

  36. ReviewFlow: Intelligent Scaffolding to Support Academic Peer Reviewing

    Authors: Lu Sun, Aaron Chan, Yun Seo Chang, Steven P. Dow

    Abstract: Peer review is a cornerstone of science. Research communities conduct peer reviews to assess contributions and to improve the overall quality of science work. Every year, new community members are recruited as peer reviewers for the first time. How could technology help novices adhere to their community's practices and standards for peer reviewing? To better understand peer review practices and ch… ▽ More

    Submitted 26 February, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: 19 pages, accepted at the 29th ACM Conference on Intelligent User Interfaces (IUI 2024)

  37. arXiv:2401.15555  [pdf, other

    cs.CL

    Augment before You Try: Knowledge-Enhanced Table Question Answering via Table Expansion

    Authors: Yujian Liu, Jiabao Ji, Tong Yu, Ryan Rossi, Sungchul Kim, Handong Zhao, Ritwik Sinha, Yang Zhang, Shiyu Chang

    Abstract: Table question answering is a popular task that assesses a model's ability to understand and interact with structured data. However, the given table often does not contain sufficient information for answering the question, necessitating the integration of external knowledge. Existing methods either convert both the table and external knowledge into text, which neglects the structured nature of the… ▽ More

    Submitted 27 January, 2024; originally announced January 2024.

  38. arXiv:2401.12789  [pdf, other

    cs.CL cs.SD eess.AS

    Multilingual and Fully Non-Autoregressive ASR with Large Language Model Fusion: A Comprehensive Study

    Authors: W. Ronny Huang, Cyril Allauzen, Tongzhou Chen, Kilol Gupta, Ke Hu, James Qin, Yu Zhang, Yongqiang Wang, Shuo-Yiin Chang, Tara N. Sainath

    Abstract: In the era of large models, the autoregressive nature of decoding often results in latency serving as a significant bottleneck. We propose a non-autoregressive LM-fused ASR system that effectively leverages the parallelization capabilities of accelerator hardware. Our approach combines the Universal Speech Model (USM) and the PaLM 2 language model in per-segment scoring mode, achieving an average… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: ICASSP 2024

  39. arXiv:2401.10293  [pdf, other

    quant-ph cs.LG

    Symmetry breaking in geometric quantum machine learning in the presence of noise

    Authors: Cenk Tüysüz, Su Yeon Chang, Maria Demidik, Karl Jansen, Sofia Vallecorsa, Michele Grossi

    Abstract: Geometric quantum machine learning based on equivariant quantum neural networks (EQNN) recently appeared as a promising direction in quantum machine learning. Despite the encouraging progress, the studies are still limited to theory, and the role of hardware noise in EQNN training has never been explored. This work studies the behavior of EQNN models in the presence of noise. We show that certain… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

    Comments: 12 pages, 10 figures. supplementary material 7 pages, 6 figures

  40. arXiv:2401.04390  [pdf, other

    cs.CV

    Learning with Noisy Labels: Interconnection of Two Expectation-Maximizations

    Authors: Heewon Kim, Hyun Sung Chang, Kiho Cho, Jaeyun Lee, Bohyung Han

    Abstract: Labor-intensive labeling becomes a bottleneck in developing computer vision algorithms based on deep learning. For this reason, dealing with imperfect labels has increasingly gained attention and has become an active field of study. We address learning with noisy labels (LNL) problem, which is formalized as a task of finding a structured manifold in the midst of noisy data. In this framework, we p… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

  41. arXiv:2401.02097  [pdf, other

    cs.CV

    Preserving Image Properties Through Initializations in Diffusion Models

    Authors: Jeffrey Zhang, Shao-Yu Chang, Kedan Li, David Forsyth

    Abstract: Retail photography imposes specific requirements on images. For instance, images may need uniform background colors, consistent model poses, centered products, and consistent lighting. Minor deviations from these standards impact a site's aesthetic appeal, making the images unsuitable for use. We show that Stable Diffusion methods, as currently applied, do not respect these requirements. The usual… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

  42. arXiv:2312.17495  [pdf

    cs.LG physics.bio-ph q-bio.BM

    Integrating Chemical Language and Molecular Graph in Multimodal Fused Deep Learning for Drug Property Prediction

    Authors: Xiaohua Lu, Liangxu Xie, Lei Xu, Rongzhi Mao, Shan Chang, Xiaojun Xu

    Abstract: Accurately predicting molecular properties is a challenging but essential task in drug discovery. Recently, many mono-modal deep learning methods have been successfully applied to molecular property prediction. However, the inherent limitation of mono-modal learning arises from relying solely on one modality of molecular representation, which restricts a comprehensive understanding of drug molecul… ▽ More

    Submitted 29 December, 2023; originally announced December 2023.

  43. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  44. arXiv:2312.11476  [pdf

    physics.geo-ph cs.LG

    The geometry of flow: Advancing predictions of river geometry with multi-model machine learning

    Authors: Shuyu Y Chang, Zahra Ghahremani, Laura Manuel, Mohammad Erfani, Chaopeng Shen, Sagy Cohen, Kimberly Van Meter, Jennifer L Pierce, Ehab A Meselhe, Erfan Goharian

    Abstract: Hydraulic geometry parameters describing river hydrogeomorphic is important for flood forecasting. Although well-established, power-law hydraulic geometry curves have been widely used to understand riverine systems and mapping flooding inundation worldwide for the past 70 years, we have become increasingly aware of the limitations of these approaches. In the present study, we have moved beyond the… ▽ More

    Submitted 27 November, 2023; originally announced December 2023.

    Comments: 30 pages, 10 figures

  45. arXiv:2312.11123  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Improved Long-Form Speech Recognition by Jointly Modeling the Primary and Non-primary Speakers

    Authors: Guru Prakash Arumugam, Shuo-yiin Chang, Tara N. Sainath, Rohit Prabhavalkar, Quan Wang, Shaan Bijwadia

    Abstract: ASR models often suffer from a long-form deletion problem where the model predicts sequential blanks instead of words when transcribing a lengthy audio (in the order of minutes or hours). From the perspective of a user or downstream system consuming the ASR results, this behavior can be perceived as the model "being stuck", and potentially make the product hard to use. One of the culprits for long… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: 8 pages, ASRU 2023

  46. All Attention U-NET for Semantic Segmentation of Intracranial Hemorrhages In Head CT Images

    Authors: Chia Shuo Chang, Tian Sheuan Chang, Jiun Lin Yan, Li Ko

    Abstract: Intracranial hemorrhages in head CT scans serve as a first line tool to help specialists diagnose different types. However, their types have diverse shapes in the same type but similar confusing shape, size and location between types. To solve this problem, this paper proposes an all attention U-Net. It uses channel attentions in the U-Net encoder side to enhance class specific feature extraction,… ▽ More

    Submitted 16 December, 2023; originally announced December 2023.

    Comments: 2022 IEEE Biomedical Circuits and Systems Conference (BioCAS)

  47. arXiv:2312.10160  [pdf, other

    cs.CL

    Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning

    Authors: Kung-Hsiang Huang, Mingyang Zhou, Hou Pong Chan, Yi R. Fung, Zhenhailong Wang, Lingyu Zhang, Shih-Fu Chang, Heng Ji

    Abstract: Recent advancements in large vision-language models (LVLMs) have led to significant progress in generating natural language descriptions for visual content and thus enhancing various applications. One issue with these powerful models is that they sometimes produce texts that are factually inconsistent with the visual input. While there has been some effort to mitigate such inconsistencies in natur… ▽ More

    Submitted 30 May, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: ACL 2024 Findings

  48. arXiv:2312.06038  [pdf, other

    cs.CV cs.LG

    Correcting Diffusion Generation through Resampling

    Authors: Yujian Liu, Yang Zhang, Tommi Jaakkola, Shiyu Chang

    Abstract: Despite diffusion models' superior capabilities in modeling complex distributions, there are still non-trivial distributional discrepancies between generated and ground-truth images, which has resulted in several notable problems in image generation, including missing object errors in text-to-image generation and low image quality. Existing methods that attempt to address these problems mostly do… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

  49. arXiv:2312.03772  [pdf, other

    cs.CV

    DiffusionAtlas: High-Fidelity Consistent Diffusion Video Editing

    Authors: Shao-Yu Chang, Hwann-Tzong Chen, Tyng-Luh Liu

    Abstract: We present a diffusion-based video editing framework, namely DiffusionAtlas, which can achieve both frame consistency and high fidelity in editing video object appearance. Despite the success in image editing, diffusion models still encounter significant hindrances when it comes to video editing due to the challenge of maintaining spatiotemporal consistency in the object's appearance across frames… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Comments: Preprint

  50. arXiv:2312.02188  [pdf, other

    cs.CV cs.AI cs.CL cs.MM

    Video Summarization: Towards Entity-Aware Captions

    Authors: Hammad A. Ayyubi, Tianqi Liu, Arsha Nagrani, Xudong Lin, Mingda Zhang, Anurag Arnab, Feng Han, Yukun Zhu, Jialu Liu, Shih-Fu Chang

    Abstract: Existing popular video captioning benchmarks and models deal with generic captions devoid of specific person, place or organization named entities. In contrast, news videos present a challenging setting where the caption requires such named entities for meaningful summarization. As such, we propose the task of summarizing news video directly to entity-aware captions. We also release a large-scale… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.