-
Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models
Authors:
Xiaoyu Zhu,
Hao Zhou,
Pengfei Xing,
Long Zhao,
Hao Xu,
Junwei Liang,
Alexander Hauptmann,
Ting Liu,
Andrew Gallagher
Abstract:
In this paper, we investigate the use of diffusion models which are pre-trained on large-scale image-caption pairs for open-vocabulary 3D semantic understanding. We propose a novel method, namely Diff2Scene, which leverages frozen representations from text-image generative models, along with salient-aware and geometric-aware masks, for open-vocabulary 3D semantic segmentation and visual grounding…
▽ More
In this paper, we investigate the use of diffusion models which are pre-trained on large-scale image-caption pairs for open-vocabulary 3D semantic understanding. We propose a novel method, namely Diff2Scene, which leverages frozen representations from text-image generative models, along with salient-aware and geometric-aware masks, for open-vocabulary 3D semantic segmentation and visual grounding tasks. Diff2Scene gets rid of any labeled 3D data and effectively identifies objects, appearances, materials, locations and their compositions in 3D scenes. We show that it outperforms competitive baselines and achieves significant improvements over state-of-the-art methods. In particular, Diff2Scene improves the state-of-the-art method on ScanNet200 by 12%.
△ Less
Submitted 18 July, 2024;
originally announced July 2024.
-
Fast Matrix Multiplications for Lookup Table-Quantized LLMs
Authors:
Han Guo,
William Brandon,
Radostin Cholakov,
Jonathan Ragan-Kelley,
Eric P. Xing,
Yoon Kim
Abstract:
The deployment of large language models (LLMs) is often constrained by memory bandwidth, where the primary bottleneck is the cost of transferring model parameters from the GPU's global memory to its registers. When coupled with custom kernels that fuse the dequantization and matmul operations, weight-only quantization can thus enable faster inference by reducing the amount of memory movement. Howe…
▽ More
The deployment of large language models (LLMs) is often constrained by memory bandwidth, where the primary bottleneck is the cost of transferring model parameters from the GPU's global memory to its registers. When coupled with custom kernels that fuse the dequantization and matmul operations, weight-only quantization can thus enable faster inference by reducing the amount of memory movement. However, developing high-performance kernels for weight-quantized LLMs presents substantial challenges, especially when the weights are compressed to non-evenly-divisible bit widths (e.g., 3 bits) with non-uniform, lookup table (LUT) quantization. This paper describes FLUTE, a flexible lookup table engine for LUT-quantized LLMs, which uses offline restructuring of the quantized weight matrix to minimize bit manipulations associated with unpacking, and vectorization and duplication of the lookup table to mitigate shared memory bandwidth constraints. At batch sizes < 32 and quantization group size of 128 (typical in LLM inference), the FLUTE kernel can be 2-4x faster than existing GEMM kernels. As an application of FLUTE, we explore a simple extension to lookup table-based NormalFloat quantization and apply it to quantize LLaMA3 to various configurations, obtaining competitive quantization performance against strong baselines while obtaining an end-to-end throughput increase of 1.5 to 2 times.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
EXCGEC: A Benchmark of Edit-wise Explainable Chinese Grammatical Error Correction
Authors:
Jingheng Ye,
Shang Qin,
Yinghui Li,
Xuxin Cheng,
Libo Qin,
Hai-Tao Zheng,
Peng Xing,
Zishan Xu,
Guo Cheng,
Zhao Wei
Abstract:
Existing studies explore the explainability of Grammatical Error Correction (GEC) in a limited scenario, where they ignore the interaction between corrections and explanations. To bridge the gap, this paper introduces the task of EXplainable GEC (EXGEC), which focuses on the integral role of both correction and explanation tasks. To facilitate the task, we propose EXCGEC, a tailored benchmark for…
▽ More
Existing studies explore the explainability of Grammatical Error Correction (GEC) in a limited scenario, where they ignore the interaction between corrections and explanations. To bridge the gap, this paper introduces the task of EXplainable GEC (EXGEC), which focuses on the integral role of both correction and explanation tasks. To facilitate the task, we propose EXCGEC, a tailored benchmark for Chinese EXGEC consisting of 8,216 explanation-augmented samples featuring the design of hybrid edit-wise explanations. We benchmark several series of LLMs in multiple settings, covering post-explaining and pre-explaining. To promote the development of the task, we introduce a comprehensive suite of automatic metrics and conduct human evaluation experiments to demonstrate the human consistency of the automatic metrics for free-text explanations. All the codes and data will be released after the review.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
InstantStyle-Plus: Style Transfer with Content-Preserving in Text-to-Image Generation
Authors:
Haofan Wang,
Peng Xing,
Renyuan Huang,
Hao Ai,
Qixun Wang,
Xu Bai
Abstract:
Style transfer is an inventive process designed to create an image that maintains the essence of the original while embracing the visual style of another. Although diffusion models have demonstrated impressive generative power in personalized subject-driven or style-driven applications, existing state-of-the-art methods still encounter difficulties in achieving a seamless balance between content p…
▽ More
Style transfer is an inventive process designed to create an image that maintains the essence of the original while embracing the visual style of another. Although diffusion models have demonstrated impressive generative power in personalized subject-driven or style-driven applications, existing state-of-the-art methods still encounter difficulties in achieving a seamless balance between content preservation and style enhancement. For example, amplifying the style's influence can often undermine the structural integrity of the content. To address these challenges, we deconstruct the style transfer task into three core elements: 1) Style, focusing on the image's aesthetic characteristics; 2) Spatial Structure, concerning the geometric arrangement and composition of visual elements; and 3) Semantic Content, which captures the conceptual meaning of the image. Guided by these principles, we introduce InstantStyle-Plus, an approach that prioritizes the integrity of the original content while seamlessly integrating the target style. Specifically, our method accomplishes style injection through an efficient, lightweight process, utilizing the cutting-edge InstantStyle framework. To reinforce the content preservation, we initiate the process with an inverted content latent noise and a versatile plug-and-play tile ControlNet for preserving the original image's intrinsic layout. We also incorporate a global semantic adapter to enhance the semantic content's fidelity. To safeguard against the dilution of style information, a style extractor is employed as discriminator for providing supplementary style guidance. Codes will be available at https://github.com/instantX-research/InstantStyle-Plus.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
Authors:
Sukmin Yun,
Haokun Lin,
Rusiru Thushara,
Mohammad Qazim Bhat,
Yongxin Wang,
Zutao Jiang,
Mingkai Deng,
Jinhong Wang,
Tianhua Tao,
Junbo Li,
Haonan Li,
Preslav Nakov,
Timothy Baldwin,
Zhengzhong Liu,
Eric P. Xing,
Xiaodan Liang,
Zhiqiang Shen
Abstract:
Multimodal large language models (MLLMs) have shown impressive success across modalities such as image, video, and audio in a variety of understanding and generation tasks. However, current MLLMs are surprisingly poor at understanding webpage screenshots and generating their corresponding HTML code. To address this problem, we propose Web2Code, a benchmark consisting of a new large-scale webpage-t…
▽ More
Multimodal large language models (MLLMs) have shown impressive success across modalities such as image, video, and audio in a variety of understanding and generation tasks. However, current MLLMs are surprisingly poor at understanding webpage screenshots and generating their corresponding HTML code. To address this problem, we propose Web2Code, a benchmark consisting of a new large-scale webpage-to-code dataset for instruction tuning and an evaluation framework for the webpage understanding and HTML code translation abilities of MLLMs. For dataset construction, we leverage pretrained LLMs to enhance existing webpage-to-code datasets as well as generate a diverse pool of new webpages rendered into images. Specifically, the inputs are webpage images and instructions, while the responses are the webpage's HTML code. We further include diverse natural language QA pairs about the webpage content in the responses to enable a more comprehensive understanding of the web content. To evaluate model performance in these tasks, we develop an evaluation framework for testing MLLMs' abilities in webpage understanding and web-to-code generation. Extensive experiments show that our proposed dataset is beneficial not only to our proposed tasks but also in the general visual domain, while previous datasets result in worse performance. We hope our work will contribute to the development of general MLLMs suitable for web-based content generation and task automation. Our data and code will be available at https://github.com/MBZUAI-LLM/web2code.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Pandora: Towards General World Model with Natural Language Actions and Video States
Authors:
Jiannan Xiang,
Guangyi Liu,
Yi Gu,
Qiyue Gao,
Yuting Ning,
Yuheng Zha,
Zeyu Feng,
Tianhua Tao,
Shibo Hao,
Yemin Shi,
Zhengzhong Liu,
Eric P. Xing,
Zhiting Hu
Abstract:
World models simulate future states of the world in response to different actions. They facilitate interactive content creation and provides a foundation for grounded, long-horizon reasoning. Current foundation models do not fully meet the capabilities of general world models: large language models (LLMs) are constrained by their reliance on language modality and their limited understanding of the…
▽ More
World models simulate future states of the world in response to different actions. They facilitate interactive content creation and provides a foundation for grounded, long-horizon reasoning. Current foundation models do not fully meet the capabilities of general world models: large language models (LLMs) are constrained by their reliance on language modality and their limited understanding of the physical world, while video models lack interactive action control over the world simulations. This paper makes a step towards building a general world model by introducing Pandora, a hybrid autoregressive-diffusion model that simulates world states by generating videos and allows real-time control with free-text actions. Pandora achieves domain generality, video consistency, and controllability through large-scale pretraining and instruction tuning. Crucially, Pandora bypasses the cost of training-from-scratch by integrating a pretrained LLM (7B) and a pretrained video model, requiring only additional lightweight finetuning. We illustrate extensive outputs by Pandora across diverse domains (indoor/outdoor, natural/urban, human/robot, 2D/3D, etc.). The results indicate great potential of building stronger general world models with larger-scale training.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
A Recover-then-Discriminate Framework for Robust Anomaly Detection
Authors:
Peng Xing,
Dong Zhang,
Jinhui Tang,
Zechao li
Abstract:
Anomaly detection (AD) has been extensively studied and applied in a wide range of scenarios in the recent past. However, there are still gaps between achieved and desirable levels of recognition accuracy for making AD for practical applications. In this paper, we start from an insightful analysis of two types of fundamental yet representative failure cases in the baseline model, and reveal reason…
▽ More
Anomaly detection (AD) has been extensively studied and applied in a wide range of scenarios in the recent past. However, there are still gaps between achieved and desirable levels of recognition accuracy for making AD for practical applications. In this paper, we start from an insightful analysis of two types of fundamental yet representative failure cases in the baseline model, and reveal reasons that hinder current AD methods from achieving a higher recognition accuracy. Specifically, by Case-1, we found that the main reasons detrimental to current AD methods is that the inputs to the recovery model contain a large number of detailed features to be recovered, which leads to the normal/abnormal area has-not/has been recovered into its original state. By Case-2, we surprisingly found that the abnormal area that cannot be recognized in image-level representations can be easily recognized in the feature-level representation. Based on the above observations, we propose a novel Recover-then-Discriminate (ReDi) framework for AD. ReDi takes a self-generated feature map and a selected prompted image as explicit input information to solve problems in case-1. Concurrently, a feature-level discriminative network is proposed to enhance abnormal differences between the recovered representation and the input representation. Extensive experimental results on two popular yet challenging AD datasets validate that ReDi achieves the new state-of-the-art accuracy.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Inv-Adapter: ID Customization Generation via Image Inversion and Lightweight Adapter
Authors:
Peng Xing,
Ning Wang,
Jianbo Ouyang,
Zechao Li
Abstract:
The remarkable advancement in text-to-image generation models significantly boosts the research in ID customization generation. However, existing personalization methods cannot simultaneously satisfy high fidelity and high-efficiency requirements. Their main bottleneck lies in the prompt image encoder, which produces weak alignment signals with the text-to-image model and significantly increased m…
▽ More
The remarkable advancement in text-to-image generation models significantly boosts the research in ID customization generation. However, existing personalization methods cannot simultaneously satisfy high fidelity and high-efficiency requirements. Their main bottleneck lies in the prompt image encoder, which produces weak alignment signals with the text-to-image model and significantly increased model size. Towards this end, we propose a lightweight Inv-Adapter, which first extracts diffusion-domain representations of ID images utilizing a pre-trained text-to-image model via DDIM image inversion, without additional image encoder. Benefiting from the high alignment of the extracted ID prompt features and the intermediate features of the text-to-image model, we then embed them efficiently into the base text-to-image model by carefully designing a lightweight attention adapter. We conduct extensive experiments to assess ID fidelity, generation loyalty, speed, and training parameters, all of which show that the proposed Inv-Adapter is highly competitive in ID customization generation and model scale.
△ Less
Submitted 6 June, 2024; v1 submitted 4 June, 2024;
originally announced June 2024.
-
3D transcranial Dynamic Ultrasound Localization Microscopy in the mouse brain using a Row-Column Array
Authors:
Alice Wu,
Jonathan Porée,
Gerardo Ramos-Palacios,
Chloé Bourquin,
Nin Ghigo,
Alexis Leconte,
Paul Xing,
Abbas F. Sadikot,
Michaël Chassé,
Jean Provost
Abstract:
The role of brain hemodynamics in neurodegenerative diseases cannot be fully assessed using existing imaging technologies. Recently, 2D Dynamic Ultrasound Localization Microscopy (DULM) has allowed for the quantitative mapping of the pulsatile flow at sub-wavelength resolution. However, to obtain accurate velocity estimates, 3D imaging is more adapted, especially for complex vascularized organs li…
▽ More
The role of brain hemodynamics in neurodegenerative diseases cannot be fully assessed using existing imaging technologies. Recently, 2D Dynamic Ultrasound Localization Microscopy (DULM) has allowed for the quantitative mapping of the pulsatile flow at sub-wavelength resolution. However, to obtain accurate velocity estimates, 3D imaging is more adapted, especially for complex vascularized organs like the brain. 3D+t DULM is achievable using matrix array probes, but suffers from limitations in terms of cost, device complexity associated with the high channel count, and operating frequencies. Alternatively, Row Column Arrays (RCA) can reduce the number of elements while maintaining a large field of view and high frame rate. Herein, we demonstrate the feasibility of performing 3D+t blood flow measurements in the mouse brain using an RCA and a DULM sequence with a high spatiotemporal resolution. Transcranial images of anesthetized mice (n=7) were acquired at a volume rate of 750 Hz using 42 tilted plane waves. After microbubbles localization and tracking, super-resolved dynamic density and velocity maps of the 3D brain vascular network were obtained. Cortical vessels were segmented and pulsatility in the arteries was significantly higher than in veins for all mice, in accordance with the literature. Our results demonstrate the feasibility and reproducibility of achieving high spatiotemporal resolution volumes of the mouse brain vasculature with DULM using a RCA.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Learning Discrete Concepts in Latent Hierarchical Models
Authors:
Lingjing Kong,
Guangyi Chen,
Biwei Huang,
Eric P. Xing,
Yuejie Chi,
Kun Zhang
Abstract:
Learning concepts from natural high-dimensional data (e.g., images) holds potential in building human-aligned and interpretable machine learning models. Despite its encouraging prospect, formalization and theoretical insights into this crucial task are still lacking. In this work, we formalize concepts as discrete latent causal variables that are related via a hierarchical causal model that encode…
▽ More
Learning concepts from natural high-dimensional data (e.g., images) holds potential in building human-aligned and interpretable machine learning models. Despite its encouraging prospect, formalization and theoretical insights into this crucial task are still lacking. In this work, we formalize concepts as discrete latent causal variables that are related via a hierarchical causal model that encodes different abstraction levels of concepts embedded in high-dimensional data (e.g., a dog breed and its eye shapes in natural images). We formulate conditions to facilitate the identification of the proposed causal model, which reveals when learning such concepts from unsupervised data is possible. Our conditions permit complex causal hierarchical structures beyond latent trees and multi-level directed acyclic graphs in prior work and can handle high-dimensional, continuous observed variables, which is well-suited for unstructured data modalities such as images. We substantiate our theoretical claims with synthetic data experiments. Further, we discuss our theory's implications for understanding the underlying mechanisms of latent diffusion models and provide corresponding empirical evidence for our theoretical insights.
△ Less
Submitted 1 June, 2024;
originally announced June 2024.
-
Towards Transcranial 3D Ultrasound Localization Microscopy of the Nonhuman Primate Brain
Authors:
Paul Xing,
Vincent Perrot,
Adan Ulises Dominguez-Vargas,
Stephan Quessy,
Numa Dancause,
Jean Provost
Abstract:
Hemodynamic changes occur in stroke and neurodegenerative diseases. Developing imaging techniques allowing the in vivo visualization and quantification of cerebral blood flow would help better understand the underlying mechanism of those cerebrovascular diseases. 3D ultrasound localization microscopy (ULM) is a novel technology that can map the microvasculature of the brain at large depth and has…
▽ More
Hemodynamic changes occur in stroke and neurodegenerative diseases. Developing imaging techniques allowing the in vivo visualization and quantification of cerebral blood flow would help better understand the underlying mechanism of those cerebrovascular diseases. 3D ultrasound localization microscopy (ULM) is a novel technology that can map the microvasculature of the brain at large depth and has been mainly used until now in rodents. Here, we demonstrated the feasibility of 3D ULM of the nonhuman primate (NHP) brain with a single 256-channels programmable ultrasound scanner. We achieved a highly resolved vascular map of the macaque brain at large depth in presence of craniotomy and durectomy using an 8-MHz multiplexed matrix probe. We were able to distinguish vessels as small as 26.9 μm. We also demonstrated that transcranial imaging of the macaque brain at similar depth was feasible using a 3-MHz probe and achieved a resolution of 60.4 μm. This work paves the way to clinical application of 3D ULM.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
Toward Inference-optimal Mixture-of-Expert Large Language Models
Authors:
Longfei Yun,
Yonghao Zhuang,
Yao Fu,
Eric P Xing,
Hao Zhang
Abstract:
Mixture-of-Expert (MoE) based large language models (LLMs), such as the recent Mixtral and DeepSeek-MoE, have shown great promise in scaling model size without suffering from the quadratic growth of training cost of dense transformers. Like dense models, training MoEs requires answering the same question: given a training budget, what is the optimal allocation on the model size and number of token…
▽ More
Mixture-of-Expert (MoE) based large language models (LLMs), such as the recent Mixtral and DeepSeek-MoE, have shown great promise in scaling model size without suffering from the quadratic growth of training cost of dense transformers. Like dense models, training MoEs requires answering the same question: given a training budget, what is the optimal allocation on the model size and number of tokens? We study the scaling law of MoE-based LLMs regarding the relations between the model performance, model size, dataset size, and the expert degree. Echoing previous research studying MoE in different contexts, we observe the diminishing return of increasing the number of experts, but this seems to suggest we should scale the number of experts until saturation, as the training cost would remain constant, which is problematic during inference time. We propose to amend the scaling law of MoE by introducing inference efficiency as another metric besides the validation loss. We find that MoEs with a few (4/8) experts are the most serving efficient solution under the same performance, but costs 2.5-3.5x more in training. On the other hand, training a (16/32) expert MoE much smaller (70-85%) than the loss-optimal solution, but with a larger training dataset is a promising setup under a training budget.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Unified Generation, Reconstruction, and Representation: Generalized Diffusion with Adaptive Latent Encoding-Decoding
Authors:
Guangyi Liu,
Yu Wang,
Zeyu Feng,
Qiyu Wu,
Liping Tang,
Yuan Gao,
Zhen Li,
Shuguang Cui,
Julian McAuley,
Zichao Yang,
Eric P. Xing,
Zhiting Hu
Abstract:
The vast applications of deep generative models are anchored in three core capabilities -- generating new instances, reconstructing inputs, and learning compact representations -- across various data types, such as discrete text/protein sequences and continuous images. Existing model families, like variational autoencoders (VAEs), generative adversarial networks (GANs), autoregressive models, and…
▽ More
The vast applications of deep generative models are anchored in three core capabilities -- generating new instances, reconstructing inputs, and learning compact representations -- across various data types, such as discrete text/protein sequences and continuous images. Existing model families, like variational autoencoders (VAEs), generative adversarial networks (GANs), autoregressive models, and (latent) diffusion models, generally excel in specific capabilities and data types but fall short in others. We introduce Generalized Encoding-Decoding Diffusion Probabilistic Models (EDDPMs) which integrate the core capabilities for broad applicability and enhanced performance. EDDPMs generalize the Gaussian noising-denoising in standard diffusion by introducing parameterized encoding-decoding. Crucially, EDDPMs are compatible with the well-established diffusion model objective and training recipes, allowing effective learning of the encoder-decoder parameters jointly with diffusion. By choosing appropriate encoder/decoder (e.g., large language models), EDDPMs naturally apply to different data types. Extensive experiments on text, proteins, and images demonstrate the flexibility to handle diverse data and tasks and the strong improvement over various existing models.
△ Less
Submitted 5 June, 2024; v1 submitted 29 February, 2024;
originally announced February 2024.
-
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT
Authors:
Omkar Thawakar,
Ashmal Vayani,
Salman Khan,
Hisham Cholakal,
Rao M. Anwer,
Michael Felsberg,
Tim Baldwin,
Eric P. Xing,
Fahad Shahbaz Khan
Abstract:
"Bigger the better" has been the predominant trend in recent Large Language Models (LLMs) development. However, LLMs do not suit well for scenarios that require on-device processing, energy efficiency, low memory footprint, and response efficiency. These requisites are crucial for privacy, security, and sustainable deployment. This paper explores the "less is more" paradigm by addressing the chall…
▽ More
"Bigger the better" has been the predominant trend in recent Large Language Models (LLMs) development. However, LLMs do not suit well for scenarios that require on-device processing, energy efficiency, low memory footprint, and response efficiency. These requisites are crucial for privacy, security, and sustainable deployment. This paper explores the "less is more" paradigm by addressing the challenge of designing accurate yet efficient Small Language Models (SLMs) for resource constrained devices. Our primary contribution is the introduction of an accurate and fully transparent open-source 0.5 billion (0.5B) parameter SLM, named MobiLlama, catering to the specific needs of resource-constrained computing with an emphasis on enhanced performance with reduced resource demands. MobiLlama is a SLM design that initiates from a larger model and applies a careful parameter sharing scheme to reduce both the pre-training and the deployment cost. Our work strives to not only bridge the gap in open-source SLMs but also ensures full transparency, where complete training data pipeline, training code, model weights, and over 300 checkpoints along with evaluation codes is available at : https://github.com/mbzuai-oryx/MobiLlama.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
Mitigating Catastrophic Forgetting in Multi-domain Chinese Spelling Correction by Multi-stage Knowledge Transfer Framework
Authors:
Peng Xing,
Yinghui Li,
Shirong Ma,
Xinnian Liang,
Haojing Huang,
Yangning Li,
Hai-Tao Zheng,
Wenhao Jiang,
Ying Shen
Abstract:
Chinese Spelling Correction (CSC) aims to detect and correct spelling errors in given sentences. Recently, multi-domain CSC has gradually attracted the attention of researchers because it is more practicable. In this paper, we focus on the key flaw of the CSC model when adapting to multi-domain scenarios: the tendency to forget previously acquired knowledge upon learning new domain-specific knowle…
▽ More
Chinese Spelling Correction (CSC) aims to detect and correct spelling errors in given sentences. Recently, multi-domain CSC has gradually attracted the attention of researchers because it is more practicable. In this paper, we focus on the key flaw of the CSC model when adapting to multi-domain scenarios: the tendency to forget previously acquired knowledge upon learning new domain-specific knowledge (i.e., catastrophic forgetting). To address this, we propose a novel model-agnostic Multi-stage Knowledge Transfer (MKT) framework, which utilizes a continuously evolving teacher model for knowledge transfer in each domain, rather than focusing solely on new domain knowledge. It deserves to be mentioned that we are the first to apply continual learning methods to the multi-domain CSC task. Experiments prove the effectiveness of our proposed method, and further analyses demonstrate the importance of overcoming catastrophic forgetting for improving the model performance.
△ Less
Submitted 17 February, 2024;
originally announced February 2024.
-
Pruning Sparse Tensor Neural Networks Enables Deep Learning for 3D Ultrasound Localization Microscopy
Authors:
Brice Rauby,
Paul Xing,
Jonathan Porée,
Maxime Gasse,
Jean Provost
Abstract:
Ultrasound Localization Microscopy (ULM) is a non-invasive technique that allows for the imaging of micro-vessels in vivo, at depth and with a resolution on the order of ten microns. ULM is based on the sub-resolution localization of individual microbubbles injected in the bloodstream. Mapping the whole angioarchitecture requires the accumulation of microbubbles trajectories from thousands of fram…
▽ More
Ultrasound Localization Microscopy (ULM) is a non-invasive technique that allows for the imaging of micro-vessels in vivo, at depth and with a resolution on the order of ten microns. ULM is based on the sub-resolution localization of individual microbubbles injected in the bloodstream. Mapping the whole angioarchitecture requires the accumulation of microbubbles trajectories from thousands of frames, typically acquired over a few minutes. ULM acquisition times can be reduced by increasing the microbubble concentration, but requires more advanced algorithms to detect them individually. Several deep learning approaches have been proposed for this task, but they remain limited to 2D imaging, in part due to the associated large memory requirements. Herein, we propose to use sparse tensor neural networks to reduce memory usage in 2D and to improve the scaling of the memory requirement for the extension of deep learning architecture to 3D. We study several approaches to efficiently convert ultrasound data into a sparse format and study the impact of the associated loss of information. When applied in 2D, the sparse formulation reduces the memory requirements by a factor 2 at the cost of a small reduction of performance when compared against dense networks. In 3D, the proposed approach reduces memory requirements by two order of magnitude while largely outperforming conventional ULM in high concentration settings. We show that Sparse Tensor Neural Networks in 3D ULM allow for the same benefits as dense deep learning based method in 2D ULM i.e. the use of higher concentration in silico and reduced acquisition time.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
Inverse Problem Approach to Aberration Correction for in vivo Transcranial Imaging Based on a Sparse Representation of Contrast-enhanced Ultrasound Data
Authors:
Paul Xing,
Antoine Malescot,
Eric Martineau,
Ravi Rungta,
Jean Provost
Abstract:
Transcranial ultrasound imaging is currently limited by attenuation and aberration induced by the skull. First used in contrast-enhanced ultrasound (CEUS), highly echoic microbubbles allowed for the development of novel imaging modalities such as ultrasound localization microscopy (ULM). Herein, we develop an inverse problem approach to aberration correction (IPAC) that leverages the sparsity of m…
▽ More
Transcranial ultrasound imaging is currently limited by attenuation and aberration induced by the skull. First used in contrast-enhanced ultrasound (CEUS), highly echoic microbubbles allowed for the development of novel imaging modalities such as ultrasound localization microscopy (ULM). Herein, we develop an inverse problem approach to aberration correction (IPAC) that leverages the sparsity of microbubble signals. We propose to use the \textit{a priori} knowledge of the medium based upon microbubble localization and wave propagation to build a forward model to link the measured signals directly to the aberration function. A standard least-squares inversion is then used to retrieve the aberration function. We first validated IPAC on simulated data of a vascular network using plane wave as well as divergent wave emissions. We then evaluated the reproducibility of IPAC \textit{in vivo} in 5 mouse brains. We showed that aberration correction improved the contrast of CEUS images by 4.6 dB. For ULM images, IPAC yielded sharper vessels, reduced vessel duplications, and improved the resolution from 21.1 $μ$m to 18.3 $μ$m. Aberration correction also improved hemodynamic quantification for velocity magnitude and flow direction.
△ Less
Submitted 14 May, 2024; v1 submitted 18 January, 2024;
originally announced January 2024.
-
LLM360: Towards Fully Transparent Open-Source LLMs
Authors:
Zhengzhong Liu,
Aurick Qiao,
Willie Neiswanger,
Hongyi Wang,
Bowen Tan,
Tianhua Tao,
Junbo Li,
Yuqi Wang,
Suqi Sun,
Omkar Pangarkar,
Richard Fan,
Yi Gu,
Victor Miller,
Yonghao Zhuang,
Guowei He,
Haonan Li,
Fajri Koto,
Liping Tang,
Nikhil Ranjan,
Zhiqiang Shen,
Xuguang Ren,
Roberto Iriondo,
Cun Mu,
Zhiting Hu,
Mark Schulze
, et al. (3 additional authors not shown)
Abstract:
The recent surge in open-source Large Language Models (LLMs), such as LLaMA, Falcon, and Mistral, provides diverse options for AI practitioners and researchers. However, most LLMs have only released partial artifacts, such as the final model weights or inference code, and technical reports increasingly limit their scope to high-level design choices and surface statistics. These choices hinder prog…
▽ More
The recent surge in open-source Large Language Models (LLMs), such as LLaMA, Falcon, and Mistral, provides diverse options for AI practitioners and researchers. However, most LLMs have only released partial artifacts, such as the final model weights or inference code, and technical reports increasingly limit their scope to high-level design choices and surface statistics. These choices hinder progress in the field by degrading transparency into the training of LLMs and forcing teams to rediscover many details in the training process. We present LLM360, an initiative to fully open-source LLMs, which advocates for all training code and data, model checkpoints, and intermediate results to be made available to the community. The goal of LLM360 is to support open and collaborative AI research by making the end-to-end LLM training process transparent and reproducible by everyone. As a first step of LLM360, we release two 7B parameter LLMs pre-trained from scratch, Amber and CrystalCoder, including their training code, data, intermediate checkpoints, and analyses (at https://www.llm360.ai). We are committed to continually pushing the boundaries of LLMs through this open-source effort. More large-scale and stronger models are underway and will be released in the future.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
A unified framework combining coherent compounding, harmonic imaging and angular coherence for simultaneous high-quality B-mode and tissue Doppler in ultrafast echocardiography
Authors:
Michael Mougharbel,
Jonathan Porée,
Stephen A. Lee,
Paul Xing,
Alice Wu,
Jean-Claude Tardif,
Jean Provost
Abstract:
Various methods have been proposed to enhance image quality in ultrafast ultrasound. Coherent compounding can improve image quality using multiple steered diverging transmits when motion occurring between transmits is corrected for. Harmonic imaging, a standard technique in conventional focused echocardiography, has been adapted for ultrafast imaging, reducing clutter. Coherence-based approaches h…
▽ More
Various methods have been proposed to enhance image quality in ultrafast ultrasound. Coherent compounding can improve image quality using multiple steered diverging transmits when motion occurring between transmits is corrected for. Harmonic imaging, a standard technique in conventional focused echocardiography, has been adapted for ultrafast imaging, reducing clutter. Coherence-based approaches have also been shown to increase contrast in clinical settings by enhancing signals from coherent echoes and reducing clutter. Herein, we introduce a simple, unified framework that combines motion-correction, harmonic imaging, and angular-coherence, showing for the first time that their benefits can be combined in real-time. Validation was conducted through in vitro testing on a spinning disk model and in vivo on 4 volunteers. In vitro results confirmed the unified framework capability to achieve high contrast in large-motion contexts up to 17 cm/s. In vivo testing highlighted proficiency in generating images of high quality during low and high tissue velocity phases of the cardiac cycle. Specifically, during ventricular filling, the unified framework increased the gCNR from 0.47 to 0.87 when compared against coherent compounding.
△ Less
Submitted 4 December, 2023;
originally announced December 2023.
-
LQ-LoRA: Low-rank Plus Quantized Matrix Decomposition for Efficient Language Model Finetuning
Authors:
Han Guo,
Philip Greengard,
Eric P. Xing,
Yoon Kim
Abstract:
We propose a simple approach for memory-efficient adaptation of pretrained language models. Our approach uses an iterative algorithm to decompose each pretrained matrix into a high-precision low-rank component and a memory-efficient quantized component. During finetuning, the quantized component remains fixed and only the low-rank component is updated. We present an integer linear programming form…
▽ More
We propose a simple approach for memory-efficient adaptation of pretrained language models. Our approach uses an iterative algorithm to decompose each pretrained matrix into a high-precision low-rank component and a memory-efficient quantized component. During finetuning, the quantized component remains fixed and only the low-rank component is updated. We present an integer linear programming formulation of the quantization component which enables dynamic configuration of quantization parameters (e.g., bit-width, block size) for each matrix given an overall target memory budget. We further explore a data-aware version of the algorithm which uses an approximation of the Fisher information matrix to weight the reconstruction objective during matrix decomposition. Experiments on finetuning RoBERTa and LLaMA-2 (7B and 70B) demonstrate that our low-rank plus quantized matrix decomposition approach (LQ-LoRA) outperforms strong QLoRA and GPTQ-LoRA baselines and enables aggressive quantization to sub-3 bits with only minor performance degradations. When finetuned on a language modeling calibration dataset, LQ-LoRA can also be used for model compression; in this setting our 2.75-bit LLaMA-2-70B model (which has 2.85 bits on average when including the low-rank components and requires 27GB of GPU memory) performs respectably compared to the 16-bit baseline.
△ Less
Submitted 30 June, 2024; v1 submitted 20 November, 2023;
originally announced November 2023.
-
Dynamic Imaging using any Ultrasound Localization Microscopy Dataset
Authors:
Nin Ghigo,
Gerardo Ramos-Palacios,
Chloé Bourquin,
Paul Xing,
Alice Wu,
Nelson Cortés,
Hugo Ladret,
Lamyae Ikan,
Christian Casanova,
Jonathan Porée,
Abbas Sadikot,
Jean Provost
Abstract:
Ultrasound Localization Microscopy (ULM) relies on the injection of microbubbles (MBs) to obtain highly resolved density maps of blood circulation in vivo, with a resolution that can reach 10 μm ~ λ/10 in the rodent brain. Static mean velocity maps can be extracted but are intrinsically biased by potential significant changes in the number of MBs detected during the cardiac cycle. Dynamic ULM (DUL…
▽ More
Ultrasound Localization Microscopy (ULM) relies on the injection of microbubbles (MBs) to obtain highly resolved density maps of blood circulation in vivo, with a resolution that can reach 10 μm ~ λ/10 in the rodent brain. Static mean velocity maps can be extracted but are intrinsically biased by potential significant changes in the number of MBs detected during the cardiac cycle. Dynamic ULM (DULM) is a technique developed for non-invasive pulsatility measurements in the brain of rodents, leading to temporally resolved velocity and density cine-loops. It was previously based on external triggers such as the electrocardiogram (ECG), limiting its use to datasets acquired specifically for DULM applications while also increasing the required acquisition time. This study presents a new motion matching method using tissue Doppler that eliminates the need for ECG-gating in DULM experiments. DULM can now be performed on any ULM datasets, recovering pertinent temporal information, and improving the robustness of the mean velocity estimates.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization
Authors:
Xinyuan Wang,
Chenxi Li,
Zhen Wang,
Fan Bai,
Haotian Luo,
Jiayou Zhang,
Nebojsa Jojic,
Eric P. Xing,
Zhiting Hu
Abstract:
Highly effective, task-specific prompts are often heavily engineered by experts to integrate detailed instructions and domain insights based on a deep understanding of both instincts of large language models (LLMs) and the intricacies of the target task. However, automating the generation of such expert-level prompts remains elusive. Existing prompt optimization methods tend to overlook the depth…
▽ More
Highly effective, task-specific prompts are often heavily engineered by experts to integrate detailed instructions and domain insights based on a deep understanding of both instincts of large language models (LLMs) and the intricacies of the target task. However, automating the generation of such expert-level prompts remains elusive. Existing prompt optimization methods tend to overlook the depth of domain knowledge and struggle to efficiently explore the vast space of expert-level prompts. Addressing this, we present PromptAgent, an optimization method that autonomously crafts prompts equivalent in quality to those handcrafted by experts. At its core, PromptAgent views prompt optimization as a strategic planning problem and employs a principled planning algorithm, rooted in Monte Carlo tree search, to strategically navigate the expert-level prompt space. Inspired by human-like trial-and-error exploration, PromptAgent induces precise expert-level insights and in-depth instructions by reflecting on model errors and generating constructive error feedback. Such a novel framework allows the agent to iteratively examine intermediate prompts (states), refine them based on error feedbacks (actions), simulate future rewards, and search for high-reward paths leading to expert prompts. We apply PromptAgent to 12 tasks spanning three practical domains: BIG-Bench Hard (BBH), as well as domain-specific and general NLP tasks, showing it significantly outperforms strong Chain-of-Thought and recent prompt optimization baselines. Extensive analyses emphasize its capability to craft expert-level, detailed, and domain-insightful prompts with great efficiency and generalizability.
△ Less
Submitted 7 December, 2023; v1 submitted 25 October, 2023;
originally announced October 2023.
-
Contextualized Machine Learning
Authors:
Benjamin Lengerich,
Caleb N. Ellington,
Andrea Rubbi,
Manolis Kellis,
Eric P. Xing
Abstract:
We examine Contextualized Machine Learning (ML), a paradigm for learning heterogeneous and context-dependent effects. Contextualized ML estimates heterogeneous functions by applying deep learning to the meta-relationship between contextual information and context-specific parametric models. This is a form of varying-coefficient modeling that unifies existing frameworks including cluster analysis a…
▽ More
We examine Contextualized Machine Learning (ML), a paradigm for learning heterogeneous and context-dependent effects. Contextualized ML estimates heterogeneous functions by applying deep learning to the meta-relationship between contextual information and context-specific parametric models. This is a form of varying-coefficient modeling that unifies existing frameworks including cluster analysis and cohort modeling by introducing two reusable concepts: a context encoder which translates sample context into model parameters, and sample-specific model which operates on sample predictors. We review the process of developing contextualized models, nonparametric inference from contextualized models, and identifiability conditions of contextualized models. Finally, we present the open-source PyTorch package ContextualizedML.
△ Less
Submitted 17 October, 2023;
originally announced October 2023.
-
Contextualized Policy Recovery: Modeling and Interpreting Medical Decisions with Adaptive Imitation Learning
Authors:
Jannik Deuschel,
Caleb N. Ellington,
Yingtao Luo,
Benjamin J. Lengerich,
Pascal Friederich,
Eric P. Xing
Abstract:
Interpretable policy learning seeks to estimate intelligible decision policies from observed actions; however, existing models force a tradeoff between accuracy and interpretability, limiting data-driven interpretations of human decision-making processes. Fundamentally, existing approaches are burdened by this tradeoff because they represent the underlying decision process as a universal policy, w…
▽ More
Interpretable policy learning seeks to estimate intelligible decision policies from observed actions; however, existing models force a tradeoff between accuracy and interpretability, limiting data-driven interpretations of human decision-making processes. Fundamentally, existing approaches are burdened by this tradeoff because they represent the underlying decision process as a universal policy, when in fact human decisions are dynamic and can change drastically under different contexts. Thus, we develop Contextualized Policy Recovery (CPR), which re-frames the problem of modeling complex decision processes as a multi-task learning problem, where each context poses a unique task and complex decision policies can be constructed piece-wise from many simple context-specific policies. CPR models each context-specific policy as a linear map, and generates new policy models $\textit{on-demand}$ as contexts are updated with new observations. We provide two flavors of the CPR framework: one focusing on exact local interpretability, and one retaining full global interpretability. We assess CPR through studies on simulated and real data, achieving state-of-the-art performance on predicting antibiotic prescription in intensive care units ($+22\%$ AUROC vs. previous SOTA) and predicting MRI prescription for Alzheimer's patients ($+7.7\%$ AUROC vs. previous SOTA). With this improvement, CPR closes the accuracy gap between interpretable and black-box methods, allowing high-resolution exploration and analysis of context-specific decision models.
△ Less
Submitted 7 May, 2024; v1 submitted 11 October, 2023;
originally announced October 2023.
-
DISTFLASHATTN: Distributed Memory-efficient Attention for Long-context LLMs Training
Authors:
Dacheng Li,
Rulin Shao,
Anze Xie,
Eric P. Xing,
Xuezhe Ma,
Ion Stoica,
Joseph E. Gonzalez,
Hao Zhang
Abstract:
FlashAttention (Dao, 2023) effectively reduces the quadratic peak memory usage to linear in training transformer-based large language models (LLMs) on a single GPU. In this paper, we introduce DISTFLASHATTN, a distributed memory-efficient attention mechanism optimized for long-context LLMs training. We propose three key techniques: token-level workload balancing, overlapping key-value communicatio…
▽ More
FlashAttention (Dao, 2023) effectively reduces the quadratic peak memory usage to linear in training transformer-based large language models (LLMs) on a single GPU. In this paper, we introduce DISTFLASHATTN, a distributed memory-efficient attention mechanism optimized for long-context LLMs training. We propose three key techniques: token-level workload balancing, overlapping key-value communication, and a rematerialization-aware gradient checkpointing algorithm. We evaluate DISTFLASHATTN on Llama-7B and variants with sequence lengths from 32K to 512K. DISTFLASHATTN achieves 8x longer sequences, 4.45 - 5.64x speedup compared to Ring Self-Attention, 2 - 8x longer sequences, 1.24 - 2.01x speedup compared to Megatron-LM with FlashAttention. It achieves 1.67x and 1.26 - 1.88x speedup compared to recent Ring Attention and DeepSpeed-Ulysses. Code is available at https://github.com/RulinShao/LightSeq.
△ Less
Submitted 31 March, 2024; v1 submitted 4 October, 2023;
originally announced October 2023.
-
FedNAR: Federated Optimization with Normalized Annealing Regularization
Authors:
Junbo Li,
Ang Li,
Chong Tian,
Qirong Ho,
Eric P. Xing,
Hongyi Wang
Abstract:
Weight decay is a standard technique to improve generalization performance in modern deep neural network optimization, and is also widely adopted in federated learning (FL) to prevent overfitting in local clients. In this paper, we first explore the choices of weight decay and identify that weight decay value appreciably influences the convergence of existing FL algorithms. While preventing overfi…
▽ More
Weight decay is a standard technique to improve generalization performance in modern deep neural network optimization, and is also widely adopted in federated learning (FL) to prevent overfitting in local clients. In this paper, we first explore the choices of weight decay and identify that weight decay value appreciably influences the convergence of existing FL algorithms. While preventing overfitting is crucial, weight decay can introduce a different optimization goal towards the global objective, which is further amplified in FL due to multiple local updates and heterogeneous data distribution. To address this challenge, we develop {\it Federated optimization with Normalized Annealing Regularization} (FedNAR), a simple yet effective and versatile algorithmic plug-in that can be seamlessly integrated into any existing FL algorithms. Essentially, we regulate the magnitude of each update by performing co-clipping of the gradient and weight decay. We provide a comprehensive theoretical analysis of FedNAR's convergence rate and conduct extensive experiments on both vision and language datasets with different backbone federated optimization algorithms. Our experimental results consistently demonstrate that incorporating FedNAR into existing FL algorithms leads to accelerated convergence and heightened model accuracy. Moreover, FedNAR exhibits resilience in the face of various hyperparameter configurations. Specifically, FedNAR has the ability to self-adjust the weight decay when the initial specification is not optimal, while the accuracy of traditional FL algorithms would markedly decline. Our codes are released at \href{https://github.com/ljb121002/fednar}{https://github.com/ljb121002/fednar}.
△ Less
Submitted 4 October, 2023;
originally announced October 2023.
-
LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset
Authors:
Lianmin Zheng,
Wei-Lin Chiang,
Ying Sheng,
Tianle Li,
Siyuan Zhuang,
Zhanghao Wu,
Yonghao Zhuang,
Zhuohan Li,
Zi Lin,
Eric P. Xing,
Joseph E. Gonzalez,
Ion Stoica,
Hao Zhang
Abstract:
Studying how people interact with large language models (LLMs) in real-world scenarios is increasingly important due to their widespread use in various applications. In this paper, we introduce LMSYS-Chat-1M, a large-scale dataset containing one million real-world conversations with 25 state-of-the-art LLMs. This dataset is collected from 210K unique IP addresses in the wild on our Vicuna demo and…
▽ More
Studying how people interact with large language models (LLMs) in real-world scenarios is increasingly important due to their widespread use in various applications. In this paper, we introduce LMSYS-Chat-1M, a large-scale dataset containing one million real-world conversations with 25 state-of-the-art LLMs. This dataset is collected from 210K unique IP addresses in the wild on our Vicuna demo and Chatbot Arena website. We offer an overview of the dataset's content, including its curation process, basic statistics, and topic distribution, highlighting its diversity, originality, and scale. We demonstrate its versatility through four use cases: developing content moderation models that perform similarly to GPT-4, building a safety benchmark, training instruction-following models that perform similarly to Vicuna, and creating challenging benchmark questions. We believe that this dataset will serve as a valuable resource for understanding and advancing LLM capabilities. The dataset is publicly available at https://huggingface.co/datasets/lmsys/lmsys-chat-1m.
△ Less
Submitted 10 March, 2024; v1 submitted 21 September, 2023;
originally announced September 2023.
-
Federated Neuro-Symbolic Learning
Authors:
Pengwei Xing,
Songtao Lu,
Han Yu
Abstract:
Neuro-symbolic learning (NSL) models complex symbolic rule patterns into latent variable distributions by neural networks, which reduces rule search space and generates unseen rules to improve downstream task performance. Centralized NSL learning involves directly acquiring data from downstream tasks, which is not feasible for federated learning (FL). To address this limitation, we shift the focus…
▽ More
Neuro-symbolic learning (NSL) models complex symbolic rule patterns into latent variable distributions by neural networks, which reduces rule search space and generates unseen rules to improve downstream task performance. Centralized NSL learning involves directly acquiring data from downstream tasks, which is not feasible for federated learning (FL). To address this limitation, we shift the focus from such a one-to-one interactive neuro-symbolic paradigm to one-to-many Federated Neuro-Symbolic Learning framework (FedNSL) with latent variables as the FL communication medium. Built on the basis of our novel reformulation of the NSL theory, FedNSL is capable of identifying and addressing rule distribution heterogeneity through a simple and effective Kullback-Leibler (KL) divergence constraint on rule distribution applicable under the FL setting. It further theoretically adjusts variational expectation maximization (V-EM) to reduce the rule search space across domains. This is the first incorporation of distribution-coupled bilevel optimization into FL. Extensive experiments based on both synthetic and real-world data demonstrate significant advantages of FedNSL compared to five state-of-the-art methods. It outperforms the best baseline by 17% and 29% in terms of unbalanced average training accuracy and unseen average testing accuracy, respectively.
△ Less
Submitted 27 May, 2024; v1 submitted 29 August, 2023;
originally announced August 2023.
-
A Tracking prior to Localization workflow for Ultrasound Localization Microscopy
Authors:
Alexis Leconte,
Jonathan Porée,
Brice Rauby,
Alice Wu,
Nin Ghigo,
Paul Xing,
Chloé Bourquin,
Gerardo Ramos-Palacios,
Abbas F. Sadikot,
Jean Provost
Abstract:
Ultrasound Localization Microscopy (ULM) has proven effective in resolving microvascular structures and local mean velocities at sub-diffraction-limited scales, offering high-resolution imaging capabilities. Dynamic ULM (DULM) enables the creation of angiography or velocity movies throughout cardiac cycles. Currently, these techniques rely on a Localization-and-Tracking (LAT) workflow consisting i…
▽ More
Ultrasound Localization Microscopy (ULM) has proven effective in resolving microvascular structures and local mean velocities at sub-diffraction-limited scales, offering high-resolution imaging capabilities. Dynamic ULM (DULM) enables the creation of angiography or velocity movies throughout cardiac cycles. Currently, these techniques rely on a Localization-and-Tracking (LAT) workflow consisting in detecting microbubbles (MB) in the frames before pairing them to generate tracks. While conventional LAT methods perform well at low concentrations, they suffer from longer acquisition times and degraded localization and tracking accuracy at higher concentrations, leading to biased angiogram reconstruction and velocity estimation. In this study, we propose a novel approach to address these challenges by reversing the current workflow. The proposed method, Tracking-and-Localization (TAL), relies on first tracking the MB and then performing localization. Through comprehensive benchmarking using both in silico and in vivo experiments and employing various metrics to quantify ULM angiography and velocity maps, we demonstrate that the TAL method consistently outperforms the reference LAT workflow. Moreover, when applied to DULM, TAL successfully extracts velocity variations along the cardiac cycle with improved repeatability. The findings of this work highlight the effectiveness of the TAL approach in overcoming the limitations of conventional LAT methods, providing enhanced ULM angiography and velocity imaging.
△ Less
Submitted 4 August, 2023;
originally announced August 2023.
-
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Authors:
Lianmin Zheng,
Wei-Lin Chiang,
Ying Sheng,
Siyuan Zhuang,
Zhanghao Wu,
Yonghao Zhuang,
Zi Lin,
Zhuohan Li,
Dacheng Li,
Eric P. Xing,
Hao Zhang,
Joseph E. Gonzalez,
Ion Stoica
Abstract:
Evaluating large language model (LLM) based chat assistants is challenging due to their broad capabilities and the inadequacy of existing benchmarks in measuring human preferences. To address this, we explore using strong LLMs as judges to evaluate these models on more open-ended questions. We examine the usage and limitations of LLM-as-a-judge, including position, verbosity, and self-enhancement…
▽ More
Evaluating large language model (LLM) based chat assistants is challenging due to their broad capabilities and the inadequacy of existing benchmarks in measuring human preferences. To address this, we explore using strong LLMs as judges to evaluate these models on more open-ended questions. We examine the usage and limitations of LLM-as-a-judge, including position, verbosity, and self-enhancement biases, as well as limited reasoning ability, and propose solutions to mitigate some of them. We then verify the agreement between LLM judges and human preferences by introducing two benchmarks: MT-bench, a multi-turn question set; and Chatbot Arena, a crowdsourced battle platform. Our results reveal that strong LLM judges like GPT-4 can match both controlled and crowdsourced human preferences well, achieving over 80% agreement, the same level of agreement between humans. Hence, LLM-as-a-judge is a scalable and explainable way to approximate human preferences, which are otherwise very expensive to obtain. Additionally, we show our benchmark and traditional benchmarks complement each other by evaluating several variants of LLaMA and Vicuna. The MT-bench questions, 3K expert votes, and 30K conversations with human preferences are publicly available at https://github.com/lm-sys/FastChat/tree/main/fastchat/llm_judge.
△ Less
Submitted 23 December, 2023; v1 submitted 9 June, 2023;
originally announced June 2023.
-
Understanding Masked Autoencoders via Hierarchical Latent Variable Models
Authors:
Lingjing Kong,
Martin Q. Ma,
Guangyi Chen,
Eric P. Xing,
Yuejie Chi,
Louis-Philippe Morency,
Kun Zhang
Abstract:
Masked autoencoder (MAE), a simple and effective self-supervised learning framework based on the reconstruction of masked image regions, has recently achieved prominent success in a variety of vision tasks. Despite the emergence of intriguing empirical observations on MAE, a theoretically principled understanding is still lacking. In this work, we formally characterize and justify existing empiric…
▽ More
Masked autoencoder (MAE), a simple and effective self-supervised learning framework based on the reconstruction of masked image regions, has recently achieved prominent success in a variety of vision tasks. Despite the emergence of intriguing empirical observations on MAE, a theoretically principled understanding is still lacking. In this work, we formally characterize and justify existing empirical insights and provide theoretical guarantees of MAE. We formulate the underlying data-generating process as a hierarchical latent variable model and show that under reasonable assumptions, MAE provably identifies a set of latent variables in the hierarchical model, explaining why MAE can extract high-level information from pixels. Further, we show how key hyperparameters in MAE (the masking ratio and the patch size) determine which true latent variables to be recovered, therefore influencing the level of semantic information in the representation. Specifically, extremely large or small masking ratios inevitably lead to low-level representations. Our theory offers coherent explanations of existing empirical observations and provides insights for potential empirical improvements and fundamental limitations of the masking-reconstruction paradigm. We conduct extensive experiments to validate our theoretical insights.
△ Less
Submitted 7 June, 2023;
originally announced June 2023.
-
Cuttlefish: Low-Rank Model Training without All the Tuning
Authors:
Hongyi Wang,
Saurabh Agarwal,
Pongsakorn U-chupala,
Yoshiki Tanaka,
Eric P. Xing,
Dimitris Papailiopoulos
Abstract:
Recent research has shown that training low-rank neural networks can effectively reduce the total number of trainable parameters without sacrificing predictive accuracy, resulting in end-to-end speedups. However, low-rank model training necessitates adjusting several additional factorization hyperparameters, such as the rank of the factorization at each layer. In this paper, we tackle this challen…
▽ More
Recent research has shown that training low-rank neural networks can effectively reduce the total number of trainable parameters without sacrificing predictive accuracy, resulting in end-to-end speedups. However, low-rank model training necessitates adjusting several additional factorization hyperparameters, such as the rank of the factorization at each layer. In this paper, we tackle this challenge by introducing Cuttlefish, an automated low-rank training approach that eliminates the need for tuning factorization hyperparameters. Cuttlefish leverages the observation that after a few epochs of full-rank training, the stable rank (i.e., an approximation of the true rank) of each layer stabilizes at a constant value. Cuttlefish switches from full-rank to low-rank training once the stable ranks of all layers have converged, setting the dimension of each factorization to its corresponding stable rank. Our results show that Cuttlefish generates models up to 5.6 times smaller than full-rank models, and attains up to a 1.2 times faster end-to-end training process while preserving comparable accuracy. Moreover, Cuttlefish outperforms state-of-the-art low-rank model training methods and other prominent baselines. The source code for our implementation can be found at: https://github.com/hwang595/Cuttlefish.
△ Less
Submitted 5 May, 2023; v1 submitted 4 May, 2023;
originally announced May 2023.
-
Federated Learning as Variational Inference: A Scalable Expectation Propagation Approach
Authors:
Han Guo,
Philip Greengard,
Hongyi Wang,
Andrew Gelman,
Yoon Kim,
Eric P. Xing
Abstract:
The canonical formulation of federated learning treats it as a distributed optimization problem where the model parameters are optimized against a global loss function that decomposes across client loss functions. A recent alternative formulation instead treats federated learning as a distributed inference problem, where the goal is to infer a global posterior from partitioned client data (Al-Shed…
▽ More
The canonical formulation of federated learning treats it as a distributed optimization problem where the model parameters are optimized against a global loss function that decomposes across client loss functions. A recent alternative formulation instead treats federated learning as a distributed inference problem, where the goal is to infer a global posterior from partitioned client data (Al-Shedivat et al., 2021). This paper extends the inference view and describes a variational inference formulation of federated learning where the goal is to find a global variational posterior that well-approximates the true posterior. This naturally motivates an expectation propagation approach to federated learning (FedEP), where approximations to the global posterior are iteratively refined through probabilistic message-passing between the central server and the clients. We conduct an extensive empirical study across various algorithmic considerations and describe practical strategies for scaling up expectation propagation to the modern federated setting. We apply FedEP on standard federated learning benchmarks and find that it outperforms strong baselines in terms of both convergence speed and accuracy.
△ Less
Submitted 8 February, 2023;
originally announced February 2023.
-
Does compressing activations help model parallel training?
Authors:
Song Bian,
Dacheng Li,
Hongyi Wang,
Eric P. Xing,
Shivaram Venkataraman
Abstract:
Large-scale Transformer models are known for their exceptional performance in a range of tasks, but training them can be difficult due to the requirement for communication-intensive model parallelism. One way to improve training speed is to compress the message size in communication. Previous approaches have primarily focused on compressing gradients in a data parallelism setting, but compression…
▽ More
Large-scale Transformer models are known for their exceptional performance in a range of tasks, but training them can be difficult due to the requirement for communication-intensive model parallelism. One way to improve training speed is to compress the message size in communication. Previous approaches have primarily focused on compressing gradients in a data parallelism setting, but compression in a model-parallel setting is an understudied area. We have discovered that model parallelism has fundamentally different characteristics than data parallelism. In this work, we present the first empirical study on the effectiveness of compression methods for model parallelism. We implement and evaluate three common classes of compression algorithms - pruning-based, learning-based, and quantization-based - using a popular Transformer training framework. We evaluate these methods across more than 160 settings and 8 popular datasets, taking into account different hyperparameters, hardware, and both fine-tuning and pre-training stages. We also provide analysis when the model is scaled up. Finally, we provide insights for future development of model parallelism compression algorithms.
△ Less
Submitted 6 January, 2023;
originally announced January 2023.
-
A novel method for searching the $Ξ_c^{0/+}$-$Ξ_c^{\prime 0/+}$ mixing effect in the angular distribution analysis of a four-body $Ξ_c^{0/+}$ decay
Authors:
Zhi Peng Xing,
Yu ji Shi
Abstract:
In this work, we raised a novel method for searching the $Ξ^{0+}_c$-$Ξ_c^{0+\prime}$ mixing effect in an angular distribution analysis of the $Ξ_c\toΞ^{(\prime)}(Λπ)\ell^+ν$ decay, where the mixing effect can be observed by the appearance of the $Ξ^{\prime}$ resonant. Armed with this angular distribution, the decay branching fraction and the forward-backward asymmetry are predicted. We pointed out…
▽ More
In this work, we raised a novel method for searching the $Ξ^{0+}_c$-$Ξ_c^{0+\prime}$ mixing effect in an angular distribution analysis of the $Ξ_c\toΞ^{(\prime)}(Λπ)\ell^+ν$ decay, where the mixing effect can be observed by the appearance of the $Ξ^{\prime}$ resonant. Armed with this angular distribution, the decay branching fraction and the forward-backward asymmetry are predicted. We pointed out that the forward-backward asymmetry, as a function of the invariant mass square of $Ξ^{(\prime)}$ and the $Ξ^{0+}_c$-$Ξ_c^{0+\prime}$ mixing angle $θ_c$, can be used to distinguish the two resonants $Ξ^{(\prime)}$ and even provide a possibility to determine the exact mixing angle.
△ Less
Submitted 17 December, 2022;
originally announced December 2022.
-
Expeditious Saliency-guided Mix-up through Random Gradient Thresholding
Authors:
Minh-Long Luu,
Zeyi Huang,
Eric P. Xing,
Yong Jae Lee,
Haohan Wang
Abstract:
Mix-up training approaches have proven to be effective in improving the generalization ability of Deep Neural Networks. Over the years, the research community expands mix-up methods into two directions, with extensive efforts to improve saliency-guided procedures but minimal focus on the arbitrary path, leaving the randomization domain unexplored. In this paper, inspired by the superior qualities…
▽ More
Mix-up training approaches have proven to be effective in improving the generalization ability of Deep Neural Networks. Over the years, the research community expands mix-up methods into two directions, with extensive efforts to improve saliency-guided procedures but minimal focus on the arbitrary path, leaving the randomization domain unexplored. In this paper, inspired by the superior qualities of each direction over one another, we introduce a novel method that lies at the junction of the two routes. By combining the best elements of randomness and saliency utilization, our method balances speed, simplicity, and accuracy. We name our method R-Mix following the concept of "Random Mix-up". We demonstrate its effectiveness in generalization, weakly supervised object localization, calibration, and robustness to adversarial attacks. Finally, in order to address the question of whether there exists a better decision protocol, we train a Reinforcement Learning agent that decides the mix-up policies based on the classifier's performance, reducing dependency on human-designed objectives and hyperparameter tuning. Extensive experiments further show that the agent is capable of performing at the cutting-edge level, laying the foundation for a fully automatic mix-up. Our code is released at [https://github.com/minhlong94/Random-Mixup].
△ Less
Submitted 10 August, 2023; v1 submitted 9 December, 2022;
originally announced December 2022.
-
Puffy: A Step-by-step Guide to Craft Bio-inspired Artifacts with Interactive Materiality
Authors:
Sark Pangrui Xing,
Bart van Dijk,
Pengcheng An,
Miguel Bruns,
Yaliang Chuang,
Stephen Jia Wang
Abstract:
A rising number of HCI scholars have begun to use materiality as a starting point for exploring the design's potential and restrictions. Despite the theoretical flourishing, the practical design process and instruction for beginner practitioners are still in scarcity. We leveraged the pictorial format to illustrate our crafting process of Puffy, a bio-inspired artifact that features a cilia-mimeti…
▽ More
A rising number of HCI scholars have begun to use materiality as a starting point for exploring the design's potential and restrictions. Despite the theoretical flourishing, the practical design process and instruction for beginner practitioners are still in scarcity. We leveraged the pictorial format to illustrate our crafting process of Puffy, a bio-inspired artifact that features a cilia-mimetic surface expressing anthropomorphic qualities through shape changes. Our approach consists of three key activities (i.e., analysis, synthesis, and detailing) interlaced recursively throughout the journey. Using this approach, we analyzed different input sources, synthesized peers' critiques and self-reflection, and detailed the designed experience with iterative prototypes. Building on a reflective analysis of our approach, we concluded with a set of practical implications and design recommendations to inform other practitioners to initiate their investigations in interactive materiality.
△ Less
Submitted 27 November, 2022;
originally announced November 2022.
-
On Optimizing the Communication of Model Parallelism
Authors:
Yonghao Zhuang,
Hexu Zhao,
Lianmin Zheng,
Zhuohan Li,
Eric P. Xing,
Qirong Ho,
Joseph E. Gonzalez,
Ion Stoica,
Hao Zhang
Abstract:
We study a novel and important communication pattern in large-scale model-parallel deep learning (DL), which we call cross-mesh resharding. This pattern emerges when the two paradigms of model parallelism - intra-operator and inter-operator parallelism - are combined to support large models on large clusters. In cross-mesh resharding, a sharded tensor needs to be sent from a source device mesh to…
▽ More
We study a novel and important communication pattern in large-scale model-parallel deep learning (DL), which we call cross-mesh resharding. This pattern emerges when the two paradigms of model parallelism - intra-operator and inter-operator parallelism - are combined to support large models on large clusters. In cross-mesh resharding, a sharded tensor needs to be sent from a source device mesh to a destination device mesh, on which the tensor may be distributed with the same or different layouts. We formalize this as a many-to-many multicast communication problem, and show that existing approaches either are sub-optimal or do not generalize to different network topologies or tensor layouts, which result from different model architectures and parallelism strategies. We then propose two contributions to address cross-mesh resharding: an efficient broadcast-based communication system, and an "overlapping-friendly" pipeline schedule. On microbenchmarks, our overall system outperforms existing ones by up to 10x across various tensor and mesh layouts. On end-to-end training of two large models, GPT-3 and U-Transformer, we improve throughput by 10% and 50%, respectively.
△ Less
Submitted 9 November, 2022;
originally announced November 2022.
-
MPCFormer: fast, performant and private Transformer inference with MPC
Authors:
Dacheng Li,
Rulin Shao,
Hongyi Wang,
Han Guo,
Eric P. Xing,
Hao Zhang
Abstract:
Enabling private inference is crucial for many cloud inference services that are based on Transformer models. However, existing private inference solutions can increase the inference latency by more than 60x or significantly compromise the inference quality. In this paper, we design the framework MPCFORMER as a practical solution, using Secure Multi-Party Computation (MPC) and Knowledge Distillati…
▽ More
Enabling private inference is crucial for many cloud inference services that are based on Transformer models. However, existing private inference solutions can increase the inference latency by more than 60x or significantly compromise the inference quality. In this paper, we design the framework MPCFORMER as a practical solution, using Secure Multi-Party Computation (MPC) and Knowledge Distillation (KD). Through extensive evaluations, we show that MPCFORMER significantly speeds up Transformer inference in MPC settings while achieving similar ML performance to the input model. On the IMDb dataset, it achieves similar performance to BERTBASE, while being 5.3x faster. On the GLUE benchmark, it achieves 97% performance of BERTBASE with a 2.2x speedup. MPCFORMER remains effective with different trained Transformer weights such as ROBERTABASE and larger models including BERTLarge. Code is available at https://github.com/MccRee177/MPCFormer.
△ Less
Submitted 16 March, 2023; v1 submitted 2 November, 2022;
originally announced November 2022.
-
ADPS: Asymmetric Distillation Post-Segmentation for Image Anomaly Detection
Authors:
Peng Xing,
Hao Tang,
Jinhui Tang,
Zechao Li
Abstract:
Knowledge Distillation-based Anomaly Detection (KDAD) methods rely on the teacher-student paradigm to detect and segment anomalous regions by contrasting the unique features extracted by both networks. However, existing KDAD methods suffer from two main limitations: 1) the student network can effortlessly replicate the teacher network's representations, and 2) the features of the teacher network s…
▽ More
Knowledge Distillation-based Anomaly Detection (KDAD) methods rely on the teacher-student paradigm to detect and segment anomalous regions by contrasting the unique features extracted by both networks. However, existing KDAD methods suffer from two main limitations: 1) the student network can effortlessly replicate the teacher network's representations, and 2) the features of the teacher network serve solely as a ``reference standard" and are not fully leveraged. Toward this end, we depart from the established paradigm and instead propose an innovative approach called Asymmetric Distillation Post-Segmentation (ADPS). Our ADPS employs an asymmetric distillation paradigm that takes distinct forms of the same image as the input of the teacher-student networks, driving the student network to learn discriminating representations for anomalous regions.
Meanwhile, a customized Weight Mask Block (WMB) is proposed to generate a coarse anomaly localization mask that transfers the distilled knowledge acquired from the asymmetric paradigm to the teacher network. Equipped with WMB, the proposed Post-Segmentation Module (PSM) is able to effectively detect and segment abnormal regions with fine structures and clear boundaries. Experimental results demonstrate that the proposed ADPS outperforms the state-of-the-art methods in detecting and segmenting anomalies. Surprisingly, ADPS significantly improves Average Precision (AP) metric by 9% and 20% on the MVTec AD and KolektorSDD2 datasets, respectively.
△ Less
Submitted 24 July, 2023; v1 submitted 19 October, 2022;
originally announced October 2022.
-
ASDOT: Any-Shot Data-to-Text Generation with Pretrained Language Models
Authors:
Jiannan Xiang,
Zhengzhong Liu,
Yucheng Zhou,
Eric P. Xing,
Zhiting Hu
Abstract:
Data-to-text generation is challenging due to the great variety of the input data in terms of domains (e.g., finance vs sports) or schemata (e.g., diverse predicates). Recent end-to-end neural methods thus require substantial training examples to learn to disambiguate and describe the data. Yet, real-world data-to-text problems often suffer from various data-scarce issues: one may have access to o…
▽ More
Data-to-text generation is challenging due to the great variety of the input data in terms of domains (e.g., finance vs sports) or schemata (e.g., diverse predicates). Recent end-to-end neural methods thus require substantial training examples to learn to disambiguate and describe the data. Yet, real-world data-to-text problems often suffer from various data-scarce issues: one may have access to only a handful of or no training examples, and/or have to rely on examples in a different domain or schema. To fill this gap, we propose Any-Shot Data-to-Text (ASDOT), a new approach flexibly applicable to diverse settings by making efficient use of any given (or no) examples. ASDOT consists of two steps, data disambiguation and sentence fusion, both of which are amenable to be solved with off-the-shelf pretrained language models (LMs) with optional finetuning. In the data disambiguation stage, we employ the prompted GPT-3 model to understand possibly ambiguous triples from the input data and convert each into a short sentence with reduced ambiguity. The sentence fusion stage then uses an LM like T5 to fuse all the resulting sentences into a coherent paragraph as the final description. We evaluate extensively on various datasets in different scenarios, including the zero-/few-/full-shot settings, and generalization to unseen predicates and out-of-domain data. Experimental results show that ASDOT consistently achieves significant improvement over baselines, e.g., a 30.81 BLEU gain on the DART dataset under the zero-shot setting.
△ Less
Submitted 22 October, 2022; v1 submitted 9 October, 2022;
originally announced October 2022.
-
Visual Anomaly Detection Via Partition Memory Bank Module and Error Estimation
Authors:
Peng Xing,
Zechao Li
Abstract:
Reconstruction method based on the memory module for visual anomaly detection attempts to narrow the reconstruction error for normal samples while enlarging it for anomalous samples. Unfortunately, the existing memory module is not fully applicable to the anomaly detection task, and the reconstruction error of the anomaly samples remains small. Towards this end, this work proposes a new unsupervis…
▽ More
Reconstruction method based on the memory module for visual anomaly detection attempts to narrow the reconstruction error for normal samples while enlarging it for anomalous samples. Unfortunately, the existing memory module is not fully applicable to the anomaly detection task, and the reconstruction error of the anomaly samples remains small. Towards this end, this work proposes a new unsupervised visual anomaly detection method to jointly learn effective normal features and eliminate unfavorable reconstruction errors. Specifically, a novel Partition Memory Bank (PMB) module is proposed to effectively learn and store detailed features with semantic integrity of normal samples. It develops a new partition mechanism and a unique query generation method to preserve the context information and then improves the learning ability of the memory module. The proposed PMB and the skip connection are alternatively explored to make the reconstruction of abnormal samples worse. To obtain more precise anomaly localization results and solve the problem of cumulative reconstruction error, a novel Histogram Error Estimation module is proposed to adaptively eliminate the unfavorable errors by the histogram of the difference image. It improves the anomaly localization performance without increasing the cost. To evaluate the effectiveness of the proposed method for anomaly detection and localization, extensive experiments are conducted on three widely-used anomaly detection datasets. The encouraging performance of the proposed method compared to the recent approaches based on the memory module demonstrates its superiority.
△ Less
Submitted 26 September, 2022;
originally announced September 2022.
-
Self-Supervised Guided Segmentation Framework for Unsupervised Anomaly Detection
Authors:
Peng Xing,
Yanpeng Sun,
Zechao Li
Abstract:
Unsupervised anomaly detection is a challenging task in industrial applications since it is impracticable to collect sufficient anomalous samples. In this paper, a novel Self-Supervised Guided Segmentation Framework (SGSF) is proposed by jointly exploring effective generation method of forged anomalous samples and the normal sample features as the guidance information of segmentation for anomaly d…
▽ More
Unsupervised anomaly detection is a challenging task in industrial applications since it is impracticable to collect sufficient anomalous samples. In this paper, a novel Self-Supervised Guided Segmentation Framework (SGSF) is proposed by jointly exploring effective generation method of forged anomalous samples and the normal sample features as the guidance information of segmentation for anomaly detection. Specifically, to ensure that the generated forged anomaly samples are conducive to model training, the Saliency Augmentation Module (SAM) is proposed. SAM introduces a saliency map to generate saliency Perlin noise map, and develops an adaptive segmentation strategy to generate irregular masks in the saliency region. Then, the masks are utilized to generate forged anomalous samples as negative samples for training. Unfortunately, the distribution gap between forged and real anomaly samples makes it difficult for models trained based on forged samples to effectively locate real anomalies. Towards this end, the Self-supervised Guidance Network (SGN) is proposed. It leverages the self-supervised module to extract features that are noise-free and contain normal semantic information as the prior knowledge of the segmentation module. The segmentation module with the knowledge of normal patterns segments out the abnormal regions that are different from the guidance features. To evaluate the effectiveness of SGSF for anomaly detection, extensive experiments are conducted on three anomaly detection datasets. The experimental results show that SGSF achieves state-of-the-art anomaly detection results.
△ Less
Submitted 26 September, 2022;
originally announced September 2022.
-
Phase Aberration Correction for in vivo Ultrasound Localization Microscopy Using a Spatiotemporal Complex-Valued Neural Network
Authors:
Paul Xing,
Jonathan Porée,
Brice Rauby,
Antoine Malescot,
Éric Martineau,
Vincent Perrot,
Ravi L. Rungta,
Jean Provost
Abstract:
Ultrasound Localization Microscopy (ULM) can map microvessels at a resolution of a few micrometers (μm). Transcranial ULM remains challenging in presence of aberrations caused by the skull, which lead to localization errors. Herein, we propose a deep learning approach based on complex-valued convolutional neural networks (CV-CNNs) to retrieve the aberration function, which can then be used to form…
▽ More
Ultrasound Localization Microscopy (ULM) can map microvessels at a resolution of a few micrometers (μm). Transcranial ULM remains challenging in presence of aberrations caused by the skull, which lead to localization errors. Herein, we propose a deep learning approach based on complex-valued convolutional neural networks (CV-CNNs) to retrieve the aberration function, which can then be used to form enhanced images using standard delay-and-sum beamforming. CV-CNNs were selected as they can apply time delays through multiplication with in-phase quadrature input data. Predicting the aberration function rather than corrected images also confers enhanced explainability to the network. In addition, 3D spatiotemporal convolutions were used for the network to leverage entire microbubble tracks. For training and validation, we used an anatomically and hemodynamically realistic mouse brain microvascular network model to simulate the flow of microbubbles in presence of aberration. The proposed CV-CNN performance was compared to the coherence-based method by using microbubble tracks. We then confirmed the capability of the proposed network to generalize to transcranial \textit{in vivo} data in the mouse brain (n=3). Vascular reconstructions using a locally predicted aberration function included additional and sharper vessels. The CV-CNN was more robust than the coherence-based method and could perform aberration correction in a 6-month-old mouse. After correction, we measured a resolution of 15.6 μm for younger mice, representing an improvement of 25.8 $\%$, while the resolution was improved by 13.9 $\%$ for the 6-month-old mouse. This work leads to different applications for complex-valued convolutions in biomedical imaging and strategies to perform transcranial ULM.
△ Less
Submitted 17 July, 2023; v1 submitted 21 September, 2022;
originally announced September 2022.
-
Meta-DETR: Image-Level Few-Shot Detection with Inter-Class Correlation Exploitation
Authors:
Gongjie Zhang,
Zhipeng Luo,
Kaiwen Cui,
Shijian Lu,
Eric P. Xing
Abstract:
Few-shot object detection has been extensively investigated by incorporating meta-learning into region-based detection frameworks. Despite its success, the said paradigm is still constrained by several factors, such as (i) low-quality region proposals for novel classes and (ii) negligence of the inter-class correlation among different classes. Such limitations hinder the generalization of base-cla…
▽ More
Few-shot object detection has been extensively investigated by incorporating meta-learning into region-based detection frameworks. Despite its success, the said paradigm is still constrained by several factors, such as (i) low-quality region proposals for novel classes and (ii) negligence of the inter-class correlation among different classes. Such limitations hinder the generalization of base-class knowledge for the detection of novel-class objects. In this work, we design Meta-DETR, which (i) is the first image-level few-shot detector, and (ii) introduces a novel inter-class correlational meta-learning strategy to capture and leverage the correlation among different classes for robust and accurate few-shot object detection. Meta-DETR works entirely at image level without any region proposals, which circumvents the constraint of inaccurate proposals in prevalent few-shot detection frameworks. In addition, the introduced correlational meta-learning enables Meta-DETR to simultaneously attend to multiple support classes within a single feedforward, which allows to capture the inter-class correlation among different classes, thus significantly reducing the misclassification over similar classes and enhancing knowledge generalization to novel classes. Experiments over multiple few-shot object detection benchmarks show that the proposed Meta-DETR outperforms state-of-the-art methods by large margins. The implementation codes are available at https://github.com/ZhangGongjie/Meta-DETR.
△ Less
Submitted 30 July, 2022;
originally announced August 2022.
-
Semantic-Aligned Matching for Enhanced DETR Convergence and Multi-Scale Feature Fusion
Authors:
Gongjie Zhang,
Zhipeng Luo,
Jiaxing Huang,
Shijian Lu,
Eric P. Xing
Abstract:
The recently proposed DEtection TRansformer (DETR) has established a fully end-to-end paradigm for object detection. However, DETR suffers from slow training convergence, which hinders its applicability to various detection tasks. We observe that DETR's slow convergence is largely attributed to the difficulty in matching object queries to relevant regions due to the unaligned semantics between obj…
▽ More
The recently proposed DEtection TRansformer (DETR) has established a fully end-to-end paradigm for object detection. However, DETR suffers from slow training convergence, which hinders its applicability to various detection tasks. We observe that DETR's slow convergence is largely attributed to the difficulty in matching object queries to relevant regions due to the unaligned semantics between object queries and encoded image features. With this observation, we design Semantic-Aligned-Matching DETR++ (SAM-DETR++) to accelerate DETR's convergence and improve detection performance. The core of SAM-DETR++ is a plug-and-play module that projects object queries and encoded image features into the same feature embedding space, where each object query can be easily matched to relevant regions with similar semantics. Besides, SAM-DETR++ searches for multiple representative keypoints and exploits their features for semantic-aligned matching with enhanced representation capacity. Furthermore, SAM-DETR++ can effectively fuse multi-scale features in a coarse-to-fine manner on the basis of the designed semantic-aligned matching. Extensive experiments show that the proposed SAM-DETR++ achieves superior convergence speed and competitive detection accuracy. Additionally, as a plug-and-play method, SAM-DETR++ can complement existing DETR convergence solutions with even better performance, achieving 44.8% AP with merely 12 training epochs and 49.1% AP with 50 training epochs on COCO val2017 with ResNet-50. Codes are available at https://github.com/ZhangGongjie/SAM-DETR .
△ Less
Submitted 6 February, 2023; v1 submitted 28 July, 2022;
originally announced July 2022.
-
Robustar: Interactive Toolbox Supporting Precise Data Annotation for Robust Vision Learning
Authors:
Chonghan Chen,
Haohan Wang,
Leyang Hu,
Yuhao Zhang,
Shuguang Lyu,
Jingcheng Wu,
Xinnuo Li,
Linjing Sun,
Eric P. Xing
Abstract:
We introduce the initial release of our software Robustar, which aims to improve the robustness of vision classification machine learning models through a data-driven perspective. Building upon the recent understanding that the lack of machine learning model's robustness is the tendency of the model's learning of spurious features, we aim to solve this problem from its root at the data perspective…
▽ More
We introduce the initial release of our software Robustar, which aims to improve the robustness of vision classification machine learning models through a data-driven perspective. Building upon the recent understanding that the lack of machine learning model's robustness is the tendency of the model's learning of spurious features, we aim to solve this problem from its root at the data perspective by removing the spurious features from the data before training. In particular, we introduce a software that helps the users to better prepare the data for training image classification models by allowing the users to annotate the spurious features at the pixel level of images. To facilitate this process, our software also leverages recent advances to help identify potential images and pixels worthy of attention and to continue the training with newly annotated data. Our software is hosted at the GitHub Repository https://github.com/HaohanWang/Robustar.
△ Less
Submitted 18 July, 2022;
originally announced July 2022.
-
MRCLens: an MRC Dataset Bias Detection Toolkit
Authors:
Yifan Zhong,
Haohan Wang,
Eric P. Xing
Abstract:
Many recent neural models have shown remarkable empirical results in Machine Reading Comprehension, but evidence suggests sometimes the models take advantage of dataset biases to predict and fail to generalize on out-of-sample data. While many other approaches have been proposed to address this issue from the computation perspective such as new architectures or training procedures, we believe a me…
▽ More
Many recent neural models have shown remarkable empirical results in Machine Reading Comprehension, but evidence suggests sometimes the models take advantage of dataset biases to predict and fail to generalize on out-of-sample data. While many other approaches have been proposed to address this issue from the computation perspective such as new architectures or training procedures, we believe a method that allows researchers to discover biases, and adjust the data or the models in an earlier stage will be beneficial. Thus, we introduce MRCLens, a toolkit that detects whether biases exist before users train the full model. For the convenience of introducing the toolkit, we also provide a categorization of common biases in MRC.
△ Less
Submitted 18 July, 2022;
originally announced July 2022.
-
BertNet: Harvesting Knowledge Graphs with Arbitrary Relations from Pretrained Language Models
Authors:
Shibo Hao,
Bowen Tan,
Kaiwen Tang,
Bin Ni,
Xiyan Shao,
Hengzhe Zhang,
Eric P. Xing,
Zhiting Hu
Abstract:
It is crucial to automatically construct knowledge graphs (KGs) of diverse new relations to support knowledge discovery and broad applications. Previous KG construction methods, based on either crowdsourcing or text mining, are often limited to a small predefined set of relations due to manual cost or restrictions in text corpus. Recent research proposed to use pretrained language models (LMs) as…
▽ More
It is crucial to automatically construct knowledge graphs (KGs) of diverse new relations to support knowledge discovery and broad applications. Previous KG construction methods, based on either crowdsourcing or text mining, are often limited to a small predefined set of relations due to manual cost or restrictions in text corpus. Recent research proposed to use pretrained language models (LMs) as implicit knowledge bases that accept knowledge queries with prompts. Yet, the implicit knowledge lacks many desirable properties of a full-scale symbolic KG, such as easy access, navigation, editing, and quality assurance. In this paper, we propose a new approach of harvesting massive KGs of arbitrary relations from pretrained LMs. With minimal input of a relation definition (a prompt and a few shot of example entity pairs), the approach efficiently searches in the vast entity pair space to extract diverse accurate knowledge of the desired relation. We develop an effective search-and-rescore mechanism for improved efficiency and accuracy. We deploy the approach to harvest KGs of over 400 new relations from different LMs. Extensive human and automatic evaluations show our approach manages to extract diverse accurate knowledge, including tuples of complex relations (e.g., "A is capable of but not good at B"). The resulting KGs as a symbolic interpretation of the source LMs also reveal new insights into the LMs' knowledge capacities.
△ Less
Submitted 2 June, 2023; v1 submitted 28 June, 2022;
originally announced June 2022.
-
Toward Learning Robust and Invariant Representations with Alignment Regularization and Data Augmentation
Authors:
Haohan Wang,
Zeyi Huang,
Xindi Wu,
Eric P. Xing
Abstract:
Data augmentation has been proven to be an effective technique for developing machine learning models that are robust to known classes of distributional shifts (e.g., rotations of images), and alignment regularization is a technique often used together with data augmentation to further help the model learn representations invariant to the shifts used to augment the data. In this paper, motivated b…
▽ More
Data augmentation has been proven to be an effective technique for developing machine learning models that are robust to known classes of distributional shifts (e.g., rotations of images), and alignment regularization is a technique often used together with data augmentation to further help the model learn representations invariant to the shifts used to augment the data. In this paper, motivated by a proliferation of options of alignment regularizations, we seek to evaluate the performances of several popular design choices along the dimensions of robustness and invariance, for which we introduce a new test procedure. Our synthetic experiment results speak to the benefits of squared l2 norm regularization. Further, we also formally analyze the behavior of alignment regularization to complement our empirical study under assumptions we consider realistic. Finally, we test this simple technique we identify (worst-case data augmentation with squared l2 norm alignment regularization) and show that the benefits of this method outrun those of the specially designed methods. We also release a software package in both TensorFlow and PyTorch for users to use the method with a couple of lines at https://github.com/jyanln/AlignReg.
△ Less
Submitted 4 June, 2022;
originally announced June 2022.