-
Phase-field modeling of dendritic growth with gas bubbles in the solidification of binary alloys
Authors:
Chengjie Zhan,
Zhenhua Chai,
Dongke Sun,
Baochang Shi,
Shaoning Geng,
Ping Jiang
Abstract:
In this work, a phase-field model is developed for the dendritic growth with gas bubbles in the solidification of binary alloys. In this model, a total free energy for the complex gas-liquid-dendrite system is proposed through considering the interactions of gas bubbles, liquid melt and solid dendrites, and it can reduce to the energy for gas-liquid flows in the region far from the solid phase, wh…
▽ More
In this work, a phase-field model is developed for the dendritic growth with gas bubbles in the solidification of binary alloys. In this model, a total free energy for the complex gas-liquid-dendrite system is proposed through considering the interactions of gas bubbles, liquid melt and solid dendrites, and it can reduce to the energy for gas-liquid flows in the region far from the solid phase, while degenerate to the energy for thermosolutal dendritic growth when the gas bubble disappears. The governing equations are usually obtained by minimizing the total free energy, but here some modifications are made to improve the capacity of the conservative phase-field equation for gas bubbles and convection-diffusion equation for solute transfer. Additionally, through the asymptotic analysis of the thin-interface limit, the present general phase-field model for alloy solidification can match the corresponding free boundary problem, and it is identical to the commonly used models under a specific choice of model parameters. Furthermore, to describe the fluid flow, the incompressible Navier-Stokes equations are adopted in the entire domain including gas, liquid, and solid regions, where the fluid-structure interaction is considered by a simple diffuse-interface method. To test the present phase-field model, the lattice Boltzmann method is used to study several problems of gas-liquid flows, dendritic growth as well as the solidification in presence of gas bubbles, and a good performance of the present model for such complex problems is observed.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Constraints on Ultra Heavy Dark Matter Properties from Dwarf Spheroidal Galaxies with LHAASO Observations
Authors:
Zhen Cao,
F. Aharonian,
Q. An,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen,
S. Z. Chen
, et al. (255 additional authors not shown)
Abstract:
In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes…
▽ More
In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes of astrophysical $γ$-ray background while large amount of dark matter. By analyzing more than 700 days observational data at LHAASO, no significant dark matter signal from 1 TeV to 1 EeV is detected. Accordingly we derive the most stringent constraints on the ultra-heavy dark matter annihilation cross-section up to EeV. The constraints on the lifetime of dark matter in decay mode are also derived.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better
Authors:
Scott Geng,
Cheng-Yu Hsieh,
Vivek Ramanujan,
Matthew Wallingford,
Chun-Liang Li,
Pang Wei Koh,
Ranjay Krishna
Abstract:
Generative text-to-image models enable us to synthesize unlimited amounts of images in a controllable manner, spurring many recent efforts to train vision models with synthetic data. However, every synthetic image ultimately originates from the upstream data used to train the generator. What additional value does the intermediate generator provide over directly training on relevant parts of the up…
▽ More
Generative text-to-image models enable us to synthesize unlimited amounts of images in a controllable manner, spurring many recent efforts to train vision models with synthetic data. However, every synthetic image ultimately originates from the upstream data used to train the generator. What additional value does the intermediate generator provide over directly training on relevant parts of the upstream data? Grounding this question in the setting of image classification,a we compare finetuning on task-relevant, targeted synthetic data generated by Stable Diffusion -- a generative model trained on the LAION-2B dataset -- against finetuning on targeted real images retrieved directly from LAION-2B. We show that while synthetic data can benefit some downstream tasks, it is universally matched or outperformed by real data from our simple retrieval baseline. Our analysis suggests that this underperformance is partially due to generator artifacts and inaccurate task-relevant visual details in the synthetic images. Overall, we argue that retrieval is a critical baseline to consider when training with synthetic data -- a baseline that current methods do not yet surpass. We release code, data, and models at https://github.com/scottgeng00/unmet-promise.
△ Less
Submitted 3 July, 2024; v1 submitted 7 June, 2024;
originally announced June 2024.
-
Data quality control system and long-term performance monitor of the LHAASO-KM2A
Authors:
Zhen Cao,
F. Aharonian,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
W. Bian,
A. V. Bukevich,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
H. X. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. Chen
, et al. (263 additional authors not shown)
Abstract:
The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To…
▽ More
The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To ensure the reliability of the LHAASO-KM2A data, a three-level quality control system has been established. It is used to monitor the status of detector units, stability of reconstructed parameters and the performance of the array based on observations of the Crab Nebula and Moon shadow. This paper will introduce the control system and its application on the LHAASO-KM2A data collected from August 2021 to July 2023. During this period, the pointing and angular resolution of the array were stable. From the observations of the Moon shadow and Crab Nebula, the results achieved using the two methods are consistent with each other. According to the observation of the Crab Nebula at energies from 25 TeV to 100 TeV, the time averaged pointing errors are estimated to be $-0.003^{\circ} \pm 0.005^{\circ}$ and $0.001^{\circ} \pm 0.006^{\circ}$ in the R.A. and Dec directions, respectively.
△ Less
Submitted 13 June, 2024; v1 submitted 20 May, 2024;
originally announced May 2024.
-
Discovery of Very-high-energy Gamma-ray Emissions from the Low Luminosity AGN NGC 4278 by LHAASO
Authors:
Zhen Cao,
F. Aharonian,
Q. An,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen,
S. Z. Chen
, et al. (255 additional authors not shown)
Abstract:
The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) i…
▽ More
The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) is compatible with NGC 4278 within $\sim0.03$ degree. Variation analysis shows an indication of the variability at a few months level in the TeV band, which is consistent with low frequency observations. Based on these observations, we report the detection of TeV $γ$-ray emissions from this low-luminosity AGN NGC 4278. The observations by LHAASO-WCDA during active period has a significance level of 8.8\,$σ$ with best-fit photon spectral index $\varGamma=2.56\pm0.14$ and a flux $f_{1-10\,\rm{TeV}}=(7.0\pm1.1_{\rm{sta}}\pm0.35_{\rm{syst}})\times10^{-13}\,\rm{photons\,cm^{-2}\,s^{-1}}$, or approximately $5\%$ of the Crab Nebula. The discovery of VHE from NGC 4278 indicates that the compact, weak radio jet can efficiently accelerate particles and emit TeV photons.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers
Authors:
Peng Gao,
Le Zhuo,
Dongyang Liu,
Ruoyi Du,
Xu Luo,
Longtian Qiu,
Yuhang Zhang,
Chen Lin,
Rongjie Huang,
Shijie Geng,
Renrui Zhang,
Junlin Xi,
Wenqi Shao,
Zhengkai Jiang,
Tianshuo Yang,
Weicai Ye,
He Tong,
Jingwen He,
Yu Qiao,
Hongsheng Li
Abstract:
Sora unveils the potential of scaling Diffusion Transformer for generating photorealistic images and videos at arbitrary resolutions, aspect ratios, and durations, yet it still lacks sufficient implementation details. In this technical report, we introduce the Lumina-T2X family - a series of Flow-based Large Diffusion Transformers (Flag-DiT) equipped with zero-initialized attention, as a unified f…
▽ More
Sora unveils the potential of scaling Diffusion Transformer for generating photorealistic images and videos at arbitrary resolutions, aspect ratios, and durations, yet it still lacks sufficient implementation details. In this technical report, we introduce the Lumina-T2X family - a series of Flow-based Large Diffusion Transformers (Flag-DiT) equipped with zero-initialized attention, as a unified framework designed to transform noise into images, videos, multi-view 3D objects, and audio clips conditioned on text instructions. By tokenizing the latent spatial-temporal space and incorporating learnable placeholders such as [nextline] and [nextframe] tokens, Lumina-T2X seamlessly unifies the representations of different modalities across various spatial-temporal resolutions. This unified approach enables training within a single framework for different modalities and allows for flexible generation of multimodal data at any resolution, aspect ratio, and length during inference. Advanced techniques like RoPE, RMSNorm, and flow matching enhance the stability, flexibility, and scalability of Flag-DiT, enabling models of Lumina-T2X to scale up to 7 billion parameters and extend the context window to 128K tokens. This is particularly beneficial for creating ultra-high-definition images with our Lumina-T2I model and long 720p videos with our Lumina-T2V model. Remarkably, Lumina-T2I, powered by a 5-billion-parameter Flag-DiT, requires only 35% of the training computational costs of a 600-million-parameter naive DiT. Our further comprehensive analysis underscores Lumina-T2X's preliminary capability in resolution extrapolation, high-resolution editing, generating consistent 3D views, and synthesizing videos with seamless transitions. We expect that the open-sourcing of Lumina-T2X will further foster creativity, transparency, and diversity in the generative AI community.
△ Less
Submitted 13 June, 2024; v1 submitted 9 May, 2024;
originally announced May 2024.
-
Deep Learning for Detecting and Early Predicting Chronic Obstructive Pulmonary Disease from Spirogram Time Series: A UK Biobank Study
Authors:
Shuhao Mei,
Yuxi Zhou,
Jiahao Xu,
Yuxuan Wan,
Shan Cao,
Qinghao Zhao,
Shijia Geng,
Junqing Xie,
Shenda Hong
Abstract:
Chronic Obstructive Pulmonary Disease (COPD) is a chronic inflammatory lung condition that causes airflow obstruction. The existing methods can only detect patients who already have COPD based on obvious features shown in the spirogram (In this article, the spirogram specifically involves measuring Volume-Flow curve time series). Early prediction of COPD risk is vital for monitoring COPD disease p…
▽ More
Chronic Obstructive Pulmonary Disease (COPD) is a chronic inflammatory lung condition that causes airflow obstruction. The existing methods can only detect patients who already have COPD based on obvious features shown in the spirogram (In this article, the spirogram specifically involves measuring Volume-Flow curve time series). Early prediction of COPD risk is vital for monitoring COPD disease progression, slowing it down, or even preventing its onset. However, these methods fail to early predict an individual's probability of COPD in the future based on subtle features in the spirogram. To address this gap, for the first time, we propose DeepSpiro, a method based on deep learning for early prediction of future COPD risk. DeepSpiro consists of four parts. First, we construct Volume-Flow curves guided by Time-Volume instability smoothing (SpiroSmoother) to enhance the stability of the original Volume-Flow curves precisely. Second, we extract critical features from the evolution of varied-length key patches (SpiroEncoder) to capture the key temporal evolution from original high-dimensional dynamic sequences to a unified low-dimensional temporal representation. Third, we explain the model based on temporal attention and heterogeneous feature fusion (SpiroExplainer), which integrates information from heterogeneous data such as spirogram and demographic information. Fourth, we predict the risk of COPD based on the evolution of key patch concavity (SpiroPredictor), enabling accurate prediction of the risk of disease in high-risk patients who are not yet diagnosed, for up to 1, 2, 3, 4, 5 years, and beyond. We conduct experiments on the UK Biobank dataset. Results show that DeepSpiro achieves an AUC value of 0.8328 in the task of detecting COPD. In early prediction tasks, high-risk and low-risk groups show significant differences in the future, with a p-value of <0.001.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Characterizing Kirkwood-Dirac positive states based on discrete Fourier transform
Authors:
Ying-Hui Yang,
Shuang Yao,
Shi-Jiao Geng,
Xiao-Li Wang,
Pei-Ying Chen
Abstract:
Kirkwood-Dirac (KD) distribution is helpful to describe nonclassical phenomena and quantum advantages, which have been linked with nonpositive entries of KD distribution. Suppose that $\mathcal{A}$ and $\mathcal{B}$ are the eigenprojectors of the two eigenbases of two observables and the discrete Fourier transform (DFT) matrix is the transition matrix between the two eigenbases. In a system with p…
▽ More
Kirkwood-Dirac (KD) distribution is helpful to describe nonclassical phenomena and quantum advantages, which have been linked with nonpositive entries of KD distribution. Suppose that $\mathcal{A}$ and $\mathcal{B}$ are the eigenprojectors of the two eigenbases of two observables and the discrete Fourier transform (DFT) matrix is the transition matrix between the two eigenbases. In a system with prime dimension, the set $\mathcal{E}_{KD+}$ of KD positive states based on the DFT matrix is convex combinations of $\mathcal{A}$ and $\mathcal{B}$. That is, $\mathcal{E}_{KD+}={\rm conv}(\mathcal{A}\cup\mathcal{B})$ [arXiv:2306.00086]. In this paper, we generalize the result. That is, in a $d$-dimensional system, $\mathcal{E}_{KD+}={\rm conv}(Ω)$ for $d=p^{2}$ and $d=pq$, where $p, q$ are prime and $Ω$ is the set of projectors of all the pure KD positive states.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language Models
Authors:
Linyi Li,
Shijie Geng,
Zhenwen Li,
Yibo He,
Hao Yu,
Ziyue Hua,
Guanghan Ning,
Siwei Wang,
Tao Xie,
Hongxia Yang
Abstract:
Large Language Models for code (code LLMs) have witnessed tremendous progress in recent years. With the rapid development of code LLMs, many popular evaluation benchmarks, such as HumanEval, DS-1000, and MBPP, have emerged to measure the performance of code LLMs with a particular focus on code generation tasks. However, they are insufficient to cover the full range of expected capabilities of code…
▽ More
Large Language Models for code (code LLMs) have witnessed tremendous progress in recent years. With the rapid development of code LLMs, many popular evaluation benchmarks, such as HumanEval, DS-1000, and MBPP, have emerged to measure the performance of code LLMs with a particular focus on code generation tasks. However, they are insufficient to cover the full range of expected capabilities of code LLMs, which span beyond code generation to answering diverse coding-related questions. To fill this gap, we propose InfiBench, the first large-scale freeform question-answering (QA) benchmark for code to our knowledge, comprising 234 carefully selected high-quality Stack Overflow questions that span across 15 programming languages. InfiBench uses four types of model-free automatic metrics to evaluate response correctness where domain experts carefully concretize the criterion for each question. We conduct a systematic evaluation for over 100 latest code LLMs on InfiBench, leading to a series of novel and insightful findings. Our detailed analyses showcase potential directions for further advancement of code LLMs. InfiBench is fully open source and continuously expanding to foster more scientific and systematic practices for code LLM evaluation.
△ Less
Submitted 27 June, 2024; v1 submitted 10 March, 2024;
originally announced April 2024.
-
LHAASO-KM2A detector simulation using Geant4
Authors:
Zhen Cao,
F. Aharonian,
Q. An,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen,
S. Z. Chen
, et al. (254 additional authors not shown)
Abstract:
KM2A is one of the main sub-arrays of LHAASO, working on gamma ray astronomy and cosmic ray physics at energies above 10 TeV. Detector simulation is the important foundation for estimating detector performance and data analysis. It is a big challenge to simulate the KM2A detector in the framework of Geant4 due to the need to track numerous photons from a large number of detector units (>6000) with…
▽ More
KM2A is one of the main sub-arrays of LHAASO, working on gamma ray astronomy and cosmic ray physics at energies above 10 TeV. Detector simulation is the important foundation for estimating detector performance and data analysis. It is a big challenge to simulate the KM2A detector in the framework of Geant4 due to the need to track numerous photons from a large number of detector units (>6000) with large altitude difference (30 m) and huge coverage (1.3 km^2). In this paper, the design of the KM2A simulation code G4KM2A based on Geant4 is introduced. The process of G4KM2A is optimized mainly in memory consumption to avoid memory overffow. Some simpliffcations are used to signiffcantly speed up the execution of G4KM2A. The running time is reduced by at least 30 times compared to full detector simulation. The particle distributions and the core/angle resolution comparison between simulation and experimental data of the full KM2A array are also presented, which show good agreement.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
A Deep Learning Method for Beat-Level Risk Analysis and Interpretation of Atrial Fibrillation Patients during Sinus Rhythm
Authors:
Jun Lei,
Yuxi Zhou,
Xue Tian,
Qinghao Zhao,
Qi Zhang,
Shijia Geng,
Qingbo Wu,
Shenda Hong
Abstract:
Atrial Fibrillation (AF) is a common cardiac arrhythmia. Many AF patients experience complications such as stroke and other cardiovascular issues. Early detection of AF is crucial. Existing algorithms can only distinguish ``AF rhythm in AF patients'' from ``sinus rhythm in normal individuals'' . However, AF patients do not always exhibit AF rhythm, posing a challenge for diagnosis when the AF rhyt…
▽ More
Atrial Fibrillation (AF) is a common cardiac arrhythmia. Many AF patients experience complications such as stroke and other cardiovascular issues. Early detection of AF is crucial. Existing algorithms can only distinguish ``AF rhythm in AF patients'' from ``sinus rhythm in normal individuals'' . However, AF patients do not always exhibit AF rhythm, posing a challenge for diagnosis when the AF rhythm is absent. To address this, this paper proposes a novel artificial intelligence (AI) algorithm to distinguish ``sinus rhythm in AF patients'' and ``sinus rhythm in normal individuals'' in beat-level. We introduce beat-level risk interpreters, trend risk interpreters, addressing the interpretability issues of deep learning models and the difficulty in explaining AF risk trends. Additionally, the beat-level information fusion decision is presented to enhance model accuracy. The experimental results demonstrate that the average AUC for single beats used as testing data from CPSC 2021 dataset is 0.7314. By employing 150 beats for information fusion decision algorithm, the average AUC can reach 0.7591. Compared to previous segment-level algorithms, we utilized beats as input, reducing data dimensionality and making the model more lightweight, facilitating deployment on portable medical devices. Furthermore, we draw new and interesting findings through average beat analysis and subgroup analysis, considering varying risk levels.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
Decoding Continuous Character-based Language from Non-invasive Brain Recordings
Authors:
Cenyuan Zhang,
Xiaoqing Zheng,
Ruicheng Yin,
Shujie Geng,
Jianhan Xu,
Xuan Gao,
Changze Lv,
Zixuan Ling,
Xuanjing Huang,
Miao Cao,
Jianfeng Feng
Abstract:
Deciphering natural language from brain activity through non-invasive devices remains a formidable challenge. Previous non-invasive decoders either require multiple experiments with identical stimuli to pinpoint cortical regions and enhance signal-to-noise ratios in brain activity, or they are limited to discerning basic linguistic elements such as letters and words. We propose a novel approach to…
▽ More
Deciphering natural language from brain activity through non-invasive devices remains a formidable challenge. Previous non-invasive decoders either require multiple experiments with identical stimuli to pinpoint cortical regions and enhance signal-to-noise ratios in brain activity, or they are limited to discerning basic linguistic elements such as letters and words. We propose a novel approach to decoding continuous language from single-trial non-invasive fMRI recordings, in which a three-dimensional convolutional network augmented with information bottleneck is developed to automatically identify responsive voxels to stimuli, and a character-based decoder is designed for the semantic reconstruction of continuous language characterized by inherent character structures. The resulting decoder can produce intelligible textual sequences that faithfully capture the meaning of perceived speech both within and across subjects, while existing decoders exhibit significantly inferior performance in cross-subject contexts. The ability to decode continuous language from single trials across subjects demonstrates the promising applications of non-invasive language brain-computer interfaces in both healthcare and neuroscience.
△ Less
Submitted 19 March, 2024; v1 submitted 17 March, 2024;
originally announced March 2024.
-
Measurements of All-Particle Energy Spectrum and Mean Logarithmic Mass of Cosmic Rays from 0.3 to 30 PeV with LHAASO-KM2A
Authors:
The LHAASO Collaboration,
Zhen Cao,
F. Aharonian,
Q. An,
A. Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen
, et al. (256 additional authors not shown)
Abstract:
We present the measurements of all-particle energy spectrum and mean logarithmic mass of cosmic rays in the energy range of 0.3-30 PeV using data collected from LHAASO-KM2A between September 2021 and December 2022, which is based on a nearly composition-independent energy reconstruction method, achieving unprecedented accuracy. Our analysis reveals the position of the knee at…
▽ More
We present the measurements of all-particle energy spectrum and mean logarithmic mass of cosmic rays in the energy range of 0.3-30 PeV using data collected from LHAASO-KM2A between September 2021 and December 2022, which is based on a nearly composition-independent energy reconstruction method, achieving unprecedented accuracy. Our analysis reveals the position of the knee at $3.67 \pm 0.05 \pm 0.15$ PeV. Below the knee, the spectral index is found to be -$2.7413 \pm 0.0004 \pm 0.0050$, while above the knee, it is -$3.128 \pm 0.005 \pm 0.027$, with the sharpness of the transition measured with a statistical error of 2%. The mean logarithmic mass of cosmic rays is almost heavier than helium in the whole measured energy range. It decreases from 1.7 at 0.3 PeV to 1.3 at 3 PeV, representing a 24% decline following a power law with an index of -$0.1200 \pm 0.0003 \pm 0.0341$. This is equivalent to an increase in abundance of light components. Above the knee, the mean logarithmic mass exhibits a power law trend towards heavier components, which is reversal to the behavior observed in the all-particle energy spectrum. Additionally, the knee position and the change in power-law index are approximately the same. These findings suggest that the knee observed in the all-particle spectrum corresponds to the knee of the light component, rather than the medium-heavy components.
△ Less
Submitted 26 March, 2024; v1 submitted 15 March, 2024;
originally announced March 2024.
-
Reinforcement Learning Paycheck Optimization for Multivariate Financial Goals
Authors:
Melda Alaluf,
Giulia Crippa,
Sinong Geng,
Zijian Jing,
Nikhil Krishnan,
Sanjeev Kulkarni,
Wyatt Navarro,
Ronnie Sircar,
Jonathan Tang
Abstract:
We study paycheck optimization, which examines how to allocate income in order to achieve several competing financial goals. For paycheck optimization, a quantitative methodology is missing, due to a lack of a suitable problem formulation. To deal with this issue, we formulate the problem as a utility maximization problem. The proposed formulation is able to (i) unify different financial goals; (i…
▽ More
We study paycheck optimization, which examines how to allocate income in order to achieve several competing financial goals. For paycheck optimization, a quantitative methodology is missing, due to a lack of a suitable problem formulation. To deal with this issue, we formulate the problem as a utility maximization problem. The proposed formulation is able to (i) unify different financial goals; (ii) incorporate user preferences regarding the goals; (iii) handle stochastic interest rates. The proposed formulation also facilitates an end-to-end reinforcement learning solution, which is implemented on a variety of problem settings.
△ Less
Submitted 9 March, 2024;
originally announced March 2024.
-
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Authors:
Dongyang Liu,
Renrui Zhang,
Longtian Qiu,
Siyuan Huang,
Weifeng Lin,
Shitian Zhao,
Shijie Geng,
Ziyi Lin,
Peng Jin,
Kaipeng Zhang,
Wenqi Shao,
Chao Xu,
Conghui He,
Junjun He,
Hao Shao,
Pan Lu,
Hongsheng Li,
Yu Qiao,
Peng Gao
Abstract:
We propose SPHINX-X, an extensive Multimodality Large Language Model (MLLM) series developed upon SPHINX. To improve the architecture and training efficiency, we modify the SPHINX framework by removing redundant visual encoders, bypassing fully-padded sub-images with skip tokens, and simplifying multi-stage training into a one-stage all-in-one paradigm. To fully unleash the potential of MLLMs, we…
▽ More
We propose SPHINX-X, an extensive Multimodality Large Language Model (MLLM) series developed upon SPHINX. To improve the architecture and training efficiency, we modify the SPHINX framework by removing redundant visual encoders, bypassing fully-padded sub-images with skip tokens, and simplifying multi-stage training into a one-stage all-in-one paradigm. To fully unleash the potential of MLLMs, we assemble a comprehensive multi-domain and multimodal dataset covering publicly available resources in language, vision, and vision-language tasks. We further enrich this collection with our curated OCR intensive and Set-of-Mark datasets, extending the diversity and generality. By training over different base LLMs including TinyLlama1.1B, InternLM2-7B, LLaMA2-13B, and Mixtral8x7B, we obtain a spectrum of MLLMs that vary in parameter size and multilingual capabilities. Comprehensive benchmarking reveals a strong correlation between the multi-modal performance with the data and parameter scales. Code and models are released at https://github.com/Alpha-VLLM/LLaMA2-Accessory
△ Less
Submitted 26 June, 2024; v1 submitted 8 February, 2024;
originally announced February 2024.
-
Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives
Authors:
Sheng Luo,
Wei Chen,
Wanxin Tian,
Rui Liu,
Luanxuan Hou,
Xiubao Zhang,
Haifeng Shen,
Ruiqi Wu,
Shuyi Geng,
Yi Zhou,
Ling Shao,
Yi Yang,
Bojun Gao,
Qun Li,
Guobin Wu
Abstract:
Foundation models have indeed made a profound impact on various fields, emerging as pivotal components that significantly shape the capabilities of intelligent systems. In the context of intelligent vehicles, leveraging the power of foundation models has proven to be transformative, offering notable advancements in visual understanding. Equipped with multi-modal and multi-task learning capabilitie…
▽ More
Foundation models have indeed made a profound impact on various fields, emerging as pivotal components that significantly shape the capabilities of intelligent systems. In the context of intelligent vehicles, leveraging the power of foundation models has proven to be transformative, offering notable advancements in visual understanding. Equipped with multi-modal and multi-task learning capabilities, multi-modal multi-task visual understanding foundation models (MM-VUFMs) effectively process and fuse data from diverse modalities and simultaneously handle various driving-related tasks with powerful adaptability, contributing to a more holistic understanding of the surrounding scene. In this survey, we present a systematic analysis of MM-VUFMs specifically designed for road scenes. Our objective is not only to provide a comprehensive overview of common practices, referring to task-specific models, unified multi-modal models, unified multi-task models, and foundation model prompting techniques, but also to highlight their advanced capabilities in diverse learning paradigms. These paradigms include open-world understanding, efficient transfer for road scenes, continual learning, interactive and generative capability. Moreover, we provide insights into key challenges and future trends, such as closed-loop driving systems, interpretability, embodied driving agents, and world models. To facilitate researchers in staying abreast of the latest developments in MM-VUFMs for road scenes, we have established a continuously updated repository at https://github.com/rolsheng/MM-VUFM4DS
△ Less
Submitted 26 May, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
A Review of Deep Learning Methods for Photoplethysmography Data
Authors:
Guangkun Nie,
Jiabao Zhu,
Gongzheng Tang,
Deyun Zhang,
Shijia Geng,
Qinghao Zhao,
Shenda Hong
Abstract:
Photoplethysmography (PPG) is a highly promising device due to its advantages in portability, user-friendly operation, and non-invasive capabilities to measure a wide range of physiological information. Recent advancements in deep learning have demonstrated remarkable outcomes by leveraging PPG signals for tasks related to personal health management and other multifaceted applications. In this rev…
▽ More
Photoplethysmography (PPG) is a highly promising device due to its advantages in portability, user-friendly operation, and non-invasive capabilities to measure a wide range of physiological information. Recent advancements in deep learning have demonstrated remarkable outcomes by leveraging PPG signals for tasks related to personal health management and other multifaceted applications. In this review, we systematically reviewed papers that applied deep learning models to process PPG data between January 1st of 2017 and July 31st of 2023 from Google Scholar, PubMed and Dimensions. Each paper is analyzed from three key perspectives: tasks, models, and data. We finally extracted 193 papers where different deep learning frameworks were used to process PPG signals. Based on the tasks addressed in these papers, we categorized them into two major groups: medical-related, and non-medical-related. The medical-related tasks were further divided into seven subgroups, including blood pressure analysis, cardiovascular monitoring and diagnosis, sleep health, mental health, respiratory monitoring and analysis, blood glucose analysis, as well as others. The non-medical-related tasks were divided into four subgroups, which encompass signal processing, biometric identification, electrocardiogram reconstruction, and human activity recognition. In conclusion, significant progress has been made in the field of using deep learning methods to process PPG data recently. This allows for a more thorough exploration and utilization of the information contained in PPG signals. However, challenges remain, such as limited quantity and quality of publicly available databases, a lack of effective validation in real-world scenarios, and concerns about the interpretability, scalability, and complexity of deep learning models. Moreover, there are still emerging research areas that require further investigation.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
Sketch-Guided Constrained Decoding for Boosting Blackbox Large Language Models without Logit Access
Authors:
Saibo Geng,
Berkay Döner,
Chris Wendler,
Martin Josifoski,
Robert West
Abstract:
Constrained decoding, a technique for enforcing constraints on language model outputs, offers a way to control text generation without retraining or architectural modifications. Its application is, however, typically restricted to models that give users access to next-token distributions (usually via softmax logits), which poses a limitation with blackbox large language models (LLMs). This paper i…
▽ More
Constrained decoding, a technique for enforcing constraints on language model outputs, offers a way to control text generation without retraining or architectural modifications. Its application is, however, typically restricted to models that give users access to next-token distributions (usually via softmax logits), which poses a limitation with blackbox large language models (LLMs). This paper introduces sketch-guided constrained decoding (SGCD), a novel approach to constrained decoding for blackbox LLMs, which operates without access to the logits of the blackbox LLM. SGCD utilizes a locally hosted auxiliary model to refine the output of an unconstrained blackbox LLM, effectively treating this initial output as a "sketch" for further elaboration. This approach is complementary to traditional logit-based techniques and enables the application of constrained decoding in settings where full model transparency is unavailable. We demonstrate the efficacy of SGCD through experiments in closed information extraction and constituency parsing, showing how it enhances the utility and flexibility of blackbox LLMs for complex NLP tasks.
△ Less
Submitted 2 July, 2024; v1 submitted 18 January, 2024;
originally announced January 2024.
-
$τ$-tilting graphs and quotients
Authors:
Changjian Fu,
Shengfei Geng,
Pin Liu
Abstract:
We investigate $τ$-tilting graphs of algebras and their quotient algebras, and obtain a sufficient condition for the connectivity of $τ$-tilting graphs to be maintained in quotient algebras. It is worth pointing out that $g$-tame algebras satisfy this condition. As a consequence, we newly obtain a large class of algebras whose $τ$-tilting graphs are connected, including in particular the quotient…
▽ More
We investigate $τ$-tilting graphs of algebras and their quotient algebras, and obtain a sufficient condition for the connectivity of $τ$-tilting graphs to be maintained in quotient algebras. It is worth pointing out that $g$-tame algebras satisfy this condition. As a consequence, we newly obtain a large class of algebras whose $τ$-tilting graphs are connected, including in particular the quotient algebras of skew-gentle algebras and quasi-tilted algebras of tame type.
△ Less
Submitted 10 January, 2024; v1 submitted 10 January, 2024;
originally announced January 2024.
-
Does or did the supernova remnant Cassiopeia A operate as a PeVatron?
Authors:
Zhen Cao,
F. Aharonian,
Q. An,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen,
S. Z. Chen
, et al. (255 additional authors not shown)
Abstract:
For decades, supernova remnants (SNRs) have been considered the prime sources of Galactic Cosmic rays (CRs). But whether SNRs can accelerate CR protons to PeV energies and thus dominate CR flux up to the knee is currently under intensive theoretical and phenomenological debate. The direct test of the ability of SNRs to operate as CR PeVatrons can be provided by ultrahigh-energy (UHE;…
▽ More
For decades, supernova remnants (SNRs) have been considered the prime sources of Galactic Cosmic rays (CRs). But whether SNRs can accelerate CR protons to PeV energies and thus dominate CR flux up to the knee is currently under intensive theoretical and phenomenological debate. The direct test of the ability of SNRs to operate as CR PeVatrons can be provided by ultrahigh-energy (UHE; $E_γ\geq 100$~TeV) $γ$-rays. In this context, the historical SNR Cassiopeia A (Cas A) is considered one of the most promising target for UHE observations. This paper presents the observation of Cas A and its vicinity by the LHAASO KM2A detector. The exceptional sensitivity of LHAASO KM2A in the UHE band, combined with the young age of Cas A, enabled us to derive stringent model-independent limits on the energy budget of UHE protons and nuclei accelerated by Cas A at any epoch after the explosion. The results challenge the prevailing paradigm that Cas A-type SNRs are major suppliers of PeV CRs in the Milky Way.
△ Less
Submitted 25 October, 2023;
originally announced October 2023.
-
Isospin-dependence of the charge-changing cross-section shaped by the charged-particle evaporation process
Authors:
J. W. Zhao,
B. -H. Sun,
I. Tanihata,
S. Terashima,
A. Prochazka,
J. Y. Xu,
L. H. Zhu,
J. Meng,
J. Su,
K. Y. Zhang,
L. S. Geng,
L. C. He,
C. Y. Liu,
G. S. Li,
C. G. Lu,
W. J. Lin,
W. P. Lin,
Z. Liu,
P. P Ren,
Z. Y. Sun,
F. Wang,
J. Wang,
M. Wang,
S. T. Wang,
X. L. Wei
, et al. (4 additional authors not shown)
Abstract:
We present the charge-changing cross sections (CCCS) of $^{11-15}$C, $^{13-17}$N, and $^{15,17-18}$O at around 300 MeV/nucleon on a carbon target, which extends to $p$-shell isotopes with $N < Z$ for the first time. The Glauber model, which considers only the proton distribution of projectile nuclei, underestimates the cross sections by more than 10\%. We show that this discrepancy can be resolved…
▽ More
We present the charge-changing cross sections (CCCS) of $^{11-15}$C, $^{13-17}$N, and $^{15,17-18}$O at around 300 MeV/nucleon on a carbon target, which extends to $p$-shell isotopes with $N < Z$ for the first time. The Glauber model, which considers only the proton distribution of projectile nuclei, underestimates the cross sections by more than 10\%. We show that this discrepancy can be resolved by considering the contribution from the charged-particle evaporation process (CPEP) following projectile neutron removal. Using nucleon densities from the deformed relativistic Hartree-Bogoliubov theory in continuum, we investigate the isospin-dependent CPEP contribution to the CCCS for a wide range of neutron-to-proton separation energy asymmetry. Our calculations, which include the CPEP contribution, agree well with existing systematic data and reveal an ``evaporation peak" at the isospin symmetric region where the neutron-to-proton separation energy is close to zero. These results suggest that analysis beyond the Glauber model is crucial for accurately determining nuclear charge radii from CCCSs.
△ Less
Submitted 21 October, 2023;
originally announced October 2023.
-
Very high energy gamma-ray emission beyond 10 TeV from GRB 221009A
Authors:
Zhen Cao,
F. Aharonian,
Q. An,
A. Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen,
S. Z. Chen
, et al. (255 additional authors not shown)
Abstract:
The highest energy gamma-rays from gamma-ray bursts (GRBs) have important implications for their radiation mechanism. Here we report for the first time the detection of gamma-rays up to 13 TeV from the brightest GRB 221009A by the Large High Altitude Air-shower Observatory (LHAASO). The LHAASO-KM2A detector registered more than 140 gamma-rays with energies above 3 TeV during 230$-$900s after the t…
▽ More
The highest energy gamma-rays from gamma-ray bursts (GRBs) have important implications for their radiation mechanism. Here we report for the first time the detection of gamma-rays up to 13 TeV from the brightest GRB 221009A by the Large High Altitude Air-shower Observatory (LHAASO). The LHAASO-KM2A detector registered more than 140 gamma-rays with energies above 3 TeV during 230$-$900s after the trigger. The intrinsic energy spectrum of gamma-rays can be described by a power-law after correcting for extragalactic background light (EBL) absorption. Such a hard spectrum challenges the synchrotron self-Compton (SSC) scenario of relativistic electrons for the afterglow emission above several TeV. Observations of gamma-rays up to 13 TeV from a source with a measured redshift of z=0.151 hints more transparency in intergalactic space than previously expected. Alternatively, one may invoke new physics such as Lorentz Invariance Violation (LIV) or an axion origin of very high energy (VHE) signals.
△ Less
Submitted 22 November, 2023; v1 submitted 13 October, 2023;
originally announced October 2023.
-
An Integer Clustering Approach for Modeling Large-Scale EV Fleets with Guaranteed Performance
Authors:
Sijia Geng,
Thomas Lee,
Dharik Mallapragada,
Audun Botterud
Abstract:
Large-scale integration of electric vehicles (EVs) leads to a tighter integration between transportation and electric energy systems. In this paper, we develop a novel integer-clustering approach to model a large number of EVs that manages vehicle charging and energy at the fleet level yet maintain individual trip dispatch. The model is then used to develop a spatially and temporally-resolved deci…
▽ More
Large-scale integration of electric vehicles (EVs) leads to a tighter integration between transportation and electric energy systems. In this paper, we develop a novel integer-clustering approach to model a large number of EVs that manages vehicle charging and energy at the fleet level yet maintain individual trip dispatch. The model is then used to develop a spatially and temporally-resolved decision-making tool for optimally planning and/or operating EV fleets and charging infrastructure. The tool comprises a two-stage framework where a tractable disaggregation step follows the integer-clustering problem to recover an individually feasible solution. Mathematical relationships between the integer clustering, disaggregation, and individual formulations are analyzed. We establish theoretical lower and upper bounds on the true individual formulation which underpins a guaranteed performance of the proposed method. The optimality accuracy and computational efficiency of the integer-clustering formulation are also numerically validated on a real-world case study of Boston's public transit network under extensive test instances. Substantial speedups with minimal loss in solution quality are demonstrated.
△ Less
Submitted 3 October, 2023;
originally announced October 2023.
-
Approximating Voltage Stability Boundary Under High Variability of Renewables Using Differential Geometry
Authors:
Dan Wu,
Franz-Erich Wolter,
Sijia Geng
Abstract:
This paper proposes a novel method rooted in differential geometry to approximate the voltage stability boundary of power systems under high variability of renewable generation. We extract intrinsic geometric information of the power flow solution manifold at a given operating point. Specifically, coefficients of the Levi-Civita connection are constructed to approximate the geodesics of the manifo…
▽ More
This paper proposes a novel method rooted in differential geometry to approximate the voltage stability boundary of power systems under high variability of renewable generation. We extract intrinsic geometric information of the power flow solution manifold at a given operating point. Specifically, coefficients of the Levi-Civita connection are constructed to approximate the geodesics of the manifold starting at an operating point along any interested directions that represent possible fluctuations in generation and load. Then, based on the geodesic approximation, we further predict the voltage collapse point by solving a few univariate quadratic equations. Conventional methods mostly rely on either expensive numerical continuation at specified directions or numerical optimization. Instead, the proposed approach constructs the Christoffel symbols of the second kind from the Riemannian metric tensors to characterize the complete local geometry which is then extended to the proximity of the stability boundary with efficient computations. As a result, this approach is suitable to handle high-dimensional variability in operating points due to the large-scale integration of renewable resources. Using various case studies, we demonstrate the advantages of the proposed method and provide additional insights and discussions on voltage stability in renewable-rich power systems.
△ Less
Submitted 3 October, 2023;
originally announced October 2023.
-
Context-Aware Entity Grounding with Open-Vocabulary 3D Scene Graphs
Authors:
Haonan Chang,
Kowndinya Boyalakuntla,
Shiyang Lu,
Siwei Cai,
Eric Jing,
Shreesh Keskar,
Shijie Geng,
Adeeb Abbas,
Lifeng Zhou,
Kostas Bekris,
Abdeslam Boularias
Abstract:
We present an Open-Vocabulary 3D Scene Graph (OVSG), a formal framework for grounding a variety of entities, such as object instances, agents, and regions, with free-form text-based queries. Unlike conventional semantic-based object localization approaches, our system facilitates context-aware entity localization, allowing for queries such as ``pick up a cup on a kitchen table" or ``navigate to a…
▽ More
We present an Open-Vocabulary 3D Scene Graph (OVSG), a formal framework for grounding a variety of entities, such as object instances, agents, and regions, with free-form text-based queries. Unlike conventional semantic-based object localization approaches, our system facilitates context-aware entity localization, allowing for queries such as ``pick up a cup on a kitchen table" or ``navigate to a sofa on which someone is sitting". In contrast to existing research on 3D scene graphs, OVSG supports free-form text input and open-vocabulary querying. Through a series of comparative experiments using the ScanNet dataset and a self-collected dataset, we demonstrate that our proposed approach significantly surpasses the performance of previous semantic-based localization techniques. Moreover, we highlight the practical application of OVSG in real-world robot navigation and manipulation experiments.
△ Less
Submitted 27 September, 2023;
originally announced September 2023.
-
Exploiting Code Symmetries for Learning Program Semantics
Authors:
Kexin Pei,
Weichen Li,
Qirui Jin,
Shuyang Liu,
Scott Geng,
Lorenzo Cavallaro,
Junfeng Yang,
Suman Jana
Abstract:
This paper tackles the challenge of teaching code semantics to Large Language Models (LLMs) for program analysis by incorporating code symmetries into the model architecture. We introduce a group-theoretic framework that defines code symmetries as semantics-preserving transformations, where forming a code symmetry group enables precise and efficient reasoning of code semantics. Our solution, SymC,…
▽ More
This paper tackles the challenge of teaching code semantics to Large Language Models (LLMs) for program analysis by incorporating code symmetries into the model architecture. We introduce a group-theoretic framework that defines code symmetries as semantics-preserving transformations, where forming a code symmetry group enables precise and efficient reasoning of code semantics. Our solution, SymC, develops a novel variant of self-attention that is provably equivariant to code symmetries from the permutation group defined over the program dependence graph. SymC obtains superior performance on five program analysis tasks, outperforming state-of-the-art code models without any pre-training. Our results suggest that code LLMs that encode the code structural prior via the code symmetry group generalize better and faster.
△ Less
Submitted 6 June, 2024; v1 submitted 7 August, 2023;
originally announced August 2023.
-
Flows: Building Blocks of Reasoning and Collaborating AI
Authors:
Martin Josifoski,
Lars Klein,
Maxime Peyrard,
Nicolas Baldwin,
Yifei Li,
Saibo Geng,
Julian Paul Schnitzler,
Yuxing Yao,
Jiheng Wei,
Debjit Paul,
Robert West
Abstract:
Recent advances in artificial intelligence (AI) have produced highly capable and controllable systems. This creates unprecedented opportunities for structured reasoning as well as collaboration among multiple AI systems and humans. To fully realize this potential, it is essential to develop a principled way of designing and studying such structured interactions. For this purpose, we introduce the…
▽ More
Recent advances in artificial intelligence (AI) have produced highly capable and controllable systems. This creates unprecedented opportunities for structured reasoning as well as collaboration among multiple AI systems and humans. To fully realize this potential, it is essential to develop a principled way of designing and studying such structured interactions. For this purpose, we introduce the conceptual framework Flows. Flows are self-contained building blocks of computation, with an isolated state, communicating through a standardized message-based interface. This modular design simplifies the process of creating Flows by allowing them to be recursively composed into arbitrarily nested interactions and is inherently concurrency-friendly. Crucially, any interaction can be implemented using this framework, including prior work on AI-AI and human-AI interactions, prompt engineering schemes, and tool augmentation. We demonstrate the potential of Flows on competitive coding, a challenging task on which even GPT-4 struggles. Our results suggest that structured reasoning and collaboration substantially improve generalization, with AI-only Flows adding +21 and human-AI Flows adding +54 absolute points in terms of solve rate. To support rapid and rigorous research, we introduce the aiFlows library embodying Flows. The aiFlows library is available at https://github.com/epfl-dlab/aiflows. Data and Flows for reproducing our experiments are available at https://github.com/epfl-dlab/cc_flows.
△ Less
Submitted 7 February, 2024; v1 submitted 2 August, 2023;
originally announced August 2023.
-
Robo-centric ESDF: A Fast and Accurate Whole-body Collision Evaluation Tool for Any-shape Robotic Planning
Authors:
Shuang Geng,
Qianhao Wang,
Lei Xie,
Chao Xu,
Yanjun Cao,
Fei Gao
Abstract:
For letting mobile robots travel flexibly through complicated environments, increasing attention has been paid to the whole-body collision evaluation. Most existing works either opt for the conservative corridor-based methods that impose strict requirements on the corridor generation, or ESDF-based methods that suffer from high computational overhead. It is still a great challenge to achieve fast…
▽ More
For letting mobile robots travel flexibly through complicated environments, increasing attention has been paid to the whole-body collision evaluation. Most existing works either opt for the conservative corridor-based methods that impose strict requirements on the corridor generation, or ESDF-based methods that suffer from high computational overhead. It is still a great challenge to achieve fast and accurate whole-body collision evaluation. In this paper, we propose a Robo-centric ESDF (RC-ESDF) that is pre-built in the robot body frame and is capable of seamlessly applied to any-shape mobile robots, even for those with non-convex shapes. RC-ESDF enjoys lazy collision evaluation, which retains only the minimum information sufficient for whole-body safety constraint and significantly speeds up trajectory optimization. Based on the analytical gradients provided by RC-ESDF, we optimize the position and rotation of robot jointly, with whole-body safety, smoothness, and dynamical feasibility taken into account. Extensive simulation and real-world experiments verified the reliability and generalizability of our method.
△ Less
Submitted 28 June, 2023;
originally announced June 2023.
-
Self-Enhancement Improves Text-Image Retrieval in Foundation Visual-Language Models
Authors:
Yuguang Yang,
Yiming Wang,
Shupeng Geng,
Runqi Wang,
Yimi Wang,
Sheng Wu,
Baochang Zhang
Abstract:
The emergence of cross-modal foundation models has introduced numerous approaches grounded in text-image retrieval. However, on some domain-specific retrieval tasks, these models fail to focus on the key attributes required. To address this issue, we propose a self-enhancement framework, A^{3}R, based on the CLIP-ViT/G-14, one of the largest cross-modal models. First, we perform an Attribute Augme…
▽ More
The emergence of cross-modal foundation models has introduced numerous approaches grounded in text-image retrieval. However, on some domain-specific retrieval tasks, these models fail to focus on the key attributes required. To address this issue, we propose a self-enhancement framework, A^{3}R, based on the CLIP-ViT/G-14, one of the largest cross-modal models. First, we perform an Attribute Augmentation strategy to enrich the textual description for fine-grained representation before model learning. Then, we propose an Adaption Re-ranking method to unify the representation space of textual query and candidate images and re-rank candidate images relying on the adapted query after model learning. The proposed framework is validated to achieve a salient improvement over the baseline and other teams' solutions in the cross-modal image retrieval track of the 1st foundation model challenge without introducing any additional samples. The code is available at \url{https://github.com/CapricornGuang/A3R}.
△ Less
Submitted 11 June, 2023;
originally announced June 2023.
-
Predicting Consultation Success in Online Health Platforms Using Dynamic Knowledge Networks and Multimodal Data Fusion
Authors:
Shuang Geng,
Wenli Zhang,
Jiaheng Xie,
Gemin Liang,
Ben Niu,
Sudha Ram
Abstract:
Online healthcare consultation in virtual health is an emerging industry marked by innovation and fierce competition. Accurate and timely prediction of healthcare consultation success can proactively help online platforms address patient concerns and improve retention rates. However, predicting online consultation success is challenging due to the partial role of virtual consultations in patients'…
▽ More
Online healthcare consultation in virtual health is an emerging industry marked by innovation and fierce competition. Accurate and timely prediction of healthcare consultation success can proactively help online platforms address patient concerns and improve retention rates. However, predicting online consultation success is challenging due to the partial role of virtual consultations in patients' overall healthcare journey and the disconnect between online and in-person healthcare IT systems. Patient data in online consultations is often sparse and incomplete, presenting significant technical challenges and a research gap. To address these issues, we propose the Dynamic Knowledge Network and Multimodal Data Fusion (DyKoNeM) framework, which enhances the predictive power of online healthcare consultations. Our work has important implications for new business models where specific and detailed online communication processes are stored in the IT database, and at the same time, latent information with predictive power is embedded in the network formed by stakeholders' digital traces. It can be extended to diverse industries and domains, where the virtual or hybrid model (e.g., integration of online and offline services) is emerging as a prevailing trend.
△ Less
Submitted 14 June, 2024; v1 submitted 6 June, 2023;
originally announced June 2023.
-
Improving Offline RL by Blending Heuristics
Authors:
Sinong Geng,
Aldo Pacchiano,
Andrey Kolobov,
Ching-An Cheng
Abstract:
We propose Heuristic Blending (HUBL), a simple performance-improving technique for a broad class of offline RL algorithms based on value bootstrapping. HUBL modifies the Bellman operators used in these algorithms, partially replacing the bootstrapped values with heuristic ones that are estimated with Monte-Carlo returns. For trajectories with higher returns, HUBL relies more on the heuristic value…
▽ More
We propose Heuristic Blending (HUBL), a simple performance-improving technique for a broad class of offline RL algorithms based on value bootstrapping. HUBL modifies the Bellman operators used in these algorithms, partially replacing the bootstrapped values with heuristic ones that are estimated with Monte-Carlo returns. For trajectories with higher returns, HUBL relies more on the heuristic values and less on bootstrapping; otherwise, it leans more heavily on bootstrapping. HUBL is very easy to combine with many existing offline RL implementations by relabeling the offline datasets with adjusted rewards and discount factors. We derive a theory that explains HUBL's effect on offline RL as reducing offline RL's complexity and thus increasing its finite-sample performance. Furthermore, we empirically demonstrate that HUBL consistently improves the policy quality of four state-of-the-art bootstrapping-based offline RL algorithms (ATAC, CQL, TD3+BC, and IQL), by 9% on average over 27 datasets of the D4RL and Meta-World benchmarks.
△ Less
Submitted 15 March, 2024; v1 submitted 31 May, 2023;
originally announced June 2023.
-
The First LHAASO Catalog of Gamma-Ray Sources
Authors:
Zhen Cao,
F. Aharonian,
Q. An,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen,
S. Z. Chen
, et al. (255 additional authors not shown)
Abstract:
We present the first catalog of very-high energy and ultra-high energy gamma-ray sources detected by the Large High Altitude Air Shower Observatory (LHAASO). The catalog was compiled using 508 days of data collected by the Water Cherenkov Detector Array (WCDA) from March 2021 to September 2022 and 933 days of data recorded by the Kilometer Squared Array (KM2A) from January 2020 to September 2022.…
▽ More
We present the first catalog of very-high energy and ultra-high energy gamma-ray sources detected by the Large High Altitude Air Shower Observatory (LHAASO). The catalog was compiled using 508 days of data collected by the Water Cherenkov Detector Array (WCDA) from March 2021 to September 2022 and 933 days of data recorded by the Kilometer Squared Array (KM2A) from January 2020 to September 2022. This catalog represents the main result from the most sensitive large coverage gamma-ray survey of the sky above 1 TeV, covering declination from $-$20$^{\circ}$ to 80$^{\circ}$. In total, the catalog contains 90 sources with an extended size smaller than $2^\circ$ and a significance of detection at $> 5σ$. Based on our source association criteria, 32 new TeV sources are proposed in this study. Among the 90 sources, 43 sources are detected with ultra-high energy ($E > 100$ TeV) emission at $> 4σ$ significance level. We provide the position, extension, and spectral characteristics of all the sources in this catalog.
△ Less
Submitted 27 November, 2023; v1 submitted 26 May, 2023;
originally announced May 2023.
-
VIP5: Towards Multimodal Foundation Models for Recommendation
Authors:
Shijie Geng,
Juntao Tan,
Shuchang Liu,
Zuohui Fu,
Yongfeng Zhang
Abstract:
Computer Vision (CV), Natural Language Processing (NLP), and Recommender Systems (RecSys) are three prominent AI applications that have traditionally developed independently, resulting in disparate modeling and engineering methodologies. This has impeded the ability for these fields to directly benefit from each other's advancements. With the recent development of foundation models, large language…
▽ More
Computer Vision (CV), Natural Language Processing (NLP), and Recommender Systems (RecSys) are three prominent AI applications that have traditionally developed independently, resulting in disparate modeling and engineering methodologies. This has impeded the ability for these fields to directly benefit from each other's advancements. With the recent development of foundation models, large language models have emerged as a potential general-purpose interface for unifying different modalities and problem formulations. In light of this, we propose the development of a multimodal foundation model (MFM) considering visual, textual, and personalization modalities under the P5 recommendation paradigm, thus named VIP5 (Visual P5), to unify various modalities and recommendation tasks. This will enable the processing of multiple modalities in a shared architecture for improved recommendations. To achieve this, we introduce multimodal personalized prompts to accommodate multiple modalities under a shared format. Additionally, we propose a parameter-efficient training method for foundation models, which involves freezing the P5 backbone and fine-tuning lightweight adapters, resulting in improved recommendation performance and increased efficiency in terms of training time and memory usage. Code and data of VIP5 are available at https://github.com/jeykigung/VIP5.
△ Less
Submitted 14 October, 2023; v1 submitted 23 May, 2023;
originally announced May 2023.
-
Grammar-Constrained Decoding for Structured NLP Tasks without Finetuning
Authors:
Saibo Geng,
Martin Josifoski,
Maxime Peyrard,
Robert West
Abstract:
Despite their impressive performance, large language models (LMs) still struggle with reliably generating complex output structures when not finetuned to follow the required output format exactly. To address this issue, grammar-constrained decoding (GCD) can be used to control the generation of LMs, guaranteeing that the output follows a given structure. Most existing GCD methods are, however, lim…
▽ More
Despite their impressive performance, large language models (LMs) still struggle with reliably generating complex output structures when not finetuned to follow the required output format exactly. To address this issue, grammar-constrained decoding (GCD) can be used to control the generation of LMs, guaranteeing that the output follows a given structure. Most existing GCD methods are, however, limited to specific tasks, such as parsing or code generation. In this work, we demonstrate that formal grammars can describe the output space for a much wider range of tasks and argue that GCD can serve as a unified framework for structured NLP tasks in general. For increased flexibility, we introduce input-dependent grammars, which allow the grammar to depend on the input and thus enable the generation of different output structures for different inputs. We then empirically demonstrate the power and flexibility of GCD-enhanced LMs on (1) information extraction, (2) entity disambiguation, and (3) constituency parsing. Our results indicate that grammar-constrained LMs substantially outperform unconstrained LMs or even beat task-specific finetuned models. Grammar constraints thus hold great promise for harnessing off-the-shelf LMs for a wide range of structured NLP tasks, especially where training data is scarce or finetuning is expensive. Code and data: https://github.com/epfl-dlab/GCD.
△ Less
Submitted 18 January, 2024; v1 submitted 23 May, 2023;
originally announced May 2023.
-
Measurement of ultra-high-energy diffuse gamma-ray emission of the Galactic plane from 10 TeV to 1 PeV with LHAASO-KM2A
Authors:
Zhen Cao,
F. Aharonian,
Q. An,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen,
S. Z. Chen
, et al. (255 additional authors not shown)
Abstract:
The diffuse Galactic $γ$-ray emission, mainly produced via interactions between cosmic rays and the interstellar medium and/or radiation field, is a very important probe of the distribution, propagation, and interaction of cosmic rays in the Milky Way. In this work we report the measurements of diffuse $γ$-rays from the Galactic plane between 10 TeV and 1 PeV energies, with the square kilometer ar…
▽ More
The diffuse Galactic $γ$-ray emission, mainly produced via interactions between cosmic rays and the interstellar medium and/or radiation field, is a very important probe of the distribution, propagation, and interaction of cosmic rays in the Milky Way. In this work we report the measurements of diffuse $γ$-rays from the Galactic plane between 10 TeV and 1 PeV energies, with the square kilometer array of the Large High Altitude Air Shower Observatory (LHAASO). Diffuse emissions from the inner ($15^{\circ}<l<125^{\circ}$, $|b|<5^{\circ}$) and outer ($125^{\circ}<l<235^{\circ}$, $|b|<5^{\circ}$) Galactic plane are detected with $29.1σ$ and $12.7σ$ significance, respectively. The outer Galactic plane diffuse emission is detected for the first time in the very- to ultra-high-energy domain ($E>10$~TeV). The energy spectrum in the inner Galaxy regions can be described by a power-law function with an index of $-2.99\pm0.04$, which is different from the curved spectrum as expected from hadronic interactions between locally measured cosmic rays and the line-of-sight integrated gas content. Furthermore, the measured flux is higher by a factor of $\sim3$ than the prediction. A similar spectrum with an index of $-2.99\pm0.07$ is found in the outer Galaxy region, and the absolute flux for $10\lesssim E\lesssim60$ TeV is again higher than the prediction for hadronic cosmic ray interactions. The latitude distributions of the diffuse emission are consistent with the gas distribution, while the longitude distributions show clear deviation from the gas distribution. The LHAASO measurements imply that either additional emission sources exist or cosmic ray intensities have spatial variations.
△ Less
Submitted 19 August, 2023; v1 submitted 9 May, 2023;
originally announced May 2023.
-
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
Authors:
Peng Gao,
Jiaming Han,
Renrui Zhang,
Ziyi Lin,
Shijie Geng,
Aojun Zhou,
Wei Zhang,
Pan Lu,
Conghui He,
Xiangyu Yue,
Hongsheng Li,
Yu Qiao
Abstract:
How to efficiently transform large language models (LLMs) into instruction followers is recently a popular research direction, while training LLM for multi-modal reasoning remains less explored. Although the recent LLaMA-Adapter demonstrates the potential to handle visual inputs with LLMs, it still cannot generalize well to open-ended visual instructions and lags behind GPT-4. In this paper, we pr…
▽ More
How to efficiently transform large language models (LLMs) into instruction followers is recently a popular research direction, while training LLM for multi-modal reasoning remains less explored. Although the recent LLaMA-Adapter demonstrates the potential to handle visual inputs with LLMs, it still cannot generalize well to open-ended visual instructions and lags behind GPT-4. In this paper, we present LLaMA-Adapter V2, a parameter-efficient visual instruction model. Specifically, we first augment LLaMA-Adapter by unlocking more learnable parameters (e.g., norm, bias and scale), which distribute the instruction-following ability across the entire LLaMA model besides adapters. Secondly, we propose an early fusion strategy to feed visual tokens only into the early LLM layers, contributing to better visual knowledge incorporation. Thirdly, a joint training paradigm of image-text pairs and instruction-following data is introduced by optimizing disjoint groups of learnable parameters. This strategy effectively alleviates the interference between the two tasks of image-text alignment and instruction following and achieves strong multi-modal reasoning with only a small-scale image-text and instruction dataset. During inference, we incorporate additional expert models (e.g. captioning/OCR systems) into LLaMA-Adapter to further enhance its image understanding capability without incurring training costs. Compared to the original LLaMA-Adapter, our LLaMA-Adapter V2 can perform open-ended multi-modal instructions by merely introducing 14M parameters over LLaMA. The newly designed framework also exhibits stronger language-only instruction-following capabilities and even excels in chat interactions. Our code and models are available at https://github.com/ZrrSkywalker/LLaMA-Adapter.
△ Less
Submitted 28 April, 2023;
originally announced April 2023.
-
A Data-Driven State Aggregation Approach for Dynamic Discrete Choice Models
Authors:
Sinong Geng,
Houssam Nassif,
Carlos A. Manzanares
Abstract:
We study dynamic discrete choice models, where a commonly studied problem involves estimating parameters of agent reward functions (also known as "structural" parameters), using agent behavioral data. Maximum likelihood estimation for such models requires dynamic programming, which is limited by the curse of dimensionality. In this work, we present a novel algorithm that provides a data-driven met…
▽ More
We study dynamic discrete choice models, where a commonly studied problem involves estimating parameters of agent reward functions (also known as "structural" parameters), using agent behavioral data. Maximum likelihood estimation for such models requires dynamic programming, which is limited by the curse of dimensionality. In this work, we present a novel algorithm that provides a data-driven method for selecting and aggregating states, which lowers the computational and sample complexity of estimation. Our method works in two stages. In the first stage, we use a flexible inverse reinforcement learning approach to estimate agent Q-functions. We use these estimated Q-functions, along with a clustering algorithm, to select a subset of states that are the most pivotal for driving changes in Q-functions. In the second stage, with these selected "aggregated" states, we conduct maximum likelihood estimation using a commonly used nested fixed-point algorithm. The proposed two-stage approach mitigates the curse of dimensionality by reducing the problem dimension. Theoretically, we derive finite-sample bounds on the associated estimation error, which also characterize the trade-off of computational complexity, estimation error, and sample complexity. We demonstrate the empirical performance of the algorithm in two classic dynamic discrete choice estimation applications.
△ Less
Submitted 31 May, 2023; v1 submitted 10 April, 2023;
originally announced April 2023.
-
Characterizing Kirkwood-Dirac nonclassicality and uncertainty diagram based on discrete Fourier transform
Authors:
Ying-Hui Yang,
Bing-Bing Zhang,
Xiao-Li Wang,
Shi-Jiao Geng,
Pei-Ying Chen
Abstract:
In this paper, we investigate the Kirkwood-Dirac nonclassicality and uncertainty diagram based on discrete Fourier transform (DFT) in a $d$ dimensional system. The uncertainty diagram of complete incompatibility bases $\mathcal {A},\mathcal {B}$ are characterized by De Bièvre [arXiv: 2207.07451]. We show that for the uncertainty diagram of the DFT matrix which is a transition matrix from basis…
▽ More
In this paper, we investigate the Kirkwood-Dirac nonclassicality and uncertainty diagram based on discrete Fourier transform (DFT) in a $d$ dimensional system. The uncertainty diagram of complete incompatibility bases $\mathcal {A},\mathcal {B}$ are characterized by De Bièvre [arXiv: 2207.07451]. We show that for the uncertainty diagram of the DFT matrix which is a transition matrix from basis $\mathcal {A}$ to basis $\mathcal {B}$, there is no ``hole" in the region of the $(n_{\mathcal {A}}, n_{\mathcal {B}})$-plane above and on the line $n_{\mathcal {A}}+n_{\mathcal {B}}\geq d+1$, whether the bases $\mathcal {A},\mathcal {B}$ are not complete incompatible bases or not. Then we present that the KD nonclassicality of a state based on the DFT matrix can be completely characterized by using the support uncertainty relation $n_{\mathcal {A}}(ψ)n_{\mathcal {B}}(ψ)\geq d$, where $n_{\mathcal {A}}(ψ)$ and $n_{\mathcal {B}}(ψ)$ count the number of nonvanishing coefficients in the basis $\mathcal {A}$ and $\mathcal {B}$ representations, respectively. That is, a state $|ψ\rangle$ is KD nonclassical if and only if $n_{\mathcal {A}}(ψ)n_{\mathcal {B}}(ψ)> d$, whenever $d$ is prime or not. That gives a positive answer to the conjecture in [Phys. Rev. Lett. \textbf{127}, 190404 (2021)].
△ Less
Submitted 30 March, 2023;
originally announced March 2023.
-
STCF Conceptual Design Report: Volume 1 -- Physics & Detector
Authors:
M. Achasov,
X. C. Ai,
R. Aliberti,
L. P. An,
Q. An,
X. Z. Bai,
Y. Bai,
O. Bakina,
A. Barnyakov,
V. Blinov,
V. Bobrovnikov,
D. Bodrov,
A. Bogomyagkov,
A. Bondar,
I. Boyko,
Z. H. Bu,
F. M. Cai,
H. Cai,
J. J. Cao,
Q. H. Cao,
Z. Cao,
Q. Chang,
K. T. Chao,
D. Y. Chen,
H. Chen
, et al. (413 additional authors not shown)
Abstract:
The Super $τ$-Charm facility (STCF) is an electron-positron collider proposed by the Chinese particle physics community. It is designed to operate in a center-of-mass energy range from 2 to 7 GeV with a peak luminosity of $0.5\times 10^{35}{\rm cm}^{-2}{\rm s}^{-1}$ or higher. The STCF will produce a data sample about a factor of 100 larger than that by the present $τ$-Charm factory -- the BEPCII,…
▽ More
The Super $τ$-Charm facility (STCF) is an electron-positron collider proposed by the Chinese particle physics community. It is designed to operate in a center-of-mass energy range from 2 to 7 GeV with a peak luminosity of $0.5\times 10^{35}{\rm cm}^{-2}{\rm s}^{-1}$ or higher. The STCF will produce a data sample about a factor of 100 larger than that by the present $τ$-Charm factory -- the BEPCII, providing a unique platform for exploring the asymmetry of matter-antimatter (charge-parity violation), in-depth studies of the internal structure of hadrons and the nature of non-perturbative strong interactions, as well as searching for exotic hadrons and physics beyond the Standard Model. The STCF project in China is under development with an extensive R\&D program. This document presents the physics opportunities at the STCF, describes conceptual designs of the STCF detector system, and discusses future plans for detector R\&D and physics case studies.
△ Less
Submitted 5 October, 2023; v1 submitted 28 March, 2023;
originally announced March 2023.
-
Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens
Authors:
Yuxiao Chen,
Jianbo Yuan,
Yu Tian,
Shijie Geng,
Xinyu Li,
Ding Zhou,
Dimitris N. Metaxas,
Hongxia Yang
Abstract:
Contrastive learning-based vision-language pre-training approaches, such as CLIP, have demonstrated great success in many vision-language tasks. These methods achieve cross-modal alignment by encoding a matched image-text pair with similar feature embeddings, which are generated by aggregating information from visual patches and language tokens. However, direct aligning cross-modal information usi…
▽ More
Contrastive learning-based vision-language pre-training approaches, such as CLIP, have demonstrated great success in many vision-language tasks. These methods achieve cross-modal alignment by encoding a matched image-text pair with similar feature embeddings, which are generated by aggregating information from visual patches and language tokens. However, direct aligning cross-modal information using such representations is challenging, as visual patches and text tokens differ in semantic levels and granularities. To alleviate this issue, we propose a Finite Discrete Tokens (FDT) based multimodal representation. FDT is a set of learnable tokens representing certain visual-semantic concepts. Both images and texts are embedded using shared FDT by first grounding multimodal inputs to FDT space and then aggregating the activated FDT representations. The matched visual and semantic concepts are enforced to be represented by the same set of discrete tokens by a sparse activation constraint. As a result, the granularity gap between the two modalities is reduced. Through both quantitative and qualitative analyses, we demonstrate that using FDT representations in CLIP-style models improves cross-modal alignment and performance in visual recognition and vision-language downstream tasks. Furthermore, we show that our method can learn more comprehensive representations, and the learned FDT capture meaningful cross-modal correspondence, ranging from objects to actions and attributes.
△ Less
Submitted 26 March, 2023;
originally announced March 2023.
-
HiCLIP: Contrastive Language-Image Pretraining with Hierarchy-aware Attention
Authors:
Shijie Geng,
Jianbo Yuan,
Yu Tian,
Yuxiao Chen,
Yongfeng Zhang
Abstract:
The success of large-scale contrastive vision-language pretraining (CLIP) has benefited both visual recognition and multimodal content understanding. The concise design brings CLIP the advantage in inference efficiency against other vision-language models with heavier cross-attention fusion layers, making it a popular choice for a wide spectrum of downstream tasks. However, CLIP does not explicitl…
▽ More
The success of large-scale contrastive vision-language pretraining (CLIP) has benefited both visual recognition and multimodal content understanding. The concise design brings CLIP the advantage in inference efficiency against other vision-language models with heavier cross-attention fusion layers, making it a popular choice for a wide spectrum of downstream tasks. However, CLIP does not explicitly capture the hierarchical nature of high-level and fine-grained semantics conveyed in images and texts, which is arguably critical to vision-language understanding and reasoning. To this end, we equip both the visual and language branches in CLIP with hierarchy-aware attentions, namely Hierarchy-aware CLIP (HiCLIP), to progressively discover semantic hierarchies layer-by-layer from both images and texts in an unsupervised manner. As a result, such hierarchical aggregation significantly improves the cross-modal alignment. To demonstrate the advantages of HiCLIP, we conduct qualitative analysis on its unsupervised hierarchy induction during inference, as well as extensive quantitative experiments on both visual recognition and vision-language downstream tasks.
△ Less
Submitted 6 March, 2023;
originally announced March 2023.
-
Artificial Intelligence System for Detection and Screening of Cardiac Abnormalities using Electrocardiogram Images
Authors:
Deyun Zhang,
Shijia Geng,
Yang Zhou,
Weilun Xu,
Guodong Wei,
Kai Wang,
Jie Yu,
Qiang Zhu,
Yongkui Li,
Yonghong Zhao,
Xingyue Chen,
Rui Zhang,
Zhaoji Fu,
Rongbo Zhou,
Yanqi E,
Sumei Fan,
Qinghao Zhao,
Chuandong Cheng,
Nan Peng,
Liang Zhang,
Linlin Zheng,
Jianjun Chu,
Hongbin Xu,
Chen Tan,
Jian Liu
, et al. (6 additional authors not shown)
Abstract:
The artificial intelligence (AI) system has achieved expert-level performance in electrocardiogram (ECG) signal analysis. However, in underdeveloped countries or regions where the healthcare information system is imperfect, only paper ECGs can be provided. Analysis of real-world ECG images (photos or scans of paper ECGs) remains challenging due to complex environments or interference. In this stud…
▽ More
The artificial intelligence (AI) system has achieved expert-level performance in electrocardiogram (ECG) signal analysis. However, in underdeveloped countries or regions where the healthcare information system is imperfect, only paper ECGs can be provided. Analysis of real-world ECG images (photos or scans of paper ECGs) remains challenging due to complex environments or interference. In this study, we present an AI system developed to detect and screen cardiac abnormalities (CAs) from real-world ECG images. The system was evaluated on a large dataset of 52,357 patients from multiple regions and populations across the world. On the detection task, the AI system obtained area under the receiver operating curve (AUC) of 0.996 (hold-out test), 0.994 (external test 1), 0.984 (external test 2), and 0.979 (external test 3), respectively. Meanwhile, the detection results of AI system showed a strong correlation with the diagnosis of cardiologists (cardiologist 1 (R=0.794, p<1e-3), cardiologist 2 (R=0.812, p<1e-3)). On the screening task, the AI system achieved AUCs of 0.894 (hold-out test) and 0.850 (external test). The screening performance of the AI system was better than that of the cardiologists (AI system (0.846) vs. cardiologist 1 (0.520) vs. cardiologist 2 (0.480)). Our study demonstrates the feasibility of an accurate, objective, easy-to-use, fast, and low-cost AI system for CA detection and screening. The system has the potential to be used by healthcare professionals, caregivers, and general users to assess CAs based on real-world ECG images.
△ Less
Submitted 10 February, 2023;
originally announced February 2023.
-
Mono-STAR: Mono-camera Scene-level Tracking and Reconstruction
Authors:
Haonan Chang,
Dhruv Metha Ramesh,
Shijie Geng,
Yuqiu Gan,
Abdeslam Boularias
Abstract:
We present Mono-STAR, the first real-time 3D reconstruction system that simultaneously supports semantic fusion, fast motion tracking, non-rigid object deformation, and topological change under a unified framework. The proposed system solves a new optimization problem incorporating optical-flow-based 2D constraints to deal with fast motion and a novel semantic-aware deformation graph (SAD-graph) f…
▽ More
We present Mono-STAR, the first real-time 3D reconstruction system that simultaneously supports semantic fusion, fast motion tracking, non-rigid object deformation, and topological change under a unified framework. The proposed system solves a new optimization problem incorporating optical-flow-based 2D constraints to deal with fast motion and a novel semantic-aware deformation graph (SAD-graph) for handling topology change. We test the proposed system under various challenging scenes and demonstrate that it significantly outperforms existing state-of-the-art methods.
△ Less
Submitted 30 January, 2023;
originally announced January 2023.
-
Affective Faces for Goal-Driven Dyadic Communication
Authors:
Scott Geng,
Revant Teotia,
Purva Tendulkar,
Sachit Menon,
Carl Vondrick
Abstract:
We introduce a video framework for modeling the association between verbal and non-verbal communication during dyadic conversation. Given the input speech of a speaker, our approach retrieves a video of a listener, who has facial expressions that would be socially appropriate given the context. Our approach further allows the listener to be conditioned on their own goals, personalities, or backgro…
▽ More
We introduce a video framework for modeling the association between verbal and non-verbal communication during dyadic conversation. Given the input speech of a speaker, our approach retrieves a video of a listener, who has facial expressions that would be socially appropriate given the context. Our approach further allows the listener to be conditioned on their own goals, personalities, or backgrounds. Our approach models conversations through a composition of large language models and vision-language models, creating internal representations that are interpretable and controllable. To study multimodal communication, we propose a new video dataset of unscripted conversations covering diverse topics and demographics. Experiments and visualizations show our approach is able to output listeners that are significantly more socially appropriate than baselines. However, many challenges remain, and we release our dataset publicly to spur further progress. See our website for video results, data, and code: https://realtalk.cs.columbia.edu.
△ Less
Submitted 26 January, 2023;
originally announced January 2023.
-
Repercussion of the $a_0(1710)$ [$a_0(1817)$] resonance and future developments
Authors:
E. Oset,
L. R. Dai,
L. S. Geng
Abstract:
In this paper, we discuss the significance and prospect for the newly discovered a0(1710)[a0(1817)] resonance state at BESIII experiment, in which they reported the observation of a scalar meson of spin-parity $J^P=0^+$ with isospin $I=1$, branded as $a_0(1817)$. This state may be the same particle as the $a_0(1710)$ observed by the BaBar experiment earlier. As early as 2008, we found that f0(1710…
▽ More
In this paper, we discuss the significance and prospect for the newly discovered a0(1710)[a0(1817)] resonance state at BESIII experiment, in which they reported the observation of a scalar meson of spin-parity $J^P=0^+$ with isospin $I=1$, branded as $a_0(1817)$. This state may be the same particle as the $a_0(1710)$ observed by the BaBar experiment earlier. As early as 2008, we found that f0(1710) can be regarded as a $K^* \bar{K}^*$ molecular state based on the chiral unitary theory, and there is a partner state $a_0(1710)$ with an isospin $I=1$. Our theoretical prediction based on this picture is in good agreement with the latest BESIII data, which further supports the molecular state picture of $a_0(1710)[a_0(1817)]$. If it is indeed the isospin partner state of $f_0(1710)$, this would rule out $f_0(1710)$ as a glueball candidate. This paper briefly reviews the relevant theoretical studies and suggests new experiments to further examine the nature of $a_0(1710)[a_0(1817)]$.
△ Less
Submitted 25 January, 2023; v1 submitted 20 January, 2023;
originally announced January 2023.
-
Intersection vectors over tilings with applications to gentle algebras and cluster algebras
Authors:
Changjian Fu,
Shengfei Geng
Abstract:
It is proved that a multiset of permissible arcs over a tiling is uniquely determined by its intersection vector under a mild condition. This generalizes a classical result over marked surfaces with triangulations. We apply this result to study $τ$-tilting theory of gentle algebras and denominator conjecture in cluster algebras. In the case of gentle algebras, it is proved that different $τ$-rigid…
▽ More
It is proved that a multiset of permissible arcs over a tiling is uniquely determined by its intersection vector under a mild condition. This generalizes a classical result over marked surfaces with triangulations. We apply this result to study $τ$-tilting theory of gentle algebras and denominator conjecture in cluster algebras. In the case of gentle algebras, it is proved that different $τ$-rigid $A$-modules over a gentle algebra $A$ have different dimension vectors if and only if $A$ has no even oriented cycle with full relations. For cluster algebras, the denominator conjecture has been established for cluster algebras of type $\mathbb{A}\mathbb{B}\mathbb{C}$.
△ Less
Submitted 4 October, 2023; v1 submitted 22 December, 2022;
originally announced December 2022.
-
Understanding Zero-Shot Adversarial Robustness for Large-Scale Models
Authors:
Chengzhi Mao,
Scott Geng,
Junfeng Yang,
Xin Wang,
Carl Vondrick
Abstract:
Pretrained large-scale vision-language models like CLIP have exhibited strong generalization over unseen tasks. Yet imperceptible adversarial perturbations can significantly reduce CLIP's performance on new tasks. In this work, we identify and explore the problem of \emph{adapting large-scale models for zero-shot adversarial robustness}. We first identify two key factors during model adaption -- t…
▽ More
Pretrained large-scale vision-language models like CLIP have exhibited strong generalization over unseen tasks. Yet imperceptible adversarial perturbations can significantly reduce CLIP's performance on new tasks. In this work, we identify and explore the problem of \emph{adapting large-scale models for zero-shot adversarial robustness}. We first identify two key factors during model adaption -- training losses and adaptation methods -- that affect the model's zero-shot adversarial robustness. We then propose a text-guided contrastive adversarial training loss, which aligns the text embeddings and the adversarial visual features with contrastive learning on a small set of training data. We apply this training loss to two adaption methods, model finetuning and visual prompt tuning. We find that visual prompt tuning is more effective in the absence of texts, while finetuning wins in the existence of text guidance. Overall, our approach significantly improves the zero-shot adversarial robustness over CLIP, seeing an average improvement of over 31 points over ImageNet and 15 zero-shot datasets. We hope this work can shed light on understanding the zero-shot adversarial robustness of large-scale models.
△ Less
Submitted 21 April, 2023; v1 submitted 13 December, 2022;
originally announced December 2022.
-
NeuDep: Neural Binary Memory Dependence Analysis
Authors:
Kexin Pei,
Dongdong She,
Michael Wang,
Scott Geng,
Zhou Xuan,
Yaniv David,
Junfeng Yang,
Suman Jana,
Baishakhi Ray
Abstract:
Determining whether multiple instructions can access the same memory location is a critical task in binary analysis. It is challenging as statically computing precise alias information is undecidable in theory. The problem aggravates at the binary level due to the presence of compiler optimizations and the absence of symbols and types. Existing approaches either produce significant spurious depend…
▽ More
Determining whether multiple instructions can access the same memory location is a critical task in binary analysis. It is challenging as statically computing precise alias information is undecidable in theory. The problem aggravates at the binary level due to the presence of compiler optimizations and the absence of symbols and types. Existing approaches either produce significant spurious dependencies due to conservative analysis or scale poorly to complex binaries.
We present a new machine-learning-based approach to predict memory dependencies by exploiting the model's learned knowledge about how binary programs execute. Our approach features (i) a self-supervised procedure that pretrains a neural net to reason over binary code and its dynamic value flows through memory addresses, followed by (ii) supervised finetuning to infer the memory dependencies statically. To facilitate efficient learning, we develop dedicated neural architectures to encode the heterogeneous inputs (i.e., code, data values, and memory addresses from traces) with specific modules and fuse them with a composition learning strategy.
We implement our approach in NeuDep and evaluate it on 41 popular software projects compiled by 2 compilers, 4 optimizations, and 4 obfuscation passes. We demonstrate that NeuDep is more precise (1.5x) and faster (3.5x) than the current state-of-the-art. Extensive probing studies on security-critical reverse engineering tasks suggest that NeuDep understands memory access patterns, learns function signatures, and is able to match indirect calls. All these tasks either assist or benefit from inferring memory dependencies. Notably, NeuDep also outperforms the current state-of-the-art on these tasks.
△ Less
Submitted 4 October, 2022;
originally announced October 2022.
-
A diffuse-interface lattice Boltzmann method for the dendritic growth with thermosolutal convection
Authors:
Chengjie Zhan,
Zhenhua Chai,
Baochang Shi,
Ping Jiang,
Shaoning Geng,
Dongke Sun
Abstract:
In this work, we proposed a diffuse interface model for the dendritic growth with thermosolutal convection. In this model, the sharp boundary between the fluid and solid dendrite is replaced by a thin but nonzero thickness diffuse interface, which is described by the order parameter governed by the phase-field equation for the dendritic growth. The governing equations for solute and heat transfer…
▽ More
In this work, we proposed a diffuse interface model for the dendritic growth with thermosolutal convection. In this model, the sharp boundary between the fluid and solid dendrite is replaced by a thin but nonzero thickness diffuse interface, which is described by the order parameter governed by the phase-field equation for the dendritic growth. The governing equations for solute and heat transfer are modified such that the previous special treatments for source term can be avoided. To solve the model for the dendritic growth with thermosolutal convection, we also developed a diffuse-interface multi-relaxation-time lattice Boltzmann (LB) method. In this method, the order parameter in the phase-field equation is combined into the force caused by the fluid-solid interaction, and the treatment on the complex fluid-solid interface can be avoided. In addition, four LB models are developed for the phase-field equation, concentration equation, temperature equation and the Navier-Stokes equations in a unified framework. Finally, to test the present diffuse-interface LB method, we performed some simulations of the dendritic growth, and found that the numerical results are in good agreements with some previous works.
△ Less
Submitted 3 September, 2022;
originally announced September 2022.
-
Frozen CLIP Models are Efficient Video Learners
Authors:
Ziyi Lin,
Shijie Geng,
Renrui Zhang,
Peng Gao,
Gerard de Melo,
Xiaogang Wang,
Jifeng Dai,
Yu Qiao,
Hongsheng Li
Abstract:
Video recognition has been dominated by the end-to-end learning paradigm -- first initializing a video recognition model with weights of a pretrained image model and then conducting end-to-end training on videos. This enables the video network to benefit from the pretrained image model. However, this requires substantial computation and memory resources for finetuning on videos and the alternative…
▽ More
Video recognition has been dominated by the end-to-end learning paradigm -- first initializing a video recognition model with weights of a pretrained image model and then conducting end-to-end training on videos. This enables the video network to benefit from the pretrained image model. However, this requires substantial computation and memory resources for finetuning on videos and the alternative of directly using pretrained image features without finetuning the image backbone leads to subpar results. Fortunately, recent advances in Contrastive Vision-Language Pre-training (CLIP) pave the way for a new route for visual recognition tasks. Pretrained on large open-vocabulary image-text pair data, these models learn powerful visual representations with rich semantics. In this paper, we present Efficient Video Learning (EVL) -- an efficient framework for directly training high-quality video recognition models with frozen CLIP features. Specifically, we employ a lightweight Transformer decoder and learn a query token to dynamically collect frame-level spatial features from the CLIP image encoder. Furthermore, we adopt a local temporal module in each decoder layer to discover temporal clues from adjacent frames and their attention maps. We show that despite being efficient to train with a frozen backbone, our models learn high quality video representations on a variety of video recognition datasets. Code is available at https://github.com/OpenGVLab/efficient-video-recognition.
△ Less
Submitted 6 August, 2022;
originally announced August 2022.