-
BEVSpread: Spread Voxel Pooling for Bird's-Eye-View Representation in Vision-based Roadside 3D Object Detection
Authors:
Wenjie Wang,
Yehao Lu,
Guangcong Zheng,
Shuigen Zhan,
Xiaoqing Ye,
Zichang Tan,
Jingdong Wang,
Gaoang Wang,
Xi Li
Abstract:
Vision-based roadside 3D object detection has attracted rising attention in autonomous driving domain, since it encompasses inherent advantages in reducing blind spots and expanding perception range. While previous work mainly focuses on accurately estimating depth or height for 2D-to-3D mapping, ignoring the position approximation error in the voxel pooling process. Inspired by this insight, we p…
▽ More
Vision-based roadside 3D object detection has attracted rising attention in autonomous driving domain, since it encompasses inherent advantages in reducing blind spots and expanding perception range. While previous work mainly focuses on accurately estimating depth or height for 2D-to-3D mapping, ignoring the position approximation error in the voxel pooling process. Inspired by this insight, we propose a novel voxel pooling strategy to reduce such error, dubbed BEVSpread. Specifically, instead of bringing the image features contained in a frustum point to a single BEV grid, BEVSpread considers each frustum point as a source and spreads the image features to the surrounding BEV grids with adaptive weights. To achieve superior propagation performance, a specific weight function is designed to dynamically control the decay speed of the weights according to distance and depth. Aided by customized CUDA parallel acceleration, BEVSpread achieves comparable inference time as the original voxel pooling. Extensive experiments on two large-scale roadside benchmarks demonstrate that, as a plug-in, BEVSpread can significantly improve the performance of existing frustum-based BEV methods by a large margin of (1.12, 5.26, 3.01) AP in vehicle, pedestrian and cyclist.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
C-Mamba: Channel Correlation Enhanced State Space Models for Multivariate Time Series Forecasting
Authors:
Chaolv Zeng,
Zhanyu Liu,
Guanjie Zheng,
Linghe Kong
Abstract:
In recent years, significant progress has been made in multivariate time series forecasting using Linear-based, Transformer-based, and Convolution-based models. However, these approaches face notable limitations: linear forecasters struggle with representation capacities, attention mechanisms suffer from quadratic complexity, and convolutional models have a restricted receptive field. These constr…
▽ More
In recent years, significant progress has been made in multivariate time series forecasting using Linear-based, Transformer-based, and Convolution-based models. However, these approaches face notable limitations: linear forecasters struggle with representation capacities, attention mechanisms suffer from quadratic complexity, and convolutional models have a restricted receptive field. These constraints impede their effectiveness in modeling complex time series, particularly those with numerous variables. Additionally, many models adopt the Channel-Independent (CI) strategy, treating multivariate time series as uncorrelated univariate series while ignoring their correlations. For models considering inter-channel relationships, whether through the self-attention mechanism, linear combination, or convolution, they all incur high computational costs and focus solely on weighted summation relationships, neglecting potential proportional relationships between channels. In this work, we address these issues by leveraging the newly introduced state space model and propose \textbf{C-Mamba}, a novel approach that captures cross-channel dependencies while maintaining linear complexity without losing the global receptive field. Our model consists of two key components: (i) channel mixup, where two channels are mixed to enhance the training sets; (ii) channel attention enhanced patch-wise Mamba encoder that leverages the ability of the state space models to capture cross-time dependencies and models correlations between channels by mining their weight relationships. Our model achieves state-of-the-art performance on seven real-world time series datasets. Moreover, the proposed mixup and attention strategy exhibits strong generalizability across other frameworks.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
MagiNet: Mask-Aware Graph Imputation Network for Incomplete Traffic Data
Authors:
Jianping Zhou,
Bin Lu,
Zhanyu Liu,
Siyu Pan,
Xuejun Feng,
Hua Wei,
Guanjie Zheng,
Xinbing Wang,
Chenghu Zhou
Abstract:
Due to detector malfunctions and communication failures, missing data is ubiquitous during the collection of traffic data. Therefore, it is of vital importance to impute the missing values to facilitate data analysis and decision-making for Intelligent Transportation System (ITS). However, existing imputation methods generally perform zero pre-filling techniques to initialize missing values, intro…
▽ More
Due to detector malfunctions and communication failures, missing data is ubiquitous during the collection of traffic data. Therefore, it is of vital importance to impute the missing values to facilitate data analysis and decision-making for Intelligent Transportation System (ITS). However, existing imputation methods generally perform zero pre-filling techniques to initialize missing values, introducing inevitable noises. Moreover, we observe prevalent over-smoothing interpolations, falling short in revealing the intrinsic spatio-temporal correlations of incomplete traffic data. To this end, we propose Mask-Aware Graph imputation Network: MagiNet. Our method designs an adaptive mask spatio-temporal encoder to learn the latent representations of incomplete data, eliminating the reliance on pre-filling missing values. Furthermore, we devise a spatio-temporal decoder that stacks multiple blocks to capture the inherent spatial and temporal dependencies within incomplete traffic data, alleviating over-smoothing imputation. Extensive experiments demonstrate that our method outperforms state-of-the-art imputation methods on five real-world traffic datasets, yielding an average improvement of 4.31% in RMSE and 3.72% in MAPE.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
A Data and Model-Driven Deep Learning Approach to Robust Downlink Beamforming Optimization
Authors:
Kai Liang,
Gan Zheng,
Zan Li,
Kai-Kit Wong,
Chan-Byoung Chae
Abstract:
This paper investigates the optimization of the long-standing probabilistically robust transmit beamforming problem with channel uncertainties in the multiuser multiple-input single-output (MISO) downlink transmission. This problem poses significant analytical and computational challenges. Currently, the state-of-the-art optimization method relies on convex restrictions as tractable approximations…
▽ More
This paper investigates the optimization of the long-standing probabilistically robust transmit beamforming problem with channel uncertainties in the multiuser multiple-input single-output (MISO) downlink transmission. This problem poses significant analytical and computational challenges. Currently, the state-of-the-art optimization method relies on convex restrictions as tractable approximations to ensure robustness against Gaussian channel uncertainties. However, this method not only exhibits high computational complexity and suffers from the rank relaxation issue but also yields conservative solutions. In this paper, we propose an unsupervised deep learning-based approach that incorporates the sampling of channel uncertainties in the training process to optimize the probabilistic system performance. We introduce a model-driven learning approach that defines a new beamforming structure with trainable parameters to account for channel uncertainties. Additionally, we employ a graph neural network to efficiently infer the key beamforming parameters. We successfully apply this approach to the minimum rate quantile maximization problem subject to outage and total power constraints. Furthermore, we propose a bisection search method to address the more challenging power minimization problem with probabilistic rate constraints by leveraging the aforementioned approach. Numerical results confirm that our approach achieves non-conservative robust performance, higher data rates, greater power efficiency, and faster execution compared to state-of-the-art optimization methods.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Frequency Enhanced Pre-training for Cross-city Few-shot Traffic Forecasting
Authors:
Zhanyu Liu,
Jianrong Ding,
Guanjie Zheng
Abstract:
The field of Intelligent Transportation Systems (ITS) relies on accurate traffic forecasting to enable various downstream applications. However, developing cities often face challenges in collecting sufficient training traffic data due to limited resources and outdated infrastructure. Recognizing this obstacle, the concept of cross-city few-shot forecasting has emerged as a viable approach. While…
▽ More
The field of Intelligent Transportation Systems (ITS) relies on accurate traffic forecasting to enable various downstream applications. However, developing cities often face challenges in collecting sufficient training traffic data due to limited resources and outdated infrastructure. Recognizing this obstacle, the concept of cross-city few-shot forecasting has emerged as a viable approach. While previous cross-city few-shot forecasting methods ignore the frequency similarity between cities, we have made an observation that the traffic data is more similar in the frequency domain between cities. Based on this fact, we propose a \textbf{F}requency \textbf{E}nhanced \textbf{P}re-training Framework for \textbf{Cross}-city Few-shot Forecasting (\textbf{FEPCross}). FEPCross has a pre-training stage and a fine-tuning stage. In the pre-training stage, we propose a novel Cross-Domain Spatial-Temporal Encoder that incorporates the information of the time and frequency domain and trains it with self-supervised tasks encompassing reconstruction and contrastive objectives. In the fine-tuning stage, we design modules to enrich training samples and maintain a momentum-updated graph structure, thereby mitigating the risk of overfitting to the few-shot training data. Empirical evaluations performed on real-world traffic datasets validate the exceptional efficacy of FEPCross, outperforming existing approaches of diverse categories and demonstrating characteristics that foster the progress of cross-city few-shot forecasting.
△ Less
Submitted 5 June, 2024; v1 submitted 3 June, 2024;
originally announced June 2024.
-
CondTSF: One-line Plugin of Dataset Condensation for Time Series Forecasting
Authors:
Jianrong Ding,
Zhanyu Liu,
Guanjie Zheng,
Haiming Jin,
Linghe Kong
Abstract:
Dataset condensation is a newborn technique that generates a small dataset that can be used in training deep neural networks to lower training costs. The objective of dataset condensation is to ensure that the model trained with the synthetic dataset can perform comparably to the model trained with full datasets. However, existing methods predominantly concentrate on classification tasks, posing c…
▽ More
Dataset condensation is a newborn technique that generates a small dataset that can be used in training deep neural networks to lower training costs. The objective of dataset condensation is to ensure that the model trained with the synthetic dataset can perform comparably to the model trained with full datasets. However, existing methods predominantly concentrate on classification tasks, posing challenges in their adaptation to time series forecasting (TS-forecasting). This challenge arises from disparities in the evaluation of synthetic data. In classification, the synthetic data is considered well-distilled if the model trained with the full dataset and the model trained with the synthetic dataset yield identical labels for the same input, regardless of variations in output logits distribution. Conversely, in TS-forecasting, the effectiveness of synthetic data distillation is determined by the distance between predictions of the two models. The synthetic data is deemed well-distilled only when all data points within the predictions are similar. Consequently, TS-forecasting has a more rigorous evaluation methodology compared to classification. To mitigate this gap, we theoretically analyze the optimization objective of dataset condensation for TS-forecasting and propose a new one-line plugin of dataset condensation designated as Dataset Condensation for Time Series Forecasting (CondTSF) based on our analysis. Plugging CondTSF into previous dataset condensation methods facilitates a reduction in the distance between the predictions of the model trained with the full dataset and the model trained with the synthetic dataset, thereby enhancing performance. We conduct extensive experiments on eight commonly used time series datasets. CondTSF consistently improves the performance of all previous dataset condensation methods across all datasets, particularly at low condensing ratios.
△ Less
Submitted 11 June, 2024; v1 submitted 4 June, 2024;
originally announced June 2024.
-
Instruction Tuning with Retrieval-based Examples Ranking for Aspect-based Sentiment Analysis
Authors:
Guangmin Zheng,
Jin Wang,
Liang-Chih Yu,
Xuejie Zhang
Abstract:
Aspect-based sentiment analysis (ABSA) identifies sentiment information related to specific aspects and provides deeper market insights to businesses and organizations. With the emergence of large language models (LMs), recent studies have proposed using fixed examples for instruction tuning to reformulate ABSA as a generation task. However, the performance is sensitive to the selection of in-cont…
▽ More
Aspect-based sentiment analysis (ABSA) identifies sentiment information related to specific aspects and provides deeper market insights to businesses and organizations. With the emergence of large language models (LMs), recent studies have proposed using fixed examples for instruction tuning to reformulate ABSA as a generation task. However, the performance is sensitive to the selection of in-context examples; several retrieval methods are based on surface similarity and are independent of the LM generative objective. This study proposes an instruction learning method with retrieval-based example ranking for ABSA tasks. For each target sample, an LM was applied as a scorer to estimate the likelihood of the output given the input and a candidate example as the prompt, and training examples were labeled as positive or negative by ranking the scores. An alternating training schema is proposed to train both the retriever and LM. Instructional prompts can be constructed using high-quality examples. The LM is used for both scoring and inference, improving the generation efficiency without incurring additional computational costs or training difficulties. Extensive experiments on three ABSA subtasks verified the effectiveness of the proposed method, demonstrating its superiority over various strong baseline models. Code and data are released at https://github.com/zgMin/IT-RER-ABSA.
△ Less
Submitted 29 May, 2024; v1 submitted 28 May, 2024;
originally announced May 2024.
-
Talk to Parallel LiDARs: A Human-LiDAR Interaction Method Based on 3D Visual Grounding
Authors:
Yuhang Liu,
Boyi Sun,
Guixu Zheng,
Yishuo Wang,
Jing Wang,
Fei-Yue Wang
Abstract:
LiDAR sensors play a crucial role in various applications, especially in autonomous driving. Current research primarily focuses on optimizing perceptual models with point cloud data as input, while the exploration of deeper cognitive intelligence remains relatively limited. To address this challenge, parallel LiDARs have emerged as a novel theoretical framework for the next-generation intelligent…
▽ More
LiDAR sensors play a crucial role in various applications, especially in autonomous driving. Current research primarily focuses on optimizing perceptual models with point cloud data as input, while the exploration of deeper cognitive intelligence remains relatively limited. To address this challenge, parallel LiDARs have emerged as a novel theoretical framework for the next-generation intelligent LiDAR systems, which tightly integrate physical, digital, and social systems. To endow LiDAR systems with cognitive capabilities, we introduce the 3D visual grounding task into parallel LiDARs and present a novel human-computer interaction paradigm for LiDAR systems. We propose Talk2LiDAR, a large-scale benchmark dataset tailored for 3D visual grounding in autonomous driving. Additionally, we present a two-stage baseline approach and an efficient one-stage method named BEVGrounding, which significantly improves grounding accuracy by fusing coarse-grained sentence and fine-grained word embeddings with visual features. Our experiments on Talk2Car-3D and Talk2LiDAR datasets demonstrate the superior performance of BEVGrounding, laying a foundation for further research in this domain.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Enhancing Semantics in Multimodal Chain of Thought via Soft Negative Sampling
Authors:
Guangmin Zheng,
Jin Wang,
Xiaobing Zhou,
Xuejie Zhang
Abstract:
Chain of thought (CoT) has proven useful for problems requiring complex reasoning. Many of these problems are both textual and multimodal. Given the inputs in different modalities, a model generates a rationale and then uses it to answer a question. Because of the hallucination issue, the generated soft negative rationales with high textual quality but illogical semantics do not always help improv…
▽ More
Chain of thought (CoT) has proven useful for problems requiring complex reasoning. Many of these problems are both textual and multimodal. Given the inputs in different modalities, a model generates a rationale and then uses it to answer a question. Because of the hallucination issue, the generated soft negative rationales with high textual quality but illogical semantics do not always help improve answer accuracy. This study proposes a rationale generation method using soft negative sampling (SNSE-CoT) to mitigate hallucinations in multimodal CoT. Five methods were applied to generate soft negative samples that shared highly similar text but had different semantics from the original. Bidirectional margin loss (BML) was applied to introduce them into the traditional contrastive learning framework that involves only positive and negative samples. Extensive experiments on the ScienceQA dataset demonstrated the effectiveness of the proposed method. Code and data are released at https://github.com/zgMin/SNSE-CoT.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
GeoViz: A Multi-View Visualization Platform for Spatio-temporal Knowledge Graph
Authors:
Jianping Zhou,
Junhao Li,
Guanjie Zheng,
Yunqiang Zhu,
Xinbing Wang,
Chenghu Zhou
Abstract:
In this paper, we propose a multi-view visualization technology for spatio-temporal knowledge graph(STKG), which utilizes three distinct perspectives: knowledge tree, knowledge net, and knowledge map, to facilitate a comprehensive analysis of the STKG. The knowledge tree enables the visualization of hierarchical interrelation within the STKG, while the knowledge net elucidates semantic relationshi…
▽ More
In this paper, we propose a multi-view visualization technology for spatio-temporal knowledge graph(STKG), which utilizes three distinct perspectives: knowledge tree, knowledge net, and knowledge map, to facilitate a comprehensive analysis of the STKG. The knowledge tree enables the visualization of hierarchical interrelation within the STKG, while the knowledge net elucidates semantic relationships among knowledge entities. Additionally, the knowledge map displays spatial and temporal distributions via spatial maps and time axes, respectively. Our visualization technology addresses the limitations inherent in single-view approaches and the deficiency of interaction in spatio-temporal perspectives evident in existing visualization methods. Moreover, we have encapsulated this technology within an integrated, open-source platform named GeoViz. A demo video of GeoViz can be accessed at https://anonymous.4open.science/r/GeoViz.
△ Less
Submitted 29 April, 2024;
originally announced May 2024.
-
Learning Robust Classifiers with Self-Guided Spurious Correlation Mitigation
Authors:
Guangtao Zheng,
Wenqian Ye,
Aidong Zhang
Abstract:
Deep neural classifiers tend to rely on spurious correlations between spurious attributes of inputs and targets to make predictions, which could jeopardize their generalization capability. Training classifiers robust to spurious correlations typically relies on annotations of spurious correlations in data, which are often expensive to get. In this paper, we tackle an annotation-free setting and pr…
▽ More
Deep neural classifiers tend to rely on spurious correlations between spurious attributes of inputs and targets to make predictions, which could jeopardize their generalization capability. Training classifiers robust to spurious correlations typically relies on annotations of spurious correlations in data, which are often expensive to get. In this paper, we tackle an annotation-free setting and propose a self-guided spurious correlation mitigation framework. Our framework automatically constructs fine-grained training labels tailored for a classifier obtained with empirical risk minimization to improve its robustness against spurious correlations. The fine-grained training labels are formulated with different prediction behaviors of the classifier identified in a novel spuriousness embedding space. We construct the space with automatically detected conceptual attributes and a novel spuriousness metric which measures how likely a class-attribute correlation is exploited for predictions. We demonstrate that training the classifier to distinguish different prediction behaviors reduces its reliance on spurious correlations without knowing them a priori and outperforms prior methods on five real-world datasets.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Sm-Nd Isotope Data Compilation from Geoscientific Literature Using an Automated Tabular Extraction Method
Authors:
Zhixin Guo,
Tao Wang,
Chaoyang Wang,
Jianping Zhou,
Guanjie Zheng,
Xinbing Wang,
Chenghu Zhou
Abstract:
The rare earth elements Sm and Nd significantly address fundamental questions about crustal growth, such as its spatiotemporal evolution and the interplay between orogenesis and crustal accretion. Their relative immobility during high-grade metamorphism makes the Sm-Nd isotopic system crucial for inferring crustal formation times. Historically, data have been disseminated sporadically in the scien…
▽ More
The rare earth elements Sm and Nd significantly address fundamental questions about crustal growth, such as its spatiotemporal evolution and the interplay between orogenesis and crustal accretion. Their relative immobility during high-grade metamorphism makes the Sm-Nd isotopic system crucial for inferring crustal formation times. Historically, data have been disseminated sporadically in the scientific literature due to complicated and costly sampling procedures, resulting in a fragmented knowledge base. However, the scattering of critical geoscience data across multiple publications poses significant challenges regarding human capital and time. In response, we present an automated tabular extraction method for harvesting tabular geoscience data. We collect 10,624 Sm-Nd data entries from 9,138 tables in over 20,000 geoscience publications using this method. We manually selected 2,118 data points from it to supplement our previously constructed global Sm-Nd dataset, increasing its sample count by over 20\%. Our automatic data collection methodology enhances the efficiency of data acquisition processes spanning various scientific domains. Furthermore, the constructed Sm-Nd isotopic dataset should motivate the research of classifying global orogenic belts.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
MASSTAR: A Multi-Modal and Large-Scale Scene Dataset with a Versatile Toolchain for Surface Prediction and Completion
Authors:
Guiyong Zheng,
Jinqi Jiang,
Chen Feng,
Shaojie Shen,
Boyu Zhou
Abstract:
Surface prediction and completion have been widely studied in various applications. Recently, research in surface completion has evolved from small objects to complex large-scale scenes. As a result, researchers have begun increasing the volume of data and leveraging a greater variety of data modalities including rendered RGB images, descriptive texts, depth images, etc, to enhance algorithm perfo…
▽ More
Surface prediction and completion have been widely studied in various applications. Recently, research in surface completion has evolved from small objects to complex large-scale scenes. As a result, researchers have begun increasing the volume of data and leveraging a greater variety of data modalities including rendered RGB images, descriptive texts, depth images, etc, to enhance algorithm performance. However, existing datasets suffer from a deficiency in the amounts of scene-level models along with the corresponding multi-modal information. Therefore, a method to scale the datasets and generate multi-modal information in them efficiently is essential. To bridge this research gap, we propose MASSTAR: a Multi-modal lArge-scale Scene dataset with a verSatile Toolchain for surfAce pRediction and completion. We develop a versatile and efficient toolchain for processing the raw 3D data from the environments. It screens out a set of fine-grained scene models and generates the corresponding multi-modal data. Utilizing the toolchain, we then generate an example dataset composed of over a thousand scene-level models with partial real-world data added. We compare MASSTAR with the existing datasets, which validates its superiority: the ability to efficiently extract high-quality models from complex scenarios to expand the dataset. Additionally, several representative surface completion algorithms are benchmarked on MASSTAR, which reveals that existing algorithms can hardly deal with scene-level completion. We will release the source code of our toolchain and the dataset. For more details, please see our project page at https://sysu-star.github.io/MASSTAR.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Dual-Channel Multiplex Graph Neural Networks for Recommendation
Authors:
Xiang Li,
Chaofan Fu,
Zhongying Zhao,
Guanjie Zheng,
Chao Huang,
Junyu Dong,
Yanwei Yu
Abstract:
Efficient recommender systems play a crucial role in accurately capturing user and item attributes that mirror individual preferences. Some existing recommendation techniques have started to shift their focus towards modeling various types of interaction relations between users and items in real-world recommendation scenarios, such as clicks, marking favorites, and purchases on online shopping pla…
▽ More
Efficient recommender systems play a crucial role in accurately capturing user and item attributes that mirror individual preferences. Some existing recommendation techniques have started to shift their focus towards modeling various types of interaction relations between users and items in real-world recommendation scenarios, such as clicks, marking favorites, and purchases on online shopping platforms. Nevertheless, these approaches still grapple with two significant shortcomings: (1) Insufficient modeling and exploitation of the impact of various behavior patterns formed by multiplex relations between users and items on representation learning, and (2) ignoring the effect of different relations in the behavior patterns on the target relation in recommender system scenarios. In this study, we introduce a novel recommendation framework, Dual-Channel Multiplex Graph Neural Network (DCMGNN), which addresses the aforementioned challenges. It incorporates an explicit behavior pattern representation learner to capture the behavior patterns composed of multiplex user-item interaction relations, and includes a relation chain representation learning and a relation chain-aware encoder to discover the impact of various auxiliary relations on the target relation, the dependencies between different relations, and mine the appropriate order of relations in a behavior pattern. Extensive experiments on three real-world datasets demonstrate that our \model surpasses various state-of-the-art recommendation methods. It outperforms the best baselines by 10.06\% and 12.15\% on average across all datasets in terms of R@10 and N@10 respectively.
△ Less
Submitted 29 March, 2024; v1 submitted 18 March, 2024;
originally announced March 2024.
-
NetTrack: Tracking Highly Dynamic Objects with a Net
Authors:
Guangze Zheng,
Shijie Lin,
Haobo Zuo,
Changhong Fu,
Jia Pan
Abstract:
The complex dynamicity of open-world objects presents non-negligible challenges for multi-object tracking (MOT), often manifested as severe deformations, fast motion, and occlusions. Most methods that solely depend on coarse-grained object cues, such as boxes and the overall appearance of the object, are susceptible to degradation due to distorted internal relationships of dynamic objects. To addr…
▽ More
The complex dynamicity of open-world objects presents non-negligible challenges for multi-object tracking (MOT), often manifested as severe deformations, fast motion, and occlusions. Most methods that solely depend on coarse-grained object cues, such as boxes and the overall appearance of the object, are susceptible to degradation due to distorted internal relationships of dynamic objects. To address this problem, this work proposes NetTrack, an efficient, generic, and affordable tracking framework to introduce fine-grained learning that is robust to dynamicity. Specifically, NetTrack constructs a dynamicity-aware association with a fine-grained Net, leveraging point-level visual cues. Correspondingly, a fine-grained sampler and matching method have been incorporated. Furthermore, NetTrack learns object-text correspondence for fine-grained localization. To evaluate MOT in extremely dynamic open-world scenarios, a bird flock tracking (BFT) dataset is constructed, which exhibits high dynamicity with diverse species and open-world scenarios. Comprehensive evaluation on BFT validates the effectiveness of fine-grained learning on object dynamicity, and thorough transfer experiments on challenging open-world benchmarks, i.e., TAO, TAO-OW, AnimalTrack, and GMOT-40, validate the strong generalization ability of NetTrack even without finetuning. Project page: https://george-zhuang.github.io/nettrack/.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
Graph Data Condensation via Self-expressive Graph Structure Reconstruction
Authors:
Zhanyu Liu,
Chaolv Zeng,
Guanjie Zheng
Abstract:
With the increasing demands of training graph neural networks (GNNs) on large-scale graphs, graph data condensation has emerged as a critical technique to relieve the storage and time costs during the training phase. It aims to condense the original large-scale graph to a much smaller synthetic graph while preserving the essential information necessary for efficiently training a downstream GNN. Ho…
▽ More
With the increasing demands of training graph neural networks (GNNs) on large-scale graphs, graph data condensation has emerged as a critical technique to relieve the storage and time costs during the training phase. It aims to condense the original large-scale graph to a much smaller synthetic graph while preserving the essential information necessary for efficiently training a downstream GNN. However, existing methods concentrate either on optimizing node features exclusively or endeavor to independently learn node features and the graph structure generator. They could not explicitly leverage the information of the original graph structure and failed to construct an interpretable graph structure for the synthetic dataset. To address these issues, we introduce a novel framework named \textbf{G}raph Data \textbf{C}ondensation via \textbf{S}elf-expressive Graph Structure \textbf{R}econstruction (\textbf{GCSR}). Our method stands out by (1) explicitly incorporating the original graph structure into the condensing process and (2) capturing the nuanced interdependencies between the condensed nodes by reconstructing an interpretable self-expressive graph structure. Extensive experiments and comprehensive analysis validate the efficacy of the proposed method across diverse GNN models and datasets. Our code is available at \url{https://github.com/zclzcl0223/GCSR}.
△ Less
Submitted 7 June, 2024; v1 submitted 11 March, 2024;
originally announced March 2024.
-
Dataset Condensation for Time Series Classification via Dual Domain Matching
Authors:
Zhanyu Liu,
Ke Hao,
Guanjie Zheng,
Yanwei Yu
Abstract:
Time series data has been demonstrated to be crucial in various research fields. The management of large quantities of time series data presents challenges in terms of deep learning tasks, particularly for training a deep neural network. Recently, a technique named \textit{Dataset Condensation} has emerged as a solution to this problem. This technique generates a smaller synthetic dataset that has…
▽ More
Time series data has been demonstrated to be crucial in various research fields. The management of large quantities of time series data presents challenges in terms of deep learning tasks, particularly for training a deep neural network. Recently, a technique named \textit{Dataset Condensation} has emerged as a solution to this problem. This technique generates a smaller synthetic dataset that has comparable performance to the full real dataset in downstream tasks such as classification. However, previous methods are primarily designed for image and graph datasets, and directly adapting them to the time series dataset leads to suboptimal performance due to their inability to effectively leverage the rich information inherent in time series data, particularly in the frequency domain. In this paper, we propose a novel framework named Dataset \textit{\textbf{Cond}}ensation for \textit{\textbf{T}}ime \textit{\textbf{S}}eries \textit{\textbf{C}}lassification via Dual Domain Matching (\textbf{CondTSC}) which focuses on the time series classification dataset condensation task. Different from previous methods, our proposed framework aims to generate a condensed dataset that matches the surrogate objectives in both the time and frequency domains. Specifically, CondTSC incorporates multi-view data augmentation, dual domain training, and dual surrogate objectives to enhance the dataset condensation process in the time and frequency domains. Through extensive experiments, we demonstrate the effectiveness of our proposed framework, which outperforms other baselines and learns a condensed synthetic dataset that exhibits desirable characteristics such as conforming to the distribution of the original data.
△ Less
Submitted 10 June, 2024; v1 submitted 11 March, 2024;
originally announced March 2024.
-
AceMap: Knowledge Discovery through Academic Graph
Authors:
Xinbing Wang,
Luoyi Fu,
Xiaoying Gan,
Ying Wen,
Guanjie Zheng,
Jiaxin Ding,
Liyao Xiang,
Nanyang Ye,
Meng Jin,
Shiyu Liang,
Bin Lu,
Haiwen Wang,
Yi Xu,
Cheng Deng,
Shao Zhang,
Huquan Kang,
Xingli Wang,
Qi Li,
Zhixin Guo,
Jiexing Qi,
Pan Liu,
Yuyang Ren,
Lyuwen Wu,
Jungang Yang,
Jianping Zhou
, et al. (1 additional authors not shown)
Abstract:
The exponential growth of scientific literature requires effective management and extraction of valuable insights. While existing scientific search engines excel at delivering search results based on relational databases, they often neglect the analysis of collaborations between scientific entities and the evolution of ideas, as well as the in-depth analysis of content within scientific publicatio…
▽ More
The exponential growth of scientific literature requires effective management and extraction of valuable insights. While existing scientific search engines excel at delivering search results based on relational databases, they often neglect the analysis of collaborations between scientific entities and the evolution of ideas, as well as the in-depth analysis of content within scientific publications. The representation of heterogeneous graphs and the effective measurement, analysis, and mining of such graphs pose significant challenges. To address these challenges, we present AceMap, an academic system designed for knowledge discovery through academic graph. We present advanced database construction techniques to build the comprehensive AceMap database with large-scale academic entities that contain rich visual, textual, and numerical information. AceMap also employs innovative visualization, quantification, and analysis methods to explore associations and logical relationships among academic entities. AceMap introduces large-scale academic network visualization techniques centered on nebular graphs, providing a comprehensive view of academic networks from multiple perspectives. In addition, AceMap proposes a unified metric based on structural entropy to quantitatively measure the knowledge content of different academic entities. Moreover, AceMap provides advanced analysis capabilities, including tracing the evolution of academic ideas through citation relationships and concept co-occurrence, and generating concise summaries informed by this evolutionary process. In addition, AceMap uses machine reading methods to generate potential new ideas at the intersection of different fields. Exploring the integration of large language models and knowledge graphs is a promising direction for future research in idea evolution. Please visit \url{https://www.acemap.info} for further exploration.
△ Less
Submitted 14 April, 2024; v1 submitted 4 March, 2024;
originally announced March 2024.
-
Spurious Correlations in Machine Learning: A Survey
Authors:
Wenqian Ye,
Guangtao Zheng,
Xu Cao,
Yunsheng Ma,
Aidong Zhang
Abstract:
Machine learning systems are known to be sensitive to spurious correlations between non-essential features of the inputs (e.g., background, texture, and secondary objects) and the corresponding labels. These features and their correlations with the labels are known as "spurious" because they tend to change with shifts in real-world data distributions, which can negatively impact the model's genera…
▽ More
Machine learning systems are known to be sensitive to spurious correlations between non-essential features of the inputs (e.g., background, texture, and secondary objects) and the corresponding labels. These features and their correlations with the labels are known as "spurious" because they tend to change with shifts in real-world data distributions, which can negatively impact the model's generalization and robustness. In this paper, we provide a review of this issue, along with a taxonomy of current state-of-the-art methods for addressing spurious correlations in machine learning models. Additionally, we summarize existing datasets, benchmarks, and metrics to aid future research. The paper concludes with a discussion of the recent advancements and future challenges in this field, aiming to provide valuable insights for researchers in the related domains.
△ Less
Submitted 16 May, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
A Lattice-Reduction Aided Vector Perturbation Precoder Relying on Quantum Annealing
Authors:
Samuel Winter,
Yangyishi Zhang,
Gan Zheng,
Lajos Hanzo
Abstract:
Quantum annealing (QA) is proposed for vector perturbation precoding (VPP) in multiple input multiple output (MIMO) communications systems. The mathematical framework of VPP is presented, outlining the problem formulation and the benefits of lattice reduction algorithms. Lattice reduction aided quantum vector perturbation (LRAQVP) is designed by harnessing physical quantum hardware, and the optimi…
▽ More
Quantum annealing (QA) is proposed for vector perturbation precoding (VPP) in multiple input multiple output (MIMO) communications systems. The mathematical framework of VPP is presented, outlining the problem formulation and the benefits of lattice reduction algorithms. Lattice reduction aided quantum vector perturbation (LRAQVP) is designed by harnessing physical quantum hardware, and the optimization of hardware parameters is discussed. We observe a 5dB gain over lattice reduction zero forcing precoding (LRZFP), which behaves similarly to a quantum annealing algorithm operating without a lattice reduction stage. The proposed algorithm is also shown to approach the performance of a sphere encoder, which exhibits an exponentially escalating complexity.
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
EasyInstruct: An Easy-to-use Instruction Processing Framework for Large Language Models
Authors:
Yixin Ou,
Ningyu Zhang,
Honghao Gui,
Ziwen Xu,
Shuofei Qiao,
Yida Xue,
Runnan Fang,
Kangwei Liu,
Lei Li,
Zhen Bi,
Guozhou Zheng,
Huajun Chen
Abstract:
In recent years, instruction tuning has gained increasing attention and emerged as a crucial technique to enhance the capabilities of Large Language Models (LLMs). To construct high-quality instruction datasets, many instruction processing approaches have been proposed, aiming to achieve a delicate balance between data quantity and data quality. Nevertheless, due to inconsistencies that persist am…
▽ More
In recent years, instruction tuning has gained increasing attention and emerged as a crucial technique to enhance the capabilities of Large Language Models (LLMs). To construct high-quality instruction datasets, many instruction processing approaches have been proposed, aiming to achieve a delicate balance between data quantity and data quality. Nevertheless, due to inconsistencies that persist among various instruction processing methods, there is no standard open-source instruction processing implementation framework available for the community, which hinders practitioners from further developing and advancing. To facilitate instruction processing research and development, we present EasyInstruct, an easy-to-use instruction processing framework for LLMs, which modularizes instruction generation, selection, and prompting, while also considering their combination and interaction. EasyInstruct is publicly released and actively maintained at https://github.com/zjunlp/EasyInstruct, along with an online demo app and a demo video for quick-start, calling for broader research centered on instruction data and synthetic data.
△ Less
Submitted 21 March, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
Multi-scale Traffic Pattern Bank for Cross-city Few-shot Traffic Forecasting
Authors:
Zhanyu Liu,
Guanjie Zheng,
Yanwei Yu
Abstract:
Traffic forecasting is crucial for intelligent transportation systems (ITS), aiding in efficient resource allocation and effective traffic control. However, its effectiveness often relies heavily on abundant traffic data, while many cities lack sufficient data due to limited device support, posing a significant challenge for traffic forecasting. Recognizing this challenge, we have made a noteworth…
▽ More
Traffic forecasting is crucial for intelligent transportation systems (ITS), aiding in efficient resource allocation and effective traffic control. However, its effectiveness often relies heavily on abundant traffic data, while many cities lack sufficient data due to limited device support, posing a significant challenge for traffic forecasting. Recognizing this challenge, we have made a noteworthy observation: traffic patterns exhibit similarities across diverse cities. Building on this key insight, we propose a solution for the cross-city few-shot traffic forecasting problem called Multi-scale Traffic Pattern Bank (MTPB). Primarily, MTPB initiates its learning process by leveraging data-rich source cities, effectively acquiring comprehensive traffic knowledge through a spatial-temporal-aware pre-training process. Subsequently, the framework employs advanced clustering techniques to systematically generate a multi-scale traffic pattern bank derived from the learned knowledge. Next, the traffic data of the data-scarce target city could query the traffic pattern bank, facilitating the aggregation of meta-knowledge. This meta-knowledge, in turn, assumes a pivotal role as a robust guide in subsequent processes involving graph reconstruction and forecasting. Empirical assessments conducted on real-world traffic datasets affirm the superior performance of MTPB, surpassing existing methods across various categories and exhibiting numerous attributes conducive to the advancement of cross-city few-shot forecasting methodologies. The code is available in https://github.com/zhyliu00/MTPB.
△ Less
Submitted 26 February, 2024; v1 submitted 1 February, 2024;
originally announced February 2024.
-
MouSi: Poly-Visual-Expert Vision-Language Models
Authors:
Xiaoran Fan,
Tao Ji,
Changhao Jiang,
Shuo Li,
Senjie Jin,
Sirui Song,
Junke Wang,
Boyang Hong,
Lu Chen,
Guodong Zheng,
Ming Zhang,
Caishuang Huang,
Rui Zheng,
Zhiheng Xi,
Yuhao Zhou,
Shihan Dou,
Junjie Ye,
Hang Yan,
Tao Gui,
Qi Zhang,
Xipeng Qiu,
Xuanjing Huang,
Zuxuan Wu,
Yu-Gang Jiang
Abstract:
Current large vision-language models (VLMs) often encounter challenges such as insufficient capabilities of a single visual component and excessively long visual tokens. These issues can limit the model's effectiveness in accurately interpreting complex visual information and over-lengthy contextual information. Addressing these challenges is crucial for enhancing the performance and applicability…
▽ More
Current large vision-language models (VLMs) often encounter challenges such as insufficient capabilities of a single visual component and excessively long visual tokens. These issues can limit the model's effectiveness in accurately interpreting complex visual information and over-lengthy contextual information. Addressing these challenges is crucial for enhancing the performance and applicability of VLMs. This paper proposes the use of ensemble experts technique to synergizes the capabilities of individual visual encoders, including those skilled in image-text matching, OCR, image segmentation, etc. This technique introduces a fusion network to unify the processing of outputs from different visual experts, while bridging the gap between image encoders and pre-trained LLMs. In addition, we explore different positional encoding schemes to alleviate the waste of positional encoding caused by lengthy image feature sequences, effectively addressing the issue of position overflow and length limitations. For instance, in our implementation, this technique significantly reduces the positional occupancy in models like SAM, from a substantial 4096 to a more efficient and manageable 64 or even down to 1. Experimental results demonstrate that VLMs with multiple experts exhibit consistently superior performance over isolated visual encoders and mark a significant performance boost as more experts are integrated. We have open-sourced the training code used in this report. All of these resources can be found on our project website.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities
Authors:
Chaochao Lu,
Chen Qian,
Guodong Zheng,
Hongxing Fan,
Hongzhi Gao,
Jie Zhang,
Jing Shao,
Jingyi Deng,
Jinlan Fu,
Kexin Huang,
Kunchang Li,
Lijun Li,
Limin Wang,
Lu Sheng,
Meiqi Chen,
Ming Zhang,
Qibing Ren,
Sirui Chen,
Tao Gui,
Wanli Ouyang,
Yali Wang,
Yan Teng,
Yaru Wang,
Yi Wang,
Yinan He
, et al. (11 additional authors not shown)
Abstract:
Multi-modal Large Language Models (MLLMs) have shown impressive abilities in generating reasonable responses with respect to multi-modal contents. However, there is still a wide gap between the performance of recent MLLM-based applications and the expectation of the broad public, even though the most powerful OpenAI's GPT-4 and Google's Gemini have been deployed. This paper strives to enhance unde…
▽ More
Multi-modal Large Language Models (MLLMs) have shown impressive abilities in generating reasonable responses with respect to multi-modal contents. However, there is still a wide gap between the performance of recent MLLM-based applications and the expectation of the broad public, even though the most powerful OpenAI's GPT-4 and Google's Gemini have been deployed. This paper strives to enhance understanding of the gap through the lens of a qualitative study on the generalizability, trustworthiness, and causal reasoning capabilities of recent proprietary and open-source MLLMs across four modalities: ie, text, code, image, and video, ultimately aiming to improve the transparency of MLLMs. We believe these properties are several representative factors that define the reliability of MLLMs, in supporting various downstream applications. To be specific, we evaluate the closed-source GPT-4 and Gemini and 6 open-source LLMs and MLLMs. Overall we evaluate 230 manually designed cases, where the qualitative results are then summarized into 12 scores (ie, 4 modalities times 3 properties). In total, we uncover 14 empirical findings that are useful to understand the capabilities and limitations of both proprietary and open-source MLLMs, towards more reliable downstream multi-modal applications.
△ Less
Submitted 29 January, 2024; v1 submitted 26 January, 2024;
originally announced January 2024.
-
Contrastive Loss Based Frame-wise Feature disentanglement for Polyphonic Sound Event Detection
Authors:
Yadong Guan,
Jiqing Han,
Hongwei Song,
Wenjie Song,
Guibin Zheng,
Tieran Zheng,
Yongjun He
Abstract:
Overlapping sound events are ubiquitous in real-world environments, but existing end-to-end sound event detection (SED) methods still struggle to detect them effectively. A critical reason is that these methods represent overlapping events using shared and entangled frame-wise features, which degrades the feature discrimination. To solve the problem, we propose a disentangled feature learning fram…
▽ More
Overlapping sound events are ubiquitous in real-world environments, but existing end-to-end sound event detection (SED) methods still struggle to detect them effectively. A critical reason is that these methods represent overlapping events using shared and entangled frame-wise features, which degrades the feature discrimination. To solve the problem, we propose a disentangled feature learning framework to learn a category-specific representation. Specifically, we employ different projectors to learn the frame-wise features for each category. To ensure that these feature does not contain information of other categories, we maximize the common information between frame-wise features within the same category and propose a frame-wise contrastive loss. In addition, considering that the labeled data used by the proposed method is limited, we propose a semi-supervised frame-wise contrastive loss that can leverage large amounts of unlabeled data to achieve feature disentanglement. The experimental results demonstrate the effectiveness of our method.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.
-
Task Planning for Multiple Item Insertion using ADMM
Authors:
Gavin Zheng
Abstract:
Mixed-integer nonlinear programmings (MINLPs) are powerful formulation tools for task planning. However, it suffers from long solving time especially for large scale problems. In this work, we first formulate the task planning problem for item stowing into a mixed-integer nonlinear programming problem, then solve it using Alternative Direction Method of Multipliers (ADMM). ADMM separates the compl…
▽ More
Mixed-integer nonlinear programmings (MINLPs) are powerful formulation tools for task planning. However, it suffers from long solving time especially for large scale problems. In this work, we first formulate the task planning problem for item stowing into a mixed-integer nonlinear programming problem, then solve it using Alternative Direction Method of Multipliers (ADMM). ADMM separates the complete formulation into a nonlinear programming problem and mixed-integer programming problem, then iterate between them to solve the original problem. We show that our ADMM converges better than non-warm-started nonlinear complementary formulation. Our proposed methods are demonstrated on hardware as a high level planner to insert books into the bookshelf.
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
AdvST: Revisiting Data Augmentations for Single Domain Generalization
Authors:
Guangtao Zheng,
Mengdi Huai,
Aidong Zhang
Abstract:
Single domain generalization (SDG) aims to train a robust model against unknown target domain shifts using data from a single source domain. Data augmentation has been proven an effective approach to SDG. However, the utility of standard augmentations, such as translate, or invert, has not been fully exploited in SDG; practically, these augmentations are used as a part of a data preprocessing proc…
▽ More
Single domain generalization (SDG) aims to train a robust model against unknown target domain shifts using data from a single source domain. Data augmentation has been proven an effective approach to SDG. However, the utility of standard augmentations, such as translate, or invert, has not been fully exploited in SDG; practically, these augmentations are used as a part of a data preprocessing procedure. Although it is intuitive to use many such augmentations to boost the robustness of a model to out-of-distribution domain shifts, we lack a principled approach to harvest the benefit brought from multiple these augmentations. Here, we conceptualize standard data augmentations with learnable parameters as semantics transformations that can manipulate certain semantics of a sample, such as the geometry or color of an image. Then, we propose Adversarial learning with Semantics Transformations (AdvST) that augments the source domain data with semantics transformations and learns a robust model with the augmented data. We theoretically show that AdvST essentially optimizes a distributionally robust optimization objective defined on a set of semantics distributions induced by the parameters of semantics transformations. We demonstrate that AdvST can produce samples that expand the coverage on target domain data. Compared with the state-of-the-art methods, AdvST, despite being a simple method, is surprisingly competitive and achieves the best average SDG performance on the Digits, PACS, and DomainNet datasets. Our code is available at https://github.com/gtzheng/AdvST.
△ Less
Submitted 14 February, 2024; v1 submitted 19 December, 2023;
originally announced December 2023.
-
Towards Controlled Table-to-Text Generation with Scientific Reasoning
Authors:
Zhixin Guo,
Jianping Zhou,
Jiexing Qi,
Mingxuan Yan,
Ziwei He,
Guanjie Zheng,
Zhouhan Lin,
Xinbing Wang,
Chenghu Zhou
Abstract:
The sheer volume of scientific experimental results and complex technical statements, often presented in tabular formats, presents a formidable barrier to individuals acquiring preferred information. The realms of scientific reasoning and content generation that adhere to user preferences encounter distinct challenges. In this work, we present a new task for generating fluent and logical descripti…
▽ More
The sheer volume of scientific experimental results and complex technical statements, often presented in tabular formats, presents a formidable barrier to individuals acquiring preferred information. The realms of scientific reasoning and content generation that adhere to user preferences encounter distinct challenges. In this work, we present a new task for generating fluent and logical descriptions that match user preferences over scientific tabular data, aiming to automate scientific document analysis. To facilitate research in this direction, we construct a new challenging dataset CTRLSciTab consisting of table-description pairs extracted from the scientific literature, with highlighted cells and corresponding domain-specific knowledge base. We evaluated popular pre-trained language models to establish a baseline and proposed a novel architecture outperforming competing approaches. The results showed that large models struggle to produce accurate content that aligns with user preferences. As the first of its kind, our work should motivate further research in scientific domains.
△ Less
Submitted 8 December, 2023;
originally announced December 2023.
-
Axiomatic Preference Modeling for Longform Question Answering
Authors:
Corby Rosset,
Guoqing Zheng,
Victor Dibia,
Ahmed Awadallah,
Paul Bennett
Abstract:
The remarkable abilities of large language models (LLMs) like GPT-4 partially stem from post-training processes like Reinforcement Learning from Human Feedback (RLHF) involving human preferences encoded in a reward model. However, these reward models (RMs) often lack direct knowledge of why, or under what principles, the preferences annotations were made. In this study, we identify principles that…
▽ More
The remarkable abilities of large language models (LLMs) like GPT-4 partially stem from post-training processes like Reinforcement Learning from Human Feedback (RLHF) involving human preferences encoded in a reward model. However, these reward models (RMs) often lack direct knowledge of why, or under what principles, the preferences annotations were made. In this study, we identify principles that guide RMs to better align with human preferences, and then develop an axiomatic framework to generate a rich variety of preference signals to uphold them. We use these axiomatic signals to train a model for scoring answers to longform questions. Our approach yields a Preference Model with only about 220M parameters that agrees with gold human-annotated preference labels more often than GPT-4. The contributions of this work include: training a standalone preference model that can score human- and LLM-generated answers on the same scale; developing an axiomatic framework for generating training data pairs tailored to certain principles; and showing that a small amount of axiomatic signals can help small models outperform GPT-4 in preference scoring. We release our model on huggingface: https://huggingface.co/corbyrosset/axiomatic_preference_model
△ Less
Submitted 2 December, 2023;
originally announced December 2023.
-
Orca 2: Teaching Small Language Models How to Reason
Authors:
Arindam Mitra,
Luciano Del Corro,
Shweti Mahajan,
Andres Codas,
Clarisse Simoes,
Sahaj Agarwal,
Xuxi Chen,
Anastasia Razdaibiedina,
Erik Jones,
Kriti Aggarwal,
Hamid Palangi,
Guoqing Zheng,
Corby Rosset,
Hamed Khanpour,
Ahmed Awadallah
Abstract:
Orca 1 learns from rich signals, such as explanation traces, allowing it to outperform conventional instruction-tuned models on benchmarks like BigBench Hard and AGIEval. In Orca 2, we continue exploring how improved training signals can enhance smaller LMs' reasoning abilities. Research on training small LMs has often relied on imitation learning to replicate the output of more capable models. We…
▽ More
Orca 1 learns from rich signals, such as explanation traces, allowing it to outperform conventional instruction-tuned models on benchmarks like BigBench Hard and AGIEval. In Orca 2, we continue exploring how improved training signals can enhance smaller LMs' reasoning abilities. Research on training small LMs has often relied on imitation learning to replicate the output of more capable models. We contend that excessive emphasis on imitation may restrict the potential of smaller models. We seek to teach small LMs to employ different solution strategies for different tasks, potentially different from the one used by the larger model. For example, while larger models might provide a direct answer to a complex task, smaller models may not have the same capacity. In Orca 2, we teach the model various reasoning techniques (step-by-step, recall then generate, recall-reason-generate, direct answer, etc.). More crucially, we aim to help the model learn to determine the most effective solution strategy for each task. We evaluate Orca 2 using a comprehensive set of 15 diverse benchmarks (corresponding to approximately 100 tasks and over 36,000 unique prompts). Orca 2 significantly surpasses models of similar size and attains performance levels similar or better to those of models 5-10x larger, as assessed on complex tasks that test advanced reasoning abilities in zero-shot settings. make Orca 2 weights publicly available at aka.ms/orca-lm to support research on the development, evaluation, and alignment of smaller LMs
△ Less
Submitted 21 November, 2023; v1 submitted 18 November, 2023;
originally announced November 2023.
-
A proof-of-concept online metadata catalogue service of Earth observation datasets for human health research in exposomics
Authors:
Keumseok Koh,
Maged N. Kamel Boulos,
Gang Zheng,
Hongsheng Zhang,
Muralikrishna V. Iyyanki,
Bosco Bwambale,
Ashraf Dewan
Abstract:
This article describes research carried out during 2023 under an International Society for Photogrammetry and Remote Sensing (ISPRS)-funded project to develop and disseminate a metadata catalogue of Earth observation data sources/products and types that are relevant to human health research in exposomics, as a free service to interested researchers worldwide. The proof-of-concept catalogue was inf…
▽ More
This article describes research carried out during 2023 under an International Society for Photogrammetry and Remote Sensing (ISPRS)-funded project to develop and disseminate a metadata catalogue of Earth observation data sources/products and types that are relevant to human health research in exposomics, as a free service to interested researchers worldwide. The proof-of-concept catalogue was informed by input from existing research literature on the subject (desk research), as well as online communications with, and relevant research publications collected from, a small panel (n = 5) of select experts from the academia in three countries (China, UK and USA). It has 90 metadata records of relevant Earth observation datasets (n = 40) and associated health-focused research publications (n = 50). The project's online portal offers a searchable version of the catalogue featuring a number of search modes and filtering options. It is hoped future, more comprehensive versions of this service will enable more researchers and studies to discover and use remote sensing data about population-level exposures to disease determinants (exposomic determinants of disease) in combination with other relevant data to reveal fresh insights that could improve our understanding of relevant diseases, and hence contribute to the development of better-optimized prevention and management plans to tackle them.
△ Less
Submitted 1 April, 2024; v1 submitted 15 November, 2023;
originally announced November 2023.
-
DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models
Authors:
Ge Zheng,
Bin Yang,
Jiajin Tang,
Hong-Yu Zhou,
Sibei Yang
Abstract:
A long-standing goal of AI systems is to perform complex multimodal reasoning like humans. Recently, large language models (LLMs) have made remarkable strides in such multi-step reasoning on the language modality solely by leveraging the chain of thought (CoT) to mimic human thinking. However, the transfer of these advancements to multimodal contexts introduces heightened challenges, including but…
▽ More
A long-standing goal of AI systems is to perform complex multimodal reasoning like humans. Recently, large language models (LLMs) have made remarkable strides in such multi-step reasoning on the language modality solely by leveraging the chain of thought (CoT) to mimic human thinking. However, the transfer of these advancements to multimodal contexts introduces heightened challenges, including but not limited to the impractical need for labor-intensive annotation and the limitations in terms of flexibility, generalizability, and explainability. To evoke CoT reasoning in multimodality, this work first conducts an in-depth analysis of these challenges posed by multimodality and presents two key insights: "keeping critical thinking" and "letting everyone do their jobs" in multimodal CoT reasoning. Furthermore, this study proposes a novel DDCoT prompting that maintains a critical attitude through negative-space prompting and incorporates multimodality into reasoning by first dividing the reasoning responsibility of LLMs into reasoning and recognition and then integrating the visual recognition capability of visual models into the joint reasoning process. The rationales generated by DDCoT not only improve the reasoning abilities of both large and small language models in zero-shot prompting and fine-tuning learning, significantly outperforming state-of-the-art methods but also exhibit impressive generalizability and explainability.
△ Less
Submitted 26 October, 2023; v1 submitted 25 October, 2023;
originally announced October 2023.
-
A global product of fine-scale urban building height based on spaceborne lidar
Authors:
Xiao Ma,
Guang Zheng,
Chi Xu,
L. Monika Moskal,
Peng Gong,
Qinghua Guo,
Huabing Huang,
Xuecao Li,
Yong Pang,
Cheng Wang,
Huan Xie,
Bailang Yu,
Bo Zhao,
Yuyu Zhou
Abstract:
Characterizing urban environments with broad coverages and high precision is more important than ever for achieving the UN's Sustainable Development Goals (SDGs) as half of the world's populations are living in cities. Urban building height as a fundamental 3D urban structural feature has far-reaching applications. However, so far, producing readily available datasets of recent urban building heig…
▽ More
Characterizing urban environments with broad coverages and high precision is more important than ever for achieving the UN's Sustainable Development Goals (SDGs) as half of the world's populations are living in cities. Urban building height as a fundamental 3D urban structural feature has far-reaching applications. However, so far, producing readily available datasets of recent urban building heights with fine spatial resolutions and global coverages remains a challenging task. Here, we provide an up-to-date global product of urban building heights based on a fine grid size of 150 m around 2020 by combining the spaceborne lidar instrument of GEDI and multi-sourced data including remotely sensed images (i.e., Landsat-8, Sentinel-2, and Sentinel-1) and topographic data. Our results revealed that the estimated method of building height samples based on the GEDI data was effective with 0.78 of Pearson's r and 3.67 m of RMSE in comparison to the reference data. The mapping product also demonstrated good performance as indicated by its strong correlation with the reference data (i.e., Pearson's r = 0.71, RMSE = 4.60 m). Compared with the currently existing products, our global urban building height map holds the ability to provide a higher spatial resolution (i.e., 150 m) with a great level of inherent details about the spatial heterogeneity and flexibility of updating using the GEDI samples as inputs. This work will boost future urban studies across many fields including climate, environmental, ecological, and social sciences.
△ Less
Submitted 22 October, 2023;
originally announced October 2023.
-
Cross-Edge Orchestration of Serverless Functions with Probabilistic Caching
Authors:
Chen Chen,
Manuel Herrera,
Ge Zheng,
Liqiao Xia,
Zhengyang Ling,
Jiangtao Wang
Abstract:
Serverless edge computing adopts an event-based paradigm that provides back-end services on an as-used basis, resulting in efficient resource utilization. To improve the end-to-end latency and revenue, service providers need to optimize the number and placement of serverless containers while considering the system cost incurred by the provisioning. The particular reason for this circumstance is th…
▽ More
Serverless edge computing adopts an event-based paradigm that provides back-end services on an as-used basis, resulting in efficient resource utilization. To improve the end-to-end latency and revenue, service providers need to optimize the number and placement of serverless containers while considering the system cost incurred by the provisioning. The particular reason for this circumstance is that frequently creating and destroying containers not only increases the system cost but also degrades the time responsiveness due to the cold-start process. Function caching is a common approach to mitigate the coldstart issue. However, function caching requires extra hardware resources and hence incurs extra system costs. Furthermore, the dynamic and bursty nature of serverless invocations remains an under-explored area. Hence, it is vitally important for service providers to conduct a context-aware request distribution and container caching policy for serverless edge computing. In this paper, we study the request distribution and container caching problem in serverless edge computing. We prove the proposed problem is NP-hard and hence difficult to find a global optimal solution. We jointly consider the distributed and resource constrained nature of edge computing and propose an optimized request distribution algorithm that adapts to the dynamics of serverless invocations with a theoretical performance guarantee. Also, we propose a context-aware probabilistic caching policy that incorporates a number of characteristics of serverless invocations. Via simulation and implementation results, we demonstrate the superiority of the proposed algorithm by outperforming existing caching policies in terms of the overall system cost and cold-start frequency by up to 62.1% and 69.1%, respectively.
△ Less
Submitted 6 October, 2023;
originally announced October 2023.
-
Health diagnosis and recuperation of aged Li-ion batteries with data analytics and equivalent circuit modeling
Authors:
Riko I Made,
Jing Lin,
Jintao Zhang,
Yu Zhang,
Lionel C. H. Moh,
Zhaolin Liu,
Ning Ding,
Sing Yang Chiam,
Edwin Khoo,
Xuesong Yin,
Guangyuan Wesley Zheng
Abstract:
Battery health assessment and recuperation play a crucial role in the utilization of second-life Li-ion batteries. However, due to ambiguous aging mechanisms and lack of correlations between the recovery effects and operational states, it is challenging to accurately estimate battery health and devise a clear strategy for cell rejuvenation. This paper presents aging and reconditioning experiments…
▽ More
Battery health assessment and recuperation play a crucial role in the utilization of second-life Li-ion batteries. However, due to ambiguous aging mechanisms and lack of correlations between the recovery effects and operational states, it is challenging to accurately estimate battery health and devise a clear strategy for cell rejuvenation. This paper presents aging and reconditioning experiments of 62 commercial high-energy type lithium iron phosphate (LFP) cells, which supplement existing datasets of high-power LFP cells. The relatively large-scale data allow us to use machine learning models to predict cycle life and identify important indicators of recoverable capacity. Considering cell-to-cell inconsistencies, an average test error of $16.84\% \pm 1.87\%$ (mean absolute percentage error) for cycle life prediction is achieved by gradient boosting regressor given information from the first 80 cycles. In addition, it is found that some of the recoverable lost capacity is attributed to the lateral lithium non-uniformity within the electrodes. An equivalent circuit model is built and experimentally validated to demonstrate how such non-uniformity can be accumulated, and how it can give rise to recoverable capacity loss. SHapley Additive exPlanations (SHAP) analysis also reveals that battery operation history significantly affects the capacity recovery.
△ Less
Submitted 21 September, 2023;
originally announced October 2023.
-
Sweeping Heterogeneity with Smart MoPs: Mixture of Prompts for LLM Task Adaptation
Authors:
Chen Dun,
Mirian Hipolito Garcia,
Guoqing Zheng,
Ahmed Hassan Awadallah,
Anastasios Kyrillidis,
Robert Sim
Abstract:
Large Language Models (LLMs) have the ability to solve a variety of tasks, such as text summarization and mathematical questions, just out of the box, but they are often trained with a single task in mind. Due to high computational costs, the current trend is to use prompt instruction tuning to better adjust monolithic, pretrained LLMs for new -- but often individual -- downstream tasks. Thus, how…
▽ More
Large Language Models (LLMs) have the ability to solve a variety of tasks, such as text summarization and mathematical questions, just out of the box, but they are often trained with a single task in mind. Due to high computational costs, the current trend is to use prompt instruction tuning to better adjust monolithic, pretrained LLMs for new -- but often individual -- downstream tasks. Thus, how one would expand prompt tuning to handle -- concomitantly -- heterogeneous tasks and data distributions is a widely open question. To address this gap, we suggest the use of \emph{Mixture of Prompts}, or MoPs, associated with smart gating functionality: the latter -- whose design is one of the contributions of this paper -- can identify relevant skills embedded in different groups of prompts and dynamically assign combined experts (i.e., collection of prompts), based on the target task. Additionally, MoPs are empirically agnostic to any model compression technique applied -- for efficiency reasons -- as well as instruction data source and task composition. In practice, MoPs can simultaneously mitigate prompt training "interference" in multi-task, multi-source scenarios (e.g., task and data heterogeneity across sources), as well as possible implications from model approximations. As a highlight, MoPs manage to decrease final perplexity from $\sim20\%$ up to $\sim70\%$, as compared to baselines, in the federated scenario, and from $\sim 3\%$ up to $\sim30\%$ in the centralized scenario.
△ Less
Submitted 5 October, 2023; v1 submitted 4 October, 2023;
originally announced October 2023.
-
OceanGPT: A Large Language Model for Ocean Science Tasks
Authors:
Zhen Bi,
Ningyu Zhang,
Yida Xue,
Yixin Ou,
Daxiong Ji,
Guozhou Zheng,
Huajun Chen
Abstract:
Ocean science, which delves into the oceans that are reservoirs of life and biodiversity, is of great significance given that oceans cover over 70% of our planet's surface. Recently, advances in Large Language Models (LLMs) have transformed the paradigm in science. Despite the success in other domains, current LLMs often fall short in catering to the needs of domain experts like oceanographers, an…
▽ More
Ocean science, which delves into the oceans that are reservoirs of life and biodiversity, is of great significance given that oceans cover over 70% of our planet's surface. Recently, advances in Large Language Models (LLMs) have transformed the paradigm in science. Despite the success in other domains, current LLMs often fall short in catering to the needs of domain experts like oceanographers, and the potential of LLMs for ocean science is under-explored. The intrinsic reasons are the immense and intricate nature of ocean data as well as the necessity for higher granularity and richness in knowledge. To alleviate these issues, we introduce OceanGPT, the first-ever large language model in the ocean domain, which is expert in various ocean science tasks. We also propose OceanGPT, a novel framework to automatically obtain a large volume of ocean domain instruction data, which generates instructions based on multi-agent collaboration. Additionally, we construct the first oceanography benchmark, OceanBench, to evaluate the capabilities of LLMs in the ocean domain. Though comprehensive experiments, OceanGPT not only shows a higher level of knowledge expertise for oceans science tasks but also gains preliminary embodied intelligence capabilities in ocean technology.
△ Less
Submitted 23 May, 2024; v1 submitted 3 October, 2023;
originally announced October 2023.
-
Simultaneous Synchronization and Calibration for Wide-baseline Stereo Event Cameras
Authors:
Wanli Xing,
Shijie Lin,
Guangze Zheng,
Yanjun Du,
Jia Pan
Abstract:
Event-based cameras are increasingly utilized in various applications, owing to their high temporal resolution and low power consumption. However, a fundamental challenge arises when deploying multiple such cameras: they operate on independent time systems, leading to temporal misalignment. This misalignment can significantly degrade performance in downstream applications. Traditional solutions, w…
▽ More
Event-based cameras are increasingly utilized in various applications, owing to their high temporal resolution and low power consumption. However, a fundamental challenge arises when deploying multiple such cameras: they operate on independent time systems, leading to temporal misalignment. This misalignment can significantly degrade performance in downstream applications. Traditional solutions, which often rely on hardware-based synchronization, face limitations in compatibility and are impractical for long-distance setups. To address these challenges, we propose a novel algorithm that exploits the motion of objects in the shared field of view to achieve millisecond-level synchronization among multiple event-based cameras. Our method also concurrently estimates extrinsic parameters. We validate our approach in both simulated and real-world indoor/outdoor scenarios, demonstrating successful synchronization and accurate extrinsic parameters estimation.
△ Less
Submitted 29 September, 2023;
originally announced September 2023.
-
Sparsity-regularized coded ptychography for robust and efficient lensless microscopy on a chip
Authors:
Ninghe Liu,
Qianhao Zhao,
Guoan Zheng
Abstract:
In ptychographic imaging, the trade-off between the number of acquisitions and the resultant imaging quality presents a complex optimization problem. Increasing the number of acquisitions typically yields reconstructions with higher spatial resolution and finer details. Conversely, a reduction in measurement frequency often compromises the quality of the reconstructed images, manifesting as increa…
▽ More
In ptychographic imaging, the trade-off between the number of acquisitions and the resultant imaging quality presents a complex optimization problem. Increasing the number of acquisitions typically yields reconstructions with higher spatial resolution and finer details. Conversely, a reduction in measurement frequency often compromises the quality of the reconstructed images, manifesting as increased noise and coarser details. To address this challenge, we employ sparsity priors to reformulate the ptychographic reconstruction task as a total variation regularized optimization problem. We introduce a new computational framework, termed the ptychographic proximal total-variation (PPTV) solver, designed to integrate into existing ptychography settings without necessitating hardware modifications. Through comprehensive numerical simulations, we validate that PPTV-driven coded ptychography is capable of producing highly accurate reconstructions with a minimal set of eight intensity measurements. Convergence analysis further substantiates the robustness, stability, and computational feasibility of the proposed PPTV algorithm. Experimental results obtained from optical setups unequivocally demonstrate that the PPTV algorithm facilitates high-throughput, high-resolution imaging while significantly reducing the measurement burden. These findings indicate that the PPTV algorithm has the potential to substantially mitigate the resource-intensive requirements traditionally associated with high-quality ptychographic imaging, thereby offering a pathway toward the development of more compact and efficient ptychographic microscopy systems.
△ Less
Submitted 24 September, 2023;
originally announced September 2023.
-
Multi-objective Optimization of Space-Air-Ground Integrated Network Slicing Relying on a Pair of Central and Distributed Learning Algorithms
Authors:
Guorong Zhou,
Liqiang Zhao,
Gan Zheng,
Shenghui Song,
Jiankang Zhang,
Lajos Hanzo
Abstract:
As an attractive enabling technology for next-generation wireless communications, network slicing supports diverse customized services in the global space-air-ground integrated network (SAGIN) with diverse resource constraints. In this paper, we dynamically consider three typical classes of radio access network (RAN) slices, namely high-throughput slices, low-delay slices and wide-coverage slices,…
▽ More
As an attractive enabling technology for next-generation wireless communications, network slicing supports diverse customized services in the global space-air-ground integrated network (SAGIN) with diverse resource constraints. In this paper, we dynamically consider three typical classes of radio access network (RAN) slices, namely high-throughput slices, low-delay slices and wide-coverage slices, under the same underlying physical SAGIN. The throughput, the service delay and the coverage area of these three classes of RAN slices are jointly optimized in a non-scalar form by considering the distinct channel features and service advantages of the terrestrial, aerial and satellite components of SAGINs. A joint central and distributed multi-agent deep deterministic policy gradient (CDMADDPG) algorithm is proposed for solving the above problem to obtain the Pareto optimal solutions. The algorithm first determines the optimal virtual unmanned aerial vehicle (vUAV) positions and the inter-slice sub-channel and power sharing by relying on a centralized unit. Then it optimizes the intra-slice sub-channel and power allocation, and the virtual base station (vBS)/vUAV/virtual low earth orbit (vLEO) satellite deployment in support of three classes of slices by three separate distributed units. Simulation results verify that the proposed method approaches the Pareto-optimal exploitation of multiple RAN slices, and outperforms the benchmarkers.
△ Less
Submitted 22 September, 2023;
originally announced September 2023.
-
Uncertainty-aware Traffic Prediction under Missing Data
Authors:
Hao Mei,
Junxian Li,
Zhiming Liang,
Guanjie Zheng,
Bin Shi,
Hua Wei
Abstract:
Traffic prediction is a crucial topic because of its broad scope of applications in the transportation domain. Recently, various studies have achieved promising results. However, most studies assume the prediction locations have complete or at least partial historical records and cannot be extended to non-historical recorded locations. In real-life scenarios, the deployment of sensors could be lim…
▽ More
Traffic prediction is a crucial topic because of its broad scope of applications in the transportation domain. Recently, various studies have achieved promising results. However, most studies assume the prediction locations have complete or at least partial historical records and cannot be extended to non-historical recorded locations. In real-life scenarios, the deployment of sensors could be limited due to budget limitations and installation availability, which makes most current models not applicable. Though few pieces of literature tried to impute traffic states at the missing locations, these methods need the data simultaneously observed at the locations with sensors, making them not applicable to prediction tasks. Another drawback is the lack of measurement of uncertainty in prediction, making prior works unsuitable for risk-sensitive tasks or involving decision-making. To fill the gap, inspired by the previous inductive graph neural network, this work proposed an uncertainty-aware framework with the ability to 1) extend prediction to missing locations with no historical records and significantly extend spatial coverage of prediction locations while reducing deployment of sensors and 2) generate probabilistic prediction with uncertainty quantification to help the management of risk and decision making in the down-stream tasks. Through extensive experiments on real-life datasets, the result shows our method achieved promising results on prediction tasks, and the uncertainty quantification gives consistent results which highly correlated with the locations with and without historical data. We also show that our model could help support sensor deployment tasks in the transportation field to achieve higher accuracy with a limited sensor deployment budget.
△ Less
Submitted 29 November, 2023; v1 submitted 13 September, 2023;
originally announced September 2023.
-
Temporal Collection and Distribution for Referring Video Object Segmentation
Authors:
Jiajin Tang,
Ge Zheng,
Sibei Yang
Abstract:
Referring video object segmentation aims to segment a referent throughout a video sequence according to a natural language expression. It requires aligning the natural language expression with the objects' motions and their dynamic associations at the global video level but segmenting objects at the frame level. To achieve this goal, we propose to simultaneously maintain a global referent token an…
▽ More
Referring video object segmentation aims to segment a referent throughout a video sequence according to a natural language expression. It requires aligning the natural language expression with the objects' motions and their dynamic associations at the global video level but segmenting objects at the frame level. To achieve this goal, we propose to simultaneously maintain a global referent token and a sequence of object queries, where the former is responsible for capturing video-level referent according to the language expression, while the latter serves to better locate and segment objects with each frame. Furthermore, to explicitly capture object motions and spatial-temporal cross-modal reasoning over objects, we propose a novel temporal collection-distribution mechanism for interacting between the global referent token and object queries. Specifically, the temporal collection mechanism collects global information for the referent token from object queries to the temporal motions to the language expression. In turn, the temporal distribution first distributes the referent token to the referent sequence across all frames and then performs efficient cross-frame reasoning between the referent sequence and object queries in every frame. Experimental results show that our method outperforms state-of-the-art methods on all benchmarks consistently and significantly.
△ Less
Submitted 7 September, 2023;
originally announced September 2023.
-
CoTDet: Affordance Knowledge Prompting for Task Driven Object Detection
Authors:
Jiajin Tang,
Ge Zheng,
Jingyi Yu,
Sibei Yang
Abstract:
Task driven object detection aims to detect object instances suitable for affording a task in an image. Its challenge lies in object categories available for the task being too diverse to be limited to a closed set of object vocabulary for traditional object detection. Simply mapping categories and visual features of common objects to the task cannot address the challenge. In this paper, we propos…
▽ More
Task driven object detection aims to detect object instances suitable for affording a task in an image. Its challenge lies in object categories available for the task being too diverse to be limited to a closed set of object vocabulary for traditional object detection. Simply mapping categories and visual features of common objects to the task cannot address the challenge. In this paper, we propose to explore fundamental affordances rather than object categories, i.e., common attributes that enable different objects to accomplish the same task. Moreover, we propose a novel multi-level chain-of-thought prompting (MLCoT) to extract the affordance knowledge from large language models, which contains multi-level reasoning steps from task to object examples to essential visual attributes with rationales. Furthermore, to fully exploit knowledge to benefit object recognition and localization, we propose a knowledge-conditional detection framework, namely CoTDet. It conditions the detector from the knowledge to generate object queries and regress boxes. Experimental results demonstrate that our CoTDet outperforms state-of-the-art methods consistently and significantly (+15.6 box AP and +14.8 mask AP) and can generate rationales for why objects are detected to afford the task.
△ Less
Submitted 3 September, 2023;
originally announced September 2023.
-
Contrastive Grouping with Transformer for Referring Image Segmentation
Authors:
Jiajin Tang,
Ge Zheng,
Cheng Shi,
Sibei Yang
Abstract:
Referring image segmentation aims to segment the target referent in an image conditioning on a natural language expression. Existing one-stage methods employ per-pixel classification frameworks, which attempt straightforwardly to align vision and language at the pixel level, thus failing to capture critical object-level information. In this paper, we propose a mask classification framework, Contra…
▽ More
Referring image segmentation aims to segment the target referent in an image conditioning on a natural language expression. Existing one-stage methods employ per-pixel classification frameworks, which attempt straightforwardly to align vision and language at the pixel level, thus failing to capture critical object-level information. In this paper, we propose a mask classification framework, Contrastive Grouping with Transformer network (CGFormer), which explicitly captures object-level information via token-based querying and grouping strategy. Specifically, CGFormer first introduces learnable query tokens to represent objects and then alternately queries linguistic features and groups visual features into the query tokens for object-aware cross-modal reasoning. In addition, CGFormer achieves cross-level interaction by jointly updating the query tokens and decoding masks in every two consecutive layers. Finally, CGFormer cooperates contrastive learning to the grouping strategy to identify the token and its mask corresponding to the referent. Experimental results demonstrate that CGFormer outperforms state-of-the-art methods in both segmentation and generalization settings consistently and significantly.
△ Less
Submitted 2 September, 2023;
originally announced September 2023.
-
When Do Program-of-Thoughts Work for Reasoning?
Authors:
Zhen Bi,
Ningyu Zhang,
Yinuo Jiang,
Shumin Deng,
Guozhou Zheng,
Huajun Chen
Abstract:
In the realm of embodied artificial intelligence, the reasoning capabilities of Large Language Models (LLMs) play a pivotal role. Although there are effective methods like program-of-thought prompting for LLMs which uses programming language to tackle complex reasoning tasks, the specific impact of code data on the improvement of reasoning capabilities remains under-explored. To address this gap,…
▽ More
In the realm of embodied artificial intelligence, the reasoning capabilities of Large Language Models (LLMs) play a pivotal role. Although there are effective methods like program-of-thought prompting for LLMs which uses programming language to tackle complex reasoning tasks, the specific impact of code data on the improvement of reasoning capabilities remains under-explored. To address this gap, we propose complexity-impacted reasoning score (CIRS), which combines structural and logical attributes, to measure the correlation between code and reasoning abilities. Specifically, we use the abstract syntax tree to encode the structural information and calculate logical complexity by considering the difficulty and the cyclomatic complexity. Through an empirical analysis, we find not all code data of complexity can be learned or understood by LLMs. Optimal level of complexity is critical to the improvement of reasoning abilities by program-aided prompting. Then we design an auto-synthesizing and stratifying algorithm, and apply it to instruction generation for mathematical reasoning and code data filtering for code generation tasks. Extensive results demonstrates the effectiveness of our proposed approach. Code will be integrated into the EasyInstruct framework at https://github.com/zjunlp/EasyInstruct.
△ Less
Submitted 18 December, 2023; v1 submitted 29 August, 2023;
originally announced August 2023.
-
Cross-city Few-Shot Traffic Forecasting via Traffic Pattern Bank
Authors:
Zhanyu Liu,
Guanjie Zheng,
Yanwei Yu
Abstract:
Traffic forecasting is a critical service in Intelligent Transportation Systems (ITS). Utilizing deep models to tackle this task relies heavily on data from traffic sensors or vehicle devices, while some cities might lack device support and thus have few available data. So, it is necessary to learn from data-rich cities and transfer the knowledge to data-scarce cities in order to improve the perfo…
▽ More
Traffic forecasting is a critical service in Intelligent Transportation Systems (ITS). Utilizing deep models to tackle this task relies heavily on data from traffic sensors or vehicle devices, while some cities might lack device support and thus have few available data. So, it is necessary to learn from data-rich cities and transfer the knowledge to data-scarce cities in order to improve the performance of traffic forecasting. To address this problem, we propose a cross-city few-shot traffic forecasting framework via Traffic Pattern Bank (TPB) due to that the traffic patterns are similar across cities. TPB utilizes a pre-trained traffic patch encoder to project raw traffic data from data-rich cities into high-dimensional space, from which a traffic pattern bank is generated through clustering. Then, the traffic data of the data-scarce city could query the traffic pattern bank and explicit relations between them are constructed. The metaknowledge is aggregated based on these relations and an adjacency matrix is constructed to guide a downstream spatial-temporal model in forecasting future traffic. The frequently used meta-training framework Reptile is adapted to find a better initial parameter for the learnable modules. Experiments on real-world traffic datasets show that TPB outperforms existing methods and demonstrates the effectiveness of our approach in cross-city few-shot traffic forecasting.
△ Less
Submitted 17 August, 2023;
originally announced August 2023.
-
EasyEdit: An Easy-to-use Knowledge Editing Framework for Large Language Models
Authors:
Peng Wang,
Ningyu Zhang,
Bozhong Tian,
Zekun Xi,
Yunzhi Yao,
Ziwen Xu,
Mengru Wang,
Shengyu Mao,
Xiaohan Wang,
Siyuan Cheng,
Kangwei Liu,
Yuansheng Ni,
Guozhou Zheng,
Huajun Chen
Abstract:
Large Language Models (LLMs) usually suffer from knowledge cutoff or fallacy issues, which means they are unaware of unseen events or generate text with incorrect facts owing to outdated/noisy data. To this end, many knowledge editing approaches for LLMs have emerged -- aiming to subtly inject/edit updated knowledge or adjust undesired behavior while minimizing the impact on unrelated inputs. Neve…
▽ More
Large Language Models (LLMs) usually suffer from knowledge cutoff or fallacy issues, which means they are unaware of unseen events or generate text with incorrect facts owing to outdated/noisy data. To this end, many knowledge editing approaches for LLMs have emerged -- aiming to subtly inject/edit updated knowledge or adjust undesired behavior while minimizing the impact on unrelated inputs. Nevertheless, due to significant differences among various knowledge editing methods and the variations in task setups, there is no standard implementation framework available for the community, which hinders practitioners from applying knowledge editing to applications. To address these issues, we propose EasyEdit, an easy-to-use knowledge editing framework for LLMs. It supports various cutting-edge knowledge editing approaches and can be readily applied to many well-known LLMs such as T5, GPT-J, LlaMA, etc. Empirically, we report the knowledge editing results on LlaMA-2 with EasyEdit, demonstrating that knowledge editing surpasses traditional fine-tuning in terms of reliability and generalization. We have released the source code on GitHub, along with Google Colab tutorials and comprehensive documentation for beginners to get started. Besides, we present an online system for real-time knowledge editing, and a demo video.
△ Less
Submitted 19 March, 2024; v1 submitted 14 August, 2023;
originally announced August 2023.
-
Hybrid Retrieval-Augmented Generation for Real-time Composition Assistance
Authors:
Menglin Xia,
Xuchao Zhang,
Camille Couturier,
Guoqing Zheng,
Saravan Rajmohan,
Victor Ruhle
Abstract:
Retrieval augmentation enhances performance of traditional language models by incorporating additional context. However, the computational demands for retrieval augmented large language models (LLMs) pose a challenge when applying them to real-time tasks, such as composition assistance. To address this limitation, we propose the Hybrid Retrieval-Augmented Generation (HybridRAG) framework, a novel…
▽ More
Retrieval augmentation enhances performance of traditional language models by incorporating additional context. However, the computational demands for retrieval augmented large language models (LLMs) pose a challenge when applying them to real-time tasks, such as composition assistance. To address this limitation, we propose the Hybrid Retrieval-Augmented Generation (HybridRAG) framework, a novel approach that efficiently combines a cloud-based LLM with a smaller, client-side, language model through retrieval augmented memory. This integration enables the client model to generate effective responses, benefiting from the LLM's capabilities and contextual information. Additionally, through an asynchronous memory update mechanism, the client model can deliver real-time completions swiftly to user inputs without the need to wait for responses from the cloud. Our experiments on five benchmark datasets demonstrate that HybridRAG significantly improves utility over client-only models while maintaining low latency.
△ Less
Submitted 5 February, 2024; v1 submitted 8 August, 2023;
originally announced August 2023.
-
Control-Oriented Deep Space Communications For Unmanned Space Exploration
Authors:
Xinran Fang,
Wei Feng,
Yunfei Chen,
Ning Ge,
Gan Zheng
Abstract:
In unmanned space exploration, the cooperation among space robots requires advanced communication techniques. In this paper, we propose a communication optimization scheme for a specific cooperation system named the "mother-daughter system". In this setup, the mother spacecraft orbits the planet, while daughter probes are distributed across the planetary surface. During each control cycle, the mot…
▽ More
In unmanned space exploration, the cooperation among space robots requires advanced communication techniques. In this paper, we propose a communication optimization scheme for a specific cooperation system named the "mother-daughter system". In this setup, the mother spacecraft orbits the planet, while daughter probes are distributed across the planetary surface. During each control cycle, the mother spacecraft senses the environment, computes control commands and distributes them to daughter probes for actions. They synergistically form sensing-communication-computing-control ($\mathbf{SC^3}$) loops. Given the indivisibility of the $\mathbf{SC^3}$ loop, we optimize the mother-daughter downlink for closed-loop control. The optimization objective is the linear quadratic regulator (LQR) cost, and the optimization parameters are the block length and transmit power. To solve the nonlinear mixed-integer problem, we first identify the optimal block length and then transform the power allocation problem into a tractable convex problem. We further derive the approximate closed-form solutions for the proposed scheme and two communication-oriented schemes: the max-sum rate scheme and the max-min rate scheme. On this basis, we analyze their power allocation principles. In particular, for time-insensitive control tasks, we find that the proposed scheme demonstrates equivalence to the max-min rate scheme. These findings are verified through simulations.
△ Less
Submitted 11 December, 2023; v1 submitted 7 August, 2023;
originally announced August 2023.
-
Relay-Enabled Backscatter Communications: Linear Mapping and Resource Allocation
Authors:
Rui Xu,
Liqin Shi,
Yinghui Ye,
Haijian Sun,
Gan Zheng
Abstract:
Relay-enabled backscatter communication (BC) is an intriguing paradigm to alleviate energy shortage and improve throughput of Internet-of-Things (IoT) devices. Most of the existing works focus on the resource allocation that considered the unequal and continuous time allocation for both source-relay and relay-destination links. However, the continuous time allocation may be infeasible since in pra…
▽ More
Relay-enabled backscatter communication (BC) is an intriguing paradigm to alleviate energy shortage and improve throughput of Internet-of-Things (IoT) devices. Most of the existing works focus on the resource allocation that considered the unequal and continuous time allocation for both source-relay and relay-destination links. However, the continuous time allocation may be infeasible since in practice, the time allocation shall be carried out in integral multiple of the subframe duration unit. In this article, we study a discrete time scheme from the perspective of frame structure, where one transmission block is divided into two phases and the linear mapping is employed as a re-encoding method to determine the number of subframes for both phases and the power allocation for each subframe in a relay-enabled BC system. Based on this, we derive an accurate system-throughput expression and formulate a mixed-integral non-convex optimization problem to maximize the system throughput by jointly optimizing the power reflection coefficient (PRC) of the IoT node, the power allocation of the hybrid access point (HAP) and the linear mapping matrix, and solve it via a three-step approach. Accordingly, we propose a low complexity iterative algorithm to obtain the throughput maximization-based resource allocation solution. Numerical results analyze the performance of our proposed algorithm, verify the superiority of our proposed scheme, and evaluate the impacts of network parameters on the system throughput.
△ Less
Submitted 26 July, 2023;
originally announced July 2023.