-
Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in the Era of Large Language Models
Authors:
Jinliang Lu,
Ziliang Pang,
Min Xiao,
Yaochen Zhu,
Rui Xia,
Jiajun Zhang
Abstract:
The remarkable success of Large Language Models (LLMs) has ushered natural language processing (NLP) research into a new era. Despite their diverse capabilities, LLMs trained on different corpora exhibit varying strengths and weaknesses, leading to challenges in maximizing their overall efficiency and versatility. To address these challenges, recent studies have explored collaborative strategies f…
▽ More
The remarkable success of Large Language Models (LLMs) has ushered natural language processing (NLP) research into a new era. Despite their diverse capabilities, LLMs trained on different corpora exhibit varying strengths and weaknesses, leading to challenges in maximizing their overall efficiency and versatility. To address these challenges, recent studies have explored collaborative strategies for LLMs. This paper provides a comprehensive overview of this emerging research area, highlighting the motivation behind such collaborations. Specifically, we categorize collaborative strategies into three primary approaches: Merging, Ensemble, and Cooperation. Merging involves integrating multiple LLMs in the parameter space. Ensemble combines the outputs of various LLMs. Cooperation} leverages different LLMs to allow full play to their diverse capabilities for specific tasks. We provide in-depth introductions to these methods from different perspectives and discuss their potential applications. Additionally, we outline future research directions, hoping this work will catalyze further studies on LLM collaborations and paving the way for advanced NLP applications.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
One-step synthesis of Cu-doped Pb$_{10}$(PO$_{4}$)$_{6}$Cl$_{2}$ apatite: A wide-gap semiconductor
Authors:
W. Z. Yang,
Z. H. Pang,
Z. Ren
Abstract:
The recent claim of potential room-temperature superconductivity in Pb$_{10-x}$Cu$_{x}$(PO$_{4}$)$_{6}$O has attracted widespread attention. However, the signature of superconductivity is later attributed to the Cu$_{2}$S impurity formed during the multiple-step synthesis procedure. Here we report a simple one-step approach to synthesize single-phase chloride analogue Cu-doped Pb$_{10}$(PO$_{4}$)…
▽ More
The recent claim of potential room-temperature superconductivity in Pb$_{10-x}$Cu$_{x}$(PO$_{4}$)$_{6}$O has attracted widespread attention. However, the signature of superconductivity is later attributed to the Cu$_{2}$S impurity formed during the multiple-step synthesis procedure. Here we report a simple one-step approach to synthesize single-phase chloride analogue Cu-doped Pb$_{10}$(PO$_{4}$)$_{6}$Cl$_{2}$ using PbO, PbCl$_{2}$, CuCl$_{2}$, and NH$_{4}$H$_{2}$PO$_{4}$ as starting materials. Irrespective of the initial stoichiometry, the Cu doping always leads to a lattice expansion in Pb$_{10}$(PO$_{4}$)$_{6}$Cl$_{2}$. This indicates that Cu prefers to reside in the hexagonal channels rather than substitutes at the Pb site, and the chemical formula is expressed as Pb$_{10}$(PO$_{4}$)$_{6}$Cu$_{x}$Cl$_{2}$. All the Pb$_{10}$(PO$_{4}$)$_{6}$Cu$_{x}$Cl$_{2}$ (0 $\leq$ $x$ $\leq$ 1.0) samples are found to be semiconductors with wide band gaps of 4.46-4.59 eV, and the Cu-doped ones ($x$ = 0.5 and 1.0) exhibit a paramagnetic behavior without any phase transition between 400 and 1.8 K. Our study calls for a reinvestigation of the Cu location in Pb$_{10-x}$Cu$_{x}$(PO$_{4}$)$_{6}$O, and supports the absence of superconductivity in this oxyapatite.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language Understanding
Authors:
Junwei Luo,
Zhen Pang,
Yongjun Zhang,
Tingzhu Wang,
Linlin Wang,
Bo Dang,
Jiangwei Lao,
Jian Wang,
Jingdong Chen,
Yihua Tan,
Yansheng Li
Abstract:
Remote Sensing Large Multi-Modal Models (RSLMMs) are developing rapidly and showcase significant capabilities in remote sensing imagery (RSI) comprehension. However, due to the limitations of existing datasets, RSLMMs have shortcomings in understanding the rich semantic relations among objects in complex remote sensing scenes. To unlock RSLMMs' complex comprehension ability, we propose a large-sca…
▽ More
Remote Sensing Large Multi-Modal Models (RSLMMs) are developing rapidly and showcase significant capabilities in remote sensing imagery (RSI) comprehension. However, due to the limitations of existing datasets, RSLMMs have shortcomings in understanding the rich semantic relations among objects in complex remote sensing scenes. To unlock RSLMMs' complex comprehension ability, we propose a large-scale instruction tuning dataset FIT-RS, containing 1,800,851 instruction samples. FIT-RS covers common interpretation tasks and innovatively introduces several complex comprehension tasks of escalating difficulty, ranging from relation reasoning to image-level scene graph generation. Based on FIT-RS, we build the FIT-RSFG benchmark. Furthermore, we establish a new benchmark to evaluate the fine-grained relation comprehension capabilities of LMMs, named FIT-RSRC. Based on combined instruction data, we propose SkySenseGPT, which achieves outstanding performance on both public datasets and FIT-RSFG, surpassing existing RSLMMs. We hope the FIT-RS dataset can enhance the relation comprehension capability of RSLMMs and provide a large-scale fine-grained data source for the remote sensing community. The dataset will be available at https://github.com/Luo-Z13/SkySenseGPT
△ Less
Submitted 8 July, 2024; v1 submitted 14 June, 2024;
originally announced June 2024.
-
RMem: Restricted Memory Banks Improve Video Object Segmentation
Authors:
Junbao Zhou,
Ziqi Pang,
Yu-Xiong Wang
Abstract:
With recent video object segmentation (VOS) benchmarks evolving to challenging scenarios, we revisit a simple but overlooked strategy: restricting the size of memory banks. This diverges from the prevalent practice of expanding memory banks to accommodate extensive historical information. Our specially designed "memory deciphering" study offers a pivotal insight underpinning such a strategy: expan…
▽ More
With recent video object segmentation (VOS) benchmarks evolving to challenging scenarios, we revisit a simple but overlooked strategy: restricting the size of memory banks. This diverges from the prevalent practice of expanding memory banks to accommodate extensive historical information. Our specially designed "memory deciphering" study offers a pivotal insight underpinning such a strategy: expanding memory banks, while seemingly beneficial, actually increases the difficulty for VOS modules to decode relevant features due to the confusion from redundant information. By restricting memory banks to a limited number of essential frames, we achieve a notable improvement in VOS accuracy. This process balances the importance and freshness of frames to maintain an informative memory bank within a bounded capacity. Additionally, restricted memory banks reduce the training-inference discrepancy in memory lengths compared with continuous expansion. This fosters new opportunities in temporal reasoning and enables us to introduce the previously overlooked "temporal positional embedding." Finally, our insights are embodied in "RMem" ("R" for restricted), a simple yet effective VOS modification that excels at challenging VOS scenarios and establishes new state of the art for object state changes (on the VOST dataset) and long videos (on the Long Videos dataset). Our code and demo are available at https://restricted-memory.github.io/.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Optimal Real-time Bidding Strategy For EV Aggregators in Wholesale Electricity Markets
Authors:
Shihan Huang,
Dongkun Han,
John Zhen Fu Pang,
Yue Chen
Abstract:
With the rapid growth of electric vehicles (EVs), EV aggregators have been playing a increasingly vital role in power systems by not merely providing charging management but also participating in wholesale electricity markets. This work studies the optimal real-time bidding strategy for an EV aggregator. Since the charging process of EVs is time-coupled, it is necessary for EV aggregators to consi…
▽ More
With the rapid growth of electric vehicles (EVs), EV aggregators have been playing a increasingly vital role in power systems by not merely providing charging management but also participating in wholesale electricity markets. This work studies the optimal real-time bidding strategy for an EV aggregator. Since the charging process of EVs is time-coupled, it is necessary for EV aggregators to consider future operational conditions (e.g., future EV arrivals) when deciding the current bidding strategy. However, accurately forecasting future operational conditions is challenging under the inherent uncertainties. Hence, there demands a real-time bidding strategy based solely on the up-to-date information, which is the main goal of this work. We start by developing an online optimal EV charging management algorithm for the EV aggregator via Lyapunov optimization. Based on this, an optimal real-time bidding strategy (bidding cost curve and bounds) for the aggregator is derived. Then, an efficient yet practical algorithm is proposed to obtain the bidding strategy. It shows that with the proposed bidding strategy, the aggregator's profit is nearly offline optimal. Moreover, the wholesale electricity market clearing result aligns with the individual aggregator's optimal charging strategy given the prices. Case studies against several benchmarks are conducted to evaluate the performance of the proposed method.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Total Gluon Helicity from Lattice without Effective Theory Matching
Authors:
Zhuoyi Pang,
Fei Yao,
Jian-Hui Zhang
Abstract:
We propose two approaches for extracting the total gluon helicity contribution to proton spin from lattice QCD, one from local operator matrix elements in a fixed gauge accessible on lattice with feasible renormalization, and the other from gauge-invariant nonlocal gluon correlators. Neither of these approaches requires a matching procedure when converted to the MS scheme. Our proposal resolves a…
▽ More
We propose two approaches for extracting the total gluon helicity contribution to proton spin from lattice QCD, one from local operator matrix elements in a fixed gauge accessible on lattice with feasible renormalization, and the other from gauge-invariant nonlocal gluon correlators. Neither of these approaches requires a matching procedure when converted to the MS scheme. Our proposal resolves a long-standing inconsistency in the literature regarding lattice calculations of the total gluon helicity, and has the potential to greatly facilitate these calculations.
△ Less
Submitted 29 June, 2024; v1 submitted 31 March, 2024;
originally announced April 2024.
-
TRG-Net: An Interpretable and Controllable Rain Generator
Authors:
Zhiqiang Pang,
Hong Wang,
Qi Xie,
Deyu Meng,
Zongben Xu
Abstract:
Exploring and modeling rain generation mechanism is critical for augmenting paired data to ease training of rainy image processing models. Against this task, this study proposes a novel deep learning based rain generator, which fully takes the physical generation mechanism underlying rains into consideration and well encodes the learning of the fundamental rain factors (i.e., shape, orientation, l…
▽ More
Exploring and modeling rain generation mechanism is critical for augmenting paired data to ease training of rainy image processing models. Against this task, this study proposes a novel deep learning based rain generator, which fully takes the physical generation mechanism underlying rains into consideration and well encodes the learning of the fundamental rain factors (i.e., shape, orientation, length, width and sparsity) explicitly into the deep network. Its significance lies in that the generator not only elaborately design essential elements of the rain to simulate expected rains, like conventional artificial strategies, but also finely adapt to complicated and diverse practical rainy images, like deep learning methods. By rationally adopting filter parameterization technique, we first time achieve a deep network that is finely controllable with respect to rain factors and able to learn the distribution of these factors purely from data. Our unpaired generation experiments demonstrate that the rain generated by the proposed rain generator is not only of higher quality, but also more effective for deraining and downstream tasks compared to current state-of-the-art rain generation methods. Besides, the paired data augmentation experiments, including both in-distribution and out-of-distribution (OOD), further validate the diversity of samples generated by our model for in-distribution deraining and OOD generalization tasks.
△ Less
Submitted 29 April, 2024; v1 submitted 14 March, 2024;
originally announced March 2024.
-
A Tampering Risk of Fiber-Based Frequency Synchronization Networks and Its Countermeasures
Authors:
Hongfei Dai,
Yufeng Chen,
Wenlin Li,
Fangmin Wang,
Guan Wang,
Zhongwang Pang,
Bo Wang
Abstract:
Fiber optic networks are used worldwide and have been regarded as excellent media for transmitting time-frequency (TF) signals. In the past decades, fiber-based TF synchronization techniques have been extensively studied. Instruments based on these techniques have been successfully applied. With the increasing application of TF synchronization instruments, their security has become an important is…
▽ More
Fiber optic networks are used worldwide and have been regarded as excellent media for transmitting time-frequency (TF) signals. In the past decades, fiber-based TF synchronization techniques have been extensively studied. Instruments based on these techniques have been successfully applied. With the increasing application of TF synchronization instruments, their security has become an important issue. Unfortunately, the security risks of fiber-based frequency synchronization (FbFS) instruments have been overlooked. This paper proposes a frequency tampering method called "frequency lens". On a 200 km fiber link, we demonstrate a frequency tampering scenario using a frequency lens-enabled frequency tampering module (FTM). On the user side, the frequency value of the recovered 100 MHz signal can be stealthily altered within a range of 100 MHz-100 Hz to 100 MHz+100 Hz, while the frequency dissemination stability of the system remains normal. Related to this tampering risk, potential hazards in three different application scenarios, which rely on precise frequency references, are analyzed. Two countermeasures are also proposed to solve this tampering risk.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
MGE: A Training-Free and Efficient Model Generation and Enhancement Scheme
Authors:
Xuan Wang,
Zeshan Pang,
Yuliang Lu,
Xuehu Yan
Abstract:
To provide a foundation for the research of deep learning models, the construction of model pool is an essential step. This paper proposes a Training-Free and Efficient Model Generation and Enhancement Scheme (MGE). This scheme primarily considers two aspects during the model generation process: the distribution of model parameters and model performance. Experiments result shows that generated mod…
▽ More
To provide a foundation for the research of deep learning models, the construction of model pool is an essential step. This paper proposes a Training-Free and Efficient Model Generation and Enhancement Scheme (MGE). This scheme primarily considers two aspects during the model generation process: the distribution of model parameters and model performance. Experiments result shows that generated models are comparable to models obtained through normal training, and even superior in some cases. Moreover, the time consumed in generating models accounts for only 1\% of the time required for normal model training. More importantly, with the enhancement of Evolution-MGE, generated models exhibits competitive generalization ability in few-shot tasks. And the behavioral dissimilarity of generated models has the potential of adversarial defense.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Acknowledgment of Emotional States: Generating Validating Responses for Empathetic Dialogue
Authors:
Zi Haur Pang,
Yahui Fu,
Divesh Lala,
Keiko Ochi,
Koji Inoue,
Tatsuya Kawahara
Abstract:
In the realm of human-AI dialogue, the facilitation of empathetic responses is important. Validation is one of the key communication techniques in psychology, which entails recognizing, understanding, and acknowledging others' emotional states, thoughts, and actions. This study introduces the first framework designed to engender empathetic dialogue with validating responses. Our approach incorpora…
▽ More
In the realm of human-AI dialogue, the facilitation of empathetic responses is important. Validation is one of the key communication techniques in psychology, which entails recognizing, understanding, and acknowledging others' emotional states, thoughts, and actions. This study introduces the first framework designed to engender empathetic dialogue with validating responses. Our approach incorporates a tripartite module system: 1) validation timing detection, 2) users' emotional state identification, and 3) validating response generation. Utilizing Japanese EmpatheticDialogues dataset - a textual-based dialogue dataset consisting of 8 emotional categories from Plutchik's wheel of emotions - the Task Adaptive Pre-Training (TAPT) BERT-based model outperforms both random baseline and the ChatGPT performance, in term of F1-score, in all modules. Further validation of our model's efficacy is confirmed in its application to the TUT Emotional Storytelling Corpus (TESC), a speech-based dialogue dataset, by surpassing both random baseline and the ChatGPT. This consistent performance across both textual and speech-based dialogues underscores the effectiveness of our framework in fostering empathetic human-AI communication.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
On Chernoff Lower-Bound of Outage Threshold for Non-Central $χ^2$-Distributed Beamforming Gain in URLLC Systems
Authors:
Jinfei Wang,
Yi Ma,
Rahim Tafazolli,
Zhibo Pang
Abstract:
The cumulative distribution function (CDF) of a non-central $χ^2$-distributed random variable (RV) is often used when measuring the outage probability of communication systems. For ultra-reliable low-latency communication (URLLC), it is important but mathematically challenging to determine the outage threshold for an extremely small outage target. This motivates us to investigate lower bounds of t…
▽ More
The cumulative distribution function (CDF) of a non-central $χ^2$-distributed random variable (RV) is often used when measuring the outage probability of communication systems. For ultra-reliable low-latency communication (URLLC), it is important but mathematically challenging to determine the outage threshold for an extremely small outage target. This motivates us to investigate lower bounds of the outage threshold, and it is found that the one derived from the Chernoff inequality (named Cher-LB) is the most effective lower bound. This finding is associated with three rigorously established properties of the Cher-LB with respect to the mean, variance, reliability requirement, and degrees of freedom of the non-central $χ^2$-distributed RV. The Cher-LB is then employed to predict the beamforming gain in URLLC for both conventional multi-antenna systems (i.e., MIMO) under first-order Markov time-varying channel and reconfigurable intellgent surface (RIS) systems. It is exhibited that, with the proposed Cher-LB, the pessimistic prediction of the beamforming gain is made sufficiently accurate for guaranteed reliability as well as the transmit-energy efficiency.
△ Less
Submitted 27 December, 2023;
originally announced December 2023.
-
Few-shot Learning using Data Augmentation and Time-Frequency Transformation for Time Series Classification
Authors:
Hao Zhang,
Zhendong Pang,
Jiangpeng Wang,
Teng Li
Abstract:
Deep neural networks (DNNs) that tackle the time series classification (TSC) task have provided a promising framework in signal processing. In real-world applications, as a data-driven model, DNNs are suffered from insufficient data. Few-shot learning has been studied to deal with this limitation. In this paper, we propose a novel few-shot learning framework through data augmentation, which involv…
▽ More
Deep neural networks (DNNs) that tackle the time series classification (TSC) task have provided a promising framework in signal processing. In real-world applications, as a data-driven model, DNNs are suffered from insufficient data. Few-shot learning has been studied to deal with this limitation. In this paper, we propose a novel few-shot learning framework through data augmentation, which involves transformation through the time-frequency domain and the generation of synthetic images through random erasing. Additionally, we develop a sequence-spectrogram neural network (SSNN). This neural network model composes of two sub-networks: one utilizing 1D residual blocks to extract features from the input sequence while the other one employing 2D residual blocks to extract features from the spectrogram representation. In the experiments, comparison studies of different existing DNN models with/without data augmentation are conducted on an amyotrophic lateral sclerosis (ALS) dataset and a wind turbine fault (WTF) dataset. The experimental results manifest that our proposed method achieves 93.75% F1 score and 93.33% accuracy on the ALS datasets while 95.48% F1 score and 95.59% accuracy on the WTF datasets. Our methodology demonstrates its applicability of addressing the few-shot problems for time series classification.
△ Less
Submitted 6 November, 2023;
originally announced November 2023.
-
Frozen Transformers in Language Models Are Effective Visual Encoder Layers
Authors:
Ziqi Pang,
Ziyang Xie,
Yunze Man,
Yu-Xiong Wang
Abstract:
This paper reveals that large language models (LLMs), despite being trained solely on textual data, are surprisingly strong encoders for purely visual tasks in the absence of language. Even more intriguingly, this can be achieved by a simple yet previously overlooked strategy -- employing a frozen transformer block from pre-trained LLMs as a constituent encoder layer to directly process visual tok…
▽ More
This paper reveals that large language models (LLMs), despite being trained solely on textual data, are surprisingly strong encoders for purely visual tasks in the absence of language. Even more intriguingly, this can be achieved by a simple yet previously overlooked strategy -- employing a frozen transformer block from pre-trained LLMs as a constituent encoder layer to directly process visual tokens. Our work pushes the boundaries of leveraging LLMs for computer vision tasks, significantly departing from conventional practices that typically necessitate a multi-modal vision-language setup with associated language prompts, inputs, or outputs. We demonstrate that our approach consistently enhances performance across a diverse range of tasks, encompassing pure 2D and 3D visual recognition tasks (e.g., image and point cloud classification), temporal modeling tasks (e.g., action recognition), non-semantic tasks (e.g., motion forecasting), and multi-modal tasks (e.g., 2D/3D visual question answering and image-text retrieval). Such improvements are a general phenomenon, applicable to various types of LLMs (e.g., LLaMA and OPT) and different LLM transformer blocks. We additionally propose the information filtering hypothesis to explain the effectiveness of pre-trained LLMs in visual encoding -- the pre-trained LLM transformer blocks discern informative visual tokens and further amplify their effect. This hypothesis is empirically supported by the observation that the feature activation, after training with LLM transformer blocks, exhibits a stronger focus on relevant regions. We hope that our work inspires new perspectives on utilizing LLMs and deepening our understanding of their underlying mechanisms. Code is available at https://github.com/ziqipang/LM4VisualEncoding.
△ Less
Submitted 6 May, 2024; v1 submitted 19 October, 2023;
originally announced October 2023.
-
IRS Assisted Federated Learning A Broadband Over-the-Air Aggregation Approach
Authors:
Deyou Zhang,
Ming Xiao,
Zhibo Pang,
Lihui Wang,
H. Vincent Poor
Abstract:
We consider a broadband over-the-air computation empowered model aggregation approach for wireless federated learning (FL) systems and propose to leverage an intelligent reflecting surface (IRS) to combat wireless fading and noise. We first investigate the conventional node-selection based framework, where a few edge nodes are dropped in model aggregation to control the aggregation error. We analy…
▽ More
We consider a broadband over-the-air computation empowered model aggregation approach for wireless federated learning (FL) systems and propose to leverage an intelligent reflecting surface (IRS) to combat wireless fading and noise. We first investigate the conventional node-selection based framework, where a few edge nodes are dropped in model aggregation to control the aggregation error. We analyze the performance of this node-selection based framework and derive an upper bound on its performance loss, which is shown to be related to the selected edge nodes. Then, we seek to minimize the mean-squared error (MSE) between the desired global gradient parameters and the actually received ones by optimizing the selected edge nodes, their transmit equalization coefficients, the IRS phase shifts, and the receive factors of the cloud server. By resorting to the matrix lifting technique and difference-of-convex programming, we successfully transform the formulated optimization problem into a convex one and solve it using off-the-shelf solvers. To improve learning performance, we further propose a weight-selection based FL framework. In such a framework, we assign each edge node a proper weight coefficient in model aggregation instead of discarding any of them to reduce the aggregation error, i.e., amplitude alignment of the received local gradient parameters from different edge nodes is not required. We also analyze the performance of this weight-selection based framework and derive an upper bound on its performance loss, followed by minimizing the MSE via optimizing the weight coefficients of the edge nodes, their transmit equalization coefficients, the IRS phase shifts, and the receive factors of the cloud server. Furthermore, we use the MNIST dataset for simulations to evaluate the performance of both node-selection and weight-selection based FL frameworks.
△ Less
Submitted 11 October, 2023;
originally announced October 2023.
-
Streaming Motion Forecasting for Autonomous Driving
Authors:
Ziqi Pang,
Deva Ramanan,
Mengtian Li,
Yu-Xiong Wang
Abstract:
Trajectory forecasting is a widely-studied problem for autonomous navigation. However, existing benchmarks evaluate forecasting based on independent snapshots of trajectories, which are not representative of real-world applications that operate on a continuous stream of data. To bridge this gap, we introduce a benchmark that continuously queries future trajectories on streaming data and we refer t…
▽ More
Trajectory forecasting is a widely-studied problem for autonomous navigation. However, existing benchmarks evaluate forecasting based on independent snapshots of trajectories, which are not representative of real-world applications that operate on a continuous stream of data. To bridge this gap, we introduce a benchmark that continuously queries future trajectories on streaming data and we refer to it as "streaming forecasting." Our benchmark inherently captures the disappearance and re-appearance of agents, presenting the emergent challenge of forecasting for occluded agents, which is a safety-critical problem yet overlooked by snapshot-based benchmarks. Moreover, forecasting in the context of continuous timestamps naturally asks for temporal coherence between predictions from adjacent timestamps. Based on this benchmark, we further provide solutions and analysis for streaming forecasting. We propose a plug-and-play meta-algorithm called "Predictive Streamer" that can adapt any snapshot-based forecaster into a streaming forecaster. Our algorithm estimates the states of occluded agents by propagating their positions with multi-modal trajectories, and leverages differentiable filters to ensure temporal consistency. Both occlusion reasoning and temporal coherence strategies significantly improve forecasting quality, resulting in 25% smaller endpoint errors for occluded agents and 10-20% smaller fluctuations of trajectories. Our work is intended to generate interest within the community by highlighting the importance of addressing motion forecasting in its intrinsic streaming setting. Code is available at https://github.com/ziqipang/StreamingForecasting.
△ Less
Submitted 2 October, 2023;
originally announced October 2023.
-
OriWheelBot: An origami-wheeled robot
Authors:
Jie Liu,
Zufeng Pang,
Zhiyong Li,
Guilin Wen,
Zhoucheng Su,
Junfeng He,
Kaiyue Liu,
Dezheng Jiang,
Zenan Li,
Shouyan Chen,
Yang Tian,
Yi Min Xie,
Zhenpei Wang,
Zhuangjian Liu
Abstract:
Origami-inspired robots with multiple advantages, such as being lightweight, requiring less assembly, and exhibiting exceptional deformability, have received substantial and sustained attention. However, the existing origami-inspired robots are usually of limited functionalities and developing feature-rich robots is very challenging. Here, we report an origami-wheeled robot (OriWheelBot) with vari…
▽ More
Origami-inspired robots with multiple advantages, such as being lightweight, requiring less assembly, and exhibiting exceptional deformability, have received substantial and sustained attention. However, the existing origami-inspired robots are usually of limited functionalities and developing feature-rich robots is very challenging. Here, we report an origami-wheeled robot (OriWheelBot) with variable width and outstanding sand walking versatility. The OriWheelBot's ability to adjust wheel width over obstacles is achieved by origami wheels made of Miura origami. An improved version, called iOriWheelBot, is also developed to automatically judge the width of the obstacles. Three actions, namely direct pass, variable width pass, and direct return, will be carried out depending on the width of the channel between the obstacles. We have identified two motion mechanisms, i.e., sand-digging and sand-pushing, with the latter being more conducive to walking on the sand. We have systematically examined numerous sand walking characteristics, including carrying loads, climbing a slope, walking on a slope, and navigating sand pits, small rocks, and sand traps. The OriWheelBot can change its width by 40%, has a loading-carrying ratio of 66.7% on flat sand and can climb a 17-degree sand incline. The OriWheelBot can be useful for planetary subsurface exploration and disaster area rescue.
△ Less
Submitted 29 September, 2023;
originally announced October 2023.
-
An underground fiber cable discrimination method based on laser interferometer
Authors:
Dongqi Song,
Guan Wang,
Zhongwang Pang,
Bo Wang
Abstract:
The maintenance and repair of optical fiber networks often requires the discrimination of underground cables among a group of buried ones. Presently, the methods commonly used are either inefficient or harmful to the fiber cable. In this paper, we evaluate the effectiveness of fiber optic vibration sensing method on underground fiber cable discrimination. We find that the typical vibration sensing…
▽ More
The maintenance and repair of optical fiber networks often requires the discrimination of underground cables among a group of buried ones. Presently, the methods commonly used are either inefficient or harmful to the fiber cable. In this paper, we evaluate the effectiveness of fiber optic vibration sensing method on underground fiber cable discrimination. We find that the typical vibration sensing method-distributed acoustic sensing (DAS) is not suitable for fiber cable discrimination. Especially for long distance scenario, due to "frequency grafting" effect, the DAS method will give a false response on the spectrum of knock events during cable discrimination process. In response, we propose an underground fiber cable discrimination method based on laser interferometer, and demonstrate it on the 40-km urban fiber cable connecting Tsinghua University and Yongding Road. It can give an obvious and real response to the knock event, and can be used in practical applications.
△ Less
Submitted 25 August, 2023;
originally announced August 2023.
-
VATP360: Viewport Adaptive 360-Degree Video Streaming based on Tile Priority
Authors:
Zhiyu Pang
Abstract:
360-degree video becomes increasingly popular among users. In the current network bandwidth, serving high resolution 360 degree video to users is quite difficult. Most of the work has been devoted to the prediction of user viewports or tile-based adaptive algorithms. However, it is difficult to predict user viewports more accurately using only information such as user's historical viewports or vid…
▽ More
360-degree video becomes increasingly popular among users. In the current network bandwidth, serving high resolution 360 degree video to users is quite difficult. Most of the work has been devoted to the prediction of user viewports or tile-based adaptive algorithms. However, it is difficult to predict user viewports more accurately using only information such as user's historical viewports or video saliency maps. In this paper, we propose a viewport adaptive 360-degree video streaming method based on tile priority (VATP360), which tries to balance between the performance and the overhead. The proposed VATP360 consists of three main modules: viewport prediction, tile priority classification and bitrate allocation. In the viewport prediction module, object motion trajectory and predicted user's region-of-interest (ROI) are used to achieve accurate prediction of the user's future viewport. Then, the predicted viewport, along with the object motion trajectory, are fed into the proposed tile priority classification algorithm to assign different priorities to tiles, which would reduce the computational complexity of the bitrate allocation module. Finally in the bitrate allocation stage, we adaptively assign bitrates to tiles of different priority by reinforcement learning. Experimental results on publicly available datasets have demonstrated the effectiveness of the proposed method.
△ Less
Submitted 27 August, 2023; v1 submitted 29 July, 2023;
originally announced July 2023.
-
virtCCA: Virtualized Arm Confidential Compute Architecture with TrustZone
Authors:
Xiangyi Xu,
Wenhao Wang,
Yongzheng Wu,
Chenyu Wang,
Huifeng Zhu,
Haocheng Ma,
Zhennan Min,
Zixuan Pang,
Rui Hou,
Yier Jin
Abstract:
ARM recently introduced the Confidential Compute Architecture (CCA) as part of the upcoming ARMv9-A architecture. CCA enables the support of confidential virtual machines (cVMs) within a separate world called the Realm world, providing protection from the untrusted normal world. While CCA offers a promising future for confidential computing, the widespread availability of CCA hardware is not expec…
▽ More
ARM recently introduced the Confidential Compute Architecture (CCA) as part of the upcoming ARMv9-A architecture. CCA enables the support of confidential virtual machines (cVMs) within a separate world called the Realm world, providing protection from the untrusted normal world. While CCA offers a promising future for confidential computing, the widespread availability of CCA hardware is not expected in the near future, according to ARM's roadmap. To address this gap, we present virtCCA, an architecture that facilitates virtualized CCA using TrustZone, a mature hardware feature available on existing ARM platforms. Notably, virtCCA can be implemented on platforms equipped with the Secure EL2 (S-EL2) extension available from ARMv8.4 onwards, as well as on earlier platforms that lack S-EL2 support. virtCCA is fully compatible with the CCA specifications at the API level. We have developed the entire CCA software and firmware stack on top of virtCCA, including the enhancements to the normal world's KVM to support cVMs, and the TrustZone Management Monitor (TMM) that enforces isolation among cVMs and provides cVM life-cycle management. We have implemented virtCCA on real ARM servers, with and without S-EL2 support. Our evaluation, conducted on micro-benchmarks and macro-benchmarks, demonstrates that the overhead of running cVMs is acceptable compared to running normal-world VMs. Specifically, in a set of real-world workloads, the overhead of virtCCA-SEL2 is less than 29.5% for I/O intensive workloads, while virtCCA-EL3 outperforms the baseline in most cases.
△ Less
Submitted 17 February, 2024; v1 submitted 19 June, 2023;
originally announced June 2023.
-
MV-Map: Offboard HD-Map Generation with Multi-view Consistency
Authors:
Ziyang Xie,
Ziqi Pang,
Yu-Xiong Wang
Abstract:
While bird's-eye-view (BEV) perception models can be useful for building high-definition maps (HD-Maps) with less human labor, their results are often unreliable and demonstrate noticeable inconsistencies in the predicted HD-Maps from different viewpoints. This is because BEV perception is typically set up in an 'onboard' manner, which restricts the computation and consequently prevents algorithms…
▽ More
While bird's-eye-view (BEV) perception models can be useful for building high-definition maps (HD-Maps) with less human labor, their results are often unreliable and demonstrate noticeable inconsistencies in the predicted HD-Maps from different viewpoints. This is because BEV perception is typically set up in an 'onboard' manner, which restricts the computation and consequently prevents algorithms from reasoning multiple views simultaneously. This paper overcomes these limitations and advocates a more practical 'offboard' HD-Map generation setup that removes the computation constraints, based on the fact that HD-Maps are commonly reusable infrastructures built offline in data centers. To this end, we propose a novel offboard pipeline called MV-Map that capitalizes multi-view consistency and can handle an arbitrary number of frames with the key design of a 'region-centric' framework. In MV-Map, the target HD-Maps are created by aggregating all the frames of onboard predictions, weighted by the confidence scores assigned by an 'uncertainty network'. To further enhance multi-view consistency, we augment the uncertainty network with the global 3D structure optimized by a voxelized neural radiance field (Voxel-NeRF). Extensive experiments on nuScenes show that our MV-Map significantly improves the quality of HD-Maps, further highlighting the importance of offboard methods for HD-Map generation.
△ Less
Submitted 8 October, 2023; v1 submitted 15 May, 2023;
originally announced May 2023.
-
Synthetic non-Abelian gauge fields for non-Hermitian systems
Authors:
Zehai Pang,
Jinbing Hu,
Yi Yang
Abstract:
Non-Abelian gauge fields are versatile tools for synthesizing topological phenomena but have so far been mostly studied in Hermitian systems, where gauge flux has to be defined from a closed loop in order for gauge fields, whether Abelian or non-Abelian, to become physically meaningful. We show that this condition can be relaxed in non-Hermitian systems by proposing and studying a generalized Hata…
▽ More
Non-Abelian gauge fields are versatile tools for synthesizing topological phenomena but have so far been mostly studied in Hermitian systems, where gauge flux has to be defined from a closed loop in order for gauge fields, whether Abelian or non-Abelian, to become physically meaningful. We show that this condition can be relaxed in non-Hermitian systems by proposing and studying a generalized Hatano--Nelson model with imbalanced non-Abelian hopping. Despite lacking gauge flux in one dimension, non-Abelian gauge fields create rich non-Hermitian topological consequences. Under only nearest-neighbor coupling, non-Abelian gauge fields enable Hopf-link bulk braiding topology, whose phase transition accompanies the emergence of exceptional points (EPs). At both ends of an open chain, non-Abelian gauge fields lead to the simultaneous presence of non-Hermitian skin modes, whose population can be effectively tuned. Asymptotic analysis shows that this tuning mechanism stems from the interplay between the Abelian Hatano--Nelson coupling and effective high-order hopping, which becomes substantial near the EP phase transition condition. The predicted non-Hermitian phenomena, enabled by non-Abelian gauge fields, could be realized in synthetic dimensional optical platforms such as time-multiplexed photonic mesh lattices and driven ring resonators.
△ Less
Submitted 4 April, 2023;
originally announced April 2023.
-
Standing Between Past and Future: Spatio-Temporal Modeling for Multi-Camera 3D Multi-Object Tracking
Authors:
Ziqi Pang,
Jie Li,
Pavel Tokmakov,
Dian Chen,
Sergey Zagoruyko,
Yu-Xiong Wang
Abstract:
This work proposes an end-to-end multi-camera 3D multi-object tracking (MOT) framework. It emphasizes spatio-temporal continuity and integrates both past and future reasoning for tracked objects. Thus, we name it "Past-and-Future reasoning for Tracking" (PF-Track). Specifically, our method adapts the "tracking by attention" framework and represents tracked instances coherently over time with objec…
▽ More
This work proposes an end-to-end multi-camera 3D multi-object tracking (MOT) framework. It emphasizes spatio-temporal continuity and integrates both past and future reasoning for tracked objects. Thus, we name it "Past-and-Future reasoning for Tracking" (PF-Track). Specifically, our method adapts the "tracking by attention" framework and represents tracked instances coherently over time with object queries. To explicitly use historical cues, our "Past Reasoning" module learns to refine the tracks and enhance the object features by cross-attending to queries from previous frames and other objects. The "Future Reasoning" module digests historical information and predicts robust future trajectories. In the case of long-term occlusions, our method maintains the object positions and enables re-association by integrating motion predictions. On the nuScenes dataset, our method improves AMOTA by a large margin and remarkably reduces ID-Switches by 90% compared to prior approaches, which is an order of magnitude less. The code and models are made available at https://github.com/TRI-ML/PF-Track.
△ Less
Submitted 3 April, 2023; v1 submitted 7 February, 2023;
originally announced February 2023.
-
QCD Factorization of Quasi Generalized Gluon Distributions
Authors:
J. P. Ma,
Z. Y. Pang,
C. P. Zhang,
G. P. Zhang
Abstract:
We study the factorization relations between quasi gluon GPDs and twist-2 GPDs. The perturbative coefficient functions are obtained at one-loop level. They are free from any collinear- or I.R. divergences. Unlike the case of the factorization of quasi quark GPDs at one-loop, we have to add ghost contributions for the factorization of quasi gluon GPDs in order to obtain gauge-invariant results. In…
▽ More
We study the factorization relations between quasi gluon GPDs and twist-2 GPDs. The perturbative coefficient functions are obtained at one-loop level. They are free from any collinear- or I.R. divergences. Unlike the case of the factorization of quasi quark GPDs at one-loop, we have to add ghost contributions for the factorization of quasi gluon GPDs in order to obtain gauge-invariant results. In general, operators will be mixed beyond tree-level. Our work shows that the mixing pattern of the nonlocal operators in quasi gluon GPDs is the same as local operators, i.e., the nonlocal operators considered are mixed with gauge-invariant operators, BRST-variation operators and operators involving EOM operator. The factorization relations are obtained for all quasi gluon GPDs. Taking the forward limit, we also obtain the relations between quasi gluon PDFs and twist-2 PDFs.
△ Less
Submitted 19 April, 2023; v1 submitted 15 December, 2022;
originally announced December 2022.
-
Security Defense of Large Scale Networks Under False Data Injection Attacks: An Attack Detection Scheduling Approach
Authors:
Yuhan Suo,
Senchun Chai,
Runqi Chai,
Zhong-Hua Pang,
Yuanqing Xia,
Guo-Ping Liu
Abstract:
In large-scale networks, communication links between nodes are easily injected with false data by adversaries. This paper proposes a novel security defense strategy from the perspective of attack detection scheduling to ensure the security of the network. Based on the proposed strategy, each sensor can directly exclude suspicious sensors from its neighboring set. First, the problem of selecting su…
▽ More
In large-scale networks, communication links between nodes are easily injected with false data by adversaries. This paper proposes a novel security defense strategy from the perspective of attack detection scheduling to ensure the security of the network. Based on the proposed strategy, each sensor can directly exclude suspicious sensors from its neighboring set. First, the problem of selecting suspicious sensors is formulated as a combinatorial optimization problem, which is non-deterministic polynomial-time hard (NP-hard). To solve this problem, the original function is transformed into a submodular function. Then, we propose an attack detection scheduling algorithm based on the sequential submodular optimization theory, which incorporates \emph{expert problem} to better utilize historical information to guide the sensor selection task at the current moment. For different attack strategies, theoretical results show that the average optimization rate of the proposed algorithm has a lower bound, and the error expectation is bounded. In addition, under two kinds of insecurity conditions, the proposed algorithm can guarantee the security of the entire network from the perspective of the augmented estimation error. Finally, the effectiveness of the developed method is verified by the numerical simulation and practical experiment.
△ Less
Submitted 17 December, 2023; v1 submitted 11 December, 2022;
originally announced December 2022.
-
Credit Assignment for Trained Neural Networks Based on Koopman Operator Theory
Authors:
Zhen Liang,
Changyuan Zhao,
Wanwei Liu,
Bai Xue,
Wenjing Yang,
Zhengbin Pang
Abstract:
Credit assignment problem of neural networks refers to evaluating the credit of each network component to the final outputs. For an untrained neural network, approaches to tackling it have made great contributions to parameter update and model revolution during the training phase. This problem on trained neural networks receives rare attention, nevertheless, it plays an increasingly important role…
▽ More
Credit assignment problem of neural networks refers to evaluating the credit of each network component to the final outputs. For an untrained neural network, approaches to tackling it have made great contributions to parameter update and model revolution during the training phase. This problem on trained neural networks receives rare attention, nevertheless, it plays an increasingly important role in neural network patch, specification and verification. Based on Koopman operator theory, this paper presents an alternative perspective of linear dynamics on dealing with the credit assignment problem for trained neural networks. Regarding a neural network as the composition of sub-dynamics series, we utilize step-delay embedding to capture snapshots of each component, characterizing the established mapping as exactly as possible. To circumvent the dimension-difference problem encountered during the embedding, a composition and decomposition of an auxiliary linear layer, termed minimal linear dimension alignment, is carefully designed with rigorous formal guarantee. Afterwards, each component is approximated by a Koopman operator and we derive the Jacobian matrix and its corresponding determinant, similar to backward propagation. Then, we can define a metric with algebraic interpretability for the credit assignment of each network component. Moreover, experiments conducted on typical neural networks demonstrate the effectiveness of the proposed method.
△ Less
Submitted 2 December, 2022;
originally announced December 2022.
-
Online Optimization in Power Systems with High Penetration of Renewable Generation: Advances and Prospects
Authors:
Zhaojian Wang,
Wei Wei,
John Zhen Fu Pang,
Feng Liu,
Bo Yang,
Xinping Guan,
Shengwei Mei
Abstract:
Traditionally, offline optimization of power systems is acceptable due to the largely predictable loads and reliable generation. The increasing penetration of fluctuating renewable generation and Internet-of-Things devices allowing for fine-grained controllability of loads have led to the diminishing applicability of offline optimization in the power systems domain, and have redirected attention t…
▽ More
Traditionally, offline optimization of power systems is acceptable due to the largely predictable loads and reliable generation. The increasing penetration of fluctuating renewable generation and Internet-of-Things devices allowing for fine-grained controllability of loads have led to the diminishing applicability of offline optimization in the power systems domain, and have redirected attention to online optimization methods. However, online optimization is a broad topic that can be applied in and motivated by different settings, operated on different time scales, and built on different theoretical foundations. This paper reviews the various types of online optimization techniques used in the power systems domain and aims to make clear the distinction between the most common techniques used. In particular, we introduce and compare four distinct techniques used covering the breadth of online optimization techniques used in the power systems domain, i.e., optimization-guided dynamic control, feedback optimization for single-period problems, Lyapunov-based optimization, and online convex optimization techniques for multi-period problems. Lastly, we recommend some potential future directions for online optimization in the power systems domain.
△ Less
Submitted 26 November, 2022;
originally announced November 2022.
-
Contrastive Losses Are Natural Criteria for Unsupervised Video Summarization
Authors:
Zongshang Pang,
Yuta Nakashima,
Mayu Otani,
Hajime Nagahara
Abstract:
Video summarization aims to select the most informative subset of frames in a video to facilitate efficient video browsing. Unsupervised methods usually rely on heuristic training objectives such as diversity and representativeness. However, such methods need to bootstrap the online-generated summaries to compute the objectives for importance score regression. We consider such a pipeline inefficie…
▽ More
Video summarization aims to select the most informative subset of frames in a video to facilitate efficient video browsing. Unsupervised methods usually rely on heuristic training objectives such as diversity and representativeness. However, such methods need to bootstrap the online-generated summaries to compute the objectives for importance score regression. We consider such a pipeline inefficient and seek to directly quantify the frame-level importance with the help of contrastive losses in the representation learning literature. Leveraging the contrastive losses, we propose three metrics featuring a desirable key frame: local dissimilarity, global consistency, and uniqueness. With features pre-trained on the image classification task, the metrics can already yield high-quality importance scores, demonstrating competitive or better performance than past heavily-trained methods. We show that by refining the pre-trained features with a lightweight contrastively learned projection module, the frame-level importance scores can be further improved, and the model can also leverage a large number of random videos and generalize to test videos with decent performance. Code available at https://github.com/pangzss/pytorch-CTVSUM.
△ Less
Submitted 18 November, 2022;
originally announced November 2022.
-
Distributed Online Generalized Nash Equilibrium Tracking for Prosumer Energy Trading Games
Authors:
Yongkai Xie,
Zhaojian Wang,
John Z. F. Pang,
Bo Yang,
Xinping Guan
Abstract:
With the proliferation of distributed generations, traditional passive consumers in distribution networks are evolving into "prosumers", which can both produce and consume energy. Energy trading with the main grid or between prosumers is inevitable if the energy surplus and shortage exist. To this end, this paper investigates the peer-to-peer (P2P) energy trading market, which is formulated as a g…
▽ More
With the proliferation of distributed generations, traditional passive consumers in distribution networks are evolving into "prosumers", which can both produce and consume energy. Energy trading with the main grid or between prosumers is inevitable if the energy surplus and shortage exist. To this end, this paper investigates the peer-to-peer (P2P) energy trading market, which is formulated as a generalized Nash game. We first prove the existence and uniqueness of the generalized Nash equilibrium (GNE). Then, an distributed online algorithm is proposed to track the GNE in the time-varying environment. Its regret is proved to be bounded by a sublinear function of learning time, which indicates that the online algorithm has an acceptable accuracy in practice. Finally, numerical results with six microgrids validate the performance of the algorithm.
△ Less
Submitted 5 October, 2022;
originally announced October 2022.
-
On Efficient Reinforcement Learning for Full-length Game of StarCraft II
Authors:
Ruo-Ze Liu,
Zhen-Jia Pang,
Zhou-Yu Meng,
Wenhai Wang,
Yang Yu,
Tong Lu
Abstract:
StarCraft II (SC2) poses a grand challenge for reinforcement learning (RL), of which the main difficulties include huge state space, varying action space, and a long time horizon. In this work, we investigate a set of RL techniques for the full-length game of StarCraft II. We investigate a hierarchical RL approach involving extracted macro-actions and a hierarchical architecture of neural networks…
▽ More
StarCraft II (SC2) poses a grand challenge for reinforcement learning (RL), of which the main difficulties include huge state space, varying action space, and a long time horizon. In this work, we investigate a set of RL techniques for the full-length game of StarCraft II. We investigate a hierarchical RL approach involving extracted macro-actions and a hierarchical architecture of neural networks. We investigate a curriculum transfer training procedure and train the agent on a single machine with 4 GPUs and 48 CPU threads. On a 64x64 map and using restrictive units, we achieve a win rate of 99% against the level-1 built-in AI. Through the curriculum transfer learning algorithm and a mixture of combat models, we achieve a 93% win rate against the most difficult non-cheating level built-in AI (level-7). In this extended version of the paper, we improve our architecture to train the agent against the cheating level AIs and achieve the win rate against the level-8, level-9, and level-10 AIs as 96%, 97%, and 94%, respectively. Our codes are at https://github.com/liuruoze/HierNet-SC2. To provide a baseline referring the AlphaStar for our work as well as the research and open-source community, we reproduce a scaled-down version of it, mini-AlphaStar (mAS). The latest version of mAS is 1.07, which can be trained on the raw action space which has 564 actions. It is designed to run training on a single common machine, by making the hyper-parameters adjustable. We then compare our work with mAS using the same resources and show that our method is more effective. The codes of mini-AlphaStar are at https://github.com/liuruoze/mini-AlphaStar. We hope our study could shed some light on the future research of efficient reinforcement learning on SC2 and other large-scale games.
△ Less
Submitted 23 September, 2022;
originally announced September 2022.
-
Hardware-in-the-Loop Simulation for Evaluating Communication Impacts on the Wireless-Network-Controlled Robots
Authors:
Honghao Lv,
Zhibo Pang,
Ming Xiao,
Geng Yang
Abstract:
More and more robot automation applications have changed to wireless communication, and network performance has a growing impact on robotic systems. This study proposes a hardware-in-the-loop (HiL) simulation methodology for connecting the simulated robot platform to real network devices. This project seeks to provide robotic engineers and researchers with the capability to experiment without heav…
▽ More
More and more robot automation applications have changed to wireless communication, and network performance has a growing impact on robotic systems. This study proposes a hardware-in-the-loop (HiL) simulation methodology for connecting the simulated robot platform to real network devices. This project seeks to provide robotic engineers and researchers with the capability to experiment without heavily modifying the original controller and get more realistic test results that correlate with actual network conditions. We deployed this HiL simulation system in two common cases for wireless-network-controlled robotic applications: (1) safe multi-robot coordination for mobile robots, and (2) human-motion-based teleoperation for manipulators. The HiL simulation system is deployed and tested under various network conditions in all circumstances. The experiment results are analyzed and compared with the previous simulation methods, demonstrating that the proposed HiL simulation methodology can identify a more reliable communication impact on robot systems.
△ Less
Submitted 28 September, 2022; v1 submitted 14 July, 2022;
originally announced July 2022.
-
Scalable Bayesian Inference for Detection and Deblending in Astronomical Images
Authors:
Derek Hansen,
Ismael Mendoza,
Runjing Liu,
Ziteng Pang,
Zhe Zhao,
Camille Avestruz,
Jeffrey Regier
Abstract:
We present a new probabilistic method for detecting, deblending, and cataloging astronomical sources called the Bayesian Light Source Separator (BLISS). BLISS is based on deep generative models, which embed neural networks within a Bayesian model. For posterior inference, BLISS uses a new form of variational inference known as Forward Amortized Variational Inference. The BLISS inference routine is…
▽ More
We present a new probabilistic method for detecting, deblending, and cataloging astronomical sources called the Bayesian Light Source Separator (BLISS). BLISS is based on deep generative models, which embed neural networks within a Bayesian model. For posterior inference, BLISS uses a new form of variational inference known as Forward Amortized Variational Inference. The BLISS inference routine is fast, requiring a single forward pass of the encoder networks on a GPU once the encoder networks are trained. BLISS can perform fully Bayesian inference on megapixel images in seconds, and produces highly accurate catalogs. BLISS is highly extensible, and has the potential to directly answer downstream scientific questions in addition to producing probabilistic catalogs.
△ Less
Submitted 12 July, 2022;
originally announced July 2022.
-
Indoor optical fiber eavesdropping approach and its avoidance
Authors:
Haiqing Hao,
Zhongwang Pang,
Guan Wang,
Bo Wang
Abstract:
The optical fiber network has become a worldwide infrastructure. In addition to the basic functions in telecommunication, its sensing ability has attracted more and more attention. In this paper, we discuss the risk of household fiber being used for eavesdropping and demonstrate its performance in the lab. Using a 3-meter tail fiber in front of the household optical modem, voices of normal human s…
▽ More
The optical fiber network has become a worldwide infrastructure. In addition to the basic functions in telecommunication, its sensing ability has attracted more and more attention. In this paper, we discuss the risk of household fiber being used for eavesdropping and demonstrate its performance in the lab. Using a 3-meter tail fiber in front of the household optical modem, voices of normal human speech can be eavesdropped by a laser interferometer and recovered 1.1 km away. The detection distance limit and system noise are analyzed quantitatively. We also give some practical ways to prevent eavesdropping through household fiber.
△ Less
Submitted 3 August, 2022; v1 submitted 11 July, 2022;
originally announced July 2022.
-
A Unified Description for Polarization-Transfer Mechanisms in Magnetic Resonance in Static Solids: Cross Polarization and DNP
Authors:
Zhenfeng Pang,
Sheetal Jain,
Chen Yang,
Xueqian Kong,
Kong Ooi Tan
Abstract:
Polarization transfers are crucial building blocks in magnetic resonance experiments, i.e., they can be used to polarize insensitive nuclei and correlate nuclear spins in multidimensional NMR spectroscopy. The polarization can be transferred either across different nuclear spin species or from electron spins to the relatively low-polarized nuclear spins. The former route occurring in solid-state N…
▽ More
Polarization transfers are crucial building blocks in magnetic resonance experiments, i.e., they can be used to polarize insensitive nuclei and correlate nuclear spins in multidimensional NMR spectroscopy. The polarization can be transferred either across different nuclear spin species or from electron spins to the relatively low-polarized nuclear spins. The former route occurring in solid-state NMR (ssNMR) can be performed via cross polarization (CP), while the latter route is known as dynamic nuclear polarization (DNP). Despite having different operating conditions, we opinionate that both mechanisms are theoretically similar processes in ideal conditions, i.e., the electron is merely another spin-1/2 particle with a much higher gyromagnetic ratio. Here, we show that the CP and DNP processes can be described using a unified theory based on average Hamiltonian theory (AHT) combined with fictitious operators. The intuitive and unified approach has allowed new insights on the cross effect (CE) DNP mechanism, leading to better design of DNP polarizing agents and extending the applications beyond just hyperpolarization. We explore the possibility of exploiting theoretically predicted DNP transients for electron-nucleus distance measurements--like routine dipolar-recoupling experiments in solid-state NMR.
△ Less
Submitted 21 March, 2022;
originally announced March 2022.
-
Short-Packet Interleaver against Impulse Interference in Practical Industrial Environments
Authors:
Ming Zhan,
Zhibo Pang,
Dacfey Dzung,
Kan Yu,
Ming Xiao
Abstract:
The most common cause of transmission failure in Wireless High Performance (WirelessHP) target industry environments is impulse interference. As interleavers are commonly used to improve the reliability on the Orthogonal Frequency Division Multiplexing (OFDM) symbol level for long packet transmission, this paper considers the feasibility of applying short-packet bit interleaving to enhance the imp…
▽ More
The most common cause of transmission failure in Wireless High Performance (WirelessHP) target industry environments is impulse interference. As interleavers are commonly used to improve the reliability on the Orthogonal Frequency Division Multiplexing (OFDM) symbol level for long packet transmission, this paper considers the feasibility of applying short-packet bit interleaving to enhance the impulse/burst interference resisting capability on both OFDM symbol and frame level. Using the Universal Software Radio Peripherals (USRP) and PC hardware platform, the Packet Error Rate (PER) performance of interleaved coded short-packet transmission with Convolutional Codes (CC), Reed-Solomon codes (RS) and RS+CC concatenated codes are tested and analyzed. Applying the IEEE 1613 standard for impulse interference generation, extensive PER tests of CC(1=2) and RS(31; 21)+CC(1=2) concatenated codes are performed. With practical experiments, we prove the effectiveness of bit in terleaved coded short-packet transmission in real factory environments. We also investigate how PER performance depends on the interleavers, codes and impulse interference power and frequency.
△ Less
Submitted 1 March, 2022;
originally announced March 2022.
-
QCD Factorization of Quasi Generalized Quark Distributions
Authors:
J. P. Ma,
Z. Y. Pang,
G. P. Zhang
Abstract:
We study the factorization of quasi generalized quark distributions with twist-2 generalized parton distributions. We use an approach which is different than that used in literature. Using the approach we derive the factorization relations of all quasi generalized quark distributions at one-loop. The contributions from twist-2 generalized gluon distributions are included. Our results apply not onl…
▽ More
We study the factorization of quasi generalized quark distributions with twist-2 generalized parton distributions. We use an approach which is different than that used in literature. Using the approach we derive the factorization relations of all quasi generalized quark distributions at one-loop. The contributions from twist-2 generalized gluon distributions are included. Our results apply not only to the quasi distributions of a spin-1/2 hadron but also to those of a hadron with any spin.
△ Less
Submitted 30 July, 2022; v1 submitted 14 February, 2022;
originally announced February 2022.
-
Epicenter localization using forward-transmission laser interferometry
Authors:
Bohan Zhang,
Guan Wang,
Zhongwang Pang,
Bo Wang
Abstract:
Widely distributed optical fibers, together with phase-sensitive laser interferometry, can expand seismic detection methods and have great potential for epicenter localization. In this paper, we propose an integral response method based on a forward transmission scheme. It uses spectrum analysis and parameter fitting to localize the epicenter. With the given shape of the fiber ring, the integral p…
▽ More
Widely distributed optical fibers, together with phase-sensitive laser interferometry, can expand seismic detection methods and have great potential for epicenter localization. In this paper, we propose an integral response method based on a forward transmission scheme. It uses spectrum analysis and parameter fitting to localize the epicenter. With the given shape of the fiber ring, the integral phase changes of light propagating in the forward and reverse directions can be used to determine the direction, depth, distance of the epicenter, and seismic wave speed. For the noisy case with SNR=20 dB, the simulation results show ultrahigh precision when epicenter distance is 200 km: the error of the orientation angle is ~0.003°, the error of the P-wave speed is ~0.9 m/s, the error of the epicenter depth is ~9.5 m, and the error of the epicenter distance is ~200 m.
△ Less
Submitted 20 June, 2022; v1 submitted 11 February, 2022;
originally announced February 2022.
-
Embracing Single Stride 3D Object Detector with Sparse Transformer
Authors:
Lue Fan,
Ziqi Pang,
Tianyuan Zhang,
Yu-Xiong Wang,
Hang Zhao,
Feng Wang,
Naiyan Wang,
Zhaoxiang Zhang
Abstract:
In LiDAR-based 3D object detection for autonomous driving, the ratio of the object size to input scene size is significantly smaller compared to 2D detection cases. Overlooking this difference, many 3D detectors directly follow the common practice of 2D detectors, which downsample the feature maps even after quantizing the point clouds. In this paper, we start by rethinking how such multi-stride s…
▽ More
In LiDAR-based 3D object detection for autonomous driving, the ratio of the object size to input scene size is significantly smaller compared to 2D detection cases. Overlooking this difference, many 3D detectors directly follow the common practice of 2D detectors, which downsample the feature maps even after quantizing the point clouds. In this paper, we start by rethinking how such multi-stride stereotype affects the LiDAR-based 3D object detectors. Our experiments point out that the downsampling operations bring few advantages, and lead to inevitable information loss. To remedy this issue, we propose Single-stride Sparse Transformer (SST) to maintain the original resolution from the beginning to the end of the network. Armed with transformers, our method addresses the problem of insufficient receptive field in single-stride architectures. It also cooperates well with the sparsity of point clouds and naturally avoids expensive computation. Eventually, our SST achieves state-of-the-art results on the large scale Waymo Open Dataset. It is worth mentioning that our method can achieve exciting performance (83.8 LEVEL 1 AP on validation split) on small object (pedestrian) detection due to the characteristic of single stride. Codes will be released at https://github.com/TuSimple/SST
△ Less
Submitted 12 December, 2021;
originally announced December 2021.
-
Immortal Tracker: Tracklet Never Dies
Authors:
Qitai Wang,
Yuntao Chen,
Ziqi Pang,
Naiyan Wang,
Zhaoxiang Zhang
Abstract:
Previous online 3D Multi-Object Tracking(3DMOT) methods terminate a tracklet when it is not associated with new detections for a few frames. But if an object just goes dark, like being temporarily occluded by other objects or simply getting out of FOV, terminating a tracklet prematurely will result in an identity switch. We reveal that premature tracklet termination is the main cause of identity s…
▽ More
Previous online 3D Multi-Object Tracking(3DMOT) methods terminate a tracklet when it is not associated with new detections for a few frames. But if an object just goes dark, like being temporarily occluded by other objects or simply getting out of FOV, terminating a tracklet prematurely will result in an identity switch. We reveal that premature tracklet termination is the main cause of identity switches in modern 3DMOT systems. To address this, we propose Immortal Tracker, a simple tracking system that utilizes trajectory prediction to maintain tracklets for objects gone dark. We employ a simple Kalman filter for trajectory prediction and preserve the tracklet by prediction when the target is not visible. With this method, we can avoid 96% vehicle identity switches resulting from premature tracklet termination. Without any learned parameters, our method achieves a mismatch ratio at the 0.0001 level and competitive MOTA for the vehicle class on the Waymo Open Dataset test set. Our mismatch ratio is tens of times lower than any previously published method. Similar results are reported on nuScenes. We believe the proposed Immortal Tracker can offer a simple yet powerful solution for pushing the limit of 3DMOT. Our code is available at https://github.com/ImmortalTracker/ImmortalTracker.
△ Less
Submitted 26 November, 2021;
originally announced November 2021.
-
Satellite Based Computing Networks with Federated Learning
Authors:
Hao Chen,
Ming Xiao,
Zhibo Pang
Abstract:
Driven by the ever-increasing penetration and proliferation of data-driven applications, a new generation of wireless communication, the sixth-generation (6G) mobile system enhanced by artificial intelligence (AI), has attracted substantial research interests. Among various candidate technologies of 6G, low earth orbit (LEO) satellites have appealing characteristics of ubiquitous wireless access.…
▽ More
Driven by the ever-increasing penetration and proliferation of data-driven applications, a new generation of wireless communication, the sixth-generation (6G) mobile system enhanced by artificial intelligence (AI), has attracted substantial research interests. Among various candidate technologies of 6G, low earth orbit (LEO) satellites have appealing characteristics of ubiquitous wireless access. However, the costs of satellite communication (SatCom) are still high, relative to counterparts of ground mobile networks. To support massively interconnected devices with intelligent adaptive learning and reduce expensive traffic in SatCom, we propose federated learning (FL) in LEO-based satellite communication networks. We first review the state-of-the-art LEO-based SatCom and related machine learning (ML) techniques, and then analyze four possible ways of combining ML with satellite networks. The learning performance of the proposed strategies is evaluated by simulation and results reveal that FL-based computing networks improve the performance of communication overheads and latency. Finally, we discuss future research topics along this research direction.
△ Less
Submitted 20 November, 2021;
originally announced November 2021.
-
SimpleTrack: Understanding and Rethinking 3D Multi-object Tracking
Authors:
Ziqi Pang,
Zhichao Li,
Naiyan Wang
Abstract:
3D multi-object tracking (MOT) has witnessed numerous novel benchmarks and approaches in recent years, especially those under the "tracking-by-detection" paradigm. Despite their progress and usefulness, an in-depth analysis of their strengths and weaknesses is not yet available. In this paper, we summarize current 3D MOT methods into a unified framework by decomposing them into four constituent pa…
▽ More
3D multi-object tracking (MOT) has witnessed numerous novel benchmarks and approaches in recent years, especially those under the "tracking-by-detection" paradigm. Despite their progress and usefulness, an in-depth analysis of their strengths and weaknesses is not yet available. In this paper, we summarize current 3D MOT methods into a unified framework by decomposing them into four constituent parts: pre-processing of detection, association, motion model, and life cycle management. We then ascribe the failure cases of existing algorithms to each component and investigate them in detail. Based on the analyses, we propose corresponding improvements which lead to a strong yet simple baseline: SimpleTrack. Comprehensive experimental results on Waymo Open Dataset and nuScenes demonstrate that our final method could achieve new state-of-the-art results with minor modifications.
Furthermore, we take additional steps and rethink whether current benchmarks authentically reflect the ability of algorithms for real-world challenges. We delve into the details of existing benchmarks and find some intriguing facts. Finally, we analyze the distribution and causes of remaining failures in \name\ and propose future directions for 3D MOT. Our code is available at https://github.com/TuSimple/SimpleTrack.
△ Less
Submitted 18 November, 2021;
originally announced November 2021.
-
Comparison between Time Shifting Deviation and Cross-correlation Methods
Authors:
Zhongwang Pang,
Guan Wang,
Bo Wang,
Lijun Wang
Abstract:
Time delay estimation (TDE) is an important step to identify and locate vibration source. The TDE result can be obtained by cross-correlation method through seeking the maximum correlation peak of two signals. However, the cross-correlation method will induce random error when dealing with the nonstationary signal. We propose a novel time shifting deviation (TSDEV) method to solve this problem, wh…
▽ More
Time delay estimation (TDE) is an important step to identify and locate vibration source. The TDE result can be obtained by cross-correlation method through seeking the maximum correlation peak of two signals. However, the cross-correlation method will induce random error when dealing with the nonstationary signal. We propose a novel time shifting deviation (TSDEV) method to solve this problem, which has been proved to achieve ultrahigh precision localization result in the fiber vibration monitoring system. This paper compares TSDEV method with cross-correlation in detail by simulating TDE process in different conditions, such as signals with arbitrary intercepted length, nonstationary drift and correlated noise. Besides, experimental demonstration has been carried out on 60 km fiber to localize a wide band vibration signal. The typical localization error is 2 m with standard deviation of 21.4 m using TSDEV method. It stands in clear contrast to the result of cross-correlation method, whose localization error is 70 m and the standard deviation is 208.4 m. Compared with cross-correlation method, TSDEV has the same resistance to white noise, but has fewer boundary conditions and better suppression on linear drift or common noise, which leads to more precise TDE results.
△ Less
Submitted 15 November, 2021;
originally announced November 2021.
-
Noise Error Pattern Generation Based on Successive Addition-Subtraction for Guessing Decoding
Authors:
Ming Zhan,
Zhibo Pang,
Kan Yu,
Jing Xu,
Fang Wu
Abstract:
Guessing random additive noise decoding (GRAND) algorithm has emerged as an excellent decoding strategy that can meet both the high reliability and low latency constraints. This paper proposes a successive addition-subtraction algorithm to generate noise error permutations. A noise error patterns generation scheme is presented by embedding the "1" and "0" bursts alternately. Then detailed procedur…
▽ More
Guessing random additive noise decoding (GRAND) algorithm has emerged as an excellent decoding strategy that can meet both the high reliability and low latency constraints. This paper proposes a successive addition-subtraction algorithm to generate noise error permutations. A noise error patterns generation scheme is presented by embedding the "1" and "0" bursts alternately. Then detailed procedures of the proposed algorithm are presented, and its correctness is also demonstrated through theoretical derivations. The aim of this work is to provide a preliminary paradigm and reference for future research on GRAND algorithm and hardware implementation.
△ Less
Submitted 1 November, 2021;
originally announced November 2021.
-
Reconfigurable Intelligent Surface-induced Randomness for mmWave Key Generation
Authors:
Shubo Yang,
Han Han,
Yihong Liu,
Weisi Guo,
Zhibo Pang,
Lei Zhang
Abstract:
Secret key generation in physical layer security exploits the unpredictable random nature of wireless channels. The millimeter-wave (mmWave) channels have limited multipath and channel randomness in static environments. In this paper, for mmWave secret key generation of physical layer security, we use a reconfigurable intelligent surface (RIS) to induce randomness directly in wireless environments…
▽ More
Secret key generation in physical layer security exploits the unpredictable random nature of wireless channels. The millimeter-wave (mmWave) channels have limited multipath and channel randomness in static environments. In this paper, for mmWave secret key generation of physical layer security, we use a reconfigurable intelligent surface (RIS) to induce randomness directly in wireless environments, without adding complexity to transceivers. We consider RIS to have continuous individual phase shifts (CIPS) and derive the RIS-assisted reflection channel distribution with its parameters. Then, we propose continuous group phase shifts (CGPS) to increase the randomness specifically at legal parties. Since the continuous phase shifts are expensive to implement, we analyze discrete individual phase shifts (DIPS) and derive the corresponding channel distribution, which is dependent on the quantization bit. We then derive the secret key rate (SKR) to evaluate the randomness performance. With the simulation results verifying the analytical results, this work explains the mathematical principles and lays a foundation for future mmWave evaluation and optimization of artificial channel randomness.
△ Less
Submitted 8 August, 2022; v1 submitted 31 October, 2021;
originally announced November 2021.
-
Time shifting deviation method enhanced laser interferometry: ultrahigh precision localizing of traffic vibration using urban fiber link
Authors:
Guan Wang,
Zhongwang Pang,
Bohan Zhang,
Fangmin Wang,
Yufeng Chen,
Hongfei Dai,
Bo Wang,
Lijun Wang
Abstract:
Using fiber network as a huge sensing system will enrich monitoring methods of public infrastructures and geological disasters. With traditional cross-correlation method, laser interferometer has been used to detect and localize the vibration event. However, the random error induced by cross-correlation method limits the localization accuracy, and makes it not suitable for ultrahigh precision loca…
▽ More
Using fiber network as a huge sensing system will enrich monitoring methods of public infrastructures and geological disasters. With traditional cross-correlation method, laser interferometer has been used to detect and localize the vibration event. However, the random error induced by cross-correlation method limits the localization accuracy, and makes it not suitable for ultrahigh precision localizing applications. We propose a novel time shifting deviation (TSDEV) method, which has advantages over cross-correlation method in practicability and localization accuracy. Three experiments are carried out to demonstrate the novelty of the TSDEV method. In lab test, vibration localization accuracy of ~2.5 m is realized. In field tests, TSDEV method enhanced interferometry is applied to monitor the urban fiber link. Traffic vibration events on the campus road and Beijing ring road have been precisely localized and analyzed, respectively. The proposed technique will extend the function of existing urban fiber network, and better serve the future smart city.
△ Less
Submitted 25 January, 2022; v1 submitted 9 October, 2021;
originally announced October 2021.
-
Massive-MIMO MF Beamforming with or without Grouped STBC for Ultra-Reliable Single-Shot Transmission Using Aged CSIT
Authors:
Jinfei Wang,
Yi Ma,
Na Yi,
Rahim Tafazolli,
Zhibo Pang
Abstract:
The technology of using massive transmit-antennas to enable ultra-reliable single-shot transmission (URSST) is challenged by the transmitter-side channel knowledge (i.e., CSIT) imperfection. When the imperfectness mainly comes from the channel time-variation, the outage probability of the matched filter (MF) transmitter beamforming is investigated based on the first-order Markov model of the aged…
▽ More
The technology of using massive transmit-antennas to enable ultra-reliable single-shot transmission (URSST) is challenged by the transmitter-side channel knowledge (i.e., CSIT) imperfection. When the imperfectness mainly comes from the channel time-variation, the outage probability of the matched filter (MF) transmitter beamforming is investigated based on the first-order Markov model of the aged CSIT. With a fixed transmit-power, the transmitter-side uncertainty of the instantaneous signal-to-noise ratio (iSNR) is mathematically characterized. In order to guarantee the outage probability for every single shot, a transmit-power adaptation approach is proposed to satisfy a pessimistic iSNR requirement, which is predicted using the Chernoff lower bound of the beamforming gain. Our numerical results demonstrate a remarkable transmit-power efficiency when comparing with power control approaches using other lower bounds. In addition, a combinatorial approach of the MF beamforming and grouped space-time block code (G-STBC) is proposed to further mitigate the detrimental impact of the CSIT uncertainty. It is shown, through both theoretical analysis and computer simulations, that the combinatorial approach can further improve the transmit-power efficiency with a good tradeoff between the outage probability and the latency.
△ Less
Submitted 6 October, 2021; v1 submitted 3 October, 2021;
originally announced October 2021.
-
Nonlocal Patch-Based Fully-Connected Tensor Network Decomposition for Remote Sensing Image Inpainting
Authors:
Wen-Jie Zheng,
Xi-Le Zhao,
Yu-Bang Zheng,
Zhi-Feng Pang
Abstract:
Remote sensing image (RSI) inpainting plays an important role in real applications. Recently, fully-connected tensor network (FCTN) decomposition has been shown the remarkable ability to fully characterize the global correlation. Considering the global correlation and the nonlocal self-similarity (NSS) of RSIs, this paper introduces the FCTN decomposition to the whole RSI and its NSS groups, and p…
▽ More
Remote sensing image (RSI) inpainting plays an important role in real applications. Recently, fully-connected tensor network (FCTN) decomposition has been shown the remarkable ability to fully characterize the global correlation. Considering the global correlation and the nonlocal self-similarity (NSS) of RSIs, this paper introduces the FCTN decomposition to the whole RSI and its NSS groups, and proposes a novel nonlocal patch-based FCTN (NL-FCTN) decomposition for RSI inpainting. Different from other nonlocal patch-based methods, the NL-FCTN decomposition-based method, which increases tensor order by stacking similar small-sized patches to NSS groups, cleverly leverages the remarkable ability of FCTN decomposition to deal with higher-order tensors. Besides, we propose an efficient proximal alternating minimization-based algorithm to solve the proposed NL-FCTN decomposition-based model with a theoretical convergence guarantee. Extensive experiments on RSIs demonstrate that the proposed method achieves the state-of-the-art inpainting performance in all compared methods.
△ Less
Submitted 13 September, 2021;
originally announced September 2021.
-
Twist-3 Double Spin Asymmetries in Drell-Yan Processes
Authors:
M. C. Hu,
J. P. Ma,
Z. Y. Pang,
G. P. Zhang
Abstract:
We study double spin asymmetries in Drell-Yan processes in which one initial hadron is transversely polarized and another one is longitudinally polarized. The complete part of the hadronic tensor relevant to asymmetries is derived. This part consists of twist-2 and twist-3 parton distributions and is gauge invariant. We construct some observables which can be used to extract these parton distribut…
▽ More
We study double spin asymmetries in Drell-Yan processes in which one initial hadron is transversely polarized and another one is longitudinally polarized. The complete part of the hadronic tensor relevant to asymmetries is derived. This part consists of twist-2 and twist-3 parton distributions and is gauge invariant. We construct some observables which can be used to extract these parton distributions from experimental measurements.
△ Less
Submitted 13 January, 2022; v1 submitted 3 August, 2021;
originally announced August 2021.
-
TextFlint: Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing
Authors:
Tao Gui,
Xiao Wang,
Qi Zhang,
Qin Liu,
Yicheng Zou,
Xin Zhou,
Rui Zheng,
Chong Zhang,
Qinzhuo Wu,
Jiacheng Ye,
Zexiong Pang,
Yongxin Zhang,
Zhengyan Li,
Ruotian Ma,
Zichu Fei,
Ruijian Cai,
Jun Zhao,
Xingwu Hu,
Zhiheng Yan,
Yiding Tan,
Yuan Hu,
Qiyuan Bian,
Zhihua Liu,
Bolin Zhu,
Shan Qin
, et al. (9 additional authors not shown)
Abstract:
Various robustness evaluation methodologies from different perspectives have been proposed for different natural language processing (NLP) tasks. These methods have often focused on either universal or task-specific generalization capabilities. In this work, we propose a multilingual robustness evaluation platform for NLP tasks (TextFlint) that incorporates universal text transformation, task-spec…
▽ More
Various robustness evaluation methodologies from different perspectives have been proposed for different natural language processing (NLP) tasks. These methods have often focused on either universal or task-specific generalization capabilities. In this work, we propose a multilingual robustness evaluation platform for NLP tasks (TextFlint) that incorporates universal text transformation, task-specific transformation, adversarial attack, subpopulation, and their combinations to provide comprehensive robustness analysis. TextFlint enables practitioners to automatically evaluate their models from all aspects or to customize their evaluations as desired with just a few lines of code. To guarantee user acceptability, all the text transformations are linguistically based, and we provide a human evaluation for each one. TextFlint generates complete analytical reports as well as targeted augmented data to address the shortcomings of the model's robustness. To validate TextFlint's utility, we performed large-scale empirical evaluations (over 67,000 evaluations) on state-of-the-art deep learning models, classic supervised methods, and real-world systems. Almost all models showed significant performance degradation, including a decline of more than 50% of BERT's prediction accuracy on tasks such as aspect-level sentiment classification, named entity recognition, and natural language inference. Therefore, we call for the robustness to be included in the model evaluation, so as to promote the healthy development of NLP technology.
△ Less
Submitted 5 May, 2021; v1 submitted 21 March, 2021;
originally announced March 2021.
-
Model-free Vehicle Tracking and State Estimation in Point Cloud Sequences
Authors:
Ziqi Pang,
Zhichao Li,
Naiyan Wang
Abstract:
Estimating the states of surrounding traffic participants stays at the core of autonomous driving. In this paper, we study a novel setting of this problem: model-free single-object tracking (SOT), which takes the object state in the first frame as input, and jointly solves state estimation and tracking in subsequent frames. The main purpose for this new setting is to break the strong limitation of…
▽ More
Estimating the states of surrounding traffic participants stays at the core of autonomous driving. In this paper, we study a novel setting of this problem: model-free single-object tracking (SOT), which takes the object state in the first frame as input, and jointly solves state estimation and tracking in subsequent frames. The main purpose for this new setting is to break the strong limitation of the popular "detection and tracking" scheme in multi-object tracking. Moreover, we notice that shape completion by overlaying the point clouds, which is a by-product of our proposed task, not only improves the performance of state estimation but also has numerous applications. As no benchmark for this task is available so far, we construct a new dataset LiDAR-SOT and corresponding evaluation protocols based on the Waymo Open dataset. We then propose an optimization-based algorithm called SOTracker involving point cloud registration, vehicle shapes, correspondence, and motion priors. Our quantitative and qualitative results prove the effectiveness of our SOTracker and reveal the challenging cases for SOT in point clouds, including the sparsity of LiDAR data, abrupt motion variation, etc. Finally, we also explore how the proposed task and algorithm may benefit other autonomous driving applications, including simulating LiDAR scans, generating motion data, and annotating optical flow. The code and protocols for our benchmark and algorithm are available at https://github.com/TuSimple/LiDAR_SOT/. A video demonstration is at https://www.youtube.com/watch?v=BpHixKs91i8.
△ Less
Submitted 5 August, 2021; v1 submitted 10 March, 2021;
originally announced March 2021.
-
Predictive coding feedback results in perceived illusory contours in a recurrent neural network
Authors:
Zhaoyang Pang,
Callum Biggs O'May,
Bhavin Choksi,
Rufin VanRullen
Abstract:
Modern feedforward convolutional neural networks (CNNs) can now solve some computer vision tasks at super-human levels. However, these networks only roughly mimic human visual perception. One difference from human vision is that they do not appear to perceive illusory contours (e.g. Kanizsa squares) in the same way humans do. Physiological evidence from visual cortex suggests that the perception o…
▽ More
Modern feedforward convolutional neural networks (CNNs) can now solve some computer vision tasks at super-human levels. However, these networks only roughly mimic human visual perception. One difference from human vision is that they do not appear to perceive illusory contours (e.g. Kanizsa squares) in the same way humans do. Physiological evidence from visual cortex suggests that the perception of illusory contours could involve feedback connections. Would recurrent feedback neural networks perceive illusory contours like humans? In this work we equip a deep feedforward convolutional network with brain-inspired recurrent dynamics. The network was first pretrained with an unsupervised reconstruction objective on a natural image dataset, to expose it to natural object contour statistics. Then, a classification decision layer was added and the model was finetuned on a form discrimination task: squares vs. randomly oriented inducer shapes (no illusory contour). Finally, the model was tested with the unfamiliar ''illusory contour'' configuration: inducer shapes oriented to form an illusory square. Compared with feedforward baselines, the iterative ''predictive coding'' feedback resulted in more illusory contours being classified as physical squares. The perception of the illusory contour was measurable in the luminance profile of the image reconstructions produced by the model, demonstrating that the model really ''sees'' the illusion. Ablation studies revealed that natural image pretraining and feedback error correction are both critical to the perception of the illusion. Finally we validated our conclusions in a deeper network (VGG): adding the same predictive coding feedback dynamics again leads to the perception of illusory contours.
△ Less
Submitted 16 June, 2021; v1 submitted 3 February, 2021;
originally announced February 2021.