-
Top Pass: Improve Code Generation by Pass@k-Maximized Code Ranking
Authors:
Zhi-Cun Lyu,
Xin-Ye Li,
Zheng Xie,
Ming Li
Abstract:
Code generation has been greatly enhanced by the profound advancements in Large Language Models (LLMs) recently. Nevertheless, such LLM-based code generation approaches still struggle to generate error-free code in a few tries when faced with complex problems. To address this, the prevailing strategy is to sample a huge number of candidate programs, with the hope of any one in them could work. How…
▽ More
Code generation has been greatly enhanced by the profound advancements in Large Language Models (LLMs) recently. Nevertheless, such LLM-based code generation approaches still struggle to generate error-free code in a few tries when faced with complex problems. To address this, the prevailing strategy is to sample a huge number of candidate programs, with the hope of any one in them could work. However, users of code generation systems usually expect to find a correct program by reviewing or testing only a small number of code candidates. Otherwise, the system would be unhelpful. In this paper, we propose Top Pass, a code ranking approach that identifies potential correct solutions from a large number of candidates. Top Pass directly optimizes the pass@k loss function, enhancing the quality at the top of the candidate list. This enables the user to find the correct solution within as few tries as possible. Experimental results on four benchmarks indicate that our Top Pass method enhances the usability of code generation models by producing better ranking results, particularly achieving a 32.9\% relative improvement in pass@1 on CodeContests when compared to the state-of-the-art ranking method.
△ Less
Submitted 11 August, 2024;
originally announced August 2024.
-
You Augment Me: Exploring ChatGPT-based Data Augmentation for Semantic Code Search
Authors:
Yanlin Wang,
Lianghong Guo,
Ensheng Shic,
Wenqing Chen,
Jiachi Chen,
Wanjun Zhong,
Menghan Wang,
Hui Li,
Hongyu Zhang,
Ziyu Lyu,
Zibin Zheng
Abstract:
Code search plays a crucial role in software development, enabling developers to retrieve and reuse code using natural language queries. While the performance of code search models improves with an increase in high-quality data, obtaining such data can be challenging and expensive. Recently, large language models (LLMs) such as ChatGPT have made remarkable progress in both natural and programming…
▽ More
Code search plays a crucial role in software development, enabling developers to retrieve and reuse code using natural language queries. While the performance of code search models improves with an increase in high-quality data, obtaining such data can be challenging and expensive. Recently, large language models (LLMs) such as ChatGPT have made remarkable progress in both natural and programming language understanding and generation, offering user-friendly interaction via simple prompts. Inspired by these advancements, we propose a novel approach ChatDANCE, which utilizes high-quality and diverse augmented data generated by a large language model and leverages a filtering mechanism to eliminate low-quality augmentations. Specifically, we first propose a set of ChatGPT prompting rules that are specifically designed for source code and queries. Then, we leverage ChatGPT to rewrite code and queries based on the according prompts and then propose a filtering mechanism which trains a cross-encoder from the backbone model UniXcoder to filter out code and query pairs with low matching scores. Finally, we re-train the backbone model using the obtained high-quality augmented data. Experimental results show that ChatDANCE achieves state-of-the-art performance, improving the best baseline by 13.2% (R@1) and 7% (MRR). Surprisingly, we find that this augment-filter-retrain strategy enables the backbone model (UniXcoder) to self-grow. Moreover, extensive experiments show the effectiveness of each component and ChatDANCE has stable performance under different hyperparameter settings. In addition, we conduct qualitative and quantitative analyses to investigate why ChatDANCE works well and find that it learns a more uniform distribution of representations and effectively aligns the code and query spaces.
△ Less
Submitted 10 August, 2024;
originally announced August 2024.
-
Knowledge Probing for Graph Representation Learning
Authors:
Mingyu Zhao,
Xingyu Huang,
Ziyu Lyu,
Yanlin Wang,
Lixin Cui,
Lu Bai
Abstract:
Graph learning methods have been extensively applied in diverse application areas. However, what kind of inherent graph properties e.g. graph proximity, graph structural information has been encoded into graph representation learning for downstream tasks is still under-explored. In this paper, we propose a novel graph probing framework (GraphProbe) to investigate and interpret whether the family o…
▽ More
Graph learning methods have been extensively applied in diverse application areas. However, what kind of inherent graph properties e.g. graph proximity, graph structural information has been encoded into graph representation learning for downstream tasks is still under-explored. In this paper, we propose a novel graph probing framework (GraphProbe) to investigate and interpret whether the family of graph learning methods has encoded different levels of knowledge in graph representation learning. Based on the intrinsic properties of graphs, we design three probes to systematically investigate the graph representation learning process from different perspectives, respectively the node-wise level, the path-wise level, and the structural level. We construct a thorough evaluation benchmark with nine representative graph learning methods from random walk based approaches, basic graph neural networks and self-supervised graph methods, and probe them on six benchmark datasets for node classification, link prediction and graph classification. The experimental evaluation verify that GraphProbe can estimate the capability of graph representation learning. Remaking results have been concluded: GCN and WeightedGCN methods are relatively versatile methods achieving better results with respect to different tasks.
△ Less
Submitted 7 August, 2024;
originally announced August 2024.
-
FACTTRACK: Time-Aware World State Tracking in Story Outlines
Authors:
Zhiheng Lyu,
Kevin Yang,
Lingpeng Kong,
Daniel Klein
Abstract:
While accurately detecting and correcting factual contradictions in language model outputs has become increasingly important as their capabilities improve, doing so is highly challenging. We propose a novel method, FACTTRACK, for tracking atomic facts and addressing factual contradictions. Crucially, FACTTRACK also maintains time-aware validity intervals for each fact, allowing for change over tim…
▽ More
While accurately detecting and correcting factual contradictions in language model outputs has become increasingly important as their capabilities improve, doing so is highly challenging. We propose a novel method, FACTTRACK, for tracking atomic facts and addressing factual contradictions. Crucially, FACTTRACK also maintains time-aware validity intervals for each fact, allowing for change over time. At a high level, FACTTRACK consists of a four-step pipeline to update a world state data structure for each new event: (1) decompose the event into directional atomic facts; (2) determine the validity interval of each atomic fact using the world state; (3) detect contradictions with existing facts in the world state; and finally (4) add new facts to the world state and update existing atomic facts. When we apply FACTTRACK to contradiction detection on structured story outlines, we find that FACTTRACK using LLaMA2-7B-Chat substantially outperforms a fair baseline using LLaMA2-7B-Chat, and achieves performance comparable to a GPT4 baseline. Moreover, when using GPT4, FACTTRACK significantly outperforms the GPT4 baseline.
△ Less
Submitted 23 July, 2024;
originally announced July 2024.
-
Tell Me Where You Are: Multimodal LLMs Meet Place Recognition
Authors:
Zonglin Lyu,
Juexiao Zhang,
Mingxuan Lu,
Yiming Li,
Chen Feng
Abstract:
Large language models (LLMs) exhibit a variety of promising capabilities in robotics, including long-horizon planning and commonsense reasoning. However, their performance in place recognition is still underexplored. In this work, we introduce multimodal LLMs (MLLMs) to visual place recognition (VPR), where a robot must localize itself using visual observations. Our key design is to use vision-bas…
▽ More
Large language models (LLMs) exhibit a variety of promising capabilities in robotics, including long-horizon planning and commonsense reasoning. However, their performance in place recognition is still underexplored. In this work, we introduce multimodal LLMs (MLLMs) to visual place recognition (VPR), where a robot must localize itself using visual observations. Our key design is to use vision-based retrieval to propose several candidates and then leverage language-based reasoning to carefully inspect each candidate for a final decision. Specifically, we leverage the robust visual features produced by off-the-shelf vision foundation models (VFMs) to obtain several candidate locations. We then prompt an MLLM to describe the differences between the current observation and each candidate in a pairwise manner, and reason about the best candidate based on these descriptions. Our results on three datasets demonstrate that integrating the general-purpose visual features from VFMs with the reasoning capabilities of MLLMs already provides an effective place recognition solution, without any VPR-specific supervised training. We believe our work can inspire new possibilities for applying and designing foundation models, i.e., VFMs, LLMs, and MLLMs, to enhance the localization and navigation of mobile robots.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation
Authors:
Xuan He,
Dongfu Jiang,
Ge Zhang,
Max Ku,
Achint Soni,
Sherman Siu,
Haonan Chen,
Abhranil Chandra,
Ziyan Jiang,
Aaran Arulraj,
Kai Wang,
Quy Duc Do,
Yuansheng Ni,
Bohan Lyu,
Yaswanth Narsupalli,
Rongqi Fan,
Zhiheng Lyu,
Yuchen Lin,
Wenhu Chen
Abstract:
The recent years have witnessed great advances in video generation. However, the development of automatic video metrics is lagging significantly behind. None of the existing metric is able to provide reliable scores over generated videos. The main barrier is the lack of large-scale human-annotated dataset. In this paper, we release VideoFeedback, the first large-scale dataset containing human-prov…
▽ More
The recent years have witnessed great advances in video generation. However, the development of automatic video metrics is lagging significantly behind. None of the existing metric is able to provide reliable scores over generated videos. The main barrier is the lack of large-scale human-annotated dataset. In this paper, we release VideoFeedback, the first large-scale dataset containing human-provided multi-aspect score over 37.6K synthesized videos from 11 existing video generative models. We train VideoScore (initialized from Mantis) based on VideoFeedback to enable automatic video quality assessment. Experiments show that the Spearman correlation between VideoScore and humans can reach 77.1 on VideoFeedback-test, beating the prior best metrics by about 50 points. Further result on other held-out EvalCrafter, GenAI-Bench, and VBench show that VideoScore has consistently much higher correlation with human judges than other metrics. Due to these results, we believe VideoScore can serve as a great proxy for human raters to (1) rate different video models to track progress (2) simulate fine-grained human feedback in Reinforcement Learning with Human Feedback (RLHF) to improve current video generation models.
△ Less
Submitted 24 June, 2024; v1 submitted 21 June, 2024;
originally announced June 2024.
-
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning
Authors:
Chaojie Wang,
Yanchen Deng,
Zhiyi Lyu,
Liang Zeng,
Jujie He,
Shuicheng Yan,
Bo An
Abstract:
Large Language Models (LLMs) have demonstrated impressive capability in many natural language tasks. However, the auto-regressive generation process makes LLMs prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning. In this paper, by casting multi-step reasoning of LLMs as a heuristic search problem, we aim to alleviate the pathology by introducing…
▽ More
Large Language Models (LLMs) have demonstrated impressive capability in many natural language tasks. However, the auto-regressive generation process makes LLMs prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning. In this paper, by casting multi-step reasoning of LLMs as a heuristic search problem, we aim to alleviate the pathology by introducing Q*, a general, versatile and agile framework for guiding LLMs decoding process with deliberative planning. By learning a plug-and-play Q-value model as heuristic function for estimating expected future rewards, our Q* can effectively guide LLMs to select the most promising next reasoning step without fine-tuning LLMs for the current task, which avoids the significant computational overhead and potential risk of performance degeneration on other tasks. Extensive experiments on GSM8K, MATH and MBPP demonstrate the superiority of our method, contributing to improving the reasoning performance of existing open-source LLMs.
△ Less
Submitted 22 July, 2024; v1 submitted 20 June, 2024;
originally announced June 2024.
-
Multiagent Multitraversal Multimodal Self-Driving: Open MARS Dataset
Authors:
Yiming Li,
Zhiheng Li,
Nuo Chen,
Moonjun Gong,
Zonglin Lyu,
Zehong Wang,
Peili Jiang,
Chen Feng
Abstract:
Large-scale datasets have fueled recent advancements in AI-based autonomous vehicle research. However, these datasets are usually collected from a single vehicle's one-time pass of a certain location, lacking multiagent interactions or repeated traversals of the same place. Such information could lead to transformative enhancements in autonomous vehicles' perception, prediction, and planning capab…
▽ More
Large-scale datasets have fueled recent advancements in AI-based autonomous vehicle research. However, these datasets are usually collected from a single vehicle's one-time pass of a certain location, lacking multiagent interactions or repeated traversals of the same place. Such information could lead to transformative enhancements in autonomous vehicles' perception, prediction, and planning capabilities. To bridge this gap, in collaboration with the self-driving company May Mobility, we present the MARS dataset which unifies scenarios that enable MultiAgent, multitraveRSal, and multimodal autonomous vehicle research. More specifically, MARS is collected with a fleet of autonomous vehicles driving within a certain geographical area. Each vehicle has its own route and different vehicles may appear at nearby locations. Each vehicle is equipped with a LiDAR and surround-view RGB cameras. We curate two subsets in MARS: one facilitates collaborative driving with multiple vehicles simultaneously present at the same location, and the other enables memory retrospection through asynchronous traversals of the same location by multiple vehicles. We conduct experiments in place recognition and neural reconstruction. More importantly, MARS introduces new research opportunities and challenges such as multitraversal 3D reconstruction, multiagent perception, and unsupervised object discovery. Our data and codes can be found at https://ai4ce.github.io/MARS/.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Embracing Radiance Field Rendering in 6G: Over-the-Air Training and Inference with 3D Contents
Authors:
Guanlin Wu,
Zhonghao Lyu,
Juyong Zhang,
Jie Xu
Abstract:
The efficient representation, transmission, and reconstruction of three-dimensional (3D) contents are becoming increasingly important for sixth-generation (6G) networks that aim to merge virtual and physical worlds for offering immersive communication experiences. Neural radiance field (NeRF) and 3D Gaussian splatting (3D-GS) have recently emerged as two promising 3D representation techniques base…
▽ More
The efficient representation, transmission, and reconstruction of three-dimensional (3D) contents are becoming increasingly important for sixth-generation (6G) networks that aim to merge virtual and physical worlds for offering immersive communication experiences. Neural radiance field (NeRF) and 3D Gaussian splatting (3D-GS) have recently emerged as two promising 3D representation techniques based on radiance field rendering, which are able to provide photorealistic rendering results for complex scenes. Therefore, embracing NeRF and 3D-GS in 6G networks is envisioned to be a prominent solution to support emerging 3D applications with enhanced quality of experience. This paper provides a comprehensive overview on the integration of NeRF and 3D-GS in 6G. First, we review the basics of the radiance field rendering techniques, and highlight their applications and implementation challenges over wireless networks. Next, we consider the over-the-air training of NeRF and 3D-GS models over wireless networks by presenting various learning techniques. We particularly focus on the federated learning design over a hierarchical device-edge-cloud architecture, which is suitable for exploiting distributed data and computing resources over 6G networks to train large models representing large-scale scenes. Then, we consider the over-the-air rendering of NeRF and 3D-GS models at wireless network edge. We present three practical rendering architectures, namely local, remote, and co-rendering, respectively, and provide model compression approaches to facilitate the transmission of radiance field models for rendering. We also present rendering acceleration approaches and joint computation and communication designs to enhance the rendering efficiency. In a case study, we propose a new semantic communication enabled 3D content transmission design.
△ Less
Submitted 18 June, 2024; v1 submitted 20 May, 2024;
originally announced May 2024.
-
Frame Interpolation with Consecutive Brownian Bridge Diffusion
Authors:
Zonglin Lyu,
Ming Li,
Jianbo Jiao,
Chen Chen
Abstract:
Recent work in Video Frame Interpolation (VFI) tries to formulate VFI as a diffusion-based conditional image generation problem, synthesizing the intermediate frame given a random noise and neighboring frames. Due to the relatively high resolution of videos, Latent Diffusion Models (LDMs) are employed as the conditional generation model, where the autoencoder compresses images into latent represen…
▽ More
Recent work in Video Frame Interpolation (VFI) tries to formulate VFI as a diffusion-based conditional image generation problem, synthesizing the intermediate frame given a random noise and neighboring frames. Due to the relatively high resolution of videos, Latent Diffusion Models (LDMs) are employed as the conditional generation model, where the autoencoder compresses images into latent representations for diffusion and then reconstructs images from these latent representations. Such a formulation poses a crucial challenge: VFI expects that the output is deterministically equal to the ground truth intermediate frame, but LDMs randomly generate a diverse set of different images when the model runs multiple times. The reason for the diverse generation is that the cumulative variance (variance accumulated at each step of generation) of generated latent representations in LDMs is large. This makes the sampling trajectory random, resulting in diverse rather than deterministic generations. To address this problem, we propose our unique solution: Frame Interpolation with Consecutive Brownian Bridge Diffusion. Specifically, we propose consecutive Brownian Bridge diffusion that takes a deterministic initial value as input, resulting in a much smaller cumulative variance of generated latent representations. Our experiments suggest that our method can improve together with the improvement of the autoencoder and achieve state-of-the-art performance in VFI, leaving strong potential for further enhancement.
△ Less
Submitted 6 August, 2024; v1 submitted 9 May, 2024;
originally announced May 2024.
-
Intelligent Reflecting Surface Aided AirComp: Multi-Timescale Design and Performance Analysis
Authors:
Guangji Chen,
Jun Li,
Qingqing Wu,
Meng Hua,
Kaitao Meng,
Zhonghao Lyu
Abstract:
The integration of intelligent reflecting surface (IRS) into over-the-air computation (AirComp) is an effective solution for reducing the computational mean squared error (MSE) via its high passive beamforming gain. Prior works on IRS aided AirComp generally rely on the full instantaneous channel state information (I-CSI), which is not applicable to large-scale systems due to its heavy signalling…
▽ More
The integration of intelligent reflecting surface (IRS) into over-the-air computation (AirComp) is an effective solution for reducing the computational mean squared error (MSE) via its high passive beamforming gain. Prior works on IRS aided AirComp generally rely on the full instantaneous channel state information (I-CSI), which is not applicable to large-scale systems due to its heavy signalling overhead. To address this issue, we propose a novel multi-timescale transmission protocol. In particular, the receive beamforming at the access point (AP) is pre-determined based on the static angle information and the IRS phase-shifts are optimized relying on the long-term statistical CSI. With the obtained AP receive beamforming and IRS phase-shifts, the effective low-dimensional I-CSI is exploited to determine devices' transmit power in each coherence block, thus substantially reducing the signalling overhead. Theoretical analysis unveils that the achievable MSE scales on the order of ${\cal O}\left( {K/\left( {{N^2}M} \right)} \right)$, where $M$, $N$, and $K$ are the number of AP antennas, IRS elements, and devices, respectively. We also prove that the channel-inversion power control is asymptotically optimal for large $N$, which reveals that the full power transmission policy is not needed for lowering the power consumption of energy-limited devices.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
On the Causal Nature of Sentiment Analysis
Authors:
Zhiheng Lyu,
Zhijing Jin,
Fernando Gonzalez,
Rada Mihalcea,
Bernhard Schoelkopf,
Mrinmaya Sachan
Abstract:
Sentiment analysis (SA) aims to identify the sentiment expressed in a text, such as a product review. Given a review and the sentiment associated with it, this paper formulates SA as a combination of two tasks: (1) a causal discovery task that distinguishes whether a review "primes" the sentiment (Causal Hypothesis C1), or the sentiment "primes" the review (Causal Hypothesis C2); and (2) the tradi…
▽ More
Sentiment analysis (SA) aims to identify the sentiment expressed in a text, such as a product review. Given a review and the sentiment associated with it, this paper formulates SA as a combination of two tasks: (1) a causal discovery task that distinguishes whether a review "primes" the sentiment (Causal Hypothesis C1), or the sentiment "primes" the review (Causal Hypothesis C2); and (2) the traditional prediction task to model the sentiment using the review as input. Using the peak-end rule in psychology, we classify a sample as C1 if its overall sentiment score approximates an average of all the sentence-level sentiments in the review, and C2 if the overall sentiment score approximates an average of the peak and end sentiments. For the prediction task, we use the discovered causal mechanisms behind the samples to improve the performance of LLMs by proposing causal prompts that give the models an inductive bias of the underlying causal graph, leading to substantial improvements by up to 32.13 F1 points on zero-shot five-class SA. Our code is at https://github.com/cogito233/causal-sa
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Rethinking Resource Management in Edge Learning: A Joint Pre-training and Fine-tuning Design Paradigm
Authors:
Zhonghao Lyu,
Yuchen Li,
Guangxu Zhu,
Jie Xu,
H. Vincent Poor,
Shuguang Cui
Abstract:
In some applications, edge learning is experiencing a shift in focusing from conventional learning from scratch to new two-stage learning unifying pre-training and task-specific fine-tuning. This paper considers the problem of joint communication and computation resource management in a two-stage edge learning system. In this system, model pre-training is first conducted at an edge server via cent…
▽ More
In some applications, edge learning is experiencing a shift in focusing from conventional learning from scratch to new two-stage learning unifying pre-training and task-specific fine-tuning. This paper considers the problem of joint communication and computation resource management in a two-stage edge learning system. In this system, model pre-training is first conducted at an edge server via centralized learning on local pre-stored general data, and then task-specific fine-tuning is performed at edge devices based on the pre-trained model via federated edge learning. For the two-stage learning model, we first analyze the convergence behavior (in terms of the average squared gradient norm bound), which characterizes the impacts of various system parameters such as the number of learning rounds and batch sizes in the two stages on the convergence rate. Based on our analytical results, we then propose a joint communication and computation resource management design to minimize an average squared gradient norm bound, subject to constraints on the transmit power, overall system energy consumption, and training delay. The decision variables include the number of learning rounds, batch sizes, clock frequencies, and transmit power control for both pre-training and fine-tuning stages. Finally, numerical results are provided to evaluate the effectiveness of our proposed design. It is shown that the proposed joint resource management over the pre-training and fine-tuning stages well balances the system performance trade-off among the training accuracy, delay, and energy consumption. The proposed design is also shown to effectively leverage the inherent trade-off between pre-training and fine-tuning, which arises from the differences in data distribution between pre-stored general data versus real-time task-specific data, thus efficiently optimizing overall system performance.
△ Less
Submitted 31 March, 2024;
originally announced April 2024.
-
SSHPool: The Separated Subgraph-based Hierarchical Pooling
Authors:
Zhuo Xu,
Lixin Cui,
Ming Li,
Yue Wang,
Ziyu Lyu,
Hangyuan Du,
Lu Bai,
Philip S. Yu,
Edwin R. Hancock
Abstract:
In this paper, we develop a novel local graph pooling method, namely the Separated Subgraph-based Hierarchical Pooling (SSHPool), for graph classification. We commence by assigning the nodes of a sample graph into different clusters, resulting in a family of separated subgraphs. We individually employ the local graph convolution units as the local structure to further compress each subgraph into a…
▽ More
In this paper, we develop a novel local graph pooling method, namely the Separated Subgraph-based Hierarchical Pooling (SSHPool), for graph classification. We commence by assigning the nodes of a sample graph into different clusters, resulting in a family of separated subgraphs. We individually employ the local graph convolution units as the local structure to further compress each subgraph into a coarsened node, transforming the original graph into a coarsened graph. Since these subgraphs are separated by different clusters and the structural information cannot be propagated between them, the local convolution operation can significantly avoid the over-smoothing problem caused by message passing through edges in most existing Graph Neural Networks (GNNs). By hierarchically performing the proposed procedures on the resulting coarsened graph, the proposed SSHPool can effectively extract the hierarchical global features of the original graph structure, encapsulating rich intrinsic structural characteristics. Furthermore, we develop an end-to-end GNN framework associated with the SSHPool module for graph classification. Experimental results demonstrate the superior performance of the proposed model on real-world datasets.
△ Less
Submitted 13 August, 2024; v1 submitted 24 March, 2024;
originally announced March 2024.
-
GetMesh: A Controllable Model for High-quality Mesh Generation and Manipulation
Authors:
Zhaoyang Lyu,
Ben Fei,
Jinyi Wang,
Xudong Xu,
Ya Zhang,
Weidong Yang,
Bo Dai
Abstract:
Mesh is a fundamental representation of 3D assets in various industrial applications, and is widely supported by professional softwares. However, due to its irregular structure, mesh creation and manipulation is often time-consuming and labor-intensive. In this paper, we propose a highly controllable generative model, GetMesh, for mesh generation and manipulation across different categories. By ta…
▽ More
Mesh is a fundamental representation of 3D assets in various industrial applications, and is widely supported by professional softwares. However, due to its irregular structure, mesh creation and manipulation is often time-consuming and labor-intensive. In this paper, we propose a highly controllable generative model, GetMesh, for mesh generation and manipulation across different categories. By taking a varying number of points as the latent representation, and re-organizing them as triplane representation, GetMesh generates meshes with rich and sharp details, outperforming both single-category and multi-category counterparts. Moreover, it also enables fine-grained control over the generation process that previous mesh generative models cannot achieve, where changing global/local mesh topologies, adding/removing mesh parts, and combining mesh parts across categories can be intuitively, efficiently, and robustly accomplished by adjusting the number, positions or features of latent points. Project page is https://getmesh.github.io.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Minimally Supervised Topological Projections of Self-Organizing Maps for Phase of Flight Identification
Authors:
Zimeng Lyu,
Pujan Thapa,
Travis Desell
Abstract:
Identifying phases of flight is important in the field of general aviation, as knowing which phase of flight data is collected from aircraft flight data recorders can aid in the more effective detection of safety or hazardous events. General aviation flight data for phase of flight identification is usually per-second data, comes on a large scale, and is class imbalanced. It is expensive to manual…
▽ More
Identifying phases of flight is important in the field of general aviation, as knowing which phase of flight data is collected from aircraft flight data recorders can aid in the more effective detection of safety or hazardous events. General aviation flight data for phase of flight identification is usually per-second data, comes on a large scale, and is class imbalanced. It is expensive to manually label the data and training classification models usually faces class imbalance problems. This work investigates the use of a novel method for minimally supervised self-organizing maps (MS-SOMs) which utilize nearest neighbor majority votes in the SOM U-matrix for class estimation. Results show that the proposed method can reach or exceed a naive SOM approach which utilized a full data file of labeled data, with only 30 labeled datapoints per class. Additionally, the minimally supervised SOM is significantly more robust to the class imbalance of the phase of flight data. These results highlight how little data is required for effective phase of flight identification.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
Spherical Density-Equalizing Map for Genus-0 Closed Surfaces
Authors:
Zhiyuan Lyu,
Lok Ming Lui,
Gary P. T. Choi
Abstract:
Density-equalizing maps are a class of mapping methods in which the shape deformation is driven by prescribed density information. In recent years, they have been widely used for data visualization on planar domains and planar parameterization of open surfaces. However, the theory and computation of density-equalizing maps for closed surfaces are much less explored. In this work, we develop a nove…
▽ More
Density-equalizing maps are a class of mapping methods in which the shape deformation is driven by prescribed density information. In recent years, they have been widely used for data visualization on planar domains and planar parameterization of open surfaces. However, the theory and computation of density-equalizing maps for closed surfaces are much less explored. In this work, we develop a novel method for computing spherical density-equalizing maps for genus-0 closed surfaces. Specifically, we first compute a conformal parameterization of the given genus-0 closed surface onto the unit sphere. Then, we perform density equalization on the spherical domain based on the given density information to achieve a spherical density-equalizing map. The bijectivity of the mapping is guaranteed using quasi-conformal theory. We further propose a method for incorporating the harmonic energy and landmark constraints into our formulation to achieve landmark-aligned spherical density-equalizing maps balancing different distortion measures. Using the proposed methods, a large variety of spherical parameterizations can be achieved. Applications to surface registration, remeshing, and data visualization are presented to demonstrate the effectiveness of our methods.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
Minimally Supervised Learning using Topological Projections in Self-Organizing Maps
Authors:
Zimeng Lyu,
Alexander Ororbia,
Rui Li,
Travis Desell
Abstract:
Parameter prediction is essential for many applications, facilitating insightful interpretation and decision-making. However, in many real life domains, such as power systems, medicine, and engineering, it can be very expensive to acquire ground truth labels for certain datasets as they may require extensive and expensive laboratory testing. In this work, we introduce a semi-supervised learning ap…
▽ More
Parameter prediction is essential for many applications, facilitating insightful interpretation and decision-making. However, in many real life domains, such as power systems, medicine, and engineering, it can be very expensive to acquire ground truth labels for certain datasets as they may require extensive and expensive laboratory testing. In this work, we introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs), which significantly reduces the required number of labeled data points to perform parameter prediction, effectively exploiting information contained in large unlabeled datasets. Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU). The values estimated for newly-encountered data points are computed utilizing the average of the $n$ closest labeled data points in the SOM's U-matrix in tandem with a topological shortest path distance calculation scheme. Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques, including linear and polynomial regression, Gaussian process regression, K-nearest neighbors, as well as deep neural network models and related clustering schemes.
△ Less
Submitted 15 February, 2024; v1 submitted 12 January, 2024;
originally announced January 2024.
-
Inconsistency of cross-validation for structure learning in Gaussian graphical models
Authors:
Zhao Lyu,
Wai Ming Tai,
Mladen Kolar,
Bryon Aragam
Abstract:
Despite numerous years of research into the merits and trade-offs of various model selection criteria, obtaining robust results that elucidate the behavior of cross-validation remains a challenging endeavor. In this paper, we highlight the inherent limitations of cross-validation when employed to discern the structure of a Gaussian graphical model. We provide finite-sample bounds on the probabilit…
▽ More
Despite numerous years of research into the merits and trade-offs of various model selection criteria, obtaining robust results that elucidate the behavior of cross-validation remains a challenging endeavor. In this paper, we highlight the inherent limitations of cross-validation when employed to discern the structure of a Gaussian graphical model. We provide finite-sample bounds on the probability that the Lasso estimator for the neighborhood of a node within a Gaussian graphical model, optimized using a prediction oracle, misidentifies the neighborhood. Our results pertain to both undirected and directed acyclic graphs, encompassing general, sparse covariance structures. To support our theoretical findings, we conduct an empirical investigation of this inconsistency by contrasting our outcomes with other commonly used information criteria through an extensive simulation study. Given that many algorithms designed to learn the structure of graphical models require hyperparameter selection, the precise calibration of this hyperparameter is paramount for accurately estimating the inherent structure. Consequently, our observations shed light on this widely recognized practical challenge.
△ Less
Submitted 28 December, 2023;
originally announced December 2023.
-
Toward Spatial Temporal Consistency of Joint Visual Tactile Perception in VR Applications
Authors:
Fuqiang Zhao,
Kehan Zhang,
Qian Liu,
Zhuoyi Lyu
Abstract:
With the development of VR technology, especially the emergence of the metaverse concept, the integration of visual and tactile perception has become an expected experience in human-machine interaction. Therefore, achieving spatial-temporal consistency of visual and tactile information in VR applications has become a necessary factor for realizing this experience. The state-of-the-art vibrotactile…
▽ More
With the development of VR technology, especially the emergence of the metaverse concept, the integration of visual and tactile perception has become an expected experience in human-machine interaction. Therefore, achieving spatial-temporal consistency of visual and tactile information in VR applications has become a necessary factor for realizing this experience. The state-of-the-art vibrotactile datasets generally contain temporal-level vibrotactile information collected by randomly sliding on the surface of an object, along with the corresponding image of the material/texture. However, they lack the position/spatial information that corresponds to the signal acquisition, making it difficult to achieve spatiotemporal alignment of visual-tactile data. Therefore, we develop a new data acquisition system in this paper which can collect visual and vibrotactile signals of different textures/materials with spatial and temporal consistency. In addition, we develop a VR-based application call "V-Touching" by leveraging the dataset generated by the new acquisition system, which can provide pixel-to-taxel joint visual-tactile perception when sliding over the surface of objects in the virtual environment with distinct vibrotactile feedback of different textures/materials.
△ Less
Submitted 28 December, 2023; v1 submitted 26 December, 2023;
originally announced December 2023.
-
Revisiting Knowledge Distillation under Distribution Shift
Authors:
Songming Zhang,
Ziyu Lyu,
Xiaofeng Chen
Abstract:
Knowledge distillation transfers knowledge from large models into small models, and has recently made remarkable achievements. However, few studies has investigated the mechanism of knowledge distillation against distribution shift. Distribution shift refers to the data distribution drifts between training and testing phases. In this paper, we reconsider the paradigm of knowledge distillation by r…
▽ More
Knowledge distillation transfers knowledge from large models into small models, and has recently made remarkable achievements. However, few studies has investigated the mechanism of knowledge distillation against distribution shift. Distribution shift refers to the data distribution drifts between training and testing phases. In this paper, we reconsider the paradigm of knowledge distillation by reformulating the objective function in shift situations. Under the real scenarios, we propose a unified and systematic framework to benchmark knowledge distillation against two general distributional shifts including diversity and correlation shift. The evaluation benchmark covers more than 30 methods from algorithmic, data-driven, and optimization perspectives for five benchmark datasets. Overall, we conduct extensive experiments on the student model. We reveal intriguing observations of poor teaching performance under distribution shifts; in particular, complex algorithms and data augmentation offer limited gains in many cases.
△ Less
Submitted 7 January, 2024; v1 submitted 25 December, 2023;
originally announced December 2023.
-
CLadder: Assessing Causal Reasoning in Language Models
Authors:
Zhijing Jin,
Yuen Chen,
Felix Leeb,
Luigi Gresele,
Ojasv Kamal,
Zhiheng Lyu,
Kevin Blin,
Fernando Gonzalez Adauto,
Max Kleiman-Weiner,
Mrinmaya Sachan,
Bernhard Schölkopf
Abstract:
The ability to perform causal reasoning is widely considered a core feature of intelligence. In this work, we investigate whether large language models (LLMs) can coherently reason about causality. Much of the existing work in natural language processing (NLP) focuses on evaluating commonsense causal reasoning in LLMs, thus failing to assess whether a model can perform causal inference in accordan…
▽ More
The ability to perform causal reasoning is widely considered a core feature of intelligence. In this work, we investigate whether large language models (LLMs) can coherently reason about causality. Much of the existing work in natural language processing (NLP) focuses on evaluating commonsense causal reasoning in LLMs, thus failing to assess whether a model can perform causal inference in accordance with a set of well-defined formal rules. To address this, we propose a new NLP task, causal inference in natural language, inspired by the "causal inference engine" postulated by Judea Pearl et al. We compose a large dataset, CLadder, with 10K samples: based on a collection of causal graphs and queries (associational, interventional, and counterfactual), we obtain symbolic questions and ground-truth answers, through an oracle causal inference engine. These are then translated into natural language. We evaluate multiple LLMs on our dataset, and we introduce and evaluate a bespoke chain-of-thought prompting strategy, CausalCoT. We show that our task is highly challenging for LLMs, and we conduct an in-depth analysis to gain deeper insights into the causal reasoning abilities of LLMs. Our data is open-sourced at https://huggingface.co/datasets/causalNLP/cladder, and our code can be found at https://github.com/causalNLP/cladder.
△ Less
Submitted 17 January, 2024; v1 submitted 7 December, 2023;
originally announced December 2023.
-
Optimal Clustering of Discrete Mixtures: Binomial, Poisson, Block Models, and Multi-layer Networks
Authors:
Zhongyuan Lyu,
Ting Li,
Dong Xia
Abstract:
In this paper, we first study the fundamental limit of clustering networks when a multi-layer network is present. Under the mixture multi-layer stochastic block model (MMSBM), we show that the minimax optimal network clustering error rate, which takes an exponential form and is characterized by the Renyi divergence between the edge probability distributions of the component networks. We propose a…
▽ More
In this paper, we first study the fundamental limit of clustering networks when a multi-layer network is present. Under the mixture multi-layer stochastic block model (MMSBM), we show that the minimax optimal network clustering error rate, which takes an exponential form and is characterized by the Renyi divergence between the edge probability distributions of the component networks. We propose a novel two-stage network clustering method including a tensor-based initialization algorithm involving both node and sample splitting and a refinement procedure by likelihood-based Lloyd algorithm. Network clustering must be accompanied by node community detection. Our proposed algorithm achieves the minimax optimal network clustering error rate and allows extreme network sparsity under MMSBM. Numerical simulations and real data experiments both validate that our method outperforms existing methods. Oftentimes, the edges of networks carry count-type weights. We then extend our methodology and analysis framework to study the minimax optimal clustering error rate for mixture of discrete distributions including Binomial, Poisson, and multi-layer Poisson networks. The minimax optimal clustering error rates in these discrete mixtures all take the same exponential form characterized by the Renyi divergences. These optimal clustering error rates in discrete mixtures can also be achieved by our proposed two-stage clustering algorithm.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
Point Cloud Pre-training with Diffusion Models
Authors:
Xiao Zheng,
Xiaoshui Huang,
Guofeng Mei,
Yuenan Hou,
Zhaoyang Lyu,
Bo Dai,
Wanli Ouyang,
Yongshun Gong
Abstract:
Pre-training a model and then fine-tuning it on downstream tasks has demonstrated significant success in the 2D image and NLP domains. However, due to the unordered and non-uniform density characteristics of point clouds, it is non-trivial to explore the prior knowledge of point clouds and pre-train a point cloud backbone. In this paper, we propose a novel pre-training method called Point cloud Di…
▽ More
Pre-training a model and then fine-tuning it on downstream tasks has demonstrated significant success in the 2D image and NLP domains. However, due to the unordered and non-uniform density characteristics of point clouds, it is non-trivial to explore the prior knowledge of point clouds and pre-train a point cloud backbone. In this paper, we propose a novel pre-training method called Point cloud Diffusion pre-training (PointDif). We consider the point cloud pre-training task as a conditional point-to-point generation problem and introduce a conditional point generator. This generator aggregates the features extracted by the backbone and employs them as the condition to guide the point-to-point recovery from the noisy point cloud, thereby assisting the backbone in capturing both local and global geometric priors as well as the global point density distribution of the object. We also present a recurrent uniform sampling optimization strategy, which enables the model to uniformly recover from various noise levels and learn from balanced supervision. Our PointDif achieves substantial improvement across various real-world datasets for diverse downstream tasks such as classification, segmentation and detection. Specifically, PointDif attains 70.0% mIoU on S3DIS Area 5 for the segmentation task and achieves an average improvement of 2.4% on ScanObjectNN for the classification task compared to TAP. Furthermore, our pre-training framework can be flexibly applied to diverse point cloud backbones and bring considerable gains.
△ Less
Submitted 25 November, 2023;
originally announced November 2023.
-
An Overview on IEEE 802.11bf: WLAN Sensing
Authors:
Rui Du,
Haocheng Hua,
Hailiang Xie,
Xianxin Song,
Zhonghao Lyu,
Mengshi Hu,
Narengerile,
Yan Xin,
Stephen McCann,
Michael Montemurro,
Tony Xiao Han,
Jie Xu
Abstract:
With recent advancements, the wireless local area network (WLAN) or wireless fidelity (Wi-Fi) technology has been successfully utilized to realize sensing functionalities such as detection, localization, and recognition. However, the WLANs standards are developed mainly for the purpose of communication, and thus may not be able to meet the stringent requirements for emerging sensing applications.…
▽ More
With recent advancements, the wireless local area network (WLAN) or wireless fidelity (Wi-Fi) technology has been successfully utilized to realize sensing functionalities such as detection, localization, and recognition. However, the WLANs standards are developed mainly for the purpose of communication, and thus may not be able to meet the stringent requirements for emerging sensing applications. To resolve this issue, a new Task Group (TG), namely IEEE 802.11bf, has been established by the IEEE 802.11 working group, with the objective of creating a new amendment to the WLAN standard to meet advanced sensing requirements while minimizing the effect on communications. This paper provides a comprehensive overview on the up-to-date efforts in the IEEE 802.11bf TG. First, we introduce the definition of the 802.11bf amendment and its formation and standardization timeline. Next, we discuss the WLAN sensing use cases with the corresponding key performance indicator (KPI) requirements. After reviewing previous WLAN sensing research based on communication-oriented WLAN standards, we identify their limitations and underscore the practical need for the new sensing-oriented amendment in 802.11bf. Furthermore, we discuss the WLAN sensing framework and procedure used for measurement acquisition, by considering both sensing at sub-7GHz and directional multi-gigabit (DMG) sensing at 60 GHz, respectively, and address their shared features, similarities, and differences. In addition, we present various candidate technical features for IEEE 802.11bf, including waveform/sequence design, feedback types, as well as quantization and compression techniques. We also describe the methodologies and the channel modeling used by the IEEE 802.11bf TG for evaluation. Finally, we discuss the challenges and future research directions to motivate more research endeavors towards this field in details.
△ Less
Submitted 20 October, 2023;
originally announced October 2023.
-
Collaborative Visual Place Recognition
Authors:
Yiming Li,
Zonglin Lyu,
Mingxuan Lu,
Chao Chen,
Michael Milford,
Chen Feng
Abstract:
Visual place recognition (VPR) capabilities enable autonomous robots to navigate complex environments by discovering the environment's topology based on visual input. Most research efforts focus on enhancing the accuracy and robustness of single-robot VPR but often encounter issues such as occlusion due to individual viewpoints. Despite a number of research on multi-robot metric-based localization…
▽ More
Visual place recognition (VPR) capabilities enable autonomous robots to navigate complex environments by discovering the environment's topology based on visual input. Most research efforts focus on enhancing the accuracy and robustness of single-robot VPR but often encounter issues such as occlusion due to individual viewpoints. Despite a number of research on multi-robot metric-based localization, there is a notable gap in research concerning more robust and efficient place-based localization with a multi-robot system. This work proposes collaborative VPR, where multiple robots share abstracted visual features to enhance place recognition capabilities. We also introduce a novel collaborative VPR framework based on similarity-regularized information fusion, reducing irrelevant noise while harnessing valuable data from collaborators. This framework seamlessly integrates with well-established single-robot VPR techniques and supports end-to-end training with a weakly-supervised contrastive loss. We conduct experiments in urban, rural, and indoor scenes, achieving a notable improvement over single-agent VPR in urban environments (~12\%), along with consistent enhancements in rural (~3\%) and indoor (~1\%) scenarios. Our work presents a promising solution to the pressing challenges of VPR, representing a substantial step towards safe and robust autonomous systems.
△ Less
Submitted 9 October, 2023;
originally announced October 2023.
-
DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior
Authors:
Xinqi Lin,
Jingwen He,
Ziyan Chen,
Zhaoyang Lyu,
Bo Dai,
Fanghua Yu,
Wanli Ouyang,
Yu Qiao,
Chao Dong
Abstract:
We present DiffBIR, a general restoration pipeline that could handle different blind image restoration tasks in a unified framework. DiffBIR decouples blind image restoration problem into two stages: 1) degradation removal: removing image-independent content; 2) information regeneration: generating the lost image content. Each stage is developed independently but they work seamlessly in a cascaded…
▽ More
We present DiffBIR, a general restoration pipeline that could handle different blind image restoration tasks in a unified framework. DiffBIR decouples blind image restoration problem into two stages: 1) degradation removal: removing image-independent content; 2) information regeneration: generating the lost image content. Each stage is developed independently but they work seamlessly in a cascaded manner. In the first stage, we use restoration modules to remove degradations and obtain high-fidelity restored results. For the second stage, we propose IRControlNet that leverages the generative ability of latent diffusion models to generate realistic details. Specifically, IRControlNet is trained based on specially produced condition images without distracting noisy content for stable generation performance. Moreover, we design a region-adaptive restoration guidance that can modify the denoising process during inference without model re-training, allowing users to balance realness and fidelity through a tunable guidance scale. Extensive experiments have demonstrated DiffBIR's superiority over state-of-the-art approaches for blind image super-resolution, blind face restoration and blind image denoising tasks on both synthetic and real-world datasets. The code is available at https://github.com/XPixelGroup/DiffBIR.
△ Less
Submitted 12 April, 2024; v1 submitted 29 August, 2023;
originally announced August 2023.
-
TFDNet: Time-Frequency Enhanced Decomposed Network for Long-term Time Series Forecasting
Authors:
Yuxiao Luo,
Ziyu Lyu,
Xingyu Huang
Abstract:
Long-term time series forecasting is a vital task and has a wide range of real applications. Recent methods focus on capturing the underlying patterns from one single domain (e.g. the time domain or the frequency domain), and have not taken a holistic view to process long-term time series from the time-frequency domains. In this paper, we propose a Time-Frequency Enhanced Decomposed Network (TFDNe…
▽ More
Long-term time series forecasting is a vital task and has a wide range of real applications. Recent methods focus on capturing the underlying patterns from one single domain (e.g. the time domain or the frequency domain), and have not taken a holistic view to process long-term time series from the time-frequency domains. In this paper, we propose a Time-Frequency Enhanced Decomposed Network (TFDNet) to capture both the long-term underlying patterns and temporal periodicity from the time-frequency domain. In TFDNet, we devise a multi-scale time-frequency enhanced encoder backbone and develop two separate trend and seasonal time-frequency blocks to capture the distinct patterns within the decomposed trend and seasonal components in multi-resolutions. Diverse kernel learning strategies of the kernel operations in time-frequency blocks have been explored, by investigating and incorporating the potential different channel-wise correlation patterns of multivariate time series. Experimental evaluation of eight datasets from five benchmark domains demonstrated that TFDNet is superior to state-of-the-art approaches in both effectiveness and efficiency.
△ Less
Submitted 25 August, 2023;
originally announced August 2023.
-
MATLABER: Material-Aware Text-to-3D via LAtent BRDF auto-EncodeR
Authors:
Xudong Xu,
Zhaoyang Lyu,
Xingang Pan,
Bo Dai
Abstract:
Based on powerful text-to-image diffusion models, text-to-3D generation has made significant progress in generating compelling geometry and appearance. However, existing methods still struggle to recover high-fidelity object materials, either only considering Lambertian reflectance, or failing to disentangle BRDF materials from the environment lights. In this work, we propose Material-Aware Text-t…
▽ More
Based on powerful text-to-image diffusion models, text-to-3D generation has made significant progress in generating compelling geometry and appearance. However, existing methods still struggle to recover high-fidelity object materials, either only considering Lambertian reflectance, or failing to disentangle BRDF materials from the environment lights. In this work, we propose Material-Aware Text-to-3D via LAtent BRDF auto-EncodeR (\textbf{MATLABER}) that leverages a novel latent BRDF auto-encoder for material generation. We train this auto-encoder with large-scale real-world BRDF collections and ensure the smoothness of its latent space, which implicitly acts as a natural distribution of materials. During appearance modeling in text-to-3D generation, the latent BRDF embeddings, rather than BRDF parameters, are predicted via a material network. Through exhaustive experiments, our approach demonstrates the superiority over existing ones in generating realistic and coherent object materials. Moreover, high-quality materials naturally enable multiple downstream tasks such as relighting and material editing. Code and model will be publicly available at \url{https://sheldontsui.github.io/projects/Matlaber}.
△ Less
Submitted 17 August, 2023;
originally announced August 2023.
-
Bijective Density-Equalizing Quasiconformal Map for Multiply-Connected Open Surfaces
Authors:
Zhiyuan Lyu,
Gary P. T. Choi,
Lok Ming Lui
Abstract:
This paper proposes a novel method for computing bijective density-equalizing quasiconformal (DEQ) flattening maps for multiply-connected open surfaces. In conventional density-equalizing maps, shape deformations are solely driven by prescribed constraints on the density distribution, defined as the population per unit area, while the bijectivity and local geometric distortions of the mappings are…
▽ More
This paper proposes a novel method for computing bijective density-equalizing quasiconformal (DEQ) flattening maps for multiply-connected open surfaces. In conventional density-equalizing maps, shape deformations are solely driven by prescribed constraints on the density distribution, defined as the population per unit area, while the bijectivity and local geometric distortions of the mappings are uncontrolled. Also, prior methods have primarily focused on simply-connected open surfaces but not surfaces with more complicated topologies. Our proposed method overcomes these issues by formulating the density diffusion process as a quasiconformal flow, which allows us to effectively control the local geometric distortion and guarantee the bijectivity of the mapping by solving an energy minimization problem involving the Beltrami coefficient of the mapping. To achieve an optimal parameterization of multiply-connected surfaces, we develop an iterative scheme that optimizes both the shape of the target planar circular domain and the density-equalizing quasiconformal map onto it. In addition, landmark constraints can be incorporated into our proposed method for consistent feature alignment. The method can also be naturally applied to simply-connected open surfaces. By changing the prescribed population, a large variety of surface flattening maps with different desired properties can be achieved. The method is tested on both synthetic and real examples, demonstrating its efficacy in various applications in computer graphics and medical imaging.
△ Less
Submitted 15 August, 2023; v1 submitted 10 August, 2023;
originally announced August 2023.
-
Information Retrieval Meets Large Language Models: A Strategic Report from Chinese IR Community
Authors:
Qingyao Ai,
Ting Bai,
Zhao Cao,
Yi Chang,
Jiawei Chen,
Zhumin Chen,
Zhiyong Cheng,
Shoubin Dong,
Zhicheng Dou,
Fuli Feng,
Shen Gao,
Jiafeng Guo,
Xiangnan He,
Yanyan Lan,
Chenliang Li,
Yiqun Liu,
Ziyu Lyu,
Weizhi Ma,
Jun Ma,
Zhaochun Ren,
Pengjie Ren,
Zhiqiang Wang,
Mingwen Wang,
Ji-Rong Wen,
Le Wu
, et al. (8 additional authors not shown)
Abstract:
The research field of Information Retrieval (IR) has evolved significantly, expanding beyond traditional search to meet diverse user information needs. Recently, Large Language Models (LLMs) have demonstrated exceptional capabilities in text understanding, generation, and knowledge inference, opening up exciting avenues for IR research. LLMs not only facilitate generative retrieval but also offer…
▽ More
The research field of Information Retrieval (IR) has evolved significantly, expanding beyond traditional search to meet diverse user information needs. Recently, Large Language Models (LLMs) have demonstrated exceptional capabilities in text understanding, generation, and knowledge inference, opening up exciting avenues for IR research. LLMs not only facilitate generative retrieval but also offer improved solutions for user understanding, model evaluation, and user-system interactions. More importantly, the synergistic relationship among IR models, LLMs, and humans forms a new technical paradigm that is more powerful for information seeking. IR models provide real-time and relevant information, LLMs contribute internal knowledge, and humans play a central role of demanders and evaluators to the reliability of information services. Nevertheless, significant challenges exist, including computational costs, credibility concerns, domain-specific limitations, and ethical considerations. To thoroughly discuss the transformative impact of LLMs on IR research, the Chinese IR community conducted a strategic workshop in April 2023, yielding valuable insights. This paper provides a summary of the workshop's outcomes, including the rethinking of IR's core values, the mutual enhancement of LLMs and IR, the proposal of a novel IR technical paradigm, and open challenges.
△ Less
Submitted 26 July, 2023; v1 submitted 19 July, 2023;
originally announced July 2023.
-
Free dcpo-algebras via directed spaces
Authors:
Yuxu Chen,
Hui Kou,
Zhenchao Lyu
Abstract:
Directed spaces are natural topological extensions of dcpos in domain theory and form a cartesian closed category. We will show that the D-completion of free algebras over a Scott space $ΣL$, on the context of directed spaces, are exactly the free dcpo-algebras over dcpo $L$, which reveals the close connection between directed powerspaces and powerdomains. By this result, we provide a topological…
▽ More
Directed spaces are natural topological extensions of dcpos in domain theory and form a cartesian closed category. We will show that the D-completion of free algebras over a Scott space $ΣL$, on the context of directed spaces, are exactly the free dcpo-algebras over dcpo $L$, which reveals the close connection between directed powerspaces and powerdomains. By this result, we provide a topological representation of upper, lower and convex powerdomains of dcpos uniformly.
△ Less
Submitted 29 June, 2023;
originally announced June 2023.
-
Can Large Language Models Infer Causation from Correlation?
Authors:
Zhijing Jin,
Jiarui Liu,
Zhiheng Lyu,
Spencer Poff,
Mrinmaya Sachan,
Rada Mihalcea,
Mona Diab,
Bernhard Schölkopf
Abstract:
Causal inference is one of the hallmarks of human intelligence. While the field of CausalNLP has attracted much interest in the recent years, existing causal inference datasets in NLP primarily rely on discovering causality from empirical knowledge (e.g., commonsense knowledge). In this work, we propose the first benchmark dataset to test the pure causal inference skills of large language models (…
▽ More
Causal inference is one of the hallmarks of human intelligence. While the field of CausalNLP has attracted much interest in the recent years, existing causal inference datasets in NLP primarily rely on discovering causality from empirical knowledge (e.g., commonsense knowledge). In this work, we propose the first benchmark dataset to test the pure causal inference skills of large language models (LLMs). Specifically, we formulate a novel task Corr2Cause, which takes a set of correlational statements and determines the causal relationship between the variables. We curate a large-scale dataset of more than 200K samples, on which we evaluate seventeen existing LLMs. Through our experiments, we identify a key shortcoming of LLMs in terms of their causal inference skills, and show that these models achieve almost close to random performance on the task. This shortcoming is somewhat mitigated when we try to re-purpose LLMs for this skill via finetuning, but we find that these models still fail to generalize -- they can only perform causal inference in in-distribution settings when variable names and textual expressions used in the queries are similar to those in the training set, but fail in out-of-distribution settings generated by perturbing these queries. Corr2Cause is a challenging task for LLMs, and would be helpful in guiding future research on improving LLMs' pure reasoning skills and generalizability. Our data is at https://huggingface.co/datasets/causalnlp/corr2cause. Our code is at https://github.com/causalNLP/corr2cause.
△ Less
Submitted 17 April, 2024; v1 submitted 9 June, 2023;
originally announced June 2023.
-
Voices of Her: Analyzing Gender Differences in the AI Publication World
Authors:
Yiwen Ding,
Jiarui Liu,
Zhiheng Lyu,
Kun Zhang,
Bernhard Schoelkopf,
Zhijing Jin,
Rada Mihalcea
Abstract:
While several previous studies have analyzed gender bias in research, we are still missing a comprehensive analysis of gender differences in the AI community, covering diverse topics and different development trends. Using the AI Scholar dataset of 78K researchers in the field of AI, we identify several gender differences: (1) Although female researchers tend to have fewer overall citations than m…
▽ More
While several previous studies have analyzed gender bias in research, we are still missing a comprehensive analysis of gender differences in the AI community, covering diverse topics and different development trends. Using the AI Scholar dataset of 78K researchers in the field of AI, we identify several gender differences: (1) Although female researchers tend to have fewer overall citations than males, this citation difference does not hold for all academic-age groups; (2) There exist large gender homophily in co-authorship on AI papers; (3) Female first-authored papers show distinct linguistic styles, such as longer text, more positive emotion words, and more catchy titles than male first-authored papers. Our analysis provides a window into the current demographic trends in our AI community, and encourages more gender equality and diversity in the future. Our code and data are at https://github.com/causalNLP/ai-scholar-gender.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
Backpropagation-Free 4D Continuous Ant-Based Neural Topology Search
Authors:
AbdElRahman ElSaid,
Karl Ricanek,
Zeming Lyu,
Alexander Ororbia,
Travis Desell
Abstract:
Continuous Ant-based Topology Search (CANTS) is a previously introduced novel nature-inspired neural architecture search (NAS) algorithm that is based on ant colony optimization (ACO). CANTS utilizes a continuous search space to indirectly-encode a neural architecture search space. Synthetic ant agents explore CANTS' continuous search space based on the density and distribution of pheromones, stro…
▽ More
Continuous Ant-based Topology Search (CANTS) is a previously introduced novel nature-inspired neural architecture search (NAS) algorithm that is based on ant colony optimization (ACO). CANTS utilizes a continuous search space to indirectly-encode a neural architecture search space. Synthetic ant agents explore CANTS' continuous search space based on the density and distribution of pheromones, strongly inspired by how ants move in the real world. This continuous search space allows CANTS to automate the design of artificial neural networks (ANNs) of any size, removing a key limitation inherent to many current NAS algorithms that must operate within structures of a size that is predetermined by the user. This work expands CANTS by adding a fourth dimension to its search space representing potential neural synaptic weights. Adding this extra dimension allows CANTS agents to optimize both the architecture as well as the weights of an ANN without applying backpropagation (BP), which leads to a significant reduction in the time consumed in the optimization process: at least an average of 96% less time consumption with very competitive optimization performance, if not better. The experiments of this study - using real-world data - demonstrate that the BP-Free CANTS algorithm exhibits highly competitive performance compared to both CANTS and ANTS while requiring significantly less operation time.
△ Less
Submitted 30 January, 2024; v1 submitted 11 May, 2023;
originally announced May 2023.
-
Psychologically-Inspired Causal Prompts
Authors:
Zhiheng Lyu,
Zhijing Jin,
Justus Mattern,
Rada Mihalcea,
Mrinmaya Sachan,
Bernhard Schoelkopf
Abstract:
NLP datasets are richer than just input-output pairs; rather, they carry causal relations between the input and output variables. In this work, we take sentiment classification as an example and look into the causal relations between the review (X) and sentiment (Y). As psychology studies show that language can affect emotion, different psychological processes are evoked when a person first makes…
▽ More
NLP datasets are richer than just input-output pairs; rather, they carry causal relations between the input and output variables. In this work, we take sentiment classification as an example and look into the causal relations between the review (X) and sentiment (Y). As psychology studies show that language can affect emotion, different psychological processes are evoked when a person first makes a rating and then self-rationalizes their feeling in a review (where the sentiment causes the review, i.e., Y -> X), versus first describes their experience, and weighs the pros and cons to give a final rating (where the review causes the sentiment, i.e., X -> Y ). Furthermore, it is also a completely different psychological process if an annotator infers the original rating of the user by theory of mind (ToM) (where the review causes the rating, i.e., X -ToM-> Y ). In this paper, we verbalize these three causal mechanisms of human psychological processes of sentiment classification into three different causal prompts, and study (1) how differently they perform, and (2) what nature of sentiment classification data leads to agreement or diversity in the model responses elicited by the prompts. We suggest future work raise awareness of different causal structures in NLP tasks. Our code and data are at https://github.com/cogito233/psych-causal-prompt
△ Less
Submitted 2 May, 2023;
originally announced May 2023.
-
Semantic Communications for Image Recovery and Classification via Deep Joint Source and Channel Coding
Authors:
Zhonghao Lyu,
Guangxu Zhu,
Jie Xu,
Bo Ai,
Shuguang Cui
Abstract:
With the recent advancements in edge artificial intelligence (AI), future sixth-generation (6G) networks need to support new AI tasks such as classification and clustering apart from data recovery. Motivated by the success of deep learning, the semantic-aware and task-oriented communications with deep joint source and channel coding (JSCC) have emerged as new paradigm shifts in 6G from the convent…
▽ More
With the recent advancements in edge artificial intelligence (AI), future sixth-generation (6G) networks need to support new AI tasks such as classification and clustering apart from data recovery. Motivated by the success of deep learning, the semantic-aware and task-oriented communications with deep joint source and channel coding (JSCC) have emerged as new paradigm shifts in 6G from the conventional data-oriented communications with separate source and channel coding (SSCC). However, most existing works focused on the deep JSCC designs for one task of data recovery or AI task execution independently, which cannot be transferred to other unintended tasks. Differently, this paper investigates the JSCC semantic communications to support multi-task services, by performing the image data recovery and classification task execution simultaneously. First, we propose a new end-to-end deep JSCC framework by unifying the coding rate reduction maximization and the mean square error (MSE) minimization in the loss function. Here, the coding rate reduction maximization facilitates the learning of discriminative features for enabling to perform classification tasks directly in the feature space, and the MSE minimization helps the learning of informative features for high-quality image data recovery. Next, to further improve the robustness against variational wireless channels, we propose a new gated deep JSCC design, in which a gated net is incorporated for adaptively pruning the output features to adjust their dimensions based on channel conditions. Finally, we present extensive numerical experiments to validate the performance of our proposed deep JSCC designs as compared to various benchmark schemes.
△ Less
Submitted 5 April, 2023;
originally announced April 2023.
-
Generative Diffusion Prior for Unified Image Restoration and Enhancement
Authors:
Ben Fei,
Zhaoyang Lyu,
Liang Pan,
Junzhe Zhang,
Weidong Yang,
Tianyue Luo,
Bo Zhang,
Bo Dai
Abstract:
Existing image restoration methods mostly leverage the posterior distribution of natural images. However, they often assume known degradation and also require supervised training, which restricts their adaptation to complex real applications. In this work, we propose the Generative Diffusion Prior (GDP) to effectively model the posterior distributions in an unsupervised sampling manner. GDP utiliz…
▽ More
Existing image restoration methods mostly leverage the posterior distribution of natural images. However, they often assume known degradation and also require supervised training, which restricts their adaptation to complex real applications. In this work, we propose the Generative Diffusion Prior (GDP) to effectively model the posterior distributions in an unsupervised sampling manner. GDP utilizes a pre-train denoising diffusion generative model (DDPM) for solving linear inverse, non-linear, or blind problems. Specifically, GDP systematically explores a protocol of conditional guidance, which is verified more practical than the commonly used guidance way. Furthermore, GDP is strength at optimizing the parameters of degradation model during the denoising process, achieving blind image restoration. Besides, we devise hierarchical guidance and patch-based methods, enabling the GDP to generate images of arbitrary resolutions. Experimentally, we demonstrate GDP's versatility on several image datasets for linear problems, such as super-resolution, deblurring, inpainting, and colorization, as well as non-linear and blind issues, such as low-light enhancement and HDR image recovery. GDP outperforms the current leading unsupervised methods on the diverse benchmarks in reconstruction quality and perceptual quality. Moreover, GDP also generalizes well for natural images or synthesized images with arbitrary sizes from various tasks out of the distribution of the ImageNet training set.
△ Less
Submitted 3 April, 2023;
originally announced April 2023.
-
Clo(o)k: A Clock That Looks
Authors:
Zhuoyue Lyu
Abstract:
What if a clock could do more than just tell time - what if it could actually see? This paper delves into the conceptualization, design, and construction of a timepiece with visual perception capabilities, featuring three applications that expand the possibilities of human-time interaction. Insights from an Open House showcase are also shared, highlighting the unique user experiences of this devic…
▽ More
What if a clock could do more than just tell time - what if it could actually see? This paper delves into the conceptualization, design, and construction of a timepiece with visual perception capabilities, featuring three applications that expand the possibilities of human-time interaction. Insights from an Open House showcase are also shared, highlighting the unique user experiences of this device.
△ Less
Submitted 25 March, 2023;
originally announced March 2023.
-
Controllable Mesh Generation Through Sparse Latent Point Diffusion Models
Authors:
Zhaoyang Lyu,
Jinyi Wang,
Yuwei An,
Ya Zhang,
Dahua Lin,
Bo Dai
Abstract:
Mesh generation is of great value in various applications involving computer graphics and virtual content, yet designing generative models for meshes is challenging due to their irregular data structure and inconsistent topology of meshes in the same category. In this work, we design a novel sparse latent point diffusion model for mesh generation. Our key insight is to regard point clouds as an in…
▽ More
Mesh generation is of great value in various applications involving computer graphics and virtual content, yet designing generative models for meshes is challenging due to their irregular data structure and inconsistent topology of meshes in the same category. In this work, we design a novel sparse latent point diffusion model for mesh generation. Our key insight is to regard point clouds as an intermediate representation of meshes, and model the distribution of point clouds instead. While meshes can be generated from point clouds via techniques like Shape as Points (SAP), the challenges of directly generating meshes can be effectively avoided. To boost the efficiency and controllability of our mesh generation method, we propose to further encode point clouds to a set of sparse latent points with point-wise semantic meaningful features, where two DDPMs are trained in the space of sparse latent points to respectively model the distribution of the latent point positions and features at these latent points. We find that sampling in this latent space is faster than directly sampling dense point clouds. Moreover, the sparse latent points also enable us to explicitly control both the overall structures and local details of the generated meshes. Extensive experiments are conducted on the ShapeNet dataset, where our proposed sparse latent point diffusion model achieves superior performance in terms of generation quality and controllability when compared to existing methods.
△ Less
Submitted 14 March, 2023; v1 submitted 14 March, 2023;
originally announced March 2023.
-
Online Evolutionary Neural Architecture Search for Multivariate Non-Stationary Time Series Forecasting
Authors:
Zimeng Lyu,
Alexander Ororbia,
Travis Desell
Abstract:
Time series forecasting (TSF) is one of the most important tasks in data science given the fact that accurate time series (TS) predictive models play a major role across a wide variety of domains including finance, transportation, health care, and power systems. Real-world utilization of machine learning (ML) typically involves (pre-)training models on collected, historical data and then applying…
▽ More
Time series forecasting (TSF) is one of the most important tasks in data science given the fact that accurate time series (TS) predictive models play a major role across a wide variety of domains including finance, transportation, health care, and power systems. Real-world utilization of machine learning (ML) typically involves (pre-)training models on collected, historical data and then applying them to unseen data points. However, in real-world applications, time series data streams are usually non-stationary and trained ML models usually, over time, face the problem of data or concept drift.
To address this issue, models must be periodically retrained or redesigned, which takes significant human and computational resources. Additionally, historical data may not even exist to re-train or re-design model with. As a result, it is highly desirable that models are designed and trained in an online fashion. This work presents the Online NeuroEvolution-based Neural Architecture Search (ONE-NAS) algorithm, which is a novel neural architecture search method capable of automatically designing and dynamically training recurrent neural networks (RNNs) for online forecasting tasks. Without any pre-training, ONE-NAS utilizes populations of RNNs that are continuously updated with new network structures and weights in response to new multivariate input data. ONE-NAS is tested on real-world, large-scale multivariate wind turbine data as well as the univariate Dow Jones Industrial Average (DJIA) dataset. Results demonstrate that ONE-NAS outperforms traditional statistical time series forecasting methods, including online linear regression, fixed long short-term memory (LSTM) and gated recurrent unit (GRU) models trained online, as well as state-of-the-art, online ARIMA strategies.
△ Less
Submitted 20 February, 2023;
originally announced February 2023.
-
rMultiNet: An R Package For Multilayer Networks Analysis
Authors:
Ting Li,
Zhongyuan Lyu,
Chenyu Ren,
Dong Xia
Abstract:
This paper develops an R package rMultiNet to analyze multilayer network data. We provide two general frameworks from recent literature, e.g. mixture multilayer stochastic block model(MMSBM) and mixture multilayer latent space model(MMLSM) to generate the multilayer network. We also provide several methods to reveal the embedding of both nodes and layers followed by further data analysis methods,…
▽ More
This paper develops an R package rMultiNet to analyze multilayer network data. We provide two general frameworks from recent literature, e.g. mixture multilayer stochastic block model(MMSBM) and mixture multilayer latent space model(MMLSM) to generate the multilayer network. We also provide several methods to reveal the embedding of both nodes and layers followed by further data analysis methods, such as clustering. Three real data examples are processed in the package. The source code of rMultiNet is available at https://github.com/ChenyuzZZ73/rMultiNet.
△ Less
Submitted 8 February, 2023;
originally announced February 2023.
-
A note on the category of c-spaces
Authors:
Z. Lyu,
X. Xie,
H. Kou
Abstract:
We prove that the category of c-spaces with continuous maps is not cartesian closed. As a corollary the category of locally finitary compact spaces with continuous maps is also not cartesian closed.
We prove that the category of c-spaces with continuous maps is not cartesian closed. As a corollary the category of locally finitary compact spaces with continuous maps is also not cartesian closed.
△ Less
Submitted 18 March, 2023; v1 submitted 23 November, 2022;
originally announced November 2022.
-
MetaMax: Improved Open-Set Deep Neural Networks via Weibull Calibration
Authors:
Zongyao Lyu,
Nolan B. Gutierrez,
William J. Beksi
Abstract:
Open-set recognition refers to the problem in which classes that were not seen during training appear at inference time. This requires the ability to identify instances of novel classes while maintaining discriminative capability for closed-set classification. OpenMax was the first deep neural network-based approach to address open-set recognition by calibrating the predictive scores of a standard…
▽ More
Open-set recognition refers to the problem in which classes that were not seen during training appear at inference time. This requires the ability to identify instances of novel classes while maintaining discriminative capability for closed-set classification. OpenMax was the first deep neural network-based approach to address open-set recognition by calibrating the predictive scores of a standard closed-set classification network. In this paper we present MetaMax, a more effective post-processing technique that improves upon contemporary methods by directly modeling class activation vectors. MetaMax removes the need for computing class mean activation vectors (MAVs) and distances between a query image and a class MAV as required in OpenMax. Experimental results show that MetaMax outperforms OpenMax and is comparable in performance to other state-of-the-art approaches.
△ Less
Submitted 20 November, 2022;
originally announced November 2022.
-
Pushing AI to Wireless Network Edge: An Overview on Integrated Sensing, Communication, and Computation towards 6G
Authors:
Guangxu Zhu,
Zhonghao Lyu,
Xiang Jiao,
Peixi Liu,
Mingzhe Chen,
Jie Xu,
Shuguang Cui,
Ping Zhang
Abstract:
Pushing artificial intelligence (AI) from central cloud to network edge has reached board consensus in both industry and academia for materializing the vision of artificial intelligence of things (AIoT) in the sixth-generation (6G) era. This gives rise to an emerging research area known as edge intelligence, which concerns the distillation of human-like intelligence from the huge amount of data sc…
▽ More
Pushing artificial intelligence (AI) from central cloud to network edge has reached board consensus in both industry and academia for materializing the vision of artificial intelligence of things (AIoT) in the sixth-generation (6G) era. This gives rise to an emerging research area known as edge intelligence, which concerns the distillation of human-like intelligence from the huge amount of data scattered at wireless network edge. In general, realizing edge intelligence corresponds to the process of sensing, communication, and computation, which are coupled ingredients for data generation, exchanging, and processing, respectively. However, conventional wireless networks design the sensing, communication, and computation separately in a task-agnostic manner, which encounters difficulties in accommodating the stringent demands of ultra-low latency, ultra-high reliability, and high capacity in emerging AI applications such as auto-driving. This thus prompts a new design paradigm of seamless integrated sensing, communication, and computation (ISCC) in a task-oriented manner, which comprehensively accounts for the use of the data in the downstream AI applications. In view of its growing interest, this article provides a timely overview of ISCC for edge intelligence by introducing its basic concept, design challenges, and enabling techniques, surveying the state-of-the-art development, and shedding light on the road ahead.
△ Less
Submitted 4 November, 2022;
originally announced November 2022.
-
Analysis of Estimating the Bayes Rule for Gaussian Mixture Models with a Specified Missing-Data Mechanism
Authors:
Ziyang Lyu
Abstract:
Semi-supervised learning (SSL) approaches have been successfully applied in a wide range of engineering and scientific fields. This paper investigates the generative model framework with a missingness mechanism for unclassified observations, as introduced by Ahfock and McLachlan(2020). We show that in a partially classified sample, a classifier using Bayes rule of allocation with a missing-data me…
▽ More
Semi-supervised learning (SSL) approaches have been successfully applied in a wide range of engineering and scientific fields. This paper investigates the generative model framework with a missingness mechanism for unclassified observations, as introduced by Ahfock and McLachlan(2020). We show that in a partially classified sample, a classifier using Bayes rule of allocation with a missing-data mechanism can surpass a fully supervised classifier in a two-class normal homoscedastic model, especially with moderate to low overlap and proportion of missing class labels, or with large overlap but few missing labels. It also outperforms a classifier with no missing-data mechanism regardless of the overlap region or the proportion of missing class labels. Our exploration of two- and three-component normal mixture models with unequal covariances through simulations further corroborates our findings. Finally, we illustrate the use of the proposed classifier with a missing-data mechanism on interneuronal and skin lesion datasets.
△ Less
Submitted 29 December, 2023; v1 submitted 25 October, 2022;
originally announced October 2022.
-
FIMP: Foundation Model-Informed Message Passing for Graph Neural Networks
Authors:
Syed Asad Rizvi,
Nazreen Pallikkavaliyaveetil,
David Zhang,
Zhuoyang Lyu,
Nhi Nguyen,
Haoran Lyu,
Benjamin Christensen,
Josue Ortega Caro,
Antonio H. O. Fonseca,
Emanuele Zappala,
Maryam Bagherian,
Christopher Averill,
Chadi G. Abdallah,
Amin Karbasi,
Rex Ying,
Maria Brbic,
Rahul Madhav Dhodapkar,
David van Dijk
Abstract:
Foundation models have achieved remarkable success across many domains, relying on pretraining over vast amounts of data. Graph-structured data often lacks the same scale as unstructured data, making the development of graph foundation models challenging. In this work, we propose Foundation-Informed Message Passing (FIMP), a Graph Neural Network (GNN) message-passing framework that leverages pretr…
▽ More
Foundation models have achieved remarkable success across many domains, relying on pretraining over vast amounts of data. Graph-structured data often lacks the same scale as unstructured data, making the development of graph foundation models challenging. In this work, we propose Foundation-Informed Message Passing (FIMP), a Graph Neural Network (GNN) message-passing framework that leverages pretrained non-textual foundation models in graph-based tasks. We show that the self-attention layers of foundation models can effectively be repurposed on graphs to perform cross-node attention-based message-passing. Our model is evaluated on a real-world image network dataset and two biological applications (single-cell RNA sequencing data and fMRI brain activity recordings) in both finetuned and zero-shot settings. FIMP outperforms strong baselines, demonstrating that it can effectively leverage state-of-the-art foundation models in graph tasks.
△ Less
Submitted 1 July, 2024; v1 submitted 17 October, 2022;
originally announced October 2022.
-
An Overview on Over-the-Air Federated Edge Learning
Authors:
Xiaowen Cao,
Zhonghao Lyu,
Guangxu Zhu,
Jie Xu,
Lexi Xu,
Shuguang Cui
Abstract:
Over-the-air federated edge learning (Air-FEEL) has emerged as a promising solution to support edge artificial intelligence (AI) in future beyond 5G (B5G) and 6G networks. In Air-FEEL, distributed edge devices use their local data to collaboratively train AI models while preserving data privacy, in which the over-the-air model/gradient aggregation is exploited for enhancing the learning efficiency…
▽ More
Over-the-air federated edge learning (Air-FEEL) has emerged as a promising solution to support edge artificial intelligence (AI) in future beyond 5G (B5G) and 6G networks. In Air-FEEL, distributed edge devices use their local data to collaboratively train AI models while preserving data privacy, in which the over-the-air model/gradient aggregation is exploited for enhancing the learning efficiency. This article provides an overview on the state of the art of Air-FEEL. First, we present the basic principle of Air-FEEL, and introduce the technical challenges for Air-FEEL design due to the over-the-air aggregation errors, as well as the resource and data heterogeneities at edge devices. Next, we present the fundamental performance metrics for Air-FEEL, and review resource management solutions and design considerations for enhancing the Air-FEEL performance. Finally, several interesting research directions are pointed out to motivate future work.
△ Less
Submitted 11 August, 2022;
originally announced August 2022.
-
Optimal Clustering by Lloyd Algorithm for Low-Rank Mixture Model
Authors:
Zhongyuan Lyu,
Dong Xia
Abstract:
This paper investigates the computational and statistical limits in clustering matrix-valued observations. We propose a low-rank mixture model (LrMM), adapted from the classical Gaussian mixture model (GMM) to treat matrix-valued observations, which assumes low-rankness for population center matrices. A computationally efficient clustering method is designed by integrating Lloyd's algorithm and lo…
▽ More
This paper investigates the computational and statistical limits in clustering matrix-valued observations. We propose a low-rank mixture model (LrMM), adapted from the classical Gaussian mixture model (GMM) to treat matrix-valued observations, which assumes low-rankness for population center matrices. A computationally efficient clustering method is designed by integrating Lloyd's algorithm and low-rank approximation. Once well-initialized, the algorithm converges fast and achieves an exponential-type clustering error rate that is minimax optimal. Meanwhile, we show that a tensor-based spectral method delivers a good initial clustering. Comparable to GMM, the minimax optimal clustering error rate is decided by the separation strength, i.e., the minimal distance between population center matrices. By exploiting low-rankness, the proposed algorithm is blessed with a weaker requirement on the separation strength. Unlike GMM, however, the computational difficulty of LrMM is characterized by the signal strength, i.e., the smallest non-zero singular values of population center matrices. Evidence is provided showing that no polynomial-time algorithm is consistent if the signal strength is not strong enough, even though the separation strength is strong. Intriguing differences between estimation and clustering under LrMM are discussed. The merits of low-rank Lloyd's algorithm are confirmed by comprehensive simulation experiments. Finally, our method outperforms others in the literature on real-world datasets.
△ Less
Submitted 6 June, 2023; v1 submitted 10 July, 2022;
originally announced July 2022.
-
Guided Diffusion Model for Adversarial Purification
Authors:
Jinyi Wang,
Zhaoyang Lyu,
Dahua Lin,
Bo Dai,
Hongfei Fu
Abstract:
With wider application of deep neural networks (DNNs) in various algorithms and frameworks, security threats have become one of the concerns. Adversarial attacks disturb DNN-based image classifiers, in which attackers can intentionally add imperceptible adversarial perturbations on input images to fool the classifiers. In this paper, we propose a novel purification approach, referred to as guided…
▽ More
With wider application of deep neural networks (DNNs) in various algorithms and frameworks, security threats have become one of the concerns. Adversarial attacks disturb DNN-based image classifiers, in which attackers can intentionally add imperceptible adversarial perturbations on input images to fool the classifiers. In this paper, we propose a novel purification approach, referred to as guided diffusion model for purification (GDMP), to help protect classifiers from adversarial attacks. The core of our approach is to embed purification into the diffusion denoising process of a Denoised Diffusion Probabilistic Model (DDPM), so that its diffusion process could submerge the adversarial perturbations with gradually added Gaussian noises, and both of these noises can be simultaneously removed following a guided denoising process. On our comprehensive experiments across various datasets, the proposed GDMP is shown to reduce the perturbations raised by adversarial attacks to a shallow range, thereby significantly improving the correctness of classification. GDMP improves the robust accuracy by 5%, obtaining 90.1% under PGD attack on the CIFAR10 dataset. Moreover, GDMP achieves 70.94% robustness on the challenging ImageNet dataset.
△ Less
Submitted 28 June, 2022; v1 submitted 30 May, 2022;
originally announced May 2022.