Search | arXiv e-print repository

Richelieu: Self-Evolving LLM-Based Agents for AI Diplomacy

Authors: Zhenyu Guan, Xiangyu Kong, Fangwei Zhong, Yizhou Wang

Abstract: Diplomacy is one of the most sophisticated activities in human society. The complex interactions among multiple parties/ agents involve various abilities like social reasoning, negotiation arts, and long-term strategy planning. Previous AI agents surely have proved their capability of handling multi-step games and larger action spaces on tasks involving multiple agents. However, diplomacy involves… ▽ More Diplomacy is one of the most sophisticated activities in human society. The complex interactions among multiple parties/ agents involve various abilities like social reasoning, negotiation arts, and long-term strategy planning. Previous AI agents surely have proved their capability of handling multi-step games and larger action spaces on tasks involving multiple agents. However, diplomacy involves a staggering magnitude of decision spaces, especially considering the negotiation stage required. Recently, LLM agents have shown their potential for extending the boundary of previous agents on a couple of applications, however, it is still not enough to handle a very long planning period in a complex multi-agent environment. Empowered with cutting-edge LLM technology, we make the first stab to explore AI's upper bound towards a human-like agent for such a highly comprehensive multi-agent mission by combining three core and essential capabilities for stronger LLM-based societal agents: 1) strategic planner with memory and reflection; 2) goal-oriented negotiate with social reasoning; 3) augmenting memory by self-play games to self-evolving without any human in the loop. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2406.06432 [pdf, other]

SYM3D: Learning Symmetric Triplanes for Better 3D-Awareness of GANs

Authors: Jing Yang, Kyle Fogarty, Fangcheng Zhong, Cengiz Oztireli

Abstract: Despite the growing success of 3D-aware GANs, which can be trained on 2D images to generate high-quality 3D assets, they still rely on multi-view images with camera annotations to synthesize sufficient details from all viewing directions. However, the scarce availability of calibrated multi-view image datasets, especially in comparison to single-view images, has limited the potential of 3D GANs. M… ▽ More Despite the growing success of 3D-aware GANs, which can be trained on 2D images to generate high-quality 3D assets, they still rely on multi-view images with camera annotations to synthesize sufficient details from all viewing directions. However, the scarce availability of calibrated multi-view image datasets, especially in comparison to single-view images, has limited the potential of 3D GANs. Moreover, while bypassing camera pose annotations with a camera distribution constraint reduces dependence on exact camera parameters, it still struggles to generate a consistent orientation of 3D assets. To this end, we propose SYM3D, a novel 3D-aware GAN designed to leverage the prevalent reflectional symmetry structure found in natural and man-made objects, alongside a proposed view-aware spatial attention mechanism in learning the 3D representation. We evaluate SYM3D on both synthetic (ShapeNet Chairs, Cars, and Airplanes) and real-world datasets (ABO-Chair), demonstrating its superior performance in capturing detailed geometry and texture, even when trained on only single-view images. Finally, we demonstrate the effectiveness of incorporating symmetry regularization in helping reduce artifacts in the modeling of 3D assets in the text-to-3D task. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 11

arXiv:2405.12069 [pdf, other]

Gaussian Head & Shoulders: High Fidelity Neural Upper Body Avatars with Anchor Gaussian Guided Texture Warping

Authors: Tianhao Wu, Jing Yang, Zhilin Guo, Jingyi Wan, Fangcheng Zhong, Cengiz Oztireli

Abstract: By equipping the most recent 3D Gaussian Splatting representation with head 3D morphable models (3DMM), existing methods manage to create head avatars with high fidelity. However, most existing methods only reconstruct a head without the body, substantially limiting their application scenarios. We found that naively applying Gaussians to model the clothed chest and shoulders tends to result in blu… ▽ More By equipping the most recent 3D Gaussian Splatting representation with head 3D morphable models (3DMM), existing methods manage to create head avatars with high fidelity. However, most existing methods only reconstruct a head without the body, substantially limiting their application scenarios. We found that naively applying Gaussians to model the clothed chest and shoulders tends to result in blurry reconstruction and noisy floaters under novel poses. This is because of the fundamental limitation of Gaussians and point clouds -- each Gaussian or point can only have a single directional radiance without spatial variance, therefore an unnecessarily large number of them is required to represent complicated spatially varying texture, even for simple geometry. In contrast, we propose to model the body part with a neural texture that consists of coarse and pose-dependent fine colors. To properly render the body texture for each view and pose without accurate geometry nor UV mapping, we optimize another sparse set of Gaussians as anchors that constrain the neural warping field that maps image plane coordinates to the texture space. We demonstrate that Gaussian Head & Shoulders can fit the high-frequency details on the clothed upper body with high fidelity and potentially improve the accuracy and fidelity of the head region. We evaluate our method with casual phone-captured and internet videos and show our method archives superior reconstruction quality and robustness in both self and cross reenactment tasks. To fully utilize the efficient rendering speed of Gaussian splatting, we additionally propose an accelerated inference method of our trained model without Multi-Layer Perceptron (MLP) queries and reach a stable rendering speed of around 130 FPS for any subjects. △ Less

Submitted 21 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

Comments: Project Page: https://gaussian-head-shoulders.netlify.app/

arXiv:2405.11231 [pdf, ps, other]

Nucleation and growth manifest universal scaling, surely

Authors: Fan Zhong

Abstract: When a system is brought to a metastable state, nuclei of the equilibrium phase form and grow. This is the well-known nucleation and growth of first-order phase transitions. Near a critical point of a continuous phase transition, critical phenomena such as critical opalescence characterized by universal scaling emerge. These two sets of behavior are so completely different that it might appear abs… ▽ More When a system is brought to a metastable state, nuclei of the equilibrium phase form and grow. This is the well-known nucleation and growth of first-order phase transitions. Near a critical point of a continuous phase transition, critical phenomena such as critical opalescence characterized by universal scaling emerge. These two sets of behavior are so completely different that it might appear absurd to ask whether nucleation and growth exhibit even a slight universal scaling. Here we show that universal scaling is indeed manifested in standard nucleation and growth by introducing a non-universal initial time to a recently proposed method. This generalizes the method and provides a different perspective for studying universal scaling behavior and hence universality classes of nucleation and growth. △ Less

Submitted 25 May, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

Comments: 5 pages, 2 figures

arXiv:2405.02890 [pdf]

Vacancies making jerky flow in complex alloys

Authors: Zhida Liang, Fengxian Liu, Li Wang, Zihan You, Fanqi Zhong, Alan Cocks, Florian Pyczak

Abstract: Longevity of materials, especially alloys, is crucial for enhancing the sustainability and efficiency of various applications, including gas turbines. Jerky flow, also known as dynamic strain aging effect, can indeed have a significant impact on the fatigue life of high-temperature components in gas turbines. In general, three jerky flow types, i.e., A, B and C, existed in superalloys. Type A and… ▽ More Longevity of materials, especially alloys, is crucial for enhancing the sustainability and efficiency of various applications, including gas turbines. Jerky flow, also known as dynamic strain aging effect, can indeed have a significant impact on the fatigue life of high-temperature components in gas turbines. In general, three jerky flow types, i.e., A, B and C, existed in superalloys. Type A and B, occurring at low temperature, were proved to be caused by interstitial elements, such as carbon. However, Type C serration at high temperature has not been verified directly and remains unresolved, which was unanimously agreed it is caused by the interaction between solute elements and dislocations. In this study, our new discovery challenged this mainstream axiom. We proposed that vacancies play a dominant role in inducing jerky flow instead of solute atoms. By transmission electron microscopy, we observed directly that dislocations are pinned by vacancy-type dislocation loops (PDLs) or dipoles. Our findings suggest that vacancies located at dislocation jogs are primarily responsible for pinning and unpinning of superlattice dislocations. As dislocations move and interact with vacancies, PDLs or dipoles are left behind in their path. This pinning and unpinning process is repeated as successive dislocations encounter the newly formed PDLs or dipoles, leading to the recurring fluctuations in the stress, i.e., serrated flow. Additionally, antisite defects created by Co, Cr and Ti sitting on Ni positions is postulated to facilitate vacancy formation and migration towards dislocations via nearest neighbor jumps. This study provides a new avenue to show how future alloy design can improve the fatigue life of superalloys in extreme environments. △ Less

Submitted 5 May, 2024; originally announced May 2024.

arXiv:2405.01839 [pdf, other]

SocialGFs: Learning Social Gradient Fields for Multi-Agent Reinforcement Learning

Authors: Qian Long, Fangwei Zhong, Mingdong Wu, Yizhou Wang, Song-Chun Zhu

Abstract: Multi-agent systems (MAS) need to adaptively cope with dynamic environments, changing agent populations, and diverse tasks. However, most of the multi-agent systems cannot easily handle them, due to the complexity of the state and task space. The social impact theory regards the complex influencing factors as forces acting on an agent, emanating from the environment, other agents, and the agent's… ▽ More Multi-agent systems (MAS) need to adaptively cope with dynamic environments, changing agent populations, and diverse tasks. However, most of the multi-agent systems cannot easily handle them, due to the complexity of the state and task space. The social impact theory regards the complex influencing factors as forces acting on an agent, emanating from the environment, other agents, and the agent's intrinsic motivation, referring to the social force. Inspired by this concept, we propose a novel gradient-based state representation for multi-agent reinforcement learning. To non-trivially model the social forces, we further introduce a data-driven method, where we employ denoising score matching to learn the social gradient fields (SocialGFs) from offline samples, e.g., the attractive or repulsive outcomes of each force. During interactions, the agents take actions based on the multi-dimensional gradients to maximize their own rewards. In practice, we integrate SocialGFs into the widely used multi-agent reinforcement learning algorithms, e.g., MAPPO. The empirical results reveal that SocialGFs offer four advantages for multi-agent systems: 1) they can be learned without requiring online interaction, 2) they demonstrate transferability across diverse tasks, 3) they facilitate credit assignment in challenging reward settings, and 4) they are scalable with the increasing number of agents. △ Less

Submitted 3 May, 2024; originally announced May 2024.

Comments: AAAI 2024 Cooperative Multi-Agent Systems Decision-Making and Learning (CMASDL) Workshop

arXiv:2404.09857 [pdf, other]

Empowering Embodied Visual Tracking with Visual Foundation Models and Offline RL

Authors: Fangwei Zhong, Kui Wu, Hai Ci, Churan Wang, Hao Chen

Abstract: Embodied visual tracking is to follow a target object in dynamic 3D environments using an agent's egocentric vision. This is a vital and challenging skill for embodied agents. However, existing methods suffer from inefficient training and poor generalization. In this paper, we propose a novel framework that combines visual foundation models (VFM) and offline reinforcement learning (offline RL) to… ▽ More Embodied visual tracking is to follow a target object in dynamic 3D environments using an agent's egocentric vision. This is a vital and challenging skill for embodied agents. However, existing methods suffer from inefficient training and poor generalization. In this paper, we propose a novel framework that combines visual foundation models (VFM) and offline reinforcement learning (offline RL) to empower embodied visual tracking. We use a pre-trained VFM, such as ``Tracking Anything", to extract semantic segmentation masks with text prompts. We then train a recurrent policy network with offline RL, e.g., Conservative Q-Learning, to learn from the collected demonstrations without online agent-environment interactions. To further improve the robustness and generalization of the policy network, we also introduce a mask re-targeting mechanism and a multi-level data collection strategy. In this way, we can train a robust tracker within an hour on a consumer-level GPU, e.g., Nvidia RTX 3090. Such efficiency is unprecedented for RL-based visual tracking methods. We evaluate our tracker on several high-fidelity environments with challenging situations, such as distraction and occlusion. The results show that our agent outperforms state-of-the-art methods in terms of sample efficiency, robustness to distractors, and generalization to unseen scenarios and targets. We also demonstrate the transferability of the learned tracker from the virtual world to real-world scenarios. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.00219 [pdf, ps, other]

Complete universal scaling in first-order phase transitions

Authors: Fan Zhong

Abstract: Phase transitions and critical phenomena are among the most intriguing phenomena in nature and society. They are classified as first-order phase transitions (FOPTs) and continuous ones. While the latter show marvelous phenomena of scaling and universality, whether the former behaves similarly is a long-standing controversial issue. Here we definitely demonstrate complete universal scaling in field… ▽ More Phase transitions and critical phenomena are among the most intriguing phenomena in nature and society. They are classified as first-order phase transitions (FOPTs) and continuous ones. While the latter show marvelous phenomena of scaling and universality, whether the former behaves similarly is a long-standing controversial issue. Here we definitely demonstrate complete universal scaling in field driven FOPTs for Langevin equations in both zero and two spatial dimensions by rescaling all parameters and subtracting extra contributions with singular dimensions from an effective temperature and a special field according to an effective theory that possesses a fixed point of a seemingly un-physical imaginary value. This offers a perspective different from the usual nucleation and growth but conforming to continuous phase transitions to study FOPTs. △ Less

Submitted 29 March, 2024; originally announced April 2024.

Comments: 6 pages, 1 figure

arXiv:2402.02468 [pdf, other]

Fast Peer Adaptation with Context-aware Exploration

Authors: Long Ma, Yuanfei Wang, Fangwei Zhong, Song-Chun Zhu, Yizhou Wang

Abstract: Fast adapting to unknown peers (partners or opponents) with different strategies is a key challenge in multi-agent games. To do so, it is crucial for the agent to efficiently probe and identify the peer's strategy, as this is the prerequisite for carrying out the best response in adaptation. However, it is difficult to explore the strategies of unknown peers, especially when the games are partiall… ▽ More Fast adapting to unknown peers (partners or opponents) with different strategies is a key challenge in multi-agent games. To do so, it is crucial for the agent to efficiently probe and identify the peer's strategy, as this is the prerequisite for carrying out the best response in adaptation. However, it is difficult to explore the strategies of unknown peers, especially when the games are partially observable and have a long horizon. In this paper, we propose a peer identification reward, which rewards the learning agent based on how well it can identify the behavior pattern of the peer over the historical context, such as the observation over multiple episodes. This reward motivates the agent to learn a context-aware policy for effective exploration and fast adaptation, i.e., to actively seek and collect informative feedback from peers when uncertain about their policies and to exploit the context to perform the best response when confident. We evaluate our method on diverse testbeds that involve competitive (Kuhn Poker), cooperative (PO-Overcooked), or mixed (Predator-Prey-W) games with peer agents. We demonstrate that our method induces more active exploration behavior, achieving faster adaptation and better outcomes than existing methods. △ Less

Submitted 4 February, 2024; originally announced February 2024.

arXiv:2401.12737 [pdf]

doi 10.1515/nanoph-2023-0754

Controlling thermal emission with metasurfaces and its applications

Authors: Qiongqiong Chu, Fan Zhong, Xiaohe Shang, Ye Zhang, Shining Zhu, Hui Liu

Abstract: Thermal emission caused by the thermal motion of the charged particles is commonly broadband, un-polarized, and incoherent, like a melting pot of electromagnetic waves, which makes it unsuitable for infrared applications in many cases requiring specific thermal emission properties. Metasurfaces, characterized by two-dimensional subwavelength artificial nanostructures, have been extensively investi… ▽ More Thermal emission caused by the thermal motion of the charged particles is commonly broadband, un-polarized, and incoherent, like a melting pot of electromagnetic waves, which makes it unsuitable for infrared applications in many cases requiring specific thermal emission properties. Metasurfaces, characterized by two-dimensional subwavelength artificial nanostructures, have been extensively investigated for their flexibility in tuning optical properties, which provide an ideal platform for shaping thermal emission. Recently, remarkable progress was achieved not only in tuning thermal emission in multiple degrees of freedom, such as wavelength, polarization, radiation angle, coherence, and so on but also in applications of compact and integrated optical devices. Here, we review the recent advances in the regulation of thermal emission through metasurfaces and corresponding infrared applications, such as infrared sensing, radiative cooling, and thermophotovoltaic devices. △ Less

Submitted 23 January, 2024; originally announced January 2024.

Comments: 28 pages, 10 figures

Journal ref: Nanophotonics

arXiv:2312.04574 [pdf, other]

Differentiable Visual Computing for Inverse Problems and Machine Learning

Authors: Andrew Spielberg, Fangcheng Zhong, Konstantinos Rematas, Krishna Murthy Jatavallabhula, Cengiz Oztireli, Tzu-Mao Li, Derek Nowrouzezahrai

Abstract: Originally designed for applications in computer graphics, visual computing (VC) methods synthesize information about physical and virtual worlds, using prescribed algorithms optimized for spatial computing. VC is used to analyze geometry, physically simulate solids, fluids, and other media, and render the world via optical techniques. These fine-tuned computations that operate explicitly on a giv… ▽ More Originally designed for applications in computer graphics, visual computing (VC) methods synthesize information about physical and virtual worlds, using prescribed algorithms optimized for spatial computing. VC is used to analyze geometry, physically simulate solids, fluids, and other media, and render the world via optical techniques. These fine-tuned computations that operate explicitly on a given input solve so-called forward problems, VC excels at. By contrast, deep learning (DL) allows for the construction of general algorithmic models, side stepping the need for a purely first principles-based approach to problem solving. DL is powered by highly parameterized neural network architectures -- universal function approximators -- and gradient-based search algorithms which can efficiently search that large parameter space for optimal models. This approach is predicated by neural network differentiability, the requirement that analytic derivatives of a given problem's task metric can be computed with respect to neural network's parameters. Neural networks excel when an explicit model is not known, and neural network training solves an inverse problem in which a model is computed from data. △ Less

Submitted 21 November, 2023; originally announced December 2023.

arXiv:2311.15783 [pdf, other]

Hypernetworks for Generalizable BRDF Representation

Authors: Fazilet Gokbudak, Alejandro Sztrajman, Chenliang Zhou, Fangcheng Zhong, Rafal Mantiuk, Cengiz Oztireli

Abstract: In this paper, we introduce a technique to estimate measured BRDFs from a sparse set of samples. Our approach offers accurate BRDF reconstructions that are generalizable to new materials. This opens the door to BDRF reconstructions from a variety of data sources. The success of our approach relies on the ability of hypernetworks to generate a robust representation of BRDFs and a set encoder that a… ▽ More In this paper, we introduce a technique to estimate measured BRDFs from a sparse set of samples. Our approach offers accurate BRDF reconstructions that are generalizable to new materials. This opens the door to BDRF reconstructions from a variety of data sources. The success of our approach relies on the ability of hypernetworks to generate a robust representation of BRDFs and a set encoder that allows us to feed inputs of different sizes to the architecture. The set encoder and the hypernetwork also enable the compression of densely sampled BRDFs. We evaluate our technique both qualitatively and quantitatively on the well-known MERL dataset of 100 isotropic materials. Our approach accurately 1) estimates the BRDFs of unseen materials even for an extremely sparse sampling, 2) compresses the measured BRDFs into very small embeddings, e.g., 7D. △ Less

Submitted 7 March, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

arXiv:2311.12090 [pdf, other]

FrePolad: Frequency-Rectified Point Latent Diffusion for Point Cloud Generation

Authors: Chenliang Zhou, Fangcheng Zhong, Param Hanji, Zhilin Guo, Kyle Fogarty, Alejandro Sztrajman, Hongyun Gao, Cengiz Oztireli

Abstract: We propose FrePolad: frequency-rectified point latent diffusion, a point cloud generation pipeline integrating a variational autoencoder (VAE) with a denoising diffusion probabilistic model (DDPM) for the latent distribution. FrePolad simultaneously achieves high quality, diversity, and flexibility in point cloud cardinality for generation tasks while maintaining high computational efficiency. The… ▽ More We propose FrePolad: frequency-rectified point latent diffusion, a point cloud generation pipeline integrating a variational autoencoder (VAE) with a denoising diffusion probabilistic model (DDPM) for the latent distribution. FrePolad simultaneously achieves high quality, diversity, and flexibility in point cloud cardinality for generation tasks while maintaining high computational efficiency. The improvement in generation quality and diversity is achieved through (1) a novel frequency rectification module via spherical harmonics designed to retain high-frequency content while learning the point cloud distribution; and (2) a latent DDPM to learn the regularized yet complex latent distribution. In addition, FrePolad supports variable point cloud cardinality by formulating the sampling of points as conditional distributions over a latent shape distribution. Finally, the low-dimensional latent space encoded by the VAE contributes to FrePolad's fast and scalable sampling. Our quantitative and qualitative results demonstrate the state-of-the-art performance of FrePolad in terms of quality, diversity, and computational efficiency. △ Less

Submitted 20 November, 2023; originally announced November 2023.

arXiv:2311.12040 [pdf]

TransCDR: a deep learning model for enhancing the generalizability of cancer drug response prediction through transfer learning and multimodal data fusion for drug representation

Authors: Xiaoqiong Xia, Chaoyu Zhu, Yuqi Shan, Fan Zhong, Lei Liu

Abstract: Accurate and robust drug response prediction is of utmost importance in precision medicine. Although many models have been developed to utilize the representations of drugs and cancer cell lines for predicting cancer drug responses (CDR), their performances can be improved by addressing issues such as insufficient data modality, suboptimal fusion algorithms, and poor generalizability for novel dru… ▽ More Accurate and robust drug response prediction is of utmost importance in precision medicine. Although many models have been developed to utilize the representations of drugs and cancer cell lines for predicting cancer drug responses (CDR), their performances can be improved by addressing issues such as insufficient data modality, suboptimal fusion algorithms, and poor generalizability for novel drugs or cell lines. We introduce TransCDR, which uses transfer learning to learn drug representations and fuses multi-modality features of drugs and cell lines by a self-attention mechanism, to predict the IC50 values or sensitive states of drugs on cell lines. We are the first to systematically evaluate the generalization of the CDR prediction model to novel (i.e., never-before-seen) compound scaffolds and cell line clusters. TransCDR shows better generalizability than 8 state-of-the-art models. TransCDR outperforms its 5 variants that train drug encoders (i.e., RNN and AttentiveFP) from scratch under various scenarios. The most critical contributors among multiple drug notations and omics profiles are Extended Connectivity Fingerprint and genetic mutation. Additionally, the attention-based fusion module further enhances the predictive performance of TransCDR. TransCDR, trained on the GDSC dataset, demonstrates strong predictive performance on the external testing set CCLE. It is also utilized to predict missing CDRs on GDSC. Moreover, we investigate the biological mechanisms underlying drug response by classifying 7,675 patients from TCGA into drug-sensitive or drug-resistant groups, followed by a Gene Set Enrichment Analysis. TransCDR emerges as a potent tool with significant potential in drug response prediction. The source code and data can be accessed at https://github.com/XiaoqiongXia/TransCDR. △ Less

Submitted 17 November, 2023; originally announced November 2023.

Comments: 8 figures

arXiv:2311.04146 [pdf, other]

Galaxy Spectra neural Network (GaSNet). II. Using Deep Learning for Spectral Classification and Redshift Predictions

Authors: Fucheng Zhong, Nicola R. Napolitano, Caroline Heneka, Rui Li, Franz Erik Bauer, Nicolas Bouche, Johan Comparat, Young-Lo Kim, Jens-Kristian Krogager, Marcella Longhetti, Jonathan Loveday, Boudewijn F. Roukema, Benedict L. Rouse, Mara Salvato, Crescenzo Tortora, Roberto J. Assef, Letizia P. Cassarà, Luca Costantin, Scott Croom, Luke J M Davies, Alexander Fritz, Guillaume Guiglion, Andrew Humphrey, Emanuela Pompei, Claudio Ricci , et al. (3 additional authors not shown)

Abstract: Large sky spectroscopic surveys have reached the scale of photometric surveys in terms of sample sizes and data complexity. These huge datasets require efficient, accurate, and flexible automated tools for data analysis and science exploitation. We present the Galaxy Spectra Network/GaSNet-II, a supervised multi-network deep learning tool for spectra classification and redshift prediction. GaSNet-… ▽ More Large sky spectroscopic surveys have reached the scale of photometric surveys in terms of sample sizes and data complexity. These huge datasets require efficient, accurate, and flexible automated tools for data analysis and science exploitation. We present the Galaxy Spectra Network/GaSNet-II, a supervised multi-network deep learning tool for spectra classification and redshift prediction. GaSNet-II can be trained to identify a customized number of classes and optimize the redshift predictions for classified objects in each of them. It also provides redshift errors, using a network-of-networks that reproduces a Monte Carlo test on each spectrum, by randomizing their weight initialization. As a demonstration of the capability of the deep learning pipeline, we use 260k Sloan Digital Sky Survey spectra from Data Release 16, separated into 13 classes including 140k galactic, and 120k extragalactic objects. GaSNet-II achieves 92.4% average classification accuracy over the 13 classes (larger than 90% for the majority of them), and an average redshift error of approximately 0.23% for galaxies and 2.1% for quasars. We further train/test the same pipeline to classify spectra and predict redshifts for a sample of 200k 4MOST mock spectra and 21k publicly released DESI spectra. On 4MOST mock data, we reach 93.4% accuracy in 10-class classification and an average redshift error of 0.55% for galaxies and 0.3% for active galactic nuclei. On DESI data, we reach 96% accuracy in (star/galaxy/quasar only) classification and an average redshift error of 2.8% for galaxies and 4.8% for quasars, despite the small sample size available. GaSNet-II can process ~40k spectra in less than one minute, on a normal Desktop GPU. This makes the pipeline particularly suitable for real-time analyses of Stage-IV survey observations and an ideal tool for feedback loops aimed at night-by-night survey strategy optimization. △ Less

Submitted 7 November, 2023; originally announced November 2023.

Comments: 23 pages and 31 figures. The draft has been submitted to MNRAS

arXiv:2309.07796 [pdf, other]

For A More Comprehensive Evaluation of 6DoF Object Pose Tracking

Authors: Yang Li, Fan Zhong, Xin Wang, Shuangbing Song, Jiachen Li, Xueying Qin, Changhe Tu

Abstract: Previous evaluations on 6DoF object pose tracking have presented obvious limitations along with the development of this area. In particular, the evaluation protocols are not unified for different methods, the widely-used YCBV dataset contains significant annotation error, and the error metrics also may be biased. As a result, it is hard to fairly compare the methods, which has became a big obstacl… ▽ More Previous evaluations on 6DoF object pose tracking have presented obvious limitations along with the development of this area. In particular, the evaluation protocols are not unified for different methods, the widely-used YCBV dataset contains significant annotation error, and the error metrics also may be biased. As a result, it is hard to fairly compare the methods, which has became a big obstacle for developing new algorithms. In this paper we contribute a unified benchmark to address the above problems. For more accurate annotation of YCBV, we propose a multi-view multi-object global pose refinement method, which can jointly refine the poses of all objects and view cameras, resulting in sub-pixel sub-millimeter alignment errors. The limitations of previous scoring methods and error metrics are analyzed, based on which we introduce our improved evaluation methods. The unified benchmark takes both YCBV and BCOT as base datasets, which are shown to be complementary in scene categories. In experiments, we validate the precision and reliability of the proposed global pose refinement method with a realistic semi-synthesized dataset particularly for YCBV, and then present the benchmark results unifying learning&non-learning and RGB&RGBD methods, with some finds not discovered in previous studies. △ Less

Submitted 14 September, 2023; v1 submitted 14 September, 2023; originally announced September 2023.

arXiv:2308.12452 [pdf, other]

ARF-Plus: Controlling Perceptual Factors in Artistic Radiance Fields for 3D Scene Stylization

Authors: Wenzhao Li, Tianhao Wu, Fangcheng Zhong, Cengiz Oztireli

Abstract: The radiance fields style transfer is an emerging field that has recently gained popularity as a means of 3D scene stylization, thanks to the outstanding performance of neural radiance fields in 3D reconstruction and view synthesis. We highlight a research gap in radiance fields style transfer, the lack of sufficient perceptual controllability, motivated by the existing concept in the 2D image sty… ▽ More The radiance fields style transfer is an emerging field that has recently gained popularity as a means of 3D scene stylization, thanks to the outstanding performance of neural radiance fields in 3D reconstruction and view synthesis. We highlight a research gap in radiance fields style transfer, the lack of sufficient perceptual controllability, motivated by the existing concept in the 2D image style transfer. In this paper, we present ARF-Plus, a 3D neural style transfer framework offering manageable control over perceptual factors, to systematically explore the perceptual controllability in 3D scene stylization. Four distinct types of controls - color preservation control, (style pattern) scale control, spatial (selective stylization area) control, and depth enhancement control - are proposed and integrated into this framework. Results from real-world datasets, both quantitative and qualitative, show that the four types of controls in our ARF-Plus framework successfully accomplish their corresponding perceptual controls when stylizing 3D scenes. These techniques work well for individual style inputs as well as for the simultaneous application of multiple styles within a scene. This unlocks a realm of limitless possibilities, allowing customized modifications of stylization effects and flexible merging of the strengths of different styles, ultimately enabling the creation of novel and eye-catching stylistic effects on 3D scenes. △ Less

Submitted 6 September, 2023; v1 submitted 23 August, 2023; originally announced August 2023.

arXiv:2307.09582 [pdf, other]

Guided Linear Upsampling

Authors: Shuangbing Song, Fan Zhong, Tianju Wang, Xueying Qin, Changhe Tu

Abstract: Guided upsampling is an effective approach for accelerating high-resolution image processing. In this paper, we propose a simple yet effective guided upsampling method. Each pixel in the high-resolution image is represented as a linear interpolation of two low-resolution pixels, whose indices and weights are optimized to minimize the upsampling error. The downsampling can be jointly optimized in o… ▽ More Guided upsampling is an effective approach for accelerating high-resolution image processing. In this paper, we propose a simple yet effective guided upsampling method. Each pixel in the high-resolution image is represented as a linear interpolation of two low-resolution pixels, whose indices and weights are optimized to minimize the upsampling error. The downsampling can be jointly optimized in order to prevent missing small isolated regions. Our method can be derived from the color line model and local color transformations. Compared to previous methods, our method can better preserve detail effects while suppressing artifacts such as bleeding and blurring. It is efficient, easy to implement, and free of sensitive parameters. We evaluate the proposed method with a wide range of image operators, and show its advantages through quantitative and qualitative analysis. We demonstrate the advantages of our method for both interactive image editing and real-time high-resolution video processing. In particular, for interactive editing, the joint optimization can be precomputed, thus allowing for instant feedback without hardware acceleration. △ Less

Submitted 13 July, 2023; originally announced July 2023.

Comments: ACM SIGGRAPH

arXiv:2306.08943 [pdf, other]

Neural Fields with Hard Constraints of Arbitrary Differential Order

Authors: Fangcheng Zhong, Kyle Fogarty, Param Hanji, Tianhao Wu, Alejandro Sztrajman, Andrew Spielberg, Andrea Tagliasacchi, Petra Bosilj, Cengiz Oztireli

Abstract: While deep learning techniques have become extremely popular for solving a broad range of optimization problems, methods to enforce hard constraints during optimization, particularly on deep neural networks, remain underdeveloped. Inspired by the rich literature on meshless interpolation and its extension to spectral collocation methods in scientific computing, we develop a series of approaches fo… ▽ More While deep learning techniques have become extremely popular for solving a broad range of optimization problems, methods to enforce hard constraints during optimization, particularly on deep neural networks, remain underdeveloped. Inspired by the rich literature on meshless interpolation and its extension to spectral collocation methods in scientific computing, we develop a series of approaches for enforcing hard constraints on neural fields, which we refer to as Constrained Neural Fields (CNF). The constraints can be specified as a linear operator applied to the neural field and its derivatives. We also design specific model representations and training strategies for problems where standard models may encounter difficulties, such as conditioning of the system, memory consumption, and capacity of the network when being constrained. Our approaches are demonstrated in a wide range of real-world applications. Additionally, we develop a framework that enables highly efficient model and constraint specification, which can be readily applied to any downstream task where hard constraints need to be explicitly satisfied during optimization. △ Less

Submitted 29 October, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

Comments: 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

arXiv:2305.09470 [pdf, ps, other]

Integrated Planning and Control of Robotic Surgical Instruments for Tasks Autonomy

Authors: Fangxun Zhong, Yun-Hui Liu

Abstract: Agile maneuvers are essential for robot-enabled complex tasks such as surgical procedures. Prior explorations on surgery autonomy are limited to feasibility study of completing a single task without systematically addressing generic manipulation safety across different tasks. We present an integrated planning and control framework for 6-DoF robotic instruments for pipeline automation of surgical t… ▽ More Agile maneuvers are essential for robot-enabled complex tasks such as surgical procedures. Prior explorations on surgery autonomy are limited to feasibility study of completing a single task without systematically addressing generic manipulation safety across different tasks. We present an integrated planning and control framework for 6-DoF robotic instruments for pipeline automation of surgical tasks.We leverage the geometry of a robotic instrument and propose the nodal state space (NSS) to represent the robot state in SE(3) space. Each elementary robot motion could be encoded by regulation of the state parameters via a dynamical system. This theoretically ensures that every in-process trajectory is globally feasible and stably reached to an admissible target, and the controller is of closed-form without computing 6-DoF inverse kinematics. Then, to plan the motion steps reliably, we propose an interactive (instant) goal state of the robot that transforms manipulation planning through desired path constraints into a goal-varying manipulation (GVM) problem. We detail how GVM could adaptively and smoothly plan the procedure (could proceed or rewind the process as needed) based on on-the-fly situations under dynamic or disturbed environment. Finally, we extend the above policy to characterize complete pipelines of various surgical tasks. Simulations show that our framework could smoothly solve twisted maneuvers while avoiding collisions. Physical experiments using the da Vinci Research Kit (dVRK) validates the capability of automating individual tasks including tissue debridement, dissection, and wound suturing. The results confirm good task-level consistency and reliability compared to state-of-the-art automation algorithms. △ Less

Submitted 13 May, 2023; originally announced May 2023.

Comments: 28 pages

arXiv:2304.10773 [pdf, other]

doi 10.1109/LRA.2023.3272518

Learning Semantic-Agnostic and Spatial-Aware Representation for Generalizable Visual-Audio Navigation

Authors: Hongcheng Wang, Yuxuan Wang, Fangwei Zhong, Mingdong Wu, Jianwei Zhang, Yizhou Wang, Hao Dong

Abstract: Visual-audio navigation (VAN) is attracting more and more attention from the robotic community due to its broad applications, \emph{e.g.}, household robots and rescue robots. In this task, an embodied agent must search for and navigate to the sound source with egocentric visual and audio observations. However, the existing methods are limited in two aspects: 1) poor generalization to unheard sound… ▽ More Visual-audio navigation (VAN) is attracting more and more attention from the robotic community due to its broad applications, \emph{e.g.}, household robots and rescue robots. In this task, an embodied agent must search for and navigate to the sound source with egocentric visual and audio observations. However, the existing methods are limited in two aspects: 1) poor generalization to unheard sound categories; 2) sample inefficient in training. Focusing on these two problems, we propose a brain-inspired plug-and-play method to learn a semantic-agnostic and spatial-aware representation for generalizable visual-audio navigation. We meticulously design two auxiliary tasks for respectively accelerating learning representations with the above-desired characteristics. With these two auxiliary tasks, the agent learns a spatially-correlated representation of visual and audio inputs that can be applied to work on environments with novel sounds and maps. Experiment results on realistic 3D scenes (Replica and Matterport3D) demonstrate that our method achieves better generalization performance when zero-shot transferred to scenes with unseen maps and unheard sound categories. △ Less

Submitted 21 June, 2023; v1 submitted 21 April, 2023; originally announced April 2023.

Journal ref: The IEEE Robotics and Automation Letters 2023

arXiv:2304.09142 [pdf, other]

doi 10.1051/0004-6361/202346683

Cosmology with Galaxy Cluster Properties using Machine Learning

Authors: Lanlan Qiu, Nicola R. Napolitano, Stefano Borgani, Fucheng Zhong, Xiaodong Li, Mario Radovich, Weipeng Lin, Klaus Dolag, Crescenzo Tortora, Yang Wang, Rhea-Silvia Remus, Sirui Wu, Giuseppe Longo

Abstract: [Abridged] Galaxy clusters are the most massive gravitationally-bound systems in the universe and are widely considered to be an effective cosmological probe. We propose the first Machine Learning method using galaxy cluster properties to derive unbiased constraints on a set of cosmological parameters, including Omega_m, sigma_8, Omega_b, and h_0. We train the machine learning model with mock cata… ▽ More [Abridged] Galaxy clusters are the most massive gravitationally-bound systems in the universe and are widely considered to be an effective cosmological probe. We propose the first Machine Learning method using galaxy cluster properties to derive unbiased constraints on a set of cosmological parameters, including Omega_m, sigma_8, Omega_b, and h_0. We train the machine learning model with mock catalogs including "measured" quantities from Magneticum multi-cosmology hydrodynamical simulations, like gas mass, gas bolometric luminosity, gas temperature, stellar mass, cluster radius, total mass, velocity dispersion, and redshift, and correctly predict all parameters with uncertainties of the order of ~14% for Omega_m, ~8% for sigma_8, ~6% for Omega_b, and ~3% for h_0. This first test is exceptionally promising, as it shows that machine learning can efficiently map the correlations in the multi-dimensional space of the observed quantities to the cosmological parameter space and narrow down the probability that a given sample belongs to a given cosmological parameter combination. In the future, these ML tools can be applied to cluster samples with multi-wavelength observations from surveys like LSST, CSST, Euclid, Roman in optical and near-infrared bands, and eROSITA in X-rays, to constrain both the cosmology and the effect of the baryonic feedback. △ Less

Submitted 12 November, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

Comments: 19 pages, 20 figures. Revised version after the referee report. Resubmitted to A&A

Journal ref: A&A 687, A1 (2024)

arXiv:2304.08822 [pdf, other]

doi 10.1177/02783649231198900

Modal-graph 3D shape servoing of deformable objects with raw point clouds

Authors: Bohan Yang, Congying Sui, Fangxun Zhong, Yun-Hui Liu

Abstract: Deformable object manipulation (DOM) with point clouds has great potential as non-rigid 3D shapes can be measured without detecting and tracking image features. However, robotic shape control of deformable objects with point clouds is challenging due to: the unknown point-wise correspondences and the noisy partial observability of raw point clouds; the modeling difficulties of the relationship bet… ▽ More Deformable object manipulation (DOM) with point clouds has great potential as non-rigid 3D shapes can be measured without detecting and tracking image features. However, robotic shape control of deformable objects with point clouds is challenging due to: the unknown point-wise correspondences and the noisy partial observability of raw point clouds; the modeling difficulties of the relationship between point clouds and robot motions. To tackle these challenges, this paper introduces a novel modal-graph framework for the model-free shape servoing of deformable objects with raw point clouds. Unlike the existing works studying the object's geometry structure, our method builds a low-frequency deformation structure for the DOM system, which is robust to the measurement irregularities. The built modal representation and graph structure enable us to directly extract low-dimensional deformation features from raw point clouds. Such extraction requires no extra point processing of registrations, refinements, and occlusion removal. Moreover, to shape the object using the extracted features, we design an adaptive robust controller which is proved to be input-to-state stable (ISS) without offline learning or identifying both the physical and geometric object models. Extensive simulations and experiments are conducted to validate the effectiveness of our method for linear, planar, tubular, and solid objects under different settings. △ Less

Submitted 28 November, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

Comments: This paper has been accepted by the International Journal of The International Journal of Robotics Research. SAGE copyright

arXiv:2304.06002 [pdf]

Fast vehicle detection algorithm based on lightweight YOLO7-tiny

Authors: Bo Li, YiHua Chen, Hao Xu, Fei Zhong

Abstract: The swift and precise detection of vehicles plays a significant role in intelligent transportation systems. Current vehicle detection algorithms encounter challenges of high computational complexity, low detection rate, and limited feasibility on mobile devices. To address these issues, this paper proposes a lightweight vehicle detection algorithm based on YOLOv7-tiny (You Only Look Once version s… ▽ More The swift and precise detection of vehicles plays a significant role in intelligent transportation systems. Current vehicle detection algorithms encounter challenges of high computational complexity, low detection rate, and limited feasibility on mobile devices. To address these issues, this paper proposes a lightweight vehicle detection algorithm based on YOLOv7-tiny (You Only Look Once version seven) called Ghost-YOLOv7. The width of model is scaled to 0.5 and the standard convolution of the backbone network is replaced with Ghost convolution to achieve a lighter network and improve the detection speed; then a self-designed Ghost bi-directional feature pyramid network (Ghost-BiFPN) is embedded into the neck network to enhance feature extraction capability of the algorithm and enriches semantic information; and a Ghost Decouoled Head (GDH) is employed for accurate prediction of vehicle location and species; finally, a coordinate attention mechanism is introduced into the output layer to suppress environmental interference. The WIoU loss function is employed to further enhance the detection accuracy. Ablation experiments results on the PASCAL VOC dataset demonstrate that Ghost-YOLOv7 outperforms the original YOLOv7-tiny model. It achieving a 29.8% reduction in computation, 37.3% reduction in the number of parameters, 35.1% reduction in model weights, 1.1% higher mean average precision (mAP), the detection speed is higher 27FPS compared with the original algorithm. Ghost-YOLOv7 was also compared on KITTI and BIT-vehicle datasets as well, and the results show that this algorithm has the overall best performance. △ Less

Submitted 17 April, 2023; v1 submitted 12 April, 2023; originally announced April 2023.

arXiv:2304.03623 [pdf, other]

RSPT: Reconstruct Surroundings and Predict Trajectories for Generalizable Active Object Tracking

Authors: Fangwei Zhong, Xiao Bi, Yudi Zhang, Wei Zhang, Yizhou Wang

Abstract: Active Object Tracking (AOT) aims to maintain a specific relation between the tracker and object(s) by autonomously controlling the motion system of a tracker given observations. AOT has wide-ranging applications, such as in mobile robots and autonomous driving. However, building a generalizable active tracker that works robustly across different scenarios remains a challenge, especially in unstru… ▽ More Active Object Tracking (AOT) aims to maintain a specific relation between the tracker and object(s) by autonomously controlling the motion system of a tracker given observations. AOT has wide-ranging applications, such as in mobile robots and autonomous driving. However, building a generalizable active tracker that works robustly across different scenarios remains a challenge, especially in unstructured environments with cluttered obstacles and diverse layouts. We argue that constructing a state representation capable of modeling the geometry structure of the surroundings and the dynamics of the target is crucial for achieving this goal. To address this challenge, we present RSPT, a framework that forms a structure-aware motion representation by Reconstructing the Surroundings and Predicting the target Trajectory. Additionally, we enhance the generalization of the policy network by training in an asymmetric dueling mechanism. We evaluate RSPT on various simulated scenarios and show that it outperforms existing methods in unseen environments, particularly those with complex obstacles and layouts. We also demonstrate the successful transfer of RSPT to real-world settings. Project Website: https://sites.google.com/view/aot-rspt. △ Less

Submitted 7 April, 2023; originally announced April 2023.

Comments: AAAI 2023 (Oral)

arXiv:2303.10083 [pdf, other]

$αあるふぁ$Surf: Implicit Surface Reconstruction for Semi-Transparent and Thin Objects with Decoupled Geometry and Opacity

Authors: Tianhao Wu, Hanxue Liang, Fangcheng Zhong, Gernot Riegler, Shimon Vainer, Cengiz Oztireli

Abstract: Implicit surface representations such as the signed distance function (SDF) have emerged as a promising approach for image-based surface reconstruction. However, existing optimization methods assume solid surfaces and are therefore unable to properly reconstruct semi-transparent surfaces and thin structures, which also exhibit low opacity due to the blending effect with the background. While neura… ▽ More Implicit surface representations such as the signed distance function (SDF) have emerged as a promising approach for image-based surface reconstruction. However, existing optimization methods assume solid surfaces and are therefore unable to properly reconstruct semi-transparent surfaces and thin structures, which also exhibit low opacity due to the blending effect with the background. While neural radiance field (NeRF) based methods can model semi-transparency and achieve photo-realistic quality in synthesized novel views, their volumetric geometry representation tightly couples geometry and opacity, and therefore cannot be easily converted into surfaces without introducing artifacts. We present $αあるふぁ$Surf, a novel surface representation with decoupled geometry and opacity for the reconstruction of semi-transparent and thin surfaces where the colors mix. Ray-surface intersections on our representation can be found in closed-form via analytical solutions of cubic polynomials, avoiding Monte-Carlo sampling and is fully differentiable by construction. Our qualitative and quantitative evaluations show that our approach can accurately reconstruct surfaces with semi-transparent and thin parts with fewer artifacts, achieving better reconstruction quality than state-of-the-art SDF and NeRF methods. Website: https://alphasurf.netlify.app/ △ Less

Submitted 17 March, 2023; originally announced March 2023.

arXiv:2303.03964 [pdf, other]

doi 10.1109/TVCG.2023.3238821

Force-Directed Graph Layouts Revisited: A New Force Based on the T-Distribution

Authors: Fahai Zhong, Mingliang Xue, Jian Zhang, Fan Zhang, Rui Ban, Oliver Deussen, Yunhai Wang

Abstract: In this paper, we propose the t-FDP model, a force-directed placement method based on a novel bounded short-range force (t-force) defined by Student's t-distribution. Our formulation is flexible, exerts limited repulsive forces for nearby nodes and can be adapted separately in its short- and long-range effects. Using such forces in force-directed graph layouts yields better neighborhood preservati… ▽ More In this paper, we propose the t-FDP model, a force-directed placement method based on a novel bounded short-range force (t-force) defined by Student's t-distribution. Our formulation is flexible, exerts limited repulsive forces for nearby nodes and can be adapted separately in its short- and long-range effects. Using such forces in force-directed graph layouts yields better neighborhood preservation than current methods, while maintaining low stress errors. Our efficient implementation using a Fast Fourier Transform is one order of magnitude faster than state-of-the-art methods and two orders faster on the GPU, enabling us to perform parameter tuning by globally and locally adjusting the t-force in real-time for complex graphs. We demonstrate the quality of our approach by numerical evaluation against state-of-the-art approaches and extensions for interactive exploration. △ Less

Submitted 4 March, 2023; originally announced March 2023.

Comments: To appear in IEEE Transactions on Visualization and Computer Graphics

arXiv:2303.03767 [pdf, other]

Proactive Multi-Camera Collaboration For 3D Human Pose Estimation

Authors: Hai Ci, Mickel Liu, Xuehai Pan, Fangwei Zhong, Yizhou Wang

Abstract: This paper presents a multi-agent reinforcement learning (MARL) scheme for proactive Multi-Camera Collaboration in 3D Human Pose Estimation in dynamic human crowds. Traditional fixed-viewpoint multi-camera solutions for human motion capture (MoCap) are limited in capture space and susceptible to dynamic occlusions. Active camera approaches proactively control camera poses to find optimal viewpoint… ▽ More This paper presents a multi-agent reinforcement learning (MARL) scheme for proactive Multi-Camera Collaboration in 3D Human Pose Estimation in dynamic human crowds. Traditional fixed-viewpoint multi-camera solutions for human motion capture (MoCap) are limited in capture space and susceptible to dynamic occlusions. Active camera approaches proactively control camera poses to find optimal viewpoints for 3D reconstruction. However, current methods still face challenges with credit assignment and environment dynamics. To address these issues, our proposed method introduces a novel Collaborative Triangulation Contribution Reward (CTCR) that improves convergence and alleviates multi-agent credit assignment issues resulting from using 3D reconstruction accuracy as the shared reward. Additionally, we jointly train our model with multiple world dynamics learning tasks to better capture environment dynamics and encourage anticipatory behaviors for occlusion avoidance. We evaluate our proposed method in four photo-realistic UE4 environments to ensure validity and generalizability. Empirical results show that our method outperforms fixed and active baselines in various scenarios with different numbers of cameras and humans. △ Less

Submitted 7 March, 2023; originally announced March 2023.

Comments: ICLR 2023 poster

arXiv:2212.11076 [pdf, ps, other]

doi 10.1088/1402-4896/acdcc0

Theory of Critical Phenomena with Long-Range Temporal Interaction

Authors: Shaolong Zeng, Fan Zhong

Abstract: We develop a systematic theory for the critical phenomena with memory in all spatial dimensions, including $d<d_c$, $d=d_c$, and $d>d_c$, the upper critical dimension. We show that the Hamiltonian plays a unique role in dynamics and the dimensional constant $\mathfrak{d}_t$ that embodies the intimate relationship between space and time is the fundamental ingredient of the theory. However, its valu… ▽ More We develop a systematic theory for the critical phenomena with memory in all spatial dimensions, including $d<d_c$, $d=d_c$, and $d>d_c$, the upper critical dimension. We show that the Hamiltonian plays a unique role in dynamics and the dimensional constant $\mathfrak{d}_t$ that embodies the intimate relationship between space and time is the fundamental ingredient of the theory. However, its value varies with the space dimension continuously and vanishes exactly at $d=4$, reflecting reasonably the variation of the amount of the temporal dimension that is transferred to the spatial one with the strength of fluctuations. Such variations of the temporal dimension save all scaling laws though the fluctuation-dissipation theorem is violated. Various new universality classes emerge. △ Less

Submitted 8 May, 2023; v1 submitted 21 December, 2022; originally announced December 2022.

Comments: 32 pages, 1 figure. Version 2: 41 pages, 1 figure. Clarify the origin of the memory

Journal ref: Physica Scripta 98, 075017 (2023)

arXiv:2212.08641 [pdf, other]

GFPose: Learning 3D Human Pose Prior with Gradient Fields

Authors: Hai Ci, Mingdong Wu, Wentao Zhu, Xiaoxuan Ma, Hao Dong, Fangwei Zhong, Yizhou Wang

Abstract: Learning 3D human pose prior is essential to human-centered AI. Here, we present GFPose, a versatile framework to model plausible 3D human poses for various applications. At the core of GFPose is a time-dependent score network, which estimates the gradient on each body joint and progressively denoises the perturbed 3D human pose to match a given task specification. During the denoising process, GF… ▽ More Learning 3D human pose prior is essential to human-centered AI. Here, we present GFPose, a versatile framework to model plausible 3D human poses for various applications. At the core of GFPose is a time-dependent score network, which estimates the gradient on each body joint and progressively denoises the perturbed 3D human pose to match a given task specification. During the denoising process, GFPose implicitly incorporates pose priors in gradients and unifies various discriminative and generative tasks in an elegant framework. Despite the simplicity, GFPose demonstrates great potential in several downstream tasks. Our experiments empirically show that 1) as a multi-hypothesis pose estimator, GFPose outperforms existing SOTAs by 20% on Human3.6M dataset. 2) as a single-hypothesis pose estimator, GFPose achieves comparable results to deterministic SOTAs, even with a vanilla backbone. 3) GFPose is able to produce diverse and realistic samples in pose denoising, completion and generation tasks. Project page https://sites.google.com/view/gfpose/ △ Less

Submitted 16 December, 2022; originally announced December 2022.

arXiv:2211.14738 [pdf, other]

Distilled Visual and Robot Kinematics Embeddings for Metric Depth Estimation in Monocular Scene Reconstruction

Authors: Ruofeng Wei, Bin Li, Hangjie Mo, Fangxun Zhong, Yonghao Long, Qi Dou, Yun-Hui Liu, Dong Sun

Abstract: Estimating precise metric depth and scene reconstruction from monocular endoscopy is a fundamental task for surgical navigation in robotic surgery. However, traditional stereo matching adopts binocular images to perceive the depth information, which is difficult to transfer to the soft robotics-based surgical systems due to the use of monocular endoscopy. In this paper, we present a novel framewor… ▽ More Estimating precise metric depth and scene reconstruction from monocular endoscopy is a fundamental task for surgical navigation in robotic surgery. However, traditional stereo matching adopts binocular images to perceive the depth information, which is difficult to transfer to the soft robotics-based surgical systems due to the use of monocular endoscopy. In this paper, we present a novel framework that combines robot kinematics and monocular endoscope images with deep unsupervised learning into a single network for metric depth estimation and then achieve 3D reconstruction of complex anatomy. Specifically, we first obtain the relative depth maps of surgical scenes by leveraging a brightness-aware monocular depth estimation method. Then, the corresponding endoscope poses are computed based on non-linear optimization of geometric and photometric reprojection residuals. Afterwards, we develop a Depth-driven Sliding Optimization (DDSO) algorithm to extract the scaling coefficient from kinematics and calculated poses offline. By coupling the metric scale and relative depth data, we form a robust ensemble that represents the metric and consistent depth. Next, we treat the ensemble as supervisory labels to train a metric depth estimation network for surgeries (i.e., MetricDepthS-Net) that distills the embeddings from the robot kinematics, endoscopic videos, and poses. With accurate metric depth estimation, we utilize a dense visual reconstruction method to recover the 3D structure of the whole surgical site. We have extensively evaluated the proposed framework on public SCARED and achieved comparable performance with stereo-based depth estimation methods. Our results demonstrate the feasibility of the proposed approach to recover the metric depth and 3D structure with monocular inputs. △ Less

Submitted 27 November, 2022; originally announced November 2022.

arXiv:2210.12505 [pdf, other]

Sommerfeld effect in freeze-in dark matter

Authors: Fucheng Zhong, Xinyu Wang

Abstract: If two annihilation products of dark matter (DM) particles are non-relativistic and coupled to a light force mediator, their plane wave functions are modified due to multiple exchanges of the force mediators. This gives rise to the Sommerfeld effect (SE). We consider the attractive and repulsive force SE on the relic density in different phases of freeze-in DM. We find that in the pure freeze-in r… ▽ More If two annihilation products of dark matter (DM) particles are non-relativistic and coupled to a light force mediator, their plane wave functions are modified due to multiple exchanges of the force mediators. This gives rise to the Sommerfeld effect (SE). We consider the attractive and repulsive force SE on the relic density in different phases of freeze-in DM. We find that in the pure freeze-in region, the attractive/repulsive force SE slightly increases/decreases DM relic density by less than $20\%$ for TeV-scale DM. In the reannihilation region, if the portal coupling $κかっぱ$ is sufficiently large (by comparing the portal reaction rate to the Hubble rate), DM density will reach its equilibrium, and subsequently freeze out. Compared to the case without the SE, the presence of the attractive SE leads to an enlarged cross-section. As a result, a higher equilibrium value of DM density is reached, and a lower relic density is obtained after the subsequent freeze-out. However, the repulsive SE has the opposite influence. In the dark sector (DS) freeze-out region, also known as the middle flat plateau or ``mesa'' in the phase diagram, the SE has a significant impact on DM relic abundance. In this region, the attractive SE suppresses DM relic density by simultaneously enlarging the cross-section of the portal and DS internal interaction. In contrast, the repulsive SE will have the opposite effect. Finally, in the usual freeze-out region, DM relic density is suppressed or enhanced by an enlarged or reduced cross-section of the portal, respectively, due to the presence of the attractive or repulsive SE. In summary, when considering the constraint of producing correct DM relic abundance, the inclusion of SE in the portal reaction or DS internal reaction will modify the model parameters, resulting in a band-like possible parameter space. △ Less

Submitted 8 October, 2023; v1 submitted 22 October, 2022; originally announced October 2022.

Comments: 28 pages, 11 figures. Comments/suggestions are welcome

arXiv:2210.03919 [pdf, other]

CLIP-PAE: Projection-Augmentation Embedding to Extract Relevant Features for a Disentangled, Interpretable, and Controllable Text-Guided Face Manipulation

Authors: Chenliang Zhou, Fangcheng Zhong, Cengiz Oztireli

Abstract: Recently introduced Contrastive Language-Image Pre-Training (CLIP) bridges images and text by embedding them into a joint latent space. This opens the door to ample literature that aims to manipulate an input image by providing a textual explanation. However, due to the discrepancy between image and text embeddings in the joint space, using text embeddings as the optimization target often introduc… ▽ More Recently introduced Contrastive Language-Image Pre-Training (CLIP) bridges images and text by embedding them into a joint latent space. This opens the door to ample literature that aims to manipulate an input image by providing a textual explanation. However, due to the discrepancy between image and text embeddings in the joint space, using text embeddings as the optimization target often introduces undesired artifacts in the resulting images. Disentanglement, interpretability, and controllability are also hard to guarantee for manipulation. To alleviate these problems, we propose to define corpus subspaces spanned by relevant prompts to capture specific image characteristics. We introduce CLIP Projection-Augmentation Embedding (PAE) as an optimization target to improve the performance of text-guided image manipulation. Our method is a simple and general paradigm that can be easily computed and adapted, and smoothly incorporated into any CLIP-based image manipulation algorithm. To demonstrate the effectiveness of our method, we conduct several theoretical and empirical studies. As a case study, we utilize the method for text-guided semantic face editing. We quantitatively and qualitatively demonstrate that PAE facilitates a more disentangled, interpretable, and controllable image manipulation with state-of-the-art quality and accuracy. △ Less

Submitted 7 May, 2023; v1 submitted 8 October, 2022; originally announced October 2022.

arXiv:2209.00853 [pdf, other]

TarGF: Learning Target Gradient Field to Rearrange Objects without Explicit Goal Specification

Authors: Mingdong Wu, Fangwei Zhong, Yulong Xia, Hao Dong

Abstract: Object Rearrangement is to move objects from an initial state to a goal state. Here, we focus on a more practical setting in object rearrangement, i.e., rearranging objects from shuffled layouts to a normative target distribution without explicit goal specification. However, it remains challenging for AI agents, as it is hard to describe the target distribution (goal specification) for reward engi… ▽ More Object Rearrangement is to move objects from an initial state to a goal state. Here, we focus on a more practical setting in object rearrangement, i.e., rearranging objects from shuffled layouts to a normative target distribution without explicit goal specification. However, it remains challenging for AI agents, as it is hard to describe the target distribution (goal specification) for reward engineering or collect expert trajectories as demonstrations. Hence, it is infeasible to directly employ reinforcement learning or imitation learning algorithms to address the task. This paper aims to search for a policy only with a set of examples from a target distribution instead of a handcrafted reward function. We employ the score-matching objective to train a Target Gradient Field (TarGF), indicating a direction on each object to increase the likelihood of the target distribution. For object rearrangement, the TarGF can be used in two ways: 1) For model-based planning, we can cast the target gradient into a reference control and output actions with a distributed path planner; 2) For model-free reinforcement learning, the TarGF is not only used for estimating the likelihood-change as a reward but also provides suggested actions in residual policy learning. Experimental results in ball and room rearrangement demonstrate that our method significantly outperforms the state-of-the-art methods in the quality of the terminal state, the efficiency of the control process, and scalability. △ Less

Submitted 16 January, 2023; v1 submitted 2 September, 2022; originally announced September 2022.

arXiv:2208.13422 [pdf]

Light-YOLOv5: A Lightweight Algorithm for Improved YOLOv5 in Complex Fire Scenarios

Authors: Hao Xu, Bo Li, Fei Zhong

Abstract: Fire-detection technology is of great importance for successful fire-prevention measures. Image-based fire detection is one effective method. At present, object-detection algorithms are deficient in performing detection speed and accuracy tasks when they are applied in complex fire scenarios. In this study, a lightweight fire-detection algorithm, Light-YOLOv5 (You Only Look Once version five), is… ▽ More Fire-detection technology is of great importance for successful fire-prevention measures. Image-based fire detection is one effective method. At present, object-detection algorithms are deficient in performing detection speed and accuracy tasks when they are applied in complex fire scenarios. In this study, a lightweight fire-detection algorithm, Light-YOLOv5 (You Only Look Once version five), is presented. First, a separable vision transformer (SepViT) block is used to replace several C3 modules in the final layer of a backbone network to enhance both the contact of the backbone network to global in-formation and the extraction of flame and smoke features; second, a light bidirectional feature pyramid network (Light-BiFPN) is designed to lighten the model while improving the feature extraction and balancing speed and accuracy features during a fire-detection procedure; third, a global attention mechanism (GAM) is fused into the network to cause the model to focus more on the global dimensional features and further improve the detection accuracy of the model; and finally, the Mish activation function and SIoU loss are utilized to simultaneously increase the convergence speed and enhance the accuracy. The experimental results show that compared to the original algorithm, the mean average accuracy (mAP) of Light-YOLOv5 increases by 3.3%, the number of parameters decreases by 27.1%, and the floating point operations (FLOPs) decrease by 19.1%. The detection speed reaches 91.1 FPS, which can detect targets in complex fire scenarios in real time. △ Less

Submitted 1 December, 2022; v1 submitted 29 August, 2022; originally announced August 2022.

arXiv:2208.02049 [pdf, other]

AutoLaparo: A New Dataset of Integrated Multi-tasks for Image-guided Surgical Automation in Laparoscopic Hysterectomy

Authors: Ziyi Wang, Bo Lu, Yonghao Long, Fangxun Zhong, Tak-Hong Cheung, Qi Dou, Yunhui Liu

Abstract: Computer-assisted minimally invasive surgery has great potential in benefiting modern operating theatres. The video data streamed from the endoscope provides rich information to support context-awareness for next-generation intelligent surgical systems. To achieve accurate perception and automatic manipulation during the procedure, learning based technique is a promising way, which enables advance… ▽ More Computer-assisted minimally invasive surgery has great potential in benefiting modern operating theatres. The video data streamed from the endoscope provides rich information to support context-awareness for next-generation intelligent surgical systems. To achieve accurate perception and automatic manipulation during the procedure, learning based technique is a promising way, which enables advanced image analysis and scene understanding in recent years. However, learning such models highly relies on large-scale, high-quality, and multi-task labelled data. This is currently a bottleneck for the topic, as available public dataset is still extremely limited in the field of CAI. In this paper, we present and release the first integrated dataset (named AutoLaparo) with multiple image-based perception tasks to facilitate learning-based automation in hysterectomy surgery. Our AutoLaparo dataset is developed based on full-length videos of entire hysterectomy procedures. Specifically, three different yet highly correlated tasks are formulated in the dataset, including surgical workflow recognition, laparoscope motion prediction, and instrument and key anatomy segmentation. In addition, we provide experimental results with state-of-the-art models as reference benchmarks for further model developments and evaluations on this dataset. The dataset is available at https://autolaparo.github.io. △ Less

Submitted 3 August, 2022; originally announced August 2022.

Comments: Accepted at MICCAI 2022

arXiv:2207.12620 [pdf, other]

Large-displacement 3D Object Tracking with Hybrid Non-local Optimization

Authors: Xuhui Tian, Xinran Lin, Fan Zhong, Xueying Qin

Abstract: Optimization-based 3D object tracking is known to be precise and fast, but sensitive to large inter-frame displacements. In this paper we propose a fast and effective non-local 3D tracking method. Based on the observation that erroneous local minimum are mostly due to the out-of-plane rotation, we propose a hybrid approach combining non-local and local optimizations for different parameters, resul… ▽ More Optimization-based 3D object tracking is known to be precise and fast, but sensitive to large inter-frame displacements. In this paper we propose a fast and effective non-local 3D tracking method. Based on the observation that erroneous local minimum are mostly due to the out-of-plane rotation, we propose a hybrid approach combining non-local and local optimizations for different parameters, resulting in efficient non-local search in the 6D pose space. In addition, a precomputed robust contour-based tracking method is proposed for the pose optimization. By using long search lines with multiple candidate correspondences, it can adapt to different frame displacements without the need of coarse-to-fine search. After the pre-computation, pose updates can be conducted very fast, enabling the non-local optimization to run in real time. Our method outperforms all previous methods for both small and large displacements. For large displacements, the accuracy is greatly improved ($81.7\% \;\text{v.s.}\; 19.4\%$). At the same time, real-time speed ($>$50fps) can be achieved with only CPU. The source code is available at \url{https://github.com/cvbubbles/nonlocal-3dtracking}. △ Less

Submitted 25 July, 2022; originally announced July 2022.

arXiv:2207.01249 [pdf, other]

Model-Free 3D Shape Control of Deformable Objects Using Novel Features Based on Modal Analysis

Authors: Bohan Yang, Bo Lu, Wei Chen, Fangxun Zhong, Yun-Hui Liu

Abstract: Shape control of deformable objects is a challenging and important robotic problem. This paper proposes a model-free controller using novel 3D global deformation features based on modal analysis. Unlike most existing controllers using geometric features, our controller employs a physically-based deformation feature by decoupling 3D global deformation into low-frequency mode shapes. Although modal… ▽ More Shape control of deformable objects is a challenging and important robotic problem. This paper proposes a model-free controller using novel 3D global deformation features based on modal analysis. Unlike most existing controllers using geometric features, our controller employs a physically-based deformation feature by decoupling 3D global deformation into low-frequency mode shapes. Although modal analysis is widely adopted in computer vision and simulation, it has not been used in robotic deformation control. We develop a new model-free framework for modal-based deformation control under robot manipulation. Physical interpretation of mode shapes enables us to formulate an analytical deformation Jacobian matrix mapping the robot manipulation onto changes of the modal features. In the Jacobian matrix, unknown geometry and physical properties of the object are treated as low-dimensional modal parameters which can be used to linearly parameterize the closed-loop system. Thus, an adaptive controller with proven stability can be designed to deform the object while online estimating the modal parameters. Simulations and experiments are conducted using linear, planar, and solid objects under different settings. The results not only confirm the superior performance of our controller but also demonstrate its advantages over the baseline method. △ Less

Submitted 18 April, 2023; v1 submitted 4 July, 2022; originally announced July 2022.

Comments: Accepted by the IEEE Transactions on Robotics. The paper will appear in the IEEE Transactions on Robotics. IEEE copyright

arXiv:2205.15838 [pdf, other]

D$^2$NeRF: Self-Supervised Decoupling of Dynamic and Static Objects from a Monocular Video

Authors: Tianhao Wu, Fangcheng Zhong, Andrea Tagliasacchi, Forrester Cole, Cengiz Oztireli

Abstract: Given a monocular video, segmenting and decoupling dynamic objects while recovering the static environment is a widely studied problem in machine intelligence. Existing solutions usually approach this problem in the image domain, limiting their performance and understanding of the environment. We introduce Decoupled Dynamic Neural Radiance Field (D$^2$NeRF), a self-supervised approach that takes a… ▽ More Given a monocular video, segmenting and decoupling dynamic objects while recovering the static environment is a widely studied problem in machine intelligence. Existing solutions usually approach this problem in the image domain, limiting their performance and understanding of the environment. We introduce Decoupled Dynamic Neural Radiance Field (D$^2$NeRF), a self-supervised approach that takes a monocular video and learns a 3D scene representation which decouples moving objects, including their shadows, from the static background. Our method represents the moving objects and the static background by two separate neural radiance fields with only one allowing for temporal changes. A naive implementation of this approach leads to the dynamic component taking over the static one as the representation of the former is inherently more general and prone to overfitting. To this end, we propose a novel loss to promote correct separation of phenomena. We further propose a shadow field network to detect and decouple dynamically moving shadows. We introduce a new dataset containing various dynamic objects and shadows and demonstrate that our method can achieve better performance than state-of-the-art approaches in decoupling dynamic and static 3D objects, occlusion and shadow removal, and image segmentation for moving objects. △ Less

Submitted 5 November, 2022; v1 submitted 31 May, 2022; originally announced May 2022.

arXiv:2204.01091 [pdf, other]

doi 10.1088/1674-1137/ac7200

Final bound-state formation effect on dark matter annihilation

Authors: Xinyu Wang, Fucheng Zhong, Feng Luo

Abstract: If the annihilation products of dark matter (DM) are non-relativistic and couples directly to a light force mediator, the non-perturbation effect like final state bound state (FBS) formation and final state Sommerfeld (FSS) effect must be considered. Non-relativistic region of final particles will appear when there is small mass split between DM and products, so we study those effects in the degen… ▽ More If the annihilation products of dark matter (DM) are non-relativistic and couples directly to a light force mediator, the non-perturbation effect like final state bound state (FBS) formation and final state Sommerfeld (FSS) effect must be considered. Non-relativistic region of final particles will appear when there is small mass split between DM and products, so we study those effects in the degenerate region of mass (including kinematics forbidden case) using two specific models. We demonstrate that FBS effect will significantly modify the DM relic abundance comparing to the standard perturbation calculation in some mass split region. We emphasize that FBS effect is comparable to the FSS effect in those mass split. The conservation angular momentum are subtle considering FBS formation, in some cases there may be not $s$-wave, so we use two models exhibit the different partial wave FBS effect contribution. We also show that the FBS formation with vector boson emission process also contributes in DM relic abundance, and first calculate the $p$-wave FSS effect in the specific model. △ Less

Submitted 21 May, 2022; v1 submitted 3 April, 2022; originally announced April 2022.

Comments: Version to be published in Chinese Physics C. 23 pages, 13 figures

arXiv:2203.16245 [pdf, ps, other]

doi 10.1088/1402-4896/ac9ca3

Effective-Dimension Theory of Critical Phenomena above Upper Critical Dimensions

Authors: Shaolong Zeng, Sue Ping Szeto, Fan Zhong

Abstract: Phase transitions and critical phenomena are among the most intriguing phenomena in nature and their renormalization-group theory is one of the greatest achievements of theoretical physics. However, the predictions of the theory above an upper critical dimension $d_c$ seriously disagree with reality. In addition to its fundamental significance, the problem is also of practical importance because b… ▽ More Phase transitions and critical phenomena are among the most intriguing phenomena in nature and their renormalization-group theory is one of the greatest achievements of theoretical physics. However, the predictions of the theory above an upper critical dimension $d_c$ seriously disagree with reality. In addition to its fundamental significance, the problem is also of practical importance because both complex systems with spatial or temporal long-range interactions and quantum phase transitions can substantially lower $d_c$. The extant scenarios built on a dangerous irrelevant variable (DIV) to resolve the problem introduce two sets of critical exponents and even two sets of scaling laws whose origin is obscure. Here, we consider the DIV from a different perspective and clearly unveil the origin of the two sets of exponents and hence the intrinsic inconsistency in those scenarios. We then develop an effective-dimension theory in which critical fluctuations and system volume are fixed at an effective dimension by the DIV. This enables us to account for all the extant results consistently. A novel asymptotic finite-size scaling behavior for a correlation function together with a new anomalous dimension and its associated scaling law is also derived. △ Less

Submitted 30 March, 2022; originally announced March 2022.

Comments: 6 pages, no figure

Journal ref: A revised version was published in Physica Scripta 97, 125002 (2022)

arXiv:2203.16243 [pdf, ps, other]

doi 10.1088/0256-307X/39/12/120501

Theory of Critical Phenomena with Memory

Authors: Shaolong Zeng, Sue ping Szeto, Fan Zhong

Abstract: Memory is a ubiquitous characteristic of complex systems and critical phenomena are one of the most intriguing phenomena in nature. Here, we propose an Ising model with memory and develop a corresponding theory of critical phenomena with memory for complex systems and discovered a series of surprising novel results. We show that a naive theory of a usual Hamiltonian with a direct inclusion of a po… ▽ More Memory is a ubiquitous characteristic of complex systems and critical phenomena are one of the most intriguing phenomena in nature. Here, we propose an Ising model with memory and develop a corresponding theory of critical phenomena with memory for complex systems and discovered a series of surprising novel results. We show that a naive theory of a usual Hamiltonian with a direct inclusion of a power-law decaying long-range temporal interaction violates radically a hyperscaling law for all spatial dimensions even at and below the upper critical dimension. This entails both indispensable consideration of the Hamiltonian for dynamics, rather than the usual practice of just focusing on the corresponding dynamic Lagrangian alone, and transformations that result in a correct theory in which space and time are inextricably interwoven, leading to an effective spatial dimension that repairs the hyperscaling law. The theory gives rise to a set of novel mean-field critical exponents, which are different from the usual Landau ones, as well as new universality classes. These exponents are verified by numerical simulations of the Ising model with memory in two and three spatial dimensions. △ Less

Submitted 12 August, 2022; v1 submitted 30 March, 2022; originally announced March 2022.

Comments: 6 pages, 2 figures. The new version has 8 pages, more references, more discussions on model, and used renormalization-group technique instead of power counting method

Journal ref: Chin. Phys. Lett. 39, 120501 (2022)

arXiv:2203.13572 [pdf, other]

A Visual Navigation Perspective for Category-Level Object Pose Estimation

Authors: Jiaxin Guo, Fangxun Zhong, Rong Xiong, Yunhui Liu, Yue Wang, Yiyi Liao

Abstract: This paper studies category-level object pose estimation based on a single monocular image. Recent advances in pose-aware generative models have paved the way for addressing this challenging task using analysis-by-synthesis. The idea is to sequentially update a set of latent variables, e.g., pose, shape, and appearance, of the generative model until the generated image best agrees with the observa… ▽ More This paper studies category-level object pose estimation based on a single monocular image. Recent advances in pose-aware generative models have paved the way for addressing this challenging task using analysis-by-synthesis. The idea is to sequentially update a set of latent variables, e.g., pose, shape, and appearance, of the generative model until the generated image best agrees with the observation. However, convergence and efficiency are two challenges of this inference procedure. In this paper, we take a deeper look at the inference of analysis-by-synthesis from the perspective of visual navigation, and investigate what is a good navigation policy for this specific task. We evaluate three different strategies, including gradient descent, reinforcement learning and imitation learning, via thorough comparisons in terms of convergence, robustness and efficiency. Moreover, we show that a simple hybrid approach leads to an effective and efficient solution. We further compare these strategies to state-of-the-art methods, and demonstrate superior performance on synthetic and real-world datasets leveraging off-the-shelf pose-aware generative models. △ Less

Submitted 23 July, 2022; v1 submitted 25 March, 2022; originally announced March 2022.

arXiv:2203.13437 [pdf]

BCOT: A Markerless High-Precision 3D Object Tracking Benchmark

Authors: Jiachen Li, Bin Wang, Shiqiang Zhu, Xin Cao, Fan Zhong, Wenxuan Chen, Te Li, Jason Gu, Xueying Qin

Abstract: Template-based 3D object tracking still lacks a high-precision benchmark of real scenes due to the difficulty of annotating the accurate 3D poses of real moving video objects without using markers. In this paper, we present a multi-view approach to estimate the accurate 3D poses of real moving objects, and then use binocular data to construct a new benchmark for monocular textureless 3D object tra… ▽ More Template-based 3D object tracking still lacks a high-precision benchmark of real scenes due to the difficulty of annotating the accurate 3D poses of real moving video objects without using markers. In this paper, we present a multi-view approach to estimate the accurate 3D poses of real moving objects, and then use binocular data to construct a new benchmark for monocular textureless 3D object tracking. The proposed method requires no markers, and the cameras only need to be synchronous, relatively fixed as cross-view and calibrated. Based on our object-centered model, we jointly optimize the object pose by minimizing shape re-projection constraints in all views, which greatly improves the accuracy compared with the single-view approach, and is even more accurate than the depth-based method. Our new benchmark dataset contains 20 textureless objects, 22 scenes, 404 video sequences and 126K images captured in real scenes. The annotation error is guaranteed to be less than 2mm, according to both theoretical analysis and validation experiments. We re-evaluate the state-of-the-art 3D object tracking methods with our dataset, reporting their performance ranking in real scenes. Our BCOT benchmark and code can be found at https://ar3dv.github.io/BCOT-Benchmark/. △ Less

Submitted 24 March, 2022; originally announced March 2022.

arXiv:2203.04647 [pdf, other]

Normal and Visibility Estimation of Human Face from a Single Image

Authors: Fuzhi Zhong, Rui Wang, Yuchi Huo, Hujun Bao

Abstract: Recent work on the intrinsic image of humans starts to consider the visibility of incident illumination and encodes the light transfer function by spherical harmonics. In this paper, we show that such a light transfer function can be further decomposed into visibility and cosine terms related to surface normal. Such decomposition allows us to recover the surface normal in addition to visibility. W… ▽ More Recent work on the intrinsic image of humans starts to consider the visibility of incident illumination and encodes the light transfer function by spherical harmonics. In this paper, we show that such a light transfer function can be further decomposed into visibility and cosine terms related to surface normal. Such decomposition allows us to recover the surface normal in addition to visibility. We propose a deep learning-based approach with a reconstruction loss for training on real-world images. Results show that compared with previous works, the reconstruction of human face from our method better reveals the surface normal and shading details especially around regions where visibility effect is strong. △ Less

Submitted 9 March, 2022; originally announced March 2022.

arXiv:2203.03570 [pdf, other]

Kubric: A scalable dataset generator

Authors: Klaus Greff, Francois Belletti, Lucas Beyer, Carl Doersch, Yilun Du, Daniel Duckworth, David J. Fleet, Dan Gnanapragasam, Florian Golemo, Charles Herrmann, Thomas Kipf, Abhijit Kundu, Dmitry Lagun, Issam Laradji, Hsueh-Ti, Liu, Henning Meyer, Yishu Miao, Derek Nowrouzezahrai, Cengiz Oztireli, Etienne Pot, Noha Radwan, Daniel Rebain, Sara Sabour, Mehdi S. M. Sajjadi , et al. (10 additional authors not shown)

Abstract: Data is the driving force of machine learning, with the amount and quality of training data often being more important for the performance of a system than architecture and training details. But collecting, processing and annotating real data at scale is difficult, expensive, and frequently raises additional privacy, fairness and legal concerns. Synthetic data is a powerful tool with the potential… ▽ More Data is the driving force of machine learning, with the amount and quality of training data often being more important for the performance of a system than architecture and training details. But collecting, processing and annotating real data at scale is difficult, expensive, and frequently raises additional privacy, fairness and legal concerns. Synthetic data is a powerful tool with the potential to address these shortcomings: 1) it is cheap 2) supports rich ground-truth annotations 3) offers full control over data and 4) can circumvent or mitigate problems regarding bias, privacy and licensing. Unfortunately, software tools for effective data generation are less mature than those for architecture design and training, which leads to fragmented generation efforts. To address these problems we introduce Kubric, an open-source Python framework that interfaces with PyBullet and Blender to generate photo-realistic scenes, with rich annotations, and seamlessly scales to large jobs distributed over thousands of machines, and generating TBs of data. We demonstrate the effectiveness of Kubric by presenting a series of 13 different generated datasets for tasks ranging from studying 3D NeRF models to optical flow estimation. We release Kubric, the used assets, all of the generation code, as well as the rendered datasets for reuse and modification. △ Less

Submitted 7 March, 2022; originally announced March 2022.

Comments: 21 pages, CVPR2022

arXiv:2203.02119 [pdf, other]

GraspARL: Dynamic Grasping via Adversarial Reinforcement Learning

Authors: Tianhao Wu, Fangwei Zhong, Yiran Geng, Hongchen Wang, Yongjian Zhu, Yizhou Wang, Hao Dong

Abstract: Grasping moving objects, such as goods on a belt or living animals, is an important but challenging task in robotics. Conventional approaches rely on a set of manually defined object motion patterns for training, resulting in poor generalization to unseen object trajectories. In this work, we introduce an adversarial reinforcement learning framework for dynamic grasping, namely GraspARL. To be spe… ▽ More Grasping moving objects, such as goods on a belt or living animals, is an important but challenging task in robotics. Conventional approaches rely on a set of manually defined object motion patterns for training, resulting in poor generalization to unseen object trajectories. In this work, we introduce an adversarial reinforcement learning framework for dynamic grasping, namely GraspARL. To be specific. we formulate the dynamic grasping problem as a 'move-and-grasp' game, where the robot is to pick up the object on the mover and the adversarial mover is to find a path to escape it. Hence, the two agents play a min-max game and are trained by reinforcement learning. In this way, the mover can auto-generate diverse moving trajectories while training. And the robot trained with the adversarial trajectories can generalize to various motion patterns. Empirical results on the simulator and real-world scenario demonstrate the effectiveness of each and good generalization of our method. △ Less

Submitted 14 March, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

arXiv:2202.08439 [pdf, other]

doi 10.1088/1674-4527/ac68c4

Galaxy Spectra neural Networks (GaSNets). I. Searching for strong lens candidates in eBOSS spectra using Deep Learning

Authors: Fucheng Zhong, Rui Li, Nicola R. Napolitano

Abstract: With the advent of new spectroscopic surveys from ground and space, observing up to hundreds of millions of galaxies, spectra classification will become overwhelming for standard analysis techniques. To prepare for this challenge, we introduce a family of deep learning tools to classify features in one-dimensional spectra. As the first application of these Galaxy Spectra neural Networks (GaSNets),… ▽ More With the advent of new spectroscopic surveys from ground and space, observing up to hundreds of millions of galaxies, spectra classification will become overwhelming for standard analysis techniques. To prepare for this challenge, we introduce a family of deep learning tools to classify features in one-dimensional spectra. As the first application of these Galaxy Spectra neural Networks (GaSNets), we focus on tools specialized at identifying emission lines from strongly lensed star-forming galaxies in the eBOSS spectra. We first discuss the training and testing of these networks and define a threshold probability, PL, of 95% for the high quality event detection. Then, using a previous set of spectroscopically selected strong lenses from eBOSS, confirmed with HST, we estimate a completeness of ~80% as the fraction of lenses recovered above the adopted PL. We finally apply the GaSNets to ~1.3M spectra to collect a first list of ~430 new high quality candidates identified with deep learning applied to spectroscopy and visually graded as highly probable real events. A preliminary check against ground-based observations tentatively shows that this sample has a confirmation rate of 38%, in line with previous samples selected with standard (no deep learning) classification tools and follow-up by Hubble Space Telescope. This first test shows that machine learning can be efficiently extended to feature recognition in the wavelength space, which will be crucial for future surveys like 4MOST, DESI, Euclid, and the Chinese Space Station Telescope (CSST). △ Less

Submitted 17 April, 2022; v1 submitted 16 February, 2022; originally announced February 2022.

Comments: Submitted to RAA, 33 pages,21 figures. Comments/suggestions are welcome. It is accepted with minor revisions by RAA

arXiv:2201.02374 [pdf, other]

As-Continuous-As-Possible Extrusion Fabrication of Surface Models

Authors: Fanchao Zhong, Yonglai Xu, Haisen Zhao, Lin Lu

Abstract: We propose a novel computational framework for optimizing the toolpath continuity in fabricating surface models on an extrusion-based 3D printer. Toolpath continuity has been a critical issue for extrusion-based fabrications that affects both quality and efficiency. Transfer moves cause non-smoothor bumpy surfaces and get worse for materials with large inertia like clay. For surface models, the ef… ▽ More We propose a novel computational framework for optimizing the toolpath continuity in fabricating surface models on an extrusion-based 3D printer. Toolpath continuity has been a critical issue for extrusion-based fabrications that affects both quality and efficiency. Transfer moves cause non-smoothor bumpy surfaces and get worse for materials with large inertia like clay. For surface models, the effects of continuity are even more severe, in terms of surface quality and model stability. In this paper, we introduce an original criterion "one-path-patch" (OPP), for representing a shell surface patch that can be traversed in one path considering fabrication constraints. We study the properties of an OPP and the merging operations for OPPs, and propose a bottom-up OPP merging procedure for decomposing the given shell surface into a minimal number of OPPs and generating the "as-continuous-as-possible" (ACAP) toolpath. Furthermore, we customize the path planning algorithm with a curved layer printing scheme, which reduces the staircase defect and improves the toolpath continuity via possibly connecting multiple segments. We evaluate the ACAP algorithm for both ceramic and thermoplastic materials, and results demonstrate that it improves the fabrication of surface models in both surface quality and efficiency. △ Less

Submitted 28 May, 2022; v1 submitted 7 January, 2022; originally announced January 2022.

Comments: 16 pages, 23 figures

ACM Class: I.3.5

arXiv:2112.05098 [pdf, other]

doi 10.7554/eLife.93115.1

Intraspecific predator interference promotes biodiversity in ecosystems

Authors: Ju Kang, Shijie Zhang, Yiyuan Niu, Fan Zhong, Xin Wang

Abstract: Explaining biodiversity is a fundamental issue in ecology. A long-standing puzzle lies in the paradox of the plankton: many species of plankton feeding on a limited variety of resources coexist, apparently flouting the competitive exclusion principle (CEP), which holds that the number of predator (consumer) species cannot exceed that of the resources at a steady state. Here, we present a mechanist… ▽ More Explaining biodiversity is a fundamental issue in ecology. A long-standing puzzle lies in the paradox of the plankton: many species of plankton feeding on a limited variety of resources coexist, apparently flouting the competitive exclusion principle (CEP), which holds that the number of predator (consumer) species cannot exceed that of the resources at a steady state. Here, we present a mechanistic model and demonstrate that intraspecific interference among the consumers enables a plethora of consumer species to coexist at constant population densities with only one or a handful of resource species. This facilitated biodiversity is resistant to stochasticity, either with the stochastic simulation algorithm or individual-based modeling. Our model naturally explains the classical experiments that invalidate the CEP, quantitatively illustrates the universal S-shaped pattern of the rank-abundance curves across a wide range of ecological communities, and can be broadly used to resolve the mystery of biodiversity in many natural ecosystems. △ Less

Submitted 30 April, 2024; v1 submitted 9 December, 2021; originally announced December 2021.

Comments: Main text 14 pages, 3 figures. Appendices 34 pages, 15 Appendix-figures

Showing 1–50 of 111 results for author: Zhong, F