Search | arXiv e-print repository

General and Task-Oriented Video Segmentation

Authors: Mu Chen, Liulei Li, Wenguan Wang, Ruijie Quan, Yi Yang

Abstract: We present GvSeg, a general video segmentation framework for addressing four different video segmentation tasks (i.e., instance, semantic, panoptic, and exemplar-guided) while maintaining an identical architectural design. Currently, there is a trend towards developing general video segmentation solutions that can be applied across multiple tasks. This streamlines research endeavors and simplifies… ▽ More We present GvSeg, a general video segmentation framework for addressing four different video segmentation tasks (i.e., instance, semantic, panoptic, and exemplar-guided) while maintaining an identical architectural design. Currently, there is a trend towards developing general video segmentation solutions that can be applied across multiple tasks. This streamlines research endeavors and simplifies deployment. However, such a highly homogenized framework in current design, where each element maintains uniformity, could overlook the inherent diversity among different tasks and lead to suboptimal performance. To tackle this, GvSeg: i) provides a holistic disentanglement and modeling for segment targets, thoroughly examining them from the perspective of appearance, position, and shape, and on this basis, ii) reformulates the query initialization, matching and sampling strategies in alignment with the task-specific requirement. These architecture-agnostic innovations empower GvSeg to effectively address each unique task by accommodating the specific properties that characterize them. Extensive experiments on seven gold-standard benchmark datasets demonstrate that GvSeg surpasses all existing specialized/general solutions by a significant margin on four different video segmentation tasks. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: ECCV 2024; Project page: https://github.com/kagawa588/GvSeg

arXiv:2407.02787 [pdf]

A versatile quantum microwave photonic signal processing platform based on coincidence window selection technique

Authors: Xinghua Li, Yifan Guo, Xiao Xiang, Runai Quan, Mingtao Cao, Ruifang Dong, Tao Liu, Ming Li, Shougang Zhang

Abstract: Quantum microwave photonics (QMWP) is an innovative approach that combines energy-time entangled biphoton sources as the optical carrier with time-correlated single-photon detection for high-speed RF signal recovery. This groundbreaking method offers unique advantages such as nonlocal RF signal encoding and robust resistance to dispersion-induced frequency fading. This paper explores the versatili… ▽ More Quantum microwave photonics (QMWP) is an innovative approach that combines energy-time entangled biphoton sources as the optical carrier with time-correlated single-photon detection for high-speed RF signal recovery. This groundbreaking method offers unique advantages such as nonlocal RF signal encoding and robust resistance to dispersion-induced frequency fading. This paper explores the versatility of processing the quantum microwave photonic signal by utilizing coincidence window selection on the biphoton coincidence distribution. The demonstration includes finely-tunable RF phase shifting, flexible multi-tap transversal filtering (with up to 15 taps), and photonically implemented RF mixing, leveraging the nonlocal RF mapping characteristic of QMWP. These accomplishments significantly enhance the capability of microwave photonic systems in processing ultra-weak signals, opening up new possibilities for various applications. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.02774 [pdf]

Quantum microwave photonic mixer with a large spurious-free dynamic range

Authors: Xinghua Li, Yifan Guo, Xiao Xiang, Runai Quan, Mingtao Cao, Ruifang Dong, Tao Liu, Ming Li, Shougang Zhang

Abstract: As one of the most fundamental functionalities of microwave photonics, microwave frequency mixing plays an essential role in modern radars and wireless communication systems. However, the commonly utilized intensity modulation in the systems often leads to inadequate spurious-free dynamic range (SFDR) for many sought-after applications. Quantum microwave photonics technique offers a promising solu… ▽ More As one of the most fundamental functionalities of microwave photonics, microwave frequency mixing plays an essential role in modern radars and wireless communication systems. However, the commonly utilized intensity modulation in the systems often leads to inadequate spurious-free dynamic range (SFDR) for many sought-after applications. Quantum microwave photonics technique offers a promising solution for improving SFDR in terms of higher-order harmonic distortion. In this paper, we demonstrate two types of quantum microwave photonic mixers based on the configuration of the intensity modulators: cascade-type and parallel-type. Leveraging the nonlocal RF signal encoding capability, both types of quantum microwave photonic mixers not only exhibit the advantage of dual-channel output but also present significant improvement in SFDR. Specifically, the parallel-type quantum microwave photonic mixer achieves a remarkable SFDR value of 113.6 dBでしべる.Hz1/2, which is 30 dBでしべる better than that of the cascade-type quantum microwave photonic mixer. When compared to the classical microwave photonic mixer, this enhancement reaches a notable 53.6 dBでしべる at the expense of 8 dBでしべる conversion loss. These results highlight the superiority of quantum microwave photonic mixers in the fields of microwave and millimeter-wave systems. Further applying multi-photon frequency entangled sources as optical carriers, the dual-channel microwave frequency conversion capability endowed by the quantum microwave photonic mixer can be extended to enhance the performance of multiple-paths microwave mixing which is essential for radar net systems. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2405.15265 [pdf, other]

Cross-Domain Few-Shot Semantic Segmentation via Doubly Matching Transformation

Authors: Jiayi Chen, Rong Quan, Jie Qin

Abstract: Cross-Domain Few-shot Semantic Segmentation (CD-FSS) aims to train generalized models that can segment classes from different domains with a few labeled images. Previous works have proven the effectiveness of feature transformation in addressing CD-FSS. However, they completely rely on support images for feature transformation, and repeatedly utilizing a few support images for each class may easil… ▽ More Cross-Domain Few-shot Semantic Segmentation (CD-FSS) aims to train generalized models that can segment classes from different domains with a few labeled images. Previous works have proven the effectiveness of feature transformation in addressing CD-FSS. However, they completely rely on support images for feature transformation, and repeatedly utilizing a few support images for each class may easily lead to overfitting and overlooking intra-class appearance differences. In this paper, we propose a Doubly Matching Transformation-based Network (DMTNet) to solve the above issue. Instead of completely relying on support images, we propose Self-Matching Transformation (SMT) to construct query-specific transformation matrices based on query images themselves to transform domain-specific query features into domain-agnostic ones. Calculating query-specific transformation matrices can prevent overfitting, especially for the meta-testing stage where only one or several images are used as support images to segment hundreds or thousands of images. After obtaining domain-agnostic features, we exploit a Dual Hypercorrelation Construction (DHC) module to explore the hypercorrelations between the query image with the foreground and background of the support image, based on which foreground and background prediction maps are generated and supervised, respectively, to enhance the segmentation result. In addition, we propose a Test-time Self-Finetuning (TSF) strategy to more accurately self-tune the query prediction in unseen domains. Extensive experiments on four popular datasets show that DMTNet achieves superior performance over state-of-the-art approaches. Code is available at https://github.com/ChenJiayi68/DMTNet. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.10759 [pdf, other]

Dynamic Three-dimensional Simulation of Surface Charging on Rotating Asteroids

Authors: Ronghui Quan, Zhiying Song, Zhigui Liu

Abstract: Surface charging phenomenon of asteroids, mainly resulting from solar wind plasma and solar radiation, has been studied extensively. However, the influence of asteroid's rotation on surface charging has yet to be fully understood. Here neural network is established to replace numerical integration, improving the efficiency of dynamic three-dimensional simulation. We implement simulation of rotatin… ▽ More Surface charging phenomenon of asteroids, mainly resulting from solar wind plasma and solar radiation, has been studied extensively. However, the influence of asteroid's rotation on surface charging has yet to be fully understood. Here neural network is established to replace numerical integration, improving the efficiency of dynamic three-dimensional simulation. We implement simulation of rotating asteroids and surrounding plasma environment under different conditions, including quiet solar wind and solar storms, various minerals on asteroid's surface also be considered. Results show that under typical solar wind, the maximum and minimum potential of asteroids will gradually decrease with their increasing periods, especially when solar wind is obliquely incident. For asteroid has period longer than one week, this decreasing trend will become extremely slow. During solar storm passing, solar wind plasma changes sharply, the susceptibility of asteroid's surface potential to rotation is greatly pronounced. Minerals on surface also count, plagioclase is the most sensitive mineral among those we explored, while ilmenite seems indifferent to changes in rotation periods. Understanding the surface charging of asteroid under various rotation periods or angles, is crucial for further research into solar wind plasma and asteroid's surface dust motion, providing a reference for safe landing exploration of asteroids. △ Less

Submitted 17 May, 2024; originally announced May 2024.

arXiv:2405.10744 [pdf, other]

The Effect of Work Function on Dust Charging and Dynamics on the Airless Celestial Body

Authors: Ronghui Quan, Zhigui Liu, Zhiying Song

Abstract: The charged dust on the surface of airless celestial bodies, such as the moon and asteroids, is a threat to space missions. Further research on the charged dust will contribute to the success of space missions. In this paper, we study the charging and dynamics of dust particles with different work functions. By integrating the photoelectron energy distribution function over four illuminated areas… ▽ More The charged dust on the surface of airless celestial bodies, such as the moon and asteroids, is a threat to space missions. Further research on the charged dust will contribute to the success of space missions. In this paper, we study the charging and dynamics of dust particles with different work functions. By integrating the photoelectron energy distribution function over four illuminated areas with different work functions, we evaluated the photoelectron concentration in these four areas. At each area, using the photoelectron concentration, we solve the dust charging and dynamics equations with two different gravitational acceleration values. The results reveal that the dust with a larger work function can reach higher equilibrium states. These states include dominant photoelectron-related charging currents, charge numbers, and levitation heights. We suggest that the equilibrium states all hold a clear inverse relationship with the work functions of dust particles when the solar zenith angle varies from 0 to 90 degrees, displaying consistent trends under different gravitational accelerations. We also find that dust particles seem unable to stably levitate at a critical solar zenith angle. The value of this critical SZA follows the same rule subjected to the work function. △ Less

Submitted 17 May, 2024; originally announced May 2024.

arXiv:2405.08748 [pdf, other]

Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Authors: Zhimin Li, Jianwei Zhang, Qin Lin, Jiangfeng Xiong, Yanxin Long, Xinchi Deng, Yingfang Zhang, Xingchao Liu, Minbin Huang, Zedong Xiao, Dayou Chen, Jiajun He, Jiahao Li, Wenyue Li, Chen Zhang, Rongwei Quan, Jianxiang Lu, Jiabin Huang, Xiaoyan Yuan, Xiaoxiao Zheng, Yixuan Li, Jihong Zhang, Chao Zhang, Meng Chen, Jie Liu , et al. (20 additional authors not shown)

Abstract: We present Hunyuan-DiT, a text-to-image diffusion transformer with fine-grained understanding of both English and Chinese. To construct Hunyuan-DiT, we carefully design the transformer structure, text encoder, and positional encoding. We also build from scratch a whole data pipeline to update and evaluate data for iterative model optimization. For fine-grained language understanding, we train a Mu… ▽ More We present Hunyuan-DiT, a text-to-image diffusion transformer with fine-grained understanding of both English and Chinese. To construct Hunyuan-DiT, we carefully design the transformer structure, text encoder, and positional encoding. We also build from scratch a whole data pipeline to update and evaluate data for iterative model optimization. For fine-grained language understanding, we train a Multimodal Large Language Model to refine the captions of the images. Finally, Hunyuan-DiT can perform multi-turn multimodal dialogue with users, generating and refining images according to the context. Through our holistic human evaluation protocol with more than 50 professional human evaluators, Hunyuan-DiT sets a new state-of-the-art in Chinese-to-image generation compared with other open-source models. Code and pretrained models are publicly available at github.com/Tencent/HunyuanDiT △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: Project Page: https://dit.hunyuan.tencent.com/

arXiv:2404.18686 [pdf]

Dynamic temperature compensation for wavelength-stable entangled biphoton generation

Authors: Yuting Liu, Huibo Hong, Xiao Xiang, Runai Quan, Tao Liu, Mingtao Cao, Shougang Zhang, Ruifang Dong

Abstract: A dynamic temperature compensation method is presented to stabilize the wavelength of the entangled biphoton source, which is generated via the spontaneous parametric down-conversion based on a MgO: PPLN waveguide. Utilizing the dispersive Fourier transformation technique combined with a digital proportional-integral-differential algorithm, the small amount of wavelength variation can be instantly… ▽ More A dynamic temperature compensation method is presented to stabilize the wavelength of the entangled biphoton source, which is generated via the spontaneous parametric down-conversion based on a MgO: PPLN waveguide. Utilizing the dispersive Fourier transformation technique combined with a digital proportional-integral-differential algorithm, the small amount of wavelength variation can be instantly identified and then compensated with active temperature correction. The long-term wavelength stability, assessed though Allan deviation, shows nearly a hundredfold enhancement, reaching 2.00*10^(-7) at the averaging time of 10000 s. It offers a simple, ready-to-use solution for precise wavelength control in quantum information processing. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.16581 [pdf, other]

AudioScenic: Audio-Driven Video Scene Editing

Authors: Kaixin Shen, Ruijie Quan, Linchao Zhu, Jun Xiao, Yi Yang

Abstract: Audio-driven visual scene editing endeavors to manipulate the visual background while leaving the foreground content unchanged, according to the given audio signals. Unlike current efforts focusing primarily on image editing, audio-driven video scene editing has not been extensively addressed. In this paper, we introduce AudioScenic, an audio-driven framework designed for video scene editing. Audi… ▽ More Audio-driven visual scene editing endeavors to manipulate the visual background while leaving the foreground content unchanged, according to the given audio signals. Unlike current efforts focusing primarily on image editing, audio-driven video scene editing has not been extensively addressed. In this paper, we introduce AudioScenic, an audio-driven framework designed for video scene editing. AudioScenic integrates audio semantics into the visual scene through a temporal-aware audio semantic injection process. As our focus is on background editing, we further introduce a SceneMasker module, which maintains the integrity of the foreground content during the editing process. AudioScenic exploits the inherent properties of audio, namely, audio magnitude and frequency, to guide the editing process, aiming to control the temporal dynamics and enhance the temporal consistency. First, we present an audio Magnitude Modulator module that adjusts the temporal dynamics of the scene in response to changes in audio magnitude, enhancing the visual dynamics. Second, the audio Frequency Fuser module is designed to ensure temporal consistency by aligning the frequency of the audio with the dynamics of the video scenes, thus improving the overall temporal coherence of the edited videos. These integrated features enable AudioScenic to not only enhance visual diversity but also maintain temporal consistency throughout the video. We present a new metric named temporal score for more comprehensive validation of temporal consistency. We demonstrate substantial advancements of AudioScenic over competing methods on DAVIS and Audioset datasets. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.16579 [pdf, other]

Neural Interaction Energy for Multi-Agent Trajectory Prediction

Authors: Kaixin Shen, Ruijie Quan, Linchao Zhu, Jun Xiao, Yi Yang

Abstract: Maintaining temporal stability is crucial in multi-agent trajectory prediction. Insufficient regularization to uphold this stability often results in fluctuations in kinematic states, leading to inconsistent predictions and the amplification of errors. In this study, we introduce a framework called Multi-Agent Trajectory prediction via neural interaction Energy (MATE). This framework assesses the… ▽ More Maintaining temporal stability is crucial in multi-agent trajectory prediction. Insufficient regularization to uphold this stability often results in fluctuations in kinematic states, leading to inconsistent predictions and the amplification of errors. In this study, we introduce a framework called Multi-Agent Trajectory prediction via neural interaction Energy (MATE). This framework assesses the interactive motion of agents by employing neural interaction energy, which captures the dynamics of interactions and illustrates their influence on the future trajectories of agents. To bolster temporal stability, we introduce two constraints: inter-agent interaction constraint and intra-agent motion constraint. These constraints work together to ensure temporal stability at both the system and agent levels, effectively mitigating prediction fluctuations inherent in multi-agent systems. Comparative evaluations against previous methods on four diverse datasets highlight the superior prediction accuracy and generalization capabilities of our model. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.00254 [pdf, other]

Clustering for Protein Representation Learning

Authors: Ruijie Quan, Wenguan Wang, Fan Ma, Hehe Fan, Yi Yang

Abstract: Protein representation learning is a challenging task that aims to capture the structure and function of proteins from their amino acid sequences. Previous methods largely ignored the fact that not all amino acids are equally important for protein folding and activity. In this article, we propose a neural clustering framework that can automatically discover the critical components of a protein by… ▽ More Protein representation learning is a challenging task that aims to capture the structure and function of proteins from their amino acid sequences. Previous methods largely ignored the fact that not all amino acids are equally important for protein folding and activity. In this article, we propose a neural clustering framework that can automatically discover the critical components of a protein by considering both its primary and tertiary structure information. Our framework treats a protein as a graph, where each node represents an amino acid and each edge represents a spatial or sequential connection between amino acids. We then apply an iterative clustering strategy to group the nodes into clusters based on their 1D and 3D positions and assign scores to each cluster. We select the highest-scoring clusters and use their medoid nodes for the next iteration of clustering, until we obtain a hierarchical and informative representation of the protein. We evaluate on four protein-related tasks: protein fold classification, enzyme reaction classification, gene ontology term prediction, and enzyme commission number prediction. Experimental results demonstrate that our method achieves state-of-the-art performance. △ Less

Submitted 30 March, 2024; originally announced April 2024.

Comments: Accepted to CVPR2024

arXiv:2403.20022 [pdf, other]

Psychometry: An Omnifit Model for Image Reconstruction from Human Brain Activity

Authors: Ruijie Quan, Wenguan Wang, Zhibo Tian, Fan Ma, Yi Yang

Abstract: Reconstructing the viewed images from human brain activity bridges human and computer vision through the Brain-Computer Interface. The inherent variability in brain function between individuals leads existing literature to focus on acquiring separate models for each individual using their respective brain signal data, ignoring commonalities between these data. In this article, we devise Psychometr… ▽ More Reconstructing the viewed images from human brain activity bridges human and computer vision through the Brain-Computer Interface. The inherent variability in brain function between individuals leads existing literature to focus on acquiring separate models for each individual using their respective brain signal data, ignoring commonalities between these data. In this article, we devise Psychometry, an omnifit model for reconstructing images from functional Magnetic Resonance Imaging (fMRI) obtained from different subjects. Psychometry incorporates an omni mixture-of-experts (Omni MoE) module where all the experts work together to capture the inter-subject commonalities, while each expert associated with subject-specific parameters copes with the individual differences. Moreover, Psychometry is equipped with a retrieval-enhanced inference strategy, termed Ecphory, which aims to enhance the learned fMRI representation via retrieving from prestored subject-specific memories. These designs collectively render Psychometry omnifit and efficient, enabling it to capture both inter-subject commonality and individual specificity across subjects. As a result, the enhanced fMRI representations serve as conditional signals to guide a generation model to reconstruct high-quality and realistic images, establishing Psychometry as state-of-the-art in terms of both high-level and low-level metrics. △ Less

Submitted 29 March, 2024; originally announced March 2024.

Comments: Accepted to CVPR 2024

arXiv:2403.15740 [pdf, other]

Ghost Sentence: A Tool for Everyday Users to Copyright Data from Large Language Models

Authors: Shuai Zhao, Linchao Zhu, Ruijie Quan, Yi Yang

Abstract: Web user data plays a central role in the ecosystem of pre-trained large language models (LLMs) and their fine-tuned variants. Billions of data are crawled from the web and fed to LLMs. How can \textit{\textbf{everyday web users}} confirm if LLMs misuse their data without permission? In this work, we suggest that users repeatedly insert personal passphrases into their documents, enabling LLMs to m… ▽ More Web user data plays a central role in the ecosystem of pre-trained large language models (LLMs) and their fine-tuned variants. Billions of data are crawled from the web and fed to LLMs. How can \textit{\textbf{everyday web users}} confirm if LLMs misuse their data without permission? In this work, we suggest that users repeatedly insert personal passphrases into their documents, enabling LLMs to memorize them. These concealed passphrases in user documents, referred to as \textit{ghost sentences}, once they are identified in the generated content of LLMs, users can be sure that their data is used for training. To explore the effectiveness and usage of this copyrighting tool, we define the \textit{user training data identification} task with ghost sentences. Multiple datasets from various sources at different scales are created and tested with LLMs of different sizes. For evaluation, we introduce a last $k$ words verification manner along with two metrics: document and user identification accuracy. In the specific case of instruction tuning of a 3B LLaMA model, 11 out of 16 users with ghost sentences identify their data within the generation content. These 16 users contribute 383 examples to $\sim$1.8M training documents. For continuing pre-training of a 1.1B TinyLlama model, 61 out of 64 users with ghost sentences identify their data within the LLM output. These 64 users contribute 1156 examples to $\sim$10M training documents. △ Less

Submitted 23 March, 2024; originally announced March 2024.

Comments: Preprint, work in progress

arXiv:2402.09649 [pdf, other]

ProtChatGPT: Towards Understanding Proteins with Large Language Models

Authors: Chao Wang, Hehe Fan, Ruijie Quan, Yi Yang

Abstract: Protein research is crucial in various fundamental disciplines, but understanding their intricate structure-function relationships remains challenging. Recent Large Language Models (LLMs) have made significant strides in comprehending task-specific knowledge, suggesting the potential for ChatGPT-like systems specialized in protein to facilitate basic research. In this work, we introduce ProtChatGP… ▽ More Protein research is crucial in various fundamental disciplines, but understanding their intricate structure-function relationships remains challenging. Recent Large Language Models (LLMs) have made significant strides in comprehending task-specific knowledge, suggesting the potential for ChatGPT-like systems specialized in protein to facilitate basic research. In this work, we introduce ProtChatGPT, which aims at learning and understanding protein structures via natural languages. ProtChatGPT enables users to upload proteins, ask questions, and engage in interactive conversations to produce comprehensive answers. The system comprises protein encoders, a Protein-Language Pertaining Transformer (PLP-former), a projection adapter, and an LLM. The protein first undergoes protein encoders and PLP-former to produce protein embeddings, which are then projected by the adapter to conform with the LLM. The LLM finally combines user questions with projected embeddings to generate informative answers. Experiments show that ProtChatGPT can produce promising responses to proteins and their corresponding questions. We hope that ProtChatGPT could form the basis for further exploration and application in protein research. Code and our pre-trained model will be publicly available. △ Less

Submitted 14 February, 2024; originally announced February 2024.

arXiv:2306.09172 [pdf, other]

Action Sensitivity Learning for the Ego4D Episodic Memory Challenge 2023

Authors: Jiayi Shao, Xiaohan Wang, Ruijie Quan, Yi Yang

Abstract: This report presents ReLER submission to two tracks in the Ego4D Episodic Memory Benchmark in CVPR 2023, including Natural Language Queries and Moment Queries. This solution inherits from our proposed Action Sensitivity Learning framework (ASL) to better capture discrepant information of frames. Further, we incorporate a series of stronger video features and fusion strategies. Our method achieves… ▽ More This report presents ReLER submission to two tracks in the Ego4D Episodic Memory Benchmark in CVPR 2023, including Natural Language Queries and Moment Queries. This solution inherits from our proposed Action Sensitivity Learning framework (ASL) to better capture discrepant information of frames. Further, we incorporate a series of stronger video features and fusion strategies. Our method achieves an average mAP of 29.34, ranking 1st in Moment Queries Challenge, and garners 19.79 mean R1, ranking 2nd in Natural Language Queries Challenge. Our code will be released. △ Less

Submitted 25 September, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

Comments: Accepted to CVPR 2023 Ego4D Workshop; 1st in Ego4D Moment Queries Challenge; 2nd in Ego4D Natural Language Queries Challenge

arXiv:2305.15701 [pdf, other]

Action Sensitivity Learning for Temporal Action Localization

Authors: Jiayi Shao, Xiaohan Wang, Ruijie Quan, Junjun Zheng, Jiang Yang, Yi Yang

Abstract: Temporal action localization (TAL), which involves recognizing and locating action instances, is a challenging task in video understanding. Most existing approaches directly predict action classes and regress offsets to boundaries, while overlooking the discrepant importance of each frame. In this paper, we propose an Action Sensitivity Learning framework (ASL) to tackle this task, which aims to a… ▽ More Temporal action localization (TAL), which involves recognizing and locating action instances, is a challenging task in video understanding. Most existing approaches directly predict action classes and regress offsets to boundaries, while overlooking the discrepant importance of each frame. In this paper, we propose an Action Sensitivity Learning framework (ASL) to tackle this task, which aims to assess the value of each frame and then leverage the generated action sensitivity to recalibrate the training procedure. We first introduce a lightweight Action Sensitivity Evaluator to learn the action sensitivity at the class level and instance level, respectively. The outputs of the two branches are combined to reweight the gradient of the two sub-tasks. Moreover, based on the action sensitivity of each frame, we design an Action Sensitive Contrastive Loss to enhance features, where the action-aware frames are sampled as positive pairs to push away the action-irrelevant frames. The extensive studies on various action localization benchmarks (i.e., MultiThumos, Charades, Ego4D-Moment Queries v1.0, Epic-Kitchens 100, Thumos14 and ActivityNet1.3) show that ASL surpasses the state-of-the-art in terms of average-mAP under multiple types of scenarios, e.g., single-labeled, densely-labeled and egocentric. △ Less

Submitted 13 September, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

Comments: Accepted to ICCV 2023

arXiv:2305.14014 [pdf, other]

CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model

Authors: Shuai Zhao, Ruijie Quan, Linchao Zhu, Yi Yang

Abstract: Pre-trained vision-language models~(VLMs) are the de-facto foundation models for various downstream tasks. However, scene text recognition methods still prefer backbones pre-trained on a single modality, namely, the visual modality, despite the potential of VLMs to serve as powerful scene text readers. For example, CLIP can robustly identify regular (horizontal) and irregular (rotated, curved, blu… ▽ More Pre-trained vision-language models~(VLMs) are the de-facto foundation models for various downstream tasks. However, scene text recognition methods still prefer backbones pre-trained on a single modality, namely, the visual modality, despite the potential of VLMs to serve as powerful scene text readers. For example, CLIP can robustly identify regular (horizontal) and irregular (rotated, curved, blurred, or occluded) text in images. With such merits, we transform CLIP into a scene text reader and introduce CLIP4STR, a simple yet effective STR method built upon image and text encoders of CLIP. It has two encoder-decoder branches: a visual branch and a cross-modal branch. The visual branch provides an initial prediction based on the visual feature, and the cross-modal branch refines this prediction by addressing the discrepancy between the visual feature and text semantics. To fully leverage the capabilities of both branches, we design a dual predict-and-refine decoding scheme for inference. We scale CLIP4STR in terms of the model size, pre-training data, and training data, achieving state-of-the-art performance on 11 STR benchmarks. Additionally, a comprehensive empirical study is provided to enhance the understanding of the adaptation of CLIP to STR. We believe our method establishes a simple yet strong baseline for future STR research with VLMs. △ Less

Submitted 2 May, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: Preprint. A PyTorch re-implementation is at https://github.com/VamosC/CLIP4STR

arXiv:2305.04212 [pdf]

Numerical simulation of a rotating magnetic sail for space applications

Authors: Mingwei Xu, Ronghui Quan, Yunjia Yao

Abstract: The Magnetic Sail is a space propulsion system that utilizes the interaction between solar wind particles and an artificial dipole magnetic field generated by a spacecraft's coil to produce thrust without the need for additional plasma or propellant. To reduce the size of the sail while improving the efficiency of capturing solar wind, a new type of rotating magnetic sail with an initial rotation… ▽ More The Magnetic Sail is a space propulsion system that utilizes the interaction between solar wind particles and an artificial dipole magnetic field generated by a spacecraft's coil to produce thrust without the need for additional plasma or propellant. To reduce the size of the sail while improving the efficiency of capturing solar wind, a new type of rotating magnetic sail with an initial rotation speed is proposed. This study evaluates the thrust characteristics, attitude, and size design factors of a rotating magnetic sail using a 3-D single-component particle numerical simulation. The results show that an increase in rotational speed significantly increases the thrust of the rotating magnetic sail. The thrust is most significant when the magnetic moment of the sail is parallel to the direction of particle velocity. The study also found that the potential for the application of the rotating magnetic sail is greatest in orbits with high-density and low-speed space plasma environments. It suggests that a rotating magnetic sail with a magnetic moment (Mm) of 10^3-10^4 Am^2 operating at an altitude of 400 km in Low Earth Orbit (LEO) can achieve a similar thrust level to that of a rotating magnetic sail operating at 1 AUえーゆー (astronomical unit) of 10^7-10^8 Am^2. △ Less

Submitted 7 May, 2023; originally announced May 2023.

Comments: 15 pages,14 figures, under review

arXiv:2305.01897 [pdf]

Quantum two-way time transfer over a 103 km urban fiber

Authors: Huibo Hong, Runai Quan, Xiao Xiang, Yuting Liu, Tao Liu, Mingtao Cao, Ruifang Dong, Shougang Zhang

Abstract: As a new approach to realizing high-precision time synchronization between remote time scales, quantum two-way time transfer via laboratory fiber link has shown significant enhancement of the transfer stability to several tens of femtoseconds. To verify its great potential in practical systems, the field test in long-haul installed fiber optic infrastructure is required to be demonstrated. In this… ▽ More As a new approach to realizing high-precision time synchronization between remote time scales, quantum two-way time transfer via laboratory fiber link has shown significant enhancement of the transfer stability to several tens of femtoseconds. To verify its great potential in practical systems, the field test in long-haul installed fiber optic infrastructure is required to be demonstrated. In this paper, we implement the two-way quantum time transfer over a 103 km urban fiber link. A time transfer stability of 3.67 ps at 10 s and 0.28 ps at 40000 s has been achieved, despite the large attenuation of 38 dBでしべる leading to fewer than 40 correlated events per second. This achievement marks the first successful step of quantum two-way time transfer in the task of high-precision long-distance field transfer systems. △ Less

Submitted 10 August, 2023; v1 submitted 3 May, 2023; originally announced May 2023.

arXiv:2304.06306 [pdf, other]

Efficient Multimodal Fusion via Interactive Prompting

Authors: Yaowei Li, Ruijie Quan, Linchao Zhu, Yi Yang

Abstract: Large-scale pre-training has brought unimodal fields such as computer vision and natural language processing to a new era. Following this trend, the size of multi-modal learning models constantly increases, leading to an urgent need to reduce the massive computational cost of finetuning these models for downstream tasks. In this paper, we propose an efficient and flexible multimodal fusion method,… ▽ More Large-scale pre-training has brought unimodal fields such as computer vision and natural language processing to a new era. Following this trend, the size of multi-modal learning models constantly increases, leading to an urgent need to reduce the massive computational cost of finetuning these models for downstream tasks. In this paper, we propose an efficient and flexible multimodal fusion method, namely PMF, tailored for fusing unimodally pre-trained transformers. Specifically, we first present a modular multimodal fusion framework that exhibits high flexibility and facilitates mutual interactions among different modalities. In addition, we disentangle vanilla prompts into three types in order to learn different optimizing objectives for multimodal learning. It is also worth noting that we propose to add prompt vectors only on the deep layers of the unimodal transformers, thus significantly reducing the training memory usage. Experiment results show that our proposed method achieves comparable performance to several other multimodal finetuning methods with less than 3% trainable parameters and up to 66% saving of training memory usage. △ Less

Submitted 15 May, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

Comments: Camera-ready version for CVPR2023

arXiv:2303.08525 [pdf, other]

MRGAN360: Multi-stage Recurrent Generative Adversarial Network for 360 Degree Image Saliency Prediction

Authors: Pan Gao, Xinlang Chen, Rong Quan, Wei Xiang

Abstract: Thanks to the ability of providing an immersive and interactive experience, the uptake of 360 degree image content has been rapidly growing in consumer and industrial applications. Compared to planar 2D images, saliency prediction for 360 degree images is more challenging due to their high resolutions and spherical viewing ranges. Currently, most high-performance saliency prediction models for omn… ▽ More Thanks to the ability of providing an immersive and interactive experience, the uptake of 360 degree image content has been rapidly growing in consumer and industrial applications. Compared to planar 2D images, saliency prediction for 360 degree images is more challenging due to their high resolutions and spherical viewing ranges. Currently, most high-performance saliency prediction models for omnidirectional images (ODIs) rely on deeper or broader convolutional neural networks (CNNs), which benefit from CNNs' superior feature representation capabilities while suffering from their high computational costs. In this paper, inspired by the human visual cognitive process, i.e., human being's perception of a visual scene is always accomplished by multiple stages of analysis, we propose a novel multi-stage recurrent generative adversarial networks for ODIs dubbed MRGAN360, to predict the saliency maps stage by stage. At each stage, the prediction model takes as input the original image and the output of the previous stage and outputs a more accurate saliency map. We employ a recurrent neural network among adjacent prediction stages to model their correlations, and exploit a discriminator at the end of each stage to supervise the output saliency map. In addition, we share the weights among all the stages to obtain a lightweight architecture that is computationally cheap. Extensive experiments are conducted to demonstrate that our proposed model outperforms the state-of-the-art model in terms of both prediction accuracy and model size. △ Less

Submitted 15 March, 2023; originally announced March 2023.

arXiv:2212.04700 [pdf, other]

Tencent AVS: A Holistic Ads Video Dataset for Multi-modal Scene Segmentation

Authors: Jie Jiang, Zhimin Li, Jiangfeng Xiong, Rongwei Quan, Qinglin Lu, Wei Liu

Abstract: Temporal video segmentation and classification have been advanced greatly by public benchmarks in recent years. However, such research still mainly focuses on human actions, failing to describe videos in a holistic view. In addition, previous research tends to pay much attention to visual information yet ignores the multi-modal nature of videos. To fill this gap, we construct the Tencent `Ads Vide… ▽ More Temporal video segmentation and classification have been advanced greatly by public benchmarks in recent years. However, such research still mainly focuses on human actions, failing to describe videos in a holistic view. In addition, previous research tends to pay much attention to visual information yet ignores the multi-modal nature of videos. To fill this gap, we construct the Tencent `Ads Video Segmentation'~(TAVS) dataset in the ads domain to escalate multi-modal video analysis to a new level. TAVS describes videos from three independent perspectives as `presentation form', `place', and `style', and contains rich multi-modal information such as video, audio, and text. TAVS is organized hierarchically in semantic aspects for comprehensive temporal video segmentation with three levels of categories for multi-label classification, e.g., `place' - `working place' - `office'. Therefore, TAVS is distinguished from previous temporal segmentation datasets due to its multi-modal information, holistic view of categories, and hierarchical granularities. It includes 12,000 videos, 82 classes, 33,900 segments, 121,100 shots, and 168,500 labels. Accompanied with TAVS, we also present a strong multi-modal video segmentation baseline coupled with multi-label class prediction. Extensive experiments are conducted to evaluate our proposed method as well as existing representative methods to reveal key challenges of our dataset TAVS. △ Less

Submitted 9 December, 2022; originally announced December 2022.

arXiv:2212.01741 [pdf]

Quantum two-way time transfer over a hybrid free-space and fiber link

Authors: Xiao Xiang, Bingke Shi, Runai Quan, Yuting Liu, Zhiguang Xia, Huibo Hong, Tao Liu, Jincai Wu, Jia Qiang, Jianjun Jia, Shougang Zhang, Ruifang Dong

Abstract: As the superiority of quantum two-way time transfer (Q-TWTT) has been proved convincingly over fiber links, its implementation on free-space links becomes an urgent need for remote time transfer expanding to the transcontinental distance. In this paper, the first Q-TWTT experimental demonstration over a hybrid link of 2 km-long turbulent free space and 7 km-long field fiber is reported. Despite th… ▽ More As the superiority of quantum two-way time transfer (Q-TWTT) has been proved convincingly over fiber links, its implementation on free-space links becomes an urgent need for remote time transfer expanding to the transcontinental distance. In this paper, the first Q-TWTT experimental demonstration over a hybrid link of 2 km-long turbulent free space and 7 km-long field fiber is reported. Despite the significant loss of more than 25 dBでしべる and atmospheric turbulence, reliable time transfer performance lasting for overnights has been realized with time stability in terms of time deviation far below 1 picosecond. This achievement shows the good feasibility of quantum-enhanced time transfer in the space-ground integrated optical links and nicely certifies the capability of Q-TWTT in comparing and synchronizing the state-of-the-art space microwave atomic clocks. △ Less

Submitted 3 December, 2022; originally announced December 2022.

Comments: 13 pages, 7 figures

arXiv:2211.03110 [pdf]

Surpassing the classical limit of microwave photonic frequency fading effect by quantum microwave photonics

Authors: Yaqing Jin, Ye Yang, Huibo Hong, Xiao Xiang, Runai Quan, Tao Liu, Ninghua Zhu, Ming Li, Ruifang Dong, Shougang Zhang

Abstract: With energy-time entangled biphoton sources as the optical carrier and time-correlated single-photon detection for high-speed radio frequency (RF) signal recovery, the method of quantum microwave photonics (QMWP) has presented the unprecedented potential of nonlocal RF signal encoding and efficient RF signal distilling from the dispersion interference associated with ultrashort pulse carriers. In… ▽ More With energy-time entangled biphoton sources as the optical carrier and time-correlated single-photon detection for high-speed radio frequency (RF) signal recovery, the method of quantum microwave photonics (QMWP) has presented the unprecedented potential of nonlocal RF signal encoding and efficient RF signal distilling from the dispersion interference associated with ultrashort pulse carriers. In this letter, its capability in microwave signal processing and prospective superiority is further demonstrated. Both the QMWP RF phase shifting and transversal filtering functionality, which are the fundamental building blocks of microwave signal processing, are realized. Besides the perfect immunity to the dispersion-induced frequency fading effect associated with the broadband carrier in classical microwave photonics, a native two-dimensional parallel microwave signal processor is provided. These demonstrations fully prove the superiority of QMWP over classical MWP and open the door to new application fields of MWP involving encrypted processing. △ Less

Submitted 6 November, 2022; originally announced November 2022.

arXiv:2205.11135 [pdf, other]

doi 10.1016/j.optlastec.2022.109039

Spectrally resolved two-photon interference in a modified Hong-Ou-Mandel interferometer

Authors: Baihong Li, Boxin Yuan, Changhua Chen, Xiao Xiang, Runai Quan, Ruifang Dong, Shougang Zhang, Rui-Bo Jin

Abstract: A modified Hong--Ou--Mandel (HOM) interference reveals that the two-photon interference phenomenon can be explained only by the concept of a two-photon wave packet rather than a single-photon one. However, the temporal interferogram in the modified HOM interferometer becomes flat in some cases so that no useful information can be extracted from time-domain measurement. Here, we theoretically explo… ▽ More A modified Hong--Ou--Mandel (HOM) interference reveals that the two-photon interference phenomenon can be explained only by the concept of a two-photon wave packet rather than a single-photon one. However, the temporal interferogram in the modified HOM interferometer becomes flat in some cases so that no useful information can be extracted from time-domain measurement. Here, we theoretically explore such temporal interferogram from the frequency domain and obtain the spectrally resolved interference with high visibility. The result represents a modulation of the joint spectral intensity along both the frequency sum and the frequency difference directions. This is quite different from the cases of the spectrally resolved HOM interference and N00N state interference where the modulations happened only in one direction. Moreover, we have shown that such modulations have a potential application in the generation and characterization of high-dimensional frequency entanglement. △ Less

Submitted 14 December, 2022; v1 submitted 23 May, 2022; originally announced May 2022.

Comments: 10 pages, 4 figures

Journal ref: Optics and Laser Technology,159,109039(2023)

arXiv:2201.12106 [pdf]

A proof-of-principle demonstration of quantum microwave photonics

Authors: Yaqing Jin, Ye Yang, Huibo Hong, Xiao Xiang, Runai Quan, Tao Liu, Shougang Zhang, Ninghua Zhu, Ming Li, Ruifang Dong

Abstract: With the rapid development of microwave photonics, which has expanded to numerous applications of commercial importance, eliminating the emerging bottlenecks becomes of vital importance. For example, as the main branch of microwave photonics, radio-over-fiber technology provides high bandwidth, low-loss, and long-distance propagation capability, facilitating wide applications ranging from telecomm… ▽ More With the rapid development of microwave photonics, which has expanded to numerous applications of commercial importance, eliminating the emerging bottlenecks becomes of vital importance. For example, as the main branch of microwave photonics, radio-over-fiber technology provides high bandwidth, low-loss, and long-distance propagation capability, facilitating wide applications ranging from telecommunication to wireless networks. With ultrashort pulses as the optical carrier, huge capacity is further endowed. However, the wide bandwidth of ultrashort pulses results in the severe vulnerability of high-frequency RF signals to fiber dispersion. With a time-energy entangled biphoton source as the optical carrier and combined with the single-photon detection technique, a quantum microwave photonics method is proposed and demonstrated experimentally. The results show that it not only realizes unprecedented nonlocal RF signal modulation with strong resistance to the dispersion associated with ultrashort pulse carriers but provides an alternative mechanism to effectively distill the RF signal out from the dispersion. Furthermore, the spurious-free dynamic range of both the nonlocally modulated and distilled RF signals has been significantly improved. With the ultra-weak detection and high-speed processing advantages endowed by the low-timing-jitter single-photon detection, the quantum microwave photonics method opens up new possibilities in modern communication and networks. △ Less

Submitted 28 January, 2022; originally announced January 2022.

arXiv:2112.02500 [pdf, other]

MovieNet-PS: A Large-Scale Person Search Dataset in the Wild

Authors: Jie Qin, Peng Zheng, Yichao Yan, Rong Quan, Xiaogang Cheng, Bingbing Ni

Abstract: Person search aims to jointly localize and identify a query person from natural, uncropped images, which has been actively studied over the past few years. In this paper, we delve into the rich context information globally and locally surrounding the target person, which we refer to as scene and group context, respectively. Unlike previous works that treat the two types of context individually, we… ▽ More Person search aims to jointly localize and identify a query person from natural, uncropped images, which has been actively studied over the past few years. In this paper, we delve into the rich context information globally and locally surrounding the target person, which we refer to as scene and group context, respectively. Unlike previous works that treat the two types of context individually, we exploit them in a unified global-local context network (GLCNet) with the intuitive aim of feature enhancement. Specifically, re-ID embeddings and context features are simultaneously learned in a multi-stage fashion, ultimately leading to enhanced, discriminative features for person search. We conduct the experiments on two person search benchmarks (i.e., CUHK-SYSU and PRW) as well as extend our approach to a more challenging setting (i.e., character search on MovieNet). Extensive experimental results demonstrate the consistent improvement of the proposed GLCNet over the state-of-the-art methods on all three datasets. Our source codes, pre-trained models, and the new dataset are publicly available at: https://github.com/ZhengPeng7/GLCNet. △ Less

Submitted 28 February, 2023; v1 submitted 5 December, 2021; originally announced December 2021.

Comments: ICASSP 2023

arXiv:2111.00380 [pdf]

Demonstration of 50 km Fiber-optic two-way quantum time transfer at femtosecond-scale precision

Authors: Huibo Hong, Runai Quan, Xiao Xiang, Wenxiang Xue, Honglei Quan, Wenyu Zhao, Yuting Liu, Mingtao Cao, Tao Liu, Shougang Zhang, Ruifang Dong

Abstract: The two-way quantum time transfer method has been proposed and experimentally demonstrated for its potential enhancements in precision and better guarantee of security. To further testify its advantage in practical applications, the applicable direct transmission distance as well as the achievable synchronization precision between independent time scales is of great interest. In this paper, an exp… ▽ More The two-way quantum time transfer method has been proposed and experimentally demonstrated for its potential enhancements in precision and better guarantee of security. To further testify its advantage in practical applications, the applicable direct transmission distance as well as the achievable synchronization precision between independent time scales is of great interest. In this paper, an experiment on two-way quantum time transfer has been carried out over a 50 km long fiber link. With the common clock reference, a short-term stability of 2.6 ps at an averaging time of 7 s and a long-term stability of 54.6 fs at 57300 s were obtained. With independent clock references, assisted by microwave frequency transfer technology, the achieved synchronization showed almost equal performance and reached a stability of 89.5 fs at 57300 s. Furthermore, the spectral consistency of the utilized entangled photon pair sources has been studied concerning its effect on the transfer accuracy and long-term stability. The results obtained have promised a bright future of the two-way quantum time transfer for realizing high-precision time synchronization on metropolitan area fiber links. △ Less

Submitted 30 October, 2021; originally announced November 2021.

arXiv:2109.00784 [pdf]

doi 10.1364/OE.451172

Implementation of field two-way quantum synchronization of distant clocks across a 7 km deployed fiber link

Authors: Runai Quan, Huibo Hong, Wenxiang Xue, Honglei Quan, Wenyu Zhao, Xiao Xiang, Yuting Liu, Mingtao Cao, Tao Liu, Shougang Zhang, Ruifang Dong

Abstract: The two-way quantum clock synchronization has been shown not only providing femtosecond-level synchronization capability but also security against symmetric delay attacks, thus becoming a prospective method to compare and synchronize distant clocks with both enhanced precision and security. In this letter, a field test of two-way quantum synchronization between a H-maser and a Rb clock linked by a… ▽ More The two-way quantum clock synchronization has been shown not only providing femtosecond-level synchronization capability but also security against symmetric delay attacks, thus becoming a prospective method to compare and synchronize distant clocks with both enhanced precision and security. In this letter, a field test of two-way quantum synchronization between a H-maser and a Rb clock linked by a 7 km-long deployed fiber was implemented. Limited by the frequency stability of the Rb clock, the achieved time stability at 30 s was measured as 32 ps. By applying a fiber-optic microwave frequency transfer technology, the stability was improved by more than one-magnitude to 1.9 ps, even though the number of acquired photon pairs was only 1440 in 30 s due to the low sampling rate of the utilized coincidence measurement system. Such implementation demonstrates the high practicability of two-way quantum clock synchronization method for promoting the field applications. △ Less

Submitted 24 December, 2021; v1 submitted 2 September, 2021; originally announced September 2021.

arXiv:2106.13986 [pdf, other]

doi 10.1063/5.0061478

Implementation of quantum synchronization over a 20-km fiber distance based on frequency-correlated photon pairs and HOM interference

Authors: Yuting Liu, Runai Quan, Xiao Xiang, Huibo Hong, Tao Liu, Ruifang Dong, Shougang Zhang

Abstract: The quantum synchronization based on frequency-correlated photon pairs and HOM interference has shown femtosecond-level precision and great application prospect in numerous fields depending on high-precision timefrequency signals. Due to the difficulty of achieving stable HOM interference fringe after long-distance fiber transmission, this quantum synchronization is hampered from long-haul field a… ▽ More The quantum synchronization based on frequency-correlated photon pairs and HOM interference has shown femtosecond-level precision and great application prospect in numerous fields depending on high-precision timefrequency signals. Due to the difficulty of achieving stable HOM interference fringe after long-distance fiber transmission, this quantum synchronization is hampered from long-haul field application. Utilizing segmented fibers instead of a single long-length fiber, we successfully achieved the stable observation of the two-photon interference of the lab-developed broadband frequency-correlated photon pairs after 20 km-long fiber transmission, without employing auxiliary phase stabilization method. Referenced to this interference fringe, the balance of the two fiber arms is successfully achieved with a long-term stability of 20 fs. The HOM-interference-based synchronization over a 20-km fiber link is thus demonstrated and a minimum stability of 74 fs has been reached at 48,000 s. This result not only provides a simple way to stabilize the fiber-optic two-photon interferometer for long-distance quantum communication systems, but also makes a great stride forward in extending the quantum-interference-based synchronization scheme to the long-haul field applications. △ Less

Submitted 26 June, 2021; originally announced June 2021.

Comments: 5 pages, 4 figures

arXiv:2103.12996 [pdf, ps, other]

doi 10.1038/s41598-021-99373-y

Resonant Scanning Design and Control for Fast Spatial Sampling

Authors: Zhanghao Sun, Ronald Quan, Olav Solgaard

Abstract: Two-dimensional, resonant scanners have been utilized in a large variety of imaging modules due to their compact form, low power consumption, large angular range, and high speed. However, resonant scanners have problems with non-optimal and inflexible scanning patterns and inherent phase uncertainty, which limit practical applications. Here we propose methods for optimized design and control of th… ▽ More Two-dimensional, resonant scanners have been utilized in a large variety of imaging modules due to their compact form, low power consumption, large angular range, and high speed. However, resonant scanners have problems with non-optimal and inflexible scanning patterns and inherent phase uncertainty, which limit practical applications. Here we propose methods for optimized design and control of the scanning trajectory of two-dimensional resonant scanners under various physical constraints, including high frame-rate and limited actuation amplitude. First, we propose an analytical design rule for uniform spatial sampling. We demonstrate theoretically and experimentally that by including non-repeating scanning patterns, the proposed designs outperform previous designs in terms of scanning range and fill factor. Second, we show that we can create flexible scanning patterns that allow focusing on user-defined Regions-of-Interest (RoI) by modulation of the scanning parameters. The scanning parameters are found by an optimization algorithm. In simulations, we demonstrate the benefits of these designs with standard metrics and higher-level computer vision tasks (LiDAR odometry and 3D object detection). Finally, we experimentally implement and verify both unmodulated and modulated scanning modes using a two-dimensional, resonant MEMS scanner. Central to the implementations is high bandwidth monitoring of the phase of the angular scans in both dimensions. This task is carried out with a position-sensitive photodetector combined with high-bandwidth electronics, enabling fast spatial sampling at ~ 100Hzへるつ frame-rate. △ Less

Submitted 6 August, 2021; v1 submitted 24 March, 2021; originally announced March 2021.

Comments: 16 pages, 11 figures

Journal ref: Sci Rep 11, 20011 (2021)

arXiv:1907.08925 [pdf]

doi 10.1063/5.0031166

High-precision nonlocal temporal correlation identification of entangled photon pairs for quantum clock synchronization

Authors: Runai Quan, Ruifang Don, Xiao Xiang, Baihong Li, Tao Liu, Shougang Zhang

Abstract: High-precision nonlocal temporal correlation identification in the entangled photon pairs is critical to measure the time offset between remote independent time scales for many quantum information applications. The first nonlocal correlation identification was reported in 2009, which extracts the time offset via the algorithm of iterative fast Fourier transformations (FFTs) and their inverse. The… ▽ More High-precision nonlocal temporal correlation identification in the entangled photon pairs is critical to measure the time offset between remote independent time scales for many quantum information applications. The first nonlocal correlation identification was reported in 2009, which extracts the time offset via the algorithm of iterative fast Fourier transformations (FFTs) and their inverse. The least identification resolution is restricted by the peak identification threshold of the algorithm, and thus the time offset calculation precision is limited. In this paper, an improvement for the identification is presented both in the resolution and precision via a modified algorithm of direct cross correlation extraction. A flexible resolution down to 1 ps is realized, which is only dependent on the Least Significant Bit (LSB) resolution of the time-tagging device. The attainable precision is shown mainly determined by the inherent timing jitter of the single photon detectors, the acquired pair rate and acquisition time, and a sub picosecond precision (0.72 ps) has been achieved at an acquisition time of 4.5 s. This high-precision nonlocal measurement realization provides a solid foundation for the field applications of entanglement-based quantum clock synchronization, ranging and communications. △ Less

Submitted 15 December, 2020; v1 submitted 21 July, 2019; originally announced July 2019.

Comments: 13 pages, 5 figures

arXiv:1906.03769 [pdf]

doi 10.1103/PhysRevA.100.053803

Nonlocality test of energy-time entanglement via nonlocal dispersion cancellation with nonlocal detection

Authors: Baihong Li, Feiyan Hou, Runai Quan, Ruifang Dong, Lixing You, Hao Li, Xiao Xiang, Tao Liu, Shougang Zhang

Abstract: Energy-time entangled biphoton source plays a great role in quantum communication, quantum metrology and quantum cryptography due to its strong temporal correlation and capability of nonlocal dispersion cancellation. As a quantum effect, nonlocal dispersion cancellation is further proposed as an alternative way for nonlocality test of continuous variable entanglement via the violation of Bell-like… ▽ More Energy-time entangled biphoton source plays a great role in quantum communication, quantum metrology and quantum cryptography due to its strong temporal correlation and capability of nonlocal dispersion cancellation. As a quantum effect, nonlocal dispersion cancellation is further proposed as an alternative way for nonlocality test of continuous variable entanglement via the violation of Bell-like inequality proposed by Wasak et al. [Phys. Rev. A, 82, 052120 (2010)]. However, to date there is no experimental report either on the inequality violation or on a nonlocal detection with single-photon detectors at long-distance transmission channel, which is key for a true nonlocality test. In this paper, we report an experimental realization of a violation of the inequality after 62km optical fiber transmission at telecom wavelength with a nonlocal detection based on event timers and cross-correlation algorithm, which indicates a successful nonlocal test of energy-time entanglement. This work provides a new feasibility for the strict test of the nonlocality for continuous variables in both long-distance communication fiber channel and free space. △ Less

Submitted 5 October, 2019; v1 submitted 9 June, 2019; originally announced June 2019.

Comments: 6 pages,3 figures

Journal ref: Phys. Rev. A 100, 053803 (2019)

arXiv:1903.09776 [pdf, other]

Auto-ReID: Searching for a Part-aware ConvNet for Person Re-Identification

Authors: Ruijie Quan, Xuanyi Dong, Yu Wu, Linchao Zhu, Yi Yang

Abstract: Prevailing deep convolutional neural networks (CNNs) for person re-IDentification (reID) are usually built upon ResNet or VGG backbones, which were originally designed for classification. Because reID is different from classification, the architecture should be modified accordingly. We propose to automatically search for a CNN architecture that is specifically suitable for the reID task. There are… ▽ More Prevailing deep convolutional neural networks (CNNs) for person re-IDentification (reID) are usually built upon ResNet or VGG backbones, which were originally designed for classification. Because reID is different from classification, the architecture should be modified accordingly. We propose to automatically search for a CNN architecture that is specifically suitable for the reID task. There are three aspects to be tackled. First, body structural information plays an important role in reID but it is not encoded in backbones. Second, Neural Architecture Search (NAS) automates the process of architecture design without human effort, but no existing NAS methods incorporate the structure information of input images. Third, reID is essentially a retrieval task but current NAS algorithms are merely designed for classification. To solve these problems, we propose a retrieval-based search algorithm over a specifically designed reID search space, named Auto-ReID. Our Auto-ReID enables the automated approach to find an efficient and effective CNN architecture for reID. Extensive experiments demonstrate that the searched architecture achieves state-of-the-art performance while reducing 50% parameters and 53% FLOPs compared to others. △ Less

Submitted 20 August, 2019; v1 submitted 23 March, 2019; originally announced March 2019.

Comments: Accepted to ICCV 2019

arXiv:1812.10077 [pdf, ps, other]

doi 10.1103/PhysRevA.100.023849

Fiber-Optic quantum two-way time transfer with frequency entangled pulses

Authors: Feiyan Hou, Runai Quan, Ruifang Dong, Xiao Xiang, Baihong Li, Tao Liu, Xiaoyan Yang, Hao Li, Lixing You, Zhen Wang, Shougang Zhang

Abstract: High-precision time transfer is of fundamental interest in physics and metrology. Quantum time transfer technologies that use frequency-entangled pulses and their coincidence detection have been proposed, offering potential enhancements in precision and better guarantees of security. In this paper, we describe a fiber-optic two-way quantum time transfer experiment. Using quantum nonlocal dispersio… ▽ More High-precision time transfer is of fundamental interest in physics and metrology. Quantum time transfer technologies that use frequency-entangled pulses and their coincidence detection have been proposed, offering potential enhancements in precision and better guarantees of security. In this paper, we describe a fiber-optic two-way quantum time transfer experiment. Using quantum nonlocal dispersion cancellation, time transfer over a 20-km fiber link achieves a time deviation of 922 fs over 5 s and 45 fs over 40960 s. The time transfer accuracy as a function of fiber lengths from 15 m to 20 km is also investigated, and an uncertainty of 2.46 ps in standard deviation is observed. In comparison with its classical counterparts, the fiber-optic two-way quantum time transfer setup shows appreciable improvement, and further enhancements could be obtained by using new event timers with sub-picosecond precision and single-photon detectors with lower timing jitter for optimized coincidence detection. Combined with its security advantages, the femtosecond-scale two-way quantum time transfer is expected to have numerous applications in high-precision middle-haul synchronization systems. △ Less

Submitted 6 August, 2019; v1 submitted 25 December, 2018; originally announced December 2018.

Comments: 10 pages, 7 figures

Journal ref: Phys. Rev. A 100, 023849 (2019)

arXiv:1605.01286 [pdf]

doi 10.1007/s00340-016-6402-3

An efficient source of frequency anti-correlated entanglement at telecom wavelength

Authors: Feiyan Hou, Xiao Xiang, Runai Quan, Mengmeng Wang, Yiwei Zhai, Shaofeng Wang, Tao Liu, Shougang Zhang, Ruifang Dong

Abstract: We demonstrate an efficient generation of frequency anti-correlated entangled photon pairs at telecom wavelength. The fundamental laser is a continuous-wave high-power fiber laser at 1560 nm, through an extracavity frequency doubling system, a 780-nm pump with a power as high as 742 mW is realized. After single passing through a periodically poled KTiOPO4 (PPKTP) crystal, degenerate down-converted… ▽ More We demonstrate an efficient generation of frequency anti-correlated entangled photon pairs at telecom wavelength. The fundamental laser is a continuous-wave high-power fiber laser at 1560 nm, through an extracavity frequency doubling system, a 780-nm pump with a power as high as 742 mW is realized. After single passing through a periodically poled KTiOPO4 (PPKTP) crystal, degenerate down-converted photon pairs are generated. With an overall detection efficiency of 14.8 %, the count rates of the single photons and coincidence of the photon pairs are measured to be 370 kHzきろへるつ and 22 kHzきろへるつ, respectively. The spectra of the signal and idler photons are centered at 1560.23 and 1560.04 nm, while their 3-dB bandwidths being 3.22 nm both. The joint spectrum of the photon pair is observed to be frequency anti correlated and have a spectral bandwidth of 0.52 nm. According to the ratio of the single photon spectral bandwidth to the joint spectral bandwidth of the photon pairs, the degree of frequency entanglement is quantified to be 6.19. Based on a Hong Ou Mandel interferometric coincidence measurement, a frequency indistinguishability of 95 % is demonstrated. The good agreements with the theoretical estimations show that the inherent extra intensity noise in fiber lasers has little influence on frequency entanglement of the generated photon pairs. △ Less

Submitted 4 May, 2016; originally announced May 2016.

Comments: 8 pages,7 figures

Journal ref: Apply physics B 2016

arXiv:1602.06371 [pdf, ps, other]

doi 10.1038/srep30453

Demonstration of quantum synchronization based on second-order quantum coherence of entangled photons

Authors: Runai Quan, Yiwei Zhai, Mengmeng Wang, Feiyan Hou, Shaofeng Wang, Xiao Xiang, Tao Liu, Shougang Zhang, Ruifang Dong

Abstract: Based on the second-order quantum interference between frequency entangled photons that are generated by parametric down conversion, a quantum strategic algorithm for synchronizing two spatially separated clocks has been recently presented. In the reference frame of a Hong-Ou-Mandel (HOM) interferometer, photon correlations are used to define simultaneous events. Once the HOM interferometer is bal… ▽ More Based on the second-order quantum interference between frequency entangled photons that are generated by parametric down conversion, a quantum strategic algorithm for synchronizing two spatially separated clocks has been recently presented. In the reference frame of a Hong-Ou-Mandel (HOM) interferometer, photon correlations are used to define simultaneous events. Once the HOM interferometer is balanced by use of an adjustable optical delay in one arm, arrival times of simultaneously generated photons are recorded by each clock. The clock offset is determined by correlation measurement of the recorded arrival times. Utilizing this algorithm, we demonstrate a proof-of-principle experiment for synchronizing two clocks separated by 4km fiber link. A minimum timing stability of 0.4 ps at averaging time of 16000 s is achieved with an absolute time accuracy of 59.4 ps. The timing stability is verified to be limited by the correlation measurement device and ideally can be better than 10 fs. Such results shine a light to the application of quantum clock synchronization in the real high-accuracy timing system. △ Less

Submitted 20 February, 2016; originally announced February 2016.

Journal ref: Scientific Reports, 6: 30453 (2016)

Showing 1–37 of 37 results for author: Quan, R