Search | arXiv e-print repository

TVR-Ranking: A Dataset for Ranked Video Moment Retrieval with Imprecise Queries

Authors: Renjie Liang, Li Li, Chongzhi Zhang, Jing Wang, Xizhou Zhu, Aixin Sun

Abstract: In this paper, we propose the task of \textit{Ranked Video Moment Retrieval} (RVMR) to locate a ranked list of matching moments from a collection of videos, through queries in natural language. Although a few related tasks have been proposed and studied by CV, NLP, and IR communities, RVMR is the task that best reflects the practical setting of moment search. To facilitate research in RVMR, we dev… ▽ More In this paper, we propose the task of \textit{Ranked Video Moment Retrieval} (RVMR) to locate a ranked list of matching moments from a collection of videos, through queries in natural language. Although a few related tasks have been proposed and studied by CV, NLP, and IR communities, RVMR is the task that best reflects the practical setting of moment search. To facilitate research in RVMR, we develop the TVR-Ranking dataset, based on the raw videos and existing moment annotations provided in the TVR dataset. Our key contribution is the manual annotation of relevance levels for 94,442 query-moment pairs. We then develop the $NDCG@K, IoU\geq μみゅー$ evaluation metric for this new task and conduct experiments to evaluate three baseline models. Our experiments show that the new RVMR task brings new challenges to existing models and we believe this new dataset contributes to the research on multi-modality search. The dataset is available at \url{https://github.com/Ranking-VMR/TVR-Ranking} △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.04579 [pdf, other]

GOALPlace: Begin with the End in Mind

Authors: Anthony Agnesina, Rongjian Liang, Geraldo Pradipta, Anand Rajaram, Haoxing Ren

Abstract: Co-optimizing placement with congestion is integral to achieving high-quality designs. This paper presents GOALPlace, a new learning-based general approach to improving placement congestion by controlling cell density. Our method efficiently learns from an EDA tool's post-route optimized results and uses an empirical Bayes technique to adapt this goal/target to a specific placer's solutions, effec… ▽ More Co-optimizing placement with congestion is integral to achieving high-quality designs. This paper presents GOALPlace, a new learning-based general approach to improving placement congestion by controlling cell density. Our method efficiently learns from an EDA tool's post-route optimized results and uses an empirical Bayes technique to adapt this goal/target to a specific placer's solutions, effectively beginning with the end in mind. It enhances correlation with the long-running heuristics of the tool's router and timing-opt engine -- while solving placement globally without expensive incremental congestion estimation and mitigation methods. A statistical analysis with a new hierarchical netlist clustering establishes the importance of density and the potential for an adequate cell density target across placements. Our experiments show that our method, integrated as a demonstration inside an academic GPU-accelerated global placer, consistently produces macro and standard cell placements of superior or comparable quality to commercial tools. Our empirical Bayes methodology also allows a substantial quality improvement over state-of-the-art academic mixed-size placers, achieving up to 10x fewer design rule check (DRC) violations, a 5% decrease in wirelength, and a 30% and 60% reduction in worst and total negative slack (WNS/TNS). △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: 10 pages, 7 figures, preprint

arXiv:2407.02037 [pdf, other]

Saving Private WAN: Using Internet Paths to Offload WAN Traffic in Conferencing Services

Authors: Bhaskar Kataria, Palak LNU, Rahul Bothra, Rohan Gandhi, Debopam Bhattacherjee, Venkata N. Padmanabhan, Irena Atov, Sriraam Ramakrishnan, Somesh Chaturmohta, Chakri Kotipalli, Rui Liang, Ken Sueda, Xin He, Kevin Hinton

Abstract: Large-scale video conferencing services incur significant network cost while serving surging global demands. Our work systematically explores the opportunity to offload a fraction of this traffic to the Internet, a cheaper routing option offered already by cloud providers, from WAN without drop in application performance. First, with a large-scale latency measurement study with 3.5 million data po… ▽ More Large-scale video conferencing services incur significant network cost while serving surging global demands. Our work systematically explores the opportunity to offload a fraction of this traffic to the Internet, a cheaper routing option offered already by cloud providers, from WAN without drop in application performance. First, with a large-scale latency measurement study with 3.5 million data points per day spanning 241K source cities and 21 data centers across the globe, we demonstrate that Internet paths perform comparable to or better than the private WAN for parts of the world (e.g., Europe and North America). Next, we present Titan, a live (12+ months) production system that carefully moves a fraction of the conferencing traffic to the Internet using the above observation. Finally, we propose Titan-Next, a research prototype that jointly assigns the conferencing server and routing option (Internet or WAN) for individual calls. With 5 weeks of production data, we show Titan-Next reduces the sum of peak bandwidth on WAN links that defines the operational network cost by up to 61% compared to state-of-the-art baselines. We will open-source parts of the measurement data. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2406.06356 [pdf]

doi 10.1145/3656156.3663691

Re.Dis.Cover Place with Generative AI: Exploring the Experience and Design of City Wandering with Image-to-Image AI

Authors: Peng-Kai Hung, Janet Yi-Ching Huang, Stephan Wensveen, Rung-Huei Liang

Abstract: The HCI field has demonstrated a growing interest in leveraging emerging technologies to enrich urban experiences. However, insufficient studies investigate the experience and design space of AI image technology (AIGT) applications for playful urban interaction, despite its widespread adoption. To explore this gap, we conducted an exploratory study involving four participants who wandered and phot… ▽ More The HCI field has demonstrated a growing interest in leveraging emerging technologies to enrich urban experiences. However, insufficient studies investigate the experience and design space of AI image technology (AIGT) applications for playful urban interaction, despite its widespread adoption. To explore this gap, we conducted an exploratory study involving four participants who wandered and photographed within Eindhoven Centre and interacted with an image-to-image AI. Preliminary findings present their observations, the effect of their familiarity with places, and how AIGT becomes an explorer's tool or co-speculator. We then highlight AIGT's capability of supporting playfulness, reimaginations, and rediscoveries of places through defamiliarizing and familiarizing cityscapes. Additionally, we propose the metaphor AIGT as a 'tourist' to discuss its opportunities for engaging explorations and risks of stereotyping places. Collectively, our research provides initial empirical insights and design considerations, inspiring future HCI endeavors for creating urban play with generative AI. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.06192 [pdf, other]

doi 10.1145/3656156.3663692

AI Cat Narrator: Designing an AI Tool for Exploring the Shared World and Social Connection with a Cat

Authors: Zhenchi Lai, Janet Yi-Ching Huang, Rung-Huei Liang

Abstract: As technology continues to advance, the interaction between humans and cats is becoming more diverse. Our research introduces a new tool called the AI Cat Narrator, which offers a unique perspective on the shared lives of humans and cats. We combined the method of ethnography with fictional storytelling, using a defamiliarization strategy to merge real-world data seen through the eyes of cats with… ▽ More As technology continues to advance, the interaction between humans and cats is becoming more diverse. Our research introduces a new tool called the AI Cat Narrator, which offers a unique perspective on the shared lives of humans and cats. We combined the method of ethnography with fictional storytelling, using a defamiliarization strategy to merge real-world data seen through the eyes of cats with excerpts from cat literature. This combination serves as the foundation for a database to instruct the AI Cat Narrator in crafting alternative narrative. Our findings indicate that using defamiliarized data for training purposes significantly contributes to the development of characters that are both more empathetic and individualized. The contributions of our study are twofold: 1) proposing an innovative approach to prompting a reevaluation of living alongside cats; 2) establishing a collaborative, exploratory tool developed by humans, cats, and AI together. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 5 pages

arXiv:2406.00921 [pdf, other]

Towards Effective Detection of Ponzi schemes on Ethereum with Contract Runtime Behavior Graph

Authors: Ruichao Liang, Jing Chen, Cong Wu, Kun He, Yueming Wu, Weisong Sun, Ruiying Du, Qingchuan Zhao, Yang Liu

Abstract: Ponzi schemes, a form of scam, have been discovered in Ethereum smart contracts in recent years, causing massive financial losses. Existing detection methods primarily focus on rule-based approaches and machine learning techniques that utilize static information as features. However, these methods have significant limitations. Rule-based approaches rely on pre-defined rules with limited capabiliti… ▽ More Ponzi schemes, a form of scam, have been discovered in Ethereum smart contracts in recent years, causing massive financial losses. Existing detection methods primarily focus on rule-based approaches and machine learning techniques that utilize static information as features. However, these methods have significant limitations. Rule-based approaches rely on pre-defined rules with limited capabilities and domain knowledge dependency. Using static information like opcodes for machine learning fails to effectively characterize Ponzi contracts, resulting in poor reliability and interpretability. Moreover, relying on static information like transactions for machine learning requires a certain number of transactions to achieve detection, which limits the scalability of detection and hinders the identification of 0-day Ponzi schemes. In this paper, we propose PonziGuard, an efficient Ponzi scheme detection approach based on contract runtime behavior. Inspired by the observation that a contract's runtime behavior is more effective in disguising Ponzi contracts from the innocent contracts, PonziGuard establishes a comprehensive graph representation called contract runtime behavior graph (CRBG), to accurately depict the behavior of Ponzi contracts. Furthermore, it formulates the detection process as a graph classification task on CRBG, enhancing its overall effectiveness. The experiment results show that PonziGuard surpasses the current state-of-the-art approaches in the ground-truth dataset. We applied PonziGuard to Ethereum Mainnet and demonstrated its effectiveness in real-world scenarios. Using PonziGuard, we identified 805 Ponzi contracts on Ethereum Mainnet, which have resulted in an estimated economic loss of 281,700 Ether or approximately $500 million USD. We also found 0-day Ponzi schemes in the recently deployed 10,000 smart contracts. △ Less

Submitted 2 June, 2024; originally announced June 2024.

Comments: Submitted to ACM Transactions on Software Engineering and Methodology

arXiv:2406.00703 [pdf, other]

A Partition-insensitive Parallel Framework for Distributed Model Fitting

Authors: Xiaofei Wu, Rongmei Liang, Fabio Roli, Marcello Pelillo, Jing Yuan

Abstract: Distributed model fitting refers to the process of fitting a mathematical or statistical model to the data using distributed computing resources, such that computing tasks are divided among multiple interconnected computers or nodes, often organized in a cluster or network. Most of the existing methods for distributed model fitting are to formulate it in a consensus optimization problem, and then… ▽ More Distributed model fitting refers to the process of fitting a mathematical or statistical model to the data using distributed computing resources, such that computing tasks are divided among multiple interconnected computers or nodes, often organized in a cluster or network. Most of the existing methods for distributed model fitting are to formulate it in a consensus optimization problem, and then build up algorithms based on the alternating direction method of multipliers (ADMM). This paper introduces a novel parallel framework for achieving a distributed model fitting. In contrast to previous consensus frameworks, the introduced parallel framework offers two notable advantages. Firstly, it exhibits insensitivity to sample partitioning, meaning that the solution of the algorithm remains unaffected by variations in the number of slave nodes or/and the amount of data each node carries. Secondly, fewer variables are required to be updated at each iteration, so that the proposed parallel framework performs in a more succinct and efficient way, and adapts to high-dimensional data. In addition, we prove that the algorithms under the new parallel framework have a worst-case linear convergence rate in theory. Numerical experiments confirm the generality, robustness, and accuracy of our proposed parallel framework. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2405.15451 [pdf, other]

Self-distilled Dynamic Fusion Network for Language-based Fashion Retrieval

Authors: Yiming Wu, Hangfei Li, Fangfang Wang, Yilong Zhang, Ronghua Liang

Abstract: In the domain of language-based fashion image retrieval, pinpointing the desired fashion item using both a reference image and its accompanying textual description is an intriguing challenge. Existing approaches lean heavily on static fusion techniques, intertwining image and text. Despite their commendable advancements, these approaches are still limited by a deficiency in flexibility. In respons… ▽ More In the domain of language-based fashion image retrieval, pinpointing the desired fashion item using both a reference image and its accompanying textual description is an intriguing challenge. Existing approaches lean heavily on static fusion techniques, intertwining image and text. Despite their commendable advancements, these approaches are still limited by a deficiency in flexibility. In response, we propose a Self-distilled Dynamic Fusion Network to compose the multi-granularity features dynamically by considering the consistency of routing path and modality-specific information simultaneously. Two new modules are included in our proposed method: (1) Dynamic Fusion Network with Modality Specific Routers. The dynamic network enables a flexible determination of the routing for each reference image and modification text, taking into account their distinct semantics and distributions. (2) Self Path Distillation Loss. A stable path decision for queries benefits the optimization of feature extraction as well as routing, and we approach this by progressively refine the path decision with previous path information. Extensive experiments demonstrate the effectiveness of our proposed model compared to existing methods. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: ICASSP 2024

arXiv:2405.09114 [pdf, other]

SOEDiff: Efficient Distillation for Small Object Editing

Authors: Qihe Pan, Zicheng Wang, Zhen Zhao, Yiming Wu, Sifan Long, Haoran Liang, Ronghua Liang

Abstract: In this paper, we delve into a new task known as small object editing (SOE), which focuses on text-based image inpainting within a constrained, small-sized area. Despite the remarkable success have been achieved by current image inpainting approaches, their application to the SOE task generally results in failure cases such as Object Missing, Text-Image Mismatch, and Distortion. These failures ste… ▽ More In this paper, we delve into a new task known as small object editing (SOE), which focuses on text-based image inpainting within a constrained, small-sized area. Despite the remarkable success have been achieved by current image inpainting approaches, their application to the SOE task generally results in failure cases such as Object Missing, Text-Image Mismatch, and Distortion. These failures stem from the limited use of small-sized objects in training datasets and the downsampling operations employed by U-Net models, which hinders accurate generation. To overcome these challenges, we introduce a novel training-based approach, SOEDiff, aimed at enhancing the capability of baseline models like StableDiffusion in editing small-sized objects while minimizing training costs. Specifically, our method involves two key components: SO-LoRA, which efficiently fine-tunes low-rank matrices, and Cross-Scale Score Distillation loss, which leverages high-resolution predictions from the pre-trained teacher diffusion model. Our method presents significant improvements on the test dataset collected from MSCOCO and OpenImage, validating the effectiveness of our proposed method in small object editing. In particular, when comparing SOEDiff with SD-I model on the OpenImage-f dataset, we observe a 0.99 improvement in CLIP-Score and a reduction of 2.87 in FID. Our project page can be found in https://soediff.github.io/. △ Less

Submitted 15 May, 2024; originally announced May 2024.

arXiv:2403.14205 [pdf]

Solvent-Free Silsesquioxane Self-Welding for 3D Printing Multi-Refractive Index Glass Objects

Authors: Piaoran Ye, Zhihan Hong, Douglas A. Loy, Rongguang Liang

Abstract: The growing interest in 3D printing of silica glass has spurred substantial research efforts. Our prior work utilizing a liquid silica resin (LSR) demonstrated high printing accuracy and resolution. However, the resin's sensitivity to moisture posed limitations, restricting the printing environment. On the other hand, polyhedral oligomeric silsesquioxane (POSS)-based materials offer excellent wate… ▽ More The growing interest in 3D printing of silica glass has spurred substantial research efforts. Our prior work utilizing a liquid silica resin (LSR) demonstrated high printing accuracy and resolution. However, the resin's sensitivity to moisture posed limitations, restricting the printing environment. On the other hand, polyhedral oligomeric silsesquioxane (POSS)-based materials offer excellent water stability and sinterless features. Yet, they suffer from relatively high shrinkage due to the presence of additional organic monomers. In this study, we present a polymeric silsesquioxane (PSQ) resin with reduced shrinkage, enhanced moisture stability, and the retention of sinterless features, providing a promising solution for achieving high-resolution 3D printing of glass objects. Leveraging the two-photon polymerization (2PP) method, we realized nanostructures with feature sizes below 80 nm. Moreover, we demonstrate the tunability of the refractive index by incorporating zirconium moieties into the resin, facilitating the fabrication of glass micro-optics with varying refractive indices. Importantly, the self-welding capability observed between two individual components provides a flexible approach for producing micro-optics with multiple components, each possessing distinct refractive indices. This research represents a significant advancement in the field of advanced glass manufacturing, paving the way for future applications in micro- and nano-scale glass objects. △ Less

Submitted 21 March, 2024; originally announced March 2024.

arXiv:2403.00228 [pdf, other]

DISORF: A Distributed Online NeRF Training and Rendering Framework for Mobile Robots

Authors: Chunlin Li, Ruofan Liang, Hanrui Fan, Zhengen Zhang, Sankeerth Durvasula, Nandita Vijaykumar

Abstract: We present a framework, DISORF, to enable online 3D reconstruction and visualization of scenes captured by resource-constrained mobile robots and edge devices. To address the limited compute capabilities of edge devices and potentially limited network availability, we design a framework that efficiently distributes computation between the edge device and remote server. We leverage on-device SLAM s… ▽ More We present a framework, DISORF, to enable online 3D reconstruction and visualization of scenes captured by resource-constrained mobile robots and edge devices. To address the limited compute capabilities of edge devices and potentially limited network availability, we design a framework that efficiently distributes computation between the edge device and remote server. We leverage on-device SLAM systems to generate posed keyframes and transmit them to remote servers that can perform high quality 3D reconstruction and visualization at runtime by leveraging NeRF models. We identify a key challenge with online NeRF training where naive image sampling strategies can lead to significant degradation in rendering quality. We propose a novel shifted exponential frame sampling method that addresses this challenge for online NeRF training. We demonstrate the effectiveness of our framework in enabling high-quality real-time reconstruction and visualization of unknown scenes as they are captured and streamed from cameras in mobile robots and edge devices. △ Less

Submitted 29 February, 2024; originally announced March 2024.

arXiv:2402.12999 [pdf, other]

Robust single divacancy defects near stacking faults in 4H-SiC under resonant excitation

Authors: Zhen-Xuan He, Ji-Yang Zhou, Wu-Xi Lin, Qiang Li, Rui-Jian Liang, Jun-Feng Wang, Xiao-Lei Wen, Zhi-He Hao, Wei Liu, Shuo Ren, Hao Li, Li-Xing You, Jian-Shun Tang, Jin-Shi Xu, Chuan-Feng Li, Guang-Can Guo

Abstract: Color centers in silicon carbide (SiC) have demonstrated significant promise for quantum information processing. However, the undesirable ionization process that occurs during optical manipulation frequently causes fluctuations in the charge state and performance of these defects, thereby restricting the effectiveness of spin-photon interfaces. Recent predictions indicate that divacancy defects ne… ▽ More Color centers in silicon carbide (SiC) have demonstrated significant promise for quantum information processing. However, the undesirable ionization process that occurs during optical manipulation frequently causes fluctuations in the charge state and performance of these defects, thereby restricting the effectiveness of spin-photon interfaces. Recent predictions indicate that divacancy defects near stacking faults possess the capability to stabilize their neutral charge states, thereby providing robustness against photoionization effects. In this work, we present a comprehensive protocol for the scalable and targeted fabrication of single divacancy arrays in 4H-SiC using a high-resolution focused helium ion beam. Through photoluminescence emission (PLE) experiments, we demonstrate long-term emission stability with minimal linewidth shift ($\sim$ 50 MHz over 3 hours) for the single c-axis divacancies within stacking faults. By measuring the ionization rate for different polytypes of divacancies, we found that the divacancies within stacking faults are more robust against resonant excitation. Additionally, angle-resolved PLE spectra reveal their two resonant-transition lines with mutually orthogonal polarizations. Notably, the PLE linewidths are approximately 7 times narrower and the spin-coherent times are 6 times longer compared to divacancies generated via carbon-ion implantation. These findings highlight the immense potential of SiC divacancies for on-chip quantum photonics and the construction of efficient spin-to-photon interfaces, indicating a significant step forward in the development of quantum technologies. △ Less

Submitted 20 February, 2024; originally announced February 2024.

Comments: 11 pages, 4 figures

arXiv:2402.10259 [pdf, other]

GaussianObject: Just Taking Four Images to Get A High-Quality 3D Object with Gaussian Splatting

Authors: Chen Yang, Sikuang Li, Jiemin Fang, Ruofan Liang, Lingxi Xie, Xiaopeng Zhang, Wei Shen, Qi Tian

Abstract: Reconstructing and rendering 3D objects from highly sparse views is of critical importance for promoting applications of 3D vision techniques and improving user experience. However, images from sparse views only contain very limited 3D information, leading to two significant challenges: 1) Difficulty in building multi-view consistency as images for matching are too few; 2) Partially omitted or hig… ▽ More Reconstructing and rendering 3D objects from highly sparse views is of critical importance for promoting applications of 3D vision techniques and improving user experience. However, images from sparse views only contain very limited 3D information, leading to two significant challenges: 1) Difficulty in building multi-view consistency as images for matching are too few; 2) Partially omitted or highly compressed object information as view coverage is insufficient. To tackle these challenges, we propose GaussianObject, a framework to represent and render the 3D object with Gaussian splatting, that achieves high rendering quality with only 4 input images. We first introduce techniques of visual hull and floater elimination which explicitly inject structure priors into the initial optimization process for helping build multi-view consistency, yielding a coarse 3D Gaussian representation. Then we construct a Gaussian repair model based on diffusion models to supplement the omitted object information, where Gaussians are further refined. We design a self-generating strategy to obtain image pairs for training the repair model. Our GaussianObject is evaluated on several challenging datasets, including MipNeRF360, OmniObject3D, and OpenIllumination, achieving strong reconstruction results from only 4 views and significantly outperforming previous state-of-the-art methods. △ Less

Submitted 20 February, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

Comments: Project page: https://gaussianobject.github.io/

arXiv:2402.10045 [pdf]

Short-Form Videos and Mental Health: A Knowledge-Guided Neural Topic Model

Authors: Jiaheng Xie, Ruicheng Liang, Yidong Chai, Yang Liu, Daniel Zeng

Abstract: While short-form videos head to reshape the entire social media landscape, experts are exceedingly worried about their depressive impacts on viewers, as evidenced by medical studies. To prevent widespread consequences, platforms are eager to predict these videos' impact on viewers' mental health. Subsequently, they can take intervention measures, such as revising recommendation algorithms and disp… ▽ More While short-form videos head to reshape the entire social media landscape, experts are exceedingly worried about their depressive impacts on viewers, as evidenced by medical studies. To prevent widespread consequences, platforms are eager to predict these videos' impact on viewers' mental health. Subsequently, they can take intervention measures, such as revising recommendation algorithms and displaying viewer discretion. Nevertheless, applicable predictive methods lack relevance to well-established medical knowledge, which outlines clinically proven external and environmental factors of depression. To account for such medical knowledge, we resort to an emergent methodological discipline, seeded Neural Topic Models (NTMs). However, existing seeded NTMs suffer from the limitations of single-origin topics, unknown topic sources, unclear seed supervision, and suboptimal convergence. To address those challenges, we develop a novel Knowledge-guided Multimodal NTM to predict a short-form video's depressive impact on viewers. Extensive empirical analyses using TikTok and Douyin datasets prove that our method outperforms state-of-the-art benchmarks. Our method also discovers medically relevant topics from videos that are linked to depressive impact. We contribute to IS with a novel video analytics method that is generalizable to other video classification problems. Practically, our method can help platforms understand videos' mental impacts, thus adjusting recommendations and video topic disclosure. △ Less

Submitted 21 March, 2024; v1 submitted 10 January, 2024; originally announced February 2024.

arXiv:2402.02508 [pdf]

Deconstructing the spin susceptibility of a cuprate superconductor

Authors: R. Zhou, I. Vinograd, M. Hirata, T. Wu, H. Mayaffre, S. Krämer, W. N. Hardy, R. Liang, D. A. Bonn, T. Loew, J. Porras, B. Keimer, M. -H. Julien

Abstract: A major obstacle to understanding high-Tc cuprates is that superconductivity precludes observing normal-state properties at low temperatures. One prime example is the normal-state spin susceptibility: although its decrease upon cooling far above Tc typifies pseudogap behavior, its behavior at low temperatures is generally unknown. Here, our measurements in high magnetic fields expose the spin susc… ▽ More A major obstacle to understanding high-Tc cuprates is that superconductivity precludes observing normal-state properties at low temperatures. One prime example is the normal-state spin susceptibility: although its decrease upon cooling far above Tc typifies pseudogap behavior, its behavior at low temperatures is generally unknown. Here, our measurements in high magnetic fields expose the spin susceptibility of YBa2Cu3Oy down to low temperatures. Even though superconductivity is suppressed by the field, we uncover two thermally-activated contributions alongside a residual susceptibility at T=0 due to gapless excitations. We relate these two distinct gaps to short-range charge-density waves and to the formation of spin singlets similar to those found in certain quantum spin systems. These phenomena thus collectively contribute to the pseudogap in the spin susceptibility at low temperature, supplementing short-lived antiferromagnetism known to initiate pseudogap behavior at high temperatures. We therefore propose that the pseudogap should be regarded as a composite property. △ Less

Submitted 4 February, 2024; originally announced February 2024.

arXiv:2401.15239 [pdf, other]

MEA-Defender: A Robust Watermark against Model Extraction Attack

Authors: Peizhuo Lv, Hualong Ma, Kai Chen, Jiachen Zhou, Shengzhi Zhang, Ruigang Liang, Shenchen Zhu, Pan Li, Yingjun Zhang

Abstract: Recently, numerous highly-valuable Deep Neural Networks (DNNs) have been trained using deep learning algorithms. To protect the Intellectual Property (IP) of the original owners over such DNN models, backdoor-based watermarks have been extensively studied. However, most of such watermarks fail upon model extraction attack, which utilizes input samples to query the target model and obtains the corr… ▽ More Recently, numerous highly-valuable Deep Neural Networks (DNNs) have been trained using deep learning algorithms. To protect the Intellectual Property (IP) of the original owners over such DNN models, backdoor-based watermarks have been extensively studied. However, most of such watermarks fail upon model extraction attack, which utilizes input samples to query the target model and obtains the corresponding outputs, thus training a substitute model using such input-output pairs. In this paper, we propose a novel watermark to protect IP of DNN models against model extraction, named MEA-Defender. In particular, we obtain the watermark by combining two samples from two source classes in the input domain and design a watermark loss function that makes the output domain of the watermark within that of the main task samples. Since both the input domain and the output domain of our watermark are indispensable parts of those of the main task samples, the watermark will be extracted into the stolen model along with the main task during model extraction. We conduct extensive experiments on four model extraction attacks, using five datasets and six models trained based on supervised learning and self-supervised learning algorithms. The experimental results demonstrate that MEA-Defender is highly robust against different model extraction attacks, and various watermark removal/detection approaches. △ Less

Submitted 26 January, 2024; originally announced January 2024.

Comments: To Appear in IEEE Symposium on Security and Privacy 2024 (IEEE S&P 2024), MAY 20-23, 2024, SAN FRANCISCO, CA, USA

arXiv:2401.05345 [pdf, other]

DISTWAR: Fast Differentiable Rendering on Raster-based Rendering Pipelines

Authors: Sankeerth Durvasula, Adrian Zhao, Fan Chen, Ruofan Liang, Pawan Kumar Sanjaya, Nandita Vijaykumar

Abstract: Differentiable rendering is a technique used in an important emerging class of visual computing applications that involves representing a 3D scene as a model that is trained from 2D images using gradient descent. Recent works (e.g. 3D Gaussian Splatting) use a rasterization pipeline to enable rendering high quality photo-realistic imagery at high speeds from these learned 3D models. These methods… ▽ More Differentiable rendering is a technique used in an important emerging class of visual computing applications that involves representing a 3D scene as a model that is trained from 2D images using gradient descent. Recent works (e.g. 3D Gaussian Splatting) use a rasterization pipeline to enable rendering high quality photo-realistic imagery at high speeds from these learned 3D models. These methods have been demonstrated to be very promising, providing state-of-art quality for many important tasks. However, training a model to represent a scene is still a time-consuming task even when using powerful GPUs. In this work, we observe that the gradient computation phase during training is a significant bottleneck on GPUs due to the large number of atomic operations that need to be processed. These atomic operations overwhelm atomic units in the L2 partitions causing stalls. To address this challenge, we leverage the observations that during the gradient computation: (1) for most warps, all threads atomically update the same memory locations; and (2) warps generate varying amounts of atomic traffic (since some threads may be inactive). We propose DISTWAR, a software-approach to accelerate atomic operations based on two key ideas: First, we enable warp-level reduction of threads at the SM sub-cores using registers to leverage the locality in intra-warp atomic updates. Second, we distribute the atomic computation between the warp-level reduction at the SM and the L2 atomic units to increase the throughput of atomic computation. Warps with many threads performing atomic updates to the same memory locations are scheduled at the SM, and the rest using L2 atomic units. We implement DISTWAR using existing warp-level primitives. We evaluate DISTWAR on widely used raster-based differentiable rendering workloads. We demonstrate significant speedups of 2.44x on average (up to 5.7x). △ Less

Submitted 1 December, 2023; originally announced January 2024.

arXiv:2401.03522 [pdf, other]

Text-Driven Traffic Anomaly Detection with Temporal High-Frequency Modeling in Driving Videos

Authors: Rongqin Liang, Yuanman Li, Jiantao Zhou, Xia Li

Abstract: Traffic anomaly detection (TAD) in driving videos is critical for ensuring the safety of autonomous driving and advanced driver assistance systems. Previous single-stage TAD methods primarily rely on frame prediction, making them vulnerable to interference from dynamic backgrounds induced by the rapid movement of the dashboard camera. While two-stage TAD methods appear to be a natural solution to… ▽ More Traffic anomaly detection (TAD) in driving videos is critical for ensuring the safety of autonomous driving and advanced driver assistance systems. Previous single-stage TAD methods primarily rely on frame prediction, making them vulnerable to interference from dynamic backgrounds induced by the rapid movement of the dashboard camera. While two-stage TAD methods appear to be a natural solution to mitigate such interference by pre-extracting background-independent features (such as bounding boxes and optical flow) using perceptual algorithms, they are susceptible to the performance of first-stage perceptual algorithms and may result in error propagation. In this paper, we introduce TTHF, a novel single-stage method aligning video clips with text prompts, offering a new perspective on traffic anomaly detection. Unlike previous approaches, the supervised signal of our method is derived from languages rather than orthogonal one-hot vectors, providing a more comprehensive representation. Further, concerning visual representation, we propose to model the high frequency of driving videos in the temporal domain. This modeling captures the dynamic changes of driving scenes, enhances the perception of driving behavior, and significantly improves the detection of traffic anomalies. In addition, to better perceive various types of traffic anomalies, we carefully design an attentive anomaly focusing mechanism that visually and linguistically guides the model to adaptively focus on the visual context of interest, thereby facilitating the detection of traffic anomalies. It is shown that our proposed TTHF achieves promising performance, outperforming state-of-the-art competitors by +5.4% AUC on the DoTA dataset and achieving high generalization on the DADA dataset. △ Less

Submitted 15 April, 2024; v1 submitted 7 January, 2024; originally announced January 2024.

Comments: 14 pages, 7 figures

arXiv:2312.12169 [pdf, other]

doi 10.3390/universe10010010

Kilonova-Targeting Lightcurve Classification for Wide Field Survey Telescope

Authors: Runduo Liang, Zhengyan Liu, Lei Lei, Wen Zhao

Abstract: With the enhancement of sensitivity of Gravitational Wave (GW) detectors and capabilities of large survey facilities, such as Vera Rubin Observatory Legacy Survey of Space and Time (LSST) and 2.5-m Wide Field Survey Telescope (WFST), we now have the potential to detect an increasing number of distant kilonova (KN). However, distinguishing KN from the plethora of detected transients in ongoing and… ▽ More With the enhancement of sensitivity of Gravitational Wave (GW) detectors and capabilities of large survey facilities, such as Vera Rubin Observatory Legacy Survey of Space and Time (LSST) and 2.5-m Wide Field Survey Telescope (WFST), we now have the potential to detect an increasing number of distant kilonova (KN). However, distinguishing KN from the plethora of detected transients in ongoing and future follow-up surveys presents a significant challenge. In this study, our objective is to establish an efficient classification mechanism tailored for the follow-up survey conducted by WFST, with a specific focus on identifying KN associated with GW. We employ a novel temporal convolutional neural network architecture, trained using simulated multi-band photometry lasting for 3 days by WFST, accompanied by contextual information, i.e. luminosity distance information by GW. By comparison of the choices of contextual information, we can reach 95\% precision, and 94\% recall for our best model. It also performs good validation on photometry data on AT2017gfo and AT2019npv. Furthermore, we investigate the ability of the model to distinguish KN in a GW follow-up survey. We conclude that there is over 80\% probability that we can capture true KN in selected 20 candidates among $\sim 250$ detected astrophysical transients that have passed real-bogus filter and cross-matching. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: 19 pages, 9 figures, 2 tables. Accepted for publication in Universe

Journal ref: Universe 2024, 10(1), 10

arXiv:2312.01439 [pdf, other]

Absence of Fermi surface reconstruction in pressure-driven overdoped YBCO

Authors: Stanley W. Tozer, William A. Coniglio, Tobias Förster, Doug A. Bonn, Walter N. Hardy, Ruixing Liang, Erik Kampert, Audrey D. Grockowiak

Abstract: The evolution of the critical superconducting temperature and field, quantum oscillation frequencies and effective mass $m^{*}$ in underdoped YBa$_2$Cu$_3$O$_{7-δでるた}$ (YBCO) crystals ($p$ = 0.11, with $p$ the hole concentration per Cu atom) points to a partial suppression of the charge orders with increasing pressure up to 7 GPa, mimicking doping. Application of pressures up to 25 GPa pushes the sam… ▽ More The evolution of the critical superconducting temperature and field, quantum oscillation frequencies and effective mass $m^{*}$ in underdoped YBa$_2$Cu$_3$O$_{7-δでるた}$ (YBCO) crystals ($p$ = 0.11, with $p$ the hole concentration per Cu atom) points to a partial suppression of the charge orders with increasing pressure up to 7 GPa, mimicking doping. Application of pressures up to 25 GPa pushes the sample to the overdoped side of the superconducting dome. Contrary to other cuprates, or to doping studies on YBCO, the frequency of the quantum oscillations measured in that pressure range do not support the picture of a Fermi-surface reconstruction in the overdoped regime, but possibly point to the existence of a new charge order. △ Less

Submitted 3 December, 2023; originally announced December 2023.

Comments: 13 pages, 17 figures

arXiv:2311.14883 [pdf]

Predicting Potential School Shooters from Social Media Posts

Authors: Alana Cedeno, Rachel Liang, Sheikh Rabiul Islam

Abstract: The rate of terror attacks has surged over the past decade, resulting in the tragic and senseless loss or alteration of numerous lives. Offenders behind mass shootings, bombings, or other domestic terrorism incidents have historically exhibited warning signs on social media before carrying out actual incidents. However, due to inadequate and comprehensive police procedures, authorities and social… ▽ More The rate of terror attacks has surged over the past decade, resulting in the tragic and senseless loss or alteration of numerous lives. Offenders behind mass shootings, bombings, or other domestic terrorism incidents have historically exhibited warning signs on social media before carrying out actual incidents. However, due to inadequate and comprehensive police procedures, authorities and social media platforms are often unable to detect these early indicators of intent. To tackle this issue, we aim to create a multimodal model capable of predicting sentiments simultaneously from both images (i.e., social media photos) and text (i.e., social media posts), generating a unified prediction. The proposed method involves segregating the image and text components of an online post and utilizing a captioning model to generate sentences summarizing the image's contents. Subsequently, a sentiment analyzer evaluates this caption, or description, along with the original post's text to determine whether the post is positive (i.e., concerning) or negative (i.e., benign). This undertaking represents a significant step toward implementing the developed system in real-world scenarios. △ Less

Submitted 24 November, 2023; originally announced November 2023.

Journal ref: IEEE Big Data 2023

arXiv:2311.11068 [pdf, other]

Multi-block linearized alternating direction method for sparse fused Lasso modeling problems

Authors: Xiaofei Wu, Rongmei Liang, Zhimin Zhang, Zhenyu Cui

Abstract: In many statistical modeling problems, such as classification and regression, it is common to encounter sparse and blocky coefficients. Sparse fused Lasso is specifically designed to recover these sparse and blocky structured features, especially in cases where the design matrix has ultrahigh dimensions, meaning that the number of features significantly surpasses the number of samples. Quantile lo… ▽ More In many statistical modeling problems, such as classification and regression, it is common to encounter sparse and blocky coefficients. Sparse fused Lasso is specifically designed to recover these sparse and blocky structured features, especially in cases where the design matrix has ultrahigh dimensions, meaning that the number of features significantly surpasses the number of samples. Quantile loss is a well-known robust loss function that is widely used in statistical modeling. In this paper, we propose a new sparse fused lasso classification model, and develop a unified multi-block linearized alternating direction method of multipliers algorithm that effectively selects sparse and blocky features for regression and classification. Our algorithm has been proven to converge with a derived linear convergence rate. Additionally, our algorithm has a significant advantage over existing methods for solving ultrahigh dimensional sparse fused Lasso regression and classification models due to its lower time complexity. Note that the algorithm can be easily extended to solve various existing fused Lasso models. Finally, we present numerical results for several synthetic and real-world examples, which demonstrate the robustness, scalability, and accuracy of the proposed classification model and algorithm △ Less

Submitted 29 May, 2024; v1 submitted 18 November, 2023; originally announced November 2023.

arXiv:2311.04295 [pdf, other]

Algorithmic stability implies training-conditional coverage for distribution-free prediction methods

Authors: Ruiting Liang, Rina Foygel Barber

Abstract: In a supervised learning problem, given a predicted value that is the output of some trained model, how can we quantify our uncertainty around this prediction? Distribution-free predictive inference aims to construct prediction intervals around this output, with valid coverage that does not rely on assumptions on the distribution of the data or the nature of the model training algorithm. Existing… ▽ More In a supervised learning problem, given a predicted value that is the output of some trained model, how can we quantify our uncertainty around this prediction? Distribution-free predictive inference aims to construct prediction intervals around this output, with valid coverage that does not rely on assumptions on the distribution of the data or the nature of the model training algorithm. Existing methods in this area, including conformal prediction and jackknife+, offer theoretical guarantees that hold marginally (i.e., on average over a draw of training and test data). In contrast, training-conditional coverage is a stronger notion of validity that ensures predictive coverage of the test point for most draws of the training data, and is thus a more desirable property in practice. Training-conditional coverage was shown by Vovk [2012] to hold for the split conformal method, but recent work by Bian and Barber [2023] proves that such validity guarantees are not possible for the full conformal and jackknife+ methods without further assumptions. In this paper, we show that an assumption of algorithmic stability ensures that the training-conditional coverage property holds for the full conformal and jackknife+ methods. △ Less

Submitted 25 June, 2024; v1 submitted 7 November, 2023; originally announced November 2023.

arXiv:2311.00176 [pdf, other]

ChipNeMo: Domain-Adapted LLMs for Chip Design

Authors: Mingjie Liu, Teodor-Dumitru Ene, Robert Kirby, Chris Cheng, Nathaniel Pinckney, Rongjian Liang, Jonah Alben, Himyanshu Anand, Sanmitra Banerjee, Ismet Bayraktaroglu, Bonita Bhaskaran, Bryan Catanzaro, Arjun Chaudhuri, Sharon Clay, Bill Dally, Laura Dang, Parikshit Deshpande, Siddhanth Dhodhi, Sameer Halepete, Eric Hill, Jiashang Hu, Sumit Jain, Ankit Jindal, Brucek Khailany, George Kokai , et al. (17 additional authors not shown)

Abstract: ChipNeMo aims to explore the applications of large language models (LLMs) for industrial chip design. Instead of directly deploying off-the-shelf commercial or open-source LLMs, we instead adopt the following domain adaptation techniques: domain-adaptive tokenization, domain-adaptive continued pretraining, model alignment with domain-specific instructions, and domain-adapted retrieval models. We e… ▽ More ChipNeMo aims to explore the applications of large language models (LLMs) for industrial chip design. Instead of directly deploying off-the-shelf commercial or open-source LLMs, we instead adopt the following domain adaptation techniques: domain-adaptive tokenization, domain-adaptive continued pretraining, model alignment with domain-specific instructions, and domain-adapted retrieval models. We evaluate these methods on three selected LLM applications for chip design: an engineering assistant chatbot, EDA script generation, and bug summarization and analysis. Our evaluations demonstrate that domain-adaptive pretraining of language models, can lead to superior performance in domain related downstream tasks compared to their base LLaMA2 counterparts, without degradations in generic capabilities. In particular, our largest model, ChipNeMo-70B, outperforms the highly capable GPT-4 on two of our use cases, namely engineering assistant chatbot and EDA scripts generation, while exhibiting competitive performance on bug summarization and analysis. These results underscore the potential of domain-specific customization for enhancing the effectiveness of large language models in specialized applications. △ Less

Submitted 4 April, 2024; v1 submitted 31 October, 2023; originally announced November 2023.

Comments: Updated results for ChipNeMo-70B model

arXiv:2310.18302 [pdf, other]

Searching for the signature of a pair density wave in YBa$_2$Cu$_3$O$_{6.67}$ using high energy X-ray diffraction

Authors: Elizabeth Blackburn, Oleh Ivashko, Emma Campillo, Martin von Zimmermann, Ruixing Liang, Douglas A. Bonn, Walter N. Hardy, Johan Chang, Edward M. Forgan, Stephen M. Hayden

Abstract: We have carried out a search for a pair density wave signature using high-energy X-ray diffraction in fields up to 16 T. We do not see evidence for a signal at the predicted wavevector. This is a report on the details of our experiment, with information on where in reciprocal space we looked. We have carried out a search for a pair density wave signature using high-energy X-ray diffraction in fields up to 16 T. We do not see evidence for a signal at the predicted wavevector. This is a report on the details of our experiment, with information on where in reciprocal space we looked. △ Less

Submitted 27 October, 2023; originally announced October 2023.

Comments: 5 pages, report on experimental results

arXiv:2310.13929 [pdf, other]

doi 10.1093/mnras/stad3429

The Intensity of Diffuse Galactic Emission Reflected by Meteor Trails

Authors: Feiyu Zhao, Ruxi Liang, Zepei Yang, Huanyuan Shan, Qian Zheng, Qiqian Zhang, Quan Guo

Abstract: We calculate the reflection of diffuse galactic emission by meteor trails and investigate its potential relationship to Meteor Radio Afterglow (MRA). The formula to calculate the reflection of diffuse galactic emission is derived from a simplified case, assuming that the signals are mirrored by the cylindrical over-dense ionization trail of meteors. The overall observed reflection is simulated thr… ▽ More We calculate the reflection of diffuse galactic emission by meteor trails and investigate its potential relationship to Meteor Radio Afterglow (MRA). The formula to calculate the reflection of diffuse galactic emission is derived from a simplified case, assuming that the signals are mirrored by the cylindrical over-dense ionization trail of meteors. The overall observed reflection is simulated through a ray tracing algorithm together with the diffuse galactic emission modelled by the GSM sky model. We demonstrate that the spectrum of the reflected signal is broadband and follows a power law with a negative spectral index of around -1.3. The intensity of the reflected signal varies with local sidereal time and the brightness of the meteor and can reach 2000 Jy. These results agree with some previous observations of MRAs. Therefore, we think that the reflection of galactic emission by meteor trails can be a possible mechanism causing MRAs, which is worthy of further research. △ Less

Submitted 15 November, 2023; v1 submitted 21 October, 2023; originally announced October 2023.

Comments: 15 pages, 10 figures, 2 tables, accepted for publication in MNRAS, 10.1093/mnras/stad3429

arXiv:2310.08783 [pdf, ps, other]

Optimal divergence rate of the focusing Gibbs measure

Authors: Guopeng Li, Rui Liang, Yuzhao Wang

Abstract: We study the focusing Gibbs measure with critical/supercritical potentials. In particular, we prove asymptotic formulae for the frequency approximation of the partition function, which captures the optimal divergence rate of the partition function as the frequency truncation is removed. We study the focusing Gibbs measure with critical/supercritical potentials. In particular, we prove asymptotic formulae for the frequency approximation of the partition function, which captures the optimal divergence rate of the partition function as the frequency truncation is removed. △ Less

Submitted 12 October, 2023; originally announced October 2023.

Comments: 15 pages

MSC Class: 60H30; 81T08; 35Q55; 60H40

arXiv:2310.07696 [pdf, other]

Planar thermal Hall effect from phonons in cuprates

Authors: Lu Chen, Léna Le Roux, Gaël Grissonnanche, Marie-Eve Boulanger, Steven Thériault, Ruixing Liang, D. A. Bonn, W. N. Hardy, S. Pyon, T. Takayama, H. Takagi, Kejun Xu, Zhi-Xun Shen, Louis Taillefer

Abstract: A surprising "planar" thermal Hall effect, whereby the field is parallel to the current, has recently been observed in a few magnetic insulators, and this has been attributed to exotic excitations such as Majorana fermions or chiral magnons. Here we investigate the possibility of a planar thermal Hall effect in three different cuprate materials, in which the conventional thermal Hall conductivity… ▽ More A surprising "planar" thermal Hall effect, whereby the field is parallel to the current, has recently been observed in a few magnetic insulators, and this has been attributed to exotic excitations such as Majorana fermions or chiral magnons. Here we investigate the possibility of a planar thermal Hall effect in three different cuprate materials, in which the conventional thermal Hall conductivity $κかっぱ_{\rm {xy}}$ (with an out-of-plane field perpendicular to the current) is dominated by either electrons or phonons. Our measurements show that the planar $κかっぱ_{\rm {xy}}$ from electrons in cuprates is zero, as expected from the absence of a Lorentz force in the planar configuration. By contrast, we observe a sizable planar $κかっぱ_{\rm {xy}}$ in those samples where the thermal Hall response is due to phonons, even though it should in principle be forbidden by the high crystal symmetry. Our findings call for a careful re-examination of the mechanisms responsible for the phonon thermal Hall effect in insulators. △ Less

Submitted 1 November, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

arXiv:2309.15133 [pdf, other]

From Asset Flow to Status, Action and Intention Discovery: Early Malice Detection in Cryptocurrency

Authors: Ling Cheng, Feida Zhu, Yong Wang, Ruicheng Liang, Huiwen Liu

Abstract: Cryptocurrency has been subject to illicit activities probably more often than traditional financial assets due to the pseudo-anonymous nature of its transacting entities. An ideal detection model is expected to achieve all three critical properties of (I) early detection, (II) good interpretability, and (III) versatility for various illicit activities. However, existing solutions cannot meet all… ▽ More Cryptocurrency has been subject to illicit activities probably more often than traditional financial assets due to the pseudo-anonymous nature of its transacting entities. An ideal detection model is expected to achieve all three critical properties of (I) early detection, (II) good interpretability, and (III) versatility for various illicit activities. However, existing solutions cannot meet all these requirements, as most of them heavily rely on deep learning without interpretability and are only available for retrospective analysis of a specific illicit type. To tackle all these challenges, we propose Intention-Monitor for early malice detection in Bitcoin (BTC), where the on-chain record data for a certain address are much scarcer than other cryptocurrency platforms. We first define asset transfer paths with the Decision-Tree based feature Selection and Complement (DT-SC) to build different feature sets for different malice types. Then, the Status/Action Proposal Module (S/A-PM) and the Intention-VAE module generate the status, action, intent-snippet, and hidden intent-snippet embedding. With all these modules, our model is highly interpretable and can detect various illegal activities. Moreover, well-designed loss functions further enhance the prediction speed and model's interpretability. Extensive experiments on three real-world datasets demonstrate that our proposed algorithm outperforms the state-of-the-art methods. Furthermore, additional case studies justify our model can not only explain existing illicit patterns but can also find new suspicious characters. △ Less

Submitted 26 September, 2023; originally announced September 2023.

Comments: Accepted by TKDD. arXiv admin note: substantial text overlap with arXiv:2209.12001

arXiv:2309.15018 [pdf, other]

Unidirectional brain-computer interface: Artificial neural network encoding natural images to fMRI response in the visual cortex

Authors: Ruixing Liang, Xiangyu Zhang, Qiong Li, Lai Wei, Hexin Liu, Avisha Kumar, Kelley M. Kempski Leadingham, Joshua Punnoose, Leibny Paola Garcia, Amir Manbachi

Abstract: While significant advancements in artificial intelligence (AI) have catalyzed progress across various domains, its full potential in understanding visual perception remains underexplored. We propose an artificial neural network dubbed VISION, an acronym for "Visual Interface System for Imaging Output of Neural activity," to mimic the human brain and show how it can foster neuroscientific inquiries… ▽ More While significant advancements in artificial intelligence (AI) have catalyzed progress across various domains, its full potential in understanding visual perception remains underexplored. We propose an artificial neural network dubbed VISION, an acronym for "Visual Interface System for Imaging Output of Neural activity," to mimic the human brain and show how it can foster neuroscientific inquiries. Using visual and contextual inputs, this multimodal model predicts the brain's functional magnetic resonance imaging (fMRI) scan response to natural images. VISION successfully predicts human hemodynamic responses as fMRI voxel values to visual inputs with an accuracy exceeding state-of-the-art performance by 45%. We further probe the trained networks to reveal representational biases in different visual areas, generate experimentally testable hypotheses, and formulate an interpretable metric to associate these hypotheses with cortical functions. With both a model and evaluation metric, the cost and time burdens associated with designing and implementing functional analysis on the visual cortex could be reduced. Our work suggests that the evolution of computational models may shed light on our fundamental understanding of the visual cortex and provide a viable approach toward reliable brain-machine interfaces. △ Less

Submitted 26 September, 2023; originally announced September 2023.

arXiv:2309.00907 [pdf, other]

A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical Computation Offloading

Authors: Ruihuai Liang, Bo Yang, Zhiwen Yu, Xuelin Cao, Derrick Wing Kwan Ng, Chau Yuen

Abstract: Computation offloading has become a popular solution to support computationally intensive and latency-sensitive applications by transferring computing tasks to mobile edge servers (MESs) for execution, which is known as mobile/multi-access edge computing (MEC). To improve the MEC performance, it is required to design an optimal offloading strategy that includes offloading decision (i.e., whether o… ▽ More Computation offloading has become a popular solution to support computationally intensive and latency-sensitive applications by transferring computing tasks to mobile edge servers (MESs) for execution, which is known as mobile/multi-access edge computing (MEC). To improve the MEC performance, it is required to design an optimal offloading strategy that includes offloading decision (i.e., whether offloading or not) and computational resource allocation of MEC. The design can be formulated as a mixed-integer nonlinear programming (MINLP) problem, which is generally NP-hard and its effective solution can be obtained by performing online inference through a well-trained deep neural network (DNN) model. However, when the system environments change dynamically, the DNN model may lose efficacy due to the drift of input parameters, thereby decreasing the generalization ability of the DNN model. To address this unique challenge, in this paper, we propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs). Specifically, the shared backbone will be invariant during the PHs training and the inferred results will be ensembled, thereby significantly reducing the required training overhead and improving the inference performance. As a result, the joint optimization problem for offloading decision and resource allocation can be efficiently solved even in a time-varying wireless environment. Experimental results show that the proposed MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data. △ Less

Submitted 2 September, 2023; originally announced September 2023.

arXiv:2308.10545

Constraints Based on Non-detection of Kilonova Optical Searching

Authors: Runduo Liang, Zhengyan Liu, Lei Lei, Wen Zhao

Abstract: Mergers of binary neutron stars are multimessenger sources of gravitational waves that have an optical luminous counterpart, commonly referred to as 'kilonova'. Inspired by the detection of GW170817, intensive searches have been conducted during the LIGO/Virgo O3 run. However, despite these efforts, no verified kilonova was detected. In this work, we present a parameter constraint method based on… ▽ More Mergers of binary neutron stars are multimessenger sources of gravitational waves that have an optical luminous counterpart, commonly referred to as 'kilonova'. Inspired by the detection of GW170817, intensive searches have been conducted during the LIGO/Virgo O3 run. However, despite these efforts, no verified kilonova was detected. In this work, we present a parameter constraint method based on non-detection of optical searching considering both GW skymap, limited sky coverage, cadence, limiting magnitudes and the probability of astrophysical origin. We use our method to place constraints on EoS of neutron star based on follow-up during O3 run and obtain $M_{\rm TOV} = 2.170^{+0.120}_{-0.108}\ M_{\odot}$ at 90\% confidence level with the combination of other observations. And we also take outlook for WFST targeting kilonova throughout the LIGO/Virgo O4 run. With more events handled, we will obtain more stringent constraints on EoS and kilonova populations. △ Less

Submitted 22 August, 2023; v1 submitted 21 August, 2023; originally announced August 2023.

Comments: The paper contains some errors in Section 3 about data used in this work and acknowledgement description that should be corrected now

arXiv:2308.05567 [pdf, other]

C5: Towards Better Conversation Comprehension and Contextual Continuity for ChatGPT

Authors: Pan Liang, Danwei Ye, Zihao Zhu, Yunchao Wang, Wang Xia, Ronghua Liang, Guodao Sun

Abstract: Large language models (LLMs), such as ChatGPT, have demonstrated outstanding performance in various fields, particularly in natural language understanding and generation tasks. In complex application scenarios, users tend to engage in multi-turn conversations with ChatGPT to keep contextual information and obtain comprehensive responses. However, human forgetting and model contextual forgetting re… ▽ More Large language models (LLMs), such as ChatGPT, have demonstrated outstanding performance in various fields, particularly in natural language understanding and generation tasks. In complex application scenarios, users tend to engage in multi-turn conversations with ChatGPT to keep contextual information and obtain comprehensive responses. However, human forgetting and model contextual forgetting remain prominent issues in multi-turn conversation scenarios, which challenge the users' conversation comprehension and contextual continuity for ChatGPT. To address these challenges, we propose an interactive conversation visualization system called C5, which includes Global View, Topic View, and Context-associated Q\&A View. The Global View uses the GitLog diagram metaphor to represent the conversation structure, presenting the trend of conversation evolution and supporting the exploration of locally salient features. The Topic View is designed to display all the question and answer nodes and their relationships within a topic using the structure of a knowledge graph, thereby display the relevance and evolution of conversations. The Context-associated Q\&A View consists of three linked views, which allow users to explore individual conversations deeply while providing specific contextual information when posing questions. The usefulness and effectiveness of C5 were evaluated through a case study and a user study. △ Less

Submitted 10 August, 2023; originally announced August 2023.

arXiv:2308.03725 [pdf, other]

Efficient Temporal Sentence Grounding in Videos with Multi-Teacher Knowledge Distillation

Authors: Renjie Liang, Yiming Yang, Hui Lu, Li Li

Abstract: Temporal Sentence Grounding in Videos (TSGV) aims to detect the event timestamps described by the natural language query from untrimmed videos. This paper discusses the challenge of achieving efficient computation in TSGV models while maintaining high performance. Most existing approaches exquisitely design complex architectures to improve accuracy with extra layers and loss, suffering from ineffi… ▽ More Temporal Sentence Grounding in Videos (TSGV) aims to detect the event timestamps described by the natural language query from untrimmed videos. This paper discusses the challenge of achieving efficient computation in TSGV models while maintaining high performance. Most existing approaches exquisitely design complex architectures to improve accuracy with extra layers and loss, suffering from inefficiency and heaviness. Although some works have noticed that, they only make an issue of feature fusion layers, which can hardly enjoy the highspeed merit in the whole clunky network. To tackle this problem, we propose a novel efficient multi-teacher model (EMTM) based on knowledge distillation to transfer diverse knowledge from both heterogeneous and isomorphic networks. Specifically, We first unify different outputs of the heterogeneous models into one single form. Next, a Knowledge Aggregation Unit (KAU) is built to acquire high-quality integrated soft labels from multiple teachers. After that, the KAUえーゆー module leverages the multi-scale video and global query information to adaptively determine the weights of different teachers. A Shared Encoder strategy is then proposed to solve the problem that the student shallow layers hardly benefit from teachers, in which an isomorphic teacher is collaboratively trained with the student to align their hidden states. Extensive experimental results on three popular TSGV benchmarks demonstrate that our method is both effective and efficient without bells and whistles. △ Less

Submitted 7 August, 2023; originally announced August 2023.

arXiv:2307.14575 [pdf, other]

A Memory-Augmented Multi-Task Collaborative Framework for Unsupervised Traffic Accident Detection in Driving Videos

Authors: Rongqin Liang, Yuanman Li, Yingxin Yi, Jiantao Zhou, Xia Li

Abstract: Identifying traffic accidents in driving videos is crucial to ensuring the safety of autonomous driving and driver assistance systems. To address the potential danger caused by the long-tailed distribution of driving events, existing traffic accident detection (TAD) methods mainly rely on unsupervised learning. However, TAD is still challenging due to the rapid movement of cameras and dynamic scen… ▽ More Identifying traffic accidents in driving videos is crucial to ensuring the safety of autonomous driving and driver assistance systems. To address the potential danger caused by the long-tailed distribution of driving events, existing traffic accident detection (TAD) methods mainly rely on unsupervised learning. However, TAD is still challenging due to the rapid movement of cameras and dynamic scenes in driving scenarios. Existing unsupervised TAD methods mainly rely on a single pretext task, i.e., an appearance-based or future object localization task, to detect accidents. However, appearance-based approaches are easily disturbed by the rapid movement of the camera and changes in illumination, which significantly reduce the performance of traffic accident detection. Methods based on future object localization may fail to capture appearance changes in video frames, making it difficult to detect ego-involved accidents (e.g., out of control of the ego-vehicle). In this paper, we propose a novel memory-augmented multi-task collaborative framework (MAMTCF) for unsupervised traffic accident detection in driving videos. Different from previous approaches, our method can more accurately detect both ego-involved and non-ego accidents by simultaneously modeling appearance changes and object motions in video frames through the collaboration of optical flow reconstruction and future object localization tasks. Further, we introduce a memory-augmented motion representation mechanism to fully explore the interrelation between different types of motion representations and exploit the high-level features of normal traffic patterns stored in memory to augment motion representations, thus enlarging the difference from anomalies. Experimental results on recently published large-scale dataset demonstrate that our method achieves better performance compared to previous state-of-the-art approaches. △ Less

Submitted 26 July, 2023; originally announced July 2023.

Comments: 12pages,5 figures

arXiv:2306.15988 [pdf, other]

AFPN: Asymptotic Feature Pyramid Network for Object Detection

Authors: Guoyu Yang, Jie Lei, Zhikuan Zhu, Siyu Cheng, Zunlei Feng, Ronghua Liang

Abstract: Multi-scale features are of great importance in encoding objects with scale variance in object detection tasks. A common strategy for multi-scale feature extraction is adopting the classic top-down and bottom-up feature pyramid networks. However, these approaches suffer from the loss or degradation of feature information, impairing the fusion effect of non-adjacent levels. This paper proposes an a… ▽ More Multi-scale features are of great importance in encoding objects with scale variance in object detection tasks. A common strategy for multi-scale feature extraction is adopting the classic top-down and bottom-up feature pyramid networks. However, these approaches suffer from the loss or degradation of feature information, impairing the fusion effect of non-adjacent levels. This paper proposes an asymptotic feature pyramid network (AFPN) to support direct interaction at non-adjacent levels. AFPN is initiated by fusing two adjacent low-level features and asymptotically incorporates higher-level features into the fusion process. In this way, the larger semantic gap between non-adjacent levels can be avoided. Given the potential for multi-object information conflicts to arise during feature fusion at each spatial location, adaptive spatial fusion operation is further utilized to mitigate these inconsistencies. We incorporate the proposed AFPN into both two-stage and one-stage object detection frameworks and evaluate with the MS-COCO 2017 validation and test datasets. Experimental evaluation shows that our method achieves more competitive results than other state-of-the-art feature pyramid networks. The code is available at \href{https://github.com/gyyang23/AFPN}{https://github.com/gyyang23/AFPN}. △ Less

Submitted 24 September, 2023; v1 submitted 28 June, 2023; originally announced June 2023.

arXiv:2306.07645 [pdf, ps, other]

Gibbs Dynamics for the Weakly Dispersive Nonlinear Schrödinger Equations

Authors: Rui Liang, Yuzhao Wang

Abstract: We consider the Cauchy problem for the one-dimensional periodic cubic nonlinear fractional Schrödinger equation (FNLS) with initial data distributed via Gibbs measure. We construct global strong solutions with the flow property to FNLS on the support of the Gibbs measure in the full dispersive range, thus resolving a question proposed by Sun-Tzvetkov (2021). As a byproduct, we prove the invariance… ▽ More We consider the Cauchy problem for the one-dimensional periodic cubic nonlinear fractional Schrödinger equation (FNLS) with initial data distributed via Gibbs measure. We construct global strong solutions with the flow property to FNLS on the support of the Gibbs measure in the full dispersive range, thus resolving a question proposed by Sun-Tzvetkov (2021). As a byproduct, we prove the invariance of the Gibbs measure and almost sure global well-posedness for FNLS. △ Less

Submitted 13 June, 2023; originally announced June 2023.

MSC Class: 35Q55; 37K99

arXiv:2306.07590 [pdf, other]

doi 10.1007/s11433-023-2197-5

Sciences with the 2.5-meter Wide Field Survey Telescope (WFST)

Authors: WFST Collaboration, Tinggui Wang, Guilin Liu, Zhenyi Cai, Jinjun Geng, Min Fang, Haoning He, Ji-an Jiang, Ning Jiang, Xu Kong, Bin Li, Ye Li, Wentao Luo, Zhizheng Pan, Xuefeng Wu, Ji Yang, Jiming Yu, Xianzhong Zheng, Qingfeng Zhu, Yi-Fu Cai, Yuanyuan Chen, Zhiwei Chen, Zigao Dai, Lulu Fan, Yizhong Fan , et al. (38 additional authors not shown)

Abstract: The Wide Field Survey Telescope (WFST) is a dedicated photometric surveying facility being built jointly by the University of Science and Technology of China and the Purple Mountain Observatory. It is equipped with a 2.5-meter diameter primary mirror, an active optics system, and a mosaic CCD camera with 0.73 gigapixels on the primary focal plane for high-quality image capture over an FOV of 6.5-s… ▽ More The Wide Field Survey Telescope (WFST) is a dedicated photometric surveying facility being built jointly by the University of Science and Technology of China and the Purple Mountain Observatory. It is equipped with a 2.5-meter diameter primary mirror, an active optics system, and a mosaic CCD camera with 0.73 gigapixels on the primary focal plane for high-quality image capture over an FOV of 6.5-square-degree. It is anticipated that WFST will be set up at the Lenghu site in the summer of 2023 and begin to observe the northern sky in four optical bands (u, g, r, and i) with a range of cadences, from hourly/daily in the Deep High-Cadence Survey (DHS) program to semiweekly in the Wide-Field Survey (WFS) program, three months later. During a photometric night, a nominal 30 s exposure in the WFS program will reach a depth of 22.27, 23.32, 22.84, and 22.31 (AB magnitudes) in these four bands, respectively, allowing for the detection of a tremendous amount of transients in the low-z universe and a systematic investigation of the variability of Galactic and extragalactic objects. In the DHS program, intranight 90 s exposures as deep as 23 (u) and 24 mag (g), in combination with target of opportunity follow-ups, will provide a unique opportunity to explore energetic transients in demand for high sensitivities, including the electromagnetic counterparts of gravitational wave events, supernovae within a few hours of their explosions, tidal disruption events and fast, luminous optical transients even beyond a redshift of unity. In addition, the final 6-year co-added images, anticipated to reach g=25.8 mag in WFS or 1.5 mags deeper in DHS, will be of fundamental importance to general Galactic and extragalactic science. The highly uniform legacy surveys of WFST will serve as an indispensable complement to those of LSST that monitor the southern sky. △ Less

Submitted 14 September, 2023; v1 submitted 13 June, 2023; originally announced June 2023.

Comments: 48 pages

Journal ref: SCPMA-Vol. 66 No. 10: 109512 (2023)

arXiv:2305.03244 [pdf, other]

doi 10.1021/acs.nanolett.3c00568

Plasmonic-enhanced bright single spin defects in silicon carbide membranes

Authors: Ji-Yang Zhou, Qiang Li, Zhi-He Hao, Wu-Xi Lin, Zhen-Xuan He, Rui-Jian Liang, Liping Guo, Hao Li, Lixing You, Jian-Shun Tang, Jin-Shi Xu, Chuan-Feng Li, Guang-Can Guo

Abstract: Optically addressable spin defects in silicon carbide (SiC) have emerged as attractable platforms for various quantum technologies. However, the low photon count rate significantly limits their applications. We strongly enhanced the brightness by 7 times and spin-control strength by 14 times of single divacancy defects in 4H-SiC membranes using surface plasmon generated by gold film coplanar waveg… ▽ More Optically addressable spin defects in silicon carbide (SiC) have emerged as attractable platforms for various quantum technologies. However, the low photon count rate significantly limits their applications. We strongly enhanced the brightness by 7 times and spin-control strength by 14 times of single divacancy defects in 4H-SiC membranes using surface plasmon generated by gold film coplanar waveguides. The mechanism of the plasmonic-enhanced effect is further studied by tuning the distance between single defects and the surface of the gold film. A three-energy-level model is used to determine the corresponding transition rates consistent with the enhanced brightness of single defects. Lifetime measurements also verified the coupling between defects and surface plasmons. Our scheme is low-cost, without complicated microfabrication and delicate structures, which is applicable for other spin defects in different materials. This work would promote developing spin defect-based quantum applications in mature SiC materials. △ Less

Submitted 4 May, 2023; originally announced May 2023.

arXiv:2304.13108 [pdf, other]

Detecting HI Galaxies with Deep Neural Networks in the Presence of Radio Frequency Interference

Authors: Ruxi Liang, Furen Deng, Zepei Yang, Chunming Li, Feiyu Zhao, Botao Yang, Shuanghao Shu, Wenxiu Yang, Shifan Zuo, Yichao Li, Yougang Wang, Xuelei Chen

Abstract: In neutral hydrogen (HI) galaxy survey, a significant challenge is to identify and extract the HI galaxy signal from observational data contaminated by radio frequency interference (RFI). For a drift-scan survey, or more generally a survey of a spatially continuous region, in the time-ordered spectral data, the HI galaxies and RFI all appear as regions which extend an area in the time-frequency wa… ▽ More In neutral hydrogen (HI) galaxy survey, a significant challenge is to identify and extract the HI galaxy signal from observational data contaminated by radio frequency interference (RFI). For a drift-scan survey, or more generally a survey of a spatially continuous region, in the time-ordered spectral data, the HI galaxies and RFI all appear as regions which extend an area in the time-frequency waterfall plot, so the extraction of the HI galaxies and RFI from such data can be regarded as an image segmentation problem, and machine learning methods can be applied to solve such problems. In this study, we develop a method to effectively detect and extract signals of HI galaxies based on a Mask R-CNN network combined with the PointRend method. By simulating FAST-observed galaxy signals and potential RFI impacts, we created a realistic data set for the training and testing of our neural network. We compared five different architectures and selected the best-performing one. This architecture successfully performs instance segmentation of HI galaxy signals in the RFI-contaminated time-ordered data (TOD), achieving a precision of 98.64% and a recall of 93.59%. △ Less

Submitted 25 April, 2023; originally announced April 2023.

Comments: 17 pages, 9 figures, 1 tables. Accepted for publication in RAA

arXiv:2303.13022 [pdf, other]

ENVIDR: Implicit Differentiable Renderer with Neural Environment Lighting

Authors: Ruofan Liang, Huiting Chen, Chunlin Li, Fan Chen, Selvakumar Panneer, Nandita Vijaykumar

Abstract: Recent advances in neural rendering have shown great potential for reconstructing scenes from multiview images. However, accurately representing objects with glossy surfaces remains a challenge for existing methods. In this work, we introduce ENVIDR, a rendering and modeling framework for high-quality rendering and reconstruction of surfaces with challenging specular reflections. To achieve this,… ▽ More Recent advances in neural rendering have shown great potential for reconstructing scenes from multiview images. However, accurately representing objects with glossy surfaces remains a challenge for existing methods. In this work, we introduce ENVIDR, a rendering and modeling framework for high-quality rendering and reconstruction of surfaces with challenging specular reflections. To achieve this, we first propose a novel neural renderer with decomposed rendering components to learn the interaction between surface and environment lighting. This renderer is trained using existing physically based renderers and is decoupled from actual scene representations. We then propose an SDF-based neural surface model that leverages this learned neural renderer to represent general scenes. Our model additionally synthesizes indirect illuminations caused by inter-reflections from shiny surfaces by marching surface-reflected rays. We demonstrate that our method outperforms state-of-art methods on challenging shiny scenes, providing high-quality rendering of specular reflections while also enabling material editing and scene relighting. △ Less

Submitted 23 March, 2023; originally announced March 2023.

Comments: Project page: https://nexuslrf.github.io/ENVIDR/

arXiv:2303.11034 [pdf]

Internal Structure Attention Network for Fingerprint Presentation Attack Detection from Optical Coherence Tomography

Authors: Haohao Sun, Yilong Zhang, Peng Chen, Haixia Wang, Ronghua Liang

Abstract: As a non-invasive optical imaging technique, optical coherence tomography (OCT) has proven promising for automatic fingerprint recognition system (AFRS) applications. Diverse approaches have been proposed for OCT-based fingerprint presentation attack detection (PAD). However, considering the complexity and variety of PA samples, it is extremely challenging to increase the generalization ability wi… ▽ More As a non-invasive optical imaging technique, optical coherence tomography (OCT) has proven promising for automatic fingerprint recognition system (AFRS) applications. Diverse approaches have been proposed for OCT-based fingerprint presentation attack detection (PAD). However, considering the complexity and variety of PA samples, it is extremely challenging to increase the generalization ability with the limited PA dataset. To solve the challenge, this paper presents a novel supervised learning-based PAD method, denoted as ISAPAD, which applies prior knowledge to guide network training and enhance the generalization ability. The proposed dual-branch architecture can not only learns global features from the OCT image, but also concentrate on layered structure feature which comes from the internal structure attention module (ISAM). The simple yet effective ISAM enables the proposed network to obtain layered segmentation features belonging only to Bonafide from noisy OCT volume data directly. Combined with effective training strategies and PAD score generation rules, ISAPAD obtains optimal PAD performance in limited training data. Domain generalization experiments and visualization analysis validate the effectiveness of the proposed method for OCT PAD. △ Less

Submitted 20 March, 2023; originally announced March 2023.

Comments: 12 pages, 14 figures

arXiv:2303.06829 [pdf, ps, other]

Unbounded Chaotic Weighted Pseudo-shift

Authors: Ruxi Liang, Pengyu Qin, Yonglu Shu

Abstract: In this paper, based on the work of Vijay K. Srivastava and Harish Chandra, we give a characterization of the unbounded hypercyclic weighted pseudo-shift operator $wC_{\varphi}$ on $\ell^p$ or $c_0$. Moreover we use the hypercyclicity criterion of Bès, Chan, and Seubert to give a necessary and sufficient condition in order that $wC_{\varphi}$ be a chaotic operator. In this paper, based on the work of Vijay K. Srivastava and Harish Chandra, we give a characterization of the unbounded hypercyclic weighted pseudo-shift operator $wC_{\varphi}$ on $\ell^p$ or $c_0$. Moreover we use the hypercyclicity criterion of Bès, Chan, and Seubert to give a necessary and sufficient condition in order that $wC_{\varphi}$ be a chaotic operator. △ Less

Submitted 10 May, 2023; v1 submitted 12 March, 2023; originally announced March 2023.

Comments: 20 pages

arXiv:2302.10697 [pdf, other]

doi 10.1109/TCSVT.2023.3284076

A Visual Representation-guided Framework with Global Affinity for Weakly Supervised Salient Object Detection

Authors: Binwei Xu, Haoran Liang, Weihua Gong, Ronghua Liang, Peng Chen

Abstract: Fully supervised salient object detection (SOD) methods have made considerable progress in performance, yet these models rely heavily on expensive pixel-wise labels. Recently, to achieve a trade-off between labeling burden and performance, scribble-based SOD methods have attracted increasing attention. Previous scribble-based models directly implement the SOD task only based on SOD training data w… ▽ More Fully supervised salient object detection (SOD) methods have made considerable progress in performance, yet these models rely heavily on expensive pixel-wise labels. Recently, to achieve a trade-off between labeling burden and performance, scribble-based SOD methods have attracted increasing attention. Previous scribble-based models directly implement the SOD task only based on SOD training data with limited information, it is extremely difficult for them to understand the image and further achieve a superior SOD task. In this paper, we propose a simple yet effective framework guided by general visual representations with rich contextual semantic knowledge for scribble-based SOD. These general visual representations are generated by self-supervised learning based on large-scale unlabeled datasets. Our framework consists of a task-related encoder, a general visual module, and an information integration module to efficiently combine the general visual representations with task-related features to perform the SOD task based on understanding the contextual connections of images. Meanwhile, we propose a novel global semantic affinity loss to guide the model to perceive the global structure of the salient objects. Experimental results on five public benchmark datasets demonstrate that our method, which only utilizes scribble annotations without introducing any extra label, outperforms the state-of-the-art weakly supervised SOD methods. Specifically, it outperforms the previous best scribble-based method on all datasets with an average gain of 5.5% for max f-measure, 5.8% for mean f-measure, 24% for MAE, and 3.1% for E-measure. Moreover, our method achieves comparable or even superior performance to the state-of-the-art fully supervised models. △ Less

Submitted 8 June, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

arXiv:2302.09907 [pdf, other]

General Rotation Invariance Learning for Point Clouds via Weight-Feature Alignment

Authors: Liang Xie, Yibo Yang, Wenxiao Wang, Binbin Lin, Deng Cai, Xiaofei He, Ronghua Liang

Abstract: Compared to 2D images, 3D point clouds are much more sensitive to rotations. We expect the point features describing certain patterns to keep invariant to the rotation transformation. There are many recent SOTA works dedicated to rotation-invariant learning for 3D point clouds. However, current rotation-invariant methods lack generalizability on the point clouds in the open scenes due to the relia… ▽ More Compared to 2D images, 3D point clouds are much more sensitive to rotations. We expect the point features describing certain patterns to keep invariant to the rotation transformation. There are many recent SOTA works dedicated to rotation-invariant learning for 3D point clouds. However, current rotation-invariant methods lack generalizability on the point clouds in the open scenes due to the reliance on the global distribution, \ie the global scene and backgrounds. Considering that the output activation is a function of the pattern and its orientation, we need to eliminate the effect of the orientation.In this paper, inspired by the idea that the network weights can be considered a set of points distributed in the same 3D space as the input points, we propose Weight-Feature Alignment (WFA) to construct a local Invariant Reference Frame (IRF) via aligning the features with the principal axes of the network weights. Our WFA algorithm provides a general solution for the point clouds of all scenes. WFA ensures the model achieves the target that the response activity is a necessary and sufficient condition of the pattern matching degree. Practically, we perform experiments on the point clouds of both single objects and open large-range scenes. The results suggest that our method almost bridges the gap between rotation invariance learning and normal methods. △ Less

Submitted 9 October, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

Comments: 4 figures

arXiv:2301.13062 [pdf, other]

Operator Fusion in XLA: Analysis and Evaluation

Authors: Daniel Snider, Ruofan Liang

Abstract: Machine learning (ML) compilers are an active area of research because they offer the potential to automatically speedup tensor programs. Kernel fusion is often cited as an important optimization performed by ML compilers. However, there exists a knowledge gap about how XLA, the most common ML compiler, applies this nuanced optimization, what kind of speedup it can afford, and what low-level effec… ▽ More Machine learning (ML) compilers are an active area of research because they offer the potential to automatically speedup tensor programs. Kernel fusion is often cited as an important optimization performed by ML compilers. However, there exists a knowledge gap about how XLA, the most common ML compiler, applies this nuanced optimization, what kind of speedup it can afford, and what low-level effects it has on hardware. Our paper aims to bridge this knowledge gap by studying key compiler passes of XLA's source code. Our evaluation on a reinforcement learning environment Cartpole shows how different fusion decisions in XLA are made in practice. Furthermore, we implement several XLA kernel fusion strategies that can achieve up to 10.56x speedup compared to our baseline implementation. △ Less

Submitted 30 January, 2023; originally announced January 2023.

Comments: Course Paper, University of Toronto

arXiv:2301.05412 [pdf, other]

doi 10.1145/3580305.3599817

Evolve Path Tracer: Early Detection of Malicious Addresses in Cryptocurrency

Authors: Ling Cheng, Feida Zhu, Yong Wang, Ruicheng Liang, Huiwen Liu

Abstract: With the ever-increasing boom of Cryptocurrency, detecting fraudulent behaviors and associated malicious addresses draws significant research effort. However, most existing studies still rely on the full history features or full-fledged address transaction networks, thus cannot meet the requirements of early malicious address detection, which is urgent but seldom discussed by existing studies. To… ▽ More With the ever-increasing boom of Cryptocurrency, detecting fraudulent behaviors and associated malicious addresses draws significant research effort. However, most existing studies still rely on the full history features or full-fledged address transaction networks, thus cannot meet the requirements of early malicious address detection, which is urgent but seldom discussed by existing studies. To detect fraud behaviors of malicious addresses in the early stage, we present Evolve Path Tracer, which consists of Evolve Path Encoder LSTM, Evolve Path Graph GCN, and Hierarchical Survival Predictor. Specifically, in addition to the general address features, we propose asset transfer paths and corresponding path graphs to characterize early transaction patterns. Further, since the transaction patterns are changing rapidly during the early stage, we propose Evolve Path Encoder LSTM and Evolve Path Graph GCN to encode asset transfer path and path graph under an evolving structure setting. Hierarchical Survival Predictor then predicts addresses' labels with nice scalability and faster prediction speed. We investigate the effectiveness and versatility of Evolve Path Tracer on three real-world illicit bitcoin datasets. Our experimental results demonstrate that Evolve Path Tracer outperforms the state-of-the-art methods. Extensive scalability experiments demonstrate the model's adaptivity under a dynamic prediction setting. △ Less

Submitted 3 June, 2023; v1 submitted 13 January, 2023; originally announced January 2023.

Comments: In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD23)

arXiv:2301.00969 [pdf, other]

doi 10.1145/3564625.3567998

Boosting Neural Networks to Decompile Optimized Binaries

Authors: Ying Cao, Ruigang Liang, Kai Chen, Peiwei Hu

Abstract: Decompilation aims to transform a low-level program language (LPL) (eg., binary file) into its functionally-equivalent high-level program language (HPL) (e.g., C/C++). It is a core technology in software security, especially in vulnerability discovery and malware analysis. In recent years, with the successful application of neural machine translation (NMT) models in natural language processing (NL… ▽ More Decompilation aims to transform a low-level program language (LPL) (eg., binary file) into its functionally-equivalent high-level program language (HPL) (e.g., C/C++). It is a core technology in software security, especially in vulnerability discovery and malware analysis. In recent years, with the successful application of neural machine translation (NMT) models in natural language processing (NLP), researchers have tried to build neural decompilers by borrowing the idea of NMT. They formulate the decompilation process as a translation problem between LPL and HPL, aiming to reduce the human cost required to develop decompilation tools and improve their generalizability. However, state-of-the-art learning-based decompilers do not cope well with compiler-optimized binaries. Since real-world binaries are mostly compiler-optimized, decompilers that do not consider optimized binaries have limited practical significance. In this paper, we propose a novel learning-based approach named NeurDP, that targets compiler-optimized binaries. NeurDP uses a graph neural network (GNN) model to convert LPL to an intermediate representation (IR), which bridges the gap between source code and optimized binary. We also design an Optimized Translation Unit (OTU) to split functions into smaller code fragments for better translation performance. Evaluation results on datasets containing various types of statements show that NeurDP can decompile optimized binaries with 45.21% higher accuracy than state-of-the-art neural decompilation frameworks. △ Less

Submitted 3 January, 2023; originally announced January 2023.

arXiv:2212.14131 [pdf, other]

TAToo: Vision-based Joint Tracking of Anatomy and Tool for Skull-base Surgery

Authors: Zhaoshuo Li, Hongchao Shu, Ruixing Liang, Anna Goodridge, Manish Sahu, Francis X. Creighton, Russell H. Taylor, Mathias Unberath

Abstract: Purpose: Tracking the 3D motion of the surgical tool and the patient anatomy is a fundamental requirement for computer-assisted skull-base surgery. The estimated motion can be used both for intra-operative guidance and for downstream skill analysis. Recovering such motion solely from surgical videos is desirable, as it is compliant with current clinical workflows and instrumentation. Methods: We… ▽ More Purpose: Tracking the 3D motion of the surgical tool and the patient anatomy is a fundamental requirement for computer-assisted skull-base surgery. The estimated motion can be used both for intra-operative guidance and for downstream skill analysis. Recovering such motion solely from surgical videos is desirable, as it is compliant with current clinical workflows and instrumentation. Methods: We present Tracker of Anatomy and Tool (TAToo). TAToo jointly tracks the rigid 3D motion of patient skull and surgical drill from stereo microscopic videos. TAToo estimates motion via an iterative optimization process in an end-to-end differentiable form. For robust tracking performance, TAToo adopts a probabilistic formulation and enforces geometric constraints on the object level. Results: We validate TAToo on both simulation data, where ground truth motion is available, as well as on anthropomorphic phantom data, where optical tracking provides a strong baseline. We report sub-millimeter and millimeter inter-frame tracking accuracy for skull and drill, respectively, with rotation errors below 1°. We further illustrate how TAToo may be used in a surgical navigation setting. Conclusion: We present TAToo, which simultaneously tracks the surgical tool and the patient anatomy in skull-base surgery. TAToo directly predicts the motion from surgical videos, without the need of any markers. Our results show that the performance of TAToo compares favorably to competing approaches. Future work will include fine-tuning of our depth network to reach a 1 mm clinical accuracy goal desired for surgical applications in the skull base. △ Less

Submitted 16 May, 2023; v1 submitted 28 December, 2022; originally announced December 2022.

Comments: IPCAI/IJCARS 2023, code available at: https://github.com/mli0603/TAToo

arXiv:2212.08278 [pdf, other]

doi 10.1145/3577012

Seeing through Things: Exploring the Design Space of Privacy-Aware Data-Enabled Objects

Authors: Yu-Ting Cheng, Mathias Funk, Rung-Huei Liang, Lin-Lin Chen

Abstract: Increasing amounts of sensor-augmented research objects have been used in design research. We call these objects Data-Enabled Objects, which can be integrated into daily activities capturing data about people's detailed whereabouts, behaviours and routines. These objects provide data perspectives on everyday life for contextual design research. However, data-enabled objects are still computational… ▽ More Increasing amounts of sensor-augmented research objects have been used in design research. We call these objects Data-Enabled Objects, which can be integrated into daily activities capturing data about people's detailed whereabouts, behaviours and routines. These objects provide data perspectives on everyday life for contextual design research. However, data-enabled objects are still computational devices with limited privacy awareness and nuanced data sharing. To better design data-enabled objects, we explore privacy design spaces by inviting 18 teams of undergraduate design students to re-design the same type of sensor-enabled home research camera. We developed the Connected Peekaboo Toolkit (CPT) to support the design teams in designing, building, and directly deploying their prototypes in real home studies. We conducted Thematic Analysis to analyse their outcomes which led us to interpret that privacy is not just an obstacle but can be a driver by unfolding an exploration of possible design spaces for data-enabled objects. △ Less

Submitted 15 December, 2022; originally announced December 2022.

Comments: Data space exploration, privacy design, data-enabled objects, design ethnography, research projects, field study, 44 pages

Journal ref: ACM Trans. Comput.-Hum. Interact. 1, 1, Article 1 (January 2022)

Showing 1–50 of 309 results for author: Liang, R