Search | arXiv e-print repository

Boosting 3D Object Detection with Semantic-Aware Multi-Branch Framework

Authors: Hao Jing, Anhong Wang, Lijun Zhao, Yakun Yang, Donghan Bu, Jing Zhang, Yifan Zhang, Junhui Hou

Abstract: In autonomous driving, LiDAR sensors are vital for acquiring 3D point clouds, providing reliable geometric information. However, traditional sampling methods of preprocessing often ignore semantic features, leading to detail loss and ground point interference in 3D object detection. To address this, we propose a multi-branch two-stage 3D object detection framework using a Semantic-aware Multi-bran… ▽ More In autonomous driving, LiDAR sensors are vital for acquiring 3D point clouds, providing reliable geometric information. However, traditional sampling methods of preprocessing often ignore semantic features, leading to detail loss and ground point interference in 3D object detection. To address this, we propose a multi-branch two-stage 3D object detection framework using a Semantic-aware Multi-branch Sampling (SMS) module and multi-view consistency constraints. The SMS module includes random sampling, Density Equalization Sampling (DES) for enhancing distant objects, and Ground Abandonment Sampling (GAS) to focus on non-ground points. The sampled multi-view points are processed through a Consistent KeyPoint Selection (CKPS) module to generate consistent keypoint masks for efficient proposal sampling. The first-stage detector uses multi-branch parallel learning with multi-view consistency loss for feature aggregation, while the second-stage detector fuses multi-view data through a Multi-View Fusion Pooling (MVFP) module to precisely predict 3D objects. The experimental results on KITTI 3D object detection benchmark dataset show that our method achieves excellent detection performance improvement for a variety of backbones, especially for low-performance backbones with the simple network structures. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.05681 [pdf]

Bulk high-temperature superconductivity in the high-pressure tetragonal phase of bilayer La2PrNi2O7

Authors: Ningning Wang, Gang Wang, Xiaoling Shen, Jun Hou, Jun Luo, Xiaoping Ma, Huaixin Yang, Lifen Shi, Jie Dou, Jie Feng, Jie Yang, Yunqing Shi, Zhian Ren, Hanming Ma, Pengtao Yang, Ziyi Liu, Yue Liu, Hua Zhang, Xiaoli Dong, Yuxin Wang, Kun Jiang, Jiangping Hu, Stuart Calder, Jiaqiang Yan, Jianping Sun , et al. (4 additional authors not shown)

Abstract: The Ruddlesden-Popper (R-P) bilayer nickelate, La3Ni2O7, was recently found to show signatures of high-temperature superconductivity (HTSC) at pressures above 14 GPa. Subsequent investigations achieved zero resistance in single- and poly-crystalline samples under hydrostatic pressure conditions. Yet, obvious diamagnetic signals, the other hallmark of superconductors, are still lacking owing to the… ▽ More The Ruddlesden-Popper (R-P) bilayer nickelate, La3Ni2O7, was recently found to show signatures of high-temperature superconductivity (HTSC) at pressures above 14 GPa. Subsequent investigations achieved zero resistance in single- and poly-crystalline samples under hydrostatic pressure conditions. Yet, obvious diamagnetic signals, the other hallmark of superconductors, are still lacking owing to the filamentary nature with low superconducting volume fraction. The presence of a novel "1313" polymorph and competing R-P phases obscured proper identification of the phase for HTSC. Thus, achieving bulk HTSC and identifying the phase at play are the most prominent tasks at present. Here, we address these issues in the praseodymium (Pr)-doped La2PrNi2O7 polycrystalline samples. We find that the substitutions of Pr for La effectively inhibits the intergrowth of different R-P phases, resulting in nearly pure bilayer structure. For La2PrNi2O7, pressure-induced orthorhombic-to-tetragonal structural transition takes place at Pc ~ 11 GPa, above which HTSC emerges gradually upon further compression. The superconducting transition temperatures at 18-20 GPa reach Tconset = 82.5 K and Tczero = 60 K, which are the highest values among known nickelate superconductors. More importantly, bulk HTSC was testified by detecting clear diamagnetic signals below ~75 K corresponding to an estimated superconducting volume fraction ~ 57(5)% at 20 GPa. Our results not only resolve the existing controversies but also illuminate directions for exploring bulk HTSC in the bilayer nickelates. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.05053 [pdf, other]

Adaptive Stiffness: A Biomimetic Robotic System with Tensegrity-Based Compliant Mechanism

Authors: Po-Yu Hsieh, June-Hao Hou

Abstract: Biomimicry has played a pivotal role in robotics. In contrast to rigid robots, bio-inspired robots exhibit an inherent compliance, facilitating versatile movements and operations in constrained spaces. The robot implementation in fabrication, however, has posed technical challenges and mechanical complexity, thereby underscoring a noticeable gap between research and practice. To address the limita… ▽ More Biomimicry has played a pivotal role in robotics. In contrast to rigid robots, bio-inspired robots exhibit an inherent compliance, facilitating versatile movements and operations in constrained spaces. The robot implementation in fabrication, however, has posed technical challenges and mechanical complexity, thereby underscoring a noticeable gap between research and practice. To address the limitation, the research draws inspiration from the unique musculoskeletal feature of vertebrate physiology, which displays significant capabilities for sophisticated locomotion. The research converts the biological paradigm into a tensegrity-based robotic system, which is formed by the design of rigid-flex coupling and a compliant mechanism. This integrated technique enables the robot to achieve a wide range of motions with variable stiffness and adaptability, holding great potential for advanced performance in ill-defined environments. In summation, the research aims to provide a robust foundation for tensegrity-based biomimetic robots in practice, enhancing the feasibility of undertaking intricate robotic constructions. △ Less

Submitted 6 July, 2024; originally announced July 2024.

Comments: 14 pages, 21 figures

arXiv:2407.03594 [pdf, other]

UniPlane: Unified Plane Detection and Reconstruction from Posed Monocular Videos

Authors: Yuzhong Huang, Chen Liu, Ji Hou, Ke Huo, Shiyu Dong, Fred Morstatter

Abstract: We present UniPlane, a novel method that unifies plane detection and reconstruction from posed monocular videos. Unlike existing methods that detect planes from local observations and associate them across the video for the final reconstruction, UniPlane unifies both the detection and the reconstruction tasks in a single network, which allows us to directly optimize final reconstruction quality an… ▽ More We present UniPlane, a novel method that unifies plane detection and reconstruction from posed monocular videos. Unlike existing methods that detect planes from local observations and associate them across the video for the final reconstruction, UniPlane unifies both the detection and the reconstruction tasks in a single network, which allows us to directly optimize final reconstruction quality and fully leverage temporal information. Specifically, we build a Transformers-based deep neural network that jointly constructs a 3D feature volume for the environment and estimates a set of per-plane embeddings as queries. UniPlane directly reconstructs the 3D planes by taking dot products between voxel embeddings and the plane embeddings followed by binary thresholding. Extensive experiments on real-world datasets demonstrate that UniPlane outperforms state-of-the-art methods in both plane detection and reconstruction tasks, achieving +4.6 in F-score in geometry as well as consistent improvements in other geometry and segmentation metrics. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2206.07710 by other authors

arXiv:2407.02741 [pdf]

18 GHz Solidly Mounted Resonator in Scandium Aluminum Nitride on SiO2/Ta2O5 Bragg Reflector

Authors: Omar Barrera, Nishanth Ravi, Kapil Saha, Supratik Dasgupta, Joshua Campbell, Jack Kramer, Eugene Kwon, Tzu-Hsuan Hsu, Sinwoo Cho, Ian Anderson, Pietro Simeoni, Jue Hou, Matteo Rinaldi, Mark S. Goorsky, Ruochen Lu

Abstract: This work reports an acoustic solidly mounted resonator (SMR) at 18.64 GHz, among the highest operating frequencies reported. The device is built in scandium aluminum nitride (ScAlN) on top of silicon dioxide (SiO2) and tantalum pentoxide (Ta2O5) Bragg reflectors on silicon (Si) wafer. The stack is analyzed with X-ray reflectivity (XRR) and high-resolution X-ray diffraction (HRXRD). The resonator… ▽ More This work reports an acoustic solidly mounted resonator (SMR) at 18.64 GHz, among the highest operating frequencies reported. The device is built in scandium aluminum nitride (ScAlN) on top of silicon dioxide (SiO2) and tantalum pentoxide (Ta2O5) Bragg reflectors on silicon (Si) wafer. The stack is analyzed with X-ray reflectivity (XRR) and high-resolution X-ray diffraction (HRXRD). The resonator shows a coupling coefficient (k2) of 2.0%, high series quality factor (Qs) of 156, shunt quality factor (Qp) of 142, and maximum Bode quality factor (Qmax) of 210. The third-order harmonics at 59.64 GHz is also observed with k2 around 0.6% and Q around 40. Upon further development, the reported acoustic resonator platform can enable various front-end signal-processing functions, e.g., filters and oscillators, at future frequency range 3 (FR3) bands. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 5 pages, 9 figures, 5 tables

arXiv:2407.02428 [pdf, other]

Comparative Evaluation of Learning Models for Bionic Robots: Non-Linear Transfer Function Identifications

Authors: Po-Yu Hsieh, June-Hao Hou

Abstract: The control and modeling of bionic robot dynamics have increasingly adopted model-free control strategies using machine learning methods. Given the non-linear elastic nature of bionic robotic systems, learning-based methods provide reliable alternatives by utilizing numerical data to establish a direct mapping from actuation inputs to robot trajectories without complex kinematics models. However,… ▽ More The control and modeling of bionic robot dynamics have increasingly adopted model-free control strategies using machine learning methods. Given the non-linear elastic nature of bionic robotic systems, learning-based methods provide reliable alternatives by utilizing numerical data to establish a direct mapping from actuation inputs to robot trajectories without complex kinematics models. However, for developers, the method of identifying an appropriate learning model for their specific bionic robots and further constructing the transfer function has not been thoroughly discussed. Thus, this research trains four types of models, including ensemble learning models, regularization-based models, kernel-based models, and neural network models, suitable for multi-input multi-output (MIMO) data and non-linear transfer function identification, in order to evaluate their (1) accuracy, (2) computation complexity, and (3) performance of capturing biological movements. This research encompasses data collection methods for control inputs and action outputs, selection of machine learning models, comparative analysis of training results, and transfer function identifications. The main objective is to provide a comprehensive evaluation strategy and framework for the application of model-free control. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 16 pages, 20 figures

arXiv:2407.01625 [pdf, other]

Balanced clique subdivisions and cycles lengths in $K_{s, t}$-free graphs

Authors: Jianfeng Hou, Yindong Jin, Donglei Yang, Fan Yang

Abstract: Let $ t\ge s\ge2$ be integers. Confirming a conjecture of Mader, Liu and Montgomery [J. Lond. Math. Soc., 2017] showed that every $K_{s, t}$-free graph with average degree $d$ contains a subdivision of a clique with at least $Ωおめが(d^{\frac{s}{2(s-1)}})$ vertices. We give an improvement by showing that such a graph contains a balanced subdivision of a clique with the same order, where a balanced subdi… ▽ More Let $ t\ge s\ge2$ be integers. Confirming a conjecture of Mader, Liu and Montgomery [J. Lond. Math. Soc., 2017] showed that every $K_{s, t}$-free graph with average degree $d$ contains a subdivision of a clique with at least $Ωおめが(d^{\frac{s}{2(s-1)}})$ vertices. We give an improvement by showing that such a graph contains a balanced subdivision of a clique with the same order, where a balanced subdivision is a subdivision in which each edge is subdivided the same number of times. In 1975, Erdős asked whether the sum of the reciprocals of the cycle lengths in a graph with infinite average degree $d$ is necessarily infinite. Recently, Liu and Montgomery [J. Amer. Math. Soc., 2023] confirmed the asymptotically correct lower bound on the reciprocals of the cycle lengths, and provided a lower bound of at least $(\frac{1}{2} -o_d(1)) \log d$. In this paper, we improve this low bound to $\left(\frac{s}{2(s-1)} -o_d(1)\right) \log d$ for $K_{s, t}$-free graphs. Both proofs of our results use the graph sublinear expansion property as well as some novel structural techniques. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Comments: arXiv admin note: text overlap with arXiv:2010.15802 by other authors

arXiv:2407.01330 [pdf, other]

Learning Unsigned Distance Fields from Local Shape Functions for 3D Surface Reconstruction

Authors: Jiangbei Hu, Yanggeng Li, Fei Hou, Junhui Hou, Zhebin Zhang, Shengfa Wang, Na Lei, Ying He

Abstract: Unsigned distance fields (UDFs) provide a versatile framework for representing a diverse array of 3D shapes, encompassing both watertight and non-watertight geometries. Traditional UDF learning methods typically require extensive training on large datasets of 3D shapes, which is costly and often necessitates hyperparameter adjustments for new datasets. This paper presents a novel neural framework,… ▽ More Unsigned distance fields (UDFs) provide a versatile framework for representing a diverse array of 3D shapes, encompassing both watertight and non-watertight geometries. Traditional UDF learning methods typically require extensive training on large datasets of 3D shapes, which is costly and often necessitates hyperparameter adjustments for new datasets. This paper presents a novel neural framework, LoSF-UDF, for reconstructing surfaces from 3D point clouds by leveraging local shape functions to learn UDFs. We observe that 3D shapes manifest simple patterns within localized areas, prompting us to create a training dataset of point cloud patches characterized by mathematical functions that represent a continuum from smooth surfaces to sharp edges and corners. Our approach learns features within a specific radius around each query point and utilizes an attention mechanism to focus on the crucial features for UDF estimation. This method enables efficient and robust surface reconstruction from point clouds without the need for shape-specific training. Additionally, our method exhibits enhanced resilience to noise and outliers in point clouds compared to existing methods. We present comprehensive experiments and comparisons across various datasets, including synthetic and real-scanned point clouds, to validate our method's efficacy. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 14 pages, 11 figures

ACM Class: I.3.5

arXiv:2407.01306 [pdf, other]

Unveiling the Unseen: Exploring Whitebox Membership Inference through the Lens of Explainability

Authors: Chenxi Li, Abhinav Kumar, Zhen Guo, Jie Hou, Reza Tourani

Abstract: The increasing prominence of deep learning applications and reliance on personalized data underscore the urgent need to address privacy vulnerabilities, particularly Membership Inference Attacks (MIAs). Despite numerous MIA studies, significant knowledge gaps persist, particularly regarding the impact of hidden features (in isolation) on attack efficacy and insufficient justification for the root… ▽ More The increasing prominence of deep learning applications and reliance on personalized data underscore the urgent need to address privacy vulnerabilities, particularly Membership Inference Attacks (MIAs). Despite numerous MIA studies, significant knowledge gaps persist, particularly regarding the impact of hidden features (in isolation) on attack efficacy and insufficient justification for the root causes of attacks based on raw data features. In this paper, we aim to address these knowledge gaps by first exploring statistical approaches to identify the most informative neurons and quantifying the significance of the hidden activations from the selected neurons on attack accuracy, in isolation and combination. Additionally, we propose an attack-driven explainable framework by integrating the target and attack models to identify the most influential features of raw data that lead to successful membership inference attacks. Our proposed MIA shows an improvement of up to 26% on state-of-the-art MIA. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 20 pages, 10 figures, 4 tables

arXiv:2407.00866 [pdf, other]

Silver Linings in the Shadows: Harnessing Membership Inference for Machine Unlearning

Authors: Nexhi Sula, Abhinav Kumar, Jie Hou, Han Wang, Reza Tourani

Abstract: With the continued advancement and widespread adoption of machine learning (ML) models across various domains, ensuring user privacy and data security has become a paramount concern. In compliance with data privacy regulations, such as GDPR, a secure machine learning framework should not only grant users the right to request the removal of their contributed data used for model training but also fa… ▽ More With the continued advancement and widespread adoption of machine learning (ML) models across various domains, ensuring user privacy and data security has become a paramount concern. In compliance with data privacy regulations, such as GDPR, a secure machine learning framework should not only grant users the right to request the removal of their contributed data used for model training but also facilitates the elimination of sensitive data fingerprints within machine learning models to mitigate potential attack - a process referred to as machine unlearning. In this study, we present a novel unlearning mechanism designed to effectively remove the impact of specific data samples from a neural network while considering the performance of the unlearned model on the primary task. In achieving this goal, we crafted a novel loss function tailored to eliminate privacy-sensitive information from weights and activation values of the target model by combining target classification loss and membership inference loss. Our adaptable framework can easily incorporate various privacy leakage approximation mechanisms to guide the unlearning process. We provide empirical evidence of the effectiveness of our unlearning approach with a theoretical upper-bound analysis through a membership inference mechanism as a proof of concept. Our results showcase the superior performance of our approach in terms of unlearning efficacy and latency as well as the fidelity of the primary task, across four datasets and four deep learning architectures. △ Less

Submitted 5 July, 2024; v1 submitted 30 June, 2024; originally announced July 2024.

Comments: 17 pages, 14 figures, 6 tables

arXiv:2407.00427 [pdf, ps, other]

On the boundedness of degenerate hypergraphs

Authors: Jianfeng Hou, Caiyun Hu, Heng Li, Xizhi Liu, Caihong Yang, Yixiao Zhang

Abstract: We investigate the impact of a high-degree vertex in Turán problems for degenerate hypergraphs (including graphs). We say an $r$-graph $F$ is bounded if there exist constants $αあるふぁ, βべーた>0$ such that for large $n$, every $n$-vertex $F$-free $r$-graph with a vertex of degree at least $αあるふぁ\binom{n-1}{r-1}$ has fewer than $(1-βべーた) \cdot \mathrm{ex}(n,F)$ edges. The boundedness property is crucial for recent wo… ▽ More We investigate the impact of a high-degree vertex in Turán problems for degenerate hypergraphs (including graphs). We say an $r$-graph $F$ is bounded if there exist constants $αあるふぁ, βべーた>0$ such that for large $n$, every $n$-vertex $F$-free $r$-graph with a vertex of degree at least $αあるふぁ\binom{n-1}{r-1}$ has fewer than $(1-βべーた) \cdot \mathrm{ex}(n,F)$ edges. The boundedness property is crucial for recent works~\cite{HHLLYZ23a,DHLY24} that aim to extend the classical Hajnal--Szemerédi Theorem and the anti-Ramsey theorems of Erdős--Simonovits--Sós. We show that many well-studied degenerate hypergraphs, such as all even cycles, most complete bipartite graphs, and the expansion of most complete bipartite graphs, are bounded. In addition, to prove the boundedness of the expansion of complete bipartite graphs, we introduce and solve a Zarankiewicz-type problem for $3$-graphs, strengthening a theorem by Kostochka--Mubayi--Verstraëte~\cite{KMV15}. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Comments: comments are welcome

arXiv:2406.15683 [pdf, other]

Parity-Odd Power Spectra: Concise Statistics for Cosmological Parity Violation

Authors: Drew Jamieson, Angelo Caravano, Jiamin Hou, Zachary Slepian, Eiichiro Komatsu

Abstract: We introduce the Parity-Odd Power (POP) spectra, a novel set of observables for probing parity violation in cosmological $N$-point statistics. POP spectra are derived from composite fields obtained by applying nonlinear transformations, involving also gradients, curls, and filtering functions, to a scalar field. This compresses the parity-odd trispectrum into a power spectrum. These new statistics… ▽ More We introduce the Parity-Odd Power (POP) spectra, a novel set of observables for probing parity violation in cosmological $N$-point statistics. POP spectra are derived from composite fields obtained by applying nonlinear transformations, involving also gradients, curls, and filtering functions, to a scalar field. This compresses the parity-odd trispectrum into a power spectrum. These new statistics offer several advantages: they are computationally fast to construct, estimating their covariance is less demanding compared to estimating that of the full parity-odd trispectrum, and they are simple to model theoretically. We measure the POP spectra on simulations of a scalar field with a specific parity-odd trispectrum shape. We compare these measurements to semi-analytic theoretical calculations and find agreement. We also explore extensions and generalizations of these parity-odd observables. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: 18 pages, 5 figures

arXiv:2406.15385 [pdf, ps, other]

On a Generating Function for the Isotropic Basis Functions and Other Connected Results

Authors: Zachary Slepian, Jessica Chellino, Jiamin Hou

Abstract: Recently isotropic basis functions of $N$ unit vector arguments were presented; these are of significant use in measuring the N-Point Correlation Functions (NPCFs) of galaxy clustering. Here we develop the generating function for these basis functions -- i.e. that function which, expanded in a power series, has as its angular part the isotropic functions. We show that this can be developed using b… ▽ More Recently isotropic basis functions of $N$ unit vector arguments were presented; these are of significant use in measuring the N-Point Correlation Functions (NPCFs) of galaxy clustering. Here we develop the generating function for these basis functions -- i.e. that function which, expanded in a power series, has as its angular part the isotropic functions. We show that this can be developed using basic properties of the plane wave. A main use of the generating function is as an efficient route to obtaining the Cartesian basis expressions for the isotropic functions. We show that the methods here enable computing difficult overlap integrals of multiple spherical Bessel functions, and we also give related expansions of the Dirac Delta function into the isotropic basis. Finally, we outline how the Cartesian expressions for the isotropic basis functions might be used to enable a faster NPCF algorithm on the CPU. △ Less

Submitted 20 April, 2024; originally announced June 2024.

Comments: 28 pages, no figures, comments welcome

arXiv:2406.14083 [pdf, ps, other]

Tight bounds for rainbow partial $F$-tiling in edge-colored complete hypergraphs

Authors: Jinghua Deng, Jianfeng Hou, Xizhi Liu, Caihong Yang

Abstract: For an $r$-graph $F$ and integers $n,t$ satisfying $t \le n/v(F)$, let $\mathrm{ar}(n,tF)$ denote the minimum integer $N$ such that every edge-coloring of $K_{n}^{r}$ using $N$ colors contains a rainbow copy of $tF$, where $tF$ is the $r$-graphs consisting of $t$ vertex-disjoint copies of $F$. The case $t=1$ is the classical anti-Ramsey problem proposed by Erdős--Simonovits--Sós~\cite{ESS75}. When… ▽ More For an $r$-graph $F$ and integers $n,t$ satisfying $t \le n/v(F)$, let $\mathrm{ar}(n,tF)$ denote the minimum integer $N$ such that every edge-coloring of $K_{n}^{r}$ using $N$ colors contains a rainbow copy of $tF$, where $tF$ is the $r$-graphs consisting of $t$ vertex-disjoint copies of $F$. The case $t=1$ is the classical anti-Ramsey problem proposed by Erdős--Simonovits--Sós~\cite{ESS75}. When $F$ is a single edge, this becomes the rainbow matching problem introduced by Schiermeyer~\cite{Sch04} and Özkahya--Young~\cite{OY13}. We conduct a systematic study of $\mathrm{ar}(n,tF)$ for the case where $t$ is much smaller than $\mathrm{ex}(n,F)/n^{r-1}$. Our first main result provides a reduction of $\mathrm{ar}(n,tF)$ to $\mathrm{ar}(n,2F)$ when $F$ is bounded and smooth, two properties satisfied by most previously studied hypergraphs. Complementing the first result, the second main result, which utilizes gaps between Turán numbers, determines $\mathrm{ar}(n,tF)$ for relatively smaller $t$. Together, these two results determine $\mathrm{ar}(n,tF)$ for a large class of hypergraphs. Additionally, the latter result has the advantage of being applicable to hypergraphs with unknown Turán densities, such as the famous tetrahedron $K_{4}^{3}$. △ Less

Submitted 21 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

Comments: 19 pages, 1 figues, comments are welcome

arXiv:2406.10175 [pdf, other]

Enhancing Incomplete Multi-modal Brain Tumor Segmentation with Intra-modal Asymmetry and Inter-modal Dependency

Authors: Weide Liu, Jingwen Hou, Xiaoyang Zhong, Huijing Zhan, Jun Cheng, Yuming Fang, Guanghui Yue

Abstract: Deep learning-based brain tumor segmentation (BTS) models for multi-modal MRI images have seen significant advancements in recent years. However, a common problem in practice is the unavailability of some modalities due to varying scanning protocols and patient conditions, making segmentation from incomplete MRI modalities a challenging issue. Previous methods have attempted to address this by fus… ▽ More Deep learning-based brain tumor segmentation (BTS) models for multi-modal MRI images have seen significant advancements in recent years. However, a common problem in practice is the unavailability of some modalities due to varying scanning protocols and patient conditions, making segmentation from incomplete MRI modalities a challenging issue. Previous methods have attempted to address this by fusing accessible multi-modal features, leveraging attention mechanisms, and synthesizing missing modalities using generative models. However, these methods ignore the intrinsic problems of medical image segmentation, such as the limited availability of training samples, particularly for cases with tumors. Furthermore, these methods require training and deploying a specific model for each subset of missing modalities. To address these issues, we propose a novel approach that enhances the BTS model from two perspectives. Firstly, we introduce a pre-training stage that generates a diverse pre-training dataset covering a wide range of different combinations of tumor shapes and brain anatomy. Secondly, we propose a post-training stage that enables the model to reconstruct missing modalities in the prediction results when only partial modalities are available. To achieve the pre-training stage, we conceptually decouple the MRI image into two parts: `anatomy' and `tumor'. We pre-train the BTS model using synthesized data generated from the anatomy and tumor parts across different training samples. ... Extensive experiments demonstrate that our proposed method significantly improves the performance over the baseline and achieves new state-of-the-art results on three brain tumor segmentation datasets: BRATS2020, BRATS2018, and BRATS2015. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.08374 [pdf, other]

2.5D Multi-view Averaging Diffusion Model for 3D Medical Image Translation: Application to Low-count PET Reconstruction with CT-less Attenuation Correction

Authors: Tianqi Chen, Jun Hou, Yinchi Zhou, Huidong Xie, Xiongchao Chen, Qiong Liu, Xueqi Guo, Menghua Xia, James S. Duncan, Chi Liu, Bo Zhou

Abstract: Positron Emission Tomography (PET) is an important clinical imaging tool but inevitably introduces radiation hazards to patients and healthcare providers. Reducing the tracer injection dose and eliminating the CT acquisition for attenuation correction can reduce the overall radiation dose, but often results in PET with high noise and bias. Thus, it is desirable to develop 3D methods to translate t… ▽ More Positron Emission Tomography (PET) is an important clinical imaging tool but inevitably introduces radiation hazards to patients and healthcare providers. Reducing the tracer injection dose and eliminating the CT acquisition for attenuation correction can reduce the overall radiation dose, but often results in PET with high noise and bias. Thus, it is desirable to develop 3D methods to translate the non-attenuation-corrected low-dose PET (NAC-LDPET) into attenuation-corrected standard-dose PET (AC-SDPET). Recently, diffusion models have emerged as a new state-of-the-art deep learning method for image-to-image translation, better than traditional CNN-based methods. However, due to the high computation cost and memory burden, it is largely limited to 2D applications. To address these challenges, we developed a novel 2.5D Multi-view Averaging Diffusion Model (MADM) for 3D image-to-image translation with application on NAC-LDPET to AC-SDPET translation. Specifically, MADM employs separate diffusion models for axial, coronal, and sagittal views, whose outputs are averaged in each sampling step to ensure the 3D generation quality from multiple views. To accelerate the 3D sampling process, we also proposed a strategy to use the CNN-based 3D generation as a prior for the diffusion model. Our experimental results on human patient studies suggested that MADM can generate high-quality 3D translation images, outperforming previous CNN-based and Diffusion-based baseline methods. △ Less

Submitted 15 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

Comments: 15 pages, 7 figures

arXiv:2406.06329 [pdf, other]

A Parameter-efficient Language Extension Framework for Multilingual ASR

Authors: Wei Liu, Jingyong Hou, Dong Yang, Muyong Cao, Tan Lee

Abstract: Covering all languages with a multilingual speech recognition model (MASR) is very difficult. Performing language extension on top of an existing MASR is a desirable choice. In this study, the MASR continual learning problem is probabilistically decomposed into language identity prediction (LP) and cross-lingual adaptation (XLA) sub-problems. Based on this, we propose an architecture-based framewo… ▽ More Covering all languages with a multilingual speech recognition model (MASR) is very difficult. Performing language extension on top of an existing MASR is a desirable choice. In this study, the MASR continual learning problem is probabilistically decomposed into language identity prediction (LP) and cross-lingual adaptation (XLA) sub-problems. Based on this, we propose an architecture-based framework for language extension that can fundamentally solve catastrophic forgetting, debudded as PELE. PELE is designed to be parameter-efficient, incrementally incorporating an add-on module to adapt to a new language. Specifically, different parameter-efficient fine-tuning (PEFT) modules and their variants are explored as potential candidates to perform XLA. Experiments are carried out on 5 new languages with a wide range of low-resourced data sizes. The best-performing PEFT candidate can achieve satisfactory performance across all languages and demonstrates superiority in three of five languages over the continual joint learning setting. Notably, PEFT methods focusing on weight parameters or input features are revealed to be limited in performance, showing significantly inferior extension capabilities compared to inserting a lightweight module in between layers such as an Adapter. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: Accepted by Interspeech 2024

arXiv:2406.05985 [pdf, other]

LOP-Field: Brain-inspired Layout-Object-Position Fields for Robotic Scene Understanding

Authors: Jiawei Hou, Wenhao Guan, Xiangyang Xue, Taiping Zeng

Abstract: Spatial cognition empowers animals with remarkably efficient navigation abilities, largely depending on the scene-level understanding of spatial environments. Recently, it has been found that a neural population in the postrhinal cortex of rat brains is more strongly tuned to the spatial layout rather than objects in a scene. Inspired by the representations of spatial layout in local scenes to enc… ▽ More Spatial cognition empowers animals with remarkably efficient navigation abilities, largely depending on the scene-level understanding of spatial environments. Recently, it has been found that a neural population in the postrhinal cortex of rat brains is more strongly tuned to the spatial layout rather than objects in a scene. Inspired by the representations of spatial layout in local scenes to encode different regions separately, we proposed LOP-Field that realizes the Layout-Object-Position(LOP) association to model the hierarchical representations for robotic scene understanding. Powered by foundation models and implicit scene representation, a neural field is implemented as a scene memory for robots, storing a queryable representation of scenes with position-wise, object-wise, and layout-wise information. To validate the built LOP association, the model is tested to infer region information from 3D positions with quantitative metrics, achieving an average accuracy of more than 88\%. It is also shown that the proposed method using region information can achieve improved object and view localization results with text and RGB input compared to state-of-the-art localization methods. △ Less

Submitted 11 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

arXiv:2406.00692 [pdf]

Harvesting room-temperature plasticity in ceramics by mechanically seeded dislocations

Authors: Xufei Fang, Wenjun Lu, Jiawen Zhang, Christian Minnert, Junhua Hou, Sebastian Bruns, Ulrike Kunz, Atsutomo Nakamura, Karsten Durst, Jürgen Rödel

Abstract: The quest for room-temperature ductile ceramics has been repeatedly fueled by hopes for large-scale applications but so far has been not successful. Recent demonstrations of enhanced functional properties in ceramics through judicious dislocation imprint, however, have been sparking renewed interest in dislocation plasticity in brittle ceramics. Here, we propose a facile approach using room-temper… ▽ More The quest for room-temperature ductile ceramics has been repeatedly fueled by hopes for large-scale applications but so far has been not successful. Recent demonstrations of enhanced functional properties in ceramics through judicious dislocation imprint, however, have been sparking renewed interest in dislocation plasticity in brittle ceramics. Here, we propose a facile approach using room-temperature mechanically seeded mobile dislocations with a density of ~10^14/m^2 to significantly improve the room-temperature plasticity of ceramics with a large plastic strain beyond ~30%. The seeded mobile dislocations trigger profuse dislocation multiplication via cross slip and motion. Hence, they offer an avenue to suppress brittle fracture and harvest plasticity in ceramics without any additional high-temperature process. We employ both in situ nano-/micromechanical deformation and ex situ bulk deformation to bridge the length scales. This finding tackles the pressing bottleneck of dislocation engineering in ceramics for achieving ductile ceramics and harvesting both versatile mechanical and functional properties. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2406.00434 [pdf, other]

MoDGS: Dynamic Gaussian Splatting from Causually-captured Monocular Videos

Authors: Qingming Liu, Yuan Liu, Jiepeng Wang, Xianqiang Lv, Peng Wang, Wenping Wang, Junhui Hou

Abstract: In this paper, we propose MoDGS, a new pipeline to render novel-view images in dynamic scenes using only casually captured monocular videos. Previous monocular dynamic NeRF or Gaussian Splatting methods strongly rely on the rapid movement of input cameras to construct multiview consistency but fail to reconstruct dynamic scenes on casually captured input videos whose cameras are static or move slo… ▽ More In this paper, we propose MoDGS, a new pipeline to render novel-view images in dynamic scenes using only casually captured monocular videos. Previous monocular dynamic NeRF or Gaussian Splatting methods strongly rely on the rapid movement of input cameras to construct multiview consistency but fail to reconstruct dynamic scenes on casually captured input videos whose cameras are static or move slowly. To address this challenging task, MoDGS adopts recent single-view depth estimation methods to guide the learning of the dynamic scene. Then, a novel 3D-aware initialization method is proposed to learn a reasonable deformation field and a new robust depth loss is proposed to guide the learning of dynamic scene geometry. Comprehensive experiments demonstrate that MoDGS is able to render high-quality novel view images of dynamic scenes from just a casually captured monocular video, which outperforms baseline methods by a significant margin. △ Less

Submitted 1 June, 2024; originally announced June 2024.

arXiv:2406.00037 [pdf, other]

Aligning LLMs through Multi-perspective User Preference Ranking-based Feedback for Programming Question Answering

Authors: Hongyu Yang, Liyang He, Min Hou, Shuanghong Shen, Rui Li, Jiahui Hou, Jianhui Ma, Junda Zhao

Abstract: Code Community Question Answering (CCQA) seeks to tackle programming-related issues, thereby boosting productivity in both software engineering and academic research. Recent advancements in Reinforcement Learning from Human Feedback (RLHF) have transformed the fine-tuning process of Large Language Models (LLMs) to produce responses that closely mimic human behavior. Leveraging LLMs with RLHF for p… ▽ More Code Community Question Answering (CCQA) seeks to tackle programming-related issues, thereby boosting productivity in both software engineering and academic research. Recent advancements in Reinforcement Learning from Human Feedback (RLHF) have transformed the fine-tuning process of Large Language Models (LLMs) to produce responses that closely mimic human behavior. Leveraging LLMs with RLHF for practical CCQA applications has thus emerged as a promising area of study. Unlike standard code question-answering tasks, CCQA involves multiple possible answers, with varying user preferences for each response. Additionally, code communities often show a preference for new APIs. These challenges prevent LLMs from generating responses that cater to the diverse preferences of users in CCQA tasks. To address these issues, we propose a novel framework called Aligning LLMs through Multi-perspective User Preference Ranking-based Feedback for Programming Question Answering (ALMupQA) to create user-focused responses. Our approach starts with Multi-perspective Preference Ranking Alignment (MPRA), which synthesizes varied user preferences based on the characteristics of answers from code communities. We then introduce a Retrieval-augmented In-context Learning (RIL) module to mitigate the problem of outdated answers by retrieving responses to similar questions from a question bank. Due to the limited availability of high-quality, multi-answer CCQA datasets, we also developed a dataset named StaCCQA from real code communities. Extensive experiments demonstrated the effectiveness of the ALMupQA framework in terms of accuracy and user preference. Compared to the base model, ALMupQA showed nearly an 11% improvement in BLEU, with increases of 20% and 17.5% in BERTScore and CodeBERTScore, respectively. △ Less

Submitted 27 May, 2024; originally announced June 2024.

arXiv:2405.20188 [pdf, other]

SPARE: Symmetrized Point-to-Plane Distance for Robust Non-Rigid Registration

Authors: Yuxin Yao, Bailin Deng, Junhui Hou, Juyong Zhang

Abstract: Existing optimization-based methods for non-rigid registration typically minimize an alignment error metric based on the point-to-point or point-to-plane distance between corresponding point pairs on the source surface and target surface. However, these metrics can result in slow convergence or a loss of detail. In this paper, we propose SPARE, a novel formulation that utilizes a symmetrized point… ▽ More Existing optimization-based methods for non-rigid registration typically minimize an alignment error metric based on the point-to-point or point-to-plane distance between corresponding point pairs on the source surface and target surface. However, these metrics can result in slow convergence or a loss of detail. In this paper, we propose SPARE, a novel formulation that utilizes a symmetrized point-to-plane distance for robust non-rigid registration. The symmetrized point-to-plane distance relies on both the positions and normals of the corresponding points, resulting in a more accurate approximation of the underlying geometry and can achieve higher accuracy than existing methods. To solve this optimization problem efficiently, we propose an alternating minimization solver using a majorization-minimization strategy. Moreover, for effective initialization of the solver, we incorporate a deformation graph-based coarse alignment that improves registration quality and efficiency. Extensive experiments show that the proposed method greatly improves the accuracy of non-rigid registration problems and maintains relatively high solution efficiency. The code is publicly available at https://github.com/yaoyx689/spare. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.19684 [pdf, other]

A Comprehensive Survey on Underwater Image Enhancement Based on Deep Learning

Authors: Xiaofeng Cong, Yu Zhao, Jie Gui, Junming Hou, Dacheng Tao

Abstract: Underwater image enhancement (UIE) presents a significant challenge within computer vision research. Despite the development of numerous UIE algorithms, a thorough and systematic review is still absent. To foster future advancements, we provide a detailed overview of the UIE task from several perspectives. Firstly, we introduce the physical models, data construction processes, evaluation metrics,… ▽ More Underwater image enhancement (UIE) presents a significant challenge within computer vision research. Despite the development of numerous UIE algorithms, a thorough and systematic review is still absent. To foster future advancements, we provide a detailed overview of the UIE task from several perspectives. Firstly, we introduce the physical models, data construction processes, evaluation metrics, and loss functions. Secondly, we categorize and discuss recent algorithms based on their contributions, considering six aspects: network architecture, learning strategy, learning stage, auxiliary tasks, domain perspective, and disentanglement fusion. Thirdly, due to the varying experimental setups in the existing literature, a comprehensive and unbiased comparison is currently unavailable. To address this, we perform both quantitative and qualitative evaluations of state-of-the-art algorithms across multiple benchmark datasets. Lastly, we identify key areas for future research in UIE. A collection of resources for UIE can be found at {https://github.com/YuZhao1999/UIE}. △ Less

Submitted 25 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

Comments: A survey on the underwater image enhancement task

arXiv:2405.17208 [pdf, other]

Impact and mitigation of spectroscopic systematics on DESI DR1 clustering measurements

Authors: A. Krolewski, J. Yu, A. J. Ross, S. Penmetsa, W. J. Percival, R. Zhou, J. Hou, J. Aguilar, S. Ahlen, D. Brooks, E. Chaussidon, T. Claybaugh, A. de la Macorra, Biprateep Dey, J. E. Forero-Romero, S. Gontcho A Gontcho, J. Guy, K. Honscheid, S. Juneau, D. Kirkby, T. Kisner, A. Kremin, A. Lambert, L. Le-Guillou, M. E. Levi , et al. (18 additional authors not shown)

Abstract: The large scale structure catalogs within DESI Data Release 1 (DR1) use nearly 6 million galaxies and quasars as tracers of the large-scale structure of the universe to measure the expansion history with baryon acoustic oscillations and the growth of structure with redshift-space distortions. In order to take advantage of DESI's unprecedented statistical power, we must ensure that the galaxy clust… ▽ More The large scale structure catalogs within DESI Data Release 1 (DR1) use nearly 6 million galaxies and quasars as tracers of the large-scale structure of the universe to measure the expansion history with baryon acoustic oscillations and the growth of structure with redshift-space distortions. In order to take advantage of DESI's unprecedented statistical power, we must ensure that the galaxy clustering measurements are unaffected by non-cosmological density fluctuations. One source of spurious fluctuations comes from variation in galaxy density with spectroscopic observing conditions, lowering the redshift efficiency (and thus galaxy density) in certain areas of the sky. We measure the uniformity of the redshift success rate for DESI luminous red galaxies (LRG), bright galaxies (BGS) and quasars (QSO), complementing the detailed discussion of emission line galaxy (ELG) systematics in a companion paper (Yu et al., 2024). We find small but significant fluctuations of up to 3% in redshift success rate with the effective spectroscopic signal-to-noise, and create and describe weights that remove these fluctuations. We also describe the process to identify and remove data from certain poorly performing fibers from DESI DR1, and measure the stability of the redshift success rate with time. Finally, we find small but significant correlations of redshift success rate with position on the focal plane, survey speed, and number of exposures required, and show the impact of weights correcting these trends on the power spectrum multipoles and on cosmological parameters from BAO and RSD fits. These corrections change the best-fit parameters by $<15\%$ of their statistical errors, and thus contribute negligibly to the overall DESI error budget. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 53 pages, 41 figures. Supporting paper for DESI DR1 cosmological measurements

arXiv:2405.15364 [pdf, other]

NVS-Solver: Video Diffusion Model as Zero-Shot Novel View Synthesizer

Authors: Meng You, Zhiyu Zhu, Hui Liu, Junhui Hou

Abstract: By harnessing the potent generative capabilities of pre-trained large video diffusion models, we propose NVS-Solver, a new novel view synthesis (NVS) paradigm that operates \textit{without} the need for training. NVS-Solver adaptively modulates the diffusion sampling process with the given views to enable the creation of remarkable visual experiences from single or multiple views of static scenes… ▽ More By harnessing the potent generative capabilities of pre-trained large video diffusion models, we propose NVS-Solver, a new novel view synthesis (NVS) paradigm that operates \textit{without} the need for training. NVS-Solver adaptively modulates the diffusion sampling process with the given views to enable the creation of remarkable visual experiences from single or multiple views of static scenes or monocular videos of dynamic scenes. Specifically, built upon our theoretical modeling, we iteratively modulate the score function with the given scene priors represented with warped input views to control the video diffusion process. Moreover, by theoretically exploring the boundary of the estimation error, we achieve the modulation in an adaptive fashion according to the view pose and the number of diffusion steps. Extensive evaluations on both static and dynamic scenes substantiate the significant superiority of our NVS-Solver over state-of-the-art methods both quantitatively and qualitatively. \textit{ Source code in } \href{https://github.com/ZHU-Zhiyu/NVS_Solver}{https://github.com/ZHU-Zhiyu/NVS$\_$Solver}. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: Technical Report

arXiv:2405.15034 [pdf, other]

NeCGS: Neural Compression for 3D Geometry Sets

Authors: Siyu Ren, Junhui Hou, Wenping Wang

Abstract: This paper explores the problem of effectively compressing 3D geometry sets containing diverse categories. We make \textit{the first} attempt to tackle this fundamental and challenging problem and propose NeCGS, a neural compression paradigm, which can compress hundreds of detailed and diverse 3D mesh models (~684 MB) by about 900 times (0.76 MB) with high accuracy and preservation of detailed geo… ▽ More This paper explores the problem of effectively compressing 3D geometry sets containing diverse categories. We make \textit{the first} attempt to tackle this fundamental and challenging problem and propose NeCGS, a neural compression paradigm, which can compress hundreds of detailed and diverse 3D mesh models (~684 MB) by about 900 times (0.76 MB) with high accuracy and preservation of detailed geometric details. Specifically, we first represent each irregular mesh model/shape in a regular representation that implicitly describes the geometry structure of the model using a 4D regular volume, called TSDF-Def volume. Such a regular representation can not only capture local surfaces more effectively but also facilitate the subsequent process. Then we construct a quantization-aware auto-decoder network architecture to regress these 4D volumes, which can summarize the similarity of local geometric structures within a model and across different models for redundancy limination, resulting in more compact representations, including an embedded feature of a smaller size associated with each model and a network parameter set shared by all models. We finally encode the resulting features and network parameters into bitstreams through entropy coding. After decompressing the features and network parameters, we can reconstruct the TSDF-Def volumes, where the 3D surfaces can be extracted through the deformable marching cubes.Extensive experiments and ablation studies demonstrate the significant advantages of our NeCGS over state-of-the-art methods both quantitatively and qualitatively. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.14633 [pdf, other]

Flatten Anything: Unsupervised Neural Surface Parameterization

Authors: Qijian Zhang, Junhui Hou, Wenping Wang, Ying He

Abstract: Surface parameterization plays an essential role in numerous computer graphics and geometry processing applications. Traditional parameterization approaches are designed for high-quality meshes laboriously created by specialized 3D modelers, thus unable to meet the processing demand for the current explosion of ordinary 3D data. Moreover, their working mechanisms are typically restricted to certai… ▽ More Surface parameterization plays an essential role in numerous computer graphics and geometry processing applications. Traditional parameterization approaches are designed for high-quality meshes laboriously created by specialized 3D modelers, thus unable to meet the processing demand for the current explosion of ordinary 3D data. Moreover, their working mechanisms are typically restricted to certain simple topologies, thus relying on cumbersome manual efforts (e.g., surface cutting, part segmentation) for pre-processing. In this paper, we introduce the Flatten Anything Model (FAM), an unsupervised neural architecture to achieve global free-boundary surface parameterization via learning point-wise mappings between 3D points on the target geometric surface and adaptively-deformed UV coordinates within the 2D parameter domain. To mimic the actual physical procedures, we ingeniously construct geometrically-interpretable sub-networks with specific functionalities of surface cutting, UV deforming, unwrapping, and wrapping, which are assembled into a bi-directional cycle mapping framework. Compared with previous methods, our FAM directly operates on discrete surface points without utilizing connectivity information, thus significantly reducing the strict requirements for mesh quality and even applicable to unstructured point cloud data. More importantly, our FAM is fully-automated without the need for pre-cutting and can deal with highly-complex topologies, since its learning process adaptively finds reasonable cutting seams and UV boundaries. Extensive experiments demonstrate the universality, superiority, and inspiring potential of our proposed neural surface parameterization paradigm. The code will be publicly available. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.14271 [pdf, other]

Fine-grained Image-to-LiDAR Contrastive Distillation with Visual Foundation Models

Authors: Yifan Zhang, Junhui Hou

Abstract: Contrastive image-to-LiDAR knowledge transfer, commonly used for learning 3D representations with synchronized images and point clouds, often faces a self-conflict dilemma. This issue arises as contrastive losses unintentionally dissociate features of unmatched points and pixels that share semantic labels, compromising the integrity of learned representations. To overcome this, we harness Visual F… ▽ More Contrastive image-to-LiDAR knowledge transfer, commonly used for learning 3D representations with synchronized images and point clouds, often faces a self-conflict dilemma. This issue arises as contrastive losses unintentionally dissociate features of unmatched points and pixels that share semantic labels, compromising the integrity of learned representations. To overcome this, we harness Visual Foundation Models (VFMs), which have revolutionized the acquisition of pixel-level semantics, to enhance 3D representation learning. Specifically, we utilize off-the-shelf VFMs to generate semantic labels for weakly-supervised pixel-to-point contrastive distillation. Additionally, we employ von Mises-Fisher distributions to structure the feature space, ensuring semantic embeddings within the same class remain consistent across varying inputs. Furthermore, we adapt sampling probabilities of points to address imbalances in spatial distribution and category frequency, promoting comprehensive and balanced learning. Extensive experiments demonstrate that our approach mitigates the challenges posed by traditional methods and consistently surpasses existing image-to-LiDAR contrastive distillation methods in downstream tasks. The source code is available at \href{https://github.com/Eaphan/OLIVINE.}{\color{black}https://github.com/Eaphan/OLIVINE}. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: Under review

arXiv:2405.14160 [pdf, other]

Embedded Majorana Islands

Authors: Jin-Xing Hou, Alex Westström, Rui Wang, Wen-Li Yang, Jian Li

Abstract: Mesoscopic superconducting islands hosting Majorana zero modes (MZMs), or Majorana islands in short, offer a prototype of topological qubits. In this work we investigate theoretically the model of a generic Majorana island tunneling-coupled to a single-piece metallic substrate, hence an \textit{embedded Majorana island}. We show the crucial consequences of an interplay between the topological grou… ▽ More Mesoscopic superconducting islands hosting Majorana zero modes (MZMs), or Majorana islands in short, offer a prototype of topological qubits. In this work we investigate theoretically the model of a generic Majorana island tunneling-coupled to a single-piece metallic substrate, hence an \textit{embedded Majorana island}. We show the crucial consequences of an interplay between the topological ground states nonlocally addressed by the MZMs and the metallic bath with coherent electron propagation: on the one hand, the topological degeneracy on the Majorana island can be preserved, by virtue of the particle-hole symmetry, despite the apparent bath-induced coupling between MZMs; on the other hand, the electronic interference in the metallic bath may lead to profound alterations to the renormalization group behavior of the hybrid system towards low energy/temperature compared with conventional Kondo physics. This work serves to establish the model of embedded Majorana islands as an experimentally relevant and theoretically intriguing problem particularly in the direction of topological quantum computation. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: 11 pages, 3 figures

arXiv:2405.12223 [pdf, other]

Cascaded Multi-path Shortcut Diffusion Model for Medical Image Translation

Authors: Yinchi Zhou, Tianqi Chen, Jun Hou, Huidong Xie, Nicha C. Dvornek, S. Kevin Zhou, David L. Wilson, James S. Duncan, Chi Liu, Bo Zhou

Abstract: Image-to-image translation is a vital component in medical imaging processing, with many uses in a wide range of imaging modalities and clinical scenarios. Previous methods include Generative Adversarial Networks (GANs) and Diffusion Models (DMs), which offer realism but suffer from instability and lack uncertainty estimation. Even though both GAN and DM methods have individually exhibited their c… ▽ More Image-to-image translation is a vital component in medical imaging processing, with many uses in a wide range of imaging modalities and clinical scenarios. Previous methods include Generative Adversarial Networks (GANs) and Diffusion Models (DMs), which offer realism but suffer from instability and lack uncertainty estimation. Even though both GAN and DM methods have individually exhibited their capability in medical image translation tasks, the potential of combining a GAN and DM to further improve translation performance and to enable uncertainty estimation remains largely unexplored. In this work, we address these challenges by proposing a Cascade Multi-path Shortcut Diffusion Model (CMDM) for high-quality medical image translation and uncertainty estimation. To reduce the required number of iterations and ensure robust performance, our method first obtains a conditional GAN-generated prior image that will be used for the efficient reverse translation with a DM in the subsequent step. Additionally, a multi-path shortcut diffusion strategy is employed to refine translation results and estimate uncertainty. A cascaded pipeline further enhances translation quality, incorporating residual averaging between cascades. We collected three different medical image datasets with two sub-tasks for each dataset to test the generalizability of our approach. Our experimental results found that CMDM can produce high-quality translations comparable to state-of-the-art methods while providing reasonable uncertainty estimations that correlate well with the translation error. △ Less

Submitted 5 April, 2024; originally announced May 2024.

Comments: 15 pages, 5 figures

arXiv:2405.11853 [pdf, other]

On the Determination of Stellar Mass and Binary Fraction of Open Clusters within 500 pc from the Sun

Authors: Yueyue Jiang, Jing Zhong, Songmei Qin, Tong Tang, Li Chen, Jinliang Hou

Abstract: We investigated the stellar mass function and the binary fraction of 114 nearby open clusters (OCs) using the high-precision photometric data from Gaia Data Release 3 (Gaia DR3). We estimated the mass of member stars by using a ridge line (RL) that is better in line with the observed color-magnitude diagram (CMD), thus obtaining more accurate stellar mass and binary mass ratio ($q$) at the low-mas… ▽ More We investigated the stellar mass function and the binary fraction of 114 nearby open clusters (OCs) using the high-precision photometric data from Gaia Data Release 3 (Gaia DR3). We estimated the mass of member stars by using a ridge line (RL) that is better in line with the observed color-magnitude diagram (CMD), thus obtaining more accurate stellar mass and binary mass ratio ($q$) at the low-mass region. By analyzing the present-day mass function (PDMF) of star clusters, we found that 108 OCs follow a two-stage power-law distribution, whereas 6 OCs present a single power-law PDMF. Subsequently, we fitted the high(low)-mass index of PDMF ($dN/dm \propto m^{-αあるふぁ}$), denoted as $αあるふぁ_{\rm h}$($αあるふぁ_{\rm l}$), and segmentation point $m_{\rm c}$. For our cluster sample, the median values of $αあるふぁ_{\rm h}$ and $αあるふぁ_{\rm l}$ are 2.65 and 0.95, respectively, which are approximately consistent with the initial mass function (IMF) results provided by Kroupa (2001). We utilized the cumulative radial number distribution of stars with different masses to quantify the degree of mass segregation. We found a significant positive correlation between the state of dynamical evolution and mass segregation in OCs. We also estimated the fraction of binary stars with $q \geq 0.5$, ranging from 6% to 34% with a median of 17%. Finally, we provided a catalog of 114 nearby cluster properties, including the total mass, the binary fraction, the PDMF, and the dynamical state. △ Less

Submitted 21 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

Comments: 20 pages, 15 figures, accepted for publication in ApJ

arXiv:2405.11153 [pdf]

Dual-color Coherent Perfect Absorber

Authors: Boyi Xue, Jintian Lin, Jiankun Hou, Yicheng Zhu, Ruixin Ma, Xianfeng Chen, Ya Cheng, Li Ge, Wenjie Wan

Abstract: Perfect absorption of light critically affects light-matter interaction for various applications. Coherent perfect absorbers (CPA) gain the unique capability of controlling light with light in a linear fashion. Multi-color CPAs [Phys. Rev. Lett. 107, 033901] are highly desirable for broadband and nonlinear light-to-light coherent control, however, the experimental demonstration has still remained… ▽ More Perfect absorption of light critically affects light-matter interaction for various applications. Coherent perfect absorbers (CPA) gain the unique capability of controlling light with light in a linear fashion. Multi-color CPAs [Phys. Rev. Lett. 107, 033901] are highly desirable for broadband and nonlinear light-to-light coherent control, however, the experimental demonstration has still remained elusive. Here we experimentally observe a dual-color version of CPA (DC-CPA) through a second harmonic generation in a single whispering-gallery-mode microcavity. The DC-CPA enables simultaneous perfect absorption of both the incoming fundamental wave and its second harmonic. Similar to its linear counterpart, coherent control in the DC-CPA can be also realized by tuning the relative phase and intensity between the two-colored waves through nonlinear interference instead of the linear one. This scheme breaks the linear boundary of the traditional CPA into a multi-frequency domain and paves the way toward all-optically signal processing and quantum information. △ Less

Submitted 17 May, 2024; originally announced May 2024.

arXiv:2405.06863 [pdf, other]

Ultraprecise time-difference measurement via enhanced dual pointers with multiple weak interactions

Authors: Yanqiang Guo, Jianchao Zhang, Jiahui Hou, Xiaomin Guo, Liantuan Xiao

Abstract: Standard weak measurement with an assistant pointer and single weak interaction constrains measurement precision and quantity of interaction parameters, and a compelling characterization of quantum effect featuring weak-value amplification (WVA) remains elusive. Here, we theoretically and experimentally demonstrate an enhanced dual-pointer WVA scheme based on multiple weak interactions and variabl… ▽ More Standard weak measurement with an assistant pointer and single weak interaction constrains measurement precision and quantity of interaction parameters, and a compelling characterization of quantum effect featuring weak-value amplification (WVA) remains elusive. Here, we theoretically and experimentally demonstrate an enhanced dual-pointer WVA scheme based on multiple weak interactions and variable spectrum sources. Developing triple weak interactions, momentum P pointer reaches an optimal time-difference precision of $3.34 \times {10^{-5}}$ as at 6 nm spectral width, and intensity I pointer achieves a displacement resolution of 148.8 fm within 400 kHzきろへるつ linewidth. A quantum effect associated with an anomalous weak value is revealed by an observable violation of a Leggett-Garg inequality. The I-pointer weak value is measured to be 1478 using multiple weak interactions and high signal-to-noise detection, achieving a two-order-of-magnitude WVA enhancement compared to standard weak measurement. Our work opens up a practical avenue for minuscule quantumness measurements in challenging environments. △ Less

Submitted 10 May, 2024; originally announced May 2024.

Comments: Main text: 6 pages, 6 figures; Supplementary material: 5 pages, 4 figures

arXiv:2405.05273 [pdf, ps, other]

Lovasz' Conjecture and Other Applications of Topological Methods in Discrete Mathematics

Authors: Jingsi Hou, Guangyan Huang, Sammy Suliman, Haoran Yan

Abstract: In 20th century mathematics, the field of topology, which concerns the properties of geometric objects under continuous transformation, has proved surprisingly useful in application to the study of discrete mathematics, such as combinatorics, graph theory, and theoretical computer science. In this paper, we seek to provide an introduction to the relevant topological concepts to non-specialists, as… ▽ More In 20th century mathematics, the field of topology, which concerns the properties of geometric objects under continuous transformation, has proved surprisingly useful in application to the study of discrete mathematics, such as combinatorics, graph theory, and theoretical computer science. In this paper, we seek to provide an introduction to the relevant topological concepts to non-specialists, as well as a selection of some existing applications to theorems in discrete mathematics. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: 17 pages, 9 figures

arXiv:2405.04105 [pdf, ps, other]

On self-dual Carrollian conformal nonlinear electrodynamics

Authors: Bin Chen, Jue Hou, Haowei Sun

Abstract: In this work, we study the duality symmetry group of Carrollian (nonlinear) electrodynamics and propose a family of Carrollian ModMax theories, which are invariant under Carrollian $\text{SO}(2)$ electromagnetic (EM) duality transformations and conformal transformation. We define the Carrollian $\text{SO}(2)$ EM transformations, with the help of Hodge duality in Carrollian geometry, then we rederi… ▽ More In this work, we study the duality symmetry group of Carrollian (nonlinear) electrodynamics and propose a family of Carrollian ModMax theories, which are invariant under Carrollian $\text{SO}(2)$ electromagnetic (EM) duality transformations and conformal transformation. We define the Carrollian $\text{SO}(2)$ EM transformations, with the help of Hodge duality in Carrollian geometry, then we rederive the Gaillard-Zumino consistency condition for EM duality of Carrollian nonlinear electrodynamics. Together with the traceless condition for the energy-momentum tensor, we are able to determine the Lagrangian of the Carrollian ModMax theories among pure electrodynamics. We furthermore study their behaviors under the $\sqrt{T\bar{T}}$ deformation flow, and show that these theories deform to each other and may reach two endpoints under the flow, with one of the endpoint being the Carrollian Maxwell theory. As a byproduct, we construct a family of two-dimensional Carrollian ModMax-like multiple scalar theories, which are closed under the $\sqrt{T\bar{T}}$ flow and may flow to a BMS free multi-scalar model. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: 37 pages

arXiv:2405.02474 [pdf, other]

Nonlinear magnetic sensing with hybrid nitrogen-vacancy/magnon systems

Authors: Zhongqiang Hu, Zhiping He, Qiuyuan Wang, Chung-Tao Chou, Justin T. Hou, Luqiao Liu

Abstract: Magnetic sensing beyond linear regime could broaden the frequency range of detectable magnetic fields, which is crucial to various microwave and quantum applications. Recently, nonlinear interactions in diamond nitrogen-vacancy (NV) centers, one of the most extensively studied quantum magnetic sensors, are proposed to realize magnetic sensing across arbitrary frequencies. In this work, we enhance… ▽ More Magnetic sensing beyond linear regime could broaden the frequency range of detectable magnetic fields, which is crucial to various microwave and quantum applications. Recently, nonlinear interactions in diamond nitrogen-vacancy (NV) centers, one of the most extensively studied quantum magnetic sensors, are proposed to realize magnetic sensing across arbitrary frequencies. In this work, we enhance these capabilities by exploiting the nonlinear spin dynamics in hybrid systems of NV centers and ferri- or ferro-magnetic (FM) thin films. We study the frequency mixing effect in the hybrid NV/magnon systems, and demonstrate that the introduction of FM not only amplifies the intensity of nonlinear resonance signals that are intrinsic to NV spins, but also enables novel frequency mixings through parametric pumping and nonlinear magnon scattering effects. The discovery and understanding of the magnetic nonlinearities in hybrid NV/magnon systems position them as a prime candidate for magnetic sensing with a broad frequency range and high tunablity, particularly meaningful for nanoscale, dynamical, and non-invasive materials characterization. △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2404.15802 [pdf, other]

Raformer: Redundancy-Aware Transformer for Video Wire Inpainting

Authors: Zhong Ji, Yimu Su, Yan Zhang, Jiacheng Hou, Yanwei Pang, Jungong Han

Abstract: Video Wire Inpainting (VWI) is a prominent application in video inpainting, aimed at flawlessly removing wires in films or TV series, offering significant time and labor savings compared to manual frame-by-frame removal. However, wire removal poses greater challenges due to the wires being longer and slimmer than objects typically targeted in general video inpainting tasks, and often intersecting… ▽ More Video Wire Inpainting (VWI) is a prominent application in video inpainting, aimed at flawlessly removing wires in films or TV series, offering significant time and labor savings compared to manual frame-by-frame removal. However, wire removal poses greater challenges due to the wires being longer and slimmer than objects typically targeted in general video inpainting tasks, and often intersecting with people and background objects irregularly, which adds complexity to the inpainting process. Recognizing the limitations posed by existing video wire datasets, which are characterized by their small size, poor quality, and limited variety of scenes, we introduce a new VWI dataset with a novel mask generation strategy, namely Wire Removal Video Dataset 2 (WRV2) and Pseudo Wire-Shaped (PWS) Masks. WRV2 dataset comprises over 4,000 videos with an average length of 80 frames, designed to facilitate the development and efficacy of inpainting models. Building upon this, our research proposes the Redundancy-Aware Transformer (Raformer) method that addresses the unique challenges of wire removal in video inpainting. Unlike conventional approaches that indiscriminately process all frame patches, Raformer employs a novel strategy to selectively bypass redundant parts, such as static background segments devoid of valuable information for inpainting. At the core of Raformer is the Redundancy-Aware Attention (RAA) module, which isolates and accentuates essential content through a coarse-grained, window-based attention mechanism. This is complemented by a Soft Feature Alignment (SFA) module, which refines these features and achieves end-to-end feature alignment. Extensive experiments on both the traditional video inpainting datasets and our proposed WRV2 dataset demonstrate that Raformer outperforms other state-of-the-art methods. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.15603 [pdf]

Development of Pattern Recognition Validation for Boson Sampling

Authors: Yang Ji, Yongzheng Wu, Shi Wang, Jie Hou, Meiling Chen, Ming Ni

Abstract: Boson sampling is one of the most attractive quantum computation models to demonstrate the quantum computational advantage. However, this aim may be hard to realize considering noise sources such as photon distinguishability. Inspired by the Bayesian validation developed to evaluate whether photon distinguishability is too high to demonstrate the quantum computational advantage, we develop the pat… ▽ More Boson sampling is one of the most attractive quantum computation models to demonstrate the quantum computational advantage. However, this aim may be hard to realize considering noise sources such as photon distinguishability. Inspired by the Bayesian validation developed to evaluate whether photon distinguishability is too high to demonstrate the quantum computational advantage, we develop the pattern recognition validation for boson sampling. Based on clusters constructed with the K means++ method, the distribution of test values is nearly monotonically changed with the photon indistinguishability, especially when photons are close to be indistinguishable. We analyze the intrinsic data structure through calculating probability distributions and mean 2-norm distances of the sorted outputs. Approximation algorithms are also used to show the data structure changes with photon distinguishability. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.14270 [pdf, other]

What do Transformers Know about Government?

Authors: Jue Hou, Anisia Katinskaia, Lari Kotilainen, Sathianpong Trangcasanchai, Anh-Duc Vu, Roman Yangarber

Abstract: This paper investigates what insights about linguistic features and what knowledge about the structure of natural language can be obtained from the encodings in transformer language models.In particular, we explore how BERT encodes the government relation between constituents in a sentence. We use several probing classifiers, and data from two morphologically rich languages. Our experiments show t… ▽ More This paper investigates what insights about linguistic features and what knowledge about the structure of natural language can be obtained from the encodings in transformer language models.In particular, we explore how BERT encodes the government relation between constituents in a sentence. We use several probing classifiers, and data from two morphologically rich languages. Our experiments show that information about government is encoded across all transformer layers, but predominantly in the early layers of the model. We find that, for both languages, a small number of attention heads encode enough information about the government relations to enable us to train a classifier capable of discovering new, previously unknown types of government, never seen in the training data. Currently, data is lacking for the research community working on grammatical constructions, and government in particular. We release the Government Bank -- a dataset defining the government relations for thousands of lemmas in the languages in our experiments. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.12804 [pdf, other]

Linearly-evolved Transformer for Pan-sharpening

Authors: Junming Hou, Zihan Cao, Naishan Zheng, Xuan Li, Xiaoyu Chen, Xinyang Liu, Xiaofeng Cong, Man Zhou, Danfeng Hong

Abstract: Vision transformer family has dominated the satellite pan-sharpening field driven by the global-wise spatial information modeling mechanism from the core self-attention ingredient. The standard modeling rules within these promising pan-sharpening methods are to roughly stack the transformer variants in a cascaded manner. Despite the remarkable advancement, their success may be at the huge cost of… ▽ More Vision transformer family has dominated the satellite pan-sharpening field driven by the global-wise spatial information modeling mechanism from the core self-attention ingredient. The standard modeling rules within these promising pan-sharpening methods are to roughly stack the transformer variants in a cascaded manner. Despite the remarkable advancement, their success may be at the huge cost of model parameters and FLOPs, thus preventing its application over low-resource satellites.To address this challenge between favorable performance and expensive computation, we tailor an efficient linearly-evolved transformer variant and employ it to construct a lightweight pan-sharpening framework. In detail, we deepen into the popular cascaded transformer modeling with cutting-edge methods and develop the alternative 1-order linearly-evolved transformer variant with the 1-dimensional linear convolution chain to achieve the same function. In this way, our proposed method is capable of benefiting the cascaded modeling rule while achieving favorable performance in the efficient manner. Extensive experiments over multiple satellite datasets suggest that our proposed method achieves competitive performance against other state-of-the-art with fewer computational resources. Further, the consistently favorable performance has been verified over the hyper-spectral image fusion task. Our main focus is to provide an alternative global modeling framework with an efficient structure. The code will be publicly available. △ Less

Submitted 19 April, 2024; originally announced April 2024.

Comments: 10 pages

arXiv:2404.11401 [pdf, other]

RainyScape: Unsupervised Rainy Scene Reconstruction using Decoupled Neural Rendering

Authors: Xianqiang Lyu, Hui Liu, Junhui Hou

Abstract: We propose RainyScape, an unsupervised framework for reconstructing clean scenes from a collection of multi-view rainy images. RainyScape consists of two main modules: a neural rendering module and a rain-prediction module that incorporates a predictor network and a learnable latent embedding that captures the rain characteristics of the scene. Specifically, based on the spectral bias property of… ▽ More We propose RainyScape, an unsupervised framework for reconstructing clean scenes from a collection of multi-view rainy images. RainyScape consists of two main modules: a neural rendering module and a rain-prediction module that incorporates a predictor network and a learnable latent embedding that captures the rain characteristics of the scene. Specifically, based on the spectral bias property of neural networks, we first optimize the neural rendering pipeline to obtain a low-frequency scene representation. Subsequently, we jointly optimize the two modules, driven by the proposed adaptive direction-sensitive gradient-based reconstruction loss, which encourages the network to distinguish between scene details and rain streaks, facilitating the propagation of gradients to the relevant components. Extensive experiments on both the classic neural radiance field and the recently proposed 3D Gaussian splatting demonstrate the superiority of our method in effectively eliminating rain streaks and rendering clean images, achieving state-of-the-art performance. The constructed high-quality dataset and source code will be publicly available. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.08772 [pdf, other]

Nonlinear Wave-Spin Interactions in Nitrogen-Vacancy Centers

Authors: Zhongqiang Hu, Qiuyuan Wang, Chung-Tao Chou, Justin T. Hou, Zhiping He, Luqiao Liu

Abstract: Nonlinear phenomena represent one of the central topics in the study of wave-matter interactions and constitute the key blocks for various applications in optical communication, computing, sensing, and imaging. In this work, we show that by employing the interactions between microwave photons and electron spins of nitrogen-vacancy (NV) centers, one can realize a variety of nonlinear effects, rangi… ▽ More Nonlinear phenomena represent one of the central topics in the study of wave-matter interactions and constitute the key blocks for various applications in optical communication, computing, sensing, and imaging. In this work, we show that by employing the interactions between microwave photons and electron spins of nitrogen-vacancy (NV) centers, one can realize a variety of nonlinear effects, ranging from the resonance at the sum or difference frequency of two or more waves to electromagnetically induced transparency from the interference between spin transitions. We further verify the phase coherence through two-photon Rabi-oscillation measurements. The highly sensitive, optically detected NV-center dynamics not only provides a platform for studying magnetically induced nonlinearities but also promises novel functionalities in quantum control and quantum sensing. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: 15 pages and 10 figures

arXiv:2404.07269 [pdf, other]

Comparing Compressed and Full-modeling Analyses with FOLPS: Implications for DESI 2024 and beyond

Authors: H. E. Noriega, A. Aviles, H. Gil-Marín, S. Ramirez-Solano, S. Fromenteau, M. Vargas-Magaña, J. Aguilar, S. Ahlen, O. Alves, S. Brieden, D. Brooks, J. L. Cervantes-Cota, S. Chen, T. Claybaugh, S. Cole, K. Dawson, A. de la Macorra, A. de Mattia, P. Doel, N. Findlay, J. E. Forero-Romero, E. Gaztañaga, S. Gontcho A Gontcho, K. Honscheid, J. Hou , et al. (29 additional authors not shown)

Abstract: The Dark Energy Spectroscopic Instrument (DESI) will provide unprecedented information about the large-scale structure of our Universe. In this work, we study the robustness of the theoretical modelling of the power spectrum of FOLPS, a novel effective field theory-based package for evaluating the redshift space power spectrum in the presence of massive neutrinos. We perform this validation by fit… ▽ More The Dark Energy Spectroscopic Instrument (DESI) will provide unprecedented information about the large-scale structure of our Universe. In this work, we study the robustness of the theoretical modelling of the power spectrum of FOLPS, a novel effective field theory-based package for evaluating the redshift space power spectrum in the presence of massive neutrinos. We perform this validation by fitting the AbacusSummit high-accuracy $N$-body simulations for Luminous Red Galaxies, Emission Line Galaxies and Quasar tracers, calibrated to describe DESI observations. We quantify the potential systematic error budget of FOLPS, finding that the modelling errors are fully sub-dominant for the DESI statistical precision within the studied range of scales. Additionally, we study two complementary approaches to fit and analyse the power spectrum data, one based on direct Full-Modelling fits and the other on the ShapeFit compression variables, both resulting in very good agreement in precision and accuracy. In each of these approaches, we study a set of potential systematic errors induced by several assumptions, such as the choice of template cosmology, the effect of prior choice in the nuisance parameters of the model, or the range of scales used in the analysis. Furthermore, we show how opening up the parameter space beyond the vanilla $Λらむだ$CDM model affects the DESI observables. These studies include the addition of massive neutrinos, spatial curvature, and dark energy equation of state. We also examine how relaxing the usual Cosmic Microwave Background and Big Bang Nucleosynthesis priors on the primordial spectral index and the baryonic matter abundance, respectively, impacts the inference on the rest of the parameters of interest. This paper pathways towards performing a robust and reliable analysis of the shape of the power spectrum of DESI galaxy and quasar clustering using FOLPS. △ Less

Submitted 13 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

Comments: Supporting publication of DESI 2024 VII: Cosmological constraints from full-shape analyses of the two-point clustering statistics measurements, in preparation (2024)

arXiv:2404.05997 [pdf, other]

Concept-Attention Whitening for Interpretable Skin Lesion Diagnosis

Authors: Junlin Hou, Jilan Xu, Hao Chen

Abstract: The black-box nature of deep learning models has raised concerns about their interpretability for successful deployment in real-world clinical applications. To address the concerns, eXplainable Artificial Intelligence (XAI) aims to provide clear and understandable explanations of the decision-making process. In the medical domain, concepts such as attributes of lesions or abnormalities serve as ke… ▽ More The black-box nature of deep learning models has raised concerns about their interpretability for successful deployment in real-world clinical applications. To address the concerns, eXplainable Artificial Intelligence (XAI) aims to provide clear and understandable explanations of the decision-making process. In the medical domain, concepts such as attributes of lesions or abnormalities serve as key evidence for deriving diagnostic results. However, existing concept-based models mainly depend on concepts that appear independently and require fine-grained concept annotations such as bounding boxes. A medical image usually contains multiple concepts and the fine-grained concept annotations are difficult to acquire. In this paper, we propose a novel Concept-Attention Whitening (CAW) framework for interpretable skin lesion diagnosis. CAW is comprised of a disease diagnosis branch and a concept alignment branch. In the former branch, we train the CNN with a CAW layer inserted to perform skin lesion diagnosis. The CAW layer decorrelates features and aligns image features to conceptual meanings via an orthogonal matrix. In the latter branch, we calculate the orthogonal matrix under the guidance of the concept attention mask. We particularly introduce a weakly-supervised concept mask generator that only leverages coarse concept labels for filtering local regions that are relevant to certain concepts, improving the optimization of the orthogonal matrix. Extensive experiments on two public skin lesion diagnosis datasets demonstrated that CAW not only enhanced interpretability but also maintained a state-of-the-art diagnostic performance. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.05169 [pdf, other]

QMix: Quality-aware Learning with Mixed Noise for Robust Retinal Disease Diagnosis

Authors: Junlin Hou, Jilan Xu, Rui Feng, Hao Chen

Abstract: Due to the complexity of medical image acquisition and the difficulty of annotation, medical image datasets inevitably contain noise. Noisy data with wrong labels affects the robustness and generalization ability of deep neural networks. Previous noise learning methods mainly considered noise arising from images being mislabeled, i.e. label noise, assuming that all mislabeled images are of high im… ▽ More Due to the complexity of medical image acquisition and the difficulty of annotation, medical image datasets inevitably contain noise. Noisy data with wrong labels affects the robustness and generalization ability of deep neural networks. Previous noise learning methods mainly considered noise arising from images being mislabeled, i.e. label noise, assuming that all mislabeled images are of high image quality. However, medical images are prone to suffering extreme quality issues, i.e. data noise, where discriminative visual features are missing for disease diagnosis. In this paper, we propose a noise learning framework, termed as QMix, that learns a robust disease diagnosis model under mixed noise. QMix alternates between sample separation and quality-aware semisupervised training in each training epoch. In the sample separation phase, we design a joint uncertainty-loss criterion to effectively separate (1) correctly labeled images; (2) mislabeled images with high quality and (3) mislabeled images with low quality. In the semi-supervised training phase, we train a disease diagnosis model to learn robust feature representation from the separated samples. Specifically, we devise a sample-reweighing loss to mitigate the effect of mislabeled images with low quality during training. Meanwhile, a contrastive enhancement loss is proposed to further distinguish mislabeled images with low quality from correctly labeled images. QMix achieved state-of-the-art disease diagnosis performance on five public retinal image datasets and exhibited substantial improvement on robustness against mixed noise. △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2404.04228 [pdf, other]

{\sc SimBIG}: Cosmological Constraints using Simulation-Based Inference of Galaxy Clustering with Marked Power Spectra

Authors: Elena Massara, ChangHoon Hahn, Michael Eickenberg, Shirley Ho, Jiamin Hou, Pablo Lemos, Chirag Modi, Azadeh Moradinezhad Dizgah, Liam Parker, Bruno Régaldo-Saint Blancard

Abstract: We present the first $Λらむだ$CDM cosmological analysis performed on a galaxy survey using marked power spectra. The marked power spectrum is the two-point function of a marked field, where galaxies are weighted by a function that depends on their local density. The presence of the mark leads these statistics to contain higher-order information of the original galaxy field, making them a good candidate… ▽ More We present the first $Λらむだ$CDM cosmological analysis performed on a galaxy survey using marked power spectra. The marked power spectrum is the two-point function of a marked field, where galaxies are weighted by a function that depends on their local density. The presence of the mark leads these statistics to contain higher-order information of the original galaxy field, making them a good candidate to exploit the non-Gaussian information of a galaxy catalog. In this work we make use of \simbig, a forward modeling framework for galaxy clustering analyses, and perform simulation-based inference using normalizing flows to infer the posterior distribution of the $Λらむだ$CDM cosmological parameters. We consider different mark configurations (ways to weight the galaxy field) and deploy them in the \simbig~pipeline to analyze the corresponding marked power spectra measured from a subset of the BOSS galaxy sample. We analyze the redshift-space mark power spectra decomposed in $\ell = 0, 2, 4$ multipoles and include scales up to the non-linear regime. Among the various mark configurations considered, the ones that give the most stringent cosmological constraints produce posterior median and $68\%$ confidence limits on the growth of structure parameters equal to $Ωおめが_m=0.273^{+0.040}_{-0.030}$ and $σしぐま_8=0.777^{+0.077}_{-0.071}$. Compared to a perturbation theory analysis using the power spectrum of the same dataset, the \simbig~marked power spectra constraints on $σしぐま_8$ are up to $1.2\times$ tighter, while no improvement is seen for the other cosmological parameters. △ Less

Submitted 5 April, 2024; originally announced April 2024.

Comments: 15 pages, 6 figures

arXiv:2404.00548 [pdf, other]

Modeling State Shifting via Local-Global Distillation for Event-Frame Gaze Tracking

Authors: Jiading Li, Zhiyu Zhu, Jinhui Hou, Junhui Hou, Jinjian Wu

Abstract: This paper tackles the problem of passive gaze estimation using both event and frame data. Considering the inherently different physiological structures, it is intractable to accurately estimate gaze purely based on a given state. Thus, we reformulate gaze estimation as the quantification of the state shifting from the current state to several prior registered anchor states. Specifically, we propo… ▽ More This paper tackles the problem of passive gaze estimation using both event and frame data. Considering the inherently different physiological structures, it is intractable to accurately estimate gaze purely based on a given state. Thus, we reformulate gaze estimation as the quantification of the state shifting from the current state to several prior registered anchor states. Specifically, we propose a two-stage learning-based gaze estimation framework that divides the whole gaze estimation process into a coarse-to-fine approach involving anchor state selection and final gaze location. Moreover, to improve the generalization ability, instead of learning a large gaze estimation network directly, we align a group of local experts with a student network, where a novel denoising distillation algorithm is introduced to utilize denoising diffusion techniques to iteratively remove inherent noise in event data. Extensive experiments demonstrate the effectiveness of the proposed method, which surpasses state-of-the-art methods by a large margin of 15$\%$. The code will be publicly available at https://github.com/jdjdli/Denoise_distill_EF_gazetracker. △ Less

Submitted 28 June, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

arXiv:2403.18548 [pdf, other]

A Semi-supervised Nighttime Dehazing Baseline with Spatial-Frequency Aware and Realistic Brightness Constraint

Authors: Xiaofeng Cong, Jie Gui, Jing Zhang, Junming Hou, Hao Shen

Abstract: Existing research based on deep learning has extensively explored the problem of daytime image dehazing. However, few studies have considered the characteristics of nighttime hazy scenes. There are two distinctions between nighttime and daytime haze. First, there may be multiple active colored light sources with lower illumination intensity in nighttime scenes, which may cause haze, glow and noise… ▽ More Existing research based on deep learning has extensively explored the problem of daytime image dehazing. However, few studies have considered the characteristics of nighttime hazy scenes. There are two distinctions between nighttime and daytime haze. First, there may be multiple active colored light sources with lower illumination intensity in nighttime scenes, which may cause haze, glow and noise with localized, coupled and frequency inconsistent characteristics. Second, due to the domain discrepancy between simulated and real-world data, unrealistic brightness may occur when applying a dehazing model trained on simulated data to real-world data. To address the above two issues, we propose a semi-supervised model for real-world nighttime dehazing. First, the spatial attention and frequency spectrum filtering are implemented as a spatial-frequency domain information interaction module to handle the first issue. Second, a pseudo-label-based retraining strategy and a local window-based brightness loss for semi-supervised training process is designed to suppress haze and glow while achieving realistic brightness. Experiments on public benchmarks validate the effectiveness of the proposed method and its superiority over state-of-the-art methods. The source code and Supplementary Materials are placed in the https://github.com/Xiaofeng-life/SFSNiD. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: This paper is accepted by CVPR2024

arXiv:2403.16649 [pdf, other]

CLHA: A Simple yet Effective Contrastive Learning Framework for Human Alignment

Authors: Feiteng Fang, Liang Zhu, Min Yang, Xi Feng, Jinchang Hou, Qixuan Zhao, Chengming Li, Xiping Hu, Ruifeng Xu

Abstract: Reinforcement learning from human feedback (RLHF) is a crucial technique in aligning large language models (LLMs) with human preferences, ensuring these LLMs behave in beneficial and comprehensible ways to users. However, a longstanding challenge in human alignment techniques based on reinforcement learning lies in their inherent complexity and difficulty in training. To address this challenge, we… ▽ More Reinforcement learning from human feedback (RLHF) is a crucial technique in aligning large language models (LLMs) with human preferences, ensuring these LLMs behave in beneficial and comprehensible ways to users. However, a longstanding challenge in human alignment techniques based on reinforcement learning lies in their inherent complexity and difficulty in training. To address this challenge, we present a simple yet effective Contrastive Learning Framework for Human Alignment (CLHA) to align LLMs with human preferences directly. CLHA employs a novel rescoring strategy to evaluate the noise within the data by considering its inherent quality and dynamically adjusting the training process. Simultaneously, CLHA utilizes pairwise contrastive loss and adaptive supervised fine-tuning loss to adaptively modify the likelihood of generating responses, ensuring enhanced alignment with human preferences. Using advanced methods, CLHA surpasses other algorithms, showcasing superior performance in terms of reward model scores, automatic evaluations, and human assessments on the widely used ``Helpful and Harmless'' dataset. △ Less

Submitted 26 March, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.15698 [pdf, other]

SceneX:Procedural Controllable Large-scale Scene Generation via Large-language Models

Authors: Mengqi Zhou, Jun Hou, Chuanchen Luo, Yuxi Wang, Zhaoxiang Zhang, Junran Peng

Abstract: Due to its great application potential, large-scale scene generation has drawn extensive attention in academia and industry. Recent research employs powerful generative models to create desired scenes and achieves promising results. However, most of these methods represent the scene using 3D primitives (e.g. point cloud or radiance field) incompatible with the industrial pipeline, which leads to a… ▽ More Due to its great application potential, large-scale scene generation has drawn extensive attention in academia and industry. Recent research employs powerful generative models to create desired scenes and achieves promising results. However, most of these methods represent the scene using 3D primitives (e.g. point cloud or radiance field) incompatible with the industrial pipeline, which leads to a substantial gap between academic research and industrial deployment. Procedural Controllable Generation (PCG) is an efficient technique for creating scalable and high-quality assets, but it is unfriendly for ordinary users as it demands profound domain expertise. To address these issues, we resort to using the large language model (LLM) to drive the procedural modeling. In this paper, we introduce a large-scale scene generation framework, SceneX, which can automatically produce high-quality procedural models according to designers' textual descriptions.Specifically, the proposed method comprises two components, PCGBench and PCGPlanner. The former encompasses an extensive collection of accessible procedural assets and thousands of hand-craft API documents. The latter aims to generate executable actions for Blender to produce controllable and precise 3D assets guided by the user's instructions. Our SceneX can generate a city spanning 2.5 km times 2.5 km with delicate layout and geometric structures, drastically reducing the time cost from several weeks for professional PCG engineers to just a few hours for an ordinary user. Extensive experiments demonstrated the capability of our method in controllable large-scale scene generation and editing, including asset placement and season translation. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Showing 1–50 of 756 results for author: Hou, J