-
Introducing Diminutive Causal Structure into Graph Representation Learning
Authors:
Hang Gao,
Peng Qiao,
Yifan Jin,
Fengge Wu,
Jiangmeng Li,
Changwen Zheng
Abstract:
When engaging in end-to-end graph representation learning with Graph Neural Networks (GNNs), the intricate causal relationships and rules inherent in graph data pose a formidable challenge for the model in accurately capturing authentic data relationships. A proposed mitigating strategy involves the direct integration of rules or relationships corresponding to the graph data into the model. Howeve…
▽ More
When engaging in end-to-end graph representation learning with Graph Neural Networks (GNNs), the intricate causal relationships and rules inherent in graph data pose a formidable challenge for the model in accurately capturing authentic data relationships. A proposed mitigating strategy involves the direct integration of rules or relationships corresponding to the graph data into the model. However, within the domain of graph representation learning, the inherent complexity of graph data obstructs the derivation of a comprehensive causal structure that encapsulates universal rules or relationships governing the entire dataset. Instead, only specialized diminutive causal structures, delineating specific causal relationships within constrained subsets of graph data, emerge as discernible. Motivated by empirical insights, it is observed that GNN models exhibit a tendency to converge towards such specialized causal structures during the training process. Consequently, we posit that the introduction of these specific causal structures is advantageous for the training of GNN models. Building upon this proposition, we introduce a novel method that enables GNN models to glean insights from these specialized diminutive causal structures, thereby enhancing overall performance. Our method specifically extracts causal knowledge from the model representation of these diminutive causal structures and incorporates interchange intervention to optimize the learning process. Theoretical analysis serves to corroborate the efficacy of our proposed method. Furthermore, empirical experiments consistently demonstrate significant performance improvements across diverse datasets.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
VoxNeuS: Enhancing Voxel-Based Neural Surface Reconstruction via Gradient Interpolation
Authors:
Sidun Liu,
Peng Qiao,
Zongxin Ye,
Wenyu Li,
Yong Dou
Abstract:
Neural Surface Reconstruction learns a Signed Distance Field~(SDF) to reconstruct the 3D model from multi-view images. Previous works adopt voxel-based explicit representation to improve efficiency. However, they ignored the gradient instability of interpolation in the voxel grid, leading to degradation on convergence and smoothness. Besides, previous works entangled the optimization of geometry a…
▽ More
Neural Surface Reconstruction learns a Signed Distance Field~(SDF) to reconstruct the 3D model from multi-view images. Previous works adopt voxel-based explicit representation to improve efficiency. However, they ignored the gradient instability of interpolation in the voxel grid, leading to degradation on convergence and smoothness. Besides, previous works entangled the optimization of geometry and radiance, which leads to the deformation of geometry to explain radiance, causing artifacts when reconstructing textured planes.
In this work, we reveal that the instability of gradient comes from its discontinuity during trilinear interpolation, and propose to use the interpolated gradient instead of the original analytical gradient to eliminate the discontinuity. Based on gradient interpolation, we propose VoxNeuS, a lightweight surface reconstruction method for computational and memory efficient neural surface reconstruction. Thanks to the explicit representation, the gradient of regularization terms, i.e. Eikonal and curvature loss, are directly solved, avoiding computation and memory-access overhead.
Further, VoxNeuS adopts a geometry-radiance disentangled architecture to handle the geometry deformation from radiance optimization.
The experimental results show that VoxNeuS achieves better reconstruction quality than previous works. The entire training process takes 15 minutes and less than 3 GB of memory on a single 2080ti GPU.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
DistGrid: Scalable Scene Reconstruction with Distributed Multi-resolution Hash Grid
Authors:
Sidun Liu,
Peng Qiao,
Zongxin Ye,
Wenyu Li,
Yong Dou
Abstract:
Neural Radiance Field~(NeRF) achieves extremely high quality in object-scaled and indoor scene reconstruction. However, there exist some challenges when reconstructing large-scale scenes. MLP-based NeRFs suffer from limited network capacity, while volume-based NeRFs are heavily memory-consuming when the scene resolution increases. Recent approaches propose to geographically partition the scene and…
▽ More
Neural Radiance Field~(NeRF) achieves extremely high quality in object-scaled and indoor scene reconstruction. However, there exist some challenges when reconstructing large-scale scenes. MLP-based NeRFs suffer from limited network capacity, while volume-based NeRFs are heavily memory-consuming when the scene resolution increases. Recent approaches propose to geographically partition the scene and learn each sub-region using an individual NeRF. Such partitioning strategies help volume-based NeRF exceed the single GPU memory limit and scale to larger scenes. However, this approach requires multiple background NeRF to handle out-of-partition rays, which leads to redundancy of learning. Inspired by the fact that the background of current partition is the foreground of adjacent partition, we propose a scalable scene reconstruction method based on joint Multi-resolution Hash Grids, named DistGrid. In this method, the scene is divided into multiple closely-paved yet non-overlapped Axis-Aligned Bounding Boxes, and a novel segmented volume rendering method is proposed to handle cross-boundary rays, thereby eliminating the need for background NeRFs. The experiments demonstrate that our method outperforms existing methods on all evaluated large-scale scenes, and provides visually plausible scene reconstruction. The scalability of our method on reconstruction quality is further evaluated qualitatively and quantitatively.
△ Less
Submitted 8 May, 2024; v1 submitted 7 May, 2024;
originally announced May 2024.
-
GraCo: Granularity-Controllable Interactive Segmentation
Authors:
Yian Zhao,
Kehan Li,
Zesen Cheng,
Pengchong Qiao,
Xiawu Zheng,
Rongrong Ji,
Chang Liu,
Li Yuan,
Jie Chen
Abstract:
Interactive Segmentation (IS) segments specific objects or parts in the image according to user input. Current IS pipelines fall into two categories: single-granularity output and multi-granularity output. The latter aims to alleviate the spatial ambiguity present in the former. However, the multi-granularity output pipeline suffers from limited interaction flexibility and produces redundant resul…
▽ More
Interactive Segmentation (IS) segments specific objects or parts in the image according to user input. Current IS pipelines fall into two categories: single-granularity output and multi-granularity output. The latter aims to alleviate the spatial ambiguity present in the former. However, the multi-granularity output pipeline suffers from limited interaction flexibility and produces redundant results. In this work, we introduce Granularity-Controllable Interactive Segmentation (GraCo), a novel approach that allows precise control of prediction granularity by introducing additional parameters to input. This enhances the customization of the interactive system and eliminates redundancy while resolving ambiguity. Nevertheless, the exorbitant cost of annotating multi-granularity masks and the lack of available datasets with granularity annotations make it difficult for models to acquire the necessary guidance to control output granularity. To address this problem, we design an any-granularity mask generator that exploits the semantic property of the pre-trained IS model to automatically generate abundant mask-granularity pairs without requiring additional manual annotation. Based on these pairs, we propose a granularity-controllable learning strategy that efficiently imparts the granularity controllability to the IS model. Extensive experiments on intricate scenarios at object and part levels demonstrate that our GraCo has significant advantages over previous methods. This highlights the potential of GraCo to be a flexible annotation tool, capable of adapting to diverse segmentation scenarios. The project page: https://zhao-yian.github.io/GraCo.
△ Less
Submitted 16 May, 2024; v1 submitted 1 May, 2024;
originally announced May 2024.
-
AbsGS: Recovering Fine Details for 3D Gaussian Splatting
Authors:
Zongxin Ye,
Wenyu Li,
Sidun Liu,
Peng Qiao,
Yong Dou
Abstract:
3D Gaussian Splatting (3D-GS) technique couples 3D Gaussian primitives with differentiable rasterization to achieve high-quality novel view synthesis results while providing advanced real-time rendering performance. However, due to the flaw of its adaptive density control strategy in 3D-GS, it frequently suffers from over-reconstruction issue in intricate scenes containing high-frequency details,…
▽ More
3D Gaussian Splatting (3D-GS) technique couples 3D Gaussian primitives with differentiable rasterization to achieve high-quality novel view synthesis results while providing advanced real-time rendering performance. However, due to the flaw of its adaptive density control strategy in 3D-GS, it frequently suffers from over-reconstruction issue in intricate scenes containing high-frequency details, leading to blurry rendered images. The underlying reason for the flaw has still been under-explored. In this work, we present a comprehensive analysis of the cause of aforementioned artifacts, namely gradient collision, which prevents large Gaussians in over-reconstructed regions from splitting. To address this issue, we propose the novel homodirectional view-space positional gradient as the criterion for densification. Our strategy efficiently identifies large Gaussians in over-reconstructed regions, and recovers fine details by splitting. We evaluate our proposed method on various challenging datasets. The experimental results indicate that our approach achieves the best rendering quality with reduced or similar memory consumption. Our method is easy to implement and can be incorporated into a wide variety of most recent Gaussian Splatting-based methods. We will open source our codes upon formal publication. Our project page is available at: https://ty424.github.io/AbsGS.github.io/
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
Graph Partial Label Learning with Potential Cause Discovering
Authors:
Hang Gao,
Jiaguo Yuan,
Jiangmeng Li,
Peng Qiao,
Fengge Wu,
Changwen Zheng,
Huaping Liu
Abstract:
Graph Neural Networks (GNNs) have garnered widespread attention for their potential to address the challenges posed by graph representation learning, which face complex graph-structured data across various domains. However, due to the inherent complexity and interconnectedness of graphs, accurately annotating graph data for training GNNs is extremely challenging. To address this issue, we have int…
▽ More
Graph Neural Networks (GNNs) have garnered widespread attention for their potential to address the challenges posed by graph representation learning, which face complex graph-structured data across various domains. However, due to the inherent complexity and interconnectedness of graphs, accurately annotating graph data for training GNNs is extremely challenging. To address this issue, we have introduced Partial Label Learning (PLL) into graph representation learning. PLL is a critical weakly supervised learning problem where each training instance is associated with a set of candidate labels, including the ground-truth label and the additional interfering labels. PLL allows annotators to make errors, which reduces the difficulty of data labeling. Subsequently, we propose a novel graph representation learning method that enables GNN models to effectively learn discriminative information within the context of PLL. Our approach utilizes potential cause extraction to obtain graph data that holds causal relationships with the labels. By conducting auxiliary training based on the extracted graph data, our model can effectively eliminate the interfering information in the PLL scenario. We support the rationale behind our method with a series of theoretical analyses. Moreover, we conduct extensive evaluations and ablation studies on multiple datasets, demonstrating the superiority of our proposed method.
△ Less
Submitted 21 May, 2024; v1 submitted 17 March, 2024;
originally announced March 2024.
-
FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation
Authors:
Pengchong Qiao,
Lei Shang,
Chang Liu,
Baigui Sun,
Xiangyang Ji,
Jie Chen
Abstract:
Subject-driven generation has garnered significant interest recently due to its ability to personalize text-to-image generation. Typical works focus on learning the new subject's private attributes. However, an important fact has not been taken seriously that a subject is not an isolated new concept but should be a specialization of a certain category in the pre-trained model. This results in the…
▽ More
Subject-driven generation has garnered significant interest recently due to its ability to personalize text-to-image generation. Typical works focus on learning the new subject's private attributes. However, an important fact has not been taken seriously that a subject is not an isolated new concept but should be a specialization of a certain category in the pre-trained model. This results in the subject failing to comprehensively inherit the attributes in its category, causing poor attribute-related generations. In this paper, motivated by object-oriented programming, we model the subject as a derived class whose base class is its semantic category. This modeling enables the subject to inherit public attributes from its category while learning its private attributes from the user-provided example. Specifically, we propose a plug-and-play method, Subject-Derived regularization (SuDe). It constructs the base-derived class modeling by constraining the subject-driven generated images to semantically belong to the subject's category. Extensive experiments under three baselines and two backbones on various subjects show that our SuDe enables imaginative attribute-related generations while maintaining subject fidelity. Codes will be open sourced soon at FaceChain (https://github.com/modelscope/facechain).
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
TFDMNet: A Novel Network Structure Combines the Time Domain and Frequency Domain Features
Authors:
Hengyue Pan,
Yixin Chen,
Zhiliang Tian,
Peng Qiao,
Linbo Qiao,
Dongsheng Li
Abstract:
Convolutional neural network (CNN) has achieved impressive success in computer vision during the past few decades. The image convolution operation helps CNNs to get good performance on image-related tasks. However, it also has high computation complexity and hard to be parallelized. This paper proposes a novel Element-wise Multiplication Layer (EML) to replace convolution layers, which can be trai…
▽ More
Convolutional neural network (CNN) has achieved impressive success in computer vision during the past few decades. The image convolution operation helps CNNs to get good performance on image-related tasks. However, it also has high computation complexity and hard to be parallelized. This paper proposes a novel Element-wise Multiplication Layer (EML) to replace convolution layers, which can be trained in the frequency domain. Theoretical analyses show that EMLs lower the computation complexity and easier to be parallelized. Moreover, we introduce a Weight Fixation mechanism to alleviate the problem of over-fitting, and analyze the working behavior of Batch Normalization and Dropout in the frequency domain. To get the balance between the computation complexity and memory usage, we propose a new network structure, namely Time-Frequency Domain Mixture Network (TFDMNet), which combines the advantages of both convolution layers and EMLs. Experimental results imply that TFDMNet achieves good performance on MNIST, CIFAR-10 and ImageNet databases with less number of operations comparing with corresponding CNNs.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
Holistic Inverse Rendering of Complex Facade via Aerial 3D Scanning
Authors:
Zixuan Xie,
Rengan Xie,
Rong Li,
Kai Huang,
Pengju Qiao,
Jingsen Zhu,
Xu Yin,
Qi Ye,
Wei Hua,
Yuchi Huo,
Hujun Bao
Abstract:
In this work, we use multi-view aerial images to reconstruct the geometry, lighting, and material of facades using neural signed distance fields (SDFs). Without the requirement of complex equipment, our method only takes simple RGB images captured by a drone as inputs to enable physically based and photorealistic novel-view rendering, relighting, and editing. However, a real-world facade usually h…
▽ More
In this work, we use multi-view aerial images to reconstruct the geometry, lighting, and material of facades using neural signed distance fields (SDFs). Without the requirement of complex equipment, our method only takes simple RGB images captured by a drone as inputs to enable physically based and photorealistic novel-view rendering, relighting, and editing. However, a real-world facade usually has complex appearances ranging from diffuse rocks with subtle details to large-area glass windows with specular reflections, making it hard to attend to everything. As a result, previous methods can preserve the geometry details but fail to reconstruct smooth glass windows or verse vise. In order to address this challenge, we introduce three spatial- and semantic-adaptive optimization strategies, including a semantic regularization approach based on zero-shot segmentation techniques to improve material consistency, a frequency-aware geometry regularization to balance surface smoothness and details in different surfaces, and a visibility probe-based scheme to enable efficient modeling of the local lighting in large-scale outdoor environments. In addition, we capture a real-world facade aerial 3D scanning image set and corresponding point clouds for training and benchmarking. The experiment demonstrates the superior quality of our method on facade holistic inverse rendering, novel view synthesis, and scene editing compared to state-of-the-art baselines.
△ Less
Submitted 8 April, 2024; v1 submitted 20 November, 2023;
originally announced November 2023.
-
Rethinking SIGN Training: Provable Nonconvex Acceleration without First- and Second-Order Gradient Lipschitz
Authors:
Tao Sun,
Congliang Chen,
Peng Qiao,
Li Shen,
Xinwang Liu,
Dongsheng Li
Abstract:
Sign-based stochastic methods have gained attention due to their ability to achieve robust performance despite using only the sign information for parameter updates. However, the current convergence analysis of sign-based methods relies on the strong assumptions of first-order gradient Lipschitz and second-order gradient Lipschitz, which may not hold in practical tasks like deep neural network tra…
▽ More
Sign-based stochastic methods have gained attention due to their ability to achieve robust performance despite using only the sign information for parameter updates. However, the current convergence analysis of sign-based methods relies on the strong assumptions of first-order gradient Lipschitz and second-order gradient Lipschitz, which may not hold in practical tasks like deep neural network training that involve high non-smoothness. In this paper, we revisit sign-based methods and analyze their convergence under more realistic assumptions of first- and second-order smoothness. We first establish the convergence of the sign-based method under weak first-order Lipschitz. Motivated by the weak first-order Lipschitz, we propose a relaxed second-order condition that still allows for nonconvex acceleration in sign-based methods. Based on our theoretical results, we gain insights into the computational advantages of the recently developed LION algorithm. In distributed settings, we prove that this nonconvex acceleration persists with linear speedup in the number of nodes, when utilizing fast communication compression gossip protocols. The novelty of our theoretical results lies in that they are derived under much weaker assumptions, thereby expanding the provable applicability of sign-based algorithms to a wider range of problems.
△ Less
Submitted 23 October, 2023;
originally announced October 2023.
-
Out-of-Distributed Semantic Pruning for Robust Semi-Supervised Learning
Authors:
Yu Wang,
Pengchong Qiao,
Chang Liu,
Guoli Song,
Xiawu Zheng,
Jie Chen
Abstract:
Recent advances in robust semi-supervised learning (SSL) typically filter out-of-distribution (OOD) information at the sample level. We argue that an overlooked problem of robust SSL is its corrupted information on semantic level, practically limiting the development of the field. In this paper, we take an initial step to explore and propose a unified framework termed OOD Semantic Pruning (OSP), w…
▽ More
Recent advances in robust semi-supervised learning (SSL) typically filter out-of-distribution (OOD) information at the sample level. We argue that an overlooked problem of robust SSL is its corrupted information on semantic level, practically limiting the development of the field. In this paper, we take an initial step to explore and propose a unified framework termed OOD Semantic Pruning (OSP), which aims at pruning OOD semantics out from in-distribution (ID) features. Specifically, (i) we propose an aliasing OOD matching module to pair each ID sample with an OOD sample with semantic overlap. (ii) We design a soft orthogonality regularization, which first transforms each ID feature by suppressing its semantic component that is collinear with paired OOD sample. It then forces the predictions before and after soft orthogonality decomposition to be consistent. Being practically simple, our method shows a strong performance in OOD detection and ID classification on challenging benchmarks. In particular, OSP surpasses the previous state-of-the-art by 13.7% on accuracy for ID classification and 5.9% on AUROC for OOD detection on TinyImageNet dataset. The source codes are publicly available at https://github.com/rain305f/OSP.
△ Less
Submitted 29 May, 2023; v1 submitted 29 May, 2023;
originally announced May 2023.
-
PVP: Pre-trained Visual Parameter-Efficient Tuning
Authors:
Zhao Song,
Ke Yang,
Naiyang Guan,
Junjie Zhu,
Peng Qiao,
Qingyong Hu
Abstract:
Large-scale pre-trained transformers have demonstrated remarkable success in various computer vision tasks. However, it is still highly challenging to fully fine-tune these models for downstream tasks due to their high computational and storage costs. Recently, Parameter-Efficient Tuning (PETuning) techniques, e.g., Visual Prompt Tuning (VPT) and Low-Rank Adaptation (LoRA), have significantly redu…
▽ More
Large-scale pre-trained transformers have demonstrated remarkable success in various computer vision tasks. However, it is still highly challenging to fully fine-tune these models for downstream tasks due to their high computational and storage costs. Recently, Parameter-Efficient Tuning (PETuning) techniques, e.g., Visual Prompt Tuning (VPT) and Low-Rank Adaptation (LoRA), have significantly reduced the computation and storage cost by inserting lightweight prompt modules into the pre-trained models and tuning these prompt modules with a small number of trainable parameters, while keeping the transformer backbone frozen. Although only a few parameters need to be adjusted, most PETuning methods still require a significant amount of downstream task training data to achieve good results. The performance is inadequate on low-data regimes, especially when there are only one or two examples per class. To this end, we first empirically identify the poor performance is mainly due to the inappropriate way of initializing prompt modules, which has also been verified in the pre-trained language models. Next, we propose a Pre-trained Visual Parameter-efficient (PVP) Tuning framework, which pre-trains the parameter-efficient tuning modules first and then leverages the pre-trained modules along with the pre-trained transformer backbone to perform parameter-efficient tuning on downstream tasks. Experiment results on five Fine-Grained Visual Classification (FGVC) and VTAB-1k datasets demonstrate that our proposed method significantly outperforms state-of-the-art PETuning methods.
△ Less
Submitted 26 April, 2023;
originally announced April 2023.
-
Towards Vision Transformer Unrolling Fixed-Point Algorithm: a Case Study on Image Restoration
Authors:
Peng Qiao,
Sidun Liu,
Tao Sun,
Ke Yang,
Yong Dou
Abstract:
The great success of Deep Neural Networks (DNNs) has inspired the algorithmic development of DNN-based Fixed-Point (DNN-FP) for computer vision tasks. DNN-FP methods, trained by Back-Propagation Through Time or computing the inaccurate inversion of the Jacobian, suffer from inferior representation ability. Motivated by the representation power of the Transformer, we propose a framework to unroll t…
▽ More
The great success of Deep Neural Networks (DNNs) has inspired the algorithmic development of DNN-based Fixed-Point (DNN-FP) for computer vision tasks. DNN-FP methods, trained by Back-Propagation Through Time or computing the inaccurate inversion of the Jacobian, suffer from inferior representation ability. Motivated by the representation power of the Transformer, we propose a framework to unroll the FP and approximate each unrolled process via Transformer blocks, called FPformer. To reduce the high consumption of memory and computation, we come up with FPRformer by sharing parameters between the successive blocks. We further design a module to adapt Anderson acceleration to FPRformer to enlarge the unrolled iterations and improve the performance, called FPAformer. In order to fully exploit the capability of the Transformer, we apply the proposed model to image restoration, using self-supervised pre-training and supervised fine-tuning. 161 tasks from 4 categories of image restoration problems are used in the pre-training phase. Hereafter, the pre-trained FPformer, FPRformer, and FPAformer are further fine-tuned for the comparison scenarios. Using self-supervised pre-training and supervised fine-tuning, the proposed FPformer, FPRformer, and FPAformer achieve competitive performance with state-of-the-art image restoration methods and better training efficiency. FPAformer employs only 29.82% parameters used in SwinIR models, and provides superior performance after fine-tuning. To train these comparison models, it takes only 26.9% time used for training SwinIR models. It provides a promising way to introduce the Transformer in low-level vision tasks.
△ Less
Submitted 28 January, 2023;
originally announced January 2023.
-
Nanoparticles Passive Targeting Allows Optical Imaging of Bone Diseases
Authors:
Chao Mi,
Xun Zhang,
Chengyu Yang,
Jianqun Wu,
Xinxin Chen,
Chenguang Ma,
Sitong Wu,
Zhichao Yang,
Pengzhen Qiao,
Yang Liu,
Weijie Wu,
Zhiyong Guo,
Jiayan Liao,
Jiajia Zhou,
Ming Guan,
Chao Liang,
Chao Liu,
Dayong Jin
Abstract:
Bone health related skeletal disorders are commonly diagnosed by X-ray imaging, but the radiation limits its use. Light excitation and optical imaging through the near-infrared-II window (NIR-II, 1000-1700 nm) can penetrate deep tissues without radiation risk, but the targeting of contrast agent is non-specific. Here, we report that lanthanide-doped nanocrystals can be passively transported by end…
▽ More
Bone health related skeletal disorders are commonly diagnosed by X-ray imaging, but the radiation limits its use. Light excitation and optical imaging through the near-infrared-II window (NIR-II, 1000-1700 nm) can penetrate deep tissues without radiation risk, but the targeting of contrast agent is non-specific. Here, we report that lanthanide-doped nanocrystals can be passively transported by endothelial cells and macrophages from the blood vessels into bone marrow microenvironment. We found that this passive targeting scheme can be effective for longer than two months. We therefore developed an intravital 3D and high-resolution planar imaging instrumentation for bone disease diagnosis. We demonstrated the regular monitoring of 1 mm bone defects for over 10 days, with resolution similar to X-ray imaging result, but more flexible use in prognosis. Moreover, the passive targeting can be used to reveal the early onset inflammation at the joints as the synovitis in the early stage of rheumatoid arthritis. Furthermore, the proposed method is comparable to μCT in recognizing symptoms of osteoarthritis, including the mild hyperostosis in femur which is ~100 μm thicker than normal, and the growth of millimeter-scale osteophyte in the knee joint, which further proves the power and universality of our approach in diagnosis of bone diseases
△ Less
Submitted 4 January, 2023;
originally announced January 2023.
-
Out-of-Candidate Rectification for Weakly Supervised Semantic Segmentation
Authors:
Zesen Cheng,
Pengchong Qiao,
Kehan Li,
Siheng Li,
Pengxu Wei,
Xiangyang Ji,
Li Yuan,
Chang Liu,
Jie Chen
Abstract:
Weakly supervised semantic segmentation is typically inspired by class activation maps, which serve as pseudo masks with class-discriminative regions highlighted. Although tremendous efforts have been made to recall precise and complete locations for each class, existing methods still commonly suffer from the unsolicited Out-of-Candidate (OC) error predictions that not belongs to the label candida…
▽ More
Weakly supervised semantic segmentation is typically inspired by class activation maps, which serve as pseudo masks with class-discriminative regions highlighted. Although tremendous efforts have been made to recall precise and complete locations for each class, existing methods still commonly suffer from the unsolicited Out-of-Candidate (OC) error predictions that not belongs to the label candidates, which could be avoidable since the contradiction with image-level class tags is easy to be detected. In this paper, we develop a group ranking-based Out-of-Candidate Rectification (OCR) mechanism in a plug-and-play fashion. Firstly, we adaptively split the semantic categories into In-Candidate (IC) and OC groups for each OC pixel according to their prior annotation correlation and posterior prediction correlation. Then, we derive a differentiable rectification loss to force OC pixels to shift to the IC group. Incorporating our OCR with seminal baselines (e.g., AffinityNet, SEAM, MCTformer), we can achieve remarkable performance gains on both Pascal VOC (+3.2%, +3.3%, +0.8% mIoU) and MS COCO (+1.0%, +1.3%, +0.5% mIoU) datasets with negligible extra training overhead, which justifies the effectiveness and generality of our OCR.
△ Less
Submitted 14 March, 2023; v1 submitted 22 November, 2022;
originally announced November 2022.
-
Fuzzy Positive Learning for Semi-supervised Semantic Segmentation
Authors:
Pengchong Qiao,
Zhidan Wei,
Yu Wang,
Zhennan Wang,
Guoli Song,
Fan Xu,
Xiangyang Ji,
Chang Liu,
Jie Chen
Abstract:
Semi-supervised learning (SSL) essentially pursues class boundary exploration with less dependence on human annotations. Although typical attempts focus on ameliorating the inevitable error-prone pseudo-labeling, we think differently and resort to exhausting informative semantics from multiple probably correct candidate labels. In this paper, we introduce Fuzzy Positive Learning (FPL) for accurate…
▽ More
Semi-supervised learning (SSL) essentially pursues class boundary exploration with less dependence on human annotations. Although typical attempts focus on ameliorating the inevitable error-prone pseudo-labeling, we think differently and resort to exhausting informative semantics from multiple probably correct candidate labels. In this paper, we introduce Fuzzy Positive Learning (FPL) for accurate SSL semantic segmentation in a plug-and-play fashion, targeting adaptively encouraging fuzzy positive predictions and suppressing highly-probable negatives. Being conceptually simple yet practically effective, FPL can remarkably alleviate interference from wrong pseudo labels and progressively achieve clear pixel-level semantic discrimination. Concretely, our FPL approach consists of two main components, including fuzzy positive assignment (FPA) to provide an adaptive number of labels for each pixel and fuzzy positive regularization (FPR) to restrict the predictions of fuzzy positive categories to be larger than the rest under different perturbations. Theoretical analysis and extensive experiments on Cityscapes and VOC 2012 with consistent performance gain justify the superiority of our approach.
△ Less
Submitted 19 November, 2022; v1 submitted 16 October, 2022;
originally announced October 2022.
-
Multi-Outputs Is All You Need For Deblur
Authors:
Sidun Liu,
Peng Qiao,
Yong Dou
Abstract:
Image deblurring task is an ill-posed one, where exists infinite feasible solutions for blurry image. Modern deep learning approaches usually discard the learning of blur kernels and directly employ end-to-end supervised learning. Popular deblurring datasets define the label as one of the feasible solutions. However, we argue that it's not reasonable to specify a label directly, especially when th…
▽ More
Image deblurring task is an ill-posed one, where exists infinite feasible solutions for blurry image. Modern deep learning approaches usually discard the learning of blur kernels and directly employ end-to-end supervised learning. Popular deblurring datasets define the label as one of the feasible solutions. However, we argue that it's not reasonable to specify a label directly, especially when the label is sampled from a random distribution. Therefore, we propose to make the network learn the distribution of feasible solutions, and design based on this consideration a novel multi-head output architecture and corresponding loss function for distribution learning. Our approach enables the model to output multiple feasible solutions to approximate the target distribution. We further propose a novel parameter multiplexing method that reduces the number of parameters and computational effort while improving performance. We evaluated our approach on multiple image-deblur models, including the current state-of-the-art NAFNet. The improvement of best overall (pick the highest score among multiple heads for each validation image) PSNR outperforms the compared baselines up to 0.11~0.18dB. The improvement of the best single head (pick the best-performed head among multiple heads on validation set) PSNR outperforms the compared baselines up to 0.04~0.08dB. The codes are available at https://github.com/Liu-SD/multi-output-deblur.
△ Less
Submitted 27 August, 2022;
originally announced August 2022.
-
$L_2$BN: Enhancing Batch Normalization by Equalizing the $L_2$ Norms of Features
Authors:
Zhennan Wang,
Kehan Li,
Runyi Yu,
Yian Zhao,
Pengchong Qiao,
Chang Liu,
Fan Xu,
Xiangyang Ji,
Guoli Song,
Jie Chen
Abstract:
In this paper, we analyze batch normalization from the perspective of discriminability and find the disadvantages ignored by previous studies: the difference in $l_2$ norms of sample features can hinder batch normalization from obtaining more distinguished inter-class features and more compact intra-class features. To address this issue, we propose a simple yet effective method to equalize the…
▽ More
In this paper, we analyze batch normalization from the perspective of discriminability and find the disadvantages ignored by previous studies: the difference in $l_2$ norms of sample features can hinder batch normalization from obtaining more distinguished inter-class features and more compact intra-class features. To address this issue, we propose a simple yet effective method to equalize the $l_2$ norms of sample features. Concretely, we $l_2$-normalize each sample feature before feeding them into batch normalization, and therefore the features are of the same magnitude. Since the proposed method combines the $l_2$ normalization and batch normalization, we name our method $L_2$BN. The $L_2$BN can strengthen the compactness of intra-class features and enlarge the discrepancy of inter-class features. The $L_2$BN is easy to implement and can exert its effect without any additional parameters or hyper-parameters. We evaluate the effectiveness of $L_2$BN through extensive experiments with various models on image classification and acoustic scene classification tasks. The results demonstrate that the $L_2$BN can boost the generalization ability of various neural network models and achieve considerable performance improvements.
△ Less
Submitted 21 March, 2023; v1 submitted 6 July, 2022;
originally announced July 2022.
-
Automated Segmentation of Brain Gray Matter Nuclei on Quantitative Susceptibility Mapping Using Deep Convolutional Neural Network
Authors:
Chao Chai,
Pengchong Qiao,
Bin Zhao,
Huiying Wang,
Guohua Liu,
Hong Wu,
E Mark Haacke,
Wen Shen,
Chen Cao,
Xinchen Ye,
Zhiyang Liu,
Shuang Xia
Abstract:
Abnormal iron accumulation in the brain subcortical nuclei has been reported to be correlated to various neurodegenerative diseases, which can be measured through the magnetic susceptibility from the quantitative susceptibility mapping (QSM). To quantitively measure the magnetic susceptibility, the nuclei should be accurately segmented, which is a tedious task for clinicians. In this paper, we pro…
▽ More
Abnormal iron accumulation in the brain subcortical nuclei has been reported to be correlated to various neurodegenerative diseases, which can be measured through the magnetic susceptibility from the quantitative susceptibility mapping (QSM). To quantitively measure the magnetic susceptibility, the nuclei should be accurately segmented, which is a tedious task for clinicians. In this paper, we proposed a double-branch residual-structured U-Net (DB-ResUNet) based on 3D convolutional neural network (CNN) to automatically segment such brain gray matter nuclei. To better tradeoff between segmentation accuracy and the memory efficiency, the proposed DB-ResUNet fed image patches with high resolution and the patches with low resolution but larger field of view into the local and global branches, respectively. Experimental results revealed that by jointly using QSM and T$_\text{1}$ weighted imaging (T$_\text{1}$WI) as inputs, the proposed method was able to achieve better segmentation accuracy over its single-branch counterpart, as well as the conventional atlas-based method and the classical 3D-UNet structure. The susceptibility values and the volumes were also measured, which indicated that the measurements from the proposed DB-ResUNet are able to present high correlation with values from the manually annotated regions of interest.
△ Less
Submitted 3 August, 2020;
originally announced August 2020.
-
On a problem of Erdős about graphs whose size is the Turán number plus one
Authors:
Pu Qiao,
Xingzhi Zhan
Abstract:
We consider finite simple graphs. Given a graph $H$ and a positive integer $n,$ the Turán number of $H$ for the order $n,$ denoted ${\rm ex}(n,H),$ is the maximum size of a graph of order $n$ not containing $H$ as a subgraph. Erdős posed the following problem in 1990:
"For which graphs $H$ is it true that every graph on $n$ vertices and ${\rm ex}(n,H)+1$ edges contains at least two $H$s? Perhaps…
▽ More
We consider finite simple graphs. Given a graph $H$ and a positive integer $n,$ the Turán number of $H$ for the order $n,$ denoted ${\rm ex}(n,H),$ is the maximum size of a graph of order $n$ not containing $H$ as a subgraph. Erdős posed the following problem in 1990:
"For which graphs $H$ is it true that every graph on $n$ vertices and ${\rm ex}(n,H)+1$ edges contains at least two $H$s? Perhaps this is always true."
We solve the second part of this problem in the negative by proving that for every integer $k\ge 4,$ there exists a graph $H$ of order $k$ and at least two orders $n$ such that there exists a graph of order $n$ and size ${\rm ex}(n,H)+1$ which contains exactly one copy of $H.$ Denote by $C_4$ the $4$-cycle. We also prove that for every integer $n$ with $6\le n\le 11,$ there exists a graph of order $n$ and size ${\rm ex}(n,C_4)+1$ which contains exactly one copy of $C_4,$ but for $n=12$ or $n=13,$ the minimum number of copies of $C_4$ in a graph of order $n$ and size ${\rm ex}(n,C_4)+1$ is $2.$
△ Less
Submitted 31 January, 2020;
originally announced January 2020.
-
The diameter and radius of radially maximal graphs
Authors:
Pu Qiao,
Xingzhi Zhan
Abstract:
A graph is called radially maximal if it is not complete and the addition of any new edge decreases its radius. In 1976 Harary and Thomassen proved that the radius $r$ and diameter $d$ of any radially maximal graph satisfy $r\le d\le 2r-2.$ Dutton, Medidi and Brigham rediscovered this result with a different proof in 1995 and they posed the conjecture that the converse is true, that is, if $r$ and…
▽ More
A graph is called radially maximal if it is not complete and the addition of any new edge decreases its radius. In 1976 Harary and Thomassen proved that the radius $r$ and diameter $d$ of any radially maximal graph satisfy $r\le d\le 2r-2.$ Dutton, Medidi and Brigham rediscovered this result with a different proof in 1995 and they posed the conjecture that the converse is true, that is, if $r$ and $d$ are positive integers satisfying $r\le d\le 2r-2,$ then there exists a radially maximal graph with radius $r$ and diameter $d.$ We prove this conjecture and a little more.
△ Less
Submitted 16 October, 2019;
originally announced October 2019.
-
Visual Tree Convolutional Neural Network in Image Classification
Authors:
Yuntao Liu,
Yong Dou,
Ruochun Jin,
Peng Qiao
Abstract:
In image classification, Convolutional Neural Network(CNN) models have achieved high performance with the rapid development in deep learning. However, some categories in the image datasets are more difficult to distinguished than others. Improving the classification accuracy on these confused categories is benefit to the overall performance. In this paper, we build a Confusion Visual Tree(CVT) bas…
▽ More
In image classification, Convolutional Neural Network(CNN) models have achieved high performance with the rapid development in deep learning. However, some categories in the image datasets are more difficult to distinguished than others. Improving the classification accuracy on these confused categories is benefit to the overall performance. In this paper, we build a Confusion Visual Tree(CVT) based on the confused semantic level information to identify the confused categories. With the information provided by the CVT, we can lead the CNN training procedure to pay more attention on these confused categories. Therefore, we propose Visual Tree Convolutional Neural Networks(VT-CNN) based on the original deep CNN embedded with our CVT. We evaluate our VT-CNN model on the benchmark datasets CIFAR-10 and CIFAR-100. In our experiments, we build up 3 different VT-CNN models and they obtain improvement over their based CNN models by 1.36%, 0.89% and 0.64%, respectively.
△ Less
Submitted 4 June, 2019;
originally announced June 2019.
-
Relation between the number of leaves of a tree and its diameter
Authors:
Pu Qiao,
Xingzhi Zhan
Abstract:
Let $L(n,d)$ denote the minimum possible number of leaves in a tree of order $n$ and diameter $d.$ In 1975 Lesniak gave the lower bound
$B(n,d)=\lceil 2(n-1)/d\rceil$ for $L(n,d).$ When $d$ is even, $B(n,d)=L(n,d).$ But when $d$ is odd, $B(n,d)$ is smaller than $L(n,d)$ in general.
For example, $B(21,3)=14$ while $L(21,3)=19.$ We prove that for $d\ge 2,$…
▽ More
Let $L(n,d)$ denote the minimum possible number of leaves in a tree of order $n$ and diameter $d.$ In 1975 Lesniak gave the lower bound
$B(n,d)=\lceil 2(n-1)/d\rceil$ for $L(n,d).$ When $d$ is even, $B(n,d)=L(n,d).$ But when $d$ is odd, $B(n,d)$ is smaller than $L(n,d)$ in general.
For example, $B(21,3)=14$ while $L(21,3)=19.$ We prove that for $d\ge 2,$
$ L(n,d)=\left\lceil \frac{2(n-1)}{d}\right\rceil$ if $d$ is even and $L(n,d)=\left\lceil \frac{2(n-2)}{d-1}\right\rceil$ if $d$ is odd.
The converse problem is also considered. Let $D(n,f)$ be the minimum possible diameter of a tree of order $n$ with exactly $f$ leaves.
We prove that $D(n,f)=2$ if $n=f+1,$ $D(n,f)=2k+1$ if $n=kf+2,$ and $D(n,f)=2k+2$ if $kf+3\le n\le (k+1)f+1.$
△ Less
Submitted 27 April, 2019;
originally announced April 2019.
-
IF-TTN: Information Fused Temporal Transformation Network for Video Action Recognition
Authors:
Ke Yang,
Peng Qiao,
Dongsheng Li,
Yong Dou
Abstract:
Effective spatiotemporal feature representation is crucial to the video-based action recognition task. Focusing on discriminate spatiotemporal feature learning, we propose Information Fused Temporal Transformation Network (IF-TTN) for action recognition on top of popular Temporal Segment Network (TSN) framework. In the network, Information Fusion Module (IFM) is designed to fuse the appearance and…
▽ More
Effective spatiotemporal feature representation is crucial to the video-based action recognition task. Focusing on discriminate spatiotemporal feature learning, we propose Information Fused Temporal Transformation Network (IF-TTN) for action recognition on top of popular Temporal Segment Network (TSN) framework. In the network, Information Fusion Module (IFM) is designed to fuse the appearance and motion features at multiple ConvNet levels for each video snippet, forming a short-term video descriptor. With fused features as inputs, Temporal Transformation Networks (TTN) are employed to model middle-term temporal transformation between the neighboring snippets following a sequential order. As TSN itself depicts long-term temporal structure by segmental consensus, the proposed network comprehensively considers multiple granularity temporal features. Our IF-TTN achieves the state-of-the-art results on two most popular action recognition datasets: UCF101 and HMDB51. Empirical investigation reveals that our architecture is robust to the input motion map quality. Replacing optical flow with the motion vectors from compressed video stream, the performance is still comparable to the flow-based methods while the testing speed is 10x faster.
△ Less
Submitted 11 April, 2019; v1 submitted 26 February, 2019;
originally announced February 2019.
-
Exploring Frame Segmentation Networks for Temporal Action Localization
Authors:
Ke Yang,
Xiaolong Shen,
Peng Qiao,
Shijie Li,
Dongsheng Li,
Yong Dou
Abstract:
Temporal action localization is an important task of computer vision. Though many methods have been proposed, it still remains an open question how to predict the temporal location of action segments precisely. Most state-of-the-art works train action classifiers on video segments pre-determined by action proposal. However, recent work found that a desirable model should move beyond segment-level…
▽ More
Temporal action localization is an important task of computer vision. Though many methods have been proposed, it still remains an open question how to predict the temporal location of action segments precisely. Most state-of-the-art works train action classifiers on video segments pre-determined by action proposal. However, recent work found that a desirable model should move beyond segment-level and make dense predictions at a fine granularity in time to determine precise temporal boundaries. In this paper, we propose a Frame Segmentation Network (FSN) that places a temporal CNN on top of the 2D spatial CNNs. Spatial CNNs are responsible for abstracting semantics in spatial dimension while temporal CNN is responsible for introducing temporal context information and performing dense predictions. The proposed FSN can make dense predictions at frame-level for a video clip using both spatial and temporal context information. FSN is trained in an end-to-end manner, so the model can be optimized in spatial and temporal domain jointly. We also adapt FSN to use it in weakly supervised scenario (WFSN), where only video level labels are provided when training. Experiment results on public dataset show that FSN achieves superior performance in both frame-level action localization and temporal action localization.
△ Less
Submitted 14 February, 2019;
originally announced February 2019.
-
Pairs of a tree and a nontree graph with the same status sequence
Authors:
Pu Qiao,
Xingzhi Zhan
Abstract:
The status of a vertex $x$ in a graph is the sum of the distances between $x$ and all other vertices. Let $G$ be a connected graph. The status sequence of $G$ is the list of the statuses of all vertices arranged in nondecreasing order. $G$ is called status injective if all the statuses of its vertices are distinct. Let $G$ be a member of a family of graphs $\mathscr{F}$ and let the status sequence…
▽ More
The status of a vertex $x$ in a graph is the sum of the distances between $x$ and all other vertices. Let $G$ be a connected graph. The status sequence of $G$ is the list of the statuses of all vertices arranged in nondecreasing order. $G$ is called status injective if all the statuses of its vertices are distinct. Let $G$ be a member of a family of graphs $\mathscr{F}$ and let the status sequence of $G$ be $s.$ $G$ is said to be status unique in $\mathscr{F}$ if $G$ is the unique graph in $\mathscr{F}$ whose status sequence is $s.$ In 2011, J.L. Shang and C. Lin posed the following two conjectures. Conjecture 1: A tree and a nontree graph cannot have the same status sequence. Conjecture 2: Any status injective tree is status unique in all connected graphs. We settle these two conjectures negatively. For every integer $n\ge 10,$ we construct a tree $T_n$ and a unicyclic graph $U_n,$ both of order $n,$ with the following two properties: (1) $T_n$ and $U_n$ have the same status sequence; (2) for $n\ge 15,$ if $n$ is congruent to $3$ modulo $4$ then $T_n$ is status injective and among any four consecutive even orders, there is at least one order $n$ such that $T_n$ is status injective.
△ Less
Submitted 28 January, 2019;
originally announced January 2019.
-
The largest graphs with given order and diameter: A simple proof
Authors:
Pu Qiao,
Xingzhi Zhan
Abstract:
A consequence of Ore's classic theorem characterizing the maximal graphs with given order and diameter is a determination of the largest such graphs. We give a very short and simple proof of this smaller result, based on a well-known elementary observation.
A consequence of Ore's classic theorem characterizing the maximal graphs with given order and diameter is a determination of the largest such graphs. We give a very short and simple proof of this smaller result, based on a well-known elementary observation.
△ Less
Submitted 9 November, 2018;
originally announced November 2018.
-
Learning Generic Diffusion Processes for Image Restoration
Authors:
Peng Qiao,
Yong Dou,
Yunjin Chen,
Wensen Feng
Abstract:
Image restoration problems are typical ill-posed problems where the regularization term plays an important role. The regularization term learned via generative approaches is easy to transfer to various image restoration, but offers inferior restoration quality compared with that learned via discriminative approaches. On the contrary, the regularization term learned via discriminative approaches ar…
▽ More
Image restoration problems are typical ill-posed problems where the regularization term plays an important role. The regularization term learned via generative approaches is easy to transfer to various image restoration, but offers inferior restoration quality compared with that learned via discriminative approaches. On the contrary, the regularization term learned via discriminative approaches are usually trained for a specific image restoration problem, and fail in the problem for which it is not trained. To address this issue, we propose a generic diffusion process (genericDP) to handle multiple Gaussian denoising problems based on the Trainable Non-linear Reaction Diffusion (TNRD) models. Instead of one model, which consists of a diffusion and a reaction term, for one Gaussian denoising problem in TNRD, we enforce multiple TNRD models to share one diffusion term. The trained genericDP model can provide both promising denoising performance and high training efficiency compared with the original TNRD models. We also transfer the trained diffusion term to non-blind deconvolution which is unseen in the training phase. Experiment results show that the trained diffusion term for multiple Gaussian denoising can be transferred to image non-blind deconvolution as an image prior and provide competitive performance.
△ Less
Submitted 17 July, 2018;
originally announced July 2018.
-
Detour-saturated graphs of small girths
Authors:
Pu Qiao,
Xingzhi Zhan
Abstract:
A detour of a graph G is a longest path in G. The detour order of G is the number of vertices in a detour of G. A graph is said to be detour-saturated if the addition of any edge increases strictly the detour order. L.W. Beineke, J.E. Dunbar and M. Frick asked the following three questions in 2005. (1) What is the smallest order of a detour-saturated graph of girth 4? (2) Let Pr be the graph obtai…
▽ More
A detour of a graph G is a longest path in G. The detour order of G is the number of vertices in a detour of G. A graph is said to be detour-saturated if the addition of any edge increases strictly the detour order. L.W. Beineke, J.E. Dunbar and M. Frick asked the following three questions in 2005. (1) What is the smallest order of a detour-saturated graph of girth 4? (2) Let Pr be the graph obtained from the Petersen graph by splitting one of its vertices into three leaves. Is Pr the smallest triangle-free detour-saturated graph? (3) Does there exist a detour-saturated graph with finite girth bigger than 5? We answer these questions.
△ Less
Submitted 18 June, 2018;
originally announced June 2018.
-
The minimum number of Hamilton cycles in a hamiltonian threshold graph of a prescribed order
Authors:
Pu Qiao,
Xingzhi Zhan
Abstract:
We prove that the minimum number of Hamilton cycles in a hamiltonian threshold graph of order $n$ is $2^{\lfloor (n-3)/2\rfloor}$ and this minimum number is attained uniquely by the graph with degree sequence $n-1,n-1,n-2,\ldots,\lceil n/2\rceil,\lceil n/2\rceil,\ldots,3,2$ of $n-2$ distinct degrees. This graph is also the unique graph of minimum size among all hamiltonian threshold graphs of orde…
▽ More
We prove that the minimum number of Hamilton cycles in a hamiltonian threshold graph of order $n$ is $2^{\lfloor (n-3)/2\rfloor}$ and this minimum number is attained uniquely by the graph with degree sequence $n-1,n-1,n-2,\ldots,\lceil n/2\rceil,\lceil n/2\rceil,\ldots,3,2$ of $n-2$ distinct degrees. This graph is also the unique graph of minimum size among all hamiltonian threshold graphs of order $n.$
△ Less
Submitted 26 February, 2018;
originally announced February 2018.
-
Exploring Temporal Preservation Networks for Precise Temporal Action Localization
Authors:
Ke Yang,
Peng Qiao,
Dongsheng Li,
Shaohe Lv,
Yong Dou
Abstract:
Temporal action localization is an important task of computer vision. Though a variety of methods have been proposed, it still remains an open question how to predict the temporal boundaries of action segments precisely. Most works use segment-level classifiers to select video segments pre-determined by action proposal or dense sliding windows. However, in order to achieve more precise action boun…
▽ More
Temporal action localization is an important task of computer vision. Though a variety of methods have been proposed, it still remains an open question how to predict the temporal boundaries of action segments precisely. Most works use segment-level classifiers to select video segments pre-determined by action proposal or dense sliding windows. However, in order to achieve more precise action boundaries, a temporal localization system should make dense predictions at a fine granularity. A newly proposed work exploits Convolutional-Deconvolutional-Convolutional (CDC) filters to upsample the predictions of 3D ConvNets, making it possible to perform per-frame action predictions and achieving promising performance in terms of temporal action localization. However, CDC network loses temporal information partially due to the temporal downsampling operation. In this paper, we propose an elegant and powerful Temporal Preservation Convolutional (TPC) Network that equips 3D ConvNets with TPC filters. TPC network can fully preserve temporal resolution and downsample the spatial resolution simultaneously, enabling frame-level granularity action localization. TPC network can be trained in an end-to-end manner. Experiment results on public datasets show that TPC network achieves significant improvement on per-frame action prediction and competing results on segment-level temporal action localization.
△ Less
Submitted 11 September, 2017; v1 submitted 10 August, 2017;
originally announced August 2017.
-
Recent advances in high-contrast metastructures, metasurfaces and photonic crystals
Authors:
Pengfei Qiao,
Weijian Yang,
Connie J. Chang-Hasnain
Abstract:
In the recent decade, the research field using arrays of high-index-contrast near-wavelength dieletric structures on flat surfaces, known as high-contrast metastructures (HCMs) or metasurfaces, has emerged and expanded rapidly. Although the HCMs and metasurfaces share great similarities in physical structures with photonic crystals (PhCs), i.e. periodic nanostructures, many differences exist in th…
▽ More
In the recent decade, the research field using arrays of high-index-contrast near-wavelength dieletric structures on flat surfaces, known as high-contrast metastructures (HCMs) or metasurfaces, has emerged and expanded rapidly. Although the HCMs and metasurfaces share great similarities in physical structures with photonic crystals (PhCs), i.e. periodic nanostructures, many differences exist in their design, analysis, operation conditions, and applications. In this paper, we provide a generalized theoretical understanding of the two subjects and show their intrinsic connections. We further discuss the simulation and design approaches, categorized by their functionalities and applications. The similarity and differences between HCMs, metasurfaces and PhCs are also discussed. New findings are presented regarding the physical connection between the PhC band structures and the 1D and 2D HCM scattering spectra under transverse and longitudinal tilt incidence. Novel designs using HCMs as holograms, spatial light modulators, and surface plasmonic couplers are discussed. Recent advances on HCMs, metasurfaces and PhCs are reviewed and compared for applications such as broadband mirrors, waveguides, couplers, resonators, and reconfigurable optics.
△ Less
Submitted 24 July, 2017;
originally announced July 2017.
-
On vertex types of graphs
Authors:
Pu Qiao,
Xingzhi Zhan
Abstract:
The vertices of a graph are classified into seven types by J.T. Hedetniemi, S.M. Hedetniemi, S.T. Hedetniemi and T.M. Lewis and they ask the following questions: 1) What is the smallest order $n$ of a graph having $n-2$ very typical vertices or $n-2$ typical vertices? 2) What is the smallest order of a pantypical graph? We answer these two questions in this paper.
The vertices of a graph are classified into seven types by J.T. Hedetniemi, S.M. Hedetniemi, S.T. Hedetniemi and T.M. Lewis and they ask the following questions: 1) What is the smallest order $n$ of a graph having $n-2$ very typical vertices or $n-2$ typical vertices? 2) What is the smallest order of a pantypical graph? We answer these two questions in this paper.
△ Less
Submitted 26 May, 2017;
originally announced May 2017.
-
Learning Non-local Image Diffusion for Image Denoising
Authors:
Peng Qiao,
Yong Dou,
Wensen Feng,
Yunjin Chen
Abstract:
Image diffusion plays a fundamental role for the task of image denoising. Recently proposed trainable nonlinear reaction diffusion (TNRD) model defines a simple but very effective framework for image denoising. However, as the TNRD model is a local model, the diffusion behavior of which is purely controlled by information of local patches, it is prone to create artifacts in the homogenous regions…
▽ More
Image diffusion plays a fundamental role for the task of image denoising. Recently proposed trainable nonlinear reaction diffusion (TNRD) model defines a simple but very effective framework for image denoising. However, as the TNRD model is a local model, the diffusion behavior of which is purely controlled by information of local patches, it is prone to create artifacts in the homogenous regions and over-smooth highly textured regions, especially in the case of strong noise levels. Meanwhile, it is widely known that the non-local self-similarity (NSS) prior stands as an effective image prior for image denoising, which has been widely exploited in many non-local methods. In this work, we are highly motivated to embed the NSS prior into the TNRD model to tackle its weaknesses. In order to preserve the expected property that end-to-end training is available, we exploit the NSS prior by a set of non-local filters, and derive our proposed trainable non-local reaction diffusion (TNLRD) model for image denoising. Together with the local filters and influence functions, the non-local filters are learned by employing loss-specific training. The experimental results show that the trained TNLRD model produces visually plausible recovered images with more textures and less artifacts, compared to its local versions. Moreover, the trained TNLRD model can achieve strongly competitive performance to recent state-of-the-art image denoising methods in terms of peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM).
△ Less
Submitted 24 February, 2017;
originally announced February 2017.
-
A spatio-temporal model and inference tools for longitudinal count data on multicolor cell growth
Authors:
Puxue Qiao,
Christina Mølck,
Davide Ferrari,
Frédéric Hollande
Abstract:
Multicolor cell spatio-temporal image data have become important to investigate organ development and regeneration, malignant growth or immune responses by tracking different cell types both in vivo and in vitro. Statistical modeling of image data from common longitudinal cell experiments poses significant challenges due to the presence of complex spatio-temporal interactions between different cel…
▽ More
Multicolor cell spatio-temporal image data have become important to investigate organ development and regeneration, malignant growth or immune responses by tracking different cell types both in vivo and in vitro. Statistical modeling of image data from common longitudinal cell experiments poses significant challenges due to the presence of complex spatio-temporal interactions between different cell types and difficulties related to measurement of single cell trajectories. Current analysis methods focus mainly on univariate cases, often not considering the spatio-temporal effects affecting cell growth between different cell populations. In this paper, we propose a conditional spatial autoregressive model to describe multivariate count cell data on the lattice, and develop inference tools. The proposed methodology is computationally tractable and enables researchers to estimate a complete statistical model of multicolor cell growth. Our methodology is applied on real experimental data where we investigate how interactions between cells affect their growth. We include two case studies; the first evaluates interactions between cancer cells and fibroblasts, which are normally present in the tumor microenvironment, whilst the second evaluates interactions between cloned cancer cells when grown as different combinations.
△ Less
Submitted 9 April, 2018; v1 submitted 12 October, 2016;
originally announced October 2016.
-
Image Denoising via Multi-scale Nonlinear Diffusion Models
Authors:
Wensen Feng,
Peng Qiao,
Xuanyang Xi,
Yunjin Chen
Abstract:
Image denoising is a fundamental operation in image processing and holds considerable practical importance for various real-world applications. Arguably several thousands of papers are dedicated to image denoising. In the past decade, sate-of-the-art denoising algorithm have been clearly dominated by non-local patch-based methods, which explicitly exploit patch self-similarity within image. Howeve…
▽ More
Image denoising is a fundamental operation in image processing and holds considerable practical importance for various real-world applications. Arguably several thousands of papers are dedicated to image denoising. In the past decade, sate-of-the-art denoising algorithm have been clearly dominated by non-local patch-based methods, which explicitly exploit patch self-similarity within image. However, in recent two years, discriminatively trained local approaches have started to outperform previous non-local models and have been attracting increasing attentions due to the additional advantage of computational efficiency. Successful approaches include cascade of shrinkage fields (CSF) and trainable nonlinear reaction diffusion (TNRD). These two methods are built on filter response of linear filters of small size using feed forward architectures. Due to the locality inherent in local approaches, the CSF and TNRD model become less effective when noise level is high and consequently introduces some noise artifacts. In order to overcome this problem, in this paper we introduce a multi-scale strategy. To be specific, we build on our newly-developed TNRD model, adopting the multi-scale pyramid image representation to devise a multi-scale nonlinear diffusion process. As expected, all the parameters in the proposed multi-scale diffusion model, including the filters and the influence functions across scales, are learned from training data through a loss based approach. Numerical results on Gaussian and Poisson denoising substantiate that the exploited multi-scale strategy can successfully boost the performance of the original TNRD model with single scale. As a consequence, the resulting multi-scale diffusion models can significantly suppress the typical incorrect features for those noisy images with heavy noise.
△ Less
Submitted 21 September, 2016;
originally announced September 2016.
-
Turán problems for digraphs avoiding distinct walks of a given length with the same endpoints
Authors:
Zejun Huang,
Zhenhua Lyu,
Pu Qiao
Abstract:
Let $n \ge 5$ and $k\ge 4$ be positive integers. We determine the maximum size of digraphs of order n that avoid distinct walks of length k with the same endpoints. We also characterize the extremal digraphs attaining this maximum number when $k \ge 5$.
Let $n \ge 5$ and $k\ge 4$ be positive integers. We determine the maximum size of digraphs of order n that avoid distinct walks of length k with the same endpoints. We also characterize the extremal digraphs attaining this maximum number when $k \ge 5$.
△ Less
Submitted 22 August, 2016;
originally announced August 2016.
-
Effect of Ti doping on the electrical transport and magnetic properties of layered compound Na0.8CoO2
Authors:
W. Y. Zhang,
Y. G. Zhao,
Z. P. Guo,
P. T. Qiao,
L. Cui,
L. B. Luo,
X. P. Zhang,
H. C. Yu,
Y. G. Shi,
S. Y. Zhang,
T. Y. Zhao,
J. Q. Li
Abstract:
Effect of Ti doping on the electrical transport and magnetic properties of layered Na0.8Co1-xTixO2 compounds has been investigated. The lattice parameters a and c increase with x. A minor amount of Ti doping results in a metal-insulator transition at low temperatures. For samples with x > 0.03, the variable-range hopping process dominates the transport behavior above a certain temperature. The t…
▽ More
Effect of Ti doping on the electrical transport and magnetic properties of layered Na0.8Co1-xTixO2 compounds has been investigated. The lattice parameters a and c increase with x. A minor amount of Ti doping results in a metal-insulator transition at low temperatures. For samples with x > 0.03, the variable-range hopping process dominates the transport behavior above a certain temperature. The temperature dependence of magnetization of all the samples is found to obey the Curie-Weiss law. The mechanism of the doping effect is discussed.
△ Less
Submitted 12 June, 2005;
originally announced June 2005.
-
Influence of the starting composition on the structural and superconducting properties of MgB2 phase
Authors:
Y. G. Zhao,
X. P. Zhang,
P. T. Qiao,
H. T. Zhang,
S. L. Jia,
B. S. Cao M. H. Zhu,
Z. H. Han,
X. L. Wang,
B. L. Gu
Abstract:
We report the preparation of Mg$_{1-x}$B$_{2}$ (0$\le$x$\le$0.5) compounds with the nominal compositions. Single phase MgB$_{2}$ was obtained for x=0 sample. For 0$<$x$\le$0.5, MgB$_{4}$ coexists with "MgB$_{2}$" and the amount of MgB$_{4}$ increases with x. With the increase of x, the lattice parameter ${\it c}$ of "MgB$_{2}$" increases and the lattice parameter ${\it a}$ decreases, correspondi…
▽ More
We report the preparation of Mg$_{1-x}$B$_{2}$ (0$\le$x$\le$0.5) compounds with the nominal compositions. Single phase MgB$_{2}$ was obtained for x=0 sample. For 0$<$x$\le$0.5, MgB$_{4}$ coexists with "MgB$_{2}$" and the amount of MgB$_{4}$ increases with x. With the increase of x, the lattice parameter ${\it c}$ of "MgB$_{2}$" increases and the lattice parameter ${\it a}$ decreases, correspondingly T$_{c}$ of Mg$_{1-x}$B$_{2}$ decreases. The results were discussed in terms of the presence of Mg vacancies or B interstitials in the MgB$_{2}$ structure. This work is helpful to the understanding of the MgB$_{2}$ films with different T$_{c}$, as well as the Mg site doping effect for MgB$_{2}$.
△ Less
Submitted 3 May, 2001;
originally announced May 2001.
-
Effect of Li doping on structure and superconducting transition temperature of Mg1-xLixB2
Authors:
Y. G. Zhao,
X. P. Zhang,
P. T. Qiao,
H. T. Zhang,
S. L. Jia,
B. S. Cao M. H. Zhu,
Z. H. Han,
X. L. Wang,
B. L. Gu
Abstract:
We report the preparation of Mg1-xLixB2 compounds. Nearly single phased samples were obtained for x<0.3. The in-plane lattice parameter a decreases with Li doping, while the lattice parameter c does not show obvious change. The superconducting transition temperature of Mg1-xLixB2 decreases with Li doping and loss of superconductivity occurs for x=0.5 sample. The results of our work are consisten…
▽ More
We report the preparation of Mg1-xLixB2 compounds. Nearly single phased samples were obtained for x<0.3. The in-plane lattice parameter a decreases with Li doping, while the lattice parameter c does not show obvious change. The superconducting transition temperature of Mg1-xLixB2 decreases with Li doping and loss of superconductivity occurs for x=0.5 sample. The results of our work are consistent with the prediction of the hole superconductivity mechanism.
△ Less
Submitted 2 March, 2001;
originally announced March 2001.