-
Scalable and Robust Mobile Activity Fingerprinting via Over-the-Air Control Channel in 5G Networks
Authors:
Gunwoo Yoon,
Byeongdo Hong
Abstract:
5G has undergone significant changes in its over-the-air control channel architecture compared to legacy networks, aimed at enhancing performance. These changes have unintentionally strengthened the security of control channels, reducing vulnerabilities in radio channels for attackers. However, based on our experimental results, less than 10% of Physical Downlink Control Channel (PDCCH) messages c…
▽ More
5G has undergone significant changes in its over-the-air control channel architecture compared to legacy networks, aimed at enhancing performance. These changes have unintentionally strengthened the security of control channels, reducing vulnerabilities in radio channels for attackers. However, based on our experimental results, less than 10% of Physical Downlink Control Channel (PDCCH) messages could be decoded using sniffers. We demonstrate that even with this limited data, cell scanning and targeted user mobile activity tracking are feasible. This privacy attack exposes the number of active communication channels and reveals the mobile applications and their usage time. We propose an efficient deep learning-based mobile traffic classification method that eliminates the need for manual feature extraction, enabling scalability across various applications while maintaining high performance even in scenarios with data loss. We evaluated the effectiveness of our approach using both an open-source testbed and a commercial 5G testbed, demonstrating the feasibility of mobile activity fingerprinting and targeted attacks. To the best of our knowledge, this is the first study to track mobile activity over-the-air using PDCCH messages.
△ Less
Submitted 19 September, 2024;
originally announced September 2024.
-
Neural Dynamics Model of Visual Decision-Making: Learning from Human Experts
Authors:
Jie Su,
Fang Cai,
Shu-Kuo Zhao,
Xin-Yi Wang,
Tian-Yi Qian,
Da-Hui Wang,
Bo Hong
Abstract:
Uncovering the fundamental neural correlates of biological intelligence, developing mathematical models, and conducting computational simulations are critical for advancing new paradigms in artificial intelligence (AI). In this study, we implemented a comprehensive visual decision-making model that spans from visual input to behavioral output, using a neural dynamics modeling approach. Drawing ins…
▽ More
Uncovering the fundamental neural correlates of biological intelligence, developing mathematical models, and conducting computational simulations are critical for advancing new paradigms in artificial intelligence (AI). In this study, we implemented a comprehensive visual decision-making model that spans from visual input to behavioral output, using a neural dynamics modeling approach. Drawing inspiration from the key components of the dorsal visual pathway in primates, our model not only aligns closely with human behavior but also reflects neural activities in primates, and achieving accuracy comparable to convolutional neural networks (CNNs). Moreover, magnetic resonance imaging (MRI) identified key neuroimaging features such as structural connections and functional connectivity that are associated with performance in perceptual decision-making tasks. A neuroimaging-informed fine-tuning approach was introduced and applied to the model, leading to performance improvements that paralleled the behavioral variations observed among subjects. Compared to classical deep learning models, our model more accurately replicates the behavioral performance of biological intelligence, relying on the structural characteristics of biological neural networks rather than extensive training data, and demonstrating enhanced resilience to perturbation.
△ Less
Submitted 3 September, 2024;
originally announced September 2024.
-
SafeEmbodAI: a Safety Framework for Mobile Robots in Embodied AI Systems
Authors:
Wenxiao Zhang,
Xiangrui Kong,
Thomas Braunl,
Jin B. Hong
Abstract:
Embodied AI systems, including AI-powered robots that autonomously interact with the physical world, stand to be significantly advanced by Large Language Models (LLMs), which enable robots to better understand complex language commands and perform advanced tasks with enhanced comprehension and adaptability, highlighting their potential to improve embodied AI capabilities. However, this advancement…
▽ More
Embodied AI systems, including AI-powered robots that autonomously interact with the physical world, stand to be significantly advanced by Large Language Models (LLMs), which enable robots to better understand complex language commands and perform advanced tasks with enhanced comprehension and adaptability, highlighting their potential to improve embodied AI capabilities. However, this advancement also introduces safety challenges, particularly in robotic navigation tasks. Improper safety management can lead to failures in complex environments and make the system vulnerable to malicious command injections, resulting in unsafe behaviours such as detours or collisions. To address these issues, we propose \textit{SafeEmbodAI}, a safety framework for integrating mobile robots into embodied AI systems. \textit{SafeEmbodAI} incorporates secure prompting, state management, and safety validation mechanisms to secure and assist LLMs in reasoning through multi-modal data and validating responses. We designed a metric to evaluate mission-oriented exploration, and evaluations in simulated environments demonstrate that our framework effectively mitigates threats from malicious commands and improves performance in various environment settings, ensuring the safety of embodied AI systems. Notably, In complex environments with mixed obstacles, our method demonstrates a significant performance increase of 267\% compared to the baseline in attack scenarios, highlighting its robustness in challenging conditions.
△ Less
Submitted 3 September, 2024;
originally announced September 2024.
-
Optimizing Automated Picking Systems in Warehouse Robots Using Machine Learning
Authors:
Keqin Li,
Jin Wang,
Xubo Wu,
Xirui Peng,
Runmian Chang,
Xiaoyu Deng,
Yiwen Kang,
Yue Yang,
Fanghao Ni,
Bo Hong
Abstract:
With the rapid growth of global e-commerce, the demand for automation in the logistics industry is increasing. This study focuses on automated picking systems in warehouses, utilizing deep learning and reinforcement learning technologies to enhance picking efficiency and accuracy while reducing system failure rates. Through empirical analysis, we demonstrate the effectiveness of these technologies…
▽ More
With the rapid growth of global e-commerce, the demand for automation in the logistics industry is increasing. This study focuses on automated picking systems in warehouses, utilizing deep learning and reinforcement learning technologies to enhance picking efficiency and accuracy while reducing system failure rates. Through empirical analysis, we demonstrate the effectiveness of these technologies in improving robot picking performance and adaptability to complex environments. The results show that the integrated machine learning model significantly outperforms traditional methods, effectively addressing the challenges of peak order processing, reducing operational errors, and improving overall logistics efficiency. Additionally, by analyzing environmental factors, this study further optimizes system design to ensure efficient and stable operation under variable conditions. This research not only provides innovative solutions for logistics automation but also offers a theoretical and empirical foundation for future technological development and application.
△ Less
Submitted 29 August, 2024;
originally announced August 2024.
-
A Study on Prompt Injection Attack Against LLM-Integrated Mobile Robotic Systems
Authors:
Wenxiao Zhang,
Xiangrui Kong,
Conan Dewitt,
Thomas Braunl,
Jin B. Hong
Abstract:
The integration of Large Language Models (LLMs) like GPT-4o into robotic systems represents a significant advancement in embodied artificial intelligence. These models can process multi-modal prompts, enabling them to generate more context-aware responses. However, this integration is not without challenges. One of the primary concerns is the potential security risks associated with using LLMs in…
▽ More
The integration of Large Language Models (LLMs) like GPT-4o into robotic systems represents a significant advancement in embodied artificial intelligence. These models can process multi-modal prompts, enabling them to generate more context-aware responses. However, this integration is not without challenges. One of the primary concerns is the potential security risks associated with using LLMs in robotic navigation tasks. These tasks require precise and reliable responses to ensure safe and effective operation. Multi-modal prompts, while enhancing the robot's understanding, also introduce complexities that can be exploited maliciously. For instance, adversarial inputs designed to mislead the model can lead to incorrect or dangerous navigational decisions. This study investigates the impact of prompt injections on mobile robot performance in LLM-integrated systems and explores secure prompt strategies to mitigate these risks. Our findings demonstrate a substantial overall improvement of approximately 30.8% in both attack detection and system performance with the implementation of robust defence mechanisms, highlighting their critical role in enhancing security and reliability in mission-oriented tasks.
△ Less
Submitted 8 September, 2024; v1 submitted 6 August, 2024;
originally announced August 2024.
-
Exploiting Diffusion Prior for Out-of-Distribution Detection
Authors:
Armando Zhu,
Jiabei Liu,
Keqin Li,
Shuying Dai,
Bo Hong,
Peng Zhao,
Changsong Wei
Abstract:
Out-of-distribution (OOD) detection is crucial for deploying robust machine learning models, especially in areas where security is critical. However, traditional OOD detection methods often fail to capture complex data distributions from large scale date. In this paper, we present a novel approach for OOD detection that leverages the generative ability of diffusion models and the powerful feature…
▽ More
Out-of-distribution (OOD) detection is crucial for deploying robust machine learning models, especially in areas where security is critical. However, traditional OOD detection methods often fail to capture complex data distributions from large scale date. In this paper, we present a novel approach for OOD detection that leverages the generative ability of diffusion models and the powerful feature extraction capabilities of CLIP. By using these features as conditional inputs to a diffusion model, we can reconstruct the images after encoding them with CLIP. The difference between the original and reconstructed images is used as a signal for OOD identification. The practicality and scalability of our method is increased by the fact that it does not require class-specific labeled ID data, as is the case with many other methods. Extensive experiments on several benchmark datasets demonstrates the robustness and effectiveness of our method, which have significantly improved the detection accuracy.
△ Less
Submitted 21 August, 2024; v1 submitted 16 June, 2024;
originally announced June 2024.
-
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments
Authors:
Zhiheng Xi,
Yiwen Ding,
Wenxiang Chen,
Boyang Hong,
Honglin Guo,
Junzhe Wang,
Dingwen Yang,
Chenyang Liao,
Xin Guo,
Wei He,
Songyang Gao,
Lu Chen,
Rui Zheng,
Yicheng Zou,
Tao Gui,
Qi Zhang,
Xipeng Qiu,
Xuanjing Huang,
Zuxuan Wu,
Yu-Gang Jiang
Abstract:
Building generalist agents that can handle diverse tasks and evolve themselves across different environments is a long-term goal in the AI community. Large language models (LLMs) are considered a promising foundation to build such agents due to their generalized capabilities. Current approaches either have LLM-based agents imitate expert-provided trajectories step-by-step, requiring human supervis…
▽ More
Building generalist agents that can handle diverse tasks and evolve themselves across different environments is a long-term goal in the AI community. Large language models (LLMs) are considered a promising foundation to build such agents due to their generalized capabilities. Current approaches either have LLM-based agents imitate expert-provided trajectories step-by-step, requiring human supervision, which is hard to scale and limits environmental exploration; or they let agents explore and learn in isolated environments, resulting in specialist agents with limited generalization. In this paper, we take the first step towards building generally-capable LLM-based agents with self-evolution ability. We identify a trinity of ingredients: 1) diverse environments for agent exploration and learning, 2) a trajectory set to equip agents with basic capabilities and prior knowledge, and 3) an effective and scalable evolution method. We propose AgentGym, a new framework featuring a variety of environments and tasks for broad, real-time, uni-format, and concurrent agent exploration. AgentGym also includes a database with expanded instructions, a benchmark suite, and high-quality trajectories across environments. Next, we propose a novel method, AgentEvol, to investigate the potential of agent self-evolution beyond previously seen data across tasks and environments. Experimental results show that the evolved agents can achieve results comparable to SOTA models. We release the AgentGym suite, including the platform, dataset, benchmark, checkpoints, and algorithm implementations. The AgentGym suite is available on https://github.com/WooooDyy/AgentGym.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Cross-Task Multi-Branch Vision Transformer for Facial Expression and Mask Wearing Classification
Authors:
Armando Zhu,
Keqin Li,
Tong Wu,
Peng Zhao,
Bo Hong
Abstract:
With wearing masks becoming a new cultural norm, facial expression recognition (FER) while taking masks into account has become a significant challenge. In this paper, we propose a unified multi-branch vision transformer for facial expression recognition and mask wearing classification tasks. Our approach extracts shared features for both tasks using a dual-branch architecture that obtains multi-s…
▽ More
With wearing masks becoming a new cultural norm, facial expression recognition (FER) while taking masks into account has become a significant challenge. In this paper, we propose a unified multi-branch vision transformer for facial expression recognition and mask wearing classification tasks. Our approach extracts shared features for both tasks using a dual-branch architecture that obtains multi-scale feature representations. Furthermore, we propose a cross-task fusion phase that processes tokens for each task with separate branches, while exchanging information using a cross attention module. Our proposed framework reduces the overall complexity compared with using separate networks for both tasks by the simple yet effective cross-task fusion phase. Extensive experiments demonstrate that our proposed model performs better than or on par with different state-of-the-art methods on both facial expression recognition and facial mask wearing classification task.
△ Less
Submitted 30 April, 2024; v1 submitted 22 April, 2024;
originally announced April 2024.
-
The application of Augmented Reality (AR) in Remote Work and Education
Authors:
Keqin Li,
Peng Xirui,
Jintong Song,
Bo Hong,
Jin Wang
Abstract:
With the rapid advancement of technology, Augmented Reality (AR) technology, known for its ability to deeply integrate virtual information with the real world, is gradually transforming traditional work modes and teaching methods. Particularly in the realms of remote work and online education, AR technology demonstrates a broad spectrum of application prospects. This paper delves into the applicat…
▽ More
With the rapid advancement of technology, Augmented Reality (AR) technology, known for its ability to deeply integrate virtual information with the real world, is gradually transforming traditional work modes and teaching methods. Particularly in the realms of remote work and online education, AR technology demonstrates a broad spectrum of application prospects. This paper delves into the application potential and actual effects of AR technology in remote work and education. Through a systematic literature review, this study outlines the key features, advantages, and challenges of AR technology. Based on theoretical analysis, it discusses the scientific basis and technical support that AR technology provides for enhancing remote work efficiency and promoting innovation in educational teaching models. Additionally, by designing an empirical research plan and analyzing experimental data, this article reveals the specific performance and influencing factors of AR technology in practical applications. Finally, based on the results of the experiments, this research summarizes the application value of AR technology in remote work and education, looks forward to its future development trends, and proposes forward-looking research directions and strategic suggestions, offering empirical foundation and theoretical guidance for further promoting the in-depth application of AR technology in related fields.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
WorDepth: Variational Language Prior for Monocular Depth Estimation
Authors:
Ziyao Zeng,
Daniel Wang,
Fengyu Yang,
Hyoungseob Park,
Yangchao Wu,
Stefano Soatto,
Byung-Woo Hong,
Dong Lao,
Alex Wong
Abstract:
Three-dimensional (3D) reconstruction from a single image is an ill-posed problem with inherent ambiguities, i.e. scale. Predicting a 3D scene from text description(s) is similarly ill-posed, i.e. spatial arrangements of objects described. We investigate the question of whether two inherently ambiguous modalities can be used in conjunction to produce metric-scaled reconstructions. To test this, we…
▽ More
Three-dimensional (3D) reconstruction from a single image is an ill-posed problem with inherent ambiguities, i.e. scale. Predicting a 3D scene from text description(s) is similarly ill-posed, i.e. spatial arrangements of objects described. We investigate the question of whether two inherently ambiguous modalities can be used in conjunction to produce metric-scaled reconstructions. To test this, we focus on monocular depth estimation, the problem of predicting a dense depth map from a single image, but with an additional text caption describing the scene. To this end, we begin by encoding the text caption as a mean and standard deviation; using a variational framework, we learn the distribution of the plausible metric reconstructions of 3D scenes corresponding to the text captions as a prior. To "select" a specific reconstruction or depth map, we encode the given image through a conditional sampler that samples from the latent space of the variational text encoder, which is then decoded to the output depth map. Our approach is trained alternatingly between the text and image branches: in one optimization step, we predict the mean and standard deviation from the text description and sample from a standard Gaussian, and in the other, we sample using a (image) conditional sampler. Once trained, we directly predict depth from the encoded text using the conditional sampler. We demonstrate our approach on indoor (NYUv2) and outdoor (KITTI) scenarios, where we show that language can consistently improve performance in both.
△ Less
Submitted 2 June, 2024; v1 submitted 4 April, 2024;
originally announced April 2024.
-
HyperCLOVA X Technical Report
Authors:
Kang Min Yoo,
Jaegeun Han,
Sookyo In,
Heewon Jeon,
Jisu Jeong,
Jaewook Kang,
Hyunwook Kim,
Kyung-Min Kim,
Munhyong Kim,
Sungju Kim,
Donghyun Kwak,
Hanock Kwak,
Se Jung Kwon,
Bado Lee,
Dongsoo Lee,
Gichang Lee,
Jooho Lee,
Baeseong Park,
Seongjin Shin,
Joonsang Yu,
Seolki Baek,
Sumin Byeon,
Eungsup Cho,
Dooseok Choe,
Jeesung Han
, et al. (371 additional authors not shown)
Abstract:
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t…
▽ More
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs.
△ Less
Submitted 13 April, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
Expectations Versus Reality: Evaluating Intrusion Detection Systems in Practice
Authors:
Jake Hesford,
Daniel Cheng,
Alan Wan,
Larry Huynh,
Seungho Kim,
Hyoungshick Kim,
Jin B. Hong
Abstract:
Our paper provides empirical comparisons between recent IDSs to provide an objective comparison between them to help users choose the most appropriate solution based on their requirements. Our results show that no one solution is the best, but is dependent on external variables such as the types of attacks, complexity, and network environment in the dataset. For example, BoT_IoT and Stratosphere I…
▽ More
Our paper provides empirical comparisons between recent IDSs to provide an objective comparison between them to help users choose the most appropriate solution based on their requirements. Our results show that no one solution is the best, but is dependent on external variables such as the types of attacks, complexity, and network environment in the dataset. For example, BoT_IoT and Stratosphere IoT datasets both capture IoT-related attacks, but the deep neural network performed the best when tested using the BoT_IoT dataset while HELAD performed the best when tested using the Stratosphere IoT dataset. So although we found that a deep neural network solution had the highest average F1 scores on tested datasets, it is not always the best-performing one. We further discuss difficulties in using IDS from literature and project repositories, which complicated drawing definitive conclusions regarding IDS selection.
△ Less
Submitted 28 March, 2024; v1 submitted 26 March, 2024;
originally announced March 2024.
-
Image Captioning in news report scenario
Authors:
Tianrui Liu,
Qi Cai,
Changxin Xu,
Bo Hong,
Jize Xiong,
Yuxin Qiao,
Tsungwei Yang
Abstract:
Image captioning strives to generate pertinent captions for specified images, situating itself at the crossroads of Computer Vision (CV) and Natural Language Processing (NLP). This endeavor is of paramount importance with far-reaching applications in recommendation systems, news outlets, social media, and beyond. Particularly within the realm of news reporting, captions are expected to encompass d…
▽ More
Image captioning strives to generate pertinent captions for specified images, situating itself at the crossroads of Computer Vision (CV) and Natural Language Processing (NLP). This endeavor is of paramount importance with far-reaching applications in recommendation systems, news outlets, social media, and beyond. Particularly within the realm of news reporting, captions are expected to encompass detailed information, such as the identities of celebrities captured in the images. However, much of the existing body of work primarily centers around understanding scenes and actions. In this paper, we explore the realm of image captioning specifically tailored for celebrity photographs, illustrating its broad potential for enhancing news industry practices. This exploration aims to augment automated news content generation, thereby facilitating a more nuanced dissemination of information. Our endeavor shows a broader horizon, enriching the narrative in news reporting through a more intuitive image captioning framework.
△ Less
Submitted 1 April, 2024; v1 submitted 24 March, 2024;
originally announced March 2024.
-
Rumor Detection with a novel graph neural network approach
Authors:
Tianrui Liu,
Qi Cai,
Changxin Xu,
Bo Hong,
Fanghao Ni,
Yuxin Qiao,
Tsungwei Yang
Abstract:
The wide spread of rumors on social media has caused a negative impact on people's daily life, leading to potential panic, fear, and mental health problems for the public. How to debunk rumors as early as possible remains a challenging problem. Existing studies mainly leverage information propagation structure to detect rumors, while very few works focus on correlation among users that they may co…
▽ More
The wide spread of rumors on social media has caused a negative impact on people's daily life, leading to potential panic, fear, and mental health problems for the public. How to debunk rumors as early as possible remains a challenging problem. Existing studies mainly leverage information propagation structure to detect rumors, while very few works focus on correlation among users that they may coordinate to spread rumors in order to gain large popularity. In this paper, we propose a new detection model, that jointly learns both the representations of user correlation and information propagation to detect rumors on social media. Specifically, we leverage graph neural networks to learn the representations of user correlation from a bipartite graph that describes the correlations between users and source tweets, and the representations of information propagation with a tree structure. Then we combine the learned representations from these two modules to classify the rumors. Since malicious users intend to subvert our model after deployment, we further develop a greedy attack scheme to analyze the cost of three adversarial attacks: graph attack, comment attack, and joint attack. Evaluation results on two public datasets illustrate that the proposed MODEL outperforms the state-of-the-art rumor detection models. We also demonstrate our method performs well for early rumor detection. Moreover, the proposed detection method is more robust to adversarial attacks compared to the best existing method. Importantly, we show that it requires a high cost for attackers to subvert user correlation pattern, demonstrating the importance of considering user correlation for rumor detection.
△ Less
Submitted 1 April, 2024; v1 submitted 24 March, 2024;
originally announced March 2024.
-
Retrosynthesis Prediction via Search in (Hyper) Graph
Authors:
Zixun Lan,
Binjie Hong,
Jiajun Zhu,
Zuo Zeng,
Zhenfu Liu,
Limin Yu,
Fei Ma
Abstract:
Predicting reactants from a specified core product stands as a fundamental challenge within organic synthesis, termed retrosynthesis prediction. Recently, semi-template-based methods and graph-edits-based methods have achieved good performance in terms of both interpretability and accuracy. However, due to their mechanisms these methods cannot predict complex reactions, e.g., reactions with multip…
▽ More
Predicting reactants from a specified core product stands as a fundamental challenge within organic synthesis, termed retrosynthesis prediction. Recently, semi-template-based methods and graph-edits-based methods have achieved good performance in terms of both interpretability and accuracy. However, due to their mechanisms these methods cannot predict complex reactions, e.g., reactions with multiple reaction center or attaching the same leaving group to more than one atom. In this study we propose a semi-template-based method, the \textbf{Retro}synthesis via \textbf{S}earch \textbf{i}n (Hyper) \textbf{G}raph (RetroSiG) framework to alleviate these limitations. In the proposed method, we turn the reaction center identification and the leaving group completion tasks as tasks of searching in the product molecular graph and leaving group hypergraph respectively. As a semi-template-based method RetroSiG has several advantages. First, RetroSiG is able to handle the complex reactions mentioned above by its novel search mechanism. Second, RetroSiG naturally exploits the hypergraph to model the implicit dependencies between leaving groups. Third, RetroSiG makes full use of the prior, i.e., one-hop constraint. It reduces the search space and enhances overall performance. Comprehensive experiments demonstrated that RetroSiG achieved competitive results. Furthermore, we conducted experiments to show the capability of RetroSiG in predicting complex reactions. Ablation experiments verified the efficacy of specific elements, such as the one-hop constraint and the leaving group hypergraph.
△ Less
Submitted 9 February, 2024;
originally announced February 2024.
-
Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning
Authors:
Zhiheng Xi,
Wenxiang Chen,
Boyang Hong,
Senjie Jin,
Rui Zheng,
Wei He,
Yiwen Ding,
Shichun Liu,
Xin Guo,
Junzhe Wang,
Honglin Guo,
Wei Shen,
Xiaoran Fan,
Yuhao Zhou,
Shihan Dou,
Xiao Wang,
Xinbo Zhang,
Peng Sun,
Tao Gui,
Qi Zhang,
Xuanjing Huang
Abstract:
In this paper, we propose R$^3$: Learning Reasoning through Reverse Curriculum Reinforcement Learning (RL), a novel method that employs only outcome supervision to achieve the benefits of process supervision for large language models. The core challenge in applying RL to complex reasoning is to identify a sequence of actions that result in positive rewards and provide appropriate supervision for o…
▽ More
In this paper, we propose R$^3$: Learning Reasoning through Reverse Curriculum Reinforcement Learning (RL), a novel method that employs only outcome supervision to achieve the benefits of process supervision for large language models. The core challenge in applying RL to complex reasoning is to identify a sequence of actions that result in positive rewards and provide appropriate supervision for optimization. Outcome supervision provides sparse rewards for final results without identifying error locations, whereas process supervision offers step-wise rewards but requires extensive manual annotation. R$^3$ overcomes these limitations by learning from correct demonstrations. Specifically, R$^3$ progressively slides the start state of reasoning from a demonstration's end to its beginning, facilitating easier model exploration at all stages. Thus, R$^3$ establishes a step-wise curriculum, allowing outcome supervision to offer step-level signals and precisely pinpoint errors. Using Llama2-7B, our method surpasses RL baseline on eight reasoning tasks by $4.1$ points on average. Notebaly, in program-based reasoning on GSM8K, it exceeds the baseline by $4.2$ points across three backbone models, and without any extra data, Codellama-7B + R$^3$ performs comparable to larger models or closed-source models.
△ Less
Submitted 17 March, 2024; v1 submitted 8 February, 2024;
originally announced February 2024.
-
MouSi: Poly-Visual-Expert Vision-Language Models
Authors:
Xiaoran Fan,
Tao Ji,
Changhao Jiang,
Shuo Li,
Senjie Jin,
Sirui Song,
Junke Wang,
Boyang Hong,
Lu Chen,
Guodong Zheng,
Ming Zhang,
Caishuang Huang,
Rui Zheng,
Zhiheng Xi,
Yuhao Zhou,
Shihan Dou,
Junjie Ye,
Hang Yan,
Tao Gui,
Qi Zhang,
Xipeng Qiu,
Xuanjing Huang,
Zuxuan Wu,
Yu-Gang Jiang
Abstract:
Current large vision-language models (VLMs) often encounter challenges such as insufficient capabilities of a single visual component and excessively long visual tokens. These issues can limit the model's effectiveness in accurately interpreting complex visual information and over-lengthy contextual information. Addressing these challenges is crucial for enhancing the performance and applicability…
▽ More
Current large vision-language models (VLMs) often encounter challenges such as insufficient capabilities of a single visual component and excessively long visual tokens. These issues can limit the model's effectiveness in accurately interpreting complex visual information and over-lengthy contextual information. Addressing these challenges is crucial for enhancing the performance and applicability of VLMs. This paper proposes the use of ensemble experts technique to synergizes the capabilities of individual visual encoders, including those skilled in image-text matching, OCR, image segmentation, etc. This technique introduces a fusion network to unify the processing of outputs from different visual experts, while bridging the gap between image encoders and pre-trained LLMs. In addition, we explore different positional encoding schemes to alleviate the waste of positional encoding caused by lengthy image feature sequences, effectively addressing the issue of position overflow and length limitations. For instance, in our implementation, this technique significantly reduces the positional occupancy in models like SAM, from a substantial 4096 to a more efficient and manageable 64 or even down to 1. Experimental results demonstrate that VLMs with multiple experts exhibit consistently superior performance over isolated visual encoders and mark a significant performance boost as more experts are integrated. We have open-sourced the training code used in this report. All of these resources can be found on our project website.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
Applying Bayesian Ridge Regression AI Modeling in Virus Severity Prediction
Authors:
Jai Pal,
Bryan Hong
Abstract:
Artificial intelligence (AI) is a powerful tool for reshaping healthcare systems. In healthcare, AI is invaluable for its capacity to manage vast amounts of data, which can lead to more accurate and speedy diagnoses, ultimately easing the workload on healthcare professionals. As a result, AI has proven itself to be a power tool across various industries, simplifying complex tasks and pattern recog…
▽ More
Artificial intelligence (AI) is a powerful tool for reshaping healthcare systems. In healthcare, AI is invaluable for its capacity to manage vast amounts of data, which can lead to more accurate and speedy diagnoses, ultimately easing the workload on healthcare professionals. As a result, AI has proven itself to be a power tool across various industries, simplifying complex tasks and pattern recognition that would otherwise be overwhelming for humans or traditional computer algorithms. In this paper, we review the strengths and weaknesses of Bayesian Ridge Regression, an AI model that can be used to bring cutting edge virus analysis to healthcare professionals around the world. The model's accuracy assessment revealed promising results, with room for improvement primarily related to data organization. In addition, the severity index serves as a valuable tool to gain a broad overview of patient care needs, aligning with healthcare professionals' preference for broader categorizations.
△ Less
Submitted 4 December, 2023; v1 submitted 14 October, 2023;
originally announced October 2023.
-
Generalized Gain and Impedance Expressions for Single-Transistor Amplifiers
Authors:
Brian Hong
Abstract:
This expository manuscript presents generalized expressions for the low-frequency voltage gain and terminal impedances of each of the three fundamental bipolar-amplifier topologies (i.e., common emitter, common base, and common collector). Unlike the formulas that students typically learn and designers typically use, the equations presented in this tutorial assume the most general set of condition…
▽ More
This expository manuscript presents generalized expressions for the low-frequency voltage gain and terminal impedances of each of the three fundamental bipolar-amplifier topologies (i.e., common emitter, common base, and common collector). Unlike the formulas that students typically learn and designers typically use, the equations presented in this tutorial assume the most general set of conditions: finite output resistance and base-collector current gain, a load resistor at each non-input terminal of the transistor, and a "feedback" resistor between the base and collector terminals. Although perhaps algebraically complex at first glance, emphasis is placed on mathematical elegance and ease of use -- expressions are formulated in terms of sub-terms that capture important aspects of the circuit's behavior. Similarities in the mathematical structure of the results reveal a deeper conceptual connection between different amplifier topologies and, ultimately, a reciprocity relationship between the base and emitter terminals. Familiar approximate expressions are subsumed as special cases. Tables consolidating the expressions in an organized fashion are provided. Companion results for metal-oxide-semiconductor (MOS) single-transistor amplifiers are also included.
△ Less
Submitted 17 September, 2023;
originally announced September 2023.
-
The Rise and Potential of Large Language Model Based Agents: A Survey
Authors:
Zhiheng Xi,
Wenxiang Chen,
Xin Guo,
Wei He,
Yiwen Ding,
Boyang Hong,
Ming Zhang,
Junzhe Wang,
Senjie Jin,
Enyu Zhou,
Rui Zheng,
Xiaoran Fan,
Xiao Wang,
Limao Xiong,
Yuhao Zhou,
Weiran Wang,
Changhao Jiang,
Yicheng Zou,
Xiangyang Liu,
Zhangyue Yin,
Shihan Dou,
Rongxiang Weng,
Wensen Cheng,
Qi Zhang,
Wenjuan Qin
, et al. (4 additional authors not shown)
Abstract:
For a long time, humanity has pursued artificial intelligence (AI) equivalent to or surpassing the human level, with AI agents considered a promising vehicle for this pursuit. AI agents are artificial entities that sense their environment, make decisions, and take actions. Many efforts have been made to develop intelligent agents, but they mainly focus on advancement in algorithms or training stra…
▽ More
For a long time, humanity has pursued artificial intelligence (AI) equivalent to or surpassing the human level, with AI agents considered a promising vehicle for this pursuit. AI agents are artificial entities that sense their environment, make decisions, and take actions. Many efforts have been made to develop intelligent agents, but they mainly focus on advancement in algorithms or training strategies to enhance specific capabilities or performance on particular tasks. Actually, what the community lacks is a general and powerful model to serve as a starting point for designing AI agents that can adapt to diverse scenarios. Due to the versatile capabilities they demonstrate, large language models (LLMs) are regarded as potential sparks for Artificial General Intelligence (AGI), offering hope for building general AI agents. Many researchers have leveraged LLMs as the foundation to build AI agents and have achieved significant progress. In this paper, we perform a comprehensive survey on LLM-based agents. We start by tracing the concept of agents from its philosophical origins to its development in AI, and explain why LLMs are suitable foundations for agents. Building upon this, we present a general framework for LLM-based agents, comprising three main components: brain, perception, and action, and the framework can be tailored for different applications. Subsequently, we explore the extensive applications of LLM-based agents in three aspects: single-agent scenarios, multi-agent scenarios, and human-agent cooperation. Following this, we delve into agent societies, exploring the behavior and personality of LLM-based agents, the social phenomena that emerge from an agent society, and the insights they offer for human society. Finally, we discuss several key topics and open problems within the field. A repository for the related papers at https://github.com/WooooDyy/LLM-Agent-Paper-List.
△ Less
Submitted 19 September, 2023; v1 submitted 14 September, 2023;
originally announced September 2023.
-
Use neural networks to recognize students' handwritten letters and incorrect symbols
Authors:
JiaJun Zhu,
Zichuan Yang,
Binjie Hong,
Jiacheng Song,
Jiwei Wang,
Tianhao Chen,
Shuilan Yang,
Zixun Lan,
Fei Ma
Abstract:
Correcting students' multiple-choice answers is a repetitive and mechanical task that can be considered an image multi-classification task. Assuming possible options are 'abcd' and the correct option is one of the four, some students may write incorrect symbols or options that do not exist. In this paper, five classifications were set up - four for possible correct options and one for other incorr…
▽ More
Correcting students' multiple-choice answers is a repetitive and mechanical task that can be considered an image multi-classification task. Assuming possible options are 'abcd' and the correct option is one of the four, some students may write incorrect symbols or options that do not exist. In this paper, five classifications were set up - four for possible correct options and one for other incorrect writing. This approach takes into account the possibility of non-standard writing options.
△ Less
Submitted 12 September, 2023;
originally announced September 2023.
-
RCsearcher: Reaction Center Identification in Retrosynthesis via Deep Q-Learning
Authors:
Zixun Lan,
Zuo Zeng,
Binjie Hong,
Zhenfu Liu,
Fei Ma
Abstract:
The reaction center consists of atoms in the product whose local properties are not identical to the corresponding atoms in the reactants. Prior studies on reaction center identification are mainly on semi-templated retrosynthesis methods. Moreover, they are limited to single reaction center identification. However, many reaction centers are comprised of multiple bonds or atoms in reality. We refe…
▽ More
The reaction center consists of atoms in the product whose local properties are not identical to the corresponding atoms in the reactants. Prior studies on reaction center identification are mainly on semi-templated retrosynthesis methods. Moreover, they are limited to single reaction center identification. However, many reaction centers are comprised of multiple bonds or atoms in reality. We refer to it as the multiple reaction center. This paper presents RCsearcher, a unified framework for single and multiple reaction center identification that combines the advantages of the graph neural network and deep reinforcement learning. The critical insight in this framework is that the single or multiple reaction center must be a node-induced subgraph of the molecular product graph. At each step, it considers choosing one node in the molecular product graph and adding it to the explored node-induced subgraph as an action. Comprehensive experiments demonstrate that RCsearcher consistently outperforms other baselines and can extrapolate the reaction center patterns that have not appeared in the training set. Ablation experiments verify the effectiveness of individual components, including the beam search and one-hop constraint of action space.
△ Less
Submitted 27 January, 2023;
originally announced January 2023.
-
Unsupervised Cardiac Segmentation Utilizing Synthesized Images from Anatomical Labels
Authors:
Sihan Wang,
Fuping Wu,
Lei Li,
Zheyao Gao,
Byung-Woo Hong,
Xiahai Zhuang
Abstract:
Cardiac segmentation is in great demand for clinical practice. Due to the enormous labor of manual delineation, unsupervised segmentation is desired. The ill-posed optimization problem of this task is inherently challenging, requiring well-designed constraints. In this work, we propose an unsupervised framework for multi-class segmentation with both intensity and shape constraints. Firstly, we ext…
▽ More
Cardiac segmentation is in great demand for clinical practice. Due to the enormous labor of manual delineation, unsupervised segmentation is desired. The ill-posed optimization problem of this task is inherently challenging, requiring well-designed constraints. In this work, we propose an unsupervised framework for multi-class segmentation with both intensity and shape constraints. Firstly, we extend a conventional non-convex energy function as an intensity constraint and implement it with U-Net. For shape constraint, synthetic images are generated from anatomical labels via image-to-image translation, as shape supervision for the segmentation network. Moreover, augmentation invariance is applied to facilitate the segmentation network to learn the latent features in terms of shape. We evaluated the proposed framework using the public datasets from MICCAI2019 MSCMR Challenge and achieved promising results on cardiac MRIs with Dice scores of 0.5737, 0.7796, and 0.6287 in Myo, LV, and RV, respectively.
△ Less
Submitted 15 January, 2023;
originally announced January 2023.
-
More Interpretable Graph Similarity Computation via Maximum Common Subgraph Inference
Authors:
Zixun Lan,
Binjie Hong,
Ye Ma,
Fei Ma
Abstract:
Graph similarity measurement, which computes the distance/similarity between two graphs, arises in various graph-related tasks. Recent learning-based methods lack interpretability, as they directly transform interaction information between two graphs into one hidden vector and then map it to similarity. To cope with this problem, this study proposes a more interpretable end-to-end paradigm for gra…
▽ More
Graph similarity measurement, which computes the distance/similarity between two graphs, arises in various graph-related tasks. Recent learning-based methods lack interpretability, as they directly transform interaction information between two graphs into one hidden vector and then map it to similarity. To cope with this problem, this study proposes a more interpretable end-to-end paradigm for graph similarity learning, named Similarity Computation via Maximum Common Subgraph Inference (INFMCS). Our critical insight into INFMCS is the strong correlation between similarity score and Maximum Common Subgraph (MCS). We implicitly infer MCS to obtain the normalized MCS size, with the supervision information being only the similarity score during training. To capture more global information, we also stack some vanilla transformer encoder layers with graph convolution layers and propose a novel permutation-invariant node Positional Encoding. The entire model is quite simple yet effective. Comprehensive experiments demonstrate that INFMCS consistently outperforms state-of-the-art baselines for graph-graph classification and regression tasks. Ablation experiments verify the effectiveness of the proposed computation paradigm and other components. Also, visualization and statistics of results reveal the interpretability of INFMCS.
△ Less
Submitted 16 September, 2022; v1 submitted 9 August, 2022;
originally announced August 2022.
-
U-Net with ResNet Backbone for Garment Landmarking Purpose
Authors:
Khay Boon Hong
Abstract:
We build a heatmap-based landmark detection model to locate important landmarks on 2D RGB garment images. The main goal is to detect edges, corners and suitable interior region of the garments. This let us re-create 3D garments in modern 3D editing software by incorporate landmark detection model and texture unwrapping. We use a U-net architecture with ResNet backbone to build the model. With an a…
▽ More
We build a heatmap-based landmark detection model to locate important landmarks on 2D RGB garment images. The main goal is to detect edges, corners and suitable interior region of the garments. This let us re-create 3D garments in modern 3D editing software by incorporate landmark detection model and texture unwrapping. We use a U-net architecture with ResNet backbone to build the model. With an appropriate loss function, we are able to train a moderately robust model.
△ Less
Submitted 26 April, 2022;
originally announced April 2022.
-
Monitored Distillation for Positive Congruent Depth Completion
Authors:
Tian Yu Liu,
Parth Agrawal,
Allison Chen,
Byung-Woo Hong,
Alex Wong
Abstract:
We propose a method to infer a dense depth map from a single image, its calibration, and the associated sparse point cloud. In order to leverage existing models (teachers) that produce putative depth maps, we propose an adaptive knowledge distillation approach that yields a positive congruent training process, wherein a student model avoids learning the error modes of the teachers. In the absence…
▽ More
We propose a method to infer a dense depth map from a single image, its calibration, and the associated sparse point cloud. In order to leverage existing models (teachers) that produce putative depth maps, we propose an adaptive knowledge distillation approach that yields a positive congruent training process, wherein a student model avoids learning the error modes of the teachers. In the absence of ground truth for model selection and training, our method, termed Monitored Distillation, allows a student to exploit a blind ensemble of teachers by selectively learning from predictions that best minimize the reconstruction error for a given image. Monitored Distillation yields a distilled depth map and a confidence map, or ``monitor'', for how well a prediction from a particular teacher fits the observed image. The monitor adaptively weights the distilled depth where if all of the teachers exhibit high residuals, the standard unsupervised image reconstruction loss takes over as the supervisory signal. On indoor scenes (VOID), we outperform blind ensembling baselines by 17.53% and unsupervised methods by 24.25%; we boast a 79% model size reduction while maintaining comparable performance to the best supervised method. For outdoors (KITTI), we tie for 5th overall on the benchmark despite not using ground truth. Code available at: https://github.com/alexklwong/mondi-python.
△ Less
Submitted 25 October, 2022; v1 submitted 29 March, 2022;
originally announced March 2022.
-
Small Lesion Segmentation in Brain MRIs with Subpixel Embedding
Authors:
Alex Wong,
Allison Chen,
Yangchao Wu,
Safa Cicek,
Alexandre Tiard,
Byung-Woo Hong,
Stefano Soatto
Abstract:
We present a method to segment MRI scans of the human brain into ischemic stroke lesion and normal tissues. We propose a neural network architecture in the form of a standard encoder-decoder where predictions are guided by a spatial expansion embedding network. Our embedding network learns features that can resolve detailed structures in the brain without the need for high-resolution training imag…
▽ More
We present a method to segment MRI scans of the human brain into ischemic stroke lesion and normal tissues. We propose a neural network architecture in the form of a standard encoder-decoder where predictions are guided by a spatial expansion embedding network. Our embedding network learns features that can resolve detailed structures in the brain without the need for high-resolution training images, which are often unavailable and expensive to acquire. Alternatively, the encoder-decoder learns global structures by means of striding and max pooling. Our embedding network complements the encoder-decoder architecture by guiding the decoder with fine-grained details lost to spatial downsampling during the encoder stage. Unlike previous works, our decoder outputs at 2 times the input resolution, where a single pixel in the input resolution is predicted by four neighboring subpixels in our output. To obtain the output at the original scale, we propose a learnable downsampler (as opposed to hand-crafted ones e.g. bilinear) that combines subpixel predictions. Our approach improves the baseline architecture by approximately 11.7% and achieves the state of the art on the ATLAS public benchmark dataset with a smaller memory footprint and faster runtime than the best competing method. Our source code has been made available at: https://github.com/alexklwong/subpixel-embedding-segmentation.
△ Less
Submitted 17 September, 2021;
originally announced September 2021.
-
Optimal Approximation with Sparse Neural Networks and Applications
Authors:
Khay Boon Hong
Abstract:
We use deep sparsely connected neural networks to measure the complexity of a function class in $L^2(\mathbb R^d)$ by restricting connectivity and memory requirement for storing the neural networks. We also introduce representation system - a countable collection of functions to guide neural networks, since approximation theory with representation system has been well developed in Mathematics. We…
▽ More
We use deep sparsely connected neural networks to measure the complexity of a function class in $L^2(\mathbb R^d)$ by restricting connectivity and memory requirement for storing the neural networks. We also introduce representation system - a countable collection of functions to guide neural networks, since approximation theory with representation system has been well developed in Mathematics. We then prove the fundamental bound theorem, implying a quantity intrinsic to the function class itself can give information about the approximation ability of neural networks and representation system. We also provides a method for transferring existing theories about approximation by representation systems to that of neural networks, greatly amplifying the practical values of neural networks. Finally, we use neural networks to approximate B-spline functions, which are used to generate the B-spline curves. Then, we analyse the complexity of a class called $β$ cartoon-like functions using rate-distortion theory and wedgelets construction.
△ Less
Submitted 14 August, 2021;
originally announced August 2021.
-
Instance-based Vision Transformer for Subtyping of Papillary Renal Cell Carcinoma in Histopathological Image
Authors:
Zeyu Gao,
Bangyang Hong,
Xianli Zhang,
Yang Li,
Chang Jia,
Jialun Wu,
Chunbao Wang,
Deyu Meng,
Chen Li
Abstract:
Histological subtype of papillary (p) renal cell carcinoma (RCC), type 1 vs. type 2, is an essential prognostic factor. The two subtypes of pRCC have a similar pattern, i.e., the papillary architecture, yet some subtle differences, including cellular and cell-layer level patterns. However, the cellular and cell-layer level patterns almost cannot be captured by existing CNN-based models in large-si…
▽ More
Histological subtype of papillary (p) renal cell carcinoma (RCC), type 1 vs. type 2, is an essential prognostic factor. The two subtypes of pRCC have a similar pattern, i.e., the papillary architecture, yet some subtle differences, including cellular and cell-layer level patterns. However, the cellular and cell-layer level patterns almost cannot be captured by existing CNN-based models in large-size histopathological images, which brings obstacles to directly applying these models to such a fine-grained classification task. This paper proposes a novel instance-based Vision Transformer (i-ViT) to learn robust representations of histopathological images for the pRCC subtyping task by extracting finer features from instance patches (by cropping around segmented nuclei and assigning predicted grades). The proposed i-ViT takes top-K instances as input and aggregates them for capturing both the cellular and cell-layer level patterns by a position-embedding layer, a grade-embedding layer, and a multi-head multi-layer self-attention module. To evaluate the performance of the proposed framework, experienced pathologists are invited to selected 1162 regions of interest from 171 whole slide images of type 1 and type 2 pRCC. Experimental results show that the proposed method achieves better performance than existing CNN-based models with a significant margin.
△ Less
Submitted 23 June, 2021;
originally announced June 2021.
-
An Adaptive Framework for Learning Unsupervised Depth Completion
Authors:
Alex Wong,
Xiaohan Fei,
Byung-Woo Hong,
Stefano Soatto
Abstract:
We present a method to infer a dense depth map from a color image and associated sparse depth measurements. Our main contribution lies in the design of an annealing process for determining co-visibility (occlusions, disocclusions) and the degree of regularization to impose on the model. We show that regularization and co-visibility are related via the fitness (residual) of model to data and both c…
▽ More
We present a method to infer a dense depth map from a color image and associated sparse depth measurements. Our main contribution lies in the design of an annealing process for determining co-visibility (occlusions, disocclusions) and the degree of regularization to impose on the model. We show that regularization and co-visibility are related via the fitness (residual) of model to data and both can be unified into a single framework to improve the learning process. Our method is an adaptive weighting scheme that guides optimization by measuring the residual at each pixel location over each training step for (i) estimating a soft visibility mask and (ii) determining the amount of regularization. We demonstrate the effectiveness our method by applying it to several recent unsupervised depth completion methods and improving their performance on public benchmark datasets, without incurring additional trainable parameters or increase in inference time. Code available at: https://github.com/alexklwong/adaframe-depth-completion.
△ Less
Submitted 24 August, 2021; v1 submitted 5 June, 2021;
originally announced June 2021.
-
Model-based Cybersecurity Analysis: Past Work and Future Directions
Authors:
Simon Yusuf Enoch,
Mengmeng Ge,
Jin B. Hong,
Dong Seong Kim
Abstract:
Model-based evaluation in cybersecurity has a long history. Attack Graphs (AGs) and Attack Trees (ATs) were the earlier developed graphical security models for cybersecurity analysis. However, they have limitations (e.g., scalability problem, state-space explosion problem, etc.) and lack the ability to capture other security features (e.g., countermeasures). To address the limitations and to cope…
▽ More
Model-based evaluation in cybersecurity has a long history. Attack Graphs (AGs) and Attack Trees (ATs) were the earlier developed graphical security models for cybersecurity analysis. However, they have limitations (e.g., scalability problem, state-space explosion problem, etc.) and lack the ability to capture other security features (e.g., countermeasures). To address the limitations and to cope with various security features, a graphical security model named attack countermeasure tree (ACT) was developed to perform security analysis by taking into account both attacks and countermeasures. In our research, we have developed different variants of a hierarchical graphical security model to solve the complexity, dynamicity, and scalability issues involved with security models in the security analysis of systems. In this paper, we summarize and classify security models into the following; graph-based, tree-based, and hybrid security models. We discuss the development of a hierarchical attack representation model (HARM) and different variants of the HARM, its applications, and usability in a variety of domains including the Internet of Things (IoT), Cloud, Software-Defined Networking, and Moving Target Defenses. We provide the classification of the security metrics, including their discussions. Finally, we highlight existing problems and suggest future research directions in the area of graphical security models and applications. As a result of this work, a decision-maker can understand which type of HARM will suit their network or security analysis requirements.
△ Less
Submitted 24 May, 2021; v1 submitted 18 May, 2021;
originally announced May 2021.
-
Generative Adversarial Networks via a Composite Annealing of Noise and Diffusion
Authors:
Kensuke Nakamura,
Simon Korman,
Byung-Woo Hong
Abstract:
Generative adversarial network (GAN) is a framework for generating fake data using a set of real examples. However, GAN is unstable in the training stage. In order to stabilize GANs, the noise injection has been used to enlarge the overlap of the real and fake distributions at the cost of increasing variance. The diffusion (or smoothing) may reduce the intrinsic underlying dimensionality of data b…
▽ More
Generative adversarial network (GAN) is a framework for generating fake data using a set of real examples. However, GAN is unstable in the training stage. In order to stabilize GANs, the noise injection has been used to enlarge the overlap of the real and fake distributions at the cost of increasing variance. The diffusion (or smoothing) may reduce the intrinsic underlying dimensionality of data but it suppresses the capability of GANs to learn high-frequency information in the training procedure. Based on these observations, we propose a data representation for the GAN training, called noisy scale-space (NSS), that recursively applies the smoothing with a balanced noise to data in order to replace the high-frequency information by random data, leading to a coarse-to-fine training of GANs. We experiment with NSS using DCGAN and StyleGAN2 based on benchmark datasets in which the NSS-based GANs outperforms the state-of-the-arts in most cases.
△ Less
Submitted 31 July, 2022; v1 submitted 1 May, 2021;
originally announced May 2021.
-
Regularization in network optimization via trimmed stochastic gradient descent with noisy label
Authors:
Kensuke Nakamura,
Bong-Soo Sohn,
Kyoung-Jae Won,
Byung-Woo Hong
Abstract:
Regularization is essential for avoiding over-fitting to training data in network optimization, leading to better generalization of the trained networks. The label noise provides a strong implicit regularization by replacing the target ground truth labels of training examples by uniform random labels. However, it can cause undesirable misleading gradients due to the large loss associated with inco…
▽ More
Regularization is essential for avoiding over-fitting to training data in network optimization, leading to better generalization of the trained networks. The label noise provides a strong implicit regularization by replacing the target ground truth labels of training examples by uniform random labels. However, it can cause undesirable misleading gradients due to the large loss associated with incorrect labels. We propose a first-order optimization method (Label-Noised Trim-SGD) that uses the label noise with the example trimming in order to remove the outliers based on the loss. The proposed algorithm is simple yet enables us to impose a large label-noise and obtain a better regularization effect than the original methods. The quantitative analysis is performed by comparing the behavior of the label noise, the example trimming, and the proposed algorithm. We also present empirical results that demonstrate the effectiveness of our algorithm using the major benchmarks and the fundamental networks, where our method has successfully outperformed the state-of-the-art optimization methods.
△ Less
Submitted 2 May, 2022; v1 submitted 20 December, 2020;
originally announced December 2020.
-
An Empirical Evaluation of Bluetooth-based Decentralized Contact Tracing in Crowds
Authors:
Hsu-Chun Hsiao,
Chun-Ying Huang,
Shin-Ming Cheng,
Bing-Kai Hong,
Hsin-Yuan Hu,
Chia-Chien Wu,
Jian-Sin Lee,
Shih-Hong Wang,
Wei Jeng
Abstract:
Digital contact tracing is being used by many countries to help contain COVID-19's spread in a post-lockdown world. Among the various available techniques, decentralized contact tracing that uses Bluetooth received signal strength indication (RSSI) to detect proximity is considered less of a privacy risk than approaches that rely on collecting absolute locations via GPS, cellular-tower history, or…
▽ More
Digital contact tracing is being used by many countries to help contain COVID-19's spread in a post-lockdown world. Among the various available techniques, decentralized contact tracing that uses Bluetooth received signal strength indication (RSSI) to detect proximity is considered less of a privacy risk than approaches that rely on collecting absolute locations via GPS, cellular-tower history, or QR-code scanning. As of October 2020, there have been millions of downloads of such Bluetooth-based contract-tracing apps, as more and more countries officially adopt them. However, the effectiveness of these apps in the real world remains unclear due to a lack of empirical research that includes realistic crowd sizes and densities. This study aims to fill that gap, by empirically investigating the effectiveness of Bluetooth-based contact tracing in crowd environments with a total of 80 participants, emulating classrooms, moving lines, and other types of real-world gatherings. The results confirm that Bluetooth RSSI is unreliable for detecting proximity, and that this inaccuracy worsens in environments that are especially crowded. In other words, this technique may be least useful when it is most in need, and that it is fragile when confronted by low-cost jamming. Moreover, technical problems such as high energy consumption and phone overheating caused by the contact-tracing app were found to negatively influence users' willingness to adopt it. On the bright side, however, Bluetooth RSSI may still be useful for detecting coarse-grained contact events, for example, proximity of up to 20m lasting for an hour. Based on our findings, we recommend that existing contact-tracing apps can be re-purposed to focus on coarse-grained proximity detection, and that future ones calibrate distance estimates and adjust broadcast frequencies based on auxiliary information.
△ Less
Submitted 4 November, 2021; v1 submitted 9 November, 2020;
originally announced November 2020.
-
Apparel-invariant Feature Learning for Apparel-changed Person Re-identification
Authors:
Zhengxu Yu,
Yilun Zhao,
Bin Hong,
Zhongming Jin,
Jianqiang Huang,
Deng Cai,
Xiaofei He,
Xian-Sheng Hua
Abstract:
With the rise of deep learning methods, person Re-Identification (ReID) performance has been improved tremendously in many public datasets. However, most public ReID datasets are collected in a short time window in which persons' appearance rarely changes. In real-world applications such as in a shopping mall, the same person's clothing may change, and different persons may wearing similar clothes…
▽ More
With the rise of deep learning methods, person Re-Identification (ReID) performance has been improved tremendously in many public datasets. However, most public ReID datasets are collected in a short time window in which persons' appearance rarely changes. In real-world applications such as in a shopping mall, the same person's clothing may change, and different persons may wearing similar clothes. All these cases can result in an inconsistent ReID performance, revealing a critical problem that current ReID models heavily rely on person's apparels. Therefore, it is critical to learn an apparel-invariant person representation under cases like cloth changing or several persons wearing similar clothes. In this work, we tackle this problem from the viewpoint of invariant feature representation learning. The main contributions of this work are as follows. (1) We propose the semi-supervised Apparel-invariant Feature Learning (AIFL) framework to learn an apparel-invariant pedestrian representation using images of the same person wearing different clothes. (2) To obtain images of the same person wearing different clothes, we propose an unsupervised apparel-simulation GAN (AS-GAN) to synthesize cloth changing images according to the target cloth embedding. It's worth noting that the images used in ReID tasks were cropped from real-world low-quality CCTV videos, making it more challenging to synthesize cloth changing images. We conduct extensive experiments on several datasets comparing with several baselines. Experimental results demonstrate that our proposal can improve the ReID performance of the baseline models.
△ Less
Submitted 16 August, 2020; v1 submitted 13 August, 2020;
originally announced August 2020.
-
Exposure Density and Neighborhood Disparities in COVID-19 Infection Risk: Using Large-scale Geolocation Data to Understand Burdens on Vulnerable Communities
Authors:
Boyeong Hong,
Bartosz Bonczak,
Arpit Gupta,
Lorna Thorpe,
Constantine E. Kontokosta
Abstract:
This study develops a new method to quantify neighborhood activity levels at high spatial and temporal resolutions and test whether, and to what extent, behavioral responses to social distancing policies vary with socioeconomic and demographic characteristics. We define exposure density as a measure of both the localized volume of activity in a defined area and the proportion of activity occurring…
▽ More
This study develops a new method to quantify neighborhood activity levels at high spatial and temporal resolutions and test whether, and to what extent, behavioral responses to social distancing policies vary with socioeconomic and demographic characteristics. We define exposure density as a measure of both the localized volume of activity in a defined area and the proportion of activity occurring in non-residential and outdoor land uses. We utilize this approach to capture inflows/outflows of people as a result of the pandemic and changes in mobility behavior for those that remain. First, we develop a generalizable method for assessing neighborhood activity levels by land use type using smartphone geolocation data over a three-month period covering more than 12 million unique users within the Greater New York area. Second, we measure and analyze disparities in community social distancing by identifying patterns in neighborhood activity levels and characteristics before and after the stay-at-home order. Finally, we evaluate the effect of social distancing in neighborhoods on COVID-19 infection rates and outcomes associated with localized demographic, socioeconomic, and infrastructure characteristics in order to identify disparities in health outcomes related to exposure risk. Our findings provide insight into the timely evaluation of the effectiveness of social distancing for individual neighborhoods and support a more equitable allocation of resources to support vulnerable and at-risk communities. Our findings demonstrate distinct patterns of activity pre- and post-COVID across neighborhoods. The variation in exposure density has a direct and measurable impact on the risk of infection.
△ Less
Submitted 4 August, 2020;
originally announced August 2020.
-
Composite Metrics for Network Security Analysis
Authors:
Simon Yusuf Enoch,
Jin B. Hong,
Mengmeng Ge,
Dong Seong Kim
Abstract:
Security metrics present the security level of a system or a network in both qualitative and quantitative ways. In general, security metrics are used to assess the security level of a system and to achieve security goals. There are a lot of security metrics for security analysis, but there is no systematic classification of security metrics that are based on network reachability information. To ad…
▽ More
Security metrics present the security level of a system or a network in both qualitative and quantitative ways. In general, security metrics are used to assess the security level of a system and to achieve security goals. There are a lot of security metrics for security analysis, but there is no systematic classification of security metrics that are based on network reachability information. To address this, we propose a systematic classification of existing security metrics based on network reachability information. Mainly, we classify the security metrics into host-based and network-based metrics. The host-based metrics are classified into metrics ``without probability" and "with probability", while the network-based metrics are classified into "path-based" and "non-path based". Finally, we present and describe an approach to develop composite security metrics and it's calculations using a Hierarchical Attack Representation Model (HARM) via an example network. Our novel classification of security metrics provides a new methodology to assess the security of a system.
△ Less
Submitted 17 July, 2020; v1 submitted 7 July, 2020;
originally announced July 2020.
-
Stochastic batch size for adaptive regularization in deep network optimization
Authors:
Kensuke Nakamura,
Stefano Soatto,
Byung-Woo Hong
Abstract:
We propose a first-order stochastic optimization algorithm incorporating adaptive regularization applicable to machine learning problems in deep learning framework. The adaptive regularization is imposed by stochastic process in determining batch size for each model parameter at each optimization iteration. The stochastic batch size is determined by the update probability of each parameter followi…
▽ More
We propose a first-order stochastic optimization algorithm incorporating adaptive regularization applicable to machine learning problems in deep learning framework. The adaptive regularization is imposed by stochastic process in determining batch size for each model parameter at each optimization iteration. The stochastic batch size is determined by the update probability of each parameter following a distribution of gradient norms in consideration of their local and global properties in the neural network architecture where the range of gradient norms may vary within and across layers. We empirically demonstrate the effectiveness of our algorithm using an image classification task based on conventional network models applied to commonly used benchmark datasets. The quantitative evaluation indicates that our algorithm outperforms the state-of-the-art optimization algorithms in generalization while providing less sensitivity to the selection of batch size which often plays a critical role in optimization, thus achieving more robustness to the selection of regularity.
△ Less
Submitted 14 April, 2020;
originally announced April 2020.
-
Adaptive Regularization via Residual Smoothing in Deep Learning Optimization
Authors:
Junghee Cho,
Junseok Kwon,
Byung-Woo Hong
Abstract:
We present an adaptive regularization algorithm that can be effectively applied to the optimization problem in deep learning framework. Our regularization algorithm aims to take into account the fitness of data to the current state of model in the determination of regularity to achieve better generalization. The degree of regularization at each element in the target space of the neural network arc…
▽ More
We present an adaptive regularization algorithm that can be effectively applied to the optimization problem in deep learning framework. Our regularization algorithm aims to take into account the fitness of data to the current state of model in the determination of regularity to achieve better generalization. The degree of regularization at each element in the target space of the neural network architecture is determined based on the residual at each optimization iteration in an adaptive way. Our adaptive regularization algorithm is designed to apply a diffusion process driven by the heat equation with spatially varying diffusivity depending on the probability density function following a certain distribution of residual. Our data-driven regularity is imposed by adaptively smoothing a simplified objective function in which the explicit regularization term is omitted in an alternating manner between the evaluation of residual and the determination of the degree of its regularity. The effectiveness of our algorithm is empirically demonstrated by the numerical experiments in the application of image classification problems, indicating that our algorithm outperforms other commonly used optimization algorithms in terms of generalization using popular deep learning models and benchmark datasets.
△ Less
Submitted 30 August, 2019; v1 submitted 23 July, 2019;
originally announced July 2019.
-
Adaptive Weight Decay for Deep Neural Networks
Authors:
Kensuke Nakamura,
Byung-Woo Hong
Abstract:
Regularization in the optimization of deep neural networks is often critical to avoid undesirable over-fitting leading to better generalization of model. One of the most popular regularization algorithms is to impose L-2 penalty on the model parameters resulting in the decay of parameters, called weight-decay, and the decay rate is generally constant to all the model parameters in the course of op…
▽ More
Regularization in the optimization of deep neural networks is often critical to avoid undesirable over-fitting leading to better generalization of model. One of the most popular regularization algorithms is to impose L-2 penalty on the model parameters resulting in the decay of parameters, called weight-decay, and the decay rate is generally constant to all the model parameters in the course of optimization. In contrast to the previous approach based on the constant rate of weight-decay, we propose to consider the residual that measures dissimilarity between the current state of model and observations in the determination of the weight-decay for each parameter in an adaptive way, called adaptive weight-decay (AdaDecay) where the gradient norms are normalized within each layer and the degree of regularization for each parameter is determined in proportional to the magnitude of its gradient using the sigmoid function. We empirically demonstrate the effectiveness of AdaDecay in comparison to the state-of-the-art optimization algorithms using popular benchmark datasets: MNIST, Fashion-MNIST, and CIFAR-10 with conventional neural network models ranging from shallow to deep. The quantitative evaluation of our proposed algorithm indicates that AdaDecay improves generalization leading to better accuracy across all the datasets and models.
△ Less
Submitted 7 August, 2019; v1 submitted 21 July, 2019;
originally announced July 2019.
-
Bilateral Cyclic Constraint and Adaptive Regularization for Unsupervised Monocular Depth Prediction
Authors:
Alex Wong,
Byung-Woo Hong,
Stefano Soatto
Abstract:
Supervised learning methods to infer (hypothesize) depth of a scene from a single image require costly per-pixel ground-truth. We follow a geometric approach that exploits abundant stereo imagery to learn a model to hypothesize scene structure without direct supervision. Although we train a network with stereo pairs, we only require a single image at test time to hypothesize disparity or depth. We…
▽ More
Supervised learning methods to infer (hypothesize) depth of a scene from a single image require costly per-pixel ground-truth. We follow a geometric approach that exploits abundant stereo imagery to learn a model to hypothesize scene structure without direct supervision. Although we train a network with stereo pairs, we only require a single image at test time to hypothesize disparity or depth. We propose a novel objective function that exploits the bilateral cyclic relationship between the left and right disparities and we introduce an adaptive regularization scheme that allows the network to handle both the co-visible and occluded regions in a stereo pair. This process ultimately produces a model to generate hypotheses for the 3-dimensional structure of the scene as viewed in a single image. When used to generate a single (most probable) estimate of depth, our method outperforms state-of-the-art unsupervised monocular depth prediction methods on the KITTI benchmarks. We show that our method generalizes well by applying our models trained on KITTI to the Make3d dataset.
△ Less
Submitted 19 June, 2019; v1 submitted 18 March, 2019;
originally announced March 2019.
-
CloudSafe: A Tool for an Automated Security Analysis for Cloud Computing
Authors:
Seoungmo An,
Taehoon Eom,
Jong Sou Park,
Jin B. Hong,
Armstrong Nhlabatsi,
Noora Fetais,
Khaled M. Khan,
Dong Seong Kim
Abstract:
Cloud computing has been adopted widely, providing on-demand computing resources to improve perfornance and reduce the operational costs. However, these new functionalities also bring new ways to exploit the cloud computing environment. To assess the security of the cloud, graphical security models can be used, such as Attack Graphs and Attack Trees. However, existing models do not consider all ty…
▽ More
Cloud computing has been adopted widely, providing on-demand computing resources to improve perfornance and reduce the operational costs. However, these new functionalities also bring new ways to exploit the cloud computing environment. To assess the security of the cloud, graphical security models can be used, such as Attack Graphs and Attack Trees. However, existing models do not consider all types of threats, and also automating the security assessment functions are difficult. In this paper, we propose a new security assessment tool for the cloud named CloudSafe, an automated security assessment for the cloud. The CloudSafe tool collates various tools and frameworks to automate the security assessment process. To demonstrate the applicability of the CloudSafe, we conducted security assessment in Amazon AWS, where our experimental results showed that we can effectively gather security information of the cloud and carry out security assessment to produce security reports. Users and cloud service providers can use the security report generated by the CloudSafe to understand the security posture of the cloud being used/provided.
△ Less
Submitted 11 March, 2019;
originally announced March 2019.
-
Safe Element Screening for Submodular Function Minimization
Authors:
Weizhong Zhang,
Bin Hong,
Lin Ma,
Wei Liu,
Tong Zhang
Abstract:
Submodular functions are discrete analogs of convex functions, which have applications in various fields, including machine learning and computer vision. However, in large-scale applications, solving Submodular Function Minimization (SFM) problems remains challenging. In this paper, we make the first attempt to extend the emerging technique named screening in large-scale sparse learning to SFM for…
▽ More
Submodular functions are discrete analogs of convex functions, which have applications in various fields, including machine learning and computer vision. However, in large-scale applications, solving Submodular Function Minimization (SFM) problems remains challenging. In this paper, we make the first attempt to extend the emerging technique named screening in large-scale sparse learning to SFM for accelerating its optimization process. We first conduct a careful studying of the relationships between SFM and the corresponding convex proximal problems, as well as the accurate primal optimum estimation of the proximal problems. Relying on this study, we subsequently propose a novel safe screening method to quickly identify the elements guaranteed to be included (we refer to them as active) or excluded (inactive) in the final optimal solution of SFM during the optimization process. By removing the inactive elements and fixing the active ones, the problem size can be dramatically reduced, leading to great savings in the computational cost without sacrificing any accuracy. To the best of our knowledge, the proposed method is the first screening method in the fields of SFM and even combinatorial optimization, thus pointing out a new direction for accelerating SFM algorithms. Experiment results on both synthetic and real datasets demonstrate the significant speedups gained by our approach.
△ Less
Submitted 6 June, 2018; v1 submitted 22 May, 2018;
originally announced May 2018.
-
Block-Cyclic Stochastic Coordinate Descent for Deep Neural Networks
Authors:
Kensuke Nakamura,
Stefano Soatto,
Byung-Woo Hong
Abstract:
We present a stochastic first-order optimization algorithm, named BCSC, that adds a cyclic constraint to stochastic block-coordinate descent. It uses different subsets of the data to update different subsets of the parameters, thus limiting the detrimental effect of outliers in the training set. Empirical tests in benchmark datasets show that our algorithm outperforms state-of-the-art optimization…
▽ More
We present a stochastic first-order optimization algorithm, named BCSC, that adds a cyclic constraint to stochastic block-coordinate descent. It uses different subsets of the data to update different subsets of the parameters, thus limiting the detrimental effect of outliers in the training set. Empirical tests in benchmark datasets show that our algorithm outperforms state-of-the-art optimization methods in both accuracy as well as convergence speed. The improvements are consistent across different architectures, and can be combined with other training techniques and regularization methods.
△ Less
Submitted 20 November, 2017;
originally announced November 2017.
-
Predictors of Re-admission for Homeless Families in New York City: The Case of the Win Shelter Network
Authors:
Constantine Kontokosta,
Boyeong Hong,
Awais Malik,
Ira M. Bellach,
Xueqi Huang,
Kristi Korsberg,
Dara Perl,
Avikal Somvanshi
Abstract:
New York City faces the challenge of an ever-increasing homeless population with almost 60,000 people currently living in city shelters. In 2015, approximately 25% of families stayed longer than 9 months in a shelter, and 17% of families with children that exited a homeless shelter returned to the shelter system within 30 days of leaving. This suggests that "long-term" shelter residents and those…
▽ More
New York City faces the challenge of an ever-increasing homeless population with almost 60,000 people currently living in city shelters. In 2015, approximately 25% of families stayed longer than 9 months in a shelter, and 17% of families with children that exited a homeless shelter returned to the shelter system within 30 days of leaving. This suggests that "long-term" shelter residents and those that re-enter shelters contribute significantly to the rise of the homeless population living in city shelters and indicate systemic challenges to finding adequate permanent housing. Women in Need (Win) is a non-profit agency that provides shelter to almost 10,000 homeless women and children (10% of all homeless families of NYC), and is the largest homeless shelter provider in the City. This paper focuses on our preliminary work with Win to understand the factors that affect the rate of readmission of homeless families at Win shelters, and to predict the likelihood of re-entry into the shelter system on exit. These insights will enable improved service delivery and operational efficiencies at these shelters. This paper describes our recent efforts to integrate Win datasets with city records to create a unified, comprehensive database of the homeless population being served by Win shelters. A preliminary classification model is developed to predict the odds of readmission and length of shelter stay based on the demographic and socioeconomic characteristics of the homeless population served by Win. This work is intended to form the basis for establishing a network of "smart shelters" through the use of data science and data technologies.
△ Less
Submitted 18 October, 2017;
originally announced October 2017.
-
Equity in 311 Reporting: Understanding Socio-Spatial Differentials in the Propensity to Complain
Authors:
Constantine Kontokosta,
Boyeong Hong,
Kristi Korsberg
Abstract:
Cities across the United States are implementing information communication technologies in an effort to improve government services. One such innovation in e-government is the creation of 311 systems, offering a centralized platform where citizens can request services, report non-emergency concerns, and obtain information about the city via hotline, mobile, or web-based applications. The NYC 311 s…
▽ More
Cities across the United States are implementing information communication technologies in an effort to improve government services. One such innovation in e-government is the creation of 311 systems, offering a centralized platform where citizens can request services, report non-emergency concerns, and obtain information about the city via hotline, mobile, or web-based applications. The NYC 311 service request system represents one of the most significant links between citizens and city government, accounting for more than 8,000,000 requests annually. These systems are generating massive amounts of data that, when properly managed, cleaned, and mined, can yield significant insights into the real-time condition of the city. Increasingly, these data are being used to develop predictive models of citizen concerns and problem conditions within the city. However, predictive models trained on these data can suffer from biases in the propensity to make a request that can vary based on socio-economic and demographic characteristics of an area, cultural differences that can affect citizens' willingness to interact with their government, and differential access to Internet connectivity. Using more than 20,000,000 311 requests - together with building violation data from the NYC Department of Buildings and the NYC Department of Housing Preservation and Development; property data from NYC Department of City Planning; and demographic and socioeconomic data from the U.S. Census American Community Survey - we develop a two-step methodology to evaluate the propensity to complain: (1) we predict, using a gradient boosting regression model, the likelihood of heating and hot water violations for a given building, and (2) we then compare the actual complaint volume for buildings with predicted violations to quantify discrepancies across the City.
△ Less
Submitted 6 October, 2017;
originally announced October 2017.
-
Adaptive Regularization of Some Inverse Problems in Image Analysis
Authors:
Byung-Woo Hong,
Ja-Keoung Koo,
Martin Burger,
Stefano Soatto
Abstract:
We present an adaptive regularization scheme for optimizing composite energy functionals arising in image analysis problems. The scheme automatically trades off data fidelity and regularization depending on the current data fit during the iterative optimization, so that regularization is strongest initially, and wanes as data fidelity improves, with the weight of the regularizer being minimized at…
▽ More
We present an adaptive regularization scheme for optimizing composite energy functionals arising in image analysis problems. The scheme automatically trades off data fidelity and regularization depending on the current data fit during the iterative optimization, so that regularization is strongest initially, and wanes as data fidelity improves, with the weight of the regularizer being minimized at convergence. We also introduce the use of a Huber loss function in both data fidelity and regularization terms, and present an efficient convex optimization algorithm based on the alternating direction method of multipliers (ADMM) using the equivalent relation between the Huber function and the proximal operator of the one-norm. We illustrate and validate our adaptive Huber-Huber model on synthetic and real images in segmentation, motion estimation, and denoising problems.
△ Less
Submitted 9 May, 2017;
originally announced May 2017.
-
Multi-Label Segmentation via Residual-Driven Adaptive Regularization
Authors:
Byung-Woo Hong,
Ja-Keoung Koo,
Stefano Soatto
Abstract:
We present a variational multi-label segmentation algorithm based on a robust Huber loss for both the data and the regularizer, minimized within a convex optimization framework. We introduce a novel constraint on the common areas, to bias the solution towards mutually exclusive regions. We also propose a regularization scheme that is adapted to the spatial statistics of the residual at each iterat…
▽ More
We present a variational multi-label segmentation algorithm based on a robust Huber loss for both the data and the regularizer, minimized within a convex optimization framework. We introduce a novel constraint on the common areas, to bias the solution towards mutually exclusive regions. We also propose a regularization scheme that is adapted to the spatial statistics of the residual at each iteration, resulting in a varying degree of regularization being applied as the algorithm proceeds: the effect of the regularizer is strongest at initialization, and wanes as the solution increasingly fits the data. This minimizes the bias induced by the regularizer at convergence. We design an efficient convex optimization algorithm based on the alternating direction method of multipliers using the equivalent relation between the Huber function and the proximal operator of the one-norm. We empirically validate our proposed algorithm on synthetic and real images and offer an information-theoretic derivation of the cost-function that highlights the modeling choices made.
△ Less
Submitted 27 February, 2017;
originally announced February 2017.
-
Adaptive Regularization in Convex Composite Optimization for Variational Imaging Problems
Authors:
Byung-Woo Hong,
Ja-Keoung Koo,
Hendrik Dirks,
Martin Burger
Abstract:
We propose an adaptive regularization scheme in a variational framework where a convex composite energy functional is optimized. We consider a number of imaging problems including denoising, segmentation and motion estimation, which are considered as optimal solutions of the energy functionals that mainly consist of data fidelity, regularization and a control parameter for their trade-off. We pres…
▽ More
We propose an adaptive regularization scheme in a variational framework where a convex composite energy functional is optimized. We consider a number of imaging problems including denoising, segmentation and motion estimation, which are considered as optimal solutions of the energy functionals that mainly consist of data fidelity, regularization and a control parameter for their trade-off. We presents an algorithm to determine the relative weight between data fidelity and regularization based on the residual that measures how well the observation fits the model. Our adaptive regularization scheme is designed to locally control the regularization at each pixel based on the assumption that the diversity of the residual of a given imaging model spatially varies. The energy optimization is presented in the alternating direction method of multipliers (ADMM) framework where the adaptive regularization is iteratively applied along with mathematical analysis of the proposed algorithm. We demonstrate the robustness and effectiveness of our adaptive regularization through experimental results presenting that the qualitative and quantitative evaluation results of each imaging task are superior to the results with a constant regularization scheme. The desired properties, robustness and effectiveness, of the regularization parameter selection in a variational framework for imaging problems are achieved by merely replacing the static regularization parameter with our adaptive one.
△ Less
Submitted 28 February, 2017; v1 submitted 8 September, 2016;
originally announced September 2016.
-
Scaling Up Sparse Support Vector Machines by Simultaneous Feature and Sample Reduction
Authors:
Weizhong Zhang,
Bin Hong,
Wei Liu,
Jieping Ye,
Deng Cai,
Xiaofei He,
Jie Wang
Abstract:
Sparse support vector machine (SVM) is a popular classification technique that can simultaneously learn a small set of the most interpretable features and identify the support vectors. It has achieved great successes in many real-world applications. However, for large-scale problems involving a huge number of samples and ultra-high dimensional features, solving sparse SVMs remains challenging. By…
▽ More
Sparse support vector machine (SVM) is a popular classification technique that can simultaneously learn a small set of the most interpretable features and identify the support vectors. It has achieved great successes in many real-world applications. However, for large-scale problems involving a huge number of samples and ultra-high dimensional features, solving sparse SVMs remains challenging. By noting that sparse SVMs induce sparsities in both feature and sample spaces, we propose a novel approach, which is based on accurate estimations of the primal and dual optima of sparse SVMs, to simultaneously identify the inactive features and samples that are guaranteed to be irrelevant to the outputs. Thus, we can remove the identified inactive samples and features from the training phase, leading to substantial savings in the computational cost without sacrificing the accuracy. Moreover, we show that our method can be extended to multi-class sparse support vector machines. To the best of our knowledge, the proposed method is the \emph{first} \emph{static} feature and sample reduction method for sparse SVMs and multi-class sparse SVMs. Experiments on both synthetic and real data sets demonstrate that our approach significantly outperforms state-of-the-art methods and the speedup gained by our approach can be orders of magnitude.
△ Less
Submitted 18 July, 2019; v1 submitted 24 July, 2016;
originally announced July 2016.