-
X-Recon: Learning-based Patient-specific High-Resolution CT Reconstruction from Orthogonal X-Ray Images
Authors:
Yunpeng Wang,
Kang Wang,
Yaoyao Zhuo,
Weiya Shi,
Fei Shan,
Lei Liu
Abstract:
Rapid and accurate diagnosis of pneumothorax, utilizing chest X-ray and computed tomography (CT), is crucial for assisted diagnosis. Chest X-ray is commonly used for initial localization of pneumothorax, while CT ensures accurate quantification. However, CT scans involve high radiation doses and can be costly. To achieve precise quantitative diagnosis while minimizing radiation exposure, we propos…
▽ More
Rapid and accurate diagnosis of pneumothorax, utilizing chest X-ray and computed tomography (CT), is crucial for assisted diagnosis. Chest X-ray is commonly used for initial localization of pneumothorax, while CT ensures accurate quantification. However, CT scans involve high radiation doses and can be costly. To achieve precise quantitative diagnosis while minimizing radiation exposure, we proposed X-Recon, a CT ultra-sparse reconstruction network based on ortho-lateral chest X-ray images. X-Recon integrates generative adversarial networks (GANs), including a generator with a multi-scale fusion rendering module and a discriminator enhanced by 3D coordinate convolutional layers, designed to facilitate CT reconstruction. To improve precision, a projective spatial transformer is utilized to incorporate multi-angle projection loss. Additionally, we proposed PTX-Seg, a zero-shot pneumothorax segmentation algorithm, combining image processing techniques with deep-learning models for the segmentation of air-accumulated regions and lung structures. Experiments on a large-scale dataset demonstrate its superiority over existing approaches. X-Recon achieved a significantly higher reconstruction resolution with a higher average spatial resolution and a lower average slice thickness. The reconstruction metrics achieved state-of-the-art performance in terms of several metrics including peak signal-to-noise ratio. The zero-shot segmentation algorithm, PTX-Seg, also demonstrated high segmentation precision for the air-accumulated region, the left lung, and the right lung. Moreover, the consistency analysis for the pneumothorax chest occupancy ratio between reconstructed CT and original CT obtained a high correlation coefficient. Code will be available at: https://github.com/wangyunpengbio/X-Recon
△ Less
Submitted 21 July, 2024;
originally announced July 2024.
-
DeCE: Deceptive Cross-Entropy Loss Designed for Defending Backdoor Attacks
Authors:
Guang Yang,
Yu Zhou,
Xiang Chen,
Xiangyu Zhang,
Terry Yue Zhuo,
David Lo,
Taolue Chen
Abstract:
Code Language Models (CLMs), particularly those leveraging deep learning, have achieved significant success in code intelligence domain. However, the issue of security, particularly backdoor attacks, is often overlooked in this process. The previous research has focused on designing backdoor attacks for CLMs, but effective defenses have not been adequately addressed. In particular, existing defens…
▽ More
Code Language Models (CLMs), particularly those leveraging deep learning, have achieved significant success in code intelligence domain. However, the issue of security, particularly backdoor attacks, is often overlooked in this process. The previous research has focused on designing backdoor attacks for CLMs, but effective defenses have not been adequately addressed. In particular, existing defense methods from natural language processing, when directly applied to CLMs, are not effective enough and lack generality, working well in some models and scenarios but failing in others, thus fall short in consistently mitigating backdoor attacks. To bridge this gap, we first confirm the phenomenon of ``early learning" as a general occurrence during the training of CLMs. This phenomenon refers to that a model initially focuses on the main features of training data but may become more sensitive to backdoor triggers over time, leading to overfitting and susceptibility to backdoor attacks. We then analyze that overfitting to backdoor triggers results from the use of the cross-entropy loss function, where the unboundedness of cross-entropy leads the model to increasingly concentrate on the features of the poisoned data. Based on this insight, we propose a general and effective loss function DeCE (Deceptive Cross-Entropy) by blending deceptive distributions and applying label smoothing to limit the gradient to be bounded, which prevents the model from overfitting to backdoor triggers and then enhances the security of CLMs against backdoor attacks. To verify the effectiveness of our defense method, we select code synthesis tasks as our experimental scenarios. Our experiments across various code synthesis datasets, models, and poisoning ratios demonstrate the applicability and effectiveness of DeCE in enhancing the security of CLMs.
△ Less
Submitted 20 August, 2024; v1 submitted 11 July, 2024;
originally announced July 2024.
-
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Authors:
Terry Yue Zhuo,
Minh Chien Vu,
Jenny Chim,
Han Hu,
Wenhao Yu,
Ratnadira Widyasari,
Imam Nur Bani Yusuf,
Haolan Zhan,
Junda He,
Indraneil Paul,
Simon Brunner,
Chen Gong,
Thong Hoang,
Armel Randy Zebaze,
Xiaoheng Hong,
Wen-Ding Li,
Jean Kaddour,
Ming Xu,
Zhihan Zhang,
Prateek Yadav,
Naman Jain,
Alex Gu,
Zhoujun Cheng,
Jiawei Liu,
Qian Liu
, et al. (8 additional authors not shown)
Abstract:
Automated software engineering has been greatly empowered by the recent advances in Large Language Models (LLMs) for programming. While current benchmarks have shown that LLMs can perform various software engineering tasks like human developers, the majority of their evaluations are limited to short and self-contained algorithmic tasks. Solving challenging and practical programming tasks requires…
▽ More
Automated software engineering has been greatly empowered by the recent advances in Large Language Models (LLMs) for programming. While current benchmarks have shown that LLMs can perform various software engineering tasks like human developers, the majority of their evaluations are limited to short and self-contained algorithmic tasks. Solving challenging and practical programming tasks requires the capability of utilizing diverse function calls as tools to efficiently implement functionalities like data analysis and web development. In addition, using multiple tools to solve a task needs compositional reasoning by accurately understanding complex instructions. Fulfilling both of these characteristics can pose a great challenge for LLMs. To assess how well LLMs can solve challenging and practical programming tasks, we introduce Bench, a benchmark that challenges LLMs to invoke multiple function calls as tools from 139 libraries and 7 domains for 1,140 fine-grained programming tasks. To evaluate LLMs rigorously, each programming task encompasses 5.6 test cases with an average branch coverage of 99%. In addition, we propose a natural-language-oriented variant of Bench, Benchi, that automatically transforms the original docstrings into short instructions only with essential information. Our extensive evaluation of 60 LLMs shows that LLMs are not yet capable of following complex instructions to use function calls precisely, with scores up to 60%, significantly lower than the human performance of 97%. The results underscore the need for further advancements in this area.
△ Less
Submitted 26 June, 2024; v1 submitted 22 June, 2024;
originally announced June 2024.
-
IG2: Integrated Gradient on Iterative Gradient Path for Feature Attribution
Authors:
Yue Zhuo,
Zhiqiang Ge
Abstract:
Feature attribution explains Artificial Intelligence (AI) at the instance level by providing importance scores of input features' contributions to model prediction. Integrated Gradients (IG) is a prominent path attribution method for deep neural networks, involving the integration of gradients along a path from the explained input (explicand) to a counterfactual instance (baseline). Current IG var…
▽ More
Feature attribution explains Artificial Intelligence (AI) at the instance level by providing importance scores of input features' contributions to model prediction. Integrated Gradients (IG) is a prominent path attribution method for deep neural networks, involving the integration of gradients along a path from the explained input (explicand) to a counterfactual instance (baseline). Current IG variants primarily focus on the gradient of explicand's output. However, our research indicates that the gradient of the counterfactual output significantly affects feature attribution as well. To achieve this, we propose Iterative Gradient path Integrated Gradients (IG2), considering both gradients. IG2 incorporates the counterfactual gradient iteratively into the integration path, generating a novel path (GradPath) and a novel baseline (GradCF). These two novel IG components effectively address the issues of attribution noise and arbitrary baseline choice in earlier IG methods. IG2, as a path method, satisfies many desirable axioms, which are theoretically justified in the paper. Experimental results on XAI benchmark, ImageNet, MNIST, TREC questions answering, wafer-map failure patterns, and CelebA face attributes validate that IG2 delivers superior feature attributions compared to the state-of-the-art techniques. The code is released at: https://github.com/JoeZhuo-ZY/IG2.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
Multi-Beam Integrated Sensing and Communication: State-of-the-Art, Challenges and Opportunities
Authors:
Yinxiao Zhuo,
Tianqi Mao,
Haojin Li,
Chen Sun,
Zhaocheng Wang,
Zhu Han,
Sheng Chen
Abstract:
Integrated sensing and communication (ISAC) has been envisioned as a critical enabling technology for the next-generation wireless communication, which can realize location/motion detection of surroundings with communication devices. This additional sensing capability leads to a substantial network quality gain and expansion of the service scenarios. As the system evolves to millimeter wave (mmWav…
▽ More
Integrated sensing and communication (ISAC) has been envisioned as a critical enabling technology for the next-generation wireless communication, which can realize location/motion detection of surroundings with communication devices. This additional sensing capability leads to a substantial network quality gain and expansion of the service scenarios. As the system evolves to millimeter wave (mmWave) and above, ISAC can realize simultaneous communications and sensing of the ultra-high throughput level and radar resolution with compact design, which relies on directional beamforming against the path loss. With the multi-beam technology, the dual functions of ISAC can be seamlessly incorporated at the beamspace level by unleashing the potential of joint beamforming. To this end, this article investigates the key technologies for multi-beam ISAC system. We begin with an overview of the current state-of-the-art solutions in multi-beam ISAC. Subsequently, a detailed analysis of the advantages associated with the multi-beam ISAC is provided. Additionally, the key technologies for transmitter, channel and receiver of the multi-beam ISAC are introduced. Finally, we explore the challenges and opportunities presented by multi-beam ISAC, offering valuable insights into this emerging field.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts
Authors:
Yifeng Ding,
Jiawei Liu,
Yuxiang Wei,
Terry Yue Zhuo,
Lingming Zhang
Abstract:
We introduce XFT, a simple yet powerful training scheme, by simply merging upcycled Mixture-of-Experts (MoE) to unleash the performance limit of instruction-tuned code Large Language Models (LLMs). While vanilla sparse upcycling fails to improve instruction tuning, XFT introduces a shared expert mechanism with a novel routing weight normalization strategy into sparse upcycling, which significantly…
▽ More
We introduce XFT, a simple yet powerful training scheme, by simply merging upcycled Mixture-of-Experts (MoE) to unleash the performance limit of instruction-tuned code Large Language Models (LLMs). While vanilla sparse upcycling fails to improve instruction tuning, XFT introduces a shared expert mechanism with a novel routing weight normalization strategy into sparse upcycling, which significantly boosts instruction tuning. After fine-tuning the upcycled MoE model, XFT introduces a learnable model merging mechanism to compile the upcycled MoE model back to a dense model, achieving upcycled MoE-level performance with only dense-model compute. By applying XFT to a 1.3B model, we create a new state-of-the-art tiny code LLM (<3B) with 67.1 and 64.6 pass@1 on HumanEval and HumanEval+ respectively. With the same data and model architecture, XFT improves supervised fine-tuning (SFT) by 13% on HumanEval+, along with consistent improvements from 2% to 13% on MBPP+, MultiPL-E, and DS-1000, demonstrating its generalizability. XFT is fully orthogonal to existing techniques such as Evol-Instruct and OSS-Instruct, opening a new dimension for improving code instruction tuning. Codes are available at https://github.com/ise-uiuc/xft.
△ Less
Submitted 6 June, 2024; v1 submitted 23 April, 2024;
originally announced April 2024.
-
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order
Authors:
Taishi Nakamura,
Mayank Mishra,
Simone Tedeschi,
Yekun Chai,
Jason T Stillerman,
Felix Friedrich,
Prateek Yadav,
Tanmay Laud,
Vu Minh Chien,
Terry Yue Zhuo,
Diganta Misra,
Ben Bogin,
Xuan-Son Vu,
Marzena Karpinska,
Arnav Varma Dantuluri,
Wojciech Kusa,
Tommaso Furlanello,
Rio Yokota,
Niklas Muennighoff,
Suhas Pai,
Tosin Adewumi,
Veronika Laippala,
Xiaozhe Yao,
Adalberto Junior,
Alpay Ariyak
, et al. (20 additional authors not shown)
Abstract:
Pretrained language models underpin several AI applications, but their high computational cost for training limits accessibility. Initiatives such as BLOOM and StarCoder aim to democratize access to pretrained models for collaborative community development. However, such existing models face challenges: limited multilingual capabilities, continual pretraining causing catastrophic forgetting, where…
▽ More
Pretrained language models underpin several AI applications, but their high computational cost for training limits accessibility. Initiatives such as BLOOM and StarCoder aim to democratize access to pretrained models for collaborative community development. However, such existing models face challenges: limited multilingual capabilities, continual pretraining causing catastrophic forgetting, whereas pretraining from scratch is computationally expensive, and compliance with AI safety and development laws. This paper presents Aurora-M, a 15B parameter multilingual open-source model trained on English, Finnish, Hindi, Japanese, Vietnamese, and code. Continually pretrained from StarCoderPlus on 435 billion additional tokens, Aurora-M surpasses 2 trillion tokens in total training token count. It is the first open-source multilingual model fine-tuned on human-reviewed safety instructions, thus aligning its development not only with conventional red-teaming considerations, but also with the specific concerns articulated in the Biden-Harris Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence. Aurora-M is rigorously evaluated across various tasks and languages, demonstrating robustness against catastrophic forgetting and outperforming alternatives in multilingual settings, particularly in safety evaluations. To promote responsible open-source LLM development, Aurora-M and its variants are released at https://huggingface.co/collections/aurora-m/aurora-m-models-65fdfdff62471e09812f5407 .
△ Less
Submitted 23 April, 2024; v1 submitted 30 March, 2024;
originally announced April 2024.
-
StarCoder 2 and The Stack v2: The Next Generation
Authors:
Anton Lozhkov,
Raymond Li,
Loubna Ben Allal,
Federico Cassano,
Joel Lamy-Poirier,
Nouamane Tazi,
Ao Tang,
Dmytro Pykhtar,
Jiawei Liu,
Yuxiang Wei,
Tianyang Liu,
Max Tian,
Denis Kocetkov,
Arthur Zucker,
Younes Belkada,
Zijian Wang,
Qian Liu,
Dmitry Abulkhanov,
Indraneil Paul,
Zhuang Li,
Wen-Ding Li,
Megan Risdal,
Jia Li,
Jian Zhu,
Terry Yue Zhuo
, et al. (41 additional authors not shown)
Abstract:
The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data…
▽ More
The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data sources, such as GitHub pull requests, Kaggle notebooks, and code documentation. This results in a training set that is 4x larger than the first StarCoder dataset. We train StarCoder2 models with 3B, 7B, and 15B parameters on 3.3 to 4.3 trillion tokens and thoroughly evaluate them on a comprehensive set of Code LLM benchmarks. We find that our small model, StarCoder2-3B, outperforms other Code LLMs of similar size on most benchmarks, and also outperforms StarCoderBase-15B. Our large model, StarCoder2- 15B, significantly outperforms other models of comparable size. In addition, it matches or outperforms CodeLlama-34B, a model more than twice its size. Although DeepSeekCoder- 33B is the best-performing model at code completion for high-resource languages, we find that StarCoder2-15B outperforms it on math and code reasoning benchmarks, as well as several low-resource languages. We make the model weights available under an OpenRAIL license and ensure full transparency regarding the training data by releasing the SoftWare Heritage persistent IDentifiers (SWHIDs) of the source code data.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models
Authors:
Terry Yue Zhuo,
Armel Zebaze,
Nitchakarn Suppattarachai,
Leandro von Werra,
Harm de Vries,
Qian Liu,
Niklas Muennighoff
Abstract:
The high cost of full-parameter fine-tuning (FFT) of Large Language Models (LLMs) has led to a series of parameter-efficient fine-tuning (PEFT) methods. However, it remains unclear which methods provide the best cost-performance trade-off at different model scales. We introduce Astraios, a suite of 28 instruction-tuned OctoCoder models using 7 tuning methods and 4 model sizes up to 16 billion para…
▽ More
The high cost of full-parameter fine-tuning (FFT) of Large Language Models (LLMs) has led to a series of parameter-efficient fine-tuning (PEFT) methods. However, it remains unclear which methods provide the best cost-performance trade-off at different model scales. We introduce Astraios, a suite of 28 instruction-tuned OctoCoder models using 7 tuning methods and 4 model sizes up to 16 billion parameters. Through investigations across 5 tasks and 8 different datasets encompassing both code comprehension and code generation tasks, we find that FFT generally leads to the best downstream performance across all scales, and PEFT methods differ significantly in their efficacy based on the model scale. LoRA usually offers the most favorable trade-off between cost and performance. Further investigation into the effects of these methods on both model robustness and code security reveals that larger models tend to demonstrate reduced robustness and less security. At last, we explore the relationships among updated parameters, cross-entropy loss, and task performance. We find that the tuning effectiveness observed in small models generalizes well to larger models, and the validation loss in instruction tuning can be a reliable indicator of overall downstream performance.
△ Less
Submitted 1 January, 2024;
originally announced January 2024.
-
Transformer-based Selective Super-Resolution for Efficient Image Refinement
Authors:
Tianyi Zhang,
Kishore Kasichainula,
Yaoxin Zhuo,
Baoxin Li,
Jae-sun Seo,
Yu Cao
Abstract:
Conventional super-resolution methods suffer from two drawbacks: substantial computational cost in upscaling an entire large image, and the introduction of extraneous or potentially detrimental information for downstream computer vision tasks during the refinement of the background. To solve these issues, we propose a novel transformer-based algorithm, Selective Super-Resolution (SSR), which parti…
▽ More
Conventional super-resolution methods suffer from two drawbacks: substantial computational cost in upscaling an entire large image, and the introduction of extraneous or potentially detrimental information for downstream computer vision tasks during the refinement of the background. To solve these issues, we propose a novel transformer-based algorithm, Selective Super-Resolution (SSR), which partitions images into non-overlapping tiles, selects tiles of interest at various scales with a pyramid architecture, and exclusively reconstructs these selected tiles with deep features. Experimental results on three datasets demonstrate the efficiency and robust performance of our approach for super-resolution. Compared to the state-of-the-art methods, the FID score is reduced from 26.78 to 10.41 with 40% reduction in computation cost for the BDD100K dataset. The source code is available at https://github.com/destiny301/SSR.
△ Less
Submitted 10 December, 2023;
originally announced December 2023.
-
Chain-of-Thought in Neural Code Generation: From and For Lightweight Language Models
Authors:
Guang Yang,
Yu Zhou,
Xiang Chen,
Xiangyu Zhang,
Terry Yue Zhuo,
Taolue Chen
Abstract:
Large Language Models (LLMs) have demonstrated remarkable potential in code generation. The integration of Chain of Thought (CoT) reasoning can further boost their performance. However, current CoT methods often require manual writing or LLMs with over 100 billion parameters to generate, impeding their applicability in resource-constrained scenarios. In this study, we investigate lightweight Langu…
▽ More
Large Language Models (LLMs) have demonstrated remarkable potential in code generation. The integration of Chain of Thought (CoT) reasoning can further boost their performance. However, current CoT methods often require manual writing or LLMs with over 100 billion parameters to generate, impeding their applicability in resource-constrained scenarios. In this study, we investigate lightweight Language Models (lLMs), which are defined to have fewer than 10 billion parameters. Empirically, we find that most lLMs cannot generate high-quality CoTs when prompted by the few-shot method, but can take advantage of high-quality CoTs generated elsewhere to improve their performance in code generation. Based on these findings, we design a novel approach COTTON which can leverage lLMs to automatically generate CoTs for code generation. We synthesize new datasets and conduct extensive experiments on various benchmarks. The results show that the CoTs generated by COTTON outperform the baselines in terms of automated and human evaluation metrics. In particular, the CoTs generated by COTTON boost various lLMs to achieve higher performance gains than those generated by LLMs such as ChatGLM (130B), and are competitive with those generated by gpt-3.5-turbo (175B). Our study also showcases the potential of lLMs in software engineering applications.
△ Less
Submitted 4 August, 2024; v1 submitted 9 December, 2023;
originally announced December 2023.
-
ABIGX: A Unified Framework for eXplainable Fault Detection and Classification
Authors:
Yue Zhuo,
Jinchuan Qian,
Zhihuan Song,
Zhiqiang Ge
Abstract:
For explainable fault detection and classification (FDC), this paper proposes a unified framework, ABIGX (Adversarial fault reconstruction-Based Integrated Gradient eXplanation). ABIGX is derived from the essentials of previous successful fault diagnosis methods, contribution plots (CP) and reconstruction-based contribution (RBC). It is the first explanation framework that provides variable contri…
▽ More
For explainable fault detection and classification (FDC), this paper proposes a unified framework, ABIGX (Adversarial fault reconstruction-Based Integrated Gradient eXplanation). ABIGX is derived from the essentials of previous successful fault diagnosis methods, contribution plots (CP) and reconstruction-based contribution (RBC). It is the first explanation framework that provides variable contributions for the general FDC models. The core part of ABIGX is the adversarial fault reconstruction (AFR) method, which rethinks the FR from the perspective of adversarial attack and generalizes to fault classification models with a new fault index. For fault classification, we put forward a new problem of fault class smearing, which intrinsically hinders the correct explanation. We prove that ABIGX effectively mitigates this problem and outperforms the existing gradient-based explanation methods. For fault detection, we theoretically bridge ABIGX with conventional fault diagnosis methods by proving that CP and RBC are the linear specifications of ABIGX. The experiments evaluate the explanations of FDC by quantitative metrics and intuitive illustrations, the results of which show the general superiority of ABIGX to other advanced explanation methods.
△ Less
Submitted 9 November, 2023;
originally announced November 2023.
-
Patch-based Selection and Refinement for Early Object Detection
Authors:
Tianyi Zhang,
Kishore Kasichainula,
Yaoxin Zhuo,
Baoxin Li,
Jae-Sun Seo,
Yu Cao
Abstract:
Early object detection (OD) is a crucial task for the safety of many dynamic systems. Current OD algorithms have limited success for small objects at a long distance. To improve the accuracy and efficiency of such a task, we propose a novel set of algorithms that divide the image into patches, select patches with objects at various scales, elaborate the details of a small object, and detect it as…
▽ More
Early object detection (OD) is a crucial task for the safety of many dynamic systems. Current OD algorithms have limited success for small objects at a long distance. To improve the accuracy and efficiency of such a task, we propose a novel set of algorithms that divide the image into patches, select patches with objects at various scales, elaborate the details of a small object, and detect it as early as possible. Our approach is built upon a transformer-based network and integrates the diffusion model to improve the detection accuracy. As demonstrated on BDD100K, our algorithms enhance the mAP for small objects from 1.03 to 8.93, and reduce the data volume in computation by more than 77\%. The source code is available at \href{https://github.com/destiny301/dpr}{https://github.com/destiny301/dpr}
△ Less
Submitted 3 November, 2023;
originally announced November 2023.
-
Can ChatGPT Perform Reasoning Using the IRAC Method in Analyzing Legal Scenarios Like a Lawyer?
Authors:
Xiaoxi Kang,
Lizhen Qu,
Lay-Ki Soon,
Adnan Trakic,
Terry Yue Zhuo,
Patrick Charles Emerton,
Genevieve Grant
Abstract:
Large Language Models (LLMs), such as ChatGPT, have drawn a lot of attentions recently in the legal domain due to its emergent ability to tackle a variety of legal tasks. However, it is still unknown if LLMs are able to analyze a legal case and perform reasoning in the same manner as lawyers. Therefore, we constructed a novel corpus consisting of scenarios pertain to Contract Acts Malaysia and Aus…
▽ More
Large Language Models (LLMs), such as ChatGPT, have drawn a lot of attentions recently in the legal domain due to its emergent ability to tackle a variety of legal tasks. However, it is still unknown if LLMs are able to analyze a legal case and perform reasoning in the same manner as lawyers. Therefore, we constructed a novel corpus consisting of scenarios pertain to Contract Acts Malaysia and Australian Social Act for Dependent Child. ChatGPT is applied to perform analysis on the corpus using the IRAC method, which is a framework widely used by legal professionals for organizing legal analysis. Each scenario in the corpus is annotated with a complete IRAC analysis in a semi-structured format so that both machines and legal professionals are able to interpret and understand the annotations. In addition, we conducted the first empirical assessment of ChatGPT for IRAC analysis in order to understand how well it aligns with the analysis of legal professionals. Our experimental results shed lights on possible future research directions to improve alignments between LLMs and legal experts in terms of legal reasoning.
△ Less
Submitted 2 November, 2023; v1 submitted 23 October, 2023;
originally announced October 2023.
-
A Longitudinal Analysis about the Effect of Air Pollution on Astigmatism for Children and Young Adults
Authors:
Lin An,
Qiuyue Hu,
Jieying Guan,
Yingting Zhu,
Chenyao Jiang,
Xiaoyun Zhong,
Shuyue Ma,
Dongmei Yu,
Canyang Zhang,
Yehong Zhuo,
Peiwu Qin
Abstract:
Purpose: This study aimed to investigate the correlation between air pollution and astigmatism, considering the detrimental effects of air pollution on respiratory, cardiovascular, and eye health. Methods: A longitudinal study was conducted with 127,709 individuals aged 4-27 years from 9 cities in Guangdong Province, China, spanning from 2019 to 2021. Astigmatism was measured using cylinder values…
▽ More
Purpose: This study aimed to investigate the correlation between air pollution and astigmatism, considering the detrimental effects of air pollution on respiratory, cardiovascular, and eye health. Methods: A longitudinal study was conducted with 127,709 individuals aged 4-27 years from 9 cities in Guangdong Province, China, spanning from 2019 to 2021. Astigmatism was measured using cylinder values. Multiple measurements were taken at intervals of at least 1 year. Various exposure windows were used to assess the lagged impacts of air pollution on astigmatism. A panel data model with random effects was constructed to analyze the relationship between pollutant exposure and astigmatism. Results: The study revealed significant associations between astigmatism and exposure to carbon monoxide (CO), nitrogen dioxide (NO2), and particulate matter (PM2.5) over time. A 10 μg/m3 increase in a 3-year exposure window of NO2 and PM2.5 was associated with a decrease in cylinder value of -0.045 diopters and -0.017 diopters, respectively. A 0.1 mg/m3 increase in CO concentration within a 2-year exposure window correlated with a decrease in cylinder value of -0.009 diopters. No significant relationships were found between PM10 exposure and astigmatism. Conclusion: This study concluded that greater exposure to NO2 and PM2.5 over longer periods aggravates astigmatism. The negative effect of CO on astigmatism peaks in the exposure window of 2 years prior to examination and diminishes afterward. No significant association was found between PM10 exposure and astigmatism, suggesting that gaseous and smaller particulate pollutants have easier access to human eyes, causing heterogeneous morphological changes to the eyeball.
△ Less
Submitted 13 October, 2023;
originally announced October 2023.
-
PatchProto Networks for Few-shot Visual Anomaly Classification
Authors:
Jian Wang,
Yue Zhuo
Abstract:
The visual anomaly diagnosis can automatically analyze the defective products, which has been widely applied in industrial quality inspection. The anomaly classification can classify the defective products into different categories. However, the anomaly samples are hard to access in practice, which impedes the training of canonical machine learning models. This paper studies a practical issue that…
▽ More
The visual anomaly diagnosis can automatically analyze the defective products, which has been widely applied in industrial quality inspection. The anomaly classification can classify the defective products into different categories. However, the anomaly samples are hard to access in practice, which impedes the training of canonical machine learning models. This paper studies a practical issue that anomaly samples for training are extremely scarce, i.e., few-shot learning (FSL). Utilizing the sufficient normal samples, we propose PatchProto networks for few-shot anomaly classification. Different from classical FSL methods, PatchProto networks only extract CNN features of defective regions of interest, which serves as the prototypes for few-shot learning. Compared with basic few-shot classifier, the experiment results on MVTec-AD dataset show PatchProto networks significantly improve the few-shot anomaly classification accuracy.
△ Less
Submitted 7 October, 2023;
originally announced October 2023.
-
Fake News Detectors are Biased against Texts Generated by Large Language Models
Authors:
Jinyan Su,
Terry Yue Zhuo,
Jonibek Mansurov,
Di Wang,
Preslav Nakov
Abstract:
The spread of fake news has emerged as a critical challenge, undermining trust and posing threats to society. In the era of Large Language Models (LLMs), the capability to generate believable fake content has intensified these concerns. In this study, we present a novel paradigm to evaluate fake news detectors in scenarios involving both human-written and LLM-generated misinformation. Intriguingly…
▽ More
The spread of fake news has emerged as a critical challenge, undermining trust and posing threats to society. In the era of Large Language Models (LLMs), the capability to generate believable fake content has intensified these concerns. In this study, we present a novel paradigm to evaluate fake news detectors in scenarios involving both human-written and LLM-generated misinformation. Intriguingly, our findings reveal a significant bias in many existing detectors: they are more prone to flagging LLM-generated content as fake news while often misclassifying human-written fake news as genuine. This unexpected bias appears to arise from distinct linguistic patterns inherent to LLM outputs. To address this, we introduce a mitigation strategy that leverages adversarial training with LLM-paraphrased genuine news. The resulting model yielded marked improvements in detection accuracy for both human and LLM-generated news. To further catalyze research in this domain, we release two comprehensive datasets, \texttt{GossipCop++} and \texttt{PolitiFact++}, thus amalgamating human-validated articles with LLM-generated fake and real news.
△ Less
Submitted 15 September, 2023;
originally announced September 2023.
-
Pop Quiz! Do Pre-trained Code Models Possess Knowledge of Correct API Names?
Authors:
Terry Yue Zhuo,
Xiaoning Du,
Zhenchang Xing,
Jiamou Sun,
Haowei Quan,
Li Li,
Liming Zhu
Abstract:
Recent breakthroughs in pre-trained code models, such as CodeBERT and Codex, have shown their superior performance in various downstream tasks. The correctness and unambiguity of API usage among these code models are crucial for achieving desirable program functionalities, requiring them to learn various API fully qualified names structurally and semantically. Recent studies reveal that even state…
▽ More
Recent breakthroughs in pre-trained code models, such as CodeBERT and Codex, have shown their superior performance in various downstream tasks. The correctness and unambiguity of API usage among these code models are crucial for achieving desirable program functionalities, requiring them to learn various API fully qualified names structurally and semantically. Recent studies reveal that even state-of-the-art pre-trained code models struggle with suggesting the correct APIs during code generation. However, the reasons for such poor API usage performance are barely investigated. To address this challenge, we propose using knowledge probing as a means of interpreting code models, which uses cloze-style tests to measure the knowledge stored in models. Our comprehensive study examines a code model's capability of understanding API fully qualified names from two different perspectives: API call and API import. Specifically, we reveal that current code models struggle with understanding API names, with pre-training strategies significantly affecting the quality of API name learning. We demonstrate that natural language context can assist code models in locating Python API names and generalize Python API name knowledge to unseen data. Our findings provide insights into the limitations and capabilities of current pre-trained code models, and suggest that incorporating API structure into the pre-training process can improve automated API usage and code representations. This work provides significance for advancing code intelligence practices and direction for future studies. All experiment results, data and source code used in this work are available at \url{https://doi.org/10.5281/zenodo.7902072}.
△ Less
Submitted 14 September, 2023;
originally announced September 2023.
-
OctoPack: Instruction Tuning Code Large Language Models
Authors:
Niklas Muennighoff,
Qian Liu,
Armel Zebaze,
Qinkai Zheng,
Binyuan Hui,
Terry Yue Zhuo,
Swayam Singh,
Xiangru Tang,
Leandro von Werra,
Shayne Longpre
Abstract:
Finetuning large language models (LLMs) on instructions leads to vast performance improvements on natural language tasks. We apply instruction tuning using code, leveraging the natural structure of Git commits, which pair code changes with human instructions. We compile CommitPack: 4 terabytes of Git commits across 350 programming languages. We benchmark CommitPack against other natural and synthe…
▽ More
Finetuning large language models (LLMs) on instructions leads to vast performance improvements on natural language tasks. We apply instruction tuning using code, leveraging the natural structure of Git commits, which pair code changes with human instructions. We compile CommitPack: 4 terabytes of Git commits across 350 programming languages. We benchmark CommitPack against other natural and synthetic code instructions (xP3x, Self-Instruct, OASST) on the 16B parameter StarCoder model, and achieve state-of-the-art performance among models not trained on OpenAI outputs, on the HumanEval Python benchmark (46.2% pass@1). We further introduce HumanEvalPack, expanding the HumanEval benchmark to a total of 3 coding tasks (Code Repair, Code Explanation, Code Synthesis) across 6 languages (Python, JavaScript, Java, Go, C++, Rust). Our models, OctoCoder and OctoGeeX, achieve the best performance across HumanEvalPack among all permissive models, demonstrating CommitPack's benefits in generalizing to a wider set of languages and natural coding tasks. Code, models and data are freely available at https://github.com/bigcode-project/octopack.
△ Less
Submitted 18 February, 2024; v1 submitted 14 August, 2023;
originally announced August 2023.
-
A First Look at On-device Models in iOS Apps
Authors:
Han Hu,
Yujin Huang,
Qiuyuan Chen,
Terry Yue Zhuo,
Chunyang Chen
Abstract:
Powered by the rising popularity of deep learning techniques on smartphones, on-device deep learning models are being used in vital fields like finance, social media, and driving assistance.
Because of the transparency of the Android platform and the on-device models inside, on-device models on Android smartphones have been proven to be extremely vulnerable.
However, due to the challenge in ac…
▽ More
Powered by the rising popularity of deep learning techniques on smartphones, on-device deep learning models are being used in vital fields like finance, social media, and driving assistance.
Because of the transparency of the Android platform and the on-device models inside, on-device models on Android smartphones have been proven to be extremely vulnerable.
However, due to the challenge in accessing and analysing iOS app files, despite iOS being a mobile platform as popular as Android, there are no relevant works on on-device models in iOS apps.
Since the functionalities of the same app on Android and iOS platforms are similar, the same vulnerabilities may exist on both platforms.
In this paper, we present the first empirical study about on-device models in iOS apps, including their adoption of deep learning frameworks, structure, functionality, and potential security issues.
We study why current developers use different on-device models for one app between iOS and Android.
We propose a more general attack against white-box models that does not rely on pre-trained models and a new adversarial attack approach based on our findings to target iOS's gray-box on-device models.
Our results show the effectiveness of our approaches.
Finally, we successfully exploit the vulnerabilities of on-device models to attack real-world iOS apps.
△ Less
Submitted 27 July, 2023; v1 submitted 23 July, 2023;
originally announced July 2023.
-
DetectLLM: Leveraging Log Rank Information for Zero-Shot Detection of Machine-Generated Text
Authors:
Jinyan Su,
Terry Yue Zhuo,
Di Wang,
Preslav Nakov
Abstract:
With the rapid progress of large language models (LLMs) and the huge amount of text they generated, it becomes more and more impractical to manually distinguish whether a text is machine-generated. Given the growing use of LLMs in social media and education, it prompts us to develop methods to detect machine-generated text, preventing malicious usage such as plagiarism, misinformation, and propaga…
▽ More
With the rapid progress of large language models (LLMs) and the huge amount of text they generated, it becomes more and more impractical to manually distinguish whether a text is machine-generated. Given the growing use of LLMs in social media and education, it prompts us to develop methods to detect machine-generated text, preventing malicious usage such as plagiarism, misinformation, and propaganda. Previous work has studied several zero-shot methods, which require no training data. These methods achieve good performance, but there is still a lot of room for improvement. In this paper, we introduce two novel zero-shot methods for detecting machine-generated text by leveraging the log rank information. One is called DetectLLM-LRR, which is fast and efficient, and the other is called DetectLLM-NPR, which is more accurate, but slower due to the need for perturbations. Our experiments on three datasets and seven language models show that our proposed methods improve over the state of the art by 3.9 and 1.75 AUROC points absolute. Moreover, DetectLLM-NPR needs fewer perturbations than previous work to achieve the same level of performance, which makes it more practical for real-world use. We also investigate the efficiency--performance trade-off based on users preference on these two measures and we provide intuition for using them in practice effectively. We release the data and the code of both methods in https://github.com/mbzuai-nlp/DetectLLM
△ Less
Submitted 23 May, 2023;
originally announced June 2023.
-
Source Code Data Augmentation for Deep Learning: A Survey
Authors:
Terry Yue Zhuo,
Zhou Yang,
Zhensu Sun,
Yufei Wang,
Li Li,
Xiaoning Du,
Zhenchang Xing,
David Lo
Abstract:
The increasingly popular adoption of deep learning models in many critical source code tasks motivates the development of data augmentation (DA) techniques to enhance training data and improve various capabilities (e.g., robustness and generalizability) of these models. Although a series of DA methods have been proposed and tailored for source code models, there lacks a comprehensive survey and ex…
▽ More
The increasingly popular adoption of deep learning models in many critical source code tasks motivates the development of data augmentation (DA) techniques to enhance training data and improve various capabilities (e.g., robustness and generalizability) of these models. Although a series of DA methods have been proposed and tailored for source code models, there lacks a comprehensive survey and examination to understand their effectiveness and implications. This paper fills this gap by conducting a comprehensive and integrative survey of data augmentation for source code, wherein we systematically compile and encapsulate existing literature to provide a comprehensive overview of the field. We start with an introduction of data augmentation in source code and then provide a discussion on major representative approaches. Next, we highlight the general strategies and techniques to optimize the DA quality. Subsequently, we underscore techniques useful in real-world source code scenarios and downstream tasks. Finally, we outline the prevailing challenges and potential opportunities for future research. In essence, we aim to demystify the corpus of existing literature on source code DA for deep learning, and foster further exploration in this sphere. Complementing this, we present a continually updated GitHub repository that hosts a list of update-to-date papers on DA for source code modeling, accessible at \url{https://github.com/terryyz/DataAug4Code}.
△ Less
Submitted 13 November, 2023; v1 submitted 31 May, 2023;
originally announced May 2023.
-
FACTUAL: A Benchmark for Faithful and Consistent Textual Scene Graph Parsing
Authors:
Zhuang Li,
Yuyang Chai,
Terry Yue Zhuo,
Lizhen Qu,
Gholamreza Haffari,
Fei Li,
Donghong Ji,
Quan Hung Tran
Abstract:
Textual scene graph parsing has become increasingly important in various vision-language applications, including image caption evaluation and image retrieval. However, existing scene graph parsers that convert image captions into scene graphs often suffer from two types of errors. First, the generated scene graphs fail to capture the true semantics of the captions or the corresponding images, resu…
▽ More
Textual scene graph parsing has become increasingly important in various vision-language applications, including image caption evaluation and image retrieval. However, existing scene graph parsers that convert image captions into scene graphs often suffer from two types of errors. First, the generated scene graphs fail to capture the true semantics of the captions or the corresponding images, resulting in a lack of faithfulness. Second, the generated scene graphs have high inconsistency, with the same semantics represented by different annotations.
To address these challenges, we propose a novel dataset, which involves re-annotating the captions in Visual Genome (VG) using a new intermediate representation called FACTUAL-MR. FACTUAL-MR can be directly converted into faithful and consistent scene graph annotations. Our experimental results clearly demonstrate that the parser trained on our dataset outperforms existing approaches in terms of faithfulness and consistency. This improvement leads to a significant performance boost in both image caption evaluation and zero-shot image retrieval tasks. Furthermore, we introduce a novel metric for measuring scene graph similarity, which, when combined with the improved scene graph parser, achieves state-of-the-art (SOTA) results on multiple benchmark datasets for the aforementioned tasks. The code and dataset are available at https://github.com/zhuang-li/FACTUAL .
△ Less
Submitted 1 June, 2023; v1 submitted 27 May, 2023;
originally announced May 2023.
-
StarCoder: may the source be with you!
Authors:
Raymond Li,
Loubna Ben Allal,
Yangtian Zi,
Niklas Muennighoff,
Denis Kocetkov,
Chenghao Mou,
Marc Marone,
Christopher Akiki,
Jia Li,
Jenny Chim,
Qian Liu,
Evgenii Zheltonozhskii,
Terry Yue Zhuo,
Thomas Wang,
Olivier Dehaene,
Mishig Davaadorj,
Joel Lamy-Poirier,
João Monteiro,
Oleh Shliazhko,
Nicolas Gontier,
Nicholas Meade,
Armel Zebaze,
Ming-Ho Yee,
Logesh Kumar Umapathi,
Jian Zhu
, et al. (42 additional authors not shown)
Abstract:
The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention. StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large colle…
▽ More
The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention. StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large collection of permissively licensed GitHub repositories with inspection tools and an opt-out process. We fine-tuned StarCoderBase on 35B Python tokens, resulting in the creation of StarCoder. We perform the most comprehensive evaluation of Code LLMs to date and show that StarCoderBase outperforms every open Code LLM that supports multiple programming languages and matches or outperforms the OpenAI code-cushman-001 model. Furthermore, StarCoder outperforms every model that is fine-tuned on Python, can be prompted to achieve 40\% pass@1 on HumanEval, and still retains its performance on other programming languages. We take several important steps towards a safe open-access model release, including an improved PII redaction pipeline and a novel attribution tracing tool, and make the StarCoder models publicly available under a more commercially viable version of the Open Responsible AI Model license.
△ Less
Submitted 13 December, 2023; v1 submitted 9 May, 2023;
originally announced May 2023.
-
ICE-Score: Instructing Large Language Models to Evaluate Code
Authors:
Terry Yue Zhuo
Abstract:
Recent advancements in the field of natural language generation have facilitated the use of large language models to assess the quality of generated text. Although these models have shown promising results in tasks such as machine translation and summarization, their applicability in code intelligence tasks remains limited without human involvement. The complexity of programming concepts required…
▽ More
Recent advancements in the field of natural language generation have facilitated the use of large language models to assess the quality of generated text. Although these models have shown promising results in tasks such as machine translation and summarization, their applicability in code intelligence tasks remains limited without human involvement. The complexity of programming concepts required for such tasks makes it difficult to develop evaluation metrics that align with human judgment. Token-matching-based metrics, such as BLEU, have demonstrated weak correlations with human practitioners in code intelligence tasks. Moreover, utilizing human-written test suites to evaluate functional correctness can be challenging in domains with low resources. To overcome these obstacles, we propose \texttt{ICE-Score}, a new evaluation metric via instructing large language models (LLMs) for code assessments. Our metric addresses the limitations of existing approaches by achieving superior correlations with functional correctness and human preferences, without the need for test oracles or references. We evaluate the efficacy of our metric on two different aspects (\textit{human preference} and \textit{execution success}) and four programming languages. Our results demonstrate that our metric surpasses state-of-the-art metrics for code generation, delivering high levels of accuracy and consistency across various programming languages and tasks. We also make our evaluation metric and datasets available to the public\footnote{\url{https://github.com/terryyz/ice-score}}, encouraging further research in evaluating code intelligence tasks.
△ Less
Submitted 22 January, 2024; v1 submitted 27 April, 2023;
originally announced April 2023.
-
Displacement field calculation of large-scale structures using computer vision with physical constraints
Authors:
Yapeng Guo,
Peng Zhong,
Yi Zhuo,
Fanzeng Meng,
Hao Di,
Shunlong Li
Abstract:
Because of the advantages of easy deployment, low cost and non-contact, computer vision-based structural displacement acquisition technique has received wide attention and research in recent years. However, the displacement field acquisition of large-scale structures is a challenging topic due to the contradiction of camera field of view and resolution. This paper presents a large-scale structural…
▽ More
Because of the advantages of easy deployment, low cost and non-contact, computer vision-based structural displacement acquisition technique has received wide attention and research in recent years. However, the displacement field acquisition of large-scale structures is a challenging topic due to the contradiction of camera field of view and resolution. This paper presents a large-scale structural displacement field calculation framework with integrated computer vision and physical constraints using only one camera. Firstly, the full-field image of the large-scale structure is obtained by processing the multi-view image using image stitching technique; secondly, the full-field image is meshed and the node displacements are calculated using an improved template matching method; and finally, the non-node displacements are described using shape functions considering physical constraints. The developed framework was validated using a scaled bridge model and evaluated by the proposed evaluation index for displacement field calculation accuracy. This paper can provide an effective way to obtain displacement fields of large-scale structures efficiently and cost-effectively.
△ Less
Submitted 31 March, 2023; v1 submitted 28 March, 2023;
originally announced March 2023.
-
Chloride Ion Erosion of Pre-Stressed Concrete Bridges in Cold Regions
Authors:
Hongtao Cui,
Yi Zhuo,
Dongyuan Ke,
Zhonglong Li,
Shunlong Li
Abstract:
The erosion of chloride ions in concrete bridges will accelerate the corrosion of reinforcement, which is an important reason for the decline of bridge durability. The erosion process of chloride ion, especially deicing salt solution in cold regions, is complex and has many influencing factors. It is very important to use accurate and effective methods to analyze the chloride ion erosion process i…
▽ More
The erosion of chloride ions in concrete bridges will accelerate the corrosion of reinforcement, which is an important reason for the decline of bridge durability. The erosion process of chloride ion, especially deicing salt solution in cold regions, is complex and has many influencing factors. It is very important to use accurate and effective methods to analyze the chloride ion erosion process in concrete. In this study, the pre-stressed concrete bridge retired in the cold region was taken as the research object, and the specimens from the whole bridge are obtained by the method of core drilling sampling. The concentration of chloride ion was measured at different depths of the specimens. The process of chloride ion erosion was simulated in two-dimensional space through COMSOL multi-physical field simulation, and compared with the measured results. The simulation method proposed in this paper has good reliability and accuracy.
△ Less
Submitted 28 March, 2023;
originally announced March 2023.
-
Local track irregularity identification based on multi-sensor time-frequency features of high-speed railway bridge accelerations
Authors:
Ye Mo,
Yi Zhuo,
Shunlong Li
Abstract:
Shortwave track diseases are generally reflected in the form of local track irregularity. Such diseases will greatly impact the train-track-bridge interaction (TTBI) dynamic system, seriously affecting train safety. Therefore, a method is proposed to detect and localize local track irregularities based on multis-sensor time-frequency features of high-speed railway bridge accelerations. Continuous…
▽ More
Shortwave track diseases are generally reflected in the form of local track irregularity. Such diseases will greatly impact the train-track-bridge interaction (TTBI) dynamic system, seriously affecting train safety. Therefore, a method is proposed to detect and localize local track irregularities based on multis-sensor time-frequency features of high-speed railway bridge accelerations. Continuous wavelet transform (CWT) is used to analyze the multi-sensor accelerations of railway bridges. Moreover, time-frequency features based on the sum of wavelet coefficients are proposed, considering the influence of the distance from the measurement points to the local irregularity on the recognition accuracy. Then, the multi-domain features are utilized to recognize deteriorated railway locations. A simply-supported high-speed railway bridge traversed by a railway train is adopted as a numerical simulation. Comparative studies are conducted to investigate the influence of vehicle speeds and the location of local track irregularity on the algorithm. Numerical simulation results show that the proposed algorithm can detect and locate local track irregularity accurately and is robust to vehicle speeds.
△ Less
Submitted 27 March, 2023;
originally announced March 2023.
-
Damage detection of high-speed railway box girder using train-induced dynamic responses
Authors:
Xin Wang,
Yi Zhuo,
Shunlong Li
Abstract:
This paper proposes a damage detection method based on the train-induced responses of high-speed railway box girder. Under the coupling effects of bending and torsion, the traditional damage detection method based on the Euler beam theory cannot be applied. In this research, the box girder section is divided into different components based on the plate element analysis method. The strain responses…
▽ More
This paper proposes a damage detection method based on the train-induced responses of high-speed railway box girder. Under the coupling effects of bending and torsion, the traditional damage detection method based on the Euler beam theory cannot be applied. In this research, the box girder section is divided into different components based on the plate element analysis method. The strain responses were preprocessed based on Principal Component Analysis (PCA) method to remove the influence of train operation variation. The residual error of autoregressive (AR) model was used as a potential index of damage features. The optimal order of the model was determined based on Bayesian information criterion (BIC) criterion. Finally, the confidence boundary (CB) of damage features (DF) constituting outliers can be estimated by Gaussian inverse cumulative distribution function (ICDF). The numerical simulation results show that the proposed method in this paper can effectively identify, locate and quantify the damage, which verifies the accuracy of the proposed method. The proposed method effectively identifies the early damage of all components on the key section by using four strain sensors, and it is helpful for developing effective maintenance strategies for high-speed railway box girder.
△ Less
Submitted 23 March, 2023;
originally announced March 2023.
-
Training-free Lexical Backdoor Attacks on Language Models
Authors:
Yujin Huang,
Terry Yue Zhuo,
Qiongkai Xu,
Han Hu,
Xingliang Yuan,
Chunyang Chen
Abstract:
Large-scale language models have achieved tremendous success across various natural language processing (NLP) applications. Nevertheless, language models are vulnerable to backdoor attacks, which inject stealthy triggers into models for steering them to undesirable behaviors. Most existing backdoor attacks, such as data poisoning, require further (re)training or fine-tuning language models to lear…
▽ More
Large-scale language models have achieved tremendous success across various natural language processing (NLP) applications. Nevertheless, language models are vulnerable to backdoor attacks, which inject stealthy triggers into models for steering them to undesirable behaviors. Most existing backdoor attacks, such as data poisoning, require further (re)training or fine-tuning language models to learn the intended backdoor patterns. The additional training process however diminishes the stealthiness of the attacks, as training a language model usually requires long optimization time, a massive amount of data, and considerable modifications to the model parameters. In this work, we propose Training-Free Lexical Backdoor Attack (TFLexAttack) as the first training-free backdoor attack on language models. Our attack is achieved by injecting lexical triggers into the tokenizer of a language model via manipulating its embedding dictionary using carefully designed rules. These rules are explainable to human developers which inspires attacks from a wider range of hackers. The sparse manipulation of the dictionary also habilitates the stealthiness of our attack. We conduct extensive experiments on three dominant NLP tasks based on nine language models to demonstrate the effectiveness and universality of our attack. The code of this work is available at https://github.com/Jinxhy/TFLexAttack.
△ Less
Submitted 8 February, 2023;
originally announced February 2023.
-
Electrically controllable thermal transport in Josephson junctions based on buckled two-dimensional materials
Authors:
Yu-Hao Zhuo,
Biao Wu,
Gang Ouyang,
Hai Li
Abstract:
We investigate the thermal transport properties in superconductor-antiferromagnet-superconductor and superconductor-ferromagnet-superconductor junctions based on buckled two-dimensional materials (BTDMs). Owing to the unique buckled sublattice structures of BTDMs, in both junctions the phase dependence of the thermal conductance can be effectively controlled by perpendicular electric fields. The u…
▽ More
We investigate the thermal transport properties in superconductor-antiferromagnet-superconductor and superconductor-ferromagnet-superconductor junctions based on buckled two-dimensional materials (BTDMs). Owing to the unique buckled sublattice structures of BTDMs, in both junctions the phase dependence of the thermal conductance can be effectively controlled by perpendicular electric fields. The underlying mechanism for the electrical tunability of thermal conductance is elucidated resorting to the band structures of the magnetic regions. We also reveal the distinct manifestations of antiferromagnetic and ferromagnetic exchange fields in the thermal conductance. These results demonstrate that the perpendicular electric field can serve as a knob to externally manipulate the phase-coherent thermal transport in BTDMs-based Josephson junctions.
△ Less
Submitted 5 February, 2023;
originally announced February 2023.
-
On Robustness of Prompt-based Semantic Parsing with Large Pre-trained Language Model: An Empirical Study on Codex
Authors:
Terry Yue Zhuo,
Zhuang Li,
Yujin Huang,
Fatemeh Shiri,
Weiqing Wang,
Gholamreza Haffari,
Yuan-Fang Li
Abstract:
Semantic parsing is a technique aimed at constructing a structured representation of the meaning of a natural-language question. Recent advancements in few-shot language models trained on code have demonstrated superior performance in generating these representations compared to traditional unimodal language models, which are trained on downstream tasks. Despite these advancements, existing fine-t…
▽ More
Semantic parsing is a technique aimed at constructing a structured representation of the meaning of a natural-language question. Recent advancements in few-shot language models trained on code have demonstrated superior performance in generating these representations compared to traditional unimodal language models, which are trained on downstream tasks. Despite these advancements, existing fine-tuned neural semantic parsers are susceptible to adversarial attacks on natural-language inputs. While it has been established that the robustness of smaller semantic parsers can be enhanced through adversarial training, this approach is not feasible for large language models in real-world scenarios, as it requires both substantial computational resources and expensive human annotation on in-domain semantic parsing data. This paper presents the first empirical study on the adversarial robustness of a large prompt-based language model of code, \codex. Our results demonstrate that the state-of-the-art (SOTA) code-language models are vulnerable to carefully crafted adversarial examples. To address this challenge, we propose methods for improving robustness without the need for significant amounts of labeled data or heavy computational resources.
△ Less
Submitted 9 March, 2023; v1 submitted 30 January, 2023;
originally announced January 2023.
-
Red teaming ChatGPT via Jailbreaking: Bias, Robustness, Reliability and Toxicity
Authors:
Terry Yue Zhuo,
Yujin Huang,
Chunyang Chen,
Zhenchang Xing
Abstract:
Recent breakthroughs in natural language processing (NLP) have permitted the synthesis and comprehension of coherent text in an open-ended way, therefore translating the theoretical algorithms into practical applications. The large language models (LLMs) have significantly impacted businesses such as report summarization software and copywriters. Observations indicate, however, that LLMs may exhib…
▽ More
Recent breakthroughs in natural language processing (NLP) have permitted the synthesis and comprehension of coherent text in an open-ended way, therefore translating the theoretical algorithms into practical applications. The large language models (LLMs) have significantly impacted businesses such as report summarization software and copywriters. Observations indicate, however, that LLMs may exhibit social prejudice and toxicity, posing ethical and societal dangers of consequences resulting from irresponsibility. Large-scale benchmarks for accountable LLMs should consequently be developed. Although several empirical investigations reveal the existence of a few ethical difficulties in advanced LLMs, there is little systematic examination and user study of the risks and harmful behaviors of current LLM usage. To further educate future efforts on constructing ethical LLMs responsibly, we perform a qualitative research method called ``red teaming'' on OpenAI's ChatGPT\footnote{In this paper, ChatGPT refers to the version released on Dec 15th.} to better understand the practical features of ethical dangers in recent LLMs. We analyze ChatGPT comprehensively from four perspectives: 1) \textit{Bias} 2) \textit{Reliability} 3) \textit{Robustness} 4) \textit{Toxicity}. In accordance with our stated viewpoints, we empirically benchmark ChatGPT on multiple sample datasets. We find that a significant number of ethical risks cannot be addressed by existing benchmarks, and hence illustrate them via additional case studies. In addition, we examine the implications of our findings on AI ethics and harmal behaviors of ChatGPT, as well as future problems and practical design considerations for responsible LLMs. We believe that our findings may give light on future efforts to determine and mitigate the ethical hazards posed by machines in LLM applications.
△ Less
Submitted 29 May, 2023; v1 submitted 30 January, 2023;
originally announced January 2023.
-
SantaCoder: don't reach for the stars!
Authors:
Loubna Ben Allal,
Raymond Li,
Denis Kocetkov,
Chenghao Mou,
Christopher Akiki,
Carlos Munoz Ferrandis,
Niklas Muennighoff,
Mayank Mishra,
Alex Gu,
Manan Dey,
Logesh Kumar Umapathi,
Carolyn Jane Anderson,
Yangtian Zi,
Joel Lamy Poirier,
Hailey Schoelkopf,
Sergey Troshin,
Dmitry Abulkhanov,
Manuel Romero,
Michael Lappert,
Francesco De Toni,
Bernardo García del Río,
Qian Liu,
Shamik Bose,
Urvashi Bhattacharyya,
Terry Yue Zhuo
, et al. (16 additional authors not shown)
Abstract:
The BigCode project is an open-scientific collaboration working on the responsible development of large language models for code. This tech report describes the progress of the collaboration until December 2022, outlining the current state of the Personally Identifiable Information (PII) redaction pipeline, the experiments conducted to de-risk the model architecture, and the experiments investigat…
▽ More
The BigCode project is an open-scientific collaboration working on the responsible development of large language models for code. This tech report describes the progress of the collaboration until December 2022, outlining the current state of the Personally Identifiable Information (PII) redaction pipeline, the experiments conducted to de-risk the model architecture, and the experiments investigating better preprocessing methods for the training data. We train 1.1B parameter models on the Java, JavaScript, and Python subsets of The Stack and evaluate them on the MultiPL-E text-to-code benchmark. We find that more aggressive filtering of near-duplicates can further boost performance and, surprisingly, that selecting files from repositories with 5+ GitHub stars deteriorates performance significantly. Our best model outperforms previous open-source multilingual code generation models (InCoder-6.7B and CodeGen-Multi-2.7B) in both left-to-right generation and infilling on the Java, JavaScript, and Python portions of MultiPL-E, despite being a substantially smaller model. All models are released under an OpenRAIL license at https://hf.co/bigcode.
△ Less
Submitted 24 February, 2023; v1 submitted 9 January, 2023;
originally announced January 2023.
-
PcMSP: A Dataset for Scientific Action Graphs Extraction from Polycrystalline Materials Synthesis Procedure Text
Authors:
Xianjun Yang,
Ya Zhuo,
Julia Zuo,
Xinlu Zhang,
Stephen Wilson,
Linda Petzold
Abstract:
Scientific action graphs extraction from materials synthesis procedures is important for reproducible research, machine automation, and material prediction. But the lack of annotated data has hindered progress in this field. We demonstrate an effort to annotate Polycrystalline Materials Synthesis Procedures (PcMSP) from 305 open access scientific articles for the construction of synthesis action g…
▽ More
Scientific action graphs extraction from materials synthesis procedures is important for reproducible research, machine automation, and material prediction. But the lack of annotated data has hindered progress in this field. We demonstrate an effort to annotate Polycrystalline Materials Synthesis Procedures (PcMSP) from 305 open access scientific articles for the construction of synthesis action graphs. This is a new dataset for material science information extraction that simultaneously contains the synthesis sentences extracted from the experimental paragraphs, as well as the entity mentions and intra-sentence relations. A two-step human annotation and inter-annotator agreement study guarantee the high quality of the PcMSP corpus. We introduce four natural language processing tasks: sentence classification, named entity recognition, relation classification, and joint extraction of entities and relations. Comprehensive experiments validate the effectiveness of several state-of-the-art models for these challenges while leaving large space for improvement. We also perform the error analysis and point out some unique challenges that require further investigation. We will release our annotation scheme, the corpus, and codes to the research community to alleviate the scarcity of labeled data in this domain.
△ Less
Submitted 22 October, 2022;
originally announced October 2022.
-
ViLPAct: A Benchmark for Compositional Generalization on Multimodal Human Activities
Authors:
Terry Yue Zhuo,
Yaqing Liao,
Yuecheng Lei,
Lizhen Qu,
Gerard de Melo,
Xiaojun Chang,
Yazhou Ren,
Zenglin Xu
Abstract:
We introduce ViLPAct, a novel vision-language benchmark for human activity planning. It is designed for a task where embodied AI agents can reason and forecast future actions of humans based on video clips about their initial activities and intents in text. The dataset consists of 2.9k videos from \charades extended with intents via crowdsourcing, a multi-choice question test set, and four strong…
▽ More
We introduce ViLPAct, a novel vision-language benchmark for human activity planning. It is designed for a task where embodied AI agents can reason and forecast future actions of humans based on video clips about their initial activities and intents in text. The dataset consists of 2.9k videos from \charades extended with intents via crowdsourcing, a multi-choice question test set, and four strong baselines. One of the baselines implements a neurosymbolic approach based on a multi-modal knowledge base (MKB), while the other ones are deep generative models adapted from recent state-of-the-art (SOTA) methods. According to our extensive experiments, the key challenges are compositional generalization and effective use of information from both modalities.
△ Less
Submitted 9 March, 2023; v1 submitted 11 October, 2022;
originally announced October 2022.
-
Rethinking Round-Trip Translation for Machine Translation Evaluation
Authors:
Terry Yue Zhuo,
Qiongkai Xu,
Xuanli He,
Trevor Cohn
Abstract:
Automatic evaluation on low-resource language translation suffers from a deficiency of parallel corpora. Round-trip translation could be served as a clever and straightforward technique to alleviate the requirement of the parallel evaluation corpus. However, there was an observation of obscure correlations between the evaluation scores by forward and round-trip translations in the era of statistic…
▽ More
Automatic evaluation on low-resource language translation suffers from a deficiency of parallel corpora. Round-trip translation could be served as a clever and straightforward technique to alleviate the requirement of the parallel evaluation corpus. However, there was an observation of obscure correlations between the evaluation scores by forward and round-trip translations in the era of statistical machine translation (SMT). In this paper, we report the surprising finding that round-trip translation can be used for automatic evaluation without the references. Firstly, our revisit on the round-trip translation in SMT evaluation unveils that its long-standing misunderstanding is essentially caused by copying mechanism. After removing copying mechanism in SMT, round-trip translation scores can appropriately reflect the forward translation performance. Then, we demonstrate the rectification is overdue as round-trip translation could benefit multiple machine translation evaluation tasks. To be more specific, round-trip translation could be used i) to predict corresponding forward translation scores; ii) to improve the performance of the recently advanced quality estimation model; and iii) to identify adversarial competitors in shared tasks via cross-system verification.
△ Less
Submitted 15 May, 2023; v1 submitted 15 September, 2022;
originally announced September 2022.
-
Paraphrasing Techniques for Maritime QA system
Authors:
Fatemeh Shiri,
Terry Yue Zhuo,
Zhuang Li,
Van Nguyen,
Shirui Pan,
Weiqing Wang,
Reza Haffari,
Yuan-Fang Li
Abstract:
There has been an increasing interest in incorporating Artificial Intelligence (AI) into Defence and military systems to complement and augment human intelligence and capabilities. However, much work still needs to be done toward achieving an effective human-machine partnership. This work is aimed at enhancing human-machine communications by developing a capability for automatically translating hu…
▽ More
There has been an increasing interest in incorporating Artificial Intelligence (AI) into Defence and military systems to complement and augment human intelligence and capabilities. However, much work still needs to be done toward achieving an effective human-machine partnership. This work is aimed at enhancing human-machine communications by developing a capability for automatically translating human natural language into a machine-understandable language (e.g., SQL queries). Techniques toward achieving this goal typically involve building a semantic parser trained on a very large amount of high-quality manually-annotated data. However, in many real-world Defence scenarios, it is not feasible to obtain such a large amount of training data. To the best of our knowledge, there are few works trying to explore the possibility of training a semantic parser with limited manually-paraphrased data, in other words, zero-shot. In this paper, we investigate how to exploit paraphrasing methods for the automated generation of large-scale training datasets (in the form of paraphrased utterances and their corresponding logical forms in SQL format) and present our experimental results using real-world data in the maritime domain.
△ Less
Submitted 9 March, 2023; v1 submitted 21 March, 2022;
originally announced March 2022.
-
Free-breathing 3D Cardiac T1 Mapping with Transmit B1 Correction at 3T
Authors:
Paul Kyu Han,
Thibault Marin,
Yanis Djebra,
Vanessa Landes,
Yue Zhuo,
Georges El Fakhri,
Chao Ma
Abstract:
Purpose: To develop a cardiac T1 mapping method for free-breathing 3D T1 mapping of the whole heart at 3T with transmit B1 (B1+) correction Methods: A free-breathing, ECG-gated inversion recovery sequence with spoiled gradient-echo readout was developed and optimized for cardiac T1 mapping at 3T. High-frame rate dynamic images were reconstructed from sparse (k,t)-space data acquired along a stack-…
▽ More
Purpose: To develop a cardiac T1 mapping method for free-breathing 3D T1 mapping of the whole heart at 3T with transmit B1 (B1+) correction Methods: A free-breathing, ECG-gated inversion recovery sequence with spoiled gradient-echo readout was developed and optimized for cardiac T1 mapping at 3T. High-frame rate dynamic images were reconstructed from sparse (k,t)-space data acquired along a stack-of-stars trajectory using a subspace-based method for accelerated imaging. Joint T1 and flip-angle (FA) estimation was performed in T1 mapping to improve its robustness to B1+ inhomogeneity. Subject-specific timing of data acquisition was utilized in the estimation to account for natural heart-rate variations during the imaging experiment. Results: Simulations showed that accuracy and precision of T1 mapping can be improved with joint T1 and FA estimation and optimized ECG-gated SPGR-based IR acquisition scheme. The phantom study showed good agreement between the T1 maps from the proposed method and the reference method. 3D cardiac T1 maps (40 slices) were obtained at a 1.9 mm in-plane and 4.5 mm through-plane spatial resolution from healthy subjects (n=6) with an average imaging time of 14.2 +- 1.6 min (heartbeat rate: 64.2 +- 7.1 bpm), showing myocardial T1 values comparable to those obtained from MOLLI. The proposed method generated B1+ maps with spatially smooth variation showing 21-32% and 11-15% variations across the septal-lateral and inferior-anterior regions of the myocardium in the left ventricle. Conclusion: The proposed method allows free-breathing 3D T1 mapping of the whole-heart with transmit B1 correction in a practical imaging time.
△ Less
Submitted 15 November, 2021;
originally announced November 2021.
-
Deep learning-based GTV contouring modeling inter- and intra- observer variability in sarcomas
Authors:
Thibault Marin,
Yue Zhuo,
Rita Maria Lahoud,
Fei Tian,
Xiaoyue Ma,
Fangxu Xing,
Maryam Moteabbed,
Xiaofeng Liu,
Kira Grogg,
Nadya Shusharina,
Jonghye Woo,
Chao Ma,
Yen-Lin E. Chen,
Georges El Fakhri
Abstract:
Background and purpose: The delineation of the gross tumor volume (GTV) is a critical step for radiation therapy treatment planning. The delineation procedure is typically performed manually which exposes two major issues: cost and reproducibility. Delineation is a time-consuming process that is subject to inter- and intra-observer variability. While methods have been proposed to predict GTV conto…
▽ More
Background and purpose: The delineation of the gross tumor volume (GTV) is a critical step for radiation therapy treatment planning. The delineation procedure is typically performed manually which exposes two major issues: cost and reproducibility. Delineation is a time-consuming process that is subject to inter- and intra-observer variability. While methods have been proposed to predict GTV contours, typical approaches ignore variability and therefore fail to utilize the valuable confidence information offered by multiple contours. Materials and methods: In this work we propose an automatic GTV contouring method for soft-tissue sarcomas from X-ray computed tomography (CT) images, using deep learning by integrating inter- and intra-observer variability in the learned model. Sixty-eight patients with soft tissue and bone sarcomas were considered in this evaluation, all underwent pre-operative CT imaging used to perform GTV delineation. Four radiation oncologists and radiologists performed three contouring trials each for all patients. We quantify variability by defining confidence levels based on the frequency of inclusion of a given voxel into the GTV and use a deep convolutional neural network to learn GTV confidence maps. Results: Results were compared to confidence maps from the four readers as well as ground-truth consensus contours established jointly by all readers. The resulting continuous Dice score between predicted and true confidence maps was 87% and the Hausdorff distance was 14 mm. Conclusion: Results demonstrate the ability of the proposed method to predict accurate contours while utilizing variability and as such it can be used to improve clinical workflow.
△ Less
Submitted 10 October, 2021;
originally announced October 2021.
-
Optical spectroscopy and ultrafast pump-probe study of a quasi-one-dimensional charge density wave in CuTe
Authors:
R. S. Li,
L. Yue,
Q. Wu,
S. X. Xu,
Q. M. Liu,
Z. X. Wang,
T. C. Hu,
X. Y. Zhuo,
L. Y. Shi,
S. J. Zhang,
D. Wu,
T. Dong,
N. L. Wang
Abstract:
CuTe is a two-dimensional (2D) layered material, yet forming a quasi-one-dimensional (quasi-1D) charge-density-wave (CDW) along the a-axis in the ab-plane at high transition temperature $T_{CDW}=335$ K. However, the anisotropic properties of CuTe remain to be explored. Here we performed combined transport, polarized infrared reflectivity, and ultrafast pump-probe spectroscopy to investigate the un…
▽ More
CuTe is a two-dimensional (2D) layered material, yet forming a quasi-one-dimensional (quasi-1D) charge-density-wave (CDW) along the a-axis in the ab-plane at high transition temperature $T_{CDW}=335$ K. However, the anisotropic properties of CuTe remain to be explored. Here we performed combined transport, polarized infrared reflectivity, and ultrafast pump-probe spectroscopy to investigate the underlying CDW physics of CuTe. Polarized optical measurement clearly revealed that an energy gap gradually forms along the a-axis upon cooling, while optical evidence of gap signature is absent along the b-axis, suggesting pronounced electronic anisotropy in this quasi-2D material. Time-resolved optical reflectivity measurement revealed that the amplitude and relaxation time of photo-excited quasiparticles change dramatically across the CDW phase transition. Taking fast Fourier transformation of the oscillation signals arising from collective excitations, we identify the 1.65-THz mode as the CDW amplitude mode, whose energy softens gradually at elevated temperatures. Consequently, we provide further evidence for the formation of completely anisotropic CDW order in CuTe, which is quite rare in quasi-2D materials.
△ Less
Submitted 1 March, 2022; v1 submitted 4 October, 2021;
originally announced October 2021.
-
Smartphone Data Reveal Neighborhood-Level Racial Disparities in Police Presence
Authors:
M. Keith Chen,
Katherine L. Christensen,
Elicia John,
Emily Owens,
Yilin Zhuo
Abstract:
While extensive, research on policing in America has focused on documented actions such as stops and arrests -- less is known about patrolling and presence. We map the movements of over ten thousand police officers across twenty-one of America's largest cities by combining anonymized smartphone data with station and precinct boundaries. Police spend considerably more time in Black neighborhoods, a…
▽ More
While extensive, research on policing in America has focused on documented actions such as stops and arrests -- less is known about patrolling and presence. We map the movements of over ten thousand police officers across twenty-one of America's largest cities by combining anonymized smartphone data with station and precinct boundaries. Police spend considerably more time in Black neighborhoods, a disparity which persists after controlling for density, socioeconomics, and crime-driven demand for policing. Our results suggest that roughly half of observed racial disparities in arrests are associated with this exposure disparity, which is lower in cities with more supervisor (but not officer) diversity.
△ Less
Submitted 9 March, 2022; v1 submitted 26 September, 2021;
originally announced September 2021.
-
Neural-Symbolic Commonsense Reasoner with Relation Predictors
Authors:
Farhad Moghimifar,
Lizhen Qu,
Yue Zhuo,
Gholamreza Haffari,
Mahsa Baktashmotlagh
Abstract:
Commonsense reasoning aims to incorporate sets of commonsense facts, retrieved from Commonsense Knowledge Graphs (CKG), to draw conclusion about ordinary situations. The dynamic nature of commonsense knowledge postulates models capable of performing multi-hop reasoning over new situations. This feature also results in having large-scale sparse Knowledge Graphs, where such reasoning process is need…
▽ More
Commonsense reasoning aims to incorporate sets of commonsense facts, retrieved from Commonsense Knowledge Graphs (CKG), to draw conclusion about ordinary situations. The dynamic nature of commonsense knowledge postulates models capable of performing multi-hop reasoning over new situations. This feature also results in having large-scale sparse Knowledge Graphs, where such reasoning process is needed to predict relations between new events. However, existing approaches in this area are limited by considering CKGs as a limited set of facts, thus rendering them unfit for reasoning over new unseen situations and events. In this paper, we present a neural-symbolic reasoner, which is capable of reasoning over large-scale dynamic CKGs. The logic rules for reasoning over CKGs are learned during training by our model. In addition to providing interpretable explanation, the learned logic rules help to generalise prediction to newly introduced events. Experimental results on the task of link prediction on CKGs prove the effectiveness of our model by outperforming the state-of-the-art models.
△ Less
Submitted 14 May, 2021;
originally announced May 2021.
-
Comparing interpretability and explainability for feature selection
Authors:
Jack Dunn,
Luca Mingardi,
Ying Daisy Zhuo
Abstract:
A common approach for feature selection is to examine the variable importance scores for a machine learning model, as a way to understand which features are the most relevant for making predictions. Given the significance of feature selection, it is crucial for the calculated importance scores to reflect reality. Falsely overestimating the importance of irrelevant features can lead to false discov…
▽ More
A common approach for feature selection is to examine the variable importance scores for a machine learning model, as a way to understand which features are the most relevant for making predictions. Given the significance of feature selection, it is crucial for the calculated importance scores to reflect reality. Falsely overestimating the importance of irrelevant features can lead to false discoveries, while underestimating importance of relevant features may lead us to discard important features, resulting in poor model performance. Additionally, black-box models like XGBoost provide state-of-the art predictive performance, but cannot be easily understood by humans, and thus we rely on variable importance scores or methods for explainability like SHAP to offer insight into their behavior.
In this paper, we investigate the performance of variable importance as a feature selection method across various black-box and interpretable machine learning methods. We compare the ability of CART, Optimal Trees, XGBoost and SHAP to correctly identify the relevant subset of variables across a number of experiments. The results show that regardless of whether we use the native variable importance method or SHAP, XGBoost fails to clearly distinguish between relevant and irrelevant features. On the other hand, the interpretable methods are able to correctly and efficiently identify irrelevant features, and thus offer significantly better performance for feature selection.
△ Less
Submitted 11 May, 2021;
originally announced May 2021.
-
PyArmadillo: a streamlined linear algebra library for Python
Authors:
Jason Rumengan,
Terry Yue Zhuo,
Conrad Sanderson
Abstract:
PyArmadillo is a linear algebra library for the Python language, with the aim of closely mirroring the programming interface of the widely used Armadillo C++ library, which in turn is deliberately similar to Matlab. PyArmadillo hence facilitates algorithm prototyping with Matlab-like syntax directly in Python, and relatively straightforward conversion of PyArmadillo-based Python code into performa…
▽ More
PyArmadillo is a linear algebra library for the Python language, with the aim of closely mirroring the programming interface of the widely used Armadillo C++ library, which in turn is deliberately similar to Matlab. PyArmadillo hence facilitates algorithm prototyping with Matlab-like syntax directly in Python, and relatively straightforward conversion of PyArmadillo-based Python code into performant Armadillo-based C++ code. The converted code can be used for purposes such as speeding up Python-based programs in conjunction with pybind11, or the integration of algorithms originally prototyped in Python into larger C++ codebases. PyArmadillo provides objects for matrices and cubes, as well as over 200 associated functions for manipulating data stored in the objects. Integer, floating point and complex numbers are supported. Various matrix factorisations are provided through integration with LAPACK, or one of its high performance drop-in replacements such as Intel MKL or OpenBLAS. PyArmadillo is open-source software, distributed under the Apache 2.0 license; it can be obtained at https://pyarma.sourceforge.io or via the Python Package Index in precompiled form.
△ Less
Submitted 20 October, 2021; v1 submitted 22 April, 2021;
originally announced April 2021.
-
Detecting Racial Bias in Jury Selection
Authors:
Jack Dunn,
Ying Daisy Zhuo
Abstract:
To support the 2019 U.S. Supreme Court case "Flowers v. Mississippi", APM Reports collated historical court records to assess whether the State exhibited a racial bias in striking potential jurors. This analysis used backward stepwise logistic regression to conclude that race was a significant factor, however this method for selecting relevant features is only a heuristic, and additionally cannot…
▽ More
To support the 2019 U.S. Supreme Court case "Flowers v. Mississippi", APM Reports collated historical court records to assess whether the State exhibited a racial bias in striking potential jurors. This analysis used backward stepwise logistic regression to conclude that race was a significant factor, however this method for selecting relevant features is only a heuristic, and additionally cannot consider interactions between features. We apply Optimal Feature Selection to identify the globally-optimal subset of features and affirm that there is significant evidence of racial bias in the strike decisions. We also use Optimal Classification Trees to segment the juror population subgroups with similar characteristics and probability of being struck, and find that three of these subgroups exhibit significant racial disparity in strike rate, pinpointing specific areas of bias in the dataset.
△ Less
Submitted 22 March, 2021;
originally announced March 2021.
-
Interpretable Predictive Maintenance for Hard Drives
Authors:
Maxime Amram,
Jack Dunn,
Jeremy J. Toledano,
Ying Daisy Zhuo
Abstract:
Existing machine learning approaches for data-driven predictive maintenance are usually black boxes that claim high predictive power yet cannot be understood by humans. This limits the ability of humans to use these models to derive insights and understanding of the underlying failure mechanisms, and also limits the degree of confidence that can be placed in such a system to perform well on future…
▽ More
Existing machine learning approaches for data-driven predictive maintenance are usually black boxes that claim high predictive power yet cannot be understood by humans. This limits the ability of humans to use these models to derive insights and understanding of the underlying failure mechanisms, and also limits the degree of confidence that can be placed in such a system to perform well on future data. We consider the task of predicting hard drive failure in a data center using recent algorithms for interpretable machine learning. We demonstrate that these methods provide meaningful insights about short- and long-term drive health, while also maintaining high predictive performance. We also show that these analyses still deliver useful insights even when limited historical data is available, enabling their use in situations where data collection has only recently begun.
△ Less
Submitted 12 February, 2021;
originally announced February 2021.
-
FedNS: Improving Federated Learning for collaborative image classification on mobile clients
Authors:
Yaoxin Zhuo,
Baoxin Li
Abstract:
Federated Learning (FL) is a paradigm that aims to support loosely connected clients in learning a global model collaboratively with the help of a centralized server. The most popular FL algorithm is Federated Averaging (FedAvg), which is based on taking weighted average of the client models, with the weights determined largely based on dataset sizes at the clients. In this paper, we propose a new…
▽ More
Federated Learning (FL) is a paradigm that aims to support loosely connected clients in learning a global model collaboratively with the help of a centralized server. The most popular FL algorithm is Federated Averaging (FedAvg), which is based on taking weighted average of the client models, with the weights determined largely based on dataset sizes at the clients. In this paper, we propose a new approach, termed Federated Node Selection (FedNS), for the server's global model aggregation in the FL setting. FedNS filters and re-weights the clients' models at the node/kernel level, hence leading to a potentially better global model by fusing the best components of the clients. Using collaborative image classification as an example, we show with experiments from multiple datasets and networks that FedNS can consistently achieve improved performance over FedAvg.
△ Less
Submitted 20 January, 2021;
originally announced January 2021.
-
Optimal Policy Trees
Authors:
Maxime Amram,
Jack Dunn,
Ying Daisy Zhuo
Abstract:
We propose an approach for learning optimal tree-based prescription policies directly from data, combining methods for counterfactual estimation from the causal inference literature with recent advances in training globally-optimal decision trees. The resulting method, Optimal Policy Trees, yields interpretable prescription policies, is highly scalable, and handles both discrete and continuous tre…
▽ More
We propose an approach for learning optimal tree-based prescription policies directly from data, combining methods for counterfactual estimation from the causal inference literature with recent advances in training globally-optimal decision trees. The resulting method, Optimal Policy Trees, yields interpretable prescription policies, is highly scalable, and handles both discrete and continuous treatments. We conduct extensive experiments on both synthetic and real-world datasets and demonstrate that these trees offer best-in-class performance across a wide variety of problems.
△ Less
Submitted 3 December, 2020;
originally announced December 2020.
-
COSMO: Conditional SEQ2SEQ-based Mixture Model for Zero-Shot Commonsense Question Answering
Authors:
Farhad Moghimifar,
Lizhen Qu,
Yue Zhuo,
Mahsa Baktashmotlagh,
Gholamreza Haffari
Abstract:
Commonsense reasoning refers to the ability of evaluating a social situation and acting accordingly. Identification of the implicit causes and effects of a social context is the driving capability which can enable machines to perform commonsense reasoning. The dynamic world of social interactions requires context-dependent on-demand systems to infer such underlying information. However, current ap…
▽ More
Commonsense reasoning refers to the ability of evaluating a social situation and acting accordingly. Identification of the implicit causes and effects of a social context is the driving capability which can enable machines to perform commonsense reasoning. The dynamic world of social interactions requires context-dependent on-demand systems to infer such underlying information. However, current approaches in this realm lack the ability to perform commonsense reasoning upon facing an unseen situation, mostly due to incapability of identifying a diverse range of implicit social relations. Hence they fail to estimate the correct reasoning path. In this paper, we present Conditional SEQ2SEQ-based Mixture model (COSMO), which provides us with the capabilities of dynamic and diverse content generation. We use COSMO to generate context-dependent clauses, which form a dynamic Knowledge Graph (KG) on-the-fly for commonsense reasoning. To show the adaptability of our model to context-dependant knowledge generation, we address the task of zero-shot commonsense question answering. The empirical results indicate an improvement of up to +5.2% over the state-of-the-art models.
△ Less
Submitted 2 November, 2020;
originally announced November 2020.