Cryptography and Security
See recent articles
- [1] arXiv:2409.09143 [pdf, html, other]
-
Title: DomURLs_BERT: Pre-trained BERT-based Model for Malicious Domains and URLs Detection and ClassificationAbdelkader El Mahdaouy, Salima Lamsiyah, Meryem Janati Idrissi, Hamza Alami, Zakaria Yartaoui, Ismail BerradaSubjects: Cryptography and Security (cs.CR); Computation and Language (cs.CL)
Detecting and classifying suspicious or malicious domain names and URLs is fundamental task in cybersecurity. To leverage such indicators of compromise, cybersecurity vendors and practitioners often maintain and update blacklists of known malicious domains and URLs. However, blacklists frequently fail to identify emerging and obfuscated threats. Over the past few decades, there has been significant interest in developing machine learning models that automatically detect malicious domains and URLs, addressing the limitations of blacklists maintenance and updates. In this paper, we introduce DomURLs_BERT, a pre-trained BERT-based encoder adapted for detecting and classifying suspicious/malicious domains and URLs. DomURLs_BERT is pre-trained using the Masked Language Modeling (MLM) objective on a large multilingual corpus of URLs, domain names, and Domain Generation Algorithms (DGA) dataset. In order to assess the performance of DomURLs_BERT, we have conducted experiments on several binary and multi-class classification tasks involving domain names and URLs, covering phishing, malware, DGA, and DNS tunneling. The evaluations results show that the proposed encoder outperforms state-of-the-art character-based deep learning models and cybersecurity-focused BERT models across multiple tasks and datasets. The pre-training dataset, the pre-trained DomURLs_BERT encoder, and the experiments source code are publicly available.
- [2] arXiv:2409.09174 [pdf, other]
-
Title: Incorporation of Verifier Functionality in the Software for Operations and Network Attack Results Review and the Autonomous Penetration Testing SystemComments: The U.S. federal sponsor has requested that we not include funding acknowledgement for this publicationSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
The software for operations and network attack results review (SONARR) and the autonomous penetration testing system (APTS) use facts and common properties in digital twin networks to represent real-world entities. However, in some cases fact values will change regularly, making it difficult for objects in SONARR and APTS to consistently and accurately represent their real-world counterparts. This paper proposes and evaluates the addition of verifiers, which check real-world conditions and update network facts, to SONARR. This inclusion allows SONARR to retrieve fact values from its executing environment and update its network, providing a consistent method of ensuring that the operations and, therefore, the results align with the real-world systems being assessed. Verifiers allow arbitrary scripts and dynamic arguments to be added to normal SONARR operations. This provides a layer of flexibility and consistency that results in more reliable output from the software.
- [3] arXiv:2409.09175 [pdf, other]
-
Title: Cybersecurity Software Tool Evaluation Using a 'Perfect' Network ModelComments: The U.S. federal sponsor has requested that we not include funding acknowledgement for this publicationSubjects: Cryptography and Security (cs.CR)
Cybersecurity software tool evaluation is difficult due to the inherently adversarial nature of the field. A penetration testing (or offensive) tool must be tested against a viable defensive adversary and a defensive tool must, similarly, be tested against a viable offensive adversary. Characterizing the tool's performance inherently depends on the quality of the adversary, which can vary from test to test. This paper proposes the use of a 'perfect' network, representing computing systems, a network and the attack pathways through it as a methodology to use for testing cybersecurity decision-making tools. This facilitates testing by providing a known and consistent standard for comparison. It also allows testing to include researcher-selected levels of error, noise and uncertainty to evaluate cybersecurity tools under these experimental conditions.
- [4] arXiv:2409.09272 [pdf, html, other]
-
Title: SafeEar: Content Privacy-Preserving Audio Deepfake DetectionComments: Accepted by ACM CCS 2024. Please cite this paper as "Xinfeng Li, Kai Li, Yifan Zheng, Chen Yan, Xiaoyu Ji, Wenyuan Xu. SafeEar: Content Privacy-Preserving Audio Deepfake Detection. In Proceedings of ACM Conference on Computer and Communications Security (CCS), 2024."Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Text-to-Speech (TTS) and Voice Conversion (VC) models have exhibited remarkable performance in generating realistic and natural audio. However, their dark side, audio deepfake poses a significant threat to both society and individuals. Existing countermeasures largely focus on determining the genuineness of speech based on complete original audio recordings, which however often contain private content. This oversight may refrain deepfake detection from many applications, particularly in scenarios involving sensitive information like business secrets. In this paper, we propose SafeEar, a novel framework that aims to detect deepfake audios without relying on accessing the speech content within. Our key idea is to devise a neural audio codec into a novel decoupling model that well separates the semantic and acoustic information from audio samples, and only use the acoustic information (e.g., prosody and timbre) for deepfake detection. In this way, no semantic content will be exposed to the detector. To overcome the challenge of identifying diverse deepfake audio without semantic clues, we enhance our deepfake detector with real-world codec augmentation. Extensive experiments conducted on four benchmark datasets demonstrate SafeEar's effectiveness in detecting various deepfake techniques with an equal error rate (EER) down to 2.02%. Simultaneously, it shields five-language speech content from being deciphered by both machine and human auditory analysis, demonstrated by word error rates (WERs) all above 93.93% and our user study. Furthermore, our benchmark constructed for anti-deepfake and anti-content recovery evaluation helps provide a basis for future research in the realms of audio privacy preservation and deepfake detection.
- [5] arXiv:2409.09288 [pdf, html, other]
-
Title: Generating API Parameter Security Rules with LLM for API Misuse DetectionSubjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)
In this paper, we present a new framework, named GPTAid, for automatic APSRs generation by analyzing API source code with LLM and detecting API misuse caused by incorrect parameter use. To validate the correctness of the LLM-generated APSRs, we propose an execution feedback-checking approach based on the observation that security-critical API misuse is often caused by APSRs violations, and most of them result in runtime errors. Specifically, GPTAid first uses LLM to generate raw APSRs and the Right calling code, and then generates Violation code for each raw APSR by modifying the Right calling code using LLM. Subsequently, GPTAid performs dynamic execution on each piece of Violation code and further filters out the incorrect APSRs based on runtime errors. To further generate concrete APSRs, GPTAid employs a code differential analysis to refine the filtered ones. Particularly, as the programming language is more precise than natural language, GPTAid identifies the key operations within Violation code by differential analysis, and then generates the corresponding concrete APSR based on the aforementioned operations. These concrete APSRs could be precisely interpreted into applicable detection code, which proven to be effective in API misuse detection. Implementing on the dataset containing 200 randomly selected APIs from eight popular libraries, GPTAid achieves a precision of 92.3%. Moreover, it generates 6 times more APSRs than state-of-the-art detectors on a comparison dataset of previously reported bugs and APSRs. We further evaluated GPTAid on 47 applications, 210 unknown security bugs were found potentially resulting in severe security issues (e.g., system crashes), 150 of which have been confirmed by developers after our reports.
- [6] arXiv:2409.09356 [pdf, html, other]
-
Title: Towards Robust Detection of Open Source Software Supply Chain Poisoning Attacks in Industry EnvironmentsXinyi Zheng, Chen Wei, Shenao Wang, Yanjie Zhao, Peiming Gao, Yuanchao Zhang, Kailong Wang, Haoyu WangComments: To appear in the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE'24 Industry Showcase), October 27-November 1, 2024, Sacramento, CA, USASubjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)
The exponential growth of open-source package ecosystems, particularly NPM and PyPI, has led to an alarming increase in software supply chain poisoning attacks. Existing static analysis methods struggle with high false positive rates and are easily thwarted by obfuscation and dynamic code execution techniques. While dynamic analysis approaches offer improvements, they often suffer from capturing non-package behaviors and employing simplistic testing strategies that fail to trigger sophisticated malicious behaviors. To address these challenges, we present OSCAR, a robust dynamic code poisoning detection pipeline for NPM and PyPI ecosystems. OSCAR fully executes packages in a sandbox environment, employs fuzz testing on exported functions and classes, and implements aspect-based behavior monitoring with tailored API hook points. We evaluate OSCAR against six existing tools using a comprehensive benchmark dataset of real-world malicious and benign packages. OSCAR achieves an F1 score of 0.95 in NPM and 0.91 in PyPI, confirming that OSCAR is as effective as the current state-of-the-art technologies. Furthermore, for benign packages exhibiting characteristics typical of malicious packages, OSCAR reduces the false positive rate by an average of 32.06% in NPM (from 34.63% to 2.57%) and 39.87% in PyPI (from 41.10% to 1.23%), compared to other tools, significantly reducing the workload of manual reviews in real-world deployments. In cooperation with Ant Group, a leading financial technology company, we have deployed OSCAR on its NPM and PyPI mirrors since January 2023, identifying 10,404 malicious NPM packages and 1,235 malicious PyPI packages over 18 months. This work not only bridges the gap between academic research and industrial application in code poisoning detection but also provides a robust and practical solution that has been thoroughly tested in a real-world industrial setting.
- [7] arXiv:2409.09368 [pdf, html, other]
-
Title: Models Are Codes: Towards Measuring Malicious Code Poisoning Attacks on Pre-trained Model HubsJian Zhao, Shenao Wang, Yanjie Zhao, Xinyi Hou, Kailong Wang, Peiming Gao, Yuanchao Zhang, Chen Wei, Haoyu WangComments: To appear in the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE'24), October 27-November 1, 2024, Sacramento, CA, USASubjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)
The proliferation of pre-trained models (PTMs) and datasets has led to the emergence of centralized model hubs like Hugging Face, which facilitate collaborative development and reuse. However, recent security reports have uncovered vulnerabilities and instances of malicious attacks within these platforms, highlighting growing security concerns. This paper presents the first systematic study of malicious code poisoning attacks on pre-trained model hubs, focusing on the Hugging Face platform. We conduct a comprehensive threat analysis, develop a taxonomy of model formats, and perform root cause analysis of vulnerable formats. While existing tools like Fickling and ModelScan offer some protection, they face limitations in semantic-level analysis and comprehensive threat detection. To address these challenges, we propose MalHug, an end-to-end pipeline tailored for Hugging Face that combines dataset loading script extraction, model deserialization, in-depth taint analysis, and heuristic pattern matching to detect and classify malicious code poisoning attacks in datasets and models. In collaboration with Ant Group, a leading financial technology company, we have implemented and deployed MalHug on a mirrored Hugging Face instance within their infrastructure, where it has been operational for over three months. During this period, MalHug has monitored more than 705K models and 176K datasets, uncovering 91 malicious models and 9 malicious dataset loading scripts. These findings reveal a range of security threats, including reverse shell, browser credential theft, and system reconnaissance. This work not only bridges a critical gap in understanding the security of the PTM supply chain but also provides a practical, industry-tested solution for enhancing the security of pre-trained model hubs.
- [8] arXiv:2409.09380 [pdf, html, other]
-
Title: The Midas Touch: Triggering the Capability of LLMs for RM-API Misuse DetectionSubjects: Cryptography and Security (cs.CR)
In this paper, we propose an LLM-empowered RM-API misuse detection solution, ChatDetector, which fully automates LLMs for documentation understanding which helps RM-API constraints retrieval and RM-API misuse detection. To correctly retrieve the RM-API constraints, ChatDetector is inspired by the ReAct framework which is optimized based on Chain-of-Thought (CoT) to decompose the complex task into allocation APIs identification, RM-object (allocated/released by RM APIs) extraction and RM-APIs pairing (RM APIs usually exist in pairs). It first verifies the semantics of allocation APIs based on the retrieved RM sentences from API documentation through LLMs. Inspired by the LLMs' performance on various prompting methods,ChatDetector adopts a two-dimensional prompting approach for cross-validation. At the same time, an inconsistency-checking approach between the LLMs' output and the reasoning process is adopted for the allocation APIs confirmation with an off-the-shelf Natural Language Processing (NLP) tool. To accurately pair the RM-APIs, ChatDetector decomposes the task again and identifies the RM-object type first, with which it can then accurately pair the releasing APIs and further construct the RM-API constraints for misuse detection. With the diminished hallucinations, ChatDetector identifies 165 pairs of RM-APIs with a precision of 98.21% compared with the state-of-the-art API detectors. By employing a static detector CodeQL, we ethically report 115 security bugs on the applications integrating on six popular libraries to the developers, which may result in severe issues, such as Denial-of-Services (DoS) and memory corruption. Compared with the end-to-end benchmark method, the result shows that ChatDetector can retrieve at least 47% more RM sentences and 80.85% more RM-API constraints.
- [9] arXiv:2409.09428 [pdf, html, other]
-
Title: Harnessing Lightweight Ciphers for PDF EncryptionSubjects: Cryptography and Security (cs.CR)
Portable Document Format (PDF) is a file format which is used worldwide as de-facto standard for exchanging documents. In fact this document that you are currently reading has been uploaded as a PDF. Confidential information is also exchanged through PDFs. According to PDF standard ISO 3000-2:2020, PDF supports encryption to provide confidentiality of the information contained in it along with digital signatures to ensure authenticity. At present, PDF encryption only supports Advanced Encryption Standard (AES) to encrypt and decrypt information. However, Lightweight Cryptography, which is referred to as crypto for resource constrained environments has gained lot of popularity specially due to the NIST Lightweight Cryptography (LWC) competition announced in 2018 for which ASCON was announced as the winner in February 2023. The current work constitutes the first attempt to benchmark Java implementations of NIST LWC winner ASCON and finalist XOODYAK against the current PDF encryption standard AES. Our research reveals that ASCON emerges as a clear winner with regards to throughput when profiled using two state-of-the-art benchmarking tools YourKit and JMH.
- [10] arXiv:2409.09481 [pdf, html, other]
-
Title: Scabbard: An Exploratory Study on Hardware Aware Design Choices of Learning with Rounding-based Key Encapsulation MechanismsSuparna Kundu, Quinten Norga, Angshuman Karmakar, Shreya Gangopadhyay, Jose Maria Bermudo Mera, Ingrid VerbauwhedeSubjects: Cryptography and Security (cs.CR)
Recently, the construction of cryptographic schemes based on hard lattice problems has gained immense popularity. Apart from being quantum resistant, lattice-based cryptography allows a wide range of variations in the underlying hard problem. As cryptographic schemes can work in different environments under different operational constraints such as memory footprint, silicon area, efficiency, power requirement, etc., such variations in the underlying hard problem are very useful for designers to construct different cryptographic schemes.
In this work, we explore various design choices of lattice-based cryptography and their impact on performance in the real world. In particular, we propose a suite of key-encapsulation mechanisms based on the learning with rounding problem with a focus on improving different performance aspects of lattice-based cryptography. Our suite consists of three schemes. Our first scheme is Florete, which is designed for efficiency. The second scheme is Espada, which is aimed at improving parallelization, flexibility, and memory footprint. The last scheme is Sable, which can be considered an improved version in terms of key sizes and parameters of the Saber key-encapsulation mechanism, one of the finalists in the National Institute of Standards and Technology's post-quantum standardization procedure. In this work, we have described our design rationale behind each scheme. Further, to demonstrate the justification of our design decisions, we have provided software and hardware implementations. Our results show Florete is faster than most state-of-the-art KEMs on software and hardware platforms. The scheme Espada requires less memory and area than the implementation of most state-of-the-art schemes. The implementations of Sable maintain a trade-off between Florete and Espada regarding performance and memory requirements on the hardware and software platform. - [11] arXiv:2409.09493 [pdf, other]
-
Title: Hacking, The Lazy Way: LLM Augmented PentestingComments: 9 pages, 7 figuresSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Security researchers are continually challenged by the need to stay current with rapidly evolving cybersecurity research, tools, and techniques. This constant cycle of learning, unlearning, and relearning, combined with the repetitive tasks of sifting through documentation and analyzing data, often hinders productivity and innovation. This has led to a disparity where only organizations with substantial resources can access top-tier security experts, while others rely on firms with less skilled researchers who focus primarily on compliance rather than actual security.
We introduce "LLM Augmented Pentesting," demonstrated through a tool named "Pentest Copilot," to address this gap. This approach integrates Large Language Models into penetration testing workflows. Our research includes a "chain of thought" mechanism to streamline token usage and boost performance, as well as unique Retrieval Augmented Generation implementation to minimize hallucinations and keep models aligned with the latest techniques. Additionally, we propose a novel file analysis approach, enabling LLMs to understand files. Furthermore, we highlight a unique infrastructure system that supports if implemented, can support in-browser assisted penetration testing, offering a robust platform for cybersecurity professionals, These advancements mark a significant step toward bridging the gap between automated tools and human expertise, offering a powerful solution to the challenges faced by modern cybersecurity teams. - [12] arXiv:2409.09495 [pdf, html, other]
-
Title: Protecting Vehicle Location Privacy with Contextually-Driven Synthetic Location GenerationComments: SIGSPATIAL 2024Subjects: Cryptography and Security (cs.CR)
Geo-obfuscation is a Location Privacy Protection Mechanism used in location-based services that allows users to report obfuscated locations instead of exact ones. A formal privacy criterion, geoindistinguishability (Geo-Ind), requires real locations to be hard to distinguish from nearby locations (by attackers) based on their obfuscated representations. However, Geo-Ind often fails to consider context, such as road networks and vehicle traffic conditions, making it less effective in protecting the location privacy of vehicles, of which the mobility are heavily influenced by these factors.
In this paper, we introduce VehiTrack, a new threat model to demonstrate the vulnerability of Geo-Ind in protecting vehicle location privacy from context-aware inference attacks. Our experiments demonstrate that VehiTrack can accurately determine exact vehicle locations from obfuscated data, reducing average inference errors by 61.20% with Laplacian noise and 47.35% with linear programming (LP) compared to traditional Bayesian attacks. By using contextual data like road networks and traffic flow, VehiTrack effectively eliminates a significant number of seemingly "impossible" locations during its search for the actual location of the vehicles. Based on these insights, we propose TransProtect, a new geo-obfuscation approach that limits obfuscation to realistic vehicle movement patterns, complicating attackers' ability to differentiate obfuscated from actual locations. Our results show that TransProtect increases VehiTrack's inference error by 57.75% with Laplacian noise and 27.21% with LP, significantly enhancing protection against these attacks. - [13] arXiv:2409.09517 [pdf, html, other]
-
Title: Deep Learning Under Siege: Identifying Security Vulnerabilities and Risk Mitigation StrategiesComments: 10 pages, 1 table, 6 equations/metricsJournal-ref: Proceedings of the 20th International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIHMSP) 2024, Matsue JapanSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
With the rise in the wholesale adoption of Deep Learning (DL) models in nearly all aspects of society, a unique set of challenges is imposed. Primarily centered around the architectures of these models, these risks pose a significant challenge, and addressing these challenges is key to their successful implementation and usage in the future. In this research, we present the security challenges associated with the current DL models deployed into production, as well as anticipate the challenges of future DL technologies based on the advancements in computing, AI, and hardware technologies. In addition, we propose risk mitigation techniques to inhibit these challenges and provide metrical evaluations to measure the effectiveness of these metrics.
- [14] arXiv:2409.09558 [pdf, html, other]
-
Title: A Statistical Viewpoint on Differential Privacy: Hypothesis Testing, Representation and Blackwell's TheoremComments: To appear in Annual Review of Statistics and Its ApplicationSubjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
Differential privacy is widely considered the formal privacy for privacy-preserving data analysis due to its robust and rigorous guarantees, with increasingly broad adoption in public services, academia, and industry. Despite originating in the cryptographic context, in this review paper we argue that, fundamentally, differential privacy can be considered a \textit{pure} statistical concept. By leveraging a theorem due to David Blackwell, our focus is to demonstrate that the definition of differential privacy can be formally motivated from a hypothesis testing perspective, thereby showing that hypothesis testing is not merely convenient but also the right language for reasoning about differential privacy. This insight leads to the definition of $f$-differential privacy, which extends other differential privacy definitions through a representation theorem. We review techniques that render $f$-differential privacy a unified framework for analyzing privacy bounds in data analysis and machine learning. Applications of this differential privacy definition to private deep learning, private convex optimization, shuffled mechanisms, and U.S.~Census data are discussed to highlight the benefits of analyzing privacy bounds under this framework compared to existing alternatives.
- [15] arXiv:2409.09602 [pdf, html, other]
-
Title: Security Testbed for Preempting Attacks against Supercomputing InfrastructureComments: Accepted to the Third Annual Workshop on Cyber Security in High-Performance Computing (S-HPC 24)Subjects: Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI)
Preempting attacks targeting supercomputing systems before damage remains the top security priority. The main challenge is that noisy attack attempts and unreliable alerts often mask real attacks, causing permanent damages such as system integrity violations and data breaches. This paper describes a security testbed embedded in live traffic of a supercomputer at the National Center for Supercomputing Applications (NCSA). The objective is to demonstrate attack preemption, i.e., stopping system compromise and data breaches at petascale supercomputers. Deployment of our testbed at NCSA enables the following key contributions:
1) Insights from characterizing unique attack patterns found in real security logs of over 200 security incidents curated in the past two decades at NCSA.
2) Deployment of an attack visualization tool to illustrate the challenges of identifying real attacks in HPC environments and to support security operators in interactive attack analyses.
3) Demonstrate the testbed's utility by running novel models, such as Factor Graph-Based models, to preempt a real-world ransomware family. - [16] arXiv:2409.09606 [pdf, html, other]
-
Title: BULKHEAD: Secure, Scalable, and Efficient Kernel Compartmentalization with PKSComments: Accepted to appear in NDSS'25Subjects: Cryptography and Security (cs.CR); Operating Systems (cs.OS)
The endless stream of vulnerabilities urgently calls for principled mitigation to confine the effect of exploitation. However, the monolithic architecture of commodity OS kernels, like the Linux kernel, allows an attacker to compromise the entire system by exploiting a vulnerability in any kernel component. Kernel compartmentalization is a promising approach that follows the least-privilege principle. However, existing mechanisms struggle with the trade-off on security, scalability, and performance, given the challenges stemming from mutual untrustworthiness among numerous and complex components.
In this paper, we present BULKHEAD, a secure, scalable, and efficient kernel compartmentalization technique that offers bi-directional isolation for unlimited compartments. It leverages Intel's new hardware feature PKS to isolate data and code into mutually untrusted compartments and benefits from its fast compartment switching. With untrust in mind, BULKHEAD introduces a lightweight in-kernel monitor that enforces multiple important security invariants, including data integrity, execute-only memory, and compartment interface integrity. In addition, it provides a locality-aware two-level scheme that scales to unlimited compartments. We implement a prototype system on Linux v6.1 to compartmentalize loadable kernel modules (LKMs). Extensive evaluation confirms the effectiveness of our approach. As the system-wide impacts, BULKHEAD incurs an average performance overhead of 2.44% for real-world applications with 160 compartmentalized LKMs. While focusing on a specific compartment, ApacheBench tests on ipv6 show an overhead of less than 2%. Moreover, the performance is almost unaffected by the number of compartments, which makes it highly scalable. - [17] arXiv:2409.09676 [pdf, other]
-
Title: Nebula: Efficient, Private and Accurate Histogram EstimationSubjects: Cryptography and Security (cs.CR)
We present Nebula, a system for differential private histogram estimation of data distributed among clients. Nebula enables clients to locally subsample and encode their data such that an untrusted server learns only data values that meet an aggregation threshold to satisfy differential privacy guarantees. Compared with other private histogram estimation systems, Nebula uniquely achieves all of the following: \textit{i)} a strict upper bound on privacy leakage; \textit{ii)} client privacy under realistic trust assumptions; \textit{iii)} significantly better utility compared to standard local differential privacy systems; and \textit{iv)} avoiding trusted third-parties, multi-party computation, or trusted hardware. We provide both a formal evaluation of Nebula's privacy, utility and efficiency guarantees, along with an empirical evaluation on three real-world datasets. We demonstrate that clients can encode and upload their data efficiently (only 0.0058 seconds running time and 0.0027 MB data communication) and privately (strong differential privacy guarantees $\varepsilon=1$). On the United States Census dataset, the Nebula's untrusted aggregation server estimates histograms with above 88\% better utility than the existing local deployment of differential privacy. Additionally, we describe a variant that allows clients to submit multi-dimensional data, with similar privacy, utility, and performance. Finally, we provide an open source implementation of Nebula.
- [18] arXiv:2409.09739 [pdf, html, other]
-
Title: PersonaMark: Personalized LLM watermarking for model protection and user attributionYuehan Zhang, Peizhuo Lv, Yinpeng Liu, Yongqiang Ma, Wei Lu, Xiaofeng Wang, Xiaozhong Liu, Jiawei LiuComments: Under reviewSubjects: Cryptography and Security (cs.CR); Computation and Language (cs.CL)
The rapid development of LLMs brings both convenience and potential threats. As costumed and private LLMs are widely applied, model copyright protection has become important. Text watermarking is emerging as a promising solution to AI-generated text detection and model protection issues. However, current text watermarks have largely ignored the critical need for injecting different watermarks for different users, which could help attribute the watermark to a specific individual. In this paper, we explore the personalized text watermarking scheme for LLM copyright protection and other scenarios, ensuring accountability and traceability in content generation. Specifically, we propose a novel text watermarking method PersonaMark that utilizes sentence structure as the hidden medium for the watermark information and optimizes the sentence-level generation algorithm to minimize disruption to the model's natural generation process. By employing a personalized hashing function to inject unique watermark signals for different users, personalized watermarked text can be obtained. Since our approach performs on sentence level instead of token probability, the text quality is highly preserved. The injection process of unique watermark signals for different users is time-efficient for a large number of users with the designed multi-user hashing function. As far as we know, we achieved personalized text watermarking for the first time through this. We conduct an extensive evaluation of four different LLMs in terms of perplexity, sentiment polarity, alignment, readability, etc. The results demonstrate that our method maintains performance with minimal perturbation to the model's behavior, allows for unbiased insertion of watermark information, and exhibits strong watermark recognition capabilities.
- [19] arXiv:2409.09744 [pdf, html, other]
-
Title: Taming the Ransomware Threats: Leveraging Prospect Theory for Rational Payment DecisionsSubjects: Cryptography and Security (cs.CR)
Day by day, the frequency of ransomware attacks on organizations is experiencing a significant surge. High-profile incidents involving major entities like Las Vegas giants MGM Resorts, Caesar Entertainment, and Boeing underscore the profound impact, posing substantial business barriers. When a sudden cyberattack occurs, organizations often find themselves at a loss, with a looming countdown to pay the ransom, leading to a cascade of impromptu and unfavourable decisions. This paper adopts a novel approach, leveraging Prospect Theory, to elucidate the tactics employed by cyber attackers to entice organizations into paying the ransom. Furthermore, it introduces an algorithm based on Prospect Theory and an Attack Recovery Plan, enabling organizations to make informed decisions on whether to consent to the ransom demands or resist. This algorithm Ransomware Risk Analysis and Decision Support (RADS) uses Prospect Theory to re-instantiate the shifted reference manipulated as perceived gains by attackers and adjusts for the framing effect created due to time urgency. Additionally, leveraging application criticality and incorporating Prospect Theory's insights into under/over weighing probabilities, RADS facilitates informed decision-making that transcends the simplistic framework of "consent" or "resistance," enabling organizations to achieve optimal decisions.
- [20] arXiv:2409.09794 [pdf, html, other]
-
Title: Federated Learning in Adversarial Environments: Testbed Design and Poisoning Resilience in CybersecurityComments: 7 pages, 4 figuresSubjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
This paper presents the design and implementation of a Federated Learning (FL) testbed, focusing on its application in cybersecurity and evaluating its resilience against poisoning attacks. Federated Learning allows multiple clients to collaboratively train a global model while keeping their data decentralized, addressing critical needs for data privacy and security, particularly in sensitive fields like cybersecurity. Our testbed, built using the Flower framework, facilitates experimentation with various FL frameworks, assessing their performance, scalability, and ease of integration. Through a case study on federated intrusion detection systems, we demonstrate the testbed's capabilities in detecting anomalies and securing critical infrastructure without exposing sensitive network data. Comprehensive poisoning tests, targeting both model and data integrity, evaluate the system's robustness under adversarial conditions. Our results show that while federated learning enhances data privacy and distributed learning, it remains vulnerable to poisoning attacks, which must be mitigated to ensure its reliability in real-world applications.
- [21] arXiv:2409.09860 [pdf, html, other]
-
Title: Revisiting Physical-World Adversarial Attack on Traffic Sign Recognition: A Commercial Systems PerspectiveComments: Accepted by NDSS 2025Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
Traffic Sign Recognition (TSR) is crucial for safe and correct driving automation. Recent works revealed a general vulnerability of TSR models to physical-world adversarial attacks, which can be low-cost, highly deployable, and capable of causing severe attack effects such as hiding a critical traffic sign or spoofing a fake one. However, so far existing works generally only considered evaluating the attack effects on academic TSR models, leaving the impacts of such attacks on real-world commercial TSR systems largely unclear. In this paper, we conduct the first large-scale measurement of physical-world adversarial attacks against commercial TSR systems. Our testing results reveal that it is possible for existing attack works from academia to have highly reliable (100\%) attack success against certain commercial TSR system functionality, but such attack capabilities are not generalizable, leading to much lower-than-expected attack success rates overall. We find that one potential major factor is a spatial memorization design that commonly exists in today's commercial TSR systems. We design new attack success metrics that can mathematically model the impacts of such design on the TSR system-level attack success, and use them to revisit existing attacks. Through these efforts, we uncover 7 novel observations, some of which directly challenge the observations or claims in prior works due to the introduction of the new metrics.
- [22] arXiv:2409.09928 [pdf, html, other]
-
Title: High-Security Hardware Module with PUF and Hybrid Cryptography for Data SecuritySubjects: Cryptography and Security (cs.CR); Hardware Architecture (cs.AR)
This research highlights the rapid development of technology in the industry, particularly Industry 4.0, supported by fundamental technologies such as the Internet of Things (IoT), cloud computing, big data, and data analysis. Despite providing efficiency, these developments also bring negative impacts, such as increased cyber-attacks, especially in manufacturing. One standard attack in the industry is the man-in-the-middle (MITM) attack, which can have severe consequences for the physical data transfer, particularly on the integrity of sensor and actuator data in industrial machines. This research proposes a solution by developing a hardware security module (HSM) using a field-programmable gate array (FPGA) with physical unclonable function (PUF) authentication and a hybrid encryption data security system. Experimental results show that this research improves some criteria in industrial cybersecurity, ensuring critical data security from cyber-attacks in industrial machines.
- [23] arXiv:2409.09948 [pdf, html, other]
-
Title: Enhancing Industrial Cybersecurity: SoftHSM Implementation on SBCs for Mitigating MITM AttacksSubjects: Cryptography and Security (cs.CR); Hardware Architecture (cs.AR)
The rapid growth of industrial technology, driven by automation, IoT, and cloud computing, has also increased the risk of cyberattacks, such as Man-in-the-Middle (MITM) attacks. A standard solution to protect data is using a Hardware Security Module (HSM), but its high implementation cost has led to the development of a more affordable alternative: SoftHSM. This software-based module manages encryption and decryption keys using cryptographic algorithms. This study simulates the use of SoftHSM on a single-board computer (SBC) to enhance industrial system security and cost-effectively mitigate MITM attacks. The security system integrates AES and RSA cryptographic algorithms, with SoftHSM handling RSA key storage. The results show that HSM protects RSA private keys from extraction attempts, ensuring data security. In terms of performance, the system achieved an average encryption time of 3.29 seconds, a slot access time of 0.018 seconds, and a decryption time of 2.558 seconds. It also demonstrated efficient memory usage, with 37.24% for encryption and 24.24% for decryption, while consuming 5.20 V and 0.72 A during processing.
- [24] arXiv:2409.09996 [pdf, html, other]
-
Title: FreeMark: A Non-Invasive White-Box Watermarking for Deep Neural NetworksSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Deep neural networks (DNNs) have achieved significant success in real-world applications. However, safeguarding their intellectual property (IP) remains extremely challenging. Existing DNN watermarking for IP protection often require modifying DNN models, which reduces model performance and limits their practicality.
This paper introduces FreeMark, a novel DNN watermarking framework that leverages cryptographic principles without altering the original host DNN model, thereby avoiding any reduction in model performance. Unlike traditional DNN watermarking methods, FreeMark innovatively generates secret keys from a pre-generated watermark vector and the host model using gradient descent. These secret keys, used to extract watermark from the model's activation values, are securely stored with a trusted third party, enabling reliable watermark extraction from suspect models. Extensive experiments demonstrate that FreeMark effectively resists various watermark removal attacks while maintaining high watermark capacity. - [25] arXiv:2409.10031 [pdf, html, other]
-
Title: Assessing the Impact of Sanctions in the Crypto Ecosystem: Effective Measures or Ineffective Deterrents?Comments: preprint version of paper presented at 8th International Workshop on Cryptocurrencies and Blockchain Technology - CBT 2024 and published in LNCS ProceedingsSubjects: Cryptography and Security (cs.CR); Computational Engineering, Finance, and Science (cs.CE)
Regulatory authorities aim to tackle illegal activities by targeting the economic incentives that drive such behaviour. This is typically achieved through the implementation of financial sanctions against the entities involved in the crimes. However, the rise of cryptocurrencies has presented new challenges, allowing entities to evade these sanctions and continue criminal operations. Consequently, enforcement measures have been expanded to include crypto assets information of sanctioned entities. Yet, due to the nature of the crypto ecosystem, blocking or freezing these digital assets is harder and, in some cases, such as with Bitcoin, unfeasible. Therefore, sanctions serve merely as deterrents. For this reason, in this study, we aim to assess the impact of these sanctions on entities' crypto activities, particularly those related to the Bitcoin ecosystem. Our objective is to shed light on the validity and effectiveness (or lack thereof) of such countermeasures. Specifically, we analyse the transactions and the amount of USD moved by punished entities that possess crypto addresses after being sanctioned by the authority agency. Results indicate that while sanctions have been effective for half of the examined entities, the others continue to move funds through sanctioned addresses. Furthermore, punished entities demonstrate a preference for utilising rapid exchange services to convert their funds, rather than employing dedicated money laundering services. To the best of our knowledge, this study offers valuable insights into how entities use crypto assets to circumvent sanctions.
- [26] arXiv:2409.10057 [pdf, html, other]
-
Title: A Response to: A Note on "Privacy Preserving n-Party Scalar Product Protocol"Subjects: Cryptography and Security (cs.CR)
We reply to the comments on our proposed privacy preserving n-party scalar product protocol made by Liu. In their comment Liu raised concerns regarding the security and scalability of the $n$-party scalar product protocol. In this reply, we show that their concerns are unfounded and that the $n$-party scalar product protocol is safe for its intended purposes. Their concerns regarding the security are based on a misunderstanding of the protocol. Additionally, while the scalability of the protocol puts limitations on its use, the protocol still has numerous practical applications when applied in the correct scenarios. Specifically within vertically partitioned scenarios, which often involve few parties, the protocol remains practical. In this reply we clarify Liu's misunderstanding. Additionally, we explain why the protocols scaling is not a practical problem in its intended application.
- [27] arXiv:2409.10109 [pdf, html, other]
-
Title: Analysing Attacks on Blockchain Systems in a Layer-based ApproachSubjects: Cryptography and Security (cs.CR); Emerging Technologies (cs.ET)
Blockchain is a growing decentralized system built for transparency and immutability. There have been several major attacks on blockchain-based systems, leaving a gap in the trustability of this system. This article presents a comprehensive study of 23 attacks on blockchain systems and categorizes them using a layer-based approach. This approach provides an in-depth analysis of the feasibility and motivation of these attacks. In addition, a framework is proposed that enables a systematic analysis of the impact and interconnection of these attacks, thereby providing a means of identifying potential attack vectors and designing appropriate countermeasures to strengthen any blockchain system.
- [28] arXiv:2409.10192 [pdf, other]
-
Title: PrePaMS: Privacy-Preserving Participant Management System for Studies with Rewards and PrerequisitesSubjects: Cryptography and Security (cs.CR)
Taking part in surveys, experiments, and studies is often compensated by rewards to increase the number of participants and encourage attendance. While privacy requirements are usually considered for participation, privacy aspects of the reward procedure are mostly ignored. To this end, we introduce PrePaMS, an efficient participation management system that supports prerequisite checks and participation rewards in a privacy-preserving way. Our system organizes participations with potential (dis-)qualifying dependencies and enables secure reward payoffs. By leveraging a set of proven cryptographic primitives and mechanisms such as anonymous credentials and zero-knowledge proofs, participations are protected so that service providers and organizers cannot derive the identity of participants even within the reward process. In this paper, we have designed and implemented a prototype of PrePaMS to show its effectiveness and evaluated its performance under realistic workloads. PrePaMS covers the information whether subjects have participated in surveys, experiments, or studies. When combined with other secure solutions for the actual data collection within these events, PrePaMS can represent a cornerstone for more privacy-preserving empirical research.
- [29] arXiv:2409.10336 [pdf, other]
-
Title: Execution-time opacity control for timed automataComments: This is the author (and extended) version of the manuscript of the same name published in the proceedings of the 22nd International Conference on Software Engineering and Formal Methods ({SEFM} 2024)Subjects: Cryptography and Security (cs.CR)
Timing leaks in timed automata (TA) can occur whenever an attacker is able to deduce a secret by observing some timed behavior. In execution-time opacity, the attacker aims at deducing whether a private location was visited, by observing only the execution time. It can be decided whether a TA is opaque in this setting. In this work, we tackle control, and show that we are able to decide whether a TA can be controlled at runtime to ensure opacity. Our method is constructive, in the sense that we can exhibit such a controller. We also address the case when the attacker cannot have an infinite precision in its observations.
- [30] arXiv:2409.10337 [pdf, html, other]
-
Title: Security, Trust and Privacy challenges in AI-driven 6G NetworksComments: 19 pages, 6 tablesJournal-ref: CS & IT Conference Proceedings (CSCP 2024) pp.95-113Subjects: Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI)
The advent of 6G networks promises unprecedented advancements in wireless communication, offering wider bandwidth and lower latency compared to its predecessors. This article explores the evolving infrastructure of 6G networks, emphasizing the transition towards a more disaggregated structure and the integration of artificial intelligence (AI) technologies. Furthermore, it explores the security, trust and privacy challenges and attacks in 6G networks, particularly those related to the use of AI. It presents a classification of network attacks stemming from its AI-centric architecture and explores technologies designed to detect or mitigate these emerging threats. The paper concludes by examining the implications and risks linked to the utilization of AI in ensuring a robust network.
- [31] arXiv:2409.10411 [pdf, html, other]
-
Title: A Large-Scale Privacy Assessment of Android Third-Party SDKsMark Huasong Meng, Chuan Yan, Yun Hao, Qing Zhang, Zeyu Wang, Kailong Wang, Sin Gee Teo, Guangdong Bai, Jin Song DongComments: 16 pagesSubjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)
Third-party Software Development Kits (SDKs) are widely adopted in Android app development, to effortlessly accelerate development pipelines and enhance app functionality. However, this convenience raises substantial concerns about unauthorized access to users' privacy-sensitive information, which could be further abused for illegitimate purposes like user tracking or monetization. Our study offers a targeted analysis of user privacy protection among Android third-party SDKs, filling a critical gap in the Android software supply chain. It focuses on two aspects of their privacy practices, including data exfiltration and behavior-policy compliance (or privacy compliance), utilizing techniques of taint analysis and large language models. It covers 158 widely-used SDKs from two key SDK release platforms, the official one and a large alternative one. From them, we identified 338 instances of privacy data exfiltration. On the privacy compliance, our study reveals that more than 30% of the examined SDKs fail to provide a privacy policy to disclose their data handling practices. Among those that provide privacy policies, 37% of them over-collect user data, and 88% falsely claim access to sensitive data. We revisit the latest versions of the SDKs after 12 months. Our analysis demonstrates a persistent lack of improvement in these concerning trends. Based on our findings, we propose three actionable recommendations to mitigate the privacy leakage risks and enhance privacy protection for Android users. Our research not only serves as an urgent call for industry attention but also provides crucial insights for future regulatory interventions.
New submissions for Tuesday, 17 September 2024 (showing 31 of 31 entries )
- [32] arXiv:2409.09406 (cross-list from cs.CV) [pdf, html, other]
-
Title: Real-world Adversarial Defense against Patch Attacks based on Diffusion ModelSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Adversarial patches present significant challenges to the robustness of deep learning models, making the development of effective defenses become critical for real-world applications. This paper introduces DIFFender, a novel DIFfusion-based DeFender framework that leverages the power of a text-guided diffusion model to counter adversarial patch attacks. At the core of our approach is the discovery of the Adversarial Anomaly Perception (AAP) phenomenon, which enables the diffusion model to accurately detect and locate adversarial patches by analyzing distributional anomalies. DIFFender seamlessly integrates the tasks of patch localization and restoration within a unified diffusion model framework, enhancing defense efficacy through their close interaction. Additionally, DIFFender employs an efficient few-shot prompt-tuning algorithm, facilitating the adaptation of the pre-trained diffusion model to defense tasks without the need for extensive retraining. Our comprehensive evaluation, covering image classification and face recognition tasks, as well as real-world scenarios, demonstrates DIFFender's robust performance against adversarial attacks. The framework's versatility and generalizability across various settings, classifiers, and attack methodologies mark a significant advancement in adversarial patch defense strategies. Except for the popular visible domain, we have identified another advantage of DIFFender: its capability to easily expand into the infrared domain. Consequently, we demonstrate the good flexibility of DIFFender, which can defend against both infrared and visible adversarial patch attacks alternatively using a universal defense framework.
- [33] arXiv:2409.09661 (cross-list from cs.SE) [pdf, html, other]
-
Title: ContractTinker: LLM-Empowered Vulnerability Repair for Real-World Smart ContractsComments: 4 pages, and to be accepted in ASE2024Subjects: Software Engineering (cs.SE); Cryptography and Security (cs.CR)
Smart contracts are susceptible to being exploited by attackers, especially when facing real-world vulnerabilities. To mitigate this risk, developers often rely on third-party audit services to identify potential vulnerabilities before project deployment. Nevertheless, repairing the identified vulnerabilities is still complex and labor-intensive, particularly for developers lacking security expertise. Moreover, existing pattern-based repair tools mostly fail to address real-world vulnerabilities due to their lack of high-level semantic understanding. To fill this gap, we propose ContractTinker, a Large Language Models (LLMs)-empowered tool for real-world vulnerability repair. The key insight is our adoption of the Chain-of-Thought approach to break down the entire generation task into sub-tasks. Additionally, to reduce hallucination, we integrate program static analysis to guide the LLM. We evaluate ContractTinker on 48 high-risk vulnerabilities. The experimental results show that among the patches generated by ContractTinker, 23 (48%) are valid patches that fix the vulnerabilities, while 10 (21%) require only minor modifications. A video of ContractTinker is available at this https URL.
- [34] arXiv:2409.10020 (cross-list from cs.NI) [pdf, html, other]
-
Title: Li-MSD: A lightweight mitigation solution for DAO insider attack in RPL-based IoTJournal-ref: Future Generation Computer Systems, 159, 327-339 (2024)Subjects: Networking and Internet Architecture (cs.NI); Cryptography and Security (cs.CR)
Many IoT applications run on a wireless infrastructure supported by resource-constrained nodes which is popularly known as Low-Power and Lossy Networks (LLNs). Currently, LLNs play a vital role in digital transformation of industries. The resource limitations of LLNs restrict the usage of traditional routing protocols and therefore require an energy-efficient routing solution. IETF's Routing Protocol for Low-power Lossy Networks (RPL, pronounced 'ripple') is one of the most popular energy-efficient protocols for LLNs, specified in RFC 6550. In RPL, Destination Advertisement Object (DAO) control message is transmitted by a child node to pass on its reachability information to its immediate parent or root node. An attacker may exploit the insecure DAO sending mechanism of RPL to perform 'DAO insider attack' by transmitting DAO multiple times. This paper shows that an aggressive DAO insider attacker can drastically degrade network performance. We propose a Lightweight Mitigation Solution for DAO insider attack, which is termed as 'Li-MSD'. Li-MSD uses a blacklisting strategy to mitigate the attack and restore RPL performance, significantly. By using simulations, it is shown that Li-MSD outperforms the existing solution in the literature.
- [35] arXiv:2409.10226 (cross-list from cs.DC) [pdf, html, other]
-
Title: Privacy-Preserving Distributed Maximum Consensus Without Accuracy LossSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR); Information Theory (cs.IT); Signal Processing (eess.SP)
In distributed networks, calculating the maximum element is a fundamental task in data analysis, known as the distributed maximum consensus problem. However, the sensitive nature of the data involved makes privacy protection essential. Despite its importance, privacy in distributed maximum consensus has received limited attention in the literature. Traditional privacy-preserving methods typically add noise to updates, degrading the accuracy of the final result. To overcome these limitations, we propose a novel distributed optimization-based approach that preserves privacy without sacrificing accuracy. Our method introduces virtual nodes to form an augmented graph and leverages a carefully designed initialization process to ensure the privacy of honest participants, even when all their neighboring nodes are dishonest. Through a comprehensive information-theoretical analysis, we derive a sufficient condition to protect private data against both passive and eavesdropping adversaries. Extensive experiments validate the effectiveness of our approach, demonstrating that it not only preserves perfect privacy but also maintains accuracy, outperforming existing noise-based methods that typically suffer from accuracy loss.
Cross submissions for Tuesday, 17 September 2024 (showing 4 of 4 entries )
- [36] arXiv:1905.09093 (replaced) [pdf, html, other]
-
Title: Zero-Knowledge Proof-of-Identity: Sybil-Resistant, Anonymous Authentication on Permissionless Blockchains and Incentive Compatible, Strictly Dominant CryptocurrenciesComments: New subsubsections: 4.2.8 Substituting EPID for DCAP; 4.3.3 Remote attestation for smartphonesSubjects: Cryptography and Security (cs.CR); Computer Science and Game Theory (cs.GT)
Zero-Knowledge Proof-of-Identity from trusted public certificates (e.g., national identity cards and/or ePassports; eSIM) is introduced here to permissionless blockchains in order to remove the inefficiencies of Sybil-resistant mechanisms such as Proof-of-Work (i.e., high energy and environmental costs) and Proof-of-Stake (i.e., capital hoarding and lower transaction volume). The proposed solution effectively limits the number of mining nodes a single individual would be able to run while keeping membership open to everyone, circumventing the impossibility of full decentralization and the blockchain scalability trilemma when instantiated on a blockchain with a consensus protocol based on the cryptographic random selection of nodes. Resistance to collusion is also considered.
Solving one of the most pressing problems in blockchains, a zk-PoI cryptocurrency is proved to have the following advantageous properties:
- an incentive-compatible protocol for the issuing of cryptocurrency rewards based on a unique Nash equilibrium
- strict domination of mining over all other PoW/PoS cryptocurrencies, thus the zk-PoI cryptocurrency becoming the preferred choice by miners is proved to be a Nash equilibrium and the Evolutionarily Stable Strategy
- PoW/PoS cryptocurrencies are condemned to pay the Price of Crypto-Anarchy, redeemed by the optimal efficiency of zk-PoI as it implements the social optimum
- the circulation of a zk-PoI cryptocurrency Pareto dominates other PoW/PoS cryptocurrencies
- the network effects arising from the social networks inherent to national identity cards and ePassports dominate PoW/PoS cryptocurrencies
- the lower costs of its infrastructure imply the existence of a unique equilibrium where it dominates other forms of payment - [37] arXiv:2109.03097 (replaced) [pdf, html, other]
-
Title: Quantum secure non-malleable-extractorsSubjects: Cryptography and Security (cs.CR); Quantum Physics (quant-ph)
We construct several explicit quantum secure non-malleable-extractors. All the quantum secure non-malleable-extractors we construct are based on the constructions by Chattopadhyay, Goyal and Li [2015] and Cohen [2015].
1) We construct the first explicit quantum secure non-malleable-extractor for (source) min-entropy $k \geq \textsf{poly}\left(\log \left( \frac{n}{\epsilon} \right)\right)$ ($n$ is the length of the source and $\epsilon$ is the error parameter). Previously Aggarwal, Chung, Lin, and Vidick [2019] have shown that the inner-product based non-malleable-extractor proposed by Li [2012] is quantum secure, however it required linear (in $n$) min-entropy and seed length.
Using the connection between non-malleable-extractors and privacy amplification (established first in the quantum setting by Cohen and Vidick [2017]), we get a $2$-round privacy amplification protocol that is secure against active quantum adversaries with communication $\textsf{poly}\left(\log \left( \frac{n}{\epsilon} \right)\right)$, exponentially improving upon the linear communication required by the protocol due to [2019].
2) We construct an explicit quantum secure $2$-source non-malleable-extractor for min-entropy $k \geq n- n^{\Omega(1)}$, with an output of size $n^{\Omega(1)}$ and error $2^{- n^{\Omega(1)}}$.
3) We also study their natural extensions when the tampering of the inputs is performed $t$-times. We construct explicit quantum secure $t$-non-malleable-extractors for both seeded ($t=d^{\Omega(1)}$) as well as $2$-source case ($t=n^{\Omega(1)}$). - [38] arXiv:2312.12131 (replaced) [pdf, html, other]
-
Title: Elliptic Curve Pairing Stealth Address ProtocolsSubjects: Cryptography and Security (cs.CR)
Protecting the privacy of blockchain transactions is extremely important for users. Stealth address protocols (SAP) allow users to receive assets via stealth addresses that they do not associate with their stealth meta-addresses. SAP can be generated using different cryptographic approaches. DKSAP uses an elliptic curve multiplication and hashing of the resulting shared secret. Another approach is to use a elliptic curve pairing. This paper presents four SA protocols that use elliptic curve pairing as a cryptographic solution. ECPDKSAPs are pairing-based protocols that include viewing key and spending key, while ECPSKSAP is a pairing-based protocol that uses a single key with which spending and the viewing key are derived. We find that ECPDKSAPs give significantly better results than DKSAP with the view tag. The best results are achieved with Protocol 3 (Elliptic Curve Pairing Dual Key Stealth Address Protocol), which is Ethereum-friendly. ECPSKSAP is significantly slower, but it provides an interesting theoretical result as it uses only one private key.
- [39] arXiv:2401.07261 (replaced) [pdf, html, other]
-
Title: LookAhead: Preventing DeFi Attacks via Unveiling Adversarial ContractsComments: 21 pages, 7 figuresSubjects: Cryptography and Security (cs.CR)
Decentralized Finance (DeFi) incidents stemming from the exploitation of smart contract vulnerabilities have culminated in financial damages exceeding 3 billion US dollars. Existing defense mechanisms typically focus on detecting and reacting to malicious transactions executed by attackers that target victim contracts. However, with the emergence of private transaction pools where transactions are sent directly to miners without first appearing in public mempools, current detection tools face significant challenges in identifying attack activities effectively.
Based on the fact that most attack logic rely on deploying one or more intermediate smart contracts as supporting components to the exploitation of victim contracts, in this paper, we propose a new direction for detecting DeFi attacks that focuses on identifying adversarial contracts instead of adversarial transactions. Our approach allows us to leverage common attack patterns, code semantics and intrinsic characteristics found in malicious smart contracts to build the LookAhead system based on Machine Learning (ML) classifiers and a transformer model that is able to effectively distinguish adversarial contracts from benign ones, and make just-in-time predictions of potential zero-day attacks. Our contributions are three-fold: First, we construct a comprehensive dataset consisting of features extracted and constructed from recent contracts deployed on the Ethereum and BSC blockchains. Secondly, we design a condensed representation of smart contract programs called Pruned Semantic-Control Flow Tokenization (PSCFT) and use it to train a combination of ML models that understand the behaviour of malicious codes based on function calls, control flows and other pattern-conforming features. Lastly, we provide the complete implementation of LookAhead and the evaluation of its performance metrics for detecting adversarial contracts. - [40] arXiv:2403.06281 (replaced) [pdf, html, other]
-
Title: ES-FUZZ: Improving the Coverage of Firmware Fuzzing with Stateful and Adaptable MMIO ModelsComments: 17 pages, 4 figuresSubjects: Cryptography and Security (cs.CR)
Grey-box fuzzing is widely used for testing embedded systems (ESes). The fuzzers often test the ES firmware in a fully emulated environment without real peripherals. To achieve decent code coverage, some state-of-the-art (SOTA) fuzzers infer the memory-mapped I/O (MMIO) behavior of peripherals from the firmware binary. We find the thus-generated MMIO models stateless, fixed, and poor at handling ES firmware's MMIO reads for retrieval of a data chunk. This leaves ample room for improving the code coverage.
We propose ES-Fuzz to enhance the coverage of firmware fuzz-testing with stateful MMIO models that adapt to the fuzzer's coverage bottleneck. ES-Fuzz runs concurrently with a given fuzzer and starts a new run whenever the fuzzer's coverage stagnates. It exploits the highest-coverage test case in each run and generates new stateful MMIO models that boost the fuzzer's coverage at that time. We have implemented ES-Fuzz upon Fuzzware and evaluated it with 24 popular ES firmware. ES-Fuzz is shown to improve Fuzzware's coverage by up to $47\%$ and find new bugs in these firmware. - [41] arXiv:2403.09562 (replaced) [pdf, html, other]
-
Title: PreCurious: How Innocent Pre-Trained Language Models Turn into Privacy TrapsComments: 15 pagesSubjects: Cryptography and Security (cs.CR)
The pre-training and fine-tuning paradigm has demonstrated its effectiveness and has become the standard approach for tailoring language models to various tasks. Currently, community-based platforms offer easy access to various pre-trained models, as anyone can publish without strict validation processes. However, a released pre-trained model can be a privacy trap for fine-tuning datasets if it is carefully designed. In this work, we propose PreCurious framework to reveal the new attack surface where the attacker releases the pre-trained model and gets a black-box access to the final fine-tuned model. PreCurious aims to escalate the general privacy risk of both membership inference and data extraction on the fine-tuning dataset. The key intuition behind PreCurious is to manipulate the memorization stage of the pre-trained model and guide fine-tuning with a seemingly legitimate configuration. While empirical and theoretical evidence suggests that parameter-efficient and differentially private fine-tuning techniques can defend against privacy attacks on a fine-tuned model, PreCurious demonstrates the possibility of breaking up this invulnerability in a stealthy manner compared to fine-tuning on a benign pre-trained model. While DP provides some mitigation for membership inference attack, by further leveraging a sanitized dataset, PreCurious demonstrates potential vulnerabilities for targeted data extraction even under differentially private tuning with a strict privacy budget e.g. $\epsilon=0.05$. Thus, PreCurious raises warnings for users on the potential risks of downloading pre-trained models from unknown sources, relying solely on tutorials or common-sense defenses, and releasing sanitized datasets even after perfect scrubbing.
- [42] arXiv:2405.11247 (replaced) [pdf, html, other]
-
Title: A Classification-by-Retrieval Framework for Few-Shot Anomaly Detection to Detect API Injection AttacksComments: 15 pages, 10 figures, 5 tablesSubjects: Cryptography and Security (cs.CR)
Application Programming Interface (API) Injection attacks refer to the unauthorized or malicious use of APIs, which are often exploited to gain access to sensitive data or manipulate online systems for illicit purposes. Identifying actors that deceitfully utilize an API poses a demanding problem. Although there have been notable advancements and contributions in the field of API security, there remains a significant challenge when dealing with attackers who use novel approaches that don't match the well-known payloads commonly seen in attacks. Also, attackers may exploit standard functionalities unconventionally and with objectives surpassing their intended boundaries. Thus, API security needs to be more sophisticated and dynamic than ever, with advanced computational intelligence methods, such as machine learning models that can quickly identify and respond to abnormal behavior. In response to these challenges, we propose a novel unsupervised few-shot anomaly detection framework composed of two main parts: First, we train a dedicated generic language model for API based on FastText embedding. Next, we use Approximate Nearest Neighbor search in a classification-by-retrieval approach. Our framework allows for training a fast, lightweight classification model using only a few examples of normal API requests. We evaluated the performance of our framework using the CSIC 2010 and ATRDF 2023 datasets. The results demonstrate that our framework improves API attack detection accuracy compared to the state-of-the-art (SOTA) unsupervised anomaly detection baselines.
- [43] arXiv:2406.04755 (replaced) [pdf, html, other]
-
Title: LLM Whisperer: An Inconspicuous Attack to Bias LLM ResponsesSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Writing effective prompts for large language models (LLM) can be unintuitive and burdensome. In response, services that optimize or suggest prompts have emerged. While such services can reduce user effort, they also introduce a risk: the prompt provider can subtly manipulate prompts to produce heavily biased LLM responses. In this work, we show that subtle synonym replacements in prompts can increase the likelihood (by a difference up to 78%) that LLMs mention a target concept (e.g., a brand, political party, nation). We substantiate our observations through a user study, showing our adversarially perturbed prompts 1) are indistinguishable from unaltered prompts by humans, 2) push LLMs to recommend target concepts more often, and 3) make users more likely to notice target concepts, all without arousing suspicion. The practicality of this attack has the potential to undermine user autonomy. Among other measures, we recommend implementing warnings against using prompts from untrusted parties.
- [44] arXiv:2406.05870 (replaced) [pdf, html, other]
-
Title: Machine Against the RAG: Jamming Retrieval-Augmented Generation with Blocker DocumentsSubjects: Cryptography and Security (cs.CR); Computation and Language (cs.CL); Machine Learning (cs.LG)
Retrieval-augmented generation (RAG) systems respond to queries by retrieving relevant documents from a knowledge database, then generating an answer by applying an LLM to the retrieved documents. We demonstrate that RAG systems that operate on databases with untrusted content are vulnerable to a new class of denial-of-service attacks we call jamming. An adversary can add a single ``blocker'' document to the database that will be retrieved in response to a specific query and result in the RAG system not answering this query - ostensibly because it lacks the information or because the answer is unsafe.
We describe and measure the efficacy of several methods for generating blocker documents, including a new method based on black-box optimization. This method (1) does not rely on instruction injection, (2) does not require the adversary to know the embedding or LLM used by the target RAG system, and (3) does not use an auxiliary LLM to generate blocker documents.
We evaluate jamming attacks on several LLMs and embeddings and demonstrate that the existing safety metrics for LLMs do not capture their vulnerability to jamming. We then discuss defenses against blocker documents. - [45] arXiv:2406.10719 (replaced) [pdf, html, other]
-
Title: Trading Devil: Robust backdoor attack via Stochastic investment models and Bayesian approachComments: (Last update!, a constructive comment from arxiv led to this latest update ) Stochastic investment models and a Bayesian approach to better modeling of uncertainty : adversarial machine learning or Stochastic market. arXiv admin note: substantial text overlap with arXiv:2402.05967 (see this link to the paper by : Orson Mengara)Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Computational Finance (q-fin.CP); Statistical Finance (q-fin.ST); Machine Learning (stat.ML)
With the growing use of voice-activated systems and speech recognition technologies, the danger of backdoor attacks on audio data has grown significantly. This research looks at a specific type of attack, known as a Stochastic investment-based backdoor attack (MarketBack), in which adversaries strategically manipulate the stylistic properties of audio to fool speech recognition systems. The security and integrity of machine learning models are seriously threatened by backdoor attacks, in order to maintain the reliability of audio applications and systems, the identification of such attacks becomes crucial in the context of audio data. Experimental results demonstrated that MarketBack is feasible to achieve an average attack success rate close to 100% in seven victim models when poisoning less than 1% of the training data.
- [46] arXiv:2407.14938 (replaced) [pdf, html, other]
-
Title: From Ad Identifiers to Global Privacy Control: The Status Quo and Future of Opting Out of Ad Tracking on AndroidSubjects: Cryptography and Security (cs.CR); Computers and Society (cs.CY)
Apps and their integrated third-party libraries often collect personal information from Android users for personalizing ads. This practice can be privacy-invasive. Users can limit ad tracking on Android via the AdID setting; further, the California Consumer Privacy Act (CCPA) gives user an opt-out right via Global Privacy Control (GPC). However, neither of these two privacy controls have been studied before as to whether they help Android users exercise their legally mandated opt-out right. In response, we evaluate how many Android apps are subject to the CCPA opt-out right and find it applicable to approximately 70% of apps on the top free app lists of the US Google Play Store. Our dynamic analysis of 1,811 apps from these lists shows that neither the AdID setting nor GPC effectively prevents the selling and sharing of personal information in California. For example, when disabling the AdID and sending GPC signals to the most common ad tracking domain in our dataset that implements the US Privacy String, only 4% of apps connecting to the domain indicate the opt-out status. To mitigate this shortcoming, Android's AdID setting should be evolved towards a universal GPC setting as part of Google's Privacy Sandbox.
- [47] arXiv:2407.18353 (replaced) [pdf, html, other]
-
Title: Privacy-Preserving Hierarchical Model-Distributed InferenceSubjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
This paper focuses on designing a privacy-preserving Machine Learning (ML) inference protocol for a hierarchical setup, where clients own/generate data, model owners (cloud servers) have a pre-trained ML model, and edge servers perform ML inference on clients' data using the cloud server's ML model. Our goal is to speed up ML inference while providing privacy to both data and the ML model. Our approach (i) uses model-distributed inference (model parallelization) at the edge servers and (ii) reduces the amount of communication to/from the cloud server. Our privacy-preserving hierarchical model-distributed inference, privateMDI design uses additive secret sharing and linearly homomorphic encryption to handle linear calculations in the ML inference, and garbled circuit and a novel three-party oblivious transfer are used to handle non-linear functions. privateMDI consists of offline and online phases. We designed these phases in a way that most of the data exchange is done in the offline phase while the communication overhead of the online phase is reduced. In particular, there is no communication to/from the cloud server in the online phase, and the amount of communication between the client and edge servers is minimized. The experimental results demonstrate that privateMDI significantly reduces the ML inference time as compared to the baselines.
- [48] arXiv:2408.06296 (replaced) [pdf, html, other]
-
Title: Hound: Locating Cryptographic Primitives in Desynchronized Side-Channel Traces Using Deep-LearningComments: Accepted for lecture presentation at 2024 42nd IEEE International Conference on Computer Design (ICCD 2024), Milan, Italy, Nov. 18-20, 2024Subjects: Cryptography and Security (cs.CR)
Side-channel attacks allow to extract sensitive information from cryptographic primitives by correlating the partially known computed data and the measured side-channel signal. Starting from the raw side-channel trace, the preprocessing of the side-channel trace to pinpoint the time at which each cryptographic primitive is executed, and, then, to re-align all the collected data to this specific time represent a critical step to setup a successful side-channel attack. The use of hiding techniques has been widely adopted as a low-cost solution to hinder the preprocessing of side-channel traces thus limiting side-channel attacks in real scenarios. This work introduces Hound, a novel deep learning-based pipeline to locate the execution of cryptographic primitives within the side-channel trace even in the presence of trace deformations introduced by the use of dynamic frequency scaling actuators. Hound has been validated through successful attacks on various cryptographic primitives executed on an FPGA-based system-on-chip incorporating a RISC-V CPU, while dynamic frequency scaling is active. Experimental results demonstrate the possibility of identifying the cryptographic primitives in DFS-deformed side-channel traces.
- [49] arXiv:2408.11727 (replaced) [pdf, other]
-
Title: Efficient Detection of Toxic Prompts in Large Language ModelsComments: Accepted by the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE 2024)Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Software Engineering (cs.SE)
Large language models (LLMs) like ChatGPT and Gemini have significantly advanced natural language processing, enabling various applications such as chatbots and automated content generation. However, these models can be exploited by malicious individuals who craft toxic prompts to elicit harmful or unethical responses. These individuals often employ jailbreaking techniques to bypass safety mechanisms, highlighting the need for robust toxic prompt detection methods. Existing detection techniques, both blackbox and whitebox, face challenges related to the diversity of toxic prompts, scalability, and computational efficiency. In response, we propose ToxicDetector, a lightweight greybox method designed to efficiently detect toxic prompts in LLMs. ToxicDetector leverages LLMs to create toxic concept prompts, uses embedding vectors to form feature vectors, and employs a Multi-Layer Perceptron (MLP) classifier for prompt classification. Our evaluation on various versions of the LLama models, Gemma-2, and multiple datasets demonstrates that ToxicDetector achieves a high accuracy of 96.39\% and a low false positive rate of 2.00\%, outperforming state-of-the-art methods. Additionally, ToxicDetector's processing time of 0.0780 seconds per prompt makes it highly suitable for real-time applications. ToxicDetector achieves high accuracy, efficiency, and scalability, making it a practical method for toxic prompt detection in LLMs.
- [50] arXiv:2408.16841 (replaced) [pdf, other]
-
Title: Cyber Risk Assessment for Cyber-Physical Systems: A Review of Methodologies and Recommendations for Improved Assessment EffectivenessJournal-ref: CS & IT - CSCP 2024, Vol. 14, No. 16, pp. 77-94, 2024. ISBN: 978-1-923107-35-9, ISSN: 2231-5403Subjects: Cryptography and Security (cs.CR)
Cyber-Physical Systems (CPS) integrate physical and embedded systems with information and communication technology systems, monitoring and controlling physical processes with minimal human intervention. The connection to information and communication technology exposes CPS to cyber risks. It is crucial to assess these risks to manage them effectively. This paper reviews scholarly contributions to cyber risk assessment for CPS, analyzing how the assessment approaches were evaluated and investigating to what extent they meet the requirements of effective risk assessment. We identify gaps limiting the effectiveness of the assessment and recommend real-time learning from cybersecurity incidents. Our review covers twenty-eight papers published between 2014 and 2023, selected based on a three-step search. Our findings show that the reviewed cyber risk assessment methodologies revealed limited effectiveness due to multiple factors. These findings provide a foundation for further research to explore and address other factors impacting the quality of cyber risk assessment in CPS.
- [51] arXiv:2409.01881 (replaced) [pdf, html, other]
-
Title: The Impact of Run-Time Variability on Side-Channel Attacks Targeting FPGAsComments: Accepted for lecture presentation at 2024 31st IEEE International Conference on Electronics, Circuits and Systems (ICECS), Nancy, France, Nov. 18-20, 2024Subjects: Cryptography and Security (cs.CR); Hardware Architecture (cs.AR)
To defeat side-channel attacks, many recent countermeasures work by enforcing random run-time variability to the target computing platform in terms of clock jitters, frequency and voltage scaling, and phase shift, also combining the contributions from different actuators to maximize the side-channel resistance of the target. However, the robustness of such solutions seems strongly influenced by several hyper-parameters for which an in-depth analysis is still missing. This work proposes a fine-grained dynamic voltage and frequency scaling actuator to investigate the effectiveness of recent desynchronization countermeasures with the goal of highlighting the link between the enforced run-time variability and the vulnerability to side-channel attacks of cryptographic implementations targeting FPGAs. The analysis of the results collected from real hardware allowed for a comprehensive understanding of the protection offered by run-time variability countermeasures against side-channel attacks.
- [52] arXiv:2409.07368 (replaced) [pdf, html, other]
-
Title: Demo: SGCode: A Flexible Prompt-Optimizing System for Secure Generation of CodeKhiem Ton, Nhi Nguyen, Mahmoud Nazzal, Abdallah Khreishah, Cristian Borcea, NhatHai Phan, Ruoming Jin, Issa Khalil, Yelong ShenSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
This paper introduces SGCode, a flexible prompt-optimizing system to generate secure code with large language models (LLMs). SGCode integrates recent prompt-optimization approaches with LLMs in a unified system accessible through front-end and back-end APIs, enabling users to 1) generate secure code, which is free of vulnerabilities, 2) review and share security analysis, and 3) easily switch from one prompt optimization approach to another, while providing insights on model and system performance. We populated SGCode on an AWS server with PromSec, an approach that optimizes prompts by combining an LLM and security tools with a lightweight generative adversarial graph neural network to detect and fix security vulnerabilities in the generated code. Extensive experiments show that SGCode is practical as a public tool to gain insights into the trade-offs between model utility, secure code generation, and system cost. SGCode has only a marginal cost compared with prompting LLMs. SGCode is available at: this http URL.
- [53] arXiv:2409.07926 (replaced) [pdf, html, other]
-
Title: Mobile App Security Trends and Topics: An Examination of Questions From Stack OverflowComments: This paper was accepted for publication at the 58th Hawaii International Conference on System Sciences (HICSS) - Software Technology TrackSubjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)
The widespread use of smartphones and tablets has made society heavily reliant on mobile applications (apps) for accessing various resources and services. These apps often handle sensitive personal, financial, and health data, making app security a critical concern for developers. While there is extensive research on software security topics like malware and vulnerabilities, less is known about the practical security challenges mobile app developers face and the guidance they seek. In this study, we mine Stack Overflow for questions on mobile app security, which we analyze using quantitative and qualitative techniques. The findings reveal that Stack Overflow is a major resource for developers seeking help with mobile app security, especially for Android apps, and identifies seven main categories of security questions: Secured Communications, Database, App Distribution Service, Encryption, Permissions, File-Specific, and General Security. Insights from this research can inform the development of tools, techniques, and resources by the research and vendor community to better support developers in securing their mobile apps.
- [54] arXiv:2409.08234 (replaced) [pdf, html, other]
-
Title: LLM Honeypot: Leveraging Large Language Models as Advanced Interactive Honeypot SystemsComments: 6 pages, 5 figuresSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)
The rapid evolution of cyber threats necessitates innovative solutions for detecting and analyzing malicious activity. Honeypots, which are decoy systems designed to lure and interact with attackers, have emerged as a critical component in cybersecurity. In this paper, we present a novel approach to creating realistic and interactive honeypot systems using Large Language Models (LLMs). By fine-tuning a pre-trained open-source language model on a diverse dataset of attacker-generated commands and responses, we developed a honeypot capable of sophisticated engagement with attackers. Our methodology involved several key steps: data collection and processing, prompt engineering, model selection, and supervised fine-tuning to optimize the model's performance. Evaluation through similarity metrics and live deployment demonstrated that our approach effectively generates accurate and informative responses. The results highlight the potential of LLMs to revolutionize honeypot technology, providing cybersecurity professionals with a powerful tool to detect and analyze malicious activity, thereby enhancing overall security infrastructure.
- [55] arXiv:2402.10527 (replaced) [pdf, html, other]
-
Title: Assessing biomedical knowledge robustness in large language models by query-efficient sampling attacksComments: 28 pages incl. appendix, updated versionSubjects: Computation and Language (cs.CL); Cryptography and Security (cs.CR); Applications (stat.AP)
The increasing depth of parametric domain knowledge in large language models (LLMs) is fueling their rapid deployment in real-world applications. Understanding model vulnerabilities in high-stakes and knowledge-intensive tasks is essential for quantifying the trustworthiness of model predictions and regulating their use. The recent discovery of named entities as adversarial examples (i.e. adversarial entities) in natural language processing tasks raises questions about their potential impact on the knowledge robustness of pre-trained and finetuned LLMs in high-stakes and specialized domains. We examined the use of type-consistent entity substitution as a template for collecting adversarial entities for billion-parameter LLMs with biomedical knowledge. To this end, we developed an embedding-space attack based on powerscaled distance-weighted sampling to assess the robustness of their biomedical knowledge with a low query budget and controllable coverage. Our method has favorable query efficiency and scaling over alternative approaches based on random sampling and blackbox gradient-guided search, which we demonstrated for adversarial distractor generation in biomedical question answering. Subsequent failure mode analysis uncovered two regimes of adversarial entities on the attack surface with distinct characteristics and we showed that entity substitution attacks can manipulate token-wise Shapley value explanations, which become deceptive in this setting. Our approach complements standard evaluations for high-capacity models and the results highlight the brittleness of domain knowledge in LLMs.
- [56] arXiv:2403.07865 (replaced) [pdf, html, other]
-
Title: CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code CompletionComments: ACL Findings 2024, Code is available at this https URLSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Software Engineering (cs.SE)
The rapid advancement of Large Language Models (LLMs) has brought about remarkable generative capabilities but also raised concerns about their potential misuse. While strategies like supervised fine-tuning and reinforcement learning from human feedback have enhanced their safety, these methods primarily focus on natural languages, which may not generalize to other domains. This paper introduces CodeAttack, a framework that transforms natural language inputs into code inputs, presenting a novel environment for testing the safety generalization of LLMs. Our comprehensive studies on state-of-the-art LLMs including GPT-4, Claude-2, and Llama-2 series reveal a new and universal safety vulnerability of these models against code input: CodeAttack bypasses the safety guardrails of all models more than 80\% of the time. We find that a larger distribution gap between CodeAttack and natural language leads to weaker safety generalization, such as encoding natural language input with data structures. Furthermore, we give our hypotheses about the success of CodeAttack: the misaligned bias acquired by LLMs during code training, prioritizing code completion over avoiding the potential safety risk. Finally, we analyze potential mitigation measures. These findings highlight new safety risks in the code domain and the need for more robust safety alignment algorithms to match the code capabilities of LLMs.
- [57] arXiv:2404.06666 (replaced) [pdf, html, other]
-
Title: SafeGen: Mitigating Sexually Explicit Content Generation in Text-to-Image ModelsComments: Accepted by ACM CCS 2024. Please cite this paper as "Xinfeng Li, Yuchen Yang, Jiangyi Deng, Chen Yan, Yanjiao Chen, Xiaoyu Ji, Wenyuan Xu. SafeGen: Mitigating Sexually Explicit Content Generation in Text-to-Image Models. In Proceedings of ACM Conference on Computer and Communications Security (CCS), 2024."Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Cryptography and Security (cs.CR)
Text-to-image (T2I) models, such as Stable Diffusion, have exhibited remarkable performance in generating high-quality images from text descriptions in recent years. However, text-to-image models may be tricked into generating not-safe-for-work (NSFW) content, particularly in sexually explicit scenarios. Existing countermeasures mostly focus on filtering inappropriate inputs and outputs, or suppressing improper text embeddings, which can block sexually explicit content (e.g., naked) but may still be vulnerable to adversarial prompts -- inputs that appear innocent but are ill-intended. In this paper, we present SafeGen, a framework to mitigate sexual content generation by text-to-image models in a text-agnostic manner. The key idea is to eliminate explicit visual representations from the model regardless of the text input. In this way, the text-to-image model is resistant to adversarial prompts since such unsafe visual representations are obstructed from within. Extensive experiments conducted on four datasets and large-scale user studies demonstrate SafeGen's effectiveness in mitigating sexually explicit content generation while preserving the high-fidelity of benign images. SafeGen outperforms eight state-of-the-art baseline methods and achieves 99.4% sexual content removal performance. Furthermore, our constructed benchmark of adversarial prompts provides a basis for future development and evaluation of anti-NSFW-generation methods.
- [58] arXiv:2405.00663 (replaced) [pdf, html, other]
-
Title: Quantum cryptographic protocols with dual messaging system via 2D alternate quantum walk of a genuine single-photon entangled stateComments: 13 pages (including appendix), two figures and one tableSubjects: Quantum Physics (quant-ph); Disordered Systems and Neural Networks (cond-mat.dis-nn); Cryptography and Security (cs.CR); Quantum Algebra (math.QA); Optics (physics.optics)
A single-photon entangled state (or single-particle entangled state (SPES) in general) can offer a more secure way of encoding and processing quantum information than their multi-photon (or multi-particle) counterparts. The SPES generated via a 2D alternate quantum-walk setup from initially separable states can be either 3-way or 2-way entangled. This letter shows that the generated genuine three-way and nonlocal two-way SPES can be used as cryptographic keys to securely encode two distinct messages simultaneously. We detail the message encryption-decryption steps and show the resilience of the 3-way and 2-way SPES-based cryptographic protocols against eavesdropper attacks like intercept-and-resend and man-in-the-middle. We also detail the experimental realization of these protocols using a single photon, with the three degrees of freedom being OAM, path, and polarization. We have proved that the protocols have unconditional security for quantum communication tasks. The ability to simultaneously encode two distinct messages using the generated SPES showcases the versatility and efficiency of the proposed cryptographic protocol. This capability could significantly improve the throughput of quantum communication systems.
- [59] arXiv:2406.05904 (replaced) [pdf, html, other]
-
Title: Aegis: A Decentralized Expansion BlockchainSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR)
Blockchains implement monetary systems operated by committees of nodes. The robustness of established blockchains presents an opportunity to leverage their infrastructure for creating expansion chains. Expansion chains can provide additional functionality to the primary chain they leverage or implement separate functionalities, while benefiting from the primary chain's security and the stability of its tokens. Indeed, tools like Ethereum's EigenLayer enable nodes to stake (deposit collateral) on a primary chain to form a committee responsible for operating an expansion chain.
But here is the rub. Classical protocols assume correct, well-behaved nodes stay correct indefinitely. Yet in our case, the stake incentivizes correctness--it will be slashed (revoked) if its owner deviates. Once a node withdraws its stake, there is no basis to assume its correctness.
To address the new challenge, we present Aegis, an expansion chain based on primary-chain stake, assuming a bounded primary-chain write time. Aegis uses references from Aegis blocks to primary blocks to define committees, checkpoints on the primary chain to perpetuate decisions, and resets on the primary chain to establish a new committee if the previous one becomes obsolete. It ensures safety at all times and rapid progress when latency among Aegis nodes is low. - [60] arXiv:2406.15842 (replaced) [pdf, html, other]
-
Title: Privacy Requirements and Realities of Digital Public GoodsJournal-ref: Twentieth Symposium on Usable Privacy and Security (SOUPS 2024), pp. 159-177. 2024Subjects: Human-Computer Interaction (cs.HC); Cryptography and Security (cs.CR); Computers and Society (cs.CY)
In the international development community, the term "digital public goods" is used to describe open-source digital products (e.g., software, datasets) that aim to address the United Nations (UN) Sustainable Development Goals. DPGs are increasingly being used to deliver government services around the world (e.g., ID management, healthcare registration). Because DPGs may handle sensitive data, the UN has established user privacy as a first-order requirement for DPGs. The privacy risks of DPGs are currently managed in part by the DPG standard, which includes a prerequisite questionnaire with questions designed to evaluate a DPG's privacy posture.
This study examines the effectiveness of the current DPG standard for ensuring adequate privacy protections. We present a systematic assessment of responses from DPGs regarding their protections of users' privacy. We also present in-depth case studies from three widely-used DPGs to identify privacy threats and compare this to their responses to the DPG standard. Our findings reveal limitations in the current DPG standard's evaluation approach. We conclude by presenting preliminary recommendations and suggestions for strengthening the DPG standard as it relates to privacy. Additionally, we hope this study encourages more usable privacy research on communicating privacy, not only to end users but also third-party adopters of user-facing technologies. - [61] arXiv:2407.05639 (replaced) [pdf, other]
-
Title: Deep Learning-based Anomaly Detection and Log Analysis for Computer NetworksComments: 38 pagesSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Computer network anomaly detection and log analysis, as an important topic in the field of network security, has been a key task to ensure network security and system reliability. First, existing network anomaly detection and log analysis methods are often challenged by high-dimensional data and complex network topologies, resulting in unstable performance and high false-positive rates. In addition, traditional methods are usually difficult to handle time-series data, which is crucial for anomaly detection and log analysis. Therefore, we need a more efficient and accurate method to cope with these problems. To compensate for the shortcomings of current methods, we propose an innovative fusion model that integrates Isolation Forest, GAN (Generative Adversarial Network), and Transformer with each other, and each of them plays a unique role. Isolation Forest is used to quickly identify anomalous data points, and GAN is used to generate synthetic data with the real data distribution characteristics to augment the training dataset, while the Transformer is used for modeling and context extraction on time series data. The synergy of these three components makes our model more accurate and robust in anomaly detection and log analysis tasks. We validate the effectiveness of this fusion model in an extensive experimental evaluation. Experimental results show that our model significantly improves the accuracy of anomaly detection while reducing the false alarm rate, which helps to detect potential network problems in advance. The model also performs well in the log analysis task and is able to quickly identify anomalous behaviors, which helps to improve the stability of the system. The significance of this study is that it introduces advanced deep learning techniques, which work anomaly detection and log analysis.
- [62] arXiv:2408.11006 (replaced) [pdf, html, other]
-
Title: Security Attacks on LLM-based Code Completion ToolsSubjects: Computation and Language (cs.CL); Cryptography and Security (cs.CR)
The rapid development of large language models (LLMs) has significantly advanced code completion capabilities, giving rise to a new generation of LLM-based Code Completion Tools (LCCTs). Unlike general-purpose LLMs, these tools possess unique workflows, integrating multiple information sources as input and prioritizing code suggestions over natural language interaction, which introduces distinct security challenges. Additionally, LCCTs often rely on proprietary code datasets for training, raising concerns about the potential exposure of sensitive data. This paper exploits these distinct characteristics of LCCTs to develop targeted attack methodologies on two critical security risks: jailbreaking and training data extraction attacks. Our experimental results expose significant vulnerabilities within LCCTs, including a 99.4% success rate in jailbreaking attacks on GitHub Copilot and a 46.3% success rate on Amazon Q. Furthermore, We successfully extracted sensitive user data from GitHub Copilot, including 54 real email addresses and 314 physical addresses associated with GitHub usernames. Our study also demonstrates that these code-based attack methods are effective against general-purpose LLMs, such as the GPT series, highlighting a broader security misalignment in the handling of code by modern LLMs. These findings underscore critical security challenges associated with LCCTs and suggest essential directions for strengthening their security frameworks. The example code and attack samples from our research are provided at this https URL.
- [63] arXiv:2409.00276 (replaced) [pdf, html, other]
-
Title: Exact Recovery Guarantees for Parameterized Non-linear System Identification Problem under Adversarial AttacksComments: 33 pagesSubjects: Optimization and Control (math.OC); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Systems and Control (eess.SY)
In this work, we study the system identification problem for parameterized non-linear systems using basis functions under adversarial attacks. Motivated by the LASSO-type estimators, we analyze the exact recovery property of a non-smooth estimator, which is generated by solving an embedded $\ell_1$-loss minimization problem. First, we derive necessary and sufficient conditions for the well-specifiedness of the estimator and the uniqueness of global solutions to the underlying optimization problem. Next, we provide exact recovery guarantees for the estimator under two different scenarios of boundedness and Lipschitz continuity of the basis functions. The non-asymptotic exact recovery is guaranteed with high probability, even when there are more severely corrupted data than clean data. Finally, we numerically illustrate the validity of our theory. This is the first study on the sample complexity analysis of a non-smooth estimator for the non-linear system identification problem.