• 把多维Fuzzing技术面临的问题归纳为组合爆炸、覆盖脆弱语句和触发潜在漏洞三个问题，对存在的多种多维Fuzzing技术进行了研究和比较，并总结出多维Fuzzing技术的三个基本步骤：定位脆弱语句、查找影响脆弱语句的输入...
• Bonsai Fuzzing是一种模糊测试算法，可为诸如编译器之类的软件生成简洁的测试用例。 在中描述了更多详细信息： Vasudev Vikram，Rohan Padhye和Koushik Sen.使用Bonsai Fuzzing成长了一个测试语料库。 出现在第43...
• 以下所有说明均假设一个Linux操作系统在执行docker容器的（可能是虚拟的）计算机上具有sudo权限，docker安装以及该存储库的本地版本url-fuzzing-docker。 注意：Vagrantfiles可用于构建符合构建要求（2021年1月）...
• 本文阐述了工控网络协议的特点以及 Fuzzing 测试的困难，讨论并比较了现有各种 Fuzzing 测试技术应用于工控网络协议的优缺点，提出工控网络协议的专用 Fuzzing 测试工具的设计准则，最后展望了工控网络协议 Fuzzing...
• by Michael Sutton •为什么模糊测试简化了测试设计并捕获了其他方法遗漏的缺陷 •模糊测试过程：从识别输入到评估“可利用性” •了解有效模糊测试的要求 •比较基于突变和基于生成的模糊器 ...
• Web软件安全漏洞层出不穷，攻击手段日益变化，为分析现有的Web控件漏洞检测方法，提出基于Fuzzing测试方法的Web控件漏洞检测改进模型。该系统从功能上分为五大模块进行设计和实现，并结合静态分析与动态分析技术检测...
• FuzzIL: Coverage Guided Fuzzing for JavaScript Engines
• 通过分析比较多种Fuzzing技术的定义，结合其当前发展所基于的知识和采用的方法，给出了Fuzzing技术的一个新的定义；重点从与黑盒测试技术的区别、测试对象、架构和测试数据产生机理四个方面总结了当前Fuzzing技术...
• dockerized_fuzzing 在Docker中运行模糊测试。 目前，我们已经整合了37种可用的模糊测试工具。 此是一部分。 相应的纸张将出现在Usenix安全2021年 引用本文： @inproceedings{unifuzz-li, title={{UNIFUZZ}: A ...
• 关于这本书 欢迎来到《模糊的书》！ 软件中存在错误，而捕获错误可能会涉及很多工作。 本书通过自动化软件测试（特别是通过自动生成测试）解决了这个问题。 近年来，看到了新技术的发展，这些新技术导致了测试生成和...
• Fuzzing-Dicts-master
• ContractFuzzer generates fuzzing inputs based on the ABI specifications of smart contracts, defines test oracles to detect security vulnerabilities, instruments the EVM to log smart contracts runtime ...
• ## Recent Fuzzing Papers

千次阅读 2019-03-13 15:19:10
Recent Papers Related To Fuzzing 原文在GitHub上更新： https://github.com/wcventure/FuzzingPaper All Papers Interesting Fuzzing DifFuzz: Differential Fuzzing for Side-Channel Analysis (ICSE 2019) ...
Recent Papers Related To Fuzzing
原文在GitHub上进行更新： https://github.com/wcventure/FuzzingPaper
All Papers
Interesting Fuzzing
DifFuzz: Differential Fuzzing for Side-Channel Analysis (ICSE 2019)REST-ler: Stateful REST API Fuzzing (ICSE 2019)Life after Speech Recognition: Fuzzing Semantic Misinterpretation for Voice Assistant Applications (NDSS 2019)ContractFuzzer: Fuzzing Smart Contracts for Vulnerability Detection (ASE 2018)IoTFuzzer: Discovering Memory Corruptions in IoT Through App-based Fuzzing (NDSS 2018)What You Corrupt Is Not What You Crash: Challenges in Fuzzing Embedded Devices (NDSS 2018)MoonShine: Optimizing OS Fuzzer Seed Selection with Trace Distillation (USENUX Security2018)Singularity: Pattern Fuzzing for Worst Case Complexity (FSE 2018)NEZHA: Efficient Domain-Independent Differential Testing (S&P 2017)  Evaluate Fuzzing
Evaluating Fuzz Testing (CCS 2018)  Kernel Fuzzing
PeriScope: An Effective Probing and Fuzzing Framework for the Hardware-OS Boundary (NDSS2019)Fuzzing File Systems via Two-Dimensional Input Space Exploration (S&P 2019)Razzer: Finding Kernel Race Bugs through Fuzzing (S&P 2019)kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels (Usenix Security2017)  Hybrid Fuzzing
Send Hardest Problems My Way: Probabilistic Path Prioritization for Hybrid Fuzzing (NDSS2019)QSYM: A Practical Concolic Execution Engine Tailored for Hybrid Fuzzing (USENUX Security2018)Angora: Efficient Fuzzing by Principled Search (S&P 2018)Driller: Argumenting Fuzzing Through Selective Symbolic Execution(NDSS 2016)  Addressing Magic bytes \ checksum
REDQUEEN: Fuzzing with Input-to-State Correspondence (NDSS2019)T-Fuzz: fuzzing by program transformation (S&P 2018)FairFuzz: A Targeted Mutation Strategy for Increasing Greybox Fuzz Testing Coverage (ASE 2018)VUzzer: Application-aware Evolutionary Fuzzing (NDSS 2017)  Inputs-aware Fuzzing
SLF: Fuzzing without Valid Seed Inputs (ICSE2019)Superion: Grammar-Aware Greybox Fuzzing (ICSE 2019)ProFuzzer: On-the-fly Input Type Probing for Better Zero-day Vulnerability Discovery (S&P 2019)  Directed Fuzzing
Directed Greybox Fuzzing (CCS 2017)Hawkeye: Towards a Desired Directed Grey-box Fuzzer (CCS 2018)  Addressing Collision
CollAFL: Path Sensitive Fuzzing (S&P 2018)  Fuzzing Overhead & Performance
Full-speed Fuzzing: Reducing Fuzzing Overhead through Coverage-guided Tracing (S&P 2019)Designing New Operating Primitives to Improve Fuzzing Performance (CCS 2017)  Enhancing Memory Error
Enhancing Memory Error Detection for Large-Scale Applications and Fuzz Testing (NDSS 2018)  Power Schedule
Coverage-based Greybox Fuzzing as Markov Chain (CCS 2016)  Learning-based Fuzzing
NEUZZ: Efficient Fuzzing with Neural Program Smoothing (S&P 2019)  Fuzzing Machine Learning Model
TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing (2018)Coverage-Guided Fuzzing for Deep Neural Networks (2018)
DifFuzz (Side-channal analysis)ProFuzzer (Inference input structure)FairFuzz (Target rare branches)FairFuzz & ProFuzzerEnhancing Memory Error DetectionNEZHA (Differential testing)REDQUEEN
Interesting Fuzzing
DifFuzz: Differential Fuzzing for Side-Channel Analysis (ICSE 2019)
Abstract: Side-channel attacks allow an adversary to uncover secret program data by observing the behavior of a program with respect to a resource, such as execution time, consumed memory or response size. Side-channel vulnerabilities are difficult to reason about as they involve analyzing the correlations between resource usage over multiple program paths. We present DifFuzz, a fuzzing-based approach for detecting side-channel vulnerabilities related to time and space. DifFuzz automatically detects these vulnerabilities by analyzing two versions of the program and using resource-guided heuristics to find inputs that maximize the difference in resource consumption between secret-dependent paths. The methodology of DifFuzz is general and can be applied to programs written in any language. For this paper, we present an implementation that targets analysis of Java programs, and uses and extends the Kelinci and AFL fuzzers. We evaluate DifFuzz on a large number of Java programs and demonstrate that it can reveal unknown side-channel vulnerabilities in popular applications. We also show that DifFuzz compares favorably against Blazer and Themis, two state-of-the-art analysis tools for finding side-channels in Java programs.
REST-ler: Stateful REST API Fuzzing (ICSE 2019)
Paper
Abstract: This paper introduces REST-ler, the first stateful REST API fuzzer. REST-ler analyzes the API specification of a cloud service and generates sequences of requests that automatically test the service through its API. REST-ler generates test sequences by (1) inferring producer-consumer dependencies among request types declared in the specification (eg inferring that “a request B should be executed after request A” because B takes as an input a resource-id x produced by A) and by (2) analyzing dynamic feedback from responses observed during prior test executions in order to generate new tests (eg learning that “a request C after a request sequence A;B is refused by the service” and therefore avoiding this combination in the future). We present experimental results showing that these two techniques are necessary to thoroughly exercise a service under test while pruning the large search space of possible request sequences. We used REST-ler to test GitLab, a large open-source self-hosted Git service, as well as several Microsoft Azure and Office365 cloud services. REST-ler found 28 bugs in Gitlab and several bugs in each of the Azure and Office365 cloud services tested so far. These bugs have been confirmed by the service owners, and are either in the process of being fixed or have already been fixed.
Life after Speech Recognition: Fuzzing Semantic Misinterpretation for Voice Assistant Applications (NDSS 2019)
Paper
Abstract: Popular Voice Assistant (VA) services such as Amazon Alexa and Google Assistant are now rapidly appifying their platforms to allow more flexible and diverse voice-controlled service experience. However, the ubiquitous deployment of VA devices and the increasing number of third-party applications have raised security and privacy concerns. While previous works such as hidden voice attacks mostly examine the problems of VA services’ default Automatic Speech Recognition (ASR) component, our work analyzes and evaluates the security of the succeeding component after ASR, i.e., Natural Language Understanding (NLU), which performs semantic interpretation (i.e., text-to-intent) after ASR’s acoustic-to-text processing. In particular, we focus on NLU’s Intent Classifier which is used in customizing machine understanding for third-party VA Applications (or vApps). We find that the semantic inconsistency caused by the improper semantic interpretation of an Intent Classifier can create the opportunity of breaching the integrity of vApp processing when attackers delicately leverage some common spoken errors. In this paper, we design the first linguistic-model-guided fuzzing tool, named LipFuzzer, to assess the security of Intent Classifier and systematically discover potential misinterpretation-prone spoken errors based on vApps’ voice command templates. To guide the fuzzing, we construct adversarial linguistic models with the help of Statistical Relational Learning (SRL) and emerging Natural Language Processing (NLP) techniques. In evaluation, we have successfully verified the effectiveness and accuracy of LipFuzzer. We also use LipFuzzer to evaluate both Amazon Alexa and Google Assistant vApp platforms. We have identified that a large portion of real-world vApps are vulnerable based on our fuzzing result.
ContractFuzzer: Fuzzing Smart Contracts for Vulnerability Detection (ASE 2018)
Abstract: Decentralized cryptocurrencies feature the use of blockchain to transfer values among peers on networks without central agency. Smart contracts are programs running on top of the blockchain consensus protocol to enable people make agreements while minimizing trusts. Millions of smart contracts have been deployed in various decentralized applications. The security vulnerabilities within those smart contracts pose significant threats to their applications. Indeed, many critical security vulnerabilities within smart contracts on Ethereum platform have caused huge financial losses to their users. In this work, we present ContractFuzzer, a novel fuzzer to test Ethereum smart contracts for security vulnerabilities. ContractFuzzer generates fuzzing inputs based on the ABI specifications of smart contracts, defines test oracles to detect security vulnerabilities, instruments the EVM to log smart contracts runtime behaviors, and analyzes these logs to report security vulnerabilities. Our fuzzing of 6991 smart contracts has flagged more than 459 vulnerabilities with high precision. In particular, our fuzzing tool successfully detects the vulnerability of the DAO contract that leads to USD 60 million loss and the vulnerabilities of Parity Wallet that have led to the loss of USD 30 million and the freezing of USD 150 million worth of Ether.
IoTFuzzer: Discovering Memory Corruptions in IoT Through App-based Fuzzing (NDSS 2018)
Abstract: With more IoT devices entering the consumer market, it becomes imperative to detect their security vulnerabilities before an attacker does. Existing binary analysis based approaches only work on firmware, which is less accessible except for those equipped with special tools for extracting the code from the device. To address this challenge in IoT security analysis, we present in this paper a novel automatic fuzzing framework, called IOTFUZZER, which aims at finding memory corruption vulnerabilities in IoT devices without access to their firmware images. The key idea is based upon the observation that most IoT devices are controlled through their official mobile apps, and such an app often contains rich information about the protocol it uses to communicate with its device. Therefore, by identifying and reusing program-specific logic (e.g., encryption) to mutate the test case (particularly message fields), we are able to effectively probe IoT targets without relying on any knowledge about its protocol specifications. In our research, we implemented IOTFUZZER and evaluated 17 real-world IoT devices running on different protocols, and our approach successfully identified 15 memory corruption vulnerabilities (including 8 previously unknown ones).
What You Corrupt Is Not What You Crash: Challenges in Fuzzing Embedded Devices (NDSS 2018)
Paper  Slides
Abstract: As networked embedded systems are becoming more ubiquitous, their security is becoming critical to our daily life. While manual or automated large scale analysis of those systems regularly uncover new vulnerabilities, the way those systems are analyzed follows often the same approaches used on desktop systems. More specifically, traditional testing approaches relies on observable crashes of a program, and binary instrumentation techniques are used to improve the detection of those faulty states. In this paper, we demonstrate that memory corruptions, a common class of security vulnerabilities, often result in different behavior on embedded devices than on desktop systems. In particular, on embedded devices, effects of memory corruption are often less visible. This reduces significantly the effectiveness of traditional dynamic testing techniques in general, and fuzzing in particular. Additionally, we analyze those differences in several categories of embedded devices and show the resulting impact on firmware analysis. We further describe and evaluate relatively simple heuristics which can be applied at run time (on an execution trace or in an emulator), during the analysis of an embedded device to detect previously undetected memory corruptions.
MoonShine: Optimizing OS Fuzzer Seed Selection with Trace Distillation (USENUX Security2018)
Paper
Abstract: OS fuzzers primarily test the system call interface between the OS kernel and user-level applications for security vulnerabilities. The effectiveness of evolutionary OS fuzzers depends heavily on the quality and diversity of their seed system call sequences. However, generating good seeds for OS fuzzing is a hard problem as the behavior of each system call depends heavily on the OS kernel state created by the previously executed system calls. Therefore, popular evolutionary OS fuzzers often rely on hand-coded rules for generating valid seed sequences of system calls that can bootstrap the fuzzing process. Unfortunately, this approach severely restricts the diversity of the seed system call sequences and therefore limits the effectiveness of the fuzzers. In this paper, we develop MoonShine, a novel strategy for distilling seeds for OS fuzzers from system call traces of real-world programs while still maintaining the dependencies across the system calls. MoonShine leverages light-weight static analysis for efficiently detecting dependencies across different system calls. We designed and implemented MoonShine as an extension to Syzkaller, a state-of-the-art evolutionary fuzzer for the Linux kernel. Starting from traces containing 2.8 million system calls gathered from 3,220 real-world programs, MoonShine distilled down to just over 14,000 calls while preserving 86% of the original code coverage. Using these distilled seed system call sequences, MoonShine was able to improve Syzkaller’s achieved code coverage for the Linux kernel by 13% on average. MoonShine also found 14 new vulnerabilities in the Linux kernel that were not found by Syzkaller.
Singularity: Pattern Fuzzing for Worst Case Complexity (FSE 2018)
Paper
Abstract: We describe a new blackbox complexity testing technique for determining the worst-case asymptotic complexity of a given application. The key idea is to look for an input pattern —rather than a concrete input— that maximizes the asymptotic resource usage of the program. Because input patterns can be described concisely as programs in a restricted language, our method transforms the complexity testing problem to optimal program synthesis. In particular, we express these input patterns using a new model of computation called Recurrent Computation Graph (RCG) and solve the optimal synthesis problem by developing a genetic programming algorithm that operates on RCGs. We have implemented the proposed ideas in a tool called Singularity and evaluate it on a diverse set of benchmarks. Our evaluation shows that Singularity can effectively discover the worst-case complexity of various algorithms and that it is more scalable compared to existing state-of-the-art techniques. Furthermore, our experiments also corroborate that Singularity can discover previously unknown performance bugs and availability vulnerabilities in real-world applications such as Google Guava and JGraphT.
NEZHA: Efficient Domain-Independent Differential Testing (S&P 2017)
Differential testing uses similar programs as cross-referencing oracles to find semantic bugs that do not exhibit explicit erroneous behaviors like crashes or assertion failures. Unfortunately, existing differential testing tools are domain-specific and inefficient, requiring large numbers of test inputs to find a single bug. In this paper, we address these issues by designing and implementing NEZHA, an efficient input-format-agnostic differential testing framework. The key insight behind NEZHA’s design is that current tools generate inputs by simply borrowing techniques designed for finding crash or memory corruption bugs in individual programs (e.g., maximizing code coverage). By contrast, NEZHA exploits the behavioral asymmetries between multiple test programs to focus on inputs that are more likely to trigger semantic bugs. We introduce the notion of δ-diversity, which summarizes the observed asymmetries between the behaviors of multiple test applications. Based on δ-diversity, we design two efficient domain-independent input generation mechanisms for differential testing, one gray-box and one black-box. We demonstrate that both of these input generation schemes are significantly more efficient than existing tools at finding semantic bugs in real-world, complex software.
Evaluate Fuzzing
Evaluating Fuzz Testing (CCS 2018)
Abstract: Fuzz testing has enjoyed great success at discovering security critical bugs in real software. Recently, researchers have devoted significant effort to devising new fuzzing techniques, strategies, and algorithms. Such new ideas are primarily evaluated experimentally so an important question is: What experimental setup is needed to produce trustworthy results? We surveyed the recent research literature and assessed the experimental evaluations carried out by 32 fuzzing papers. We found problems in every evaluation we considered. We then performed our own extensive experimental evaluation using an existing fuzzer. Our results showed that the general problems we found in existing experimental evaluations can indeed translate to actual wrong or misleading assessments. We conclude with some guidelines that we hope will help improve experimental evaluations of fuzz testing algorithms, making reported results more robust.
Kernel Fuzzing
PeriScope: An Effective Probing and Fuzzing Framework for the Hardware-OS Boundary (NDSS2019)
Paper
Abstract: The OS kernel is an attractive target for remote attackers. If compromised, the kernel gives adversaries full system access, including the ability to install rootkits, extract sensitive information, and perform other malicious actions, all while evading detection. Most of the kernel’s attack surface is situated along the system call boundary. Ongoing kernel protection efforts have focused primarily on securing this boundary; several capable analysis and fuzzing frameworks have been developed for this purpose. However, there are additional paths to kernel compromise that do not involve system calls, as demonstrated by several recent exploits. For example, by compromising the firmware of a peripheral device such as a Wi-Fi chipset and subsequently sending malicious inputs from the Wi-Fi chipset to the Wi-Fi driver, adversaries have been able to gain control over the kernel without invoking a single system call. Unfortunately, there are currently no practical probing and fuzzing frameworks that can help developers find and fix such vulnerabilities occurring along the hardware-OS boundary. We present PeriScope, a Linux kernel based probing framework that enables fine-grained analysis of device-driver interactions. PeriScope hooks into the kernel’s page fault handling mechanism to either passively monitor and log traffic between device drivers and their corresponding hardware, or mutate the data stream on-the-fly using a fuzzing component, PeriFuzz, thus mimicking an active adversarial attack. PeriFuzz accurately models the capabilities of an attacker on peripheral devices, to expose different classes of bugs including, but not limited to, memory corruption bugs and double-fetch bugs. To demonstrate the risk that peripheral devices pose, as well as the value of our framework, we have evaluated PeriFuzz on the Wi-Fi drivers of two popular chipset vendors, where we discovered 15 unique vulnerabilities, 9 of which were previously unknown.
Fuzzing File Systems via Two-Dimensional Input Space Exploration (S&P 2019)
Paper
Abstract: File systems, a basic building block of an OS, are too big and too complex to be bug free. Nevertheless, file systems rely on regular stress-testing tools and formal checkers to find bugs, which are limited due to the ever-increasing complexity of both file systems and OSes. Thus, fuzzing, proven to be an effective and a practical approach, becomes a preferable choice, as it does not need much knowledge about a target. However, three main challenges exist in fuzzing file systems: mutating a large image blob that degrades overall performance, generating image-dependent file operations, and reproducing found bugs, which is difficult for existing OS fuzzers. Hence, we present JANUS, the first feedback-driven fuzzer that explores the two-dimensional input space of a file system, i.e., mutating metadata on a large image, while emitting image-directed file operations. In addition, JANUS relies on a library OS rather than on traditional VMs for fuzzing, which enables JANUS to load a fresh copy of the OS, thereby leading to better reproducibility of bugs. We evaluate JANUS on eight file systems and found 90 bugs in the upstream Linux kernel, 62 of which have been acknowledged. Forty-three bugs have been fixed with 32 CVEs assigned. In addition, JANUS achieves higher code coverage on all the file systems after fuzzing 12 hours, when compared with the state-of-the-art fuzzer Syzkaller for fuzzing file systems. JANUS visits 4.19x and 2.01x more code paths in Btrfs and ext4, respectively. Moreover, JANUS is able to reproduce 88-100% of the crashes, while Syzkaller fails on all of them.
Razzer: Finding Kernel Race Bugs through Fuzzing (S&P 2019)
Paper
Abstract: A data race in a kernel is an important class of bugs, critically impacting the reliability and security of the associated system. As a result of a race, the kernel may become unresponsive. Even worse, an attacker may launch a privilege escalation attack to acquire root privileges. In this paper, we propose Razzer, a tool to find race bugs in kernels. The core of Razzer is in guiding fuzz testing towards potential data race spots in the kernel. Razzer employs two techniques to find races efficiently: a static analysis and a deterministic thread interleaving technique. Using a static analysis, Razzer identifies over-approximated potential data race spots, guiding the fuzzer to search for data races in the kernel more efficiently. Using the deterministic thread interleaving technique implemented at the hypervisor, Razzer tames the non-deterministic behavior of the kernel such that it can deterministically trigger a race. We implemented a prototype of Razzer and ran the latest Linux kernel (from v4.16-rc3 to v4.18-rc3) using Razzer. As a result, Razzer discovered 30 new races in the kernel, with 16 subsequently confirmed and accordingly patched by kernel developers after they were reported.
kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels (Usenix Security2017)
Paper  Slide  CodePaper
Abstract: Many kinds of memory safety vulnerabilities have been endangering software systems for decades. Amongst other approaches, fuzzing is a promising technique to unveil various software faults. Recently, feedback-guided fuzzing demonstrated its power, producing a steady stream of security-critical software bugs. Most fuzzing efforts—especially feedback fuzzing—are limited to user space components of an operating system (OS), although bugs in kernel components are more severe, because they allow an attacker to gain access to a system with full privileges. Unfortunately, kernel components are difficult to fuzz as feedback mechanisms (i.e., guided code coverage) cannot be easily applied. Additionally, non-determinism due to interrupts, kernel threads, statefulness, and similar mechanisms poses problems. Furthermore, if a process fuzzes its own kernel, a kernel crash highly impacts the performance of the fuzzer as the OS needs to reboot.
In this paper, we approach the problem of coverage-guided kernel fuzzing in an OS-independent and hardware-assisted way: We utilize a hypervisor and Intel’s Processor Trace (PT) technology. This allows us to remain independent of the target OS as we just require a small user space component that interacts with the targeted OS. As a result, our approach introduces almost no performance overhead, even in cases where the OS crashes, and performs up to 17,000 executions per second on an off-the-shelf laptop. We developed a framework called kernel-AFL (kAFL) to assess the security of Linux, macOS, and Windows kernel components. Among many crashes, we uncovered several flaws in the ext4 driver for Linux, the HFS and APFS file system of macOS, and the NTFS driver of Windows.
Hybrid Fuzzing:
Send Hardest Problems My Way: Probabilistic Path Prioritization for Hybrid Fuzzing (NDSS2019)
Paper
Abstract: Hybrid fuzzing which combines fuzzing and concolic execution has become an advanced technique for software vulnerability detection. Based on the observation that fuzzing and concolic execution are complementary in nature, the state-of-the-art hybrid fuzzing systems deploy “demand launch” and “optimal switch” strategies. Although these ideas sound intriguing, we point out several fundamental limitations in them, due to oversimplified assumptions. We then propose a novel “discriminative dispatch” strategy to better utilize the capability of concolic execution. We design a novel Monte Carlo based probabilistic path prioritization model to quantify each path’s difficulty and prioritize them for concolic execution. This model treats fuzzing as a random sampling process. It calculates each path’s probability based on the sampling information. Finally, our model prioritizes and assigns the most difficult paths to concolic execution. We implement a prototype system DigFuzz and evaluate our system with two representative datasets. Results show that the concolic execution in DigFuzz outperforms than that in a state-of-the-art hybrid fuzzing system Driller in every major aspect. In particular, the concolic execution in DigFuzz contributes to discovering more vulnerabilities (12 vs. 5) and producing more code coverage (18.9% vs. 3.8%) on the CQE dataset than the concolic execution in Driller.
QSYM: A Practical Concolic Execution Engine Tailored for Hybrid Fuzzing (USENUX Security2018)
Abstract: Recently, hybrid fuzzing has been proposed to address the limitations of fuzzing and concolic execution by combining both approaches. The hybrid approach has shown its effectiveness in various synthetic benchmarks such as DARPA Cyber Grand Challenge (CGC) binaries, but it still suffers from scaling to find bugs in complex, real-world software. We observed that the performance bottleneck of the existing concolic executor is the main limiting factor for its adoption beyond a small-scale study. To overcome this problem, we design a fast concolic execution engine, called QSYM, to support hybrid fuzzing. The key idea is to tightly integrate the symbolic emulation with the native execution using dynamic binary translation, making it possible to implement more fine-grained, so faster, instruction-level symbolic emulation. Additionally, QSYM loosens the strict soundness requirements of conventional concolic executors for better performance, yet takes advantage of a faster fuzzer for validation, providing unprecedented opportunities for performance optimizations, e.g., optimistically solving constraints and pruning uninteresting basic blocks. Our evaluation shows that QSYM does not just outperform state-of-the-art fuzzers (i.e., found 14× more bugs than VUzzer in the LAVA-M dataset, and outperformed Driller in 104 binaries out of 126), but also found 13 previously unknown security bugs in eight real-world programs like Dropbox Lepton, ffmpeg, and OpenJPEG, which have already been intensively tested by the state-of-the-art fuzzers, AFL and OSS-Fuzz.
Angora: Efficient Fuzzing by Principled Search (S&P 2018)
Abstract: Abstract-Fuzzing is a popular technique for finding software bugs. However, the performance of the state-of-the-art fuzzers leaves a lot to be desired. Fuzzers based on symbolic execution produce quality inputs but run slow, while fuzzers based on random mutation run fast but have difficulty producing quality inputs. We propose Angora, a new mutation-based fuzzer that outperforms the state-of-the-art fuzzers by a wide margin. The main goal of Angora is to increase branch coverage by solving path constraints without symbolic execution. To solve path constraints efficiently, we introduce several key techniques: scalable byte-level taint tracking, context-sensitive branch count, search based on gradient descent, and input length exploration. On the LAVA-M data set, Angora found almost all the injected bugs, found more bugs than any other fuzzer that we compared with, and found eight times as many bugs as the second-best fuzzer in the program who. Angora also found 103 bugs that the LAVA authors injected but could not trigger. We also tested Angora on eight popular, mature open source programs. Angora found 6, 52, 29, 40 and 48 new bugs in file, jhead, nm, objdump and size, respectively. We measured the coverage of Angora and evaluated how its key techniques contribute to its impressive performance.
Driller: Argumenting Fuzzing Through Selective Symbolic Execution(NDSS 2016)
Paper
Abstract: Memory corruption vulnerabilities are an everpresent risk in software, which attackers can exploit to obtain unauthorized access to confidential information. As products with access to sensitive data are becoming more prevalent, the number of potentially exploitable systems is also increasing, resulting in a greater need for automated software vetting tools. DARPA recently funded a competition, with millions of dollars in prize money, to further research focusing on automated vulnerability finding and patching, showing the importance of research in this area. Current techniques for finding potential bugs include static, dynamic, and concolic analysis systems, which each having their own advantages and disadvantages. A common limitation of systems designed to create inputs which trigger vulnerabilities is that they only find shallow bugs and struggle to exercise deeper paths in executables. We present Driller, a hybrid vulnerability excavation tool which leverages fuzzing and selective concolic execution in a complementary manner, to find deeper bugs. Inexpensive fuzzing is used to exercise compartments of an application, while concolic execution is used to generate inputs which satisfy the complex checks separating the compartments. By combining the strengths of the two techniques, we mitigate their weaknesses, avoiding the path explosion inherent in concolic analysis and the incompleteness of fuzzing. Driller uses selective concolic execution to explore only the paths deemed interesting by the fuzzer and to generate inputs for conditions that the fuzzer cannot satisfy. We evaluate Driller on 126 applications released in the qualifying event of the DARPA Cyber Grand Challenge and show its efficacy by identifying the same number of vulnerabilities, in the same time, as the top-scoring team of the qualifying event.
Note: 我们都知道，fuzzing对于一些比较宽松的限制(比如x>0)能够很容易的通过变异产生一些输入达到该条件；而symbolic execution非常擅长求解一下magic value(比如x == deadleaf)。这是一篇比较经典的将concolic execution和fuzzing结合在一起的文章，该文章的主要思想就是先用AFL等Fuzzer根据seed进行变异，来测试程序。当产生的输入一直走某些路径，并没有探测到新的路径时，此时就"stuck"了。这时，就是用concolic execution来产生输入，保证该输入能走到一些新的分支。从而利用concolic execution来辅助fuzz。
REDQUEEN: Fuzzing with Input-to-State Correspondence (NDSS2019)
Abstract: Automated software testing based on fuzzing has experienced a revival in recent years. Especially feedback-driven fuzzing has become well-known for its ability to efficiently perform randomized testing with limited input corpora. Despite a lot of progress, two common problems are magic numbers and (nested) checksums. Computationally expensive methods such as taint tracking and symbolic execution are typically used to overcome such roadblocks. Unfortunately, such methods often require access to source code, a rather precise description of the environment (e.g., behavior of library calls or the underlying OS), or the exact semantics of the platform’s instruction set. In this paper, we introduce a lightweight, yet very effective alternative to taint tracking and symbolic execution to facilitate and optimize state-of-the-art feedback fuzzing that easily scales to large binary applications and unknown environments. We observe that during the execution of a given program, parts of the input often end up directly (i.e., nearly unmodified) in the program state. This input-to-state correspondence can be exploited to create a robust method to overcome common fuzzing roadblocks in a highly effective and efficient manner. Our prototype implementation, called REDQUEEN, is able to solve magic bytes and (nested) checksum tests automatically for a given binary executable. Additionally, we show that our techniques outperform various state-of-the-art tools on a wide variety of targets across different privilege levels (kernel-space and userland) with no platform-specific code. REDQUEEN is the first method to find more than 100% of the bugs planted in LAVA-M across all targets. Furthermore, we were able to discover 65 new bugs and obtained 16 CVEs in multiple programs and OS kernel drivers. Finally, our evaluation demonstrates that REDQUEEN is fast, widely applicable and outperforms concurrent approaches by up to three orders of magnitude.
T-Fuzz: fuzzing by program transformation (S&P 2018)
Abstract: Fuzzing is a simple yet effective approach to discover software bugs utilizing randomly generated inputs. However, it is limited by coverage and cannot find bugs hidden in deep execution paths of the program because the randomly generated inputs fail complex sanity checks, e.g., checks on magic values, checksums, or hashes. To improve coverage, existing approaches rely on imprecise heuristics or complex input mutation techniques (e.g., symbolic execution or taint analysis) to bypass sanity checks. Our novel method tackles coverage from a different angle: by removing sanity checks in the target program. T-Fuzz leverages a coverage-guided fuzzer to generate inputs. Whenever the fuzzer can no longer trigger new code paths, a light-weight, dynamic tracing based technique detects the input checks that the fuzzer-generated inputs fail. These checks are then removed from the target program. Fuzzing then continues on the transformed program, allowing the code protected by the removed checks to be triggered and potential bugs discovered. Fuzzing transformed programs to find bugs poses two challenges: (1) removal of checks leads to over-approximation and false positives, and (2) even for true bugs, the crashing input on the transformed program may not trigger the bug in the original program. As an auxiliary post-processing step, T-Fuzz leverages a symbolic execution-based approach to filter out false positives and reproduce true bugs in the original program. By transforming the program as well as mutating the input, T-Fuzz covers more code and finds more true bugs than any existing technique. We have evaluated T-Fuzz on the DARPA Cyber Grand Challenge dataset, LAVA-M dataset and 4 real-world programs (pngfix, tiffinfo, magick and pdftohtml). For the CGC dataset, T-Fuzz finds bugs in 166 binaries, Driller in 121, and AFL in 105. In addition, found 3 new bugs in previously-fuzzed programs and libraries.
FairFuzz: A Targeted Mutation Strategy for Increasing Greybox Fuzz Testing Coverage (ASE 2018)
Abstract: In recent years, fuzz testing has proven itself to be one of the most effective techniques for finding correctness bugs and security vulnerabilities in practice. One particular fuzz testing tool, American Fuzzy Lop (AFL), has become popular thanks to its ease-of-use and bug-finding power. However, AFL remains limited in the bugs it can find since it simply does not cover large regions of code. If it does not cover parts of the code, it will not find bugs there. We propose a two-pronged approach to increase the coverage achieved by AFL. First, the approach automatically identifies branches exercised by few AFL-produced inputs (rare branches), which often guard code that is empirically hard to cover by naively mutating inputs. The second part of the approach is a novel mutation mask creation algorithm, which allows mutations to be biased towards producing inputs hitting a given rare branch. This mask is dynamically computed during fuzz testing and can be adapted to other testing targets. We implement this approach on top of AFL in a tool named FairFuzz. We conduct evaluation on real-world programs against state-of-the-art versions of AFL. We find that on these programs FairFuzz achieves high branch coverage at a faster rate that state-of-the-art versions of AFL. In addition, on programs with nested conditional structure, it achieves sustained increases in branch coverage after 24 hours (average 10.6% increase). In qualitative analysis, we find that FairFuzz has an increased capacity to automatically discover keywords.
VUzzer: Application-aware Evolutionary Fuzzing (NDSS 2017)
Abstract: Fuzzing is an effective software testing technique to find bugs. Given the size and complexity of real-world applications, modern fuzzers tend to be either scalable, but not effective in exploring bugs that lie deeper in the execution, or capable of penetrating deeper in the application, but not scalable. In this paper, we present an application-aware evolutionary fuzzing strategy that does not require any prior knowledge of the application or input format. In order to maximize coverage and explore deeper paths, we leverage control- and data-flow features based on static and dynamic analysis to infer fundamental properties of the application. This enables much faster generation of interesting inputs compared to an application-agnostic approach. We implement our fuzzing strategy in VUzzer and evaluate it on three different datasets: DARPA Grand Challenge binaries (CGC), a set of real-world applications (binary input parsers), and the recently released LAVA dataset. On all of these datasets, VUzzer yields significantly better results than state-of-the-art fuzzers, by quickly finding several existing and new bugs.
Inputs-aware Fuzzing
SLF: Fuzzing without Valid Seed Inputs (ICSE2019)
Paper
Abstract: Fuzzing is an important technique to detect software bugs and vulnerabilities. It works by mutating a small set of seed inputs to generate a large number of new inputs. Fuzzers’ performance often substantially degrades when valid seed inputs are not available. Although existing techniques such as symbolic execution can generate seed inputs from scratch, they have various limitations hindering their applications in real-world complex software without source code. In this paper, we propose a novel fuzzing technique that features the capability of generating valid seed inputs. It piggy-backs on AFL to identify input validity checks and the input fields that have impact on such checks. It further classifies these checks according to their relations to the input. Such classes include arithmetic relation, object offset, data structure length and so on. A multi-goal search algorithm is developed to apply class specific mutations in order to satisfy inter-dependent checks all together. We evaluate our technique on 20 popular benchmark programs collected from other fuzzing projects and the Google fuzzer test suite, and compare it with existing fuzzers AFL and AFLFast, symbolic execution engines KLEE and S2E, and a hybrid tool Driller that combines fuzzing with symbolic execution. The results show that our technique is highly effective and efficient, out-performing the other tools.
Superion: Grammar-Aware Greybox Fuzzing (ICSE 2019)
Paper
Abstract: In recent years, coverage-based greybox fuzzing has proven itself to be one of the most effective techniques for finding security bugs in practice. Particularly, American Fuzzy Lop (AFL for short) is deemed to be a great success in fuzzing relatively simple test inputs. Unfortunately, when it meets structured test inputs such as XML and JavaScript, those grammar-blind trimming and mutation strategies in AFL hinder the effectiveness and efficiency. To this end, we propose a grammar-aware coverage-based grey-box fuzzing approach to fuzz programs that process structured inputs. Given the grammar (which is often publicly available) of test inputs, we introduce a grammar-aware trimming strategy to trim test inputs at the tree level using the abstract syntax trees (ASTs) of parsed test inputs. Further, we introduce two grammar-aware mutation strategies (i.e., enhanced dictionary-based mutation and tree-based mutation). Specifically, tree-based mutation works via replacing subtrees using the ASTs of parsed test inputs. Equipped with grammar-awareness, our approach can carry the fuzzing exploration into width and depth. We implemented our approach as an extension to AFL, named Superion; and evaluated the effectiveness of Superion on real-life large-scale programs (a XML engine libplist and three JavaScript engines WebKit, Jerryscript and ChakraCore). Our results have demonstrated that Superion can improve the code coverage (i.e., 16.7% and 8.8% in line and function coverage) and bug-finding capability (i.e., 30 new bugs, among which we discovered 21 new vulnerabilities with 16 CVEs assigned and 3.2K USD bug bounty rewards received) over AFL and jsfunfuzz.
ProFuzzer: On-the-fly Input Type Probing for Better Zero-day Vulnerability Discovery (S&P 2019)
Abstract: Existing mutation based fuzzers tend to randomly mutate the input of a program without understanding its underlying syntax and semantics. In this paper, we propose a novel on-the-fly probing technique (called ProFuzzer) that automatically recovers and understands input fields of critical importance to vulnerability discovery during a fuzzing process and intelligently adapts the mutation strategy to enhance the chance of hitting zero-day targets. Since such probing is transparently piggybacked to the regular fuzzing, no prior knowledge of the input specification is needed. During fuzzing, individual bytes are first mutated and their fuzzing results are automatically analyzed to link those related together and identify the type for the field connecting them; these bytes are further mutated together following type-specific strategies, which substantially prunes the search space. We define the probe types generally across all applications, thereby making our technique application agnostic. Our experiments on standard benchmarks and real-world applications show that ProFuzzer substantially outperforms AFL and its optimized version AFLFast, as well as other state-of-art fuzzers including VUzzer, Driller and QSYM. Within two months, it exposed 42 zero-days in 10 intensively tested programs, generating 30 CVEs.
Directed Fuzzing
Directed Greybox Fuzzing (CCS 2017)
Paper  Code
Abstract: Existing Greybox Fuzzers (GF) cannot be effectively directed, for instance, towards problematic changes or patches, towards critical system calls or dangerous locations, or towards functions in the stack-trace of a reported vulnerability that we wish to reproduce. In this paper, we introduce Directed Greybox Fuzzing (DGF) which generates inputs with the objective of reaching a given set of target program locations efficiently. We develop and evaluate a simulated annealing-based power schedule that gradually assigns more energy to seeds that are closer to the target locations while reducing energy for seeds that are further away. Experiments with our implementation AFLGo demonstrate that DGF outperforms both directed symbolic-execution-based whitebox fuzzing and undirected greybox fuzzing. We show applications of DGF to patch testing and crash reproduction, and discuss the integration of AFLGo into Google’s continuous fuzzing platform OSS-Fuzz. Due to its directedness, AFLGo could find 39 bugs in several well-fuzzed, security-critical projects like LibXML2. 17 CVEs were assigned.
Hawkeye: Towards a Desired Directed Grey-box Fuzzer (CCS 2018)
Abstract: Grey-box fuzzing is a practically effective approach to test real-world programs. However, most existing grey-box fuzzers lack directedness, i.e. the capability of executing towards user-specified target sites in the program. To emphasize existing challenges in directed fuzzing, we propose Hawkeye to feature four desired properties of directed grey-box fuzzers. Owing to a novel static analysis on the program under test and the target sites, Hawkeye precisely collects the information such as the call graph, function and basic block level distances to the targets. During fuzzing, Hawkeye evaluates exercised seeds based on both static information and the execution traces to generate the dynamic metrics, which are then used for seed prioritization, power scheduling and adaptive mutating. These strategies help Hawkeye to achieve better directedness and gravitate towards the target sites. We implemented Hawkeye as a fuzzing framework and evaluated it on various real-world programs under different scenarios. The experimental results showed that Hawkeye can reach the target sites and reproduce the crashes much faster than state-of-the-art grey-box fuzzers such as AFL and AFLGo. Specially, Hawkeye can reduce the time to exposure for certain vulnerabilities from about 3.5 hours to 0.5 hour. By now, Hawkeye has detected more than 41 previously unknown crashes in projects such as Oniguruma, MJS with the target sites provided by vulnerability prediction tools; all these crashes are confirmed and 15 of them have been assigned CVE IDs.
CollAFL: Path Sensitive Fuzzing (S&P 2018)
Abstract: Coverage-guided fuzzing is a widely used and ef- fective solution to find software vulnerabilities. Tracking code coverage and utilizing it to guide fuzzing are crucial to coverage- guided fuzzers. However, tracking full and accurate path coverage is infeasible in practice due to the high instrumentation overhead. Popular fuzzers (e.g., AFL) often use coarse coverage information, e.g., edge hit counts stored in a compact bitmap, to achieve highly efficient greybox testing. Such inaccuracy and incompleteness in coverage introduce serious limitations to fuzzers. First, it causes path collisions, which prevent fuzzers from discovering potential paths that lead to new crashes. More importantly, it prevents fuzzers from making wise decisions on fuzzing strategies. In this paper, we propose a coverage sensitive fuzzing solution CollAFL. It mitigates path collisions by providing more accurate coverage information, while still preserving low instrumentation overhead. It also utilizes the coverage information to apply three new fuzzing strategies, promoting the speed of discovering new paths and vulnerabilities. We implemented a prototype of CollAFL based on the popular fuzzer AFL and evaluated it on 24 popular applications. The results showed that path collisions are common, i.e., up to 75% of edges could collide with others in some applications, and CollAFL could reduce the edge collision ratio to nearly zero. Moreover, armed with the three fuzzing strategies, CollAFL outperforms AFL in terms of both code coverage and vulnerability discovery. On average, CollAFL covered 20% more program paths, found 320% more unique crashes and 260% more bugs than AFL in 200 hours. In total, CollAFL found 157 new security bugs with 95 new CVEs assigned.
Full-speed Fuzzing: Reducing Fuzzing Overhead through Coverage-guided Tracing (S&P 2019)
Paper
Abstract: Coverage-guided fuzzing is one of the most successful approaches for discovering software bugs and security vulnerabilities. Of its three main components: (1) test case generation, (2) code coverage tracing, and (3) crash triage, code coverage tracing is a dominant source of overhead. Coverage-guided fuzzers trace every test case’s code coverage through either static or dynamic binary instrumentation, or more recently, using hardware support. Unfortunately, tracing all test cases incurs significant performance penalties–even when the overwhelming majority of test cases and their coverage information are discarded because they do not increase code coverage. To eliminate needless tracing by coverage-guided fuzzers, we introduce the notion of coverage-guided tracing. Coverage-guided tracing leverages two observations: (1) only a fraction of generated test cases increase coverage, and thus require tracing; and (2) coverage-increasing test cases become less frequent over time. Coverage-guided tracing encodes the current frontier of coverage in the target binary so that it self-reports when a test case produces new coverage–without tracing. This acts as a filter for tracing; restricting the expense of tracing to only coverage-increasing test cases. Thus, coverage-guided tracing trades increased time handling coverage-increasing test cases for decreased time handling non-coverage-increasing test cases. To show the potential of coverage-guided tracing, we create an implementation based on the static binary instrumentor Dyninst called UnTracer. We evaluate UnTracer using eight real-world binaries commonly used by the fuzzing community. Experiments show that after only an hour of fuzzing, UnTracer’s average overhead is below 1%, and after 24-hours of fuzzing, UnTracer approaches 0% overhead, while tracing every test case with popular white- and black-box-binary tracers AFL-Clang, AFL-QEMU, and AFL-Dyninst incurs overheads of 36%, 612%, and 518%, respectively. We further integrate UnTracer with the state-of-the-art hybrid fuzzer QSYM and show that in 24-hours of fuzzing, QSYM-UnTracer executes 79% and 616% more test cases than QSYM-Clang and QSYM-QEMU, respectively.
Designing New Operating Primitives to Improve Fuzzing Performance (CCS 2017)
Paper  Code
Abstract: Fuzzing is a software testing technique that finds bugs by repeatedly injecting mutated inputs to a target program. Known to be a highly practical approach, fuzzing is gaining more popularity than ever before. Current research on fuzzing has focused on producing an input that is more likely to trigger a vulnerability. In this paper, we tackle another way to improve the performance of fuzzing, which is to shorten the execution time of each iteration. We observe that AFL, a state-of-the-art fuzzer, slows down by 24x because of file system contention and the scalability of fork() system call when it runs on 120 cores in parallel. Other fuzzers are expected to suffer from the same scalability bottlenecks in that they follow a similar design pattern. To improve the fuzzing performance, we design and implement three new operating primitives specialized for fuzzing that solve these performance bottlenecks and achieve scalable performance on multi-core machines. Our experiment shows that the proposed primitives speed up AFL and LibFuzzer by 6.1 to 28.9x and 1.1 to 735.7x, respectively, on the overall number of executions per second when targeting Google’s fuzzer test suite with 120 cores. In addition, the primitives improve AFL’s throughput up to 7.7x with 30 cores, which is a more common setting in data centers. Our fuzzer-agnostic primitives can be easily applied to any fuzzer with fundamental performance improvement and directly benefit large-scale fuzzing and cloud-based fuzzing services.
Enhancing Memory Error:
Enhancing Memory Error Detection for Large-Scale Applications and Fuzz Testing (NDSS 2018)
Abstract: Memory errors are one of the most common vulnerabilities for the popularity of memory unsafe languages including C and C++. Once exploited, it can easily lead to system crash (i.e., denial-of-service attacks) or allow adversaries to fully compromise the victim system. This paper proposes MEDS, a practical memory error detector. MEDS significantly enhances its detection capability by approximating two ideal properties, called an infinite gap and an infinite heap. The approximated infinite gap of MEDS setups large inaccessible memory region between objects (i.e., 4 MB), and the approximated infinite heap allows MEDS to fully utilize virtual address space (i.e., 45-bits memory space). The key idea of MEDS in achieving these properties is a novel user-space memory allocation mechanism, MEDSALLOC. MEDSALLOC leverages a page aliasing mechanism, which allows MEDS to maximize the virtual memory space utilization but minimize the physical memory uses. To highlight the detection capability and practical impacts of MEDS, we evaluated and then compared to Google’s state-of-the-art detection tool, AddressSanitizer. MEDS showed three times better detection rates on four real-world vulnerabilities in Chrome and Firefox. More importantly, when used for a fuzz testing, MEDS was able to identify 68.3% more memory errors than AddressSanitizer for the same amount of a testing time, highlighting its practical aspects in the software testing area. In terms of performance overhead, MEDS slowed down 108% and 86% compared to native execution and AddressSanitizer, respectively, on real-world applications including Chrome, Firefox, Apache, Nginx, and OpenSSL.
Power Schedule
Coverage-based Greybox Fuzzing as Markov Chain (CCS 2016)
Paper  Code
Abstract: Coverage-based Greybox Fuzzing (CGF) is a random testing approach that requires no program analysis. A new test is generated by slightly mutating a seed input. If the test exercises a new and interesting path, it is added to the set of seeds; otherwise, it is discarded. We observe that most tests exercise the same few “high-frequency” paths and develop strategies to explore significantly more paths with the same number of tests by gravitating towards low-frequency paths. We explain the challenges and opportunities of CGF using a Markov chain model which specifies the probability that fuzzing the seed that exercises path i generates an input that exercises path j. Each state (i.e., seed) has an energy that specifies the number of inputs to be generated from that seed. We show that CGF is considerably more efficient if energy is inversely proportional to the density of the stationary distribution and increases monotonically every time that seed is chosen. Energy is controlled with a power schedule. We implemented the exponential schedule by extending AFL. In 24 hours, AFLFAST exposes 3 previously unreported CVEs that are not exposed by AFL and exposes 6 previously unreported CVEs 7x faster than AFL. AFLFAST produces at least an order of magnitude more unique crashes than AFL.
Learning-based Fuzzing:
NEUZZ: Efficient Fuzzing with Neural Program Smoothing (S&P 2019)
Paper
Abstract: Fuzzing has become the de facto standard technique for finding software vulnerabilities. However, even state-of-the-art fuzzers are not very efficient at finding hard-to-trigger software bugs. Most popular fuzzers use evolutionary guidance to generate inputs that can trigger different bugs. Such evolutionary algorithms, while fast and simple to implement, often get stuck in fruitless sequences of random mutations. Gradient-guided optimization presents a promising alternative to evolutionary guidance. Gradient-guided techniques have been shown to significantly outperform evolutionary algorithms at solving high-dimensional structured optimization problems in domains like machine learning by efficiently utilizing gradients or higher-order derivatives of the underlying function. However, gradient-guided approaches are not directly applicable to fuzzing as real-world program behaviors contain many discontinuities, plateaus, and ridges where the gradient-based methods often get stuck. We observe that this problem can be addressed by creating a smooth surrogate function approximating the target program’s discrete branching behavior. In this paper, we propose a novel program smoothing technique using surrogate neural network models that can incrementally learn smooth approximations of a complex, real-world program’s branching behaviors. We further demonstrate that such neural network models can be used together with gradient-guided input generation schemes to significantly increase the efficiency of the fuzzing process. Our extensive evaluations demonstrate that NEUZZ significantly outperforms 10 state-of-the-art graybox fuzzers on 10 popular real-world programs both at finding new bugs and achieving higher edge coverage. NEUZZ found 31 previously unknown bugs (including two CVEs) that other fuzzers failed to find in 10 real-world programs and achieved 3X more edge coverage than all of the tested graybox fuzzers over 24 hour runs. Furthermore, NEUZZ also outperformed existing fuzzers on both LAVA-M and DARPA CGC bug datasets.
Fuzzing Machine Learning Model
TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing (2018)
Paper  Code
Abstract: Machine learning models are notoriously difficult to interpret and debug. This is particularly true of neural networks. In this work, we introduce automated software testing techniques for neural networks that are well-suited to discovering errors which occur only for rare inputs. Specifically, we develop coverage-guided fuzzing (CGF) methods for neural networks. In CGF, random mutations of inputs to a neural network are guided by a coverage metric toward the goal of satisfying user-specified constraints. We describe how fast approximate nearest neighbor algorithms can provide this coverage metric. We then discuss the application of CGF to the following goals: finding numerical errors in trained neural networks, generating disagreements between neural networks and quantized versions of those networks, and surfacing undesirable behavior in character level language models. Finally, we release an open source library called TensorFuzz that implements the described techniques.
Coverage-Guided Fuzzing for Deep Neural Networks (2018)
Paper
Abstract: In company with the data explosion over the past decade, deep neural network (DNN) based software has experienced unprecedented leap and is becoming the key driving force of many novel industrial applications, including many safety-critical scenarios such as autonomous driving. Despite great success achieved in various human intelligence tasks, similar to traditional software, DNNs could also exhibit incorrect behaviors caused by hidden defects causing severe accidents and losses. In this paper, we propose DeepHunter, an automated fuzz testing framework for hunting potential defects of general-purpose DNNs. DeepHunter performs metamorphic mutation to generate new semantically preserved tests, and leverages multiple plugable coverage criteria as feedback to guide the test generation from different perspectives. To be scalable towards practical-sized DNNs, DeepHunter maintains multiple tests in a batch, and prioritizes the tests selection based on active feedback. The effectiveness of DeepHunter is extensively investigated on 3 popular datasets (MNIST, CIFAR-10, ImageNet) and 7 DNNs with diverse complexities, under large set of 6 coverage criteria as feedback. The large-scale experiments demonstrate that DeepHunter can (1) significantly boost the coverage with guidance; (2) generate useful tests to detect erroneous behaviors and facilitate the DNN model quality evaluation; (3) accurately capture potential defects during DNN quantization for platform migration.
展开全文
• 自动起毛 基于“教程，我们旨在自动化ROS 2上的模糊测试。 安装 已安装Docker。 依存关系 用法 source start.sh colcon build 执照
• 本篇文章中主要记录Fuzzing101中Exercise 1的学习过程，关于文章中所用到的测试工具与测试目标，将会在后面的内容中展现。通过本文，将会将会学习到对目标应用程序进行插桩、AFL-FUZZ的使用、GDB验证fuzz结果。编写...

前言
本篇文章中主要记录Fuzzing101中Exercise 1的学习过程，关于文章中所用到的测试工具与测试目标，将会在后面的内容中展现。通过本文，将会展示如下知识点：
对目标应用程序进行插桩AFL-FUZZ的使用GDB验证fuzz结果
本次目标为Xpdf 3.02版本，该本存在一个不受控制的递归漏洞CVE-2019-13288，该漏洞是通过构造文件的方式，使得Parser.cc文件中的Parser::getObj()函数多次调用，由于每个被调用的函数都会在栈上开辟一块新的使用空间(栈帧)，如果一个函数被递归调用太多次，就会导致栈内存耗尽并导致程序崩溃，因此远程攻击者可以利用这个漏洞进行DOS攻击
编写不易，如果能够帮助到你，希望能够点赞收藏加关注哦Thanks♪(･ω･)ﾉ

模糊测试系列往期回顾： AFL源码分析之afl-fuzz.c详细注释（二）：FUZZ执行流程（五万字警告，慎入） AFL源码分析之afl-fuzz.c详细注释（一）：初始配置（6万字警告，慎入） AFL源码分析之afl-gcc.c详细注释 AFL源码分析之afl-as.c详细注释 免费资源：AFL-2.57b.zip（AFL源码分析章节版本）

项目介绍
本文主要用到三个项目：Fuzzing101、AFL、XPDF，本章节会进行简单介绍
Fuzzing 101
项目地址：https://github.com/antonio-morales/Fuzzing101
Fuzzing101为GitHub上的一个开源项目，项目收录了十个真实的漏洞目标（目前已展示八个），该项目主要面向想要学习fuzz的人，以及想要在真实项目程序中尝试fuzz的人

选取Fuzzing 101作为模糊测试专栏的练习，是觉得这个项目中的样例比较好上手，逐步的可以见识到各种fuzz的方式
AFL
非常经典的面向安全的模糊器，采用一种新型的编译时检测算法和遗传算法，可以自动发现测试用例在二进制文件中的内部状态

关于AFL的详细内容可以查看前几篇关于源码注释的文章，本次使用AFL而不是fuzzing101中推荐的AFL++，是因为想要验证一下前段时间读源码的成果
Xpdf
Xpdf是一个免费的PDF查看器和工具包，包括文本提取器、图像转换器、HTML转换器，这是一个开源项目

Xpdf也是本次实验的目标程序，后面的内容会对该项目做进一步介绍
环境部署
sudo apt-get install gcc git make wget build-essential

下面直接看目标安装过程
AFL安装
hollk@ubuntu:~$cd$HOME
hollk@ubuntu:~$git clone https://github.com/google/AFL.git && cd AFL/  这里需要注意的是需要将afl-clang-fast.c文件中的部分内容删掉，不然编译时候会报错 hollk@ubuntu:~/AFL$ vim ./llvm_mode/afl-clang-fast.c
---------------131~134行-------------------
#ifndef __ANDROID__
cc_params[cc_par_cnt++] = "-mllvm";
cc_params[cc_par_cnt++] = "-sanitizer-coverage-block-threshold=0";
#endif
-------------------------------------------
将这部分内容删掉，保存退出

接下来编译AFL源码：
hollk@ubuntu:~/AFL$make AFL_TRACE_PC=1 hollk@ubuntu:~/AFL$ make install

检验一下AFL是否编译成功：
Xpdf插桩编译
接下来对本次目标进行安装与插桩，首先为本次模糊测试项目创建一个新的目录：
hollk@ubuntu:~/AFL$cd$HOME
hollk@ubuntu:~$mkdir fuzzing_xpdf && cd fuzzing_xpdf  下载Xpdf 3.02 hollk@ubuntu:~/fuzzing_xpdf$ wget https://dl.xpdfreader.com/old/xpdf-3.02.tar.gz
hollk@ubuntu:~/fuzzing_xpdf$tar -xvzf xpdf-3.02.tar.gz hollk@ubuntu:~/fuzzing_xpdf$ cd xpdf-3.02/

在xpdf-3.02/目录下，可以看到一个叫configure的文件，用编辑器打开它可以看到，这是一个生成Makefile的程序：
编译程序默认指定的时gcc，由于我们需要在编译的时候进行插桩，所以将编译命令指定为afl-clang-fast
hollk@ubuntu:~/fuzzing_xpdf/xpdf-3.02$export CC=/home/hollk/AFL/afl-clang-fast hollk@ubuntu:~/fuzzing_xpdf/xpdf-3.02$ export CXX=/home/hollk/AFL/afl-clang-fast++

编译程序，并指定将编译好的程序放在$HOME/fuzzing_xpdf/install/中 hollk@ubuntu:~/fuzzing_xpdf/xpdf-3.02$ ./configure --prefix="$HOME/fuzzing_xpdf/install/" hollk@ubuntu:~/fuzzing_xpdf/xpdf-3.02$ make
hollk@ubuntu:~/fuzzing_xpdf/xpdf-3.02$make install  编译好后，我们需要确认一下插桩成功了，因此进入到install目录，由于我们使用的时afl-clang-fast进行插桩，所以需要查看插桩关键字：__sanitizer_cov hollk@ubuntu:~/fuzzing_xpdf/xpdf-3.02$ cd $HOME/fuzzing_xpdf/install/ hollk@ubuntu:~/fuzzing_xpdf/install/bin$ strings ./pdftotext | grep __sanitizer_cov

插桩成功效果如下：
FUZZ阶段
实验环境已经部署好了，接下来就需要进行fuzz了
fuzz前期准备
在正式执行fuzz前，需要准备一些fuzz的种子，也就是说需要输入的测试用例，并且要为这些文件创建一个目录：
hollk@ubuntu:~/fuzzing_xpdf/install/bin$cd$HOME/fuzzing_xpdf
hollk@ubuntu:~/fuzzing_xpdf$mkdir pdf_examples && cd pdf_examples hollk@ubuntu:~/fuzzing_xpdf/pdf_examples$ wget https://github.com/mozilla/pdf.js-sample-files/raw/master/helloworld.pdf
hollk@ubuntu:~/fuzzing_xpdf/pdf_examples$wget http://www.africau.edu/images/default/sample.pdf hollk@ubuntu:~/fuzzing_xpdf/pdf_examples$ wget https://www.melbpc.org.au/wp-content/uploads/2017/10/small-example-pdf-file.pdf

这里需要注意的是helloworld.pdf这个文件所在的项目已经没了，所以需要从https://github.com/mozilla/pdf.js-sample-files链接中获取后，再复制到pdf_examples目录中
这里需要检测一下我们下载下来的文件是否是可用的，可以调用fuzzing_xpdf/install/bin目录下的pdfinfo查看一下helloworld.pdf的基本信息：
$HOME/fuzzing_xpdf/install/bin/pdfinfo -box -meta$HOME/fuzzing_xpdf/pdf_examples/helloworld.pdf

成功效果如下：

与此同时还需要关闭系统的核心转储，确保在fuzz过程中出现crash也不会使得程序中止
sudo su
echo core >/proc/sys/kernel/core_pattern
exit

开始fuzz
现在所有的条件都已经准备就绪，下面就开始进行fuzz了：
$HOME/AFL/afl-fuzz -i$HOME/fuzzing_xpdf/pdf_examples/ -o $HOME/fuzzing_xpdf/out/ -M fuzzer1 --$HOME/fuzzing_xpdf/install/bin/pdftotext @@ $HOME/fuzzing_xpdf/output  这里说明一下使用的参数： -i：指定输入文件夹，里面是准备好的种子-o：指定输出文件夹，存放fuzz过程中出现的生成的queue、crash、hang等-M：可以选用主从多开fuzzer（其他fuzzer用-S指定，需要注意的是输出路径要保持一致）–：分隔符，后加测试目标@@：指代文件，如果不加@@就是标准输入 那么使用-S指定多开fuzzer就可以同时进行多个fuzzer 这里由于分配给虚拟机只有4核，所以就只开了4个fuzzer，当然这个数量取决于你分配的核的数量，可以用htop查看一下当前的资源情况 可以看到，现在四个内核都处于使用状态 fuzz结果 首先说明一下AFL界面上的各个参数： process timing：进程时间 run time：运行总时间last new path：距离上一次发现新路径的时间last uniq crash：距离上一次发现crash的时间last uniq hang：距离上一次挂起的时间 overall results：总体状态信息 cycles done：运行的总周期数total paths：运行的总路径数uniq crashes：运行中的崩溃次数uniq hangs：运行中的挂起次数 cycle progress：进程周期进展 now processing：当前测试用例编号paths timed out：由于超时放弃的输入数量 map coverage： 映射覆盖率 map desity：已命中分支元组数与bitmap可以容纳的数量的比例（当前输入/全部语料库值）count coverage： 二进制文件中元组命中计数变化 stage progress：执行状态 now tring：正在执行变异策略stage execs：当前的进度total execs：全部执行总数exec speed：执行速度 findings in depth：路径深度发现 favored paths：基于最小化算法的favored路径new edge on：新发现边数total crashes：总crashes数total tmouts：总挂起信息 fuzzing strategy yields：用于追踪fuzzing策略所获得的路径和尝试执行的次数的比例，用于验证有效性path geometry： levels：fuzzing过程中达到的路径深度，用户提供测试用例为1，根据变异阶段递增逐渐增加pending：还有多少输入数据没有经过任何测试pend fav：fuzzer在这个队里中真正想达到的条目own finds：找到的新路径数量imported：从其他fuzzer导入的路径数量stability：衡量观测到痕迹的一致性 介绍完之后就应该知道主要应该注意的点： last new path：距离上一次发现新路径的时间，关注这个点可以更好地了解到fuzzing过程中是否存在某些问题，如果距离上一次发现新路径的时间过长，说明输入的种子可能是存在问题的，这时可以考虑暂停进一步蒸馏种子uniq crashes：运行中的崩溃次数，这个就很好理解了，因为一旦出现crash，说明程序由于一些输入导致了崩溃，这个崩溃很有可能是溢出导致的 这是我在跑fuzz时候fuzzer2出现crash的情况，可以看到uniq crashes已经出现了1个crash，这时就可以暂停fuzzing查看一下out目录下的crash输出，由于本次实验开了4个fuzzer，所以将会在~/fuzzing_xpdf/out目录下看到4个文件夹: 由于是fuzzer2跑出的crash，所以直接就直接进入~/fuzzing_xpdf/out/fuzzer2/crashes路径中可以看到类似如下名称的文件（每次跑出的文件名可能会有变化，按照你跑出的文件名为准，格式类似下图）： GDB验证fuzz结果 我们删掉由afl-clang-fast插桩编译的install目录下的内容，重新使用gcc编译： hollk@ubuntu:~$ rm -r $HOME/fuzzing_xpdf/install hollk@ubuntu:~/fuzzing_xpdf/xpdf-3.02$ cd $HOME/fuzzing_xpdf/xpdf-3.02/ hollk@ubuntu:~/fuzzing_xpdf/xpdf-3.02$ make clean
hollk@ubuntu:~/fuzzing_xpdf/xpdf-3.02$CFLAGS="-g -O0" CXXFLAGS="-g -O0" ./configure --prefix="$HOME/fuzzing_xpdf/install/"
hollk@ubuntu:~/fuzzing_xpdf/xpdf-3.02$make hollk@ubuntu:~/fuzzing_xpdf/xpdf-3.02$ make install

现在可以使用gdb指定pdftotext二进制程序运行fuzz结果文件：
hollk@ubuntu:~/fuzzing_xpdf/xpdf-3.02$cd$HOME/fuzzing_xpdf/install/bin
hollk@ubuntu:~/fuzzing_xpdf/install/bin$gdb --args ./pdftotext$HOME/fuzzing_xpdf/out/fuzzer2/crashes/id:000000,sig:06,src:001043,op:havoc,rep:32 /home/hollk/fuzzing_xpdf/output

进入gdb运行run命令，可以看到一连串的报错后程序会停在__vfprintf_internal报错函数中：

使用bt命令回溯一下执行过的函数：

可以看到Parser::getObj函数在不断的被递归调用，这验证了CVE-2019-13288漏洞中的描述

展开全文
• 你的鼓励不仅能让更多的人看到Fuzzing,更是作者原创的动力。 ????：博主是一个伪装成程序员的hacker。研究领域包括操作系统安全，二进制安全，Linux系统，运维，渗透测试，人工智能，信息计算科学等领域。懂些C和...
狩猎者网络安全旗下——知柯信息安全团队（知柯信安）
漏洞挖掘是否是真正的安全呢？
"The best alternative to defense mechanisms is to find and fix the bugs."  grescurity的观点是安全缓解 ACTUAL effective improvements to security come from building mitigations to kill entire classes of vulns, not bug hunting." Maybe we need secure language, library and tools – security by design.
什么是Fuzzing(模糊测试）？
模糊测试（Fuzzing），是一种通过向目标系统提供非预期的输入并监视异常结果来发现软件漏洞的方法。[百度百科]
模糊测试 （fuzz testing, fuzzing）是一种软件测试技术。其核心思想是将自动或半自动生成的随机数据输入到一个程序中，并监视程序异常，如崩溃，断言（assertion）失败，以发现可能的程序错误，比如内存泄漏。模糊测试常常用于检测软件或计算机系统的安全漏洞。
模糊测试最早由威斯康星大学的Barton Miller于1988年提出。他们的工作不仅使用随机无结构的测试数据，还系统的利用了一系列的工具去分析不同平台上的各种软件，并对测试发现的错误进行了系统的分析。此外，他们还公开了源代码，测试流程以及原始结果数据。
模糊测试工具主要分为两类，变异测试（mutation-based）以及生成测试（generation-based）。模糊测试可以被用作白盒，灰盒或黑盒测试。[3]文件格式与网络协议是最常见的测试目标，但任何程序输入都可以作为测试对象。常见的输入有环境变量，鼠标和键盘事件以及API调用序列。甚至一些通常不被考虑成输入的对象也可以被测试，比如数据库中的数据或共享内存。 对于安全相关的测试，那些跨越可信边界的数据是最令人感兴趣的。比如，模糊测试那些处理任意用户上传的文件的代码比测试解析服务器配置文件的代码更重要。因为服务器配置文件往往只能被有一定权限的用户修改。 【维基百科】
Fuzzing的历史：
“Generates a stream of random characters to be consumed by a target program” – Miller et al.
1988年，威斯康星大学的Barton Miller教授率先在他的课程实验提出模糊测试。实验内容是开发一个基本的命令行模糊器以测试Unix程序。这个模糊器可以用随机数据来“轰炸”这些测试程序直至其崩溃。类似的实验于1995年被重复，并且包括了图形界面程序，网络协议和系统API库。一些后续工作可以测试Mac和Windows系统上的命令行程序与图形界面程序。
技术：
模糊测试工具通常可以被分为两类。变异测试通过改变已有的数据样本去生成测试数据。生成测试则通过对程序输入的建模来生成新的测试数据。
定义Definition：
我们在设计程序时，除了考虑到程序功能之外，是否会出现其他程序员无法考虑到的情况？比如安全上的问题。 Fuzzing is the execution of the PUT using input(s) sampled from an input space (the “fuzz input space”) that protrudes the expected input space of the PUT. Fuzz testing is the use of fuzzing to test if a PUT violates a security policy.
优化问题Optimization problem
Fuzzing模型是一个优化问题
过程Process：
返回的B是返回的错误，C是测试的程序。整个Fuzzing测试最核心的一部分是Schedule调度，通过调度生成输入集。
Components组件：
1.Corpus语料集 2.Generator 3.Mutator 4.Input 5.Stage 6.Executor 7.Observer 8.Feedback
Generator与Mutator生成语料集，Mutator是改变语料集，Input是生成的语料集，Stage与Executor是模拟程序执行过程，我们将通过Observer获取执行信息，FeedBack判定语料集的过程。
Fuzzing入门教程An entry-level tutorial：
1.选择目标 2.分析代码 3.Write the harness 4.准备前奏 5.动态调优 6.Fuzzing的停止 7.Triage
选择目标

不受信任的输入：设备模块与迁移模块 非交互式、无状态：libpng与smtpd 不安全语言：C与Rust 古老而无路径：GNU coreutils与OpenSSL 单进程：libxml2与ftpd
分析代码 Analyse the code
我们的输入类型是什么？Argv、stdin、env、shm等。 它在哪里？Main, routine, lib, etc. 程序入口在哪里？ In-memory snapshot & copy 我们可以（手动）重置状态吗？ 具有较少分叉的持久模式 我们应该修补和追踪吗？ Socket, checksum, timer, random
Write the harness (partial)
这是AFL模糊测试工具的部分Write the harness源代码
int main(int arg,char **argv){
const char program_name[]="program name";
const size t program_name_size=
sizeof (program_name);
static char stdin read[1024 * 16]={'\0'};
static char *ret[1024 *4] = {{char * }NULL};
for (size t i =0 ; i<program_name_size;i++)
argc = 1;
argv  = &ret[0];
/*API init*/
#ifdef AFL HAVE_MANUAL CONTROL
AFL_INIT();
#endif
}
#ifdef AFL_HAVE_MANUAL CONTROL
//must be after  AFL_INIT and befor AFL_LOOP
unsigned char *fuzz buf = AFL_FUZZ_TESTCASE_BUF;
#endif
#ifdef AFL_HAVE_MANUAL CONTROL
while(AFL_LOPP(10000)){
#endif
memset(&ret[1],0,sizeof(ret) - sizeof(ret(0)));
#ifdef AFL_MANUAL_CONTROL
//don't use the macro directly in a call!
#endif
return 1;

准备前奏Prepare the prelude
**Dict: man 1 expr → “\x20”, “|”, “&”, “!”, “=”, “>”, “<”, “/”, “\”, “:”, “*”, etc. Seeds: “1 + 3”, “length 1x3”, “10 / 2”, “1234 : 23”, “sad % 3”, etc.
需要Fuzz程序源代码：
执行脚本Scripts:
$afl-system-config$ CC=afl-clang-lto CXX=afl-clang-lto++ RANLIB=llvm-ranlib AR=llvm-ar ./configure
$AFL_USE_UBSAN=1/AFL_USE_ASAN=1/AFL_HARDEN=1 make -j$(nproc)
$afl-fuzz -i seed -o out -x expr.dict -m none -M main0 ./expr_asan$ AFL_IMPORT_FIRST=1  AFL_NO_UI=1 afl-fuzz -i- -o out -L 0 -x expr.dict -S slaveX ./expr
$# etc.  动态调优Dynamic tune： Coverage: Gcov, llvm-cov, etc. Performance: Linux perf, gperftools, etc. Fuzzing的停止When to stop? Triage 我们希望POC越小越好 最小化： Afl-tmin, afl-extras, afl-ddmin-mod, abstract, etc. 重复数据消除Deduplication: Afl-cmin, Stack Backtrace Hashing, Semantics-aware, etc Exploitation： GDB extension ‘exploitable’, etc. Understandability： Afl-analyze, etc. What did we get? CWE-125, Out-of-bounds Read: $ expr 0 : "\$$0*\$$*0*\\1"

CWE-787, Out-of-bounds Write:
  $expr 0 : "$$'$$*"  Fuzzing devices in QEMU QEMU Device Fuzzer 具体来讲我们是对QEMU中的IO设备进行Fuzz QEMU Device Fuzzer QTest是QEMU的重置框架。 CCS 17’ - Designing New Operating Primitives to Improve Fuzz the E1000 Network Interface Card – 4 hours 我们观察到两个问题。 1.有效操作码过低，这将影响变异和执行的速度。 我们发现写的操作（MIMO write与PCI configure write）影响较大。 我们为不同的操作码分配不同的权重-负值​​删除需要抑制的无效操作码和操作。在fork（2）之前，我们计算输入的总重量，以决定是否值得fork或只是返回fuzzer。 Struct Aware - Libprotobuf-mutator, AFLSmart, NAUTILUS (Sweat and blood) USENIX Security '19 - MOPT: Optimized Mutation Scheduling for Fuzzers 我们能不能通过语法描述直接生成？ Google通过fuzzing集群对程序进行程序测试最后通知开发者。 漏洞和BUG： [Patch] SPICE/libspice-server: Fix nullptr dereference in red-parse-qxl.cpp [Vulnerability Disclosure] FFmpeg/libavcodec: Double free hevc context [Vulnerability Disclosure] GNOME/libgxps: Mishandle NULL pointer in the converter [Bug Report] qemu-system virtio-mouse: Assertion in address_space_lduw_le_cached failed [Bug Report] GNU Coreutils: Heap underflow when expr mishandles unmatched (…) in regex [Vulnerability Disclosure] QEMU/Slirp: OOB access while processing ARP/NCSI packets Fuzzing的未来发展： 现在最大的问题是，Fuzzing找到了许多漏洞，目前像Google等公司具有较完整的Fuzzing流程，它们虽然能够发现自己产品的漏洞，但是没有开发者会去理会，过几天，一个白帽子会提交申请会说我们找到了一个漏洞，按照管理应该给他发证书奖品，如果每一个程序都被发现，那么SRC平台和公司的开销会变得比较大。现在的问题是，如何让开发者重视安全测试。我们如何设计Fuzzing方案？我们是否能通过静态分析的手段选择方案？能否动态地让人参与动态覆盖度的过程，让人更方便的添加语料集。 本文参考： TSE 2019: The Art, Science, and Engineering of Fuzzing: A Survey https://media.ccc.de/v/rc3-699526-fuzzers_like_lego https://zh.wikipedia.org/wiki/%E6%A8%A1%E7%B3%8A%E6%B5%8B%E8%AF%95 哔哩哔哩Fuzzing入门-原理与实践  展开全文 • 工控系统网络协议Fuzzing测试技术研究综述 • Hackfest-高级模糊测试研讨会 从这里开始-> 以前的版本 EkoParty => 要求 您需要的工作坊是： 电报帐户。 您将需要它来使用它向我发送您的问题/解决方案 具有Internet连接的正在运行Linux系统 ... • 目前检测软件缓冲区溢出漏洞仅局限于手工分析、二进制补丁比较及fuzzing技术等，这些技术要么对人工分析依赖程度高，要么盲目性太大，致使漏洞发掘效率极为低下。结合fuzzing技术、数据流动态分析技术以及异常自动... • Fuzzing技术已被证明可以非常有效地找出网页浏览器漏洞。随著浏览器厂商提供的漏洞奖金悬赏计划与0day漏洞交易市场的成长，更多研究人员加入浏览器漏洞挖掘的行列。能够胜过这些漏洞挖掘巨头的办法，就是使用智能... • Discovering Vulnerabilities in COTS IoT Devices through Blackbox Fuzzing Web Management Interface • 高效的SIP模糊测试 这是论文《An Efficient Fuzzing Test Method For SIP Server》的源码。 • Ari Takanen_ Jared D. DeMott_ Charles Miller - Fuzzing for Software Security Testing and Quality Assurance (2018, Artech House).pdf • Reinforcement Learning-based Hierarchical Seed Scheduling for Greybox Fuzzing整体内容 论文题目 Reinforcement Learning-based Hierarchical Seed Scheduling for Greybox Fuzzing 工具名称 IAFL-HIER ...  Reinforcement Learning-based Hierarchical Seed Scheduling for Greybox Fuzzing 整体内容论文内容多级代码覆盖指标（用于种子聚类）分层种子调度策略实验 论文题目Reinforcement Learning-based Hierarchical Seed Scheduling for Greybox Fuzzing工具名称AFL-HIER AFL++-HIER论文来源NDSS 2021一作Jianhan Wang (University of California, Riverside)文章链接https://www.ndss-symposium.org/ndss-paper/reinforcement-learning-based-hierarchical-seed-scheduling-for-greybox-fuzzing/ 整体内容 灰盒模糊测试(Grey-box Fuzzing) 的流程可以描述为：模糊测试通过变异和交叉拼接生成新的测试用例，然后根据能体现适应度的机制去从新生成的输入中选出适应度高的放到种子池里，用于之后的变异。不同于自然的进化过程，种子池里的种子只有一部分会被选中去进行变异等操作生成新的测试用例。 AFL使用edge coverage来衡量种子的适应度，也就是种子是否覆盖新的branch，以覆盖更多的branch。适应度函数一个重要的性能就是保存中间节点的能力（its ability to preserve intermediate waypoint）。这里我理解的是：去探索未覆盖的路径的时候，要把已经覆盖的关键路径保留。论文中举的例子：假设有一个校验是 a = 0 x d e a d b e e f a = 0xdeadbeef ，如果只考虑edge覆盖率，那么要想变异出这个a，概率是 2 3 2 2^32 。但是，如果保留重要的waypoints,把32位拆成4个8位的数字，从 0 x e f , 0 x b e e f , 0 x a d b e e f , 0 x d e a d b e e f 0xef,0xbeef,0xadbeef,0xdeadbeef 去变异，变异出正确的数值的可能性会更大。 “Be sensitive and collaborative: Analyzing impact of coverage metrics in greybox fuzzing”这篇论文通过更加细粒度地衡量代码覆盖，来保留中间节点。这个方法能够保留更多有价值的种子，增加种子的数量，但是给fuzzing种子调度加大了负担。有些种子可能永远都不会被选中。因此，在这种情况下要提出更加合理的种子调度策略。 这篇论文中用一个分级调度策略（hierarchical scheduler）来解决种子爆炸问题。分为两个部分： 通过不同敏感度等级的代码覆盖指标把种子进行聚类。通过UCB1算法设计种子选择策略。 把fuzzing视为多臂老虎机问题，然后平衡exploitation和exploration。observation是：当一个代码覆盖率衡量指标 c j c_j 比 c i c_i 更敏感，在使用 c j c_j 来保存中间节点的同时，用 c i c_i 把种子进行一个聚类，聚类到一个节点。节点用一个树形结构来组织，越接近叶子节点的种子，它的代码覆盖指标更加敏感。 种子调度会根据UCB1算法从根节点开始选，计算每个节点的分数。直到选中一个叶子节点，从这个节点中包含的众多种子中再选一个种子。每次fuzzing完，对树形结构中的值进行更新。 论文内容 Waypoint的解释：一个测试用例如果触发了bug，我们把它视为一个链的末尾，而它对应的初始的种子视为起点。那么，初始种子经过变异生成的中间测试用例到最终生成能够触发bug的测试用例，这中间的所有的种子逐渐地降低bug的搜索空间，把这些种子叫做waypoint。 多级代码覆盖指标（用于种子聚类） 使用敏感度高的代码覆盖率指标能够提高fuzzing发现更多程序状态的能力。但是，种子的数量也因此增加了，fuzzing的种子调度的负荷也增加了。种子之间的相似性和差异性影响着fuzzing的exploration和exploitation。使用聚类的方法把相似的种子聚类。同一类的种子在代码覆盖粒度上是一样的。通过多级的代码覆盖指标把种子进行聚类。 如上图所示，用多级的不同粒度的覆盖率指标把种子进行聚类。当一个测试用例覆盖了新的函数、基本块或者边，它被保存下来作为一个种子。接下来进行聚类，从根节点开始，不同的敏感度的代码覆盖衡量指标判断子节点中是否覆盖和这个新的种子一样，如果一样就继续往下聚类。如果不一样就新建一个节点。这里由三种衡量指标：函数、基本块和边。在树中深度约小的指标敏感度越低。 分层种子调度策略 挑选种子的过程就是从根节点开始搜索到一个叶子节点的过程。exploration vs Exploitation。一方面，新生成的还没怎么fuzzing过的种子可能带来新的代码覆盖率，另一方面带来新的代码覆盖率的被fuzzing过的种子被选中的几率更大。这篇论文把种子调度视为一个多臂老虎机问题，用已有的MAB算法UCB1来平衡exploitation和exploration。从根节点开始，基于代码覆盖指标选择分数最高的种子，直到选中一个叶子节点。每轮fuzzing结束，沿着种子选择的路径上的所有节点都会加分数。种子分数的计算，考虑三个方面 这个种子的稀有程度:这个种子变异出新的有趣测试用例的难易程度不确定性 一个特征 F ∈ τ l F \in \tau_l 在level l l 的 h i t c o u n t hit count 表示覆盖那个特征的测试用例的数量。 P \mathcal P 表示被测程序， I \mathcal I 是目前生成所有输入，则 n u m h i t s [ F ] = ∣ I ∈ I : F ∈ C l ( P , T ) ∣ num_hits[F] = |{I \in \mathcal I :F \in C_l(P,T)}| 很多论文中都提到，越是稀有，越要提高选中的概率。因此， r a r e n e s s [ F ] = 1 n e m _ h i t s [ F ] rareness[F] = \frac{1}{nem\_hits[F]} 假设，我们在第 t t 轮选中一个种子 s s 来进行fuzzing, I s , t \mathcal I_{s,t} 是这一轮生成的所有测试用例，代码覆盖等级 C l ， l ∈ 1 , . . . , n C_l，l \in {1,...,n} 下， f c o v [ s , l , t ] = { F : F ∈ C ( P , I ) ∀ I s , t } fcov[s,l,t] = \{F:F \in C(P,I) \forall I_{s,t}\} 接下来，如何计算一轮fuzzing之后seed的reward。如果只把新覆盖的features的数量作为reward，随着fuzzing的进行，覆盖新的特征的可能性逐渐降低，因此，种子的平均回报可能会迅速减少到零。当种子数量多且方差接近零时，UCB算法不能很好地对种子进行优先排序。因此，这个论文中在计算seedreward的时候，就按最稀有的那个特征的稀有值作为这个种子的reward。 说人话就是：比如这层是函数级别的代码覆盖指标，那就首先统计所有生成的种子覆盖的函数分别的覆盖次数，找出覆盖次数最少的次数，然后用1除以它就是$SeedReward(s,l,t) $。 S e e d R e w a r d ( s , l , t ) = m a x F ∈ f c o v [ s , l , t ] ( r a r e n e s s [ F ] ) SeedReward(s,l,t) = \underset{F\in fcov[s,l,t]}{max}(rareness[F]) 在反向传播的时候，不同层的节点的reward怎么算呢？每一层的节点d可以构成一个序列， < a 1 , . . . , a n , a n + 1 > <a^1,...,a^n,a^n+1> 。在计算reward的时候，考虑到 a l a^l 会影响到之后的种子调度，并且它的feedback是由coverage level组成的，reward的计算方式如下： R e w a r d ( a l , t ) = ∏ l ≤ k ≤ n S e e d R e w a r d ( s , k , t ) n − l + 1 Reward(a^l,t) = \sqrt[n-l+1]{\prod_{l \le k \le n}{SeedReward(s,k,t)}} reward是规定好从下到上分数怎么传递，接下来就是怎么从上到下怎么去选种子。论文基于UCB1计算fuzzing一个种子的performance期望： F u z z P e r f ( a ) = Q ( a ) + U ( a ) FuzzPerf(a) = Q(a) + U(a) Q ( a ) Q(a) 是节点 a a 目前得到的reward平均值，$ U(a) \$是上置信区间半径。 在计算Q的时候使用加权平均，也就是一个种子变异的次数越多，它的稀缺性越低。

Q

(

a

,

t

)

=

R

e

w

a

r

d

(

a

,

t

)

+

w

⋅

Q

(

a

,

t

′

)

⋅

∑

p

=

0

N

[

a

,

t

]

−

1

w

p

1

+

w

⋅

∑

p

=

0

N

[

a

,

t

]

−

1

w

p

Q(a,t) = \frac{Reward(a,t)+w · Q(a,t') ·\displaystyle \sum ^{N[a,t]-1}_{p=0} w^p}{1+w·\displaystyle \sum ^{N[a,t]-1}_{p=0} w^p}

N

[

a

,

t

]

N[a,t]

表示t轮结束时这个种子被选中的次数，t’表示上一轮a被选中。w在后面实验部分有测试，取值为0.5。

U

(

a

)

U(a)

，一个节点中包含的种子数量越多，应该给更高的选中机会。

U

(

a

)

=

C

×

Y

[

a

]

Y

[

a

′

]

×

l

o

g

N

[

a

′

]

N

[

a

]

U(a) = C \times \sqrt{\frac{Y[a]}{Y[a']}} \times \sqrt{\frac{logN[a']}{N[a]}}

Y

[

a

]

Y[a]

就是节点中种子的数量。C是一个参数设置成1.4。（通常MCTS中都会把C设置成根号2，也就是1.4）

以上的公式只能够根据已有的数据来计算，对于没有被选中过的种子，它没有选中次数，也只有一个种子，我们就无法计算相应的值。因此对于这些种子，它也有覆盖的特征路径，根据它覆盖的特征，这么计算：

S

e

e

d

R

a

r

e

n

e

s

s

(

s

,

l

)

=

∑

F

∈

C

l

(

p

,

s

)

r

a

r

e

n

e

s

s

2

[

F

]

∣

F

:

F

∈

C

l

(

p

,

s

)

∣

SeedRareness(s,l) = \sqrt{ \frac{\sum_{F\in C_l (p,s)} rareness^2[F]}{|{F:F \in C_l(p,s)}|} }

这里用根号是为了让这个值更小，保留更多的差异性，那么节点的稀有性计算如下：

R

a

r

e

n

e

s

s

(

a

l

)

=

S

e

e

d

R

a

r

e

n

e

s

s

(

s

,

l

)

Rareness(a^l) = SeedRareness(s,l)

也就是上一层的稀有性和它的子节点是一样的。每轮除了反向传播reward，也要反向传播稀有性。

结合上述所有公式，选中一个节点的分数计算方式如下：

S

c

o

r

e

(

a

)

=

R

a

r

e

n

e

s

s

(

a

)

×

F

u

z

z

P

e

r

f

(

a

)

Score(a) = Rareness(a) \times FuzzPerf(a)

实验
分别基于AFL和AFL++实现了两个原型：AFL-HEIR和AFL++-HEIR。

CGC: CGC中crash的程序的平均以及最低和最高数量、首先发现crash的时间。代码覆盖率：统计AFL-HEIR比其它跑的好的程序的数量。别的程序2小时达到的代码覆盖率AFL-HEIR的用时。AFL-HEIR用多长时间超过其它的fuzzer. Fuzzbench:比较了AFL++-HEIR和AFL++和AFL++-FLAT。6个小时代码覆盖率平均值。unique edge数量。

这里对实验结果进行了讨论：

与CGC基准测试的结果相比，我们发现在大多数FuzzBench基准测试中，我们的性能并没有明显优于afl++。我们怀疑原因是我们在评估中使用的基于ucb1的调度程序和超参数更倾向于开发而不是探索。因此，当被测试的程序相对较小时(例如，CGC基准测试)，我们的调度器可以发现更多的错误，而不会牺牲太多的整体覆盖。但是在FuzzBench程序中，打破一些独特的边缘(表II)可能会因为不探索其他更容易覆盖的边缘而被掩盖。

吞吐量: CGC上计算工具在种子调度上花费的时间。 分析多级种子调度策略的有效性：拿AFL-HEIR和AFL比较，AFL-Flat和AFLfast比较。统计每个fuzzer生成的种子数量，以及AFL++-HEIR在不同level的节点数量。 分析公式中参数的影响。 对其它覆盖率指标的支持性。

展开全文
• ## Fuzzing技术简介

千次阅读 2018-12-25 11:37:42
一、什么是Fuzzing？ Fuzz本意是“羽毛、细小的毛发、使模糊、变得模糊”，后来用在软件测试领域，中文一般指“模糊测试”，英文有的叫“Fuzzing”，有的叫“Fuzz Testing”。本文用fuzzing表示模糊测试。 Fuzzing...
• ## Fuzzing

千次阅读 2014-10-29 16:49:06
一些或者某位杰出的黑客在研究漏洞发掘技术的时候发明了Fuzzing技术。可以说这是一种非常快速而有效的发掘技术。Fuzzing技术的思想就是利用“暴力”来实现对目标程序的自动化测试，然后监视检查其最后的结果，如果...
• SmartSeed: Smart Seed Generation for Efficient Fuzzing 摘要 模糊测试是一种自动检测应用程序漏洞的方法。对于基于遗传算法的fuzzing，它可以改变用户提供的种子文件，以获得大量输入，然后使用这些输入测试客观...
• 关于使用WASM在Web浏览器中对本机代码进行模糊处理的我的演讲的演示。 此回购包含演示（示例）的示例和示例（演示），我使用使用libFuzzer在浏览器中对C / C ++程序进行模糊测试。 它包含以及一些工具来帮助用户...
• Hardware Fuzzing(Fuzzer), 硬件模糊测试的代码。都打包在里面了
• winafl, adm的一个 fork，用于 fuzzing Windows 二进制文件 WinAFL Original AFL code written by Michal Zalewski <lcamtuf@google.com> Windows fork written and maintai

...