SN4KE: Practical Mutation Testing at Binary Level

Mohsen Ahmadi (Arizona State University), Pantea Kiaei (Worcester Polytechnic Institute), Navid Emamdoost (University of Minnesota)

Mutation analysis is an effective technique to evaluate a test suite adequacy in terms of revealing unforeseen bugs in software. Traditional source- or IR-level mutation analysis is not applicable to the software only available in binary format. This paper proposes a practical binary mutation analysis via binary rewriting, along with a rich set of mutation operators to represent more realistic bugs. We implemented our approach using two state-of-the-art binary rewriting tools and evaluated its effectiveness and scalability by applying them to SPEC CPU benchmarks. Our analysis revealed that the richer mutation operators contribute to generating more diverse mutants, which, compared to previous works leads to a higher mutation score for the test harness. We also conclude that the reassembleable disassembly rewriting yields better scalability in comparison to lifting to an intermediate representation and performing a full translation.

Is Your Firmware Real or Re-Hosted? A case study in re-hosting VxWorks control system firmware

Abraham Clements (Sandia National Labs), Logan Carpenter (Sandia National Labs), William A. Moeglein (Sandia National Labs), Christopher Wright (Purdue University)

Emulating firmware is increasingly popular for systems research, particularly vulnerability research. In this paper we describe how we extend HALucinator to work with real-world systems that use the popular VxWorks RTOS. We describe the Re-hosting Support Layer (its definition and implementation) with the functions necessary to get a Schneider Electric SCADAPack 350 remote terminal unit, a Schneider Electric Modicon 340 programmable logic controller, and Hughes 9201 BGAN inmarsat terminal up and re-hosted (at least partially). We share the process and our path of performing this work over the last year, and give a retrospective approach for re-hosting other RTOSes. We provide a case study with 3 real devices, and show that we can re-host portions of the firmware and perform analyses to show the success of our approach.

PyPANDA: Taming the PANDAmonium of Whole System Dynamic Analysis

Luke Craig (MIT Lincoln Laboratory), Andrew Fasano (Northeastern University), Tiemoko Ballo (MIT Lincoln Laboratory), Tim Leek (MIT Lincoln Laboratory), Brendan Dolan-Gavitt (NYU), William Robertson (Northeastern University)

When working with real world programs, dynamic analyses often must be run on a whole-system instead of just a single binary. Existing whole-system dynamic analysis platforms generally require analyses to be written in compiled languages, a suboptimal choice for many iterative analysis tasks. Furthermore, these platforms leave analysts with a split view between the behavior of the system under analysis and the analysis itself---in particular the system being analyzed must commonly be controlled manually while analysis scripts are run. To improve this process, we designed and implemented PyPANDA, a Python interface to the PANDA dynamic analysis platform. PyPANDA unifies the gap between guest virtual machines behavior and analysis tasks; enables painless integrations with other program analysis tools; and greatly lowers the barrier of entry to whole-system dynamic analysis. The capabilities of PyPANDA are demonstrated by using it to dynamically evaluate the accuracy of three binary analysis frameworks, track heap allocations across multiple processes, and synchronize state between PANDA and a binary analysis platform. Significant challenges were overcome to integrate a scripting language into PANDA with minimal performance impact.

Polypyus – The Firmware Historian

Jan Friebertshäuser (TU Darmstadt, SEEMOO), Florian Kosterhon (TU Darmstadt, SEEMOO), Jiska Classen (TU Darmstadt, SEEMOO), Matthias Hollick (TU Darmstadt, SEEMOO)

Embedded systems, IoT devices, and systems on a chip such as wireless network cards often run raw firmware binaries. Raw binaries miss metadata such as the target architecture and an entry point. Thus, their analysis is challenging. Nonetheless, chip firmware analysis is vital to the security of modern devices. We find that state-of-the-art disassemblers fail to identify function starts and signatures in raw binaries. In our case, these issues originate from the dense, variable-length ARM Thumb2 instruction set. Binary differs such as BinDiff and Diaphora perform poor on raw ARM binaries, since they depend on correctly identified functions. Moreover, binary patchers like NexMon require function signatures to pass arguments.

As a solution for fast diffing and function identification, we design and implement Polypyus. This firmware historian learns from binaries with known functions, generalizes this knowledge, and applies it to raw binaries. Polypyus is independent from architecture and disassembler. However, the results can be imported as disassembler entry points, thereby improving function identification and follow-up results by other binary differs. Additionally, we partially reconstruct function signatures and custom types from Eclipse PDOM files. Each Eclipse project contains a PDOM file, which caches selected project information for compiler optimization. We showcase the capabilities of Polypyus on a set of 20 firmware binaries.

icLibFuzzer: Isolated-context libFuzzer for Improving Fuzzer Comparability

YU-CHUAN LIANG (National Taiwan University), Hsu-Chun Hsiao (National Taiwan University)

libFuzzer is a powerful fuzzer that has helped find thousands of bugs in real-world programs. However, fuzzers that seek to compare with libFuzzer and its variants face two significant limitations. First, they are restricted to use the time-to-first-crash metric rather than the code-coverage metric because libFuzzer will abort whenever the fuzzing target crashes. Second, even if libFuzzer in the ignore-crash mode can continue after finding a crash, it may produce wrong results for programs expecting a clean global context. Thus, fuzzers wishing to compare with libFuzzer are restricted to use carefully modified programs or programs without global-context dependency. To solve this context pollution problem and enhance comparability between libFuzzer-based and AFL-based fuzzers, we present a new libFuzzer mode called isolated-context mode (icLibFuzzer) that isolates the contexts of each fuzzer instance and fuzzing target, allowing to reinitialize the fuzzing target's context after each execution efficiently. To implement icLibFuzzer, we modify libFuzzer's in-process infrastructure into a lightweight forkserver infrastructure inspired by AFL's design and propose structure packing, which speeds up the fuzzing speed by about 2x. We compare icLibFuzzer with four state-of-the-art fuzzers (AFL, Angora, QSYM, and Honggfuzz) using several real-world programs. The experiment result shows that icLibFuzzer outperforms these four fuzzers in most target programs after 24 hours of fuzzing and maintains the lead from 24 to 72 hours. To demonstrate that we can easily keep up with libFuzzer's updates, we upgrade icLibFuzzer to using the latest libFuzzer (from LLVM9 to LLVM11) with no change to the code base. Our preliminary evaluation hints icLibFuzzer-LLVM11's promising improvement compared with icLibFuzzer-LLVM9 and AFL++, the latest fuzzer in AFL family we can find. We hope icLibFuzzer can serve as another baseline for fuzzing research. Our source code is available at GitHub.

Effects of Precise and Imprecise Value-Set Analysis (VSA) Information on Manual Code Analysis

Laura Matzen (Sandia National Laboratories), Michelle Leger (Sandia National Laboratories), Geoffrey Reedy (Sandia National Laboratories)

Binary reverse engineers rely on a combination of automated and manual techniques to answer questions about software. However, when evaluating automated analysis results, they rarely have additional information to help them contextualize these results in the binary. We expect that humans could more readily understand the binary program and these analysis results if they had access to information usually kept internal to the analysis, like value-set analysis (VSA) information. However, these automated analyses often give up precision for scalability, and imprecise information has the potential to hinder human decision making.

To assess how precision of VSA information affects human analysts, we designed a human study in which reverse engineers answered short information flow problems, determining whether code snippets would print sensitive information. We hypothesized that precise VSA information would help our participants analyze code faster and more accurately, and that imprecise VSA information would lead to slower, less accurate performance than having no VSA information at all. Our hand-crafted code snippets were presented paired with precise, imprecise, or no VSA information in a blocked design. We recorded participants' eye movements, response times, and accuracy while they answered the problems. Our experiment showed that precise VSA information changed participants' problem-solving strategies and supported faster, more accurate analyses. However, contrary to our predictions, having imprecise VSA information also led to increased accuracy relative to having no VSA information; this seemed to be because participants spent more time working through the code.

JMPscare: Introspection for Binary-Only Fuzzing

Dominik Maier (TU Berlin), Lukas Seidel (TU Berlin)

Researchers spend hours, or even days, to understand a target well enough to harness it and get a feedback-guided fuzzer running. Once this is achieved, they rely on their fuzzer to find the right paths, maybe sampling the collected queue entries to see how well it performs. Their knowledge is of little help to the fuzzer, the fuzzer’s behavior is largely a black box to the researcher. Enter JMPscare, providing deep insight into fuzzing queues. By highlighting unreached basic blocks across all queue items during fuzzing, JMPscare allows security researchers to understand the shortcomings of their fuzzer, and helps to overcome them. JMPscare can analyze thousands of queue entries efficently, and highlight interesting roadblocks, so-called frontiers. Using this information, the human-in-the-loop is able to improve fuzzer, mutator, and harness. Even complex bugs, hard to reach for a generalized fuzzer, hidden deep in the control flow of the target, can be covered in this way. Apart from a purely analytical view, its convenient built-in binary patching facilitates force execution for subsequent fuzz runs. We demonstrate the benefit of JMPscare on the ARM-based MediaTek Baseband.

With JMPscare we gain in-depth insight into larger parts of the firmware and find new targets in this RTOS. JMPscare simplifies further mutator, fuzzer, and instrumentation development.

Dinosaur Resurrection: PowerPC Binary Patching for Base Station Analysis

Uwe Müller (TU Darmstadt, SEEMOO), Eicke Hauck (TU Darmstadt, SEEMOO), Timm Welz (TU Darmstadt, SEEMOO), Jiska Classen (TU Darmstadt, SEEMOO), Matthias Hollick (TU Darmstadt, SEEMOO)

Dated computing architectures such as PowerPC continue to live in systems with multi-decade lifespan. This particularly includes embedded systems with real-time requirements that are deeply integrated into critical infrastructures as well as control systems in power plants, trains, airplanes, etc. One example is Terrestrial Trunked Radio (TETRA), a digital radio system used in the public safety domain and deployed in more than 120 countries worldwide: base stations of one of the main vendors are still based on PowerPC. Despite the criticality of the aforementioned systems, many follow a security by obscurity approach and there are no openly available analysis tools. While analyzing a TETRA base station, we design and develop a set of analysis tools centered around a PowerPC binary patcher. We further create various dynamic tooling on top, including a fast memory dumper, function tracer, flexible patching capabilities at runtime, and a fuzzer. We describe the genesis of these tools and detail the binary patcher, which is general in nature and not limited to our base station under test.

Short Paper: Declarative Demand-Driven Reverse Engineering

Yihao Sun (Syracuse University), Jeffrey Ching (Syracuse University), Kristopher Micinski (Syracuse University)

Binary reverse engineering is a challenging task because it often necessitates reasoning using both domain-specific knowledge (e.g., understanding entrypoint idioms common to an ABI) and logical inference (e.g., reconstructing interprocedural control flow). To help perform these tasks, reverse engineers often use toolkits (such as IDA Pro or Ghidra) that allow them to interactively explicate properties of binaries. We argue that deductive databases serve as a natural abstraction for interfacing between visualization-based binary analysis tools and high-performance logical inference engines that compute facts about binaries. In this paper, we present a vision for the future in which reverse engineers use a visualization-based tool to understand binaries while simultaneously querying a logical-inference engine to perform arbitrarily-complex deductive inference tasks. We call our vision declarative demand-driven reverse engineering (D^3RE for short), and sketch a formal sematnics whose goal is to mediate interaction between a logical-inference engine (such Souffle) and a reverse engineering tool. We describe a prototype tool, d3re, which are using to explore the D^3RE vision. While still a prototype, we have used d3re to reimplement several common querying tasks on binaries. Our evaluation demonstrates that d3re enables both better performance and more succinct implementation of these common RE tasks.