“The only true governing principles are speed, reliability, and ease of use. “
American Fuzzy Lop is a relatively new fuzzer, written and designed primarily by the famous lcamtuf, that has been making the rounds in the infosec world due to its overwhelming success. A large portion of this success is due to it’s compile time instrumentation and a basic genetic algorithm which allows it to exercise a surprising amount in the targeted program. Most of the information here is summarised or stolen from lcamtuf’s own whitepaper. My own interest in AFL, aside from the bug finding, was how this instrumentation worked to increase code coverage.
The ‘Genetic’ algorithm is relatively simple and won’t be discussed much further than this simple outline:
- Load user-supplied initial test cases into the queue,
- Take input file from the queue,
- Repeatedly mutate the file using a balanced variety of traditional fuzzing strategies
- If any of the generated mutations resulted in a new tuple being recorded by the instrumentation, add mutated output as a new entry in the queue.
- Go to 2.
As mentioned earlier, one of AFL’s main features is that it injects lightweight instrumentation during the compilation process of a program to give feedback to the fuzzer. This lets it know when a change is made in the input has affected the programs execution flow in such a way that it has triggered a new transition in state. Previously, most other coverage recording tools used in fuzzing just focused on recording when a basic block has been hit, but AFL records the transition between states e.g. A -> B instead of just A and B.
From his docs:
This form of coverage provides considerably more insight into the execution path of the program than simple block coverage. In particular, it trivially distinguishes between the following execution traces:
A -> B -> C -> D -> E (tuples: AB, BC, CD, DE)
A -> B -> D -> C -> E (tuples: AB, BD, DC, CE)
When an input generates an execution flow with a new tuple, it is saved and added to the queue of input files for future mutation.
He further adds pseudo code for this feature:
The fuzzer maintains a global map of tuples seen in previous executions; this data can be rapidly compared with individual traces and updated in just a couple of dword- or qword-wide instructions and a simple loop.
One of the most amazing things about AFL is how fast this shared memory bitmap can be processed and compared to previous runs. Years ago when I was fuzzing flash I wrote a naive algorithm to compare execution traces I’d generated using DynaRIO + an internet gathered corpus and it was several orders of magnitude less impressive than this one.
To demonstrate what this instrumentation actually looks like as concrete code below is part of a binary compiled without, and then with the instrumentation from AFL.
A call to __afl_maybe_log is added to the beginning of every instrumented basic block, with a number of instructions that save register values so they won’t be clobbered by the injected code. Notice that ecx is loaded up with a random number before each __afl_maybe_log called. This is the ‘cur_location’ variable.
Inside __afl_maybe_log is mostly code that sets up the accessing a shared memory region which stores the map of tuple hits.
Every byte set in the output map can be thought of as a hit for a particular (branch_src, branch_dst) tuple in the instrumented code.
The workhorse of this function is the block that actually does the recording and looks very similar to the pseudo code above :>
The shift right of ecx is very important as it maintains the directionality of the tuples. Without the shift, A -> B would be the same as B -> A or even A -> A.
The other coverage metric that is used by AFL is hit count of each recorded tuple. The tuples are sorted in various buckets based on the number of times they’re executed, sized 1, 2, 3, 4-7, 8-15, 16-31, 32-127, and 128+ hits. If a particular execution trace causes a tuple to shift buckets, either up or down, then it is considered interesting enough to feed back into the AFL ‘genetic’ engine. This hit counting makes up the majority of ‘new’ traces vs purely new tuples which usually make up 10-30%.
Interestingly, AFL aggressively culls execution paths that slow the overall speed down by a large (beyond the 5x initial testing speed) amount as lcamtuf doesn’t consider the trade off of this new path worth it. Herr Nagy echoed a similar sentiment during his Kiwicon presentation when he praised the god of speed:
There are a number of reasons why this is interesting to my research on computer assisted vulnerability finding but primarily we could leverage intelligent fuzzing to help us test hypotheses as to whether a particular state could be reached. This state doesn’t have to immediately be a security vulnerability, e.g. a strcpy(), but perhaps something more subtle that would have be first noticed by a human auditor. A large portion of this could potentially be automated by drawing on the corpus of interesting files/inputs AFL generates and saves while running, by giving it a close starting point to our target state.