At Hackito Ergo Sum 2012, I presented about Exploitation of the RenderArena allocator in WebKit (PDF) with a focus on the Android Mobile platform. Since one of the techniques for hijacking a vtable (and subsequently achieve code execution) requires careful heap massaging, we developed an internal tool to hook the various heap allocation functions inline and log all allocations and frees in memory with as minimal overhead as possbile. Since the gist of the talk was the reliable exploitation of this specific bug class, I did not go to deep on how we built this tool. Since some people asked about its internals, the basic ideas are presented here.
The general idea was to log all heap (de-)allocations while maintaining allocation order in a multi-threaded environment (such as the Android Browser) by introducing as little per allocation overhead as possible. Since using a debugger to set a breakpoint on the respective heap functions does incur too much such overhead, I decided for a different approach:
To log every allocation in a timely manner, each such function is hot-patched at runtime with a non-intrusive call to a helper function that logs information about the caller to a memory buffer. Only when this memory buffer is full, it escapes to the analysis software for flushing the buffer over the network to an analysts computer. This method adds so little overhead to memory allocation that the program under analysis remains interactively usable, i.e. we can still use the Browser normally. Of course this approach is extensible to hot-patching other functions besides the normal system heap allocator and, in fact, for the talk we also instrumented a special Webkit sub-allocator (please refer to the slides for more information).
The code to log information is generic in the sense that we can sample one arbitrary register at any point to be hooked. This is usually sufficient for capturing function parameters and return values at the beginning / end of certain functions or arbitrary. If you know a little bit more of the function you are looking at, you can even sample arbitrary values (because ARM is a RISC architecture, which requires any value to be processed to be loaded into a register at some point). The native code to log a single register sample collected looks like this:
|Native Code to Log a Single Sample|
|Decompilation of Log Code|
Note that the original code was done in hand-written assembly, but with sufficient manual added type and pseudo-calling-convention information added, IDA is doing a good job of reconstructing equivalent C code.
Unfortunately for us, there typically is no slack space in real-world code that would allow us to insert simple branches to our logging functions. Therefore for inline hooking, typically the code at the desired location to be probed is overwritten with a branch to a trampoline. This trampoline then needs to compensate for the overwritten instruction, so it usually consists of:
- A call to the desired hook (the logging function in our case) with potentially required state saving and restoring to preserve the expected state of the original code
- Semantically equivalent code to the code overwritten, often just a potentially fixed up copy of the original instructions
- A branch to the code following the overwritten instructions to continue the original code
An example of two such trampolines that have been generated is depicted below. All this code is generated at runtime by disassembling the original code, determining necessary fix-up steps and of course generating code to sample the desired register by copying it into R0 for the log function. The instructions in cyan are the original instrucions copied over.
|Example JIT’ed Probe Trampolines|
The code for generating these trampolines grew a little more complex than anticipated to cover the corner-cases encountered. It now consists of three passes:
- Disassemble the original to-be-overwritten code and look for special instructions (thankfully the Thumb2 instruction set is very regular)
- Length Reassembly of original code to calculate relative addresses and offsets
- Actual reassembly of original code, fixing relative references and adding branches to original code
|Decompilation of Example Trampoline|
This tool has proven extremely valuable in prototyping attacks by whitehat researchers as we used existing visualization tools to render a view of the heap at any given point in time. It can however be leveraged for benign purposes as well, e.g. some Windows malware analysis sandboxes are based on the same inline hooking approach on the x86 architecture. Although at first looking simple when designed on paper, this project has been difficult to implement due to all the corner cases encountered.