Next: General Techniques
Up: Static Disassembly of Obfuscated
Previous: Related Work and Background
Disassembling Obfuscated Binaries
Our disassembler performs static analysis on Intel x86 binaries. When
analyzing an obfuscated binary, one cannot assume that the code was
generated by a well-behaved compiler. In fact, the obfuscation
techniques introduced by Linn and Debray [13]
precisely exploit the fact that standard disassemblers assume certain
properties of compiler-generated code that can be violated without
changing the program's functionality. By transforming the binary into
functionally equivalent code that does not possess all the assumed
properties, standard disassemblers are confused and fail to correctly
translate binary code into its corresponding assembly representation.
In general, certain properties are easier to change than others and it
is not straightforward to transform (i.e., obfuscate) a binary into a
functionally equivalent representation in which all the
compiler-related properties of the original code are lost. When
disassembling obfuscated binaries, we require that certain assumptions
are valid. These assumptions (some of which constitute limiting
factors for our ability to disassemble obfuscated binaries) are
described in the following subsections.
- Valid instructions must not overlap. An instruction is
denoted as valid if it belongs to the program, that is, it is
reached (and executed) at run-time as part of some legal program
execution trace. Two instructions overlap if one or more bytes
in the executable are shared by both instruction. In other words,
the start of one instruction is located at an address that is
already used by another instruction. Overlapping instructions have
been suggested to complicate disassembly in [7].
However, suitable candidate instructions for this type of
transformation are difficult to find in real executables and the
reported obfuscation effects were minimal [13].
- Conditional jumps can be either taken or not taken. This
means that control flow can continue at the branch target or at the
instruction after the conditional branch. In particular, it is not
possible to insert junk bytes at the branch target or at the address
following the branch instruction. Linn and
Debray [13] discuss the possibility to
transform unconditional jumps into conditional branches using opaque
predicates. Opaque predicates are predicates that always evaluate to
either true or false, independent of the input. This would allow the
obfuscator to insert junk bytes either at the jump target or in
place of the fall-through instruction. However, it is not obvious
how to generate opaque predicates that are not easily recognizable
for the disassembler. Also, the obfuscator presented
in [13] does not implement this transformation.
- An arbitrary amount of junk bytes can be inserted at
unreachable locations. Unreachable locations denotes locations
that are not reachable at run-time. These locations can be found
after instructions that change the normal control flow. For example,
most compilers arrange code such that the address following an
unconditional jump contains a valid instruction. However, we assume
that an arbitrary number of junk bytes can be inserted there.
- The control flow does not have to continue immediately
after a call instruction. Thus, an arbitrary number of padding
bytes can be added after each call. This is different from the
standard behavior where it is expected that the callee returns to
the instruction following a call using the corresponding return
instruction. More specifically, in the x86 instruction set
architecture, the call operation performs a jump to the call
target and, in addition, pushes the address following the call
instruction on the stack. This address is then used by the
corresponding ret instruction, which performs a jump to the
address currently on top of the stack. However, by redirecting calls
to a branch function, it is trivial to change the return address.
Our disassembly techniques can be divided into two classes: general
techniques and tool-specific techniques.
General techniques are techniques that do not rely upon any knowledge
on how a particular obfuscator transforms the binary. It is only
required that the transformations respect our assumptions. Our general
techniques are based on the program's control flow, similar to a
recursive traversal disassembler. However, we use a different approach
to construct the control flow graph, which is more resilient to
obfuscation attempts. Program regions that are not covered by the
control flow graph are analyzed using statistical techniques. The
general techniques are described in more detail in Section 4.
An instance of an obfuscator that respects our assumptions is
presented by Linn and Debray in [13]. By
tailoring the static analysis process against a particular tool, it is
often possible to reverse some of the performed transformations and
improve the analysis results. Section 5 discusses potential
modifications to our general techniques to take advantage of
tool-specific knowledge when disassembling binaries transformed with
Linn and Debray's obfuscator.
In Section 6, we show that the general techniques
presented in the next section offer a significant improvement over
previous approaches. When combined with tool-specific knowledge, the
obfuscated binary is almost completely disassembled.
Next: General Techniques
Up: Static Disassembly of Obfuscated
Previous: Related Work and Background
2004-05-18