Most techniques used in reverse engineering are pretty obvious. Still I think it's worth listing a few that are slightly less obvious.
-
Interpreting lists of symbols generated by
nm -pam
(or some variant thereof)Some of these are "non-external" -- accessible to methods in the same module. Others are "external" -- accessible to (importable by) other modules. And still others are "(undefined) external" -- imported from other modules.
"External" symbols (of both kinds) will usually be
extern "C"
. But non-external symbols may be mangled C++ symbols, which contain both name and type information (and for methods their calling parameters). Manged symbols can be demangled using thec++filt
utility.Lists of "(undefined) external" symbols can be useful if you're only interested in (say) the CG... methods (in the CoreGraphics framework) called from the AppKit framework. In that case you only need to (say) hook the CG... methods in the AppKit's list of "(undefined) external" symbols.
Mangled "non-external" method names can sometimes be useful reconstructing parameter lists for non-mangled "external" methods -- sometimes an "external" method is a thin wrapper around a "non-external" method with a mangled name.
-
Grepping directories to find which binary displayed a particular error message, or imports or implements a particular exported method.
Almost all OS X system libraries can be found under one of the two following directories:
/System/Library/
/usr/lib/
Applications can be found under
/Applications/
. Plugins can be found under/Library/Internet Plug-Ins/
.When searching on an error message, you generally choose a distinctive part of it, including at least one space. When searching on a method you choose its name.
In each directory where you want to search:
grep -r -s "[string]" *
When searching on an error string, each match should contain the string somewhere in the
__cstring
section of its__TEXT
segment. When searching on a method name, each match should contain the string either in the__cstring
section or the__stubs
section of its__TEXT
segment.Once you've found some matches, you'll need to look more closely at each of them (using tools like
nm -pam
,class-dump
andstrings
). -
Learning more about objects by discovering their lifetimes
A lot can be learned about a class and its methods by finding out what code "owns" and uses it. "Ownership" is particularly easy to determine -- just find out what code creates and destroys it.
In Objective-C code this is very straightforward -- just hook the appropriate
init...
anddealloc
ordispose
methods, and print a stack trace on each.In C/C++ code you'll need to look for methods with "Create" and "Destroy" (or something similar) in their names.
An example of this is the interpose library I posted at Showing menubar covers firefox window in fullscreen mode.
-
Discovering undocumented C methods' parameters and return values
This pretty much always requires the use of a disassembler -- a good one, like [Hopper Disassembler] (http://www.hopperapp.com/), that can follow cross-references (for example from a method's implementation to the code that calls it).
You need to understand assembly code and calling conventions -- which can be quite obscure. But there are some tricks you can play to make your life easier. * Look at messages displayed when a particular parameter is invalid.
They will often tell you what that parameter is supposed to be -- e.g. "invalid window" or "invalid context". * Look at the assembly code for both the method and what calls it.
This often provides additional context -- especially if the caller is a documented method. * Where are `CGFloat` parameters in the parameter list?
`CGFloat` parameters are `double`s in 64-bit mode and `float`s in 32-bit mode.
In 64-bit mode, the first six integer and pointer parameters are passed in registers `$rdi`, `$rsi`, `$rdx`, `$rcx`, `$r8` and `$r9`. `CGFloat` parameters (the first two) are passed in `$xmm0` and `$xmm1`. But there's no way to tell where the `CGFloat` parameters fit in the parameter list (or in the method's declaration).
In 32-bit mode, all such parameters are passed on the stack frame. So by looking at the 32-bit implementation of a method, you can tell where `CGFloat` parameters go in the parameter list.
-
Learning more about the codepath on which a crash happens
Often there's not much you can do about a non-reproducible crash that happens in system code (or any code for which you don't have the source).
Still, if all the crashes have the same stack, you can sometimes glean more information about them by finding out what the code that crashes "normally" does (when it doesn't crash).
The method names in crash stacks, plus the names of the modules they belong to, provide clues. You can also use
gdb
or an interpose library to find out when the methods below the top of the stack normally run -- all the time, or only under special conditions. A successful case of this is my fourth example of reverse engineering.Sometimes you can learn how to "emulate" a crash -- to trigger exactly the same crash stack by using an interpose library to alter system code in subtle ways. This can sometimes give you clues about how to reproduce the crash.
For example at startup crash at libclh, I found I can "emulate" those crashes by making
IOConnectMapMemory()
orIOConnectCallMethod()
do an error return.