Reversing Objective-C Binaries With the REobjc Module for IDA Pro
Recently I took a look at a product that manages Apple Inc.’s macOS and iOS devices in an enterprise environment. As part of this work, I performed an analysis of Objective-C binaries running on managed macOS endpoints. I used Hex-Rays’ Interactive Disassembler (IDA) Pro to perform disassembly and decompilation of these binaries.
If you’ve never programmed on macOS or iOS, you might be unfamiliar with the Objective-C language. It’s a variant of the C programming language. Programs developed in this language are linked against the Objective-C runtime shared library. This library implements the entire object model supporting Objective-C.
One of the goals of the Objective-C runtime is to be as dynamic as possible. One feature of this design goal affects function calls being performed on objects. The Objective-C nomenclature refers to these function calls as message passing. Objective-C objects receive these messages, which typically results in one of the object’s methods being called. The runtime dynamically resolves method calls at runtime. Objective-C source code method calls are converted by the compiler into calls to the runtime function objc_msgSend().
Here we’ll take a closer look at an IDA Pro module, REobjc, that adds proper cross references from calls to objc_msgSend() to the actual function being called.
IDA Pro Cross References and Objective-C
Objective-C calls from one method to another are compiled as calls to objc_msgSend(). One effect of this is that IDA Pro cross references do not reflect the actual functions being called at runtime. This function is defined with the following function signature:
id objc_msgSend(id self, SEL op, ...)
This implies that for any Objective-C method call you make, the first two arguments are the object’s self pointer, and the selector, which is a string representation of the method being called on self. Objective-C methods that take arguments pass those arguments in order after the selector.
Compiling Objective-C Programs
To better demonstrate how Objective-C source is compiled and assembled, the following code example introduces source code using common Objective-C patterns. This init method includes four Objective-C method calls.
Conceptually, the compiler takes the Objective-C method calls above and compiles them into C code that resembles the following. This example is actually decompiler output from IDA Pro, but it illustrates how Objective-C calls are converted into C by the compiler. Each of the four Objective-C calls above corresponds to the function calls indicated in the following excerpt.
As shown, [super init] call is translated into a call to objc_msgSendSuper2(). This is a common pattern used to initialize subclasses. The call to [NSString string] is translated to an objc_msgSend() call which is sent to the object representing the NSString class. The call to [NSMutableData dataWithLength: _length] is translated into a call to objc_msgSend(), in an example of a class method call with additional parameters.
The last Objective-C call in the example, [[BTGattLookup alloc] init], shows a common object allocate-then-initialize pattern. This shows the alloc message being sent to the BTGattLookup class, which results in an instance of that class. This instance is then the self used in the call to a second objc_msgSend() call to the init method. Objective-C and the Intel X64 Architecture
In the resulting binary on Intel X64 architecture, the calls work according the Intel X64 ABI. Function arguments are passed in registers in the order RDI, RSI, RDX, RCX, R8, R9. This implies the RDI register holds the self pointer, and the RSI register holds the selector pointer. Arguments to the Objective-C method begin in the RDX register if necessary.
To properly add cross references from one Objective-C function to another, the values in the RDI and RSI registers must be known. Discovering the values in these two registers is typically straightforward for most calls to objc_msgSend().
Other aspects of Objective-C analysis to keep in mind are the different ways compilers might decide to generate function calls. On X64, the compiler typically generates function calls using CALL and JMP instructions.
It’s possible conditional jump instructions or direct assignment to the instruction pointer are used. The current heuristics in the module don’t address those cases. During development I did not observe binaries that used conditional branches to call Objective-C runtime functions.
The compiler can also encode the function calls as indirect calls or direct calls. In the case of an indirect call, the instruction argument is a register. In the case of a direct call, the instruction argument is some reference to a location in memory. In either case, we must be able to determine if the CALL or JMP references objc_msgSend().
Additionally, to properly track function call cross references, the analysis must track the return values of functions as they are called. In X64, the return value from a function call is stored in the RAX register. If the Objective-C source code first allocates an object and then performs method calls on the resulting object instance, tracking the type of object pointer stored in RAX is necessary to properly understand what object is being passed in calls to objc_msgSend().
REobjc Idapython Module
The primary purpose of the REobjc idapython module is the creation of cross references between Objective-C methods. The module is intended to be easy to use. To use REobjc, open the IDA Pro command window and execute the following lines of Python:
idaapi.require("reobjc")
r = reobjc.REobjc(autorun=True)
REobjc Under the Hood
My intent with the REobjc module is for it to work as simply as possible. However, it might be useful to explain the module’s functionality. This will hopefully help people see how the code works and suggest ways it can work better or be more accurate. Pull requests and discussion are greatly appreciated on this module.
To locate the Objective-C runtime calls we care about, it’s important to understand there are multiple ways compilers may encode the calls in a binary. As we mentioned, calls to functions can either be direct or indirect, and there are a couple of ways the target of the call instruction might be encoded. The Objective-C runtime is linked into all Objective-C programs, and for this reason, all calls eventually land in the imported libobjc.dylib library.
Typically, programs will contain a stub function that merely performs an unconditional jump into the objc_msgSend() function. This allows the library to be loaded at any address, with the loader then performing the proper fixup to let the target program call into the library properly.
In the REobjc module, this is handled by making sure all instances of calls to objc_msgSend() are properly identified. Sometimes the target of the call will be _objc_msgSend, sometimes it will be an imported pointer of the form __imp__objc_msgSend. Because calls may be encoded using either form of these targets, the module locates all forms in the current database.
Hopefully this approach is flexible enough to work with any binary you find. The module retrieves a list of all names in the IDA database using the API idautils.Names() then matches the target functions via a regular expression, storing the matches in an array. During analysis, every candidate call or jump instruction is compared against the list of Objective-C runtime functions, and those that are found to call any form of objc_msgSend() are candidates for having an added cross reference.
The module iterates over all the functions in the target binary, and for each function, iterates over all the instructions in that function. When a target CALL or JMP instruction is identified, the module determines the target of the instruction. If the target is a register, the module walks backward from the CALL or JMP to determine the value of the register. Direct calls are simpler, in that the target of the CALL or JMP is immediately known. In either case, if the target is objc_msgSend(), the CALL or JMP is a candidate for adding a cross reference.
When a call to objc_msgSend() is identified, the first two function arguments must be identified. To reiterate, the first argument is a pointer to the object that receives the Objective-C message, which is called self. The second argument is a pointer to the selector, or message, being passed to the object. Resolving register values is handled in the module by the resolve_register_backwalk_ea() method.
This function is useful even outside of Objective-C reverse engineering. This function takes a program location and a string representation of a register name. Starting at the given program location, the function will go backward one instruction at a time, looking for the value assigned to the target function. The function does this by checking for common X64 instruction mnemonics MOV and LEA. Some programs will copy values to and from registers using variables, and the code tracks these kinds of assignments until the value being copied into the target register is known.
As we mentioned, there are two registers that are handled in a special way here. The RAX register will contain the return value from a previous CALL instruction, so REobjc.resolve_register_backwalk_ea() will track RAX by considering CALLs. Also, because the RDI register is used as the self pointer in Objective-C, there are cases where RDI is not explicitly set inside a function. This is because the target function being examined is calling methods on its own self pointer. For this reason, when the function walks backward, if RDI is the target register and it’s not explicitly set, the code will perform a lookup to determine the self pointer from the current class.
Once the self and selector pointers are resolved for the RDI and RSI registers, the module will attempt to create a cross reference to the appropriate method. This is done by leveraging the existing Objective-C support in IDA Pro. The module function REobjc.objc_msgsend_xref handles the creation of cross references. The function takes the program location of a CALL, and the program location where RDI and RSI are set, and attempts to add the appropriate cross reference.
It’s important to understand that cross references are only added when the Objective-C method being called is located in the current binary. As I work more in this area, I will consider how best to handle method calls that are located in an imported library.
REobjc: The Future
As with any code, REobjc has a couple of bugs — corner cases where things don’t work quite right. One glaring error occurs when multiple classes have a method with a common name. This happens frequently in Objective-C code where a common parent class is subclassed multiple times. If a parent class P has a method called execute, and there are child classes A and B that reimplement the execute method, REobjc might not (and likely will not) create the proper cross reference to the appropriate subclass reimplementation of the execute method.
That’s a significant bug for a tool with the sole purpose of creating valid cross references. During testing around this bug in REobjc, I found the IDA Pro disassembler suffers from a similar bug. Essentially, REobjc iterates over all the matching execute methods and creates a cross reference to the method with the largest address (in other words, the last method named execute in the target binary). The pseudocode decompiler in IDA Pro seems to suffer from a similar bug. Decompiled code I examined referenced the incorrect class in several cases. I will bring this bug to the attention of the developers at Hex-Rays and provide them with the logic I use to resolve the similar bug in REobjc.
This next issue is not so much a bug as a feature I am working on. The current implementation of REobjc is X64 only. This was by design initially, as I was looking at target programs running on that architecture. In addition, I wanted to focus on making my code work, and architecture portability would only have muddied my development efforts. Future development efforts will continue with the goal of making the code work for ARM AARCH64 architecture. This will let reverse engineers use REobjc on Objective-C binaries using ARM, like those found on Apple’s iPhone, iPad, Apple Watch and Mac Pro.
You can find REobjc added to the existing Duo Labs GitHub idapython repository.