Tuesday, May 26, 2009 at 4:11 PM
(Note: this is one of our occasional extra-geeky technical posts. If this isn't your thing, don't worry; our usual non-technical stuff will be back soon.)
If you're a programmer, there are many reasons why you might want to go exploring the inner workings of Mac OS X. You might want to learn how how Apple achieves interesting effects. Or perhaps you're just curious about how things work. (We're all adults here, so I won't lecture you about the dangers of using private or undocumented interfaces in your apps.)
In any case, though, you need to know how to read assembly, either PowerPC (if you have an older Mac) or x86 (if you have anything recent). While there are good resources available to learn about reading PowerPC assembly for exploration, there are fewer about x86. Despite the present and future of the Mac being x86, it seems like people have lots of anxiety about having to work with it.
I think the problem is not a lack of documentation on x86 assembly, but a surfeit of it. Most of it is Windows- or DOS-centric, usually with syntax that doesn't apply (Intel syntax vs the AT&T syntax that GCC uses), and with the aim of teaching how to write it. But reading x86 assembly really isn't that hard. If all you want to do is learn how to read the code generated by GCC, it's probably just as easy as PowerPC.
The other day I was investigating how window minimization and window titles work. While exploring, I took notes of my discoveries. Let's touch on two functions, in both PowerPC and x86 flavors.
Before we begin, I'm going to assume that you're comfortable with assembly in general (though not necessarily with any particular one). If you have the latest developer tools, launch Shark (in /Developer/Applications/Performance Tools
) and in the Help menu you can access various ISA references. In addition, Apple has ABI documentation for both the PowerPC and x86. I'm going to go over each function twice (once for PowerPC and once for x86); feel free to skim the PowerPC version if you're accustomed to it. And finally, this is only for the 32-bit version of each platform; things change even more with 64 bits.
SetWindowTitleWithCFString
The trail always begins with a public call that uses the SPI that you want to figure out. In this case, I chose SetWindowTitleWithCFString
because it has to somehow set the title of a window even if it's minimized. I went with Carbon because sometimes the dynamic nature of Objective-C with Cocoa makes tracing code harder.
PowerPC
<+0>: mflr r0 // save linkage
<+4>: stmw r30,-8(r1) // stash r30, r31
<+8>: mr r30,r4 // save r4 (new title)
<+12>: stw r0,8(r1) // make stack frame
<+16>: stwu r1,-80(r1) // make stack frame
This is the prologue of the function. The PowerPC doesn't have a dedicated stack pointer (convention is to use r1
for that), so the common way of implementing branches by pushing the PC onto the stack doesn't work. Instead, the PowerPC has a link register and a command bl
to branch and put the old PC value into the link register. Thus, almost every function starts with mflr r0
, to pull the old PC into a usable register. Then in <+4> we save off some registers that we're going to smash. Every function needs scratch registers to hold local variables, and usually the high-numbered registers are used. The stmw
(store multiple words) instruction is useful for ditching many high registers on the stack. Then in <+12> we drop the old PC onto the stack and allocate 80 bytes on the stack.
A note on parameter passing. Integer-sized parameters (the only kind we'll be dealing with today) are passed into a function starting with r3
and going up through the registers. Return values are returned in r3
. So we see that in <+8> we stick away the pointer to the new name in r30
(whose previous value was stored on the stack earlier).
<+20>: bl 0x92881384 <_Z13GetWindowDataP15OpaqueWindowPtr>
<+24>: li r0,-5600 // errInvalidWindowRef
<+28>: cmpwi cr7,r3,0 // if no window data, bail
<+32>: beq- cr7,0x928d2ae0 <+60>
<+36>: cmpwi cr7,r30,0 // if no string to set, bail
<+40>: li r0,-50 // paramErr
<+44>: beq- cr7,0x928d2ae0 <+60>
<+48>: mr r4,r30
This is where we must start making inferences as to what the code is doing. Fortunately, we have the symbols so it's not too hard. We see that we use the WindowRef as a parameter to a C++ function GetWindowData(OpaqueWindowPtr)
, as the WindowRef was passed in as r3
and r3
wasn't altered before the call. In addition, note that the function return value, being in r3
, will overwrite the WindowRef value which wasn't saved in a high register. That's fine, as the WindowRef was just an index into a table and won't be needed further.
At this point we run some checks. We compare both r3
and r30
to zero, and if either are zero we jump to the end with r0
set to the appropriate error code. (The end of the function will move r0
into r3
for return.)
The PowerPC condition register has eight condition sets. Why are we using cr7
here? Probably because cr7
is volatile and we can get away with not saving/restoring it.
<+52>: bl 0x928d2af8 <_ZN10WindowData14SetTitleCommonEPK10__CFString>
<+56>: li r0,0 // return noErr
<+60>: addi r1,r1,80 // tear down stack frame and return
<+64>: mr r3,r0
<+68>: lwz r0,8(r1)
<+72>: lmw r30,-8(r1)
<+76>: mtlr r0
<+80>: blr
The rest is pretty simple. We call a member function WindowData::SetTitleCommon(CFString*)
, and then do common tear down. We restore the stack pointer, put the return value into r3
, restore the registers, move the old PC back into the link register, and branch to the link register (blr
), returning us to our caller.
x86
The PowerPC register file is really easy: r0
, r1
, r2
... r31
. x86 has fewer registers and they've historically had different roles (accumulator, base, source index, destination index, and so on). Seriously, forget about that. There are eight registers you care about. eax
, ebx
, ecx
, edx
, esi
, and edi
are all general-purpose registers. esp
is the stack pointer. ebp
is the frame pointer. That's it.
PowerPC assembly reads right-to-left (except for stores). x86 AT&T syntax in general reads left-to-right.
<+0>: push %ebp // make stack frame
<+1>: mov %esp,%ebp // make stack frame
<+3>: push %esi // stash %esi
<+4>: sub $0x14,%esp // make stack frame
x86 is stack-based. Parameters to a function are put at the top of the stack, and the rightmost parameters have the highest addresses. To execute the function, the call
instruction was used. This instruction pushes the PC onto the stack, so even before we hit <+0> the parameters are four bytes above the stack pointer. In <+0> we save off the old stack frame value and in <+1> we establish our stack frame. At this point ebp
is fixed for the entire function. In <+3> we save the old values of registers we're going to use, and in <+4> we allocate space on the stack.
This is a perfect example of an ideal stack frame. ebp
is the frame pointer. It points (to the stack) at the old frame pointer. ebp
+4 is the PC of the function that called us. ebp
+8 is the first parameter passed in, ebp
+12 is the second, etc. Immediately below ebp
are the values saved from the registers, which will be restored before the return. And below that is a bunch of stack space used for either register spillage or calling subsequent functions. One interesting note is that rarely are parameters pushed onto the stack for a call. The stack pointer doesn't move once we make it past the prologue. We just set the memory right above esp
(the stack pointer) and make the call.
<+7>: mov 0x8(%ebp),%eax // get WindowRef in %eax
<+10>: mov 0xc(%ebp),%esi // get new title in %esi
The parameters are passed on the stack. Since fiddling in memory is slow, we pull the values into registers. It's actually pretty analogous to how things go in PowerPC. There, lower registers like r3
are reused for parameter passing so important values are kept in the high registers. On x86 the parameters go on the stack and values are kept in registers when possible. Why eax
and esi
? Why not?
<+13>: mov %eax,(%esp) // put WindowRef on the stack
<+16>: call 0x92dfb8f6 <_Z13GetWindowDataP15OpaqueWindowPtr>
With the PowerPC, you can tell how many parameters a function has by seeing how many registers starting with r3
are loaded. Here, we just look at the register indirect addressing with esp
.
<+21>: mov %eax,%edx // stick WindowData into %edx
<+23>: mov $0xffffea20,%eax // errInvalidWindowRef
<+28>: test %edx,%edx // if no window data, bail
<+30>: je 0x92e4bb04 <+54>
<+32>: test %esi,%esi // if no string to set, bail
<+34>: mov $0xffce,%ax // paramErr
<+38>: je 0x92e4bb04 <+54>
Return values come back from functions in eax
, but otherwise this is pretty much the same. The only thing of interest to note is the clever use of the peculiar register structure. In <+23> the constant 0xffffea20
is loaded into eax
. But on <+34> the constant 0xffce
is loaded in ax
. But since ax
is just an alias for the lower 16 bits of eax
, the upper half of the word is left as 0xffff
and we get the full constant 0xffffffce
in eax
. Why do this? Because loading a 32 bit constant takes 5 bytes while loading a 16 bit constant only takes 4.
<+40>: mov %esi,0x4(%esp) // load new title as param 2
<+44>: mov %edx,(%esp) // load WindowData as param 1
<+47>: call 0x92e4bb0c <_ZN10WindowData14SetTitleCommonEPK10__CFString>
<+52>: xor %eax,%eax // return noErr
Same stuff as before. The one note is the zeroing of eax
with an xor
. Just a fancy trick as the generated code is faster and smaller than the equivalent mov $0x0,%eax
.
<+54>: add $0x14,%esp // tear down stack frame and return
<+57>: pop %esi
<+58>: leave
<+59>: ret
<+60>: nop
<+61>: nop
The mirror image of the stack frame creation.
That's one function down and one left to go. Next time, we'll take a look at a function that behaves a little differently than this one did.