Mac OS X Spelunking in PowerPC and x86 Assembly, part 2

Thursday, June 04, 2009 at 11:53 AM



(Note: this is another of our occasional extra-geeky technical posts. If this isn't your thing, don't worry; our usual non-technical stuff will be back soon.)

Welcome back. In our last post we went through a simple function that made calls to other functions, and touched on stack frames and parameter passing. This time let's talk about a different function. We'll focus less on the things we've seen, and more on some more advanced actions that this function does.

UpdateDockTitle

PowerPC

<+0>: mflr    r0               // save linkage
<+4>: stmw r28,-16(r1) // stash r28, r29, r30, r31
<+8>: mr r30,r3 // save r3 (WindowData)
<+12>: bcl- 20,4*cr7+so,0x928d2bd4 <+16>
<+16>: mflr r31 // get ip in r31

Whoa... what?

Short story: <+12> is an unconditional branch-and-link.

Long story: On the PowerPC, instructions like bge, etc. are just aliases to a more primitive branch instruction, bc (branch conditional). In this case, the first parameter is 20 (0b10100), which indicates “branch always”. Since it's always going to branch, the second parameter doesn't matter, so it was set to all 1 bits (which translates to 4*cr7+so).

Why do this? Because we're going to need to access some PC-relative data, and the PowerPC chip has no PC-relative addressing mode. And the register move instructions can't access the PC register. Therefore we cheat in a way by taking an unconditional jump to the next address. Since it's a branch and link, the link register is filled with the next address (in this case, that equals the address just jumped to) which can be moved to a normal register.

Why branch-conditional with a condition “branch always”? The b opcode only provides absolute addressing. Only bc has relative addressing.

<+20>: stw     r0,8(r1)
<+24>: stwu r1,-80(r1) // make stack frame
<+28>: addis r28,r31,3533
<+32>: bl 0x928d2c50 <_Z15GetTitleForDockP10WindowData>
<+36>: lbz r0,-3364(r28) // haul initialization boolean into r0

This is where intuition comes in. We're hauling in some random byte from a PC-relative address. (lbz is load byte and zero, which loads one byte from memory and clears the high bits.) What's byte sized? A Boolean (the Carbon type; GCC makes C++ bools 4 bytes). Why a Boolean? Probably a flag. And with the value of the byte gating the call to RegisterAsDockClientPriv, it's a safe bet that it's an initialization flag.

<+40>: mr      r29,r3         // stash new title into r29
<+44>: cmpwi cr7,r0,0 // was initialized?
<+48>: bne- cr7,0x928d2c04 <+64> // if so, skip
<+52>: bl 0x9287f864 <_Z24RegisterAsDockClientPrivv> // else initialize
<+56>: li r0,1 // and set flag
<+60>: stb r0,-3364(r28) // as being intialized
<+64>: mr r3,r30
<+68>: mr r4,r29
<+72>: bl 0x928d2c68 <SyncPlatformWindowTitle> // call with (WindowData, new title)
<+76>: lwz r0,344(r30) // pull (WindowData + 344)
<+80>: andis. r2,r0,64 // and pull a flag bit out of it (minimized?)

More intuition here. r30 contains a pointer to the WindowData class instance, and we're accessing some word 344 bytes in. We don't care about the destination register (we don't touch r2 again this function) but don't miss the name of the opcode: “andis.” Remember that the period means to update cr0.

Once again, this is obviously a flag (bit-sized this time). But what does it mean? Context tells us that we only call CoreDockSetItemTitle when it's set. Thus, it's a safe guess that this is the is-minimized flag.

<+84>: beq-    0x928d2c38 <+116> // if not minimized, skip this step
<+88>: addi r1,r1,80
<+92>: lwz r3,196(r30) // load WID

How do I know that WindowData+196 is the CoreGraphics WID (CGWindowID; see CGWindow.h)? I used Quartz Debug to look at the window list for a sample app. The app only had one window, and the listed WID matched.

<+96>: mr      r4,r29 // load new title
<+100>: lwz r0,8(r1)
<+104>: lmw r28,-16(r1) // tear down stack frame
<+108>: mtlr r0
<+112>: b 0x92b58ce4 <dyld_stub_CoreDockSetItemTitle>

Note that we're tearing down the stack frame twice. In this case we're tail calling CoreDockSetItemTitle so that it's as if our caller called them directly. This is equivalent to the code return CoreDockSetItemTitle(wid, newTitle). Note from the setup of r3 and r4 that we can deduce the parameter types. Can we figure out the return type, though? Not really. The calling code ignores it, so we can ignore it too.

<+116>: addi    r1,r1,80
<+120>: li r3,0
<+124>: lwz r0,8(r1)
<+128>: lmw r28,-16(r1)
<+132>: mtlr r0
<+136>: blr

x86

<+0>: push   %ebp                   // make stack frame
<+1>: mov %esp,%ebp
<+3>: sub $0x28,%esp
<+6>: mov %ebx,-0xc(%ebp) // save %ebx
<+9>: call 0x92e4bbe4 <+14>
<+14>: pop %ebx // IP > %ebx

We're doing the same trick here to get the PC into a register and I'm a bit stumped as to why. From what I know, the x86 has PC-relative addressing, and surely there's got to be a better way to get the PC into a normal register. Right?

<+15>: mov    %esi,-0x8(%ebp)      // save %esi
<+18>: mov 0x8(%ebp),%esi // WindowData > %esi
<+21>: mov %edi,-0x4(%ebp) // save %edi

This almost looks like it was compiled by a different compiler. In the previous function, edi and esi are pushed, and then the stack pointer dropped. Here, we create the stack space and then move the contents of three registers (edi, esi, and ebx). I suspect that things change once we also have to save ebx, though I don't know why.

<+24>: mov    %esi,%eax            // %esi (WindowData) > %eax
<+26>: call 0x92e4bc40 <_Z15GetTitleForDockP10WindowData>

Whoa. If we're calling a function we need to set the parameter via stack-relative addressing off esp. What's going on here?

The point of an ABI is that it's a documented way for functions to call each other. But if a function, say GetTitleForDock(WindowData*), is a short one that's not public and is only used under controlled circumstances, why worry about setting up the stack? In this particular case, GetTitleForDock happens to be a nine-instruction routine. Not worth the hassle of a stack frame, so it's reasonable to pass in the one parameter in eax.

<+31>: cmpb   $0x0,0xd51a36c(%ebx) // test initialization boolean
<+38>: mov %eax,%edi // window title > %edi
<+40>: jne 0x92e4bc0c <+54> // if initialized, skip
<+42>: call 0x92df9fe0 <_Z24RegisterAsDockClientPrivv> // else initialize
<+47>: movb $0x1,0xd51a36c(%ebx) // and set flag as being initialized
<+54>: mov %edi,0x4(%esp) // new title (param 2)
<+58>: mov %esi,(%esp) // WindowData (param 1)
<+61>: call 0x92e4bc52 <SyncPlatformWindowTitle>
<+66>: xor %eax,%eax // clear %eax (noErr?)
<+68>: testb $0x2,0x159(%esi) // test flag (WindowData + 0x159) (minimized?)
<+75>: je 0x92e4bc35 <+95> // if not minimized, skip this step
<+77>: mov %edi,0x4(%esp) // new title (param 2)
<+81>: mov 0xc4(%esi),%eax // (WindowData + 0xC4) WID
<+87>: mov %eax,(%esp) // (param 1)
<+90>: call 0xa0a52ad1 <dyld_stub_CoreDockSetItemTitle>
<+95>: mov -0xc(%ebp),%ebx
<+98>: mov -0x8(%ebp),%esi
<+101>: mov -0x4(%ebp),%edi
<+104>: leave
<+105>: ret

Conclusion

Poking around in assembly isn't usually something you do every day. But whether you need it for debugging your own code or exploring someone else's, it's a skill that is definitely worth learning. PowerPC and x86 processors might have had a bit of a different history, but the code that's generated for either is certainly not as intractable as some suggest.

Where to go from here? Look around some more. Use otool -tV to dump binaries and see what they do. Use nm to see which symbols are exported from frameworks and watch how they work.

Go exploring, and have fun.

(Thanks to my editor, Scott Knaster, and to David Shayer, whose introductory session on PowerPC assembly at the legendary MacHack conference started me on this path.)