After reading about shellcode in Chapter 5 of Hacking: The Art of Exploitation, I wanted to go back through some of the examples and try them out. The first example was a simple Hello World program in Intel assembly. I followed along in the book and had no problems reproducing results on a 32 bit Linux VM using nasm with elf file format and ld for linking.
Then I decided I wanted to try something similar but with a little bit of a challenge: write a Mac OS X 64 bit “hello world” program using the new fast ‘syscall’ instruction instead of the software interrupt based (int 0×80) system call, this is where things got interesting.
First and foremost, the version of Nasm that comes with Mac OS X is a really old version. If you want to assemble macho64 code, you’ll need to download the lastest version.
nobody@nobody:~$ nasm -v NASM version 2.09.03 compiled on Oct 27 2010
I figured I could replace the extended registers with the 64 bit registers and the int 0×80 call with a syscall instruction so my first attempt was something like this
After assembling and linking, I got this
nobody@nobody:~$ nasm -f macho64 helloworld.s nobody@nobody:~$ ld helloworld.o ld: could not find entry point "start" (perhaps missing crt1.o) for inferred architecture x86_64
Apparently Mac OS X doesn’t use ‘_start’ for linking, instead it just uses ‘start’. After removing the underscore prefix from start, I was able to link but after running, I got this
nobody@nobody:~$ ./a.out Bus error
I was pretty stumped at this point so I headed off to Google to figure out how I was supposed to use the ‘syscall’ instruction. After a bunch of confusion, I stumbled upon the documentation and realized that x86_64 uses entirely different registers for passing arguments. From the documentation:
The number of the syscall has to be passed in register rax. rdi - used to pass 1st argument to functions rsi - used to pass 2nd argument to functions rdx - used to pass 3rd argument to functions rcx - used to pass 4th argument to functions r8 - used to pass 5th argument to functions r9 - used to pass 6th argument to functions A system-call is done via the syscall instruction. The kernel destroys registers rcx and r11.
So I tweaked the code with this new information
... mov rax, 4 ; System call write = 4 mov rdi, 1 ; Write to standard out = 1 mov rsi, hello_world ; The address of hello_world string mov rdx, 14 ; The size to write syscall ; Invoke the kernel mov rax, 1 ; System call number for exit = 1 mov rdi, 0 ; Exit success = 0 syscall ; Invoke the kernel ...
And with high hopes that I’d see “Hello World!” on the console, I still got the exact same ‘Bus error’ after assembling and linking. Back to Google to see if others had tried a write syscall on Mac OS X. I found a few posts of people having success with the syscall number 0×2000004 so I thought I’d give it a try. Similarly, the exit syscall number was 0×2000001. I tweaked the code and BINGO! I was now able to see “Hello World” output on my console but I was seriously confused at this point; what was this magic number 0×200000 that is being added to the standard syscall numbers?
I looked in syscall.h to see if this was some sort of padding (for security?) I greped all of /usr/include for 0×2000000 with no hints what-so-ever. I looked into the Mach-o file format to see if it was related to that with no luck.
After about an hour and a half of looking, I spotted what I was looking for in ‘syscall_sw.h’
Mac OS X or likely BSD has split up the system call numbers into several different “classes.” The upper order bits of the syscall number represent the class of the system call, in the case of write and exit, it’s SYSCALL_CLASS_UNIX and hence the upper order bits are 2! Thus, every Unix system call will be (0×2000000 + unix syscall #).
Armed with this information, here’s the final x86_64 Mach-o “Hello World”
nobody@nobody:~$ nasm -f macho64 helloworld.s nobody@nobody:~$ ld helloworld.o nobody@nobody:~$ ./a.out Hello World!