Zero cost stack overflow protection for ARM Cortex-M devices

February 17, 2018 by Jorge Aparicio

One of the core features of Rust is memory safety. Whenever possible the compiler enforces memory safety at compile. One example of this is the borrow checker which prevents data races, iterator invalidation, pointer invalidation and other issues at compile time. Other memory problems like buffer overflows can’t be prevented at compile time. In those cases the compiler inserts runtime checks, bounds checks in this case, to enforce memory safety at runtime.

What about stack overflows? For quite a long time Rust didn’t have stack overflow checking but that wasn’t much of a problem on tier 1 platforms since these platforms have an OS and a MMU (Memory Management Unit) that prevents stack overflows from wreaking havoc.

Consider this (silly) program that calls a recursive function that allocates a 1 MB array on the stack.

fn main() {
    println!("{}", fib(10));
}

#[inline(never)]
fn fib(n: u64) -> u64 {
    let _use_stack = [0u8; 1024 * 1024];

    if n < 2 {
        1
    } else {
        fib(n - 1) + fib(n - 2)
    }
}

If you run this safe program using last year nightly you get a segmentation fault.

$ # last year nightly
$ cargo run +nightly-2017-02-16
[1]    15156 segmentation fault (core dumped)  cargo run +nightly-2017-02-16

But if you run it with a recent nightly you’ll get an abort and a meaningful error message.

$ cargo run +nightly-2018-02-16
thread 'main' has overflowed its stack
fatal runtime error: stack overflow
[1]    16042 abort (core dumped)  cargo run +nightly-2018-02-16

The difference in behavior is due to stack probe support landing in rustc / LLVM last year. Like bounds checks, stack probes are also a runtime memory safety mechanism but for catching stack overflows. At the time of writing only x86 / x86_64 has stack probe support in rustc / LLVM.

MMU-less devices

But what’s the effect of a stack overflow on bare metal devices that have no OS or a MMU like the ARM Cortex-M?

Let’s find out with this (silly) program:

#![no_std]

extern crate cortex_m;
extern crate stm32f103xx;

use cortex_m::asm;

const PATTERN: u32 = 0xdeadbeef;

// initialize some RAM to a known bit pattern
static mut DATA: [u32; 1024] = [PATTERN; 1024];

fn main() {
    asm::bkpt();

    let _x = fib(100);
}

#[inline(never)]
fn fib(n: u32) -> u32 {
    if unsafe { DATA.last() } != Some(&PATTERN) {
        // `DATA` never changes so this should be unreachable, right?
        asm::bkpt();
    }

    // allocate and zero a 1KB of stack memory
    let _use_stack = [0u8; 1024];

    if n < 2 {
        1
    } else {
        fib(n - 1) + fib(n - 2)
    }
}

You can probably guess how this will go … If you debug this program and inspect the memory where DATA is located at the first breakpoint, before fib is called, you’ll see something like this:

> # GDB
> continue
overflow::main () at src/main.rs:14
14          asm::bkpt();

> # breakpoint in `main`

> x/1028x 0x20000000 # inspect the DATA variable
0x20000000: 0xdeadbeef  0xdeadbeef  0xdeadbeef  0xdeadbeef  # start of DATA
(..)
0x20000ff0: 0xdeadbeef  0xdeadbeef  0xdeadbeef  0xdeadbeef  # end of DATA
0x20001000: 0xc260b0e9  0xda79849d  0x517bb7fa  0xa84886ba  # uninitialized RAM

That matches the expected bit pattern. So far so good.

If you resume the program until it hits the second breakpoint, the one inside the fib function, you’ll see this:

> continue
overflow::fib (n=86) at src/main.rs:22
22              asm::bkpt();

> # breakpoint in `fib`

> x/1028x 0x20000000
0x20000000: 0xdeadbeef  0xdeadbeef  0xdeadbeef  0xdeadbeef  # start of DATA
(..)
0x20000fb0: 0xdeadbeef  0xdeadbeef  0xdeadbeef  0xdeadbeef
0x20000fc0: 0x20000ffc  0x08001070  0x20000ffc  0x08001070
0x20000fd0: 0xdeadbeef  0x00000001  0x2000107c  0x08001074
0x20000fe0: 0x2000107c  0x08001074  0x20001048  0x0800036b
0x20000ff0: 0x20000000  0x00000001  0x00000000  0x00000001  # end of DATA
0x20001000: 0x00000000  0x00000001  0x2000107c  0x08001074

The DATA variable has been silently corrupted! Although this program has some unsafe code the memory corruption is not caused by the unsafe code; it is caused by calling the fib function, which is safe to call.

This means that ARM Cortex-M programs which only contain safe code can run into memory corruption issues and that goes against Rust core feature of being memory safe. Let’s fix it!

Fixing it

Stack probes seems like the right way to fix this, but unfortunately stack probe support is only available on x86 and here we are talking about the ARM Cortex-M architecture. There’s another problem as well: the x86 implementation of stack probes assumes there’s some paging (virtual memory) mechanism being used so that implementation can’t be directly translated to bare metal ARM. Finally, stack probes impose a runtime overhead on function calls so it’s not a zero cost solution.

Thankfully, there’s another way to fix this and that’s truly zero cost. Before I explain it let me first show you how stack overflows cause memory corruption.

This is the memory layout of a bare metal Cortex-M program like the one I showed before.

Static variables, like the DATA variable from the previous program, are stored at the bottom (start) of RAM, in the .bss and .data sections, which are fixed in size. The stack is located at the top (end) of RAM and it grows downwards. If the stack grows too large it can crash into the .bss+.data section, overwriting it; this corrupts static variables.

The way to prevent stack overflows from corrupting memory is simple: you place the .bss+.data section at the top of RAM and put the stack below it. Like this:

In this scenario when the stack grows too large it ends up crashing into the boundary of the RAM region and that triggers a hard fault exception. With this layout the static variables remain safe during a stack overflow condition. Nice!

cortex-m-rt-ld

Now all we need to do is change the memory layout of the program. The cortex-m-rt crate decides the memory layout by providing a linker script to the linker. This linker script describes the memory layout of the program in a declarative manner (details here, if you are interested).

The problem is that linker scripts don’t support arranging memory as we want: they only let you specify the start address of sections like .bss+.data but in this case we want to specify the end address of .bss+.data. We can’t specify the start address of .bss+.data to be 0x2000_4000 or some other fixed number because the correct number depends on the size of the .bss+.data section and linker scripts don’t provide support to get the size of an output section – simply because the size is not known at link time; the size of a section will only be known after the linking process.

The workaround for this missing linker script functionality is … to link the program twice – this technique is also used in the C world. Linking is done the first time to figure out the size of the .bss+.data section; after linking you can run arm-none-eabi-size over the output binary and find out the size. In the second linking step we feed the size of the section to the linker script, as a hardcoded number, and use that to select the right start address of the .bss+.data section.

In C this two step linking is done using Makefiles. We can’t replicate that approach in Rust because it requires having the user explicitly write down the linker invocations and in Rust land linking is done transparently by rustc / Cargo.

So what we’ll do instead is to use a linker wrapper. Instead of linking the program using arm-none-eabi-ld we’ll use a linker wrapper called cortex-m-rt-ld. This wrapper is a Rust program that will call the linker twice.

The only thing a user needs to do, apart from installing cortex-m-rt-ld, is to change the linker in Cargo’s configuration file:

$ # this file comes from the cortex-m-quickstart template v0.2.4
$ cat .cargo/config
[target.thumbv7m-none-eabi]
runner = 'arm-none-eabi-gdb'
rustflags = [
  "-C", "link-arg=-Tlink.x",
  "-C", "linker=cortex-m-rt-ld", # <- CHANGED!
  "-Z", "linker-flavor=ld",
  "-Z", "thinlto=no",
]

[build]
target = "thumbv7m-none-eabi"

This will make rustc invoke cortex-m-rt-ld with all the arguments it would normally pass to arm-none-eabi-ld.

In practice

Let’s put this technique in practice by relinking the Cortex M program I showed before. But before we do that let’s look at the linker sections of the binary we debugged.

$ arm-none-eabi-size -Ax target/thumbv7m-none-eabi/debug/overflow
section                  size         addr
.vector_table           0x130    0x8000000
.text                   0xeb2    0x8000130
.rodata                 0x294    0x8000ff0
.stack                 0x5000   0x20000000
.bss                      0x0   0x20000000
.data                  0x1000   0x20000000

This output shows the start addresses and the sizes of the .stack, .bss and .data sections. From the output you can see that they overlap: .stack starts at address 0x2000_5000 and ends at address 0x2000_0000 (remember that it grows downwards); .data starts at address 0x2000_0000 and ends at address 0x2000_1000.

Now let’s relink the program using cortex-m-rt-ld and look at the linker sections again.

$ arm-none-eabi-size -Ax target/thumbv7m-none-eabi/debug/overflow
section                  size         addr
.vector_table           0x130    0x8000000
.text                   0xeb2    0x8000130
.rodata                 0x294    0x8000ff0
.stack                 0x4000   0x20000000
.bss                      0x0   0x20004000
.data                  0x1000   0x20004000

Now the sections don’t overlap! .stack starts at address 0x2000_4000 and ends at address 0x2000_0000; .data starts at address 0x2000_4000 and ends at address 0x2000_5000.

I mentioned that on stack overflow a hard fault exception would be triggered. Turns out we can define how that is handled using the exception! macro so we can choose how the program should behave on a stack overflow condition.

#![no_std]

extern crate cortex_m;
#[macro_use(exception)] // NEW!
extern crate stm32f103xx;

// same program as before

// NEW!
exception!(HARD_FAULT, on_stack_overflow);

#[inline(always)]
fn on_stack_overflow() {
    asm::bkpt();
}

Now let’s run this program.

> # GDB
> continue
overflow::main () at src/main.rs:15
15          asm::bkpt();

> # breakpoint in `main`

> continue
HARD_FAULT () at <exception macros>:14
14      <exception macros>: No such file or directory.

> # breakpoint in `on_stack_overflow`

> x/1028x 0x20003ff0
0x20003ff0: 0x00000000  0x00000000  0x00000014  0xffffffff
0x20004000: 0xdeadbeef  0xdeadbeef  0xdeadbeef  0xdeadbeef # start of DATA
(..)
0x20004ff0: 0xdeadbeef  0xdeadbeef  0xdeadbeef  0xdeadbeef # end of DATA

This time we hit the HARD_FAULT exception handler during the stack overflow and the DATA variable remained intact.

What if I have a heap?

When you have a heap and you use the standard memory layout you can run into two different problems: a stack overflow can overwrite the .heap; and memory allocations can make the .heap grow too large and crash into the .stack, overwriting it.

Again, tweaking the memory layout can prevent the problem. If you place the .heap at the top of the RAM, place .bss+.data below it and the .stack below that then you avoid memory corruption in both scenarios.

cortex-m-rt-ld supports this memory layout but it requires you to specify the size of the .heap in a linker script. You can do that by adding a _heap_size symbol to memory.x, if you are providing that file; or by passing a new linker script that provides that symbol to the linker.

The former will look like this:

$ tail -n1 memory.x
_heap_size = 0x400; /* 1 KB */

And the latter will look like this:

$ echo '_heap_size = 0x400;' > heap.x

$ cat .cargo/config
[target.thumbv7m-none-eabi]
runner = 'arm-none-eabi-gdb'
rustflags = [
  "-C", "link-arg=-Tlink.x",
  "-C", "link-arg=-Theap.x", # NEW!
  "-C", "linker=cortex-m-rt-ld",
  "-Z", "linker-flavor=ld",
  "-Z", "thinlto=no",
]

[build]
target = "thumbv7m-none-eabi"

Here are the linker sections of our running example after adding a 1 KB heap and linking it using cortex-m-rt-ld.

$ arm-none-eabi-size -Ax target/thumbv7m-none-eabi/debug/overflow
section                  size         addr
.vector_table           0x130    0x8000000
.text                   0xe8e    0x8000130
.rodata                 0x294    0x8000fc0
.stack                 0x3c00   0x20000000
.bss                      0x0   0x20003c00
.data                  0x1000   0x20003c00
.heap                   0x400   0x20004c00

Note how .bss, .data and .stack have been pushed down (towards a lower address) by the .heap.

Other configurations?

Currently cortex-m-rt-ld doesn’t support memory layouts that involve more than one RAM region but we don’t have great support for that in cortex-m-rt either so there’s no much point in supporting that in cortex-m-rt-ld at the moment.

The approach described here doesn’t help if you are using threads, where each one has its own stack. In that scenario the thread stacks are laid out contiguously in memory and no amount of shuffling around will prevent one from overflowing into the other. There pretty much your only choice is to use a MPU (Memory Protection Unit) – assuming your microcontroller has one – to create stack boundaries on demand. Using the MPU is not zero cost as there’s some setup involved on each context switch.

Conclusion

That’s it. Protect your ARM Cortex-M program from stack overflows and make it truly memory safe by just swapping out the linker!


Thank you patrons! ❤️

I want to wholeheartedly thank:

Iban Eguia, Aaron Turon, Geoff Cant, Harrison Chin, Brandon Edens, whitequark, James Munns, Fredrik Lundström, Kjetil Kjeka, Kor Nielsen, Alexander Payne, Dietrich Ayala, Kenneth Keiter, Hadrien Grasland, vitiral and 48 more people for supporting my work on Patreon.


Let’s discuss on reddit.

Contents

Creative Commons License
Jorge Aparicio