UPDATE Given the comments I’ve received so far I think I should more explicitly mention that the context here are systems that lack a MMU and where a memory allocator may or may not be available or desirable, e.g. Cortex-M microcontrollers.
In this post I’ll describe an approach to building memory safe DMA based APIs.
DMA?
DMA stands for Direct Memory Access and it’s a peripheral used for transferring data between two
memory locations in parallel to the operation of the core processor. I like to think of the DMA as
providing asynchronous memcpy
functionality.
Let me show you the awesomeness of the DMA with an example:
Let’s say we want to send the string "Hello, world"
through the serial interface. As you probably
know by now, using the serial interface involves writing to registers. In particular, sending a byte
through the interface requires writing that byte to a register – let’s call that register the DR
register.
The serial interface operates at a slower frequency than the processor so to avoid a buffer overrun is necessary to wait until the byte has been shifted out from the DR register before writing a new byte to it. In other words, if you write bytes to the DR register too fast you’ll end up overwriting the previous byte before it has chance to be sent through the serial interface – that condition is known as buffer overrun.
The straightforward approach to performing this task is to do several blocking “write a single byte” operations:
for byte in b"Hello, world!".iter() {
block!(serial.write(*byte));
}
Here block!
will busy wait until the previous byte gets sent through the serial interface, and
serial.write
will write the *byte
into the DR register.
This gets the job done but it uses precious CPU time: the processor will be completely busy
executing the for
loop.
If we use the DMA the task can be performed with almost 0% CPU usage:
static MSG: &'static [u8] = b"Hello, world!";
// this block is executed in a few instructions
unsafe {
// address of the DR register in the USART1 register block
const USART1_DR: u32 = 0x4001_3804;
// (some configuration has been omitted)
// transfer this number of bytes
dma1_channel4.set_transfer_size(MSG.len()); // in bytes
// from here
dma1_channel4.set_src_address(MSG.as_ptr() as usize as u32);
// to here
dma1_channel4.set_dst_address(USART1_DR);
// go!
dma1_channel4.start_transfer();
}
// now the processor is free to perform other tasks
// while the DMA sends out the "Hello, world!" string
This code performs the same task but now the processor is free to do other tasks while the serial operation is performed in the background.
Although not shown above, the processor can check if the DMA transfer has finished by reading some register.
When DMA transfers go wrong
DMA transfers are pretty useful because they can free up a lot of CPU time but they can be very dangerous when misused.
Let’s look at an example where a DMA transfer goes wrong:
fn start() {
let mut buf = [0u8; 256];
// starts a DMA transfer to fill `buf` with data from the serial interface
unsafe {
// ..
dma1_channel5.set_transfer_size(buf.len());
dma1_channel5.set_src_address(USART1_DR);
dma1_channel5.set_dst_address(buf.as_mut_ptr());
dma1_channel5.start_transfer();
}
// `buf` deallocated here
}
fn corrupted() {
let mut x = 0;
let y = 0;
// do stuff with `x` and `y`
}
start();
corrupted();
Here the problem is that a transfer is started on a stack allocated buffer but then the buffer is
immediately deallocated. The call to corrupted
reuses the stack memory that the DMA is operating
on for the stack variables x
and y
; this lets the DMA overwrite the values of x
and y
,
wreaking havoc. If you add optimization into the mix it becomes impossible to predict what will
happen at runtime.
In this case it’s a bit obvious that there’s a programmer error as buf
is never used. The problem
becomes less obvious if you return buf
from the start
function; in that case you can still get
undefined behavior depending on how the compiler decides to optimize the code.
Trying to make it safe
Using the DMA like that is unsafe
because a lot of things can go wrong. In this section we’ll try
to wrap all that unsafe
code into a safe abstraction.
We start with a newtype over the buffer on which the DMA is operating:
/// Ongoing DMA transfer
struct Transfer<'a> {
buf: &'a mut [u8],
ongoing: bool,
}
We can use this to freeze the original buffer while the DMA operation is in progress. That prevents the buffer from being modified (that would be mutable aliasing – the DMA is already mutating the buffer) and from being deallocated (that would let the DMA corrupt memory if the allocation is reused).
Dropping the Transfer
instance would let us modify, and also destroy, the original buffer so that
operation should stop the transfer to prevent mutable aliasing and memory unsafety:
impl<'a> Drop for Transfer<'a> {
fn drop(&mut self) {
// NOTE For now I'm not going to explain where this
// `dma1_channel5` value comes from. I'll come back to it later
// on drop we stop the ongoing transfer
if self.ongoing {
dma1_channel5.stop_transfer();
}
}
}
We want to be able to get the buffer back when the transfer is over so we add a wait
method that
waits until the transfer is over and returns back the buffer:
impl<'a> Transfer<'a> {
pub fn wait(mut self) -> &'a mut [u8] {
// wait until the transfer is over
while dma1_channel5.transfer_is_in_progress() {}
// defuse the `drop` method
self.ongoing = false;
self.buf
}
}
Now we can pair this Transfer
API with a Serial
interface abstraction to provide a safe API for
the asynchronous read operation we had before:
impl Serial {
/// Starts a DMA transfer to fill `buf` with data from the serial interface
fn read_exact<'a>(&mut self, buf: &'a mut [u8]) -> Transfer<'a> {
unsafe {
dma1_channel5.set_src_address(USART1_DR);
dma1_channel5.set_dst_address(buf.as_mut_ptr());
dma1_channel5.set_transfer_size(buf.len());
dma1_channel5.start_transfer();
}
Transfer { buf, ongoing: true }
}
}
Usage looks like this:
let mut buf = [0; 16];
let transfer = serial.read_exact(&mut buf);
// do other stuff
let buf = transfer.wait();
// do stuff with the now filled `buf`fer
Now let’s see if the API can prevent us from shooting ourselves in the foot:
fn start(serial: &mut Serial) -> Transfer {
let mut buf = [0; 16];
serial.read_exact(&mut buf)
//~^ error: borrowed value does not live long enough
} // `buf` dropped / deallocated here
Good. This won’t compile because buf
is both allocated and deallocated in start
thus the
Transfer
can’t outlive the scope of start
.
Let’s try the stack corruption example from before:
fn start(serial: &mut Serial) {
let mut buf = [0; 16];
// (the `Transfer` value will get `drop`ped here even if I don't call `drop`)
drop(serial.read_exact(&mut buf));
}
fn corrupted() {
let mut x = 0;
let y = 0;
// do stuff with `x` and `y`
}
start(&mut serial);
corrupted();
There won’t be stack corruption this time because when Transfer
is dropped in start
the DMA
transfer is stopped. Great!
Leakpocalypse
Seems like a pretty solid abstraction, right? Unfortunately, it’s not completely safe because it relies on destructors for safety and destructors are not guaranteed to run in Rust.
Here’s how to break the abstraction:
fn start(serial: &mut Serial) {
let mut buf = [0; 16];
// not `unsafe`!
mem::forget(serial.read_exact(&mut buf));
}
fn corrupted() {
let mut x = 0;
let y = 0;
// do stuff with `x` and `y`
}
start(&mut serial);
corrupted();
This produces stack corruption in safe Rust. mem::forget
-ing Transfer
prevents its destructor
from running, which means the DMA transfer is never stopped. Furthermore, this also breaks Rust
aliasing rules because it lets the processor mutate buf
which is already being mutated by the
DMA.
“But nobody writes code like that!”. Not on purpose, no; but we are talking about Rust here: memory unsafety is banned in safe Rust and that property must hold regardless of how contorted the code is.
&'static mut
to the rescue
The good news is that we can fix all the issues by simply tweaking the lifetime of Transfer
:
/// Ongoing DMA transfer
struct Transfer {
buf: &'static mut [u8], // <- lifetime changed
// ongoing: bool, // no longer required
}
// impl Drop for Transfer { .. } // no longer required
impl Transfer {
pub fn wait(self) -> &'static mut [u8] {
// wait until the transfer is over
while dma1_channel5.transfer_is_in_progress() {}
// self.ongoing = false; // no longer required
self.buf
}
}
impl Serial {
/// Starts a DMA transfer to fill `buf` with data from the serial interface
fn read_exact(&mut self, buf: &'static mut [u8]) -> Transfer {
// same implementation as before
}
}
Now you may be wondering “But, where can I get a &'static mut
reference from? Stack allocated
arrays don’t have 'static
lifetime”. I got you covered: my last blog post explains how
to safely create &'static mut
references within and without RTFM. Let’s use the singleton!
approach to test out this API:
let buf: &'static mut [u8] = singleton!(_: [u8; 16] = [0; 16]).unwrap();
let transfer = serial.read_exact(buf);
// do stuff
let buf: &'static mut [u8] = transfer.wait();
// do stuff with `buf`
Seems to work. What about the issues that plagued the previous API?
fn start(serial: &mut Serial) {
let buf: &'static mut [u8] = singleton!(_: [u8; 16] = [0; 16]).unwrap();
mem::forget(serial.read_exact(buf));
}
fn corrupted() {
let mut x = 0;
let y = 0;
// do stuff with `x` and `y`
}
start(&mut serial);
corrupted();
buf
will be statically allocated in the .bss
region, not on the stack, so, in first place, it’s
impossible to deallocate buf
’s memory. Secondly, Transfer
has no destructor this time so it
doesn’t matter if mem::forget
is used on the value or not. In either case, the DMA transfer will
continue its process but since it’s operating on statically allocated memory and not on the stack
there won’t be stack corruption problem in this case. Nice!
What about mutable aliasing? &'static mut T
has move semantics so calling serial.read_exact
hands over ownership of buf
to the Transfer
value. Even if the Transfer
value is
mem::forget
-ten the buffer memory can’t be accessed through buf
anymore:
let buf: &'static mut [u8] = singleton!(_: [u8; 16] = [0; 16]).unwrap();
mem::forget(serial.read_exact(buf));
buf[0] = 1;
//~^ error: cannot assign to `buf[..]` because it is borrowed
There’s one more consequence to using &'static mut
references in the DMA based API: now Transfer
owns the buffer and has 'static
lifetime (more precisely: it satisfies the Transfer: 'static
bound). This means that Transfer
values can be stored in RTFM resources (static
variables), which
can be used to move data from one task to another.
So, we can start a DMA transfer in task A, send the Transfer
value to task B and complete (wait
for) the transfer there. The send operation is also cheap because the Transfer
value is only 2
words in size (and it could be just 1 word in size if &'static mut [T; N]
was used internally).
An alternative API
While working on this blog post @nagisa pointed out to me another way to make a memory safe DMA based API:
impl Serial {
fn read_exact<R, F>(&mut self, buf: &mut [u8], f: F) -> R
where
F: FnOnce() -> R,
{
// start transfer
unsafe {
// ..
dma1_channel5.set_src_address(USART1_DR);
dma1_channel5.set_dst_address(buf.as_mut_ptr());
dma1_channel5.set_transfer_size(buf.len());
dma1_channel5.start_transfer();
}
// run closure
let r = f();
// wait until the transfer is over
while dma1_channel5.transfer_is_in_progress() {}
r
}
}
This closure-based API encodes the “start transfer, do stuff and wait for the transfer to finish” pattern that we have seen before into a single method call. This method is safe even when used with stack allocated buffers as there’s no way to deallocate the buffer while the transfer is in progress (*).
let buf = [0; 16];
serial.read_exact(&mut buf, || {
// do stuff
});
// do stuff with `buf`
The disadvantage of this API is that you can’t send an ongoing DMA transfer to another task
(execution context) because the transfer will always be completed during the execution of
read_exact
.
(*) A digression
This alternative API made stop and think about exception safety. For example, what happens if
f
panics and the panicking behavior is to unwind 1? That would deallocate the arraybuf
but wouldn’t stop the DMA transfer and that might cause problems.That’s not hard to fix though: you create a drop guard that stops the DMA transfer in its destructor before calling
f
and then youmem::forget
it afterf
returns. The fix will cost a bit of extra binary size but the increase should be negligible.Finally, I don’t think the
&'static mut
-based API has to concern itself with exception safety becausesingleton!
and RTFM allocate the memory in.bss
/.data
and that memory will never be deallocated.
Improving the guarantees
Up to this point the &'static mut
-based API is memory safe but it’s not foolproof. For instance,
nothing stops you from starting another DMA transfer on the same serial interface but that’s not
allowed by the hardware. Let’s see how we can improve the API to prevent that.
First, let’s demystify this dma1_channel5
value. This value actually has type dma1::Channel5
and,
semantically, has ownership over one of the DMA channels (some vendors call them streams, not
channels). The DMA subsystem usually can handle several concurrent, independent data transfers; a
channel is the part of the subsystem that handles one of those concurrent data transfers. The number
of DMA channels is device specific: for example, the STM32F103 has two DMA peripherals, DMA1 and
DMA2, and DMA1 has seven channels, DMA2 five.
We can start there and provide an API to split DMA peripherals into independent channels:
let p = stm32f103xx::Peripherals::take().unwrap();
// consumes `p.DMA1`
let channels: dma1::Channels = p.DMA1.split();
let c4: dma1::Channel4 = channels.4;
This is pretty similar to what we did with the GPIO peripheral, which controls the configuration of I/O pins, in the Brave new I/O blog post.
Next, usage constraints:
Some channels can be used with some peripherals but not with others. Also, a single channel can’t be
used with more than one peripheral at the same time, and a single channel can’t handle more than one
memory transfer at the same time. We can encode all these properties in the API by having Transfer
take ownership of the channel:
/// Ongoing DMA transfer
struct Transfer<CHANNEL> {
buf: &'static mut [u8],
chan: CHANNEL, // NEW!
}
impl Transfer<dma1::Channel4> {
/// Waits until the DMA transfer is done
pub fn wait(self) -> (&'static mut [u8], dma1::Channel4) {
// wait until the transfer is over
while self.chan.ifcr().tcif4().bit_is_clear() {}
(self.buf, self.chan)
}
}
impl Serial {
/// Starts a DMA transfer to fill `buf` with data from the serial interface
pub fn read_exact(
&mut self,
chan: dma1::Channel4, // NEW!
buf: &'static mut [u8],
) -> Transfer<dma1::Channel4> {
// ..
// `chan` grants access to the registers of DMA1_CHANNEL4
// set destination address
chan.cmar().write(|w| w.ma().bits(buf.as_ptr() as usize as u32));
// ~~~~ CMAR4 register
// set transfer size
chan.cndtr().write(|w| w.ndt().bits(buf.len()));
// ~~~~~ CNDTR4 register
// ..
}
}
Example of hardware constraints being enforced at compile time:
let a = singleton!(_: [u8; 16] = [0; 16]).unwrap();
let b = singleton!(_: [u8; 16] = [0; 16]).unwrap();
// wrong channel
// serial.read_exact(channels.1, a);
//~^ error: expected `dma1::Channel4`, found `dma1::Channel1`
// OK
let t = serial.read_exact(channels.4, a);
// can't start a new DMA transfer on the same peripheral
// let t = serial.read_exact(channels.4, b);
//~^ error: use of moved value `channels.4`
// can't start a DMA transfer on another peripheral that also uses dma1::Channel4
// let t = i2c2.write_all(channels.4, ADDRESS, b);
//~^ error: use of moved value `channels.4`
This would have also worked if Transfer
stored a mutable (&mut-
) reference to dma1::Channel4
instead of storing it by value, but with that approach Transfer
would have lost its : 'static
bound and you would no longer be able to store Transfer
in a RTFM resource.
There’s one more change to do here. Transfer
doesn’t freeze the Serial
instance; this means that
after calling serial.write_all(c5, "Hello, world!")
you are still be able to call
serial.write(b'X')
to write a byte to the interface. That’s not a good / useful thing to do
because the processor will race against the DMA transfer. Let’s forbid that by having Transfer
take ownership of the serial interface as well:
/// Ongoing DMA transfer
struct Transfer<CHANNEL, P> {
buf: &'static mut [u8],
chan: CHANNEL,
payload: P, // NEW!
}
impl<P> Transfer<dma1::Channel4, P> {
/// Waits until the DMA transfer is done
pub fn wait(self) -> (&'static mut [u8], dma1::Channel4, P) {
// wait until the transfer is over
while self.chan.ifcr().tcif4().bit_is_clear() {}
(self.buf, self.chan, self.payload)
}
}
impl Serial {
/// Starts a DMA transfer that fills the `buf`fer with serial data
pub fn read_exact(
self, // <- main change (was `&mut self`)
chan: dma1::Channel4,
buf: &'static mut [u8],
) -> Transfer<dma1::Channel4, Serial> {
// ..
Transfer { buf, chan, payload: self }
}
}
Preventing misoptimtization
To us, programmers, using the DMA based API looks like:
let buf = singleton!(_: [u8; 45] = [0; 45]).unwrap();
buf.copy_from_slice(b"The quick brown fox jumps over the lazy dog.\n")
let transfer = serial.write_all(channels.5, buf);
// ..
let (buf, c5, serial) = transfer.wait();
To the compiler that code looks like this, after inlining some functions calls:
let buf = singleton!(_: [u8; 45] = [0; 45]).unwrap();
buf.copy_from_slice(b"The quick brown fox jumps over the lazy dog.\n")
// ..
// set destination address
channels.5.cmar().write(|w| w.ma().bits(buf.as_ptr() as u32));
// set transfer size
channels.5.cndtr().write(|w| w.ndt().bits(buf.len()));
// ..
// start transfer
channels.5.ccr().modify(|w| w.cen().set_bit());
let transfer = Transfer { buf, chan: channels.5, payload: serial }
// ..
// wait until the transfer is over
while transfer.chan.ifcr().tcif4().bit_is_clear() {}
let (buf, c5, serial) = (transfer.buf, transfer.chan, transfer.payload);
Now, the operations on registers (e.g. write
s) are volatile so we are sure the compiler won’t
reorder those with respect to other volatile operations. But, the compiler is free to move non
volatile operations like buf.copy_from_slice
to, say, after // start transfer
as that reordering
doesn’t change the outcome of the preceding buf.as_ptr()
and buf.len()
operations. Of course,
such reordering would change the semantics of the program (it creates a data race between the DMA
and the processor) because buf
will be read by the DMA after // start transfer
but the compiler
doesn’t know that.
To prevent those problematic reorderings we can add compiler_fence
s to both Serial.write_all
and Transfer.wait
such that the inlined code looks like this:
let buf = singleton!(_: [u8; 45]).unwrap();
buf.copy_from_slice(b"The quick brown fox jumps over the lazy dog.\n")
// ..
// set destination address
channels.5.cmar().write(|w| w.ma().bits(buf.as_ptr() as u32));
// set transfer size
channels.5.cndtr().write(|w| w.ndt().bits(buf.len()));
// ..
atomic::compiler_fence(Ordering::SeqCst); // <- NEW!
// start transfer
channels.5.ccr().modify(|w| w.cen().set_bit());
let transfer = Transfer { buf, chan: channels.5, payload: serial }
// ..
// wait until the transfer is over
while transfer.chan.ifcr().tcif4().bit_is_clear() {}
atomic::compiler_fence(Ordering::SeqCst); // <- NEW!
let (buf, c5, serial) = (transfer.buf, transfer.chan, transfer.payload);
compiler_fence(Ordering::SeqCst)
prevents the compiler 2 from reordering any memory operation
across it. With this change buf.copy_from_slice
can’t be moved to after // start transfer
.
compiler_fence
is a bit of a hammer 3 in this case because it prevents reordering any memory
operation across it, which could hinder some optimizations, but here we only want to prevent memory
operations on buf
from being reordered across the fence. I don’t know if it’s possible to give a
more precise hint to the compiler, though. If you know the answer, let me know!
Making it generic
DMA based APIs would be a great addition to the embedded-hal
but they need to be free of device
specific details like the channel types and the Transfer
type. We can rework Serial.read_exact
and Transfer
into device agnostic traits like these:
/// On going DMA transfer
pub trait Transfer {
type Payload;
fn is_done(&self) -> bool;
fn wait(self) -> Self::Payload;
}
/// Read bytes from a serial interface
pub trait ReadExact {
type T: Transfer<Payload = (Self, &'static mut [u8])>;
fn read_exact(self, buf: &'static mut [u8]) -> Self::T;
}
An implementation of those traits could look like this:
pub struct DmaSerialTransfer {
// `Transfer` is the implementation from before
transfer: Transfer<dma1::Channel4, Serial>,
}
impl hal::Transfer for DmaSerialTransfer {
fn is_done(&self) -> bool {
self.transfer.is_done()
}
fn wait(self) -> (DmaSerial, &'static mut [u8]) {
let (buf, chan, serial) = self.transfer.wait();
(DmaSerial { serial, chan }, buf)
}
}
/// DMA enabled serial interface
pub struct DmaSerial { serial: Serial, chan: dma1::Channel4 }
impl hal::ReadExact for DmaSerial {
type T = DmaSerialTransfer;
fn read_exact(self, buf: &'static mut [u8]) -> DmaSerialTransfer {
// `_read_exact` is the implementation frome before
let transfer = self.serial._read_exact(self.chan, buf);
DmaSerialTransfer { transfer }
}
}
impl Serial {
/// Enable DMA functionality
pub fn with_dma(self, chan: dma1::Channel4) -> DmaSerial {
DmaSerial { serial: self, chan }
}
}
Futures?
Some of you have probably noticed that the Transfer
trait is similar to the Future
trait. Why
not use the Future
trait instead? Well, I’m not a fan of the panicky poll
interface so I’d
rather not force the caller to use it since you can easily write an adapter to turn a Transfer
implementer into a Future
. See below:
struct FutureTransfer<T>
where
T: Transfer,
{
transfer: Option<T>,
}
// omitted: constructor
impl<T> Future for FutureTransfer<T>
where
T: Transfer,
{
type Item = T::Payload;
// (at this point you probably have noticed that, for simplicity, I've
// omitted error handling in the `Transfer` API)
type Error = !;
fn poll(&mut self) -> Poll<T::Payload, !> {
if self.transfer
.as_ref()
.expect("FutureTransfer polled beyond completion") // may `panic!`
.is_done()
{
let payload = self.transfer.take().unwrap().wait();
Ok(Async::Ready(payload))
} else {
Ok(Async::NotReady)
}
}
}
That’s my take on memory safe DMA based APIs. If you have come up with a different solution let me know!
I have proposed exploring this approach to DMA based APIs in the embedded-hal
repo. If you
implement or run into problems trying to implement these APIs leave a comment over there! You can
use my implementation of these APIs in the stm32f103xx-hal
crate as a reference. Unfortunately,
the APIs in that crate are pretty much undocumented but at least there are some (also undocumented)
examples.
I’ve also sketched an API for circular DMA transfers, which I have not included in this blog post, but I’m going to revisit the API to accommodate a use case raised by a user. I might do a small blog post about that once that API is more fleshed out.
Until next time.
Thank you patrons! ❤️
I want to wholeheartedly thank:
Iban Eguia, Aaron Turon, Geoff Cant, Harrison Chin, Brandon Edens, whitequark, James Munns, Fredrik Lundström, Kjetil Kjeka, Kor Nielsen, Alexander Payne, Dietrich Ayala, Kenneth Keiter, Hadrien Grasland, vitiral and 45 more people for supporting my work on Patreon.
Let’s discuss on reddit.
-
Bare metal applications don’t usually implement unwinding due to the cost / complexity but it’s not impossible to find an application that does. ↩︎
-
Some of you may be wondering if something stronger, like a memory synchronization instruction, is required here. This implementation is for a single core Cortex-M microcontroller. That architecture doesn’t reorder memory transactions so a compiler barrier is enough; a compiler barrier might not be enough in multi-core Cortex-M systems, though. ↩︎
-
I’ve seen worse, though. I’ve seen C programs mark whole statically allocated buffers that will be used with the DMA as
volatile
. That de-optimizes all operations on the buffer; that approach can even prevent the compiler from optimizing for loops over the buffer intomemcpy
/memset
. ↩︎