We have used the RTFM framework in the previous posts but put most of the
application logic in tasks, and always sent the processor to sleep in the idle
function. In this post we’ll put the idle
function to better use and build a
CPU usage monitor there.
Idle
The main logic of the CPU usage monitor will be in the idle
function. Let’s
see how it works:
// RESOURCES
peripherals!(stm32f30x, {
DWT: Peripheral {
register_block: Dwt,
ceiling: C0,
},
(..)
});
// Total sleep time (in clock cycles)
static SLEEP_TIME: Resource<Cell<u32>, C1> = Resource::new(Cell::new(0));
// IDLE LOOP
fn idle(ref prio: P0, _thr: T0) -> ! {
loop {
// For the span of this critical section the processor will not service
// interrupts (tasks)
rtfm::atomic(|thr| {
let dwt = DWT.access(prio, thr);
let sleep_time = SLEEP_TIME.access(prio, thr);
// Sleep
let before = dwt.cyccnt.read();
rtfm::wfi();
let after = dwt.cyccnt.read();
let elapsed = after.wrapping_sub(before);
// Accumulate sleep time
sleep_time.set(sleep_time.get() + elapsed);
});
// Tasks are serviced at this point
}
}
We will still put the processor to sleep in the idle
function – you can still
see the WFI (Wait For Interrupt) instruction. However this time the instruction
will be executed within a global critical section (rtfm::atomic
). So what
will happen when an interrupt event arrives? The processor will wake up from
the WFI instruction but it will not service the interrupt because of the
critical section; instead it will continue executing the idle
function.
Before and after the WFI instruction the CYCCNT register is read; the
difference between these two snapshots, elapsed
, is the time, in clock cycles,
that the processor spent sleeping, waiting for an interrupt. This sleep time
is then accumulated in the SLEEP_TIME
resource; this resource tracks the
total time spent sleeping.
Once the resource has been updated the critical section ends and idle
gets
immediately preempted by the pending interrupts. The processor then starts
executing the tasks that need to be serviced. Once the processor has finished
executing all the pending tasks it returns back to idle
.
The loop
restarts: a new critical section starts and the processor goes back
to sleep. The whole cycle repeats.
So the logic in the idle
function is actually not measuring the CPU use;
it’s measuring the opposite: the total time the CPU is not being used. To turn
that number into CPU usage we have to subtract the total elapsed time by the
time spent sleeping; that would be the time the CPU was active. To get a
percentage we have to divide the active time by the total elapsed time and
multiply by 100%. The formula would be:
CPU_USE = (TOTAL - SLEEP) / TOTAL * 100.
Now let’s build an example.
Blinky
We’ll add the CPU usage monitor to the blinky example we used in the post that introduced the RTFM framework.
Starting from that example we’ll replace the idle
function with the version
shown in the previous section and then tweak the blinky
task like this:
fn blinky(ref mut task: Tim1UpTim10, ref prio: P1, ref thr: T1) {
static STATE: Local<bool, Tim1UpTim10> = Local::new(false);
let tim1 = &TIM1.access(prio, thr);
let itm = ITM.access(prio, thr);
let sleep_time = SLEEP_TIME.access(prio, thr);
let state = STATE.borrow_mut(task);
let timer = Timer(tim1);
if timer.clear_update_flag().is_ok() {
*state = !*state;
if *state {
Green.on();
} else {
Green.off();
}
// NEW!
// Report clock cycles spent sleeping
iprintln!(&itm.stim[0], "{}", sleep_time.get());
// Reset sleep time back to zero
sleep_time.set(0);
} else {
// Only reachable via `rtfm::request(blinky)`
unreachable!()
}
}
(You can find the full source code of this program in the appendix)
The new part here is that, after we toggle the state of the LED, we print the
sleep time to the console using the ITM, and then we reset the SLEEP_TIME
counter back to zero. As the blinky
task is periodic this will print the
number of clock cycles the processor spent sleeping in a period of one second.
Let’s see the numbers reported by this program when compiled in debug mode (without optimizations):
$ cat /dev/ttyUSB0
(..)
7993609
7993608
7993609
7993608
The sleep time is reported every one second. As the processor is operating at 8 MHz 1 second is equivalent to 8 millions of processor clock cycles. Subtracting that value by the ones above yields the active CPU time per second:
6391
6392
6391
6392
Those values can be converted to a CPU usage percentage using the formula we
presented before (PERIOD
is 8_000_000
):
0.0798875%
0.0799%
0.0798875%
0.0799%
Averaging those values gives 0.0799%
Note that this CPU usage includes both blinking the LED, and reporting the sleep time.
We can repeat the measurement but with the program compiled in release mode (with optimizations):
$ cat /dev/ttyUSB0
(..)
7999488
7999488
7999488
7999488
Now the average CPU usage is down to 0.0064%.
For extra enlightenment we can do a threshold vs time diagram, like we did in the fearless concurrency post, for this last program:
The difference is that the preemption threshold of the idle loop is now maxed
out but once an update event arrives the threshold of idle
is quickly dropped
to allow preemption and service the blinky
task. Scheduling wise nothing has
changed.
Loopback
Let’s do one more example. Let’s add a CPU usage monitor to the concurrency
example we did in the fearless concurrency post. Starting from this post blinky
example we only have to add the loopback
task and some initialization code:
tasks!(stm32f103xx, {
// ..
loopback: Task {
interrupt: Usart1,
priority: P1,
enabled: true,
},
});
fn loopback(_task: Usart1, ref prio: P1, ref thr: T1) {
let usart1 = USART1.access(prio, thr);
let serial = Serial(&usart1);
if let Ok(byte) = serial.read() {
if serial.write(byte).is_err() {
// NOTE(unreachable!) unlikely to overrun the TX buffer because we
// are sending _one_ byte per byte received
unreachable!()
}
} else {
// NOTE(unreachable!) only reachable through `rtfm::request(loopback)`
unreachable!()
}
}
This loopback
task will send back the data that comes through the serial
interface byte by byte.
In the original example we tested this program by manually sending data through a serial terminal. The data throughput was rather low as the input came from a keyboard. This time we’ll maximize the data throughput using the following command:
$ # Send the pangram over the serial interface 1000 times and as fast possible
$ for i in `seq 1 1000`; do
echo "The quick brown fox jumps over the lazy dog." > /dev/rfcomm0
done
Let’s see if the microcontroller can keep up and what the CPU usage is under these conditions:
In this take the program running on the microcontroller was compiled in release mode. The top right terminal shows the data that the microcontroller echoes back. The bottom right terminal shows the sleep time periodically reported by the microcontroller. Here’s a transcript of the sleep times:
$ cat /dev/ttyUSB0
(..)
7999488
7660188
7636788
7636788
7639843
7639612
7648033
7646047
7636788
7926363
7999488
The average CPU usage under these conditions is 4.4624% (the first and last two samples were not taken into account in the computation).
Delayed interrupt handling
Using this CPU monitor comes at a cost: servicing of interrupts (tasks) will be
delayed by a constant number of clock cycles. You can think of this as an
increased context switching cost when switching from idle
to some task.
I think this delay is better visualized by looking at the source code of idle
.
Note the comments:
fn idle(ref prio: P0, _thr: T0) -> ! {
loop {
rtfm::atomic(|thr| {
let dwt = DWT.access(prio, thr);
let sleep_time = SLEEP_TIME.access(prio, thr);
let before = dwt.cyccnt.read();
// Sleep
rtfm::wfi(); // <- event A arrives and wakes the processor up
let after = dwt.cyccnt.read();
let elapsed = after.wrapping_sub(before);
sleep_time.set(sleep_time.get() + elapsed);
});
// task A starts
}
}
Starting the task A is delayed by the execution of the code that increases the
SLEEP_TIME
counter:
let after = dwt.cyccnt.read();
let elapsed = after.wrapping_sub(before);
sleep_time.set(sleep_time.get() + elapsed);
}); // end of critical section
The delay is constant and less than 10 clock cycles – I’ll show you the
disassembly in a bit – which is not that bad. However, there’s one particular
scenario where a task can be delayed indefinitely. Again let’s look at the
source code of idle
:
UPDATE /u/idubrov pointed out that WFI behaves like a NOP (No OPeration) if there’s an interrupt is pending so the scenario described below is actually impossible. Yay!
fn idle(ref prio: P0, _thr: T0) -> ! {
loop {
rtfm::atomic(|thr| {
let dwt = DWT.access(prio, thr);
let sleep_time = SLEEP_TIME.access(prio, thr);
// <- event A arrives. Task A is NOT executed
let before = dwt.cyccnt.read();
// Sleep
rtfm::wfi(); // <- event B arrives and wakes the processor up
let after = dwt.cyccnt.read();
let elapsed = after.wrapping_sub(before);
sleep_time.set(sleep_time.get() + elapsed);
});
// tasks A and B start
}
}
Here event A arrives after the critical section is started and before WFI is executed. Because of this the execution of task A will be delayed until some other event arrives and wakes up the processor. That could take a few cycles, or several, or never occur. In any case it’s bad because the delay is non deterministic in the general case.
The condition to arrive at this scenario is that some interrupt event must
arrive after the critical section starts (the interrupts get disabled) and
before the processor executes the WFI instruction. The likelihood of this
scenario will thus depend on how big that window is so let’s look at the
disassembly of the idle
function:
08000462 <blinky::idle>:
8000462: f241 0c04 movw ip, #4100 ; 0x1004
8000466: f240 0100 movw r1, #0
800046a: f2ce 0c00 movt ip, #57344 ; 0xe000
800046e: f2c2 0100 movt r1, #8192 ; 0x2000
8000472: e000 b.n 8000476 <blinky::idle+0x14>
8000474: b662 cpsie i ; ENABLE interrupts
8000476: f3ef 8210 mrs r2, PRIMASK
800047a: b672 cpsid i ; DISABLE interrupts
800047c: f8dc 3000 ldr.w r3, [ip] ; read CYCCNT
8000480: bf30 wfi ; SLEEP
8000482: f8dc 0000 ldr.w r0, [ip] ; read CYCCNT
8000486: f012 0f01 tst.w r2, #1
800048a: eba0 0003 sub.w r0, r0, r3
800048e: 680b ldr r3, [r1, #0]
8000490: 4418 add r0, r3
8000492: 6008 str r0, [r1, #0]
8000494: d1ef bne.n 8000476 <blinky::idle+0x14>
8000496: e7ed b.n 8000474 <blinky::idle+0x12>
The window is very small; it’s a single instruction: ldr.w r3, [ip]
. That
makes the likelihood of hitting this unbounded delay scenario almost
impossible.
Outro
All right. There you go: a noninvasive (no need for external monitoring hardware), low overhead CPU usage monitor. Until next time!
Thank you patrons! ❤️
I want to wholeheartedly thank:
Iban Eguia, Aaron Turon, Geoff Cant, Harrison Chin, Brandon Edens, whitequark, J. Ryan Stinnett, James Munns and 20 more people for supporting my work on Patreon.
Let’s discuss on reddit.
Appendix
#![feature(const_fn)]
#![feature(used)]
#![no_std]
// version = "0.2.9"
#[macro_use]
extern crate cortex_m;
// version = "0.2.4"
extern crate cortex_m_rt;
// version = "0.1.0"
#[macro_use]
extern crate cortex_m_rtfm as rtfm;
// git = "https://github.com/japaric/blue-pill"
// rev = "63f2e6195546669f685606096db78ec73c5525b8"
extern crate blue_pill;
use core::cell::Cell;
use blue_pill::led::{Green, self};
use blue_pill::stm32f103xx;
use blue_pill:⏲:Timer;
use rtfm::{Local, C1, P0, P1, Resource, T0, T1, TMax};
use stm32f103xx::interrupt::Tim1UpTim10;
// CONFIGURATION
const FREQUENCY: u32 = 1; // Hz
// RESOURCES
peripherals!(stm32f103xx, {
DWT: Peripheral {
register_block: Dwt,
ceiling: C0,
},
GPIOC: Peripheral {
register_block: Gpioc,
ceiling: C0,
},
ITM: Peripheral {
register_block: Itm,
ceiling: C1,
},
RCC: Peripheral {
register_block: Rcc,
ceiling: C0,
},
TIM1: Peripheral {
register_block: Tim1,
ceiling: C1,
},
});
// Total sleep time (in clock cycles)
static SLEEP_TIME: Resource<Cell<u32>, C1> = Resource::new(Cell::new(0));
// INITIALIZATION PHASE
fn init(ref prio: P0, thr: &TMax) {
let dwt = &DWT.access(prio, thr);
let gpioc = &GPIOC.access(prio, thr);
let rcc = &RCC.access(prio, thr);
let tim1 = &TIM1.access(prio, thr);
let timer = Timer(tim1);
dwt.enable_cycle_counter();
led::init(gpioc, rcc);
timer.init(FREQUENCY, rcc);
timer.resume();
}
// IDLE LOOP
fn idle(ref prio: P0, _thr: T0) -> ! {
loop {
// For the span of this critical section the processor will not service
// interrupts (tasks)
rtfm::atomic(|thr| {
let dwt = DWT.access(prio, thr);
let sleep_time = SLEEP_TIME.access(prio, thr);
// Sleep
let before = dwt.cyccnt.read();
rtfm::wfi();
let after = dwt.cyccnt.read();
let elapsed = after.wrapping_sub(before);
// Accumulate sleep time
sleep_time.set(sleep_time.get() + elapsed);
});
// Tasks are serviced at this point
}
}
// TASKS
tasks!(stm32f103xx, {
blinky: Task {
interrupt: Tim1UpTim10,
priority: P1,
enabled: true,
},
});
fn blinky(ref mut task: Tim1UpTim10, ref prio: P1, ref thr: T1) {
static STATE: Local<bool, Tim1UpTim10> = Local::new(false);
let tim1 = &TIM1.access(prio, thr);
let itm = ITM.access(prio, thr);
let sleep_time = SLEEP_TIME.access(prio, thr);
let state = STATE.borrow_mut(task);
let timer = Timer(tim1);
if timer.clear_update_flag().is_ok() {
*state = !*state;
if *state {
Green.on();
} else {
Green.off();
}
// NEW!
// Report clock cycles spent sleeping
iprintln!(&itm.stim[0], "{}", sleep_time.get());
// Reset sleep time back to zero
sleep_time.set(0);
} else {
// Only reachable via `rtfm::request(blinky)`
unreachable!()
}
}