instructor: Martin Karsten

MC4063 TTh 1-2:20

https://student.cs.uwaterloo.ca/~cs452/W23/

ex 4 1 32 5

Week 1. Jan 10

     +-----+                                linux.student.cs
     |train|                      network     +-------+   network
     |track|                   +--------------+ TFTP  +---------+
     +-----+                   |              +-------+         |
                               |                                |
                               |                                |
+----+------+-----+         +--+---+                        +---+--+
|PWR | 6021 | 6051+---------+ ARM  +---------+              | DEV  |
+----+------+-----+  COM1   +------+  COM2   |              +------+
      Marklin               grey box         |
                                             |
                                             |
                                             |
                                             | /dev/ttySx
                                          +--+--+
                                          | OPS |
                                          +-----+
                                          track PC

raspberry runs without MLU, unaligned access to memory will not work.

memory layout of running program

-------- 0xffff
 stack v
--------
 heap  ^
--------
 data
--------
 txt
--------
-------- 0

elf sections

freestanding environment (in contrast to hosted)

memory setup:

system timer (ch10) (vs arm timer):

big polling loop (a0):

for (;;) {
    if (c1) a1();
    if (c2) a2();
}

serial communication

                GPIO
                +--+         +--+
                |  +-------> +--+ SPI master
                |  |
+-----+         |  |         +--+
|ser. +------>  | 18+------->---+
|hat  |         | ..
+-----+------>  | 21 +------>+--+
                |  |         +--+
                |  |
                |  +-------> +--+
                +--+         +--+

SP1 (serial peripheral interface) (section 2.3 and ch9)

                31             0
                +--+---+---+---+
                |  | 1 | 2 | 3 |  +----> send end
                +^-+---+---+---+
               count
               bits

                +--+---+---+---+
recv end  +---> |  | 1 | 2 | 3 |
                +--+---+---+---+
                         1   2
                             1

UART:

UART configuration:

train and track commands:

eg. this code is bad; do not use other than SPI code

for (;;) {
    while (!c1) a1();
    while (!c2) a2();
}

Week 2. Jan 17

problems:

bounties:

multitasking

what is problem in the polling loop?

for (;;) {
    if (c1) a1();
    if (c2) a2();
}

even if only c1 is active, we still need to check everything.

improvement?

for (;;) {
    if (c1) a1();
    if (c2) a2();
    if (c1) a1();  // c1 is important, let's check again
    ...
}

this does not solve the problem.

what if we split big task:

for (;;) {
    if (c1) a1();
    if (c2) {a2_1(); a2_half_done = 1;}
    if (c1) a1();
    if (a2_half_done) {a2_2();}
}

this is too complex, not reusable.

concurrency:

key concerns of task: use stack; because need to handle recursion (unknown number of push/pops).

canonical approach of kernel code:

void kmain() {
  initialize();  // includes starting the first user task
  for (;;) {
    currtask = schedule();
    request = activate(currtask);
    handle(request);
  }
}

kernel:

why not shared memory for communication? it uses locks that are for OS.

entry/exit kernel-user-level tasks

task stack has:

memory:

software development principle: KISS

context switch

ARMv8:

register file:

ARM instructions:

what is context switch? changing stack (pointer), register file and pc.

questions:

kernel design: 01N

application binary interface (ABI)

subroutine

stack switch: a simple form of context switch

// abi rules apply (x0-x18 already saved)
void StackSwitch(char* newSP, char** oldSP);
  1. save x19-x30 (push to stack)
  2. save SP to to memory location *x1
  3. load SP from x0
  4. restore x19-x30 (pop)
  5. return

mode switch:

how to implement handler() for syscall:

Week 3. Jan 24

useful registers in SVC mode switch:

eg. how to mix C code and assembler?

main.c:

extern int adder(int, int);

int main(void) {...}

adder.c:

.text
.align 2  // align everything to 2^2=4 bytes (instr size)
          // or .balign 4
.global	adder
.type adder, %function
adder:
  add x2, x1, x0
  mov x0, x2
  bx lr  // or ret

eg. combine assembler with C code.

/* https://gcc.gnu.org/onlinedocs/gcc/Using-Assembly-Language-with-C.html */
unsigned int val;
// read via asm
asm volatile("mrs %x0, cpsr" : "=r"(val));
printf("cpsr: 0x%x\n\r", val);
// write via asm
asm volatile("msr cpsr, %x0" :: "r"(val));

pi initialization:

kernel start:

task exit: what if user program forgets to call Exit()? wrap the user function:

void task_start(void (*task_function)()) {
    task_function();
    Exit();
}

scheduling:

message passing

sender receiver
SEND_WAIT RECEIVE_WAIT
mailbox: queue of senders when ready, sender put itself to queue
rendezvous copy message
REPLY_WAIT <- decouple in time/space
2nd rendezvous copy response

message:

synchronization:

we can build an async producer/consumer pattern on top of the sync message passing mechanism.

single-producer-single-consumer

multiple-producer-single-consumer

single-producer-multiple-consumer

multiple-producer-multiple-consumer

server pattern:

name resolution:

Week 4. Jan 31

compiler optimization:

C/C++ volatile:

int addr = 0xfe000000 + 0x3000 + 0x04;
volatile unsigned *d = (...)addr;
unsigned val = *d;

timing/measurements:

eg. what is problem with this measurement?

start = clock();
...
stop = clock();
duration = stop - start;

improvement: do many operations and measure total time, then take average. this has problems too

periodic action with period X:

eg. what is problem with this?

for (;;) {
    sleep(X);
    action();
}

an improvement:

for (;;) {
    target += X;  // never accumulates error
    sleep(target - clock());
    action();
}

CPU caches:

cache modes (section b2.7)

cache setup:

cache maintainance:

interrupts

interrupts vs polling

interrupt handling/scheduling:

stack-switch:

interrupt hardware

edge interrupt vs level interrupt

GIC registers

how to get interrupt number (IRQ#):

handler routine:

Week 5. Feb 7

timer interrupt:

AwaitEvent():

interrupt handling:

clock server:

// same standard server loop
for (;;) {
    req = Receive();
    process(req);
    if (...) {
        Reply(req);
    }
}

the blocking on AwaitEvent is delegated to the clock notifier:

for (;;) {
    AwaitEvent();
    Send(clock_server);
}

idle:

software design

problem complexity vs. solution complexity

objectives:

design patterns:

debugging:

model implementation
program design/logic code
environment interface & docs internals

debugging strategy

timing problems

race condition/data races:

Week 6. Feb 14

uart

cpu --- spi --- uart (tx/rx) --- marklin
    gpio

sources of uart interrupts:

uart IRQ control:

uart fifo:

operation:

eg. when writing to TX fifo:

  1. write TX
  2. check TXLVL
  3. if not ready, enable TX interrupt, otherwise go back to step 1
  4. [interrupt]
  5. disable TX interrupt
  6. go to step 1

task structure:

SPI:

GPIO:

flow control:

train

train velocity differs at each level of speed, and differs in each environment => cannot predict state of the track. have to measure the speed.

data representation of velocity:

server types:

suggestion:

Week 9. Mar 7

train modelling

experiments and data:

measurement:

uncertainty:

eg. if we have dist=100, t1=10, t2=20: then tavg=15, so v=100/15=6.67.

dynamic calibration: use exponentially weighted moving average (EWMA)

acceleration:

eg. acceleration completes before 2nd sensor hit (d/t > v_avg)

   v
   ^
v2 |   +---------+-
   |  /          |
va | /           |
   |/            |
v1 +-------------+--> t
       t1        t2

eg. acceleration does not complete before 2nd sensor hit (d/t < v_avg)

   v
vr ^   +
   |  /|
vs | / |
   |/  |
v1 +---+-----------> t
       t

measuring stop distance:

measuring stop time:

start:

short moves:

latency

latency analysis:

average latency: no balancing between lower and higher latencies

worst-case latency: WCET

automatic analysis:

train control:

task priorities:

options for train control command loop:

Week 11. Mar 21

TC:

context switch stress testing:

train control

scenarios:

sensor attributions => error margin for prediction

what if unexpected errors (timeouts) occur:

deadlock: wait for random time an restart?

path-finding:

design approach

decomposing problems:

priority/interrupts

interrupts:

priority inversion

Week 12. Mar 28

real-time cpu scheduling

special-purpose vs general-purpose systems:

types of realtime:

efficiency vs latency tradeoff:

cyclic executive (polling loop):

for (;;) {
    t = current_time();
    if (1) ...
    ...
    sleep(t + target - current_time());
}

real-time task scheduling:

liu/layland (1973):

multiprocessor scheduling: np-hard

linux objectives:

Week 13. Apr 4

multi-core programming

parallelism:

without atomic ops, use dekker's algo:

// shared state
int turn = 0;
int wantIn[2] = {false, false};  // bool

// thread init
int me = 0;
int you = 1;

// thread critical section
wantIn[me] = true;
while (wantIn[you]) {
    if (turn != me) {
        wantIn[me] = false;
        while (turn != me) { /* wait */ }
        wantIn[me] = true;
    }
}
// critical code ...
turn = you;
wantIn[me] = false;

with atomic RMW operations (atomic increment, test-set etc.), use ticket lock:

// shared state
int ticket = 0;
int serving = 0;

// thread critical section
int myTicket = ticket++;
while (myTicket != serving) { /* wait */ }
// critical code
serving++;

memory ordering

eg.

// shared state
int a = 0, b = 0;

// thread 1
a = 1;
print(b);

// thread 2
b = 1;
print(a);

outcome of 00 is possible because of delayed writes.

TSO (total store order):

memory barriers (fences): special instructions to enforce memory ordering

// thread critical section
wantIn[me] = true;
fence;
while (wantIn[you]) {
    if (turn != me) {
        wantIn[me] = false;
        while (turn != me) { /* wait */ }
        wantIn[me] = true;
        fence;
    }
}

sequential consistency:

ticket lock:

// thread critical section
int myTicket = ticket++;
while (myTicket != serving) { /* wait */ }
fence;
// critical code
fence;
serving++;

x86 atomics have fence semantics.

volatile:

we use atomic instructions to ensure parallelism and fence for memory ordering.