Embedded Systems/Threading and Synchronization

One of the most useful developments in the history of computing is multitasking and multithreading. This technique isn't always available to an embedded system engineer, but some embedded systems and RTOS have multithreading (MT) capability. The chapters in this section will talk about some of the uses of MT, and will discuss some of the common pitfalls associated with MT programming. This page is only going to serve as a brief reference to multi-threaded programming.

Preemptive Multithreading
When the first multi-tasking systems were established, they did not have a central controller. Multi-tasking was established by having programs voluntarily give up control to the system, and the system would then give control to another process. This system worked reasonably well, except that any program that was misbehaving would slow down the entire system. For instance, if a program got stuck in an infinite loop, it would never give up control, and the system would freeze.

The solution to this problem is preemptive multithreading. In a preemptive environment, control could be moved from one process to another process at any given time. The process that was "preempted" would not even know that anything had happened, except maybe there would be a larger than average delay between 2 instructions. Preemptive multithreading allows for programs that do not voluntarily give up control, and it also allows a computer to continue functioning when a single process hangs.

There are a number of problems associated with preemptive multithreading that all stem from the fact that control is taken away from one process when it is not necessarily prepared to give up control. For instance, if one process were writing to a memory location, and was preempted, the next process would see half-written data, or even corrupted data in that memory location. Or, if a task was reading in data from an input port, and it was preempted, the timing would be wrong, and data would be missed from the line. Clearly, this is unacceptable.

The solution to this new problem then is the idea of synchronization. Synchronization is a series of tools provided by the preemptive multithreaded operating system to ensure that these problems are avoided. Synchronization tools can include timers, "critical sections," and locks. Timers can ensure that a given process may be preempted, but only for a certain time. Critical sections are commands in the code that prevent the system from switching control for a certain time. Locks are commands that prevent interference in atomic operations. These topics will all be discussed in the following chapters.

Mutexes
The term Mutex is short for "Mutual Exclusion", and is a type of mechanism used in a preemptive environment that can prevent unauthorized access to resources that are currently in use. Mutexes follow several rules:


 * 1) Mutexes are system wide objects, that are maintained by the kernel.
 * 2) Mutexes can only be owned by one process at a time
 * 3) Mutexes can be acquired by asking the kernel to allocate that mutex to the current task
 * 4) If a Mutex is already allocated, the request function will block until the mutex is available.

In general, it is considered good programming practice to release mutexes as quickly as possible. Some problems with mutexes will be discussed in the chapter on deadlocks.

Spin Locks
Spin locks is a quick form of synchronization methods. It is named after its behavior - spin in the loop while the condition is false. To implement spin lock system should support test-and-set idiom or give exclusive access to a locking thread by any means (masking interrupts, locking bus).

An advantage of spin locks is that they are very simple. A disadvantage is that they waste CPU cycles in loop waiting. Most common use of spin locks is to synchronize quick access to objects. It is not advisable to do a long computations while spin locked a section.

Critical Sections
A critical section is a sequence of computer instructions that may malfunction if interrupted. An atomic operation is a sequence of computer instructions that cannot be interrupted and function correctly. In practice, these two subtly different definitions are often combined. Operating systems provide synchronization objects to meet these requirements, and some actually call these objects as "critical sections," "atomic operations" or "monitors."

An example of a critical section is code that removes data from a queue that is filled by an interrupt. If the critical section is not protected, the interrupt can occur while the dequeuing function is changing pointers, and corrupt the queue pointers. An example of an atomic operation is an I/O read where the process must read all the data at a particular rate, and cannot be preempted while reading.

A generally good programming practice is to have programs exit their critical sections as quickly as possible, because holding a critical section for too long will cause other processes on the system not to get any time, and will cause a major performance decrease. Critical sections should be used with care.

Priority Scheduling
Many RTOS have a mechanism to distinguish the relative priorities of different tasks. High-priority tasks are executed more frequently than the low priority tasks. Each implementation of priority scheduling will be a little different, however.

Deadlock
A deadlock occurs when a series of synchronization objects are held in a preemptive MT system in such a way that no process can move forward. Let us take a look at an example:

Let's say that we have 2 threads: T1 and T2. We also have 2 mutexes, M1 and M2.
 * 1) T1 asks for and acquires mutex M1.
 * 2) T2 acquires M2
 * 3) T1 asks for M2, and the system transfers control to T2 until T2 releases M2.
 * 4) T2 asks for M1, and the system is in deadlock (neither thread can continue until the other releases it's mutex).

This is a very difficult problem to diagnose, and an even more difficult problem to fix. This chapter will provide some general guide-lines for preventing deadlocks.

Watchdog Timer
See ../Watchdog Timer.

read twice and compare
Perhaps the most common concurrency algorithm used in embedded systems is the "read the counter twice and compare" optimistic concurrency control pattern.

Many hardware timers and hardware counters are connected to the CPU over an 8-bit bus. If the low byte of the timer happens to be 0xFF when we start to read the timer, if the timer increments between two byte reads, a simple read of each byte separately will get a corrupted value. We get a slightly different corrupted value if we read the low byte first vs. the low byte last, but it's corrupted either way.

One simple solution is to read the counter twice and compare:

or

Because no locks are used, the double-read algorithm avoids deadlocks, avoids priority inversion, and avoids other problems with locks.

There are many other algorithms based on this read-twice-and-compare algorithm.

For example, the seqlocks used in Linux use this read-twice-and-compare algorithm.

For example, single-reader single-writer ring buffers are also common in embedded systems, and both the reader and the writer can be implemented with a similar lock-free algorithm.

incrementing counter
When the read-twice-and-compare algorithm is used to synchronize several processes (rather than, as above, a hardware counter and a process that reads the counter), the process that changes the counter needs to make that change in a way that appears atomic to the other process.

Many architectures have a single instruction that can atomically increment a variable. Alas, the exact details vary from one architecture to another, and from one C compiler to another. Some relatively portable ways to tell the C compiler to atomically increment a variable involve the std::atomic template class from the standard Atomic operations library, part of the C11 standard for the C programming language; or the tbb::atomic template class; or the boost::atomic template class; etc.

The main problem with the double-read algorithm is the ABA problem. To avoid the ABA problem, we use counters that only count up, and we give the counter enough bits that, for any reasonable amount of time a thread could possibly be delayed between reading the first byte of the first read and the last byte of the second read (say, maximum 1 second), the counter is long enough that at the worst-case (highest-frequency) count rate it takes much longer than that—say, the minimum time to overflow is once every 10 seconds.

For example, up-down counters can be synthesized from 2 counters, one that only counts up, and the other that only counts down.