Ruby Hacking Guide/Threads

Ruby Interface
Come to think of it, I haven’t yet shown an example of using Ruby threads in practice. It’s not much but here’s an introduction for now:

If you execute this program, you should see “from thread” and “from main” mixed together in the output.

Of course, more than just creating multiple threads, there are also a number of ways to control them. There isn’t a  keyword like in Java, but common primitives like ,  , and   are provided, and the API below can be used for operations on the threads themselves.

Thread API
— Pass execution to another thread — End the thread — End this thread — Pause this thread temporarily — Wait for the receiving thread to end — Resume a thread that was previously paused

Ruby Threads
At a glance, threads may seem to all be run together, but they are actually executed in turns, each for a little bit of time. Strictly speaking, on multi-CPU machines one can concurrently run multiple threads, but even so, if there are more threads than CPU’s, the threads must run in turns.

Ruby still has a GIL (Global Interpreter Lock). Because of this lock, the ruby interpreter can strictly speaking run only one thread at a time. However, when a thread is blocked (e.g. waiting for data to arrive over the network), the interpreter can use switch to another thread while the blocked thread is waiting. At the moment, if you want to truly run multiple threads concurrently with ruby you will have to run multiple interpreters. This technique is often used by web servers like unicorn. Much work as been done to lessen the impact of the GIL, and in the future it might disappear altogether. For most purposes though, the current situation is sufficient.

Preemptive?
Now we’ll talk about the characteristics of Ruby threads in a little more detail. When talking about threads, one can talk about whether or not they are preemptive.

In a preemptive threading system, even if the user of the threads doesn’t explicitly switch threads, the threads will be switched on their own. Looked at in reverse, the timing of thread switching cannot be controlled by the user.

On the other hand, in a non-preemptive threading system, as long as the user of the threads doesn’t explicitly say “you can pass control to the next thread now,” the threads won’t switch. Looked at in reverse again, it is clear that the user of the threads can control where it is possible for threads to be switched.

This distinction can also be made for processes. In this case, preemptive is seen as the “superior” approach. If, for example, a program had a bug which caused it to fall into an infinite loop, processes would not be able to switch. In other words, one user program could lock up the entire system; this is no good. Windows 3.1 has MS-DOS as its foundation, so its process switching is non-preemptive, but Windows 95’s is preemptive. Therefore, Windows 95 is more robust and it can be said that Windows 95 is “superior” to Windows 3.1.

So which is it for Ruby threads? At the Ruby level, threads are preemptive and at the C level, threads are non-preemptive. In other words, when writing C code, you can almost exactly specify the timing of thread switches.

Why is Ruby threading like this? Threads are indeed convenient but there are certain considerations that must be made when using them. Namely, code must accommodate for the threads (the code must be thread-safe ). That is, if thread switching were preemptive at the C level, all of the C libraries that we use would have to be thread-safe.

However, there are actually many C libraries that are not yet thread-safe. If we decreased the number of libraries you can use by making thread safety a requirement, all of the effort taken to make extension libraries easy to write would be meaningless. Thus, for Ruby, making threading non-preemptive at the Ruby level is the rational choice.

Management Structure
We learned that at the C level, Ruby threads are non-preemptive. That is, after your thread runs for a while, it voluntarily gives up control to another thread. So let’s consider an executing thread that is just about to stop running. Which thread should it pass control to? No, to begin with we need to know how Ruby threads are represented internally. Let’s take a look at the variables and data structure for managing threads.

▼ Thread management structure

For various reasons,  has become very large, so we focus on the important parts here. Looking at just the two members,  and , which are both   structures, you might think that   is a doubly-linked list. But actually, it is not just a doubly-linked list; its ends meet. In other words, it is a circular doubly-linked list. This is an important point. When you add the static variables,  and , the whole data structure looks like Diagram 1.

Figure 1 : Data structure for managing threads

is a thread that exists when the program is starting up. In other words, it is the “first” thread. is, of course, the current thread; that is, the thread that is currently running. The value of  doesn’t change throughout the operation of the process but the value of   changes rapidly.

With the threads forming a cycle in this manner, choosing the next thread is simple: just follow the  link and choose that thread. With just that, you can run all threads evenly, to an extent.