Parrot Virtual Machine/Run Core and Opcodes

Run Core
We've discussed run cores earlier, but in this chapter we are going to get into a much deeper discussion of them. Here, we are going to talk about opcodes, and the special opcode compiler that converts them into standard C code. We will also look at how these opcodes are translated by the opcode compiler into different forms, and we will see the different runcores that perform these opcodes.

Opcodes
Opcodes are written using a very special syntax which is a mix of C and special keywords. Opcodes are converted by the opcode compiler,  into the formats necessary for the different run cores.

The core opcodes for Parrot are all defined in, in files with a   extension. Opcodes are divided into different files, depending on their purpose:

Writing Opcodes
Ops are defined with the  keyword, and work similarly to C source code. Here is an example:

op my_op { }

Alternatively, we can use the  keyword as well:

inline op my_op { }

We define the input and output parameters using the keywords  and , followed by the type of input. If an input parameter is used but not altered, you can define it as The types can be ,   (strings),   (floating-point values) or   (integers). Here is an example function prototype:

op my_op(out NUM, in STR, in PMC, in INT) { }

That function takes a string, a PMC, and an int, and returns a num. Notice how the parameters do not have names. Instead, they correspond to numbers:

op my_op(out NUM, in STR, in PMC, in INT) ^      ^       ^       ^               |       |       |       |              $1      $2      $3      $4

Here's another example, an operation that takes three integer inputs, adds them together, and returns an integer sum:

op sum(out INT, in INT, in INT, in INT) { $1 = $2 + $3 + $4; }

Nums are converted into ordinary floating point values, so they can be passed directly to functions that require floats or doubles. Likewise, INTs are just basic integer values, and can be treated as such. PMCs and STRINGs, however, are complex values. You can't pass a Parrot STRING to a library function that requires a null-terminated C string. The following is bad:

op my_str_length(out INT, in STR) { $1 = strlen($2); // WRONG! }
 * 1) include 

Advanced Parameters
When we talked about the types of parameters above, we weren't entirely complete. Here is a list of direction qualifiers that you can have in your op:

The type of the argument can also be one of several options:

OP naming and function signatures
You can have many ops with the same name, so long as they have different parameters. The two following declarations are okay:

op my_op (out INT, in INT) { }

op my_op (out NUM, in INT) { }

The ops compiler converts these op declarations similar to the following C function declarations:

INTVAL op_my_op_i_i(INTVAL param1) { }

NUMBER op_my_op_n_i(INTVAL param1) { }

Notice the "_i_i" and "_n_i" suffixes at the end of the function names? This is how Parrot ensures that function names are unique in the system to prevent compiler problems. This is also an easy way to look at a function signature and see what kinds of operands it takes.

Control Flow
An opcode can determine where control flow moves to after it has completed executing. For most opcodes, the default behavior is to move to the next instruction in memory. However, there are many sorts of ways to alter control flow, some of which are very new and exotic. There are several keywords that can be used to obtain an address of an operation. We can then  that instruction directly, or we can store that address and jump to it later.

The Opcode Compiler
The opcode compiler is located at, although most of its functionality is located in a variety of included libs, such as. and.

We'll look at the different runcores in the section below. Suffice it to say, however, that different runcores require that the opcodes be compiled into a different format for execution. Therefore the job of the opcode compiler is relatively complex: it must read in the opcode description files and output syntactically correct C code in several different output formats.

Dynops: Dynamic Opcode Libraries
The ops we've been talking about so far are all the standard built-in ops. These aren't the only ops available however, Parrot also allows dynamic op libraries to be loaded in at runtime.

dynops are dynamically-loadable op libraries. They are written almost exactly like regular built-in ops are, but they're compiled separately into a library and loaded in to Parrot at runtime using the  directive.

Run Cores
Runcores are the things that decode and execute the stream of opcodes in a PBC file. In the most simple case, a runcore is a loop that takes each bytecode value, gathers the parameter data from the PBC stream, and passes control to the opcode routine for execution.

There are several different opcores. Some are very practical and simple, some use special tricks and compiler features to optimize for speed. Some opcores perform useful ancillary tasks such as debugging and profiling. Some runcores serve no useful purpose except to satisfy some basic academic interest.

Basic Cores

 * Slow Core: In the slow core, each opcode is compiled into a separate function. Each opcode function takes two arguments: a pointer to the current opcode, and the Parrot interpreter structure. All arguments to the opcodes are parsed and stored in the interpreter structure for retrieval. This core is, as its name implies, very slow. However, it's conceptually very simple and it's very stable. For this reason, the slow core is used as the base for some of the specialty cores we'll discuss later.
 * Fast Core: The fast core is exactly like the slow core, except it doesn't do the bounds checking and explicit context updating that the slow core does.
 * Switched Core: The switch core uses a gigantic C  statement to handle opcode dispatching, instead of using individual functions. The benefit is that functions do not need to be called for each opcode, which saves on the number of machine code instructions necessary to call an opcode.

Native Code Cores

 * JIT Core:
 * Exec Core:

Advanced Cores
The two cores that we're going to discuss next rely on a specialty feature of some compilers called computed goto. In normal ANSI C, labels are control flow statements and are not treated like first-class data items. However, compilers that support compute goto allow labels to be treated like pointers, stored in variables, and jumped to indirectly.

void * my_label = &&THE_LABEL; goto *my_label;

The computed goto cores compile all the opcodes into a single large function, and each opcode corresponds to a label in the function. These labels are all stored in a large array:

void *opcode_labels[] = { &&opcode1, &&opcode2, &&opcode3, ... };

Each opcode value can then be taken as an offset to this array as follows:

goto *opcode_labels[current_opcode];


 * Computed Goto Core: The computed goto core uses the mechanism described above to dispatch the various opcodes. After each opcode is executed, the next opcode in the incoming bytecode stream is looked up in the table and dispatched from there.
 * Predereferenced Computed Goto Core: In the precomputed goto core, the bytecode stream is preprocessed to convert opcode numbers into the respective labels. This means they don't need to be looked up each time, the opcode can be jumped to directly as if it was a label. Keep in mind that the dispatch mechanism must be used after every opcode, and in large programs there could be millions of opcodes. Even small savings in the number of machine code instructions between opcodes can make big differences in speed.

Specialty Cores

 * GC Debug Core:
 * Debugger Core:
 * Profiling Core:
 * Tracing Core: