Creating a Virtual Machine/Register VM in C

Design
The first example will be one of the simplest possible architectures, and it will consist of the following components:
 * 1) A set of registers (I will arbitrarily choose to have 4 of them, numbered 0 to 3). The registers serve as a group of read/write memory cells.
 * 2) A program, which is a read-only sequence of VM instructions
 * 3) An execution unit to run the program. The execution unit will read each instruction in order and execute it.

Addressing modes
In any assembly language numbers serve multiple uses. Aside from representing scalar values, they also represent register numbers and memory locations. In this example I will follow the convention of using prefix characters to denote the role a number plays.
 * Register numbers begin with the letter r, like r0, r1, r2.
 * Immediate (scalar) values begin with the hash mark #, like #100, #200.
 * Memory addresses begin with the at sign @, like @1000, @1004.

Instruction set
After designing the VM, it is necessary to design the VM's instruction set. The instruction set is simply a reflection of the kinds of things we would like to do (or allow others to do) with this VM. Here are some things it should be possible to do with this VM: Let's keep it just that simple for now. After we have a working implementation we can go back and add more instructions.
 * Load an immediate number (a constant) into a register.
 * Perform an arithmetic sum on two registers (i.e., add two numbers).
 * Halt the machine.

First let's write a short program using these three instructions to see what they might look like: If the VM were to run the program, this is a description of what would happen on each line: Note that in keeping with common convention I have chosen to place the destination registers first in the operand list.
 * 1) Load the immediate value 100 into the register r0. Note that placing a value into a register is commonly called a load operation.
 * 2) Load the immediate value 200 into register r1.
 * 3) Add the contents of registers r0 and r1 and place the sum into register r2.
 * 4) End the program and halt the VM.

Instruction codes
After the assembly language is created it is necessary to determine how to represent each instruction as a number. This establishes a one-to-one correspondence between each instruction in the assembly language and each instruction code in the set of instruction codes. Converting a program from assembly language to instruction codes is called assembling, and conversion from instruction codes back into assembly language is called disassembling.

Several choices we must make at this point are: First, to answer the last question, since there are only small numbers of instructions and registers in this VM it should not be very difficult to encode all operands in a single instruction word, even if (for the sake of simplicity) I were to use a 16-bit instruction word. Thus, a 16-bit number written in hexadecimal has 4 digits, giving us easy access to 4 information fields, each containing 16 variations (0-9 and A-F). Having made these decisions, let us now establish the encoding. Recall that we have 16 instruction numbers available.
 * What number is used to represent each assembly language instruction?
 * How are instruction operands encoded?
 * Are operands part of the instruction word (remember, by word I mean number), or are they separate words (numbers)?
 * 1) The first digit of a machine word will be the instruction number. This gives our VM the potential for having up to 16 different instructions. This is a small amount by contemporary standards, but it is plenty for our example virtual machine.
 * 2) The next three digits will be used for the operands. These can be used as three 1-digit operands, two operands of 1 and 2 digits, or a single 3-digit operand.

The halt instruction will be instruction 0, and there is an important reason for choosing 0 for this instruction. Since empty space in the computer's memory will most likely be filled with 0s, any run-away program will eventually encounter a 0 and attempt to execute this instruction, immediately halting the program.

The remaining two instructions can be assigned arbitrarily: the loadi instruction can be instruction 1, and the add instruction can be instruction 2. This is our current instruction encoding list: 0 = halt 1 = loadi 2 = add Examining the first program instruction, we see that we must now encode the register and the immediate value: There are three hexadecimal digits available for operands, so we will use the first of those three as the register number, and the second and third together as the immediate value. Now we have just determined the complete encoding of the loadi instruction: bits 15-12 = 1 bits 11- 8 = register number bits 7- 0 = immediate value The register number for this instruction is 0, and the immediate value is 100, which in hexadecimal is 64 (6 x 16 + 4). Thus, the complete 16-bit hexadecimal instruction code for the first instruction is 1064.

The second instruction is assembled in a similar way. The immediate value is 200, which in hexadecimal is C8. The second instruction assembles to 11C8.

The third instruction is 2201.

The last instruction is 0.

Putting the instructions together, we get these 4 16-bit hexadecimal numbers as the complete program:

Implementation
It is now time to implement this design. I have chosen to write this program in C for a number of reasons: The primary function of the VM will be a run function that does the following steps in order inside a loop. The loop repeats until the halt instruction is executed. Let's begin writing the program.
 * It's a very simple language that allows the writing of concise programs.
 * It's a low level language with some high-level features. This makes it a good choice for creating a VM.
 * 1) Fetch the next instruction from the program.
 * 2) Decode the instruction into its constituent parts.
 * 3) Execute the decoded instruction.

Registers
The first and simplest part to implement is the set of 4 registers. The registers in this VM are signed integers that are, depending on your computer and operating system, either 16, 32, or 64 bits wide. The exact size is irrelevant for this example.

Program instructions
The fully assembled program can easily be stored in an array of integers.

Fetch
The fetch function retrieves the next word from the program. In order to do this we must introduce a variable called the program counter (also sometimes called the instruction pointer). Before the instruction word is returned from the function the program counter must be incremented so that it refers to the next instruction in the program array.

Decode
The decode function will decode each instruction completely. It will determine the three operand registers and immediate value for each instruction, even if an instruction doesn't use that part. This actually makes the function simpler. The instruction number and operands are stored in variables.

Execute
The execute function actually performs the instruction. First there needs to be a running flag that the halt instruction can set to 0. Now the eval function. It's just a switch ladder. This function uses the instruction code and operand variables that the decode function stored values into. Notice that in each instruction case I display the instruction. This makes it easy to follow the program as it runs.

Run
The run function is rather simple: fetch, then decode, then execute.

Main
The main function simply calls the run function:

Complete program source
For your convenience I have included the complete program source in a single block. Note that I have added a showRegs function in order to see the values of all the registers between instructions. This allowed me to verify that both the VM and the assembled program are working correctly.

Sample run
Here is the output from the program: