Parrot Virtual Machine/Squaak Tutorial/PAST Nodes and More Statements

The previous episode introduced the full grammar specification of Squaak, and we finally started working on the implementation. If you're doing the exercises, you currently have basic assignments working; strings and integers can be assigned to (global) variables. This episode will focus on implementation of some statement types and explain a few bits about the different PAST node types.

Parrot Abstract Syntax Tree
A Parrot Abstract Syntax Tree (PAST) represents a program written in Squaak (or any other Parrot-ported language), and consists of nodes. In the previous episode, we already saw nodes to represent string and integer literals, identifiers and "operator" nodes, in our case assignment.

Other operators represent other high-level language constructs such as conditional statements, loops, and subroutine invocation. Depending on the node type, a PAST node can take child nodes. For instance, a PAST node to represent an if-statement can have up to three child nodes. The first child node represents the condition; if true, the second child node is evaluated. If the condition evaluates to false, and there's a third child node, this third child node is evaluated (the else part).

If the PAST represents a subroutine invocation, the child nodes are evaluated in a different way. In that case, the first child node represents the subroutine that is to be invoked (unless the  attribute was set on this node, but more on that in a later episode), and all other child nodes are passed to that subroutine as arguments.

It generally doesn't matter of which PAST node type the children are. For instance, consider a language in which a simple expression is a statement:

42

You might wonder what kind of code is generated for this. Well, it's really very simple: a new  node is created (of a certain type, for this example that would be ' '), and the value is assigned to this node. It might seem a bit confusing to write something like this, as it doesn't really do anything (note that this is not valid Squaak input):

if 42 then "hi" else "bye" end

But again, this works out correctly; the "then" and "else" blocks are compiled to instructions that load that particular literal into a  node and leave it there. That's fine, if your language allows such statements.

The point I'm trying to make is, that all PAST nodes are equal. You don't need to think about the node types if you set a node as a child of some other parent node. Each PAST node is compiled into a number of PIR instructions.

Go with the control-flow
Now you know a bit more on PAST nodes, let's get our hands dirty and implement some more statement types. In the rest of this episode, we'll handle if-statements and throw-statements.

If-then-else
The first statement we're going to implement now is the if-statement. An if-statement has typically three parts (but this of course depends on the programming language): a conditional expression, a "then" part and an "else" part. Implementing this in Perl 6 rules and PAST is almost trivial:

Note that the optional  block is stored in the match object's " " field. If we hadn't written this  part, then   would have been an array, with   the " " part, and   the optional   part. Assigning the optional  block to a different field, makes the action method slightly easier to read.

Also note that the statement rule has been updated; a statement is now either an assignment or an if-statement. As a result, the action method statement now takes a key argument. The relevant action methods are shown below:

That's, easy, huh? First, we get the result objects for the conditional expression and the then part. Then, a new  node is created, and the   is set to 'if', meaning this node represents an if-statement. Then, if there is an "else" block, this block's result object is retrieved and added as the third child of the PAST node. Finally, the result object is set with the make function.

Result objects
At this point it's wise to spend a few words on the   function, the parse actions and how the whole PAST is created by the individual parse actions. Have another look at the action method if_statement. In the first two lines, we request the result objects for the conditional expression and the "then" block. When were these result objects created? How can we be sure they're there? The answer lies in the order in which the parse actions are executed. The special   symbol that triggers a parse action invocation, is usually placed at the end of the. For this input string: " " this implies the following order:


 * 1) parse TOP
 * 2) parse statement
 * 3) parse if_statement
 * 4) parse expression
 * 5) parse integer
 * 6) create
 * 7) parse block
 * 8) parse statement
 * 9) parse assignment
 * 10) parse identifier
 * 11) create
 * 12) parse integer
 * 13) create
 * 14) create
 * 15) create   (in action method block)
 * 16) create
 * 17) create   (in action method TOP)

As you can see, PAST nodes are created in the leafs of the parse tree first, so that later, action methods higher in the parse tree can retrieve them.

Throwing Exceptions
The grammar rule for the "throw" statement is really quite easy, but it's useful to discuss the parse action, as it shows the use of generating custom PIR instructions. First the grammar rule:

rule throw_statement { 'throw' {*}   }

I assume you know how to update the "statement" rule by now. The throw statement will compile down to Parrot's "throw" instruction, which takes one argument. In order to generate a custom Parrot instruction, the instruction can be specified in the :pirop attribute when creating a PAST::Op node. Any child nodes are passed as arguments to this instruction, so we need to pass the result object of the expression being thrown as a child of the PAST::Op node representing the "throw" instruction.

method throw_statement($/) { make PAST::Op.new( $( $ ),                          :pirop('throw'),                           :node($/) ); }

What's Next?
In this episode we implemented two more statement types of Squaak. You should get a general idea of how and when PAST nodes are created, and how they can be retrieved sub (parse) trees. In the next episode we'll take a closer look at variable scope and subroutines.

In the mean time, I can imagine some things are not too clear. In case you're lost, don't hesitate to leave comment, and I'll try to answer (as far as my knowledge goes).