Parrot Virtual Machine/Optables and Expressions

Optables and Expressions
Top-down parsing has some problems. First, it's slow because it may need to try many alternatives before it finds a match. Each time it fails on a match, it needs to return to the previous rule and try again. Another problem is that parse trees can become very large for statements that have lots of little tokens, operator precedence, and many subrules. Consider a simple mathematical expression "1 * 2 + 5" and the following grammar:

EXPRESSION := TERM ADD_OP TERM TERM := (NUMBER MUL_OP NUMBER) | NUMBER ADD_OP := '+' | '-' MUL_OP := '*' | '/'

Our equation above creates a very complex parse tree:

EXPRESSION TERM NUMBER 2 MUL_OP "*" NUMBER 5 ADD_OP "+" TERM NUMBER 5

For things like a mathematical expression, we would like to use a bottom-up parser instead of a top-down one. A special type of bottom-up parser, called an operator precidence table parser is particularly efficient and well-suited for these tasks. To create an operator precedence table, PGE has a special tool called an.

Creating Optables
An optable is created by creating a rule that. Here is an example:

rule expression is optable { ... }

The  is not an omission, it's part of the rule. If you don't have the  three dots, your rule will not work. This is called the yada yada yada operator, and is used to define a function prototype. PGE will instantiate the function for you, using the options you specify. The keyword  tells the parser to convert to bottom-up mode to parse expressions

Operator Prototypes
In an optable, tokens are specified using the  keyword instead of the   or   keywords. A  definition must define several things: The name and type of the operator, the precidence of the operator, and the PIR code that is used to perform the particular operation. Here are some examples:

proto 'infix:+' is precidence('1') is pirop('add') { ... } proto 'prefix:-' is tighter('infix:+') is pirop('neg') { ... } proto 'postfix:++' is equal('prefix:-') { ... }

These examples show how to parse three common operators: A "+" between two values, a negative value for a negative number (such as "-123") and a post-increment operator ("x++"). There are lots of important keywords here, and this isn't even all the keywords you will need to produce a full expression parser.

Operator Types
There are four types of operator: prefix, postfix, infix, and circumfix. Here are some examples:

Operator Precedence
Bottom-up parsers, as we mentioned before, suffer from a particular problem called shift-reduce conflicts. A shift-reduce conflict occurs when a parser cannot determine whether to shift in a new input token, or reduce the tokens it already has. Here is an example from the C programming language:

-x + 2 * y

We know, because we've been taught about operator precedence since we were children, that the expression is grouped like this:

(-x) + (2 * y)

However, a computer doesn't know this unless we tell it what precedence the operators follow. For instance, without knowing the order in which operators act, a parser might group the expression on a first-come, first-served basis like this:

((-x) + 2) * y

Or maybe it would shift in all tokens and start reducing from the right:

-(x + (2 * y)

The point is, we need to create rules to tell the operator precedence parser how to parse these conflicts. Here are the precedence rules that we can specify:

We didn't show this in the example, but it's also possible to determine whether operators act to the left or to the right. We define this with the  keyword. The following table summarizes these rules:

Operator Actions
There are three ways to specify how an operator behaves. The first is to associate the operator with a PIR opcode. The second is to associate the operator with a PAST node type. The third and final way to specify the actions of an operator is to create a function. The first two methods can be performed with the  and   specifiers, respectively.

Operators
For simple operations, it doesn't make any sense to create and call a function. This is especially true for simple arithmetic operations which can be performed very quickly using PASM opcodes that Parrot already defines. Use the code  to implement an operator as a single opcode.

Operators
Some operators, such as logical operators, can be best implemented using types of nodes that PAST already provides. Use the  form to specify the type of PAST node to use.

Operators as Functions
By default, unless you specify an operator as  or , the operator will default to being a function. Most operators are therefore defaulted to be functions, which is very good for languages which intend to allow operator overloading.

Proto Regexes
We will use the term "Proto", "Proto Regex" or "Protoregex" on occasion in this book, and it is used often in the documentation for Perl 6 as well. Protos are like rules except they offer something similar to multiple dispatch: You can have multiple rules of the same name that depend on the operands passed to it. By declaring an operator as a proto, you are giving it a name and you are also allowing future additions to be made to the language. This is an imporant feature for languages which utilize operator overloading.

Protos are defined with the  keyword. They can be used in either the top-down or bottom-up parser, although because of their most common use as a support mechanism for operator overloading, they are most often used in the bottom-up operator precidence parser. To define a new proto, use the form:

proto NAME OPTIONS { ... }

In this declaration, the  is a literal piece of code: you actually must type the three dots (ellipsis) between the curly brackets. What this does is alerts the parser that this is not an implementation of the rule itself, it is actually just a signature or a forward-declaration for other functions. For instance, we can define a proto MyProto:

proto MyProto { ... }

And in another file we can define functions to implement this:

.sub MyProto # Insert the parameter list and function logic here .sub

.sub MyProto      # Another MyProto! # Insert a different parameter list and different function logic here .end

Parsing Terms
Terms are the values between operators. In the following expression:

1 + 2 * 3

The numbers 1, 2, and 3 are all terms. Terms are parsed using the ordinary rules, and are loaded into the operator precedence table using the following syntax:

proto 'term:' is precedence('1') is parsed(&term) { ... }

Of course, you can set the precedence however you see fit, but it's typically best to make terms parse as tightly as possible. The  attribute specifies that terms are to be parsed using a rule named "term". You can define this term using ordinary rule syntax. The  symbol is the sigil that NQP uses to refer to a subroutine. In this case, the subroutine in question is the grammar method that is intended to parse terms (but it can conceivably be anything else). We will learn more about NQP and sigils in the following chapter.

Terms are the way the parser switches back from bottom-up to top-down parsing.

Expression and Term Methods
Once we've created our parser, we need to create methods associated with them to construct the necessary PAST nodes for the parse tree. Lucky for us, these methods are easy to create, and typically the best option is to use a basic template from another language implementation.

Terms are defined like any other rule, and don't require any special syntax. Expressions, however, do require a little bit of special attention.