Irony - Language Implementation Kit/Grammar/Terminals

Terminals are the tokens identified by a scanner and passed to the parser. Irony provides a handful of key terminals that are found in almost every programming language (comments, identifiers, string literals, etc.).

Standard terminals
These terminals are already defined in the Grammar base class:

Empty

Used to identify an optional element in a non terminal: Eof

Identifies end of file (using Eof in grammar rules is optional, the Parser automatically adds this symbol as a lookahead to Root non terminal

LineStartTerminal

Used for error tokens

SyntaxError

Used for error tokens

The following are used in indent-sensitive languages like Python. They are not produced by scanner but are produced by CodeOutlineFilter after scanning and before parsing:

NewLine

Indent

Indicates an indentation

Dedent

Indicates the end of an indentation

Eos

End-of-Statement terminal - used in indentation-sensitive language to signal end-of-statement. It is not always synced with CRLF chars, and CodeOutlineFilter carefully produces Eos tokens (as well as Indent and Dedent) based on line/col information in incoming content tokens.

CommentTerminal
The comment terminal allows you to easily declare what defines a comment in your language. Most languages provide at least a line comment, but lots of others allow the concept of a block comment.

To set up either type of comment terminal, just declare a new CommentTerminal type and set the start and end characters.

Example Line Comment:

Example Block Comment:

If you want the scanner to basically ignore your comment terminals so they don't show up in your parse tree etc., then add them to the non grammar terminal list.

ConstantTerminal
This terminal allows to declare a set of constants in the input language.

It should be used when constant symbols do not look like normal identifiers; e.g. in Scheme, #t, #f are true/false constants, and they don't fit into Scheme identifier pattern.

IdentifierTerminal
The identifier terminal will identify those tokens in source code that represent variables expressed in the normal standard way (i.e. starts with an underscore or letter and contains only letters, numbers, and underscores) but can be configured to identify other non-standard methods of expressing variables.

NumberLiteral
The built-in number literal terminal can identify numerous types of numbers from simple integers (e.g. 1) to decimals (e.g. 1.0) to numbers expressed in scientific notation (e.g. 1.1e2).

StringLiteral
Use this terminal to identify string literals; just set the start/end character(s).

One of the useful properties of the StringLiteral terminal is its ability to treat a string as a template and resolve expressions embedded within like in Ruby. Just set the IsTemplate option like above and then feed it a settings class to tell it how to find those expressions. Your expression root (the non terminal used to resolve the embedded expressions) also needs to be added to the SnippetRoots list.

In this example, a new StringTemplateSettings is created where any expression surrounded by curly braces ({ and }) is treated as an expression ("expression" being the non terminal acting as the root expression):

Keywords
Keyword terminals can be declared two ways: explicitly using the ToTerm method in a variable declaration or implicitly within production rules.

Explicit declaration of the keyword SELECT in SQL and then its use in the SELECT statement production:

Implicit declaration inside a SELECT statement production in SQL:

Operators
You define operators as terminals in the same way you would with keywords. You define the associativity and precedence of those operators using the RegisterOperators method in the base Grammar class.

Example indicating associativity and precendence of simple binary operators:

Punctuation
You can tell the scanner and parser what terminals are being used as punctuation in your language by using the MarkPunctuation method in the base Grammar class. Typically, these are terminals like the left and right parentheses characters or curly braces characters.

Example indicating what terminals act as punctuation (LPAREN, RPAREN, LBRACE, and RBRACE assumed to be KeyTerm objects defined beforehand):

Custom Terminals
You can create your own terminals if the built in ones don't fit your needs. Just extend Irony.Parsing.Terminal and go from there. You can also extend the built-in terminals if you need to make slight adjustments to them to fit your language if you can't do so by simply setting existing properties.