Compiler Construction/Glossary

Glossary
The glossary is intended to provide definitions of words or phrases which relate particularly to compiling, not to provide definitions of general computing jargon, for which a reference to Wikipedia may be more appropriate.

If the word or phrase you want is not here:
 * 1) Try looking it up on Wikipedia.
 * 2) Try searching the World Wide Web using Google or similar.

If you find the word or phrase, consider adding it to this glossary with explanation and/or references.

If you don't find the word or phrase, or can't understand what you did find, add the word or phrase itself to this glossary without an explanation, and leave it for someone else to explain.

When adding an item
When adding a word or phrase, please keep things more or less in alphabetical order to make it easier for others to find things.

The following guidelines will keep this glossary consistent (just follow the same style as some existing entry):
 * The key word or phrase should be:
 * preceded by a blank line,
 * in bold,
 * on a line by itself.
 * The actual definition/description should be indented.
 * Any cross-references to other items in this glossary should be in italic.
 * The alphabetical splits can be refined as required to try and keep the number of entries in each section below about 20.

Wikipedia

 * Addressing mode
 * Algol 60
 * Assembly/Assembler language
 * Tiny BASIC language
 * BASIC
 * BNF: Backus-Naur Form
 * Bootstrapping
 * C
 * C++
 * Chomsky
 * Chomsky hierarchy
 * Compilers
 * Computer architecture
 * CPU design
 * EBNF: Extended BNF
 * FORTRAN
 * High-level language
 * Low-level language
 * Interpreters
 * Pascal
 * PL/0 - Wirth's Toy/Example Compiler from A+DS=P
 * PL/1
 * Recursion
 * Side effect
 * Ada

A to I
checking for meaning
 * This is an alternative phrase for semantic analysis.

comment
 * Part of a source program which is intended for a human reader and which is ignored by the compiler. As a minimum, all variables and all routines should have explanatory comments.  A comment may attempt to explain nearby code, but problems ensue if the code is updated and the comment is not. Ideally, comments should tell WHAT code does, not HOW; the code should be self-explanatory.

compilation
 * The act of translation performed by a compiler.

compiler
 * A computer program that translates a computer program written in one computer language (called the source language) into an equivalent program written in another computer language (called the target language).

compile-time
 * The period during which a program is being compiled, as distinct from run-time when a program is actually running. Certain errors can be detected at compile-time (e.g. misspelt identifier) while other errors may not be detectable until run-time (e.g. division by 0).

 GNU 
 * A recursive acronym "GNU's Not Unix". Independently written Unix lookalike, which can be combined with Linux kernel to produce GNU/Linux - a useful free operating system.

GPL
 * GNU General Public Licence. Much copyrighted free software, including GNU/Linux, is available under this licence. The general intention is to ensure that such software remains freely available for anyone to use and/or modify.

identifier
 * A name used in a program. Normally some (possibly unpronouncable) combination of letters and digits which starts with a letter.  Some programming languages may allow other characters such as underline '_' or hash '#'.  Identifiers may be defined by the programmer or they may be pre-defined by the language (e.g. 'sqrt' is often a pre-defined identifier for referring to a 'square-root' function).  Older programming languages used to limit identifiers to a maximum of 6 characters - the resulting abbreviations tended to make programs more difficult to understand.  Many programming languages allow identifiers of almost any length, though some may only take the first 32 characters into account.  There are two main styles of writing long identifiers: either  now_for_some_long_name  or  NowForSomeLongName.

implementation language
 * This is the programming language in which the compiler or interpreter is written. It might be the same as either the source language or the target language.

intermediate language
 * Some moderately low-level language used as an interface between the front-end and the back-end of a compiler.

interpreter
 * A computer program which examines a computer program written in some source language and carries out the actions required by that program more or less directly, without translating it into some other language. May well be slower than a compiled program, especially if there is a lot of calculation.

J to R
just-in-time compilation
 * A cross between a compiler and an interpreter. The source language is parsed in real-time and translated to machine language which is run immediately. The underlying machine runs the code whereas with an interpreter, it does the work itself.

keyword
 * Many programming languages reserve some identifiers as keywords for use when indicating the structure of a program, e.g. if is often used to indicate some conditional code. Languages such as Pascal/C/C++ have around 50 reserved keywords, Fortran doesn't have any, COBOL has around 300.

lexical analysis
 * This is an alternative name for scanning or tokenisation. The function of lexical analysis is to scan the source program (a sequence of characters arranged on lines) and convert it to a sequence of valid tokens.  Any comments are usually removed at this stage as well.  Syntax analysis is responsible for checking that it is a valid sequence.

machine language
 * This is the lowest level language. It consists of just binary digits. It was only ever used when computers were first invented to create the first compilers.

parsing
 * This is an alternative name for syntax analysis.

pass
 * The number of times that the source text has to be scanned or rescanned in order to compile the program. Due to limited main memory, some early compilers had a large number of passes (about 60).

pragma
 * This is sometimes referred to as a significant comment. The intention is that of passing directives to the compiler or the preprocessor at various points in the source program.  Such directives might indicate whether speed or space is more important, or if certain checks should be suppressed, or if a particular routine is to be timed, etc.

preprocessor
 * A program that takes text and performs lexical conversions on it. The conversions may include macro substitution, conditional inclusion, and inclusion of other files. This can be used to write platform-independent code by excluding source files that are not necessary on certain platforms without changing the build instructions.

reserved word
 * This is an alternative phrase for keyword.

run-time
 * The period when a program is running/being executed/doing some hopefully useful work, as distinct from compile-time when the program is being translated.

S to Z
scanning
 * This is an alternative word for lexical analysis or tokenisation.

semantic analysis
 * The function of semantic analysis is to check that the source program is meaningful. Note that a program can have a valid meaning and still be incorrect if it doesn't do what was really intended.

source language
 * The language accepted as input by a compiler, and translated/compiled into a target language. It is normally a high-level language, written using some mixture of English words and mathematical notation.

syntax analysis
 * This is an alternative name for parsing. The function of syntax analysis is to check that the source program is grammatically correct, i.e. that we have a valid sequence of tokens.  Checking that the source program actually means something is the job of semantic analysis.

target language
 * The language in which the output of a compiler is written. It is normally a low-level language such as assembler, written with somewhat cryptic abbreviations for machine instructions, but may instead be machine code for some actual or virtual computer.

token
 * A fundamental symbol as processed by syntax analysis. A token may be an identifier 'Result', a reserved keyword if, a compound symbol '<=', or a single character '+'.

tokenisation
 * This is an alternative word for lexical analysis or scanning. It means 'conversion to tokens'.