Compiler Construction/About the Book

Preface
The purpose of this book is to provide practical advice on writing a compiler, together with some practical examples of both compilers and interpreters, in order to break away from the concept that building compilers and interpreters are impossible tasks.

For basic information on the subject, look up the References or External Links Sections of the book and the Glossary for the relevant Wikipedia articles.

Some theory is unavoidable, but has been kept to a minimum. If you search the Web for "compiler construction" you will find lots of information and many different approaches. All this book can do is demonstrate a few of the many possible ways of constructing a compiler. After going through this book you should be better able to evaluate other methods.

''This is a Wiki. Readers, please contribute: fix a spelling mistake, rephrase a sentence, expand a paragraph, add a section, write a complete chapter.''

This book has been written in accordance to several goals. The book must be:
 * useful and practical - there are several good compiler books available to us, in particular the Dragon Book. However, while technologies for scanners or syntax analyzers have been well-developed and known, we believe that there is a need for a book that covers, for example, script languages or sophisticated dynamic compilation with byte-code.
 * non-trivial - this is related to the first goal.
 * provides a good coverage of object-oriented languages -
 * have a lot of references -
 * updated - one advantage of an online textbook is that it can be updated to cover new languages or implementation technologies.

We wish the book helps you create a compiler or any tool employing compiler technology.

If you are not familiar with terminology or basic concepts in compilers, you should read the Introduction Chapter since the rest of the book will assume it as background information.

Assumptions
To get the most out of this book, your background/experience/capabilities should include most of the following:
 * Appropriate Background
 * Be familiar with the basic computer architecture, components and execution model.
 * Have a few years experience in one or (preferably) more high-level programming languages.
 * Be able to cope with programs containing several thousand lines of source code.
 * You don't need any specific mathematical background, though some acquaintance with simple algebra might be helpful.
 * Programming requires a logical mind capable of precise attention to detail.
 * Be prepared to learn about various computer science concepts. In particular, you should be able to understand EBNF as described in Section 6 of this chapter.  Other concepts will be introduced as required, e.g. stacks, recursion.
 * Be prepared to learn about simple assembly languages and about various aspects of computer architecture and addressing modes.

Even if you do not have the background given above you should still be able to pick up a few ideas by reading the text.

You may be able to implement some or all of the programs supplied in this book. If you have some high-level programming experience you should also be able to play around at extending the capabilities of the interpreter.

Since this is a freely available book, it seems appropriate to use some freely available implementation language.
 * Implementation Software

I have chosen to use a subset of C++ in conjunction with the GNU GCC compiler. The subset corresponds roughly to using C++ as a better C, including use of standard C++ libraries, but not getting into the object-oriented side of C++. If you use GNU/Linux then this compiler will/should already be available to you. Otherwise you will have to search the World Wide Web for a free version which will run on your system.

If you use Microsoft Windows then you might like to consider Code::Blocks, a free IDE (Integrated Development Environment) with the MinGW compiler(a port of GCC), which is available from www.codeblocks.org.

To reduce the complexity of this book, we will make some simplifying assumptions about the source language which we write a compiler for: In later chapters we will give more consideration to the effects of relaxing these assumptions.
 * Simplify the Source Language
 * 1) The source language does not allow subroutine declarations to be nested.
 * 2) All identifiers must be declared before use.
 * 3) Syntax analysis of the source language can be done in one pass.
 * 4) During syntax analysis, we need only look at the next source token to decide what to do next.

In practical terms these assumptions mean that the source language is easy to translate using simple techniques, and that some run-time complexity has been eliminated.
 * The first assumption rules out Algol 60 and Pascal.
 * The second assumption allows one-pass compilation.
 * The third assumption rules out Algol 68; according to Hunter, up to 4 passes may be needed for syntax analysis.
 * The second and fourth assumptions rule out Fortran; the following are all valid Fortran statements, but you can't even decide what sort of statement each is until you have looked near/at the end of the statement (in Fortran, spaces are not significant, variables need not be declared, and there are no reserved keywords).

The difference between the last two lines is reported to have caused the loss of a space probe costing millions of dollars.

next - Introducing Compilers and Interpreters

Copyright and Licenses
This page shall be included in any copy of the book.

Any source code included if not bearing a different statement shall be considered under the public domain. Images used have their own copyright status, specified in their respective repositories (en.wikibooks.org or at commons.wikimedia.org).


 * Special contributions
 * Murray Langton Complete restructure of apparently abandoned book.
 * TakuyaMurata; I occasionally add some content. I believe that while it likely takes years to complete the book, we can still produce pieces of work useful for those interested in compilers.
 * psanand Like to contribute something to this whenever I get time