Talk:Compiler Construction

organization
I'm not sure this is the best organization for a book on compilers. It's missing a lot of important detail in areas (regular vs. context-free languages, SSA intermediate representation, etc). Plus, there's a lot of room devoted to stuff that is not really compiler related (garbage collection and IDEs for instance).

I think a better organization would be:


 * Introduction
 * What is a compiler
 * History of compilers
 * Brief background in language theory (chomsky hierarchy)
 * Frontend
 * Parsing regular languages
 * Regular expressions
 * Lex
 * Parsing context free languages
 * shift-reduce
 * recursive-decent
 * packrat?
 * Yacc
 * Putting it together
 * Simple lex/yacc toy language
 * Intermediate Representations
 * Parse Trees
 * Semantic Analaysis
 * Abstract Syntax Trees
 * Advanced Representations
 * RTL
 * Single Statement Assignment (SSA) Form
 * Hashed SSA Form
 * Optimizers
 * Dataflow Analysis
 * In-depth on optimizations (dead-code, common subexpression, etc.)
 * Alias Analysis
 * Backends (Code Generators)
 * Various BURGS
 * Hand Generation

I'll leave this here for a bit to see what others think and then start refactoring.

--AnthonyLiguori 03:36, 27 May 2005 (UTC)

what language to use
It may be more informative for an intro text to use a language like Scheme, which can be easily presented with and without lex and yacc.

--Matt 15:26, 9 May 2004 (UTC)


 * We do not want to be beholden to just one programming language. Dysprosia 23:15, 9 May 2004 (UTC)

Oh, I wasn't suggesting the book should be "Constructing a Compiler for Scheme." Typically, a text on compilers has alternating theory/practice sections. I was merely suggesting that Scheme be used in the practice sections.

What language would you use?

--Matt 16:45, 17 May 2004 (UTC)

I have limited experience in compiler writing (interactive Basic compiler written in assembler, Pascal-like high level language written in itself, plus a slight aquaintance with flex and bison c.f. lex and yacc). Fluent in Fortran, C, Algol 60, Basic, Pascal, some experience with C++, PL/1, fluent in assembler for 6 different computer architectures (not x86 infortunately), expect to learn Java this year. I'm willing to have a stab at this wikibook over the next year, but would probably need to redo the existing outline, and would prefer to use C or simple C++ as the implementation language for any examples.

--Murray Langton 22:00 14 Nov 2004 (UTC)


 * Can you elaborate why the old outline needs to be redone? I think we should start from a small book, so it can be manageable to us. The current outline seems to be too much. Also, I disagree to use C++. We expect that the textbook is used mostly in college, and nowadays students learn Java to write code, and many don't know C/C++. Also, the fact that java is generally machine-independent is quite nice to us. -- TakuyaMurata 23:39, 15 Apr 2005 (UTC)

I don't think there's any need for "implementation examples." Languages should be described in the appropriate notation (regular expressions or EBNF), optimizations should be described via dataflow equations, and so on. Compilers have a rich set of domain specific languages and this shouldn't be hidden.

--AnthonyLiguori 03:39, 27 May 2005 (UTC)

3000 lines?
In the first section, we have the following:

"Note that a compiler is a non-trivial computer program; when written completely by hand a non-optimising compiler for a simple source language is likely to be upwards of 3000 lines long. Some compiler-writing tools are available which can reduce this size."

That 3000 is somewhat misleading, especially for a simple source language.

For instance:

Wirth's famous PL/0 (Pascal subset) has been implemented in many different languages - C and Pascal versions usually run around 1000 lines or less.

A Pascal subset, powerful enough to compile itself, has been implemented in less than 1500 lines of Pascal.

Another Pascal subset, Tiny-Pascal, is around 1100 lines of code, and can also compile itself.

At least two C compilers, with enough of the language implemented to self-compile, have been implemented in less than 1000 lines of code each.

Various Basic's (with strings and arrays) can be implemented in 1000 lines or less.

All of these are hand-written.

While it can and does take many thousands of lines of code to implement a compiler for a more complicated language (full ANSI-C, C++, Java, etc.), compilers for simple (but useful) languages can easily be implemented in less than 1500 lines of hand-written code.

I realize this is being very picky, but there is no need to make compilers appear to be even more complicated than they already are :-)

--Sam 16:30, 6 July 2005 (UTC)

donation
I have had a small book/large paper on writing compilers on my web site for ten years or so. [http://pages.prodigy.net/j_alan/hitech/compiler/compmain.html]

Feel free to make any use of it, including direcly copying and pasting (everybody does it anyway!)

I would especially encourage you to make use of the sections on theory, including using Chomsky Normal Formal and Extended Backus-Naur Form to design languages, and converting such forms to recursive-descent parsers.

If you use the code examples, they should probably be converted to a more modern language as I'm not sure how much Pascal is used anymore.

j_alan@prodigy.net

--JamesAlanFarrell 11:55, 28 July 2005


 * Has anything been done about this? Any volunteers? -- Jonmmorgan 00:28, 9 March 2006 (UTC)

theory
the opening paragraph of this wikibook claims that theory is kept to a minimum. I dont think this is a good tactic. Theory should be included and sufficiently explained. It is reasonable, however, to separate the "non-essential" theory into optional subsections. for instance, each section could have "Introduction", "main ideas", "theory", and "examples" subsections. --Whiteknight 05:27, 24 August 2005 (UTC)

implementation language
the subject of which language to use is an important one, and i think it deserves it's own section. I think, that it would be short-sighted to limit this book to any single existing language, because languages come in and out of favor relatively quickly. a book written 10 years ago in C would probably be rewritten today in VB or Java. unfortunately, a book written exclusively in language X exclude people who dont use language X. I have 2 recommendations: 1) all code examples be written in at least 2 different languages, possibly more for important examples, or 2) all code examples be written in Pseudocode (or even wikicode). --Whiteknight 05:21, 24 August 2005 (UTC)


 * My understanding is that most compiler books have a bunch of "toy" languages as examples.
 * But then they have a long chapter on a language powerful enough to write a Self-hosting compiler.
 * There seems to be a tradeoff between a "powerful" language that can describe any particular toy language in a few lines (but the compiler for that language is long and complex), vs. a "simple" language that takes many more lines to describe that toy language (but the compiler for that language is much shorter and easier to understand).


 * Will one of our code examples be one of these mind-bending "self-hosted" compilers? If so, an alternate implementation in some other language will help avoid the ChickenAndEggProblem.
 * --DavidCary 19:00, 24 February 2007 (UTC)

output language

 * What output language should the compilers in this wikibook spit out? --68.0.124.33 (talk) 00:00, 6 January 2009 (UTC)


 * Some options are:
 * * C-- is a "portable machine language" specifically designed for high-level compilers to generate (C--).
 * * MMIX is a fictional machine language specifically designed to be easy to learn (MMIX).
 * * ARM is the machine language used in most 32 bit CPUs (ARM).

so much for organization
I don't think it is time for us to discuss the organization of the book. The book, as of this writing, has almost no stuff yet but some just outline. I think it works just fine if we simply write more by adding examples, explaining theories, constructing exercise problems etc. We can easily reorganize those materials later on after gaining more sense of what the book should be. In other words, the book should be what it turned out to be and not necessarily what was planned initially. People have different views in mind about what the book should be and it would be the best if people simply add what they know. -- 219.168.92.42 11:52, 15 November 2005 (UTC)

Very Tiny Basic Lex/Yacc interpreter
I am currently writing an interpreter in C, using Lex/Yacc and Make. The language is that defined in the current Case Study 1. I feel we need an example of a Lex/Yacc project, since it is a common approach. Also since interpreting Very Tiny Basic is pretty straightforward, we could write interpreters for it in many languages/ways to demonstrate the differences in technique (and try to underline the similarities). Anyway, I plan to make it Compiler Construction:Case Study 1B, although it belongs in a much latter place than Case Study 1. I hope to write a useful article with it. Any suggestions would be appreciated. Matt73 03:01, 6 February 2006 (UTC)


 * I think that is a good idea. Maybe it's not necessary to do it in many languages so much as many techniques.  A few that come to mind are domain specific languages (eg. Yacc/Lex), recursive-descent parsers and regexp driven lexers (which is what Yacc and Lex compile to), parser combinators (though I think you might need a declarative language to do those properly) and DCGs (Definitive Clause Grammars, used in languages like Prolog and Mercury).


 * I also notice that, while VTB is defined in Bachus-Naur form, I don't see anywhere that BNF is actually defined. I also think that a few sample VTB programs are needed, as that shows a language better than BNF.


 * Jonmmorgan 00:26, 9 March 2006 (UTC)


 * I've written a couple of sample programs, but I haven't done any BASIC programming for a long time (which is a good thing). Could someone please review them?


 * Jonmmorgan 23:39, 23 March 2006 (UTC)

About size of hand written and table-driven compiler. Both type of compilers may have less then 700-1000 lines of code and it may compile itself. To do that only limited language subset must be umplemented (one-dimension arrays, functions without arguments, if, while, call and assign stmts) and some checks may be omitted.


 * avhohlov, 20 March 2006


 * Avhohlov, I'm not sure who/what you are replying to? Regardless, your note is very interesting.             Do you have an example?  Thanks --


 * 72.145.148.76 13:34, 20 March 2006 (UTC)

Here [] you may see small compiler (less then 1000 lines, DOS) and modified Wirth's Pascal-S (less then 2000 lines but much more expressive, P-CODE) both self-compiled. There are no differences to translate small compiler to any widespread languages (for example C or Pascal). DOS are obsolete now and P-CODE may have some benefits but it requires virtual machine compiled by other compiler or assembler. Generation of Linux/ELF file may be relatively simple (see next link).

Also you may see OTCC (Obfuscated Tiny C Compiler) at []. Very small but it's a hacker's code.


 * avhohlov, 20 March 2006

Lex/Yacc
I am writing an outline for a proposed book on Lex/Yacc (and Flex/Bison). These two programs are certainly worth writing about, and they can be used for more then just compilers. here is my proposed outline. If we have a book dedicated to these programs, we won't need to waste time trying to explain them here in the compiler book. Also, I think that some of the contributers here would make good contributers to this new book. Any feedback would be appreciated. --Whiteknight (talk) (current) 22:16, 25 April 2006 (UTC)

It's a very good idea, and would certainly help this book. I feel that this should be a more general book than "How to make a compiler using X technique&tools" and it would be good to be able to give theory and simple examples then refer to the Flex/Yacc book (and hopefully others). Of course, the eventual direction of this book will be decided by whoever ends up writing it (not me, I don't know enough theory). I intend to finish my VTB interpreter, it just needs the interpreting code, finishing and some article writing. The reason I chose to write it was so that it flows on from the 1st case study. Hopefully when my exams are over I'll be able to finish it. Good luck with Flex/Yacc, and maybe I'll be able to help out. Matt73 13:49, 10 June 2006 (UTC)


 * If we had a good book here on wikibooks about automata theory, and context-free languages, writing a general book about compilers and theory would be a sinch. Maybe I will get to work on an outline for a book like that as well. --Whiteknight (talk) (projects) 15:15, 11 June 2006 (UTC)

Automata Theory
I have decided to create a new book on the subject of Automata Theory. This way, we can have a wikibooks-based resource for the background information, and then this book can use that resource to cover the ground-work behind compilers, computer languages, etc. I have posted a preliminary outline for this new book HERE on my user page. Any comments or suggestions are welcome. --Whiteknight (talk) (projects) 00:27, 12 June 2006 (UTC)

Safety optimization
What are "safety [optimizations] reducing the possibility of data structures becoming corrupted" ?

While I've certainly seen many optimizations that are "not re-entrant", "less safe", "non-volatile-safe", or otherwise "unsafe", I am unfamiliar with optimizations that make the code *more* safe than non-optimized code.

Please enlighten me. --DavidCary 20:25, 25 October 2006 (UTC)

Probably things dealing with stuff like this ( http://lambda-the-ultimate.org/node/1579 )--Panic 23:05, 25 October 2006 (UTC)