Introduction to Programming Languages/Parsing

Parsing
Parsing is the problem of transforming a linear sequence of characters into a syntax tree. Nowadays we are very good at parsing. In other words, we have many tools, such as lex and yacc, for instance, that helps us in this task. However, in the early days of computer science parsing was a very difficult problem. This was one of the first, and most fundamental challenges that the first compiler writers had to face. All of that must be dealt as an example.

If the program text describes a syntactically valid program, then it is possible to convert this text into a syntax tree. As an example, the figure below contains different parsing trees for three different programs written in our grammar of arithmetic expressions:



There are many algorithms to build a parsing tree from a sequence of characters. Some are more powerful, others are more practical. Basically, these algorithms try to find a sequence of applications of the production rules that end up generating the target string. For instance, let's consider the grammar below, which specifies a very small subset of the English grammar:

::= a | the ::= taught | learned | read | studied | saw ::= by | with | about

Below we have a sequence of derivations showing that the sentence "the student learned the programming language with the professor" is a valid program in this language:

⇒ .           ⇒   .           ⇒ the. ⇒ the student. ⇒ the student. ⇒ the student learned. ⇒ the student learned. ⇒ the student learned the. ⇒ the student learned the programming language. ⇒ the student learned the programming language. ⇒ the student learned the programming language with. ⇒ the student learned the programming language with. ⇒ the student learned the programming language with the. ⇒ the student learned the programming language with the professor.