Computability and Complexity/Formal Languages/Chomsky Hierarchy/Context Free Languages

= Context Free Languages =

Context free languages are the second most restricted class of languages in the Chomsky Hierarchy. Languages in this class can be described by a set of generation rules using 'non-terminal' symbols and 'terminal symbols', where the terminal symbols are the alphabet of the language. These rules replace non-terminal symbols with strings of terminals or non-terminals, or with an empty string. It is common practice to use capital letters for the non-terminal symbols, and lower-case letters for the symbols in the alphabet (and thus for the terminal symbols). This set of replacement rules is called a context-free grammar. A language is context free if there is a context-free grammar which generates every string in the language (and no other strings) from a starting non-terminal symbol.

As an example, consider the following grammar: From the starting symbol S, this grammar can generate either the empty string or a word composed of any number of as followed by an equal number of bs. You may recall from the section on ../Regular Languages/ that this language, which can also be written as $$\{a^n b^n|n \ge 0\}$$, is not part of the class of regular languages, but we now see that it is context-free.
 * S -> ε (ε here stands for the empty string)
 * S -> A
 * A -> aAb
 * A -> ab

Note, in the example, that the grammar provides a choice with what rule to apply to both S and A non-terminals. It is these choices that allow the grammar to generate all the strings in a language, and not just a single string.

Pushdown Automata
The class of context free languages is the same as the class of languages recognized by machines called pushdown automata. A pushdown automaton (PDA) is a non-deterministic machine comprised of a finite number of states with transitions between them, much like an NFA (see ../Regular Languages/), but with the addition of a stack of unlimited size. As part of its transitions, the machine can pop the top item off of the stack and use its contents as part of its transition, and can also push a new item onto the stack. Its transitions can be written as {A,x,y} -> {B,z}, where A and B are the from and to states, x is the next input character, y is what is popped off the stack, and z is what is placed on the stack. Any of x, y, or z can be ε, meaning that nothing is placed or consumed as part of that transition.

Abilties
The addition of the stack provides a functionally limited but arbitrarily large amount of memory to the machine, enabling it to recognize more complex languages than finite automata. As above, the language $$\{a^nb^n|n \ge 0\}$$ is a context-free language, and so there is a PDA which recognizes it. It can do this by adding an item to the stack for each a in the string, thus storing a count of them, and removing an item from the stack for each b. If the number of as and bs is the same, the stack will empty when there are no more bs, and so the two numbers can be effectively compared. For a more detailed example of a machine to recognize this language, see the sample machine below.

If a PDA simply doesn't use the stack, it is the same as an NFA and thus equivalent to a DFA, so any languages recognized by a finite automaton can be recognized by a PDA as well. Because of this, and because there are non-regular languages which are context-free, the class of regular languages is a proper subset of the class of context-free languages.

Limitations
As with the regular languages, there are many languages which are not context-free. The stack on the PDA, while it provides infinite storage capacity, is still a stack, and so only the last element placed on it can be accessed at any given time. Accessing earlier elements requires removing and thus losing the later elements, since there is no other stack on which to place them. The PDA is also limited in that it must consume the input characters in the order in which they are received, and cannot access them again, except by placing them on the stack.

These limitations make it impossible for a PDA to recognize the language $$\{a^nb^nc^n|n \ge 0\}$$. By using the stack, a PDA can count the number of as, and compare them to the number of bs, but in doing so, it must consume its record of the as, so the cs cannot be compared in the same way. The machine cannot create a record of the bs to compare with the cs without obscuring the count of the as, so that method is equally unsuccessful. In short, the lack of any sort of random or direct access to the memory of the PDA prevents it from recognizing this and many other languages, and since no PDA can recognize them, they cannot be context-free languages.

Examples
The code below is a sample PDA emulator in Perl. Given the description of a machine and an input string, it simulates the machine processing the input string, and shows whether the machine accepts or not.

The syntax is: progname.pl PDAFile inputFile, where PDAFile is a text file containing the TM instructions, and inputFile is a text file containing the input string. Some sample inputs, including a set of PDA instructions for a machine to recognize $$\{a^nb^n|n \ge 0\}$$, can be found under ../sample PDA inputs/

Previous | Next