Computability and Complexity/Formal Languages/Chomsky Hierarchy/Regular Languages

= Regular Languages =

Regular languages are the most restricted, and the simplest, languages in the Chomsky Hierarchy. Languages in this class are usually described like mathematical sets, with a description in curly brackets. Any word that matches that description is part of the language, and any word that doesn't match the description isn't part of the language.

Informally, the class of regular languages is those languages which can be described by characters from its alphabet with concatenation, parentheses, the &cup; (or) operator, and the * operator. An example would be $$\{(b \cup (ab))*\}$$, the language containing any number of as and bs, where every a is followed by a b.

It should be noted that any finite language, which is a language comprised of some finite sequence of words $$w_1, w_2, \ldots, w_n$$, can be written as $$\{w_1 \cup w_2 \cup \ldots \cup w_n\}$$. As a result, every finite language is regular. The converse of this, however, is not true. The example language given earlier, $$\{(b \cup (ab))*\}$$, contains an infinite number of words, and is regular.

An alternative, though equivalent, definition of the regular languages is those languages which can be generated by a regular grammar. A regular grammar is a set of rules which, starting from an initial symbol, replace "non-terminal symbols" (symbols which are not in the alphabet of the language) with "terminal symbols" (symbols which are in the alphabet of the language) through replacement rules that replace each non-terminal symbol with either a terminal symbol, a terminal symbol followed by a non-terminal symbol, or ε (the empty string). As an example, the language $$\{(b \cup (ab))*\}$$ above can be described by the following grammar (let S be the starting symbol): Note that by making different choices about which rules to apply when, this grammar can produce, from a starting string of "S", the empty string or a string of as and bs with every a followed by a b. This is exactly the same as the words matching the set notation above, making the languages produced by the two methods identical.
 * S -> ε
 * S -> aB
 * S -> b
 * S -> bA
 * A -> aB
 * A -> b
 * A -> bA
 * B -> bA
 * B -> b

Finite Automata
Finite automata are machines that consist of a finite number of states, and transition between those states based only on the current state and a character input. When the input has all been consumed, the machine either "accepts" or "rejects" the string of inputs, based on the final state of the machine. These machines happen to correspond exactly with regular languages, with the set of strings accepted by any finite automaton being a regular language, and with every regular language having a machine that accepts all and only strings from that language. For this reason we say that the class of regular languages is equivalent to the class of languages recognized by finite automata.

DFAs & NFAs
There are two types of finite automata, deterministic and non-deterministic. A deterministic finite automaton, or DFA, makes one transition per character in the input, and contains a transition from each state for each character in the alphabet.

A non-deterministic finite automaton, or NFA, can consume either one or no characters per transition, does not need a transition for each character from each state, and can have more than one possible transition for a specific input and a specific state. These properties allow the NFA to contain computation branches. If any computation branch generated by a string accepts, the machine accepts that string, and only rejects if no branch accepts. Since DFAs are more restrictive than NFAs, every DFA is also an NFA, and any NFA can be converted into a DFA, so the set of languages recognized by all NFAs and DFAs is the same, and the two machines are equivalent.

For either type, the automaton can be specified completely by its set of states, its alphabet, the set of transitions, the start state, and the accepting states.

Limitations
Not all languages are regular. Take the language $$\{a^n b^n|n \ge 0\}$$, whose words contain some number of as followed by an equal number of bs. The only memory that a finite automaton has is its current state, so it can count a number of as no larger than the number of states it possesses. Since the language contains all words of this type, the words can have an arbitrarily large number of as, so for any machine, there must be some strings whose number of as it cannot store. If it cannot store the number of as, then it cannot compare the number of as and bs, and so cannot verify that any string is a member of the language. As a result, there is no finite automaton that recognizes this language, and so it cannot be a regular language.

Examples
The code below is a sample DFA emulator in Perl. Given the description of a machine and an input string, it simulates the machine processing the input string, and shows whether the machine accepts or not.

The syntax is: progname.pl DFAFile inputFile, where DFAFile is a text file containing the DFA instructions, and inputFile is a text file containing the input string. Some sample inputs, including a set of DFA instructions, can be found under ../sample DFA inputs/

Previous | Next