Theory of Formal Languages, Automata, and Computation/Automata and the Chomsky Hierarchy

Automata (or Machines) specify how the strings in a language can be recognized. Automata are another form of finite representations of formal, typically infinite languages. We describe four broad categories of automata and corresponding categories of languages that the automata categories represent. The close correspondence between generators and recognizers should not surprise us, since we have already seen in the form of programming language parsers how a generator (grammar) can be adapted to a recognizer, but interestingly, and I think reassuringly, the language classes defined by the major types of automata correspond exactly to the language classes of the Chomsky Hierarchy as defined by grammar classes! In contrast to our treatment of grammars, which proceeded from broadest class (Type 0 or unrestricted) to the most constrained class (Type 3 or regular), when presenting classes of automata for language classes we work in the opposite direction, from simplest and most constrained automata (Type 3, regular), known as finite automata (FAs), to the computationally most powerful theoretical automata, which are Turing machines for Type 0, unrestricted languages. The most important reason for this change in direction is that FAs can be regarded as standalone computational devices, but they are also integral to all the automata types that we will be discussing.

Finite Automata: Recognizers of Regular Languages
As we will see soon, finite state automata or simply finite automata are recognizers of regular languages. A finite automaton (FA), M over an alphabet Σ, is a system (Q, Σ, $$\delta$$ , q0 , A) , where Q is a finite, nonempty set of states; Σ is finite input alphabet; $$\delta$$ is a mapping of Q × Σ into Q (that is, $$\delta$$ is a set of transitions between states on particular Σ symbols); q0 is in Q and is the start state of M; and A ⊆ Q and is the set of accepting states. FAs are nicely summarized visually as state transition diagrams (see Figure FAexample1), where states (Q) are shown as circles, transitions between states ($$\delta$$) from one state to another state are labeled by an alphabet (Σ) symbol, an unlabelled arc that emanates from outside the diagram indicates the start state (q0), and concentrentic circles identify the accepting states. In other sources you may see "final" states used instead of or in addition to "accepting" states. This book uses "accepting", however, rather than "final", since a so-called "final" state need not be final at all -- transitions from "final" states to other states are possible and common. To determine whether a string, w of symbols from Σ is in the language recognized by an FA, M, the string is processed from the left to right, one symbol at a time. Before processing of the input string begins, the FA is in its start state. As each symbol in w is read from the left-to-right, an appropriate transition is taken from the current state to the next state indicated by a transition. In addition to a state transition diagram, the transition function $$\delta$$ is often represented as a list of transitions or as a table of transitions. Both are shown in Figure FAexample1, which represents/accepts the strings over 0s and 1s that start with '00'.

The language accepted by an FA is the set of strings that place the FA in an accepting state when the string has been completely processed (i.e., read). In the case of the example FA, there will be exactly one path through the FA from the start state that corresponds to a given string. The string either ends up in an accepting state (i.e., its in the language defined by the FA) or it ends up in a non-accepting state (i.e., its not in the language defined by the FA).

When presenting an FA transition diagram (or list or table) it is a common convention, for purposes of comprehensibility, to omit explicit inclusion of trap states, and any transition to a trap state is omitted as well. If in processing a string there is no transition shown for a (state, input) pair, we might call the transition undefined, but literally it signifies a transition to a non-accepting trap state, so we know the string being processed is not in the language. A "shorthand" version of the FA in Figure FAexample1 is shown in Figure FAexample2. It should be clear that from the standpoint of acceptance and non-acceptance, an FA never needs more than one trap state, if any at all.

Deterministic and NonDeterministic FAs
If there is exactly one transition defined for every (state, input) pair in an FA, then the FA is deterministic. The FA in FAexample1 is a deterministic FA (DFA), as is the FA in Figure FAexample2, remembering it is but shorthand, with implied (but real), transitions to a trap state. The FA of Figure FAExample3 is also a deterministic automaton. If there is one or more transitions that can be taken for any of the same (state, input) pair then the FA is a nondeterministic finite automaton (NFA). Notice that by definition every DFA is an NFA, but not vice versa. Figure NFAExample1 illustrates an NFA, which is not also a DFA. In this example there are two transitions defined for (q0, 0).

In the case of an NFA, there may be more than one path from the start state to accepting states after processing a string. You can think of each path being enumerated using a breadth first enumeration strategy. An NFA accepts a string if at least one path ends with an accepting state after the string is processed. This is illustrated in Figure NFASearch1 for the input string 00101.

Equivalence of NFAs and DFAs
The languages that are recognized by the NFAs and DFAs are the same. That is, for any NFA there is a corresponding DFA that recognizes the same language as the NFA, and as we have already noted, every DFA is by definition an NFA. In the former case, we can translate an arbitrary NFA to a DFA, where the DFA is constructed by considering combinations of states in the NFA to be single states in the DFA. Intuitively, each state in the DFA will correspond to a possible frontier in the breadth first enumeration of strings/paths recognized/expanded by the NFA. Figure NFAfrontiers shows some of the frontiers, in ellipses, in the search for a particular string, but that process is generalizable to enumeration of all strings, not just 00101. In the worst case, if there are |K| states in an NFA, there will be O(2undefined) states in a corresponding DFA. While the NFA description of a language is often simpler and more elegant than the DFA equivalent, the NFA will require enumeration in practice when actually processing strings, and thus the NFA has higher runtime costs than a corresponding DFA.

Liberal Expansion then Pruning
One approach to do the translation from NFA to DFA uses an approach of liberal expansion then pruning. We won't ultimately use this approach, but its a good example of algorithm development, and I'll try to demonstrate the process of developing an algorithm. (1) Given an NFA, let's take the power set of the NFA states (i.e., the set of all possible combinations of NFA states), where each member of the power set will correspond to a state of the new, equivalent DFA. (2) The start state of the DFA will correspond to the start state of the NFA. Consider the top NFA in Figure NFAExample1. The start state of the DFA will be {p}, where p is the start state of the NFA and {p} is a member of the power set of NFA states. (3) To determine whether there is a transition between any two DFA states, lets look at the NFA states that are used in the DFA state composition, and create a transition between the DFA states, call them X and Y, for which there is a corresponding transition between an NFA state in X and an NFA state in Y. Two DFA states would correspond to X={q, r} and Y={r, t}, since each is a combination of NFA states. To determine if there is a transition on a '1' from X to Y, we note that in the NFA there is a transition from q (one member of X) to r (one member of Y). This observation isn't sufficient for adding a transition from X (i.e., {q,r}) to Y (i.e., {r,t}) on '1' in the DFA, however, since that same reasoning would also result in a transition from {q, r} to {r} in the DFA, and from {q, r} to {r, t, s} too, and for every Y for which r was a member. This is just a bad way of creating another NFA, rather than translating an NFA to a DFA. Rather, let's assume that a transition is made only between X and Y on a symbol x if every member of X participates in an NFA transition to a member of Y on x, and if every member of Y is the result of a transition on x from a member of X in the NFA. So, returning to our example, we would create a transition from {q, r} to {r, t} on '1' in the DFA because $$q \xrightarrow[]{1} r$$ and $$r \xrightarrow[]{1} t$$ in the NFA, but {q, r} would not transition to any other state on '1' in the DFA. (4) After adding transitions using this strategy, we prune any states and subgraphs that are not reachable from the start state. For example, {r, t} is reachable from {q,r}, but is {q,r} reachable from {p}? No, since p is part of every DFA state because of the initial loop on 0,1. This is seen in the top-most path in Figure NFAfrontiers for '00101' but confirm your understanding that p is part of any frontier of a breadth first enumeration of all reachable states. Having found that {q,r} is not reachable from {p}, we prune {q,r}, then eliminate any states for which there are no incoming links, and repeat the process until all unreachable states are eliminated.

Liberal expansion and pruning, as we have sketched it, works for NFA to DFA translation, but it initially creates many more DFA states than necessary in most cases (i.e., O(2undefined). Nonetheless, variations on liberal expansion and pruning are useful in many circumstances, which we address further when discussing AI.

Another strategy for NFA to DFA translation is to grow the number of DFA states more judicially than naive creation and pruning of the power set of NFA states.

Systematic and Judicious Growth
The prefered process for NFA to DFA translation essentially does as we started with -- a breadth first expansion of possible paths taken through the NFA. We can use a table representation of this expansion, and where each cell in the table is a DFA state represented as a possible NFA frontier. Again, the worst. case table size will be exponential in the number of NFA states, but in most cases it is far smaller. Using our running example (but with the 'shorthand' FA), we see in Figure NFAtoDFAExample1a that {p} is the DFA start state. p and q are on the frontier of the breadth-first enumeration after processing a 0, so DFA state {p,q} are added as a DFA state. Only p is on the frontier after processing the first 1, and so {p} transitions to itself in the DFA.

Continuing, {p, q} is a DFA state that must be investigated. From {p, q} on a 0 we have to identify states that can be next from either p or q (inclusive or). From p on a 0 we can go to p or q in the NFA. From q on a 0 we can go to r. So {p,q,r} becomes a new state in the growing DFA. Note that {p,q,r} corresponds to a frontier in Figure NFAfrontiers.

Still on Figure NFAtoDFAExample1a, from {p,q} on a 1 we can stay in p, or from q we can go to r in the NFA, so {p,r} is a new DFA state. We continue to process those DFA states that were created in the previous step. In Figure NFAtoDFAExample1b we take {p, r} next and then {p,q,r}, though either order would yield the same results. As we know, there is an implicit trap state that can be reached from r too, but we can keep the trap state implicit for simplicity's sake (and the resulting DFA may be a 'shorthand' too).Continuing further we arrive at the table of Figure NFAtoDFAExample1c, with all DFA states generated from all NFA frontiers that are reachable from {p}. The final DFA, expressed as a transition diagram, is shown in Figure NFAtoDFAExample1d. Note that the accepting states in the final DFA correspond to all NFA frontiers that include an accepting state in the NFA, since each of these include at least one path that accepts a string. At this point we could rename the DFA states if we wished to simple atoms, but we can leave the names as is, which is certainly helpful for illustration here, but remember the names, while appearing to be 'composite' are actually atomic in the final DFA.

Connection to Machine Learning
If we compare the DFA of NFAtoDFAExample1d with the NFA that we began with we are reminded of an observation that we began with: that the "NFA description of a language is often simpler and more elegant than the DFA equivalent" but the "NFA will require enumeration in practice when actually processing strings, and thus the NFA has higher runtime costs than a corresponding DFA." So a development strategy that suggests itself is that human software developers start by specifying a non-deterministic software solution to a problem, and an AI translate this non-deterministic solution to a deterministic solution. In real world problems however it is probably not possible to translate to a completely deterministic solution because of uncertainties in an environment, but in real life settings it is often possible to eliminate or otherwise mitigate the costs of nondeterministic choices with the help of probabilities of transitions that are seen in practice. We will delve further into this in later chapters.

Retrospection
Systematic and judicious growth is another general strategy for problem solving. We generate only those states that we need to, again by using an enumeration strategy. "When we need to" is something that is up for interpretation, however, and yet another strategy, which is sometimes called a lazy enumeration strategy will only enumerate the space necessary to identify paths for a given string, one at a time (e.g., much like Figure NFAFrontiers). Such lazy strategies are used in AI systems and algorithms more generally, and can be particularly important when strings (and derivations and recognition paths) vary in the probability with which they occur in some environment. Again, this will be covered more when we discuss AI and machine learning, as well as probabilitic grammars and probabilistic automata, in more depth.

Finally, while we used Figure NFAExample1 to illustrate the equivalence of NFAs and DFAs, we didn't actually say what language the NFA and DFA of Figure NFAtoDFA1d represented. You could reflect on this before reading on, but the answer could help comprehension of the example. Spoiler. The language recognized are strings of 0s and 1s in which there are two 0s seperated by one other symbol. In state p of the NFA there are two possible actions when a 0 is encountered. One action is to guess that the 0 is the first of the two paired 0s by transitioning from p to q, and the other action is to guess that the 0 is not the first of two suitably paired 0s and stay in state p.

Equivalence of Regular Grammars and FAs
We defined the regular languages as those that could be generated by regular grammars. But the regular languages are exactly those that are recognized by FAs as well. We show this by showing how any regular grammar can be converted to an NFA (which can then be translated into a DFA), and by showing how a DFA (which is also an NFA) can be translated to a regular grammar. Both processes are straightforward.

To translate a regular grammar to an NFA, insure first that ε only appears, if at all, on the righthand side of the start symbol, S, and that if ε does so appear (because ε is in the language generated by the grammar), then S does not appear on the right hand side of any production. Let G = (VG, ΣG, PG, SG) be a regular (type 3) grammar. Then there exists a finite automaton M = (QM, ΣM, $$\delta$$M, SM, AM) with L(M) = L(G) (i.e., the languages recognized by M and the language generated by G are the same), and thus ΣG = ΣM.

(1) Define the states of the NFA, QM, to correspond to the variables of the grammar, VG, plus an additional state H, so the states of the NFA are QM = VG $$\cup$$ {H}.

(2) The initial state of the NFA corresponds to the start symbol, SM = SG, of the regular grammar.

(3) If there is a production SG $$\rightarrow$$ ε, then the accepting states of the NFA are AM = {SM, H}, otherwise AM = {H}.

(4) $$\delta$$M(X, d) contains the accepting state H if X $$\rightarrow$$ d is in PG. Recognize that there is always a single variable in each sentential form of a regular grammar derivation, except the last one which removes the remaining variable.

(5) $$\delta$$M(X, d) contains Y for each Y that appears in a production X $$\rightarrow$$ dY.

In short, the NFA will follow very closely from the grammar, with states corresponding to non-terminal symbols and transitions in the NFA corresponding to productions in the grammar. Figure FA-RegularExample1 shows a Regular grammar and a NFA representing the same language. Nondeterminism occurs at State A on an 'a' and at state B on a 'b'.

Now we wish to show that an FA can be translated to a grammar that represents the same language as the FA. In this directiuon we assume the FA is a DFA, M, knowing that we can always translate an NFA to a. DFA if needed. Again, the correspondence between FA and grammar is very close. Define a regular grammar G = (VG, ΣG, PG, SG), where PG includes X $$\rightarrow$$ dY if $$\delta$$M(X,  d) = Y and X $$\rightarrow$$ d if $$\delta$$M(X,  d) = H and H is in AM. Again, see Figure FA-RegularExample1 to confirm your understanding, remembering that there is something you need to do before applying the translation procedure from FA to regular grammar.

NFAs with epsilon Transitions
An important source of nondeterminism in FAs is when multiple transitions are defined for any (state, input) pair; we have already discussed that source of nondeterminism. A second source of nondeterminism is epsilon ($$\varepsilon$$) transitions. An $$\varepsilon$$ transition from one state to another can be taken without reading an input symbol. Consider the NFA in Figure NFAExample2. In the top, shorthand NFA, from q0 we can take an $$\varepsilon$$ transition to q1 before reading an input symbol. This choice of taking $$\varepsilon$$ transitions can be applied repeatedly. If we are just starting to read a string from left to right using NFAExample2, we can either (1) read the first symbol in q0 and stay in q0 if its a '0', or go to a trap state otherwise; or (2) we can take an $$\varepsilon$$ transition to q1 and read the first symbol in q1 and stay in q1 if its a '1', or go to a trap state otherwise; or (3) we can take an $$\varepsilon$$ transition to q1 and then another $$\varepsilon$$ transition to q2 and read the first symbol in q2 and stay in q2 if its a '2', or go to a trap state otherwise. In general, $$\varepsilon$$ transitions can be a source of considerable nonterminism. Any of these 3 options will be the first steps of different recognition trajectories. In each of these three options, we chose to take the $$\varepsilon$$ transitions BEFORE reading an input symbol, but its also allowed to take the $$\varepsilon$$ AFTER reading a symbol. So, for example, we could expand option (2) from q0 we could take an $$\varepsilon$$ transition to q1, read a '1', and take an $$\varepsilon$$ transition to q2, all before reading the second input symbol.

Epsilon-closure
An important construct is the epsilon-closure or $$\varepsilon$$-closure of a state, say p. $$\varepsilon$$-closure(p) is the set of all states that are reachable from p using only $$\varepsilon$$ transitions. $$\varepsilon$$-closure(p) includes p itself. In Figure NFAExample2 $$\varepsilon$$-closure(q0) = {q0, q1, q2}, $$\varepsilon$$-closure(q1) = {q1, q2}, and $$\varepsilon$$-closure(q2) = {q2}.

Updating frontiers
In general, when processing a string from left to right, the process traces out multiple trajectories, and if at least one trajectory or path ends in an accepting state then the string is accepted. As each symbol in the string is processed, the trajectories so far are expanded, with as many $$\varepsilon$$-transitions as are allowed extending paths, as well as the current input symbol. Figures NFAProcess2a - NFAProcess2c illustrate the process for the first two symbols of '0112'. The captions explain the conventions used, notably on what constitutes a frontier at each step. Processing the last two symbols of '0112' is left as an exercise.

Translation to DFAs
The process of translating NFAs with $$\varepsilon$$ transitions (ε-NFAs) to DFAs is much like the previously presented procedure of translating NFAs without $$\varepsilon$$ transitions to DFAs, with one important additional step. In the previous translation procedure, the start state of the NFA is used as the start state of the DFA being constructed. But when epsilon transitions are present, the start state of the DFA will correspond to the epsilon closure of the start state of the NFA. In the example of Figure NFAExample2 the $$\varepsilon$$-closure(q0) = {q0, q1, q2}. So (q0, q1, q2) is the start state of the DFA, and because q2 is an accepting state in the NFA, (q0, q1, q2) will be an accepting state in the DFA too. Figure NFAtoDFAExample2 should the DFA that is equivalent to the NFA.

As a thought experiment consider what would have resulted without taking the first step. q0 would have been the DFA start state, but it would not have been an accepting state, so the DFA would not have accepted the empty string, though NFAExample2 does accept the empty string. Thus, the two representations would not have been equivalent in terms of the language that they represented. In general, always do a sanity check when doing translations between representations, testing on selected example strings, but of course placing your highest reliance on understanding and using equivalence-preserving operations for translation.

Equivalent Representations of Regular Languages
To recap, there are two possible sources of nondeterminism or choice in an NFA: ε transitions and multiple possible transitions on the same (state, input) pair. In general, both kinds of nondeterminism can appear in the same NFA, as in Figure NFAExample3, and both can be removed by the procedure we have covered for translation to a DFA. Given the discussion so far, there are the class of languages defined by NFAs with no ε transitions and the class of languages defined by NFAs in which ε transitions are allowed. By definition, the latter (ε transitions allowed) are a superset of the former (no ε transitions). But are they a proper superset? Does the addition of ε transitions add any computational power at all beyond the elegance of expression? No, because all NFAs, ε transitions or not, are translatable to DFAs. We have described a number of ways of representing regular languages, which are summarized in Figure RegularRepresentations1. Equivalence between two representations is demonstrated if there is a path from one representation to another, and vice versa. Thus, the class of languages of regular grammars is equivalent to the class of languages of DFAs since there is a path from one to the other in each direction. The DFAs are trivially NFAs, and the NFAs with no $$\varepsilon$$ transitions are trivially NFAs with $$\varepsilon$$ transitions allowed. FYI, we could eliminate the central two-way arc between NFAs (no $$\varepsilon$$ transitions) and DFAs (i.e., we could have chosen not to demonstrate those translations), and the ablated graph that resulted would still be sufficient to indicate the equivalence of all four representations.

Regular Expressions
Regular expressions (REs) are yet another way of representing regular languages. A RE is a declarative represention, which expresses a pattern that strings in a language adhere to, whereas a FA specifies a process for recognizing strings in a language and a grammar specifies a process to generate strings in a language.

An RE is an expression over an alphabet Σ, using binary operators for concatenation (×) and for choice (+), and the unary operator of repetition, often known as Kleene Closure or simply closure.

All symbols in Σ are REs, which are atomic. If Σ = {a, b, c} then 'a' is an RE representation the finite language {a}, 'b' represents {b}, and 'c' represents {c}.

If r is an RE and s is an RE then r + s is an RE representing the language represented by r ORed (union) with the language represented by s. Strings in the language specified by r + s would be strings that matched r or strings that matched s. If r = a and s = b, then L(r+s) = {a, b}.

If r is an RE and s is an RE then r × s (or simply rs) is an RE representing the language represented by r concatenated with the language represented by s. Concatenating two languages results in another language with strings that are a concatenation of a string in the first language with a string in the second language. If r = a + b and s = c + b then L((a+b) × (c + b)), alternatively L((a+b)(c+b)), equals {ac, ab, bc, bb}.

As another example, suppose r = (a+b)(b+c) and s = c × b, then L(rs) = {abcb, accb, bbcb, bccb}. (I use c × b instead of cb to make it clear that I am using concatenation, so that cb is not confused an atomic identifier).

And another example, r = (a × b c c) + (c × a) and s = (b + a). L(rs) = {abcb, abca, cab, caa}.

Note that when using '+' the order of the arguments does not matter. In contrast, when using '×' the order does matter.

The repetition operator is unary and signified by *. If r is an RE then r* is an RE that represents the infinite language of strings in which strings in L(r*) are repeated 0 or more times. So if r = a, then L(r*) = L(a*) = {$$\varepsilon$$, a, aa, aaa, aaaa, ...}. Notice that the empty string is a member of L(r*), for the case where r is "repeated" 0 times.

Assume r = (a+b)(b+c), then L(r*) = L(((a+b)(b+c))*) = {$$\varepsilon$$, ab, ac, bb, bc, abab, abac, abbb, abbc, acab, acac, acbb, acbc, bbab, bbac, bbbb, bbbc, bcab, bcac, bcbb, bcbc...}. Occasionally, you might see notation that fixes the number of repetitions to some constant. For example, L(r0) = {$$\varepsilon$$}, L(r1) = {ab, ac, bb, bc}, L(r2) = {abab, abac, abbb, abbc, acab, acac, acbb, acbc, bbab, bbac, bbbb, bbbc, bcab, bcac, bcbb, bcbc}. This notation could be helpful to some students in understanding the repetition operator itself. For example, L(r*) = L(r0) $$\cup$$ L(r1) $$\cup$$ L(r2) $$\cup$$ L(r3) $$\cup$$ ... .

If we wish to express repetition 1 or more times we can write rr* or equivalently r(r*) since * has higher precedence than concatenation. But it is also common to write repetition 1 or more times with +, so if r = a+b then L(r+) = {a, b, aa, ab, ba, bb, aaa, aab, aba, abb, baa, bab, bba, bbb, aaaa, ....} = L(rr*), with $$\varepsilon$$ excluded.

Consider the set of all strings over {0, 1} with at most one pair of consecutive 0s. 0n, where n is greater than 0, would count as n-1 consecutive pairs. A first stab at a solution might be (01)* 00 (10)*, BUT it doesn’t allow a string that starts with 1, it requires exactly one 00, doesn’t allow a string that ends with 1, and it doesn’t allow for any repeating 1s. (1+ε) (01)* 00 (10)*(1+ε) is a second attempt, but it still requires exactly one 00, and it doesn’t allow for any repeating 1s. How about (1+ε) (01+1)* (00+ε) (10+1)*(1+ε)? What about a single 0? So, how about (1+ε) (01+1)* (00+0+ε) (10+1)*(1+ε)? Consider wherther this last example represents the language that's requested. If so, can you simplify the RE so that the revised version still represents the same language? In general, finding a correct (and elegant) RE will require some search.

REs represent the same class of languages as those of Figure RegularRepresentations1. Since the four previously studied schemes for regular languages are known to be equivalent, we can show the equivalence of REs to these other representstions by showing how any of the previous schemes can be translated to REs, and showing how REs can be translated to any of the other representations.

Converting DFAs to Regular Expressions
Converting DFAs to Regular Expressions (REs) uses a process of state elimination. At the beginning, single terminal symbols label the arcs of the DFA. Of course, each terminal symbol is a primitive RE. The state elimination algorithm eliminates a state at a time, and as it does so, REs of increasing complexity will label the arcs of the shrinking DFA. This calls for a generalization of what we mean by a DFA, but it is a natural generalization, in which arbitrary REs can label arcs instead of simply primitive, single-symbol REs.

For example, in the simplest case, suppose that states P, Q, and R, are arranged such that there is a transition from P to Q on an input ‘a’ and there is a transition from Q to R on an input ‘b’. Then if we want to eliminate Q next, we will replace arcs from P to Q and from Q to R with a single transition directly from P to R, and the transition will be labeled by the RE ‘ab’. If that is the only involvement of Q in the DFA, then Q is eliminated. But typically a state to be eliminated like Q will be involved with many other states, and for each pair of states Pi and Rk with transitions into and out of Q respectively, the algorithm will create new direct transitions between Pi and Rk with suitable REs labeling the transitions. A Pi and Rk may actually be the same state, so a loop is introduced from Pi (Rk) back to itself when eliminating Q. Figure DFAtoRE1 illustrates the various steps in translation from a DFA to RE. A interesting anomaly occurs in this example when eliminating state A. Since A originates two paths, there is no path in which A is intermediate. We have shown the result as an RE that has no source, but transitions to E, the start state after eliminating A. The meaning of such a sourceless transition should be straightforward, as external context that is relevant to all strings recognized by the remaining generalized FA. We are reminded to giving external context to a large language model, for example, in the form of a prompt. Nonetheless, a practice that remedies this anomaly, particularly if we were to automate the translation procedure from an FA M is to introduce two "dummy" states, Sstart and Send, to M yielding M', where Sstart is the start state of M' and has an $$\varepsilon$$ transition to the start state of M, and Send is the single accepting state of M', and all accepting states of M have an $$\varepsilon$$ transition to Send. Figure DFAtoRE1a illustrates the effect of this policy on the previous example. Its true that the addition of $$\varepsilon$$ transitions makes M' an NFA, particularly because of these transitions from all of the accepting states of M to the single accepting state of M', but if the FA is otherwise deterministic then you would be right to call the process of Figure DFAtoRE1a a DFA to RE translation.

Figure DFAtoRE2 illustrates another translation. In principle it does not matter what the order of state elimination is, but I find it helpful to start by eliminating states that are less involved with other states. One of the exercises asks you to translate Figure DFAtoRE2 using a different order of state elimination.

Converting Regular Expressions to NFAs with ε transitions
Regular expressions are most easily translated into NFAs with ε transitions, which in turn can be translated to DFAs if desired, but that latter step is not necessary for demonstrating the equivalence of REs and Regular languages. While the basic steps of the algorithm are intuitive (i.e., a concatenation in the RE translates to a strict sequence of states and transitions, a ‘+’ leads to branching into two or more transitions from a state; and repetition (*) in the RE leads to a loop in the NFA), the very liberal introduction of ε transitions into a growing NFA makes the algorithm more easily implementable, but makes the construction very cumbersome for illustration, and so you will see me simplifying the growing NFA at intermediate steps. Let's start with an example of the previous section by translating (0 + 1(1+01)* 00)* to an NFA with ε transitions, and simplifying along the way. I will illustrate by working from the inside out, translating simpler subexpressions first. Figure REtoNFA1a illustrates the translation of (1 + 01)*, which is a subexpression of (0 + 1( 1 + 01)*  00)*. The initial NFA of the Figure shows liberal introduction of $$\varepsilon$$ transitions before most all subexpressions. In an automated algorithm an $$\varepsilon$$ transition would be introduced between the 0 and 1 transitions in the lower path between states Q and R as well, but I used shorthand and jumped ahead on the simplification step for reasons of space. Following simplication we are left with essentially a DFA (excepting the $$\varepsilon$$ transitions from the artificial start state and to the artificial end state).

Figure REtoNFA1b continues to add the the NFA to include additional subcomponents of (0 +  1 (1 + 01)* 00 )*, now 1 (1 + 01)* 00. In this image, b, an additional '1' has been placed in front of the earlier NFA, with added $$\varepsilon$$ transitions. It gets ridiculous, particularly the consecutive $$\varepsilon$$ transitions, but again, the rationale is that the liberal addition eases the algorithm implementation, since taking 'shortcuts' can lead to mistakes, and since the very last step in such an algorithm would be to translate to a DFA, getting rid of all the $$\varepsilon$$ transitions, if that were important. Nonetheless, I ran out of room on the page and used shorthand when adding '00' at the end. Figure REtoNFAc shows the final step in the RE to NFA translation at the top, and this is followed by an NFA to DFA translation. Remember that we started with a RE that was the result of a translation from an NFA, as shown in Figure DFAtoRE2. We might expect that translation of that RE to a DFA as shown as the end up as the same DFA that we started with DFAtoRE2. Obviously, as Figure DFAsComparison makes clear, the two DFAs are not the same. But more importantly, do these two DFAs represent the same language? The answer, thankfully, is yes. As we will see in the next section on DFA minimization, the DFA on the right can be minimized so that it is the same as the DFA on the left, except for the naming of states. In this example, the minimization of the right results in states L and M,L being combined into a single accepting state.

Having demonstrated that REs are translatable to and from other regular language representations, we give the updated representational equivalencies in Figure RegularRepresentations2.

DFA Minimization
DFAs can be minimized into a unique minimal-state DFA, differing only in the naming of states between two “versions” of the minimal DFA. The minimization algorithm operates by identifying which pairs of states are ‘distinguishable’ or not. After looking at all pairs, if two states are not distinguishable, then they are merged in the final, minimal state DFA. It seems to me that a remarkable property of the minimization algorithm is that it looks only at the transitions out of two states to determine whether those states are ‘distinguishable’ or not. That is, transitions INTO the states being compared is not an overt part the algorithm. Moreover, its worth noting that the minimum state DFAs are identical, except in the naming of states – the transitions between states are identical in structure too, though we don’t get into the proof of this. It is conventional to create a table like that on the right of Figure MinimizingDFA1a that can be used to compare pairs of states. The DFA on the left of the figure is the same as we left off with in NFA to DFA translation in Figures REtoNFA1c and DFAcomparison, but with states renamed. The first step in the minimization process is to mark every pair of states in which one is an accepting state and one is not an accepting state as "distinguishable". This is the state of the table in Figure MinimizingDFA1a. Only one pair of states remain, L and P. Are L and P distinguishable or are they equivalent? They both transition to the same states on '0' and on '1', respectively, and we conclude that they are indistinguishable, aka equivalent. If we combine L and P into a single accepting state, LP, and update transitions into and out of LP (i.e., so state N transitions to LP on a '0', LP transitions to LP on a '0', LP transitions to M on a '1') then we have the DFA at the left of Figure DFAsComparison.

The general algorithm for filling in a table is below. The basic idea is to find states that cannot be equivalent, and then states that are equivalent, and merge equivalent states. Algorithm for marking pairs of inequivalent states (“Table-filling” algorithm)

FOR p in A and q in Q – A DO mark(p, q) // accepting states cannot be equivalent to non-accepting states

FOR each pair of unmarked, distinct states (p, q) in A x A or (Q – A) x (Q – A) DO

IF for some input symbol ‘a’, ($$\delta$$(p,a), $$\delta$$(q,a)) is marked

THEN  mark(p, q)

ELSE /* no pair ($$\delta$$(p,a), $$\delta$$(q,a)) is marked

FOR all input symbols ‘a’ DO

put (p, q) on followup list for ($$\delta$$(p,a), $$\delta$$(q,a)) UNLESS $$\delta$$(p,a) = $$\delta$$(q,a))

Repeat second FOR loop until no changes in what is marked (this amounts to processing follow-up lists) The DFA minimization algorithm can be used to determine whether two DFAs accept the same language or not – minimize each of the DFAs being compared and then examine the structure of the resulting minimum state DFAs; if the minimal state DFAs are identical then the two original DFAs accept the same language. The DFA minimization algorithm can be used more broadly to judge whether any two descriptions using any of the representations that we studied (RGs, REs, DFAs, NFAs with and without ε transitions) denote the same language – translate to a DFA and minimize and compare the minimal DFAs.

The algorithm for answering whether two regular language representations are the same by translating each representation to a minimal DFA and comparing the two DFAs to see if they are isomorphic directed graphs is an example of a decision algorithm for regular languages.

Pumping Lemma for Regular Languages Revisited
Though we have described the Pumping Lemma for regular languages previously in terms of grammars, its helpful to also consider the typical way of introducing the PL using DFAs. This is equivalent to our earlier presentation and serves to further highlight the equivalency of FAs and regular (or right linear) grammars.

The Pumping Lemma says that there must be at least one loop in a DFA accepting an infinite regular language, and this loop can be repeated indefinitely or removed from an accepted string that is long enough to have been forced into taking a loop. We can show a language is not regular by assuming that it is regular, carefully selecting a string that is long enough, and pumping a suitable substring 0 or more times (recall that 0 times is the same as deleting the substring) until the resulting string is known not to be in the language thus violating the Pumping Lemma property of regular languages.

Pushdown Automata: Recognizers of Context Free Languages
PDAs are basically FAs with an additional form of memory – a stack. A pushdown automaton M = (Q, Σ, Γ, $$\delta$$, q0, X0, A), where Q, Σ, $$\delta$$, q0, and A have roles similar to those in FAs, and Γ and Z0 are new constructs associated with the PDA's stack memory.


 * Q is a finite set of states,
 * Σ is the finite input alphabet,
 * Γ is the finite stack alphabet,
 * q0 is in Q and is the start state,
 * X0 is in Γ and is the symbol that is initially on the stack,
 * A ⊆ Q is the set of accepting states.

$$\delta$$ is a mapping from Q × (Σ ∪ ε) × Γ to finite subsets of Q × Γ*. That is, $$\delta$$ is a finite set of transitions of the form (qi, a, X) $$\rightarrow$$ {...(qk, α)...} where the PDA transitions from state qi to qk when the current input symbol is a (or ε) and the top of stack symbol is X. Upon transition the top of stack symbol X is replaced with a string of stack symbols, α. That is, if in state qi, 'a' is the input, and X is the top of stack symbol, then the transition is made to state qk and X is replaced by popping X and pushing a sequence of stack symbols denoted by α, where 'X' might be among the stack symbols pushed back on. Note that if α = ‘WYZ’, then after popping X, then Z is pushed first, then Y, then W, and W (i.e., the leftmost stack symbol) is the new top of stack. The definition of a PDA allows for nondeterminism in that each $$\delta$$(qi, a, X) is a set of one or more possible moves, and ε transitions are allowed. Similar to representations of NFA simulations as breadth-first enumerations of recognition paths or trajectories (e.g., Figure NFASearch1), PDA simulations can be represented as a breadth-first enumeration too, but because of the increased complexity of PDAs relative to FAs, each node in a PDA enumeration is called an instantaneous description (ID) or configuration. Each PDA ID is made up of a state, qk, the top of stack, and for convenience we will typically include the remaining input string as well. Figure PDARecognitionPath shows one path only in what would be a larger enumeration of IDs in search of recognition paths for a string 'abc', using a PDA with transitions that include those at the bottom (e.g., $$\delta$$(q0, a, X0) $$\rightarrow$$ (q1, X1)). Figures PDAExample1a and PDAExample1b give an example of a single PDA definition, along with in-line comments and illustrations of the function of transitions. In-line comments are generally the minimum that would be desirable in presenting a PDA definition.

Acceptance by Empty Stack and by Accepting State
In addition to accepting by accepting state, PDAs can also be designed to accept by empty stack. In particular, if the input is exhausted and the PDA's stack is empty then the PDA accepts. When accepting by empty stack the accepting states, A, are irrelevant and A = { } is the indication that a PDA accepts by empty stack. We may sometimes use $$\xrightarrow[]{*}$$ notation to denote the language accepted by a PDA M = (Q, Σ, Γ, $$\delta$$, q0, X0, { }) on the empty stack: L(M) = {w $$\mid$$ (q0, w, X0) $$\xrightarrow[]{*}$$ (qi, ε,  ε)} indicates that there is a sequence of 0 or more transitions from (q0, w, X0) to (qi, ε,  ε), where the first ε indicates that the input is exhausted and the the second ε indicates that the stack is empty. The $$\xrightarrow[]{*}$$ notation can also be used to denote a language accepted by accepting state: for PDA M = (Q, Σ, Γ, $$\delta$$, q0, X0, A$$\neq$${ }), L(M) = {w $$\mid$$ (q0, w, X0) $$\xrightarrow[]{*}$$ (p, ε, γ)}, where p is an accepting state and γ indicates 0 or more stack symbols.

The PDA of Figures PDAExample1a and PDAExample1b accept by accepting state (i.e., state q3). One of the exercises asks you to revise this PDA to accept by empty stack. In general, any PDA that accepts by the empty stack can be translated to a PDA that accepts by accepting state, and vice versa.

Equivalence of PDAs and CFGs
PDAs and CFGs are equivalent finite representations of the CFLs. Since CFGs were used to delimit the CFLs, we would want to show that PDAs and CFGs are equivalent. We can do this by showing that any PDA can be translated to a CFG, and any CFG can be translated to a PDA. We only briefly sketch the translation strategies here.

Thm: If L is a CFL over with alphabet Σ, then there exists a PDA  M, such that L = L(M).

Assume that G = (V, Σ, P, S) is a context free grammar in Greibach Normal Form (i.e., all productions of form A $$\rightarrow$$ aβ, where 'a ' is a terminal symbol, 'A' is a variable, and β is a possibly empty string of variables. We assume that ε is not in L(G). The translation strategy is to construct a PDA M = ({q1}, Σ, V, $$\delta$$, q1, S, {}), accepting by empty stack, where the start state of M is the start symbol of G (i.e., S), where (q1, a, A) contains (q1, γ) whenever A $$\rightarrow$$ aγ is a production of G. Can you confirm that (q1, S) $$\xrightarrow[M]{*}$$ (q1, ε) for any string w iff S $$\xrightarrow[G]{*}$$ w?

Thm: If L is L(M) for a PDA M,  then L is a CFL.

Construct a CFG G from PDA M, such that the leftmost derivation of any string w using G simulates the PDA on w (i.e., the variables in a sentential form correspond to the stack symbols). Can you flesh this argument out?

Deterministic PDAs
As with FAs, deterministic PDAs (DPDAs) are a special case of (nondeterminism-allowed) PDAs, where the righthand side of every transition is a singleton set in the DPDA and ε transitions are not allowed in a DPDA. Unlike the case of FAs, however, the deterministic and nondeterminism-allowed PDAs are not equivalent. Whenever you see 'PDA' with no qualifier, assume the default, nondeterminism-allowed PDA definition.

How can we determine whether a CFL cannot be recognized by a DPDA? We will be satified by intuition. Consider that you are playing the role of a PDA, and are scanning from left to right a string in the language {wwR | w and its reverse are defined over (0+1)*}. How do you know when you are at the boundary between w and wR and that you should therefore switch from remembering (pushing) symbols of w to checking off (popping) symbols from wR? There is no other way than to guess that you are at the boundary, or not, and then follow the implications of that guess along different recognition paths. The language {wwR | w and its reverse are defined over (0+1)*} cannot be recognized by a DPDA. In contrast, consider the language {wcwR | w and its reverse are defined over (0+1)*, and symbol c marks the boundary between them}. There is no need to guess, and the language wcwR can be recognized by a DPDA. As another example, consider the language {wwR | w and its reverse are defined over (0+1)10}, that is, w is exactly 10 symbols. Again, there is no need for the PDA to guess, and this language can be recognized by a DPDA. In fact, the language is regular and it can be recognized by a DFA, a huge DFA to be sure, but a DFA nonetheless.

A language for which there exists a DPDA that recognizes it is a DPDAL or simply a DCFL.

Deterministic and Inherently Ambiguous (IA) Languages
A question that may occur to some is on the relationship between inherently ambiguous CFLs, as was defined when discussing grammars, and DCFLs, as defined just now by DPDAs. One might suspect that DCFLs and IACFLs are complements, but there was no mention of this rather obvious question in references texts, most notably the 1969 and 1979 editions of Hopcroft and Ullman, where I would have expected at least something like "you might wonder about the relationship between DCFLs and IACFLs, but no relationships are proven as yet". After discussing the issue with chatGPT, with several followups to correct contradictions by the AI, it pointed out that Hopcroft, Motwani, and Ullman (p. 255-256, Section 6.4.4) addressed the issue. All deterministic languages (recognized by DPDAs) have unambiguous grammars, but DCFLs and IACFLs are not perfect complements since counterexamples can be found. For instance, S $$\rightarrow$$ 0S0 | 1S1 | ε  is an unambiguous grammar, G, but L(G) = {wwR | w and its reverse are defined over (0+1)*} is not a DCFL. Thus, it might be ok to colloqially refer to the relationship as "somewhat complementary", but not so formally as complements.

An interesting additional point is that in the realm of CFLs generally, the languages recognized by PDAs accepting by empty stack and the languages recognized by PDAs accepting by accepting state are the same sets, but in the realm of DCFLs only, the DCFLs defined by accepting state are a proper subset of the DCFLs recognized by empty stack. All DCFLs are recognized by a DPDA accepting by empty stack, but only some DCFLs are recognized by DPDAs accepting by accepting state.

Here is a convenient summary of relationships involving deterministic languages.

• If there is a deterministic PDA that accepts L then L is a deterministic (CF) language or DCFL; will also call a DCFL this a DPDA language.

• All regular languages are also deterministic languages (DCFLs or DPDALs), but not vice versa. That is, regular languages are a proper subset of the DCFLs, which in turn are a proper subset of the CFLs.

• All deterministic languages have unambiguous grammars that generate them, but not vice versa. That is, deterministic languages are a proper subset of the CFLs that are not inherently ambiguous.

• Languages accepted by PDA accepting/final state and languages accepted by PDA empty stack are the same, and both are exactly the CFLs.

• This is not the case with DCFLs – languages accepted by accepting/final state are a proper subset of the languages accepted by empty stack within the class of DCFLs.

The subclass relationships present in the CFLs are illustrated in Figure CFLsubclasses.

Turing Machines: Recognizers of Unrestricted Languages
Turing machines (TMs) are recognizers of unrestricted languages, which include all the other language classes that we have described (i.e., CSLs, CFLs, RLs). TMs are representationally more expressive than FAs or PDAs. TMs can also enumerate languages and compute functions, and can simulate digital computers and other automata. In fact, TMs are taken to delimit what is computable. TMs are specified by M = (Q, Σ, $$\Gamma$$, $$\delta$$, q0, B, D), as summarized in Figure TMDefinition. The set of states in the finite control (Q), a start state, q0, a set of accepting states A $$\subsetneq$$ Q, an input alphabet, Σ, and a transition function, $$\delta$$, have similar roles as in other automata. A tape alphabet, $$\Gamma$$, are all symbols that can appear on the infinite tape memory. A special symbol, B $$\in$$ $$\Gamma$$, represents a 'blank' on the tape. An input string, w, over Σ, is the initial contents of the tape, with Bs on both sides of w. Thus, Σ $$\subsetneq$$ $$\Gamma$$. Note that this convention of having the tape be an explicit part of the TM notation is different from the conventions for FAs and PDAs, in which the input string's presence on a READ-ONLY tape is often implicit. The TM tape is a read-AND-WRITE memory, similar to a PDA stack, but different from a PDA stack, which does not initially contain the input string. Finally, the TM can move its read-and-write head one tape cell in either direction (denoted Left or Right) or keeping it unchanged (U), rather than in simply one direction as in the case of an FA's and PDA's (implicit) read-only tape.

Textbooks typically define the transition function, $$\delta$$, to be deterministic by default in TMs. That is, for each (state, tape symbol) pair, the TM makes exactly one move (aka transition), going to a another state, rewriting the current tape symbol (perhaps to the same value), and moving the read-write tape head left, right, or keeping it unchanged. Succinctly, a transition is $$\delta$$(qi, X) = (qk, Y, D), where qi, qk $$\in$$ Q; X, Y $$\in$$ $$\Gamma$$; D $$\in$$ {L, R, U}.

TM's as recognizers
In keeping with our treatment of other kinds of automata, we first consider some examples of TMs as recognizers of languages. Because a TM can move to the right and left an indeterminant number of times, there is no telling when it's input is exhausted. Rather, we will assume that the transitions are defined so the as soon as a TM enters an accepting state, it accepts, and if a TM ever encounters a (state, tape symbol) configuration that is not the lefthand side of any transition then the TM immediately rejects. An example of the TM as a recognizer is shown in Figure TMRecognizer1. TM programming can be quite tedious, even more tedious than programming a computer in assembly or machine language, but one typically gives high level pseudocode descriptions of procedures first, then translates to TM code, as illustrated in bold red in the Figure. Figure TMRecognizer1a traces the TM procedure on input '000111', shown at the top of the leftmost column.

As with PDAs we define instantaneous descriptions (IDs) or configurations for TMs. An ID for TMs include (a) the current state, (b) the location of the read/write head on the tape, (c) the tape symbol under the read/write head, and (d) for convenience of illustration we will often include all the current tape symbols as part of the ID, though this is not typically required. Figure TMRecognizer1a shows the IDs in a recognition path for '000111'.

Equivalent Variants of Turing Machines
There are variations on the Turing machine definition that neither add nor diminish the computational power of TMs. The variations that are considered here are (a) multi-track, single tape machines, (b) multi-tape TMs, (c) one-way infinite tape TMs, and (d) nondeterministic TMs.

Multi-track TMs
We can divide up a single tape into multiple tracks, say n of them, as illustrated in Figure TMmultitracks. Each track contains cells that are coincident with the cells of the other tracks. Each track j has its own finite tape alphabet, $$\Gamma$$j, and different tracks can have the same tape alphabets or different tape alphabets. There is, however, only one input alphabet, $$\Sigma$$, and one set of states, Q. There is only one tape head, and at each step the tape head will point at a single tape location with tuple (X1, X2, ..., X n ) for tracks 1 through n, where each Xj $$\in$$ $$\Gamma$$j. The transitions of a multi-track are defined as $$\delta$$(qi, (X1, X2, ..., Xn)) = (qk, (Y1, Y2, ..., Yn), D), where qi, qk $$\in$$ Q; Xj, Yj $$\in$$ $$\Gamma$$j; D $$\in$$ {L, R, U}. In sum, the definition of a multi-track TM with n tracks is M = (Q, Σ, ($$\Gamma$$1,$$\Gamma$$2, ..., $$\Gamma$$n), $$\delta$$, q0, B, D), $$\Sigma$$ is a subset of each $$\Gamma$$j.

A multi-track TM adds no computational power to the basic TM model, which has a single track tape. To see the equivalence in computational power, note that the standard single-tape model is a special case of the multi-track model in which n = 1. To go the other way, that any multi-track TM with n > 1 has an equivalent single track TM that accepts the same language, recognize that we can create a single tape alphabet, $$\Gamma$$, by concatenating the symbol names of the different track alphabets, that is the cross-product of the individual track alphabets, $$\Gamma$$1 $$\times$$ $$\Gamma$$2 $$\times$$ ... $$\times$$ $$\Gamma$$n. Transitions are redefined to use this one alphabet $$\delta$$(qi, $$\langle$$X1 $$\times$$ X2 $$\times$$ ... $$\times$$ Xn$$\rangle$$) = (qk, $$\langle$$Y1 $$\times$$ Y2 $$\times$$ ... $$\times$$ Yn$$\rangle$$, D), where $$\langle$$X1 $$\times$$ X2 $$\times$$ ... $$\times$$ Xn$$\rangle$$ and $$\langle$$Y1 $$\times$$ Y2 $$\times$$ ... $$\times$$ Yn$$\rangle$$ are symbols in $$\Gamma$$.

The advantage of the multitrack formalism is that it can make expression of many procedures easier to conceptualize and write, and that includes acting as an intermediate formalism in showing that still other formalisms are equivalent to the basic single tape, single track TM model.

Multi-tape TMs
The multi-tape TM formalism, summarized in Figure TMmultitapes, is different than the multi-track formalism, in that a multi-tape TM has multiple read/write heads, one for each of n tapes, as well as multiple tape alphabets, $$\Gamma$$j. Each transition is defined as $$\delta$$(qi, (X1, X2, ..., Xn)) = (qk, (Y1, Y2, ..., Yn), (D1, D2, ..., Dn)), where Xj and Yj are members of their respective tape alphabets $$\Gamma$$j, and each Dj is the direction (L, R, U) of the read/write head for the jth tape.Thus, at any given time in processing the n read/write heads need not be aligned at the same cells on their respective tapes, relative to where they all began, which is assumed to be in alignment at the start of the input string on the first tape at the very start of processing.

The multi-tape model adds no computational power to the standard single tape model. To see this, note that the standard single tape model is a special case of the multi-tape model where n = 1. Moreover, any multi-tape TM has an equivalent single tape, multi-track TM, which in turn can be converted into an equivalent single-tape, single-track TM. We just show a sketch of the first step here, and a sketch of the second step is above.

Each tape of the multi-tape TM will correspond to a track in a single tape, multi-track TM. In addition to n tracks for the n tapes, there will be n additional tracks, which give the relative location of the read/write head of the corresponding tape in the original multi-tape TM. So, if the head for tape j had moved three steps L (-3) and one step right (+1) and two steps unchanged (+0), then after the six steps the track representing its location would be -2 relative to its start location.

TMs with one-way infinite tapes
its intuitive that the basic TM model can be simulated by associating one track with one side of the standard TM’s read/write head, and another track or stack associated with the other side of the read/write head. Thus, these seeming restrictions to a semi-infinite tape or two stacks don’t reduce the set of languages that can be recognized over the standard TM model.

Can we reformulate the multi-tape conversion, using two tracks per tape

Nondeterministic TMs
A nondeterministic TM includes at least one configuration in which more than one transition is possible. So, we extend the transition function so that the value of a $$\delta$$(qi, X) is a set, such as $$\delta$$(qi, X) = { ... (qk, Y, D) ...}. In many cases the set for a configuration might still be a singleton, and if we chose we could still consider a deterministic TM as a special case of this more general definition.

Nondeterministic TMs add no computational power beyond

Compare TMs to the definition of PDAs, which were defined initially as inherently nondeterministic, and deterministic PDAs were defined as a special case of the default definition, rather than nondeterministic versions defined as extensions to the deterministic machines, as is the case with TMs and FAs. Why?

Presumably, the reason for this switch in convention among is because for FAs and TMs the deterministic versions accept the same set of languages as the nondeterministic versions, so why not introduce the deterministic versions first and use them as a jumping off point for the nondeterministic version. In contrast, the deterministic PDAs are not as powerful as the nondeterministic PDAs – the former recognizes a proper subset of the latter. So make the nondeterministic PDAs the default, thus conveying an unambiguous equivalence to the CFLs.

Why are the nondeterministic and deterministic machines of equivalent power for FAs and TMs, but not PDAs? Offhand I don’t have a definitive answer, but it seems intuitive that it is related to a ceiling effect in the case of TMs (i.e., the most powerful machines) and a floor effect for FAs (the least powerful machines). PDAs are intermediate between the extremes, and it seems to be the case in many settings, far and wide, not just ALC, that much of the most interesting, or at least most variable, behaviors happens between the extremes.

TM's to compute functions
Let's start with the multiplication function.

TMs as subroutines
Subroutines are an important part of TM programming, where ‘calling’ a subroutine amounts to the TM transitioning to a network of states that have been specially set aside as implementing the subroutine, and the subroutine returns by leaving the machine in a configuration that can be picked up by the ‘calling’ routine. The use of the multiplication subroutine as a means of implementing the square function (i.e., n2) is a good example of this

Turing Equivalency of Digital Computers
Finally, there is an equivalence between TMs and standard digital computers. In one direction it seems clear that we could all simulate a TM in our favorite programming language. In the other direction, we illustrate how the basic components of interpreting a low level machine language program could be simulated on a multi-tape machine (which is equivalent to some very complicated single tape TM).

The basic components of a digital computer include (a) a central processing unit, (b) a memory, and we single out

In showing that TMs are equivalent to digital computers in terms of what can be computed in principle. Examples have been given that show TMs recognizing and enumerating various languages and computing different functions. But something important is missing in a compelling argument for equivalence. Just as there are “universal” computer programs that can translate/compile/interpret other computer programs, there is a universal TM, call it MU, that can interpret or simulate another TM, Mk, on a given input string to Mk. In order for a TM, Mk, to be input to TMU, we need to encode Mk as a string (because any TM, MU included, requires a string as input that is used to initialize MU’s tape). The language accepted by TMU, call it LU, is of the form { ($) |  is a string encoding of a TM, Mk,  is a string that is input to Mk, and $ is a delimiter separating the two substrings -- the Mk and its input}.

Turing Machines and Language Classes
As already noted, TMs are recognizers of unrestricted languages. Given an unrestricted grammar, G, we can design a TM, M, such that L(G) = L(M). The easiest strategy for constructing such a recognizer, though not the most efficient, is to design a TM recognizer that calls a subroutine TM that is an enumerator of L(G), as already discussed, such that given a string w, if the TM enumerator eventually generates w, then the TM recognizer accepts w as a member of L(G), else the TM recognizer rejects w as a member of L(G).

Recursively enumerable (RE) languages
Because TMs can enumerate the members of any language defined by an unrestricted grammar using, for example, a breadth-first enumeration strategy, the languages recognized by TMs are also referred to as the recursively enumerable languages. The recursively enumerable (RE) languages are synonymous with the unrestricted languages. But what happens when a TM recognizer of an RE (aka unrestricted) language is given a string that is not in the language. Ideally, and in most cases with a properly designed TM, the TM will reject w and halt. But in the case of some strings that are not in the language, L(G) = L(M), the recognizer may not halt, but may run forever. To intuit this possibility, recall that an unrestricted grammar need not be a noncontracting grammar, and if so, a TM constructed from a contracting grammar will result in paths in a breadth-first enumeration that can shrink in size. This complicates the test of when a string w of length $$\mid$$w$$\mid$$ is not a future sentential form on an enumeration path, and in some cases there will be no such test.

To repeat, for some RE languages there is no TM that will halt in all cases where an input string is not in the TM's language.

Recursive languages
A TM (or more generally, a procedure) that halts on all inputs, be the input string a member of the TM's language or not a member, is called an algorithm. The language of a TM that always halts is called a recursive language. The recursive languages are a proper subset of the RE languages. The recursive languages are said to be decidable or solvable, since TMs to recognize them always halt with a decision of accept or reject.

RE languages that are not recursive are said to be undecidable or unsolvable, since there are some cases in which their TM recognizers will not halt with a reject decision. Non-RE languages are undecidable (aka unsolvable) in a second, even stronger sense -- there is no TM at all for recognizing all strings in the language in both the accept and reject cases.

We return to issues of undecidability/unsolvability and decidability/solvability when we discuss the properties of the RE languages and properties of the recursive languages, respectively.

Languages that are not RE
There are languages that are not recognized by any Turing machine, that is there are languages that are not recursively enumerable (not RE, and thus not recursive either).

One might ask what languages could possibly be non-RE. Intuitively, examples of non-RE languages will be those for which there is no basis by which a recognizer (or enumerator) can be constructed. A language in which the members are selected randomly would be an intuitive illustrative example: LR = {w | w in Σ* and random(w) == true} could be formed by enumerating the strings in Σ* in canonical order and randomly deciding in each case whether a string, w, is in the language or not. The reason that a recognizer can't be built for this language assumes that random(w) is applied in the recognizer with no memory of its application in the creation of the language. And we can't simply remember the members, since there are an infinite number. In contrast, if random(w) were a pseudorandom process then such procedures are repeatable and we could implement a recognizer in that case, but if we have a truly random process then no TM recognizer can be built and the language is not RE. In addition to "random languages" there are demonstrably other non-RE languages. A demonstration of non-RE languages uses a diagonalization argument. Consider Figure TableofTMLanguages and recall that we can encode TMs as strings. Each row corresponds to the language represented by one TM encoding -- the actual encoding is not shown but is not important to the argument. We only show a label for the TM. If a row does not represent a valid TM encoding then its assumed to be an encoding of a "dummy" machine that represents the empty language (i.e., a row of all 0s for j $$\geq$$ 1). The '1' values in cells of the table, (i, j), indicate that TMi accepts string j. The diagonal represents the set of encodings of TMs that accept their own encodings. That is, a '1' in (4, 4) says that TM4 accepts TM4's encoding. Its possible that the diagonal corresponds to a row in the table, which would therefore represent the TM that represents the language of TMs (encodings) that accept "themselves". But if we complement the diagonal, as shown in Figure DiagonalLanguage, then that complemented diagnonal, $$\bar{L}$$d, is one that differs with every other language in the table by at least one cell -- and so $$\bar{L}$$d cannot correspond to a row in the table, and therefore there is no TM that represents $$\bar{L}$$d, the language of TM encodings that do not accept themselves.

$$\bar{L}$$d = { |  is not accepted by Mk} is not RE.

There are other languages that are not RE. The language of string encodings of TMs for recursive languages is not RE. And thus the language of string encodings of TMs for languages that are not recursive is also not RE (p. 187 second edition).

Languages that are not RE are said to be undecidable or unsolvable, but in an even stronger sense than RE languages that are not recursive. In the former case, there is no TM recognizer that halts in all cases with either accept or reject decisions. There might be TMs that are approximate representations of a non-RE language, but there is no TM that precisely represents a non-RE language.

Ld and Lu are RE but not recursive
The language LU is recursively enumerable, but is not recursive. This means that the universal TM halts on accept, but does not halt on reject in all cases. Intuitively, if the argument to the universal TM, Mu, doesn't halt, then Mu doesn't halt either. Any programmer who has ever coded an infitite loop will recognize this. But to be convincing, can we identify a TM that doesn't halt in all cases, thus representing an RE language (because there is a TM) that is not recursive (doesn't halt in all cases).

Recall from the section above that $$\bar{L}$$d = { |  is NOT accepted by Mk} is not RE. The complement of this language is $$L$$d = { |  is accepted by Mk} (i.e., the diagnonal in Figure TableofTMLanguages and the left table of Figure DiagonalLanguage). In the previous section we noted that $$L$$d could correspond to a row in Figure TableofTMLanguages, and indeed we can build a TM, Md, that recognizes $$L$$d by providing the universal TM with the input ($), and if the universal TM accepts, then  is in $$L$$d = { |  is accepted by Mk}, else  is not in $$L$$d.

The existence of Md indicates that $$L$$d is RE. Is $$L$$d recursive however? The answer must be no, because if $$L$$d were recursive then $$\bar{L}$$d would necessarily be recursive as well. To preview a discussion later in "Properties of Recursive Languages", if L is a recursive language, then there is a TM, call it ML, that recognizes L and that is guaranteed to halt for all accept cases and in all reject cases. We can create a new TM, ML', that calls ML as a subroutine. If ML returns accept for a input string, then ML' returns reject. If ML returns reject for a input string, then ML' returns accept. ML' accepts the complement of L, and always halts, so the complement of L is recursive.

So $$L$$d is a language that is RE but not recursive, since if $$L$$d were recursive then $$\bar{L}$$d would be recursive too, and we know $$\bar{L}$$d isn't recursive (its not even RE). We can say therefore that Lu isn't recursive either, since it won't halt when its argument is a non-halting TM like Md.

Reductions
A reduction is a kind of translation. When we have talked about translations previously, as in the case of inductive proofs or ..., we generally used steps in the translation that were symmeteric, and each step yielded an equivalent form, so direction of a demonstration is arbitrary, at least formally. But reduction doesn't require equivalence necessarily, and direction is important. Rather, you can think of a reduction as showing a one way implication. If you hear “Reduce problem P1 to problem P2” in computational theory it means to “find a means of determining a solution for problem P1 using a means of determining a solution for P2.” In the previous section on showing $$L$$d and $$L$$u were not recursive, we reduced the problem of recognizing $$L$$d to the problem of recognizing $$L$$u, as illustrated in Figure ReductionExample1. In this example, input to TM Md was an encoded TM <Mk>. This input was preprocessed into <Mk>$<Mk>, which was passed to Mu. Whatever means for determining solutions you devise, the reduction should have the property that for every instance of the P1 problem (e.g., membership in $$L$$d) that should return a ‘yes/accept’, the corresponding instance of the P2 problem (e.g., membership in $$L$$u) returns a ‘yes/accept’, and for all instances of P1 that should return a ‘no/reject’, so does P2 for the corresponding instances.

In a reduction from P1 to P2, you can think about the P2 solution procedure (e.g., Mu) as a subroutine of the P1 solution procedure (e.g., Md), albeit a subroutine that does the bulk of the work. Again, we are not showing P1 and P2 are equivalent, but only that one problem solution procedure can be found using another problem solution procedure. In a reduction from P1 to P2, we can also say colloqially that P2 (e.g., membership in Lu) is at least as hard as P1 (e.g., membership in Ld).

Figure ReductionInProof also gives us an alternative way to prove, by contradiction, that Lu is not recursive (i.e., undecidable) given that $$L$$d is not recursive. Of the four logically equivalent ways for thinking about how reduction can be used in a proof by contradiction, the first (i.e., P2 $$\in$$ R $$\rightarrow$$ P1 $$\in$$ R) follows most intuitively from the reduction in Figure ReductionExample1 where we reduce the problem of membership in $$L$$d to the problem of membership in $$L$$u. In that reduction we assume that $$L$$u is recursive, and thus Mu halts in all cases. But if that were so then the reduction indicates that Md would also halt in all cases, which we know cannot be the case. We are left with a contradiction, and so our assumption that Lu is recursive must be wrong.

Linear bounded TMs and the CSLs
Linear Bounded Turing Machines are a form of restricted TMs that exactly recognize the class of context sensitive languages (CSLs). To the set of input symbols a linear bounded TM adds two special symbols to Σ, one that delimits the left end of the input and the other delimits the right end of the input, and these special symbols are only used in that way and they cannot be overwritten. A linearly bounded TM is a nondeterministic single tape TM that never leaves the tape cells on which its input was placed. The input is {w | w is in Σ, but with the two special end markers omitted}.

Thm: If L is a CSL, then L is accepted by a linear bounded TM.

Intuitively, we know that L can be enumerated by a CSG in canonical order and the derivation of a string cannot include a sentential form that is longer than the string.

Thm: If L = L(M) where M is an linear bounded TM then L (excluding ε, if necessary) is a CSL.

The CSLs are a proper subset of the recursive languages. A language that is recursive but not context sensitive, thus showing the proper subset relationship, is described next, following Hopcroft and Ullman.

A Recursive Language that is not a CSL
Develop a binary encoding scheme for the type 0 grammars (the grunt work is not important to understand the argument), and these can be numbered from 1 to infinity (which will include “dummy” grammars, just as an earlier argument acknowledged “dummy” TMs). Of the type 0 grammars its easy to recognize whether the ith grammar in the ordering is a CSG (its productions’ right-hand sides will never be smaller in length than the corresponding left-hand sides).

Define language L = {wi | wi is NOT in L(Gi)}. That is, L is the language of binary strings wi that encode CSGs Gi where wi is not not generated by Gi. (i.e., the language of grammars that do not generate “themselves” so to speak). Since Gi is a CSG, there is an algorithm (always halts) that determines whether wi is in L(Gi). Thus, L is recursive, since given a string wi the test for membership of wi in L will always halt -- if wi is not a CSG then reject wi as a member of L, and if wi is a CSG then if it generates itself then accept wi as a member of L (and halt), else reject wi and halt. But L itself is not generated by any CSG.

A proof by contradiction shows that L cannot be generated by a CSG. Suppose that there was a CSG Gk that generated L, so L = L(Gk). If wk were in L(Gk) we get a contradiction, because L is the language of binary encoded CSGs that are not generated by “themselves”. Consider then if wk is not in L, then wk is not in L(Gk), and again a contradiction, because L is defined so as to include wk in that case.

Thus, L is a recursive language that is not a CSL.

If the reasoning sounds like a Escher painting initially, I understand, but reflect on it. Many proofs by contradiction at this level have the feel of paradoxes, though they are not.

Rather than simply becoming comfortable with the logic of paradoxical-sounding proofs by contradiction, however, its often instructive to dig deeper. By saying that L is not a CSL is to say that there is no non-contracting grammar that generates L -- none, nil, nada! Remember that CSGs are a normal form of non-contracting grammars. Is there something about L = {wi | wi is NOT in L(Gi) and binary encoded Gi = wi} that precludes a non-contracting grammar? Carrying the implications further, for all Type 0 (unrestricted) grammars that do generate L, there must exist a wj in L

How do we connect this reasoning to the fact that the CSLs are non-contracting; the LR~CS must be non-contracting. Does this suggest other recursive but not CSL languages, like planning is which sometimes you must undo subgoals?

Exercises, Projects, and Discussions
FA Exercise 1: Consider the FA of Figure FAexample3. Rewrite this FA to explicitly show all states and transitions.

FA Exercise 2: Describe the language that the FA of Figure FAExample3 represents.

FA Exercise 3: (adapted from Hopcroft and Ullman, 1979, p. 48; Hopcroft, Motwani, and Ullman, 2007, p. 53) Define an FA that accepts strings of 0s and 1s that contain three consecutive 1s.

FA Exercise 4: (due to Hopcroft and Ullman, 1969, p. 44) Define an FA that accepts strings such that every 0 is followed immediately by a 1.

FA Exercise 5: Have a discussion with an AI large language model (LLM) on formalizing and generalizing the discussion under "Equivalence of NFAs and DFAs/Systematic and Judicious Growth". Your goal is to obtain a concise algorithm, or runnable program if you wish, for translating an NFA to a DFA using the strategy that is discussed. You should not accept out of hand whatever description that you get from the LLM, but check it and iterate until you feel confident that you can explain the algorithm and inputs/outputs to the instructor, TA, or other student who is familiar with the area, but. needs help with this particular algorithm.

FA Exercise 6: (adapted from Hopcroft and Ullman, 1979, p. 48; Hopcroft, Motwani, and Ullman, 2007, p. 54) Construct an NFA that accepts the set of all strings of 0s and 1s such that the 6th symbol from the right end is 1. Hint: Because the 6th from the right end will be read by the NFA BEFORE it reaches the right end, the NFA cannot know that the current '1' is 6th from the right or not. So use the guessing strategy described in "Equivalence of NFAs and DFAs/Retrospection".

FA Exercise 7: (adapted from Hopcroft and Ullman, 1979, p. 48; Hopcroft, Motwani, and Ullman, 2007, p. 54, pp. 64-65) Translate the NFA that you develop in FA Exercise 5 to a DFA. Choose an unambiguous way of representing the DFA.

FA Exercise 8: Translate the DFA you construct in FA Exercise 6 to a regular grammar.

FA Exercise 9: (due to Hopcroft and Ullman, 1979, p. 48; Hopcroft, Motwani, and Ullman, 2007, p. 54) Give a FA that accepts all strings beginning with a 1 that when interpreted as a binary integer is a multiple of 5. For example 101, 1010, and 1111 are accepted and 0, 100, and 111 are not.

FA Exercise 10: (due to Hopcroft and Ullman, 1979, p. 48; Hopcroft, Motwani, and Ullman, 2007, p. 53) Construct a FA over {0, 1} that accepts all strings such that each block of 5 consecutive symbols contains at least two 0s. Use a sliding window interpretation of the problem, so that there are two blocks of 5, for example, in a 6 symbol string: symbols 1-5 and symbols 2-6.

FA Exercise 11: Continue processing the input of '0112' in Figure NFAProcess2c, and confirm '0112' is accepted by the NFA. Also confirm that '122' is accepted, and that '120' is not accepted.

FA Exercise 12: (a) (due to Hopcroft and Ullman, 1979, p. 48) Give an NFA (which is not also a DFA) for the language described in Figure NFAExample3, except for i $$\geq$$ 0, instead of i $$\geq$$ 1; (b) Give a DFA for the language of (a).

FA Exercise 13: Give an NFA (that is not also a DFA) that accepts the same language as Figure NFAExample2, but with no $$\varepsilon$$ transitions. Can you write a general procedure for translating an NFA with $$\varepsilon$$ transitions to an NFA without $$\varepsilon$$ transitions, and that is not necessarily a DFA?

RE Exercise 1: There is a pedagogical case, as well as a practical case, to be made for starting this chapter with regular expressions, showing their equivalence to regular grammars, then presenting the generalized version of FAs with transitions labeled by arbitrary REs, showing that these are equivalent to FAs with only primitive REs (i.e., alphabet symbols) labeling the transitions, and continuing on. Carry out a thought experiment in which you redesign the chapter with REs labeling FA transitions. How would NFAs be defined by presenting the generalized FAs first? How might demonstrations of equivalence be affected? How might design and development of FAs be impacted by an encouraged allowance to start with generalized FAs?

RE Exercise 2: Translate Figure DFAtoRE2 by eliminating state C first.

RE Exercise 3: (due to Hopcroft and Ullman, 1979, p. 50; Hopcroft, Motwani, and Ullman, 2007, p. 92) Construct an RE for the set of all strings over {0,1} such that the number of 0s and number of 1s are equal and no prefix has two more 0s than 1s and no prefix has two more 1s than 0s  (i.e., for any prefix the number of 0s and number of 1s differ by no more than one).

RE Exercise 4: Using the textual descriptions and examples as a guide, write an algorithm in pseudocode or a programming language of your choice to translate DFAs to REs. Demonstrate the algorithm on a few test cases. If you use a large language model, insure that you understand the LLM's result, and revise it as necessary. Show the prompts you used.

PDA Exercise 1: Give a pushdown automaton for {w | w is a string in (0+1)* and w consists of an equal number of 0s and 1s}. Hint: you can do this with a deterministic PDA.

PDA Exercise 2: Give a pushdown automaton for {wcwR | c is a character and w is a string in (a+b)*}.

PDA Exercise 3: Give a pushdown automaton for {cwwR | c is a character and w is a string in (a+b)*} that accepts by empty stack. Hint: you can revise the PDA defined by Figures PDAExample1a and PDAExample1b if you wish.

PDA Exercise 4: Construct a PDA for palindromes over {a, b}.

PDA Exercise 5: Just as we gave an FA explanation of the Pumping Lemma for RLs, give a PDA explanation of the PL for CFLs.

TM Exercise 1: Define a formalism for a PDA+ that has two stacks instead of only one stack. Show that the two-stack PDA+ is equivalent in computational power to a TM.