Theory of Formal Languages, Automata, and Computation/Applications of Language Classes

Recursive Descent Parsing and Translation
CFGs have been used for some time in defining the syntax of programming languages, starting with Algol. Ideally, we can directly translate a grammar into mutually recursive procedures for parsing a computer program, and ultimately for compiling it. Consider the grammars of Figure ExpressionGrammars. The first of these grammars for arithmetic expressions is simple, yet ambiguous, since id + id * id (and other strings) can be generated by two (or more) distinct leftmost derivations or distinct parse trees. So, that is unsuitable as the basis foor automated parsing. The second grammar is not ambiguous, having enforced operator precedence rules to ensure desirable, single parse trees for every string. This is as far as we got when introducing this CFL of arithmentic expressions. As it turns out, grammar (b) presents another prooblem for the recursive procedures we have in mind. This grammar is a left recursive. When this grammar is translated into mutually recursive procedures, we have a function such as Expression (i.e., E) potentially calling itself, as indicated in E $$\rightarrow$$ E + T, with no change in the argument that is passed. This can lead to infinite recursion. Figure AlgolGrammar shows a simplified CFG for the Algol-60 programming language. Algol-60, as in the year 1960, was the first programming language with syntax defined by a context free grammar. The variables in the grammar translate rather directly into mutually recursive functions in a recursive descent parser. Figure AlgolTranslator shows the parsing routines corresponding to the grammar variables of expression, term, and factor. From parse_expression, a term is what is first expected, and so parse_term is called to identify a portion of code that was generated by the term part of the grammar. Parse_term then calls parse-factor to identify the Algol code that was derived from the factor variable of the grammar.

In parse_factor actual symbols in the program are scanned by the part of the parser called the lexical analyzer, which is concerned with identifying basic tokens, such as constants, variables, operators, and punctuation in the Algol code. In the sample translator code, the match function is highlighted in bold blue, and this function contains a call to the next_token function, which is essentially the lexical analyzer and performs the recognition procedures associated with the parts of the grammar in Figure AlgolGrammar that are highlighted in bold blue. Those productions, in bold blue, correspond to a regular grammar (almost), and you should confirm that productions not conforming exactly to the regular grammar constraints of A $$\rightarrow$$ aB or A $$\rightarrow$$ a can be easily translated to productions that do adhere to the constraints (though its a bit unwieldy to do so because of the space it would require). Though the next_token code is not shown in this illustration, it is an important part of a parser implementation since the lexical analyzer is what actually scans the input string/program.

In Figure AlgolTranslator, the generation of machine code is shown on bold purple. The machine code language assumes that arguments are pushed (loaded) onto a stack, and binary operations such as addition and multiply pop the top two items on the stack, apply the operation, and push the result back on the stack. So in the illustration of Figure AlgolTranslator the machine code translation of the Algol expression '1.2 + Var21 * 3' reads as follows and would execute as indicated in comments when it was run.


 * 1) LOAD_CONST 1.2
 * 2) LOAD_VAR Var21    // (i.e., push the value of Var21)
 * 3) LOAD_CONST 3
 * 4) MULTIPLY               // pop 3 and pop value of Var21, compute 3 * value of Var21, and push the result, P, onto the runtime stack
 * 5) ADD                        // pop P and pop 1.2, compute P + 1.2, and push the result, R, onto the runtime stack

If I had asked chatGPT to show more code for the parser, then it would include parser code for other grammar variables such as 'assignment statement' and included additional code generation commands such as STORE_VAR for storing the top of the runtime stack in a relative memory address associated with an Algol program variable name. There are other important parts of a translator, such as the symbol table for associating Algol program identifiers with computer memory (relative) addresses, that are not tied to any underlying grammar.

In sum, a recursive descent parser and translator follows elegantly from the specification of a programming language grammar. The recursive routines, as with any function calls, will push records onto the parser's runtime stack (not to be confused with the runtime stack that is assumed by the assembly code above), and this reliance on a stack by the parser makes it comprable to a PDA.

Relationships to Artificial Intelligence
There are many informal and formal connections between AI and formal languages, automata, and computation. This chapter delves into these connections, as well as the requisite and optional AI material. Notes embedded in the main text indicate how I often present the material over a semester. While this material is at the end of the textbook, it is typically scattered in my lectures throughout the semester.

Probabilistic Grammars and Automata
In AI, non-determinism is prevalent, but cannot always be removed entirely while guaranteeing the same results because (a) operations/productions/transitions are uncertain, and (b) because the world/IDs are not typically complete or correct (i.e., the AI doesn’t know some things or it is wrong).

Measures of uncertainty are used in AI, most notably probabilities or fuzzy membership.

Probabilistic grammars are a type of formal grammar where each production rule has a probability associated with it.

These probabilities represent the likelihood that a particular rule will be used to generate a string.

Natural Language Processing: Probabilistic grammars can be used to model the probability distribution of sentences in a natural language, making it useful for tasks such as speech recognition and machine translation.

Information Retrieval: Probabilistic grammars can be used to generate queries that match documents in a database, allowing for more accurate search results.

DNA Sequencing: Probabilistic grammars can be used to model the probability distribution of DNA sequences, making it useful for tasks such as genome assembly and alignment.

NFA-to-DFA translation and speedup learning
The first point of inspiration is in the elimination of nondeterminism when translating an NFA to a DFA (or a nondeterministic PDA to a deterministic PDA, if the latter exists). I’ve said that NFAs and nondeterminism generally can lead to simpler and more elegant solutions, and be much easier to specify by a human, but these automata will require enumeration (depth first or breadth first or some heuristic approach) when they are actually used (to see whether a string reaches an accepting configuration along at least one path of the machine), and that is computational costly. So ideally, we can imagine that in the case of a complex problem, someone can specify an NFA solution, but then use automated translation to acquire a more efficient-to-execute deterministic recognizer. The same can be said about AI and machine learning generally. I regard it as definitional of AI that an AI system explore alternatives, requiring enumeration or search – that is nondeterminism. Machine learning can then be used to eliminate nondeterminism, and thus “speedup” problem solving, planning, and other processes by the AI.

Its rarely, if ever, the case that all sources of nondeterminism can be eliminated in an AI program, however. The reasons are several. One, operators or “transitions” in AI often do not have perfectly predictable outcomes. A robot may try to pick up a cup, for example, and the cup slips from the robot’s grasp – there may always be at least two outcomes, I’ve got the cup or I don’t, and probably more outcomes if there can be liquid in the cup or not, hot versus cold liquid, etc. While some outcomes may be rare, we still want the robot to be able to respond to them – this possibility of multiple outcomes is the clear manifestation of nondeterminism. Another source of nondeterminism that is related to the first is that the AI may not know with certainty about all relevant aspects of the environment. For example, an intelligent vehicle may not know that a pedestrian is skateboarding behind a building but is about to abruptly enter traffic, and the AI had better anticipate that possibility, as well as mundane circumstances. Nondeterminism again.

Even though its unlikely that machine learning can get rid of all nondeterminism, it can “reduce” the nondeterminism. In a real AI application we might measure the extent that the nondeterminism is reduced by the expected number of states that an AI system has to enumerate, both before learning and after learning, and while learning. The expected number of states generated during an enumeration or search is the sum of the states generated across all possible problems, weighted by the probability of each problem. So an expected number is a weighted average.

DFA-to-RE Translation and Macro Learning
The second learning task that I want to relate is called macro learning, which is used in AI learning to solve problems more effectively. In particular, if an AI observes that transitions are repeatedly sequenced together, or it discovers that certain transitions that are sequenced together lead to solutions, then the AI might form a macro operator (or macro transition) that concatenates the transitions together so that they can be applied as a single packet, or macro. This process of macro formation will both reduce the nondeterminism found in enumeration by taking large steps in individual paths or derivations, but it also adds to nondeterminism by creating additional choice points in an enumeration or search for a string’s derivation, since the macro is added to, rather than replacing, the operators (or transitions) that are used in its composition. The management of the tradeoffs in macro learning led to a good deal of research on learning choice preferences or biases as well that involved probabilities that are placed on choices. I don’t get into probabilities here, but the slides do touch on their use in probabilistic grammars which are related to AI approaches to solving problems or analogously in deriving strings.

AI Planning
While the implication operator in logic theorem proving looks the same as the production symbol, $$\rightarrow$$, theorem proving is additive. When making an inference, nothing in the current world model is overwritten.

In contrast, in AI planning a “mental” model of the current world state is maintained and changed through the application of operators (or productions). A common representation of operators show PREconditions, which have to be true before the operator can be applied, and EFFects, which are the conditions that are true after the operator is applied. For our purposes we can think of an operator as a production, as in a grammar, PRE $$\rightarrow$$ EFF, where effects rewrite preconditions. There is more nuance to it than that, but the details are left to an AI course.

The slides show two examples of the derivations of single plans, analogous to a single derivation of a string, but as with languages, there is a search for the derivation (plan), which requires an enumeration of many paths, most of which are unsuccessful derivations (and that’s ok – we are looking for just one plan/derivation).

Generating a plan can be modeled as generating a string with a grammar. The operators can be regarded as skeletal productions, that is productions with variables, as illustrated in the blocks-world operators of the slides – Pickup(?X), Putdown(?X), Unstack(?X, ?Y), Stack (?X, ?Y). The last slides of Friday’s lecture give a brief glimpse at this equivalence, though AI planning will generally operate with additional pattern matching capabilities than we have seen with respect to strings. The equivalence also makes reference to ridiculous computational (storage and runtime) requirements in the case where we are interpreting AI states as strings and AI operators as productions, but computational cost is not an issue we are concerned with at this point, and similar equivalence arguments that are not concerned with costs are made by Hopcroft, Motwani, and Ullman 3rd Edition (2007) when comparing Turing Machines and computers (e.g., breakout boxes on pp., 322, 346, 364).

The grammars that are equivalent to AI planning problems would typically be context sensitive, but a general or universal AI planning system is unrestricted. I don’t show this, but have a chat with chatGPT to learn more. Here are some handy references too, not just to AI planning but to other material at the intersection of CS 3252 and AI, notably forms of machine learning that have fallen out of favor, but still very interesting – these references are optional material.

Solomonoff, R. J. (1964). “A Formal Theory of Inductive Inference, Part 1”, Information and Control, Vol. 7.

Biermann, A.W. (1972). “On the Inference of Turing machines from sample computations”, Artificial Intelligence, Vol 3.

Angluin, D. (1980). “Inductive Inference of Formal Languages from Positive Data”, Information and Control, Vol. 45.

Moore, R.C. (1984). “A Formal Theory of Knowledge and Action” Technical Report 320, SRI.

Tate, Alan AI Planning MOOC.

Also, plenty of other references online (search for ‘AI Planning’) or take the AI course.

Exercises, Projects, and Discussions
Project 1: Build a compiler for a non-trivial subset of a programming language of choice.