Artificial Intelligence/Search/Exhaustive search/Finite state automata

Searching with Finite State Automata
A common way of searching for a pattern in a string of symbols is by using Finite State Automata, also known as finite state machines or FSMs. FSMs are abstract structures which can be implemented in a number of different ways, but all of them share some common properties. Each FSM has :


 * a finite set of states
 * a way of receiving inputs
 * transitions, which are paths from one state to another triggered by inputs
 * optional actions, which come in a few different flavours depending on what kind of event triggers them:
 * entry actions occur when entering a particular state
 * exit actions occur when exiting a particular state
 * input actions only occur when a particular class of input is received while the machine is in a particular state
 * transition actions occur during specified transitions

Acceptors
When used for pattern matches within strings, an FSM is given each successive character of the string as its input. For simple substring matching, the FSM would have one state for each number of characters in the substring matched so far.

Finite state automata which simply accept or reject an input string with a yes/no response are called acceptors. Here is a state diagram of an acceptor that returns a "yes" response when a string contains the substring "aba":



The machine starts in the "empty" state, representing an empty substring buffer. The "other" transitions represent inputs of any character besides "a" or "b", so the only input that will get the machine out of the empty state is the first character of the substring to be matched, "a". From then on, inputs are converted into transitions from state to state.

The double circle around "aba" indicates that it is an accept state of this FSM, meaning that the machine will give a "yes" result if it is in this state when it has no more inputs to process. All other states are non-accpting states and would result in a "no" result at the end of processing.

Transducers
An FSM with actions that provide output is instead called a finite state transducer. Here is an example of a transducer written in Ruby which strips C++/Java-style comments out of an input stream:


 * 1) !/usr/bin/env ruby


 * 1) comment_stripper.rb
 * 2) Requires Ruby 1.8.7
 * 3) Strips C++/Java style comments from an input stream using a finite state
 * 4) transducer. Single-line comments begin with two forward slashes ("//") and
 * 5) extend to the end of the line. Block comments begin with a slash followed by an
 * 6) asterisk ("/*") and end with an asterisk followed by a slash ("*/").
 * 7) When run from the command line, performs the transformation on standard input.
 * 8) Example:
 * 9) cat hello_world.cpp | ruby comment_stripper.rb
 * 1) When run from the command line, performs the transformation on standard input.
 * 2) Example:
 * 3) cat hello_world.cpp | ruby comment_stripper.rb
 * 1) cat hello_world.cpp | ruby comment_stripper.rb
 * 1) cat hello_world.cpp | ruby comment_stripper.rb

class CommentStripper def initialize @current_state = :code end def input(c) output = INPUT_ACTIONS[c][@current_state] || '' @current_state = TRANSITIONS[c][@current_state] || @current_state return output end TRANSITIONS = Hash.new do |hash,input| # triggered by any characters other than the ones explicitly defined below {     :slash  => :code, :star  => :block } end.merge(    # overrides the default transitions defined above    "/" => {      :code   => :slash,      :slash  => :line,      :star   => :code    },    "*" => {      :slash  => :block,      :block  => :star    },    "\n" => {      :line   => :code,      :slash  => :code,      :star   => :block    }  ) INPUT_ACTIONS = Hash.new do |hash,input| {     :code   => input, :slash => ("/" + input) } end.merge(    "/" => {},    "*" => {      :code   => "*"    },    "\n" => {      :code   => "\n",      :slash  => "/\n",      :line   => "\n"    }  ) end

if $0 == __FILE__ cs = CommentStripper.new $stdin.each_char do |c| print cs.input(c) end end

The transitions and input actions are both defined in nested hash tables keyed by input and state. The input method takes in a single character of the input string, performs any necessary transition to a new state, and returns the transduced character based on the original state. This diagram shows the transitions and states defined in this machine: