Browse Prior Art Database

Method for converting a finite state automaton into a regular expression

IP.com Disclosure Number: IPCOM000200417D
Publication Date: 2010-Oct-11
Document File: 2 page(s) / 48K

Publishing Venue

The IP.com Prior Art Database

Abstract

An algorithm is presented which can be used to turn any finite-state automaton into a regular expression.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 54% of the total text.

Page 01 of 2

Method for converting a finite state automaton into a regular expression

A regular expression is a concise string which defines a regular language. A regular language is a (possibly infinite) list of finite strings which obeys certain rules. The intersection of two regular languages A and B is simply the list C of all strings appearing in both of the original lists A and B. It can be proven that C must also form a regular language and hence have a regular expression associated with it. However, computing the regular expression for C is troublesome.

The basic process for computing this intersection would be as follows:
1) convert the regexes A and B into "finite state automata" (e.g see http://en.wikipedia.org/wiki/Finite-state

_

machine)
2) compute the intersection of the two automata - this results in a third automaton, C, which accepts only the strings which both A and B would accept
3) convert C back into a regex.

Step 1 is quite simple and has been accomplished many times. A web-based example is http://osteele.com/tools/reanimator/. Step 2 is also relatively simple and this functionality is available in any well-equipped FSA library: http://osteele.com/software/python/fsa/ is just such an example. Step 3 is much more complex. At the moment the available approaches seem to consist of these: http://neumannhaus.com/christoph/papers/2005-03-16.DFA

_to

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.100.5838&rep=rep1&type =pdf

    A simple output is highly desirable. Regexes have no standard form, many regexes are equivalent, and many algorithms for step 3 result in unreadably large and complex (though valid) regular expressions.

    The best method seems to be the "Brzozowski Algebraic Method" listed in the first document. Notice how this involves creating a set of simultaneous equations - one for each state in C - and using back-substitution to solve the equation for the initial state.

    In the algorithm presented here, instead of constructing one equation for each state in C, one equation is constructed for each state-set in C. Below is a worked example.

Suppose C has states {1, 2, 3, 4, 5}, and a simple two-character alphabet {"a", "b"}, and a transition function f where:
f(1, "a") = 2. f(1, "b") = 4. State 1 is the initial state.
f(2, "a") = 3. f(2, "b") = 5.
f(3, "a") = 3. f(3, "b") = 5. State 3 is final.
f(4, "a") = 2. f(4, "b") = 4.
f(5, "a") = 2. f(5, "b") = 4. State 5 is final.

A finite-state automaton can have multiple final states, so a starting state-set consists of all the final states of C. This is {3, 5},
An "inverse transition function" or "inbound map" is constructed, f', for {3, 5}. For each symbol in C's alphabet, all the states in C where f(

,

) falls in {3, 5} are found. Thus:

    For symbol...