Tutorial

2.2.10. Finite State Filters

Finite State Machines

In this section I show how to construct Dual-Use Filters from finite state machines. For purposes of this section, a finite state machine consists of a collection of states, represented as ints, a distinguished initial state, a transition table, a collection of event handlers and a stack of characters. These finite state machines were inspired by the finite state machine examples that accompany the Boost Metaprogamming library. See, e.g., <libs/mpl/example/player1.cpp>

For us, an event is simply a character. A transition table is an MPL Forward Sequence of rows. Each row is a 4-tuple consisting of a current state, a character class, a next state and an event handler. During the operation of a finite state machine, its transition table is consulted to determine how the machine's state should be updated and which event handler to call. When an event e occurs, the transition table is searched for the first row whose current state is equal to the current state of the machine, and whose character class matches e. The machine's state is then updated to the row's next state, and the event handler is called with e as an argument.

A character class is a class type with a static member function test taking a character and a std::locale as arguments. E.g.,

    struct is_any {
        static bool test(char, const std::locale&);
    };

There are several built in character classes. The character class is_any matches any character. For any character c, the character class is<c> only matches c. The character classes alnum, is_alpha, is_cntrl, is_digit, is_graph, is_lower, is_print, is_punct, is_space, is_upper, is_xdigit implement locale-sensitive character classification.

Event handlers are member functions of a finite state machine which take a single character as argument and return void. The implementation of an event handler typically examines the given character and pushes zero or more characters of filtered data onto the stack. A special event handler on_eof takes no arguments and is called automatically after the last character in a sequence is processed.

There are three built-in event handlers, on_any, push and skip. The handler on_any is called automatically when an event occurs which is not covered by any row in the transition table. The default behavior of on_any is to do nothing. The handler push pushes the given character onto the stack. The handler skip ignores the current character. The handlers push and skip are not defined by default; in order to make them available, the macro BOOST_IOSTREAM_FSM should be invoked within the class definition of a finite state machine, passing the state machine's name as the macro argument.

Finite state machines must derive from the class template boost::iostreams::finite_state_machine, defined in the header <libs/iostreams/example/finite_state_filter.hpp>. The first template argument to finite_state_machine should be the derived class itself; the second template argument should be the character type.

A finite state machine's transition table should be declared as a member type named transition_table. Its event handlers should be implemented as member functions. A finite state machine may define a static integral constant named initial_state, which will be treated as the machine's initial state. Alternatively, the definition of the initial state can be omitted, in which case it will default to 0.

Given a finite state machine my_fsm, you can define a Dual-Use Filter as follows:

namespace io = boost::iostreams;

typedef io::finite_state_filter<my_fsm> my_finite_state_filter;

dos2unix_fsm

The following state machine can be used to convert line endings from DOS to UNIX. The constant initial_state, the class template row and the character classes is and is_any are members of the base class boost::iostreams::finite_state_machine.

#include <boost/mpl/vector>
#include <libs/iostreams/example/finite_state_filter.hpp>

namespace io = boost::iostreams;

struct dos2unix_fsm : io::finite_state_machine<dos2unix_fsm> {
    BOOST_IOSTREAMS_FSM(dos2unix_fsm) // Define skip and push.
    typedef dos2unix_fsm self;
    typedef boost::mpl::vector<
                row<initial_state, is<'\r'>, initial_state, &self::skip>,
                row<initial_state, is_any,   initial_state, &self::push>
            > transition_table;
};

This machine has just a single state. Its transition table can be understood as follows: if the current character is '\r', ignore it; otherwise, forward it unchanged.

unix2dos_fsm

The following state machine can be used to convert line endings from UNIX to DOS:

#include <boost/mpl/vector>
#include <libs/iostreams/example/finite_state_filter.hpp>

namespace io = boost::iostreams;

struct unix2dos_fsm : io::finite_state_machine<unix2dos_fsm> {
    BOOST_IOSTREAMS_FSM(unix2dos_fsm) // Define skip and push.
    typedef unix2dos_fsm self;

    void on_lf(char) { push('\r'); push('\n'); }

    typedef boost::mpl::vector<
                row<initial_state, is<'\n'>, initial_state, &self::on_lf>,
                row<initial_state, is_any,   initial_state, &self::push>
            > transition_table;
};

This machine also has just a single state. The event handler on_lf pushes a DOS line-ending sequence onto the stack. The transition table can be understood as follows: if the current character is '\n', write a DOS line-ending sequence; otherwise, forward it unchanged.

uncommenting_fsm

The following state machine removes C-style comments. Although it's not quite sophisticated enough for processing source code, it's a good illustration of a multi-state machine.

#include <boost/mpl/vector>
#include <libs/iostreams/example/finite_state_filter.hpp>


namespace io = boost::iostreams;

struct uncommenting_fsm : io::finite_state_machine<uncommenting_fsm> {
    BOOST_IOSTREAMS_FSM(uncommenting_fsm) // Define skip and push.
    typedef uncommenting_fsm self;

    static const int no_comment   = initial_state;
    static const int pre_comment  = no_comment + 1;
    static const int comment      = pre_comment + 1;
    static const int post_comment = comment + 1;

    void push_slash(char c) { push('/'); push(c); }

    typedef boost::mpl::vector<
                row<no_comment,   is<'/'>, pre_comment,  &self::skip>,
                row<no_comment,   is_any,  no_comment,   &self::push>,
                row<pre_comment,  is<'*'>, comment,      &self::skip>,
                row<pre_comment,  is<'/'>, pre_comment,  &self::push>,
                row<pre_comment,  is_any,  no_comment,   &self::push_slash>,
                row<comment,      is<'*'>, post_comment, &self::skip>,
                row<comment,      is_any,  comment,      &self::skip>,
                row<post_comment, is<'/'>, no_comment,   &self::skip>,
                row<post_comment, is<'*'>, post_comment, &self::skip>,
                row<post_comment, is_any,  comment,      &self::skip>
            > transition_table;
};

This machine has four states and one user-defined event handler. I'll leave it as an exercise to discover how it works.