Tutorial

2.2.3. Shell Comments Filters

Suppose you want to write a filter to remove shell-style comments. The basic algorithm is as follows: you examine characters one at a time, forwarding them unchanged, until you encounter a comment character, typically '#'. When you find a comment character, you examine and ignore characters until you encounter a newline character, at which point the algorithm begins again. Note that this algorithm consists of two subalgorithms: one algorithm for reading ordinary text, and one for reading comments.

In the next three sections, I'll express this algorithm as a stdio_filter, an InputFilter and an OutputFilter. The source code can be found in the header <libs/iostreams/example/shell_comments_filter.hpp>. These examples were inspired by James Kanze's UncommentExtractor.hh (see [Kanze]).

shell_comments_stdio_filter

You can express a shell comments Filter as a stdio_filter as follows:

#include <cstdio>    // EOF
#include <iostream>  // cin, cout
#include <boost/iostreams/filter/stdio.hpp>

class shell_comments_stdio_filter : public stdio_filter {
public:
    explicit shell_comments_stdio_filter(char comment_char = '#')
        : comment_char_(comment_char) 
        { }
private:
    void do_filter()
    {
        bool  skip = false;
        int   c;
        while ((c = std::cin.get()) != EOF) {
            skip = c == comment_char_ ?
                true :
                c == '\n' ?
                    false :
                    skip;
            if (!skip)
                std::cout.put(c);
        }
    }
    char comment_char_;
};

} } } // End namespace boost::iostreams:example

The implementation of the virtual function do_filter is straightforward: The local variable skip keeps track of whether you are currently processing a comment; the while loop reads a character c from std::cin, updates skip and writes c to std::cout unless skip is true.

Filters which derive from stdio_filter are DualUseFilters, which mean they can be used either for output or for input, but not both simultaneously. Therefore unix2dos_stdio_filter can be used in place of shell_comments_input_filter and shell_comments_output_filter, below.

shell_comments_input_filter

Next you will express a shell comments Filter as an InputFilter. A typical narrow-character InputFilter looks like this:

#include <boost/iostreams/categories.hpp>  // input_filter_tag
#include <boost/iostreams/char_traits.hpp> // EOF, WOULD_BLOCK
#include <boost/iostreams/operations.hpp>  // get, read, putback

namespace io = boost::iostreams;

class my_input_filter {
public:
    typedef char              char_type;
    typedef input_filter_tag  category;

    template<typename Source>
    int get(Source& src)
    {
        // Attempt to produce one character of filtered
        // data, reading from src as necessary. If successful,
        // return the character; otherwise return EOF to
        // indicate end-of-stream, or WOULD_BLOCK
    }

    /* Other members */
};

The function get attempts to produce a single character of filtered output. It accesses the unfiltered character sequence though the provided Source src, using the fundamental i/o operations get, read and putback. If a character is produced, get returns it. Otherwise get returns one of the status codes EOF or WOULD_BLOCK. EOF, which indicates end-of-stream, is a macro defined in the standard header <cstdio>. WOULD_BLOCK, which indicates that input is temporarily unavailable, is a constant defined in the namespace boost::iostreams, in the header <boost/iostreams/char_traits.hpp>

You could also write the above example as follows:

#include <boost/iostreams/concepts.hpp>  // input_filter

class my_input_filter : public input_filter {
public:
    template<typename Source>
    int get(Source& src);

    /* Other members */
};

Here input_filter is a convenience base class which provides the member types char_type and category, as well as no-op implementations of member functions close and imbue. I will discuss close shortly.

You're now ready to express a shell Comments Filter as an InputFilter:

#include <boost/iostreams/char_traits.hpp> // EOF, WOULD_BLOCK
#include <boost/iostreams/concepts.hpp>    // input_filter
#include <boost/iostreams/operations.hpp>  // get

namespace boost { namespace iostreams { namespace example {

class shell_comments_input_filter : public input_filter {
public:
    explicit shell_comments_input_filter(char comment_char = '#')
        : comment_char_(comment_char), skip_(false)
        { }

    template<typename Source>
    int get(Source& src)
    {
        int c;
        while (true) {
            if ((c = boost::iostreams::get(src)) == EOF || c == WOULD_BLOCK)
                break;
            skip_ = c == comment_char_ ?
                true :
                c == '\n' ?
                    false :
                    skip_;
            if (!skip_)
                break;
        }
        return c;
    }

    template<typename Source>
    void close(Source&) { skip_ = false; }
private:
    char comment_char_;
    bool skip_;
};

} } } // End namespace boost::iostreams:example

Here the member variable skip_ plays the same role as the local variable skip shell_comments_stdio_filter::do_filter. The implementation of get is very similar to that of shell_comments_stdio_filter::do_filter: the while loop reads a character c, updates skip_ and returns c unless skip_ is true. The main difference is that you have to handle the special value WOULD_BLOCK, which indicates that no input is currently available.

So you see that implementing an InputFilter from scratch is a bit more involved than deriving from stdio_filter. When writing an InputFilter you must be prepared to be interupted at any point in the middle of the algorithm; when this happens, you must record enough information about the current state of the algorithm to allow you to pick up later exactly where you left off. The same is true for OutputFilters. In fact, many Inputfilters and OutputFilters can be seen as finite state machines; I will formalize this idea later. See Finite State Filters.

There's still one problem with shell_comments_input_filter: its instances can only be used once. That's because someone might close a stream while the skip_ flag is set. If the stream were later reopened — with a fresh sequence of unfiltered data — the first line of text would be filtered out, regardless of whether it were commented.

The way to fix this is to make your Filter Closable. To do this, you must implement a member function close. You must also give your filter a category tag convertible to closable_tag, to tell the Iostream library that your filter implements close.

The improved Filter looks like this:

namespace boost { namespace iostreams { namespace example {

class shell_comments_input_filter : public input_filter {
public:
    shell_comments_input_filter();

    template<typename Source>
    int get(Source& src);

    template<typename Source>
    void close(Source&) { skip_ = false; }
private:
    bool skip_;
};

} } } // End namespace boost::iostreams:example

Here I've derived from the helper class input_filter, which provides a member type char_type equal to char and a category tag convertible to input_filter_tag and to closable_tag. The implementation of close simply clears the skip_ flag so that the Filter will be ready to be used again.

shell_comments_output_filter

Next, let's express a shell comments Filter as an OutputFilter. A typical narrow-character OutputFilter looks like this:

#include <boost/iostreams/categories.hpp>  
#include <boost/iostreams/operations.hpp>  // put, write

namespace io = boost::iostreams;

class my_output_filter {
public:
    typedef char               char_type;
    typedef output_filter_tag  category;

    template<typename Sink>
    bool put(Sink& dest, int c)
    {
        // Attempt to consume the given character of unfilitered
        // data, writing filtered data to dest as appropriate. 
        // Return true if the character was successfully consumed.
    }

    /* Other members */
};

The function put attempts to filter the single character c, writing filtered output to the Sink dest. It accesses dest using the fundamental i/o operations put and write. Both of these functions may fail: iostreams::put can return false, and iostreams::write can consume fewer characters than requested. If this occurs, the member function put is allowed to return false, indicating that c could not be consumed. Otherwise, it must consume c and return true.

You could also write the above example as follows:

#include <boost/iostreams/concepts.hpp>  // output_filter

class my_output_filter : public output_filter {
public:
    template<typename Sink>
    bool put(Sink& dest, int c);

    /* Other members */
};

Here output_filter is a convenience base class which provides the member types char_type and category, as well as no-op implementations of member functions close and imbue.

You're now ready to express a shell comments Filter as an OutputFilter:

#include <boost/iostreams/concepts.hpp>    // output_filter
#include <boost/iostreams/operations.hpp>  // put

namespace boost { namespace iostreams { namespace example {

class shell_comments_output_filter : public output_filter {
public:
    explicit shell_comments_output_filter(char comment_char = '#')
        : comment_char_(comment_char), skip_(false)
        { }

    template<typename Sink>
    bool put(Sink& dest, int c)
    {
        skip_ = c == comment_char_ ?
            true :
            c == '\n' ?
                false :
                skip_;

        if (skip_)
            return true;

        return iostreams::put(dest, c);
    }

    template<typename Source>
    void close(Source&) { skip_ = false; }
private:
    char comment_char_;
    bool skip_;
};

} } } // End namespace boost::iostreams:example

The member function put first examines the given character c and updates the member variable skip_; next, unless skip_ is true, it attempt to write c. The member function close simply clears the skip_ flag so that the Filter will be ready to be used again.