Tutorial

2.2.5. Tab-Expanding Filters

Suppose you want to write a filter which replaces each tab character with one or more space characters in such a way that the document appears unchanged when displayed. The basic algorithm is as follows: You examine characters one at a time, forwarding them as-is and keeping track of the current column number. When you encounter a tab character, you replace it with a sequence of space characters whose length depends on the current column count. When you encounter a newline character, you forward it and reset the column count.

In the next three sections, I'll express this algorithm as a stdio_filter, an InputFilter and an OutputFilter. The source code can be found in the header <libs/iostreams/example/tab_expanding_filter.hpp>. These examples were inspired by James Kanze's ExpandTabsInserter.hh (see [Kanze]).

tab_expanding_stdio_filter

You can express a tab-expanding Filter as a stdio_filter as follows:

#include <cstdio>    // EOF
#include <iostream>  // cin, cout
#include <boost/iostreams/filter/stdio.hpp>

class tab_expanding_stdio_filter : public stdio_filter {
public:
    explicit tab_expanding_stdio_filter(int tab_size = 8)
        : tab_size_(tab_size), col_no_(0)
    {
        assert(tab_size > 0);
    }
private:
    void do_filter();
    void do_close();
    void put_char(int c);
    int  tab_size_;
    int  col_no_;
};

} } } // End namespace boost::iostreams:example

The helper function put_char is identical to line_wrapping_stdio_filter::put_char. It writes a character to std::cout and updates the column count:

    void put_char(int c)
    {
        std::cout.put(c);
        if (c == '\n') {
            col_no_ = 0;
        } else {
            ++col_no_;
        }
    }

Using put_char you can implement do_filter as follows:

    void do_filter()
    {
        int c;
        while ((c = std::cin.get()) != EOF) {
            if (c == '\t') {
                int spaces = tab_size_ - (col_no_ % tab_size_);
                for (; spaces > 0; --spaces)
                    put_char(' ');
            } else {
                put_char(c);
            }
        }
    }

The while loop reads a character from std::cin and writes it to std::cout, unless it is a tab character, in which case it writes an appropriate number of space characters to std::cout.

As with line_wrapping_stdio_filter, the virtual function do_close resets the Filter's state:

    void do_close() { col_no_ = 0; }

tab_expanding_input_filter

You can express a tab-expanding Filter as an InputFilter as follows:

#include <boost/iostreams/char_traits.hpp> // EOF, WOULD_BLOCK
#include <boost/iostreams/concepts.hpp>    // input_filter
#include <boost/iostreams/operations.hpp>  // get

namespace boost { namespace iostreams { namespace example {

class tab_expanding_input_filter : public input_filter {
public:
    explicit tab_expanding_input_filter(int tab_size = 8)
        : tab_size_(tab_size), col_no_(0), spaces_(0)
    { 
        assert(tab_size > 0); 
    }

    template<typename Source>
    int get(Source& src);

    template<typename Source>
    void close(Source&);
private:
    int get_char(int c);
    int   tab_size_;
    int   col_no_;
    int   spaces_;
};

} } } // End namespace boost::iostreams:example

Let's look first at the helper function get_char:

    int get_char(int c)
    {
        if (c == '\n') {
            col_no_ = 0;
        } else {
            ++col_no_;
        }
        return c;
    }

This function updates the column count based on the given character c and returns c. Using get_char you can implement get as follows:

    template<typename Source>
    int get(Source& src)
    {
        if (spaces_ > 0) {
            --spaces_;
            return get_char(' ');
        }

        int c;
        if ((c = iostreams::get(src)) == EOF || c == WOULD_BLOCK)
            return c;

        if (c != '\t')
            return get_char(c);

        // Found a tab. Call this filter recursively.
        spaces_ = tab_size_ - (col_no_ % tab_size_);
        return this->get(src);
    }

The implementation is similar to that of line_wrapping_input_filter::get. Since get can only return a single character at a time, whenever a tab character must be replaced by a sequence of space character, only the first space character can be returned. The rest must be returned by subsequent invocations of get. The member variable spaces_ is used to store the number of such space characters.

The implementation begins by checking whether any space characters remain to be returned. If so, it decrements spaces_ and returns a space. Otherwise, a character is read from src. Ordinary characters, as well as the special values EOF and WOULD_BLOCK, are returned as-is. When a tab character is encountered, the number of spaces which must be returned by future invocations of get is recorded, and a space character is returned.

As usual, the function close resets the Filter's state:

    void close(Source&)
    {
        col_no_ = 0;
        spaces_ = 0;
    }

tab_expanding_output_filter

You can express a tab-expanding Filter as an OutputFilter as follows:

#include <boost/iostreams/concepts.hpp>    // output_filter
#include <boost/iostreams/operations.hpp>  // put

namespace boost { namespace iostreams { namespace example {

class tab_expanding_output_filter : public output_filter {
public:
    explicit tab_expanding_output_filter(int tab_size = 8)
        : tab_size_(tab_size), col_no_(0), spaces_(0)
    { 
        assert(tab_size > 0); 
    }

    template<typename Sink>
    bool put(Sink& dest, int c);

    template<typename Sink>
    void close(Sink&);
private:
    template<typename Sink>
    bool put_char(Sink& dest, int c);
    int  tab_size_;
    int  col_no_;
    int  spaces_;
};

} } } // End namespace boost::iostreams:example

The implemenation helper function put_char is the same as that of line_wrapping_output_filter::put_char: it writes the given character to std::cout and increments the column number, unless the character is a newline, in which case the column number is reset.

    template<typename Sink>
    bool put_char(Sink& dest, int c)
    {
        if (!iostreams::put(dest, c))
            return false;
        if (c != '\n')
            ++col_no_;
        else
            col_no_ = 0;
        return true;
    }

Using put_char you can implement put as follows:

    template<typename Sink>
    bool put(Sink& dest, int c) 
    {
        for (; spaces_ > 0; --spaces_)
            if (!put_char(dest, ' '))
                return false;

        if (c == '\t') {
            spaces_ = tab_size_ - (col_no_ % tab_size_) - 1;
            return this->put(dest, ' ');
        } 

        return put_char(dest, c);
    }

The implementation begins by attempting to write any space characters left over from previously encountered tabs. If successful, it examine the given character c. If c is not a tab character, it attempts to write it to dest. Otherwise, it calculates the number of spaces which must be inserted and calls itself recursively. Using recursion here saves us from having to decrement the member variable spaces_ at two different points in the code.

Note that after a tab character is encountered, get will return false until all the associated space characters have been written.

As usual, the function close resets the Filter's state:

    void close(Source&)
    {
        col_no_ = 0;
        spaces_ = 0;
    }