Tutorial

2.2.6. Dictionary Filters

A dictionary filter is a Filter which performs text substitution in the following manner. It maintains a collection of pairs of strings whose first components are words and whose second components represent replacement text — I'll call such a collection a dictionary, and refer to the pairs it contains as definitions. When a dictionary filter encounters a word which appears as the first component of a definition, it forwards the replacement text instead of the original word. Other words, whitespace and punctuation are forwarded unchanged.

The basic algorithm is as follows: You examine characters one at a time, appending them to a string which I'll call the current word. When you encounter a non-alphabetic character, you consult the dictionary to determine whether the current word appears as the first component of a definition. If it does, you forward the replacement text followed by the non-alphabetic character. Otherwise, you forward the current word followed by the non-alphabetic character. When the end-of-stream is reached, you consult the dictionary again and forward either the curent word or its replacement, as appropriate.

In the following sections, I'll express this algorithm as a stdio_filter, an InputFilter and an OutputFilter. The source code can be found in the header <libs/iostreams/example/dictionary_filter.hpp>.

dictionary

You can represent a dictionary using the following class:

#include <map>
#include <string>

namespace boost { namespace iostreams { namespace example {

class dictionary {
public:
    void add(std::string key, const std::string& value);
    void replace(std::string& key);

    /* ... */
};

} } } // End namespace boost::iostreams:example

The member function add converts key to lower case and adds the pair key, value to the dictionary. The member function replace searches for a definition whose first component is equal to the result of converting key to lower case. If it finds such a definition, it assigns the replacement text to key, adjusting the case of the first character to match the case of the first character of key. Otherwise, it does nothing.

dictionary_stdio_filter

You can express a dictionary filter as a stdio_filter as follows:

#include <cstdio>    // EOF
#include <iostream>  // cin, cout
#include <boost/iostreams/filter/stdio.hpp>

namespace boost { namespace iostreams { namespace example {

class dictionary_stdio_filter : public stdio_filter {
public:
    dictionary_stdio_filter(dictionary& d) : dictionary_(d) { }
private:
    void do_filter()
    {
        using namespace std;
        while (true) {
            int c = std::cin.get();
            if (c == EOF || !std::isalpha((unsigned char) c)) {
                dictionary_.replace(current_word_);
                cout.write( current_word_.data(),
                            static_cast<streamsize>(current_word_.size()) );
                current_word_.erase();
                if (c == EOF)
                    break;
                cout.put(c);
            } else {
                current_word_ += c;
            }
        }
    }
    dictionary&  dictionary_;
    std::string  current_word_;
};

} } } // End namespace boost::iostreams:example

The implementation of do_filter simply loops, reading characters from std::cin and appending them to the member variable current_word_ until a non-alphabetic character or end-of-stream indication is encountered. When this occurs it uses its dictionary, stored in the member variable dictionary_, to replace the current word if necessary. Finally, it writes the current word, followed by the non-alphabetic character, if any, to std::cout.

dictionary_input_filter

You can express a dictionary filter as an InputFilter as follows:

#include <boost/iostreams/char_traits.hpp> // EOF, WOULD_BLOCK
#include <boost/iostreams/concepts.hpp>    // input_filter
#include <boost/iostreams/operations.hpp>  // get

namespace boost { namespace iostreams { namespace example {

class dictionary_input_filter : public input_filter {
public:
    dictionary_input_filter(dictionary& d)
        : dictionary_(d), off_(std::string::npos), eof_(false)
        { }

    template<typename Source>
    int get(Source& src);

    template<typename Source>
    void close(Source&);
private:
    dictionary&             dictionary_;
    std::string             current_word_;
    std::string::size_type  off_;
    bool                    eof_;
};

} } } // End namespace boost::iostreams:example

The function get is implemented as follows:

    template<typename Source>
    int get(Source& src)
    {
        // Handle unfinished business.
        if (off_ != std::string::npos && off_ < current_word_.size())
            return current_word_[off_++];
        if (off_ == current_word_.size()) {
            current_word_.erase();
            off_ = std::string::npos;
        }
        if (eof_)
            return EOF;

        // Compute curent word.
        while (true) {
            int c;
            if ((c = iostreams::get(src)) == WOULD_BLOCK)
                return WOULD_BLOCK;

            if (c == EOF || !std::isalpha((unsigned char) c)) {
                dictionary_.replace(current_word_);
                off_ = 0;
                if (c == EOF)
                    eof_ = true;
                else
                    current_word_ += c;
                break;
            } else {
                current_word_ += c;
            }
        }

        return this->get(src); // Note: current_word_ is not empty.
    }

You first check to see whether there are any characters which remain from a previous invocation of get. If so, you update some book keeping information and return the first such character.

The while loop is very similar to that of dictionary_stdio_filter::do_filter: it reads characters from the Source src, appending them to current_word_ until a non-alphabetic character, EOF or WOULD_BLOCK is encountered. The value WOULD_BLOCK is passed on to the caller. In the remaining cases, the dictionary is consulted to determine the appropriate replacement text.

Finally, get is called recursively to return the first character of the current word.

As usual, the function close resets the Filter's state:

    template<typename Source>
    void close(Source&)
    {
        current_word_.erase();
        off_ = std::string::npos;
        eof_ = false;
    }

dictionary_output_filter

You can express a dictionary filter as an OutputFilter as follows:

#include <algorithm>                      // swap
#include <boost/iostreams/concepts.hpp>   // output_filter
#include <boost/iostreams/operations.hpp> // write

namespace boost { namespace iostreams { namespace example {

class dictionary_output_filter : public output_filter {
public:
    typedef std::map<std::string, std::string> map_type;
    dictionary_output_filter(dictionary& d)
        : dictionary_(d), off_(std::string::npos)
        { }

    template<typename Sink>
    bool put(Sink& dest, int c);

    template<typename Sink>
    void close(Sink& dest);
private:
    template<typename Sink>
    bool write_current_word(Sink& dest);
    dictionary&             dictionary_;
    std::string             current_word_;
    std::string::size_type  off_;
};

} } } // End namespace boost::iostreams:example

Let's look first at the helper function write_current_word:

    template<typename Sink>
    bool write_current_word(Sink& dest)
    {
        using namespace std;
        streamsize amt = static_cast<streamsize>(current_word_.size() - off_);
        streamsize result =
            iostreams::write(dest, current_word_.data() + off_, amt);
        if (result == amt) {
            current_word_.erase();
            off_ = string::npos;
            return true;
        } else {
            off_ += static_cast<string::size_type>(result);
            return false;
        }
    }

This function attempts to write current_word_, beginning at the offset off_, to the provided Sink. If the entire sequence is successfully written, current_word_ is cleared and the function returns true. Otherwise the member variable off_ is updated to point to the first unwritten character and the function fails.

Using write_current_word you can implement put as follows:

    template<typename Sink>
    bool put(Sink& dest, int c)
    {
        if (off_ != std::string::npos && !write_current_word(dest))
            return false;
        if (!std::isalpha((unsigned char) c)) {
            dictionary_.replace(current_word_);
            off_ = 0;
        }

        current_word_ += c;
        return true;
    }

As in the implementation of dictionary_input_filter::get, you first check to see whether there are any characters from a previous invocation of put which remain to be written. If so, you attempt to write these characters using write_current_word. If successful, you next examine the given character c. If it is a non-alphabetic character, you consult the dictionary to determine the appropriate replacement text. In any case, you append c to current_word_ and return true.

The function close has more work to do in this case than simply reseting the Filter's state. Unless the last character of the unfiltered sequence happened to be a non-alphabetic character, the contents of current_word_ will not yet have been written:

    template<typename Sink>
    void close(Sink& dest)
    {
        // Reset current_word_ and off_, saving old values.
        std::string             current_word;
        std::string::size_type  off = 0;
        current_word.swap(current_word_);
        std::swap(off, off_);

        // Write remaining characters to dest.
        if (off == std::string::npos) {
            dictionary_.replace(current_word);
            off = 0;
        }
        if (!current_word.empty())
            iostreams::write( 
                dest,
                current_word.data() + off, 
                static_cast<std::streamsize>(current_word.size() - off) 
            );
    }

Note that you may assume that the template argument is a Blocking Sink, and that you must reset the values of current_word_ and off_ before calling write, in case write throws an exception.