...one of the most highly
regarded and expertly designed C++ library projects in the
world.
— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards
The boost Tokenizer package provides a flexible and easy to use way to break of a string or other character sequence into a series of tokens. Below is a simple example that will break up a phrase into words.
// simple_example_1.cpp #include#include #include int main(){ using namespace std; using namespace boost; string s = "This is, a test"; tokenizer<> tok(s); for(tokenizer<>::iterator beg=tok.begin(); beg!=tok.end();++beg){ cout << *beg << "\n"; } }
You can choose how the string gets broken up. You do this
by specifying the TokenizerFunction. If you do not specify anything, the
default TokenizerFunction is char_delimiters_separator
Field 1,"putting quotes around fields, allows commas",Field 3
Below is an example that will break the previous line into its 3 fields
// simple_example_2.cpp #include#include #include int main(){ using namespace std; using namespace boost; string s = "Field 1,\"putting quotes around fields, allows commas\",Field 3"; tokenizer > tok(s); for(tokenizer >::iterator beg=tok.begin(); beg!=tok.end();++beg){ cout << *beg << "\n"; } }
Finally, for some TokenizerFunctions you have to pass in something into the constructor in order to do anything interesting. An example is offset_separator. This class breaks a string into tokens based on offsets for example
12252001 when parsed using offsets of 2,2,4 becomes 12 25 2001. Below is an example to parse this.
// simple_example_3.cpp #include#include #include int main(){ using namespace std; using namespace boost; string s = "12252001"; int offsets[] = {2,2,4}; offset_separator f(offsets, offsets+3); tokenizer tok(s,f); for(tokenizer ::iterator beg=tok.begin(); beg!=tok.end();++beg){ cout << *beg << "\n"; } }
Revised 25 December, 2006
Copyright © 2001 John R. Bandela
Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)