...one of the most highly
regarded and expertly designed C++ library projects in the
world.
— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards
Using the algorithms is straightforward. Let us have a look at the first example:
#include <boost/algorithm/string.hpp> using namespace std; using namespace boost; // ... string str1(" hello world! "); to_upper(str1); // str1 == " HELLO WORLD! " trim(str1); // str1 == "HELLO WORLD!" string str2= to_lower_copy( ireplace_first_copy( str1,"hello","goodbye")); // str2 == "goodbye world!"
This example converts str1 to upper case and trims spaces from the start and the end of the string. str2 is then created as a copy of str1 with "hello" replaced with "goodbye". This example demonstrates several important concepts used in the library:
Container parameters:
Unlike in the STL algorithms, parameters are not specified only in the form
of iterators. The STL convention allows for great flexibility,
but it has several limitations. It is not possible to stack algorithms together,
because a container is passed in two parameters. Therefore it is not possible to use
a return value from another algorithm. It is considerably easier to write
to_lower(str1)
, than to_lower(str1.begin(), str1.end())
.
The magic of Boost.Range
provides a uniform way of handling different string types.
If there is a need to pass a pair of iterators,
boost::iterator_range
can be used to package iterators into a structure with a compatible interface.
Copy vs. Mutable: Many algorithms in the library are performing a transformation of the input. The transformation can be done in-place, mutating the input sequence, or a copy of the transformed input can be created, leaving the input intact. None of these possibilities is superior to the other one and both have different advantages and disadvantages. For this reason, both are provided with the library.
Algorithm stacking:
Copy versions return a transformed input as a result, thus allow a simple chaining of
transformations within one expression (i.e. one can write trim_copy(to_upper_copy(s))
).
Mutable versions have void
return, to avoid misuse.
Naming:
Naming follows the conventions from the Standard C++ Library. If there is a
copy and a mutable version of the same algorithm, the mutable version has no suffix
and the copy version has the suffix _copy.
Some algorithms have the prefix i
(e.g. ifind_first()
).
This prefix identifies that the algorithm works in a case-insensitive manner.
To use the library, include the boost/algorithm/string.hpp
header.
If the regex related functions are needed, include the
boost/algorithm/string_regex.hpp
header.
STL has a nice way of converting character case. Unfortunately, it works only for a single character and we want to convert a string,
string str1("HeLlO WoRld!"); to_upper(str1); // str1=="HELLO WORLD!"
to_upper()
and to_lower()
convert the case of
characters in a string using a specified locale.
For more information see the reference for boost/algorithm/string/case_conv.hpp
.
A part of the library deals with string related predicates. Consider this example:
bool is_executable( string& filename ) { return iends_with(filename, ".exe") || iends_with(filename, ".com"); } // ... string str1("command.com"); cout << str1 << (is_executable("command.com")? "is": "is not") << "an executable" << endl; // prints "command.com is an executable" //.. char text1[]="hello world!"; cout << text1 << (all( text1, is_lower() )? "is": "is not") << " written in the lower case" << endl; // prints "hello world! is written in the lower case"
The predicates determine whether if a substring is contained in the input string
under various conditions. The conditions are: a string starts with the substring,
ends with the substring,
simply contains the substring or if both strings are equal. See the reference for
boost/algorithm/string/predicate.hpp
for more details.
In addition the algorithm all()
checks
all elements of a container to satisfy a condition specified by a predicate.
This predicate can be any unary predicate, but the library provides a bunch of
useful string-related predicates and combinators ready for use.
These are located in the boost/algorithm/string/classification.hpp
header.
Classification predicates can be combined using logical combinators to form
a more complex expressions. For example: is_from_range('a','z') || is_digit()
When parsing the input from a user, strings usually have unwanted leading or trailing characters. To get rid of them, we need trim functions:
string str1=" hello world! "; string str2=trim_left_copy(str1); // str2 == "hello world! " string str3=trim_right_copy(str1); // str3 == " hello world!" trim(str1); // str1 == "hello world!" string phone="00423333444"; // remove leading 0 from the phone number trim_left_if(phone,is_any_of("0")); // phone == "423333444"
It is possible to trim the spaces on the right, on the left or on both sides of a string.
And for those cases when there is a need to remove something else than blank space, there
are _if variants. Using these, a user can specify a functor which will
select the space to be removed. It is possible to use classification
predicates like is_digit()
mentioned in the previous paragraph.
See the reference for the boost/algorithm/string/trim.hpp
.
The library contains a set of find algorithms. Here is an example:
char text[]="hello dolly!"; iterator_range<char*> result=find_last(text,"ll"); transform( result.begin(), result.end(), result.begin(), bind2nd(plus<char>(), 1) ); // text = "hello dommy!" to_upper(result); // text == "hello doMMy!" // iterator_range is convertible to bool if(find_first(text, "dolly")) { cout << "Dolly is there" << endl; }
We have used find_last()
to search the text
for "ll".
The result is given in the boost::iterator_range
.
This range delimits the
part of the input which satisfies the find criteria. In our example it is the last occurrence of "ll".
As we can see, input of the find_last()
algorithm can be also
char[] because this type is supported by
Boost.Range.
The following lines transform the result. Notice that
boost::iterator_range
has familiar
begin()
and end()
methods, so it can be used like any other STL container.
Also it is convertible to bool therefore it is easy to use find algorithms for a simple containment checking.
Find algorithms are located in boost/algorithm/string/find.hpp
.
Find algorithms can be used for searching for a specific part of string. Replace goes one step further. After a matching part is found, it is substituted with something else. The substitution is computed from the original, using some transformation.
string str1="Hello Dolly, Hello World!" replace_first(str1, "Dolly", "Jane"); // str1 == "Hello Jane, Hello World!" replace_last(str1, "Hello", "Goodbye"); // str1 == "Hello Jane, Goodbye World!" erase_all(str1, " "); // str1 == "HelloJane,GoodbyeWorld!" erase_head(str1, 6); // str1 == "Jane,GoodbyeWorld!"
For the complete list of replace and erase functions see the
reference.
There is a lot of predefined function for common usage, however, the library allows you to
define a custom replace()
that suits a specific need. There is a generic find_format()
function which takes two parameters.
The first one is a Finder object, the second one is
a Formatter object.
The Finder object is a functor which performs the searching for the replacement part. The Formatter object
takes the result of the Finder (usually a reference to the found substring) and creates a
substitute for it. Replace algorithm puts these two together and makes the desired substitution.
Check boost/algorithm/string/replace.hpp
, boost/algorithm/string/erase.hpp
and
boost/algorithm/string/find_format.hpp
for reference.
An extension to find algorithms it the Find Iterator. Instead of searching for just a one part of a string,
the find iterator allows us to iterate over the substrings matching the specified criteria.
This facility is using the Finder to incrementally
search the string.
Dereferencing a find iterator yields an boost::iterator_range
object, that delimits the current match.
There are two iterators provided find_iterator
and
split_iterator
. The former iterates over substrings that are found using the specified
Finder. The latter iterates over the gaps between these substrings.
string str1("abc-*-ABC-*-aBc"); // Find all 'abc' substrings (ignoring the case) // Create a find_iterator typedef find_iterator<string::iterator> string_find_iterator; for(string_find_iterator It= make_find_iterator(str1, first_finder("abc", is_iequal())); It!=string_find_iterator(); ++It) { cout << copy_range<std::string>(*It) << endl; } // Output will be: // abc // ABC // aBC typedef split_iterator<string::iterator> string_split_iterator; for(string_split_iterator It= make_split_iterator(str1, first_finder("-*-", is_iequal())); It!=string_split_iterator(); ++It) { cout << copy_range<std::string>(*It) << endl; } // Output will be: // abc // ABC // aBC
Note that the find iterators have only one template parameter. It is the base iterator type. The Finder is specified at runtime. This allows us to typedef a find iterator for common string types and reuse it. Additionally make_*_iterator functions help to construct a find iterator for a particular range.
See the reference in boost/algorithm/string/find_iterator.hpp
.
Split algorithms are an extension to the find iterator for one common usage scenario.
These algorithms use a find iterator and store all matches into the provided
container. This container must be able to hold copies (e.g. std::string
) or
references (e.g. iterator_range
) of the extracted substrings.
Two algorithms are provided. find_all()
finds all copies
of a string in the input. split()
splits the input into parts.
string str1("hello abc-*-ABC-*-aBc goodbye"); typedef vector< iterator_range<string::iterator> > find_vector_type; find_vector_type FindVec; // #1: Search for separators ifind_all( FindVec, str1, "abc" ); // FindVec == { [abc],[ABC],[aBc] } typedef vector< string > split_vector_type; split_vector_type SplitVec; // #2: Search for tokens split( SplitVec, str1, is_any_of("-*"), token_compress_on ); // SplitVec == { "hello abc","ABC","aBc goodbye" }
[hello]
designates an iterator_range
delimiting this substring.
First example show how to construct a container to hold references to all extracted
substrings. Algorithm ifind_all()
puts into FindVec references
to all substrings that are in case-insensitive manner equal to "abc".
Second example uses split()
to split string str1 into parts
separated by characters '-' or '*'. These parts are then put into the SplitVec.
It is possible to specify if adjacent separators are concatenated or not.
More information can be found in the reference: boost/algorithm/string/split.hpp
.
Last revised: October 30, 2010 at 18:34:45 +0100 |