Design Topics

String Representation

As the name suggest, this library works mainly with strings. However, in the context of this library, a string is not restricted to any particular implementation (like std::basic_string), rather it is a concept. This allows the algorithms in this library to be reused for any string type, that satisfies the given requirements.

Definition: A string is a collection of characters accessible in sequential ordered fashion. Character is any value type with "cheap" copying and assignment.

First requirement of string-type is that it must accessible using collection traits. This facility allows to access the elements inside the string in a uniform iterator-based fashion. This requirement is actually less stringent than that of collection concept. It implements an external collection interface. This is sufficient for our library

Second requirement defines the way in which the characters are stored in the string. Algorithms in this library work with an assumption that copying a character is cheaper then allocating extra storage to cache results. This is a natural assumption for common character types. Algorithms will work even if this requirement is not satisfied, however at the cost of performance degradation.

In addition some algorithms have additional requirements on the string-type. Particularly, it is required that an algorithm can create a new string of the given type. In this case, it is required that the type satisfies the sequence (Std §23.1.1) requirements.

In the reference and also in the code, requirement on the string type is designated by the name of template argument. CollectionT means that the basic collection requirements must hold. SequenceT designates extended sequence requirements.

`iterator_range` class

An iterator_range is an encapsulation of a pair of iterators that delimit a sequence (or, a range). This concept is widely used by sequence manipulating algorithms. Although being so useful, there no direct support for it in the standard library (The closest thing is that some algorithms return a pair of iterators). Instead all STL algorithms have two distinct parameters for beginning and end of a range. This design is natural for implementation of generic algorithms, but it forbids to work with a range as a single value.

It is possible to encapsulate a range in std::pair<>, but std::pair<> is an overly generic encapsulation, so it is not best match for a range. For instance, it does not enforce that begin and end iterators be of the same type.

Naturally the range concept is heavily used also in this library. During the development of the library, it was discovered, that there is a need for a reasonable encapsulation for it, since core part of the library deals with substring searching algorithms and any such algorithm returns a range delimiting the result of the search. std::pair<> was deemed as unsuitable. Therefore the iterator_range was defined.

The intention of the iterator_range class is to manage a range as a single value and provide a basic interface for common operations. Its interface is similar to that of a collection. In addition to begin() and end() accessors, it has member functions for checking whether the range is empty, or to determine the size of the range. It also has a set of member typedefs that extract type information from the encapsulated iterators. As such, the interface is compatible with the collection traits requirements so it is possible to use this class as a parameter to many algorithms in this library.

iterator_range will be moved to Boost.Range library in the future releases. The internal version will be deprecated then.

Collection Traits

Collection traits provide uniform access to different types of collections . This functionality allows to write generic algorithms which work with several different kinds of collections. For this library it means, that, for instance, many algorithms work with std::string as well as with char[]. This facility implements the external collection concept.

The following collection types are supported:

Standard containers
Built-in arrays (like int[])
Null terminated strings (this includes char[],wchar_t[],char*, and wchar_t*)
std::pair<iterator,iterator>

Collection traits support a subset of the container concept (Std §23.1). This subset can be described as an input container concept, e.g. a container with immutable content. Its definition can be found in the header boost/algorithm/string/collection_traits.hpp.

In the table C denotes a container and c is an object of C.

Table 9.11. Collection Traits

Name	Standard collection equivalent	Description
value_type_of<C>::type	`C::value_type`	Type of contained values
difference_type_of<C>::type	`C::difference_type`	difference type of the collection
iterator_of<C>::type	`C::iterator`	iterator type of the collection
const_iterator_of<C>::type	`C::const_iterator`	const_iterator type of the collection
result_iterator_of<C>::type		result_iterator type of the collection. This type maps to `C::iterator` for mutable collection and `C::const_iterator` for const collection.
begin(c)	`c.begin()`	Gets the iterator pointing to the start of the collection.
end(c)	`c.end()`	Gets the iterator pointing to the end of the collection.
size(c)	`c.size()`	Gets the size of the collection.
empty(c)	`c.empty()`	Checks if the collection is empty.

The collection traits are only a temporary part of this library. They will be replaced in the future releases by Boost.Range library. Use of the internal implementation will be deprecated then.

Sequence Traits

The major difference between std::list and std::vector is not in the interfaces they provide, but rather in the inner details of the class and the way how it performs various operations. The problem is that it is not possible to infer this difference from the definitions of classes without some special mechanism. However, some algorithms can run significantly faster with the knowledge of the properties of a particular container.

Sequence traits allow one to specify additional properties of a sequence container (see Std.§32.2). These properties are then used by algorithms to select optimized handling for some operations. The sequence traits are declared in the header boost/algorithm/string/sequence_traits.hpp.

In the table C denotes a container and c is an object of C.

Table 9.12. Sequence Traits

Trait	Description
has_native_replace<C>::value	Specifies that the sequence has std::string like replace method
has_stable_iterators<C>::value	Specifies that the sequence has stable iterators. It means, that operations like `insert`/`erase`/`replace` do not invalidate iterators.
has_const_time_insert<C>::value	Specifies that the insert method of the sequence has constant time complexity.
has_const_time_erase<C>::value	Specifies that the erase method of the sequence has constant time complexity

Current implementation contains specializations for std::list<T> and std::basic_string<T> from the standard library and SGI's std::rope<T> and std::slist<T>.

Find Algorithms

Find algorithms have similar functionality to std::search() algorithm. They provide a different interface which is more suitable for common string operations. Instead of returning just the start of matching subsequence they return a range which is necessary when the length of the matching subsequence is not known beforehand. This feature also allows a partitioning of the input sequence into three parts: a prefix, a substring and a suffix.

Another difference is an addition of various searching methods besides find_first, including find_regex.

It the library, find algorithms are implemented in terms of Finders. Finders are used also by other facilities (replace,split). For convenience, there are also function wrappers for these finders to simplify find operations.

Currently the library contains only naive implementation of find algorithms with complexity O(n * m) where n is the size of the input sequence and m is the size of the search sequence. There are algorithms with complexity O(n), but for smaller sequence a constant overhead is rather big. For small m << n (m by magnitude smaller than n) the current implementation provides acceptable efficiency. Even the C++ standard defines the required complexity for search algorithm as O(n * m). It is possible that a future version of library will also contain algorithms with linear complexity as an option

Replace Algorithms

The implementation of replace algorithms follows the layered structure of the library. The lower layer implements generic substitution of a range in the input sequence. This layer takes a Finder object and a Formatter object as an input. These two functors define what to replace and what to replace it with. The upper layer functions are just wrapping calls to the lower layer. Finders are shared with the find and split facility.

As usual, the implementation of the lower layer is designed to work with a generic sequence while taking advantage of specific features if possible (by using Sequence traits)

Find Iterators & Split Algorithms

Find iterators are a logical extension of the find facility. Instead of searching for one match, the whole input can be iteratively searched for multiple matches. The result of the search is then used to partition the input. It depends on the algorithms which parts are returned as the result. They can be the matching parts (find_iterator) of the parts in between (split_iterator).

In addition the split algorithms like find_all() and split() can simplify the common operations. They use a find iterator to search the whole input and copy the matches they found into the supplied container.

Exception Safety

The library requires that all operations on types used as template or function arguments provide the basic exception-safety guarantee. In turn, all functions and algorithms in this library, except where stated otherwise, will provide the basic exception-safety guarantee. In other words: The library maintains its invariants and does not leak resources in the face of exceptions. Some library operations give stronger guarantees, which are documented on an individual basis.

Some functions can provide the strong exception-safety guarantee. That means that following statements are true:

If an exception is thrown, there are no effects other than those of the function
If an exception is thrown other than by the function, there are no effects

This guarantee can be provided under the condition that the operations on the types used for arguments for these functions either provide the strong exception guarantee or do not alter the global state .

In the reference, under the term strong exception-safety guarantee, we mean the guarantee as defined above.

For more information about the exception safety topics, follow this link