Appendices

Appendix 1: Not Yet Implemented

The following features of xpressive have not yet been implemented, but are planned for the near future:

match_flag_type::format_sed
match_flag_type::format_perl

The following features are planned for xpressive 2.0:

syntax_option_type::collate
POSIX collation sequences such as [.a.] and [=a=]
Semantic actions
Fine-grained control over the dynamic regex syntax
Improved localization support, possibly as a custom facet for std::locale

Appendix 2: Differences from Boost.Regex

Since many of xpressive's users are likely to be familiar with the Boost.Regex library, I would be remiss if I failed to point out some important differences between xpressive and Boost.Regex. In particular:

xpressive::basic_regex<> is a template on the iterator type, not the character type.
xpressive::basic_regex<> cannot be constructed directly from a string; rather, you must use basic_regex::compile() or regex_compiler<> to build a regex object from a string.
xpressive::basic_regex<> does not have an imbue() member function; rather, the imbue() member function is in the xpressive::regex_compiler<> factory.
boost::basic_regex<> has a subset of std::basic_string<>'s members. xpressive::basic_regex<> does not. The members lacking are: assign(), operator[](), max_size(), begin(), end(), size(), compare(), and operator=(std::basic_string<>).
Other member functions that exist in boost::basic_regex<> but do not exist in xpressive::basic_regex<> are: set_expression(), get_allocator(), imbue(), getloc(), getflags(), and str().
xpressive::basic_regex<> does not have a RegexTraits template parameter. Customization of regex syntax and localization behavior will be controlled by regex_compiler<> and a custom regex facet for std::locale.
xpressive::basic_regex<> and xpressive::match_results<> do not have an Allocator template parameter. This is by design.
match_not_dot_null and match_not_dot_newline have moved from the match_flag_type enum to the syntax_option_type enum, and they have changed names to not_dot_null and not_dot_newline.
The following syntax_option_type enumeration values are not supported: escape_in_lists, char_classes, intervals, limited_ops, newline_alt, bk_plus_qm, bk_braces, bk_parens, bk_refs, bk_vbar, use_except, failbit, literal, perlex, basic, extended, emacs, awk, grep ,egrep, sed, JavaScript, JScript.
The following match_flag_type enumeration values are not supported: match_not_bob, match_not_eob, match_perl, match_posix, and match_extra.

Also, in the current implementation, the regex algorithms in xpressive will not detect pathological behavior and abort by throwing an exception. It is up to you to write efficient patterns that do not behave pathologically.

Appendix 3: Performance Comparison

xpressive vs. Boost.Regex with GCC (Cygwin)
xpressive vs. Boost.Regex with Visual C++

The performance of xpressive is competitive with Boost.Regex. I have run performance benchmarks comparing static xpressive, dynamic xpressive and Boost.Regex on two platforms: gcc (Cygwin) and Visual C++. The tests include short matches and long searches. For both platforms, xpressive comes off well on short matches and roughly on par with Boost.Regex on long searches.

<disclaimer> As with all benchmarks, the true test is how xpressive performs with your patterns, your input, and your platform, so if performance matters in your application, it's best to run your own tests. </disclaimer>

xpressive vs. Boost.Regex with GCC (Cygwin)

Below are the results of a performance comparison between:

static xpressive
dynamic xpressive
Boost.Regex

Test Specifications

Hardware:: hyper-threaded 3GHz Xeon with 1Gb RAM
Operating System:: Windows XP Pro + Cygwin
Compiler:: GNU C++ version 3.4.4 (Cygwin special)
C++ Standard Library:: GNU libstdc++ version 3.4.4
Boost.Regex Version:: 1.33+, BOOST_REGEX_USE_CPP_LOCALE, BOOST_REGEX_RECURSIVE
xpressive Version:: 0.9.6a

Comparison 1: Short Matches

The following tests evaluate the time taken to match the expression to the input string. For each result, the top number has been normalized relative to the fastest time, so 1.0 is as good as it gets. The bottom number (in parentheses) is the actual time in seconds. The best time has been marked in green.

Short Matches

static xpressive	dynamic xpressive	Boost	Text	Expression
1 (8.79e‑07s)	1.08 (9.54e‑07s)	2.51 (2.2e‑06s)	100- this is a line of ftp response which contains a message string	`^([0-9]+)(\-\| \|$)(.*)$`
1.06 (1.07e‑06s)	1 (1.01e‑06s)	4.01 (4.06e‑06s)	1234-5678-1234-456	`([[:digit:]]{4}[- ]){3}[[:digit:]]{3,4}`
1 (1.4e‑06s)	1.13 (1.58e‑06s)	2.89 (4.05e‑06s)	john_maddock@compuserve.com	`^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)\|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}\|[0-9]{1,3})(\]?)$`
1 (1.28e‑06s)	1.16 (1.49e‑06s)	3.07 (3.94e‑06s)	foo12@foo.edu	`^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)\|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}\|[0-9]{1,3})(\]?)$`
1 (1.22e‑06s)	1.2 (1.46e‑06s)	3.22 (3.93e‑06s)	bob.smith@foo.tv	`^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)\|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}\|[0-9]{1,3})(\]?)$`
1.04 (8.64e‑07s)	1 (8.34e‑07s)	2.5 (2.09e‑06s)	EH10 2QQ	`^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$`
1.11 (9.09e‑07s)	1 (8.19e‑07s)	2.47 (2.03e‑06s)	G1 1AA	`^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$`
1.12 (9.38e‑07s)	1 (8.34e‑07s)	2.5 (2.08e‑06s)	SW1 1ZZ	`^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$`
1 (7.9e‑07s)	1.06 (8.34e‑07s)	2.49 (1.96e‑06s)	4/1/2001	`^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$`
1 (8.19e‑07s)	1.04 (8.49e‑07s)	2.4 (1.97e‑06s)	12/12/2001	`^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$`
1.09 (8.95e‑07s)	1 (8.19e‑07s)	2.4 (1.96e‑06s)	123	`^[-+]?[[:digit:]]\.?[[:digit:]]$`
1.11 (8.79e‑07s)	1 (7.9e‑07s)	2.57 (2.03e‑06s)	+3.14159	`^[-+]?[[:digit:]]\.?[[:digit:]]$`
1.09 (8.94e‑07s)	1 (8.19e‑07s)	2.47 (2.03e‑06s)	-3.14159	`^[-+]?[[:digit:]]\.?[[:digit:]]$`

Comparison 2: Long Searches

The next test measures the time to find all matches in a long English text. The text is the complete works of Mark Twain, from Project Gutenberg. The text is 19Mb long. As above, the top number is the normalized time and the bottom number is the actual time. The best time is in green.

Long Searches

static xpressive	dynamic xpressive	Boost	Expression
1 (0.0263s)	1 (0.0263s)	1.78 (0.0469s)	`Twain`
1 (0.0234s)	1 (0.0234s)	1.79 (0.042s)	`Huck[[:alpha:]]+`
1.84 (1.26s)	2.21 (1.51s)	1 (0.687s)	`[[:alpha:]]+ing`
1.09 (0.192s)	2 (0.351s)	1 (0.176s)	`^[^ ]*?Twain`
1.41 (0.08s)	1.21 (0.0684s)	1 (0.0566s)	`Tom\|Sawyer\|Huckleberry\|Finn`
1.56 (0.195s)	1.12 (0.141s)	1 (0.125s)	`(Tom\|Sawyer\|Huckleberry\|Finn).{0,30}river\|river.{0,30}(Tom\|Sawyer\|Huckleberry\|Finn)`

xpressive vs. Boost.Regex with Visual C++

Below are the results of a performance comparison between:

static xpressive
dynamic xpressive
Boost.Regex

Test Specifications

Hardware:: hyper-threaded 3GHz Xeon with 1Gb RAM
Operating System:: Windows XP Pro
Compiler:: Visual C++ .NET 2003 (7.1)
C++ Standard Library:: Dinkumware, version 313
Boost.Regex Version:: 1.33+, BOOST_REGEX_USE_CPP_LOCALE, BOOST_REGEX_RECURSIVE
xpressive Version:: 0.9.6a

Comparison 1: Short Matches

The following tests evaluate the time taken to match the expression to the input string. For each result, the top number has been normalized relative to the fastest time, so 1.0 is as good as it gets. The bottom number (in parentheses) is the actual time in seconds. The best time has been marked in green.

Short Matches

static xpressive	dynamic xpressive	Boost	Text	Expression
1 (3.2e‑007s)	1.37 (4.4e‑007s)	2.38 (7.6e‑007s)	100- this is a line of ftp response which contains a message string	`^([0-9]+)(\-\| \|$)(.*)$`
1 (6.4e‑007s)	1.12 (7.15e‑007s)	1.72 (1.1e‑006s)	1234-5678-1234-456	`([[:digit:]]{4}[- ]){3}[[:digit:]]{3,4}`
1 (9.82e‑007s)	1.3 (1.28e‑006s)	1.61 (1.58e‑006s)	john_maddock@compuserve.com	`^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)\|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}\|[0-9]{1,3})(\]?)$`
1 (8.94e‑007s)	1.3 (1.16e‑006s)	1.7 (1.52e‑006s)	foo12@foo.edu	`^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)\|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}\|[0-9]{1,3})(\]?)$`
1 (9.09e‑007s)	1.28 (1.16e‑006s)	1.67 (1.52e‑006s)	bob.smith@foo.tv	`^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)\|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}\|[0-9]{1,3})(\]?)$`
1 (3.06e‑007s)	1.07 (3.28e‑007s)	1.95 (5.96e‑007s)	EH10 2QQ	`^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$`
1 (3.13e‑007s)	1.09 (3.42e‑007s)	1.86 (5.81e‑007s)	G1 1AA	`^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$`
1 (3.2e‑007s)	1.09 (3.5e‑007s)	1.86 (5.96e‑007s)	SW1 1ZZ	`^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$`
1 (2.68e‑007s)	1.22 (3.28e‑007s)	2 (5.36e‑007s)	4/1/2001	`^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$`
1 (2.76e‑007s)	1.16 (3.2e‑007s)	1.94 (5.36e‑007s)	12/12/2001	`^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$`
1 (2.98e‑007s)	1.03 (3.06e‑007s)	1.85 (5.51e‑007s)	123	`^[-+]?[[:digit:]]\.?[[:digit:]]$`
1 (3.2e‑007s)	1.12 (3.58e‑007s)	1.81 (5.81e‑007s)	+3.14159	`^[-+]?[[:digit:]]\.?[[:digit:]]$`
1 (3.28e‑007s)	1.11 (3.65e‑007s)	1.77 (5.81e‑007s)	-3.14159	`^[-+]?[[:digit:]]\.?[[:digit:]]$`

Comparison 2: Long Searches

The next test measures the time to find all matches in a long English text. The text is the complete works of Mark Twain, from Project Gutenberg. The text is 19Mb long. As above, the top number is the normalized time and the bottom number is the actual time. The best time is in green.

Long Searches

static xpressive	dynamic xpressive	Boost	Expression
1 (0.019s)	1 (0.019s)	2.98 (0.0566s)	`Twain`
1 (0.0176s)	1 (0.0176s)	3.17 (0.0556s)	`Huck[[:alpha:]]+`
3.62 (1.78s)	3.97 (1.95s)	1 (0.492s)	`[[:alpha:]]+ing`
2.32 (0.344s)	3.06 (0.453s)	1 (0.148s)	`^[^ ]*?Twain`
1 (0.0576s)	1.05 (0.0606s)	1.15 (0.0664s)	`Tom\|Sawyer\|Huckleberry\|Finn`
1.24 (0.164s)	1.44 (0.191s)	1 (0.133s)	`(Tom\|Sawyer\|Huckleberry\|Finn).{0,30}river\|river.{0,30}(Tom\|Sawyer\|Huckleberry\|Finn)`

Boost C++ Libraries

Appendix 1: Not Yet Implemented

Appendix 2: Differences from Boost.Regex

Appendix 3: Performance Comparison

xpressive vs. Boost.Regex with GCC (Cygwin)

Comparison 1: Short Matches

Short Matches

Comparison 2: Long Searches

Long Searches

xpressive vs. Boost.Regex with Visual C++

Comparison 1: Short Matches

Short Matches

Comparison 2: Long Searches

Long Searches