The following features of xpressive have not yet been implemented, but are
planned for the near future:
- match_flag_type::format_sed
-
match_flag_type::format_perl
The following features are planned for xpressive 2.0:
- syntax_option_type::collate
-
POSIX collation sequences such as
[.a.]and[=a=] - Semantic actions
- Fine-grained control over the dynamic regex syntax
-
Improved localization support, possibly as a custom facet for
std::locale
Since many of xpressive's users are likely to be familiar with the Boost.Regex library, I would be remiss if
I failed to point out some important differences between xpressive and Boost.Regex. In particular:
-
xpressive::basic_regex<>is a template on the iterator type, not the character type. -
xpressive::basic_regex<>cannot be constructed directly from a string; rather, you must usebasic_regex::compile()orregex_compiler<>to build a regex object from a string. -
xpressive::basic_regex<>does not have animbue()member function; rather, theimbue()member function is in thexpressive::regex_compiler<>factory. -
boost::basic_regex<>has a subset ofstd::basic_string<>'s members.xpressive::basic_regex<>does not. The members lacking are:assign(),operator[](),max_size(),begin(),end(),size(),compare(), andoperator=(std::basic_string<>). -
Other member functions that exist in
boost::basic_regex<>but do not exist inxpressive::basic_regex<>are:set_expression(),get_allocator(),imbue(),getloc(),getflags(), andstr(). -
xpressive::basic_regex<>does not have a RegexTraits template parameter. Customization of regex syntax and localization behavior will be controlled byregex_compiler<>and a custom regex facet forstd::locale. -
xpressive::basic_regex<>andxpressive::match_results<>do not have an Allocator template parameter. This is by design. -
match_not_dot_nullandmatch_not_dot_newlinehave moved from thematch_flag_typeenum to thesyntax_option_typeenum, and they have changed names tonot_dot_nullandnot_dot_newline. -
The following
syntax_option_typeenumeration values are not supported:escape_in_lists,char_classes,intervals,limited_ops,newline_alt,bk_plus_qm,bk_braces,bk_parens,bk_refs,bk_vbar,use_except,failbit,literal,perlex,basic,extended,emacs,awk,grep,egrep,sed,JavaScript,JScript. -
The following
match_flag_typeenumeration values are not supported:match_not_bob,match_not_eob,match_perl,match_posix, andmatch_extra.
Also, in the current implementation, the regex algorithms in xpressive will not detect pathological behavior and abort by throwing an exception. It is up to you to write efficient patterns that do not behave pathologically.
The performance of xpressive is competitive with Boost.Regex. I have run performance benchmarks comparing static xpressive, dynamic xpressive and Boost.Regex on two platforms: gcc (Cygwin) and Visual C++. The tests include short matches and long searches. For both platforms, xpressive comes off well on short matches and roughly on par with Boost.Regex on long searches.
<disclaimer> As with all benchmarks, the true test is how xpressive performs with your patterns, your input, and your platform, so if performance matters in your application, it's best to run your own tests. </disclaimer>
Below are the results of a performance comparison between:
- static xpressive
- dynamic xpressive
- Boost.Regex
Test Specifications
- Hardware:
- hyper-threaded 3GHz Xeon with 1Gb RAM
- Operating System:
- Windows XP Pro + Cygwin
- Compiler:
- GNU C++ version 3.4.4 (Cygwin special)
- C++ Standard Library:
- GNU libstdc++ version 3.4.4
- Boost.Regex Version:
- 1.33+, BOOST_REGEX_USE_CPP_LOCALE, BOOST_REGEX_RECURSIVE
- xpressive Version:
- 0.9.6a
Comparison 1: Short Matches
The following tests evaluate the time taken to match the expression to the input string. For each result, the top number has been normalized relative to the fastest time, so 1.0 is as good as it gets. The bottom number (in parentheses) is the actual time in seconds. The best time has been marked in green.
Short Matches
| static xpressive | dynamic xpressive | Boost | Text | Expression |
|---|---|---|---|---|
| 1(8.79e‑07s) | 1.08(9.54e‑07s) | 2.51(2.2e‑06s) | 100- this is a line of ftp response which contains a message string | ^([0-9]+)(\-| |$)(.*)$ |
| 1.06(1.07e‑06s) | 1(1.01e‑06s) | 4.01(4.06e‑06s) | 1234-5678-1234-456 | ([[:digit:]]{4}[- ]){3}[[:digit:]]{3,4} |
| 1(1.4e‑06s) | 1.13(1.58e‑06s) | 2.89(4.05e‑06s) | john_maddock@compuserve.com | ^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$ |
| 1(1.28e‑06s) | 1.16(1.49e‑06s) | 3.07(3.94e‑06s) | foo12@foo.edu | ^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$ |
| 1(1.22e‑06s) | 1.2(1.46e‑06s) | 3.22(3.93e‑06s) | bob.smith@foo.tv | ^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$ |
| 1.04(8.64e‑07s) | 1(8.34e‑07s) | 2.5(2.09e‑06s) | EH10 2QQ | ^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$ |
| 1.11(9.09e‑07s) | 1(8.19e‑07s) | 2.47(2.03e‑06s) | G1 1AA | ^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$ |
| 1.12(9.38e‑07s) | 1(8.34e‑07s) | 2.5(2.08e‑06s) | SW1 1ZZ | ^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$ |
| 1(7.9e‑07s) | 1.06(8.34e‑07s) | 2.49(1.96e‑06s) | 4/1/2001 | ^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$ |
| 1(8.19e‑07s) | 1.04(8.49e‑07s) | 2.4(1.97e‑06s) | 12/12/2001 | ^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$ |
| 1.09(8.95e‑07s) | 1(8.19e‑07s) | 2.4(1.96e‑06s) | 123 | ^[-+]?[[:digit:]]*\.?[[:digit:]]*$ |
| 1.11(8.79e‑07s) | 1(7.9e‑07s) | 2.57(2.03e‑06s) | +3.14159 | ^[-+]?[[:digit:]]*\.?[[:digit:]]*$ |
| 1.09(8.94e‑07s) | 1(8.19e‑07s) | 2.47(2.03e‑06s) | -3.14159 | ^[-+]?[[:digit:]]*\.?[[:digit:]]*$ |
Comparison 2: Long Searches
The next test measures the time to find all matches in a long English text. The text is the complete works of Mark Twain, from Project Gutenberg. The text is 19Mb long. As above, the top number is the normalized time and the bottom number is the actual time. The best time is in green.
Long Searches
| static xpressive | dynamic xpressive | Boost | Expression |
|---|---|---|---|
| 1(0.0263s) | 1(0.0263s) | 1.78(0.0469s) | Twain |
| 1(0.0234s) | 1(0.0234s) | 1.79(0.042s) | Huck[[:alpha:]]+ |
| 1.84(1.26s) | 2.21(1.51s) | 1(0.687s) | [[:alpha:]]+ing |
| 1.09(0.192s) | 2(0.351s) | 1(0.176s) | ^[^
]*?Twain |
| 1.41(0.08s) | 1.21(0.0684s) | 1(0.0566s) | Tom|Sawyer|Huckleberry|Finn |
| 1.56(0.195s) | 1.12(0.141s) | 1(0.125s) | (Tom|Sawyer|Huckleberry|Finn).{0,30}river|river.{0,30}(Tom|Sawyer|Huckleberry|Finn) |
Below are the results of a performance comparison between:
- static xpressive
- dynamic xpressive
- Boost.Regex
Test Specifications
- Hardware:
- hyper-threaded 3GHz Xeon with 1Gb RAM
- Operating System:
- Windows XP Pro
- Compiler:
- Visual C++ .NET 2003 (7.1)
- C++ Standard Library:
- Dinkumware, version 313
- Boost.Regex Version:
- 1.33+, BOOST_REGEX_USE_CPP_LOCALE, BOOST_REGEX_RECURSIVE
- xpressive Version:
- 0.9.6a
Comparison 1: Short Matches
The following tests evaluate the time taken to match the expression to the input string. For each result, the top number has been normalized relative to the fastest time, so 1.0 is as good as it gets. The bottom number (in parentheses) is the actual time in seconds. The best time has been marked in green.
Short Matches
| static xpressive | dynamic xpressive | Boost | Text | Expression |
|---|---|---|---|---|
| 1(3.2e‑007s) | 1.37(4.4e‑007s) | 2.38(7.6e‑007s) | 100- this is a line of ftp response which contains a message string | ^([0-9]+)(\-| |$)(.*)$ |
| 1(6.4e‑007s) | 1.12(7.15e‑007s) | 1.72(1.1e‑006s) | 1234-5678-1234-456 | ([[:digit:]]{4}[- ]){3}[[:digit:]]{3,4} |
| 1(9.82e‑007s) | 1.3(1.28e‑006s) | 1.61(1.58e‑006s) | john_maddock@compuserve.com | ^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$ |
| 1(8.94e‑007s) | 1.3(1.16e‑006s) | 1.7(1.52e‑006s) | foo12@foo.edu | ^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$ |
| 1(9.09e‑007s) | 1.28(1.16e‑006s) | 1.67(1.52e‑006s) | bob.smith@foo.tv | ^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$ |
| 1(3.06e‑007s) | 1.07(3.28e‑007s) | 1.95(5.96e‑007s) | EH10 2QQ | ^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$ |
| 1(3.13e‑007s) | 1.09(3.42e‑007s) | 1.86(5.81e‑007s) | G1 1AA | ^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$ |
| 1(3.2e‑007s) | 1.09(3.5e‑007s) | 1.86(5.96e‑007s) | SW1 1ZZ | ^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$ |
| 1(2.68e‑007s) | 1.22(3.28e‑007s) | 2(5.36e‑007s) | 4/1/2001 | ^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$ |
| 1(2.76e‑007s) | 1.16(3.2e‑007s) | 1.94(5.36e‑007s) | 12/12/2001 | ^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$ |
| 1(2.98e‑007s) | 1.03(3.06e‑007s) | 1.85(5.51e‑007s) | 123 | ^[-+]?[[:digit:]]*\.?[[:digit:]]*$ |
| 1(3.2e‑007s) | 1.12(3.58e‑007s) | 1.81(5.81e‑007s) | +3.14159 | ^[-+]?[[:digit:]]*\.?[[:digit:]]*$ |
| 1(3.28e‑007s) | 1.11(3.65e‑007s) | 1.77(5.81e‑007s) | -3.14159 | ^[-+]?[[:digit:]]*\.?[[:digit:]]*$ |
Comparison 2: Long Searches
The next test measures the time to find all matches in a long English text. The text is the complete works of Mark Twain, from Project Gutenberg. The text is 19Mb long. As above, the top number is the normalized time and the bottom number is the actual time. The best time is in green.
Long Searches
| static xpressive | dynamic xpressive | Boost | Expression |
|---|---|---|---|
| 1(0.019s) | 1(0.019s) | 2.98(0.0566s) | Twain |
| 1(0.0176s) | 1(0.0176s) | 3.17(0.0556s) | Huck[[:alpha:]]+ |
| 3.62(1.78s) | 3.97(1.95s) | 1(0.492s) | [[:alpha:]]+ing |
| 2.32(0.344s) | 3.06(0.453s) | 1(0.148s) | ^[^
]*?Twain |
| 1(0.0576s) | 1.05(0.0606s) | 1.15(0.0664s) | Tom|Sawyer|Huckleberry|Finn |
| 1.24(0.164s) | 1.44(0.191s) | 1(0.133s) | (Tom|Sawyer|Huckleberry|Finn).{0,30}river|river.{0,30}(Tom|Sawyer|Huckleberry|Finn) |
| Copyright © 2003, 2004 Eric Niebler |
