...one of the most highly
regarded and expertly designed C++ library projects in the
world.
— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards
“Examples of designs that meet most of the criteria for "goodness" (easy to understand, flexible, efficient) are a recursive-descent parser, which is traditional procedural code. Another example is the STL, which is a generic library of containers and algorithms depending crucially on both traditional procedural code and on parametric polymorphism.” --Bjarne Stroustrup
In the mid-80s, Joel wrote his first calculator in Pascal. Such an unforgettable coding experience, he was amazed at how a mutually recursive set of functions can model a grammar specification. In time, the skills he acquired from that academic experience became very practical as he was tasked to do some parsing. For instance, whenever he needed to perform any form of binary or text I/O, he tried to approach each task somewhat formally by writing a grammar using Pascal-like syntax diagrams and then a corresponding recursive-descent parser. This process worked very well.
The arrival of the Internet and the World Wide Web magnified the need for parsing a thousand-fold. At one point Joel had to write an HTML parser for a Web browser project. Using the W3C formal specifications, he easily wrote a recursive-descent HTML parser. With the influence of the Internet, RFC specifications were abundent. SGML, HTML, XML, email addresses and even those seemingly trivial URLs were all formally specified using small EBNF-style grammar specifications. Joel had more parsing to do, and he wished for a tool similar to larger parser generators such as YACC and ANTLR, where a parser is built automatically from a grammar specification.
This ideal tool would be able to parse anything from email addresses and command lines, to XML and scripting languages. Scalability was a primary goal. The tool would be able to do this without incurring a heavy development load, which was not possible with the above mentioned parser generators. The result was Spirit.
Spirit was a personal project that was conceived when Joel was involved in R&D in Japan. Inspired by the GoF's composite and interpreter patterns, he realized that he can model a recursive-descent parser with hierarchical-object composition of primitives (terminals) and composites (productions). The original version was implemented with run-time polymorphic classes. A parser was generated at run time by feeding in production rule strings such as:
"prod ::= {'A' | 'B'} 'C';"
A compile function compiled the parser, dynamically creating a hierarchy of objects and linking semantic actions on the fly. A very early text can be found here: pre-Spirit.
Version 1.0 to 1.8 was a complete rewrite of the original Spirit parser using expression templates and static polymorphism, inspired by the works of Todd Veldhuizen (Expression Templates, C++ Report, June 1995). Initially, the static-Spirit version was meant only to replace the core of the original dynamic-Spirit. Dynamic-Spirit needed a parser to implement itself anyway. The original employed a hand-coded recursive-descent parser to parse the input grammar specification strings. It was at this time when Hartmut Kaiser joined the Spirit development.
After its initial "open-source" debut in May 2001, static-Spirit became a success. At around November 2001, the Spirit website had an activity percentile of 98%, making it the number one parser tool at Source Forge at the time. Not bad for a niche project like a parser library. The "static" portion of Spirit was forgotten and static-Spirit simply became Spirit. The library soon evolved to acquire more dynamic features.
Spirit was formally accepted into Boost in October 2002. Boost is a peer-reviewed, open collaborative development effort around a collection of free Open Source C++ libraries covering a wide range of domains. The Boost Libraries have become widely known as an industry standard for design and implementation quality, robustness, and reusability.
Over the years, especially after Spirit was accepted into Boost, Spirit has served its purpose quite admirably. Classic-Spirit (versions prior to 2.0) focused on transduction parsing, where the input string is merely translated to an output string. Many parsers fall into the transduction type. When the time came to add attributes to the parser library, it was done in a rather ad-hoc manner, with the goal being 100% backward compatible with Classic Spirit. As a result, some parsers have attributes, some don't.
Spirit V2 is another major rewrite. Spirit V2 grammars are fully attributed (see Attribute Grammar) which means that all parser components have attributes. To do this efficiently and elegantly, we had to use a couple of infrastructure libraries. Some did not exist, some were quite new when Spirit debuted, and some needed work. Boost.Mpl is an important infrastructure library, yet is not sufficient to implement Spirit V2. Another library had to be written: Boost.Fusion. Fusion sits between MPL and STL --between compile time and runtime -- mapping types to values. Fusion is a direct descendant of both MPL and Boost.Tuples. Fusion is now a full-fledged Boost library. Phoenix also had to be beefed up to support Spirit V2. The result is Boost.Phoenix. Last but not least, Spirit V2 uses an Expression Templates library called Boost.Proto.
Even though it has evolved and matured to become a multi-module library, Spirit is still used for micro-parsing tasks as well as scripting languages. Like C++, you only pay for features that you need. The power of Spirit comes from its modularity and extensibility. Instead of giving you a sledgehammer, it gives you the right ingredients to easily create a sledgehammer.
Just before the development of Spirit V2 began, Hartmut came across the StringTemplate library that is a part of the ANTLR parser framework. [1] The concepts presented in that library lead Hartmut to the next step in the evolution of Spirit. Parsing and generation are tightly connected to a formal notation, or a grammar. The grammar describes both input and output, and therefore, a parser library should have a grammar driven output. This duality is expressed in Spirit by the parser library Spirit.Qi and the generator library Spirit.Karma using the same component infrastructure.
The idea of creating a lexer library well integrated with the Spirit parsers is not new. This has been discussed almost since Classic-Spirit (pre V2) initially debuted. Several attempts to integrate existing lexer libraries and frameworks with Spirit have been made and served as a proof of concept and usability (for example see Wave: The Boost C/C++ Preprocessor Library, and SLex: a fully dynamic C++ lexer implemented with Spirit). Based on these experiences we added Spirit.Lex: a fully integrated lexer library to the mix, allowing the user to take advantage of the power of regular expressions for token matching, removing pressure from the parser components, simplifying parser grammars. Again, Spirit's modular structure allowed us to reuse the same underlying component library as for the parser and generator libraries.
Each major section (there are 3: Qi, Karma, and Lex) is roughly divided into 3 parts:
Some icons are used to mark certain topics indicative of their relevance. These icons precede some text to indicate:
Table 1. Icons
Icon |
Name |
Meaning |
---|---|---|
|
Note |
Generally useful information (an aside that doesn't fit in the flow of the text) |
|
Tip |
Suggestion on how to do something (especially something that is not obvious) |
|
Important |
Important note on something to take particular notice of |
|
Caution |
Take special care with this - it may not be what you expect and may cause bad results |
|
Danger |
This is likely to cause serious trouble if ignored |
This documentation is automatically generated by Boost QuickBook documentation tool. QuickBook can be found in the Boost Tools.
Please direct all questions to Spirit's mailing list. You can subscribe to the Spirit General List. The mailing list has a searchable archive. A search link to this archive is provided in Spirit's home page. You may also read and post messages to the mailing list through Spirit General NNTP news portal (thanks to Gmane). The news group mirrors the mailing list. Here is a link to the archives: http://news.gmane.org/gmane.comp.parsers.spirit.general.
[1] Quote from http://www.stringtemplate.org/: It is a Java template engine (with ports for C# and Python) for generating source code, web pages, emails, or any other formatted text output.