Boost C++ Libraries

...one of the most highly regarded and expertly designed C++ library projects in the world. Herb Sutter and Andrei Alexandrescu, C++ Coding Standards

Click here to view the latest version of this page.
Next

Spirit 2.0

Joel de Guzman

Hartmut Kaiser

Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)


Table of Contents

Preface
What's New
Introduction
Qi and Karma
Tutorials
Quick Start
Warming up
Semantic Actions
Complex - Our first complex parser
Sum - adding numbers
Number List - stuffing numbers into a std::vector
Number List Redux - list syntax
Number List Attribute - one more, with style
Roman Numerals
Employee - Parsing into structs
Mini XML - ASTs!
Mini XML - Error Handling
Abstracts
Parsing Expression Grammar
Parsing
Parsing and Generating
Primitives
Operators
Attributes
Semantic Actions
Directives
Rules
Grammars
Debugging
Mini XML - Error Handling
Parse Trees and ASTs
Quick Reference
Reference
Concepts
Char
String
Numeric
Binary
Directive
Action
Nonterminal
Operators
Stream
Auxiliary
Debug
Spirit.Lex
Introduction to Spirit.Lex
Spirit.Lex Tutorials
Spirit.Lex Tutorials Overview
Quickstart 1 - A word counter using Spirit.Lex
Quickstart 2 - A better word counter using Spirit.Lex
Quickstart 3 - Counting Words Using a Parser
Abstracts
Lexer Primitives
Tokenizing Input Data
Lexer Semantic Actions
The Static Lexer Model
Parsing using a Lexer
Lexer Attributes
Lexer States
Quick Reference
Reference
Concepts
Lexer Class
Token Class
TokenDef Class
TokenSet Class
FAQ
Notes
Portability
Porting from Spirit 1.8.x
Style Guide
Techniques
Rationale
Acknowledgments
References

Examples of designs that meet most of the criteria for "goodness" (easy to understand, flexible, efficient) are a recursive- descent parser, which is traditional procedural code. Another example is the STL, which is a generic library of containers and algorithms depending crucially on both traditional procedural code and on parametric polymorphism. --Bjarne Stroustrup

History

80s

In the Mid 80s, Joel wrote his first calculator in Pascal. It has been an unforgettable coding experience. He was amazed how a mutually recursive set of functions can model a grammar specification. In time, the skills he acquired from that academic experience became very practical. Periodically Joel was tasked to do some parsing. For instance, whenever he needs to perform any form of I/O, even in binary, he tries to approach the task somewhat formally by writing a grammar using Pascal- like syntax diagrams and then write a corresponding recursive-descent parser. This worked very well.

90s

The arrival of the Internet and the World Wide Web magnified this thousand-fold. At one point Joel had to write an HTML parser for a Web browser project. He got a recursive-descent HTML parser working based on the W3C formal specifications easily. He was certainly glad that HTML had a formal grammar specification. Because of the influence of the Internet, Joel then had to do more parsing. RFC specifications were everywhere. SGML, HTML, XML, even email addresses and those seemingly trivial URLs were all formally specified using small EBNF- style grammar specifications. This made him wish for a tool similar to big- time parser generators such as YACC and ANTLR, where a parser is built automatically from a grammar specification. Yet, he wants it to be extremely small; small enough to fit in my pocket, yet scalable.

It must be able to practically parse simple grammars such as email addresses to moderately complex grammars such as XML and perhaps some small to medium-sized scripting languages. Scalability is a prime goal. You should be able to use it for small tasks such as parsing command lines without incurring a heavy payload, as you do when you are using YACC or PCCTS. Even now that it has evolved and matured to become a multi-module library, true to its original intent, Spirit can still be used for extreme micro-parsing tasks. You only pay for features that you need. The power of Spirit comes from its modularity and extensibility. Instead of giving you a sledgehammer, it gives you the right ingredients to create a sledgehammer easily.

The result was Spirit. Spirit was a personal project that was conceived when Joel was doing R&D in Japan. Inspired by the GoF's composite and interpreter patterns, he realized that he can model a recursive-descent parser with hierarchical-object composition of primitives (terminals) and composites (productions). The original version was implemented with run-time polymorphic classes. A parser is generated at run time by feeding in production rule strings such as:

"prod ::= {'A' | 'B'} 'C';"

A compile function compiled the parser, dynamically creating a hierarchy of objects and linking semantic actions on the fly. A very early text can be found here: pre-Spirit.

2001 to 2006

Version 1.0 to 1.8 was a complete rewrite of the original Spirit parser using expression templates and static polymorphism, inspired by the works of Todd Veldhuizen (Expression Templates, C++ Report, June 1995). Initially, the static-Spirit version was meant only to replace the core of the original dynamic-Spirit. Dynamic-spirit needed a parser to implement itself anyway. The original employed a hand-coded recursive-descent parser to parse the input grammar specification strings. Incidentially it was the time, when Hartmut joined the Spirit development.

After its initial "open-source" debut in May 2001, static-Spirit became a success. At around November 2001, the Spirit website had an activity percentile of 98%, making it the number one parser tool at Source Forge at the time. Not bad for such a niche project such as a parser library. The "static" portion of Spirit was forgotten and static-Spirit simply became Spirit. The library soon evolved to acquire more dynamic features.

Spirit was formally accepted into Boost in October 2002. Boost is a peer-reviewed, open collaborative development effort that is a collection of free Open Source C++ libraries covering a wide range of domains. The Boost Libraries have become widely known as an industry standard for design and implementation quality, robustness, and reusability.

2007

Over the years, especially after Spirit was accepted into Boost, Spirit has served its purpose quite admirably. The focus of what we'll now call Classic-Spirit (versions prior to 2.0) was on transduction parsing where the input string is merely translated to an output string. A lot of parsers are of the transduction type. When the time came to add attributes to the parser library, it was done rather in an ad-hoc manner, with the goal being 100% backward compatible with classic Spirit. Some parsers have attributes, some don't.

Spirit V2 is another major rewrite. Spirit V2 grammars are fully attributed (see Attribute Grammar). All parser components have attributes. To do this efficiently and ellegantly, we had to use a couple of infrastructure libraries. Some of which haven't been written yet at the time, some were quite new when Spirit debuted, and some needed work. MPL is an important infrastructure library, yet is not sufficient to implement Spirit V2. Another library had to be written: Fusion. Fusion sits between MPL and STL --between compile time and runtime -- mapping types to values. Fusion is a direct descendant of both MPL and Boost.Tuples (Fusion is now a full fledged Boost library). Phoenix also had to be beefed up to support Spirit V2. The result is Phoenix2. Last but not least, Spirit V2 uses an Expression Templates library called -Boost.Proto-.

New Ideas: Spirit V2

Just before the development of Spirit V2 began, Hartmut came across the StringTemplate library which is a part of the ANTLR parser framework. It is a Java template engine (with ports for C# and Python) for generating source code, web pages, emails, or any other formatted text output. With it, he got the the idea of using a formal notation (a grammar) to describe the expected structure of an input character sequence. The same grammar may be used to formalize the structure of a corresponding output character sequence. This is possible because parsing, most of the time, is implemented by comparing the input with the patterns defined by the grammar. If we use the same patterns to format a matching output, the generated sequence will follow the rules of the grammar as well.

This insight lead to the implementation of a grammar driven output generation library compatibile with the Spirit parser library. As it turned out, parsing and generation are tightly connected and have very similar concepts. The duality of these two sides of the same medal is ubiquitous, which allowed us to build the parser library Spirit.Qi and the generator library Spirit.Karma using the same component infastructure.

The idea of creating a lexer library well integrated with the Spirit parsers is not new. This has been discussed almost for the whole time of the existence of Classic-Spirit (pre V2) now. Several attempts to integrate existing lexer libraries and frameworks with Spirit have been made and served as a proof of concept and usability (for example see Wave: The Boost C/C++ Preprocessor Library, and SLex: a fully dynamic C++ lexer implemented with Spirit). Based on these experiences we added Spirit.Lex: a fully integrated lexer library to the mix, allowing to take advantage of the power of regular expressions for token matching, removing pressure from the parser components, simplifying parser grammars. Again, Spirit's modular structure allowed us to reuse the same underlying component library as for the parser and generator libraries.

How to use this manual

Each major section (there are two: Qi and Karma, and Lex) is roughly divided into 3 parts:

  1. Tutorials: A step by step guide with heavily annotated code. These are meant to get the user acquainted with the library as quickly as possible. The objective is to build the confidence of the user in using the library using abundant examples and detailed instructions. Examples speak volumes.
  2. Abstracts: A high level summary of key topics. The objective is to give the user a high level view of the library, the key concepts, background and theories.
  3. Reference: Detailed formal technical reference. We start with a quick reference -- an easy to use table that maps into the reference proper. The reference proper starts with C++ Concepts followed by models of the concepts.

Some icons are used to mark certain topics indicative of their relevance. These icons precede some text to indicate:

Table 1. Icons

Icon

Name

Meaning

note

Note

Generally useful information (an aside that doesn't fit in the flow of the text)

tip

Tip

Suggestion on how to do something (especially something that not be obvious)

important

Important

Important note on something to take particular notice of

caution

Caution

Take special care with this - it may not be what you expect and may cause bad results

alert

Danger

This is likely to cause serious trouble if ignored

This documentation is automatically generated by Boost QuickBook documentation tool. QuickBook can be found in the Boost Tools.

Support

Please direct all questions to Spirit's mailing list. You can subscribe to the Spirit General List. The mailing list has a searchable archive. A search link to this archive is provided in Spirit's home page. You may also read and post messages to the mailing list through Spirit General NNTP news portal (thanks to Gmane). The news group mirrors the mailing list. Here is a link to the archives: http://news.gmane.org/gmane.comp.parsers.spirit.general.

Last revised: December 23, 2008 at 09:37:55 GMT


Next