Boost C++ Libraries

...one of the most highly regarded and expertly designed C++ library projects in the world. Herb Sutter and Andrei Alexandrescu, C++ Coding Standards

Click here to view the latest version of this page.
PrevUpHomeNext

Lexer Semantic Actions

The main task of a lexer normally is to recognize tokens in the input. Traditionally this has been complemented with the possibility to execute arbitrary code whenever a certain token has been detected. Spirit.Lex has been designed to support this mode of operation as well. We borrow from the concept of semantic actions for parsers (Spirit.Qi) and generators (Spirit.Karma). Lexer semantic actions may be attached to any token definition. These are C++ functions or function objects that are called whenever a token definition successfully recognizes a portion of the input. Say you have a token definition D, and a C++ function f, you can make the lexer call f whenever it matches an input by attaching f:

D[f]

The expression above links f to the token definition, D. The required prototype of f is:

void f (Iterator& start, Iterator& end, pass_flag& matched, Idtype& id, Context& ctx);

where:

Iterator& start

This is the iterator pointing to the begin of the matched range in the underlying input sequence. The type of the iterator is the same as specified while defining the type of the lexertl::actor_lexer<...> (its first template parameter). The semantic action is allowed to change the value of this iterator influencing, the matched input sequence.

Iterator& end

This is the iterator pointing to the end of the matched range in the underlying input sequence. The type of the iterator is the same as specified while defining the type of the lexertl::actor_lexer<...> (its first template parameter). The semantic action is allowed to change the value of this iterator influencing, the matched input sequence.

pass_flag& matched

This value is pre/initialized to pass_normal. If the semantic action sets it to pass_fail this behaves as if the token has not been matched in the first place. If the semantic action sets this to pass_ignore the lexer ignores the current token and tries to match a next token from the input.

Idtype& id

This is the token id of the type Idtype (most of the time this will be a std::size_t) for the matched token. The semantic action is allowed to change the value of this token id, influencing the if of the created token.

Context& ctx

This is a reference to a lexer specific, unspecified type, providing the context for the current lexer state. It can be used to access different internal data items and is needed for lexer state control from inside a semantic action.

When using a C++ function as the semantic action the following prototypes are allowed as well:

void f (Iterator& start, Iterator& end, pass_flag& matched, Idtype& id);
void f (Iterator& start, Iterator& end, pass_flag& matched);
void f (Iterator& start, Iterator& end);
void f ();
[Important] Important

In order to use lexer semantic actions you need to use type lexertl::actor_lexer<> as your lexer class (instead of the type lexertl::lexer<> as described in earlier examples).

The context of a lexer semantic action

The last parameter passed to any lexer semantic action is a reference to an unspecified type (see the Context type in the table above). This type is unspecified because it depends on the token type returned by the lexer. It is implemented in the internals of the iterator type exposed by the lexer. Nevertheless, any context type is expected to expose a couple of functions allowing to influence the behavior of the lexer. The following table gives an overview and a short description of the available functionality.

Table 8. Functions exposed by any context passed to a lexer semantic action

Name

Description

Iterator const& get_eoi() const

The function get_eoi() may be used by to access the end iterator of the input stream the lexer has been initialized with

void more()

The function more() tells the lexer that the next time it matches a rule, the corresponding token should be appended onto the current token value rather than replacing it.

Iterator const& less(Iterator const& it, int n)

The function less() returns an iterator positioned to the nth input character beyond the current token start iterator (i.e. by passing the return value to the parameter end it is possible to return all but the first n characters of the current token back to the input stream.

bool lookahead(std::size_t id)

The function lookahead() can be used to implement lookahead for lexer engines not supporting constructs like flex' a/b (match a, but only when followed by b). It invokes the lexer on the input following the current token without actually moving forward in the input stream. The function returns whether the lexer was able to match a token with the given token-id id.

std::size_t get_state() const and void set_state(std::size_t state)

The functions get_state() and set_state() may be used to introspect and change the current lexer state.

token_value_type get_value() const and void set_value(Value const&)

The functions get_value() and set_value() may be used to introspect and change the current token value.


Lexer Semantic Actions Using Phoenix

Even if it is possible to write your own function object implementations (i.e. using Boost.Lambda or Boost.Bind), the preferred way of defining lexer semantic actions is to use Boost.Phoenix. In this case you can access the parameters described above by using the predefined Spirit placeholders:

Table 9. Predefined Phoenix placeholders for lexer semantic actions

Placeholder

Description

_start

Refers to the iterator pointing to the beginning of the matched input sequence. Any modifications to this iterator value will be reflected in the generated token.

_end

Refers to the iterator pointing past the end of the matched input sequence. Any modifications to this iterator value will be reflected in the generated token.

_pass

References the value signaling the outcome of the semantic action. This is pre-initialized to lex::pass_flags::pass_normal. If this is set to lex::pass_flags::pass_fail, the lexer will behave as if no token has been matched, if is set to lex::pass_flags::pass_ignore, the lexer will ignore the current match and proceed trying to match tokens from the input.

_tokenid

Refers to the token id of the token to be generated. Any modifications to this value will be reflected in the generated token.

_val

Refers to the value the next token will be initialized from. Any modifications to this value will be reflected in the generated token.

_state

Refers to the lexer state the input has been match in. Any modifications to this value will be reflected in the lexer itself (the next match will start in the new state). The currently generated token is not affected by changes to this variable.

_eoi

References the end iterator of the overall lexer input. This value cannot be changed.


The context object passed as the last parameter to any lexer semantic action is not directly accessible while using Boost.Phoenix expressions. We rather provide predefined Phoenix functions allowing to invoke the different support functions as mentioned above. The following table lists the available support functions and describes their functionality:

Table 10. Support functions usable from Phoenix expressions inside lexer semantic actions

Plain function

Phoenix function

Description

ctx.more()

more()

The function more() tells the lexer that the next time it matches a rule, the corresponding token should be appended onto the current token value rather than replacing it.

ctx.less()

less(n)

The function less() takes a single integer parameter n and returns an iterator positioned to the nth input character beyond the current token start iterator (i.e. by assigning the return value to the placeholder _end it is possible to return all but the first n characters of the current token back to the input stream.

ctx.lookahead()

lookahead(std::size_t) or lookahead(token_def)

The function lookahead() takes a single parameter specifying the token to match in the input. The function can be used for instance to implement lookahead for lexer engines not supporting constructs like flex' a/b (match a, but only when followed by b). It invokes the lexer on the input following the current token without actually moving forward in the input stream. The function returns whether the lexer was able to match the specified token.



PrevUpHomeNext