Boost C++ Libraries

...one of the most highly regarded and expertly designed C++ library projects in the world. Herb Sutter and Andrei Alexandrescu, C++ Coding Standards

Click here to view the latest version of this page.
PrevUpHomeNext

How To

Non-conventional Syntax
Response Files
Winmain Command Line
Option Groups and Hidden Options
Custom Validators
Unicode Support
Allowing Unknown Options

This section describes how the library can be used in specific situations.

Non-conventional Syntax

Sometimes, standard command line syntaxes are not enough. For example, the gcc compiler has "-frtti" and -fno-rtti" options, and this syntax is not directly supported.

For such cases, the library allows the user to provide an additional parser -- a function which will be called on each command line element, before any processing by the library. If the additional parser recognises the syntax, it returns the option name and value, which are used directly. The above example can be handled by the following code:

pair<string, string> reg_foo(const string& s)
{
    if (s.find("-f") == 0) {
        if (s.substr(2, 3) == "no-")
            return make_pair(s.substr(5), string("false"));
        else
            return make_pair(s.substr(2), string("true"));
    } else {
        return make_pair(string(), string());
    }
}

Here's the definition of the additional parser. When parsing the command line, we pass the additional parser:

store(command_line_parser(ac, av).options(desc).extra_parser(reg_foo)
        .run(), vm);

The complete example can be found in the "example/custom_syntax.cpp" file.

Response Files

Some operating system have very low limits of the command line length. The common way to work around those limitations is using response files. A response file is just a configuration file which uses the same syntax as the command line. If the command line specifies a name of response file to use, it's loaded and parsed in addition to the command line. The library does not provide direct support for response files, so you'll need to write some extra code.

First, you need to define an option for the response file:

("response-file", value<string>(), 
     "can be specified with '@name', too")

Second, you'll need an additional parser to support the standard syntax for specifying response files: "@file":

pair<string, string> at_option_parser(string const&s)
{
    if ('@' == s[0])
        return std::make_pair(string("response-file"), s.substr(1));
    else
        return pair<string, string>();
}

Finally, when the "response-file" option is found, you'll have to load that file and pass it to the command line parser. This part is the hardest. We'll use the Boost.Tokenizer library, which works but has some limitations. You might also consider Boost.StringAlgo. The code is:

if (vm.count("response-file")) {
     // Load the file and tokenize it
     ifstream ifs(vm["response-file"].as<string>().c_str());
     if (!ifs) {
         cout << "Could no open the response file\n";
         return 1;
     }
     // Read the whole file into a string
     stringstream ss;
     ss << ifs.rdbuf();
     // Split the file content
     char_separator<char> sep(" \n\r");
     tokenizer<char_separator<char> > tok(ss.str(), sep);
     vector<string> args;
     copy(tok.begin(), tok.end(), back_inserter(args));
     // Parse the file and store the options
     store(command_line_parser(args).options(desc).run(), vm);     
}

The complete example can be found in the "example/response_file.cpp" file.

Winmain Command Line

On the Windows operating system, GUI applications receive the command line as a single string, not split into elements. For that reason, the command line parser cannot be used directly. At least on some compilers, it is possible to obtain the split command line, but it's not clear if all compilers support the same mechanism on all versions of the operating system. The split_winmain function is a portable mechanism provided by the library.

Here's an example of use:

vector<string> args = split_winmain(lpCmdLine);
store(command_line_parser(args).options(desc).run(), vm);

The function is an overload for wchar_t strings, so can also be used in Unicode applications.

Option Groups and Hidden Options

Having a single instance of the options_description class with all the program's options can be problematic:

  • Some options make sense only for specific source, for example, configuration files.

  • The user would prefer some structure in the generated help message.

  • Some options shouldn't appear in the generated help message at all.

To solve the above issues, the library allows a programmer to create several instances of the options_description class, which can be merged in different combinations. The following example will define three groups of options: command line specific, and two options group for specific program modules, only one of which is shown in the generated help message.

Each group is defined using standard syntax. However, you should use reasonable names for each options_description instance:

options_description general("General options");
general.add_options()
    ("help", "produce a help message")
    ("help-module", value<string>()->implicit(),
        "produce a help for a given module")
    ("version", "output the version number")
    ;

options_description gui("GUI options");
gui.add_options()
    ("display", value<string>(), "display to use")
    ;

options_description backend("Backend options");
backend.add_options()
    ("num-threads", value<int>(), "the initial number of threads")
    ;

After declaring options groups, we merge them in two combinations. The first will include all options and be used for parsing. The second will be used for the "--help" option.

// Declare an options description instance which will include
// all the options
options_description all("Allowed options");
all.add(general).add(gui).add(backend);

// Declare an options description instance which will be shown
// to the user
options_description visible("Allowed options");
visible.add(general).add(gui);

What is left is to parse and handle the options:

variables_map vm;
store(parse_command_line(ac, av, all), vm);

if (vm.count("help")) 
{
    cout << visible;
    return 0;
}
if (vm.count("help-module")) {
    const string& s = vm["help-module"].as<string>();
    if (s == "gui") {
        cout << gui;
    } else if (s == "backend") {
        cout << backend;
    } else {
        cout << "Unknown module '" 
             << s << "' in the --help-module option\n";
        return 1;
    }
    return 0;
}
if (vm.count("num-threads")) {
    cout << "The 'num-threads' options was set to "
         << vm["num-threads"].as<int>() << "\n";            
}                           

When parsing the command line, all options are allowed. The "--help" message, however, does not include the "Backend options" group -- the options in that group are hidden. The user can explicitly force the display of that options group by passing "--help-module backend" option. The complete example can be found in the "example/option_groups.cpp" file.

Custom Validators

By default, the conversion of option's value from string into C++ type is done using iostreams, which sometimes is not convenient. The library allows the user to customize the conversion for specific classes. In order to do so, the user should provide suitable overload of the validate function.

Let's first define a simple class:

struct magic_number {
public:
    magic_number(int n) : n(n) {}
    int n;
};

and then overload the validate function:

void validate(boost::any& v, 
              const std::vector<std::string>& values,
              magic_number* target_type, int)
{
    static regex r("\\d\\d\\d-(\\d\\d\\d)");

    using namespace boost::program_options;

    // Make sure no previous assignment to 'a' was made.
    validators::check_first_occurence(v);
    // Extract the first string from 'values'. If there is more than
    // one string, it's an error, and exception will be thrown.
    const string& s = validators::get_single_string(values);

    // Do regex match and convert the interesting part to 
    // int.
    smatch match;
    if (regex_match(s, match, r)) {
        v = any(magic_number(lexical_cast<int>(match[1])));
    } else {
        throw validation_error("invalid value");
    }        
}
        

The function takes four parameters. The first is the storage for the value, and in this case is either empty or contains an instance of the magic_number class. The second is the list of strings found in the next occurrence of the option. The remaining two parameters are needed to workaround the lack of partial template specialization and partial function template ordering on some compilers.

The function first checks that we don't try to assign to the same option twice. Then it checks that only a single string was passed in. Next the string is verified with the help of the Boost.Regex library. If that test is passed, the parsed value is stored into the v variable.

The complete example can be found in the "example/regex.cpp" file.

Unicode Support

To use the library with Unicode, you'd need to:

  • Use Unicode-aware parsers for Unicode input

  • Require Unicode support for options which need it

Most of the parsers have Unicode versions. For example, the parse_command_line function has an overload which takes wchar_t strings, instead of ordinary char.

Even if some of the parsers are Unicode-aware, it does not mean you need to change definition of all the options. In fact, for many options, like integer ones, it makes no sense. To make use of Unicode you'll need some Unicode-aware options. They are different from ordinary options in that they accept wstring input, and process it using wide character streams. Creating an Unicode-aware option is easy: just use the the wvalue function instead of the regular value.

When an ascii parser passes data to an ascii option, or a Unicode parser passes data to a Unicode option, the data are not changed at all. So, the ascii option will see a string in local 8-bit encoding, and the Unicode option will see whatever string was passed as the Unicode input.

What happens when Unicode data is passed to an ascii option, and vice versa? The library automatically performs the conversion from Unicode to local 8-bit encoding. For example, if command line is in ascii, but you use wstring options, then the ascii input will be converted into Unicode.

To perform the conversion, the library uses the codecvt<wchar_t, char> locale facet from the global locale. If you want to work with strings that use local 8-bit encoding (as opposed to 7-bit ascii subset), your application should start with:

locale::global(locale(""));
      

which would set up the conversion facet according to the user's selected locale.

It's wise to check the status of the C++ locale support on your implementation, though. The quick test involves three steps:

  1. Go the the "test" directory and build the "test_convert" binary.

  2. Set some non-ascii locale in the environmemt. On Linux, one can run, for example:

    $ export LC_CTYPE=ru_RU.KOI8-R
    
  3. Run the "test_convert" binary with any non-ascii string in the selected encoding as its parameter. If you see a list of Unicode codepoints, everything's OK. Otherwise, locale support on your system might be broken.

Allowing Unknown Options

Usually, the library throws an exception on unknown option names. This behaviour can be changed. For example, only some part of your application uses Program_options, and you wish to pass unrecognized options to another part of the program, or even to another application.

To allow unregistered options on the command line, you need to use the basic_command_line_parser class for parsing (not parse_command_line) and call the allow_unregistered method of that class:

parsed_options parsed = 
    command_line_parser(argv, argc).options(desc).allow_unregistered().run();      
      

For each token that looks like an option, but does not have a known name, an instance of basic_option will be added to the result. The string_key and value fields of the instance will contain results of syntactic parsing of the token, the unregistered field will be set to true, and the original_tokens field will contain the token as it appeared on the command line.

If you want to pass the unrecognized options further, the collect_unrecognized function can be used. The function will collect original tokens for all unrecognized values, and optionally, all found positional options. Say, if your code handles a few options, but does not handles positional options at all, you can use the function like this:

vector<string> to_pass_further = collect_arguments(parsed.option, include_positional);
      
Copyright 2002-2004 Vladimir Prus

PrevUpHomeNext