Boost C++ Libraries

...one of the most highly regarded and expertly designed C++ library projects in the world. Herb Sutter and Andrei Alexandrescu, C++ Coding Standards

PrevUpHomeNext

(Experimental) Advanced client-side SQL query formatting

Extending format_sql

You can specialize formatter to add formatting support to your types:

// We want to add formatting support for employee
struct employee
{
    std::string first_name;
    std::string last_name;
    std::string company_id;
};

namespace boost {
namespace mysql {

template <>
struct formatter<employee>
{
    // formatter<T> should define the following functions:
    //    const char* parse(const char* first, const char*);
    //    void format(const T&, format_context_base&) const;

    const char* parse(const char* begin, const char* /* end */)
    {
        // Parse any format specifiers for this type.
        // [begin, end) points to the range of characters holding the format specifier string
        // We should return a pointer to the first unparsed character.
        // We don't support any specifiers for this type, so we return the begin pointer.
        return begin;
    }

    void format(const employee& emp, format_context_base& ctx) const
    {
        // Perform the actual formatting by appending characters to ctx.
        // We usually use format_sql_to to achieve this.
        // We will make this suitable for INSERT statements
        format_sql_to(ctx, "{}, {}, {}", emp.first_name, emp.last_name, emp.company_id);
    }
};

}  // namespace mysql
}  // namespace boost

The type can now be used in format_sql and format_sql_to:

// We can now use employee as a built-in value
std::string query = boost::mysql::format_sql(
    conn.format_opts().value(),
    "INSERT INTO employee (first_name, last_name, company_id) VALUES ({}), ({})",
    employee{"John", "Doe", "HGS"},
    employee{"Rick", "Johnson", "AWC"}
);

BOOST_TEST(
    query ==
    "INSERT INTO employee (first_name, last_name, company_id) VALUES "
    "('John', 'Doe', 'HGS'), ('Rick', 'Johnson', 'AWC')"
);

You can add support for format specifiers for your type by modifying the parse function in formatter. For example, an employee can be formatted differently depending on whether we're using it in an INSERT or an UPDATE:

template <>
struct formatter<employee>
{
    // Should we format our employee as an INSERT or an UPDATE?
    // This flag is set by parse and used by format
    bool format_as_update{false};

    const char* parse(const char* it, const char* end)
    {
        // Parse any format specifiers for this type.
        // We recognize the 'u' specifier, which means that we should
        // format the type for an UPDATE statement, instead of an INSERT
        // If no specifier is found, default to INSERT
        // Note that the range may be empty, so we must check for this condition
        if (it != end && *it == 'u')
        {
            // The 'u' specifier is present, record it
            format_as_update = true;

            // Mark the current character as parsed
            ++it;
        }

        // Return a pointer to the first unparsed character.
        // If we didn't manage to parse the entire character range, an error will be emitted.
        // The library already takes care of this.
        return it;
    }

    void format(const employee& emp, format_context_base& ctx) const
    {
        if (format_as_update)
        {
            // This should be suitable to be placed in an UPDATE ... SET statement
            format_sql_to(
                ctx,
                "first_name={}, last_name={}, company_id={}",
                emp.first_name,
                emp.last_name,
                emp.company_id
            );
        }
        else
        {
            // Format only the values, as in INSERT statements
            format_sql_to(ctx, "{}, {}, {}", emp.first_name, emp.last_name, emp.company_id);
        }
    }
};

We can now use it like this:

// We can now use the 'u' specifier with employee
std::string query = boost::mysql::format_sql(
    conn.format_opts().value(),
    "UPDATE employee SET {:u} WHERE id = {}",
    employee{"John", "Doe", "HGS"},
    42
);

BOOST_TEST(
    query == "UPDATE employee SET first_name='John', last_name='Doe', company_id='HGS' WHERE id = 42"
);

// If no format specifier is provided, we get the old behavior
query = boost::mysql::format_sql(
    conn.format_opts().value(),
    "INSERT INTO employee (first_name, last_name, company_id) VALUES ({}), ({})",
    employee{"John", "Doe", "HGS"},
    employee{"Rick", "Johnson", "AWC"}
);

BOOST_TEST(
    query ==
    "INSERT INTO employee (first_name, last_name, company_id) VALUES "
    "('John', 'Doe', 'HGS'), ('Rick', 'Johnson', 'AWC')"
);

See the formatter reference docs for more info.

Format string syntax

This section extends on the supported syntax for format strings. The syntax is similar to the one in fmtlib.

A format string is composed of regular text and replacement fields. Regular text is output verbatim, while replacement fields are substituted by formatted arguments. For instance, in "SELECT {} FROM employee", "SELECT " and " FROM EMPLOYEE" is regular text, and "{}" is a replacement field.

A {} is called an automatic indexing replacement field. Arguments are replaced in the order they were provided to the format function. For instance:

BOOST_TEST(format_sql(opts, "SELECT {}, {}, {}", 42, "abc", nullptr) == "SELECT 42, 'abc', NULL");

A field index can be included within the braces. This is called manual indexing. Indices can appear in any order, and can be repeated:

// Recall that you need to set connect_params::multi_queries to true when connecting
// before running semicolon-separated queries.
std::string query = boost::mysql::format_sql(
    conn.format_opts().value(),
    "UPDATE employee SET first_name = {1} WHERE id = {0}; SELECT * FROM employee WHERE id = {0}",
    42,
    "John"
);

BOOST_TEST(
    query ==
    "UPDATE employee SET first_name = 'John' WHERE id = 42; SELECT * FROM employee WHERE id = 42"
);

Finally, you can use named arguments by using the initializer-list overloads, which creates format_arg values:

std::string query = boost::mysql::format_sql(
    conn.format_opts().value(),
    "UPDATE employee SET first_name = {name} WHERE id = {id}; SELECT * FROM employee WHERE id = {id}",
    {
        {"id",   42    },
        {"name", "John"}
    }
);

BOOST_TEST(
    query ==
    "UPDATE employee SET first_name = 'John' WHERE id = 42; SELECT * FROM employee WHERE id = 42"
);

Argument names can only contain ASCII letters (lowercase and uppercase), digits and the underscore character (_). Names can't start with a digit.

Format strings can use either manual or automatic indexing, but can't mix them:

try
{
    // Mixing manual and auto indexing is illegal. This will throw an exception.
    format_sql(opts, "SELECT {0}, {}", 42);
}
catch (const boost::system::system_error& err)
{
    BOOST_TEST(err.code() == boost::mysql::client_errc::format_string_manual_auto_mix);
}

Named arguments can be mixed with either manual or automatic indexing.

Unreferenced format arguments are ignored. It's not an error to supply more format arguments than required:

// This is OK
std::string query = format_sql(opts, "SELECT {}", 42, "abc");

You can output a brace literal by doubling it:

BOOST_TEST(
    format_sql(opts, "SELECT 'Brace literals: {{ and }}'") == "SELECT 'Brace literals: { and }'"
);

Format specifiers (e.g. {:i}) are supported for some types, but are far less common than in fmtlib, since most types have a single, canonical representation.

Specifiers can appear when doing automatic indexing (e.g. {:i}), manual indexing (e.g. {0:i}) and named arguments (e.g. {name:i}).

Types specializing formatters can define custom specifiers. Only printable ASCII characters that are not { or } can be used as specifiers.

Format strings must be encoded according to format_options::charset. Otherwise, an error will be generated.

Error handling model

Some values can't be securely formatted. For instance, C++ double can be NaN and infinity, which is not supported by MySQL. Strings can contain byte sequences that don't represent valid characters, which makes them impossible to escape securely.

format_sql reports these errors by throwing boost::system::system_error exceptions, which contain an error code with details about what happened. For instance:

try
{
    // We're trying to format a double infinity value, which is not
    // supported by MySQL. This will throw an exception.
    std::string formatted_query = format_sql(opts, "SELECT {}", HUGE_VAL);
}
catch (const boost::system::system_error& err)
{
    BOOST_TEST(err.code() == boost::mysql::client_errc::unformattable_value);
}

You don't have to use exceptions, though. basic_format_context and format_sql_to use boost::system::result, instead.

basic_format_context contains an error code that is set when formatting a value fails. This is called the error state, and can be queried using basic_format_context::error_state. When basic_format_context::get is called (after all individual values have been formatted), the error state is checked. The system::result returned by get will contain the error state if it was set, or the generated query if it was not:

// ctx contains an error code that tracks whether any error happened
boost::mysql::format_context ctx(opts);

// We're trying to format a infinity, which is an error. This
// will set the error state, but won't throw.
format_sql_to(ctx, "SELECT {}, {}", HUGE_VAL, 42);

// The error state gets checked at this point. Since it is set,
// res will contain an error.
boost::system::result<std::string> res = std::move(ctx).get();
BOOST_TEST(!res.has_value());
BOOST_TEST(res.has_error());
BOOST_TEST(res.error() == boost::mysql::client_errc::unformattable_value);
// res.value() would throw an error, like format_sql would

Rationale: the error state mechanism makes composing formatters easier, as the error state is checked only once.

Errors caused by invalid format strings are also reported using this mechanism.

Format options and character set tracking

MySQL has many configuration options that affect its syntax. There are two options that formatting functions need to know in order to work:

any_connection::format_opts is a convenience function that returns a boost::system::result<format_options>. If the connection could not determine the current character set, the result will contain an error. For a reference on how character set tracking works, please read this section.

[Warning] Warning

Passing an incorrect format_options value to formatting functions may cause escaping to generate incorrect values, which may generate vulnerabilities. Stay safe and always use any_connection::format_opts instead of hand-crafting format_options values. Doing this, if the character set can't be safely determined, you will get a client_errc::unknown_character_set error instead of a vulnerability.

Custom string types

format_sql_to can be used with string types that are not std::string, as long as they satisfy the OutputString concept. This includes strings with custom allocators (like std::pmr::string) and boost::static_string. You need to use basic_format_context, specifying the string type:

// Create a format context that uses std::pmr::string
boost::mysql::basic_format_context<std::pmr::string> ctx(conn.format_opts().value());

// Compose your query as usual
boost::mysql::format_sql_to(ctx, "SELECT * FROM employee WHERE id = {}", 42);

// Retrieve the query as usual
std::pmr::string query = std::move(ctx).get().value();

Re-using string memory

You can pass a string value to the context's constructor, to re-use memory:

// we want to re-use memory held by storage
std::string storage;

// storage is moved into ctx by the constructor. If any memory
// had been allocated by the string, it will be re-used.
boost::mysql::format_context ctx(conn.format_opts().value(), std::move(storage));

// Use ctx as you normally would
boost::mysql::format_sql_to(ctx, "SELECT {}", 42);

// When calling get(), the string is moved out of the context
std::string query = std::move(ctx).get().value();

Raw string escaping

If you're building a SQL framework, or otherwise performing very low-level tasks, you may need to just escape a string, without quoting or formatting. You can use escape_string, which mimics mysql_real_escape_string.

[Note] Note

Don't use this unless you know what you're doing.


PrevUpHomeNext