Boost C++ Libraries

...one of the most highly regarded and expertly designed C++ library projects in the world. Herb Sutter and Andrei Alexandrescu, C++ Coding Standards

PrevUpHomeNext

Advanced client-side SQL query formatting

with_params is handy, but may fall short in some cases involving queries with complex logic. For these cases, you can use format_sql and format_sql_to to expand a query without executing it. These APIs don't involve communication with the server.

format_sql is the simplest, and is akin to std::format:

// Compose the SQL query without executing it.
// format_opts returns a system::result<format_options>,
// contains settings like the current character set.
// If the connection is using an unknown character set, this will throw an error.
std::string query = boost::mysql::format_sql(
    conn.format_opts().value(),
    "SELECT id, salary FROM employee WHERE last_name = {}",
    "Doe"
);

BOOST_TEST(query == "SELECT id, salary FROM employee WHERE last_name = 'Doe'");

format_sql requires a format_options instance describing connection configuration, like the character set currently in use. any_connection::format_opts provides an easy way to retrieve these. This section contains more info about format_opts.

Some use cases, usually involving conditionals, may not be expressible in terms of a single format string. In such cases, you can use format_context and format_sql_to to build query strings incrementally:

// Compose a query that retrieves all employees in a company,
// with an optional limit
std::string compose_select_query(
    boost::mysql::format_options opts,
    string_view company_id,
    std::optional<long> limit
)
{
    // format_context will accumulate the query as we compose it
    boost::mysql::format_context ctx(opts);

    // format_sql_to expands a format string and appends the result
    // to a format context. This way, we can build our query in smaller pieces
    // Add all the query except for the LIMIT clause
    boost::mysql::format_sql_to(ctx, "SELECT * FROM employee WHERE company_id = {}", company_id);

    if (limit)
    {
        // Add the LIMIT clause
        boost::mysql::format_sql_to(ctx, " LIMIT {}", *limit);
    }

    // Retrieve the generated query string.
    // get() returns a boost::system::result<std::string> that
    // contains an error if any of the format operations failed.
    // Calling value() will throw on error, like format_sql does
    return std::move(ctx).get().value();
}
std::string query = compose_select_query(conn.format_opts().value(), "HGS", {});
BOOST_TEST(query == "SELECT * FROM employee WHERE company_id = 'HGS'");

query = compose_select_query(conn.format_opts().value(), "HGS", 50);
BOOST_TEST(query == "SELECT * FROM employee WHERE company_id = 'HGS' LIMIT 50");

sequence uses this feature to make formatting ranges easier.

Any type that works with with_params also does with format_sql and format_sql_to. These types are said to satisfy the Formattable concept. This table summarizes such types.

The sequence function can be used when the default range formatting isn't sufficient. If the elements in your range are not formattable, you can pass a user-defined function to sequence describing how to format each element:

// Employee is a plain struct, not formattable by default
std::vector<employee> employees{
    {"John", "Doe",   "HGS"},
    {"Kate", "Smith", "AWC"},
};
std::string query = format_sql(
    conn.format_opts().value(),
    "INSERT INTO employee (first_name, last_name, company_id) VALUES {}",
    sequence(
        employees,
        [](const employee& e, format_context_base& ctx) {
            // This function will be called for each element in employees,
            // and should format a single element into the passed ctx.
            // Commas will be inserted separating elements.
            format_sql_to(ctx, "({}, {}, {})", e.first_name, e.last_name, e.company_id);
        }
    )
);
BOOST_TEST(
    query ==
    "INSERT INTO employee (first_name, last_name, company_id) VALUES "
    "('John', 'Doe', 'HGS'), ('Kate', 'Smith', 'AWC')"
);

By default, elements are separated by commas, but this is configurable:

// A collection of filters to apply to a query
std::vector<std::pair<string_view, string_view>> filters{
    {"company_id", "HGS" },
    {"first_name", "John"},
};

std::string query = format_sql(
    conn.format_opts().value(),
    "SELECT * FROM employee WHERE {}",
    sequence(
        filters,
        [](const std::pair<string_view, string_view>& f, format_context_base& ctx) {
            // Compose a single filter
            format_sql_to(ctx, "{:i} = {}", f.first, f.second);
        },
        " AND "  // glue string: separate each element with AND clauses
    )
);

BOOST_TEST(query == "SELECT * FROM employee WHERE `company_id` = 'HGS' AND `first_name` = 'John'");

You can use sequence and with_params together.

By default, sequence copies the range you pass as parameter, making it safer for async code. You can use std::reference_wrapper or std::span to avoid such copies.

Some types, like strings, can be formatted in multiple ways. As with std::format, you can select how to format them using format specifiers.

As we've seen, strings are formatted as single-quoted values by default. If you use the {:i} specifier, you can obtain dynamic SQL identifiers, instead:

std::string query = boost::mysql::format_sql(
    conn.format_opts().value(),
    "SELECT id, last_name FROM employee ORDER BY {:i} DESC",
    "company_id"
);

BOOST_TEST(query == "SELECT id, last_name FROM employee ORDER BY `company_id` DESC");

Specifiers are compatible with explicit indices and named arguments, too. This is equivalent to the previous snippet:

std::string query = boost::mysql::format_sql(
    conn.format_opts().value(),
    "SELECT id, last_name FROM employee ORDER BY {0:i} DESC",
    "company_id"
);

You can specialize formatter to add formatting support to your types:

// We want to add formatting support for employee
struct employee
{
    std::string first_name;
    std::string last_name;
    std::string company_id;
};

namespace boost {
namespace mysql {

template <>
struct formatter<employee>
{
    // formatter<T> should define the following functions:
    //    const char* parse(const char* first, const char*);
    //    void format(const T&, format_context_base&) const;

    const char* parse(const char* begin, const char* /* end */)
    {
        // Parse any format specifiers for this type.
        // [begin, end) points to the range of characters holding the format specifier string
        // We should return a pointer to the first unparsed character.
        // We don't support any specifiers for this type, so we return the begin pointer.
        return begin;
    }

    void format(const employee& emp, format_context_base& ctx) const
    {
        // Perform the actual formatting by appending characters to ctx.
        // We usually use format_sql_to to achieve this.
        // We will make this suitable for INSERT statements
        format_sql_to(ctx, "{}, {}, {}", emp.first_name, emp.last_name, emp.company_id);
    }
};

}  // namespace mysql
}  // namespace boost

The type can now be used in format_sql, format_sql_to and with_params:

// We can now use employee as a built-in value
std::string query = boost::mysql::format_sql(
    conn.format_opts().value(),
    "INSERT INTO employee (first_name, last_name, company_id) VALUES ({}), ({})",
    employee{"John", "Doe", "HGS"},
    employee{"Rick", "Johnson", "AWC"}
);

BOOST_TEST(
    query ==
    "INSERT INTO employee (first_name, last_name, company_id) VALUES "
    "('John', 'Doe', 'HGS'), ('Rick', 'Johnson', 'AWC')"
);

You can add support for format specifiers for your type by modifying the parse function in formatter. For example, an employee can be formatted differently depending on whether we're using it in an INSERT or an UPDATE:

template <>
struct formatter<employee>
{
    // Should we format our employee as an INSERT or an UPDATE?
    // This flag is set by parse and used by format
    bool format_as_update{false};

    const char* parse(const char* it, const char* end)
    {
        // Parse any format specifiers for this type.
        // We recognize the 'u' specifier, which means that we should
        // format the type for an UPDATE statement, instead of an INSERT
        // If no specifier is found, default to INSERT
        // Note that the range may be empty, so we must check for this condition
        if (it != end && *it == 'u')
        {
            // The 'u' specifier is present, record it
            format_as_update = true;

            // Mark the current character as parsed
            ++it;
        }

        // Return a pointer to the first unparsed character.
        // If we didn't manage to parse the entire character range, an error will be emitted.
        // The library already takes care of this.
        return it;
    }

    void format(const employee& emp, format_context_base& ctx) const
    {
        if (format_as_update)
        {
            // This should be suitable to be placed in an UPDATE ... SET statement
            format_sql_to(
                ctx,
                "first_name={}, last_name={}, company_id={}",
                emp.first_name,
                emp.last_name,
                emp.company_id
            );
        }
        else
        {
            // Format only the values, as in INSERT statements
            format_sql_to(ctx, "{}, {}, {}", emp.first_name, emp.last_name, emp.company_id);
        }
    }
};

We can now use it like this:

// We can now use the 'u' specifier with employee
std::string query = boost::mysql::format_sql(
    conn.format_opts().value(),
    "UPDATE employee SET {:u} WHERE id = {}",
    employee{"John", "Doe", "HGS"},
    42
);

BOOST_TEST(
    query == "UPDATE employee SET first_name='John', last_name='Doe', company_id='HGS' WHERE id = 42"
);

// If no format specifier is provided, we get the old behavior
query = boost::mysql::format_sql(
    conn.format_opts().value(),
    "INSERT INTO employee (first_name, last_name, company_id) VALUES ({}), ({})",
    employee{"John", "Doe", "HGS"},
    employee{"Rick", "Johnson", "AWC"}
);

BOOST_TEST(
    query ==
    "INSERT INTO employee (first_name, last_name, company_id) VALUES "
    "('John', 'Doe', 'HGS'), ('Rick', 'Johnson', 'AWC')"
);

See the formatter reference docs for more info.

Format string syntax

This section extends on the supported syntax for format strings. The syntax is similar to the one in fmtlib.

A format string is composed of regular text and replacement fields. Regular text is output verbatim, while replacement fields are substituted by formatted arguments. For instance, in "SELECT {} FROM employee", "SELECT " and " FROM EMPLOYEE" is regular text, and "{}" is a replacement field.

A {} is called an automatic indexing replacement field. Arguments are replaced in the order they were provided to the format function. For instance:

BOOST_TEST(format_sql(opts, "SELECT {}, {}, {}", 42, "abc", nullptr) == "SELECT 42, 'abc', NULL");

A field index can be included within the braces. This is called manual indexing. Indices can appear in any order, and can be repeated:

// Recall that you need to set connect_params::multi_queries to true when connecting
// before running semicolon-separated queries. Executes:
// "UPDATE employee SET first_name = 'John' WHERE id = 42; SELECT * FROM employee WHERE id = 42"
results result;
conn.execute(
    with_params(
        "UPDATE employee SET first_name = {1} WHERE id = {0}; SELECT * FROM employee WHERE id = {0}",
        42,
        "John"
    ),
    result
);

Format strings can use either manual or automatic indexing, but can't mix them:

try
{
    // Mixing manual and auto indexing is illegal. This will throw an exception.
    format_sql(opts, "SELECT {0}, {}", 42);
}
catch (const boost::system::system_error& err)
{
    BOOST_TEST(err.code() == boost::mysql::client_errc::format_string_manual_auto_mix);
}

Unreferenced format arguments are ignored. It's not an error to supply more format arguments than required:

// This is OK
std::string query = format_sql(opts, "SELECT {}", 42, "abc");

You can output a brace literal by doubling it:

BOOST_TEST(
    format_sql(opts, "SELECT 'Brace literals: {{ and }}'") == "SELECT 'Brace literals: { and }'"
);

Format specifiers (e.g. {:i}) are supported for some types, but are far less common than in fmtlib, since most types have a single, canonical representation.

Specifiers can appear when doing automatic indexing (e.g. {:i}) or manual indexing (e.g. {0:i}).

Types specializing formatters can define custom specifiers. Only printable ASCII characters that are not { or } can be used as specifiers.

Format strings must be encoded according to format_options::charset. Otherwise, an error will be generated.

Error handling model

Some values can't be securely formatted. For instance, C++ double can be NaN and infinity, which is not supported by MySQL. Strings can contain byte sequences that don't represent valid characters, which makes them impossible to escape securely.

When using with_params and any of these errors is encountered, the execute operation fails, as if a server error had been encountered. This is transparent to the user, so no action is required.

format_sql reports these errors by throwing boost::system::system_error exceptions, which contain an error code with details about what happened. For instance:

try
{
    // We're trying to format a double infinity value, which is not
    // supported by MySQL. This will throw an exception.
    std::string formatted_query = format_sql(opts, "SELECT {}", HUGE_VAL);
}
catch (const boost::system::system_error& err)
{
    BOOST_TEST(err.code() == boost::mysql::client_errc::unformattable_value);
}

You don't have to use exceptions, though. basic_format_context and format_sql_to use boost::system::result, instead.

basic_format_context contains an error code that is set when formatting a value fails. This is called the error state, and can be queried using format_context_base::error_state. When basic_format_context::get is called (after all individual values have been formatted), the error state is checked. The system::result returned by get will contain the error state if it was set, or the generated query if it was not:

// ctx contains an error code that tracks whether any error happened
boost::mysql::format_context ctx(opts);

// We're trying to format a infinity, which is an error. This
// will set the error state, but won't throw.
format_sql_to(ctx, "SELECT {}, {}", HUGE_VAL, 42);

// The error state gets checked at this point. Since it is set,
// res will contain an error.
boost::system::result<std::string> res = std::move(ctx).get();
BOOST_TEST(!res.has_value());
BOOST_TEST(res.has_error());
BOOST_TEST(res.error() == boost::mysql::client_errc::unformattable_value);
// res.value() would throw an error, like format_sql would

Rationale: the error state mechanism makes composing formatters easier, as the error state is checked only once.

Errors caused by invalid format strings are also reported using this mechanism.

Format options and character set tracking

MySQL has many configuration options that affect its syntax. There are two options that formatting functions need to know in order to work:

any_connection::format_opts is a convenience function that returns a boost::system::result<format_options>. If the connection could not determine the current character set, the result will contain an error. For a reference on how character set tracking works, please read this section.

[Note] Note

Prior to connection establishment, the connection's character set is always unknown. Connect your connection before calling format_opts.

[Warning] Warning

Passing an incorrect format_options value to formatting functions may cause escaping to generate incorrect values, which may generate vulnerabilities. Stay safe and always use any_connection::format_opts instead of hand-crafting format_options values. Doing this, if the character set can't be safely determined, you will get a client_errc::unknown_character_set error instead of a vulnerability.

Custom string types

format_sql_to can be used with string types that are not std::string, as long as they satisfy the OutputString concept. This includes strings with custom allocators (like std::pmr::string) and boost::static_string. You need to use basic_format_context, specifying the string type:

// Create a format context that uses std::pmr::string
boost::mysql::basic_format_context<std::pmr::string> ctx(conn.format_opts().value());

// Compose your query as usual
boost::mysql::format_sql_to(ctx, "SELECT * FROM employee WHERE id = {}", 42);

// Retrieve the query as usual
std::pmr::string query = std::move(ctx).get().value();

Re-using string memory

You can pass a string value to the context's constructor, to re-use memory:

// we want to re-use memory held by storage
std::string storage;

// storage is moved into ctx by the constructor. If any memory
// had been allocated by the string, it will be re-used.
boost::mysql::format_context ctx(conn.format_opts().value(), std::move(storage));

// Use ctx as you normally would
boost::mysql::format_sql_to(ctx, "SELECT {}", 42);

// When calling get(), the string is moved out of the context
std::string query = std::move(ctx).get().value();

Raw string escaping

If you're building a SQL framework, or otherwise performing very low-level tasks, you may need to just escape a string, without quoting or formatting. You can use escape_string, which mimics mysql_real_escape_string.

[Note] Note

Don't use this unless you know what you're doing.

C++ type

Formatted as...

Example

signed char, short, int, long, long long

Integral literal
No format specifiers allowed

format_sql(opts, "SELECT {}", 42) == "SELECT 42"
format_sql(opts, "SELECT {}", -1) == "SELECT -1"

unsigned char, unsigned short, unsigned int, unsigned long, unsigned long long

Integral literal
No format specifiers allowed

format_sql(opts, "SELECT {}", 42u) == "SELECT 42"

bool

Integral literal 1 if true, 0 if false
No format specifiers allowed

format_sql(opts, "SELECT {}", false) == "SELECT 0"
format_sql(opts, "SELECT {}", true) == "SELECT 1"

String types (convertible to string_view), including:

std::string

string_view

std::string_view

const char*

Without format specifiers: single-quoted escaped string literal. Note that LIKE special characters (% and _) are not escaped.

i format specifier: backtick-quoted, escaped SQL identifier.

r format specifier: raw, unescaped SQL. Warning: use this specifier with caution.

// Without format specifier: escaped, quoted string value
format_sql(opts, "SELECT {}", "Hello world") == "SELECT 'Hello world'"
format_sql(opts, "SELECT {}", "Hello 'world'") == R"(SELECT 'Hello \'world\'')"

// {:i}: escaped, quoted dynamic identifier
format_sql(opts, "SELECT {:i} FROM t", "salary") == "SELECT `salary` FROM t"
format_sql(opts, "SELECT {:i} FROM t", "sal`ary") == "SELECT `sal``ary` FROM t"

// {:r}: raw, unescaped SQL. WARNING: incorrect use can cause vulnerabilities
format_sql(opts, "SELECT * FROM t WHERE id = 42 {:r} salary > 20000", "OR") ==
    "SELECT * FROM t WHERE id = 42 OR salary > 20000"

Blob types (convertible to span<const unsigned char>), including:

blob (std::vector<unsigned char>)

blob_view (span<const unsigned char>)

std::array<unsigned char, N>

Hex string literal
No format specifiers allowed

format_sql(opts, "SELECT {}", blob{0x00, 0x48, 0xff}) == R"(SELECT x'0048ff')"

float, except NaN and inf

Floating-point literal, after casting to double.
MySQL does not support NaNs and infinities. Attempting to format these cause a client_errc::unformattable_value error.
No format specifiers allowed.

// Equivalent to format_sql(opts, "SELECT {}", static_cast<double>(4.2f))
// Note that MySQL uses doubles for all floating point literals
format_sql(opts, "SELECT {}", 4.2f) == "SELECT 4.199999809265137e+00"

double, except NaN and inf

Floating-point literal.
MySQL does not support NaNs and infinities. Attempting to format these cause a client_errc::unformattable_value error.
No format specifiers allowed.

format_sql(opts, "SELECT {}", 4.2) == "SELECT 4.2e+00"

date

Single quoted, DATE-compatible string literal
No format specifiers allowed

format_sql(opts, "SELECT {}", date(2021, 1, 2)) == "SELECT '2021-01-02'"

datetime

Single quoted DATETIME-compatible string literal
No format specifiers allowed

format_sql(opts, "SELECT {}", datetime(2021, 1, 2, 23, 51, 14)) == "SELECT '2021-01-02 23:51:14.000000'"

time and std::chrono::duration types convertible to time

Single quoted TIME-compatible string literal
No format specifiers allowed

format_sql(opts, "SELECT {}", std::chrono::seconds(121)) == "SELECT '00:02:01.000000'"

std::nullptr_t

NULL
No format specifiers allowed

format_sql(opts, "SELECT {}", nullptr) == "SELECT NULL"

boost::optional<T> and std::optional<T>, T being one of the fundamental types above.
Not applicable to custom types or ranges.
No format specifiers allowed

Formats the underlying value if there is any.
NULL otherwise.

format_sql(opts, "SELECT {}", optional<int>(42)) == "SELECT 42"
format_sql(opts, "SELECT {}", optional<int>()) == "SELECT NULL"

field and field_view

Formats the underlying value.
No format specifiers allowed

format_sql(opts, "SELECT {}", field(42)) == "SELECT 42"
format_sql(opts, "SELECT {}", field("abc")) == "SELECT 'abc'"
format_sql(opts, "SELECT {}", field()) == "SELECT NULL"

Range of formattable elements. Informally, such ranges support std::begin() and std::end(), and its iterator operator* must yield one of the following:

  • A WritableField (i.e. one of the fundamental types above).
  • A type with a custom formatter.

Ranges of ranges are not supported. Note that vector<unsigned char> and similar types are formatted as blobs, not as sequences.

See the Formattable concept reference for a formal definition.

Formats each element in the range, separating elements with commas.
Specifiers can be applied to individual elements by prefixing them with a colon (:)

// long is a WritableField
format_sql(opts, "SELECT {}", std::vector<long>{1, 5, 20}) == "SELECT 1, 5, 20"

// C++20 ranges and other custom ranges accepted
format_sql(opts, "SELECT {}", std::vector<long>{1, 5, 20} | std::ranges::views::take(2)) ==
"SELECT 1, 5"

// Apply the 'i' specifier to each element in the sequence
format_sql(
    opts,
    "SELECT {::i} FROM employee",
    std::vector<string_view>{"first_name", "last_name"}
) == "SELECT `first_name`, `last_name` FROM employee"

format_sequence (as returned by sequence)

Formats each element in a range by calling a user-supplied function, separating elements by a glue string (a comma by default).
No format specifiers allowed

format_sql(
    opts,
    "SELECT {}",
    sequence(
        std::vector<int>{1, 5, 20},
        [](int val, format_context_base& ctx) { format_sql_to(ctx, "{}+1", val); }
    )
) == "SELECT 1+1, 5+1, 20+1"

Custom type that specializes formatter

Calls formatter::parse and formatter::format
May accept user-defined format specifiers.

formattable_ref

Formats the underlying value. Can represent any of the types above.
Accepts the same format specifiers as the underlying type.

format_sql(opts, "SELECT {}", formattable_ref(42)) == "SELECT 42"
format_sql(opts, "SELECT {:i} FROM t", formattable_ref("salary")) == "SELECT `salary` FROM t"

PrevUpHomeNext