Boost C++ Libraries

...one of the most highly regarded and expertly designed C++ library projects in the world. Herb Sutter and Andrei Alexandrescu, C++ Coding Standards

PrevUpHomeNext

(Experimental) Client-side SQL query formatting

When issuing queries that contain untrusted input, prepared statement are usually the way to go. However, some use cases like dynamic filters or batch inserts are not attainable using them.

The functions described in this section can be used to compose SQL query strings dynamically client-side while keeping your application secure.

[Note] Note

This feature is experimental. Its API may change in subsequent releases.

Common use cases

This feature can also be used to improve efficiency, as text queries perform less round-trips to the server. See this section for more info.

Security considerations

Client-side SQL formatting does protect against SQL injection by appropriately quoting and escaping string values. However, this feature does not understand your queries. You need to make sure that your formatting operations result in valid SQL. If you try to build very complex queries and make mistakes, attackers may exploit logic errors in your formatting logic to overtake your system.

Formatting simple queries

You can use format_sql to generate a SQL query from a format string and a set of parameters, using a notation similar to std::format:

std::string employee_name = get_name();  // employee_name is an untrusted string

// Compose the SQL query in the client
std::string query = boost::mysql::format_sql(
    conn.format_opts().value(),
    "SELECT id, salary FROM employee WHERE last_name = {}",
    employee_name
);

// If employee_name is "John", query now contains:
// "SELECT id, salary FROM employee WHERE last_name = 'John'"
// If employee_name contains quotes, they will be escaped as required

// Execute the generated query as usual
results result;
conn.execute(query, result);

format_sql doesn't involve communication with the server. In order to work, it requires a format_options instance describing connection configuration, like the character set currently in use. any_connection::format_opts provides an easy way to retrieve these.

[Note] Note

Getting an unknown_character_set error? Have a look at this section.

All fundamental types can be used with query formatting. This includes integers, floating point types, strings, blobs, dates and times:

std::string query = boost::mysql::format_sql(
    conn.format_opts().value(),
    "SELECT id FROM employee WHERE salary > {}",
    42000
);

BOOST_TEST(query == "SELECT id FROM employee WHERE salary > 42000");

std::optional<T> and boost::optional<T> can also be used:

std::optional<std::int64_t> salary;  // get salary from a possibly untrusted source

std::string query = boost::mysql::format_sql(
    conn.format_opts().value(),
    "UPDATE employee SET salary = {} WHERE id = {}",
    salary,
    1
);

// Depending on whether salary has a value or not, generates:
// UPDATE employee SET salary = 42000 WHERE id = 1
// UPDATE employee SET salary = NULL WHERE id = 1

Collections and ranges are supported, as long as its elements can be formatted:

std::vector<long> ids{1, 5, 20};
std::string query = format_sql(
    conn.format_opts().value(),
    "SELECT * FROM employee WHERE id IN ({})",
    ids
);
BOOST_TEST(query == "SELECT * FROM employee WHERE id IN (1, 5, 20)");

See this section for more on formatting ranges, and this table for a reference of types that have built-in support for SQL formatting.

[Note] Note

Like with std::format, the format string passed to format_sql must be known at compile-time. You can skip this check using the runtime function.

Like std::format, you can use arguments with explicit indices:

// Recall that you need to set connect_params::multi_queries to true when connecting
// before running semicolon-separated queries.
std::string query = boost::mysql::format_sql(
    conn.format_opts().value(),
    "UPDATE employee SET first_name = {1} WHERE id = {0}; SELECT * FROM employee WHERE id = {0}",
    42,
    "John"
);

BOOST_TEST(
    query ==
    "UPDATE employee SET first_name = 'John' WHERE id = 42; SELECT * FROM employee WHERE id = 42"
);

You can also use named arguments, using the initializer list overload:

std::string query = boost::mysql::format_sql(
    conn.format_opts().value(),
    "UPDATE employee SET first_name = {name} WHERE id = {id}; SELECT * FROM employee WHERE id = {id}",
    {
        {"id",   42    },
        {"name", "John"}
    }
);

BOOST_TEST(
    query ==
    "UPDATE employee SET first_name = 'John' WHERE id = 42; SELECT * FROM employee WHERE id = 42"
);

See this section for an in-depth explanation on format strings.

Format specifiers

Some types, like strings, can be formatted in multiple ways. As with std::format, you can select how to format them using format specifiers.

As we've seen, strings are formatted as single-quoted values by default. If you use the {:i} specifier, you can obtain dynamic SQL identifiers, instead:

std::string query = boost::mysql::format_sql(
    conn.format_opts().value(),
    "SELECT id, last_name FROM employee ORDER BY {:i} DESC",
    "company_id"
);

BOOST_TEST(query == "SELECT id, last_name FROM employee ORDER BY `company_id` DESC");

Specifiers are compatible with explicit indices and named arguments, too. This is equivalent to the previous snippet:

std::string query = boost::mysql::format_sql(
    conn.format_opts().value(),
    "SELECT id, last_name FROM employee ORDER BY {0:i} DESC",
    "company_id"
);

Building SQL strings incrementally using format_sql_to

Some use cases, usually involving conditionals, may not be expressible in terms of a single format string. In such cases, you can use format_context and format_sql_to to build query strings incrementally:

// Compose a query that retrieves all employees in a company,
// with an optional limit
std::string compose_select_query(
    boost::mysql::format_options opts,
    string_view company_id,
    std::optional<long> limit
)
{
    // format_context will accumulate the query as we compose it
    boost::mysql::format_context ctx(opts);

    // format_sql_to expands a format string and appends the result
    // to a format context. This way, we can build our query in smaller pieces
    // Add all the query except for the LIMIT clause
    boost::mysql::format_sql_to(ctx, "SELECT * FROM employee WHERE company_id = {}", company_id);

    if (limit)
    {
        // Add the LIMIT clause
        boost::mysql::format_sql_to(ctx, " LIMIT {}", *limit);
    }

    // Retrieve the generated query string
    return std::move(ctx).get().value();
}
std::string query = compose_select_query(conn.format_opts().value(), "HGS", {});
BOOST_TEST(query == "SELECT * FROM employee WHERE company_id = 'HGS'");

query = compose_select_query(conn.format_opts().value(), "HGS", 50);
BOOST_TEST(query == "SELECT * FROM employee WHERE company_id = 'HGS' LIMIT 50");

Formatting ranges with sequence

The sequence function can be used when the default range formatting isn't sufficient. If the elements in your range are not formattable, you can pass a user-defined function to sequence describing how to format each element:

// Employee is a plain struct, not formattable by default
std::vector<employee> employees{
    {"John", "Doe",   "HGS"},
    {"Kate", "Smith", "AWC"},
};
std::string query = format_sql(
    conn.format_opts().value(),
    "INSERT INTO employee (first_name, last_name, company_id) VALUES {}",
    sequence(
        employees,
        [](const employee& e, format_context_base& ctx) {
            // This function will be called for each element in employees,
            // and should format a single element into the passed ctx.
            // Commas will be inserted separating elements.
            format_sql_to(ctx, "({}, {}, {})", e.first_name, e.last_name, e.company_id);
        }
    )
);
BOOST_TEST(
    query ==
    "INSERT INTO employee (first_name, last_name, company_id) VALUES "
    "('John', 'Doe', 'HGS'), ('Kate', 'Smith', 'AWC')"
);

By default, elements are separated by commas, but this is configurable:

// A collection of filters to apply to a query
std::vector<std::pair<string_view, string_view>> filters{
    {"company_id", "HGS" },
    {"first_name", "John"},
};

std::string query = format_sql(
    conn.format_opts().value(),
    "SELECT * FROM employee WHERE {}",
    sequence(
        filters,
        [](const std::pair<string_view, string_view>& f, format_context_base& ctx) {
            // Compose a single filter
            format_sql_to(ctx, "{:i} = {}", f.first, f.second);
        },
        " AND "  // glue string: separate each element with AND clauses
    )
);

BOOST_TEST(query == "SELECT * FROM employee WHERE `company_id` = 'HGS' AND `first_name` = 'John'");

Solving the unknown_character_set error

If you are getting a boost::system::system_error with a client_errc::unknown_character_set error code (or getting this error code by other means), your connection is currently unaware of the character set it's using, which is required by format operations. Try the following:

[Warning] Warning

Security considerations: don't craft format_options values manually. Always use any_connection::format_opts.

For an explanation on why format_options is necessary and how character set tracking works, please read this section.

Solving the invalid_encoding error

SQL formatting can fail if you provide values that can't be securely formatted. The most common cause is passing string values that are not valid according to the passed character set. This triggers a client_errc::invalid_encoding error:

// If the connection is using UTF-8 (the default), this will throw an error,
// because the string to be formatted contains invalid UTF8.
format_sql(conn.format_opts().value(), "SELECT {}", "bad\xff UTF-8");

You can validate your strings beforehand or handle the error once it happened and reject the input. Other types may also produce format errors.

[Tip] Tip

If you prefer handling errors with error codes, instead of exceptions, use format_sql_to. Please read this section for details.

Efficiency considerations

Both client-side SQL formatting and prepared statements have pros and cons efficiency-wise:

Types with built-in support for SQL formatting

C++ type

Formatted as...

Example

signed char, short, int, long, long long

Integral literal
No format specifiers allowed

format_sql(opts, "SELECT {}", 42) == "SELECT 42"
format_sql(opts, "SELECT {}", -1) == "SELECT -1"

unsigned char, unsigned short, unsigned int, unsigned long, unsigned long long

Integral literal
No format specifiers allowed

format_sql(opts, "SELECT {}", 42u) == "SELECT 42"

bool

Integral literal 1 if true, 0 if false
No format specifiers allowed

format_sql(opts, "SELECT {}", false) == "SELECT 0"
format_sql(opts, "SELECT {}", true) == "SELECT 1"

String types (convertible to string_view), including:

std::string

string_view

std::string_view

const char*

Without format specifiers: single-quoted escaped string literal. Note that LIKE special characters (% and _) are not escaped.

i format specifier: backtick-quoted, escaped SQL identifier.

r format specifier: raw, unescaped SQL. Warning: use this specifier with caution.

// Without format specifier: escaped, quoted string value
format_sql(opts, "SELECT {}", "Hello world") == "SELECT 'Hello world'"
format_sql(opts, "SELECT {}", "Hello 'world'") == R"(SELECT 'Hello \'world\'')"

// {:i}: escaped, quoted dynamic identifier
format_sql(opts, "SELECT {:i} FROM t", "salary") == "SELECT `salary` FROM t"
format_sql(opts, "SELECT {:i} FROM t", "sal`ary") == "SELECT `sal``ary` FROM t"

// {:r}: raw, unescaped SQL. WARNING: incorrect use can cause vulnerabilities
format_sql(opts, "SELECT * FROM t WHERE id = 42 {:r} salary > 20000", "OR") ==
    "SELECT * FROM t WHERE id = 42 OR salary > 20000"

Blob types (convertible to span<const unsigned char>), including:

blob (std::vector<unsigned char>)

blob_view (span<const unsigned char>)

std::array<unsigned char, N>

Hex string literal
No format specifiers allowed

format_sql(opts, "SELECT {}", blob{0x00, 0x48, 0xff}) == R"(SELECT x'0048ff')"

float, except NaN and inf

Floating-point literal, after casting to double.
MySQL does not support NaNs and infinities. Attempting to format these cause a client_errc::unformattable_value error.
No format specifiers allowed.

// Equivalent to format_sql(opts, "SELECT {}", static_cast<double>(4.2f))
// Note that MySQL uses doubles for all floating point literals
format_sql(opts, "SELECT {}", 4.2f) == "SELECT 4.199999809265137e+00"

double, except NaN and inf

Floating-point literal.
MySQL does not support NaNs and infinities. Attempting to format these cause a client_errc::unformattable_value error.
No format specifiers allowed.

format_sql(opts, "SELECT {}", 4.2) == "SELECT 4.2e+00"

date

Single quoted, DATE-compatible string literal
No format specifiers allowed

format_sql(opts, "SELECT {}", date(2021, 1, 2)) == "SELECT '2021-01-02'"

datetime

Single quoted DATETIME-compatible string literal
No format specifiers allowed

format_sql(opts, "SELECT {}", datetime(2021, 1, 2, 23, 51, 14)) == "SELECT '2021-01-02 23:51:14.000000'"

time and std::chrono::duration types convertible to time

Single quoted TIME-compatible string literal
No format specifiers allowed

format_sql(opts, "SELECT {}", std::chrono::seconds(121)) == "SELECT '00:02:01.000000'"

std::nullptr_t

NULL
No format specifiers allowed

format_sql(opts, "SELECT {}", nullptr) == "SELECT NULL"

boost::optional<T> and std::optional<T>, T being one of the fundamental types above.
Not applicable to custom types or ranges.
No format specifiers allowed

Formats the underlying value if there is any.
NULL otherwise.

format_sql(opts, "SELECT {}", optional<int>(42)) == "SELECT 42"
format_sql(opts, "SELECT {}", optional<int>()) == "SELECT NULL"

field and field_view

Formats the underlying value.
No format specifiers allowed

format_sql(opts, "SELECT {}", field(42)) == "SELECT 42"
format_sql(opts, "SELECT {}", field("abc")) == "SELECT 'abc'"
format_sql(opts, "SELECT {}", field()) == "SELECT NULL"

Range of formattable elements. Informally, such ranges support std::begin() and std::end(), and its iterator operator* must yield one of the following:

  • A WritableField (i.e. one of the fundamental types above).
  • A type with a custom formatter.

Ranges of ranges are not supported. Note that vector<unsigned char> and similar types are formatted as blobs, not as sequences.

See the Formattable concept reference for a formal definition.

Formats each element in the range, separating elements with commas.
Specifiers can be applied to individual elements by prefixing them with a colon (:)

// long is a WritableField
format_sql(opts, "SELECT {}", std::vector<long>{1, 5, 20}) == "SELECT 1, 5, 20"

// C++20 ranges and other custom ranges accepted
format_sql(opts, "SELECT {}", std::vector<long>{1, 5, 20} | std::ranges::views::take(2)) ==
"SELECT 1, 5"

// Apply the 'i' specifier to each element in the sequence
format_sql(
    opts,
    "SELECT {::i} FROM employee",
    std::vector<string_view>{"first_name", "last_name"}
) == "SELECT `first_name`, `last_name` FROM employee"

format_sequence_view (as returned by sequence)

Formats each element in a range by calling a user-supplied function, separating elements by a glue string (a comma by default).
No format specifiers allowed

format_sql(
    opts,
    "SELECT {}",
    sequence(
        std::vector<int>{1, 5, 20},
        [](int val, format_context_base& ctx) { format_sql_to(ctx, "{}+1", val); }
    )
) == "SELECT 1+1, 5+1, 20+1"

Custom type that specializes formatter

Calls formatter::parse and formatter::format
May accept user-defined format specifiers.

formattable_ref

Formats the underlying value. Can represent any of the types above.
Accepts the same format specifiers as the underlying type.

format_sql(opts, "SELECT {}", formattable_ref(42)) == "SELECT 42"
format_sql(opts, "SELECT {:i} FROM t", formattable_ref("salary")) == "SELECT `salary` FROM t"

PrevUpHomeNext