Boost C++ Libraries

...one of the most highly regarded and expertly designed C++ library projects in the world. Herb Sutter and Andrei Alexandrescu, C++ Coding Standards

PrevUpHomeNext

Text queries and client-side SQL formatting

Text queries are those that use MySQL text protocol for execution. Plain strings and with_params use this protocol. This contrasts with prepared statements, which are first prepared and then executed separately, and use a binary protocol.

[Warning] Warning

Never compose SQL queries using raw string concatenation. This is insecure and can lead to SQL injection vulnerabilities. Use the client-side SQL formatting facilities explained in this section to avoid vulnerabilities.

Using text queries you can run multiple semicolon-separated queries, which can improve efficiency.

with_params is the easiest way to use client-side SQL formatting. It can be used as a simpler and more flexible alternative to prepared statements. While prepared statements expand queries server-side, SQL formatting does it client-side. Please read the comparison with prepared statements and the security considerations sections for more info.

with_params takes a SQL query string with placeholders and a set of parameters. When passed to execute or async_execute, the query is expanded in the client with the supplied parameters and sent to the server for execution:

std::string employee_name = get_name();  // employee_name is an untrusted string
results result;

// Expand the query and execute it. The expansion happens client-side.
// If employee_name is "John", the executed query would be:
// "SELECT id, salary FROM employee WHERE last_name = 'John'"
conn.execute(
    with_params("SELECT id, salary FROM employee WHERE last_name = {}", employee_name),
    result
);

Curly braces ({}) represent placeholders (technically called replacement fields). The notation and semantics are similar to std::format.

All fundamental types can be used as query parameters. This includes integers, floating point types, strings, blobs, dates and times:

// Will execute "SELECT id FROM employee WHERE salary > 42000"
results result;
conn.execute(with_params("SELECT id FROM employee WHERE salary > {}", 42000), result);

std::optional<T> and boost::optional<T> can also be used:

std::optional<std::int64_t> salary;  // get salary from a possibly untrusted source
results result;

// Depending on whether salary has a value or not, executes:
// "UPDATE employee SET salary = 42000 WHERE id = 1"
// "UPDATE employee SET salary = NULL WHERE id = 1"
conn.execute(with_params("UPDATE employee SET salary = {} WHERE id = {}", salary, 1), result);

Collections and ranges are supported, as long as its elements can be formatted:

results result;
std::vector<long> ids{1, 5, 20};

// Executes "SELECT * FROM employee WHERE id IN (1, 5, 20)"
conn.execute(with_params("SELECT * FROM employee WHERE id IN ({})", ids), result);

See this section for more on formatting ranges, and this table for a reference of types that have built-in support for SQL formatting.

[Note] Note

Like with std::format, the query string passed to with_params must be known at compile-time. You can skip this check using the runtime function.

Like std::format, you can use arguments with explicit indices:

// Recall that you need to set connect_params::multi_queries to true when connecting
// before running semicolon-separated queries. Executes:
// "UPDATE employee SET first_name = 'John' WHERE id = 42; SELECT * FROM employee WHERE id = 42"
results result;
conn.execute(
    with_params(
        "UPDATE employee SET first_name = {1} WHERE id = {0}; SELECT * FROM employee WHERE id = {0}",
        42,
        "John"
    ),
    result
);

See this section for a reference on the format string syntax.

Not all values can be formatted. If the library finds that formatting a certain value can cause an ambiguity that could lead to a security problem, an error will be issued and the query won't be sent to the server. Here are the most common errors:

  • client_errc::invalid_encoding
    • Cause: one of your string parameters contains invalid code points. With the default character set, this means that it contains invalid UTF-8.
    • Solution: all string values must be encoded according to the connection's character set (usually UTF-8). Sanitize or reject such values. Use the blob and blob_view types for values that don't represent character strings, but arbitrary binary values.
  • client_errc::unformattable_value
    • Cause: one of your parameters contains an invalid value. For instance, a double contains a NaN or an Inf, unsupported by MySQL.
    • Solution: reject such values, or replace them by NULL before passing them to client-side SQL formatting.
  • client_errc::unknown_character_set

For example:

try
{
    // If the connection is using UTF-8 (the default), this will throw an error,
    // because the string to be formatted is not valid UTF-8.
    // The query never reaches the server.
    results result;
    conn.execute(with_params("SELECT {}", "bad\xff UTF-8"), result);
}
catch (const boost::system::system_error& err)
{
    BOOST_TEST(err.code() == boost::mysql::client_errc::invalid_encoding);
}

Although both serve a similar purpose, they are fundamentally different. Prepared statements are parsed and expanded by the server. Client-side SQL expands the query in the client and sends it to the server as a string.

This means that client-side SQL does not understand your queries. It just knows about how to format MySQL types into a string without creating vulnerabilities, but otherwise treats your queries as opaque strings. Client-side SQL yields greater flexibility (you can dynamically compose any query), while statements have more limitations. This also means that you need to pay more attention to compose valid queries, specially when dealing with complex conditionals. Logic errors may lead to exploits. Please read the security considerations section for more info.

Client-side SQL entails less round-trips to the server than statements, and is usually more efficient for lightweight queries. However, it uses the less compact text protocol, which may be slower for queries retrieving a lot of data. See the efficiency considerations section for more info.

In general, use client-side SQL formatting for the following cases:

  • Simple queries that don't retrieve a lot of data. Default to with_params and only switch to statements if your performance measurements says so.
  • Queries involving dynamic SQL that can't be achieved by statements. Typical cases include:
    • Dynamic filters (example).
    • Batch inserts. Inserting rows one by one can lead to poor efficiency. You can use client-side SQL formatting to compose a single INSERT that inserts several rows at once (see example 1 and example 2).
    • PATCH-like updates, where the field list in an UPDATE must be dynamic (example).
    • Queries involving dynamic identifiers, like table and field names.
    • Conditional sorting.
    • Pipelines consisting of several semicolon-separated queries with dynamic fields.

On the other hand, prefer prepared statements if:

  • You are executing the same query over and over. You can prepare the statement once and execute it several times.
  • Your query is retrieving a lot of data, and you have performed the relevant performance measurements.
Efficiency considerations

Both client-side SQL formatting and prepared statements have pros and cons efficiency-wise:

  • Client-formatted SQL entails less round-trips to the server. For prepared statements, you usually need a call to prepare the statement, another one to execute it, and possibly a final one to close it. Client-formatted SQL only requires the execution round-trip. This performance gain increases with network latency and if you are using TLS.
  • Prepared statements always entail a mutation of session state, while client-formatted SQL may not. If you're using a connection_pool with prepared statements, you can't use pooled_connection::return_without_reset, as this will leak the statement. With client-formatted queries, reset may not be required if your SQL doesn't mutate session state.
  • Client-formatted SQL queries use a usually less efficient text-based protocol, while prepared statements use a more compact binary protocol. This is relevant if you're retrieving lots of data that is slow to convert to and from text (like doubles).
  • Prepared statements can be re-used. If you need to execute a query several times, prepared statements will only be parsed once.
  • Client-formatted SQL allows more efficient patterns than prepared statements, like batch inserts and semicolon-separated queries.
Security considerations

Both client-side SQL formatting and prepared statements protect against SQL injection. Statements do so by parsing the query with placeholders server-side, before performing parameter substitution. Client-side SQL quotes and escapes your values to avoid injection, but does not understand your queries.

This means that you need to ensure that your queries always expand to valid SQL. This is trivial for simple queries, but may be an issue with more complex ones, involving ranges or dynamic identifiers. For instance, the following query may expand to invalid SQL if the provided range is empty:

// If ids.empty(), generates "SELECT * FROM employee WHERE id IN ()", which is a syntax error.
// This is not a security issue for this query, but may be exploitable in more involved scenarios.
// Queries involving only scalar values (as opposed to ranges) are not affected by this.
// It is your responsibility to check for conditions like ids.empty(), as client-side SQL
// formatting does not understand your queries.
std::vector<int> ids;
auto q = format_sql(conn.format_opts().value(), "SELECT * FROM employee WHERE id IN ({})", ids);

The risk is higher if you're building your query by pieces using format_sql_to.

To sum up:

  • Client-side SQL protects against SQL injection.
  • Client-side SQL does not protect against logic errors. The risk is only present in complex queries. We suggest the following advice:
    • Avoid complex query generation logic as much as possible. Use a single format string instead of format_sql_to, unless you have no other option.
    • When using ranges, consider if the empty range would lead to valid SQL or not.
    • Thoroughly test complex query generation logic.
  • Client-side SQL requires knowing the connection's current character set. This usually happens out of the box, and will lead to a controlled error otherwise. Some recommendations:

PrevUpHomeNext