...one of the most highly
regarded and expertly designed C++ library projects in the
world.
— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards
When issuing queries that contain untrusted input, prepared statement are usually the way to go. However, some use cases like dynamic filters or batch inserts are not attainable using them.
The functions described in this section can be used to compose SQL query strings dynamically client-side while keeping your application secure.
Note | |
---|---|
This feature is experimental. Its API may change in subsequent releases. |
INSERT
that performs several insertions at once. See this
example for simple batch inserts and this
other for a generic utility to implement this case.
This feature can also be used to improve efficiency, as text queries perform less round-trips to the server. See this section for more info.
Client-side SQL formatting does protect against SQL injection by appropriately quoting and escaping string values. However, this feature does not understand your queries. You need to make sure that your formatting operations result in valid SQL. If you try to build very complex queries and make mistakes, attackers may exploit logic errors in your formatting logic to overtake your system.
You can use format_sql
to generate a SQL query from a format string and a set of parameters, using
a notation similar to std::format
:
std::string employee_name = get_name(); // employee_name is an untrusted string // Compose the SQL query in the client std::string query = boost::mysql::format_sql( conn.format_opts().value(), "SELECT id, salary FROM employee WHERE last_name = {}", employee_name ); // If employee_name is "John", query now contains: // "SELECT id, salary FROM employee WHERE last_name = 'John'" // If employee_name contains quotes, they will be escaped as required // Execute the generated query as usual results result; conn.execute(query, result);
format_sql
doesn't involve communication with the server. In order to work, it requires
a format_options
instance describing connection configuration, like the character set currently
in use. any_connection::format_opts
provides an easy way to retrieve these.
Note | |
---|---|
Getting an |
All fundamental types can be used with query formatting. This includes integers, floating point types, strings, blobs, dates and times:
std::string query = boost::mysql::format_sql( conn.format_opts().value(), "SELECT id FROM employee WHERE salary > {}", 42000 ); BOOST_TEST(query == "SELECT id FROM employee WHERE salary > 42000");
std::optional<T>
and
boost::optional<T>
can
also be used:
std::optional<std::int64_t> salary; // get salary from a possibly untrusted source std::string query = boost::mysql::format_sql( conn.format_opts().value(), "UPDATE employee SET salary = {} WHERE id = {}", salary, 1 ); // Depending on whether salary has a value or not, generates: // UPDATE employee SET salary = 42000 WHERE id = 1 // UPDATE employee SET salary = NULL WHERE id = 1
Collections and ranges are supported, as long as its elements can be formatted:
std::vector<long> ids{1, 5, 20}; std::string query = format_sql( conn.format_opts().value(), "SELECT * FROM employee WHERE id IN ({})", ids ); BOOST_TEST(query == "SELECT * FROM employee WHERE id IN (1, 5, 20)");
See this section for more on formatting ranges, and this table for a reference of types that have built-in support for SQL formatting.
Note | |
---|---|
Like with |
Like std::format
, you can use arguments with explicit
indices:
// Recall that you need to set connect_params::multi_queries to true when connecting // before running semicolon-separated queries. std::string query = boost::mysql::format_sql( conn.format_opts().value(), "UPDATE employee SET first_name = {1} WHERE id = {0}; SELECT * FROM employee WHERE id = {0}", 42, "John" ); BOOST_TEST( query == "UPDATE employee SET first_name = 'John' WHERE id = 42; SELECT * FROM employee WHERE id = 42" );
You can also use named arguments, using the initializer list overload:
std::string query = boost::mysql::format_sql( conn.format_opts().value(), "UPDATE employee SET first_name = {name} WHERE id = {id}; SELECT * FROM employee WHERE id = {id}", { {"id", 42 }, {"name", "John"} } ); BOOST_TEST( query == "UPDATE employee SET first_name = 'John' WHERE id = 42; SELECT * FROM employee WHERE id = 42" );
See this section for an in-depth explanation on format strings.
Some types, like strings, can be formatted in multiple ways. As with std::format
,
you can select how to format them using format specifiers.
As we've seen, strings are formatted as single-quoted values by default. If
you use the {:i}
specifier, you can obtain dynamic SQL identifiers,
instead:
std::string query = boost::mysql::format_sql( conn.format_opts().value(), "SELECT id, last_name FROM employee ORDER BY {:i} DESC", "company_id" ); BOOST_TEST(query == "SELECT id, last_name FROM employee ORDER BY `company_id` DESC");
Specifiers are compatible with explicit indices and named arguments, too. This is equivalent to the previous snippet:
std::string query = boost::mysql::format_sql( conn.format_opts().value(), "SELECT id, last_name FROM employee ORDER BY {0:i} DESC", "company_id" );
Some use cases, usually involving conditionals, may not be expressible in terms
of a single format string. In such cases, you can use format_context
and format_sql_to
to build query strings incrementally:
// Compose a query that retrieves all employees in a company, // with an optional limit std::string compose_select_query( boost::mysql::format_options opts, string_view company_id, std::optional<long> limit ) { // format_context will accumulate the query as we compose it boost::mysql::format_context ctx(opts); // format_sql_to expands a format string and appends the result // to a format context. This way, we can build our query in smaller pieces // Add all the query except for the LIMIT clause boost::mysql::format_sql_to(ctx, "SELECT * FROM employee WHERE company_id = {}", company_id); if (limit) { // Add the LIMIT clause boost::mysql::format_sql_to(ctx, " LIMIT {}", *limit); } // Retrieve the generated query string return std::move(ctx).get().value(); }
std::string query = compose_select_query(conn.format_opts().value(), "HGS", {}); BOOST_TEST(query == "SELECT * FROM employee WHERE company_id = 'HGS'"); query = compose_select_query(conn.format_opts().value(), "HGS", 50); BOOST_TEST(query == "SELECT * FROM employee WHERE company_id = 'HGS' LIMIT 50");
The sequence
function can be used when the default range formatting isn't sufficient. If
the elements in your range are not formattable, you can pass a user-defined
function to sequence
describing
how to format each element:
// Employee is a plain struct, not formattable by default std::vector<employee> employees{ {"John", "Doe", "HGS"}, {"Kate", "Smith", "AWC"}, }; std::string query = format_sql( conn.format_opts().value(), "INSERT INTO employee (first_name, last_name, company_id) VALUES {}", sequence( employees, [](const employee& e, format_context_base& ctx) { // This function will be called for each element in employees, // and should format a single element into the passed ctx. // Commas will be inserted separating elements. format_sql_to(ctx, "({}, {}, {})", e.first_name, e.last_name, e.company_id); } ) ); BOOST_TEST( query == "INSERT INTO employee (first_name, last_name, company_id) VALUES " "('John', 'Doe', 'HGS'), ('Kate', 'Smith', 'AWC')" );
By default, elements are separated by commas, but this is configurable:
// A collection of filters to apply to a query std::vector<std::pair<string_view, string_view>> filters{ {"company_id", "HGS" }, {"first_name", "John"}, }; std::string query = format_sql( conn.format_opts().value(), "SELECT * FROM employee WHERE {}", sequence( filters, [](const std::pair<string_view, string_view>& f, format_context_base& ctx) { // Compose a single filter format_sql_to(ctx, "{:i} = {}", f.first, f.second); }, " AND " // glue string: separate each element with AND clauses ) ); BOOST_TEST(query == "SELECT * FROM employee WHERE `company_id` = 'HGS' AND `first_name` = 'John'");
If you are getting a boost::system::system_error
with a client_errc::unknown_character_set
error code (or getting
this error code by other means), your connection is currently unaware of the
character set it's using, which is required by format operations. Try the following:
any_connection::connect
or async_connect
)
before calling format_opts
.
any_connection::set_character_set
or async_set_character_set
to set your connection's character set instead of using raw SQL.
connect_params::connection_collation
values are not supported by all servers and often trigger fallback behavior.
If you are using a non-UTF8 character set, prefer setting it explicitly
using set_character_set
or async_set_character_set
.
Don't rely on connect_params::connection_collation
.
any_connection::reset_connection
and async_reset_connection
wipe character set information. Call set_character_set
or async_set_character_set
after resetting your connection.
Warning | |
---|---|
Security considerations: don't craft |
For an explanation on why format_options
is necessary and how character set tracking works, please read this
section.
SQL formatting can fail if you provide values that can't be securely formatted.
The most common cause is passing string values that are not valid according
to the passed character set. This triggers a client_errc::invalid_encoding
error:
// If the connection is using UTF-8 (the default), this will throw an error, // because the string to be formatted contains invalid UTF8. format_sql(conn.format_opts().value(), "SELECT {}", "bad\xff UTF-8");
You can validate your strings beforehand or handle the error once it happened and reject the input. Other types may also produce format errors.
Tip | |
---|---|
If you prefer handling errors with error codes, instead of exceptions, use
|
Both client-side SQL formatting and prepared statements have pros and cons efficiency-wise:
connection_pool
with prepared statements, you can't use pooled_connection::return_without_reset
,
as this will leak the statement. With client-formatted queries, reset may
not be required if your SQL doesn't mutate session state.
C++ type |
Formatted as... |
Example |
---|---|---|
|
Integral literal |
format_sql(opts, "SELECT {}", 42) == "SELECT 42" format_sql(opts, "SELECT {}", -1) == "SELECT -1" |
|
Integral literal |
format_sql(opts, "SELECT {}", 42u) == "SELECT 42" |
|
Integral literal |
format_sql(opts, "SELECT {}", false) == "SELECT 0" format_sql(opts, "SELECT {}", true) == "SELECT 1" |
String types (convertible to |
Without format specifiers: single-quoted escaped string literal.
Note that |
// Without format specifier: escaped, quoted string value format_sql(opts, "SELECT {}", "Hello world") == "SELECT 'Hello world'" format_sql(opts, "SELECT {}", "Hello 'world'") == R"(SELECT 'Hello \'world\'')" // {:i}: escaped, quoted dynamic identifier format_sql(opts, "SELECT {:i} FROM t", "salary") == "SELECT `salary` FROM t" format_sql(opts, "SELECT {:i} FROM t", "sal`ary") == "SELECT `sal``ary` FROM t" // {:r}: raw, unescaped SQL. WARNING: incorrect use can cause vulnerabilities format_sql(opts, "SELECT * FROM t WHERE id = 42 {:r} salary > 20000", "OR") == "SELECT * FROM t WHERE id = 42 OR salary > 20000" |
Blob types (convertible to |
Hex string literal |
format_sql(opts, "SELECT {}", blob{0x00, 0x48, 0xff}) == R"(SELECT x'0048ff')" |
|
Floating-point literal, after casting to |
// Equivalent to format_sql(opts, "SELECT {}", static_cast<double>(4.2f)) // Note that MySQL uses doubles for all floating point literals format_sql(opts, "SELECT {}", 4.2f) == "SELECT 4.199999809265137e+00" |
|
Floating-point literal. |
format_sql(opts, "SELECT {}", 4.2) == "SELECT 4.2e+00" |
Single quoted, |
format_sql(opts, "SELECT {}", date(2021, 1, 2)) == "SELECT '2021-01-02'" |
|
Single quoted |
format_sql(opts, "SELECT {}", datetime(2021, 1, 2, 23, 51, 14)) == "SELECT '2021-01-02 23:51:14.000000'" |
|
Single quoted |
format_sql(opts, "SELECT {}", std::chrono::seconds(121)) == "SELECT '00:02:01.000000'" |
|
|
|
format_sql(opts, "SELECT {}", nullptr) == "SELECT NULL" |
|
Formats the underlying value if there is any. |
format_sql(opts, "SELECT {}", optional<int>(42)) == "SELECT 42" format_sql(opts, "SELECT {}", optional<int>()) == "SELECT NULL" |
|
Formats the underlying value. |
format_sql(opts, "SELECT {}", field(42)) == "SELECT 42" format_sql(opts, "SELECT {}", field("abc")) == "SELECT 'abc'" format_sql(opts, "SELECT {}", field()) == "SELECT NULL" |
Range of formattable elements. Informally, such ranges support
Ranges of ranges are not supported. Note that
See |
Formats each element in the range, separating elements with commas. |
// long is a WritableField format_sql(opts, "SELECT {}", std::vector<long>{1, 5, 20}) == "SELECT 1, 5, 20" // C++20 ranges and other custom ranges accepted format_sql(opts, "SELECT {}", std::vector<long>{1, 5, 20} | std::ranges::views::take(2)) == "SELECT 1, 5" // Apply the 'i' specifier to each element in the sequence format_sql( opts, "SELECT {::i} FROM employee", std::vector<string_view>{"first_name", "last_name"} ) == "SELECT `first_name`, `last_name` FROM employee" |
|
Formats each element in a range by calling a user-supplied function,
separating elements by a glue string (a comma by default). |
format_sql( opts, "SELECT {}", sequence( std::vector<int>{1, 5, 20}, [](int val, format_context_base& ctx) { format_sql_to(ctx, "{}+1", val); } ) ) == "SELECT 1+1, 5+1, 20+1" |
Custom type that specializes |
Calls |
|
Formats the underlying value. Can represent any of the types above. |
format_sql(opts, "SELECT {}", formattable_ref(42)) == "SELECT 42" format_sql(opts, "SELECT {:i} FROM t", formattable_ref("salary")) == "SELECT `salary` FROM t" |