...one of the most highly
regarded and expertly designed C++ library projects in the
world.
— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards
Multi-function operations allow running operations as a set of separate steps, which gives you better control over execution. They work by splitting some of the reads and writes into several function calls.
You can use multi-function operations to execute text queries and prepared statements, and through the dynamic or the static interface.
To make a good use of multi-function operations, you should have a basic understanding of the underlying protocol.
The protocol uses messages to communicate. These are delimited by headers containing the message length. All operations are initiated by the client, by sending a single request message, to which the server responds with a set of response messages.
The diagram below shows the message exchange between client and server for text queries and statement executions. Each arrow represents a message.
There are two separate cases:
last_insert_id
and affected_rows
.
connection::execute
handles the full message exchange. In contrast, connection::start_execution
will not read the rows, if any.
Some takeaways:
execution_state
is the main class for the dynamic interface in multi-function operations. An
execution state holds information required to progress the execution operation,
like metadata (required to parse the rows) and protocol state. Contrary to
results
, it doesn't contain
the rows.
Given the following table definition:
const char* table_definition = R"%( CREATE TEMPORARY TABLE posts ( id INT PRIMARY KEY AUTO_INCREMENT, title VARCHAR (256) NOT NULL, body TEXT NOT NULL ) )%";
You can start a multi-function operation using connection::start_execution
:
// st will hold information about the operation being executed. // It must be passed to any successive operations for this execution execution_state st; // Sends the query and reads response and meta, but not the rows conn.start_execution("SELECT title, body FROM posts", st);
We now must read all the generated rows by
calling connection::read_some_rows
,
which will return a batch of an unspecified size:
// st.complete() returns true once the OK packet is received while (!st.complete()) { // row_batch will be valid until conn performs the next network operation rows_view row_batch = conn.read_some_rows(st); for (row_view post : row_batch) { // Process post as required std::cout << "Title:" << post.at(0) << std::endl; } }
Some remarks:
read_some_rows
will return at least one, but may return more.
execution_state::complete
returns true
after we've read
the final OK packet for this operation.
row_batch
may
or may not be empty, depending on the number of rows and their size.
read_some_rows
after reading the final OK packet returns an empty batch.
read_some_rows
returns a rows_view
object pointing into the connection's internal buffer. This view is valid until
the connection performs any other operation involving a network transfer.
Note that there is no need to distinguish between case 1 and case 2 in the diagram above in our code, as reading rows for a complete operation is well defined.
The mechanics are similar to what's been exposed above. The static interface
uses static_execution_state
to carry state. As with static_results
,
we must define and pass a type describing our rows:
// We can use a plain struct with ints and strings to describe our rows. // This must be placed at the namespace level struct post { int id; std::string title; std::string body; }; // We use BOOST_DESCRIBE_STRUCT to add reflection capabilities to post. // We must list all the fields that should be populated by Boost.MySQL BOOST_DESCRIBE_STRUCT(post, (), (id, title, body))
We can now start our operation using the same connection::start_execution
:
// st will hold information about the operation being executed. // It must be passed to any successive operations for this execution static_execution_state<post> st; // Sends the query and reads response and meta, but not the rows. // If there is any schema mismatch between the declared row type and // what the server returned, start_execution will detect it and fail conn.start_execution("SELECT id, title, body FROM posts", st);
We now must read all the generated rows by
calling connection::read_some_rows
:
// storage will be filled with the read rows. You can use any other contiguous range. std::array<post, 20> posts; // st.complete() returns true once the OK packet is received while (!st.complete()) { std::size_t read_rows = conn.read_some_rows(st, boost::span<post>(posts)); for (const post& p : boost::span<post>(posts.data(), read_rows)) { // Process post as required std::cout << "Title " << p.title << std::endl; } }
Some remarks:
static_execution_state
doesn't store rows anyhow. It uses the row types passed as template parameters
to validate the metadata returned by the server, and ensure it is compatible
with the C++ data structures that will be used with read_some_rows
.
read_some_rows
a boost::span
of the appropriate row type. We've
used std::array
to place rows on the stack, but
you can use any other contiguous range.
read_some_rows
returns
the number of read rows. At maximum, this will be the size of the span,
but there may be less, depending on row and network buffer sizes.
read_some_rows
will return at least one, but may return more.
execution_state::complete
returns true
after we've read
the final OK packet for this operation.
read_some_rows
after reading the final OK packet always reads zero rows.
You can access metadata at any point, using execution_state::meta
or static_execution_state::meta
.
This function returns a collection of metadata
objects. For more information, plase refer to this
section.
You can access OK packet data using functions like last_insert_id
and affected_rows
in both execution_state
and
static_execution_state
. As
this information is contained in the OK packet, it can
only be accessed once the complete
function returns true
.
When using operations that return more than one resultset (e.g. when calling stored procedures), the protocol is slightly more complex:
The message exchange is as follows:
For example, given the following stored procedure:
CREATE PROCEDURE get_company(IN pin_company_id CHAR(10)) BEGIN START TRANSACTION READ ONLY; SELECT id, name, tax_id FROM company WHERE id = pin_company_id; SELECT first_name, last_name, salary FROM employee WHERE company_id = pin_company_id; COMMIT; END
We can write:
Dynamic interface |
Static interface |
---|---|
// Get the company ID to retrieve, possibly from the user std::string company_id = get_company_id(); // Call the procedure execution_state st; statement stmt = conn.prepare_statement("CALL get_employees(?)"); conn.start_execution(stmt.bind(company_id), st); // The above code will generate 3 resultsets // Read the 1st one, which contains the matched companies while (st.should_read_rows()) { rows_view company_batch = conn.read_some_rows(st); // Use the retrieved companies as required for (row_view company : company_batch) { std::cout << "Company: " << company.at(1).as_string() << "\n"; } } // Move on to the 2nd one, containing the employees for these companies conn.read_resultset_head(st); while (st.should_read_rows()) { rows_view employee_batch = conn.read_some_rows(st); // Use the retrieved employees as required for (row_view employee : employee_batch) { std::cout << "Employee " << employee.at(0).as_string() << " " << employee.at(1).as_string() << "\n"; } } // The last one is an empty resultset containing information about the // CALL statement itself. We're not interested in this conn.read_resultset_head(st); assert(st.complete()); |
// Get the company ID to retrieve, possibly from the user std::string company_id = get_company_id(); // Our procedure generates three resultsets. We must pass each row type // to static_execution_state as template parameters using empty = std::tuple<>; static_execution_state<company, employee, empty> st; // Call the procedure statement stmt = conn.prepare_statement("CALL get_employees(?)"); conn.start_execution(stmt.bind(company_id), st); // Read the 1st one, which contains the matched companies std::array<company, 5> companies; while (st.should_read_rows()) { std::size_t read_rows = conn.read_some_rows(st, boost::span<company>(companies)); // Use the retrieved companies as required for (const company& c : boost::span<company>(companies.data(), read_rows)) { std::cout << "Company: " << c.name << "\n"; } } // Move on to the 2nd one, containing the employees for these companies conn.read_resultset_head(st); std::array<employee, 20> employees; while (st.should_read_rows()) { std::size_t read_rows = conn.read_some_rows(st, boost::span<employee>(employees)); // Use the retrieved companies as required for (const employee& emp : boost::span<employee>(employees.data(), read_rows)) { std::cout << "Employee " << emp.first_name << " " << emp.last_name << "\n"; } } // The last one is an empty resultset containing information about the // CALL statement itself. We're not interested in this conn.read_resultset_head(st); assert(st.complete()); |
Note that we're using should_read_rows
instead of complete
as our loop termination condition. complete()
returns true when all the resultsets have
been read, while should_read_rows()
will return false once an individual result
has been fully read.
When using the static interface with multi-function operations, not all schema
mismatches can be found by the start_execution
function, since not all the information is available at this point. Errors
may be reported by read_some_rows
and read_resultset_head
, too.
Overall, the same checks are performed as when using connection::execute
,
but at different points in time.
execution_state
and static_execution_state
can be seen as state
machines with four states. Each state describes which reading function should
be invoked next:
should_start_op
:
the initial state, after you default-construct an execution_state
.
You should call connection::start_execution
or connection::async_start_execution
to start the operation.
should_read_rows
:
the next operation should be connection::read_some_rows
,
to read the generated rows.
should_read_head
:
the next operation should be connection::read_resultset_head
,
to read the next resultset metadata. Only operations that generate multiple
resultsets go into this state.
complete
:
no more operations are required.
For multi-function operations, you may also access OK packet data ever time
a resultset has completely been read, i.e. when should_read_head
returns true
.
This example shows this general case. It uses multi-queries to run an arbitrary SQL script file.
To properly understand read_some_rows
,
we need to know that every connection
owns an internal buffer, where packets sent
by the server are stored. It is a single, flat buffer, and you can configure
its initial size when creating a connection
,
passing a buffer_params
object as the first argument to connection
's
constructor. The read buffer may be grown under certain circumstances to accommodate
large messages.
read_some_rows
gets the maximum
number of rows that fit in the internal buffer (without growing it) performing
a single read_some
operation
on the stream (or using cached data). If there are rows to read, read_some_rows
guarantees to read at least
one. This means that, if doing what we described yields no rows (e.g. because
of a large row that doesn't fit into the buffer), read_some_rows
will grow the buffer or perform more reads until at least one row has been
read. If you're using the static interface, the number of read rows is limited
by the size of span you passed, too.
If you want to get the most of read_some_rows
,
customize the initial buffer size to maximize the number of rows that each
batch retrieves.