MySQL defines a character set as "a set of symbols and their respective
utf16 are character sets
supported by MySQL.
A collation is a set of rules for comparing characters in a character set.
For example, a case-insensitive collation will make strings that only differ
in case compare equal. All collations are associated to a single character
set. For example,
is a case-insensitive collation associated to the
character set. Every character set has a default collation, which will be used
if a character set without a collation is specified. For example,
latin1_swedish_ci is the default collation
latin1 character set.
You can find more information about these concepts in the official MySQL docs on character sets.
Every connection has an associated character set and collation. The connection's character set determines the encoding for character strings sent to and retrieved from the server. This includes SQL query strings, string fields and column names in metadata. The connection's collation is used for string literal comparison.
Every session you establish can have its own different character set and collation. You can specify this in two ways:
You specify a numeric ID that identifies the collation to use, and your
connection will use the character set associated to this collation. You
can find collation IDs in the
The problem with this approach is that if you specify a collation ID that
is unknown to the server (e.g.
for an old MySQL 5.7 server), the handshake operation will succeed but
the connection will sillently fall back to the server's
default character set, (usually
which is not Unicode).
SQL statement. For example,
utf8mb4" will set the current connection's character
utf8mb4 and the
connection's collation to utf8mb4's default collation. If the character
set is unknown, the
SET NAMES statement will fail. You can use
to issue the statement:
results result; conn.execute("SET NAMES utf8mb4", result); // Further operations can assume utf8mb4 as conn's charset
Both of the above methods are shortcuts to set several session-level variables. The ones that impact this library's behavior are:
determines the encoding that SQL statements sent to the server should have.
This includes the SQL strings passed to
and string parameters passed to
Not all character sets are permissible in
The server will accept setting this variable to any UTF-8 character set,
but won't accept UTF-16.
determines the encoding that the server will use to send any kind of result,
including string fields retrieved by
and error messages.
Note that, when you define a string column with a character set (e.g.
"CREATE TABLE t1 (col1 VARCHAR(5) CHARACTER
SET latin1)"), the column's character set will be
used for storage and comparisons, but not for client communication. If
utf16, any field obtained
col1 will be UTF16-encoded, and not latin1-encoded.
Note also that
reflects the charset and collation the server has converted the column
to before sending it to the client. In the above example,
metadata::column_collation will be the default
collation for UTF16, rather than
The table below summarizes the encoding used by each piece of functionality in this library:
Encoding given by...
String values passed as parameters to
Server-generated error messages:
ASCII. These can only contain ASCII characters and are always ASCII encoded. More info in this section.