Author: Qin Guangfei
A member of the DBA team of Acsen, responsible for daily project problem handling and company platform problem troubleshooting, interested in databases and ideas on technology. Once you enter IT, it is as deep as the sea, and since then, integrity is a passerby.
Source of this article: original contribution
*The original content is produced by the open source community of Aikesheng, and the original content shall not be used without authorization. For reprinting, please contact the editor and indicate the source.
problem background
A customer of our company recently found the following phenomenon when checking a batch of newly installed MySQL databases:
- The MySQL client character set of this batch is all Latin1;
- However, for MySQL deployed with the same parameter template before, the client character set is utf8;
It is known that the MySQL version is 5.7.32, and the server operating system is Redhat 7, so why are the MySQL character sets of the two installations different?
Character set introduction
First of all, we briefly review the relevant knowledge of MySQL character set. MySQL character set can be roughly divided into the following two aspects:
Server-level character set
- The server-level character set, that is, the character set used when data is stored in the database, can be further subdivided into library-level, table-level, and field-level;
- Generally speaking, if there is no special specification when building a database and building a table, then the server-level character set will be used;
- The character set at the server level can be specified using the character_set_server parameter;
Client-level character set
The client-level character set, that is, the character set used by the client to connect to the database, is controlled by the following parameters:
- character_set_client: The server believes that the request sent by the client is encoded with this parameter, so it will use this parameter to decode after receiving the request;
- character_set_connection: When the server internally processes the request string, it will be converted from character_set_client to character_set_connection, so the two parameters must be consistent;
- character_set_results: When the server returns the query result to the client, it will encode it according to character_set_results, and then return it, so it needs to be consistent with character_set_client;
- That is to say, the three parameters that control the client-level character set need to be consistent, and the following methods can be used:
-- 数据库中直接修改
mysql> set names utf8;
-- 配置文件中修改,客户端重新连接
shell> vi /etc/my.cnf
[mysql]
default-character-set = utf8
shell>
-- mysql客户端使用的my.cnf文件,可以根据mysql --help | grep "my.cnf" 确认
shell> mysql --help | grep "my.cnf"
order of preference, my.cnf, $MYSQL_TCP_PORT,
/etc/my.cnf /etc/mysql/my.cnf /usr/local/mysql/etc/my.cnf ~/.my.cnf
shell>
Troubleshooting process
Back to this question, we first checked the character set of the MySQL installed before, it is indeed utf8
After that, check all the my.cnf files on the server, and find that no matter whether it is previously installed or the latest batch of installations, only the my.cnf file specified by MySQL runtime has character set related parameters: character_set_server=utf8mb4
Finally, I checked the character set of the operating system on the server and found that the problematic one was en_US, while the original one was en_US.UTF-8
It seems to have found where the problem lies. Under the test environment verification, when the server character set is set to en_US, the MySQL client character set becomes Latin1
So, why is this happening, let's see what the official documentation says: https://dev.mysql.com/doc/refman/5.7/en/charset-connection.html#charset-connection-client-configuration
Translated, there are roughly two meanings:
- The client tools mysql, mysqladmin, mysqlcheck, mysqlimport, and mysqlshow all have a default character set, latin1 for MySQL 5.7 and utf8mb4 for MySQL 8.0;
- Each client tool will detect the character set of the operating system, such as the value of the Linux system LANG environment variable, if MySQL supports the operating system's character set, it will use the operating system's character set (this support includes the OS character set when not exact match is made) will map to the closest MySQL character set); if not supported, the client default character set is used;
We know that the closest character set to en_US is Latin1, so back to our question, when the character set of the server is en_US, we see that the MySQL client character set is Latin1, is it understandable?
The client using MySQL 8.0 can further verify that when an exact match is not possible, the closest character set of MySQL is used:
To adjust the character set of the server (Redhat 7 or CentOS 7), please refer to the following methods:
-- 立即生效,但重启后会丢失
shell> export LANG="en_US.UTF-8"
-- 永久生效
shell> vim /etc/locale.conf
LANG="en_US.UTF-8"
shell> source /etc/locale.conf
Summarize
- The reason for the inconsistency between the character set this time and the previous one is that the character set of this batch of virtual machine images is somewhat problematic. Generally speaking, both Redhat 7 and CentOS 7 will be en_US.UTF-8;
- The character set at the MySQL Server level can be specified with the character_set_server parameter;
- The three parameters that control the character set at the MySQL Client level need to be consistent, generally utf8 (utf8mb4 for MySQL 8.0), and are related to the character set of the server where MySQL Client is located;
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。