The interviewer asked me MySQL tuning, I really am

Interviewer : How about you tell me how to tune MySQL?

Candidate : Wow, this is a big proposition... I think that for developers, the focus of tuning MySQL is generally on "development specifications", "database indexes", or solving slow online queries.

candidate : For the internal parameter tuning of MySQL, a professional DBA will do it.

Interviewer : After so much, you just want to express that you don’t know MySQL parameter tuning, right?

Candidate : Grass, was found.

Interviewer : Then you come to talk about the norms and indexes that you usually develop, how do you usually do?

candidate : Well, first of all, in the production environment, we create database tables in the work order system (then naturally require DBA approval). If the index is not created when the table is created, it will directly prompt the warning (:

candidate : In theory, if the table has a certain amount of data, then the corresponding index should be created. There are still a lot of things to pay attention to when querying data from the database, and many of them are accumulated in ordinary times. For example:

candidate : 1. Can the "covering index" be used to reduce the time consumed by "returning to the table". Means that when we select, we must specify the corresponding column, not select *

candidate : 2. Consider whether to build a "joint index", if you build a "joint index", try to put the highest degree of discrimination on the far left, and you need to consider the "leftmost matching principle"

candidate : 3. Performing function operations or expression calculations on the index will cause the index to fail

candidate : 4. Use sub-queries to optimize super-multi-page scenarios. For example, limit offset, n is to get offset + n records in MySQL, and then return n records. The use of sub-queries is to find out n, and retrieve the corresponding records through ID to improve query efficiency.

Interviewer : Well...

candidate : 5. Use the explain command to view the SQL execution plan to see if the SQL you wrote has gone and what indexes have gone. Use show profile to view the loss of system resources by SQL (but it is generally seldom used)

candidate : 6. After opening the transaction, only operate the database as much as possible in the transaction, and consciously reduce the lock holding time (for example, if you need to insert && modify data in the transaction, you can insert first and then modify it. Because of the modification It is an update operation, and row locks will be added. If it is updated first, concurrent requests may cause multiple transactions to wait for row locks to be released)

Interviewer : Well, you mentioned affairs. You also talked about the isolation level of affairs before. What isolation level do you use online?

candidate : Well, we use Read Commit here, and MySQL uses Repeatable read by default. Which isolation level to choose depends on the application scenario, because the lower the isolation level, the higher the transaction concurrency performance.

candidate : (General Internet companies choose Read Commit as the main isolation level)

candidate : Like Repeatable read (repeatable read) isolation level, there may be deadlock problems caused by "gap lock".

candidate : But you may already know that the default isolation level of MySQL is Repeatable read. A large part of the reason is that at the very beginning, MySQL's binlog did not have a row mode. Under the read commit isolation level, there will be a problem of "master-slave data inconsistency".

candidate : binlog records database table structure and table data "changes", such as update/delete/insert/truncate/create. In MySQL, master-slave synchronization is actually implemented by using binlog (:

candidate : For this historical reason, MySQL sets the default isolation level to Repeatable read

Interviewer : Well, I would like to ask by the way, have you encountered similar problems: Even if the index is correct, online query is still slow.

candidate : Yes, of course I have encountered

Interviewer : Then how did you do it?

candidate : If the index is on the right, but the query is still slow, then generally speaking, the amount of data in the table is too large.

candidate : First of all, consider whether the "old data" can be "deleted". For our company, we will synchronize the data to Hive, indicating that a copy has been stored offline.

candidate : If the "old data" has no business to query, the easiest way is to "delete" part of the data. The amount of data is reduced, so naturally, the retrieval speed will be faster...

Interviewer : Well, but generally not deleted

candidate : Yes, only a very small part of the business can delete data (:

candidate : Then, consider another situation, can you go directly to a layer of cache (Redis) before querying.

candidate : If you go to the cache, it depends on whether the business can tolerate the "non-real-time" data read (after all, the consistency of Redis and MySQL data needs to be guaranteed), if the query conditions are relatively complex and changeable ( Involving various group by and sum), then caching is not a good way, and it is inconvenient to maintain...

Candidate see if there is a "string" retrieval scenario that causes query inefficiency. If so, you can consider importing the table data to a search engine like Elasticsearch, and subsequent online queries will go directly to Elasticsearch NS.

candidate : MySQL->Elasticsearch needs to have a corresponding synchronization program (usually it is to monitor the binlog of MySQL, parse the binlog and import it into Elasticsearch)

candidate : If it is not, then consider whether to make the corresponding aggregation table according to the dimension of the query condition, and query the data of the aggregation table for online requests instead of the original table.

candidate order, there is an order detail, and the order detail table is too large. However, the query function revealed on the product side (front desk) is displayed in the dimension of "days", then the daily data of each user can be aggregated. In the aggregate table, there is only one aggregated data for a user a day.

candidate : query the aggregated table, the speed must be leveraged (the amount of data in the aggregated table must be much less than the original table)

candidate : The general idea is to "change space for time", and the same data is stored in another place to improve query efficiency

Interviewer : Then I would like to ask, in addition to reading, writing performance also has a bottleneck, what should I do?

candidate : When you talk about this, I am not sleepy.

candidate : If there are bottlenecks in MySQL reading and writing, first look at the current MySQL architecture.

candidate : If it is a single library, is it possible to consider upgrading to a master-slave architecture to achieve read-write separation?

candidate : The simple understanding is: the main library receives write requests and the slave library receives read requests. The data of the slave library is updated by the binlog sent by the master library to achieve the consistency of the master-slave data (in a general scenario, the master-slave data is asynchronous to ensure final consistency)

Interviewer : Well...

candidate : If there is still a bottleneck in reading and writing under the master-slave architecture, then it is necessary to consider whether to sub-database and sub-table

candidate : At least under the structure of my former company, the business is differentiated. There is a traffic database for traffic, an advertisement database for advertisements, and a product database for commodities. Therefore, the meaning of the sub-database sub-table I am talking about here is: to split a table in an original library.

candidate : For example, now I have a business order table, this order table is in the advertising database, suppose this business order table has 100 million data volume, now I want to sub-database and sub-table

candidate : Then the data of this table will be divided into multiple advertisement libraries and multiple tables (:

candidate the requests (Originally, a single database and a single table have 100 million data, then suppose I separate 8 databases, then each database has a data volume of 1200+W, Each database is divided into 8 tables, and each table has a data volume of 150W).

Interviewer : What do you use as the sub-database key?

Candidate : According to our experience, it is generally based on userId (because there are more queries based on the user's dimension). If you want to query according to other dimensions, you still refer to the above idea (in exchange for space time).

Interviewer : How is the ID of the

candidate : This involves the way the distributed ID is generated, and there are many ideas. Some are self-incremented with the help of MySQL, some are self-incremented with the help of Redis, and some are self-incremented based on the "snow algorithm". The specific method to use depends on the company's technology stack. Generally, Redis and the "snow algorithm"-based implementation are more commonly used.

candidate : As for why auto-increment is emphasized (it is still related to the order of the index, which has been mentioned before, you should remember)

Interviewer : Well, if I want to sub-database and sub-table, what is the migration process?

candidate : We generally adopt the "double write" method for migration. The general steps are:

candidate : 1. Incremental messages are written to the new table and the old table respectively

candidate : 2. Migrate the data of the old table to the new database

candidate : 3. Sooner or later the data of the new table will catch up with the old table (data is synchronized on a node)

candidate : Fourth, verify whether the data of the new table and the old table are normal (mainly to see if they can match)

candidate : Fifth, open dual reading (part of the traffic goes to the new table, part of the flow goes to the old table), which is equivalent to the process of grayscale going online

candidate : Six, all read traffic cuts to the new table, stop writing to the old table

candidate : Seven, prepare the rollback mechanism in advance, and the temporary switchover failure can restore normal business and related procedures for repairing data.

Interviewer 16168d29d9500c: Well...I

This article summarizes:

There is a certain amount of data in the database table, it needs to have a corresponding index
When you find a slow query, check whether the index is correct, whether you can optimize the query speed with a better index, and check whether there is any problem with the posture of the index
When the index cannot solve the slow query, it is generally caused by the large amount of data in the business table, and the idea of using space for time
When the read and write performance encounters bottlenecks, first consider whether you can upgrade the database architecture to solve the problem, if not, you need to consider sub-database sub-table
Although sub-database and sub-table can solve the bottleneck of reading and writing, it will also bring various problems. It is necessary to investigate solutions and step on pits in advance.

Online is not a place for you to show off your skills, stability is the last word. Can be solved in a simple way, not in a complicated way.

Welcome to follow my WeChat public [16168d29d95143 Java3y ] to talk about Java interviews. The online interviewer series is being updated continuously!

[Online Interviewer-Mobile] The series updated twice a week!

[Online Interviewer-Computer] The series updated twice a week!

Originality is not easy! ! Seek three links! !