1
头图

Hello everyone, this is Yuebai.

Writing this article is not to compare the pros and cons of mysql and elasticsearch (they are different by nature, there is nothing comparable), but to share a query problem that I encountered recently at work and the process of solving this problem. For elasticsearch, I'm still at the stage of understanding it. If it weren't for the needs of this work, I might not study it😂

Well, back to the topic, due to internal work adjustments, I received a company's edge project, the volume is not large, the number of hundreds of thousands of users. However, it was the number of hundreds of thousands of users in this mere that caused the problem of too many mysql in query parameters, and the operation of the management background to query the customer list was slow and even paralyzed for a while.

You may be thinking, is the code written too badly or the early design is not well considered?

e18d20c94006dfe0-3cdd6892e1ea4e8e-5fd0bb2a3b67d491916c6556801d6eae.jpeg
In fact, this can also be understood. After all, the demand for products is changeable, and it is common for ill-consideration. This list query originally only had a few simple field queries, and they were all field queries in the customer form table. As the product changes, there are as many as a dozen query conditions. Among them, the label query and the joint table can not solve the problem. What is the specific situation?

Don't worry, let me briefly introduce
开始你的表演.gif

Brief description of the problem

In order to focus on describing this problem, the table structure has been brutally simplified, and it’s good to understand this meaning😂
The front-end display is the following paging table (the picture screenshot is from the official ant design document)
企业微信20211021-191939.png

Table Structure

Customer table custmers

Fieldtype
idint
namestring
gendertinyint
ageint
remarkstring
created_attimestamp
updated_attimestamp

Tags table tags

Fieldtype
idint
namestring
created_attimestamp
updated_attimestamp

Association table customer_tag

Fieldtype
customer_idint
tag_idint

Query requirements

Now the product needs to be inquired by the customer's name and the label on the customer, and multiple labels are supported at the same time. The original SQL is probably like this

/*
 获取符合要求的 customer_id 列表 
 查出来一大堆 customer_id
 */
select customer_id from custmer_tag where tag_id in (传入的tag_id);

/*
 通过 customer_id 查询
 每翻一页都得经历这一大堆id的in查询,id过多还会导致代码直接崩溃
*/
select * from customers where id in (一大堆id) limit 10 offset 0; 

The problem is obvious, too many parameters in the in query, not only inefficient, but also in extreme cases can cause the SQL program to crash if it is too long.

At first glance, I thought it could be saved (slower, slower, first make the program not abnormal), so I changed to the following statement:

select * from customers where id in (select customer_id from custmer_tag where tag_id in (传入的 tag_id) group by customer_id));

But after a closer look at the business logic, I gave up. Multiple tag_id queries must support and and or query logic. select customer_id from custmer_tag where tag_id in (1,2,3) group by customer_id is an or relational query. Customer only needs to have any tag_id to satisfy the query conditions. But what if it is and logic? To find out customers with tags 1, 2, and 3 at the same time, then this statement is not applicable. Of course, if you must use sql to check, you may be able to find out. I didn't try again here. After all, even if the subquery works, the efficiency is very low and it is not a long-term solution.

solution

How to solve it? My first reaction was to first query the id of the qualified customer table through es, and then query the data through id, so that the entire business logic is changed minimally, and there is no problem with efficiency. After all, the original query and data assembly logic is very complicated. , Involving five or six tables and subsequent data processing, I really don’t want to make too many changes (regular workplace life-saving)
4e8872a8f8104b737b669bacc7903cc8.jpg

Start to optimize

Build elasticsearch service

This is the job of operation and maintenance, I just mention the needs.

Data write es

I have to do this 😅
Since the fields that need to be queried involve multiple tables, simply write all relevant fields and basic customer information to es to facilitate the subsequent business that also needs to be queried (there is really used later, and fields are added)

After the field is determined, how to update it in real time? Trigger write and update operations at each point of business modification and insertion?
Obviously not. It is easy to write incompletely in this way, and it also requires a lot of mechanized work. Obviously it is not what programmers living in dire straits should do. So how to do it?

At this time, if we can synchronize the data like mysql from the library, and then we update es, wouldn't it be perfect. Do whatever it takes. After some tossing (mainly monitoring binlog), the problem of data storage is finally solved.

rebuild code

The code logic now becomes like this:

  1. Query the customer id 100, 200, 300 from the es page
  2. Delete all the where conditions of the original sql, add where id in (100,200,300), and other logic does not need to be changed

At this point, the business reconstruction is completed, and the query speed is easily increased by a hundredfold. What are the disadvantages?

! ! ! It costs money! ! ! ! !
充钱能解决.gif

Afterword

Although the problem has been solved at present, from this incident, I realized the convenience of es and decided to learn more. The es and mysql monitoring and other related content involved in this article will be updated later!

Just the sauce, I wish you all the best in your work! I am Yuebai, an ape who survived on the Internet.


月白
316 声望5 粉丝

大家好,我是月白