This pain point of mysql can be easily solved with elasticsearch

Hello everyone, this is Yuebai.

Writing this article is not to compare the pros and cons of mysql and elasticsearch (they are different by nature, there is nothing comparable), but to share a query problem that I encountered recently at work and the process of solving this problem. For elasticsearch, I'm still at the stage of understanding it. If it weren't for the needs of this work, I might not study it😂

Well, back to the topic, due to internal work adjustments, I received a company's edge project, the volume is not large, the number of hundreds of thousands of users. However, it was the number of hundreds of thousands of users in this mere that caused the problem of too many mysql in query parameters, and the operation of the management background to query the customer list was slow and even paralyzed for a while.

You may be thinking, is the code written too badly or the early design is not well considered?

e18d20c94006dfe0-3cdd6892e1ea4e8e-5fd0bb2a3b67d491916c6556801d6eae.jpeg
In fact, this can also be understood. After all, the demand for products is changeable, and it is common for ill-consideration. This list query originally only had a few simple field queries, and they were all field queries in the customer form table. As the product changes, there are as many as a dozen query conditions. Among them, the label query and the joint table can not solve the problem. What is the specific situation?

Don't worry, let me briefly introduce
开始你的表演.gif

Brief description of the problem

In order to focus on describing this problem, the table structure has been brutally simplified, and it’s good to understand this meaning😂
The front-end display is the following paging table (the picture screenshot is from the official ant design document)
企业微信20211021-191939.png

Table Structure

Customer table custmers

Field	type
id	int
name	string
gender	tinyint
age	int
remark	string
created_at	timestamp
updated_at	timestamp

Tags table tags

Field	type
id	int
name	string
created_at	timestamp
updated_at	timestamp

Association table customer_tag

Field	type
customer_id	int
tag_id	int

Query requirements

Now the product needs to be inquired by the customer's name and the label on the customer, and multiple labels are supported at the same time. The original SQL is probably like this

/*
 获取符合要求的 customer_id 列表 
 查出来一大堆 customer_id
 */
select customer_id from custmer_tag where tag_id in (传入的tag_id);

/*
 通过 customer_id 查询
 每翻一页都得经历这一大堆id的in查询，id过多还会导致代码直接崩溃
*/
select * from customers where id in (一大堆id) limit 10 offset 0;

The problem is obvious, too many parameters in the in query, not only inefficient, but also in extreme cases can cause the SQL program to crash if it is too long.

At first glance, I thought it could be saved (slower, slower, first make the program not abnormal), so I changed to the following statement:

select * from customers where id in (select customer_id from custmer_tag where tag_id in (传入的 tag_id) group by customer_id));

But after a closer look at the business logic, I gave up. Multiple tag_id queries must support and and or query logic. select customer_id from custmer_tag where tag_id in (1,2,3) group by customer_id is an or relational query. Customer only needs to have any tag_id to satisfy the query conditions. But what if it is and logic? To find out customers with tags 1, 2, and 3 at the same time, then this statement is not applicable. Of course, if you must use sql to check, you may be able to find out. I didn't try again here. After all, even if the subquery works, the efficiency is very low and it is not a long-term solution.

solution

How to solve it? My first reaction was to first query the id of the qualified customer table through es, and then query the data through id, so that the entire business logic is changed minimally, and there is no problem with efficiency. After all, the original query and data assembly logic is very complicated. , Involving five or six tables and subsequent data processing, I really don’t want to make too many changes (regular workplace life-saving)

Start to optimize

Build elasticsearch service

This is the job of operation and maintenance, I just mention the needs.

Data write es

I have to do this 😅
Since the fields that need to be queried involve multiple tables, simply write all relevant fields and basic customer information to es to facilitate the subsequent business that also needs to be queried (there is really used later, and fields are added)

After the field is determined, how to update it in real time? Trigger write and update operations at each point of business modification and insertion?
Obviously not. It is easy to write incompletely in this way, and it also requires a lot of mechanized work. Obviously it is not what programmers living in dire straits should do. So how to do it?

At this time, if we can synchronize the data like mysql from the library, and then we update es, wouldn't it be perfect. Do whatever it takes. After some tossing (mainly monitoring binlog), the problem of data storage is finally solved.

rebuild code

The code logic now becomes like this:

Query the customer id 100, 200, 300 from the es page
Delete all the where conditions of the original sql, add where id in (100,200,300), and other logic does not need to be changed

At this point, the business reconstruction is completed, and the query speed is easily increased by a hundredfold. What are the disadvantages?

! ! ! It costs money! ! ! ! !
充钱能解决.gif

Afterword

Although the problem has been solved at present, from this incident, I realized the convenience of es and decided to learn more. The es and mysql monitoring and other related content involved in this article will be updated later!

Just the sauce, I wish you all the best in your work! I am Yuebai, an ape who survived on the Internet.

This pain point of mysql can be easily solved with elasticsearch

Brief description of the problem

Table Structure

Query requirements

solution

Start to optimize

Build elasticsearch service

Data write es

rebuild code

Afterword

月白

引用和评论

从 12s 到 200ms，MySQL 两千万数据 6 种深度分页优化

Open WebUI：开源AI交互平台的全面解析

被 Manus 带火的 MCP 是什么｜一文看懂

MySQL × 向量数据库：大模型时代的黄金组合实战指南

百万级群聊的设计实践

MySQL Binlog 深度解析

MySQL 备份 Shell 脚本：支持远程同步与阿里云 OSS 备份