Technology Sharing | Create Index on Long String

爱可生云数据库
中文

Author: Yao Yuan

MySQL ACE, Huawei Cloud MVP, focused on Oracle, MySQL database for many years, Oracle 10G and 12C OCM, MySQL 5.6, 5.7, 8.0 OCP. Now Dingjia Technology serves as a technical consultant, providing database training and technical support services for colleagues and customers.

Source of this article: original submission

* Produced by the Aikesheng open source community, original content is not allowed to be used without authorization, please contact the editor and indicate the source for reprinting.


When creating an index on a field of a very long string, the index will become very large and inefficient. One solution is to use the crc32 or md5 function to hash the long string and then create an index on the result of the calculation. In MySQL 5.7 or later, you can create an automatically generated field. For example, you can create the following table:

create table website(
id int unsigned not null,
web varchar(100) not null,
webcrc int unsigned generated always as (crc32(web)) not null,
primary key (id)
);

Insert records into this table:

mysql> insert into website(id,web) values(1,"https://www.scutech.com");
Query OK, 1 row affected (0.07 sec)

mysql> select * from website;
+----+-------------------------+-----------+
| id | web                     | webcrc    |
+----+-------------------------+-----------+
|  1 | https://www.scutech.com | 851176738 |
+----+-------------------------+-----------+
1 row in set (0.00 sec)

You can see that the cyclic redundancy check value of the web field is automatically generated in the field webcrc. Create an index on this field to get a small, efficient index.

In MySQL 8.0.13 or later, functional indexes can be created directly, for example:

create table website8(
id int unsigned not null,
web varchar(100) not null,
primary key (id),
index ((crc32(web)))
);

Query the index on this table:

mysql> show index from website8\G
*************************** 1. row ***************************
        Table: website8
   Non_unique: 0
     Key_name: PRIMARY
 Seq_in_index: 1
  Column_name: id
    Collation: A
  Cardinality: 0
     Sub_part: NULL
       Packed: NULL
         Null:
   Index_type: BTREE
      Comment:
Index_comment:
      Visible: YES
   Expression: NULL
*************************** 2. row ***************************
        Table: website8
   Non_unique: 1
     Key_name: functional_index
 Seq_in_index: 1
  Column_name: NULL
    Collation: A
  Cardinality: 0
     Sub_part: NULL
       Packed: NULL
         Null:
   Index_type: BTREE
      Comment:
Index_comment:
      Visible: YES
   Expression: crc32(`web`)
2 rows in set (0.00 sec)

You can see that the first index is the primary key, and the second index is the functional index.

Another way to solve the length of the index field is to create a prefix index. The syntax for creating a prefix index is: col_name(length). The prefix index is to create an index on the first part of the string. The supported data types include: CHAR, VARCHAR , BINARY and VARBINARY. The key to creating a prefix index is to select the length of the string of the prefix. The longer the length, the higher the selectivity of the index, but the larger the storage space.

The c field in the sbtest2 table is a 120-length string. The following SQL statement queries the index selectivity at different lengths:

mysql> select
count(distinct(left(c,3)))/count(*) sel3,
count(distinct(left(c,7)))/count(*) sel7,
count(distinct(left(c,9)))/count(*) sel9,
count(distinct c)/count(*) selectivity
from sbtest1;
+--------+--------+--------+-------------+
| sel3   | sel7   | sel9   | selectivity |
+--------+--------+--------+-------------+
| 0.0120 | 0.9959 | 1.0000 |      1.0000 |
+--------+--------+--------+-------------+
1 row in set (1.66 sec)

You can see that creating an index in the first 9 bits of this field can achieve a selectivity of 1, and then increasing the prefix bit of this index will not improve the selectivity of the index. Here is the command to create an index:

mysql> alter table sbtest2 add index (c(9));
阅读 535

中国领先的企业数据处理技术整体解决方案提供商,开源数据库领域优秀企业。为大型行业用户的特定场景提...

278 声望
150 粉丝
0 条评论

中国领先的企业数据处理技术整体解决方案提供商,开源数据库领域优秀企业。为大型行业用户的特定场景提...

278 声望
150 粉丝
文章目录
宣传栏