Basic knowledge of database
Why use a database
Data is stored in memory
- Advantages: fast access speed
- Disadvantages: Data cannot be saved permanently
Data is saved in file
- Advantages: permanent storage of data
- Disadvantages: 1) The speed is slower than memory operations, and frequent IO operations. 2) Inconvenient to query data
Data is saved in the database
- Data is saved permanently
- Using SQL statements, queries are convenient and efficient.
- Convenient data management
What is SQL?
Structured Query Language (Structured Query Language) is abbreviated as SQL, which is a database query language ( 12800 words! SQL Syntax Manual 160c035c248fcd).
Role: used to access data, query, update and manage relational database systems.
What is MySQL?
MySQL is a relational database management system, developed by Sweden MySQL ) AB, which is a product of Oracle. MySQL is one of the most popular relational database management systems. In terms of web applications, MySQL is one of the best RDBMS (Relational Database Management System) application software. It is very commonly used in Java enterprise-level development, because MySQL is open source, free, and easy to expand.
What are the three database paradigms
- The first normal form: each column cannot be split.
- Second paradigm: On the basis of the first paradigm, non-primary key columns are completely dependent on the primary key, and cannot be part of the primary key.
- Third paradigm: On the basis of the second paradigm, non-primary key columns only depend on the primary key and do not depend on other non-primary keys.
When designing the database structure, try to comply with the three paradigms. If you do not comply, there must be sufficient reasons. Such as performance. In fact, we often compromise database design for performance.
What are the mysql tables related to permissions
The MySQL server controls the user's access to the database through the permission table. The permission table is stored in the mysql database and mysql_install_db
script. These permission tables are user, db, table_priv, columns_priv and host respectively. The following describes the structure and content of these tables:
User permission table: record the user account information allowed to connect to the server, the permissions inside are global.
db permission table: record the operation permissions of each account on each database.
table_priv permission table: records the operation permissions at the data table level.
columns_priv permission table: record data column-level operation permissions.
Host permission table: cooperate with the db permission table to make more detailed control of database-level operation permissions on a given host. This permission table is not affected by the GRANT and REVOKE statements.
How many input formats does MySQL binlog have? What's the difference?
There are three formats, statement, row and mixed. ROW or STATEMENT? How to choose online MySQL Binlog?
- In statement mode, every sql that will modify data will be recorded in the binlog. There is no need to record the changes of each line, which reduces the amount of binlog, saves IO, and improves performance. Since the execution of sql is contextual, relevant information needs to be saved when saving, and there are some statements that use functions and the like that cannot be recorded and copied.
- At the row level, the context-related information of the SQL statement is not recorded, only which record is modified is saved. The record unit is the change of each line, which can basically be all recorded. However, due to many operations, a large number of line changes (such as alter table) will be caused. Therefore, the file of this mode saves too much information and the log volume is too large.
- Mixed, a compromise solution, uses statement records for common operations, and row when statement cannot be used.
In addition, some optimizations have been made to the row level in the new version of MySQL. When the table structure changes, the statement will be recorded instead of row by row. ! 7000 words study notes, MySQL from entry to abandon
type of data
What data types does mysql have
Integer type
- Including TINYINT, SMALLINT, MEDIUMINT, INT, BIGINT, which respectively represent 1 byte, 2 bytes, 3 bytes, 4 bytes, and 8 bytes integers. Any integer type can be added with the UNSIGNED attribute, indicating that the data is unsigned, that is, a non-negative integer.
- Length: Integer type can be specified length, for example: INT(11) means INT type with length 11. Length is meaningless in most scenarios. It does not limit the legal range of values, but only affects the number of characters displayed, and it is only meaningful when used in conjunction with the UNSIGNED ZEROFILL attribute.
- For example, suppose the type is set to INT(5) and the attribute is UNSIGNED ZEROFILL. If the data inserted by the user is 12, the actual data stored in the database is 00012.
Real number type
- Including FLOAT, DOUBLE, DECIMAL.
- DECIMAL can be used to store integers larger than BIGINT and can store exact decimals.
- The FLOAT and DOUBLE have a range of values, and support the use of standard floating point for approximate calculations.
- FLOAT and DOUBLE are more efficient than DECIMAL when calculating. DECIMAL can be understood as a character string for processing.
String type
- Including VARCHAR, CHAR, TEXT, BLOB
- VARCHAR is used to store variable-length character strings, which is more space-saving than fixed-length types.
- VARCHAR uses an extra 1 or 2 bytes to store the length of the string. When the column length is less than 255 bytes, use 1 byte for representation, otherwise use 2 bytes for representation.
- When the content stored in VARCHAR exceeds the set length, the content will be truncated.
- CHAR is fixed-length, and sufficient space is allocated according to the defined string length.
- CHAR will be filled with spaces as needed to facilitate comparison.
- CHAR is suitable for storing very short strings, or all values are close to the same length.
- When the content stored in CHAR exceeds the set length, the content will also be truncated.
Use strategy:
- For data that changes frequently, CHAR is better than VARCHAR because CHAR is not prone to fragmentation.
- For very short columns, CHAR is more efficient in storage space than VARCHAR.
- When using it, pay attention to allocate only the space needed. Longer columns will consume more memory when sorting.
- Try to avoid using TEXT/BLOB types. Temporary tables will be used when querying, causing serious performance overhead.
Enumerated type (ENUM), which stores unique data as a predefined collection.
- Sometimes you can use ENUM instead of commonly used string types.
- ENUM storage is very compact and will compress the list value to one or two bytes.
- When ENUM is stored internally, it actually stores an integer.
- Try to avoid using numbers as constants in ENUM enumerations, because it is easy to get confused.
- Sorting is according to the internally stored integer
Date and time types, try to use timestamp, space efficiency is higher than datetime,
- Saving timestamps in integers is usually inconvenient to handle.
- If you need to store subtle, you can use bigint storage.
engine
The difference between MySQL storage engine MyISAM and InnoDB
Storage engine: How to store data, indexes and other objects in MySQL is the realization of a file system.
The commonly used storage engines are as follows:
- Innodb engine: Innodb engine provides support for database ACID transactions. It also provides row-level locks and foreign key constraints. The goal of its design is to handle a database system with large data capacity.
- MyIASM engine (the original default engine of Mysql): does not provide transaction support, nor does it support row-level locks and foreign keys.
- MEMORY engine: All the data is in the memory, the data processing speed is fast, but the security is not high.
The difference between MyISAM and InnoDB
The difference between MyISAM index and InnoDB index?
- InnoDB index is a clustered index, MyISAM index is a non-clustered index.
- The leaf nodes of InnoDB's primary key index store row data, so the primary key index is very efficient.
- The leaf node of the MyISAM index stores the row data address, which needs to be addressed again to get the data.
- The leaf nodes of the InnoDB non-primary key index store the primary key and other indexed column data, so it is very efficient to cover the index when querying.
4 major features of InnoDB engine
- Insert buffer
- Double write
- Adaptive hash index (ahi)
- Read ahead
Storage engine selection
If there is no special requirement, just use the default Innodb.
MyISAM: Applications that focus on reading and writing, such as blog systems and news portals.
Innodb: The update (delete) operation frequency is also high, or the integrity of the data must be guaranteed; the amount of concurrency is high, and transactions and foreign keys are supported. Such as OA automated office system.
index
What is an index?
Indexes are a special type of file (the index on the InnoDB data table is an integral part of the table space), they contain reference pointers to all records in the data table.
Index is a data structure. The database index is a sorted data structure in the database management system to help quickly query and update the data in the database table. The realization of the index usually uses the B tree and its variant B+ tree.
More generally speaking, an index is equivalent to a directory. In order to find the contents of the book conveniently, the contents are indexed to form a catalog. Index is a file, it is to occupy physical space.
What are the advantages and disadvantages of indexes?
Advantages of indexing
- Can greatly speed up the data retrieval speed, which is also the main reason for creating an index.
- By using the index, you can use the optimization hider in the query process to improve the performance of the system.
Disadvantages of indexing
- In terms of time: creating and maintaining indexes takes time. Specifically, when adding, deleting, and modifying data in the table, the index must also be dynamically maintained, which will reduce the execution efficiency of addition/change/deletion;
- Space: The index needs to take up physical space.
Index usage scenarios (emphasis)
- where
In the above figure, the records are queried based on id. Because the id field only establishes a primary key index, the only optional index for this SQL execution is the primary key index. If there are more than one, the better one will be selected as the basis for retrieval.
-- 增加一个没有建立索引的字段
alter table innodb1 add sex char(1);
-- 按sex检索时可选的索引为null
EXPLAIN SELECT * from innodb1 where sex='男';
You can try to create an index on the field (alter table table name add index (field name)) according to the efficiency of the query when a field is not indexed. The same SQL execution efficiency, you will find that the query efficiency will be Significant improvement (the larger the amount of data, the more obvious).
- order by
When we use order by to sort the query results according to a certain field, if the field is not indexed, the execution plan will use external sorting for all the data in the query (read the data from the hard disk in batches to the memory using internal sorting, Finally merge the sort results), this operation affects the performance very much, because all the data involved in the query needs to be read from the disk to the memory (if a single piece of data is too large or the amount of data is too large, the efficiency will be reduced), and no matter whether it is read into the memory After the sort.
But if we create an index on the field, alter table table name add index (field name), then since the index itself is ordered, the data can be retrieved one by one directly in accordance with the order of the index and the mapping relationship. And if it is paged, then only the data corresponding to the index in a certain range of the index table is taken out, instead of taking out all the data, sorting and returning the data in a certain range as described above. (Fetching data from disk is the most performance-affecting)
- join
Indexing the fields involved in the join statement matching relationship (on) can improve efficiency
- Index coverage
If the fields to be queried have been indexed, the engine will directly query in the index table without accessing the original data (otherwise, as long as one field is not indexed, it will do a full table scan). This is called index coverage. Therefore, we need to write only the necessary query fields after select as much as possible to increase the probability of index coverage.
It is worth noting here that do not think about creating an index for each field, because the advantage of using an index first lies in its small size.
What are the types of indexes?
- Primary key index: The data column is not allowed to be repeated, and it is not allowed to be NULL. A table can only have one primary key.
Unique index: Data columns are not allowed to be repeated, and NULL values are allowed. A table allows multiple columns to create unique indexes.
可以通过 ALTER TABLE table_name ADD UNIQUE (column); 创建唯一索引 可以通过 ALTER TABLE table_name ADD UNIQUE (column1,column2); 创建唯一组合索引
Ordinary index: The basic index type, there is no restriction on uniqueness, and NULL values are allowed.
可以通过ALTER TABLE table_name ADD INDEX index_name (column);创建普通索引 可以通过ALTER TABLE table_name ADD INDEX index_name(column1, column2, column3);创建组合索引
Full-text index: It is a key technology currently used by search engines.
可以通过ALTER TABLE table_name ADD FULLTEXT (column);创建全文索引
Index data structure (b tree, hash)
The data structure of the index is related to the implementation of the specific storage engine. The most used indexes in MySQL include Hash index, B+ tree index, etc., and the default index implementation of the InnoDB storage engine we often use is: B+ tree index. For a hash index, the underlying data structure is a hash table, so when most of the requirements are for single record query, you can choose a hash index, which has the fastest query performance; for most other scenarios, it is recommended to choose a BTree index.
- B-tree index
mysql fetches data through the storage engine, and basically 90% of people use InnoDB. According to the implementation method, there are currently only two types of InnoDB indexes: BTREE (B-tree) index and HASH index. B-tree index is the most frequently used index type in Mysql database. Basically all storage engines support BTree index. Usually, the index we are talking about does not accidentally refer to the (B-tree) index (actually implemented with B+tree, because when viewing the table index, mysql always prints BTREE, so it is referred to as the B-tree index)
query method :
- Primary key index area: PI (the address of the associated saved data) press the primary key to query,
- Common index area: si (the address of the associated id, and then reach the address above). So press the primary key to query, the fastest
B+tree properties :
- 1.) The nodes of n subtrees contain n keywords, which are not used to store data but to store the index of the data.
- 2.) All leaf nodes contain information about all keywords, and pointers to records containing these keywords, and the leaf nodes themselves are linked in order of the size of the keywords from small to large.
- 3.) All non-terminal nodes can be regarded as index parts, and the nodes only contain the largest (or smallest) keywords in their subtrees.
- 4.) In the B+ tree, the insertion and deletion of data objects are only performed on the leaf nodes.
- 5.) The B+ tree has 2 head pointers, one is the root node of the tree, and the other is the leaf node with the smallest key code.
- Hash index
Briefly speaking, similar to the HASH table (hash table) that is simply implemented in the data structure, when we use the hash index in mysql, it is mainly through the Hash algorithm (common Hash algorithms include direct addressing and square picking. , Folding method, divisor remainder method, random number method), the database field data is converted into a fixed-length Hash value, and the row pointer of this data is stored in the corresponding position of the Hash table; if a Hash collision occurs (two The hash value of different keywords is the same), it is stored in the form of a linked list under the corresponding hash key. Of course, this is only a simplified simulation diagram.
Basic principles of indexing
Indexes are used to quickly find records with specific values. If there is no index, generally the entire table is traversed when the query is executed.
The principle of indexing is very simple, which is to turn unordered data into ordered queries
- Sort the contents of the indexed column
- Generate an inverted list of sort results
- Put the data address chain on the content of the inverted table
- When querying, first get the contents of the inverted table, and then take out the data address chain, so as to get the specific data
What are the indexing algorithms?
Index algorithm has BTree algorithm and Hash algorithm.
- BTree algorithm
BTree is the most commonly used mysql database index algorithm, and it is also the default algorithm of mysql. Because it can be used not only in the comparison operators =, >, >=, <, <= and between, but also in the like operator, as long as its query condition is a constant that does not start with a wildcard, for example :
-- 只要它的查询条件是一个不以通配符开头的常量
select * from user where name like 'jack%';
-- 如果一通配符开头,或者没有使用常量,则不会使用索引,例如:
select * from user where name like '%jack';
- Hash algorithm
Hash Hash index can only be used for peer-to-peer comparison, such as =, <=> (equivalent to =) operator. Since the data is located once, unlike the BTree index, which requires multiple IO accesses from the root node to the branch node and finally to the page node, the retrieval efficiency is much higher than that of the BTree index.
The principle of index design?
- The column suitable for indexing is the column that appears in the where clause, or the column specified in the join clause
- Classes with a small cardinality have poor indexing effect, and there is no need to create an index in this column
- Use a short index, if you index a long string column, you should specify a prefix length, which can save a lot of index space
- Don't over-index. Indexing requires additional disk space and reduces the performance of write operations. When the table content is modified, the index will be updated or even reconstructed. The more index columns, the longer this time will be. So only maintain the required index to facilitate the query.
The principle of creating an index (the top priority)
index is good, but it is not unlimited use, it is best to comply with the following principles
- 1) The leftmost prefix matching principle, a very important principle for combined indexes, mysql will always match to the right until it encounters a range query (>, <, between, like) and stops matching, such as a = 1 and b = 2 and c> 3 and d = 4 If an index of (a, b, c, d) order is established, d is not an index. If an index of (a, b, d, c) is established, it can be used, a, b The order of d can be adjusted arbitrarily.
- 2) Create indexes for fields that are frequently used as query conditions
- 3) Frequently updated fields are not suitable for index creation
- 4) If the column that cannot effectively distinguish the data is not suitable for index column (such as gender, gender, gender, unknown, at most three, the degree of discrimination is too low)
- 5) Expand the index as much as possible, do not create a new index. For example, there is already an index of a in the table, and now you want to add an index of (a, b), you only need to modify the original index.
- 6) Data columns with foreign keys must be indexed.
- 7) For columns that are rarely involved in the query, do not create indexes for columns with more duplicate values.
- 8) Do not create indexes for columns defined as text, image, and bit data types.
Three ways to create an index, delete an index
The first way: create an index when executing CREATE TABLE
CREATE TABLE user_index2 ( id INT auto_increment PRIMARY KEY, first_name VARCHAR (16), last_name VARCHAR (16), id_card VARCHAR (18), information text, KEY name (first_name, last_name), FULLTEXT KEY (information), UNIQUE KEY (id_card) );
The second way: use the ALTER TABLE command to increase the index
ALTER TABLE table_name ADD INDEX index_name (column_list);
- ALTER TABLE is used to create ordinary indexes, UNIQUE indexes or PRIMARY KEY indexes.
- Among them, table_name is the name of the table whose index is to be added, and column_list indicates which columns to index. When there are multiple columns, each column is separated by a comma.
- The index name index_name can be named by yourself. By default, MySQL will assign a name based on the first index column. In addition, ALTER TABLE allows multiple tables to be changed in a single statement, so multiple indexes can be created at the same time.
The third way: use the CREATE INDEX command to create
CREATE INDEX index_name ON table_name (column_list);
CREATE INDEX can add ordinary index or UNIQUE index to the table. (However, PRIMARY KEY index cannot be created)
Delete index
Delete ordinary indexes, unique indexes, and full-text indexes according to the index name: alter table table name drop KEY index name
alter table user_index drop KEY name;
alter table user_index drop KEY id_card;
alter table user_index drop KEY information;
Delete the primary key index: alter table table name drop primary key (because there is only one primary key). It is worth noting here that if the primary key grows by itself, then this operation cannot be performed directly (self-growth depends on the primary key index):
Need to cancel self-growth and delete again:
alter table user_index
-- 重新定义字段
MODIFY id int,
drop PRIMARY KEY
But the primary key is usually not deleted, because the design of the primary key must have nothing to do with business logic.
What should I pay attention to when creating an index?
- Non-empty fields: You should specify the column as NOT NULL, unless you want to store NULL. In mysql, columns with null values are difficult to query optimization because they make indexes, index statistics, and comparison operations more complicated. You should replace the null value with 0, a special value, or an empty string;
- Fields with large discrete values: the column (the degree of difference between the values of the variables) is placed in front of the joint index, and the difference value of the field can be viewed through the count() function. The larger the return value, the more unique values of the field. The discrete degree of the field is high;
- The smaller the index field, the better: the data storage of the database is based on the page as the unit. The more data stored in one page, the more data obtained by one IO operation, the higher the efficiency.
Can index query definitely improve query performance? why
Generally, querying data through an index is faster than a full table scan. But we must also pay attention to its cost.
- The index needs space for storage and regular maintenance. Whenever a record is added or decreased in the table or the index column is modified, the index itself will also be modified. This means that INSERT, DELETE, and UPDATE for each record will cost 4 or 5 more disk I/Os. Because indexes require additional storage space and processing, unnecessary indexes will slow down query response time. Using index query may not necessarily improve query performance, index range query (INDEX RANGE SCAN) is suitable for two situations:
- Based on a range of retrieval, the general query returns a result set that is less than 30% of the number of records in the table
- Retrieval based on non-unique index
How to delete millions of data or more
Regarding the index: Because the index requires additional maintenance costs, because the index file is a separate file, when we add, modify, or delete data, additional operations on the index file will occur, which require additional IO, Will reduce the execution efficiency of addition/modification/deletion. Therefore, when we delete millions of data in the database, query the official MySQL manual to find that the speed of deleting data is directly proportional to the number of indexes created.
- So when we want to delete millions of data, we can delete the index first (it takes more than three minutes at this time)
- Then delete the useless data (this process takes less than two minutes)
- After the deletion is completed, the index is recreated (the data is less at this time). The index creation is also very fast, about ten minutes.
- Compared with the previous direct deletion, it is definitely much faster, not to mention that in case the deletion is interrupted, all deletions will be rolled back. That's even more a pit.
Prefix index
Syntax: index(field(10)), use the first 10 characters of the field value to build the index, the default is to use the entire content of the field to build the index.
Prerequisite: The prefix has a high degree of identification. For example, passwords are suitable for establishing a prefix index, because passwords are almost different.
The difficulty of the actual operation: lies in the length of the prefix interception.
We can use select count(*)/count(distinct left(password,prefixLen));
view an average matching degree of different prefix lengths by adjusting the value of prefixLen (increment from 1), and it will be close to 1 (the first prefixLen characters of a password can almost determine the only record)
What is the leftmost prefix principle? What is the leftmost matching principle
- As the name implies, it is the leftmost first. When creating a multi-column index, according to business needs, the most frequently used column in the where clause is placed on the leftmost.
- The left-most prefix matching principle, a very important principle, mysql will always match to the right until it encounters a range query (>, <, between, like) and stop matching, such as a = 1 and b = 2 and c> 3 and d = 4 If you create an index in the order of (a, b, c, d), d will not use the index, if you create an index of (a, b, d, c), you can all use it, in the order of a, b, d Can be adjusted arbitrarily.
- = And in can be out of order, such as a = 1 and b = 2 and c = 3 The index can be established in any order (a, b, c), and MySQL's query optimizer will help you optimize it into a form that the index can recognize
The difference between B tree and B+ tree
In a B-tree, you can store keys and values in internal nodes and leaf nodes; but in a B+ tree, the internal nodes are all keys without values, and the leaf nodes store both keys and values.
The leaf nodes of the B+ tree are connected by a chain, and the leaf nodes of the B tree are independent.
Benefits of using B-tree
B-trees can store keys and values at the same time in internal nodes. Therefore, placing frequently accessed data close to the root node will greatly improve the query efficiency of hot data. This feature makes B-trees more efficient in scenarios where specific data is repeatedly queried multiple times.
Benefits of using B+ trees
Since the internal nodes of the B+ tree only store keys, not values, one read can obtain more keys in the memory page, which helps narrow the search range faster. The leaf nodes of the B+ tree are connected by a chain. Therefore, when a full data traversal is required, the B+ tree only needs to use O(logN) time to find the smallest node, and then perform O(N) sequential traversal through the chain. can. The B-tree needs to traverse each level of the tree, which will require more memory replacement times, so it will take more time
What is the difference or pros and cons between Hash index and B+ tree?
First of all, we must know the underlying implementation principle of Hash index and B+ tree index:
The bottom layer of the hash index is the hash table. When searching, the corresponding key value can be obtained by calling the hash function once, and then back to the table to query to obtain the actual data. The bottom layer of the B+ tree is a multi-way balanced search tree. For each query, it starts from the root node, finds the leaf node to get the key value, and then judges whether it needs to return to the table to query the data according to the query.
Then it can be seen that they have the following differences:
- Hash index is faster to perform equivalent query (in general), but it cannot perform range query.
Because after the hash function is used to build the index in the hash index, the order of the index cannot be consistent with the original order, and range queries cannot be supported. And all the nodes of the B+ tree follow (the left node is smaller than the parent node, the right node is larger than the parent node, and the multi-tree is similar), and the natural support range.
Hash index does not support the use of index for sorting, the principle is the same as above.
- Hash index does not support fuzzy query and leftmost prefix matching of multi-column index. The principle is also because the hash function is unpredictable. The indexes of AAAA and AAAAB are not related.
- Hash index can not avoid returning to the table to query data at any time, and B+ tree can only complete the query through the index when it meets certain conditions (clustered index, covering index, etc.).
- Although the hash index is faster in equivalent queries, it is not stable. The performance is unpredictable. When a key value has a large number of repetitions, a hash collision occurs, and the efficiency may be extremely poor. The query efficiency of the B+ tree is relatively stable, and all queries are from the root node to the leaf node, and the height of the tree is low.
Therefore, in most cases, direct selection of B+ tree index can obtain stable and better query speed. There is no need to use a hash index.
Why the database uses B+ tree instead of B tree
- B tree is only suitable for random retrieval, while B+ tree supports both random retrieval and sequential retrieval;
- The B+ tree space utilization is higher, which can reduce the number of I/Os, and the disk read and write costs are lower. Generally speaking, the index itself is also very large, and it is impossible to store all of it in the memory, so the index is often stored on the disk in the form of an index file. In this case, disk I/O consumption will occur during the index lookup process. The internal node of the B+ tree does not have a pointer to the specific information of the keyword. It is only used as an index. Its internal node is smaller than the B tree. The number of keywords in the node that can be accommodated by the disk block is more, and it is read into the memory at one time The more keywords can be searched, and the number of IO reads and writes is reduced. The number of IO reads and writes is the biggest factor affecting index retrieval efficiency;
- The query efficiency of the B+ tree is more stable. The B-tree search may end at a non-leaf node. The closer to the root node, the shorter the search time. The existence of the record can be determined as long as the keyword is found. Its performance is equivalent to a binary search in the full set of keywords. In the B+ tree, the sequential search is more obvious. When searching randomly, any keyword search must take a path from the root node to the leaf node. The search path length of all keywords is the same, which leads to the query efficiency of each keyword. quite.
- While B-tree improves disk IO performance, it does not solve the problem of low efficiency of element traversal. The leaf nodes of the B+ tree are connected together in the order of pointers, and the entire tree can be traversed as long as the leaf nodes are traversed. Moreover, range-based queries in the database are very frequent, and B-trees do not support such operations.
- It is more efficient when adding or deleting files (nodes). Because the leaf nodes of the B+ tree contain all keywords and are stored in an ordered linked list structure, this can improve the efficiency of addition and deletion.
B+ tree does not need to go back to the table to query data when it satisfies the clustered index and the covering index.
In the B+ tree index, the leaf node may store the current key value, or it may store the current key value and the entire row of data. This is the clustered index and the non-clustered index. In InnoDB, only the primary key index is a clustered index. If there is no primary key, a unique key is selected to build a clustered index. If there is no unique key, a key is implicitly generated to build a clustered index.
When the query uses a clustered index, the entire row of data can be obtained at the corresponding leaf node, so there is no need to perform back-to-table query again.
What is a clustered index? When to use clustered and non-clustered indexes
- Clustered index: put the data storage and index together, find the index and find the data
- Non-clustered index: Store data in a separate index structure. The leaf node of the index structure points to the corresponding row of the data. Myisam caches the index in memory through key_buffer. When data needs to be accessed (data is accessed through index), it is in memory. Directly search the index in, and then find the corresponding data on the disk through the index, which is why the index is slow when the key buffer is not hit
To clarify a concept: In InnoDB, the index created on the clustered index is called the auxiliary index. Auxiliary index access data always requires a second lookup. Non-clustered indexes are auxiliary indexes, such as composite index, prefix index, and unique. Index, the secondary index leaf node is no longer the physical location of the row, but the primary key value
When to use clustered and non-clustered indexes
Will non-clustered indexes always return to table queries?
Not necessarily. This relates to whether all the fields required by the query statement hit the index. If all the fields hit the index, then there is no need to perform a back-to-table query.
For a simple example, suppose we build an index on the age of the employee table, then when the query select age from employee where age <20, the leaf node of the index already contains age information, and it will not be performed again Back to the table query.
What is a joint index? Why do we need to pay attention to the order in the joint index?
MySQL can use multiple fields to create an index at the same time, called a joint index. In the joint index, if you want to hit the index, you need to use them one by one in the order of the fields when creating the index, otherwise the index cannot be hit.
The specific reasons are:
When MySQL uses an index, the index needs to be ordered. Assuming that a joint index of "name, age, school" is now established, the order of the index is: first sort by name, if the name is the same, sort by age, if the value of age is also equal , Then sort by school.
When querying, the index is only strictly ordered by name at this time, so you must first use the name field for equivalent query, and then for the matched column, it is strictly ordered according to the age field, and you can use the age field. Do index search, and so on. Therefore, you should pay attention to the order of the index columns when creating a joint index. Generally, put the columns with frequent query requirements or high field selectivity first. In addition, individual adjustments can be made according to special cases of query or table structure.
Affairs
What is a database transaction?
A transaction is an indivisible sequence of database operations and the basic unit of database concurrency control. The result of its execution must change the database from one consistency state to another consistency state. A transaction is a logical set of operations, either all of them are executed or none of them are executed.
The most classic transaction is often cited as an example of transfer.
If Xiao Ming wants to transfer 1000 yuan to Xiaohong, this transfer will involve two key operations: reduce Xiao Ming's balance by 1000 yuan, and increase Xiaohong's balance by 1000 yuan. In case there is a sudden error between these two operations, such as a banking system crash, which causes Xiaoming's balance to decrease but Xiaohong's balance does not increase, this would be wrong. The transaction is to ensure that these two key operations either succeed or both fail.
What are the four characteristics of things (ACID)?
Relational databases need to follow the ACID rules, the specific content is as follows:
- Atomicity: Transaction is the smallest unit of execution and division is not allowed. The atomicity of the transaction ensures that the actions are either all completed or completely ineffective;
- Consistency: Before and after the transaction is executed, the data remains consistent, and the results of multiple transactions reading the same data are the same;
- Isolation: When concurrently accessing the database, a user's transaction is not interfered by other transactions, and the database is independent among concurrent transactions;
- Persistence: After a transaction is committed. Its changes to the data in the database are persistent, and even if the database fails, it should not have any impact on it.
What is dirty read? Phantom reading? Not repeatable?
Dirty Read: A transaction has updated a copy of data, and another transaction has read the same copy of data at this time. For some reason, the previous RollBack operation is performed, and the data read by the latter transaction is Would be incorrect.
Non-repeatable read: The data is inconsistent in the two queries of a transaction. This may be the original data updated by a transaction inserted between the two queries.
Phantom Read: The number of data items in the two queries of a transaction is inconsistent. For example, one transaction queries several rows of data, while another transaction inserts new columns of data at this time. In the next query, you will find that there are several columns of data that it did not have before.
What is the isolation level of a transaction? What is the default isolation level of MySQL?
In order to achieve the four major characteristics of transactions, the database defines four different transaction isolation levels, from low to high as Read uncommitted, Read committed, Repeatable read, and Serializable. These four levels can solve dirty reads, non-repeatable reads, and serializable. Phantom reading these types of questions.
The SQL standard defines four isolation levels:
- READ-UNCOMMITTED (read uncommitted): The lowest isolation level, allowing to read uncommitted data changes, which may cause dirty reads, phantom reads or non-repeatable reads.
- READ-COMMITTED (read committed): Allows to read data that has been committed by concurrent transactions, which can prevent dirty reads, but phantom reads or non-repeatable reads may still occur.
- REPEATABLE-READ (repeatable read): The results of multiple reads of the same field are consistent, unless the data is modified by the transaction itself, which can prevent dirty reads and non-repeatable reads, but phantom reads may still occur.
- SERIALIZABLE (serializable): The highest isolation level, fully compliant with the ACID isolation level. All transactions are executed one by one, so that there is no interference between transactions, that is, this level can prevent dirty reads, non-repeatable reads, and phantom reads.
It should be noted here: the REPEATABLE_READ isolation level used by Mysql by default, the READ_COMMITTED isolation level used by Oracle by default
The realization of the transaction isolation mechanism is based on the lock mechanism and concurrent scheduling. Among them, concurrent scheduling uses MVVC (Multi-Version Concurrency Control), which supports features such as concurrent consistent read and rollback by saving the modified old version information.
Because the lower the isolation level, the fewer locks the transaction requests, so the isolation level of most database systems is READ-COMMITTED (read submitted content):, but what you need to know is that the InnoDB storage engine uses REPEATABLE-READ ( ) 160c035c24ab53 will not have any performance loss.
The InnoDB storage engine generally uses the SERIALIZABLE (serializable) isolation level in the case of distributed transactions.
lock
Do you know MySQL locks?
When the database has concurrent transactions, data inconsistencies may occur. At this time, some mechanism is needed to ensure the order of access, and the lock mechanism is such a mechanism.
Just like a hotel room, if everyone enters and exits at will, there will be multiple people robbing the same room, and a lock is installed on the room, and the person who applies for the key can check in and lock the room. Others have to wait for him to finish using it. It can be used again.
The relationship between isolation level and lock
Under the Read Uncommitted level, read data does not need to add a shared lock, so that it will not conflict with the exclusive lock on the modified data
Under the Read Committed level, the read operation needs to add a shared lock, but the shared lock is released after the statement is executed;
Under the Repeatable Read level, read operations need to add a shared lock, but the shared lock is not released before the transaction is committed, that is, the shared lock must be released after the transaction is completed.
SERIALIZABLE is the most restrictive isolation level, because this level locks the entire range of keys and holds the lock until the transaction is completed.
According to the granularity of the lock, what are the database locks? Locking mechanism and InnoDB lock algorithm
In relational databases, database locks can be divided into row-level locks (INNODB engine), table-level locks (MYISAM engine), and page-level locks (BDB engine) according to the granularity of locks.
Locks used by MyISAM and InnoDB storage engines:
- MyISAM uses table-level locking.
- InnoDB supports row-level locking and table-level locking, the default is row-level locking
Row-level lock, table-level lock and page-level lock comparison
row-level lock is the lock with the finest locking granularity in Mysql, which means that only the row of the current operation is locked. Row-level locks can greatly reduce conflicts in database operations. The locking granularity is the smallest, but the locking overhead is also the largest. Row-level locks are divided into shared locks and exclusive locks.
features : high overhead, slow locking; deadlock will occur; locking granularity is the smallest, the probability of lock conflicts is the lowest, and the concurrency is the highest.
table-level lock is the lock with the largest locking granularity in MySQL, which means to lock the entire table of the current operation. It is simple to implement and consumes less resources. It is supported by most MySQL engines. The most commonly used MYISAM and INNODB both support table-level locking. Table-level locks are divided into table shared read locks (shared locks) and table exclusive write locks (exclusive locks).
Features : low overhead, fast locking; no deadlock; large locking granularity, the highest probability of locking conflicts, and the lowest concurrency.
page-level lock is a lock in MySQL with a lock granularity between row-level locks and table-level locks. Table-level locks are fast, but there are many conflicts, and row-level conflicts are few, but slow. Therefore, a compromised page level is adopted, and a set of adjacent records is locked at a time.
features : overhead and locking time are between table locks and row locks; deadlocks will occur; locking granularity is between table locks and row locks, and the degree of concurrency is general
What kind of locks does MySQL have in terms of lock categories? Isn't it a bit of a hindrance to concurrency efficiency to lock like the above?
In terms of lock categories, there are shared locks and exclusive locks.
Shared lock: also known as read lock. When the user wants to read data, add a shared lock to the data. Multiple shared locks can be added at the same time.
Exclusive lock: Also called write lock. When the user wants to write data, an exclusive lock is added to the data. Only one exclusive lock can be added, and it is mutually exclusive with other exclusive locks and shared locks.
Using the above example, there are two types of user behaviors. One is to look at the house. It is acceptable for multiple users to look at the house together. One is a real one-night stay. During this period, no matter if you want to check in or want to see the room.
The granularity of the lock depends on the specific storage engine. InnoDB implements row-level locks, page-level locks, and table-level locks.
Their locking overhead is from large to small, and their concurrency is also from large to small.
How is the row lock of the InnoDB engine implemented in MySQL?
Answer: InnoDB is based on the index to complete the row lock
例: select * from tab_with_index where id = 1 for update;
for update can complete row lock locking based on conditions, and id is a column with an index key. If id is not an index key, then InnoDB will complete the table lock, and there will be no way to talk about concurrency
There are three lock algorithms for the InnoDB storage engine
- Record lock: the lock on a single row record
- Gap lock: gap lock, lock a range, excluding the record itself
- Next-key lock: record+gap locks a range, including the record itself
Related knowledge points:
- Innodb uses next-key lock for row queries
- Next-locking keying in order to solve the Phantom Problem
- When the query index contains unique attributes, downgrade next-key lock to record key
- The purpose of Gap lock design is to prevent multiple transactions from inserting records into the same range, which will lead to phantom reading problems.
- There are two ways to explicitly close the gap lock: (except for foreign key constraints and uniqueness checks, only record lock is used in other cases) A. Set the transaction isolation level to RC B. Set the parameter innodb_locks_unsafe_for_binlog to 1
What is a deadlock? How to deal with it?
Deadlock refers to a phenomenon in which two or more transactions occupy each other on the same resource and request to lock each other's resources, leading to a vicious circle.
Common ways to solve deadlock
- 1. If different programs will access multiple tables concurrently, try to agree to access the tables in the same order, which can greatly reduce the chance of deadlock.
- 2. In the same transaction, try to lock all the resources needed at once to reduce the probability of deadlock;
- 3. For business parts that are very prone to deadlocks, you can try to upgrade the lock granularity and reduce the probability of deadlocks through table-level locking;
If the business is not handled well, you can use distributed transaction locks or use optimistic locks
What are the optimistic and pessimistic locking of the database? How did it happen?
The task of concurrency control in a database management system (DBMS) is to ensure that when multiple transactions access the same data in the database at the same time, the isolation and unity of the transaction and the unity of the database are not destroyed. Optimistic concurrency control (optimistic locking) and pessimistic concurrency control (pessimistic locking) are the main technical methods used for concurrency control.
Pessimistic lock: Assuming that concurrency conflicts will occur, shield all operations that may violate data integrity. When the data is queried, the transaction is locked until the transaction is committed. Implementation method: use the lock mechanism in the database
Optimistic locking: Assuming that no concurrency conflicts will occur, only check whether data integrity is violated when the operation is submitted. When modifying data, the transaction is locked, and the lock is performed by version. Implementation method: Le generally uses version number mechanism or CAS algorithm implementation.
Use scenarios of two locks
From the above introduction of the two types of locks, we know that the two types of locks have their own advantages and disadvantages. One cannot be considered better than the other. For example, optimistic locks are suitable for less writes (read more scenarios), that is, conflicts are true. When this happens rarely, this saves the overhead of the lock and increases the overall throughput of the system.
However, if it is overwriting, conflicts will usually occur frequently, which will cause the upper-level application to constantly retry, which will reduce the performance, so it is more appropriate to use pessimistic lock in the scenario of overwriting.
view
Why use views? What is a view?
In order to improve the reusability of complex SQL statements and the security of table operations, the MySQL database management system provides a view feature. The so-called view is essentially a virtual table, which does not exist physically. Its content is similar to a real table and contains a series of named column and row data. However, the view does not exist in the database in the form of stored data values. Row and column data come from the basic table referenced by the query that defines the view, and are dynamically generated when the view is specifically referenced.
The view enables developers to only care about certain specific data of interest and specific tasks, and can only see the data defined in the view, not the data in the table referenced by the view, thereby improving the security of the data in the database .
What are the characteristics of the view?
The characteristics of the view are as follows:
- The columns of the view can come from different tables, which are abstractions of tables and new relationships established in a logical sense.
- A view is a table (virtual table) generated from a basic table (real table).
- The creation and deletion of views does not affect the basic table.
- Updates (additions, deletions and modifications) to the view content directly affect the basic table.
- When the view comes from multiple basic tables, adding and deleting data is not allowed.
View operations include creating a view, viewing a view, deleting a view, and modifying a view.
What are the usage scenarios for views?
The basic purpose of the view: simplify sql query and improve development efficiency. If there is another use, it is to be compatible with the old table structure.
The following are common usage scenarios for views:
- Reuse SQL statements;
- Simplify complex SQL operations. After writing a query, you can easily reuse it without knowing its basic query details;
- Use part of the table instead of the entire table;
- Protect data. You can grant users access to specific parts of the table instead of access to the entire table;
- Change the data format and presentation. The view can return data that is different from the representation and format of the underlying table.
Advantages of views
- The query is simplified. Views can simplify user operations
- Data security. View allows users to view the same data from multiple angles, and can provide security protection for confidential data
- Logical data independence. Views provide a certain degree of logical independence for reconstructing the database
Disadvantages of the view
performance. The database must convert the query of the view into a query of the basic table. If the view is defined by a complex multi-table query, then, even a simple query of the view, the database turns it into a complex combination. It takes a certain amount of time.
Modify restrictions. When the user tries to modify certain rows of the view, the database must convert it into a modification of certain rows of the basic table. In fact, this is also the case when inserting or deleting from the view. For simple views, this is very convenient, but for more complex views, it may be unmodifiable
These views have the following characteristics: 1. Views with set operators such as UNIQUE. 2. Views with GROUP BY clause. 3. There are views of aggregate functions such as AVG\SUM\MAX. 4. Views using the DISTINCT keyword. 5. View of the connection table (with some exceptions)
What is a cursor?
A cursor is a data buffer created by the system for users to store the execution results of SQL statements. Each cursor area has a name. The user can obtain the records one by one through the cursor and assign them to the main variable, which will be further processed by the main language.
Stored procedures and functions
What is a stored procedure? What are the advantages and disadvantages?
A stored procedure is a pre-compiled SQL statement, which has the advantage of allowing a modular design, that is, it only needs to be created once and can be called multiple times in the program later. If a certain operation needs to execute SQL multiple times, using stored procedures is faster than pure SQL statement execution.
advantage
- 1) The stored procedure is pre-compiled, and the execution efficiency is high.
- 2) The code of the stored procedure is directly stored in the database, and the stored procedure name is directly called to reduce network communication.
- 3) High security, users with certain permissions are required to execute stored procedures.
- 4) Stored procedures can be reused, reducing the workload of database developers.
Disadvantage
- 1) Debugging is troublesome, but debugging with PL/SQL Developer is very convenient! Make up for this shortcoming.
- 2) The problem of porting, the database-side code is of course related to the database. But if it is an engineering project, there is basically no migration problem.
- 3) Recompilation problem, because the back-end code is compiled before running, if the object with the reference relationship changes, the affected stored procedures and packages will need to be recompiled (but it can also be set to automatically compile at runtime).
- 4) If a large number of stored procedures are used in a program system, the data structure will change as the user's demand increases when the program is delivered, and then there are related problems of the system. Finally, if the user wants to maintain the system, you can say It is very difficult, and the price is unprecedented, and it is more troublesome to maintain.
trigger
What is a trigger? What are the usage scenarios of triggers?
A trigger is a special event-driven stored procedure defined by the user on the relational table. A trigger is a piece of code that is automatically executed when an event is triggered.
scenes to be used
- Cascading changes can be achieved through related tables in the database.
- Real-time monitoring of changes in a field in a table requires corresponding processing.
- For example, the serial number of certain services can be generated.
Be careful not to abuse, otherwise it will cause difficulties in maintaining the database and applications.
You need to keep in mind the above basic knowledge points, the focus is to understand the difference between the data types CHAR and VARCHAR, the difference between the table storage engine InnoDB and MyISAM.
What triggers are there in MySQL?
There are the following six triggers in the MySQL database:
- Before Insert
- After Insert
- Before Update
- After Update
- Before Delete
- After Delete
Commonly used SQL statements
What are the main types of SQL statements
Data definition language DDL (Data Ddefinition Language) CREATE, DROP, ALTER
Mainly for the above operations that have operations on logical structures, including table structures, views, and indexes.
Data query language DQL (Data Query Language) SELECT
This is easier to understand, that is, the query operation, with the select keyword. Various simple queries, connection queries, etc. belong to DQL.
Data manipulation language DML (Data Manipulation Language) INSERT, UPDATE, DELETE
Mainly for the above operations, that is, to operate on the data. Corresponding to the query operation mentioned above, DQL and DML jointly construct the addition, deletion, modification, and query operations commonly used by most junior programmers. The query is a more special kind that is divided into DQL.
Data control function DCL (Data Control Language) GRANT, REVOKE, COMMIT, ROLLBACK
Mainly for the above operations that are operations on the security and integrity of the database, which can be simply understood as permission control, etc.
What are super keys, candidate keys, primary keys, and foreign keys?
- Super key: The set of attributes that can uniquely identify the tuple in the relationship is called the super key of the relationship mode. An attribute can be used as a super key, and a combination of multiple attributes can also be used as a super key. Super keys include candidate keys and primary keys.
- Candidate key: It is the smallest super key, that is, the super key without redundant elements.
- Primary key: A combination of data columns or attributes in a database table that uniquely and completely identify storage data objects. A data column can only have one primary key, and the value of the primary key cannot be missing, that is, it cannot be a null value (Null).
- Foreign key: The primary key of another table that exists in one table is called the foreign key of this table.
What kinds of SQL constraints are there?
- NOT NULL: The content used to control the field must not be empty (NULL).
- UNIQUE: The content of the control field cannot be repeated. A table allows multiple Unique constraints.
- PRIMARY KEY: It is also used for the control field content cannot be repeated, but it only allows one in a table.
- FOREIGN KEY: It is used to prevent the action of destroying the connection between tables, and it can also prevent illegal data from being inserted into the foreign key column, because it must be one of the values in the table it points to.
- CHECK: Used to control the value range of the field.
Six related queries
- iCross Join (CROSS JOIN)
- i inner join (INNER JOIN)
- i outer connection (LEFT JOIN/RIGHT JOIN)
- i Joint query (UNION and UNION ALL)
- i Full join (FULL JOIN)
iCross Join (CROSS JOIN)
SELECT * FROM A,B(,C)或者SELECT * FROM A CROSS JOIN B (CROSS JOIN C)#没有任何关联条件,结果是笛卡尔积,结果集会很大,没有意义,很少使用内连接(INNER JOIN)SELECT * FROM A,B WHERE A.id=B.id或者SELECT * FROM A INNER JOIN B ON A.id=B.id多表中同时符合某种条件的数据记录的集合,INNER JOIN可以缩写为JOIN
Internal connections are divided into three categories
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。