This article was first published in the Nebula Graph public NebulaGraphCommunity , Follow to see the technical practice of the large factory graph database.
1 background
Nebula 2.0 already supports the text query function based on an external full-text search engine. Before introducing this feature, let's briefly review the architecture design and storage model of Nebula Graph, which is easier to describe in the next chapter.
1.1 Introduction to Nebula Graph Architecture
As shown in the figure, the Storage Service has three layers. The bottom layer is Store Engine. It is a stand-alone local store engine that provides get
/ put
/ scan
/ delete
operations for local data. The related interfaces are placed in KVStore / KVEngine. In the h file, users can customize and develop related local store plugins according to their own needs. Currently, Nebula provides a Store Engine based on RocksDB.
On top of the local store engine is our Consensus layer, which implements Multi Group Raft. Each Partition corresponds to a set of Raft Group, where Partition is our data fragmentation. At present, Nebula's fragmentation strategy uses the static hash method. The specific method for hashing will be mentioned in the next chapter schema. The user needs to specify the number of Partitions when creating a SPACE. Once the number of Partitions is set, it cannot be changed. Generally speaking, the number of Partitions must be able to meet the future expansion needs of the business.
Above the Consensus layer, which is the top layer of Storage Service, is our Storage Interfaces. This layer defines a series of APIs related to graphs. These API requests will be translated into a set of KV operations for the corresponding Partition at this level. It is the existence of this layer that makes our storage service a real graph storage. Otherwise, the Storage Service is just a KV storage. However, Nebula did not propose KV as a service separately. The main reason is that a large number of calculations are involved in the process of graph query. These calculations often require the use of graph schema, and the KV layer does not have the concept of data schema, so the design will be compared. It is easy to achieve calculation pushdown.
1.2 Introduction to Nebula Graph Storage
In Nebula Graph 2.0, the storage structure has been improved. It includes the storage structure of points, edges, and indexes. Next, we will briefly review the storage structure of 2.0. Through the explanation of the storage structure, you can basically understand the principle of Nebula Graph data and index scanning.
1.2.1 Nebula data storage structure
The storage of Nebula data includes the storage of "points" and "edges". The storage of "points" and "edges" is based on the KV model storage. Here we mainly introduce the storage structure of its Key. The structure is as follows
Type
: 1 byte, used to indicate the type of key. The current types include vertex, edge, index, system, etc.PartID
: 3 bytes, used to indicate the data fragment partition, this field is mainly used for partition redistribution (balance) to facilitate scanning the entire partition data according to the prefixVertexID
: n bytes, used to indicate the ID of the source point in the outgoing edge, and the ID of the target point in the incoming edge.Edge Type
: 4 bytes, used to indicate the type of this edge, if it is greater than 0, it means the out edge, and if it is less than 0, it means the in edge.Rank
: 8 bytes, used to handle the situation where there are multiple edges of the same type. Users can set up according to their needs, this field may store trading hours , transaction serial number , or a ranking weight .PlaceHolder
: 1 byte, invisible to users, used when realizing distributed transactions in the future.TagID
: 4 bytes, used to indicate the type of tag.
1.2.1.1 Point storage structure
Type (1 byte) | PartID (3 bytes) | VertexID (n bytes) | TagID (4 bytes) |
---|
1.2.1.2 Edge storage structure
Type (1 byte) | PartID (3 bytes) | VertexID (n bytes) | EdgeType (4 bytes) | Rank (8 bytes) | VertexID (n bytes) | PlaceHolder (1 byte) |
---|
1.2.2 Nebula index storage structure
- props binary (n bytes): The props attribute value in tag or edge. If the attribute is NULL, 0xFF will be filled.
- Nullable bitset (2 bytes): Identifies whether the prop attribute value is NULL, a total of 2 bytes (16 bits), it can be seen that an index can contain up to 16 fields.
1.2.2.1 tag index storage structure
Type (1 byte) | PartID (3 bytes) | IndexID (4 bytes) | props binary (n bytes) | nullable bitset (2 bytes) | VertexID (n bytes) |
---|
1.2.2.2 Edge index storage structure
Type (1 byte) | PartID (3 bytes) | IndexID (4 bytes) | props binary (n bytes) | nullable bitset (2 bytes) | VertexID (n bytes) | Rank (8 bytes) | VertexID (n bytes) |
---|
1.3 Reasons for borrowing a third-party full-text search engine
From the above storage structure reasoning, we can see that if we want to perform a fuzzy query on a prop field, we need to perform a full table scan
or full index scan
, and then filter it row by row. From this point of view, the query performance will be greatly reduced. In the case of a large amount of data, memory overflow may occur before the scan is completed. In addition, if the storage model of the Nebula index is designed as an inverted index model suitable for text search, it will deviate from the initial design principle of the Nebula index. After some research and discussion, the so-called technical industry has specialization, and the work of text search is still handed over to an external third-party full-text search engine. On the basis of ensuring query performance, it also reduces the development cost of the Nebula kernel.
2 goals
2.1 Function
For version 2.0, we only support text search LOOKUP
That is to say, based on Nebula's internal index, with the help of a third-party full-text search engine to complete the text search function of LOOKUP
For the third-party full-text engine, only some basic data import, query and other functions are currently used. If you want to do some complex, plain text query calculations, Nebula's current functions still need to be improved and improved, and we look forward to the valuable suggestions from the majority of community users. The currently supported text search expressions are as follows:
- Fuzzy query
- Prefix query
- Wildcard query
- Regular expression query
2.2 Performance
The performance mentioned here refers to data synchronization performance and query performance.
- Data synchronization performance: Since we use a third-party full-text search engine, it is inevitable that a copy of data needs to be saved in the third-party full-text search engine. Proven, third-party import performance full-text search engine is lower than the performance data import Nebula itself, in order not to affect the performance of its Nebula data import, we passed asynchronous data synchronization solutions for data import third-party full-text search engine jobs. The specific data synchronization logic will be introduced in detail in the following chapters.
- Data query performance: We just mentioned that if you don't use a third-party full-text search engine, Nebula's text search will be a nightmare. At present,
LOOKUP
supports text search through a third-party full-text engine. It is inevitable that the performance will be slower than the native index scan of Nebula, and sometimes even the query of the third-party full-text engine itself will be very slow. At this time, we need to have a aging mechanism . Ensure query performance. NamelyLIMIT
andTIMEOUT
, which will be described in detail in the following chapters.
3 Glossary
name | Description |
---|---|
Tag | Used for the attribute structure on the point, a vertex can be attached with multiple tags, which are marked by tagId. |
Edge | Similar to tag, edge is the attribute structure used on the edge, marked by edgetype. |
Property | The attribute value on tag or edge, its data type is determined by the structure of tag or edge. |
Partition | Nebula Graph is the smallest logical storage unit. One Storage Engine can contain multiple partitions. Partition is divided into the roles of leader and follower, and raftex guarantees the data consistency between leader and follower. |
Graph space | Each graph space is an independent business graph unit, and each graph space has its own set of tags and edges. A Nebula Graph cluster can contain multiple graph spaces. |
Index | The index appearing in the following refers to the attribute index of the point and edge of the Nebula Graph. Its data type depends on tag or edge. |
TagIndex | Based on the index created by the tag, one tag can create multiple indexes. Because composite indexes are not supported temporarily, an index can only be based on one tag. |
EdgeIndex | Index created based on edge. Similarly, one edge can create multiple indexes, but one index can only be based on one edge. |
Scan Policy | The scanning strategy of index, often a query statement can have multiple index scanning methods, but the specific scanning method to be used needs to be determined by the scan policy. |
Optimizer | Optimize query conditions, such as sorting, splitting, and merging sub-expression nodes on the expression tree of the WHERE clause. Its purpose is to obtain higher query efficiency. |
4 Implementation logic
At present, our compatible third-party full-text search engine is ElasticSearch. This chapter mainly focuses on ElasticSearch.
4.1 Storage structure
4.1.1 DocID
partId(10 bytes) | schemaId(10 bytes) | encoded_columnName(32 bytes) | encoded_val(max 344 bytes) |
---|
- partId: Corresponds to the partition ID of Nebula. It is not used in the current 2.0 version. It is mainly used for future query push and es routing mechanisms.
- schemaId: corresponds to the tagId or edgetype of Nebula.
- Encoded_columnName: Corresponds to the column name in tag or edge, and an md5 encoding is made here to avoid incompatible characters in ES DocID.
- The maximum size of encoded_val is 344 bytes because the prop value is encoded in base64 to solve the problem of visible characters in the prop that are not supported by the docId. The actual val size is limited to 256 bytes. Why is the length limited to 256 here? At the beginning of the design, the main purpose was to complete the text search function in LOOKUP. Based on Nebula's own index, its length is also limited. Like the traditional relational database MySQL, the field length of its index is recommended to be within 256 characters. Therefore, the length of the third search engine is also limited to 256. does not support full-text search for long texts here.
- The longest docId of ES is 512 bytes, and currently there are about 100 bytes reserved.
4.1.2 Doc Fields
- schema_id: corresponds to the tagId or edgetype of Nebula.
- column_id: the code of the column in the nebula tag or edge.
- value: Corresponds to the attribute value in Nebula's native index.
4.2 Data synchronization logic
Leader & Listener
The above chapter briefly introduces the logic of data asynchronous synchronization, and this logic will be introduced in detail in this chapter. Before the introduction, let us get to know the Leader and Listener of Nebula.
- Leader: Nebula itself is a horizontally scalable distributed system, and its distributed protocol is raft. A partition (Partition) can have multiple roles in a distributed system, such as Leader, Follower, Learner, etc. When new data is written, the leader initiates a WAL synchronization event and synchronizes the WAL to Follower and Learner . When network abnormalities, disk abnormalities, etc. occur, its partition role will also change accordingly. This ensures the data security of the distributed database. Whether it is Leader, Follower, or Learner, they are all controlled in the nebula-storaged process, and their system parameters are determined by the configuration parameter
nebula-stoage.conf
. - Listener: Unlike Leader, Follower and Learner, Listener is controlled by a separate process, and its configuration parameters are determined
nebula-stoage-listener.conf
As a listener, the Listener will passively receive the WAL from the Leader, parse the WAL regularly, and call the data insertion API of the third-party full-text engine to synchronize the data to the third-party full-text search engine. For ElasticSearch, Nebula supportsPUT
andBULK
interfaces.
Next we introduce the data synchronization logic:
- Insert vertex or edge through Client or Console
- The graph layer calculates the relevant partition through the Vertex ID
- The graph layer sends the
INSERT
request to the leader of the relevant Partition through the storageClient - The Leader parses the
INSERT
request and synchronizes the WAL to the Listener - The Listener will periodically process the newly synchronized WAL and parse the WAL to obtain the attribute value of the field type string in the tag or edge.
- Assemble tag or edge metadata and attribute values into an ElasticSearch compatible data structure
- Write to ElasticSearch through ElasticSearch's
PUT
orBULK
- If the write fails, go back to step 5 and continue to retry the failed WAL until the write succeeds.
- After the writing is successful, record the successful Log ID and Term ID as the starting value for the next WAL synchronization.
- Go back to the timer in step 5 and process the new WAL.
In the above steps, if the ElasticSearch cluster hangs or the Listener process hangs, stop WAL synchronization. When the system is restored, the last successful Log ID will continue to synchronize data. Here is a suggestion that the DBA needs to monitor the running status of ES in real time through external monitoring tools. If ES is in an invalid state for a long time, it will cause the Listener's log to skyrocket and make normal query operations impossible.
4.3 Query logic
As can be seen from the above figure, the key steps of its text search are "Send Fulltext Scan Request" → "Fulltext Cluster" → "Collect Constant Values" → "IndexScan Optimizer".
- Send Fulltext Scan Request: Generate a full-text index query request based on query conditions, schema ID, and Column ID (ie CURL command encapsulated into ES)
- Fulltext Cluster: Send query request to ES and get ES query result.
- Collect Constant Values: Use the returned query result as a constant value to generate the query expression inside Nebula. For example, the original query request is to query the attribute value starting with "A" in the C1 field. If the returned result contains two results "A1" and "A2", then in this step, it will be parsed as the neubla expression
C1 == "A1" OR C1 == "A2"
. - IndexScan Optimizer: According to the newly generated expression, find the optimal Nebula internal Index based on RBO, and generate the optimal execution plan.
- In the "Fulltext Cluster" step, there may be slow query performance or massive data return. Here we provide
LIMIT
andTIMEOUT
mechanisms to interrupt the query on the ES side in real time.
5 Demo
5.1 Deploy an external ES cluster
For the deployment of ES clusters, I won't introduce them in detail here, I believe everyone is familiar with it. What needs to be explained here is that when the ES cluster is successfully started, we need to create a general template for the ES cluster. Its structure is as follows:
{
"template": "nebula*",
"settings": {
"index": {
"number_of_shards": 3,
"number_of_replicas": 1
}
},
"mappings": {
"properties" : {
"tag_id" : { "type" : "long" },
"column_id" : { "type" : "text" },
"value" :{ "type" : "keyword"}
}
}
}
5.2 Deploy Nebula Listener
- According to the actual environment, modify the configuration parameter
nebula-storaged-listener.conf
- Start Listener:
./bin/nebula-storaged --flagfile ${listener_config_path}/nebula-storaged-listener.conf
5.3 Registering the client connection information of ElasticSearch
nebula> SIGN IN TEXT SERVICE (127.0.0.1:9200);
nebula> SHOW TEXT SEARCH CLIENTS;
+-------------+------+
| Host | Port |
+-------------+------+
| "127.0.0.1" | 9200 |
+-------------+------+
| "127.0.0.1" | 9200 |
+-------------+------+
| "127.0.0.1" | 9200 |
+-------------+------+
5.4 Create Nebula Space
CREATE SPACE basketballplayer (partition_num=3,replica_factor=1, vid_type=fixed_string(30));
USE basketballplayer;
5.5 Add Listener
nebula> ADD LISTENER ELASTICSEARCH 192.168.8.5:46780,192.168.8.6:46780;
nebula> SHOW LISTENER;
+--------+-----------------+-----------------------+----------+
| PartId | Type | Host | Status |
+--------+-----------------+-----------------------+----------+
| 1 | "ELASTICSEARCH" | "[192.168.8.5:46780]" | "ONLINE" |
+--------+-----------------+-----------------------+----------+
| 2 | "ELASTICSEARCH" | "[192.168.8.5:46780]" | "ONLINE" |
+--------+-----------------+-----------------------+----------+
| 3 | "ELASTICSEARCH" | "[192.168.8.5:46780]" | "ONLINE" |
+--------+-----------------+-----------------------+----------+
5.6 Create Tag, Edge, Nebula Index
At this time, it is recommended that the length of the field "name" should be less than 256. If the business permits, it is recommended that the type of the field name in the player be defined as a fixed_string type, and its length is less than 256.
nebula> CREATE TAG player(name string, age int);
nebula> CREATE TAG INDEX name ON player(name(20));
5.7 Insert data
nebula> INSERT VERTEX player(name, age) VALUES \
"Russell Westbrook": ("Russell Westbrook", 30), \
"Chris Paul": ("Chris Paul", 33),\
"Boris Diaw": ("Boris Diaw", 36),\
"David West": ("David West", 38),\
"Danny Green": ("Danny Green", 31),\
"Tim Duncan": ("Tim Duncan", 42),\
"James Harden": ("James Harden", 29),\
"Tony Parker": ("Tony Parker", 36),\
"Aron Baynes": ("Aron Baynes", 32),\
"Ben Simmons": ("Ben Simmons", 22),\
"Blake Griffin": ("Blake Griffin", 30);
5.8 Query
nebula> LOOKUP ON player WHERE PREFIX(player.name, "B");
+-----------------+
| _vid |
+-----------------+
| "Boris Diaw" |
+-----------------+
| "Ben Simmons" |
+-----------------+
| "Blake Griffin" |
+-----------------+
6 Problem tracking and solving skills
In the process of setting up the system environment, a certain step error may cause the function to fail to operate normally. In the previous user feedback, I summarized three types of errors that may occur. The overview of the analysis and problem-solving skills is as follows
Listener cannot be started, or does not work normally after starting
- Check the Listener configuration file to ensure that the
IP:Port
Listener does not conflict with the existing nebula-storaged - Check the Listener configuration file to ensure that the
IP:Port
Meta is correct. This should be consistent with nebula-storaged - Check the Listener configuration file to ensure that the pids directory and the logs directory are independent, and do not conflict with nebula-storaged
- When the startup is successful, the configuration is modified due to a configuration error, and it still cannot work after restarting. At this time, the meta-related metadata needs to be cleaned up. For this operation command, please refer to the help manual of nebula: document link .
- Check the Listener configuration file to ensure that the
Data cannot be synchronized to the ES cluster
- Check that the Listener from Leader termination by the WAL, you can view
nebula-storaged-listener.conf
profile–listener_path
whether there are files in the directory. - Open vlog (
UPDATE CONFIGS storage:v=3
), and pay attention to whether the CURL command in the log is executed successfully, if there is an error, it may be an ES configuration or ES version compatibility error
- Check that the Listener from Leader termination by the WAL, you can view
There is data in the ES cluster, but the correct result cannot be queried
- Also open vlog (
UPDATE CONFIGS graph:v=3
), pay attention to the graph log, check the reason why the CURL command failed to execute - When querying, only lowercase characters can be recognized, but uppercase characters cannot be recognized. It may be an error in ES template creation. Please refer to the nebula help manual to create: document link .
- Also open vlog (
7 TODO
- Create a full-text index for a specific tag or edge
- Reconstruction of full-text index (REBUILD)
Exchange graph database technology? Please join Nebula exchange group under Nebula fill in your card , Nebula assistant will pull you into the group ~
Want to exchange graph database technology with other big companies? The NUC 2021 conference is waiting for you to communicate: NUC 2021 registration portal
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。