Abstract: In the semantic web and other graph models, follow the open world assumption, and the facts that are not included in the data are considered to be unknown rather than false.

This article is shared from the HUAWEI CLOUD community " graph database support for NULL attribute values ", the original author: Hello_TT.

NULL (empty value) is an identifier for unknown or missing data attributes in the database, and is used to indicate data values that do not exist in the database. When the attribute value of a node or edge of the graph data in the graph database is missing or undefined, the attribute value is NULL.

So why does the graph database need to support NULL values?

In graph models such as the Semantic Web, following the open world assumption, the facts that are not included in the data are considered to be unknown rather than false. For example, for a graph database containing several students, there are two queries as follows:

  • Query 1: find people who are in Tsinghua University
  • Query 2: Find out people whose university is not at Tsinghua University

If Xiao Ming in the graph database does not fill in the school, then does Xiao Ming belong to the result set of query one or the result set of query two. The open world hypothesis holds that the uncontained data is unknown and not false. Under the support of this logic, Xiao Ming is neither the answer to query one nor the answer to query two.

The graph database implements this logic through NULL values.

Let's take a look at the support of NULL attribute values in various graph databases.

GDB

For the string data type, an empty string with a length of zero is supported, which is represented as: "", and a blank field without double quotation marks indicates that it does not exist, which is nullptr.

NebulaGraph

By default, when inserting a point or edge, the attribute value can be NULL, and the user can also set the attribute value to not allow NULL (NOT NULL), that is, the value of the attribute must be set when inserting a point or edge, unless it has been set when the attribute is created Defaults.

HugeGraph

You can specify some strings to represent a null value, such as "NULL". If the vertex/edge attribute corresponding to the column is a nullable attribute, the value of this attribute will not be set when the vertex/edge is constructed.

Amazon Neptune

Blank fields are allowed. A blank field is considered a NULL value.

Neo4j

In Cypher, NULL is used to indicate missing or undefined values. Conceptually, NULL means a missing unknown value, and its treatment is slightly different from other values.

Gremlin

TinkerGraph can be configured to support NULL as an attribute value, but not all graph database products support it. So be sure to check the function of supportsNullPropertyValues() or check the documentation before using it.

TigerGraph

NULL and NOT NULL attributes are not supported. The value of NULL is not supported in the graph database. If no value is assigned to the attribute when creating the vertex or edge instance, the default value of the data type is used to assign the attribute, and the latest version has abolished this.

Huawei Cloud Image Engine Service GES

Support NULL attribute value. When entering a blank field, the attribute value is considered NULL.

The following example illustrates, suppose the schema of the imported data is:

<label name="movie">
    <properties>
        <property name="ChineseName" cardinality="single" dataType="string"/>
        <property name="Year" cardinality="single" dataType="int"/> 
    </properties>
</label>
<label name="user">
    <properties>
        <property name="Gender" cardinality="single" dataType="enum" typeNameCount="2" typeName1="F" typeName2="M"/>
        <property name="School" cardinality="single" dataType="string"/>
        <property name="Age" cardinality="single" dataType="int"/>
    </properties>
</label>
<label name="rate">
    <properties> 
        <property name="Datetime" cardinality="single" dataType="date"/>
        <property name="Score" cardinality="single" dataType="double" />
    </properties>
</label>  

The imported point data is:

张三,user,M,清华大学
李四,user,,北京大学,20
小明,user,,,21
Titanic,movie,泰塔尼克号,1997

The imported edge data is:

Zhang San, Titanic, rate,, 4
Call the GES native API interface for edge query:

GET http://{SERVER_URL}/ges/v1.0/{project_id}/graphs/{graph_name}/edges/detail? source=张三&target=Titanic

got the answer:

"edges": [
    {
        "index": "0",
        "source": "张三",
        "label": "rate",
        "properties": {
            "Score": [
                4.0
            ],
            "Datetime": [
                null
            ]
        },
        "target": "Titanic"
    }
]

As you can see, the Datetime attribute value of the queried edge is null, because the attribute field is a blank field when it is entered.

In addition, GES supports Gremlin and Cypher, two mainstream graph query languages. Below we verify the questions raised at the beginning of the article through Cypher.

Carry out the following three queries separately:

match (n:user) where n.School='清华大学' return n
match (n:user) where n.School<>'清华大学' return n
match (n:user) where n.School is null return n

The query results obtained are:

"row": [
    {
        "School": "清华大学",
        "Gender": "M",
        "Age": null
    }
],
"meta": [
    {
        "id": "张三",
        "type": "node",
        "labels": [
            "user"
        ]
    }
]

"row": [
    {
        "School": "北京大学",
        "Gender": null,
        "Age": 20
    }
],
"meta": [
    {
        "id": "李四",
        "type": "node",
        "labels": [
            "user"
        ]
    }
]

"row": [
    {
        "School": null,
        "Gender": null,
        "Age": 21
    }
],
"meta": [
    {
        "id": "小明",
        "type": "node",
        "labels": [
            "user"
        ]
    }
]

When n.School is null, the return values of n.School<>'Tsinghua University' and n.School='清华大学' are both non-true, so Xiaoming is not in the result set of the first two queries. Behind this is the three-valued arithmetic logic supported by GES Cypher. This logic supports the query mentioned at the beginning of the article and also follows the open world assumptions of models such as the Semantic Web.

Click to follow and learn about Huawei Cloud's fresh technology for the first time~


华为云开发者联盟
1.4k 声望1.8k 粉丝

生于云,长于云,让开发者成为决定性力量