Graph Database Practice: Using Nebula Graph to Crack the Idiom Wordle Mystery

This article was first published on Nebula Graph Community public number

图数据库实操：用 Nebula Graph 破解成语版 Wordle 谜底

During the Spring Festival, if you have played Wordle, a popular word guessing game on social media, you may have heard of the idiom version of Handou. In the process of playing Handou, I found that it would be very interesting to use the graph query of Nebula Graph to solve Antfu's Handou (Chinese idiom version Wordle 👉🏻 handle.antfu.me ), which is very suitable as a graph database The practice of the sentence. In this article, you'll find out how I "cheated" Handou with the knowledge graph. 😁

What is a Handou?

Handou ( https://handle.antfu.me ) is another very cool work by Antfu by Vue/Vite core team member, a very refined Chinese version of Wordle, it is a daily challenge fill Chinese idiom version of word game.

Every day, Handou will launch an idiom guessing challenge. People must guess the corresponding idiom correctly within ten times to win. After each step, they will receive a prompt about the matching of the corresponding words, initials, finals, and tones. Among them: green means this The element exists and the position matches, and the orange color indicates that the element exists but the position is wrong. The detailed rules can be seen in the screenshot of the following webpage:

The fun of Handou lies in the limited number of attempts, searching for possible answers in the brain, and constantly approaching the truth. Any attempt to cheat and tricky to leak the results is very boring and unappetizing (such as from open source Handou. stealing information from pocket code), this process is like the brain doing gymnastics.

When it comes to idiom vocabulary gymnastics for the brain, I suddenly thought, why can't we build a knowledge map of Chinese idioms outside the brain, and then use this map to actually operate a graph database and do a graph query gymnastics?

Constructing an idiom knowledge map for solving Handou

What is a knowledge graph?

To put it simply, a knowledge graph is a network that connects the relationships between entities. It was originally proposed by Google and used to meet the search problems in search engines that can only be obtained based on knowledge reasoning (rather than the inverted index of web pages). For example: "Yao Ming Wife's age?", "How many championships have the Rockets won?"

In these questions, we focus on the conditions in the question. By 2022, knowledge graphs have been widely used in recommender systems, question answering systems, security risk control, and more fields beyond search.

Why do you need to use knowledge graphs to solve Handou?

~~reason is: because I can~~

In fact, the process of solving anagrams in our brain is very similar to the process of information search in the graph network. The feedback prompt conditions for solving puzzles in Handou are naturally suitable to be expressed by the semantics of the graph. Later in this article, you will find that it is very natural to translate puzzle-solving conditions into graph semantics. This problem is like a natural exercise for graphs. I believe this is related to the structure of knowledge graphs and the knowledge structure in the human brain. Proximity has a lot to do with it.

How to build a knowledge graph for Handou puzzle solving?

A knowledge graph is composed of entities (vertices) and relationships (edges). Using a graph database management system (Graph Database MS), knowledge can be easily stored, changed, queried, and even visually explored.

In this article, I will use the open source distributed graph database Nebula Graph to practice this process, and I will put the construction of the specific graph system at the end of the article.

In this chapter, we only discuss the modeling of graphs: how to design "entities" and "relationships" for Handou's puzzle solving.

graph modeling

initial thought

First, the entities that must exist are:

idiom
Chinese character
Idioms-[contains]->Chinese characters, each Chinese character-[pronounced]->pronunciation.

Secondly, because the process of solving the puzzle involves the conditions of initials, finals and tones, considering that the magnitude of the map itself is very small (thousands), and the pronunciation of words is a one-to-many relationship, I put the pronunciation and the initials (including the initials). -initial and final -final) are also used as entities, and the relationship between them is logical:

final version

However, when I searched based on the graph later, I found that the initial modeling would make (idiom)–> (character)–> (pronunciation) in the query process to lose the specific pronunciation conditions of this word, so my final Modeling is:

In this way, the condition of pure text only involves the jump of (idiom)-->(word), while the condition of pronunciation, initial consonant, and tone is another relational path, which does not have the redundancy of the condition of the original version, and can be used in A path pattern match carries two conditions (this expression will be covered in the following examples).

`Build Idiom Knowledge Graph`

With modeling, the construction of such a simple map leaves the collection, cleaning and storage of data.

For all idiom data and their pronunciations, on the one hand, I directly extracted the data inside the Handou code, and on the other hand, I used the open source Python library PyPinyin to get the pronunciation of the data without pronunciation in the Handou data. At the same time, I also Many convenient functions in PyPinyin are used, such as: getting the initials and finals of a pinyin.

The code for the build tool is here: https://github.com/wey-gu/chinese-graph

I also put more information in the appendix at the end of the article.

`Start knowledge graph query gymnastics`

At this point, I assume that we all already have the knowledge map of the I built for , let's start our map query gymnastics!

First, open the Handou https://handle.antfu.me/

Let's say we want to start with an idiom, try this if you're out of ideas:

# 匹配成语中的一个结果
MATCH (x:idiom) "爱憎分明" RETURN x LIMIT 1

# 返回结果
("爱憎分明" :idiom{pinyin: "['ai4', 'zeng1', 'fen1', 'ming2']"})

Then we fill it in the pocket and get the prompt conditions for the first attempt:

We were lucky and got the conditions for three positions!

There is a word that is not in the first position, the pinyin is 4 tones, the final is ai, but not love (love)
There is a single word, not in the second position (hate)
There is a word with the final ing, which is not in the fourth position (ming)
The fourth character is the second tone (Ming)

Next, we start graph database statement gymnastics!

# 有一个非第一个位置的字，拼音是 4 声，韵母是 ai，但不是爱
MATCH (char0:character)<-[with_char_0:with_character]-(x:idiom)-[with_pinyin_0:with_pinyin]->(pinyin_0:character_pinyin)-[:with_pinyin_part]->(final_part_0:pinyin_part{part_type: "final"})
WHERE id(final_part_0) == "ai" AND pinyin_0.character_pinyin.tone == 4 AND with_pinyin_0.position != 0 AND with_char_0.position != 0 AND id(char0) != "爱"
# 有一个一声的字，不在第二个位置
MATCH (x:idiom) -[with_pinyin_1:with_pinyin]->(pinyin_1:character_pinyin)
WHERE pinyin_1.character_pinyin.tone == 1 AND with_pinyin_1.position != 1
# 有一个字韵母是 ing，不在第四个位置
MATCH (x:idiom) -[with_pinyin_2:with_pinyin]->(:character_pinyin)-[:with_pinyin_part]->(final_part_2:pinyin_part{part_type: "final"})
WHERE id(final_part_2) == "ing" AND with_pinyin_2.position != 3
# 第四个字是二声
MATCH (x:idiom) -[with_pinyin_3:with_pinyin]->(pinyin_3:character_pinyin)
WHERE pinyin_3.character_pinyin.tone == 2 AND with_pinyin_3.position == 3

RETURN x, count(x) as c ORDER BY c DESC

Running in the graph database, I got 7 answers:

("惊愚骇俗" :idiom{pinyin: "['jing1', 'yu2', 'hai4', 'su2']"})
("惊世骇俗" :idiom{pinyin: "['jing1', 'shi4', 'hai4', 'su2']"})
("惊见骇闻" :idiom{pinyin: "['jing1', 'jian4', 'hai4', 'wen2']"})
("沽名卖直" :idiom{pinyin: "['gu1', 'ming2', 'mai4', 'zhi2']"})
("惊心骇神" :idiom{pinyin: "['jing1', 'xin1', 'hai4', 'shen2']"})
("荆棘载途" :idiom{pinyin: "['jing1', 'ji2', 'zai4', 'tu2']"})
("出卖灵魂" :idiom{pinyin: "['chu1', 'mai4', 'ling2', 'hun2']"})

It seems that "shocking the world" is more mainstream, try it!

We are fortunate to have found the answer at one time with the help of the idiom cheating on the knowledge map. Of course, this is actually due to the number of restrictions brought by the randomly selected words for the first time, but in most cases, The chances of getting the final answer in two attempts are still very good!

Note, the long 253 minutes in the middle is because I found in the query that the map constructed in the previous code was a bit buggy. It was the wrong data of the pronunciation map caused by the word "shackled". Fortunately, it was fixed later.
Do you know the correct pronunciation of "shackled with chains"? 😭

To answer the question, let me explain the process of this idiom cracking in detail.

`meaning of the sentence`

We start with the condition of the first word, which is a condition that has both sound and glyph information.

Sound information: there is a pronunciation with the final ai4, the position is not in the first word
Text information: This word with the final vowel of ai4 is not a love word

For the sound information condition, the conversion to graph pattern matching is: (idiom)-one word pronunciation-(pinyin)-contain initials-(final) WHERE Pinyin final is ai4 AND position is not the first.

Because when modeling, I use English for attribute names (in fact, Chinese is also supported), and the actual statement is:

# 有一个非第一个位置的字，拼音是 4 声，韵母是 ai
MATCH (x:idiom)-[with_pinyin_0:with_pinyin]->(pinyin_0:character_pinyin)-[:with_pinyin_part]->(final_part_0:pinyin_part{part_type: "final"})
WHERE id(final_part_0) == "ai" AND pinyin_0.character_pinyin.tone == 4 AND with_pinyin_0.position != 0
# ...
RETURN x

Similarly, the expression for words that are not in the first position, not is:

# 有一个非第一个位置的字，拼音是 4 声，韵母是 ai，但不是爱
MATCH (char0:character)<-[with_char_0:with_character]-(x:idiom)
WHERE with_char_0.position != 0 AND id(char0) != "爱"
# ...
RETURN x, count(x) as c ORDER BY c DESC

And because these two conditions ultimately describe the same word, they can be written under one path:

# 有一个非第一个位置的字，拼音是 4 声，韵母是 ai，但不是爱
MATCH (char0:character)<-[with_char_0:with_character]-(x:idiom)-[with_pinyin_0:with_pinyin]->(pinyin_0:character_pinyin)-[:with_pinyin_part]->(final_part_0:pinyin_part{part_type: "final"})
WHERE id(final_part_0) == "ai" AND pinyin_0.character_pinyin.tone == 4 AND with_pinyin_0.position != 0 AND with_char_0.position != 0 AND id(char0) != "爱"
# ...
RETURN x

For more MATCH syntax and example details, please refer to the documentation:

MATCH：https://docs.nebula-graph.com.cn/3.0.1/3.ngql-guide/7.general-query-statements/2.match/
Graph Patterns: https://docs.nebula-graph.com.cn/3.0.1/3.ngql-guide/1.nGQL-overview/3.graph-patterns/
nGQL command: cheatsheet

`Visual display of clues`

We take the matching path of each condition as the output, and use the visualization ability of Nebula Graph to get:

# 有一个非第一个位置的字，拼音是 4 声，韵母是 ai，但不是爱 # 有一个非第一个位置的字，拼音是 4 声，韵母是 ai，但不是爱
MATCH p0=(char0:character)<-[with_char_0:with_character]-(x:idiom)-[with_pinyin_0:with_pinyin]->(pinyin_0:character_pinyin)-[:with_pinyin_part]->(final_part_0:pinyin_part{part_type: "final"})
WHERE id(final_part_0) == "ai" AND pinyin_0.character_pinyin.tone == 4 AND with_pinyin_0.position != 0 AND with_char_0.position != 0 AND id(char0) != "爱"
# 有一个一声的字，不在第二个位置
MATCH p1=(x:idiom) -[with_pinyin_1:with_pinyin]->(pinyin_1:character_pinyin)
WHERE pinyin_1.character_pinyin.tone == 1 AND with_pinyin_1.position != 1
# 有一个字韵母是 ing，不在第四个位置
MATCH p2=(x:idiom) -[with_pinyin_2:with_pinyin]->(:character_pinyin)-[:with_pinyin_part]->(final_part_2:pinyin_part{part_type: "final"})
WHERE id(final_part_2) == "ing" AND with_pinyin_2.position != 3
# 第四个字是二声
MATCH p3=(x:idiom) -[with_pinyin_3:with_pinyin]->(pinyin_3:character_pinyin)
WHERE pinyin_3.character_pinyin.tone == 2 AND with_pinyin_3.position == 3

RETURN p0,p1,p2,p3

After executing the above statement in the Console console of the visualization tool, select Import Graph to explore, and you can see:

`Next step`

If you learned about the Nebula Graph database for the first time from this article, then you can learn about the Nebula Graph project and the official Bilibili site of the Nebula Graph community 👉🏻 https://space.bilibili.com/472621355 More interesting introductory knowledge.

In addition, here is the official online trial environment of Nebula Graph, you can follow the document , and use the trial environment to try it out.

Later, Nebula Graph will carry out daily Handou nGQL gymnastics activities, so stay tuned!

Happy Graphing!

`Appendix: Building Idiom Knowledge Graph`

`Collect and generate graph data`

$ python3 graph_data_generator.py

`Import data into the Nebula Graph database`

`Deployment graph database`

With the help of Nebula-Up: https://github.com/wey-gu/nebula-up/ , one line will do.

$ curl -fsSL nebula-up.siwei.io/install.sh | bash -s -- v3.0.0

If the deployment is successful, you will see this result:

┌────────────────────────────────────────┐
│ 🌌 Nebula-Graph Playground is Up now!  │
├────────────────────────────────────────┤
│                                        │
│ 🎉 Congrats! Your Nebula is Up now!    │
│    $ cd ~/.nebula-up                   │
│                                        │
│ 🌏 You can access it from browser:     │
│      http://127.0.0.1:7001             │
│      http://<other_interface>:7001     │
│                                        │
│ 🔥 Or access via Nebula Console:       │
│    $ ~/.nebula-up/console.sh           │
│                                        │
│    To remove the playground:           │
│    $ ~/.nebula-up/uninstall.sh         │
│                                        │
│ 🚀 Have Fun!                           │
│                                        │
└────────────────────────────────────────┘

`Atlas storage`

With the help of Nebula-Importer https://github.com/vesoft-inc/nebula-importer/1623aa959d73b9, one line will .

$ docker run --rm -ti \
    --network=nebula-docker-compose_nebula-net \
    -v ${PWD}/importer_conf.yaml:/root/importer_conf.yaml \
    -v ${PWD}/output:/root \
    vesoft/nebula-importer:v3.0.0 \
    --config /root/importer_conf.yaml

It takes about a minute or two for the data to be imported successfully, and the command will exit normally.

Console connected to graph database

Get the address of the first network card of this machine, here is 10.1.1.168

$ ip address

2: enp4s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 2a:32:4c:06:04:c4 brd ff:ff:ff:ff:ff:ff
    inet 10.1.1.168/24 brd 10.1.1.255 scope global dynamic enp4s0

Enter the console container and execute the following command:

$ ~/.nebula-up/console.sh

# nebula-console -addr 10.1.1.168 -port 9669 -user root -p nebula

Check the imported data:

(root@nebula) [(none)]> show spaces
+--------------------+
| Name               |
+--------------------+
| "chinese_idiom"    |
+--------------------+

(root@nebula) [(none)]> use chinese_idiom
Execution succeeded (time spent 1510/2329 us)

Fri, 25 Feb 2022 08:53:11 UTC

(root@nebula) [chinese_idiom]> match p=(成语:idiom) return p limit 2
+------------------------------------------------------------------+
| p                                                                |
+------------------------------------------------------------------+
| <("一丁不识" :idiom{pinyin: "['yi1', 'ding1', 'bu4', 'shi2']"})> |
| <("一丝不挂" :idiom{pinyin: "['yi1', 'si1', 'bu4', 'gua4']"})>   |
+------------------------------------------------------------------+

(root@nebula) [chinese_idiom]> SUBMIT JOB STATS
+------------+
| New Job Id |
+------------+
| 11         |
+------------+
(root@nebula) [chinese_idiom]> SHOW STATS
+---------+--------------------+--------+
| Type    | Name               | Count  |
+---------+--------------------+--------+
| "Tag"   | "character"        | 4847   |
| "Tag"   | "character_pinyin" | 1336   |
| "Tag"   | "idiom"            | 29503  |
| "Tag"   | "pinyin_part"      | 57     |
| "Edge"  | "with_character"   | 116090 |
| "Edge"  | "with_pinyin"      | 5943   |
| "Edge"  | "with_pinyin_part" | 3290   |
| "Space" | "vertices"         | 35739  |
| "Space" | "edges"            | 125323 |
+---------+--------------------+--------+

`Appendix: Schema nGQL for Graph Modeling`

CREATE SPACE IF NOT EXISTS chinese_idiom(partition_num=5, replica_factor=1, vid_type=FIXED_STRING(24));
USE chinese_idiom;
# 创建点的类型
CREATE TAG idiom(pinyin string); #成语
CREATE TAG character(); #汉字
CREATE TAG character_pinyin(tone int); #单字的拼音
CREATE TAG pinyin_part(part_type string); #拼音的声部
# 创建边的类型
CREATE EDGE with_character(position int); #包含汉字
CREATE EDGE with_pinyin(position int); #读作
CREATE EDGE with_pinyin_part(part_type string); #包含声部

`references`

Wordle, the popular word puzzle game overseas, now has a Chinese version

Exchange graph database technology? To join the Nebula exchange group, please fill in your Nebula business card at , and the Nebula assistant will pull you into the group~~

Follow the public number