How did "people you may know" find you on social software?

Abstract: You have never interacted with EX, you can't remember the appearance of junior high school classmates, ex-colleagues, and even the people you least want to see-your BOSS, how did these people appear in the recommended users of your social software? What's on the list? The key technology of this is: the link prediction of the knowledge base, also known as the completion of the knowledge graph.
image.png

The crowd looked for him thousands of Baidu, and suddenly looked back, but the person was on the recommended list.

One of the best places in social software must be the deep mining of user relationships. Obviously you have blocked some people's phone numbers, WeChat, and all social accounts, but TA still appears in the "People You May Know" on the page without exception. These people include the EX you have never seen each other, junior high school classmates who can't remember their appearances, former colleagues, and even the person you least want to see-your BOSS.
image.png

▲ Douyin-discover friends

So how did these people appear on your list?

The key technology is: link prediction of knowledge base, also known as knowledge graph completion.

A picture to understand what is a knowledge graph?

Knowledge graph is a kind of multi-relation graph in which knowledge is written as structured triples, including entities, concepts and relations.

Entities refer to things in the real world such as names of people, places, institutions, etc. of entities with the same characteristics, such as "athletes" and "Golden Globes" in the following figure. Relations are used to express a certain connection between different entities.

The knowledge graph composes a graph with entities and relationships to intuitively model various scenes in the real world. The essence of the process of constructing a knowledge graph is the process of establishing cognition and understanding the world.
image.png

How to complete the knowledge graph

Take Xiaoming as an example. Xiaoming works for Sina in Wudaokou. The system can infer that Xiaoming works in Beijing. And recommended Xiao Wang, who also works at Beijing Sina, to him. In the figure below, the blue arrow indicates the relationship that already exists, and the red arrow indicates the relationship after the knowledge graph is completed.
image.png

The relationship between knowledge graph and knowledge representation learning

The knowledge graph is composed of entities and relationships, usually expressed in the form of triples-head (head entity), relation (relationship of entities), tail (tail entity), abbreviated as (h, r, t). The task of knowledge representation learning is to learn distributed representations of h, r, and t (also known as the embedding representation of the knowledge graph). It can be said that with the Embedding of knowledge graphs, AI-style knowledge graph applications become possible.

How to understand Embedding?

Simply put, embedding is a description of an object (word, character, sentence, article...) in multiple dimensions, which is equivalent to describing an object through data modeling.

For example, the RGB representation of colors in Photoshop that we often use is an atypical embedding. Here the color is divided into three characteristic latitudes, R (red intensity, value range 0-255), G (green intensity, value range 0-255), B (blue intensity, value range 0-255) . RGB(0,0,0) is black. RGB(41,36,33) is ivory black. In this way, we can describe colors by numbers.
image.png

What are the methods of knowledge representation learning

The key to knowledge representation learning is to design a reasonable scoring function. We hope to maximize the scoring function under the condition that a given fact triple is true. It can be divided into the following two categories from the realization form:

Structure-based approach

The basic idea of this type of model is to learn the representation of the entities and connections of the knowledge graph from the structure of triples. The most classic algorithm is the TransE model. The basic idea of this method is that the head vector represents the sum of h and the relation vector represents r and the tail vector represents t as close as possible, that is, h+r≈t. Here "close" can be measured using L1 or L2 norm. The schematic diagram is as follows:
image.png

Such knowledge representation learning models also include: TransH, TransR, TransD, TransA, etc.

Semantic-based approach

This type of model learns the representation of KG entities and relationships from the perspective of text semantics. Such representation methods mainly include LFM, DistMult, ComplEx, ANALOGY, ConvE, etc.

Application of knowledge representation learning

Since it is based on representation learning, the entities and relationships of the knowledge graph can be vectorized to facilitate the calculation of subsequent downstream tasks. Typical applications are as follows:

1) Similarity calculation: Using the distributed representation of entities, we can quickly calculate the semantic similarity between entities, which is of great significance for many tasks in natural language processing and information retrieval.

How to calculate similarity? for example.

Assuming that the embedding of the word "Li Bai" is a total of 5 dimensions, its value is [0.3, 0.5, 0.7, 0.03, 0.02], where each dimension represents the correlation with something, and these five values represent [Poet, Writers, writers, freelancers, knights].

And "Wang Wei"=[0.3, 0.55, 0.7, 0.03, 0.02], "Newton"=[0.01, 0.02, 0.06, 0.4, 0.01], we can use the cosine distance (in geometry, the angle cosine can be used to measure two The difference in the direction of each vector; in machine learning, this concept is used to measure the difference between sample vectors.) To calculate the distance of these words, it is obvious that the distance between Li Bai and Wang Wei is closer, and the distance between Newton and Newton is farther. It can be judged that "Li Bai" and "Wang Wei" are more similar.

2) Completion of knowledge graph. To build a large-scale knowledge graph, it is necessary to constantly supplement the relationships between entities. Using the knowledge representation learning model, the relationship between two entities can be predicted. This is generally called the link prediction of the knowledge base, which is also called the knowledge graph completion. The example of "Xiaoming Wudaokou" above can be explained very well.

3) Other applications. Knowledge representation learning has been widely used in tasks such as relation extraction, automatic question answering, entity linking, etc., showing great application potential.

Automatic question answering is a major application that is deeply integrated with knowledge representation learning. For intelligent question answering products, the background design is generally divided into three layers, input layer, presentation layer, and output layer. In short, the input layer is a question library, which gathers all the questions that users may ask. After the knowledge extraction of the presentation layer, the result is finally returned.
image.png

Typical smart question answering products include Apple Siri, Microsoft Xiaobing, Baidu, Ali Xiaomi, etc. A major feature of these question and answer products is that they can make search results more accurate, instead of returning a bunch of similar pages for you to filter by yourself, and achieve "what you answer is what you ask." For example, if you search for "Wang Sicong's worth", the result will be the specific number.
image.png

to sum up

In short, social products are based on knowledge graph knowledge completion technology, which predicts missing triples through the representation of entities and relationships, and predicts its tail entities based on the known head entity and the relationship between the head entity. In other words, they are recommended by friends based on user portraits. If you don’t want those “old acquaintances” to appear in your recommendation list, the best way is to turn off geographic positioning on social products and minimize Disclosure of personal information.

Reference

1. Liu Zhiyuan, Sun Maosong, Lin Yankai, Xie Ruobing, "Research Progress in Knowledge Representation Learning"

Click to follow and learn about Huawei Cloud's fresh technology for the first time~


开发者之家
华为云开发者社区,提供全面深入的云计算前景分析、丰富的技术干货、程序样例,分享华为云前沿资讯动态...

生于云,长于云,让开发者成为决定性力量

1.3k 声望
1.7k 粉丝
0 条评论
推荐阅读
【贺】来自开发者的点赞,华为云开发者联盟入选 2022 中国技术品牌影响力企业榜
2023 年 1 月 4 日,中国技术先锋年度评选 | 2022 中国技术品牌影响力企业榜单正式发布。作为中国领先的新一代开发者社区,SegmentFault 思否依托数百万开发者用户数据分析,各科技企业在国内技术领域的行为及影...

华为云开发者联盟阅读 476

花了几个月时间把 MySQL 重新巩固了一遍,梳理了一篇几万字 “超硬核” 的保姆式学习教程!(持续更新中~)
MySQL 是最流行的关系型数据库管理系统,在 WEB 应用方面 MySQL 是最好的 RDBMS(Relational Database Management System:关系数据库管理系统)应用软件之一。

民工哥11阅读 1k

封面图
一次偶然机会发现的MySQL“负优化”
今天要讲的这件事和上述的两个sql有关,是数年前遇到的一个关于MySQL查询性能的问题。主要是最近刷到了一些关于MySQL查询性能的文章,大部分文章中讲到的都只是一些常见的索引失效场合,于是我回想起了当初被那个...

骑牛上青山8阅读 2.2k评论 2

程序员英语学习指南
动机为什么程序员要学习英语?工作:我们每天接触的代码都是英文的、包括很多技术文档也是英文的学习:最新最前沿的技术最开始都是只有English版本就业:学好英语让你的就业范围扩大到全球,而不只限于国内目标读...

九旬6阅读 618

又一款内存数据库横空出世,比 Redis 更强,性能直接飙升一倍!杀疯了
KeyDB是Redis的高性能分支,专注于多线程,内存效率和高吞吐量。除了多线程之外,KeyDB还具有仅在Redis Enterprise中可用的功能,例如Active Replication,FLASH存储支持以及一些根本不可用的功能,例如直接备份...

民工哥4阅读 674评论 1

封面图
Mysql索引覆盖
通常情况下,我们创建索引的时候只关注where条件,不过这只是索引优化的一个方向。优秀的索引设计应该纵观整个查询,而不仅仅是where条件部分,还应该关注查询所包含的列。索引确实是一种高效的查找数据方式,但...

京东云开发者2阅读 916

封面图
面试官:请说一下如何优化结构体的性能?
使用内存对齐机制优化结构体性能,妙啊!前言之前分享过2篇结构体文章:10秒改struct性能直接提升15%,产品姐姐都夸我好棒 和 Go语言空结构体这3种妙用,你知道吗? 得到了大家的好评。这篇继续分享进阶内容:结...

王中阳Go4阅读 1.6k评论 2

封面图

生于云,长于云,让开发者成为决定性力量

1.3k 声望
1.7k 粉丝
宣传栏