Abstract: At the 7th Global Software Conference, Huawei’s software development engineer Yu Jibo and developers talked about the intelligent practice of Huawei’s official website, focusing on content production, content analysis, content quality inspection, and content operation. Six processes including content distribution, content consumption, and user feedback, as well as business pain points encountered in the process.
This article is shared from the HUAWEI CLOUD community " HUAWEI CLOUD Official Website Five Key Initiatives for Intelligent Practice [Global Software Conference Technology Sharing] ", the original author: Technical Torchbearer.
The Internet is generating massive amounts of content every moment. According to a report from Rui Ya: China’s Internet generated 4.2 million voice messages, 8.3 million shared videos, 4.16 million search queries, and 1.65 million Weibo visits within 60 seconds. .
Faced with so much content, how should we do a good job in website content operation?
At the 7th Global Software Conference, Huawei Software Development Engineer Yu Jibo chatted with the developer about the intelligent practice official website of Huawei Cloud, which mainly focused on content production, content analysis, content quality inspection, and content distribution in content operations. 6 processes including content consumption and user feedback, as well as business pain points encountered in the process.
It also focuses on how Huawei Cloud uses AI algorithms and models to provide automation capabilities, reduce labor costs, and improve content quality and content distribution efficiency.
How to judge content quality and what is the key to efficient content distribution?
In the digital age, traffic is the key to website content operation, and the good experience brought by high-quality content and efficient content distribution is the basis for traffic increase. A negative example is that the Indian media misused Putin's photos in reporting on sexual assaults. A positive example is news, e-commerce, and video websites that use recommendations and searches for content distribution.
So how does Huawei Cloud official website serve as a content website?
First introduce the content life cycle and content operation process of HUAWEI CLOUD. The content operation of HUAWEI CLOUD official website is divided into 6 stages: content production, content analysis, content quality inspection, content distribution, content consumption, and user feedback. The pages, documents, audio, video and pictures of the official website are first analyzed and understood. After the content is reviewed, the operators will distribute them to the live website. After the content is consumed on the official website of HUAWEI CLOUD, the end users will feedback relevant opinions to the internal and external platforms.
In the content operation process, our pain points include the following parts:
- A large amount of multimedia (audio, video, pictures, etc.) content requires in-depth analysis of semantics in order to judge the quality of the content and carry out effective distribution, which is time-consuming and labor-intensive;
- The content release data volume is large, the update is frequent, and the large amount of content quality inspection consumes a lot of manpower and is low in efficiency;
- The traditional method of operation configuration cannot meet the individual needs of complex customer groups, and it is easy to reduce user interest and lead to user loss;
- The access experience of the end user cannot be effectively collected, analyzed, and closed loop, which is not conducive to the rapid improvement of the product experience.
In response to the above problems, we mainly use intelligent solutions to solve business pain points at various stages, including:
- In the content analysis link, use OCR, ASR, NLP and other technologies to automatically extract the structured information of the content to reduce labor costs;
- In the content review process, use NLP technology and Huawei Cloud Moderation service to conduct machine review;
- In the content distribution link, use structured content (TDK, tags, categories, etc.) information, as well as smart recommendation, smart search and other related technologies to improve the efficiency and accuracy of content distribution and improve user experience;
- In the user feedback link, NLP-related technology is used to perform sentiment analysis and voice classification, deal with it in a timely manner, close the loop, and continuously form product improvement suggestions.
The following is a detailed introduction to Huawei Cloud intelligent operations related practices.
Key measures for the practice of intelligent operation of the official website
First, I will introduce the overall architecture of the smart operation of Huawei Cloud official website. The architecture is relatively simple and contains several key layers.
First of all, the bottom layer is the basic service layer. All our businesses are built on Huawei cloud services, including AI-related OCR, ASRC, NLP, RES, ModelArts, big data-related DLI, MRS, etc., as well as basic SQL and NoSQL storage services ; Above the basic service layer is the core data layer, including user portraits, behavior data, item information and other data; the middle layer is our feature engineering and algorithm model layer, and the algorithm models are mainly concentrated in NLP, intelligent recommendation, and intelligent search. algorithm. In the upper layer, we have built service components to support different business scenarios, including portrait and label components, strategy management sequencing components, AB testing and log collection components, etc.; the top upper application scenarios mainly include five-faced, recommendation, and search , Public opinion and intelligent question and answer, etc.
I will focus on some key measures for intelligent practice.
Key measure 1: Content analysis
In the content analysis stage, we use HUAWEI CLOUD’s OCR and ASR technologies to extract the text of pictures, audio and video to facilitate the next automated content review; at the same time, we use NLP-related technologies to extract the text’s keywords, abstracts, tags, categories, topics and other structures Information is used for model training in search engine optimization and content distribution stages.
Key measure 2: Content quality inspection
After the content is text extracted and semantically understood, we use automated means to conduct content quality inspection, including text error correction, content review, and regulatory inspection. Among them, text error correction provides pinyin-based error correction, N-Gram substring-based error correction, and language model-based error correction capabilities, because the business needs to update keywords and corpus regularly, and update the model regularly.
Content review is connected to the Moderation service of HUAWEI CLOUD. It has the review capability of text, image, and video. The business only needs to update the sensitive vocabulary regularly. In addition, there are regulatory inspections, including 404 dead links, TDK information, currency units, etc. The main solutions used are crawler services and rule engines.
Key Initiative 3: Content Distribution-Smart Recommendation
In the content distribution stage, we mainly introduce smart recommendation and smart search. Smart recommendation is to predict user interests based on user item portraits and user behaviors through intelligent means, so as to realize content search and accurate recommendation, and increase conversion rate.
The system architecture of Huawei Cloud Smart Recommendation is as follows: Based on offline OBS data, DLI's offline processing is used to extract user item portraits and user behavior information, and DLI's offline processing is used for feature engineering, recall, and ranking model training. After training, it will be released to ModelArts platform that provides online reasoning capabilities.
At the same time, we also support real-time recommendation capabilities. The business uploads user and item information through the DIS channel and updates the user and item portraits in real time. The DIS channel connects real-time behaviors to update user interest tags and recalls the real-time recommendation result set. Finally, when the user visits the official website page, he requests the ModelArts interface to put back the sorted recommended content.
Key measure 3: Content distribution-recommendation algorithm
The industry's recommendation algorithm is relatively mature. We have adopted commonly used recall and sorting algorithms. The recall part includes: collaborative filtering and interest matching, and the sorting part mainly uses LR and DeepFM. The advantage of LR is that the model is simple, efficient, and the amount of calculation is small, but the disadvantage is that it cannot handle the relationship between multiple features. The advantage of DeepFM is that it combines low-level and high-level features, and the more features, the more accurate.
In the end, intelligent recommendation brings a lot of improvement effects to the business. the distribution efficiency of content from hourly to minute, and the coverage rate of content push is increased to 90%+.
In addition, the click-through rate of official website products, activity recommendations, registration and purchase conversion rates, and the click-through rate of community homepage blog recommendations have all been improved.
On the intelligent recommendation of content distribution, we also summarized a few thoughts and experiences:
- For business scenarios with small data volumes, algorithms with simple models and strong interpretability are preferred to go online, quickly optimize and quickly verify the effect of the algorithm through the AB test;
- Make full use of the user's near-line and search behavior, because near-line represents the user's real-time interest, search can generally represent the user's content requirements, and it will be better for business indicators to improve;
- In the recommended scenario, no algorithm is omnipotent. It is necessary to select the appropriate algorithm in combination with the scenario, user and business characteristics, and the results of data analysis.
Key Initiative 4: Content Distribution-Intelligent Search
Another key measure of intelligent distribution is intelligent search. From the data statistics and the heat map analysis on the right, user search results pay more attention to the structured card part and the top-ranked articles, and the lower the attention the later. Therefore, our search optimization mainly focuses on the following aspects: 1. Smart card recall; 2. Search recall optimization; 3. Search ranking optimization.
Smart card recall
In the part of smart card recall, we mainly use the FastText model to predict the card category (text classification) corresponding to the user's search term. The input layer is the vector of words that make up the query, and the output layer is the softmax layer, which mainly outputs predicted cards and probabilities.
At the same time, we have optimized the structure of the hidden layer. The original structure adopts the superimposed average method. Although the calculation speed is fast, there is information loss. Therefore, the hidden layer is changed to a fully connected embedding method after splicing.
Recall optimization based on deep semantic model RNN-Attention-DSSM
We use the RNN-Attention-DSSM model to optimize search recall. Traditional ES queries are based on keyword matching query recall. For keywords that do not match but have the same semantics, they cannot be recalled. The DSSM model uses a large amount of Query and Doc to expose click logs, uses DNN to express Query and Doc as low-dimensional semantic vectors, and then uses the cosine distance to calculate the semantic vector distance between the two, and finally trains a semantic similarity model. RNN-Attention-DSSM is a further optimization of DSSM, which considers the context characteristics of the sentence through the RNN and Attention mechanism.
The RNN-Attention-DSSM model is as follows: the top layer is a typical DSSM layer, which calculates the semantic similarity based on the vector distance between the query and the positive and negative documents, and performs softmax. The goal of training is to maximize the probability of forward documents under query. On the left below is a typical GRU network, and on the right is a typical Self-Attention model.
Our training data is as follows: positive samples are Docs clicked by Query, negative samples are randomly selected from Docs not clicked by Query, and the ratio of positive and negative samples is 1:4. Query input is user query content, Doc input is file title + book name.
Ranking optimization based on learning ranking algorithm Ranknet
At the same time, we use the Ranknet model to optimize the ranking of search recall results, and put highly relevant doc in the top position to improve the accuracy of search results and user experience. The Ranknet model belongs to the pairwise method. It does not care about the specific value of the correlation between a certain doc and the query, but transforms all doc ranking problems into solving the order of any two doc. Namely: using doci is more relevant than docj, docj is more relevant than doci, and the degree of correlation between the two is equal, a total of three categories, and respectively use {1, -1, 0} as the corresponding category label.
As shown in the figure above, the process of the Ranknet algorithm is: on the left side, extract the features according to the user's query and recalled articles, and then a DNN network calculates the word segmentation of each document, and then calculates the difference between the scores of the documents, and then goes through the sigmoid function. Constrain the value between (0,1).
The most right key to label data, currently uses the number of clicks of each document, and compares the number of clicks of the documents in pairs, the smaller one is -1, the equal one is 0, and the larger one is 1. Then linearize the comparison value and scale the value to the orientation of [0,0.5,1]. The goal of model training is to make the comparison value obtained by the model and the value of the label data comparison as close as possible, and the model training uses the cross-entropy loss function.
Our smart search has also brought good results. Whether it is smart card recall or ranking optimization, it has increased the search click-through rate of Top1000 and Top5000.
In the next step, we plan to further improve the offline indicators of the ranking model, select a rich feature set based on business understanding and features, and find more features related to relevance; secondly, distinguish between long and short word queries, and build a separate training model for short queries. Improve the accuracy of the sorting of short query sentences; finally, it is based on NLU to further mine the user's search intent to solve the problem of unclear user search intent.
Key measure 5: Experience closed loop-sentiment analysis and voice classification
Analysis and improvement of user experience issues are an important way to continuously improve product experience. We mainly use NLP technology to analyze user emotions, and classify and distribute experience problems. The relevant logical views are as follows:
After the internal and external voices are connected, they are stored in the database after data deduplication and cleaning, and then NLP and other capabilities are used for emotional analysis and voice classification: timely public opinion warnings are issued for negative voices, and product experience problems and requirements are respectively passed through Bugs Order and demand order tracking and closed loop. At the same time, we also have a corresponding operation management platform for public opinion configuration, key public opinion tracking, emotional feedback and Kanban data presentation. The model used in this piece is also relatively simple: the bottom layer is a Bert pre-training model, and a classification model is connected downstream.
In the end, our performance data are as follows:
1. The accuracy of negative sentiment analysis reaches 95%+;
2. The workload of sentiment analysis is greatly reduced, and the number of manpower is reduced;
3. The efficiency of negative emotion processing has been raised from hour-level to minute-level;
4. According to the classification of experience problems, promote the completion of 50+ closed loop of effective improvement suggestions for cloud services.
The experience is: 1. The category definition is as clear as possible, easy to distinguish, and reduce ambiguity; 2. The labeling corpus is provided in small batches with high frequency, sampling quality inspection, and the accuracy rate is less than 95%.
Summary of engineering practice
Our engineering practice is relatively simple: based on the Huawei Cloud ModelArts one-stop development platform, build the capabilities of data processing, model training, model management, and deployment, and build the ability to continuously train and release models based on the timing scheduling of DGC.
In order to make content operations more intelligent, what we are currently doing also includes:
- Based on the pre-training ability of HUAWEI CLOUD NLP Pangu large model, optimize the accuracy of text classification and information extraction;
- According to the keywords and new features of Huawei Cloud products, AI algorithms are used to intelligently generate article content;
- Based on in-depth semantic mining and structured information of the content, establish the association relationship of HUAWEI cloud content, construct the unified life cycle management of the content, and construct the knowledge graph based on the association relationship to conduct intelligent recommendation and search;
- Multi-task article quality scoring based on page vision, information content and semantic depth to improve content quality.
welfare
After understanding the key measures for the intelligent practice of HUAWEI CLOUD official website, do you have any gains or have any questions you want to communicate? Welcome to leave your questions or thoughts in the original comment area. We will extract 3, and ask experts to communicate with you 1V1 ( original portal is here ), and a developer gift package will be given.
This time, two Huawei experts brought you front-end low-code practice . They also answered developers’ concerns, such as the best solution for website high-availability guarantee , Selection of low-code platforms, etc. Welcome to scan the code to watch the video.
Finally, attach the technology sharing PPT of Huawei's front-end R&D engineer Guo Xiao at this global software conference, click [Five key measures for intelligent practice on Huawei Cloud official website] can be downloaded and viewed at the end of the article.
Click to follow, and learn about the fresh technology of Huawei Cloud for the first time~
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。