Cloud Music Dawning Burial: Restoring the Ideal State of Data

Image credit: https://unsplash.com
Author: Fisherman

background

Entering the second half of the mobile Internet, algorithmic personalized recommendation and manual refined operation driven by user behavior data analysis have become an indispensable configuration for each product, and data has become one of the core competitiveness of each product. Own data warehouse and data center. Among them, the buried point plays a crucial role as the main data source of Internet products. It can be seen from the figure below that the origin of the entire data link determines the quality and capability of the entire data system. However, due to the long production and consumption link of buried points, the performance is not directly visible to users, and the information transmission and precipitation in the process are difficult, so that there are often problems with buried points, which are often found late and guaranteed costs are high; and as the business develops to different stages , the ability requirements for data embedding will also change, but it is often very difficult to follow- .

As a typical complex content product, cloud music includes a variety of content media (songs/podcasts/videos/dynamics/comments/live broadcasts/karaoke rooms, etc.), and various organizational forms (playlists/playlists/albums/topics/cloud circles etc.), multiple user identities (musician/talent/common user/host, etc.), and many content distribution scenarios (recommended stream/search/list/zone, etc.), with a typical content product sensitive to distribution efficiency Therefore, it has very complex user consumption data and behavior analysis demands in multiple domains such as overall traffic/content/users. This poses a very high challenge to end-to-end standardization and comprehensive tracking. Product decision-making/BI analysis/data warehouse development/algorithm recommendation and many other aspects need to be supported by a set of overall efficient, stable and standardized tracking design. However, in the past history, due to the poor construction of the program and the heavy historical burden, the burying point has always been a prominent pain point in the business, and there are several problems as shown in the figure below.

Based on this background, the author, together with Netease Hangzhou Research Institute and many departments in Cloud Music, established the Shuguang Embedding Project, and the design plan promotes the implementation of joint construction and optimization. At present, the core construction has been basically completed, and the core business such as search, podcast, and homepage recommendation has been completed. And data transformation, the data of the remaining platform will return to the audit ability, build up and open up, and optimize some ease of use, as well as more business landing management in 22 years.

Industry research

In order to improve the efficiency and quality of buried points, it is necessary to carry out good design in the aspects of buried point positioning standards, buried point timing aperture, parameter setting collection, etc. to improve the accurate expression of information and reduce the possibility of errors, and it is necessary to develop various tools and platforms in each link to empower Yes, I have comprehensively observed the solutions of various companies in the industry, and the overall focus is on the following aspects.

In terms of pit location, in view of the large front-end DOM tree layout rendering model, it is easy to think of describing the position of the pit node itself on the tree. Based on this, Mixpanel, GrowingIO , Shence , and netease HubbleData other selected x-path described method, by automatic x-path points and the amount buried no more service parameters are combined Automatically take screenshots on the terminal, combine IDE and platform, and set more parameters for pit circle selection; and adopt some description optimization methods to reduce the sensitivity of level, type and location changes to ID. However, it has not been able to fundamentally solve the problem of stable standardization of positioning description and consistent description of different ends. The solution is difficult to apply to core business reports and online algorithm use.

The construction of Tencent, Meituan, Byte, Kuaishou, etc. combines the pit diagram and parameter management to accurately manage the pit-oriented buried points. The positioning description is randomly generated by the platform or planned to be input according to the default specifications. The buried point parameters are all developed by the development-oriented pit. Manually set up no-level context summary collection, and combine with IDE plug-ins to enable development and testing to improve efficiency (refer to: Byte Beat Large-scale Embedding Data Governance Best Practices ). Among them, found a good combination with x-path in the internal dynamic layout, opened up the configuration of parameters corresponding to the location of the pit circle selection (x-path) and the model, and built the visual automatic embedding point on 1622ebdcb97792 MTFlexbox. Realize the complete planning/BI self-service of the buried point.

Ali uses the four-segment SPM (site. page. page block. point in the block) and SCM (delivery system ID. delivery algorithm ID. delivery algorithm version ID. delivery crowd ID) to standardize the description of location and content, develop It is also for pits to identify and set parameters. There is no hierarchical context collection and collection. AOP is done at each end of the buried point to , and the platform is opened through screenshot image coding to more business parameters. Configure . Netease also adopted a similar scheme to Ali.

It can be seen that each major manufacturer has not adopted the self-generation of x-path-like positioning description and the construction of context parameter basis collection and collection in mainstream products. Cloud Music also adopted a similar management scheme earlier. The location description of the pit was randomly generated by the platform and four-segment SPM. However, due to the large number of resource distribution and organization forms in the complex content community-based products of Cloud Music itself, as well as the internal engineering construction background, as shown in the figure below, a playlist and dynamic will appear in many scenarios and module partitions, and the industry's current There are various management methods, even if the song list card components on the end are unified and reused, but their buried points need to be greatly expanded with the increase of scene combinations and presentations, which makes the development of buried points in terms of workload and quality. And standardization has always been a pain.

Based on the business and architectural characteristics of the content-based product itself (subsequent articles will be output for special explanation and analysis, please pay attention), in order to completely solve the problem and meet the business needs, it is still necessary to consider the self-positioning and self-aggregation of the page-level context of the x-path type. The solution is just that we need to solve the problem of accuracy, stability and consistency that the industry has not been able to effectively deal with through certain means.

Among them, in terms of self-aggregation and collection of context parameters: Watermelon Video is based on the responsibility chain model , and collects relevant parent-child level parameters in the business layer, but it will be more intrusive and incomplete to the business. The Datong buried point platform of Tencent PCG Data Center also combines the UI level of the DOM tree to automatically collect and summarize and format parameters at all levels. However, the Datong management platform itself is not well done, and the SDK collection scheme is not thorough and generalized enough. Limited maintenance is difficult, and the road of virtual object tree construction has not been completely moved.

Dawning Program

In the Dawning Buried Point, considering that there are many levels without data meaning in the location description of x-path, and these levels often change with the style refactoring and modification of the classmates on the end (often there are new container levels or Type modification), and in the actual embedding point, you only need to pay attention to the resource cards, module channel containers, interactive components, etc. containing several levels. If you can only mark these levels and perform self-aggregation, then the description of the positioning If the product If there is no demand change at the design level, it is stable and easy to understand and can be consistent across multiple ends (after all, product designs on different ends are consistent). Based on this, the concept of buried point objects as shown below is introduced. In the entire DOM tree, the required object level is identified by oid (illustration, the actual level is more than drawn, only red-page and blue - The element level is marked with oid), and with the help of the structure of the DOM tree, a sparse buried object tree is naturally formed.

The basic idea of the whole Dawning Embedding is to build the burying object tree of the page on the end (client & cross-end & front-end) and keep it updated synchronously, to realize the automatic burying of page and element exposure, and the Hook through the user behavior API Realize the automatic burying of clicks.

As shown in the figure above, Dawning has built a business-insensitive automatic burying SDK, which automatically generates and updates the object tree of the page, realizes the automatic burying of exposure and click events, and finds the root from the corresponding node when the pit burying point is generated. For the object node, collect the parameters set by all the nodes on the tree branch, and then the standardized location SPM and content SCM description of the pit can be automatically generated; on this basis, the relevant buried point events are recorded in the SDK in an orderly manner, and the user's operation behavior on the way can be identified. , directly generate and record refer parameters in the agreed format when the buried point is generated for efficient and accurate behavioral attribution. At the same time, it can be seen that this solution is particularly friendly to the componentized content-based architecture. For the case where the card is reused and reorganized in different scenarios, the card itself does not need to be buried again. The relevant parameter standards and buried point structure conventions involved in the scheme are as follows, and the main key parameters and their meanings are listed.

On this basis, a corresponding embedment platform is built for embedment requirement management, parameter and other metadata management, embedment testing and inspection and verification, etc. The overall process is as follows:

Going back to the pain point analysis above, with the support of the SDK and platform, as well as the subsequent UI automation based on Dawning’s buried points, our solutions to each pain point can be summarized as follows:

Dawning SDK

Dawning SDK is the core of the implementation of the entire solution. Its overall contents are as follows. Currently, the four-terminal support of Android, IOS, WEB, and RN has been completed:

For better standardization, we limit the standard events to two major events, exposure and click, and then use the spm and scm of the object to distinguish the detailed event business meaning. For example, in the old burying scheme (generally the industry also adopts this kind of definition), the attention event on the terminal will define the EventCode as "focus", and in Dawning, it will be directly defined as the click operation of the element that can cause the attention behavior (usually it will be The same oid==btn_focus is used to define uniformly), so the expressiveness of the click event is relatively sufficient as a whole. In terms of exposure, we distinguish between pages and elements. The exposure and anti-exposure of pages will trigger the exposure detection of elements below the corresponding nodes, and page nodes have root pages and sub-pages according to their different levels, but business development does not need to be concerned. Just API set elementOID and pageOID. Based on this, the basic core function of the SDK is to realize the automatic tracking of page/element exposure and click events through the SDK at each end, and to automatically record and correlate user behavior. The current SDK processing flow on Android, IOS, and WEB is shown in the figure below.

The key process in the figure: The tree update is triggered by the system AOP, the OID node is filtered out in the main thread according to the traversal of the actual DOM tree of the page, and the virtual tree is generated, and the tree update event is pushed to the Dawning worker thread to perform the visible occlusion cross judgment of the view for exposure. The automatic generation of buried points, and then in the worker thread according to the tree structure, the branch backtracking of the buried node node collects the parameters of the relevant nodes for structuring; at the same time, the refer parameters of the user behavior link (by page exposure, element click) are maintained in the worker thread. , user-defined and inserted referrers), these referrer parameters will be brought directly in the buried point for data-side consumption attribution.

The green circle in the figure is the API open to business development, which includes the parameter settings of the node, active trigger refresh (some very special cases where page updates are not triggered by AOP services), custom tracking events, and non-AOP events. Refer insertion and other four parts, the overall operation is very lightweight, the most important node parameter setting API can be consistent with the actual display data set for View by each end, and each layer only needs to pay attention to its own parameters, the parent module, Pages and other information are automatically collected by the SDK when backtracking.

Of course, this is the main process in an ideal state. In the overall solution, there will be special circumstances in the front-end and client-side that need to be smoothed out. Consistent, and to meet the needs of business data, we support logical mounts to separate a node from its own View parent-child relationship to establish a new parent-child relationship (as shown on the left of the above figure, the floating layer opened by clicking more songs may be an independent Window, and There is no parent-child relationship between the playlist page and the internal elements; however, in data analysis, you will want to see the information on the playlist page and the corresponding song card by clicking on each click in the floating layer. At this time, you can choose to logically hang the floating layer object. loaded into the playlist card or under the playlist page object), and a virtual hierarchy to support adding an analyzable data hierarchy to the parent-child hierarchy of View that does not exist (as shown on the right in the figure above, for each playlist and song card in the module, as well as the header More on the part and the buried event of the play button, you will need to know the information of the module to which it belongs, but the level of the entire module may not exist on the end--the red dotted line in the figure, but the horizontal scrolling list at the head and below is on the overall list The splicing illusion is relatively common for end developers to reduce the UI level. At this time, you can quickly achieve the goal by adding a layer of virtual module nodes corresponding to the dotted red box).

The SDK also provides a relatively complete extension support and judgment for the exposure visibility of View. The business can modify the actual Frame of the View through the API (display Rect, so as to support the actual visibility behind the semi-transparent, etc.). At the same time, in the exposure detection, a strict occlusion comparison judgment is also made based on the visible area and tree structure of the node. The overall principle and process are shown in the figure above.

The above is the overall situation of Android, IOS, and WEB. The WEB-side solution is based on DOM rather than VDOM operations, and can be adapted to various front-end technology stacks. For the cross-end platform rendered by the RN class Native, it maintains the same Component configuration API on the JS side as the WEB side, and then the client extends the properties through the ViewManager to configure the relevant nodes to correspond to the actual View nodes on the Native side. Other processes are the same as Native is consistent. In addition, for the rendering frameworks such as Flutter and Compose on the Android side, the entry point can also be found in the actual layout node tree, and the corresponding buried point object tree structure can be constructed. More details will be disclosed in due course.

Android端BaseViewManager中扩展：
@ReactProp(name = "eventTracing")
public void setEventTracing(@Nonnull T view, @Nullable ReadableMap eventTracing) {
  // 调用客户端侧曙光埋点SDK的节点设置API
}

IOS端RCTViewManager中扩展：
RCT_CUSTOM_VIEW_PROPERTY(eventTracing, NSDictionary *, RCTView) {
  // 调用客户端侧曙光埋点SDK的节点设置API
}

In addition to the core functions of the SDK, the Sugon project has also developed a tool to support the direct viewing of objects and levels at each end, assisting development and testing, and with the implementation of the solution and the development of standard data warehouses, it can also be used on the end in the future. Directly do data visualization, the overall effect is as follows, and the specific details will not be repeated.

EasyInsight

EasyInsight is a tracking management platform built by Sugon. It plays a very important role in the data tracking before, during and after the data tracking. The overall functions of the platform are shown in the figure below, among which requirements management and parameter management have been completed, and The requirements testing part of the test audit. This article is currently being written about regression auditing, dynamic parameter selection, and subsequent development of more extensions.

In advance, it carries the metadata management of buried points, the proposal of buried point requirements, the design of buried points, review and assignment of tasks.

According to the parameter requirements in the object details page, the students in the development process determine the appropriate level and components according to the level (blood relationship) relationship to set the buried point parameters. The platform maintains various public and private parameters required by the relevant objects and SPM at the object level. , configure the value of related parameters, and build an object tree that reuses the next page through the maintenance of the parent object, and also provides a complete perspective of the blood tree, which is more intuitive.

The post-event platform provides automated testing tools for the demand dimension, as well as integrated grayscale and online data inspection functions for version capabilities. Development can be quickly self-verified and then launched. In the requirement test, the development is connected to the platform socket through QR code scanning to ensure the real-time nature of the buried point data, and is compared with the buried point specification generated by the object relationship and parameter configuration maintained on the platform. The regression audit function is more complicated, especially for the large number of client-side requirements for integrated and centralized release. We have done a lot of work to connect the internal R&D management tools with EasyInsight, and associate the code merge status of version-related tasks to find the right one on the platform. Credential stuffing verification is performed after the version and the embedded ODS generated by it are shunted. Results of tests and audits are output in full reports and IM notifications.

In addition to the management of the embedded point development process, the metadata and object information deposited on the embedded point after the embedded point goes online is also the main content of the subsequent data warehouse development, BI analysis, and algorithm use; at the same time, the related visual agile data analysis capabilities are also under construction. , based on the high standardization of the overall data buried points, common multi-dimensional, funnel, retention, path and other analysis requirements, you can simply and quickly drag and drop to configure and view.

data governance

At the time of the construction of Dawning Buried Point, Cloud Music already had many versions of buried point data, and various calibers and specifications coexisted, and the overall data usage was confusing. No matter how well-built the point system is, if it only covers new requirements, regardless of old points, and various business reports and business and algorithm tasks are mixed, the accuracy cannot be guaranteed. Therefore, the construction of new plans, the development and implementation of new burial sites, and the sorting and comparison of old burial sites are indispensable for the three legs, all of which are included in the scope of the Dawning Project. During the process, data development, planning, BI, algorithms and the development of various business servers were involved in the detailed sorting of old usage, and also used this as an opportunity to enrich and standardize cloud music's data warehouse capabilities.

The overall new and old data compatibility solution is as shown in the figure below. Since the transformation of the buried points is not completed with a centralized investment, but continues with the business iteration, it is necessary to carry out a secondary sorting of the old buried points (the old non-standardized ones will be sorted out). The buried point is mapped into a pseudo-spm through UDF according to various parameter conditions), compatible with new and old data through the intermediate table according to the new and old spm mapping, and the original downstream directly from the ods to be used directly. meter or compatible flow meter.

EasyInsight also provides tooling support for traffic switching of compatible tables, as shown in the following figure:

postscript

At the same time that Sugon completes the implementation of buried points and data management, Cloud Music has also solved the objects that have not been fundamentally solved by various UI automation solutions based on x-path and Accessibility based on Sugon's multi-end consistent and clear oid and spm. Due to the poor stability of use cases and low operating efficiency, a new automation solution was built and achieved very good results (I will write an article to summarize the construction ideas and pain points of various automation solutions in the big front-end and the automation solution based on Dawning, stay tuned). At the same time, a more complete security risk control strategy based on Dawning, a new AB system, and visual analysis capabilities are also being launched. Dawning’s burying point starts from burying points, but it means more, which is also what was expected from the initial project establishment as shown in the open-mind diagram.

Governing data buried points is a very hard and difficult task. It is like changing the wheels of a high-speed train. It is not easy to do in any company. Fortunately, I got the support of my boss in Cloud Music and cross-team circle. A few very good friends have achieved a very good degree of completion in a short time with a relatively small investment. During the period of many difficulties, the students of the project team lost their confidence several times, but they all survived in the end. With the implementation of core business and business switching feedback, all parties have put more and more attention and expectations on it. Looking back, I am full of gratitude. It is a blessing in life to be able to restore the ideal country of data that I had long imagined in my heart.

This article is published from the NetEase Cloud Music technical team, and any form of reprinting of the article is prohibited without authorization. We recruit various technical positions all year round. If you are ready to change jobs and happen to like cloud music, then join us at grp.music-fe(at)corp.netease.com!

Cloud Music Dawning Burial: Restoring the Ideal State of Data

background

Industry research

Dawning Program

Dawning SDK

EasyInsight

data governance

postscript

云音乐技术团队

引用和评论

AI Code 在团队开发工作流的融合思考

ClkLog埋点用户分析系统支持手机端查询统计数据

用户行为分析正在被保险行业广泛采纳-ClkLog埋点分析系统

ClkLog埋点分析系统-uni-app埋点上报攻略

ClkLog埋点分析系统集成指南

ClkLog埋点分析系统常见问题-指标定义与统计逻辑Sec.1

从零开始搭建埋点采集体系：轻松解锁用户行为分析