Evolution and Practice of Online Service System of Algorithm Platform

The Turing platform is a one-stop algorithm platform built by the Meituan distribution technical team. The online service framework in the Turing platform-Turing OS mainly focuses on machine learning and deep learning online service modules, which are online models and algorithm strategies. Deployment and calculation provide a unified platform solution, which can effectively improve the efficiency of algorithm iteration. This article will discuss with you the thinking and optimization ideas of Turing OS in the construction and practice, hoping to help or inspire everyone.

0. Write in front

AI can be said to be a hot "star" in the Internet industry. Whether it is an established giant or a traffic upstart, they are vigorously developing AI technology to empower their own businesses. Meituan has been exploring the application of different machine learning models in various business scenarios early on, from linear models and tree models to deep neural networks, BERT, DQN, etc. in recent years, and they have been successfully applied to search , Recommendation, advertising, distribution and other businesses have also achieved better results and output.

The algorithm platform built by Meituan’s Distribution Technology Department-Turing (hereinafter referred to as Turing Platform) aims to provide one-stop services, covering data preprocessing, feature generation, model training, model evaluation, model deployment, online prediction, AB The whole process of experiment and algorithm effect evaluation reduces the threshold for algorithm engineers to use, helps them to escape from tedious engineering development, and focus their limited energy on the iterative optimization of business and algorithm logic. For specific practices, you can refer to a technical blog previously posted by the technical team, "1609e513ba04be One-stop Machine Learning Platform Construction Practice ".

With the completion of the machine learning platform, feature platform, AB platform, etc., the distribution technology team found that the online prediction part has gradually become a bottleneck in algorithm development and iteration. For this reason, we started the overall research and development of the Turing online service framework. This article will discuss in detail the design and practice of Turing OS (Online Serving), the online service framework in the Turing platform. I hope it can be helpful or inspiring for you.

With the gradual maturity of the Turing platform, more than 18 business parties, including Meituan Distribution, have connected to the Turing platform. The overall overview is roughly as follows: A total of 10+ BUs (business units) are connected, 100% coverage of the US The core business scenario of group distribution supports 500+ online models, 2500+ features, 180+ algorithm strategies, and supports tens of billions of online predictions every day. Powered by the Turing platform, the algorithm iteration cycle is reduced from the day level to the hour level, which greatly improves the iterative efficiency of the distribution algorithm.

1. Introduction to Turing Platform

The Turing platform is a one-stop algorithm platform. The overall architecture is shown in Figure 1 below. The bottom layer relies on Kubernetes and Docker to achieve unified scheduling and management of resources such as CPU/GPU, and integrates machine learning such as Spark ML, XGBoost, and TensorFlow. /Deep learning framework, including feature production, model training, model deployment, online reasoning, AB experiment and other one-stop platform functions, supporting the scheduling and time of Meituan distribution and flash sale, cycling, grocery shopping, maps and other business units Various AI applications such as estimation, distribution range, search, recommendation, etc. Turing platform mainly includes four functions: machine learning platform, feature platform, Turing online service (Online Serving), and AB experiment platform.

图1 图灵平台总体架构

Machine learning platform : Provides functions such as model training, task scheduling, model evaluation and model tuning, and implements drag-and-drop visual model training based on DAG.
Feature Platform : Provides functions such as online and offline feature production, feature extraction, and feature aggregation, and pushes it to the online feature database to provide high-performance feature acquisition services.
Turing online service : Online Serving, hereinafter referred to as Turing OS, provides a unified platform solution for feature acquisition, data preprocessing, online deployment of models and algorithm strategies, and high-performance computing.
AB experiment platform : Provides functions such as pre-AA grouping, in-event AB shunting, and post-event effect evaluation, covering the complete life cycle of AB experiments.

Turing OS mainly refers to the online service module of the Turing platform, focusing on machine learning/deep learning online services. The goal is to make offline trained models go online quickly, effectively improving the algorithm iteration efficiency of various business departments, and quickly obtaining results. Generate value to the business. The following will focus on Turing Online Serving.

2. Turing OS construction background

In the early stage of the development of Meituan’s distribution business, in order to support the rapid development of the business, to quickly support the online algorithm and rapid trial and error, the engineers of each business line independently developed a series of online forecasting functions, which is known as the "chimney mode". This model is very flexible and can quickly support the personalized needs of the business. However, with the gradual expansion of business scale, the shortcomings of this "chimney model" have been highlighted, mainly in the following three aspects:

Duplicate the wheel : Feature acquisition and preprocessing, feature version switching, model loading and switching, online prediction and AB experiment are all independently developed, starting from scratch.
lack of platform-based capabilities : lack of platform-based operation and maintenance, management, monitoring and tracking capabilities for the complete life cycle of features and model iterations, and low R&D efficiency.
Algorithm and engineering are seriously coupled. : The boundary between algorithm and engineering is blurred, the coupling is serious, mutual restriction, and algorithm iteration efficiency is low.

The "chimney model" made an indelible contribution in the early stage of business development, but with the growth of business volume, the marginal revenue of this method has gradually decreased to an intolerable level, and a unified online service framework is urgently needed to make changes. .

At present, most mainstream open source machine learning online service frameworks on the market only provide model prediction functions, and do not include pre-processing and post-processing modules, as shown in Figure 2 below.

图2 机器学习在线服务示意图

For example, Google TensorFlow Serving is a high-performance open source online service framework for machine learning model Serving. It provides gRPC/HTTP interfaces for external calls, supports model hot updates and automatic model version management, and solves the pain points of resource scheduling and service discovery. Provide stable and reliable services to the outside world. However, TensorFlow Serving does not include pre-processing and post-processing modules. The business engineer needs to preprocess the input into tensors and pass them to TensorFlow Serving for model calculation, and then post-process the model calculation results. The logic of pre-processing and post-processing is very important for algorithm strategies, and iterations are more frequent. This part is closely integrated with the model and is more suitable for algorithm students to be responsible. If it is implemented by the engineering side, the engineering students are simply implementing the algorithm design. Logic, coupling is too serious, iteration efficiency is low, and it can easily lead to inconsistencies between design and implementation, and cause online accidents.

In order to solve the above problems and provide users with a more convenient and easy-to-use algorithm platform, the Turing platform has built a unified online service framework, which is presented in the form of algorithm version by integrating model calculation and pre-processing/post-processing modules. Iteration eliminates complicated interactions with algorithms and engineering.

Here we have extended the algorithm definition. The algorithm (also called algorithm strategy) in this article can be understood as a combined function: y=f1(x)+fi(x)+…+fn(x), where fi(x) It can be rule calculation, model calculation (machine learning and deep learning) or non-model algorithm calculation (such as genetic algorithm, operations research optimization, etc.). The adjustment of any combination factor in the combination function (such as model input and output changes, model type changes, or rule adjustments) can be regarded as an iteration of the algorithm version. Algorithm iteration is a cyclic process of algorithm development-online-effect evaluation-improvement. The goal of Turing OS is to optimize the iterative efficiency of the algorithm.

3. Turing OS 1.0

3.1 Introduction to Turing OS 1.0

In order to solve the problem of repeated wheel manufacturing and lack of platforming capabilities during the development of the "chimney mode", we set out to build the Turing OS 1.0 framework. The framework integrates model calculation, pre-processing, and post-processing modules, and encapsulates the complicated feature acquisition, pre-processing, model calculation, and post-processing logic in the Turing online service framework and provides it to the outside world in the form of SDK. Algorithm engineers develop personalized pre-processing and post-processing logic based on Turing online service SDK; business engineering integrates Turing online service SDK and algorithm package, and calls the interface provided by SDK for model calculation and algorithm calculation.

Through Turing OS 1.0, we have solved the problems of independent development, independent iteration, and repeated wheel building by business parties, greatly simplifying the development work of algorithm engineers and engineering R&D personnel, and the project is indirectly called algorithm prediction through the Turing online service framework. Processing and model calculations do not directly interact with the algorithm, which reduces the coupling problem of engineering and algorithm to a certain extent.

As shown in Figure 3, the Turing online service framework at this stage integrates the following functions:

图3 图灵OS 1.0

3.1.1 Feature acquisition

Through feature aggregation, dynamic grouping, local caching, and line-of-business-level physical resource isolation, it provides high-availability and high-performance features for online acquisition of computing power.
Configure the feature acquisition process through custom MLDL (Machine Learning Definition Language), and unify the feature acquisition process to improve the ease of use of online service features.
DLBox (Deep Learning Box) supports placing the original vectorized features and models on the same node for local calculations, solving the performance problem of large-scale data recall in deep learning scenarios, supporting high concurrency of various distribution businesses and rapid algorithm iteration.

3.1.2 Model calculation

Support local (Local) and remote (Remote) model deployment modes, respectively corresponding to the deployment of models in business service local and dedicated model online service clusters; through multi-machine asynchronous parallel computing, support for CPU/GPU resource heterogeneity and other means , To solve the performance problem of large-scale model calculation; through the model Sharding to solve the problem that the single-machine cannot load the ultra-large model.
In terms of deep learning model calculations, use high-performance computing acceleration library MKL-DNN and TVM and other compilation optimization technologies to further improve the inference performance of deep learning models.
Through the configuration of model feature association relationships and preprocessing logic encapsulated by MLDL, the automation of feature acquisition, feature processing, and assembly is realized, and the efficiency of model development iteration is improved.

3.1.3 Algorithm calculation

Supports algorithm version management, AB routing, supports dynamic acquisition of models, features and parameters associated with algorithm versions, and supports hot updates of models and parameters.
Supports AB experiments and flexible gray scale releases, and realizes the evaluation of AB experiment effects through a unified buried point log.

3.2 Legacy issues of Turing OS 1.0

Turing OS 1.0 solves the problems of duplication of wheels, chaotic features, and lack of platform capabilities in various business lines. By providing one-stop platform services, it supports the large-scale algorithm online prediction scenarios and high-performance computing of various business lines of Meituan Distribution The demand for algorithm; makes algorithm students pay more attention to the iterative optimization of algorithm strategy itself, which improves the efficiency of algorithm iteration. However, the aforementioned three-party coupling problem of engineering, algorithm, and platform has not been solved well, and it is mainly reflected in:

The business project statically relies on the algorithm package, which is deployed in the business project, and the update and iterative launch of the algorithm package requires the release of the business project.
The algorithm package and the business project run in the same JVM. Although one RPC consumption is reduced, the calculation performance of the algorithm package will affect the performance of the business project, and the stability of the business project is uncontrollable. For example, the TensorFlow model calculation consumes too much CPU, The loading and switching of large models consumes memory and other issues.
As the functions provided by the Turing platform become more and more abundant, the Turing online service SDK becomes more and more bloated. The business project must upgrade the Turing online service SDK to use the new features of the Turing platform, but the business project upgrade SDK has a higher risk , And will slow down the speed of business engineering deployment.

图4 三方高耦合示意图

Based on the above points, it can be seen that the three parties of algorithm, engineering, and Turing platform are highly coupled, causing each of them to have many pain points, as shown in Figure 4. These problems have seriously affected the efficiency of algorithm iteration. Algorithm iterations have a long online test period and low efficiency:

Algorithm pain point business projects. Each project release requires a complete R&D test cycle, which results in a long process and low efficiency.
Engineering pain point : Algorithm package and business engineering are in the same JVM, the performance of algorithm calculation will affect the performance of business engineering services; at the same time, business engineering needs to follow the iteration of the algorithm package to release frequently, and the changes may only involve upgrading the version of the algorithm package .
Turing platform pain point : Turing online service SDK is deployed in business engineering, and the version convergence is difficult, and compatibility is difficult; at the same time, it is difficult to promote Turing's new functions, and business engineering needs to upgrade Turing online service SDK.

Therefore, it is necessary to better decouple the algorithm, engineering and Turing platform, which not only meets the demand for fast iteration of the algorithm, but also meets the demand for stability of the business engineering side, and win-win cooperation.

4. Turing OS 2.0

Aiming at the pain points of the high coupling of algorithm, engineering and Turing platform in the Turing OS 1.0 framework, we developed the Turing OS 2.0 framework. The goal is to solve the coupling problem of algorithm, engineering, and Turing platform, so that algorithm iteration does not need to rely on. The project is released, and the new functions of the Turing platform are launched without the need to upgrade the business project SDK, which further improves the efficiency of algorithm iteration and engineering development.

Focusing on the goal of decoupling algorithm, engineering and Turing platform, in the framework of Turing OS 2.0, we designed and developed functions such as the algorithm package plug-in hot deployment framework, algorithm data channel and algorithm orchestration framework to support the iterative launch of algorithms. At the same time, it has designed and developed an algorithm verification platform integrating sandbox drainage, real-time playback, performance stress testing, and Debug testing to ensure the high performance, correctness and stability of the algorithm strategy. Turing OS 2.0 framework decouples algorithm, engineering and Turing platform, and realizes the closed loop of algorithm and engineering iteration. The entire process of most algorithm iterations does not require the participation of engineering R&D personnel and test engineers. Algorithm engineers can complete the iterative launch of algorithm strategies at an hourly level; through the empowerment of Turing OS 2.0, the efficiency of algorithm R&D iteration has been greatly improved.

图5 图灵OS框架V2.0

The specific features of Turing OS 2.0 are as follows:

Standardized lightweight SDK : Business engineering only needs to rely on a lightweight Turing OS SDK, without frequent upgrades, reducing the difficulty of engineering end access, and decoupling business engineering and Turing platform.
algorithm plug-in : self-developed Turing algorithm plug-in framework, supports hot deployment of algorithm package as a plug-in in Turing OS service, decoupling algorithm and engineering; Turing OS service can deploy multiple versions of multiple algorithm packages , Each algorithm package has independent thread pool resources.
data channel : In some complex algorithm scenarios, the algorithm strategy still needs to be completed by business engineering: 1) The data is obtained inside the algorithm, and the result can only be obtained through the business engineering call interface and then passed to the algorithm; 2) The algorithm is called inside the algorithm. Algorithm A and Algorithm B can only be called at the same time through a business project transfer. In order to solve the above two points, we put forward the concept of data channel, so that the algorithm itself has the ability to obtain data independently, instead of all data needs to be obtained by business engineering and then transparently transmitted to the algorithm.
Algorithm Arrangement : Multiple algorithms are combined into a directed acyclic graph (DAG) in a serial or parallel manner, which can be regarded as an algorithm orchestration; the abstraction and precipitation of business algorithms corresponds to the new architecture is a combination of algorithms With orchestration, algorithm orchestration further empowers business launches and algorithm iterations, further improves the efficiency of business algorithm iterations, and further decouples algorithms and engineering.
sandbox drainage : Turing sandbox is a service that is physically isolated from Turing OS, but has the same operating environment. Traffic passing through the sandbox will not have any impact on online business; the sandbox can verify the correctness of the algorithm logic , While evaluating the performance of algorithm calculations to improve the efficiency of the R&D testing process.
Turing playback and unified buried point : A lot of important data (algorithm strategies, models, features, parameters, data channels and other related data) will be generated in the process of algorithm calculation and model calculation. These data are not only helpful for quick troubleshooting and positioning System problems also provide an important data basis for modules such as AB experiment reports, sandbox drainage and performance stress testing. In order to better automatically record, store and use these data, we have designed a real-time playback platform and a unified embedding point.
performance stress test : Turing OS integrates the capabilities of Meituan’s full-link stress test system Quake, reuses the traffic data collected by the unified playback platform to construct requests, and performs stress tests on the sandboxes that have deployed the new version of the algorithm package. Ensure the performance and stability of the algorithm strategy iteration.

图6 图灵OS 2.0总体架构

The following will introduce the above-mentioned functions and features to see how Turing OS 2.0 solves the pain points of the three-way coupling of algorithm, engineering and Turing platform.

4.1 Standardized lightweight SDK

In order to solve the coupling pain point between business engineering and Turing platform, that is, the difficulty of SDK version convergence when Turing online service SDK is deployed in business engineering, we mainly focus on the lightweight, simple and easy access, stable and scalable, safe and reliable SDK. In consideration of several aspects, the Turing online service SDK was split and transformed:

SDK lightweight : sink the original Turing OS SDK logic into the Turing OS service, and only provide a simple and general batch prediction interface; the SDK does not need to expose too much algorithm-related details, algorithm version routing, real-time/offline Feature acquisition, model calculation, etc. are hidden inside Turing OS. The lightweight SDK integrates the custom routing of Turing OS. The business side does not need to pay attention to which Turing OS cluster the algorithm package is deployed in, and it is completely transparent to the user.
simple and easy to access : Provides a unified and general Thrift interface for algorithm calculations, and uses Protobuf/Thrift to define algorithm input and output. Compared with the current Java class definition interface, the advantage is that the compatibility is guaranteed; after the definition of the Protobuf interface, the algorithm And the project can be developed independently.
can be extended to : The lightweight SDK version is stable and does not require repeated upgrades on the engineering side; Protobuf naturally supports serialization, and subsequent traffic copying and playback embedding can be performed based on this.
High-performance : For scenarios that require large-scale algorithm calculations and require high availability, such as batch prediction for C-end users, we have designed asynchronous batches and high parallelism to improve algorithm computing performance; for single-task computing time-consuming and CPU consumption For scenarios with high requirements and high availability, such as scheduling path planning in sub-urban areas, we have designed an optimal retry mechanism for fast failure of the client to ensure high availability, and also balance the computing resources of Turing OS.
safe and reliable. : For the scenario where multiple algorithm packages are deployed on a single Turing OS, it provides resource isolation at the thread pool level. Different algorithm packages for each business line are split vertically according to business scenarios to provide physical-level cluster resource isolation. Increase the fuse downgrade mechanism to ensure the stability and reliability of the calculation process.

4.2 Algorithm plug-in

Through the standardized and lightweight transformation of Turing OS SDK, we have solved the pain points of coupling between business engineering and Turing platform. Through the service transformation of Turing OS, the pain point of coupling between algorithm and business engineering is solved. However, the pain points of coupling between the algorithm and the Turing platform still exist and the pain points have increased: the iterative launch of the algorithm relies on the release of the Turing OS service, and it has failed to achieve the goal of tripartite decoupling.

In order to solve the coupling pain points between the algorithm and the Turing platform, and to further improve the iterative efficiency of the algorithm strategy, our next design idea is the algorithm plug-in, Turing OS containerization: the algorithm package is deployed as a plug-in to the Turing OS In the algorithm package release version, Turing OS is not required to be released, and Turing OS does not even need to be restarted, as shown in Figure 7.

algorithm plug-in : We have developed the Turing OS algorithm plug-in framework, which supports the deployment of algorithm packages to Turing OS services in the form of plug-ins; the specific implementation scheme is a custom algorithm class loader ClassLoader, and different ClassLoaders load different Algorithm package version, by loading multi-version algorithm package and pointer replacement, realizing algorithm package hot deployment.
Turing OS containerization : Turing OS acts as a plug-in container, loads different algorithm versions of algorithm packages, performs algorithm version routing and algorithm strategy calculations, the process after Turing OS is containerized: 1) If the algorithm version is not If you need to add new parameters, neither the engineering side nor Turing OS needs to release a version; 2) The main work of business engineering is to pass parameters to the algorithm, and the logic is simple. If there is no change in the input parameters, there is no need to release the version. Take control.

图7 图灵OS容器化-算法插件化示意图

4.3 Data channel

Through the above methods, we have solved the coupling problem of algorithm, engineering and Turing platform in the release iteration. However, in addition to the above coupling, there are also some complex algorithm scenarios. The algorithm and business engineering still have coupling, which is mainly reflected in the following two points of data that the algorithm depends on the business engineering:

algorithm internally obtains data : Currently, the results are obtained through the business engineering call interface and then passed to the algorithm, such as some service-oriented interface data, distributed KV cache data, etc., algorithms and business projects need to be developed and iteratively launched.
algorithm internally calls algorithm : At present, it is implemented by calling algorithm A and algorithm B at the same time through business engineering and writing transfer logic. For example, the input of algorithm A needs to use the result of algorithm B, or the result of algorithm A and algorithm B needs to be integrated to get the final Output, these operations are generally handled by business engineering. An optional solution is to merge Algorithm A and Algorithm B into one huge algorithm, but the disadvantage of this solution is that it increases the cost of research and development of algorithm A and algorithm B for independent AB experiments and grayscale.

In order to solve the above two points, we put forward the concept of data channel (Data Channel), so that the algorithm itself has the ability to obtain data autonomously. In the algorithm, the algorithm can support the data channel through the way of annotation provided by Turing OS. The interactive interface between the algorithm and the business engineering only needs to pass some key parameters and context data, and the algorithm internally assembles the parameters required by the data channel. After the transformation of data channelization, the algorithm interface is further simplified, the coupling degree between the algorithm and the project is further reduced, and the problem of the algorithm internally calling the algorithm can be solved by the algorithm arrangement described below.

4.4 Algorithm layout

A complete algorithm calculation process includes the algorithm calculation part, as well as the input preprocessing logic and the post-processing logic of the calculation result, etc. The algorithm calculation can be N rule calculations, N model calculations (machine learning and deep learning, etc.), or Non-model algorithm calculations (such as genetic algorithm, operations research optimization, etc.), or a combination of multiple types of algorithms. We abstract this computational logic unit with independent input and output as an operator, which can be arranged and reused. The two general types of operators are as follows:

model calculation operator : the model calculation engine performs model calculations. We support local and remote model calculation modes. In the remote calculation mode, the model may be deployed in different model clusters, and the operator is a further step to the model calculation. Encapsulation, functions such as Local and Remote selection and model cluster routing are transparent to users, algorithm engineers do not need to perceive, we will dynamically adjust according to the overall computing performance.
algorithm calculation operator : that is, the algorithm calculation engine in Turing OS performs algorithm strategy calculation. Different algorithm plug-ins may be deployed in different Turing OS. At the same time, the routing function of Turing OS cluster is encapsulated. User transparency.

Multiple operators are combined into a directed acyclic graph (DAG) in a serial or parallel manner to form an operator arrangement. Currently, we have two ways to implement operator arrangement:

algorithm data channel : the algorithm calculation engines in different Turing OS call each other or the algorithm calculation engine calls the model calculation engine. The algorithm data channel is a specific means to realize operator arrangement.
Algorithm master control logic : We extract a layer of algorithm master control logic layer from the upper layer of algorithm call to meet the complex algorithm scene and the situation of multiple algorithm correlation dependencies. The algorithm master control logic is implemented by the algorithm engineer in the algorithm package ; Through the general control logic function of the algorithm, the algorithm engineer can arbitrarily arrange the relationship between the algorithms to further decouple the algorithm and the project.

From the perspective of an algorithm engineer, Turing OS provides services in the form of building blocks. By combining independent sub-functions and operators, they are connected in series and parallel in a standard manner to form an online system that meets various needs.

图8 基于算子编排的算法在线服务架构

Under this architecture, the algorithm’s work mainly consists of the following three parts: 1) Algorithm engineers perform business process abstraction and modeling; 2) Algorithm engineers perform independent operator development and testing; 3) Algorithm engineers perform calculations based on business process abstraction. Arrangement and combination of children. Operator orchestration further empowers business function launches and algorithm iteration, and the efficiency of business algorithm iteration is further improved.

4.5 Multi-mode integration

The above introduced Turing OS as a container that can deploy multiple versions of multiple algorithm packages, and supports hot deployment of algorithm packages. Turing OS decouples business engineering, algorithms, and Turing's tripartite coupling through plug-in hot deployment and orchestration functions, which greatly improves the iterative efficiency of algorithms. In order to further meet the requirements of the business, we provide two Turing OS deployment integration modes: Standalone mode and Embedded mode.

Standalone (standalone mode)

In Standalone mode, Turing OS is deployed separately from business services. Business services call algorithms through lightweight SDK. Turing lightweight SDK encapsulates Turing OS’s custom routing, and Thrift-RPC calls Turing. The logic of the OS service.

Embedded (embedded mode)

In some complex scenarios with high concurrency and high performance requirements, higher requirements are put forward for the integration mode and performance of our Turing OS. In the independent deployment mode, every algorithm calculation of the business engineering has the consumption of RPC, so we have realized the new integration mode of Turing OS-Embedded. In the Embedded mode, we provide the Turing OS framework code package to the outside. The business side integrates the Turing OS framework package in its own engineering services. The business service also serves as a Turing OS container, or the algorithm is called through the lightweight SDK. Algorithm calculations are performed locally in the business service. The features of the embedded Turing OS are as follows:

Because business engineering integrates Turing OS framework code, it inherits the functions of algorithm package plug-in and hot deployment, and has the dual attributes of business functions and Turing OS container.
Business engineering does not directly rely on the algorithm package, but is dynamically managed by the Turing OS framework, and the algorithm package is plug-in hot deployment, which achieves the purpose of decoupling algorithm and engineering.
The business engineering directly performs local algorithm calculations, reducing the RPC and serialization consumption of algorithm calls, and at the same time reusing the business engineering server resources, further reducing cluster resource consumption and improving resource utilization.

When the algorithm package plug-in is deployed, the business project integrated in the embedded mode will load the corresponding algorithm package as a container and route it to the local for algorithm calculation, as shown in Figure 9 below.

图9 图灵OS集成模式Embed/RPC示意图

Standalone and Embedded modes have their own pros and cons, and no one has absolute advantages. When using them, you need to choose according to specific business scenarios. The comparison of the two modes is as follows:

Deployment mode	advantage	Disadvantage	Applicable scene
Standalone	The degree of coupling is lower, and the business side only relies on Turing's lightweight SDK	Need to build Turing OS cluster, occupy machine resources; there is RPC call overhead	Suitable for business scenarios that require large-scale calls and asynchronous and parallel computing of distributed multiple machines
Embedded	Reuse of business side machines, high resource utilization; less RPC calls, high performance	Can not give full play to multi-machine asynchronous distributed parallelism, only single machine parallel	Suitable for small batch calls, business scenarios that require high RT performance for a single call

4.6 Turing Sandbox

After Turing OS supports the hot deployment of algorithm plug-ins, the algorithm iteration efficiency has been greatly improved compared with before, and the freedom of online algorithm engineers has also been greatly increased. There is no need to go through the scheduled development and testing of business engineering and testing; however, new ones have also been introduced. problem:

Before the algorithm iteration goes online, it is impossible to pre-calculate the flow rate on the lead, and the algorithm effect is evaluated before going online. It is difficult to check before going online, and the test efficiency of algorithm engineers is low.
Current online real-time evaluation and verification are difficult, and the online performance and effect evaluation of algorithm strategies lack process automation tools.
Frequent iterative launches are also a big challenge to Turing OS service and business stability.

At that time, the optional plan was to deploy the algorithm strategy first, cut the gray scale to small traffic, and then analyze the effect of the unified buried point log evaluation algorithm. The drawback of this solution is that it is impossible to evaluate the effect of the algorithm before it goes online, and the problem is found too late. If there is a problem with the grayscale function, it will affect the online business and cause a bad case. In response to the above-mentioned pre-launch verification issues, we developed the Turing Sandbox, which realized the full-link simulation experiment of the algorithm without disturbing the stability of the online business.

Turing sandbox is a service that is physically isolated from Turing OS service but has the same operating environment. Traffic passing through the sandbox will not cause any impact on online business. As shown in Figure 10 below, online traffic is diverted to the online environment sandbox. The environment configurations and data of Turing OS and Turing sandbox are the same (versions, parameters, features, models, etc.). The new version of the algorithm (version V3 of the algorithm package 1 in Figure 10 below) first deploys the sandbox to drain traffic to verify the correctness of the algorithm. At the same time, it can also drain the traffic in the sandbox for algorithm performance pressure testing. As an automated tool for the algorithm verification process, Turing Sandbox improves the efficiency of algorithm testing and further improves the iterative efficiency of algorithm versions.

图10 图灵沙箱引流验证示意图

4.7 Unified playback platform

In order to facilitate the analysis of the effect of the algorithm and the troubleshooting of abnormalities, we need to record the input, output, features and models used in the algorithm calculation process in order to restore the scene. However, a large amount of data will be generated in the process of algorithm calculation, which brings challenges to storage and recording:

large amount of data : One request may correspond to multiple algorithm model calculations, and rich feature values are often used, resulting in intermediate calculation data that is several times the request amount.
has high concurrency. : Centrally collects and stores data generated by Turing OS services, and needs the ability to carry the sum of QPS traffic during peak periods of these services.
customizable. : Turing OS deploys dozens of different algorithms. Their request and response formats are very different, and data such as features and data sources are even more difficult to unify.

In order to better record and store these important data, Turing OS designed and developed a unified playback platform, and provided solutions to the above problems, as shown in Figure 11 below:

Combining ES and HBase to store playback data, where ES stores key index fields, and HBase stores complete data records, to give full play to the advantages of both, while meeting the requirements of fast query and search and massive data storage.
Use Google Protobuf's DynamicMessage function to expand the original Google Protobuf format, dynamically support the definition of playback data format and data assembly, and support synchronization with the ES index, which not only guarantees the high performance of serialization and storage, but also ensures the data of each algorithm Efficient access.
Considering that the timeliness requirements for these data queries are not high, the message queue is used to decouple sending and storage to achieve the effect of peaking and filling valleys in traffic. The algorithms in the Turing OS platform are automatically connected to playback through the playback Client.

图11 图灵回放平台示意图

4.8 Performance stress testing and tuning

Through Turing sandbox and unified playback, Turing OS has the ability to quickly verify the correctness of algorithm data, but it lacks automated tools in terms of algorithm calculation performance analysis. Turing OS integrates the capabilities of the company's full-link stress measurement system Quake (see " Full-link Stress Measurement Platform (Quake) Practice in " for details on Quake), and reuses the traffic data collected by the unified playback platform To construct a request, perform a stress test on Turing OS or Turing sandbox that has deployed the new version of the algorithm package.

During the stress test, the performance of the algorithm in different QPS scenarios is recorded, mainly including application indicators such as CPU and memory, response time-consuming data such as TP delay and timeout rate, and compare it with online real performance, historical stress test data and service promises Comparing and analyzing the SLA to provide a stress test report and optimization guide. When there are obvious performance problems, the online process of the algorithm package will be blocked. Turing OS is also connected to Meituan's internal performance diagnosis and optimization platform Scalpel, which can generate analysis reports of thread stacks and performance hotspots during stress testing, assisting users in quickly locating performance bottlenecks and providing references for specific optimization directions.

图12 图灵全链路压测及性能诊断示意图

5. Turing OS 2.0 construction results

5.1 Algorithm development process

Through the algorithm plug-in transformation and dynamic hot deployment capabilities of Turing OS, we decoupled the algorithm, engineering, and Turing platform, realized the respective closed loops of algorithm and engineering iteration, improved R&D efficiency, and greatly shortened the online cycle of algorithm iteration:

When model iterations, feature changes, and algorithm strategy iterations, algorithm engineers can independently complete the development and testing of the entire link without the intervention of engineering R&D personnel and test engineers; at the same time, the algorithm package can be deployed independently without any service going online. It will be known after going online. The engineering side and the product side only need to pay attention to changes in related indicators.
When new business scenarios and new algorithm strategies are connected, algorithms and engineering need to be jointly developed. After the Protobuf interface is defined, algorithm engineers and engineering developers can independently develop codes and go online.

By using automated tools such as sandbox drainage verification and performance stress testing and diagnosis provided by Turing OS, the efficiency of algorithm strategy iteration is further improved, and the online period of algorithm iteration is greatly shortened, from the day level to the hour level. Algorithm engineers develop independently, then deploy Turing OS for self-testing and debugging, deploy sandboxes for drainage testing, evaluate the effect and performance through the pressure testing platform, and finally deploy and go online independently. The entire process does not require the participation of engineering R&D personnel and Turing engineers to achieve automatic The goal of operation and maintenance; at the same time, various methods are used to ensure the execution performance of the algorithm strategy and the operational stability of Turing OS.

图13 图灵算法研发流程

5.2 Summary of Turing OS 2.0 Usage

Turing OS (Turing Online Service Framework 2.0) has been built for more than half a year. The overall overview is roughly as follows: Currently, 20+ Turing OS clusters have been built, 25+ algorithm packages and 50+ algorithms have been connected. The monthly algorithm package is deployed and launched 200+ times; it supports tens of billions of algorithm strategy calculations every day. With Turing OS empowered, most of the algorithm iteration process does not require the participation of engineering R&D personnel and test engineers. Algorithm engineers can complete the iterative launch of algorithm strategies at an hourly level.

Currently, a Turing OS cluster can carry multiple algorithm packages of a single business line or multiple sub-business line algorithm packages of a single department. Algorithm packages and Turing OS clusters can be dynamically associated and deployed dynamically. Turing OS also supports business line levels. It is isolated from physical resources at the algorithm package level. In order to facilitate the use of the business side, we provide comprehensive access documents and video courses. Except for the Turing platform side to build Turing OS clusters, any business side can basically build its own Turing OS service within one hour. We also provide best practice documents and performance tuning configurations, so that business parties can solve most of the problems themselves without guidance. We are currently building automated operation and maintenance tools to further reduce the access threshold and operation and maintenance costs of Turing OS.

6. Summary and future outlook

Of course, there is definitely no perfect algorithm platform and algorithm online service framework, Turing OS still has a lot of room for improvement. As we continue to explore machine learning and deep learning online services, there will be more and more application scenarios that require Turing OS support. In the future, we will continue to build in the following areas:

Construct Turing OS automated operation and maintenance tools and automated testing tools, support semi-automatic algorithm development, and further reduce platform access costs and operation and maintenance costs.
Further improve the Turing OS framework, improve the algorithm support capabilities, and support the operation in the Spark environment. When the algorithm is iterative, the correctness, performance and effect of the new function of the algorithm will be verified based on massive data.
Promote the construction of Turing OS full graph engine, provide graphical process orchestration tools and graph execution engine through general components of abstract algorithm business, further empower business launch and algorithm iteration, and further improve iteration efficiency.

7. About the author

Yongbo, Jishang, Yanwei, Feifan, etc., all come from the Algorithm Platform Group of Meituan's Distribution Technology Department, and are responsible for Turing platform construction and other related work.

8. Recruitment Information

If you want to feel the charm of Turing platform and Turing OS up close, welcome to join us. Meituan’s delivery technical team sincerely recruits technical experts and architects from machine learning platforms and algorithm engineering to face the challenges of complex business and high concurrent traffic, and build the industry’s largest instant delivery network and platform to meet Meituan’s delivery The era of comprehensive business intelligence. Interested students can submit their resumes to: houyongbo@meituan.com (the subject of the email indicates: Meituan Delivery Technical Team).

Read more technical articles from the

1609e513ba1596 | . You can view the collection of technical articles of the Meituan technical team over the years.

| This article is produced by the Meituan technical team, and the copyright belongs to Meituan. Welcome to reprint or use the content of this article for non-commercial purposes such as sharing and communication, please indicate "the content is reproduced from the Meituan technical team". This article may not be reproduced or used commercially without permission. For any commercial activity, please send an email to tech@meituan.com to apply for authorization.