vivo commented on the practice of traffic and data isolation in the middle platform

1. Background

By providing general capabilities such as comment publishing, liking, reporting, and custom comment sorting, vivo commenting center helps the front-end business to quickly build comment functions and provide comment operation capabilities, avoiding the duplication of front-end business construction and the problem of data silos. At present, there are 10+ business accesses such as vivo short video, vivo browser, negative one screen, vivo mall and so on. The traffic size and fluctuation range of these services are different. How to ensure the high availability of front-end services and avoid the unavailability of other services due to the surge in traffic of one service? The comment data of all businesses are stored by the middle office. Their data volume and DB pressure are different. As a middle office, how should we isolate the data of each business and ensure the high availability of the entire middle office system?

This article will share with you the solution of vivo commenting on the middle platform, mainly from the two parts of traffic isolation and data isolation.

2. Traffic isolation

2.1 Traffic grouping

The vivo browser business has hundreds of millions of daily activities, and real-time hot news is pushed across the entire network. For such important businesses with large numbers of users and large traffic, we provide a separate cluster to provide them with services to avoid being affected by other businesses.

The vivo commenting platform provides external services through the Dubbo interface. We logically divide the entire service cluster through Dubbo tag routing. A Dubbo call can intelligently select the service provider corresponding to the tag according to the tag carried in the request. transfer. As shown below:

1) provider tagging: There are currently two ways to complete instance grouping, namely dynamic rule tagging and static rule tagging. Dynamic rules have higher priority than static rules, and when two rules exist and appear at the same time In the event of a conflict, the dynamic rules will prevail. The company's internal operation and maintenance system supports dynamic marking very well, by marking the machine with the specified ip (non-docker container, the machine ip is fixed).

2) The front-end consumer specifies the service label: set when the request is initiated, as follows;

The front desk specifies the routing label of the middle station

RpcContext.getContext().setAttachment(Constants.REQUEST_TAG_KEY,"browser");

The scope of the request label is each invocation. You only need to set the label before calling the comment middle-end service. The provider of the front-end business calling other services is not affected by the route label.

2.2 Multi-tenant current limiting

We isolate the high-traffic business through a separate cluster. However, the cost of deploying clusters independently is high, and a cluster cannot be deployed independently for each front-end business. In most cases, multiple services still need to share a set of clusters, so how to deal with burst traffic when the services of the shared cluster encounter traffic? That's right, limit the flow! However, many current throttling is a one-size-fits-all approach to limiting the overall QPS of the interface. In this case, the surge in traffic of a certain foreground service will cause all foreground service requests to be throttled.

This requires multi-tenant current limiting (a tenant here can be understood as a front-end business), which supports the current limiting processing of the traffic of different tenants on the same interface. The effect is as follows:

implementation process :

We use Sentinel's hotspot parameter flow-limiting feature, use the service identity code as the hotspot parameter, and configure different flow control sizes for each service.

So what is the hotspot parameter current limit? First of all, we have to talk about what a hot spot is, and a hot spot is data that is frequently accessed. In many cases, we want to count the top n data with the highest access frequency in a hotspot data and restrict its access. for example:

Commodity ID is a parameter, which counts the most frequently purchased commodity IDs within a certain period of time and imposes restrictions.
User ID is a parameter, which is limited for user IDs that are frequently accessed within a period of time.

The hotspot parameter current limit will count the hotspot parameters in the incoming parameters, and limit the current of resource calls containing hotspot parameters according to the configured current limiting threshold and mode. Hotspot parameter current limiting can be regarded as a special kind of flow control, which takes effect only for resource calls that contain hotspot parameters. Sentinel uses the LRU strategy to count the most recently accessed hotspot parameters, and combines the token bucket algorithm to perform parameter-level flow control. The following image is an example of a comment scenario:

Using Sentinel to protect resources is mainly divided into several steps: defining resources, defining rules, and processing rules to take effect.

1) Define resource :

It can be understood as the API interface path of each middle station.

2) Define rule :

Sentienl supports many rules for QPS flow control, adaptive current limiting, hotspot parameter current limiting, cluster current limiting, etc. Here we use single-machine hotspot parameter current limiting.

Hotspot parameter current limiting configuration

{
    "resource": "com.vivo.internet.comment.facade.comment.CommentFacade:comment(com.vivo.internet.comment.facade.comment.dto.CommentRequestDto)", // 需要限流的接口
    "grade": 1, // QPS限流模式
    "count": 3000, // 接口默认限流大小3000
    "clusterMode": false, // 单机模式
    "paramFieldName": "clientCode", // 指定热点参数名即业务方编码字段，这里是我们对sentinel组件做了优化，增加了该配置属性，用来指定参数对象的属性名作为热点参数key
    "paramFlowItemList": [ // 热点参数限流规则
        {
            "object": "vivo-community", // 当clientCode为该值时，匹配该限流规则
            "count": 1000,   // 限流大小为1000
            "classType": "java.lang.String"
        },
        {
            "object": "vivo-shop", // 当clientCode为该值时，匹配该限流规则
            "count": 2000, // 限流大小为2000
            "classType": "java.lang.String"
        }
    ]
}

3) The rule takes effect processing :

Sentinel will throw ParamFlowException when the current limiting rule is triggered. It is not elegant to directly throw the exception to the front-end business for processing. Sentinel provides us with a unified exception callback processing entry DubboAdapterGlobalConfig, which supports us to convert exceptions into business custom results and return them.

Custom current limiting return result;

DubboAdapterGlobalConfig.setProviderFallback((invoker, invocation, ex) ->
AsyncRpcResult.newDefaultAsyncResult(FacadeResultUtils.returnWithFail(FacadeResultEnum.USER_FLOW_LIMIT), invocation));

What additional optimizations did we do:

1) The current limiting console within the company does not yet support the current limiting configuration of hotspot parameters, so we have added a new current limiting configuration controller to support the dynamic delivery of current limiting configuration through the configuration center. The overall process is as follows:

Dynamic delivery of current limiting configuration;

public class VivoCfgDataSourceConfig implements InitializingBean {
    private static final String PARAM_FLOW_RULE_PREFIX = "sentinel.param.flow.rule";
 
    @Override
    public void afterPropertiesSet() {
        // 定制配置解析对象
        VivoCfgDataSource<List<ParamFlowRule>> paramFlowRuleVivoDataSource = new VivoCfgDataSource<>(PARAM_FLOW_RULE_PREFIX, sources -> sources.stream().map(source -> JSON.parseObject(source, ParamFlowRule.class)).collect(Collectors.toList()));
        // 注册配置生效监听器
        ParamFlowRuleManager.register2Property(paramFlowRuleVivoDataSource.getProperty());
        // 初始化限流配置
        paramFlowRuleVivoDataSource.init();
 
        // 监听配置中心
        VivoConfigManager.addListener(((item, type) -> {
            if (item.getName().startsWith(PARAM_FLOW_RULE_PREFIX)) {
                paramFlowRuleVivoDataSource.updateValue(item, type);
            }
        }));
    }
}

2) There are two ways for the native sentinel to specify the current limiting hotspot parameters:

The first is to specify the nth parameter of the interface method;
The second is that the method parameter inherits ParamFlowArgument and implements the ParamFlowKey method. The return value of this method is the value of the hotspot parameter.

Neither of these two methods is flexible. The first method does not support specifying object properties; the second method requires us to modify the code. If an interface parameter does not inherit ParamFlowArgument after going online, and you want to configure the hotspot parameter current limit, you can only pass The way to change the code release solved it. Therefore, we have optimized the current limiting source code of the hotspot parameter of the sentinel component, adding "a property of the specified parameter object" as a hotspot parameter, and supporting object-level nesting. Small code changes, but greatly facilitate the configuration of hotspot parameters.

The modified hotspot parameter verification logic;

public static boolean passCheck(ResourceWrapper resourceWrapper, /*@Valid*/ ParamFlowRule rule, /*@Valid*/ int count,
                                Object... args) {
 
    // 忽略部分代码
    // Get parameter value. If value is null, then pass.
    Object value = args[paramIdx];
    if (value == null) {
        return true;
    }
 
    // Assign value with the result of paramFlowKey method
    if (value instanceof ParamFlowArgument) {
        value = ((ParamFlowArgument) value).paramFlowKey();
    }else{
        // 根据classFieldName指定的热点参数获取热点参数值
        if (StringUtil.isNotBlank(rule.getClassFieldName())){
            // 反射获取参数对象中的classFieldName属性值
            value = getParamFieldValue(value, rule.getClassFieldName());
        }
    }
    // 忽略部分代码
}

3. MongoDB data isolation

Why do data isolation? There are two reasons for this. The first point is that the middle office stores the data of different services in the front desk. During data query, the data of each service cannot affect each other, and the data of service B cannot be queried by service A. The second point: each business has different data levels and different pressures on db operations. For example, in traffic isolation, we provide a separate set of service clusters for browser business use, then the db used by browser business also needs to be configured separately. , so as to be completely isolated from the service pressure of other businesses.

vivo commented that the middle platform used MongoDB as the storage medium (for students who are interested in the details of database selection and Mongodb application, you can read our previous introduction " MongoDB's practice in commenting on the middle platform "), in order to isolate the different business parties Data, the comment center provides two data isolation solutions: physical isolation and logical isolation.

3.1 Physical separation

The data of different business parties is stored in different database clusters, which requires our system to support multiple data sources of MongoDB. The implementation process is as follows:

1) Find a suitable entry point

By analyzing the source code of the execution process of spring-data-mongodb, it is found that before executing all statements, a getDB() action will be performed to obtain the database connection instance, as follows.

spring-data-mongodb db operation source code;

private <T> T executeFindOneInternal(CollectionCallback<DBObject> collectionCallback,
        DbObjectCallback<T> objectCallback, String collectionName) {
    try {
        //关键代码getDb()
        T result = objectCallback
                .doWith(collectionCallback.doInCollection(getAndPrepareCollection(getDb(), collectionName)));
        return result;
    } catch (RuntimeException e) {
        throw potentiallyConvertRuntimeException(e, exceptionTranslator);
    }
}

getDB() will execute the getDb( ) method of the org.springframework.data.mongodb.MongoDbFactory interface. By default, the SimpleMongoDbFactory implementation of MongoDbFactory is used. Seeing this, we can naturally think of using the "proxy mode" to proxy objects with SimpleMongoDbFactory Go replace SimpleMongoDbFactory and create a SimpleMongoDbFactory instance for each MongoDB set inside the proxy object.

Execute the getDb( ) operation of the proxy object when the db operation is performed, it only needs to do two things;

Find the SimpleMongoDbFactory object of the corresponding cluster
Execute the SimpleMongoDbFactory.getdb( ) operation.

The class diagram is as follows.

The overall execution process is as follows:

3.1.2 Core code implementation

Dubbo filter gets the business identity and sets it to the context;

private boolean setCustomerCode(Object argument) {
     // 从string类型参数中获取业务身份信息
    if (argument instanceof String) {
        if (!Pattern.matches("client.*", (String) argument)) {
            return false;
        }
        // 设置业务身份信息到上下文中
        CustomerThreadLocalUtil.setCustomerCode((String) argument);
        return true;
    } else {
        // 从list类型中获取参数对象
        if (argument instanceof List) {
            List<?> listArg = (List<?>) argument;
            if (CollectionUtils.isEmpty(listArg)) {
                return false;
            }
            argument = ((List<?>) argument).get(0);
        }
        // 从object对象中获取业务身份信息
        try {
            Method method = argument.getClass().getMethod(GET_CLIENT_CODE_METHOD);
            Object object = method.invoke(argument);
            // 校验业务身份是否合法
            ClientParamCheckService clientParamCheckService = ApplicationUtil.getBean(ClientParamCheckService.class);
            clientParamCheckService.checkClientValid(String.valueOf(object));
            // 设置业务身份信息到上下文中
            CustomerThreadLocalUtil.setCustomerCode((String) object);
            return true;
        } catch (NoSuchMethodException | IllegalAccessException | InvocationTargetException e) {
            log.debug("反射获取clientCode失败，入参为：{}", argument.getClass().getName(), e);
            return false;
        }
    }
}

The routing proxy class of the MongoDB cluster;

public class MultiMongoDbFactory extends SimpleMongoDbFactory {
 
    // 不同集群的数据库实例缓存：key为MongoDB集群配置名，value为对应业务的MongoDB集群实例
    private final Map<String, SimpleMongoDbFactory> mongoDbFactoryMap = new ConcurrentHashMap<>();
 
    // 添加创建好的MongoDB集群实例
    public void addDb(String dbKey, SimpleMongoDbFactory mongoDbFactory) {
        mongoDbFactoryMap.put(dbKey, mongoDbFactory);
    }
 
    @Override
    public DB getDb() throws DataAccessException {
        // 从上下文中获取前台业务编码
        String customerCode = CustomerThreadLocalUtil.getCustomerCode();
        // 获取该业务对应的MongoDB配置名
        String dbKey = VivoConfigManager.get(ConfigKeyConstants.USER_DB_KEY_PREFIX + customerCode);
        // 从连接缓存中获取对应的SimpleMongoDbFactory实例
        if (dbKey != null && mongoDbFactoryMap.get(dbKey) != null) {
            // 执行SimpleMongoDbFactory.getDb()操作
            return mongoDbFactoryMap.get(dbKey).getDb();
        }
        return super.getDb();
    }
}

Customize MongoDB operation template;

@Bean
public MongoTemplate createIgnoreClass() {
    // 生成MultiMongoDbFactory代理
    MultiMongoDbFactory multiMongoDbFactory = multiMongoDbFactory();
    if (multiMongoDbFactory == null) {
        return null;
    }
    MappingMongoConverter converter = new MappingMongoConverter(new DefaultDbRefResolver(multiMongoDbFactory), new MongoMappingContext());
    converter.setTypeMapper(new DefaultMongoTypeMapper(null));
    // 使用multiMongoDbFactory代理生成MongoDB操作模板
    return new MongoTemplate(multiMongoDbFactory, converter);
}

3.2 Logical isolation

Physical isolation is the most thorough data isolation, but it is impossible for us to build an independent MongoDB cluster for every business. When multiple businesses share a database, logical isolation of data is required.

logical isolation is generally divided into two types :

One is table isolation: the data of different business parties are stored in different tables in the same database, and different businesses operate on different data tables.
One is row isolation: the data of different business parties are stored in the same table, the redundant business party codes in the table, and the purpose of data isolation is achieved through the business code filter conditions when reading data.

Considering the implementation cost and commenting business scenarios, we chose the table isolation method. The implementation process is as follows:

1) Initialize data table

Every time a new business is connected, we will assign a unique identity code to the business. We directly use the identity code as the suffix of the business table name, and initialize the tables, such as: mall comment table comment\_info\_vshop, community comment Table comment\_info\_community.

2) Automatic search table

Directly use the spring-data-mongodb @Document annotation to support the ability of Spel, combined with our business identity information context, to achieve automatic table lookup.

Automatic lookup table

@Document(collection = "comment_info_#{T(com.vivo.internet.comment.common.utils.CustomerThreadLocalUtil).getCustomerCode()}")
public class Comment {
    // 表字段忽略
}

The overall effect of the combination of the two isolation methods:

Fourth, the last

Through the above practices, we have well supported front-end services of different magnitudes, and achieved no invasion of business code, which better decoupled the complexity between technology and business. In addition, we have also isolated the Redis cluster and ES cluster used in the project for different services. The general idea is similar to the isolation of MongoDB. They are all agents, so I won't introduce them one by one here.

Author: vivo official website mall development team - Sun Daoming

vivo commented on the practice of traffic and data isolation in the middle platform

1. Background

2. Traffic isolation

2.1 Traffic grouping

2.2 Multi-tenant current limiting

3. MongoDB data isolation

3.1 Physical separation

3.1.2 Core code implementation

3.2 Logical isolation

Fourth, the last

vivo互联网技术

引用和评论

vivo 互联网研发效能关键技术与实践

Java8的新特性

Java11的新特性

Java5的新特性

Java9的新特性

Java13的新特性

Java7的新特性