Construction practice of real-time risk control system of Zhongyuan Bank

Abstract: This article is compiled from the speech delivered by Chen Yuqiang, a development engineer at the Central Plains Bank Data Platform Center, at the Flink Forward Asia 2021 industry practice session. The main contents include:
construction system
Selection & Architecture
Application scenarios
Construction effectiveness

Click to view live replay & speech PDF

1. Construction system

Banks are risk-management enterprises, and the ability to identify, measure, price and prevent risks is the core competitiveness of banks. Zhongyuan Bank has built a full-process risk control system for anti-fraud, credit risk and operational risk.

Fraud may exist in the application, transaction, marketing and other links of banking business. With the development of technology, it is more and more difficult to conduct anti-fraud under the circumstance of ganging, concealment, specialization and real-time fraud. At the same time, with the increase of business types, the traditional expert rule scorecard model is difficult to cope with complex risk control scenarios, and requires the help of big data, real-time computing, machine learning, knowledge maps and other high-tech technologies to create high-quality credit capabilities. In addition, whether it is possible to detect and resolve business operation risks in a timely manner, including process risks, abnormal employee behaviors, and liquidity risks of assets and liabilities, also faces great challenges.

When traditional technologies meet these challenges, it is difficult to obtain real-time users' multi-channel operation behaviors, and it is difficult to achieve all-round and real-time prevention and control effects. The traditional risk control system is generally calculated based on expert rules, which has the characteristics of difficult to control the triggering threshold of the rules, and it is difficult to absorb low-saturated noise data. It is difficult to improve the accuracy by accumulating the number of rules. In addition, the relative isolation between traditional systems, the difficulty of data circulation , and the isolation of data make it difficult for experts to formulate rules and train models, which is not conducive to the overall risk control effect.

The new risk control system first realizes real-time , improves the timeliness of data processing through technologies such as stream computing and in-memory computing, realizes timely identification of risk behaviors across systems, and improves the high concurrency of the system through technologies such as cloud native and resource elasticity. ability. While improving hard power, it pays more attention to intelligence , and combines probability-based machine learning models with expert rules to fully release the value of big data and avoid blind spots for expert rules and experience. In addition, through the creation of platform- based products, rapid support capabilities for different scenarios and a complete risk control system are formed.

In the past three years, we have experienced the exploration, experimentation and platform construction of real-time computing, and applied real-time computing technology to various scenarios such as anti-fraud, event-driven, and real-time OLAP. By the end of 2021, the number of tasks and the average daily processing data volume will increase. several times. In terms of risk control, it has undergone a transition from the introduction of foreign decision-making systems to self-developed decision-making platforms. In 2021, the self-developed decision-making platform has begun to undertake new requirements and rule models migrated from some foreign decision-making systems.

The intelligent risk control system capability model can be summarized as:

Real-time risk feature identification and calculation;
Integrate expert rules and machine learning models to provide intelligent decision-making capabilities through complex orchestration;
Through platform-based shielding of technical details, users are provided with a friendly experience.
In the risk control system, standardization is used to formulate specifications, build data standards and open data capabilities.
And through the construction of the ModelOps management system, the whole life cycle management of the risk model from demand to production is realized.
In addition, low-code and visual methods help lower the threshold for use.

2. Selection & Architecture

In this architecture, Flink is mainly used in scenarios such as data cleaning, real-time dimension table processing and association, and window computing. It realizes the processing of basic indicators, derived indicators, and composite indicators through pre-computing, memory computing, and aggregation computing, and provides decision-making models. Feature support. Model choreography focuses on choreographing rich and easy-to-use rule models such as decision sets, scorecards, decision trees, and decision tables. At the same time, index services and algorithm model services can be invoked in the rules to jointly participate in logical operations.

The risk control system is implemented based on cloud native architecture and open source technology, and supports messages, interfaces, and various types of databases. Through the interface of data source, dimension table, parameter configuration, and supporting users to write business logic with Flink SQL, the threshold for real-time computing is greatly reduced. The computing power of the rule/model/index engine is combined to support risk control decisions through visual orchestration (DAG). In addition, there are some special features, such as SQL scoring, gateway offloading, etc.

Real-time metrics can be used for expert rules, and real-time features can be used for online model training. The machine learning platform uses offline data for model training and model inference, and combines the risk data screened out by rules to conduct supervised and unsupervised algorithm training based on offline data.

3. Application scenarios

Anti-fraud is an important part of transactions. Transaction data is usually widened based on internal and external data such as black and white lists, knowledge maps, judicial, tax, and industrial and commercial data. The widened data is used for expert rules and machine learning models. The transaction initiation system will release or strengthen the verification of the transaction according to the decision result of the smart policy platform. Risk outcome data can be used as samples for association mining or feature analysis of graph data, or for supervised learning.

In terms of technical implementation, for transaction requests, the Smart Policy platform will call different computing engines according to the DAG arrangement logic, and return the calculation results. At the same time, the real-time computing platform will use the change data of the trading system database to calculate real-time indicators such as transactions/behaviors. In addition, historical data will be extracted into offline data warehouses and data lakes for use by downstream machine learning platforms.

A narrow and simple understanding of credit is the act of financial institutions providing funds to customers. The Smart Policy platform carries more than 50 scenarios in the pre-loan stage through scorecards, decision sets, etc., and receives about 30,000 credit requests per day on average; for the mid-loan and post-loan links that focus on batch data processing, the average daily data processing 13 million.

The obvious difference in the technical architecture between the credit granting scenario and the transaction anti-fraud scenario is that it requires external data support. The Smart Policy Platform conducts expert rule calculations and machine learning model inference on transaction variables associated with internal and external data. Credit scenarios do not use real-time indicators for the time being.

Employee behavior, credit management, and public opinion analysis are all within the scope of operational risks. Scenario data such as reversal behavior and machine tool management are processed into offline operational risk indicators, and highly sensitive behavior data are processed into real-time indicators. The indicators carry out rules and model operations to obtain early warning results, and then form risk verification events and lists. The resulting data is also used as risk feature samples to train algorithms and mine risks.

The technical architecture of operational risk is relatively intuitive. The historical business data is synchronized to the data warehouse every day, and the processing of risk indicators is completed in the data warehouse. At the same time, offline data will also be used for model training. The Smart Policy Platform conducts regular calculations on offline indicators every day, and pushes the risk warning results to the downstream operation system.

4. Construction effectiveness

In terms of business results, the anti-fraud system has connected 14 channels and 105 types of scenarios, introduced stream computing to traditional anti-fraud technology to achieve real-time anti-fraud, helped control more than 10,000 high-risk accounts, and assisted in blocking the transfer of funds exceeding 10 million yuan. , and achieved zero fraud losses in online transactions throughout the year. In terms of credit extension, it supports full-cycle credit scenarios, including more than 50 scenarios such as quota assessment, risk pricing, and post-loan warning, and handles more than 30,000 incoming transactions every day. It processes more than 300,000 pieces of operational indicator data in batches every day, and processes about 30 million pieces of employee behavior data in real time through Flink every day. It has the ability to discover internal high-risk behaviors in real time.

In terms of technical achievements, the Smart Policy platform receives more than 50,000 business transaction requests every day, and the response time is about 8 milliseconds (the latest data). The response time of rules and model orchestration scenarios is less than 3 seconds, and about 18 million batches of data are processed every day. The real-time computing platform handles an average of 2.7TB of data per day, an increase of 5 times compared to the beginning of the year. On the basis of platformization, this system has the ability to flexibly arrange expert rules and machine learning models, and the machine learning model service is invoked more than 20,000 times a day.

Company profile: Zhongyuan Bank is the only provincial-level legal person bank in Henan Province. As of June 30, 2021, the total assets are 753.002 billion. Zhongyuan Bank has been constantly surpassing itself while reaping many honors, and its ranking in the well-known "Banker", "Fortune" and other rankings has improved compared with previous years.

Click to view live replay & speech PDF

2022 4th Real-time Computing FLINK Challenge

490,000 bonuses are waiting for you!

Continue the "Encouraging Teacher Program" and win generous gifts!

Click to enter the official website of the competition to learn about registration

For more technical issues related to Flink, you can scan the code to join the community DingTalk exchange group to get the latest technical articles and community dynamics as soon as possible. Please pay attention to the public number~

Recommended activities

Alibaba Cloud's enterprise-level product based on Apache Flink - real-time computing Flink version is now open:
99 yuan to try out the Flink version of real-time computing (yearly and monthly, 10CU), and you will have the opportunity to get Flink's exclusive custom sweater; another package of 3 months and above will have a 15% discount!
Learn more about the event: https://www.aliyun.com/product/bigdata/en

Construction practice of real-time risk control system of Zhongyuan Bank

1. Construction system

2. Selection & Architecture

3. Application scenarios

4. Construction effectiveness

ApacheFlink

引用和评论

Flink CDC 3.4 发布, 优化高频 DDL 处理，支持 Batch 模式，新增 Iceberg 支持

【Hadoop】HDFS架构解析

【Hadoop】HBase系统解析及适用场景

基于 pyflink 的算法工作流设计和改造

通过Milvus内置Sparse-BM25算法进行全文检索并将混合检索应用于RAG系统

MCP+Hologres+LLM 搭建数据分析 Agent

基于 Flink CDC YAML 的 MySQL 到 Kafka 流式数据集成