Data governance helps blockchain move towards the era of big data | "Super Talk Blockchain" 82 issue review
The background and existing problems of blockchain data governance
Data governance uses specific mechanisms to ensure data integrity and security, including quality, efficiency, and security management. Data governance is not a static state, but a sustainable process.
As the blockchain gradually penetrates into people's daily lives, it has applications in digital government affairs, financial services, social governance, public welfare and environmental protection, judicial arbitration and other fields.
After the data in these fields is on the chain, it needs to be analyzed and processed to tap value. On-chain and off-chain data enter the data lake after being collected. At this time, the data lake can provide data support for upper-level applications. The data is often used in business analysis, large-screen display, regulatory auditing, business reports, etc. These functions are again blocks Chain applications provide capacity support. Through this cycle, data "flows" around the data lake.
In many traditional Internet companies, data governance issues mainly occur in the entire process of data production collection, processing and storage, data application, and data destruction. Various risks may occur in each link.
- The current process of integration of blockchain technology and big data governance is also facing new challenges.
- The cost of node storage is high. As the amount of data on the node continues to increase, the cost of node storage increases exponentially;
- The data synchronization time is long. When the data volume of the node is very large, the new node synchronization data cycle is long and cannot quickly join the network;
- The node query performance is low, and the transaction execution efficiency gradually decreases with the increase of the business and data volume on the node;
- Unable to process big data. Due to the specific chain storage structure of the blockchain, big data processing and complex queries cannot be performed on the chain;
- Data export and development costs are high, business analysis needs to analyze data according to smart contracts, and the development cost cycle is long;
- Unable to reuse and poor scalability, when the business changes, the analysis and export of the data on the chain also need to be re-developed.
With the continuous development of blockchain business, the degree of refinement of enterprise operations continues to increase, and the impact of blockchain data on enterprises is increasing. When companies use these assets to create value, the requirements for data quality, efficiency, and security are constantly increasing.
Data governance component technology architecture and program advantages
The technical architecture of the data governance component is developed around the bottom layer of the blockchain, which is divided into two layers: operation and maintenance components, development and business components, to jointly realize the governance of blockchain data.
The operation and maintenance layer includes a data warehouse (Data-Stash), which is responsible for data expansion, backup, tailoring, and synchronization. Development and business components mainly include data export (Data-Export) and business reconciliation (Data-Reconcile). The data export component mainly solves the complex query, analysis and processing problems of blockchain big data; the business reconciliation component mainly provides reconciliation solutions based on blockchain data.
The data governance component solution has the following advantages:
First, the performance is efficient and can be synchronized in real time to query. It provides full data backup while supporting efficient synchronization of node data. Data export provides efficient real-time query capabilities and supports multi-threading and multi-active processing to improve processing performance;
Second, it supports the scalability of different storage media. The data storage supports different storage media such as MySQL and ES, and provides scalable protocol interfaces at the bottom to ensure scalability;
Third, the security, stability, and credibility of data services can be proved, based on multiple nodes to back up data to ensure data integrity;
Fourth, distributed storage supports big data analysis and query. The data governance component exports data on the chain to storage media that is convenient for big data analysis and query based on smart contracts, and provides generalized query capabilities, supporting sub-database sub-table and master-slave Backup
Fifth, low-code development is almost zero cost, components are mainly for developers, we minimize code development as much as possible, and the basic application of components can be completed by simple configuration;
Sixth, the universal design ensures that the solution is reusable, and the components will be designed or developed as far as possible to take into account the versatility, and there is no need to repeat development for different scenarios. At the same time, we will also provide some personalized configurations.
Data governance component application scenarios and component introduction
The advantages of data governance components are closely related to the use of scenarios.
For front-end data services in operation and maintenance management scenarios, it can achieve full backup, data trimming, fast synchronization, and cold data query; in business function scenarios, it mainly involves data analysis, large-scale display, regulatory audit, and business reports; in industry application scenarios Mainly include digital government affairs, financial services, social governance, judicial arbitration, etc.
The following is a specific introduction about data governance components.
Data-Stash data warehouse components
Data-Stash is a data warehouse component based on FISCO BCOS, which mainly provides the capacity of block chain data expansion, backup and tailoring. It generates node backup by parsing the node's Binlog log, so that the node can realize the separation of hot and cold data, and provide the ability of cropping and rapid data synchronization.
Through the analysis of node Binlog, Data-Stash realizes the capabilities of full backup of node ledger, multi-dimensional ledger verification, trusted storage of backup data, and resumable transmission of interrupted points.
Data-Stash mainly has the following features:
(1) Separate cold and hot data
Over time, nodes will accumulate more and more ledger data. If the volume of the node grows uncontrollably, it will eventually erode the node server and cause adverse effects.
In this regard, data separation can be achieved through data warehouse services. Start the Data-Stash service and import the node Binlog into the database to implement data backup. Developers can divide the data on the chain, delete infrequently used data, and retain recent data. In order to keep the node running unaffected, the user needs to ensure that the node is enabled.
(2) Realize efficient node migration
When the blockchain business is running, there are often node expansion or upgrade requirements. For example, if the server needs to go offline or replace the disk due to some failure, we can quickly synchronize the data of the node through Data-Stash.
(3) Supervision, audit and traceability
For the regulator, it is necessary to ensure the integrity and queryability of the ledger data. Since the ledger database of the blockchain itself may not be able to meet the demand, at this time we can perform a complete backup through the data warehouse component; we can use a relational database In order to better query data; in order to better meet the needs of supervision, we adopt a multi-dimensional verification mechanism to prevent malicious tampering of nodes.
Data-Export data export component
Data-Export is also a data export tool based on the FISCO BCOS platform. Users hardly need coding. As long as a simple configuration, structured data can be exported to relational databases or ES databases for subsequent business analysis and processing.
At the same time, it supports multi-active deployment, data sub-database sub-table, export data visualization, application supervision and other functions, and can adapt to various complex business scenarios.
Data-Export mainly has the following features:
(1) Support the export of smart contract data
Contract-related methods and event data can be analyzed and exported through Data-Export. The exported data is more intuitive and can be used for display and analysis.
(2) Complex data query and analysis
In terms of data storage, Data-Export currently supports MySQL and ES storage, and provides extended interfaces. At the same time, it supports multiple export strategies. After the data is exported to the chain, complex queries and further analysis can be carried out.
(3) Technical architecture supporting read-write separation
Using Data-Export, the upper chain write operation can be separated from the read operation, and the read capability is provided by exporting the data to the chain, thereby reducing the pressure on the read operation of the chain node, and realizing the technical architecture of read and write separation.
(4) Provide visualization capabilities such as monitoring
Data on the chain can be exported to a database table, data display is provided through visualization capabilities, the core process and value of data are presented, and the ability to monitor blockchain data is realized.
Data-Reconcile data reconciliation component
The reconciliation between traditional enterprises mainly relies on the centralized ledgers of both parties to the reconciliation. Based on the transferability, immutability, and driving characteristics of the blockchain itself, we can find a credible objective basis.
Data-Reconcile is a data reconciliation component based on blockchain, providing a generalized data reconciliation solution based on blockchain smart contract ledger.
Data-Reconcile mainly has the following features:
(1) Support dynamic, scalable and customized development
On the one hand, the Data-Reconcile data reconciliation component will provide some general models; on the other hand, it also supports further customized development in different business scenarios.
(2) Flexible and configurable data reconciliation rules
Reconciliation rules can be customized and configured to provide scheduling management of reconciliation tasks.
(3) Pluggable and expandable reconciliation process
Provides extended interfaces, pluggable functions and processes.
For the specific operation demonstrations of the three main components of Data-Stash data warehouse, Data-Export data export, and Data-Reconcile data reconciliation, please click to watch the operation demonstration.
WeBankBlockchain-Data-Stash data warehouse component
WeBankBlockchain-Data-Export data export component
WeBankBlockchain-Data-Reconcile data reconciliation component
"Super Talk Blockchain" is a live broadcast event launched by the FISCO BCOS open source community. Every Thursday at 8 pm, the community invites a technical geek or application pioneer to share development practices or application experience in the live broadcast room. As a fixed column in the community, "Super Talk Blockchain" has held nearly a hundred sessions, covering everything from technical seminars to industrial applications. You are welcome to recommend yourself or recommend friends to share in the live broadcast room. Add a small assistant V to join the group to watch the live broadcast.