Introduction to content of this article is the 2021 Cloud Habitat Conference-Enterprise-level Cloud Native Database Best Practice Forum, Shanghai New Energy Vehicle Data Platform Wang Chengming's sharing of the "Internet of Vehicles panoramic monitoring data spatio-temporal hyper-converged database solution".
Wang Chengming, Technical Director of Shanghai New Energy Vehicle Public Data Collection and Monitoring Research Center
This article will introduce the spatial-temporal hyper-convergence database solution for the panoramic monitoring data of the Internet of Vehicles through three parts. (Based on the speech content of the sub-forum "Enterprise Cloud Native Database Best Practices" of the 2021 Cloud Home Conference)
1. Data center platform business introduction
2. Platform technical architecture
III. Platform Vision and Goals
1. Introduction to Data Center Platform Business
First, let me briefly introduce our organization and the business we do. We are a new energy vehicle data center in Shanghai. The business we do is to access the data of all new energy vehicles in Shanghai, including the full access to the data of passenger cars, commercial vehicles, logistics, buses and other electric vehicles. At present, the amount of data connected to our platform is close to 600,000 vehicles. Our data scale is close to 1.5 petabytes. The picture above is our regulatory big data platform.
Our data access is based on the new energy vehicle safety supervision 32960 for data collection, which contains 38 static data and more than 80 dynamic data. Including the vehicle data based on the vehicle VIN code, starting motor, driving motor, battery and alarm and other 8 categories of data. These data are typical time series data of the Internet of Things. We need to store, analyze and apply these data.
At the same time, we have a diversified data structure. We not only have government-entrusted data collection and management, but also international cooperation projects. We have a new energy vehicle data platform, a hydrogen energy station integrated platform, a battery traceability management platform, a renewable energy management platform, and an intelligent networked vehicle management platform. Our goal is to strive to build a core data center into a world-class automotive industry center.
Based on data, we provide effective data support for the formulation, implementation and post-evaluation of government policies, such as new energy vehicle promotion reports, the site selection and deployment of government public charging piles, energy-saving and emission-reduction effectiveness evaluation, and security incident retrospective scenarios. Data support. We also provide value-added services to the market and the industry, such as the sales of second-hand cars in the automotive aftermarket, battery recycling, and the design of insurance products to provide data product services. We also provide data open services to universities and scientific research institutions, and we hope that our data can serve the industry to a greater extent.
2. Platform technology architecture
The overall data architecture is based on the open source Hadoop system, and different architectures should be selected for different scenarios in the data link. Our data is multi-source heterogeneous, including structured data, semi-structured data, IoT time series data, file data, etc. According to different data characteristics, we choose different storage engines. We have built a multi-protocol data collection platform. We have data gateway access based on netty, message queue (Kafka), file log (flume) and data interface API. Data storage includes static data (RDS), cache (Redis), hot storage (Hbase), warm storage (HDFS), cold storage (OSS), etc. Then we provide data analysis and scenario-based applications based on the spark engine.
The above picture is our original technical architecture (data warehouse based on Hadoop). At present, most of the Internet industry will use a similar architecture. It is difficult to have a unified engine to solve the data storage problem. In addition, the prior art cannot effectively analyze time series data. For effective analysis, we need to convert the IoT time series data into structured data to realize data analysis, which can simplify the analysis and satisfy most of our business scenarios.
We have chosen these architectures for different scenarios, but these architectures have also encountered many problems and challenges. There are about four aspects. First, the technology stack is complex, and the integration of multiple components results in a highly complex technology stack and high maintenance costs. Second, storage fragmentation. The realization and maintenance of data synchronization mechanism, data query and maintenance, and data life cycle management lead to a high degree of data redundancy. Third, the development threshold is high. Different development languages and tools used by different technology stacks have high development thresholds and are difficult to standardize. Fourth, platform scalability challenges. We have many challenges in capacity planning, resource utilization, expansion and relocation.
After comparing multiple products, we finally chose the Alibaba Cloud Lindorm platform. The core data storage uses multi-mode data to store to Lindorm (structured, semi-structured, time series data, etc.), and data analysis still uses the spark module provided by the Lindorm platform. The advantages of Alibaba Cloud Lindorm are low cost, high availability, flexibility, automatic data cold and hot separation, and low latency.
The Alibaba Cloud Lindorm platform system architecture realizes end-to-end product integration, greatly reducing development and maintenance costs, and improving ease of use and stability. Supports HBase incremental data that is automatically archived in Parquet format in real time, and is regularly merged and cleaned for analysis by Spark. Spark analysis results are returned to HBase using BulkLoad. Capability product packaging, supports general API calls, and has the capabilities of automatic fault tolerance, distributed expansion, monitoring and alarming, and high performance.
3. Platform vision and goals
Let’s take a case to talk about the value of car data. We once did a scenario where we did a restoration of a traffic accident based on data. We matched the car and the road network. Through data analysis, we found that the car owner was at a certain point in time. Both the accelerator and the brake were stepped on and the accident occurred. Without the analysis of this data, it is difficult to determine whether the fault is the vehicle or the user's driving behavior. Similar scenarios are often encountered in accident identification, and the value of data is also reflected.
In addition, when new energy vehicles are retired, they have a certain residual value. But how to judge the surplus value of new energy vehicles? We can use car data to analyze user driving behavior and battery performance. Therefore, we and our partners jointly develop an app to evaluate the residual value of the car based on our data. First of all, you need to authorize your car data as an individual, and then we can predict how many kilometers your car can run and how much battery residual value is, so as to serve the car aftermarket. This is a very typical application.
Next, talk about our vision and goals. We hope to build an open data ecosystem based on an open platform for big data applications. A data center based on data from new energy vehicles and intelligent networked vehicles. We are committed to centralizing valuable data and data with characteristic tags, plus our data algorithm package, on this platform. We hope that upstream and downstream governments, research institutions and related industries can use our data and platforms for mutual benefit and win-win results.
With the full popularization of new energy, big data plays an important role in the fields of traffic safety, energy conservation and emission reduction. We hope to build this platform with everyone in a way of co-construction, sharing, and co-governance at all levels of traffic safety and energy to empower the industry.
Copyright Statement: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users. The copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。