6
The existing data architecture is difficult to support the realization of modern applications.

With the rapid rise of the cloud computing industry, all walks of life have driven their own cloud-based business innovation and information architecture modernization. The reliability, flexibility, and cost-effective advantages of cloud computing have enabled many companies to take advantage of "Cloud" business is included in the future development strategy plan. Modern applications are re-upgrades of existing applications and new applications developed based on new technologies and new models. Modern applications can help companies face more complex business competition, and in such competition, rely on advanced models, data insights, and application innovation to achieve leadership. Therefore, many companies hope to change the design, construction, and management of applications by adopting modern application development models, thereby improving agility and accelerating their own innovation process. With the rise of modern application development based on cloud native, container, microservice, serverless, etc., various industries are focusing their attention on data architecture. After all, applications built by microservices and serverless are like engines, and data is the real power.

The difficulty of modern application development is also the difficulty of data architecture innovation

Modern applications place higher requirements on scale, availability, and performance.

图片

For modern applications, it is not only necessary to cope with the rapid increase in the number of users, but also to support the ever-increasing application load types and numbers. This is the first problem faced by modern application development-stronger scalability.

Take the game scene as an example. The current top-ranked national game daily active users have exceeded 100 million. In the future, more than one million users will become the norm for applications. Let’s imagine the meta-universe scene of the recent fire. The number of users of a global meta universe application will be several times or even dozens of times this data. The high concurrency comparable to Amazon's Black Friday promotion will become a daily routine. In a scenario where everyone interacts in a meta-universe, the corresponding back-end system responds to concurrency requirements is extremely high. This is not a speculation, but a reasonable prediction of the future based on facts and the status quo. Therefore, the first thing that needs to be solved is the larger-scale concurrency problem.

The second problem facing modern application development is how to store massive amounts of data, and how to process these data in real time and intelligently after storing them.

The current data has shown the "two-eight law": structured data accounted for 20%, and unstructured data accounted for 80%. "Microsoft Flight Simulator" simulates real mountains, roads, and clouds, and generates more than 2.5PB (2.5 x 10^6 GB) of structured data. The amount of data required for the ultimate meta universe is at least several orders of magnitude higher than this.

According to IDC's latest report, unstructured data accounts for more than 90% of existing data. With the increase of new software, the proportion of unstructured data will become higher and higher. Unstructured data in different formats and standards is technically more difficult to store and analyze than structured data. Traditional data architectures will struggle to deal with such massive amounts of data.

In addition, modern applications also need to consider performance and latency issues. In the future, new modern applications will target users all over the world, which has extremely high requirements for latency. In games, 10ms latency is unacceptable, and some games even require memory-level latency. The realization of ultra-strong bandwidth, ultra-fast transmission speed, etc. requires the construction of communications infrastructure all over the world.

It should be noted that while considering high concurrency and low latency, you also need to consider overall quality and cost. The manpower and material resources required to establish, operate and maintain such a large-scale application may be unbearable for ordinary enterprises. So in addition to quality, cost is also an issue that needs to be considered.

In summary, modern applications must process at least terabytes and petabytes of structured data and several times that of unstructured data, support millions of users distributed around the world, and process hundreds per second with extremely low latency. Million requests.

For unstructured data, many companies are now beginning to build cloud data lakes based on cloud storage with exabytes of scalability such as Amazon S3, and process and analyze these data through cloud-native data analysis and processing tools. For structured data, the following shortcomings need to be remedied:

  • Enterprises are bound by traditional commercial databases, making it difficult to innovate. Traditional commercial databases are not only expensive, but also have proprietary technology and license terms, which require frequent audits. Although more and more companies are turning to open source databases such as MySQL and PostgreSQL, they still need the performance of commercial databases.
  • Unable to meet the needs of specific scenarios. With the continuous increase of application scenarios, different applications have their own specific needs. Now, developers are increasingly using microservice architecture to build applications, and choose a new generation of relational and non-relational databases. However, the structural data coupling of relational databases is large, which is not conducive to the expansion of distributed deployment. Non-relational databases have no transaction processing and are slightly lacking in complex queries.
  • The traditional database operation and maintenance model still requires energy and cost. Operation and maintenance is time-consuming but the value output is low, but the enterprise has to expend energy and cost in this regard.

What kind of data architecture does modern applications need as support?

Since the existing data architecture is difficult to support the realization of modern applications, a data architecture change is imperative. This new data architecture must be able to solve the above mentioned problems, that is, it needs to have higher scalability, be able to adapt to diversified data forms, have higher data processing capabilities and lower latency, and of course there must be a path to realization. And tools.

Related technical solutions and innovation

At the moment, the best technology combination in the IT industry may be "cloud computing + artificial intelligence." Cloud computing solves the problems of scalability, data storage, performance, etc., while artificial intelligence technology greatly improves the efficiency of data analysis and processing.

Cloud computing can provide "unlimited renewal" for the peak demand of modern applications and "optimal energy consumption" for smooth operation. Serverless, as one of the cloud computing models, can theoretically automatically adapt applications from zero to infinite demand peaks, and is better at solving scalability issues.

The advantage of the serverless architecture is that it can be loaded on demand, so that the application will not continue to occupy resources, and will only be deployed and started when a request arrives or an event occurs, avoiding cost waste. At the same time, serverless applications natively support high availability, which can better cope with sudden high traffic. When the database is also serverless, it can achieve high scalability and automatic capacity scaling, achieve pay-as-you-go, reduce expenditure costs, and further liberate database management and operation and maintenance. Amazon DynamoDB launched by Amazon in 2012 is a Serverless database.

In 2007, Amazon’s epoch-making paper on Key-value storage, "Dynamo: Amazon's Highly Available Key-value Store", addressed the core demands of satisfying the "always-on" user experience and improving the availability, scalability and performance of its database. , Is considered the pioneering work of NoSQL, which gave birth to a series of NoSQL distributed databases. And Amazon DynamoDB is the orthodox realization of the Dynamo concept, and it is driving a new generation of high-performance, Internet-scale applications that traditional databases cannot carry.

Represented by serverless databases, cloud databases are rapidly developing and maturing, bringing better accessibility and high availability, as well as high scalability and portability. In addition, the cloud database also reduces the difficulty and cost of deployment, and will not cause a particularly heavy burden on the enterprise.

In the face of large-scale data, traditional database components still have problems such as insensitive business types and weak automatic operation and maintenance capabilities. Machine learning algorithms can analyze a large number of data records, mark outliers and abnormal patterns, and can also automatically, continuously, and Perform patching, tuning, backup and upgrade operations without manual intervention, minimize human error or malicious behavior, and ensure safe and efficient database operation. The latest Amazon DevOps Guru for RDS released by Amazon Cloud Technology on re:Invent can help detect database problems, perform root cause analysis, recommend changes, and even automatically repair database problems.

Modern applications are ultimately facing the world, and now many companies are also doing global layout. In this process, the global distributed application system has become the first choice of enterprises. Each node in a distributed system is interconnected through a communication network, which not only facilitates communication, but also realizes resource sharing, and speeds up calculations. However, this also puts more pressure on the operation and maintenance of enterprises, and there are also certain data transmission security issues. Therefore, automated and secure deployment is very important.

The choice of technology is always accompanied by the sacrifice of certain performance. It is difficult to have a product that can achieve the ultimate in performance, function, and usability. Traditional database vendors' approach of "one database for the world" has been unable to meet their needs. Building different types of database products according to different purposes and usage scenarios, and achieving "special database dedicated" is the core of the new data architecture. The dedicated library can be adapted to various applications of different scales, giving priority to the performance that applications most need, and the usability is greatly improved.

How to modernize the architecture?

The popular understanding of architecture is that enterprises can use modern data architecture to get rid of the shackles of traditional databases, and have special tools to complete the modernization of infrastructure. Of course, this is not easy, and to a large extent depends on the ability of the manufacturer.

According to the Gartner 2020 Global Cloud Database Magic Quadrant Report, Amazon Cloud Technology continues to maintain innovation and leadership. Therefore, let's take Amazon Cloud Technology as an example to see how it contributes to the digital drive transformation of enterprises.

Three important features, two important supports

First of all, , Amazon Cloud Technology has created a Serverless database to realize the elastic scalability of the database and further simplify the operation of enterprises to create, maintain and expand the database.

Amazon Cloud Technology has five serverless databases: Amazon Aurora, Amazon DynamoDB, Amazon Timestream (a time series database service), Amazon Keyspaces (a managed database service compatible with Apache Cassandra) and Amazon QLDB (a fully managed ledger database) . Among them, Amazon Aurora has evolved to the v2 version, which can scale the database workload from hundreds of transactions to hundreds of thousands of transactions in one second, which can save up to 90% compared with the cost of configuring capacity for peak loads.

So, how does the serverless database perform? Huami Technology Health Cloud can be regarded as a typical case. As of February 2, 2021, Huami Technology’s smart wearable device has recorded a cumulative number of steps of 151 trillion steps, a cumulative sleep record of 12.8 billion nights, and a total heart rate record of 120.8 billion hours. Huami Technology Health Cloud needs to complete the collection and storage of terabytes of data every day. It not only needs to store a huge amount of data, but also must ensure extremely high data security, stability and low-latency response. In order to solve these problems, Huami Technology Health Cloud uses Amazon DynamoDB as the core database to store users' health and exercise-related data. Amazon DynamoDB can provide a consistent response time with a delay of no more than 10 milliseconds at any scale, supports the construction of applications with unlimited throughput and storage space, and meets the data storage requirements of Huami Health Cloud. In addition, the serverless architecture of Amazon DynamoDB eliminates the need for users to provision, patch, and manage any servers, nor do they need to install, maintain, or run any software.

At present, Huami Technology has fully introduced Amazon Cloud Technology. Zhang Ji, Vice President of Big Data and Cloud Platform of Huami Technology, said, “The characteristics of Huami Technology Health Cloud in terms of data storage and processing are that the layers of cold and hot data are obvious, and the peaks and valleys of data access are also obvious. The function allows us to choose different services to meet different needs to balance performance and cost. "Now Huami uses Amazon DynamoDB to store core data; and uses Amazon Simple Storage Service (Amazon S3) to store cold data, logs, and backup data; Use Amazon Simple Queue Service (SQS), Amazon Simple Notification Service (Amazon SNS) and Amazon Managed Streaming for Apache Kafka (Amazon MSK) for data synchronization; use Amazon Lambda for data migration and dumping; use Amazon Kinesis and Amazon EMR for large data analysis. Compared with before switching to Amazon Cloud Technology, Huami Technology Health Cloud has reduced the number of P0 and P1 level failures by approximately 20%, the recovery time has been reduced by approximately 30%, the overall service availability has increased by 0.25%, and the system's availability indicators Reached 99.99%.

In addition, Jointown has also used Amazon Aurora to replace the traditional MySQL database. The overall database performance has been improved by 5 times and the TCO has been reduced by 50%.

followed by . In order to realize the special library dedicated, Amazon Cloud Technology now has more than a dozen specially constructed database services, including relationships, key values, documents, memory, graphs, time series, wide columns and ledger data types. These database products have their own advantages and are suitable for different application scenarios.

Among them, Amazon MemoryDB for Redis is a Redis-compatible and durable memory database service. It is specially built for modern applications with a microservice architecture, and can be used as a high-performance main database for microservice applications. Enterprises no longer need to manage cache and persistent databases separately.

Amazon DocumentDB is a fast, scalable, highly available, and fully managed document database service that supports MongoDB workloads. As a document database, Amazon DocumentDB can simplify storing, querying, and indexing JSON data. Developers can use the same MongoDB application code, drivers, and tools as today to run, manage, and scale workloads on Amazon DocumentDB, enjoying improved performance, scalability, and availability without worrying about the underlying infrastructure manage.

Amazon DynamoDB is a key-value database service for massive data and large mixed workloads. According to the official introduction, Amazon DynamoDB can build applications with almost unlimited throughput and storage space, providing consistent single-digit milliseconds in environments of any scale The response time is extremely suitable for games, advertising technology, mobile internet and other applications that require low-latency data access of any scale. Huya has used Amazon DynamoDB to automatically expand its capacity to cope with more than 10 times the sudden increase in traffic.

As we all know, NoSQL is often doing things that "make miracles vigorously", that is, to achieve fast access through a large number of redundant storage + indexes, but this may also cause a waste of storage space. The Amazon DynamoDB Standard-Infrequent Access (DynamoDB Standard-IA), officially released at the Amazon Cloud Technology re:Invent conference, can save users up to 60% of storage while maintaining the same performance, durability and scalability. space.

, the database service of Amazon Cloud Technology is deeply integrated with artificial intelligence technology. Amazon Cloud Technology's services such as Amazon Aurora ML and Amazon Neptune ML allow database developers to use familiar database query languages (such as SQL) to perform machine learning operations without professional knowledge of machine learning.

What we have to talk about is the value of cloud databases after providing data storage services for applications, realizing unified analysis and using machine learning for business innovation, and helping enterprises to transform their business driven by data. The "smart lake warehouse architecture" proposed by Amazon Cloud Technology is realized through a series of services that allow the seamless flow of data between databases, data warehouses and various analysis tools, while providing the ability to directly start machine learning in the database. , So that DBAs and database engineers can quickly use machine learning to carry out business innovation instead of focusing on technical learning. This is the advantage of cloud databases. Qiyuan World, an artificial intelligence platform company, uses the "smart lake warehouse" to innovate on the cloud, achieves data integration and unified governance, and accelerates the landing and scale development of its full life cycle product matrix concept. At the same time, the streaming data processing system has achieved minute-level deployment, and can easily carry millions of QPS (query rate per second) streaming data. It also reduces batch processing running time by 80% and total operating costs by 50%.

and , in order to support the global distributed application system of enterprises, Amazon Cloud Technology has launched Amazon Aurora Global Database (global database), Amazon DynamoDB Global Tables (global table), Amazon ElastiCache for Redis Global Datastore (global data store), Amazon With functions such as DocumentDB Global Clusters (global clusters), enterprises can configure existing clusters with one click, local write data is globally readable, and enjoy sub-millisecond latency.

According to CAIDA statistics, Amazon Cloud Technology is also one of the world's largest Internet bandwidth owners. All regions, availability zones, and edge nodes of Amazon Cloud Technology are connected by high-bandwidth redundant optical cables across continents and oceans, and are 100% encrypted. It is reported that the infrastructure of Amazon Cloud Technology covers 81 Availability Zones (AZ) in 25 geographic regions around the world.

finally , making a migration plan may be a challenge for enterprises. To this end, Amazon Cloud Technology has developed a variety of migration tools. For example, Amazon Schema Conversion Tool can be used to convert database schemas, Amazon Database Migration Service (Amazon DMS) is used to migrate data, and Amazon DMS Fleet Advisor, which is newly released this year, can be used. To collect and analyze database patterns and objects, including information about functional metadata, pattern objects, and usage indicators, and allow companies to build customized migration plans by determining the complexity of migrating the source database to the target service in Amazon Cloud Technology. In addition, Babelfish for Amazon Aurora PostgreSQL, which has just launched globally, can also help companies migrate to SQL Server applications. It is reported that more than 450,000 databases worldwide have been migrated to Amazon Cloud Technology.

It is worth noting that Amazon Cloud Technology has become a strategic cloud service provider for Meta Universe. Meta will use more of Amazon Cloud Technology’s computing, storage, database and security services, and will run third-party cooperative applications on Amazon Cloud Technology. At the same time, Meta will use Amazon Cloud Technology’s computing services to carry out artificial intelligence projects. Related research and development work.

In addition, the meta-universe game "Fortress Night", which is very popular and has more than 350 million users worldwide, almost all of its workloads, such as 3D image modeling, real-time rendering, etc., run on Amazon Cloud Technology products. . Riot, the developer of League of Legends, also deployed the game's infrastructure on Amazon Cloud Technology. The global operation of 37 Interactive Entertainment has also migrated part of its data to Amazon cloud technology services, and the pressure on infrastructure has been greatly reduced. In addition, 37 Interactive Entertainment has quickly built a global network with the help of Amazon Cloud Technology. The server’s cloud architecture enables players around the world to get an almost uniform and smooth experience.

For these companies that are building modern applications, Amazon Cloud Technology has become an indispensable support platform.

Concluding remarks

The five concepts of serverless, AI empowerment, specialized construction, global deployment, and smooth migration are exactly what Amazon Cloud Technology "modern end-to-end data strategy-architecture modernization" connotation.

The modern end-to-end data strategy of Amazon Cloud Technology is a strategic thinking for future applications and a deliverable architecture designed to provide a steady stream of momentum for enterprise development. The modern end-to-end data strategy mainly includes three elements:

  • The first is the modernization of the data architecture we mentioned earlier. Architecture modernization is the cornerstone of all innovations, and its most important concept is "The right tool for the job", that is, using specially constructed tools in different scenarios, and specialized tools require professional modern hosting platforms, which can save a lot of money The time, money and energy of the business.

图片

There are also two main contents that we have not extended in this article:

  • Unified analysis of data. Unified analysis of data is to realize the organic integration and unification of data through special tools on the cloud, connecting all data into a safe and well-managed coherent system, so that enterprises have flexible expansion and extreme performance. After obtaining real-time feedback and data, enterprises can quickly expand their service scale.
  • Business innovation based on data. "The key difference between a thriving company and a company that is struggling to survive is whether creating a data-driven organization is a top priority," Swami Sivasubramanian, vice president of machine learning at Amazon, said at the Amazon Cloud Technology re:Invent global conference. The innovation demands of an enterprise rooted in its own business are the driving force of innovation, among which training and tuning, model deployment and management all involve innovation at the infrastructure level.

At present, enterprises are mainly facing serious problems of outdated infrastructure, low degree of automation and lack of special tools. At the same time, heavy capital expenditures also hinder the progress of enterprises. Therefore, the determination of enterprises to make changes is great. Gartner predicts that by 2024, enterprises will increase their investment in data and analysis by 40% in order to become data-driven and digital.

In the future, the product layout of Amazon Cloud Technology will be further expanded. On the basis of existing products, Amazon Cloud Technology will develop a variety of new products according to customer needs, including specific industries such as finance, telecommunications, medical treatment, and automobiles. These will become important tools for the digital-driven transformation of enterprises, as well as important infrastructure for the construction of modern applications.

图片

As far as the technology circle is concerned, the development of various technologies and fields during the year has not only stood at the peak, but also experienced ups and downs. At the last moment of 2021, we also want to listen to the voices of developers in the cloud computing field. To this end, the award-winning survey of cloud computing developers has officially opened. We sincerely invite all partners to participate, and multiple gifts are waiting for you!


SegmentFault思否
14.3k 声望167.1k 粉丝

SegmentFault 社区管理媛 - 思否小姐姐