Where is the next revolution in cloud-native databases?

Author | Wan Jia

Since the 1970s, relational databases have entered the stage of history and become the protagonist of the database industry. Since then, the industry has entered a golden age, and a database expert once wrote, "For a long time, relational databases were almost the choice of the whole world. You may use one database to play all your business, and you don't need one. Even engineers to maintain it."

"Old but stronger" relational database

For the database industry, the emergence of the Internet has brought great changes: the amount of data has increased dramatically, the types of data have become more complex, and the demand for data processing speed has continued to increase, and the era of big data has arrived in an all-round way.

Therefore, NoSQL databases for unstructured data have emerged, and document databases, time series databases, graph databases, search databases, etc. have emerged.

For a while, NoSQL databases have become popular and favored by many enterprises. Everyone can't wait to transform all systems with NoSQL.

NoSQL is popular because it solves various problems in relational databases. The first major problem is that there are many schemas of data, and it is very clumsy to use relational databases to represent different schemas, so different databases, such as document type, time series type, and search type, are required. Another major problem is that the ACID of relational databases greatly affects the performance and scalability of the database, so NoSQL makes compromises to solve the problem of large scale.

With the development of the mobile Internet and the wide application of big data technology, more and more new databases have emerged, but relational databases still occupy a dominant position. Why does it last so long? One of the main reasons is that the relational database adopts the SQL standard , which is a high-level non-procedural programming interface language that perfectly connects computer science and data management methods that are easy for humans to understand and recognize, and it is still difficult to surpass.

Cloud Era: The "Evolution" of Relational Databases

With the emergence and development of cloud computing, more and more enterprises have begun to deploy databases on the cloud. Cloud databases that provide database functions in the form of cloud services emerge as the times require. It is reported that the cloud database not only reduces the repeated configuration of database parameters, but also has the characteristics of rapid deployment, high scalability, high availability, portability, easy operation and maintenance, and resource isolation.

In particular , cloud-native databases with elastic scaling and global deployment capabilities designed based on concepts such as containerization, microservices, and serverless can be accessed from multiple front-ends anytime, anywhere, provide computing nodes for cloud services, and can flexibly and timely mobilize resources for scaling. It has become a new trend in the development of the industry to help enterprises reduce costs and increase efficiency.

It can be said that the database ushered in changes in the cloud era. On the one hand, the database at this time evolves and develops in the direction of memory and distribution, and even RDBMS itself is challenged by NoSQL; on the other hand, in the cloud hosting environment, the relational database gradually exposes some problems.

At this point, in order to adapt to changes, relational databases need to innovate and evolve themselves. The trailblazer is Amazon Web Services, which has been named "Global Cloud Computing Leader" by Gartner for 11 consecutive years.

Amazon Cloud Technology launched the relational database Amazon Aurora in 2014. It is compatible with MySQL and PostgreSQL, and adopts shared storage and read-write separation, which not only improves database performance, but also solves the scalability problem, allowing traditional Internet companies to seamlessly migrate to the cloud, which makes it the most important part of the cloud computing era. a representative.

With the high performance and availability of traditional commercial databases, and the simplicity and cost-effectiveness of open source databases, Amazon Aurora is the fastest-growing cloud service in the history of Amazon Cloud Technology and ranks among the eighth most popular cloud services for startups An Amazon cloud technology global service.

Why is it favored by many companies? This is inseparable from the powerful performance and advanced architectural design of Amazon Aurora.

In terms of performance, Amazon Aurora is fully compatible with open source engines, and can achieve 5 times the throughput of standard MySQL and 3 times that of standard PostgreSQL, and supports parallel queries to accelerate OLAP business . Secondly, in terms of high availability, it can achieve high availability of Availability Zone (AZ) + 1, and Global Database can complete cross-zone disaster recovery backup . Third, in terms of scalability, it supports automatic scaling of 15 read replicas , and each database instance can automatically scale up to 128 TB. Finally, in terms of cost, it provides commercial-grade database performance at one- tenth of the cost , and storage does not need to be provisioned and pay-as-you-go.

Architecturally, the Amazon Aurora architecture supports a serverless serverless architecture. It adopts an architecture that separates computing and storage, which can achieve rapid expansion of the storage layer and improve data analysis capabilities. At the same time, it also adopts the unique concept of log as database, which saves the amount of data transmission between nodes and storage layer, so as to achieve performance improvement.

If the arrival of the cloud era has promoted the transformation of the database, then the combination with Serverless has once again added fire to the development of the database.

Amazon Cloud Technology launched Amazon Aurora Serverless v1 in 2018. It is reported that it is an on-demand auto-scaling configuration version of Amazon Aurora, which automatically starts, shuts down, and expands or shrinks capacity according to the needs of the application, allowing developers to run databases in the cloud without managing any database instances.

Replacing Oracle with Amazon Aurora

Starting from an online bookstore business, after more than 20 years of development, Amazon has become not only a multinational e-commerce company, but also one of the largest Internet online retailers in the world.

It is understood that Amazon.com was launched in July 1995, and the website began to sell books. Since then, the products sold have diversified from books, covering audio and video products, software, consumer electronics, household appliances, kitchenware, food, toys, maternal and child products, cosmetics, daily chemical products, sports equipment and other categories.

As early as 2008, the number of visitors to Amazon.com, the main domain name of Amazon's website, reached at least 615 million, which was twice the number of customers in Walmart supermarket stores at that time. In addition to the main domain name, Amazon has localized websites in several countries around the world, including China, Canada, the United Kingdom, France, Germany, Mexico, and Australia, among others.

The continuous increase in users and the expansion of the global market have made Amazon's e-commerce business thriving.

However, challenges also come with it. On the one hand, Amazon needs to deal with extremely high access traffic, especially in extreme situations during shopping holidays such as the Christmas season; on the other hand, the rapid development of business has led to exponential growth of data, and the amount of data continues to increase. It is reported that in 2017, Amazon e-commerce has nearly 7,500 OLTP databases, accumulatively storing 75PB of data, involving more than 1,000 applications of more than 100 teams in the company.

Amazon has been using the Oracle database for a long time, but it found that the Oracle database was not only expensive, but also poorly scalable to accommodate growing business needs.

In addition to poor scalability and increasing costs, Amazon e-commerce also faces latency risks from increased data volumes and transaction rates, availability risks from legacy code/architecture, and operational risks from hardware configuration and management time/resources.

How do you solve your own challenges? This has become a difficult problem for Amazon's e-commerce.

To solve these problems, Amazon e-commerce decided to replace the Oracle database, migrating nearly 7,500 OLTP databases to Amazon RDS and Amazon Aurora. A great feature of Amazon Aurora is that 85–90% of its PostgreSQL queries match existing Oracle queries, which means that converting queries to Amazon Aurora PostgreSQL is almost entirely automatic.

After the migration is completed, it not only saves 40%-90% of operating costs , but also greatly improves performance, while Amazon e-commerce reduces peak expansion workload and management overhead by 10 times by using Amazon Cloud Technology Managed Database Services. Thus, the benefits are obvious.

If Amazon e-commerce companies appreciate the powerful performance, scalability and low-cost advantages of Amazon Aurora, then Jointown realizes its advantages of easily realizing low-latency read-write separation and coping with peaks and valleys of business load.

Jiuzhoutong is a large-scale enterprise group that mainly operates western medicine, traditional Chinese medicine and equipment, and takes medical institutions, wholesale enterprises and retail pharmacies as its main customers, and provides customers with various value-added services such as information and logistics. Among them, online B2B business grows at a rate of more than 30% every year.

The business characteristics of its B2B system are that it reads more and writes less, and the read-write ratio is between 8:2 and 7:3, and there is often a large gap between peaks and valleys. In addition, in the self-built MySQL mode, the data replication delay between the master database and the slave database will exceed 1 second, the read-write separation effect is not good, and the pressure of the master database remains high.

To this end, Jointown uses Amazon Aurora to easily separate read and write databases and expand on demand. Using the Amazon Aurora database service, a single Amazon Aurora cluster can support up to 15 read-only nodes with automatic online expansion and contraction. The overall database performance is improved by 5 times, the TCO is reduced by 50%, and cross-availability zone deployment, load balancing/automatic failover, fine monitoring/on-demand automatic scaling, etc. are realized. At the same time, it also effectively reduces the workload pressure of the main library. Furthermore, there is an efficient balance between performance and cost. With the help of Amazon Aurora Auto Scaling, read replicas can be scaled on demand to meet business needs and save server costs.

In the process of globalization, Huya Live also chose Amazon Aurora. At the beginning of 2018, Huya Live launched the overseas product Nimo TV, and by the end of the year, the monthly active users reached 10 million. The products have successfully landed in Southeast Asia and Latin America, and entered the Spanish market in 2019.

It is reported that in the background of the database, Huya Live uses DynamoDB to store users' dynamic information, including payment, status, friends' attention and other information. Relatively static information is stored on Amazon Aurora, such as basic user information. Amazon Aurora can automatically expand capacity, and because computing and storage are separated, when the amount of data is large, computing instances can be upgraded separately to ensure performance. At the same time, under abnormal conditions, it usually only takes about 10 seconds to automatically realize the failover without any impact on end users. And, with its global database capabilities, the local user experience can be enhanced. Huya Live deploys databases in the Asia Pacific (Singapore) region of Amazon Cloud Technology, and establishes replicas in other regions to improve the experience of local users.

In addition to being suitable for rapid business growth, low-latency read-write separation, peaks and valleys of business load, global deployment, storage expansion, global deployment, backup and recovery, and disaster recovery, Amazon Aurora is also suitable for minimizing system downtime, accidental deletion and modification , graphical monitoring and performance tuning scenarios.

Last year, Amazon Cloud Technology was named a cloud database leader in the Gartner report "2021 Gartner Magic Quadrant for Cloud Database Management System", which is the seventh consecutive year that Amazon Cloud Technology has won this honor. The reason for this evaluation is the continuous innovation of Amazon cloud technology.

Image source: Gartner

For example, create a serverless database, realize the elastic scaling of the database, further simplify the creation, maintenance and expansion of the database for customers, and achieve high scalability and automatic scaling capacity. Another example is the introduction of Babelfish for Amazon Aurora PostgreSQL, which makes Amazon Aurora compatible with applications written on Microsoft SQL Server. Another example is the Amazon DevOps Guru, a machine learning-powered feature that helps developers and DevOps engineers quickly detect, diagnose, and remediate various database-related issues in Amazon RDS.

Recently, Gartner released 2021 DBMS market revenue data. We are seeing cloud usage increasing year by year at an astonishing rate. On the one hand, the DBMS market continues to accelerate its growth, with a scale of nearly 80 billion US dollars, an increase of 14.5 billion US dollars over 2020; on the other hand, the market structure has changed, and cloud platform providers are in a leading position. Among them, Amazon Cloud Technology's revenue in the DBMS market increased by 42.3%, almost twice the market growth rate, and only 0.1% behind the first place in the ranking. For the first time, it surpassed the traditional database giant Oracle, which also confirms the power of "cloud computing + database".

For Amazon Cloud Technology, innovation in databases goes far beyond that. As of now, Amazon Cloud Technology currently provides more than a dozen purpose-built database services that support eight data types: relational, key-value, document, memory, graph, time series, wide-column, and ledger.

Why does Amazon Cloud provide so many database products? In my opinion, as Amazon CTO and VP Dr. Werner Vogels put it, "Developers want their apps to be well-built and to scale effectively, and for that, they need to be able to use them within the same app. Multiple databases and data models. Rarely does one database meet the needs of multiple different application scenarios, the era of one-size-fits-all databases is over, and developers are using a large number of specialized databases to build highly distributed applications . Developers are doing What they do best: breaking complex applications into smaller pieces and then choosing the best tool to solve each problem."

Always on the road to database innovation: Amazon Aurora Serverless v2

In the direction of database, the pace of innovation of Amazon cloud technology has never stopped, and it has been moving forward.

Recently, Amazon Aurora Serverless v2 went live. It is said to scale instantly to support the most demanding applications, delivering cost savings of up to 90% compared to peak capacity provisioning. At the same time, scale database workloads to hundreds of thousands of transactions in fractions of a second. Arguably, it's a whole new level of scalability and cost savings. And, Amazon Aurora Serverless v2 provides full Amazon Aurora functionality, including Multi-AZ support, global databases, and read replicas. It can achieve more fine-grained capacity adjustment, and adjust capacity according to multiple dimensions, providing an appropriate amount of database resources for application needs. Instead, businesses don't need to manage database capacity, and only pay for the resources their applications consume.

What's more worth mentioning is that Amazon Aurora Serverless v2 supports high-availability deployment and read scaling across AZs. Through continuous monitoring and utilizing the largest possible buffer pool, v2 in-place scaling can be achieved in seconds.

Plus, it's perfect for a variety of applications. For example, in the face of rapid business growth scenarios and massive multi-tenancy scenarios, when an enterprise with hundreds of thousands of applications, or a software-as-a-service (SaaS) provider with a multi-tenant environment with hundreds or thousands of databases, can Use Amazon Aurora Serverless v2 to manage database capacity across the fleet. It is also suitable for scenarios with obvious business throughput fluctuations, such as game business, e-commerce business, test environment, etc., as well as new business systems where throughput cannot be estimated. Amazon Aurora Serverless v2 can effectively save customers costs for business systems that are in trough most of the time

At present, we are in the stage of rapid development of a new round of scientific and technological revolution: the scale of data is growing explosively, the types of data are becoming more and more abundant, and the application of data is rapidly deepening. Data-driven will bring a new wave of innovation. In the database field, serverless architecture will also become one of the inevitable trends in the future development of cloud-native databases.

Together with Amazon Cloud Technology, witness the future of the database industry!

Where is the next revolution in cloud-native databases?

"Old but stronger" relational database

Cloud Era: The "Evolution" of Relational Databases

Replacing Oracle with Amazon Aurora

Always on the road to database innovation: Amazon Aurora Serverless v2

亚马逊云开发者

引用和评论

全新 Graviton4 实例，提升 Valkey 性价比

53 倍性能提升！TiDB 全局索引如何优化分区表查询？

分布式数据库解析

K8s 小白入门｜从电影配乐谈起，聊聊容器编排和 K8s

做到真正0丢失、0重复：Apache SeaTunnel 实现万亿级数据一致性全解密

在 Kubernetes 上用 KubeBlocks + Dify 快速构建生产级 AIGC 应用

如何在通义灵码里用上DeepSeek-V3 和 DeepSeek-R1 满血版671B模型？