Introduction to perfectly integrates the characteristics of enterprise-level storage and cloud is the goal of enterprise-level storage on the cloud. It opens more new dimensions of storage, and helps users to do better business while ensuring the sustainability of their businesses. Innovation. This article is a general chapter of ESSD's technical interpretation. It generally introduces the innovation of ESSD cloud disks that integrate the characteristics of cloud and enterprise-level storage, centered on services, and provide users with a more convenient and smarter storage service experience.

Preface

When it comes to enterprise storage, the most impressive keywords are "high stability", "high performance", and "rich enterprise features"; when it comes to cloud computing, everyone thinks of "large scale" and "global deployment". , "Flexibility", "service-oriented", "intelligent", "instant activation", and "pay-as-you-go" these distinctive features. If the two are combined, what new storage form will be produced? The goal of enterprise-level storage on the cloud is to perfectly integrate the characteristics of enterprise-level storage and cloud, open up more new dimensions of storage, and help users to better perform business innovation while ensuring the sustainability of users' business.

image

Taking block storage as an example, a common enterprise-level solution is the Storage Area Network (SAN), which connects storage arrays and business hosts through a dedicated network, provides unified storage management and sharing, and achieves high-performance and low-latency data access. However, SAN has disadvantages such as high cost, complex operation and maintenance, and poor scalability. These problems are precisely the aspects that cloud technology is best at. To this end, Alibaba Cloud has launched enterprise-level storage on the cloud based on ESSD cloud disks to help users better meet the current needs of digital transformation and innovation.

ESSD Enterprise Cloud Disk

ESSD cloud disks provide users with highly available, highly reliable, and high-performance block-level random access services, as well as rich enterprise features such as native snapshot data protection and cross-domain disaster recovery. It started the project in 2016, based on the Pangu 2.0 distributed storage base, using RDMA and NVMe SSD full-user mode IO technology, and combined with Ali’s more than 10 years of self-developed distributed storage technology accumulation, and debuted on Ali’s "Double 11" shopping in 2017 In the festival, the peak traffic of core business parts such as database and middleware has achieved amazing performance; so in 2018, it began to promote the use of Alibaba on a large scale, and began to open it to some external customers, and received very positive feedback; In 2019, the large-scale commercialization of ESSD cloud disks led the cloud disk into the microsecond era; the inclusive specification ESSD PL0 will be launched in 2020, so that small and medium-sized customers can also obtain the dividends of ESSD full flash technology; by September 2021, ESSD cloud disks have been sold in 59 available zones, and 95% of Alibaba Cloud's top customers choose to use ESSD, becoming the most popular cloud disk product.

image

As a cloud product service, ESSD cloud disk provides service-oriented, safe, and intelligent operation and maintenance management and control services to help users shield the underlying complex hardware and system operation and maintenance, and use declarative open APIs to facilitate users to build upper-level business systems. At the same time, ESSD cloud disk services are deployed globally along with cloud infrastructure, whether it is a central area, a local cloud, or an edge cloud, to better meet the diverse deployment needs of users.

ESSD cloud disks provide users with data services in three major areas: high-stability, high-performance, and highly flexible data access services, lightweight, real-time, and flexible native snapshot data protection services, and disaster recovery and active services anytime, anywhere.

In terms of the most basic data access, ESSD cloud disk provides 9 9s of high reliability and 5 9s of high availability, and provides end-to-end data protection, low latency of 100 microseconds and millions of IOPS, and supports custom keys Encryption, online expansion, and second-level performance changes. And recently released ESSD Auto PL cloud disk that automatically scales according to business load performance, supports NVMe standard protocol and shared access, as well as a dedicated cluster that meets security compliance and physical isolation.

In addition to basic data access services, ESSD cloud disks also provide users with a native snapshot service to help users better protect their data more conveniently. It provides a flexible snapshot strategy, and does not affect the front-end IO read and write performance during the snapshot period. It can complete the creation, rollback and clone of snapshots in seconds. It supports the creation of consistency group snapshots and application consistency snapshots for multiple cloud disks, and provides snapshots Cross-regional replication, and meet the needs of cloud native and container scenarios to create cloud disks in batches for real-time access through snapshots.

In addition to snapshot data protection, in order to better meet the needs of users for disaster recovery in multiple regions, ESSD Cloud Disk has newly launched an asynchronous replication service, allowing users to start with a "zero" threshold, using the infrastructure and network dedicated lines deployed by Alibaba Cloud to achieve Remote disaster recovery architecture. In the future, users will be provided with more disaster recovery services such as synchronous replication and cross-regional multiple activities.

ESSD cloud disks are service-centric and combine the characteristics of cloud and enterprise-level storage to build enterprise-level storage services on the cloud. Below we select the latest released products and functional features of ESSD cloud disks to make a more detailed interpretation for everyone.

image

ESSD Auto PL Highly Flexible IO

The ESSD Auto PL cloud disk was launched in response to the problems faced by many users: it is impossible to accurately predict the peak value of the business, and it is difficult to make precise planning in the performance configuration. If the performance configuration reservation is too high, it will cause a large amount of idle waste of daily resources; and if the performance reservation is insufficient, the business will be damaged due to sudden floods. ESSD Auto PL cloud disk hopes to help users solve this dilemma. While supporting performance-specific configuration, it also supports automatic scaling according to business load. The performance of a single disk can be automatically increased to a maximum of 1 million IOPS, providing safe and convenient access for unexpected sudden access. Performance is automatically configured. With automatic performance scaling turned on, users only need to pay for the number of reads and writes that actually exceed the pre-configured performance, which ensures stable business operation and maximizes savings in user resource configuration overhead.

image

As the industry's first cloud disk that supports decoupling of performance and capacity and supports elastic scaling of performance according to load, ESSD Auto PL needs to solve many technical challenges: such as how to quickly sense business load changes, and how to dynamically apply for and release resources on demand to support performance Scaling, how to quickly balance load scheduling, etc. After repeated polishing, ESSD Auto PL cloud disk can realize 10 millisecond level of business load perception and prediction, complete dynamic queue scheduling and concurrency adjustment in seconds, and fine-grained segmentation of a single cloud disk can make it use the entire backend in a balanced manner The resources of the storage cluster are adjusted quickly and dynamically. More than that, we also solve two other problems to remove users’ concerns:

1. Through real-time monitoring and forecasting of cluster capacity-performance level and minute-level cross-cluster scheduling and balancing, to meet the simultaneous increase of user's large-scale cloud disk load, which may exceed the upper limit of single-cluster performance;

2. Through multi-level QoS isolation and priority management, including dynamic queue distribution of hardware offloading, IO marking and execution cost evaluation rearrangement, etc., avoid the performance interference between multi-tenants caused by elasticity improvement in multi-tenant scenarios.

Through these technologies, we hope that the ESSD Auto PL cloud disk will simplify the performance configuration of users and better help users smooth through the peak hours of business.

NVMe and shared access

With the rapid development and popularization of flash memory technology, storage media is no longer a storage bottleneck, and software processing on the media has become the biggest bottleneck. The NVMe protocol is a newly launched data access protocol for high-performance devices. Compared with the traditional SCSI protocol, it is simpler and lighter, while providing rich expansion features. This time, the ESSD cloud disk supports users to use the NVMe protocol to access data more efficiently. At the same time, the cloud disk shared access is realized based on the NVMe Persistent Reservation standard.

Many mainstream commercial databases such as Oracle RAC, SAP HANA, etc. need to use disk shared access to achieve high availability. NVMe Persistent Reservation provides secure and lightweight support for shared access and permission management, greatly reducing failover time. At the same time, ESSD cloud disk also uses hardware offloading technology to reduce NVMe virtualization latency by 30%, and adopts self-developed Solar-RDMA network protocol to support efficient data transmission, and can complete network multipath failover in seconds.

image

Lightweight, real-time, and flexible native snapshot data protection

ESSD cloud disks provide native snapshots to provide users with convenient data protection services. In addition to adding multi-disk consistent snapshot groups and application-consistent snapshots, this release also provides the ultimate upgrade and optimization of the snapshot experience, which is reflected in the "light" , "Fast" and "Bounce".

"light" : Does not affect the IO read and write performance during the snapshot creation period. Many users worry that creating snapshots will affect IO performance, and only perform snapshot data protection during low business periods. We have made a lot of optimizations on the distributed snapshot algorithm and implementation, so that users can put aside the concerns that affect performance and protect data at any time. From the measured data in the figure below, we can see that when creating consistent snapshots for two ESSD cloud disks that are being written in large numbers, the delay of writing in the foreground remains unchanged; we have also measured the snapshot performance of the other two friends, and you will find that The IO latency increases nearly 1-3 times.

"fast" : ESSD cloud disk snapshots can be created, rolled back, and cloned in seconds, meeting the needs of users for real-time data protection and rapid DevOps orchestration.

"Bomb" : With the popularization of cloud native and container technology, users hope to be able to pull up a large number of container Pods in a short time. We have made a lot of optimizations to clone cloud disks in batches and perform real-time data access to allow users Pull up thousands of Pods in minutes to quickly start operation.

image

Asynchronous replication, cross-domain disaster recovery

Data is the core asset of an enterprise. In the real world, disasters that cannot be reached by humans will always occur, leading to a large-scale outage of data centers and even data loss. Data remote disaster recovery is a universal requirement of enterprise-level customers. Traditional disaster recovery solutions often require users to build their own disaster recovery centers, purchase dedicated lines, and invest a lot of manpower in operation, maintenance, and testing and verification. The investment costs are high and the cycle is long. The global deployment of cloud computing services naturally builds disaster tolerance capabilities for users anytime, anywhere. ESSD cloud disks launched an asynchronous replication service this time to help users "zero" the threshold, and perform cross-regional data disaster recovery on demand at any time.

In the design and implementation of ESSD cloud disk asynchronous replication technology, we have made many innovations and optimizations to the cloud disk consistency group replication algorithm to ensure strong timing consistency and multiple cross-checks of the master-slave cloud disk group, and the front-end read and write performance of the master disk is non-destructive ; At the same time, in the data transmission link, ensure the smallest incremental data replication, use multi-channel concurrent scheduling to compress the replication time period, and perform real-time detection and switching of the network health status; the user can open it at any time with a few mouse clicks on the console For asynchronous replication services, you only need to pay for the actual usage.

image

ESSD dedicated cluster

Some cloud users hope to implement physical isolation of data to meet the needs of industry regulations. ESSD dedicated clusters not only allow users to enjoy the advantages of unified operation and maintenance on the cloud and continuous iteration of software and hardware, but also exclusive clusters to meet physical resource isolation and customization. need.

image

A new generation of high-performance ESSD PL-X cloud disk

The high performance and rich enterprise features of ESSD have been loved by many users. We have also learned a lot from the interaction with users, and continue to polish and iterate to bring users a better cloud disk experience. Many users have feedback that they hope that ESSD can go further in terms of performance and be able to meet their most demanding performance scenarios. We have also been working hard in this direction. Here is good news for everyone in advance. A new generation of high-performance ESSD PL-X cloud disk will be released for invitational testing.

Compared with the previous most powerful ESSD PL-3 cloud disk, the ESSD PL-X cloud disk has a 70% reduction in end-to-end latency of 4K data writing to only 30 us; IOPS is increased by 3 times, reaching a maximum of 3 million; and the throughput is reduced from 4GB/ s is increased to 15GB/s. Compared with other high-performance cloud disks from other vendors, ESSD PL-X has more obvious advantages in performance comparison.

image

ESSD PL-X cloud disk uses the latest high-speed RDMA network and persistent memory technology to deeply optimize the data link, and through the innovative high-concurrent read-write consistency protocol, the protocol serialization overhead is extremely compressed. At the same time, considering that the unit cost of persistent memory is an order of magnitude higher than that of SSD, ESSD PL-X cloud disk integrates persistent memory and NVMe SSD storage media, adopts intelligent hierarchical data storage management, and brings users the highest cost performance.

From our current FIO measured data, the ESSD PL-X cloud disk 4K single-channel write end-to-end delay is only 25.44 microseconds. This delay can be broken down: the host-side virtualization delay is 10.6 us, the RDMA network transmission is 13 us, and the storage back-end processing Only 1.8 us.

image

We also tested the performance of ESSD PL-X in the database scenario. We deployed MySQL 8.0.18 community edition on a cloud server with 32 cores and 64 GB of memory. We used sysbench to pressure test the performance of multiple local disks and cloud disks. As you can see in the figure, the performance of ESSD PL-X cloud disk in the pure write and read only scenario exceeds that of other local disks and cloud disks. At the same time, because the ESSD cloud disk supports 16KB atomic write, it is satisfied that MySQL closes double write to better improve performance. We also expect to further improve performance by continuously optimizing the elastic caching algorithm of persistent memory. As can be seen from the figure below right, as the hit rate of persistent memory as a read cache increases, MySQL read performance will continue to rise.

image

Summarize

ESSD cloud disk innovation combines the characteristics of cloud and enterprise-level storage to provide users with a more convenient and smarter storage service experience. We believe that in the future, storage will no longer be the cumbersome "iron box" in everyone's impression. Enterprise-level storage on the cloud is service-centric, opening up more dimensions of storage and making storage more flexible and intelligent. The release of new product features of ESSD cloud disk has taken a big step in this direction. "Stable, safe, high-performance, inclusive smart new storage", we are on the way!

Original work: Alibaba Cloud storage full bow

Copyright Notice: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users. The copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.

阿里云开发者
3.2k 声望6.3k 粉丝

阿里巴巴官方技术号,关于阿里巴巴经济体的技术创新、实战经验、技术人的成长心得均呈现于此。


引用和评论

0 条评论