Curve block storage has been used in the production environment for nearly three years, and has withstood the test of various abnormal and extreme scenarios. Its performance and stability have exceeded the expectations of core business requirements.
Netease cloud music background
NetEase Cloud Music is one of the leading online music platforms in China, providing an interactive content community for music lovers. NetEase Cloud Music has created a large, dynamic, robust and fast-growing business format, providing users with community-centric online music services and social entertainment services. Its signature key products include "NetEase Cloud Music" and ancillary social entertainment products such as "LOOK Live", "Sound Wave" and "Sound Street", enabling music lovers to discover, enjoy, share and create independently through technology-driven tools Different music and music-derived content and interact with others.
Cloud music cloud disk business background
Cloud Music's business using cloud disks mainly includes Java applications such as the main site, UGC, and music library. The main site is the core business of cloud music and needs to provide the highest level of SLA guarantee (annual availability rate >= 99.99%). A stable cloud music experience with hundreds of millions of users has always been a major difficulty for us. Before 2019, cloud music mainly used Ceph cloud disks. As we all know, Ceph has performance defects in large-scale scenarios, and it is difficult to ensure that we are in various abnormal (bad disk slow disk, storage machine downtime, storage network congestion, etc.) scenarios. Cloud disk IO response delay is not affected; Ceph cloud disk IO jitter problem, we have tried to spend a lot of manpower to optimize and transform, but only slightly alleviated, can not be completely solved; performance problems have also invested a lot of manpower to analyze and optimize , but it still failed to meet expectations, so we set up a project to understand the Curve block storage distributed storage system.
Introduction to Curve Block Storage
Curve block storage can be well adapted to mainstream cloud computing platforms, and has the advantages of high performance, easy operation and maintenance, stability and no jitter. In practical applications, we use Curve block storage to connect with Cinder as the cloud storage backend of the cloud host, Nova as the system disk of the cloud host, and Glance as the image storage backend. During the process of creating a cloud host, Nova will clone a new volume through the Python SDK provided by Curve block storage to use as a cloud host system disk. In the process of creating a cloud disk, Cinder will create an empty volume through the Python SDK or clone a new volume from an existing volume snapshot, which can then be mounted on the cloud host as a cloud disk. The cloud host uses Libvirt as the virtualization control service and QEMU/KVM as the virtualization engine. Curve block storage provides a driver library for Libvirt/QEMU. After compiling, you can directly use the Curve volume as remote storage, and you don't need to mount the Curve block storage volume locally.
Why choose Curve
- business side
i. According to our cloud music application scenario, there are two major pain points in Ceph cloud disk:
Poor performance: Due to the poor performance of a single volume (mainly due to high IO latency, poor IOPS, and easy to be affected by other high-load volumes in the cluster), it can only be used for system disks, or as a cloud disk to supply applications to print logs , which cannot support the use of middleware services.
IO jitter: According to our observation, if the IO delay exceeds 2s, it may cause the disk util to be 100%, and the business will alarm in a large area, request accumulation, and in severe cases, it will cause an avalanche effect; according to the observation of the previous two years, Ceph cloud disk IO The jitter is very frequent (basically every month), and the jitter duration is also in minutes. Therefore, many core applications have switched to local storage to avoid similar problems.
ii. Advantages of Curve cloud disk:
Jitter: Since the use of the Curve cloud disk, the disk IO util monitoring has never had a 100% alarm caused by the distributed storage system, the stability of business operations has been greatly improved, and the core business has been gradually migrated back to the Curve cloud disk ( After all, the high space utilization, reliability, portability, and rapid recovery capabilities of cloud disks are also very important to the business).
Performance: Under the same hardware, the performance of a single volume of Curve is +2 times that of a Ceph volume, and the latency is also much lower than that of Ceph. For the specific performance comparison, please refer to the following figure:
- Operation and maintenance side
i. According to our cloud music operation and maintenance scenario, Ceph’s main pain points are as follows:
Service upgrade: Common scenarios that require client upgrades include bug fixes, new feature enhancements, and version upgrades. We have encountered a bug in which the 32-bit serial number of the Ceph community message module overflowed. Appears, causing IO hang, both the client and the server need to be updated to solve the problem. When updating the client, there are two options. One is to restart the QEMU process of the cloud host, and the other is to perform live migration on the cloud host. These two operations are more feasible in the scenario of a small number of cloud hosts. Thousands of cloud hosts perform similar operations. Obviously, the operability is very low, and the business is obviously unacceptable. In addition, the server upgrade will also cause a certain amount of IO jitter when restarting the OSD process. It needs to be operated during the low-peak business period, and the business needs to temporarily close the disk util alarm.
Performance: The operation and maintenance personnel mainly focus on the overall performance of the storage cluster. If the total capacity of the cluster does not match the total performance, it will easily lead to the problem of insufficient performance even when the capacity is sufficient. Either fewer volumes are created and the capacity is wasted, or the volume will continue to be created but it will affect the performance. The IO delay and throughput of a single volume, in addition, after the number of Ceph cluster volumes reaches a certain scale, with the increase of the number of volumes, the overall performance of the cluster also gradually declines, which leads to a greater impact on the performance of a single volume.
Algorithm: Due to the limitation of the CRUSH algorithm, the data distribution among Ceph OSDs is very uneven, and the space is wasted seriously. According to our observations, the difference between the highest and lowest OSD space usage can reach 50%, and data balancing operations are often required. However, in the process of data balancing, a large number of data migration operations will occur, resulting in IO jitter, and data balancing cannot perfectly solve the problem of unbalanced use of OSD capacity.
IO jitter: bad disk replacement, node downtime, high IO load, capacity expansion (no pool is added, adding too many pools will make OpenStack maintenance complicated) data balance, network card packet loss, slow disk, etc.
ii. Relatively speaking, Curve has significant advantages in the above aspects:
Service upgrade: The client supports hot upgrade. During the operation, the QEMU process does not need to be restarted or migrated. The millisecond-level impact is almost insensitive to the business in the cloud host. For the architecture design of hot upgrade, please refer to ①. When the Curve server is upgraded, thanks to the consensus protocol raft of the quorum mechanism, as long as the upgrade is done according to the replica domain, the impact on the business IO can be guaranteed to be in seconds, and the IO delay does not exceed 2s, and the util 100% will not be caused. .
Performance: Curve clusters can create more volumes with the same capacity and maintain stable performance output.
Algorithm: Curve data distribution is carried out by the centralized MDS service, which can ensure a very high balance. The difference between the highest and lowest chunkserver space utilization does not exceed 10%, so there is no need to perform data balancing operations.
IO jitter: When Ceph cloud disks are prone to IO jitter, Curve cloud disks are more stable. Curve VS Ceph is as follows:
Use Curve to land results
Curve block storage has been used in the production environment for nearly three years, and has withstood the test of various abnormal and extreme scenarios. Its performance and stability have exceeded the expectations of core business requirements. There is no obvious IO jitter in common failure scenarios. The upgrade of the terminal version did not affect the normal operation of the business, which fully proved that our choice at that time was correct. In addition, we would like to thank the students in the Curve team for their help in the process of using it. At present: Cloud Music uses Curve block storage as the cloud disk and system disk of the cloud host. The system disk usually has a fixed capacity of 40GB or 60GB. The cloud disk has a minimum capacity of 50GB and a maximum support of 4TB (this is a soft limit, Curve cloud disk actually supports the creation of petabyte-scale volumes).
Follow-up planning
Combined with Curve block storage aspects:
Explore cloud-native middleware scenarios based on Curve block storage, such as running the transformed Redis, Kafka, message queue and other services on Curve block storage volumes to reduce failover time.
Launched a cloud native database based on CurveBS+PolarFS+MySQL.
Other stocks use Ceph cloud disks and local storage cloud hosts to switch to Curve block storage volumes.
At present, the Curve team is also working hard to develop shared file storage services. NetEase's internal OpenStack-based private cloud 2.0 platform has gradually evolved to Kubernetes-based 3.0 platform. The demand for PVC volumes of the ReadWriteMany type has become more and more urgent. The Curve team developed Developed the Curve distributed shared file system, which supports storing data to the Curve block storage backend or an object storage service compatible with the S3 protocol, which will be launched as soon as possible in the future.
refer to:
① https://github.com/opencurve/curve/blob/master/docs/cn/nebd.md
GitHub: https://github.com/opencurve/curve
WeChat group: Please search for add or search group assistant WeChat OpenCurve_bot
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。