REDIS cloud native combat
Abstract
This time, taking Redis as an example, it explains the practice of Youdao's infrastructure team on the road to infrastructure containerization, mainly from declarative management, Operator working principle, container orchestration, master-slave mode, cluster mode, high-availability strategy, cluster Expand and shrink capacity and so on.catalog
- background
- The challenge
- Declarative management
Operator working principle
- Container Orchestration
- Master-slave mode
• Master-slave topology diagram
• Principle of Reconciliation- Cluster mode
• Cluster topology diagram
• Principle of Reconciliation- High availability strategy
• High availability guaranteed by Kubernetes
• High availability of Redis cluster- Monitoring observation
- Cluster expansion
- Summary and outlook
background
Redis is a commonly used caching service in business systems. It is often used in scenarios such as traffic peaks, data analysis, and point sorting. It can also achieve decoupling between systems through middleware and improve system scalability.
Traditional physical machine deployment middleware requires manual construction by operation and maintenance personnel, which takes a long time to start up, is not conducive to subsequent maintenance, and cannot meet the needs of rapid business development.
Compared with traditional IT, cloud native can help smooth business migration, rapid development, stable operation and maintenance, greatly reduce technical costs, and save hardware resources.
Cloud-native middleware refers to the construction of scalable infrastructure relying on containerization, service grid, microservices, serverless and other technologies, and continuous delivery of basic software for production systems, which improves application performance without changing its functions. Usability and stability.
Under this general trend, Youdao's infrastructure team started the practice of cloud-native middleware. In addition to the Redis described in this article, it also includes Elasticsearch, ZooKeeper, etc.
The challenge
The use of cloud native technology can solve the current Redis deployment slowness and low resource utilization problems. At the same time, containerized Redis clusters also face some challenges:
• How Kubernetes deploys Redis stateful services
• How to not affect service availability after the container crashes;
• How to ensure that the data in Redis memory is not lost after the container is restarted;
• How to ensure that the migration of slots does not affect the business when the node is expanded horizontally;
• How to deal with the status of the cluster after the pod ip changes.
Declarative management
For a Redis cluster, our expectation is to be able to provide services without interruption 7x24 hours, and to repair itself in case of failure. This is exactly the same as the declarative nature of the Kubernetes API.
The so-called "declarative" means that we only need to submit a defined API object to "declare" what the state I expect is like. The resource object in Kubernetes can complete the current state without external interference. The desired state transition, this process is the Reconcile process. For example, we created a Deployment through yaml, and Kubernetes will "automatically" create a Pod for it based on the configuration in yaml, pull the specified storage volume for mounting, and a series of other complex requirements.
Therefore, can our Redis cluster use a similar service to complete this process? That is, we need to define such objects and define the process of service Reconcile. The operator of Kubernetes can just meet this requirement. It can be simply understood that the operator is composed of resource definitions and resource controllers. After fully interpreting the relationship between the cluster and the operator, we design the overall architecture diagram as follows
The Operator cluster itself is deployed by Deployment, and the ETCD completes the selection of the master. The upper layer communicates with Kubernetes' Api Server, Controller Manager and other components, and the lower layer continues to reconcile the state of the Redis cluster.
In the sentinel mode, the Redis service uses a set of sentinel clusters, deployed using StatefulSet, and persistent configuration files. Redis server also adopts StatefulSet deployment, and the instance of sentinel mode is one master and multiple slaves.
Each shard in the cluster mode is deployed by StatefulSet, and the agent is deployed by Deployment. Kubernetes itself is responsible for native Pod, StatefulSet, Service, scheduling strategy, etc.
Redis resource definition can be stored in ETCD, we only need to submit the yaml configuration of custom resources in advance. is the Redis master-slave cluster that creates three copies as shown below:
apiVersion: Redis.io/v1beta1
kind: RedisCluster
metadata:
name: my-release
spec:
size: 3
imagePullPolicy: IfNotPresent
resources:
limits:
cpu: 1000m
memory: 1Gi
requests:
cpu: 1000m
memory: 1Gi
config:
maxclients: "10000"
Among them, kind defines the CR name used, size is the number of copies, resources defines resource quotas, and config corresponds to the config of Redis Server. The definition is stored in the ETCD database of Kubernetes. The subsequent specific resource application and use are completed by the Controller of the Operator.
Operator working principle
Operator is an extended mode of Kubernetes, composed of CRD and Controller. It uses customized resources to manage specific applications and their components, and Operator follows the philosophy of Kubernetes.
Operators can get many built-in automation functions from the Kubernetes core without any modification, such as using Kubernetes to automate the deployment and running of workloads, and even automate Kubernetes itself.
The Operator mode of Kubernetes can be associated with more than one custom resource through the controller without modifying the code of Kubernetes itself, that is, the behavior of the cluster can be expanded. Operator is a client of the Kubernetes API, and its core function is to act as a controller for custom resources.
CRD: Custom Resource Definition, everything in Kubernetes is a resource, a resource is a CRD, and a user-defined Kubernetes resource is a type, such as Deployment, Pod, Service, etc. that come with it by default.
CR: Custom Resource is a concrete example of implementing CRD.
The user creates a CRD custom resource, ApiServer forwards the CRD to the webhook, and the webhook performs the default configuration verification configuration and modification configuration. The configuration after the webhook processing is completed will be stored in the ETCD and returned to the user whether the creation was successful. The Controller will monitor the CRD, and process the CRD according to the pre-written business logic, such as creating a Pod, processing the relationship between the new node and the old cluster, etc., to ensure that the running state is consistent with the expectation.
Container Orchestration
The smallest deployment unit of Redis cluster in Kubernetes is Pod. Therefore, before architecture design, Redis characteristics, resource restrictions, deployment forms, data storage, state maintenance, etc. need to be considered in advance to configure appropriate deployment methods for different types of Redis clusters.
Resource Limit :
Kubernetes uses two restriction types, request and limit, to allocate resources.
• Request (resource requirement): That is, the node running the Pod must meet the most basic requirements for running the Pod before it can be started.
\
• Limit (resource limit): During the Pod is running, the memory usage may increase. The maximum amount of memory that can be used is the resource limit.
Redis basically does not abuse the cpu, so it is enough to configure 1-2 cores. The memory is allocated according to the specific business usage. Taking into account that more memory will be fork in some scenarios, such as frequent flashing of AOF, during the AOF rewriting process, the Redis main program claims that it can still receive write operations. In this case, copy on write is used. Time copy) method of operating memory data, if the business use characteristic is "write more and read less", then a large number of memory copies will be generated during the flashing period, which will lead to OOM and service restart.
An effective solution is to reduce the number of flashing operations and place the flashing operation during the night low-traffic period. The method to reduce the number of flashing times is to appropriately increase the size of auto-aof-rewrite-min-size, which can be configured to use a minimum flashing amount of 5 times or more of the used memory; secondly, the flashing can be triggered actively to determine the memory usage quota. Flashing is performed when twice, and 50% of the memory is generally reserved to prevent OOM during actual deployment.
basic form of deployment :
Depending on whether the data needs to be persisted or whether it needs to be uniquely identified, services are distinguished into stateless and stateful services. Redis clusters need to clearly identify master-slave and shards. Most scenarios also require data persistence. Kubernetes uses StatefulSet to meet this category. need. The sequential deployment of StatefulSet and the automatic rolling update in reverse order can improve the availability of the Redis cluster.
specific:
• Redis Server is started with StatefulSet, sets the Master role for the Pod with the identifier {StatefulSetName}-0, and sets other Pods as slave nodes of the Master.
• Proxy does not need to store any data and is deployed by Deployment, which is convenient for dynamic expansion.
configuration file :
When Redis Server starts, some configuration files are required, which involve user names and passwords. We use Configmap and Secret to store them. Configmap is an Api object of Kubernetes, which is often used to store non-confidential key-value pairs smaller than 1MB. And Secret can be used to store objects containing sensitive information such as passwords, tokens, keys, and other data.
Both resources can be mounted inside the Pod through the Volume mechanism when the Pod is running.
stores :
Storage uses PVC (PersistentVolumeClaim) plus PV (Persistent Volumes). PV is a resource in the Kubernetes cluster and is dynamically provisioned by the storage class StorageClass. PV supports multiple access modes: ReadWriteOnce, ReadOnlyMany or ReadWriteMany. Storage resources are defined through PV. The PVC applies for the use of the storage resource. In addition, different storage backends, such as Cephfs, Cephrbd, Openebs, LocalStorage, etc., can be abstracted according to the StorageClass field of the storage.
Master-slave mode
Master-slave topology
Each CR established after Redis containerization represents a complete Redis service. The specific service modes include sentinel mode and cluster mode. In the process of containerization, in addition to covering the bare server deployment structure, the architecture is also specified Degree of optimization.
Native Sentry Mode
In the native sentinel mode, each instance is equipped with a set of sentinels.
Shared sentinel mode
Sharing a set of sentries for all instances will further increase the startup speed of instances and, to a certain extent, increase the utilization of hardware resources. The measured single set of sentries can easily handle hundreds of master-slave clusters.
Principle of Harmony
Reconcile realizes the function of continuous monitoring and repair of the master-slave cluster.
- Check whether all Pods are started as expected. For example, if 3 servers are created, then you need to start three as expected to continue the subsequent operations.
- Check the number of Masters to make sure that the instance has only one master node (the number is 0 and one is actively selected; the number is greater than 1 to be repaired manually).
Check sentinel:
(1) Whether all sentries monitor the correct Master;
(2) All sentries know the same Slave;
(3) Check the number of sentries again to ensure that they are all available.
- Check the Service and make the Endpoints of the Service point to the correct Master.
- Check whether the Redis config has been modified, and if there is, rewrite the config parameters for all nodes.
Cluster mode
Cluster topology diagram
Redis Cluster + Proxy mode
By introducing the proxy function into the traditional Redis Cluster architecture, dynamic routing and distribution is realized, and based on the native dynamic expansion and shrinking features of Kubernetes, it is easier to deal with sudden traffic and allocate resources rationally.
The proxy forwarding rules are as follows
The proxy forwarding rules are as follows:
• For commands that operate on a single Key, the Proxy will send the request to the data shards it belongs to according to the Slot to which the Key belongs.
• For commands that operate on multiple keys, if these keys are stored in different data slices, the Proxy will split the command into multiple commands and send them to the corresponding slices.
Before service deployment, some functions of the agent were also supplemented, such as removing unavailable nodes.
Principle of Harmony
Reconcile realizes continuous monitoring and repair of Redis Cluster.
ensure cluster health :
- After waiting for all Pods to become Ready and each node recognizes each other, the Operator will select one of the Pods in each StatefulSet as the Master node, and the remaining nodes are the slaves of the Master.
- Get the ips of all Pods in the instance cluster, and cluster info of all Pods (including nodeIP, master-slave relationship, etc.).
Enter the recovery process
(1) To deal with failed nodes, perform a forget operation on the invalid ip and noaddr zombie nodes after restarting some nodes;
(2) Dealing with untrusted nodes (all nodes in the handshake state) occurs when a certain node is removed (triggered by the forget node), but when trying to join the cluster, that is, the Pod exists under the Operator perspective, but the actual cluster node does not The node is not needed, the processing method is to delete the Pod, and do the forget operation again until the Pod is deleted.
- Choose a node and use CLUSTER MEET to add all known nodes to the node.
- Establish a master-slave relationship for the Pod in the StatefulSet, and assign Slots to it at the same time. If the current number of Masters is inconsistent with expectations, the corresponding expansion and contraction operations are performed. For details, see the horizontal expansion and contraction section of "Cluster expansion and contraction".
- Check whether the Redis config has been modified, and if there is, rewrite the config parameters for all nodes.
ensure the health of the agent :
- Get the Pod ip of all running state agents.
- Obtain the Redis Server information from the agent, synchronize the cluster information to all the agents, and remove the server ip that does not exist in the agent.
- If there is no Redis Server available in the agent, it means that it has been completely removed, add one, and the agent can automatically discover other Redis nodes in the cluster.
High availability strategy
High availability guaranteed by Kubernetes
(1) Container deployment to ensure high availability:
The smallest resource object for Redis deployment is Pod, which is the smallest/simplest basic unit created or deployed by Kubernetes.
When a startup error occurs, such as "CrashLoopBackOff", Kubernetes will automatically restart the Pod on this node. When a physical node fails, Kubernetes will automatically restart one on other nodes.
When the Pod has no problem, but the program is not available, relying on the health check policy, Kubernetes will also restart the Redis node.
(2) Rolling upgrade:
When nodes are expanded vertically, using StatefulSet's rolling upgrade mechanism, Kubernetes will restart and update each Pod in reverse order, which improves the availability of services.
(3) High availability of scheduling:
Kubernetes itself does not handle the deployment relationship between the clusters formed by multiple Redis Pods, but provides a deployment strategy to ensure high availability in specific scenarios, such as physical nodes causing all Redis nodes to be down, CRD has been added in the design Affinity and anti-affinity fields.
By default, podAntiAffinity is used for node breakup. As shown below, all Pods of instance1 will be scheduled to different nodes as much as possible.
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchLabels:
Redis.io/name: instance1
topologyKey: Kubernetes.io/hostname
weight: 1
High availability of Redis cluster
Various special situations inevitably occur during Redis service operation, such as node downtime, network jitter, etc. How to continuously monitor and repair such failures to achieve high availability of Redis clusters is also a problem that Operator needs to solve. The following is in sentinel mode The pattern is taken as an example to describe how the cluster performs failure recovery.
Primary node is down : If the Redis primary node is down due to physical node eviction, node restart, abnormal end of the process, etc., the sentinel will perform a switchover operation, and then Kubernetes will re-pull a Pod on the available physical node.
slave node is down : Redis cluster in sentinel mode does not enable read-write separation, and slave node downtime has no impact on the service. Subsequent Kubernetes will restart and pull up a Pod, and the Operator will set the Pod as the slave node of the new master node.
All nodes in the : The probability of occurrence is extremely small, but the service impact can be minimized based on persistence, and the service can continue to be provided after the cluster is restored.
node network failure : In the master-slave mode, three sentries are configured for cluster master selection operation. Each node of the sentry cluster will periodically send heartbeat packets to all nodes in the Redis cluster to check whether the node is normal. If a node does not reply to the heartbeat packet of the Sentinel node within the time of down-after-milliseconds, the Redis node is subjectively offline by the Sentinel node.
When a node is recorded subjectively offline by a Sentinel node, it does not mean that the node must be faulty. It also requires other Sentinel nodes in the Sentinel cluster to jointly determine that it is offline subjectively.
The Sentinel node will ask other Sentinel nodes. If the Sentinel node in the Sentinel cluster that exceeds the quorum number believes that the Redis node is subjectively offline, the Redis is objectively offline.
If the objectively offline Redis node is a slave node or a Sentinel node, the operation ends here and there is no follow-up operation; if the objectively offline Redis node is the master node, the failover will start, and a node will be elected from the slave node Upgrade to the master node.
Cluster mode failover is similar to the above, but does not require sentinel intervention, but is implemented by PING/PONG between nodes.
Monitoring observation
Redis monitoring uses the classic Exporter+Promethus solution. Exporter is used for indicator collection, and the data is stored in Prometheus or other databases. Finally, the Grafana front-end visualizes the service status.
Cluster expansion
(1) Vertical expansion and contraction
Vertical scaling mainly refers to the adjustment of Pod's CPU and memory resources. Based on the characteristics of Kubernetes, you only need to modify the spec field corresponding to the instance, and the operator's reconciliation mechanism will continue to monitor parameter changes and make adjustments to the instance. When modifying parameters such as cpu, memory, etc., the Operator will update the limit and request information of the StatefulSet synchronously. Kubernetes will update Pods in reverse order. During the rolling update, if the master node is stopped, the preStop function of the master node will first notify the sentry or the cluster to proceed. The data is saved, and then the master-slave switching operation is performed to minimize the impact of the service. The updated master-slave relationship establishment and the sentinel monitor master node function are also handled by the Operator, and the client is not aware of the whole process. Both the master-slave version and the cluster version support second-level flash interruption in this scenario.
(2) Horizontal expansion and contraction
Horizontal scaling mainly refers to the adjustment of the number of replicas or nodes. Thanks to the declarative API of Kubernetes, the cluster can be expanded and contracted non-destructively and elastically by changing the declared resource scale.
During the Redis Server expansion operation, the operator in the master-slave version will obtain the new node ip, and the newly-started node will trigger the slaveof master node operation in the next round of reconciliation, and the sentinel will not select the node as the master node during the synchronization process. In the cluster version, Operator will perform shard migration after synchronizing node information to ensure that Slots on all nodes are distributed as evenly as possible.
During the Redis Server shrinking operation, the Operator in the master-slave version will destroy the Pods in reverse order. When destroying, they will first ask the sentry whether they are the master node. If the master node is the master node, it will perform the failover operation first and then exit. In the cluster version, the operator will perform shard migration first, and then delete the node.
The expansion and contraction of the proxy is easier to achieve. According to the law of traffic peaks and troughs, you can manually expand the proxy periodically when the peak arrives. After the peak, the proxy can be reduced; it can also be dynamically expanded and contracted according to HPA. HPA is also a part of Kubernetes. These resources can be dynamically expanded and contracted based on CPU usage, memory usage, and traffic based on the data of Kubernetes' Metrics API.
Summary and outlook
This time, taking Redis as an example, I explained the practice of Youdao's infrastructure team on the road to infrastructure containerization. After Redis goes to the cloud, it will greatly shorten the cluster deployment time, support second-level deployment, minute-level startup, and cluster support after startup. Level self-healing, the cluster relies on the characteristics of sentinels and agents, and failover has no perception of users.
Youdao's architecture team finally provides middleware capabilities in the form of a cloud platform. Users do not need to pay attention to the resource scheduling and operation and maintenance of infrastructure, but focus on specific business scenarios to help business growth. In the future, it will further explore the dynamic expansion and shrinkage of Redis instances, fault analysis and diagnosis, online migration, and hybrid deployment.
What are the advantages of Redis containerization?
Kubernetes is a container orchestration system that can automate the deployment, expansion, and management of container applications. Kubernetes provides some basic features:
deployment : deployment is faster, no manual intervention is required for cluster establishment. After the container is deployed, the service of each Redis node can be guaranteed to be normal. After the node is started, the operator will continuously monitor and reconcile the Redis cluster status, including the master-slave relationship, cluster relationship, sentinel monitoring, and failover.
Resource isolation : If all services use the same cluster and the Redis cluster configuration is modified, other services are likely to be affected. But if you use a Redis group for each system independently, it will not affect each other, and there will be no situation where a certain application accidentally shuts down the cluster and then causes a chain reaction.
failure recovery :
(1) Restart of the instance: The health check after containerization can realize the automatic restart function of the service;
(2) Network failure: Due to the high instance delay caused by the host network failure, the sentinel can switch between master and slave. In order to ensure the health of the cluster, the operator will be responsible for synchronizing the cluster information.Expansion and shrinkage : Container deployment can limit the cpu and memory of the instance according to limit and request, and can also perform expansion and contraction operations. Failure recovery after expansion is handled by the Operator.
Node adjustment : Based on the operator's continuous reconciliation of CRD resources, the state can be maintained for each Redis instance in the Operator's Controller. Therefore, the establishment of the primary and secondary relationship and the migration of cluster Slots brought about by node adjustment can be automatically completed .
data storage : containerization can mount Cephfs, LocalStorage and other storage volumes.
monitoring and maintenance : After the instance is isolated, it is easier to find problems with monitoring tools such as Exporter and Prometheus.
-- END --
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。