Idle billing

Author: Xiao Qi | Alibaba Cloud Serverless Senior Development Engineer

I heard that you have done such a technical selection

Xiao Wang is a programmer. The company's application runs on the server in the self-built computer room. All the underlying services and operation and maintenance need to be done by himself. Every upgrade and machine expansion will bring a relatively large operation and maintenance. At the same time, in order to expand the capacity in time, many idle machines have been piled up, and the cost of machines has always been relatively high. Recently, the company has newly developed two application systems. Xiao Wang is doing technology selection. He plans to embrace cloud computing, deploy new applications on the cloud, and design a set of high flexibility, low cost, simple operation and maintenance, and can easily deal with business emergencies. The architecture plan of increasing traffic allows you to devote more energy to business development and reduce your operation and maintenance burden.

The two apps share several common features:

Both applications are online applications and have relatively high requirements for call delay and service stability.
Application traffic varies greatly with business, and it is difficult to predict how much business volume will increase in advance, which requires high flexibility.
There is an obvious business low-peak period, and the call volume during the low-peak period is relatively low. It is expected that the low-peak period is mainly concentrated in the evening.
Long application startup time: One is the Java SpringBoot order system, and the other is an AI image recognition system based on large-scale mirroring, and the startup time is nearly 1 minute.

Xiao Wang's needs can be summed up in three ways:

First, I hope to save trouble and worry in operation and maintenance . After delivering the jar package or image, the application can be run with a simple configuration, and there is no need to spend special effort on operation and maintenance, monitoring, and alarming.
Second, the flexibility is better. When the business traffic increases, the capacity can be automatically expanded in time, and when the traffic decreases, the capacity can be automatically reduced.
The third is to use cloud computing to improve resource utilization and have more advantages in cost .

Let's take it apart to see how Xiao Wang selects the technology step by step.

Highly integrated services, free operation and maintenance, and high flexibility

When making technology selection, Xiao Wang considered three technical architectures: SLB + cloud server + traditional architecture with elastic scaling, K8s architecture, and functional computing (FC) architecture.

title=

The traditional architecture needs to do SLB load balancing by itself; configure the elastic scaling service, and continuously debug to find a suitable scaling strategy; and also collect logs to create alarms and monitor the dashboard. The operation, maintenance and deployment costs of this set are actually not very low. Is there a more trouble-free solution?

Xiao Wang further investigated the K8s architecture. K8s Services and Ingress rules can manage access to the application layer, so that you don't need to do SLB load balancing yourself, and use HPA to scale horizontally according to the application water level. This seems very good, but in the actual test, it is found that the scaling of HPA is at the minute level. It is not a big problem if the scaling is slower. However, when the traffic increases rapidly, the scaling is always delayed by a few minutes, which will cause some requests to be delayed. Increased or failed, affecting service availability. If the indicator threshold for capacity expansion is lowered, this problem can be solved, but at the same time, the resource utilization rate is reduced, and the cost increases a lot. In addition, you need to do log collection, alarm and monitoring by yourself, and the operation and maintenance costs are also quite high. Moreover, Xiao Wang has never been in contact with K8s before, and understanding the various concepts of K8s really costs a lot.

The FC-based architecture can solve the above problems very well. First, FC supports reservation mode and automatic scaling capabilities based on instance metrics. In this mode, it can achieve more sensitive and rapid capacity expansion and shrinkage capabilities, and ensure that the request delay remains stable during the expansion and shrinkage period; secondly, FC height It integrates many out- of-the-box functions, and the experience is smooth and worry-free, such as: http triggers are provided to save the work of docking gateways and SLBs; the console provides complete observability capabilities, and it is easy to view requests, instance status and operation. log. Finally, FC only needs to pay for the active resources used during the call and call, and no cost is incurred when there is no call , which can fully improve resource utilization and reduce costs.

Let's introduce the use of the reservation mode in detail, and how to reduce the usage cost of reservation through idle billing.

Reserved mode, perfect solution to cold start

FC supports two usage modes: volume- based and reserved . The volume-based mode automatically triggers the creation, expansion and contraction of instances through requests, creates instances when the number of calls increases, and destroys instances when requests decrease. The meter-by-volume mode fully improves resource utilization, but for applications like Xiao Wang that take a long time to start, there will be an obvious cold start phenomenon when an instance is created in the meter-by-volume mode.

To address this cold start problem, FC provides a reserved usage pattern. After the user configures the reservation, FC will create a specified number of reserved instances to reside in the system until the user updates the reservation configuration to release them. When there is a request, the reserved instance will be scheduled first. After the reserved instance is used up, a new request will trigger the creation of a metered instance. At the same time, in order to make the number of reserved instances better fit the business curve, it also provides reserved timing scaling and index scaling capabilities to improve the utilization of reserved instances. See the appendix at the end of this article for flexibility management [ 1] for more details.

In this way, it not only solves the problem of long application cold start time, but also ensures that reserved instances are maintained at a relatively high utilization level. Even if there are occasional large traffic fluctuations, you can temporarily expand the number of instances to respond to requests, and try to ensure the quality of service when traffic increases rapidly.

title=

Idle billing, cost reduction killer

In real usage scenarios, in order to ensure low latency of application requests, a certain number of reserved instances must be maintained even when there are no requests, which increases costs. Is there a way to achieve both low latency and low cost? In order to help users reduce usage costs in this scenario, Function Compute has introduced the idle billing function of reserved instances . Let's take a closer look at this function.

According to whether the reserved instance is processing requests, we classify the instances into two states: idle and active, and set the billing unit price for the two states respectively. The active billing unit price is consistent with the original resource usage unit price, and the idle billing unit price is 20% of the active billing unit price. Turning on idle billing can help you save a lot of costs.

title=

By default, the idle billing function is turned off. At this time, regardless of whether the instance in reserved mode is processing requests, FC will allocate CPU to the instance and keep the instance in an active state to ensure that the instance can still work normally when there is no request. Run background tasks. After the idle billing function is enabled, when there is no request for an instance in reserved mode, FC will freeze the CPU on the instance and make the instance idle.

title=

By adding idle billing, reserved instances can only pay for the CPU resources that are actually used. When the Reserved Instance is idle, you only need to pay 20% of the fee to deal with the problem of cold start of the instance. This will help users to significantly reduce the cost of using reserved instances. At the same time, users can worry less about the utilization of reserved instances and use reserved instances with confidence.

title=

Taking the above figure as an example, assume that the utilization rate of reserved instances is 60%, and the original usage cost is 1. After using idle billing, the cost is 60% 1 + 40% 20% *1 = 0.68, which can bring a 32% cost reduction.

Configuration method

You can configure reserved instances and idle billing through the console and the SDK.

Log in to the Function Compute console, select Create Rule on the Home->Flexibility Management page, and then configure [Idle Billing]. At the same time, you can use the SDK for configuration, and support Java, Go, Node.js and other languages. For details, please refer to API online debugging [2 ] .

title=

After the idle billing is enabled, you can check the idle resource usage charges for elastic instances and performance instances in Billing Center - Billing Details - Detailed Billing (billing bills are generally generated after a delay of 3 to 6 hours).

title=

Epilogue

Function Compute (FC) has been committed to providing users with fully managed computing services with high elasticity, free operation and maintenance, and low cost. The release of the idle billing function can help users further reduce the cost of using reserved instances, allowing users to pay only for the reserved resources that are actually used. Function Compute will gradually release more technical dividends of Serverless, and continue to provide users with more extreme performance in terms of performance, cost, and experience.

Reference documentation link:

[1] Resilience management:

https://help.aliyun.com/document_detail/185038.html

[2] API online debugging:

https://next.api.aliyun.com/api/FC-Open/2021-04-06/PutProvisionConfig?

[3] Billing overview:

https://help.aliyun.com/document_detail/54301.html

Click here to learn more about the functional details of Function Compute FC!

Idle billing | The optimal solution between serverless cold start and cost

I heard that you have done such a technical selection

Highly integrated services, free operation and maintenance, and high flexibility

Reserved mode, perfect solution to cold start

Idle billing, cost reduction killer

Idle billing

Configuration method

Epilogue

阿里云云原生

引用和评论

如何在通义灵码里使用 MCP 能力？

K8s 小白入门｜从电影配乐谈起，聊聊容器编排和 K8s

全网首发 | PAI Model Gallery一键部署阶跃星辰Step-Video-T2V、Step-Audio-Chat模型

无需编码5分钟免费部署云上调用满血版DeepSeek

支付宝H5下载被拦截的原因排查与解决指南

如何在通义灵码里用上DeepSeek-V3 和 DeepSeek-R1 满血版671B模型？

云上玩转DeepSeek系列之四：DeepSeek R1 蒸馏和微调训练最佳实践