HTTP series: HTTP caching

Introduction

In order to improve the access speed and efficiency of the website, we need to design a variety of caches, through which unnecessary additional data transmission and requests can be avoided, thereby increasing the request speed of the website. For the HTTP protocol, it has its own HTTP cache.

Today we will delve into the caching mechanism and use in HTTP.

Types of cache in HTTP

Caching is to save a copy of the requested resource locally, so that when the next request is made, the copy is directly returned without downloading the resource from the server, thereby reducing the transmission of resources and improving efficiency.

In addition to directly accessing and returning resources, the cache in HTTP can be divided into two categories, one is shared cache, which means that different clients can obtain resources from the shared cache, and these resources can be used by multiple clients. Visited. There is also a private cache, which means that the cache can only be accessed privately by users or clients, and other users have no right to access.

Private cache is easy to understand. The caches in our commonly used browsers are basically private caches. These caches are unique to the browser and will not be shared with other browsers.

Shared cache is mainly used in some web proxies, such as web proxy servers, because web proxy servers may provide resource services for many users. It is not necessary for each user to save a copy of the resources that these users access together. Just save one copy in the web proxy server, which can reduce invalid copies of resources.

The status of the cached response in HTTP

For HTTP caching, GET requests are generally cached, because GET requests have no extra parameters other than the URI, and their meaning is to obtain resources from the server.

Different GET requests will return different status codes.

If the resource is returned successfully, it will return 200 to indicate OK.

If it is a redirection, return 301. If it is abnormal, 404 is returned. If it is an incomplete result, 206 will be returned.

Cache control in HTTP

Cache control in HTTP is expressed through HTTP headers. Cache-Control is added in HTTP1.1, and we can control the caching of requests and responses through Cache-Control.

If caching is not needed, use:

Cache-Control: no-store

If you need to verify the client's cache, use:

Cache-Control: no-cache

If you want to force verification, you can use:

Cache-Control: must-revalidate

In this case, expired resources will not be allowed to use.

For the server, you can use Cache-Control to control whether the cache is private or public:

Cache-Control: private
Cache-Control: public

Another very important cache control is the expiration time:

Cache-Control: max-age=31536000

By setting max-age, the Expires header can be overwritten, indicating that in this time interval, the resource can be regarded as the latest and does not need to be re-obtained from the server.

Cache-Control is the header field defined in HTTP 1.1, and there is a similar field called Pragma in HTTP 1.0. The effect similar to Cache-Control: no-cache can be obtained by setting Pragma: no-cache. That is to force the client to resubmit the cache to the server for verification.

But for the server-side response, Pragma is not included, so Pragma cannot completely replace Cache-Control.

Cache refresh

After the cache is stored on the client, it can be used when requested. But to be safe, we need to set an expiration time for the cache. The cache is valid only in the time range before the expiration time. If the expiration time is exceeded, it needs to be retrieved from the server.

Such a mechanism can ensure that the resources obtained by the client are always up to date. And it can ensure that the server-side updates to resources can reach the client-side in time.

If the client's resource is in the expiration time or the like, then the state of the resource is fresh, otherwise the state of the resource is stale.

If the resource is in the stale state, the resource will not be immediately cleared from the client, but in the next request, an If-None-Match request is sent to the server to determine whether the resource is still in the fresh state on the server side Yes, if the resource has not changed, it returns 304 (Not Modified), indicating that the resource is still valid.

The duration of this fresh is judged by "Cache-Control: max-age=N".

If there is no such header in the response, it will determine whether the Expires header exists. If it exists, then the fresh time can be calculated using Expires-Date.

If there is no Expires header in the response, how to judge the fresh time of the resource?

In this case, the Last-Modified header will be searched. If this header exists, the fresh time is (Date-Last-modified)/10.

revving

In order to improve the efficiency of HTTP requests, we certainly hope that the longer the cache time, the better, but as we mentioned earlier, too long a cache time will cause the problem of difficult server resource updates. How to solve it?

For those files that are not frequently updated, the URL for requesting them can be determined by the file name + version number. The same version number means that the content of the resource is fixed, and we can cache it for a very long time.

When the server resource content changes, only the version number needs to be updated when requested.

Although such an operation will cause the modification of server resources and the version requested by the client at the same time, with the help of modern front-end packaging tools, this is not a big problem.

Cache verification

When the cached resource expires, there are two processing methods, one is to request the resource from the server again, and the other is to check the cached resource again.

Of course, verifying again requires the support of the server, and the "Cache-Control: must-revalidate" request header needs to be set.

So how does the client verify that the resource is valid? Obviously, we cannot send resources from the client to the server for verification. Such an operation method is too complicated, and will cause a waste of resources when the file is relatively large.

One way we can easily think of is to perform a hash operation on the resource file, as long as the result of this hash operation is sent for comparison.

Of course, in HTTP, an ETags header is provided. This header can be regarded as the only tag of the resource for verification on the client and server. In this way, the client can request an If-None-Match to let the server determine whether the resource is a match. This judgment is called a strong check.

There is also a weak verification method. If the response contains Last-Modified, the client can request an If-Modified-Since to ask the server whether the file has changed.

For the server side, it can choose whether to verify the file. If it is not verified, it can directly return a 200 OK status code and directly return the resource. If the verification is performed, a 304 Not Modified is returned, indicating that the client can continue to use the cached resources, and at the same time, it can also return some other header fields, such as updating the expiration time of the cache.

Vary response

When the server responds, you can bring the Vary header. The value of this Vary header is a key in the response header, such as Content-Encoding, which means to cache a certain encoding resource.

For example, the client first requests:

GET /resource HTTP/1.1
Accept-Encoding: *

The server returns:

HTTP/1.1 200 OK
Content-Encoding: gzip
Vary: Content-Encoding

The resources will be cached together with the gzip type Content-Encoding.

When the customer requests again:

GET /resource HTTP/1.1
Accept-Encoding: br

Because the encoding method of the current cached resource is gzip, which is different from the encoding method accepted by the client, it needs to be obtained from the server again:

HTTP/1.1 200 OK
Content-Encoding: br
Vary: Content-Encoding

At this time, the client caches another resource in br format.

The next time the client requests a resource of type br again, it can hit the cache.

To sum up, Vary means to distinguish and cache resources by other types such as encoding.

However, this will also cause the problem of repeated storage of resources. The same resource is cached in many copies due to different encoding formats. In order to solve this problem, it is necessary to standardize resource requests.

The so-called standardization is to verify the requested encoding method before requesting, and only select one of the encoding methods for the request, so as to avoid the situation where resources are cached multiple times.

Summarize

At this point, the introduction of HTTP caching is complete. You can deepen your understanding of HTTP caching in practical applications.

This article has been included in http://www.flydean.com/04-http-cache/
The most popular interpretation, the most profound dry goods, the most concise tutorial, and many tips you don't know are waiting for you to discover!
Welcome to pay attention to my official account: "Program those things", know technology, know you better!

HTTP series: HTTP caching

Introduction

Types of cache in HTTP

The status of the cached response in HTTP

Cache control in HTTP

Cache refresh

revving

Cache verification

Vary response

Summarize

flydean

引用和评论

在stable diffussion中完美修复AI图片

腾讯 tRPC-Go 教学——（1）搭建服务

@tanstack/react-query 实践

腾讯 tRPC-Go 教学——（2）trpc HTTP 能力

腾讯 tRPC-Go 教学——（4）tRPC 组件生态和使用

腾讯 tRPC-Go 教学——（3）微服务间调用

【深度揭秘】Caffeine 缓存引发的内存泄漏全攻略：从根源到解决方案