Interviewer: HTTP Caching

Speed, speed, or speed, if a website wants to experience a good experience, it must be displayed at the fastest speed in the first time. MySQL query is slow, add a layer of redis for caching, website resources are slow to load, how to do it, use HTTP cache

HTTP caching has been around since HTTP/1.0, in order to reduce server pressure and speed up web page response

The target of the cache operation

HTTP cache can only store the response of GET request, and can do nothing for other types of requests

cache history

HTTP/1.0 proposes the concept of cache, namely strong cache Expires and negotiated cache Last-Modified. After HTTP/1.1, there is a better solution, namely strong cache Cache-Control and negotiated cache ETag

Why are Expires and Last-Modified not applicable?

Expires is the expiration time, but the problem is that this time point is the server's time. If the client's time is different from the server's time, it will be inaccurate. So use Cache-Control instead, which means the expiration time, there is no ambiguity

Last-Modified is the last modification time, and the unit time it can perceive is seconds, that is to say, if the content file is changed multiple times within 1 second, the content file is changed, but the display is still the previous one, and there are inaccurate scenes, so With ETag, the resource is identified by the content to determine whether the resource has changed

The following table is helpful for comparison and understanding

Version	Strong cache	Negotiate cache
HTTP/1.0	Expires	Last-Modified
HTTP/1.1	Cache-Control	ETag

Comparison of two cache types

The cache types in different versions have been described above. At that time, there was a strong cache and negotiation cache, but it was not introduced in detail. Now let's talk about these two cache types

Strong cache

Cache-Control

HTTP/1.1
The cache is controlled by the expiration time, and there are many corresponding fields, such as max-age
- For example, Cache-Control: max-age=3600, which means that the cache time is 3600 seconds, and it expires

Cache request directive:

 Cache-Control: max-age=<seconds>
Cache-Control: max-stale[=<seconds>]
Cache-Control: min-fresh=<seconds>
Cache-control: no-cache
Cache-control: no-store
Cache-control: no-transform

Cache response directive:

 Cache-control: must-revalidate
Cache-control: no-cache
Cache-control: no-store
Cache-control: no-transform
Cache-control: public
Cache-control: private
Cache-control: proxy-revalidate
Cache-Control: max-age=<seconds>

Among the key points:
- Cache-control: no-cache
  - Skip the current strong cache and send an HTTP request (if there is a negotiated cache identifier, it will directly enter the negotiation cache stage )
  - The meaning of no-cache is the same as max-age=0 , that is, skip strong cache and force refresh
- Cache-control: no-store
  - No caching (including negotiated caching)
- Cache-Control: public, max-age=31536000
  - Generally used for caching static resources
  - public: The response can be cached by intermediate proxies, CDNs, etc.
  - private: dedicated to personal cache, intermediate proxy, CDN, etc. can cache this response
  - max-age: The unit is seconds
For more instructions, please refer to the instruction booklet

Expires

HTTP/1.0
grammar:
- Expires: <http-date>
That is, the expiration time, which exists in the response header returned by the server
- Expires: Mon, 11 Apr 2022 06:57:18 GMT
- Indicates that the resource will expire at 6:57 on April 11, 2022, and a request will be sent to the server when it expires
If the "max-age" or "s-max-age" command is set in the Cache-Control response header, then the Expires header will be ignored
Disadvantage: Server time may not match browser time
For more instructions, please refer to the instruction booklet

Cache-Control vs Expires

Cache-Control is more accurate than Expires
When both exist, Cache-Control takes precedence over Expires
Expires is proposed by HTTP/1.0, and its browser compatibility is better. Cache-Control is proposed by HTTP/1.1 and can exist at the same time. When there are browsers that do not support Cache-Control, Expires will prevail.

Negotiate cache

The negotiation cache needs to be used with the strong cache. The premise of using the negotiation cache is to set the strong cache setting Cache-Control: no-cache or pragma: no-cache or max-age=0 to tell the browser not to strengthen the cache

pragma is a field that prohibits web page caching in HTTP/1.0. Its value is no-cache and the effect of no-cache in Cache-Control is the same.

ETag/If-None-Match

HTTP/1.1
That is, the unique identifier of the file is generated to determine whether it has expired. This value will change whenever the content changes
In conjunction with If-None-Match , ETag is the unique identifier returned to each resource file after requesting the server. The client will store this identifier in the client (ie the browser), and it will be displayed in the request header in the next request. If-Nono-Match will bring its value, and the server will judge whether If-None-Match is consistent with the ETag on its own server, if it is consistent, it will return 304, and the redirection jump will use the local cache; if it is inconsistent, it will return 200, return the latest resource to the client, and bring the ETag
For more instructions, please refer to the instruction booklet

Last-Modified/If-Modified-Since

HTTP/1.0
The last modification time, that is, whether it has expired or not is judged by the last modification time. After the browser sends a request to the server for the first time, the server will add this field to the response header
In cooperation with If-Modified-Since , when the client accesses the server resource, the server will put Last-Modified in the response header, that is, the last modification time of this resource on the server, the client caches this value, and waits for the next time When requesting this resource, the browser will detect the Last-Modified in the request header, so add If-Modified-Since , if the value of If-Modified-Since is consistent with the last modification time of this resource in the server, If it is not consistent, it will return 200, and the latest resource will be returned to the client with Last-Modified
shortcoming:
- Although the file has been modified, the final content has not changed, so the modification time of the file will still be updated.
- The modification frequency of some files is within seconds, so it is not applicable to record with second granularity
- Some servers cannot accurately obtain the last modification time of a file
For more instructions, please refer to the instruction booklet

ETag VS Last-Modified

Accuracy
- ETag > Last-Modified. ETag uses the content to identify the resource to determine whether the resource has changed, but Last-Modified is different, and the accuracy will fail in some scenarios. For example, when editing a file, but the content of the file has not changed, the cache will be invalid; or if it changes multiple times within 1 second, the unit time that Last-Modified can perceive is seconds.
performance
- Last-Modified > ETag. Last-Modified only records a time point, and ETag needs to generate a hash value based on the specific content of the file
If both are supported, the server will prefer ETag

Negotiate cached conditional requests

As mentioned earlier, the negotiation cache is to add If-None-Match or If-Modified-Since to the request header. What are these request headers and what is the use of adding them?

Strong cache is to control the cache through specific time expiration or expiration time. This is a problem. If some of the files are modified, the browser will still display the original data because of the strong cache. Data cannot be cached using strong cache. Therefore, there is a negotiated cache, which tells the browser that the cache is invalid through file changes. Before using it, you need to go to the server to verify whether it is the latest version.

In this way, the browser will send two consecutive requests to verify:

The first is the HEAD request, which obtains the meta information such as the modification time and hash value of the resource, and then compares it with the cached data. If there is no change, the cache is used.
Otherwise, send another GET request to get the latest version

However, the network cost of such two requests is too high, so the HTTP protocol defines a series of conditional request fields starting with If, which are specially used to check and verify whether the resource expires, and combine the two requests in one request. And the responsibility of verification is also given to the server

If-Modified-Since: Compared with Last-modified, whether it has been modified
If-None-Match: Compare with ETag, whether the unique identifier is consistent
If-Unmodified-Since: Compared with Last-modified, whether to modify
If-Match: Compare with ETag for match
If-Range

Among them, the most common ones are If-Modified-Since and If-None-Match. They correspond to Last-Modified and ETag respectively. It is necessary to provide Last-Modified and ETag in advance in the first response message, and then the original address in the cache can be brought in the second request to verify whether the resource is up-to-date.

If the resource has not changed, the server will respond with a 304 Not Modified, indicating that the cache is still valid, the browser can update a validity period, and then use the cache

缓存流程

When to use strong cache and when to use negotiated cache?

First of all, the weight of strong cache is greater than that of negotiation cache. When strong cache exists, negotiation cache can only watch it; secondly, the cache identifier in HTTP/1.1 is greater than HTTP/1; so when Cache-Control exists, watch it, if it If it does not exist, look at Expires. If the strong cache is set to Cache-Control：no-cache , Cache-Control：max-age=0 , pragma: no-cache , it will tell the browser not to enter the strong cache.

Determine whether there is an ETag in the last response, if so, initiate a request with a conditional request in the request header If-None-Match , if not, then determine whether there is Last-Modified in the last response, if so, Then initiate a conditional request with If-Modified-Since in the request header. If not, it means that there is no negotiated cache, and you can initiate an HTTP request. Whether it is a request with If-None-Match or a request with ---0991384ad0fbffb71f16de5f48c90914 If-Modified-Since , the status will be returned (the server interprets whether the resource has changed). If it is 304, it means that the cache resource has not changed, and the local cache is used; If it is 200, it means that the resource has changed, initiate an HTTP request, and remember the ETag/Last-Modified in the response header

The general flow chart is as follows:

缓存判断流程图

So which resources should use strong caching, and which resources should use negotiated caching?

It is not difficult to understand that resources such as static resources that we will not change for a long time should use strong caching; and files that we often modify should use negotiated caching. If the resource does not change, then the user will still use the resource when the user enters the second time. , if the resource is modified, the user enters to initiate an HTTP request to obtain the latest resource

When we visit the website, if we pay attention, we can observe one or two in F12. As shown in the figure, my five-year front-end three-year interview is placed on the github server. F12 enters the network and can see the information in the return header. Cache-Control, Expires, ETag, Last-Modified all exist

五年前端三年面试

cache location

It is often mentioned above that whether strong cache or negotiation cache is used, it will be obtained locally from the browser, so where does the browser's local storage exist, and what are their classifications?

According to the cache location, it is divided into four parts, Memory Cache (memory cache), Disk Cache (hard disk cache), Service Worker, Push Cache

Memory Cache

Because of limited memory, not all resource files will be cached in memory. It is mainly used to cache resources with preloader related instructions, such as <link rel="prefetch"> . The preloader can parse js/css files while requesting the next resource from the network

Disk Cache

Cache on disk. Among all browser caches, disk cache has the largest coverage. It will determine which resources need to be cached according to the fields in the HTTP Header, and which resources have expired and need to be re-requested from the server.

Service Worker

Independent thread, drawing on the idea of Web Worker. That is, let JS run outside the main thread, because it is out of the browser window, because it cannot directly access the DOM, but it can still do many things, such as

Offline cache, Service Worker Cache
message push
web proxy
It is an important implementation mechanism of PWA

Push Cache

i.e. push cache, last line of defense in browsers, content in HTTP2

Priority: Service Worker-->Memory Cache-->Disk Cache-->Push Cache.

practice

After talking about so much theoretical knowledge, when I wait for the actual combat, I am at a loss. How can I break it?

The above are all verbal debates, only practice can bring out the truth

At present, front-end projects are packaged with webpack or webpack-like tool library, configure hash in webpack, and the front-end caching work is completed

The effect we want to achieve is:

HTML: Negotiate cache
CSS, JS, pictures and other resources: strong cache, file name with hash

There are three kinds of hashes in webpack: hash, chunkHash, contentHash

Hash: It is related to the construction of the entire project. As long as the project file is changed, the hash value of the entire project construction will change.
chunkHash: related to the chunk packaged by webpack, different entries will generate different chunkHash values
contentHash: Define the hash according to the content of the file. If the content of the file is unchanged, the contentHash will remain unchanged.

Here, CSS needs to be processed with contentHash, and other resources are processed with chunkHash.

Non-front-end engineering projects

That is, the traditional front-end page is generally placed in a static server, so it is necessary to perform version control on the modified files, such as adding a version number (index-v2.min.js) or adding a timestamp (time) to the entry file index.js =1626226), as a caching strategy

Backend cache practice

What really plays the role of caching is to set the caching strategy in the backend, and tell the browser whether it can do caching. Here we make a demo for strong cache and negotiation cache to experiment.

Strong caching scheme

code show as below:

 const express = require('express');
const app = express();
var options = { 
  etag: false, // 禁用协商缓存
  lastModified: false, // 禁用协商缓存
  setHeaders: (res, path, stat) => {
    res.set('Cache-Control', 'max-age=10'); // 强缓存超时时间为10秒
  },
};
app.use(express.static((__dirname + '/public'), options));
app.listen(3008);

PS: The source of the code is: Graphical HTTP cache . When doing the test, you need to pay attention to it. Under strong cache, refreshing the page cannot be measured, and it will be valid after clicking and returning.

强缓存效果

Negotiate caching scheme

code show as below:

 const express = require('express');
const app = express();
var options = {
    etag: true, // 开启协商缓存
    lastModified: true, // 开启协商缓存
    setHeaders: (res, path, stat) => {
        res.set({
            'Cache-Control': 'max-age=00', // 浏览器不走强缓存
            'Pragma': 'no-cache', // 浏览器不走强缓存
        });
    },
};
app.use(express.static((__dirname + '/public'), options));
app.listen(3001);

The effect is as follows:

协商缓存效果
Attach two demo addresses for your reference

Summarize

Why does HTTP cache, in order to share server pressure, and to make pages load faster

What means? HTTP's strong cache and negotiation cache. Strong cache works on resources that do not change very much (such as imported libraries, js, css, etc.), and negotiation cache is suitable for frequently updated files (such as html)

What is strong cache? In HTTP/1.0, it is based on Expires, but it is not accurate. After the HTTP protocol is upgraded to 1.1, it is replaced by a new identifier Cache-Control, but both can exist at the same time, and the weight of Cache-Control is greater.

What is Negotiation Cache? In HTTP/1.0, it is based on Last-Modified, that is, the last expired modification time, which is also inaccurate. After HTTP is upgraded to 1.1, it is replaced by a new identifier ETag. Both can exist at the same time, and the latter has a greater weight.

Whether it is Expires or Last-Modified, it is based on the time point. In theory, there is no problem, but there is a problem, so there is a new solution.

When the strong cache exists, the browser will use the strong cache identifier to cache, and when the strong cache is set to be invalid, the browser will use the negotiated cache as the cache strategy

The above, even if the author understands the HTTP cache