[Vernacular Popular Science] What does 404 mean when I surf the Internet?
With the development and progress of the Internet age, our study, work and life have long been inseparable from the Internet. Smart homes, online shopping, and daily travel all require the support of the Internet. The Internet has brought a lot of convenience to life.
Have you ever encountered such a situation? When we browse some information using a mobile phone or computer, or search for information in a search engine, click the search result to jump, and the browser will pop up a blank page of 404 Not Found.
I believe that all old netizens are familiar with the number "404". This error code means that the server did not find the file, usually the page that was accessed has been changed or removed, or the wrong access address was entered.
Then why use 404 instead of other numbers to represent that the access resource does not exist? There is such a "legend" about the birth of 404 on the Internet. It is rumored that before the third technological revolution, the entire Internet was like a large central database set in a room called 404. At that time, all Internet access requests were manually completed manually. If the file requested by the requester is not found in Room 404, or because the requester has written a wrong file number, the staff will return a "Room 404: File" Not Found" information.
Of course, after actual research, it is found that the legendary Room 404 does not actually exist, and the true source of the 404 has to start with the HTTP protocol, the root of the Internet.
The origin of the status code
As we all know, the establishment of the Internet has broken geographical restrictions. Through the communication between the browser and the server, we can know the world without leaving the house. The communication between the browser and the server is through the HTTP protocol.
HTTP (Hypertext Transfer Protocol), Hypertext Transfer Protocol, it is an application layer protocol. Because of its simple and fast way, it is suitable for distributed and cooperative hypermedia information systems. Since 1990, it has been applied to the World Wide Web (WWW) global information service system.
The process of the user surfing the Internet is that the browser sends a request to the server through the HTTP protocol, and then displays the content on the server host to the local.
Supporting the work of the HTTP protocol is the model worker of the TCP/IP protocol, which is responsible for the underlying data transmission. From this point of view, the so-called "Hypertext Transfer Protocol" has nothing to do with transmission, which is a bit of a misnomer. So why is HTTP still called the transport protocol? The answer is that it transmits the content of the message.
The HTTP protocol defines the format of the message in the specification document in detail, specifies the components, parsing rules, and processing strategies, so it can implement more flexible and rich functions in addition to data transmission on the TCP/IP layer.
The TCP protocol message adds a 20-byte header data before the actual data to be transmitted, and stores additional information necessary for the TCP protocol, such as the port number of the sender, the port number of the receiver, the packet sequence number, and the flag bit and many more. With this additional TCP header, the data packet can be transmitted correctly. After the header is removed at the destination, the real data can be obtained.
The HTTP protocol also needs to add this kind of header data before the actual transmitted data. However, unlike TCP, it is a "plain text" protocol. The header data is all ASCII text, which can be easily read with the naked eye. Can understand without resorting to program analysis.
The structure of the request message and response message of the HTTP protocol is basically the same, and it is mainly composed of three parts:
- Status line: describes the basic information of the response, that is, the status of the server's response;
- Header field set (header): use the key-value form to describe the message in more detail;
- Message body (entity): The actual response data, which is not necessarily plain text, but can be binary data such as pictures and videos.
The status line and the header field are often collectively referred to as the "response header", and the message body is also referred to as the "entity", which corresponds to the "header", and in many cases is directly referred to as the "body".
The HTTP protocol stipulates that the message must have a header, but it can have no body, and there must be a "blank line" after the header, that is, "CRLF", hexadecimal "0D0A".
Take the response header returned after uploading the cloud storage interface file as an example. The first line "HTTP/2 200 OK" is the status line, which consists of three parts:
- Version number: indicates the HTTP protocol version used by the message, the version in the figure above is HTTP/2;
- Status code: a three-digit number that expresses the result of the processing in the form of a code, for example, 200 means success, 404 means the resource does not exist;
- Reason phrase: As a supplement to the digital status code, it is a short text description of the status code, such as "OK" and "Not Found".
The following "Content-Type", "Connection", etc. belong to the header, and the end of the message is a blank line ending with no body.
In most cases, HTTP messages only have headers and no body. Although the HTTP protocol does not limit the size of the header, because the header is too large, it may take up a lot of server resources and affect operating efficiency. Therefore, all Web servers do not allow excessively large request headers. Even so, many big heads are still running around on the Internet.
In order to reduce the resources occupied by the "big head" as much as possible, and to reduce the time for detecting wrong address access, websites generally choose status codes to bear this responsibility, because numbers can better reduce the size of HTTP message headers than words.
The response message allows the client to quickly know whether the request is processed correctly through the status code, so that the server can select the most appropriate status to process the request and reply to the client through the status code. At the same time, through various status codes, the server clearly informs the client of the response status, allowing the client to clarify its next operation.
Currently, there are 41 status codes in the RFC standard, which can be extended by themselves. Web servers such as Apache and Nginx have defined some proprietary status codes. When developing web applications, we can also set our own proprietary status codes without conflict.
Common status codes
Next, let's talk about what each common status code represents in detail?
The meaning of the status code is to express the "status" of HTTP data processing. The client can switch the processing status in time according to the code, which is usually a decimal number. The status code specified in the RFC standard is a three-digit number, with a value range from 000 to 999 . Common status codes have a certain design format and are divided into five categories. The first digit of the number is used to indicate the classification, and 0~99 are not used. In this way, the actual usable range of the status code is greatly reduced, from 000 to 999. 100~599.
The 1×× status code belongs to the prompt information, which is the intermediate state of the protocol processing, and it is rarely used in practice.
What we can occasionally see is "101 Switching Protocols". It means that the client uses the Upgrade header field and requires that the HTTP protocol be changed to another protocol to continue communication, such as WebSocket. If the server also agrees to change the protocol, it will send a status code of 101, but the data transmission after that will no longer use HTTP.
There is also "100 Continue". Indicates that everything is normal so far, the client should continue the request, if the request has been completed, ignore it. It usually appears in file uploads.
The 2×× status code indicates that the server has received and successfully processed the client's request, which is also the status code that the client is most willing to see.
"200 OK" is the most common success status code, indicating that everything is normal and the server returned the processing result as expected by the client.
"204 No Content" is another very common success status code. Its meaning is basically the same as "200 OK", but there is no body data after the response header.
"206 Partial Content" is generally used as the basis for block download or resumable transmission. It appears when the client sends a "scope request" to request partial data of the resource. It is the same as 200, and the server successfully processed the request, but The data in the body is not all of the resource, but a part of it. The status code 206 is usually accompanied by the header field "Content-Range", which indicates the specific range of the body data in the response message for the client to confirm, for example, "Content-Range: bytes 0-66/888", which means this time What is obtained is the first 66 bytes totaling 888 bytes.
The 3×× status code indicates that the resource requested by the client has changed, and the client must re-send the request to obtain the resource with a new URI, which is commonly referred to as "redirect", including the "famous" 301 and 302 jumps .
"301 Moved Permanently" is commonly known as "permanent redirection", which means that the requested resource no longer exists, and it needs to be accessed again with a new URI. Similar to it is "302 Found", the previous description phrase was "Moved Temporarily", commonly known as "temporary redirect", which means that the requested resource is still there, but it needs to be temporarily accessed with another URI.
"304 Not Modified" is an interesting status code. It is used for conditional requests such as If-Modified-Since. It indicates that the resource has not been modified and is used for cache control. It does not have the usual meaning of redirection, but can be understood as "redirecting files that have been cached" (ie "cache redirection").
The 4×× status code indicates that the request message sent by the client is wrong, and the server cannot process it. It is a status code with the meaning of a real "error code".
"400 Bad Request" is a general error code, indicating that there is an error in the request message, but the specific data format is wrong, the request header is missing, or other errors will not be clearly indicated. Therefore, we will generally try to avoid giving it to customers during Web development. The terminal returns 400, and other status codes with more clear meaning are used.
"403 Forbidden" is not actually an error in the client's request, but it means that the server forbids access to resources. The reasons may vary, such as sensitive information, legal prohibitions, etc.
"404 Not Found" is probably the most common status code we see. It generally means that the resource is not found on this server, so it cannot be provided to the client.
Some of the remaining codes in 4×× clearly explain the cause of the error, and they are all well understood. Commonly used in development are:
- 405 Method Not Allowed: Some methods are not allowed to operate resources, for example, POST is not allowed but only GET;
- 406 Not Acceptable: The resource cannot meet the conditions requested by the client, for example, the request is in Chinese but only in English;
- 408 Request Timeout: The request timed out, and the server waited too long;
- 409 Conflict: multiple requests conflict, which can be understood as a race condition when multiple threads are concurrent;
- 413 Request Entity Too Large: The body in the request message is too large;
- 414 Request-URI Too Long: The URI in the request line is too large;
- 429 Too Many Requests: The client sent too many requests, which triggered the server's restriction;
- 431 Request Header Fields Too Large: A field or the whole of the request header is too large.
The 5×× status code indicates that the client request message is correct, but an internal error occurred when the server was processing it, and the response data could not be returned. It is an "error code" on the server side.
"500 Internal Server Error" is similar to 400. It is also a general error code. We don't know what error happened to the server. However, contrary to the 400 response, the developer usually does not return the detailed information of the error inside the server to the accessing side. Although it is not conducive to debugging, it can prevent hackers from snooping or analyzing.
"501 Not Implemented" means that the function requested by the client is not yet supported, which is similar to the meaning of "opening soon, so stay tuned".
"502 Bad Gateway" is usually the error code returned when the server is acting as a gateway or proxy, indicating that the server itself is working normally and an error occurred when accessing the back-end server, but the specific cause of the error is also unknown.
"503 Service Unavailable" means that the server is currently busy and cannot respond to the service temporarily. The prompt message "The network service is busy, please try again later" that we sometimes encounter when surfing the Internet is the status code 503.
How to deal with 404
Back to the 404 problem we mentioned at the beginning. In actual business, it is inevitable to encounter situations where the wrong link address is entered to access non-existent resources, or the server cannot be accessed due to a sudden failure. However, the default error response page provided by the Web server, whether Nginx, Apache or IIS, is not very beautiful, the page is simple, dull, and user-friendly, unable to provide users with intuitive and clear information, resulting in a decline in user experience.
Therefore, many developers use custom error pages to enhance user experience and avoid user loss. Take 404 as an example. The common practice for custom 404 pages is to place website quick navigation links, search boxes, and special services provided by the website on the page, which can effectively help users access the site and obtain the information they need.
For example, many developers will use the "Baby Go Home-Charity 404 Project" provided by Tencent Charity. Developers can quote a piece of code in a custom 404 interface. When a user accesses a 404 resource, the web page will show that the resource does not exist. At the same time, some information about missing children is loaded, and the information about missing children can be quickly disseminated through the Internet, thereby increasing the probability of finding missing children. This kind of operation makes technology full of temperature and embodies humanistic care, which is exactly where the romance of technology lies.
If you don't know how to customize the error response page, but you really want to have it. You can take a look at the CDN of the cloud again, or the custom page function of the cloud storage service. It can help you quickly configure 4XX, 5XX error response pages. Just open the console, you can configure the error response and error response diagram according to your own needs, which is very convenient and easy to use.
In addition, you can also use edge rules to allow different error codes to correspond to different URL redirection, URL rewriting and other web page guidance operations.
[Vernacular Popular Science] Talk about those little knowledge about DNS
二狗子的火锅店被隔壁老王 DDoS 攻击了
花了几个月时间把 MySQL 重新巩固了一遍，梳理了一篇几万字 “超硬核” 的保姆式学习教程！（持续更新中~）
民工哥赞 14阅读 1.9k
linong赞 11阅读 1.1k
终于卷完了！Redis 打怪升级进阶成神之路（2023 最新版）！
民工哥赞 10阅读 810
思否 CTO 祁宁：社区问答是激荡高级智慧的头脑风暴
万事ONES赞 6阅读 12.9k评论 1
硬卷完了！MongoDB 打怪升级进阶成神之路（ 2023 最新版 ）！
民工哥赞 6阅读 450
王中阳Go赞 4阅读 1.9k评论 1
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用。你还可以使用