头图

tjhttp 3. "Graphic HTTP" - HTTP information in the message

Knowledge point

  1. HTTP request message structure.
  2. Difference between request message and body, introduces some conceptual information about message and body.
  3. Content Negotiation: What is Content Negotiation? There are several ways to negotiate content.

3.1 HTTP request message structure

The structure of the request and response messages is as follows:

HTTP请求报文结构

The following is an example of a request and response to a request message.

请求报文请求和响应的案例

3.2 Differences between message and body

In order to improve the efficiency of HTTP transmission, the original text of the message can be "encoded" by means of HTTP request message and entity processing in the request.

Before introducing the specific content, we need to distinguish two terms: message and entity .

Message : It is the basic unit in HTTP communication. It consists of an octet sequence (octet sequence, where octet is 8 bits) and is transmitted through HTTP communication.

entity : is transmitted as request or response payload data (supplementary items) whose content consists of an entity header and an entity body.

In order to understand the concept of an entity, it is necessary to understand what a payload is:

Payload (English: Payload): Payload refers to the entity data information that needs to be transmitted, which is why it is called a data entity. Of course, it can also be called header and metadata, or overhead data, which is only used to assist data transmission.

Header : Refers to the data appended to the header when a piece of data is stored or transmitted. This information is a description of the data area .

Metadata (English: metadata ): … is data that describes other data information.

The entity is crossed out because the term entity (entity) is replaced by the payload (payload), many explanations of the 2616 version mentioned in the book have been abandoned, and now RFC 2616 has been taken by RFC 7230, 7235.

replaced.

The following article discusses the difference between solids and loads, and why you should replace loads

#109 (Clarify entity / representation / variant terminology) – Hypertext Transfer Protocol Wiki (ietf.org)

Explanation about the load

original:

Replaced entity with payload and variant with representation. Cleaned up description of 204 status code (related to ticket #22 ) Rewrote section on Content-Location and refer to def in RFC2557 .

In addition, there is an explanation of the term "payload" in life on the wiki. Through the description, it can be understood from the side why the official suddenly reinterprets the concept of entity.

From Wikipedia "Payload":
A payload is an object carried by an aircraft or a launch vehicle . Sometimes payload also refers to the weight that an aircraft or launch vehicle can carry. Depending on the nature of the mission, the payload of the vehicle may be cargo , passengers , crew , munitions , scientific instruments or experiments or other equipment. If optional carry is possible, that extra fuel would also be considered part of the payload, such as in aerial refueling missions.

Personally, I think that the explanation of load (it can also be called load) is easier to understand than the explanation of entity (the entity is slightly abstract), and the meaning of the entity itself will not be lost.

Then, by comparing the Chrome and Edge browsers, we found that there is a concept of load in the current version. In the past version, this part of the content was actually placed in the request entity of the message. Obviously, this is not rigorous. At that time it was called the entity .

Of course, this part has been quietly adjusted in the past two years. Obviously, these browsers have also followed up on these concepts in the subsequent RFC revision process. I don't know how many people have paid attention to it. Well, it is another small detail.

Edge的“负载”

Chrome的Payload

So the purpose of the payload concept to replace the entity concept is to prevent confusion (because it is really easy to confuse), in fact, the entity is also divided into header and other information, the entity header is the description of the payload, and the payload and some other information (request line/status) lines, various header fields, etc.) are organized into messages for transmission.

There is such a diagram in the book to help us understand the difference between entities and messages, and this diagram can also explain why many explanations treat messages and entities (payloads) as the relationship between orders and goods.

请求和响应报文结构

more headache concept

In fact, the more confusing concepts, message body and payload body are also used.

According to RFC 7230 :

The message body of an HTTP message (if present) is used to carry the payload body of the request or response. The message body is equivalent to the payload body unless transfer encoding is applied .

In other words, only when the transfer encoding is applied , the load = entity header + entity body , the current main application transfer encoding is Transfer-Encoding: chunked , that is, the concept of the load will change when you look at the block transmission , otherwise it can be simply regarded as the request Body of the message.

The body of the HTTP message is used to transmit the entity body of the request or response. Optimizing the processing of the body HTTP implements the following features in subsequent versions:

  1. Compressed transmission
  2. Chunked Transfer Coding
  3. multi-data multi-object collection

Compressed transmission

The first thing that needs to be clear is that the compression is done on the load, and the compression needs to be compressed as it is to ensure that the information is not lost, otherwise the incomplete data will cause data errors.

Common compression methods are the following, among which gzip is the compression method often used for pictures:

  • gzip(GNU zip) `
  • compress(UNIX 系统的标准压缩)
  • deflate(zlib)
  • identity(不进行编码)

Compressing transmissions comes at a cost, since this operation requires a computer to complete, so it will increase the workload of the server, but this overhead is completely acceptable.

Chunked Transfer Coding

The function of entity body chunking is called chunked transfer coding (Chunked TransferCoding). Chunked transfer means that transfer coding will split the entity content into multiple chunks (chunck), which is mentioned earlier Transfer-Encoding: chunked .

Note that "0(CR+LF)" is used to mark the block size in the last block of the payload body.

multi-data multi-object collection

The multi-data multi-object set mainly includes the following contents:

  • mulitpart/form-data : used when uploading web form files;
  • mulitpart/byteranges : Status code 206 (Partial Content) is used when the response message contains multiple ranges of content;

To use multiple data and multiple object collections, you need to specify the Content-Type header field in HTTP.

enctype attribute

A representative attribute of a multi-data multi-object collection, the main function is to inform the server what type of data it will transmit.

The most common practical use of multipart object collections is to send files using HTML forms. Files are binary data (or are considered binary data), while all other data is textual data. Since HTTP is a text protocol, it has special requirements for handling binary data.

3.3 Content Negotiation

A typical case of content negotiation is internationalization. Content negotiation is similar to translation. The server and client need to negotiate the most suitable "intermediate" language for communication, and then interact according to the character set and encoding format.

The benchmarks and judgment benchmarks are the following information in the header fields:

 Accept
Accept-Charset
Accept-Encoding
Accept-Language
Content-Language

For example, the wiki below uses this information in the request header.

 content-encoding: gzip
content-language: zh
content-length: 17396
accept-ch: Sec-CH-UA-Arch,Sec-CH-UA-Bitness,Sec-CH-UA-Full-Version-List,Sec-CH-UA-Model,Sec-CH-UA-Platform-Version

3.3.1 Content Negotiation Mode

The basic guidelines for content negotiation are as follows:

  1. Relying on the client to set the HTTP header (also called service-driven content negotiation or active negotiation), the most standard way of content negotiation.
  2. The server returns 300 or 406, proxy-driven or response negotiation mechanism.

Server-driven Negotiation

Content negotiation is done by the server side. In the server-side negotiation, the client request will send a message header along with the URL to indicate its preference, and the server will select the appropriate resource to return according to this preference.

The advantage of server-driven is to make full use of the HTTP protocol specification to reduce additional behavior, because content negotiation rather than format negotiation, the decision is actually on the server side.

Of course, the cost of this advantage is the increased complexity of the server, because it needs to "guess" the information of the client, and at the same time, it may cause the client to send more and more complex messages.

Agent-driven Negotiation

The content negotiation is performed by the client, and the user negotiation is similar to the user selecting the type of browser to automatically switch.

Note that if the client driver cannot respond to the client's request, it will degenerate into server-driven negotiation . The client driver needs to send a second request in order to obtain the content it wants (the first time to get the list, and the second time to get the resource) ), it can be seen that the driver mode of the client is not a common way.

Proxy-driven content negotiation mechanism

For the improved scheme of transparent proxy, the proxy driver mainly solves the more significant pain point of server-side negotiation: the problem of scale .

The so-called scaling problem refers to that when the server requests a large amount of resources and needs to add headers, the request volume will expand and the transmission of accurate information will also lead to information leakage.

Note that there is a certain difference between the proxy driver and the transparent proxy. It uses the HTTP protocol to support something called the response proxy mechanism since it creates a dependency. This mechanism is also similar to the client-driven negotiation. It returns a list of resources for the user to choose and then needs to first A second request is made to obtain the required resources. The transparent proxy borrows the Vary header to complete the protocol compatibility, which is a bit like an "attack".

Therefore, although the proxy driver alleviates the mode of forming a "middleman" reference between the server and the client, it cannot avoid the problem of the second request.

Transparent Negotiation

Transparent proxies are replaced by proxy-driven content negotiation mechanisms.

The transparent negotiation mechanism attempts to remove the load required for server-driven negotiation from the server, and uses an intermediate proxy on behalf of the client to minimize the exchange of messages with the client.

This is a combination of server-driven and client-driven, and it is a method for content negotiation between the server and the client. But it was abandoned because the subsequent history was not recognized.

Transparent negotiation does not provide a corresponding specification in HTTP, so the HTTP/1.1 specification does not define any transparent negotiation mechanism, but defines the Vary header, so the transparent proxy mainly uses Vary this Additional fields complete protocol compatibility.

What is the Vary response header? Added in the HTTP1.1 protocol, it is returned when the server responds to the client when negotiating the content, and the server finally uses that header list. The biggest beneficiary is not the client but the cache server. After the cache server checks and finds the Vary field, it enables the transparent negotiation mechanism to delegate the transmission.

What is a cache server? Please see the introduction to the concept of load balancing in this article [["How the Network is Connected" Reading Notes - Summary]].

Alternates first

The same was not recognized and abandoned. I can't find any information online, just ignore it.

3.4 Summary

Many content negotiation methods have been introduced above. In fact, if you look closely at current websites, you will find that server-driven negotiation and proxy-driven content negotiation mechanisms are the main ones.

The former is that the WEB service provider can push the favorite content according to the user's request, and does not need to send the request twice to save the bandwidth, which is suitable for the vast majority of WEB users. Of course, the user experience depends on the level of the server-side application developer.

The proxy-driven content negotiation mechanism is mostly used for websites that support internationalization, such as some big malls or encyclopedias. Typical websites such as Apple and Wikipedia provide a "suggestion" option to ask users which language to browse.

The client-side proxy initiative is in the hands of the user. The server cannot control it and is not conducive to commercial promotion. Therefore, most WEB sites will "block" this method. On the other hand, the proxy driver can reduce the pressure on the server and be compatible with customers. Because of the characteristics of terminal driver, it is normal to be replaced by proxy driver.

Finally, there is the transparent proxy. The custom protocol used by the transparent proxy is not very common, so it is normal to be eliminated and replaced by the proxy driver.


Xander
198 声望51 粉丝