Introduction

We know that WebSocket is a network protocol based on the TCP protocol for real-time communication between the client and the server. Very easy to use. The easiest way to use WebSocket is to directly use the browser's API to communicate with the server.

This article will deeply analyze the message exchange format of WebSocket, so that everyone can understand how websocket works.

WebSocket handshake process

We know that in order to be compatible with the HTTP protocol, WebSocket is upgraded on the basis of the HTTP protocol. After the HTTP connection between the client and the server is established, the client will send a protocol upgraded to webSocket to the server, as shown below:

GET /chat HTTP/1.1
Host: example.com:8000
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
Note that the HTTP version here must be 1.1 or higher. HTTP request method must be GET

By setting the Upgrade and Connection headers, it means that we are ready to upgrade to webSocket.

In addition to the attributes listed here, other HTTP header attributes are acceptable.

There are two more special headers here, they are Sec-WebSocket-Version and Sec-WebSocket-Key.

First look at Sec-WebSocket-Version, which represents the version number of the WebSocket requested by the client. If the server does not understand the request sent by the client, it will return a 400 ("Bad Request"). In this return, the server will return a failure message.

If you don't understand the Sec-WebSocket-Version sent by the client, the server will also return Sec-WebSocket-Version to inform the client.

One of the header fields to pay special attention to here is Sec-WebSocket-Key. Let's take a look at what is the use of this field.

When the server receives the client's request, it will return a response to the client, telling the client that the protocol has been upgraded from HTTP to WebSocket.

The response returned might look like this:

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

The Sec-WebSocket-Accept here is generated based on the Sec-WebSocket-Key in the client request. Specifically, the Sec-WebSocket-Key sent by the client is connected with the character string "258EAFA5-E914-47DA-95CA-C5AB0DC85B11". Then use the SHA1 algorithm to find its hash value.

Finally, base64 encode the hash value.

After the server returns Sec-WebSocket-Accept, the client can verify it, and the entire handshake process has been completed.

WebSocket message format

The reason to use webSocket is because the client and server can send messages anytime and anywhere. This is the magic of websocket. So what format is the message sent? Let's take a look at it in detail.

The messages communicated between the client and the server are transmitted in the form of frames one by one. The format of the frame is as follows:


      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-------+-+-------------+-------------------------------+
     |F|R|R|R| opcode|M| Payload len |    Extended payload length    |
     |I|S|S|S|  (4)  |A|     (7)     |             (16/64)           |
     |N|V|V|V|       |S|             |   (if payload len==126/127)   |
     | |1|2|3|       |K|             |                               |
     +-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - +
     |     Extended payload length continued, if payload len == 127  |
     + - - - - - - - - - - - - - - - +-------------------------------+
     |                               |Masking-key, if MASK set to 1  |
     +-------------------------------+-------------------------------+
     | Masking-key (continued)       |          Payload Data         |
     +-------------------------------- - - - - - - - - - - - - - - - +
     :                     Payload Data continued ...                :
     + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
     |                     Payload Data continued ...                |
     +---------------------------------------------------------------+

MASK indicates whether the message is encoded. For the message from the client, MASK must be 1. If the message sent by the client to the server is not 1, the server needs to disconnect from the client. However, for messages sent from the server to the client, the MASK field does not need to be set.

RSV1-3 is an extended field and can be ignored.

opcode indicates how to interpret the payload field. The payload is the actual message to be delivered. 0x0 means continue, 0x1 means text, 0x2 means binary, and others mean control fields.

FIN indicates whether it is the last frame of the message. If it is 0, it means that the message has more frames. If it is 1, it means that the frame is the last part of the message, and the message can be processed.

Why is the Payload len field needed? Because we need to know when to stop receiving messages. Therefore, a field representing the payload is needed to process the message in detail.

How to parse Payload? This is more complicated.

  1. First read 9-15 bits and parse it as an unsigned integer. If it is less than 125, then this is the length of the payload, the end. If it is 126, then go to the second step. If it is 127, then go to the third step.
  2. Read the next 16 bits, then parse it as an unsigned integer, and end.
  3. Read the next 64 bits. Resolve it as a signed integer. Finish.

If Mask is set, then read the next 4 bytes, which is 32bits. This is the masking key. When the data is read, we get the encoded payload: ENCODED, and MASK key. To decode, the logic is as follows:

var DECODED = "";
for (var i = 0; i < ENCODED.length; i++) {
    DECODED[i] = ENCODED[i] ^ MASK[i % 4];

FIN can be used in conjunction with opcode to send long messages.

FIN=1 means it is the last message. 0x1 means a text message, 0x2 means 0, means a second net worth message, 0x0 means that the message has not ended yet, so 0x0 is usually used together with FIN=0.

Extensions and Subprotocols

During the handshake between the client and the server, on the basis of the standard websocket protocol, the client can also send Extensions or Subprotocols. What is the difference between these two?

First of all, these two are set through HTTP headers. But there is still a big difference between the two. Extensions can control WebSocket and modify the payload, while subprotocols only define the structure of the payload and will not modify it.

Extensions are optional, and Subprotocols are required.

You can think of Extensions as data compression. It compresses or optimizes data on the basis of webSocket, which can make the sent messages shorter.

And Subprotocols represents the format of the message, such as using soap or wamp.

The sub-protocol is a protocol developed on the basis of the WebSocket protocol. It is mainly used for the processing of specific scenarios. It is a stricter specification established on top of the WebSocket protocol.

For example, when the client requests the server, it puts the corresponding protocol in the Sec-WebSocket-Protocol header:

GET /socket HTTP/1.1
...
Sec-WebSocket-Protocol: soap, wamp

The server will respond according to the supported types, such as:

Sec-WebSocket-Protocol: soap

Summarize

This article explains the specific format of webSocket message interaction. You can see that many powerful protocols are composed of the most basic structure.

This article has been included in http://www.flydean.com/07-websocket-message/

The most popular interpretation, the most profound dry goods, the most concise tutorial, and many tips you don't know are waiting for you to discover!

Welcome to pay attention to my official account: "Program those things", know technology, know you better!


flydean
890 声望433 粉丝

欢迎访问我的个人网站:www.flydean.com