This article is shared by the author "Abao Brother". The original title "WebSocket You Don't Know" has revisions and changes.
1 Introduction
This article will start with the basic concepts, technical principles, common error-prone common sense, and hands-on practice. It will take you to explore the WebSocket technology in an all-round way.
After reading this article, you will know the following:
1) Understand the background of the birth of WebSocket, what WebSocket is and its advantages;
2) Understand what APIs WebSocket contains and how to use WebSocket API to send ordinary text and binary data;
3) Understand the WebSocket handshake protocol, data frame format, mask algorithm and other related knowledge;
4) Understand the relationship between WebSocket and http, long polling, sockets, etc., and clarify common-sense misunderstandings;
5) Know how to implement a WebSocket server that supports sending ordinary text.
study Exchange:
- 5 groups for instant messaging/push technology development and communication: 215477170 [recommended]
- Introduction to Mobile IM Development: "One entry is enough for novices: Develop mobile IM from scratch"
- Open source IM framework source code: https://github.com/JackJiang2011/MobileIMSDK
(This article was published synchronously at: http://www.52im.net/thread-3713-1-1.html)
2. About the author
Author's screen name: A Baoge
Personal blog: http://www.semlinker.com/
Author Github: https://github.com/semlinker/
3. What is WebSocket
3.1 Background of the Birth of WebSocket
In the early days, many websites used polling (also called short polling) in order to implement push technology. Polling means that the browser sends HTTP requests to the server at regular intervals, and then the server returns the latest data to the client.
Common polling methods are divided into polling and long polling. The difference between them is shown in the following figure:
In order to more intuitively feel the difference between polling and long polling, let's take a look at the specific code:
This traditional model brings obvious disadvantages, that is, the browser needs to continuously send requests to the server. However, HTTP requests and responses may contain long headers, and the really valid data may be only a small part, so This will consume a lot of bandwidth resources.
PS: Regarding the past and present of short polling and long polling technologies, you can read these two articles in detail: "Beginner's Post: The most comprehensive Web-side instant messaging technology principle in history", "Web-side instant messaging technology inventory: short round Inquiry, Comet, Websocket, SSE".
The relatively new polling technique is Comet. Although this technology can achieve two-way communication, it still requires repeated requests. And the HTTP long connection commonly used in Comet will also consume server resources.
In this case, HTML5 defines the WebSocket protocol, which can better save server resources and bandwidth, and can communicate in more real-time.
Websocket uses the uniform resource identifier (URI) of ws or wss, where wss means Websocket using TLS.
like:
ws://echo.websocket.org
wss://echo.websocket.org
WebSocket uses the same TCP port as HTTP and HTTPS, which can bypass most firewall restrictions.
by default:
- 1) The WebSocket protocol uses port 80;
- 2) If running on TLS, port 443 is used by default.
3.2 Introduction to WebSocket
WebSocket is a network transmission protocol that can perform full-duplex communication on a single TCP connection and is located in the application layer of the OSI model. The WebSocket protocol was standardized by IETF to RFC 6455 in 2011, and was later supplemented by RFC 7936.
WebSocket makes the data exchange between the client and the server easier, allowing the server to actively push data to the client. In the WebSocket API, the browser and the server only need to complete a handshake, and a persistent connection can be created between the two, and two-way data transmission can be carried out.
After introducing the related content of polling and WebSocket, let's take a picture to see the difference between XHR Polling (short polling) and WebSocket.
The difference between XHR Polling and WebSocket is shown in the following figure:
3.3 Advantages of WebSocket
It is generally believed that the advantages of WebSocket are as follows:
- 1) Less control overhead: After the connection is created, when data is exchanged between the server and the client, the data packet header used for protocol control is relatively small;
- 2) Stronger real-time performance: Since the protocol is full-duplex, the server can actively send data to the client at any time. Compared with HTTP requests, the server needs to wait for the client to initiate the request before responding, and the delay is significantly less;
- 3) Keep the connection state: Unlike HTTP, WebSocket needs to create a connection first, which makes it a stateful protocol, and then part of the state information can be omitted when communicating;
- 4) Better binary support: WebSocket defines binary frames, which can handle binary content more easily than HTTP;
- 5) Can support extensions: WebSocket defines extensions, and users can extend the protocol and implement some custom sub-protocols.
Because WebSocket has the above advantages, it is widely used in instant messaging/IM, real-time audio and video, online education, games and other fields.
For front-end developers, if they want to use the powerful capabilities provided by WebSocket, they must first master the WebSocket API. Let's take everyone to learn about the WebSocket API.
PS: If you want a more simple introduction to WebSocket, you can read this "Quick Start for Newbies: A Concise Tutorial for WebSocket", and then come back to continue learning.
4. WebSocket API learning
4.1 Basic situation
Before introducing the WebSocket API, let's first understand its compatibility:
(Picture quoted from: https://caniuse.com/#search=WebSocket)
As can be seen from the above figure: the current mainstream web browsers all support WebSocket, so we can use it with confidence in most projects.
To use the capabilities provided by WebSocket in the browser, we must first create a WebSocket object, which provides an API for creating and managing a WebSocket connection, as well as sending and receiving data through the connection.
Using the WebSocket constructor, we can easily construct a WebSocket object.
Next, we will introduce the WebSocket API from the following four aspects:
1) WebSocket constructor;
2) The attributes of the WebSocket object;
3) WebSocket method;
4) WebSocket event.
Next, we start to learn from the constructor of WebSocket.
PS: If you want a more simple introduction to WebSocket, you can read this "Quick Start for Newbies: A Concise Tutorial for WebSocket", and then come back to continue learning.
4.2 Constructor
The syntax of the WebSocket constructor is:
const myWebSocket = newWebSocket(url [, protocols]);
The related parameters are as follows:
1) url: indicates the URL of the connection, which is the URL that the WebSocket server will respond to;
2) protocols (optional): a protocol string or an array containing protocol strings.
For point 2): These strings are used to specify sub-protocols, so that a single server can implement multiple WebSocket sub-protocols.
For example: You may want a server to be able to handle different types of interactions according to a specified protocol (protocol). If the protocol string is not specified, it is assumed to be an empty string.
When using the WebSocket constructor, a SECURITY_ERR exception will be thrown when the port you are trying to connect to is blocked.
PS: For a more detailed description of the WebSocket constructor, please refer to the official API documentation.
4.3 Properties
The WebSocket object contains the following properties:
The specific meaning of each attribute is as follows:
1) BinaryType: Use binary data type connection;
2) bufferedAmount (read only): the number of bytes not sent to the server;
3) extensions (read only): extensions selected by the server;
4) onclose: used to specify the callback function after the connection is closed;
5) onerror: used to specify the callback function after the connection fails;
6) onmessage: used to specify the callback function when the message is received from the server;
7) onopen: used to specify the callback function after a successful connection;
8) protocol (read-only): used to return the name of the selected sub-protocol on the server side;
9) readyState (read only): returns the current WebSocket connection state, there are 4 states:
- CONNECTING — 正在连接中,对应的值为 0;
- OPEN — 已经连接并且可以通讯,对应的值为 1;
- CLOSING — 连接正在关闭,对应的值为 2;
- CLOSED — 连接已关闭或者没有连接成功,对应的值为 3
10) url (read only): The return value is the absolute path of the URL when the constructor creates the WebSocket instance object.
4.4 Method
There are two main methods of WebSocket:
1) close([code[, reason]]): This method is used to close the WebSocket connection. If the connection has been closed, this method does not perform any operation;
2) send(data): This method queues the data that needs to be transmitted to the server through the WebSocket link, and increases the value of bufferedAmount according to the size of the data that needs to be transmitted. If the data cannot be transmitted (for example, the data needs to be cached and the buffer is full), the socket will close itself.
4.5 Event
Use addEventListener() or assign an event listener to the oneventname property of the WebSocket object to monitor the following events.
Here are a few events:
1) close: Triggered when a WebSocket connection is closed, it can also be set through the onclose property;
2) error: Triggered when a WebSocket connection is closed due to an error, it can also be set through the onerror property;
3) Message: Triggered when data is received via WebSocket, it can also be set via the onmessage property;
4) open: Triggered when a WebSocket connection is successful, it can also be set through the onopen property.
After introducing the WebSocket API, let's take an example of using WebSocket to send ordinary text.
4.6 Code Practice: Sending Plain Text
In the above example: We have created two textarea on the page, which are used to store the data to be sent and the data returned by the server. After the user enters the text to be sent, the entered text will be sent to the server when the user clicks the send button, and after the server successfully receives the message, it will return the received message to the client intact.
// const socket = new WebSocket("ws://echo.websocket.org");
// const sendMsgContainer = document.querySelector("#sendMessage");
Quote
function send() {
const message = sendMsgContainer.value;
Quote
if(socket.readyState !== WebSocket.OPEN) {console.log("连接未建立,还不能发送消息");
Quote
return;
Quote
}
Quote
if(message) socket.send(message);
Quote
}
Quote
Of course, after the client receives the message returned by the server, it will save the corresponding text content in the textarea text box corresponding to the received data.
Quote
// const socket = new WebSocket("ws://echo.websocket.org");
Quote
// const receivedMsgContainer = document.querySelector("#receivedMessage");
Quote
socket.addEventListener("message", function(event) {
console.log("Message from server ", event.data);
Quote
receivedMsgContainer.value = event.data;
Quote
});
In order to understand the above data interaction process more intuitively, we use the developer tools of the Chrome browser to take a look at the corresponding process.
As shown below:
The complete code corresponding to the above example is as follows:
<!DOCTYPE html>
<html>
Quote
<head>
Quote<metacharset="UTF-8"/>
Quote
<metaname="viewport"content="width=device-width, initial-scale=1.0"/>
Quote
<title>WebSocket 发送普通文本示例</title>
Quote
<style>
Quote
.block { flex: 1;
Quote
}
Quote
</style>
Quote
</head>
Quote
<body>
Quote<h3>WebSocket 发送普通文本示例</h3>
Quote
<divstyle="display: flex;">
Quote
<divclass="block">
Quote
<p>即将发送的数据:<button>发送</button></p>
Quote
<textareaid="sendMessage"rows="5"cols="15"></textarea>
Quote
</div>
Quote
<divclass="block">
Quote
<p>接收的数据:</p>
Quote
<textareaid="receivedMessage"rows="5"cols="15"></textarea>
Quote
</div>
Quote
</div>
Quote
<script>
Quote
const sendMsgContainer = document.querySelector("#sendMessage");
Quote
const receivedMsgContainer = document.querySelector("#receivedMessage");
Quote
const socket = new WebSocket("ws://echo.websocket.org");
Quote
// 监听连接成功事件
Quote
socket.addEventListener("open", function (event) { console.log("连接成功,可以开始通讯");
Quote
});
Quote
// 监听消息
Quote
socket.addEventListener("message", function (event) { console.log("Message from server ", event.data);
Quote
receivedMsgContainer.value = event.data;
Quote
});
Quote
function send() { const message = sendMsgContainer.value;
Quote
if (socket.readyState !== WebSocket.OPEN) { console.log("连接未建立,还不能发送消息");
Quote
return;
Quote
}
Quote
if (message) socket.send(message);
Quote
}
Quote
</script>
Quote
</body>
Quote
</html>
In fact, in addition to sending ordinary text, WebSocket also supports sending binary data, such as ArrayBuffer objects, Blob objects, or ArrayBufferView objects.
The code example is as follows:
const socket = new WebSocket("ws://echo.websocket.org");
Quote
socket.onopen = function() {
// Send UTF-8 encoded text information
Quote
socket.send("Hello Echo Server!");
Quote
// Send UTF-8 encoded JSON data
Quote
socket.send(JSON.stringify({ msg: "I am Brother Abao"}));
Quote
// Send binary ArrayBuffer
Quote
const buffer = newArrayBuffer(128);
Quote
socket.send(buffer);
Quote
// Send binary ArrayBufferView
Quote
const intview = new Uint32Array(buffer);
Quote
socket.send(intview);
Quote
// send binary blob
Quote
const blob = new Blob([buffer]);
Quote
socket.send(blob);
Quote
};
After the above code runs successfully, we can see the corresponding data interaction process through Chrome Developer Tools.
As shown below:
Let's take sending a Blob object as an example to introduce how to send binary data.
Blob (Binary Large Object) represents a large object of binary type. In a database management system, binary data is stored as a collection of single individuals. Blobs are usually images, sounds, or multimedia files. Objects of Blob type in JavaScript represent immutable raw data like file objects.
Friends who are interested in Blob can read the article "Blob You Don't Know".
4.7 Code Practice: Sending Binary Data
In the above example, we created two textarea on the page, which are used to store the data to be sent and the data returned by the server.
When the user clicks the send button after entering the text to be sent, we will first get the entered text and wrap the text into a Blob object and then send it to the server. After the server successfully receives the message, it will send the received message Return to the client intact.
When the browser receives a new message, if it is text data, it will automatically convert it into a DOMString object. If it is a binary data or Blob object, it will directly pass it to the application, and the application itself will respond according to the returned data type. Processing.
Data sending code:
// const socket = new WebSocket("ws://echo.websocket.org");
Quote
// const sendMsgContainer = document.querySelector("#sendMessage");
Quote
function send() {
const message = sendMsgContainer.value;
Quote
if(socket.readyState !== WebSocket.OPEN) {console.log("连接未建立,还不能发送消息");
Quote
return;
Quote
}
Quote
const blob = newBlob([message], { type: "text/plain"});
Quote
if(message) socket.send(blob);
Quote
console.log(The number of bytes not sent to the server: ${socket.bufferedAmount} );
Quote
}
When the client receives the message returned by the server, it will determine the returned data type. If it is of the Blob type, it will call the text() method of the Blob object to get the UTF-8 format content stored in the Blob object, and then The corresponding text content is saved in the textarea text box corresponding to the received data.
Data receiving code:
// const socket = new WebSocket("ws://echo.websocket.org");
Quote
// const receivedMsgContainer = document.querySelector("#receivedMessage");
Quote
socket.addEventListener("message", async function(event) {
console.log("Message from server ", event.data);
Quote
const receivedData = event.data;
Quote
if(receivedData instanceofBlob) {receivedMsgContainer.value = await receivedData.text();
Quote
} else{receivedMsgContainer.value = receivedData;
Quote
}
Quote
});
Similarly, we use the developer tools of the Chrome browser to take a look at the corresponding process:
From the above figure, we can clearly see that when using the sending Blob object, the information in the Data field displays the Binary Message, and for sending ordinary text, the information in the Data field directly displays the sent text message.
The complete code corresponding to the above example is as follows:
<!DOCTYPE html>
Quote
<html>
Quote
<head>
Quote<meta charset="UTF-8"/>
Quote
<meta name="viewport"content="width=device-width, initial-scale=1.0"/>
Quote
<title>WebSocket 发送二进制数据示例</title>
Quote
<style>
Quote
.block { flex: 1;
Quote
}
Quote
</style>
Quote
</head>
Quote
<body>
Quote<h3>WebSocket 发送二进制数据示例</h3>
Quote
<div style="display: flex;">
Quote
<div class="block">
Quote
<p>待发送的数据:<button>发送</button></p>
Quote
<textarea id="sendMessage"rows="5"cols="15"></textarea>
Quote
</div>
Quote
<div class="block">
Quote
<p>接收的数据:</p>
Quote
<textarea id="receivedMessage"rows="5"cols="15"></textarea>
Quote
</div>
Quote
</div>
Quote
<script>
Quote
const sendMsgContainer = document.querySelector("#sendMessage");
Quote
const receivedMsgContainer = document.querySelector("#receivedMessage");
Quote
const socket = new WebSocket("ws://echo.websocket.org");
Quote
// 监听连接成功事件
Quote
socket.addEventListener("open", function(event) { console.log("连接成功,可以开始通讯");
Quote
});
Quote
// 监听消息
Quote
socket.addEventListener("message", async function(event) { console.log("Message from server ", event.data);
Quote
const receivedData = event.data;
Quote
if(receivedData instanceofBlob) { receivedMsgContainer.value = await receivedData.text();
Quote
} else{ receivedMsgContainer.value = receivedData;
Quote
}
Quote
});
Quote
functionsend() { const message = sendMsgContainer.value;
Quote
if(socket.readyState !== WebSocket.OPEN) { console.log("连接未建立,还不能发送消息");
Quote
return;
Quote
}
Quote
const blob = newBlob([message], { type: "text/plain"});
Quote
if(message) socket.send(blob);
Quote
console.log(`未发送至服务器的字节数:${socket.bufferedAmount}`);
Quote
}
Quote
</script>
Quote
</body>
Quote
</html>
There may be some friends who have learned about the WebSocket API and feel that it is not fun enough. The following will take you to implement a WebSocket server that supports sending ordinary text.
5. Handwriting WebSocket server
5.1 Write in front
Before introducing how to write a WebSocket server by hand, we need to understand the life cycle of a WebSocket connection.
As can be seen from the above figure: before using WebSocket to achieve full-duplex communication, a handshake (Handshake) is required between the client and the server, and the two-way data communication can only be started after the handshake is completed.
The handshake is after the creation of the communication circuit and before the start of the information transmission.
The handshake is used to reach parameters, such as:
1) Information transmission rate
2) Alphabet
3) Parity check
4) Interruption process;
5) Other protocol features.
Handshaking helps systems or devices of different structures to connect in a communication channel without the need for manual parameter setting.
Since the handshake is the first link in the life cycle of a WebSocket connection, let's analyze the handshake protocol of WebSocket first.
5.2 Handshake Agreement
The WebSocket protocol belongs to the application layer protocol, which relies on the TCP protocol of the transport layer. WebSocket handshake through the 101 status code of HTTP/1.1 protocol. In order to create a WebSocket connection, a request needs to be sent through the browser, and then the server responds. This process is usually called "handshaking".
There are several advantages to using HTTP to complete the handshake:
1) First: Make WebSocket compatible with the existing HTTP infrastructure—make the WebSocket server run on ports 80 and 443, which are usually the only ports open to clients;
2) Second: Let us reuse and extend the HTTP Upgrade stream and add a custom WebSocket header to it to complete the negotiation.
Let's take the example of sending ordinary text that has been demonstrated before as an example to analyze the handshake process in detail.
5.2.1) Client request:
GET ws://echo.websocket.org/ HTTP/1.1
Quote
Host: echo.websocket.org
Quote
Origin: file://
Quote
Connection: Upgrade
Quote
Upgrade: websocket
Quote
Sec-WebSocket-Version: 13
Quote
Sec-WebSocket-Key: Zx8rNEkBE4xnwifpuh8DHQ==
Quote
Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits
Note: Some HTTP request headers have been ignored.
The description of the fields in the above request is as follows:
1) Connection: Upgrade must be set, indicating that the client wants to connect and upgrade;
2) Upgrade: The field must be set to websocket, indicating that you want to upgrade to the WebSocket protocol;
3) Sec-WebSocket-Version: indicates the supported WebSocket version. The version required by RFC6455 is 13, and all previous draft versions should be discarded;
4) Sec-WebSocket-Key: is a random string, the server will use these data to construct a SHA-1 message digest;
5) Sec-WebSocket-Extensions: used to negotiate the WebSocket extensions to be used for this connection: the client sends the supported extensions, and the server confirms that it supports one or more extensions by returning the same header;
6) Origin: The field is optional, usually used to indicate the page where the WebSocket connection is initiated in the browser, similar to Referer. However, unlike Referer, Origin only contains the protocol and host name.
For the above point 4): add a special string "258EAFA5-E914-47DA-95CA-C5AB0DC85B11" to "Sec-WebSocket-Key", then calculate the SHA-1 summary, then perform Base64 encoding, and use the result as " The value of the "Sec-WebSocket-Accept" header is returned to the client. By doing this, you can try to avoid the common HTTP request being mistaken for the WebSocket protocol.
5.2.2) Server response:
HTTP/1.1 101 Web Socket Protocol Handshake ①
Quote
Connection: Upgrade ②
Quote
Upgrade: websocket ③
Quote
Sec-WebSocket-Accept: 52Rg3vW4JQ1yWpkvFlsTsiezlqw= ④
Note: Some HTTP response headers have been ignored.
The fields in the above response are described as follows:
① The 101 response code confirms the upgrade to WebSocket protocol;
② Set the value of the Connection header to "Upgrade" to indicate that this is an upgrade request (the HTTP protocol provides a special mechanism that allows an established connection to be upgraded to a new, incompatible protocol);
③ The Upgrade header specifies one or more protocol names, sorted by priority, separated by commas. This means upgrading to the WebSocket protocol;
④ Signature key value verification protocol support.
After introducing the WebSocket handshake protocol, we will use Node.js to develop our WebSocket server.
5.3 Realize the handshake function
To develop a WebSocket server, we first need to implement the handshake function. Here I use the built-in http module of Node.js to create an HTTP server.
The specific code is as follows:
const http = require("http");
Quote
const port = 8888;
Quote
const { generateAcceptValue } = require("./util");
Quote
const server = http.createServer((req, res) => {
res.writeHead(200, { "Content-Type": "text/plain; charset=utf-8"});
Quote
res.end("Hello everyone, I am Brother Abao. Thank you for reading "WebSocket You Don't Know"");
Quote
});
Quote
server.on("upgrade", function(req, socket) {
if(req.headers["upgrade"] !== "websocket") {socket.end("HTTP/1.1 400 Bad Request");
Quote
return;
Quote
}
Quote
// Read the Sec-WebSocket-Key provided by the client
Quote
const secWsKey = req.headers["sec-websocket-key"];
Quote
// Use SHA-1 algorithm to generate Sec-WebSocket-Accept
Quote
const hash = generateAcceptValue(secWsKey);
Quote
// Set HTTP response header
Quote
const responseHeaders = [
Quote"HTTP/1.1 101 Web Socket Protocol Handshake",
Quote
"Upgrade: WebSocket",
Quote
"Connection: Upgrade",
Quote
`Sec-WebSocket-Accept: ${hash}`,
Quote
];
Quote
// Return the response information of the handshake request
Quote
socket.write(responseHeaders.join("\r\n") + "\r\n\r\n");
Quote
});
Quote
server.listen(port, () =>
Quote
console.log(Server running at http://localhost:${port}
)
Quote
);
In the above code: we first introduce the http module, then create an HTTP server by calling the module's createServer() method, and then we listen to the upgrade event, which will be triggered every time the server responds to an upgrade request. Since our server only supports upgrading to the WebSocket protocol, if the protocol requested by the client to upgrade is not the WebSocket protocol, we will return "400 Bad Request".
When the server receives a handshake request to upgrade to WebSocket, it will first obtain the value of "Sec-WebSocket-Key" from the request header, and then add a special string "258EAFA5-E914-47DA-95CA-C5AB0DC85B11" to the value , And then calculate the SHA-1 digest, then perform Base64 encoding, and use the result as the value of the "Sec-WebSocket-Accept" header and return it to the client.
The above process seems a bit cumbersome, but in fact, using the built-in crypto module of Node.js, it can be done in a few lines of code.
code show as below:
// util.js
Quote
const crypto = require("crypto");
Quote
const MAGIC_KEY = "258EAFA5-E914-47DA-95CA-C5AB0DC85B11";
Quote
function generateAcceptValue(secWsKey) {
return crypto
Quote.createHash("sha1")
Quote
.update(secWsKey + MAGIC_KEY, "utf8")
Quote
.digest("base64");
Quote
}
After developing the handshake function, we can use the previous example to test the function. After the server is started, we only need to make simple adjustments to the "send normal text" example, that is, replace the previous URL address with ws://localhost:8888, and then the function can be verified.
Interested guys can give it a try, the following is the result of my local operation:
From the above figure, we can see that the handshake function we have implemented can already work normally. So is it possible that the handshake will fail? The answer is yes. For example, network problems, server abnormalities or incorrect values of Sec-WebSocket-Accept.
Let's change the "Sec-WebSocket-Accept" generation rule, for example, modify the value of MAGIC_KEY, and then re-verify the handshake function.
At this time, the browser console will output the following exception information:
WebSocket connection to 'ws://localhost:8888/'failed: Error during WebSocket handshake: Incorrect 'Sec-WebSocket-Accept'header value
If your WebSocket server needs to support sub-protocols, you can refer to the following code to handle sub-protocols, and I won’t continue to introduce them here.
// read the sub-protocol from the request header
Quote
const protocol = req.headers["sec-websocket-protocol"];
Quote
// If it contains a sub-protocol, then resolve the sub-protocol
Quote
const protocols = !protocol ? [] : protocol.split(",").map((s) => s.trim());
Quote
// For simplicity, we only judge whether there is a JSON sub-protocol
Quote
if(protocols.includes("json")) {
responseHeaders.push(Sec-WebSocket-Protocol: json
);
Quote
}
Okay, the content related to the WebSocket handshake protocol has basically been introduced. In the next step, we will introduce some basic knowledge that needs to be understood to develop the message communication function.
5.4 Basics of message communication
In the WebSocket protocol, data is transmitted through a series of data frames.
In order to avoid network intermediaries (such as some intercepting agents) or some security issues, the client must add a mask to all frames it sends to the server. After the server receives the data frame without the mask, it must immediately close the connection.
5.4.1) Data frame format:
To achieve message communication, we must understand the format of WebSocket data frames:
After seeing the above content, some friends may start to be a little bit "coerced".
Let's combine the actual data frame to further analyze:
In the above figure: a simple analysis of the data frame format corresponding to the "send ordinary text" example. Here we will further introduce the payload length, because this knowledge point will be used when developing the data analysis function later.
Payload length represents the length of "payload data" in bytes.
It has the following situations:
1) If the value is 0-125, it means the length of the load data;
2) If it is 126, then the next 2 bytes are interpreted as 16-bit unsigned integer as the length of the payload data;
3) If it is 127, then the next 8 bytes are interpreted as a 64-bit unsigned integer (the most significant bit must be 0) as the length of the load data.
Remarks: The multi-byte length is expressed in network byte order, and the payload length refers to the length of "extended data" + "application data". The length of the "extended data" may be 0, so the payload length is the length of the "application data".
In addition: Unless the extension is negotiated, the length of the "extended data" is 0 bytes. In the handshake protocol, any extension must specify the length of the "extended data", how this length is calculated, and how the extension is used. If there is an extension, then this "extended data" is included in the total payload length.
PS: For a detailed explanation of the data frame format, you can read the following articles in depth:
"WebSocket from entry to proficiency, half an hour is enough! 》
"Integrating Theory with Practice: Understanding the Communication Principle, Protocol Format, and Security of WebSocket from Zero"
5.4.2) Masking algorithm:
The mask field is a 32-bit value randomly selected by the client. The mask value must be unpredictable. Therefore, the mask must come from a strong entropy, and the given mask cannot allow the server or agent to easily predict subsequent frames. The unpredictability of the mask is essential to prevent the authors of malicious applications from exposing relevant byte data on the Internet.
The mask does not affect the length of the data load, and the steps involved in masking the data and de-masking the data are the same.
The following algorithms are used for masking and de-masking operations:
j = i MOD 4
transformed-octet-i = original-octet-i XOR masking-key-octet-j
explain:
1) original-octet-i: the i-th byte of the original data;
2) transformed-octet-i: is the i-th byte of the transformed data;
3) masking-key-octet-j: is the jth byte of the mask key.
In order for the friends to better understand the calculation process of the above mask, let's mask the data of "I am Brother Abao" in the example.
The UTF-8 encoding corresponding to "I am Abao" here is as follows:
E6 88 91 E6 98 AF E9 98 BF E5 AE 9D E5 93 A5
The corresponding Masking-Key is 0x08f6efb1.
According to the above algorithm, we can perform the mask operation like this:
let uint8 = new Uint8Array([0xE6, 0x88, 0x91, 0xE6, 0x98, 0xAF, 0xE9, 0x98,0xBF, 0xE5, 0xAE, 0x9D, 0xE5, 0x93, 0xA5]);
Quote
let maskingKey = new Uint8Array([0x08, 0xf6, 0xef, 0xb1]);
Quote
let maskedUint8 = new Uint8Array(uint8.length);
Quote
for(let i = 0, j = 0; i < uint8.length; i++, j = i % 4) {
maskedUint8[i ] = uint8[i ] ^ maskingKey[j];
Quote
}
Quote
console.log(Array.from(maskedUint8).map(num=>Number(num).toString(16)).join(' '));
After the above code runs successfully, the console will output the following results:
ee 7e 7e 57 90 59 6 29 b7 13 41 2c ed 65 4a
The above result is consistent with the value corresponding to the Masked payload in WireShark, as shown in the following figure:
In the WebSocket protocol, the function of the data mask is to enhance the security of the protocol. But the data mask is not to protect the data itself, because the algorithm itself is public and the calculation is not complicated.
So why introduce a data mask? The data mask is introduced to prevent problems such as proxy cache pollution attacks that existed in earlier versions of the protocol.
After understanding the WebSocket masking algorithm and the role of data masking, let's introduce the concept of data fragmentation.
5.4.3) Data fragmentation:
Each message of WebSocket may be divided into multiple data frames. When the WebSocket receiver receives a data frame, it will judge according to the value of FIN whether it has received the last data frame of the message.
Using FIN and Opcode, we can send messages across frames.
The opcode tells what the frame should do:
1) If it is 0x1, the payload is text;
2) If it is 0x2, the payload is binary data;
3) If it is 0x0, the frame is a continuation frame (this means that the server should connect the payload of the frame to the last frame received from the client).
In order for everyone to better understand the above content, let's look at an example from MDN:
Client: FIN=1, opcode=0x1, msg="hello"
Server: (process complete message immediately) Hi.
Client: FIN=0, opcode=0x1, msg="and a"
Server: (listening, newmessage containing text started)
Client: FIN=0, opcode=0x0, msg="happy new"
Server: (listening, payload concatenated to previous message)
Client: FIN=1, opcode=0x0, msg="year!"
Server: (process complete message) Happy newyear to you too!
In the above example: the client sends two messages to the server, the first message is sent in a single frame, and the second message is sent across three frames.
Among them: the first message is a complete message (FIN=1 and opcode != 0x0), so the server can process or respond as needed. The second message is a text message (opcode=0x1) and FIN=0, indicating that the message has not been sent yet, and there are subsequent data frames. All remaining parts of the message are sent with a continuation frame (opcode=0x0), and the final frame of the message is marked with FIN=1.
Okay, I briefly introduced the related content of data sharding. Next, let's start to implement the message communication function.
5.5 Realize the message communication function
The author decomposes the realization of the message communication function into two sub-functions: message parsing and message response. Below we will respectively introduce how to implement these two sub-functions.
5.5.1) Message analysis:
Using the relevant knowledge introduced in the basic link of message communication, I implemented a parseMessage function to parse the WebSocket data frame sent by the client.
For the sake of simplicity, only text frames are processed here. The specific code is as follows:
function parseMessage(buffer) {
// 第一个字节,包含了FIN位,opcode, 掩码位
const firstByte = buffer.readUInt8(0);
// [FIN, RSV, RSV, RSV, OPCODE, OPCODE, OPCODE, OPCODE];
// 右移7位取首位,1位,表示是否是最后一帧数据
const isFinalFrame = Boolean((firstByte >>> 7) & 0x01);
console.log("isFIN: ", isFinalFrame);
// 取出操作码,低四位
/**
* %x0:表示一个延续帧。当 Opcode 为 0 时,表示本次数据传输采用了数据分片,当前收到的数据帧为其中一个数据分片;
* %x1:表示这是一个文本帧(text frame);
* %x2:表示这是一个二进制帧(binary frame);
* %x3-7:保留的操作代码,用于后续定义的非控制帧;
* %x8:表示连接断开;
* %x9:表示这是一个心跳请求(ping);
* %xA:表示这是一个心跳响应(pong);
* %xB-F:保留的操作代码,用于后续定义的控制帧。
*/
const opcode = firstByte & 0x0f;
if(opcode === 0x08) {
// 连接关闭
return;
}
if(opcode === 0x02) {
// 二进制帧
return;
}
if(opcode === 0x01) {
// 目前只处理文本帧
let offset = 1;
const secondByte = buffer.readUInt8(offset);
// MASK: 1位,表示是否使用了掩码,在发送给服务端的数据帧里必须使用掩码,而服务端返回时不需要掩码
const useMask = Boolean((secondByte >>> 7) & 0x01);
console.log("use MASK: ", useMask);
const payloadLen = secondByte & 0x7f; // 低7位表示载荷字节长度
offset += 1;
// 四个字节的掩码
let MASK = [];
// 如果这个值在0-125之间,则后面的4个字节(32位)就应该被直接识别成掩码;
if(payloadLen <= 0x7d) {
// 载荷长度小于125
MASK = buffer.slice(offset, 4 + offset);
offset += 4;
console.log("payload length: ", payloadLen);
} elseif(payloadLen === 0x7e) {
// 如果这个值是126,则后面两个字节(16位)内容应该,被识别成一个16位的二进制数表示数据内容大小;
console.log("payload length: ", buffer.readInt16BE(offset));
// 长度是126, 则后面两个字节作为payload length,32位的掩码
MASK = buffer.slice(offset + 2, offset + 2 + 4);
offset += 6;
} else{
// 如果这个值是127,则后面的8个字节(64位)内容应该被识别成一个64位的二进制数表示数据内容大小
MASK = buffer.slice(offset + 8, offset + 8 + 4);
offset += 12;
}
// 开始读取后面的payload,与掩码计算,得到原来的字节内容
const newBuffer = [];
const dataBuffer = buffer.slice(offset);
for(let i = 0, j = 0; i < dataBuffer.length; i++, j = i % 4) {
const nextBuf = dataBuffer[i ];
newBuffer.push(nextBuf ^ MASK[j]);
}
return Buffer.from(newBuffer).toString();
}
return "";
}
After creating the parseMessage function, let's update the WebSocket server created earlier:
server.on("upgrade", function(req, socket) {
socket.on("data", (buffer) => {
const message = parseMessage(buffer);
if(message) {
console.log("Message from client:"+ message);
} elseif(message === null) {
console.log("WebSocket connection closed by the client.");
}
});
if(req.headers["upgrade"] !== "websocket") {
socket.end("HTTP/1.1 400 Bad Request");
return;
}
// 省略已有代码
});
After the update is complete, we restart the server, and then continue to use the "send normal text" example to test the message parsing function.
The following is the information output by the WebSocket server after sending the "I am Brother Po" text message:
Server running at http://localhost:8888
isFIN: true
use MASK: true
payload length: 15
Message from client:我是阿宝哥
By observing the above output information, our WebSocket server can successfully parse the data frame sent by the client containing normal text. Next, we will implement the message response function.
5.5.2) Message response:
To return the data to the client, our WebSocket server must also encapsulate the data in the format of the WebSocket data frame.
Like the parseMessage function described earlier, I also encapsulated a constructReply function to encapsulate the returned data.
The specific code of this function is as follows:
function constructReply(data) {
const json = JSON.stringify(data);
const jsonByteLength = Buffer.byteLength(json);
// 目前只支持小于65535字节的负载
const lengthByteCount = jsonByteLength < 126 ? 0 : 2;
const payloadLength = lengthByteCount === 0 ? jsonByteLength : 126;
const buffer = Buffer.alloc(2 + lengthByteCount + jsonByteLength);
// 设置数据帧首字节,设置opcode为1,表示文本帧
buffer.writeUInt8(0b10000001, 0);
buffer.writeUInt8(payloadLength, 1);
// 如果payloadLength为126,则后面两个字节(16位)内容应该,被识别成一个16位的二进制数表示数据内容大小
let payloadOffset = 2;
if(lengthByteCount > 0) {
buffer.writeUInt16BE(jsonByteLength, 2);
payloadOffset += lengthByteCount;
}
// 把JSON数据写入到Buffer缓冲区中
buffer.write(json, payloadOffset);
return buffer;
}
After creating the constructReply function, let's update the WebSocket server created earlier:
server.on("upgrade", function(req, socket) {
socket.on("data", (buffer) => {
const message = parseMessage(buffer);
if(message) {
console.log("Message from client:"+ message);
// 新增以下👇代码
socket.write(constructReply({ message }));
} elseif(message === null) {
console.log("WebSocket connection closed by the client.");
}
});
});
At this point, our WebSocket server has been developed. Next, let's fully verify its function.
It can be seen from the above figure that the simple version of the WebSocket server developed above can already process ordinary text messages normally.
Finally, let's take a look at the complete code.
custom-websocket-server.js file:
const http = require("http");
const port = 8888;
const { generateAcceptValue, parseMessage, constructReply } = require("./util");
const server = http.createServer((req, res) => {
res.writeHead(200, { "Content-Type": "text/plain; charset=utf-8"});
res.end("大家好,我是阿宝哥。感谢你阅读“你不知道的WebSocket”");
});
server.on("upgrade", function(req, socket) {
socket.on("data", (buffer) => {
const message = parseMessage(buffer);
if(message) {
console.log("Message from client:"+ message);
socket.write(constructReply({ message }));
} else if(message === null) {
console.log("WebSocket connection closed by the client.");
}
});
if(req.headers["upgrade"] !== "websocket") {
socket.end("HTTP/1.1 400 Bad Request");
return;
}
// 读取客户端提供的Sec-WebSocket-Key
const secWsKey = req.headers["sec-websocket-key"];
// 使用SHA-1算法生成Sec-WebSocket-Accept
const hash = generateAcceptValue(secWsKey);
// 设置HTTP响应头
const responseHeaders = [
"HTTP/1.1 101 Web Socket Protocol Handshake",
"Upgrade: WebSocket",
"Connection: Upgrade",
`Sec-WebSocket-Accept: ${hash}`,
];
// 返回握手请求的响应信息
socket.write(responseHeaders.join("\r\n") + "\r\n\r\n");
});
server.listen(port, () =>
console.log(`Server running at http://localhost:${port}`)
);
util.js file:
const crypto = require("crypto");
Quote
const MAGIC_KEY = "258EAFA5-E914-47DA-95CA-C5AB0DC85B11";
Quote
function generateAcceptValue(secWsKey) {
return crypto
Quote.createHash("sha1")
Quote
.update(secWsKey + MAGIC_KEY, "utf8")
Quote
.digest("base64");
Quote
}
Quote
function parseMessage(buffer) {
// The first byte contains the FIN bit, opcode, and mask bit
Quote
const firstByte = buffer.readUInt8(0);
Quote
// [FIN, RSV, RSV, RSV, OPCODE, OPCODE, OPCODE, OPCODE];
Quote
// Shift 7 digits to the right to take the first digit, and 1 digit indicates whether it is the last frame of data
Quote
const isFinalFrame = Boolean((firstByte >>> 7) & 0x01);
Quote
console.log("isFIN: ", isFinalFrame);
Quote
// Take out the opcode, the lower four bits
Quote
/**
Quote
- %x0: Represents a continuation frame. When Opcode is 0, it means that data fragmentation is used in this data transmission, and the currently received data frame is one of the data fragments;
Quote
- %x1: Indicates that this is a text frame (text frame);
Quote
- %x2: Indicates that this is a binary frame (binary frame);
Quote
- %x3-7: reserved operation code for non-control frames defined later;
Quote
- %x8: indicates that the connection is disconnected;
Quote
- %x9: Indicates that this is a heartbeat request (ping);
Quote
- %xA: Indicates that this is a heartbeat response (pong);
Quote
- %xB-F: reserved operation code, used for subsequent defined control frames.
Quote
*/
Quote
const opcode = firstByte & 0x0f;
Quote
if(opcode === 0x08) {// 连接关闭
Quote
return;
Quote
}
Quote
if(opcode === 0x02) {// 二进制帧
Quote
return;
Quote
}
Quote
if(opcode === 0x01) {// 目前只处理文本帧
Quote
let offset = 1;
Quote
const secondByte = buffer.readUInt8(offset);
Quote
// MASK: 1位,表示是否使用了掩码,在发送给服务端的数据帧里必须使用掩码,而服务端返回时不需要掩码
Quote
const useMask = Boolean((secondByte >>> 7) & 0x01);
Quote
console.log("use MASK: ", useMask);
Quote
const payloadLen = secondByte & 0x7f; // 低7位表示载荷字节长度
Quote
offset += 1;
Quote
// 四个字节的掩码
Quote
let MASK = [];
Quote
// 如果这个值在0-125之间,则后面的4个字节(32位)就应该被直接识别成掩码;
Quote
if(payloadLen <= 0x7d) { // 载荷长度小于125
Quote
MASK = buffer.slice(offset, 4 + offset);
Quote
offset += 4;
Quote
console.log("payload length: ", payloadLen);
Quote
} else if(payloadLen === 0x7e) { // 如果这个值是126,则后面两个字节(16位)内容应该,被识别成一个16位的二进制数表示数据内容大小;
Quote
console.log("payload length: ", buffer.readInt16BE(offset));
Quote
// 长度是126, 则后面两个字节作为payload length,32位的掩码
Quote
MASK = buffer.slice(offset + 2, offset + 2 + 4);
Quote
offset += 6;
Quote
} else{ // 如果这个值是127,则后面的8个字节(64位)内容应该被识别成一个64位的二进制数表示数据内容大小
Quote
MASK = buffer.slice(offset + 8, offset + 8 + 4);
Quote
offset += 12;
Quote
}
Quote
// 开始读取后面的payload,与掩码计算,得到原来的字节内容
Quote
const newBuffer = [];
Quote
const dataBuffer = buffer.slice(offset);
Quote
for(let i = 0, j = 0; i < dataBuffer.length; i++, j = i % 4) { const nextBuf = dataBuffer[i ];
Quote
newBuffer.push(nextBuf ^ MASK[j]);
Quote
}
Quote
return Buffer.from(newBuffer).toString();
Quote
}
Quote
return "";
Quote
}
Quote
function constructReply(data) {
const json = JSON.stringify(data);
Quote
const jsonByteLength = Buffer.byteLength(json);
Quote
// Currently only supports payloads less than 65535 bytes
Quote
const lengthByteCount = jsonByteLength < 126 ? 0 : 2;
Quote
const payloadLength = lengthByteCount === 0 ? jsonByteLength : 126;
Quote
const buffer = Buffer.alloc(2 + lengthByteCount + jsonByteLength);
Quote
// Set the first byte of the data frame, set opcode to 1, which means the text frame
Quote
buffer.writeUInt8(0b10000001, 0);
Quote
buffer.writeUInt8(payloadLength, 1);
Quote
// If the payloadLength is 126, the content of the next two bytes (16 bits) should be recognized as a 16-bit binary number indicating the size of the data content
Quote
let payloadOffset = 2;
Quote
if(lengthByteCount > 0) {buffer.writeUInt16BE(jsonByteLength, 2);
Quote
payloadOffset += lengthByteCount;
Quote
}
Quote
// Write JSON data into the Buffer buffer
Quote
buffer.write(json, payloadOffset);
Quote
return buffer;
Quote
}
Quote
module.exports = {
generateAcceptValue,
Quote
parseMessage,
Quote
constructReply,
Quote
};
In fact, the server pushes information to the browser, in addition to using WebSocket technology, you can also use SSE (Server-Sent Events). It allows the server to stream text messages to the client, such as real-time messages generated on the server.
To achieve this goal, SSE has designed two components: EventSource API in the browser and a new "event stream" data format (text/event-stream). Among them, EventSource allows the client to receive notifications pushed by the server in the form of DOM events, and the new data format is used to deliver every data update.
In fact: SSE provides an efficient, cross-browser XHR stream implementation, and message delivery only uses a long HTTP connection. However, unlike our own implementation of the XHR flow, the browser will help us manage the connection and parse the message, so that we only focus on the business logic. Space is limited. More details about SSE will not be introduced. Friends who are interested in SSE can read the following articles on their own:
"Inventory of Instant Messaging Technologies on the Web: Short Polling, Comet, Websocket, SSE"
"SSE Technology Explained: A New HTML5 Server Push Event Technology"
"Using WebSocket and SSE Technology to Realize Web-side Message Push"
"Explain the evolution of web-side communication: from Ajax, JSONP to SSE, Websocket"
"Quick Start of IM Communication Technology on the Web: Short Polling, Long Polling, SSE, WebSocket"
"Understanding modern web-side instant messaging technology is enough: WebSocket, socket.io, SSE"
6. Error-prone common sense in the learning process of WebSocket
6.1 What is the relationship between WebSocket and HTTP?
WebSocket is a different protocol from HTTP. Both are located in the application layer of the OSI model, and both rely on the TCP protocol of the transport layer.
Although they are different, RFC 6455 stipulates that WebSocket is designed to work on HTTP 80 and 443 ports, and supports HTTP proxies and intermediaries to make it compatible with the HTTP protocol. In order to achieve compatibility, the WebSocket handshake uses the HTTP Upgrade header, which is changed from the HTTP protocol to the WebSocket protocol.
Now that the OSI (Open System Interconnection Model) model has been mentioned, here is a very vivid and vivid diagram describing the OSI model (as shown in the figure below).
(Picture quoted from: https://www.networkingsphere.com/2019/07/what-is-osi-model.html)
Of course, the relationship between WebSocket and HTTP is obviously not clear in these two sentences, interested readers can read the following two articles:
"Detailed Explanation of WebSocket (4): Questioning the relationship between HTTP and WebSocket (Part 1)"
"Detailed Explanation of WebSocket (5): Questioning the relationship between HTTP and WebSocket (Part 2)"
6.2 What is the difference between WebSocket and long polling?
Long polling means that the client initiates a request, and the server does not respond directly after receiving the request from the client. Instead, the server suspends the request first, and then determines whether the requested data is updated. If there is an update, it will respond. If there is no data, it will wait for a certain period of time before returning.
The essence of long polling is still based on the HTTP protocol, and it is still a question-and-answer (request-response) mode. After a successful handshake, WebSocket is a full-duplex TCP channel, and data can be actively sent from the server to the client.
To understand the difference between WebSocket and long polling, you need a deep understanding of the technical principles of long polling. The following 3 technical introductions about long polling are recommended for in-depth reading:
"Comet Technology Explained: Web-side Real-time Communication Technology Based on HTTP Long Connection"
"Beginner's Post: Detailed Explanation of the Principles of the Most Complete Web-side Instant Messaging Technology in History"
"Inventory of Instant Messaging Technologies on the Web: Short Polling, Comet, Websocket, SSE"
"Quick Start of IM Communication Technology on the Web: Short Polling, Long Polling, SSE, WebSocket"
6.3 What is WebSocket heartbeat?
The receiving and sending of data in the network are implemented using Socket. But if the socket has been disconnected, there must be problems when sending data and receiving data.
But how to judge whether this socket can still be used? This requires the creation of a heartbeat mechanism in the system.
The so-called "heartbeat" is to send a custom structure (heartbeat packet or heartbeat frame) regularly to let the other party know that they are "online" to ensure the validity of the link.
The so-called heartbeat packet is that the client periodically sends simple messages to the server to tell it that I am still there. The code is to send a fixed message to the server every few minutes, and the server replies with a fixed message after receiving it. If the server does not receive the client message within a few minutes, the client will be disconnected.
The control frames of Heartbeat Ping and Heartbeat Pong are defined in the WebSocket protocol:
1) The operation code contained in the heartbeat Ping frame is 0x9: If a heartbeat Ping frame is received, the terminal must send a heartbeat Pong frame as a response, unless a close frame has been received. Otherwise, the terminal should reply to the Pong frame as soon as possible;
2) The operation code contained in the heartbeat Pong frame is 0xA: The Pong frame sent as a response must fully carry the "application data" field passed in the Ping frame.
For point 2): If the terminal receives a Ping frame but does not send a Pong frame in response to the previous Ping frame, then the terminal can choose to send the Pong frame only for the most recently processed Ping frame. In addition, a Pong frame can be automatically sent, which is used as a one-way heartbeat.
PS: Here is an IM practical summary article on WebSocket heartbeat. If you are interested, you can read "Web-side Instant Messaging Practice Dry Goods: How to make your WebSocket disconnect and reconnect faster? ".
6.4 What is Socket?
The two programs on the network exchange data through a two-way communication connection. One end of this connection is called a Socket, so at least a pair of port numbers is required to establish a network communication connection.
The essence of Socket: It is an encapsulation of the TCP/IP protocol stack. It provides an interface for TCP or UDP programming, not another protocol. Through Socket, you can use TCP/IP protocol.
The description of Socket on Baidu Encyclopedia is as follows:
The original meaning of Socket in English is "hole" or "socket": as the process communication mechanism of BSD UNIX, it takes the latter meaning. Usually called "socket", it is used to describe the IP address and port. It is the handle of a communication chain and can be used to realize the communication between different virtual machines or different computers.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。