5
头图

background

In our daily work, we often encounter 客户端需要实时获取服务端最新数据 scenarios, such as chat systems (WeChat/Telegram), stock market checking software (Flush/Futu), feed push systems (Twitter/Weibo) )and many more. When meeting these requirements, we have many technical solutions. This article will introduce four common solutions to 短轮询(polling) server-side data in 长轮询(long polling) time. 长轮询(long polling) , 长连接(WebSocket) and 服务器事件推送(Sever-Sent Events, aka SSE) . This article will introduce the basic principles of each scheme, and use them to achieve the same requirement: dynamic event list . The technology stack we use is React + native NodeJS .

Demand introduction

Let's talk about the requirements of this dynamic event list first: our server will generate a new event every 5 seconds , and each event has an id field and a timestamp field. The id and timestamp fields are the timestamps generated by the event. The front-end All event information that has been generated by the server will be displayed in the form of a list. Later, when the server generates a new event, the front end will get the latest event and add it to the end of the page list.

The following is the running effect of the project:
2022-09-03 17.57.08.gif

polling

Concept explanation

I believe that most programmers have more or less used polling to obtain server-side resources. In simple terms, polling is that the client keeps calling the server-side interface to obtain the latest data . The following figure is a simple polling process:
截屏2022-09-03 下午12.00.36.png
In the above figure, the server will respond immediately after the client initiates the request, but because the data on the server has not been updated at this time, an empty result is returned to the client. After waiting for a period of time (maybe a few seconds), the client requests the data from the server again. At this time, since the data on the server has been updated, the latest data will be returned to the client. The client will wait for a while after getting the data. Then continue to send requests, and so on.

Code

Let's use polling to implement the dynamic event list requirements, first of all the Node code:

 // node/polling.js

const http = require('http')
const url = require('url')

// 时间列表
const events = []
// 最新生成的事件时间
let latestTimestamp = 0

// 事件生产者
const EventProducer = () => {
  const event = {
    id: Date.now(),
    timestamp: Date.now()
  }
  events.push(event)
  latestTimestamp = event.timestamp
}

// 每隔5秒生成一个新的事件
setInterval(() => {
  EventProducer()
}, 5000)

const server = http.createServer((req, resp) => {
  const urlParsed = url.parse(req.url, true)

  resp.setHeader('Access-Control-Allow-Origin', '*')
  resp.setHeader('Origin', '*')

  if (urlParsed.pathname == '/events') {
    // 每次客户端都带上它最后拿到的事件时间戳来获取新的事件
    const timestamp = parseInt(urlParsed.query['timestamp'])
    // 判断客户端是否拿到最新事件
    if (timestamp < latestTimestamp) {
      // 将所有没发送过给这个客户端的事件一次性发送出去
      resp.write(JSON.stringify(events.filter(event => event.timestamp > timestamp)))
    }
    resp.end()
  }
})

server.listen(8080, () => {
  console.log('server is up')
})

The above code is very simple, we have implemented a events API, the front end will bring the last timestamp every time to request the latest event after this time point. Then look at the front-end implementation:

 // react/Polling.jsx

import { useEffect, useRef, useState } from 'react'

const fetchLatestEvents = async (timestamp) => {
  // 获取最新的事件
  const body = await fetch(`http://localhost:8080/events?timestamp=${timestamp}`)
  if (body.ok) {
    const json = await body.json()
    return json
  } else {
    console.error('failed to fetch')
  }
}

function App() {
  // 记录当前客户端拿到的最新事件的timestamp
  const timestampRef = useRef(0)
  const [events, setEvents] = useState([])
  
  useEffect(() => {
    const timer = setInterval(async () => {
      const latestEvents = await fetchLatestEvents(timestampRef.current)
      if (latestEvents && latestEvents.length) {
        timestampRef.current = latestEvents[latestEvents.length - 1].timestamp
        setEvents(events => [...events, ...latestEvents])
      }
    }, 3000)

    return () => {
      clearInterval(timer)
    }
  }, [])

  return (
    <div className="App">
      <h2>event list</h2>
      <ol>
        {
          events.map(event => {
            return <li key={event.id}>{`${event.id}`}</li>
          })
        }
      </ol>
    </div>
  );
}

export default App

When we open Chrome's Devtools, we find that the front-end requests the back-end every 3s, and the requests are quite frequent, and when the back-end does not generate new data, the return value of many requests is empty, that is to say 大多数的网络资源都被浪费了 .

请求有很多空响应

Pros and Cons of Polling

From the above code, we can see that the biggest advantage of short polling is 实现简单 , and its shortcomings are also obvious:

  • There are many useless requests: Because the client does not know when the server has data updates, it can only keep asking the server. If the server's data updates are not frequent, most of these requests are useless. Useless requests will increase the bandwidth usage of the server and consume server resources. At the same time, if the client is some mobile devices, the power consumption will be very fast.
  • Poor real-time data: Because we don’t want to consume too many client or server resources, we usually do not send the second request immediately after getting the result of the previous request when implementing polling, which leads to the fact that even if the data on the server is updated , it still takes a period of time for our client to get the latest data, which is fatal for some applications that require high real-time data, such as IM systems.

    scenes to be used

    Generally, production-level applications will not use short-polling this scheme, unless you are only writing some systems for a few people.

    long polling

    After reading the above introduction about short polling, we know that polling has two major flaws: 一个是无用请求过多,另外一个是数据实时性差 . To solve these two problems, some clever programmers invented another solution: long polling. The following is a simple long polling diagram:
    截屏2022-09-03 下午12.02.31.png
    In the above figure, after the client initiates the request, the server finds that there is no new data. At this time, the server does not immediately return the request, but 将请求挂起 , after waiting for a period of time (usually 30s or 60s), if it is found that there is still no data update, an empty result is returned to the client. After the client receives the reply from the server, 立即再次向服务端发送新的请求 . This time, after receiving the client's request, the server also waited for a period of time. Fortunately, the data of the server was updated this time, and the server returned the latest data to the client. The client sends the next request again after getting the result, and so on.

    Code

    Then let's use long polling to dynamically implement the function of the event list. Let's take a look at the back-end code first:

     // node/long-polling.js
    
    const http = require('http')
    const url = require('url')
    
    const events = []
    
    let timers = new Set()
    // 当前挂起的请求
    let subscribers = new Set()
    
    const EventProducer = () => {
    const event = {
      id: Date.now(),
      timestamp: Date.now()
    }
    events.push(event)
    
    // 通知所有挂起的请求
    subscribers.forEach(subscriber => {
      subscriber.resp.write(JSON.stringify(events.filter(event => event.timestamp > subscriber.timestamp)))
      subscriber.resp.end()
    })
    // 重置subscribers
    subscribers.clear()
    // 取消请求的超时回调
    timers.forEach(timer => clearTimeout(timer))
    // 重置timers
    timers.clear()
    }
    
    // 5秒生成一个事件
    setInterval(() => {
    EventProducer()
    }, 5000)
    
    const server = http.createServer((req, resp) => {
    const urlParsed = url.parse(req.url, true)
    
    resp.setHeader('Access-Control-Allow-Origin', '*')
    resp.setHeader('Origin', '*')
    
    if (urlParsed.pathname == '/list') {
      // 发送服务端现存事件
      resp.write(JSON.stringify(events))
      resp.end()
    } else if (urlParsed.pathname == '/subscribe') {
      const timestamp = parseInt(urlParsed.query['timestamp'])
      const subscriber = {
        timestamp,
        resp
      }
      // 新建的连接挂起来
      subscribers.add(subscriber)
      
      // 30s超时,自动关闭连接
      const timer = setTimeout(() => {
        resp.end()
        timers.delete(timer)
      }, 30000)
      
      // 客户端主动断开连接
      req.on('close', () => {
        subscribers.delete(subscriber)
        clearTimeout(timer)
      })
      
      timers.add(timer)
    }
    })
    
    server.listen(8080, () => {
    console.log('server is up')
    })

    In the above code, every time a new connection comes, we will suspend it (saved in the set), and then return all the events that the client has not acquired to it when a new event is generated. Let's take a look Implementation of front-end code:

     // react/LongPolling.jsx
    
    import { useEffect, useRef, useState } from 'react'
    
    const fetchLatestEvents = async (timestamp) => {
    const body = await fetch(`http://localhost:8080/subscribe?timestamp=${timestamp}`)
    if (body.ok) {
      const json = await body.json()
      return json
    } else {
      console.error('failed to fetch')
    }
    }
    
    const listEvents = async () => {
    const body = await fetch(`http://localhost:8080/list`)
    if (body.ok) {
      const json = await body.json()
      return json
    } else {
      console.error('failed to fetch')
    }
    }
    
    function App() {
    const timestampRef = useRef(0)
    const eventsRef = useRef([])
    const [refresh, setRefresh] = useState(false)
    
    useEffect(() => {
      const fetchTask = async () => {
        if (timestampRef.current === 0) {
          // 初次加载
          const currentEvents = await listEvents()
          timestampRef.current = currentEvents[currentEvents.length - 1].timestamp
          eventsRef.current = [...eventsRef.current, ...currentEvents]
        }
    
        const latestEvents = await fetchLatestEvents(timestampRef.current)
        if (latestEvents && latestEvents.length) {
          timestampRef.current = latestEvents[latestEvents.length - 1].timestamp
          eventsRef.current = [...eventsRef.current, ...latestEvents]
        }
      }
    
      fetchTask()
        .catch(err => {
          console.error(err)
        })
        .finally(() => {
          // 触发下次加载
          setRefresh(refresh => !refresh)
        })
    }, [refresh])
    
    return (
      <div className="App">
        <h2>event list</h2>
        <ol>
          {
            eventsRef.current.map(event => {
              return <li key={event.id}>{`${event.id}`}</li>
            })
          }
        </ol>
      </div>
    );
    }
    
    export default App

    It is worth noting that, at this time, when we open the debugging tool of the browser, we can find that each request sent by the browser will not receive a reply immediately, but will have a result after pending for a period of time (about 5 seconds), and The results contain data.

截屏2022-09-04 上午8.31.23.png

Pros and Cons of Long Polling

Long polling perfectly solves the problem of short polling. First, the server does not return data to the client when there is no data update, so it avoids a large number of repeated requests from the client. Furthermore, the client sends the next request immediately after receiving the response from the server, which ensures better real-time data. But long polling is not perfect either:

  • Large consumption of server resources: The server will always hold the client's request, and this part of the request will occupy the server's resources. For some languages, each HTTP connection is an independent thread, and too many HTTP connections will consume the memory resources of the server.
  • Difficult to deal with frequent data updates: If the data is updated frequently, there will be a large number of connection creation and reconstruction processes, and this part of the consumption is very large. Although the keep-alive field of HTTP can solve some problems, the client needs to re-subscribe every time the data is obtained. Therefore, compared with WebSocket and SSE, it has an additional stage of sending new requests, which has an impact on real-time performance and performance. of.

Application scenarios

From the information found on the Internet, the previous WebQQ and Web WeChat were implemented based on long polling. I don't know if it is now, and interested readers can verify it by themselves.

WebSocket

Concept explanation

As mentioned above, long polling is not suitable for scenarios where server resources are frequently updated, and one solution to this problem is WebSocket. The simplest way to introduce WebSocket is to establish a persistent and long connection between the client and the server. This connection is duplex, and both the client and the server can send messages to each other in real time . Below is an illustration of WebSocket:

截屏2022-09-03 上午11.35.00.png
In the above figure, the client will first send an HTTP request to the server. The header of this request will tell the server that it wants to communicate based on the WebSocket protocol. If the server supports the upgrade protocol, it will send a Switching Protocol response to the client. , and they all communicate based on the WebSocket protocol.

Code

Let's take a look at how to use WebSocket to achieve the requirements of dynamic event list, the following is the back-end code:

 // node/websocket.js

const WebSocket = require('ws')

const events = []
let latestTimestamp = Date.now()

const clients = new Set()

const EventProducer = () => {
  const event = {
    id: Date.now(),
    timestamp: Date.now()
  }
  events.push(event)
  latestTimestamp = event.timestamp
  
  // 推送给所有连接着的socket
  clients.forEach(client => {
    client.ws.send(JSON.stringify(events.filter(event => event.timestamp > client.timestamp)))
    client.timestamp = latestTimestamp
  })
}

// 每5秒生成一个新的事件
setInterval(() => {
  EventProducer()
}, 5000)

// 启动socket服务器
const wss = new WebSocket.Server({ port: 8080 })

wss.on('connection', (ws, req) => {
  console.log('client connected')

  // 首次连接,推送现存事件
  ws.send(JSON.stringify(events))
  
  const client = {
    timestamp: latestTimestamp,
    ws,
  }
  clients.add(client)

  ws.on('close', _ => {
    clients.delete(client)
  })
})

In the above code, when the client connects to the server, the server will remember the timestamp of the client and push all new events to the client when new events are generated. The following is the front-end code implementation:

 // react/LongPolling.jsx

import { useEffect, useRef, useState } from 'react'

function App() {
  const timestampRef = useRef(0)
  const eventsRef = useRef([])
  const [_, setRefresh] = useState(false)
  
  useEffect(() => {
    const ws = new WebSocket(`ws://localhost:8080/ws?timestamp=${timestampRef.current}`)
    
    ws.addEventListener('open', e => {
      console.log('successfully connected')
    })
    
    ws.addEventListener('close', e => {
      console.log('socket close')
    })
    
    ws.addEventListener('message', (ev) => {
      const latestEvents = JSON.parse(ev.data)
      if (latestEvents && latestEvents.length) {
        timestampRef.current = latestEvents[latestEvents.length - 1].timestamp
        eventsRef.current = [...eventsRef.current, ...latestEvents]
        setRefresh(refresh => !refresh)
      }
    })
    
    return () => {
      ws.close()
    }
  }, [])

  return (
    <div className="App">
      <h2>event list</h2>
      <ol>
        {
          eventsRef.current.map(event => {
            return <li key={event.id}>{`${event.id}`}</li>
          })
        }
      </ol>
    </div>
  );
}

export default App

Open Chrome's network debugging tool and click ws, you will find that there is only one websocket connection between the client and the server, and all their communication happens on this connection:
截屏2022-09-03 下午5.51.23.png

Advantages and disadvantages of WebSocket

In general, I think WebSocket has the following advantages:

  • The number of connections between the client and the server is small: ideally, the client only needs to send an HTTP upgrade protocol to upgrade to a WebSocket connection, and all subsequent messages are communicated through this channel, and there is no need to establish a connection again.
  • High real-time message: Since the connection between the client and the server is always established, when the data is updated, it can be pushed to the client immediately.
  • Duplex communication: Both the server and the client can send messages to each other at any time, which is difficult for the other three schemes in this article.
  • Applicable to scenarios where server data is frequently updated: Unlike long polling, the server can push new information to the client at any time, and the client does not need to re-establish a connection or send a request after getting the information, so WebSocket is suitable for data Frequently updated scenes.

Also WebSocket is not perfect, it has the following problems:

  • Expansion trouble: WebSocket-based services are 有状态的 . This means that it is very troublesome to expand the capacity, and the system design will be more complicated.
  • Proxy limitation: The long connection time configured by default for some proxy layer software (such as Nginx) is limited, which may be only tens of seconds. At this time, the client needs to automatically reconnect. To break this limit you need to change all the proxy layer configuration from client to server, which may not be feasible in reality.

Application scenarios

The application scenarios of WebSocket are some 实时性要求很高的而且需要双工通信 systems such as IM software.

Server-Sent Events

Concept explanation

Server-Sent Events is abbreviated as SSE , which is a 基于HTTP协议的服务端向客户端推送数据的技术 . Here is a simple SSE diagram:
截屏2022-09-03 下午12.14.06.png
In the above figure, the client initiates a 持久化的HTTP连接 to the server. After the server receives the request, it will suspend the client's request. When there is a new message, it will push the data to the client through this connection. It should be pointed out here that, unlike the WebSocket long connection, the SSE connection is 单向的 , which means it is 不允许客户端向服务端发送消息 .

Code

As above, we use SSE to implement the dynamic event list requirements, first look at the back-end code:

 // node/sse.js

const http = require('http')

const events = []
const clients = new Set()
let latestTimestamp = Date.now()

const headers = {
  // 告诉HTTP连接,它是一个event-stream的请求
  'Content-Type': 'text/event-stream',
  // 保持HTTP连接不断开
  'Connection': 'keep-alive',
  'Cache-Control': 'no-cache',
  'Access-Control-Allow-Origin': '*',
  "Origin": '*'
}

const EventProducer = () => {
  const event = {
    id: Date.now(),
    timestamp: Date.now()
  }
  events.push(event)
  latestTimestamp = event.timestamp

  clients.forEach(client => {
    client.resp.write(`id: ${(new Date()).toLocaleTimeString()}\n`)
    // 后面的两个\n\n一定要有,可以理解为服务端先客户端推送信息的特殊格式
    client.resp.write(`data: ${JSON.stringify(events.filter(event => event.timestamp > client.timestamp))}\n\n`)
    client.timestamp = latestTimestamp
  })
}

// 每5秒生成一个新的事件
setInterval(() => {
  EventProducer()
}, 5000)

const server = http.createServer((req, resp) => {
  const urlParsed = url.parse(req.url, true)

  if (urlParsed.pathname == '/subscribe') {
    resp.writeHead(200, headers)
    
    // 发送现存事件
    resp.write(`id: ${(new Date()).toLocaleTimeString()}\n`)
    resp.write(`data: ${JSON.stringify(events)}\n\n`)
   
    const client = {
      timestamp: latestTimestamp,
      resp
    }
    
    clients.add(client)
    req.on('close', () => {
      clients.delete(client)
    })
  }
})

server.listen(8080, () => {
  console.log('server is up')
})

In the above code, every time the client sends a request to the server, the server first returns all existing events to the client, then suspends the request, and returns all new events to the client when a new event is generated . The following is the front-end code implementation:

 // react/SSE.jsx
import { useEffect, useRef, useState } from 'react'

function App() {
  const timestampRef = useRef(0)
  const eventsRef = useRef([])
  const [, setRefresh] = useState(false)
  
  useEffect(() => {
    const source = new EventSource(`http://localhost:8080/subscribe?timestamp=${timestampRef.current}`)
    
    source.onopen = () => {
      console.log('connected')
    }
    
    source.onmessage = event => {
      const latestEvents = JSON.parse(event.data)
      if (latestEvents.length) {
        timestampRef.current = latestEvents[latestEvents.length - 1].timestamp
        eventsRef.current = [...eventsRef.current, ...latestEvents]
        setRefresh(refresh => !refresh)
      }
    }
    
    source.addEventListener('error', (e) => {
      console.error('Error: ',  e);
    })

    return () => {
      source.close()
    }
  }, [])

  return (
    <div className="App">
      <h2>event list</h2>
      <ol>
        {
          eventsRef.current.map(event => {
            return <li key={event.id}>{`${event.id}`}</li>
          })
        }
      </ol>
    </div>
  );
}

export default App

Open Chrome's network debugging tool, you will find that the HTTP request has become an EventStream type, and all event pushes from the server to the client are on this connection without establishing a new connection.

截屏2022-09-03 下午6.38.44.png

Advantages and disadvantages of SSE

In my opinion, SSE's technology has the following advantages:

  • Low number of connections: There is only one persistent HTTP connection between the client and the server, so the performance is also very good.
  • High real-time data: It is more real-time than long polling, because the connection between the server and the client is persistent, so new messages can be pushed directly to the client.

The problem with SSE is also obvious:

  • One-way communication: SSE long connection is one-way, which does not allow the client to push data to the server.
  • Proxy layer restrictions: Like WebSocket, you will encounter the problem of proxy layer configuration. If the configuration is incorrect, the client needs to constantly reconnect with the server.

    scenes to be used

    SSE technology is suitable for some scenarios where only one-way push events are required from the server to the client, such as stock market push software.

    Summarize

    In this article, I will introduce four different solutions to keep data synchronization with the server through diagrams and actual codes. After reading this article, I believe that when you encounter similar needs later, in addition to short polling You will have more options to choose from. At the same time, I still want to emphasize: 任何一种技术都不是瑞士军刀,都有自己适用和不适用的场景,一定要根据自己的实际情况进行取舍,从而选择最适合的方案,千万不要为了用某个技术而用某个技术 !

Personal technology trends

Creation is not easy. If you learn something from this article, please give me a like or follow. Your support is the biggest motivation for me to continue to create!

At the same time, welcome the onions who pay attention to the attack on the public account to learn and grow together


进击的大葱
222 声望67 粉丝

Bitcoin is Noah's Ark.