Intensive reading of "web streams"

Node stream is more difficult to understand and difficult to use, but "stream" is a very important concept and will become more and more common (the fetch is stream), so we need to study stream seriously.

Fortunately, after the node stream, we launched a more easy-to-use and easy-to-understand web streams API. We combined Web Streams Everywhere (and Fetch for Node.js) , 2016-the year of web streams , ReadableStream , WritableStream these articles to learn.

Node stream and web stream can be converted mutually: .fromWeb() converts web stream to node stream; .toWeb() converts node stream to web stream.

intensive reading

What is a stream?

Stream is an abstract API. We can make an analogy with promise. If promise is an asynchronous standard API, stream hopes to become a standard API for I/O.

What is I/O? It is input and output, that is, the reading and writing of information, such as watching videos, loading pictures, browsing web pages, codecs, etc., are all I/O scenarios, so it does not necessarily require a large amount of data to count as I/O. For example, reading a disk file is counted as I/O, and reading "hello world" string can also be counted as I/O.

Stream is the current standard abstraction for I/O.

In order to better understand the API design of stream, and let you understand more deeply, let's first think for ourselves how to design a standard I/O API?

How should I/O scenarios abstract the API?

read() and write() are the first APIs we think of. If we continue to add, there are open() , close() and so on.

These APIs can indeed be called standard APIs for I/O scenarios, and they are simple enough. But these APIs have a shortcoming, that is, the lack of optimization considerations for reading and writing under large amounts of data. What is a large amount of data read and write? For example, reading a video file of several GB and accessing web pages in a 2G slow network environment. In these cases, if we only have read and write APIs, then it may take 2 hours for a read command to return, while a write command takes 3 Hours of execution time, and at the same time, for users, whether they are watching videos or watching web pages, they cannot accept such a long blank screen time.

But why don’t we wait so long when watching videos and web pages? Because when viewing a web page, it is not necessary to wait for all resources to be loaded before browsing and interaction. Many resources are loaded asynchronously after the first screen is rendered. This is especially true for videos. We will not start playing after loading a 30GB movie. , But you can start playing after downloading the 300kb title.

Whether it is video or web page, in order to quickly respond to content, resources are during the operation continued loaded , if we design a model to support this API, whether large or small resources can be covered with natural than read , wirte design More reasonable.

This behavior of continuously loading resources is stream.

What is stream

Stream can be thought of as describing the state of continuous flow of resources. We need to treat the I/O scenario as a continuous scenario, just like diverting water from one river to another.

To make an analogy, when we send http requests, browse web pages, and watch videos, we can regard it as a process of South-to-North Water Diversion, where the water of River A is continuously transferred to River B.

When sending an http request, River A is the back-end server and River B is the client; when browsing the web, River A is someone else’s website, and River B is your mobile phone; when watching a video, River A is the video resource on the Internet ( Of course it may also be local), River B is your video player.

Therefore, streaming is a continuous process, and there may be multiple nodes. Not only is the network request a stream, but after the resource is loaded to the local hard disk, it is read into the memory, and the video decoding is also a stream. Therefore, there are many midway reservoirs in the process of the South-to-North Water Diversion. node.

Taking these things into consideration together, the web stream API was finally formed.

There are three streams, namely: writable streams, readable streams, and transform streams. Their relationship is as follows:

readable streams represents the A river, which is the source of data. Because it is the source of data, it can only be read but not writable.
Writable streams represents the B river, which is the destination of the data. Because of the continuous storage of water, it can only be written but not readable.
Transform streams are the nodes that transform the data in the middle. For example, there is a dam between the rivers A and B. This dam can control the speed of water transportation by storing water, and can also install a filter to purify the water source, so it is writable at one end The streams input the water of the A river, and the other side provides readable streams for the B river to read.

At first glance, it is a complicated concept, but it is very natural to map to river drainage. The design of stream is very close to the concept of life.

To understand stream, you need to think about the following three questions:

Where do readable streams come from?
Do you want to use transform streams for middleware processing?
What is the logic of consuming writable streams?

read() me explain again, why there are three more thoughts about stream compared to 0617608acdec66 and write() : stream Since I/O is abstracted as a stream concept, that is, it has persistence, then the resource to be read must be a readable stream. So we have to construct a readable streams (in the future, more and more function return values may be streams, that is, work in a stream environment, so there is no need to consider how to construct streams). The reading of the stream is a continuous process, so it is not as simple as calling a function to read it at once. Therefore, writable streams also have a certain API syntax. It is precisely because of the abstraction of resources that both reading and consumption are packaged with a layer of stream API, and read function are themselves, so there is no such additional thinking burden.

Fortunately, the web streams API design is relatively simple and easy to use, and as a standard specification, it is necessary to master it. The following are respectively explained:

readable streams

The read stream is not writable, so the value can only be set during initialization:

const readableStream = new ReadableStream({
  start(controller) {
    controller.enqueue('h')
    controller.enqueue('e')
    controller.enqueue('l')
    controller.enqueue('l')
    controller.enqueue('o')
    controller.close()
  }
})

controller.enqueue() can fill in any value, equivalent to the value added to the queue, controller.close() closed, can not continue enqueue up and closing timing here, will writable streams of close response callback.

The above is just an example of mocks. In actual scenarios, the read stream is often the object returned by some calling functions, the most common is the fetch function:

async function fetchStream() {
  const response = await fetch('https://example.com')
  const stream = response.body;
}

Visible, fetch function returns response.body is a readable stream.

We can directly consume the read stream in the following ways:

readableStream.getReader().read().then({ value, done } => {})

It can also be readableStream.pipeThrough(transformStream) to a conversion stream, or readableStream.pipeTo(writableStream) to a write stream.

Whether it is a manual mock or a function return, we can all guess that the read stream is not necessarily full of data . For example, response.body may need to wait because the read is relatively early, just like the water flow of the connected water pipe is slow. The water in the source pool is the same. We can also manually simulate the slower reading situation:

const readableStream = new ReadableStream({
  start(controller) {
    controller.enqueue('h')
    controller.enqueue('e')

    setTimeout(() => {
      controller.enqueue('l')
      controller.enqueue('l')
      controller.enqueue('o')
      controller.close()
    }, 1000)
  }
})

In the above example, if we use the write stream to 'hello' beginning, we must wait 1s to get the complete 0617608acdedb0 data, but if the write stream 'hello' can be read instantly. In addition, the processing speed of the write stream may be slow. If the time for the write stream to process each word is 1s, then the write stream will be slower than the read stream whenever it is executed.

Therefore, it can be appreciated that the design of the stream is to maximize the efficiency of the entire data processing process, no matter how late the stream data is read, how late it is to start docking the write stream, and how slow the write stream processing is, the whole The links are as efficient as possible:

If readableStream is too late, we can connect later to make readableStream ready to start fast consumption.
If the processing of writableStream is slow, it is only the slow consumption of this place, the docking "water pipe" readableStream may be ready for a long time, at this time, changing to a writableStream with efficient consumption can improve the overall efficiency.

writable streams

The write stream is not readable and can be created as follows:

const writableStream = new WritableStream({
  write(chunk) {
    return new Promise(resolve => {
      // 消费的地方，可以执行插入 dom 等等操作
      console.log(chunk)

      resolve()
    });
  },
  close() {
    // 写入流 controller.close() 时，这里被调用
  },
})

The write stream does not need to be concerned with what the read stream is, so as long as you care about data writing, the write callback write implemented.

write callback needs to return a Promise, so if we consume chunk slowly, the execution speed of the write stream will slow down. We can understand that the river A diverts water to the river B. Even if the river channel of the river A is very wide, the river water It was all poured in, but the channel of River B was very narrow and could not handle such a large water flow. Therefore, due to the width of the channel of River B, the overall water flow speed was still relatively slow (of course, there is no possibility of flooding here).

So how does writableStream trigger writes? It can be written directly write()

writableStream.getWriter().write('h')

It is also possible to pipeTo() to readableStream through 0617608acdeec5, just like manual dripping, but now directly connect to a water pipe, so that we only need to process the write:

readableStream.pipeTo(writableStream)

pipeTo can also be assembled through the most primitive API. In order to understand more deeply, we use the original method to simulate a pipeTo :

const reader = readableStream.getReader()
const writer = writableStream.getWriter()

function tryRead() {
  reader.read().then(({ done, value }) => {
    if (done) {
      return
    }

    writer.ready().then(() => writer.write(value))

    tryRead()
  })
}

tryRead()

transform streams

The inside of the conversion stream is a write stream + a read stream. The way to create a conversion stream is as follows:

const decoder = new TextDecoder()
const decodeStream = new TransformStream({
  transform(chunk, controller) {
    controller.enqueue(decoder.decode(chunk, {stream: true}))
  }
})

chunk is the package controller.enqueue is the entry method of readableStream, so its underlying implementation is actually the superposition of two streams, the API is simplified to transform , you can write the read data while converting it into a read stream , For subsequent write stream consumption.

Of course, there are many native conversion streams available, such as TextDecoderStream :

const textDecoderStream = TextDecoderStream()

readable to writable streams

The following is a complete example that includes encoding and transcoding:

// 创建读取流
const readableStream = new ReadableStream({
  start(controller) {
    const textEncoder = new TextEncoder()
    const chunks = textEncoder.encode('hello', { stream: true })
    chunks.forEach(chunk => controller.enqueue(chunk))
    controller.close()
  }
})

// 创建写入流
const writableStream = new WritableStream({
  write(chunk) {
    const textDecoder = new TextDecoder()
    return new Promise(resolve => {
      const buffer = new ArrayBuffer(2);
      const view = new Uint16Array(buffer);
      view[0] = chunk;
      const decoded = textDecoder.decode(view, { stream: true });
      console.log('decoded', decoded)

      setTimeout(() => {
        resolve()
      }, 1000)
    });
  },
  close() {
    console.log('writable stream close')
  },
})

readableStream.pipeTo(writableStream)

First of all, readableStream uses TextEncoder to instantly add the 5 letters of hello to the queue at a very fast speed controller.close() , which means that this readableStream is initialized instantly and cannot be modified later and can only be read.

In the write method of writableStream, we use TextDecoder to chunk , decode one letter at a time, and print it to the console, and then after 1s, resolve , so the write stream will print a letter every 1s:

h
# 1s later
e
# 1s later
l
# 1s later
l
# 1s later
o
writable stream close

The transcoding and decoding processing in this example is not elegant enough. We don't need to write the transcoding and decoding in the stream function, but in the conversion stream, such as:

readableStream
  .pipeThrough(new TextEncoderStream())
  .pipeThrough(customStream)
  .pipeThrough(new TextDecoderStream())
  .pipeTo(writableStream)

In this way, readableStream and writableStream do not need to process encoding and decoding, but the stream is converted into Uint8Array in the middle, which is convenient for processing by other conversion streams. Finally, after the decoded conversion stream is converted into text, it is written to the stream by pipeTo The text is here.

But this is not always the case. For example, if we want to transmit a video stream, the original value of readableStream may already be Uint8Array, so it depends on the actual situation whether to connect the conversion stream or not.

Summarize

Streams is a standard processing API that abstracts I/O. The feature of supporting continuous small data processing is not accidental, but an inevitable abstraction of I/O scenarios.

We use the example of water flow to compare the concept of streams. When I/O occurs, the source stream conversion has a fixed speed x M/s, and the target client such as video conversion also has a fixed speed y M/s. The request also has a speed and is a continuous process, so fetch naturally a stream. The speed is z M/s. We finally see that the speed of the video is min(x, y, z) . Of course, if the server provides readableStream in advance, then the speed of x can be ignored. At this time, the speed of the video is min(y, z) .

This is not only the case for videos, but also for opening files, opening web pages, etc. The browser processing html is also a streaming process:

new Response(stream, {
  headers: { 'Content-Type': 'text/html' },
})

If the controller.enqueue process of this readableStream is deliberately slower, the web page can even be presented gradually, word by word: Serving a string, slowly Demo .

Although the streaming scene is so common, there is no need to change all the code to stream processing, because the code executes quickly in memory, and the variable assignment is not necessary to use stream processing, but if the value of this variable comes from An open file, or network request, then the use of stream for processing is the most efficient.

The discussion address is: Intensive Reading of "web streams" · Issue #363 · dt-fe/weekly

If you want to participate in the discussion, please click here , a new theme every week, weekend or Monday. Front-end intensive reading-to help you filter reliable content.

Follow front-end intensive reading WeChat public number

Copyright statement: Freely reproduced-non-commercial-non-derivative-keep the signature ( Creative Commons 3.0 License )

Intensive reading of "web streams"

intensive reading

How should I/O scenarios abstract the API?

What is stream

readable streams

writable streams

transform streams

readable to writable streams

Summarize

黄子毅

引用和评论

精读《算法题 - 二叉树中的最大路径和》

Vue.js-Vue实例

2025年最新反编译微信小程序的教程及工具

你可能不知道的图片加载相关知识

更强大、更灵活！ defineModel 重新定义双向绑定

使用CSS给标题添加书名号并超出省略

Base64编码的“暗坑”：解密失败？可能是这些原因！