11

Network hierarchy

Consider the simplest case: communication between two hosts. At this time, you only need a network cable to connect the two, and specify the hardware interface of each other, such as using USB, voltage 10v, frequency 2.4GHz, etc. This layer is the physical layer, and these regulations are the physical layer protocol.

图片

Of course, we are not satisfied with only two computers connected, so we can use a switch to connect multiple computers, as shown below:

图片

The network connected in this way is called a local area network or an Ethernet (Ethernet is a kind of local area network). In this network, we need to identify each machine so that we can specify which machine to communicate with. This identification is the hardware address MAC.

The production of the hardware address randomizer is determined, which is permanent and unique. In the local area network, when we need to communicate with another machine, we only need to know his hardware address, and the switch will send our message to the corresponding machine.

Here we can extract the physical layer regardless of how the underlying network cable interface is sent, and create a new layer on top of it, which is the data link layer.

We are still not satisfied with the scale of the LAN. We need to connect all the LANs. At this time, we need to use a router to connect the two LANs:

图片

But if we still use the hardware address as the unique identifier of the communication object, it is unrealistic to remember the hardware addresses of all machines when the network scale becomes larger and larger;

At the same time, a network object may frequently change equipment, and the hardware address table is more complicated to maintain at this time. Here a new address is used to mark a network object: IP address.

Through a simple mailing example to understand the IP address.

I live in Beijing, and my friend A lives in Shanghai. I want to write to friend A:

  • After writing the letter, I will write the address of my friend A on the letter and put it to the Beijing Post Office (add the target IP address to the message and send it to the router)
  • The post office will help me transport the letter to the local post office in Shanghai (the information will be routed to the router of the target IP LAN)
  • The local router in Shanghai will help me deliver the letter to my friend A (intranet communication)

Therefore, the IP address here is a network access address (friend A's address), I only need to know the target IP address, and the router can bring the message to me. In a local area network, a mapping relationship between a MAC address and an IP address can be dynamically maintained, and the MAC address of the machine can be found and sent according to the destination IP address.

In this way, we don't need to manage how the bottom layer chooses the machine. We only need to know the IP address to communicate with our target. This layer is the network layer. The core role of the network layer is to provide logical communication between hosts.

In this way, all hosts in the network are logically connected, and the upper layer only needs to provide the target IP address and data, and the network layer can send the message to the corresponding host.

A host has multiple processes, and different network communications are carried out between the processes, such as chatting with a friend on WeChat with a friend. My mobile phone communicates with two different machines at the same time.

So when my mobile phone receives data, how do I distinguish whether it is WeChat data or the king’s data? Then you must add another layer above the network layer: the transport layer:

图片

The transport layer further splits network information through sockets. Different application processes can make network requests independently without interfering with each other.

This is the most essential feature of the transport layer: it provides logical communication between processes. The processes here can be between hosts or the same host, so in android, socket communication is also a way of process communication.

Now that the application processes on different machines can communicate independently, then we can develop formal applications on the computer network: such as web page http, file transfer ftp and so on. This layer is called the application layer.

The application layer can further split the presentation layer and the session layer, but their essential characteristics have not changed: to complete specific business requirements. Compared with the following four layers, they are not necessary and can be attributed to the application layer.

Finally, summarize the hierarchical calculation network:

图片

  • The bottom physical layer is responsible for the direct communication between two machines through hardware;
  • The data link layer uses the hardware address to address in the local area network to realize the local area network communication;
  • The network layer implements logical communication between hosts through abstract IP addresses;
  • The transport layer splits the data on the basis of the network layer to realize independent network communication of the application process;
  • On the basis of the transportation layer, the application layer develops formal functions according to specific needs.

It should be noted here that layering is not a physical layering, but a logical layering. By encapsulating the logic of the bottom layer, the development of the upper layer can directly rely on the functions of the bottom layer without having to worry about the specific implementation, which simplifies the development.

This layered thinking, that is, the responsibility chain design model, separates different responsibilities through layered encapsulation, which is more convenient for development, maintenance, and so on.

TCP byte stream-oriented features

TCP does not directly add the header to the data transmitted by the application layer and then sends it to the target, but treats the data as a byte stream, and sends them in parts after marking them with a serial number. This is the byte stream-oriented feature of TCP:

图片

  • TCP will read data from the application layer in the form of a stream and store it in its own sending buffer area, and at the same time label these bytes with a sequence number
  • TCP will select an appropriate amount of bytes from the sender's buffer to form a TCP message, and send it to the target through the network layer
  • The target will read the byte and store it in its receiver buffer, and deliver it to the application layer when appropriate

The advantage of byte-oriented streaming is that there is no need to store too large data at a time and occupy too much memory. The disadvantage is that the meaning of these bytes cannot be known. For example, the application layer sends an audio file and a text file, which is a string of words for TCP. Throttling is meaningless, it will cause sticking and unpacking problems, which will be discussed later.

Principle of Reliable Transmission

As mentioned earlier, TCP is a reliable transmission protocol, that is, if a piece of data is handed over to him, he can definitely send it to the destination address intact, unless the network is blown up. The network model he implemented is as follows:

图片

For the application layer, it is an underlying support service for reliable transmission; while the bottom layer of the transport layer uses the unreliable transmission of the network layer. Although protocols can be used to ensure the reliability of data transmission at the network layer and even the data link layer, the design of such a network will be more complicated and efficiency will be reduced. It is more appropriate to place the reliability guarantee of data transmission on the transport layer.

The key points of the reliable transmission principle are summarized as follows: sliding window, timeout retransmission, cumulative confirmation, selection confirmation, and continuous ARQ.

Stop waiting for agreement

To achieve reliable transmission, the easiest way is: I send a data packet to you, and then you reply with me to receive it, and I continue to send the next data packet. The transmission model is as follows:

图片

This "come and go" method to ensure reliable transmission is the stop-and-wait protocol. I don't know if I remember that there is an ack field in the TCP header. When it is set to 1, it means that this message is an acknowledgement message.

Then consider another situation: packet loss. The network environment is unreliable, resulting in that every data packet sent may be lost. If machine A sends a data packet and is lost, then machine B will never receive the data, and machine A will always be waiting.

The solution to this problem is to retransmit after timeout. When machine A sends out a data packet, it starts timing. If the time is up and before receiving the confirmation reply, it can be considered that there is a packet loss, and it is sent again, that is, retransmission.

But retransmission will cause another kind of problem: if the original data packet is not lost, it just stays in the network for a longer time, this time machine B will receive two data packets, then how does machine B distinguish between these two data? Does the package belong to the same piece of data or different data?

This requires the method mentioned earlier: numbering the data bytes. In this way, the receiver can determine whether the data is the next data or the retransmitted data according to the byte number of the data.

There are two fields in the TCP header: sequence number and confirmation number. They represent the number of the first byte of the sender's data and the number of the first byte of the next data that the receiver expects.

Continuous ARQ protocol

The stop-and-wait protocol is sufficient for reliable transmission, but it has a fatal disadvantage: the efficiency is too low. After the sender sends a data packet, it waits. During this period, nothing is done and resources are wasted. The solution is to send data packets continuously. The model is as follows:

图片

The biggest difference from stopping waiting is that it will send continuously, and the receiver will confirm and reply one by one after receiving the data continuously. This greatly improves efficiency. But again, it brings some additional problems:

Is it possible to send infinitely until all the data in the buffer is sent? Not possible. Because you need to consider the receiver's buffer and the ability to read data. If the sending is too fast and the receiver cannot accept it, it will simply retransmit frequently, which wastes network resources. Therefore, the range of data sent by the sender needs to take into account the situation of the receiver's buffer. This is TCP flow control.

The solution is: sliding window. The basic model is as follows:

图片

  • The sender needs to set its own sendable window size according to the receiver's buffer size. The data in the window indicates that it can be sent, and the outside data cannot be sent.
  • When the data in the window receives the confirmation reply, the entire window will move forward until all the data is sent

There is a window size field in the TCP header, which indicates the remaining buffer size of the receiver, so that the sender can adjust its own sending window size. Through the sliding window, TCP flow control can be achieved, so that the transmission is not too fast, resulting in too much data loss.

The second problem caused by continuous ARQ is that the network is flooded with acknowledgment messages with the same amount of data as sent data packets, because every data packet sent must have an acknowledgment response. The way to improve network efficiency is to accumulate confirmations.

The receiver does not need to reply one by one, but after accumulating a certain amount of data packets, tell the sender that all the data before this data packet has been received. For example, when 1234 is received, the receiver only needs to tell the sender that I received 4, and the sender knows that 1234 has been received.

The third question is: how to deal with packet loss. It is very simple in the stop-and-wait protocol, just a timeout retransmission is solved. However, it is not the same in continuous ARQ.

For example: the receiver receives 123 567, six bytes, and the byte number 4 is lost. According to the idea of cumulative confirmation, only 3 confirmation responses can be sent, and 567 must be discarded because the sender will retransmit. This is the idea of GBN (go-back-n).

But we will find that we only need to retransmit 4, which is not a waste of resources, so there is: choose to confirm SACK. In the option field of the TCP message, you can set the received message segment, and each message segment needs two boundaries to determine. In this way, the sender can retransmit only the lost data according to this option field.

Reliable transmission summary

So far, the reliable transmission principle of TCP has been introduced almost. Finally, a summary:

  • Through continuous ARQ protocol and send-confirm reply mode to ensure that every data packet reaches the receiver
  • By numbering the bytes to mark whether each piece of data belongs to retransmission or new data
  • Solve the problem of data packet loss in the network through timeout retransmission
  • Flow control through sliding window
  • Improve the efficiency of confirmation reply and retransmission by accumulative confirmation + selection confirmation method

Of course, this is just the tip of the iceberg of reliable transmission, and you can study it if you are interested.

Source: https://www.toutiao.com/i6954183620600906277/


民工哥
26.4k 声望56.7k 粉丝

10多年IT职场老司机的经验分享,坚持自学一路从技术小白成长为互联网企业信息技术部门的负责人。2019/2020/2021年度 思否Top Writer