0. Overview of the whole book
This book uses the user to make a HTTP visit in the browser as a clue to describe the operation mechanism of the network. In the whole process from the user entering the URL in the browser to the page being displayed in the browser, the book introduces the two main components of the network in sequence according to the order of control transfer: the information transmission mechanism (network protocol and routers, switches, etc.) + application software (browser, web server), in the entire network request, how is the division of labor and cooperation.
1. Chapter 1
In the first chapter, I mainly introduce the process of the browser starting from receiving the URL entered by the user and starting to send the HTTP request message.
- Parse URL
- Generate HTTP messages
- Determining the Web Server IP Address
- Delegate OS to send message
1.1 Parsing URLs
what is the URL
URL (Uniform Resource Locator) is a uniform resource locator used to describe the location of a resource. Entering a URL is similar to entering the phone number of the other party when making a call. The URL identifies the protocol type and destination of the request. .
Several types of URLs
The part from the beginning of the URL to the first ":" identifies the protocol type of the URL, such as http, ftp, file, etc. For different network protocols, URL writing also has its own rules, this book will focus on the HTTP network protocol.
1.2 Generate HTTP message
After parsing the URL, we already know the target we want to visit. The next step is to generate a request message sent to the access target. The format of the request message is strictly regulated. So, let's take a look at the format of the HTTP message.
1.2.1 Format of HTTP request message
- request line
The first line of the request message consists of Method + URI + HTTP-Version, separated by spaces.
Regarding method, method is used to identify the action type of the request, such as adding, deleting, and reading. Its value is not reflected in the URL, but is determined by the browser according to the actual situation. - header
There is no blank line between the message header and the request line. It consists of several key-value pairs, each of which occupies one line. At the end of the message header, there is a blank line to mark the end of the message header and separate the message header from the message body.
field name: field value
··· ··· : ··· ··· - message body
The message body is used to carry the information to be submitted in this request, which is often used in PUT and POST requests.
1.2.2 Format of HTTP response message
- response line
The first line of the response message consists of HTTP-Version + Status Code + Status Phrase. In the format of the response message, except the format of the first line of the response line is different from that of the request line, the format of the rest is the same as that of the request message. - header
The format is the same as the request message - message body
The format is the same as the request message
1.3 Obtaining the IP address of the web server
After the request message is generated, the next step is to entrust the protocol stack to send the message to the Web server. However, before sending the message, there is still a problem, that is, by parsing the URL, only the Domain of the target resource is obtained, and when the protocol stack is entrusted to send the message, it must provide not the Domain of the web server, but the IP of the web server. address. So, what is IP, and how can Domain be converted into IP?
1.3.1 Basic knowledge of IP addresses
- Format of IP address: It is a string of 32-bit numbers, 8-bit (1 byte) group is divided into 4 groups, and each group is separated by ".".
- The internal structure of IP: The address contains "network number" + "host number", but the number of bits occupied by the two in 32 bits is not fixed, but is defined by the user when the component network is used. Therefore, in addition to the IP address, an additional piece of information (subnet mask) is required to identify the internal structure of the IP address.
- The subnet mask, which is the same length as the IP address, is also a string of 32-bit numbers. Among them, the left half is '1', and the right half is '0'. The '1' part represents the network number, and the '0' part represents the "host number"
1.3.2 From Domain to IP Address: DNS
After understanding what an IP address is, let's talk about how to obtain an IP address based on Domain. At this point, it is the turn of the DNS (Domain Name System), which is a network server that maintains the mapping relationship between Domains and IP addresses. Because in the Socket library, the function of querying the IP address is encapsulated. Therefore, when creating a Socket, you only need to pass in the Domain, and the Socket will perform a DNS query inside.
Why do you need both a Domain and an IP address? Domain is for the convenience of human memory, and the short IP address can reduce the burden of router routing.
Where do DNS IP addresses come from? It is a setting item as TCP/IP, which is set in advance in the OS and does not need to be queried.
1.3.3 Global DNS Relay
Due to the number of domain names in the world, not only hundreds or thousands, it is impossible to maintain all Domain-IP address mapping relationships on the same DNS server. Therefore, many DNS servers are required to store the mapping information of Domain-IP addresses in a distributed manner. Then, how do these DNS servers all over the world cooperate?
- Domain Name Hierarchy
Before introducing the way of cooperation between DNS servers, let's first introduce the hierarchical relationship of Domain. The domain name is a string separated by ' . '. Its level is also divided by ' . ', the further to the right, the higher the level. For example, in the domain name ' https://dubbo.apache.org/ ', the highest level is 'org', followed by 'apache', and then 'dubbo'. Locate DNS server and get IP address
Since Domains are hierarchical, of course DNS servers will also be stored hierarchically.- First of all, there are 12 root domain name servers in the world, and they will store the DNS service IP addresses of the first-level domain names;
- Secondly, in the DNS server that maintains the first-level domain name, it will store the IP address of the DNS server that maintains the mapping relationship of the second-level domain name in its own domain;
- Similarly, the third-level DNS server maintains the IP address of the fourth-level DNS server, and so on down to the next level until it gets the IP address that matches the complete Domian, and then returns it to the DNS client that initiated the request.
- In addition, each DNS server must maintain the IP address of the Root DNS server. In this way, as long as any DNS server is found, the IP address corresponding to any domain name can be obtained. (There are 12 Root DNS servers worldwide)
1.4 Delegate the protocol stack to send messages
After obtaining the IP address of the Web server from the DNS, the requirements of the protocol stack data are met. Therefore, in the next step, the browser will entrust the protocol stack to send packets to the Web server.
First, the protocol stack abstracts the action of sending messages between the browser and the web server into a model of "pipes and jacks". During communication, there is a "jack" on the browser side and the Web server side, and the two "jacks" are connected by a "pipe". The pipe is bidirectional, and both parties can read and write data to the pipe.
- The server establishes a Socket "jack", waiting to be connected
- The browser creates a Socket, the browser calls the protocol stack program, creates a Socket, and randomly assigns a port. Since the browser program may communicate with multiple web servers at the same time, the protocol stack will return a descriptor to identify this The newly created Socket.
- The browser initiates a connection and connects the two sockets with a pipe (the port number is required). At this time, the browser needs to inform the protocol stack of three parameters, the descriptor of the socket, the IP address of the web server, and the port of the web server No. While establishing the connection, the browser will inform the Web server of the IP address and port number used by its own Socket, so that the Web server can send a response message to its own Socket.
- To send and receive messages, the programs on both sides write data to their own Sockets, and monitor until they get the corresponding messages sent by the Web server to their own Sockets.
- After the Web server sends the response message, it closes the Socket directly, or waits for the browser to close the Socket before closing its own Socket.
In this section, the communication mechanism between the browser and the Web server is described from a macro level, and the specific details will be described in detail in the next chapter.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。