1. "Graphic HTTP" - WEB and Network Fundamentals - 技术读书笔记

tjhttp 1. "Graphic HTTP" - WEB and Network Foundation

Knowledge point

An overview of the history of the birth of HTTP. A Chinese translation website is provided in the text for comparison reading.
Extension: HTTP 3.0 has already come out, why is the 2.0 advancement still only half? off-topic discussion
An overview of the TCP/IP protocol for basic definitions.
Distinguish between URL and URI.

1.1 Key points of this chapter

The opening part is about WEB and network history introduction, so there is not much to understand and remember. The part of network basic TCP/IP is the core of the entire Internet. It is recommended to read it several times to understand and digest.

The transmission of WEB relies on the HTTP (HyperText Transfer Protocol, HyperText Transfer Protocol 1) protocol as a specification. The job of HTTP is to complete a series of operational processes from the client to the server. In order to ensure that two different devices can communicate normally, are subject to the same rules.

At present, HTTP has developed to 3.0, but 2.0 was still being drafted when this book was written, so it can be regarded as an "old" book, and many places need to consult the current information for correction.

1.2 The birth of HTTP

In March 1989 HTTP was born in the hands of a few people. Tim BernersLee of CERN (European Organization for Nuclear Research) proposed the idea of network communication.

In November 1990, CERN successfully developed the world's first web server and web browser.

In 1990, there was a discussion on the HTML 1.0 draft, which was simply scrapped due to several ambiguities in HTML 1.0.

In January 1993, Mosaic developed by NCSA (National Center for Supercomputer Applications), the ancestor of modern browsers, came out. Released for Windows and Macintosh in the fall of the same year.

In December 1994, Netscape Communications released Netscape Navigator 1.0, and in 1995 Microsoft released the infamous Internet Explorer 1.0 and 2.0, and IE's decades-long history of tormenting developers began.

There is a relatively well-known history on the Internet, which is about the browser competition between Microsoft and Netscape. Interested students can check this history. Microsoft finally won the victory with the customer stickiness of the Windows platform and free of charge. Although Jing won the lawsuit, the browser market is gradually declining due to the monopoly of Windows. After all, explore does not charge anyone.

The current FireFox browser was formerly known as Netscape, but the browser kernel is dominated by Google. Edge is also remarkable under the blessing of the chrome kernel and its own optimization, but Microsoft’s behavior of using the platform to forcibly bind customers is still visible. Second, this is something that modern users can perceive.

Also worth mentioning is that the 3W is built with three technologies:

SGML (Standard Generalized Markup Language, Standard Generalized Markup Language) is the highest level standard of the earliest hypertext format. It is possible to define a metalanguage of markup languages, and even to define conventional ways that don't have to use < >, which are too complex to be universal . Subsequent HTTP and XML can be seen as extensions, splits and simplifications of this protocol.
HTML (HyperText Markup Language, hypertext markup language).
URL (Uniform12 Resource Locator, Uniform Resource Locator).

1.3 Brief History of HTTP

HTTP has developed to now, and basically all websites use HTTP1.1 version as the standard. Since RFC2616 released in 1999, a version RFC723 has been released.

This part of the content will be expanded and discussed again in the second chapter

**RFC7231 protocol online reading:

https://tools.ietf.org/html/rfc7231

History development

If you are interested, you can click the agreement number to view the original text.

HTTP/0.9: HTTP came out in 1990. This version can be seen as the prototype before 1.0. Because it was not established as a standard official version, it is called 0.9.
HTTP/1.0: HTTP was officially announced as a standard in May 1996, and the standard number is RFC1945 (click the link to view the white paper).
HTTP/1.1: Published in January 1997, there are still a large number of websites in use until now. The original standard is RFC2068 , and the revised version RFC2616 was released later. The latest version is rfc7231 .
HTTP/2.0: HTTP/2 is the first update of the HTTP protocol since HTTP 1.1 in 1999, an improved version of RFC2616 . It is mainly based on the SPDY protocol. (later revised as RFC 7231)
It is developed by the Hypertext Transfer Protocol Bis (httpbis) working group of the Internet Engineering Task Force (IETF). The group submitted the HTTP/2 standard proposal to the IESG (English: Internet_Engineering_Steering_Group ) for discussion in December 2014, and was approved on February 17, 2015.
Standard number of HTTP2.0: RFC 7540

Bilingual reading in Chinese and English

Finally, I found a translation website about the HTTP protocol on the Internet. The author of the project seems to have abandoned the pit and has not been updated for many years, but it can be used as a reference for students with weak English foundation:

rfc7540-translation-en_us

1.3.1 Features of HTTP/2.0

The goal of HTTP/2.0 is to improve the user's speed experience when using the Web.

In order to support this realization, the official proposed three technologies:

SPDY (SPDY HTTP Speed): Google proposed to improve the efficiency of HTTP access and solve the pipeline defects in HTTP1.X, with the intention of shortening the entire request time.
Mobility Network-Friendly: Drafted by Microsoft Corporation, it is a standard used to improve and improve the communication speed and performance of mobile communication. As the name knows, it is a protocol used to achieve high-speed Internet access for mobile phones.
HTTP Upgrade (Network-Friendly HTTP Upgrade): It is also some improvement ideas for the mobile terminal.

1.3.2 HTTP2.0 off-topic

This book was written around 13 or 14 years. HTTP2.0 has been promoted for nearly ten years now, but the speed of promotion is generally average.

As a reader, you must be curious about how popular it is now. Here is a website for reference.

Judging from the paper data, as of now, less than half of the foreign statistics currently use HTTP/2 , which means that more than half of the servers are still using HTTP1.1.

This leads to the next topic, 3.0 is about to come out, why is 2.0 not yet fully popularized?

This is to understand why the popularity of 2.0 is so difficult.

The first is that the request in HTTP 1.0 is very pure. A request is an HTTP connection, and it is disconnected after the request is completed.

So HTTP1.X has been upgraded, and the establishment and disconnection of TCP can be optimized through Keep-alive, and one connection can also correspond to multiple requests.

However, Google will not meet such efficiency. Google has promoted the upgrade of HTTP 2.0. The problem that the request response of HTTP 1.X must be queued is dealt with, and the entire request is completed using multiplexing.

Of course, this will significantly improve the effect, so why don't everyone use it?

Why is the protocol progressing, and it seems that the efficiency is improving significantly, and why is the HTTP upgrade difficult to advance?

On the surface

The real project basically needs to rely on the framework to complete. There are some old systems and old versions of the framework that are not upgraded when they want to upgrade, or they are too lazy to upgrade at all, because there is no significant benefit to bring benefits. There may even be a problem that outweighs the gain.

deeper reasons

HTTP2.0 comes with HTTPS, which actually leads to a conflict problem. Most of the actual projects need to use Nginx reverse proxy.

But Ngnix can also enable Http2.0 support. Why do you still insist on using Nginx as a reverse proxy instead of using HTTP2.0 directly?

The reason for this may come from TCP long links. In the case of Nginx deployment, in fact, the request does not need to go through a long list of routes but directly interacts with Nginx.

However, HTTP2.0 can be deployed in multiple locations and in parallel with multiple requests in sequence, and can well meet server requirements through clustering and load balancing.

In the framework, if the request is sent to the local, the number of cores in a single machine is limited, and the concurrency efficiency is actually similar to that of HTTP1.X, because the tasks still need to be queued.

If HTTP2.0 is turned on and handed over to Nginx to split modules and simplify functions, clusters can be used without changing the development mode.

In addition, one of the most critical reasons is that although HTTP 2.0 solves the problem of multiplexing concurrent requests in HTTP, the problem of TCP has not been solved.

Therefore, it is generally the pot of TCP, followed by the strong enough Nginx, and the high cost of framework upgrades, and finally the expectation that HTTP3.0 will reach the stomach in one step.

Of course, don't think that the penetration rate of 50% is very low. From another point of view, websites with a lot of traffic and daily use basically have the bonus of HTTP 2.0.

1.4 TCP/IP

Understanding HTTP necessarily requires understanding TCP/IP.

The HTTP protocol is an application layer protocol. If it is a pyramid structure, it is at the top of the model, but in fact the core of the pyramid is TCP/IP.

HTTP is supported on this basis. The modern network architecture is established based on the TCP/IP model, and HTTP is only a part of it.

TCP/IP is a general term for various protocol families related to the Internet . But another way of saying it is that it just represents the two protocols TCP and IP.

The TCP/IP protocol suite is divided into the following 5 layers according to the layers: application layer, transport layer, network layer and data link layer , as well as the physical layer closely related to hardware.

If you are interested in TCP/IP, you can read "Illustrated TCP/IP" and "TCP/IP Detailed Explanation"

The hierarchical design means that it is easy to modify, that is, the high cohesion and low coupling that is often said is fully reflected in the TCP/IP protocol.

But in fact, this design is not completely without shortcomings, that is, although each layer can be upgraded, it cannot break through the original framework. The limitation of the TCP protocol itself is also an important reason that makes it difficult for the HTTP protocol to promote the upgrade.

So what is the OSI model introduced at the beginning of "TCP/IP Detailed Explanation, Volume 1"?

In fact, it is the beautiful vision of the early Internet protocol builders, trying to achieve a highly scalable Internet architecture through this model. The ugly point is to completely monopolize the standard and make the future architecture unlicensed. You all have to listen mine.

Of course, ideals and beautiful reality are very skinny. Because the OSI model structure has too many layers and is difficult to promote and maintain, it is quickly replaced by the more streamlined and well-understood TCP/IP.

So the OSI model is the product of historical struggle, but it is actually the benchmark for network model protocols, and the TCP/IP model eventually survived with the help of UNIX.

OSI模型

What does each layer do according to the model introduction?

Application Layer: Determines application-related activities that provide services to users.
Transport layer: The transport layer is mainly data transmission between protocols, including a variety of protocols, including but not limited to TCP protocols, such as TCP (Transmission Control Protocol, Transmission Control Protocol) and UDP (User Data Protocol, user datagram) protocol).
Network layer: The network layer is used to process data packets flowing on the network, and finally convert them into the smallest unit of network packets for transmission.
Data link layer: It can also be considered as a visible hardware part, such as a network card. The scope of hardware is within the scope of the link layer.
Physical layer: that is, network transmission support devices such as network cables and hubs, which can be directly regarded as network cables from a rough point of view.

The following is the encapsulation process of the entire network data packet. If you want to understand the whole process, you can check the second chapter of the book "How the Network Is Connected" .

TCP/IP请求模型

1.4.1 IP, TCP and DNS

Divided according to the protocol level: IP (Internet Protocol) is located at the network layer , and TCP is located at the transport layer . Therefore, in addition to representing a protocol group, the TCP protocol and the IP protocol themselves are not at the same level, so they cannot be confused.

IP Protocol (Internet Protocol)

IP (Internet Protocol) Internet Protocol is located in the network layer .

The main job of the IP protocol is to ensure that the information can be transmitted accurately. In order to ensure that the data can be delivered correctly, the IP protocol needs to ensure that the MAC address and IP address are correct. The IP address indicates the address to which the node is assigned, and the MAC address refers to the network card to which fixed address.

IP addresses may undergo address translation due to internal and external network communication. Address translation relies on address translation equipment, but MAC addresses are the only network card MAC addresses in the world that have been fixed since the network card was produced.

The ARP protocol communicates with the MAC address, and then communicates and transmits through a method similar to express navigation to find a site. The whole core is through a "look-up table" method.

ARP Protocol (Address Resolution Protocol) : ARP is a protocol used to resolve addresses. The corresponding MAC address can be found out according to the IP address of the communicating party.

请求路由

TCP protocol

The TCP protocol is located at the transport layer and provides byte stream services. The so-called byte stream service refers to splitting large blocks of data into message segments, while reliable service refers to transmitting data to the other party. At the same time, TCP ensures the transmission of large segments of data. Data will be cut.

In order to ensure the accurate transmission of data, the entire TCP also needs to rely on the three-way handshake, and the process of the three-way handshake is not discussed too much here.

TCP协议

DNS service

Users typically use hostnames or domain names to access each other's computers, rather than directly via IP addresses.

DNS is a service responsible for domain name and IP conversion. Before requesting the target server, it is usually necessary to obtain the IP address corresponding to the domain name according to the DNS service.

The relationship between each protocol and HTTP

Note the omission of the MAC header in the book.

The whole picture is the most entry-level perspective. In fact, if you go deeper, you will find that it is not that simple. This picture is also too general. Just look at the basic responsibilities of the character.

各协议和HTTP关系

1.4.2 URLs and URIs

difference and contrast

First we have to distinguish the concept itself:

URL : Indicates Uniform Resource Location, that is, the string of strings at the top of the browser when we access the WEB server.

URI : In fact, there are three components here. The full name of URI is Uniform Resource Identifier. RFC2396 defines these three words in 1.1 of the specification. As a whole, URI is represented by a certain protocol scheme. The location identifier of the resource.

Uniform : Specifying a unified format can facilitate the processing of various types of resources, which is often referred to as "habit is better than configuration". The specific case is that for example, ftp starts with ftp and requests the protocol, and http starts with request. http protocol.
Resource : The abstract definition resource is "anything that can be accessed", such as documents, pictures, network files, etc. can all be regarded as resources.
Identifier : Indicates an object that can be identified, also called an identifier.

The quickest distinction can be directly thought of URI as a protocol and a standard, and a URL as a "direct implementation" and "representation" of the URI protocol standard.

URI belongs to the ranks of the Internet's top-level specifications, and only requests that conform to the URI specification can be identified. Assigned Numbers Authority, Internet Assigned Numbers Authority) management and promulgation.

Finally two intuitive examples of URL definitions:

 hierarchical part
        ┌───────────────────┴─────────────────────┐
                    authority               path
        ┌───────────────┴───────────────┐┌───┴────┐
  abc://username:password@example.com:123/path/data?key=value&key2=value2#fragid1
  └┬┘   └───────┬───────┘ └────┬────┘ └┬┘           └─────────┬─────────┘ └──┬──┘
scheme  user information     host     port                  query         fragment

  urn:example:mammal:monotreme:echidna
  └┬┘ └──────────────┬───────────────┘
scheme              path

Of course, the official also gave some cases in the white paper:

 The following examples illustrate URI that are in common use.

   ftp://ftp.is.co.za/rfc/rfc1808.txt
      -- ftp scheme for File Transfer Protocol services

   gopher://spinaltap.micro.umn.edu/00/Weather/California/Los%20Angeles
      -- gopher scheme for Gopher and Gopher+ Protocol services

   http://www.math.uio.no/faq/compression-faq/part1.html
      -- http scheme for Hypertext Transfer Protocol services

   mailto:mduerst@ifi.unizh.ch
      -- mailto scheme for electronic mail addresses

   news:comp.infosystems.www.servers.unix
      -- news scheme for USENET news groups and articles

   telnet://melvyl.ucop.edu/
      -- telnet scheme for interactive services via the TELNET Protocol

Finally, we simply compare URLs and URIs, and we can see that URIs define the "category" of URLs, so URLs can be seen as a subset of URIs.

URL format

URL 格式

This section mainly introduces the fragment identifiers that are used less frequently, and the fragment identifiers represent the sub-resources in the acquired resources .

Note that not all requests on the Internet will conform to the RFC protocol. RFC refers to the revision of opinions of the HTTP protocol. In most cases, applications will comply with these contents, but there are always exceptions.

If you do not refer to the RFC protocol for communication, you need your own protocol to complete the communication between clients. A typical example, such as the RPC protocol, is a classic non-HTTP protocol communication implementation. Of course, there are many problems and disputes in this scheme.

1.5 Summary

Like most technical books, Chapter 1 is an overview chapter that provides a general introduction to the basics of HTTP.

Of course, this book is indeed very old, and many protocols and standards have been out of use for a long time, but from another point of view, IP, TCP, DNS are basically unchanged for ten thousand years, so there is no need to worry about them becoming outdated. .

The discussion of HTTP 2.0 is an extension of the notes, which briefly discusses why HTTP 2.0 is difficult to advance.

In fact, HTTP2.0 has been popularized in major mainstream websites, and some large domestic manufacturers have basically followed up.

1. "Graphic HTTP" - WEB and Network Fundamentals

tjhttp 1. "Graphic HTTP" - WEB and Network Foundation

Knowledge point

1.1 Key points of this chapter

1.2 The birth of HTTP

1.3 Brief History of HTTP

1.3.1 Features of HTTP/2.0

1.3.2 HTTP2.0 off-topic

1.4 TCP/IP

1.4.1 IP, TCP and DNS

1.4.2 URLs and URIs

1.5 Summary

阿东

引用和评论

清华大学第五弹：DeepSeek与AI幻觉

腾讯 tRPC-Go 教学——（1）搭建服务

@tanstack/react-query 实践

腾讯 tRPC-Go 教学——（2）trpc HTTP 能力

腾讯 tRPC-Go 教学——（4）tRPC 组件生态和使用

腾讯 tRPC-Go 教学——（3）微服务间调用

腾讯 tRPC-Go 教学——（7）服务配置和指标上报