Introduction
The proxy should be familiar to everyone, the more famous ones are nginx, apache HTTPD, stunnel, etc.
We know that the proxy is to replace the client to make a message request to the server, and we hope to retain the initial TCP connection information, such as source and destination IP and port, in the process of proxy, to provide some personalized operations.
In general, in order to achieve this goal, there are some ready-made solutions, such as in the HTTP protocol, you can use the "X-Forwarded-For" header to include information about the original source address, and "X-Original-To" Information used to carry the destination address.
For another example, in the SMTP protocol, the XCLIENT protocol can be specially used for mail exchange.
Or you can compile the kernel with your proxy as the default gateway for your server.
Although these methods are available, they have more or less restrictions, either related to the protocol or modifying the system architecture, so the scalability is not strong.
Especially in the case of multiple proxy servers chained calls, the above method is almost impossible to complete.
This requires a unified proxy protocol, through which all nodes are compatible with this proxy protocol, and the chain call of the proxy can be seamlessly implemented. This proxy protocol is the proxy Protocol proposed by haproxy in 2010.
The advantages of this proxy protocol are:
- It is protocol agnostic (can be used with any layer 7 protocol, even with encryption)
- It does not require any infrastructure changes
- Can penetrate NAT firewall
- it is extensible
And haproxy itself is a very good open source load balancing and proxy software, providing high load capacity and excellent performance, so it is widely used in many companies, such as: GoDaddy, GitHub, Bitbucket, Stack Overflow, Reddit, Slack, Speedtest .net, Tumblr, Twitter, etc.
What I want to introduce today is the underlying details of haproxy's Proxy Protocol.
Implementation details of the Proxy Protocol
We mentioned above that the purpose of Proxy Protocol is to carry some fields that can mark the initial TCP connection information, such as IP address and port.
If the client and server are directly connected, the server can obtain the following information through getsockname and getpeername:
- address family: AF_INET for IPv4, AF_INET6 for IPv6, AF_UNIX
- socket protocol: SOCK_STREAM for TCP, SOCK_DGRAM for UDP
- Source and destination addresses at the network layer
- The source and destination port numbers of the transport layer
Therefore, the purpose of Proxy Protocol is to encapsulate the above information, and then put the above information into the request header, so that the server can correctly read the client's information.
In the Proxy Protocol, two versions are defined.
In version 1, the header file information is in text form, that is, human-readable. This method is mainly used to ensure better debuggability in the early stage of protocol application, so as to quickly correct the scene.
In version 2, the binary encoding function of the header file is provided. On the premise that the functions of version 1 have been basically perfected, binary encoding is provided, which can effectively improve the transmission and processing performance of the application.
Because there are two versions, the receiving end of the server also needs to implement support for the corresponding version.
In order to better apply the Proxy Protocol, the Proxy Protocol actually defines only one header information. This request header will be placed at the beginning of each connection when the connection initiator initiates the connection. And the protocol is stateless because it doesn't expect the sender to wait for the receiver before sending headers, nor does it expect the receiver to send anything back.
Next, we specifically observe the implementation of the two versions of the protocol.
version 1
In version 1, the proxy header consisted of a string of US-ASCII encoded strings. This proxy header will be sent before the connection is established between the client and server, and before any real data is sent.
Let's first look at an example of an http request using a proxy header:
PROXY TCP4 192.168.0.1 192.168.0.102 12345 443\r\n
GET / HTTP/1.1\r\n
Host: 192.168.0.102\r\n
\r\n
In the above example, \r\n means carriage return and line feed, which is the end-of-line marker. The code sends an HTTP request to host:192.168.0.102, and the first line is the proxy header used.
What exactly does it mean?
The first is the string "PROXY", indicating that this is a proxy protocol header, and is the v1 version.
followed by a space separator.
Then there is the INET protocol and family used by the proxy. For the v1 version, both "TCP4" and "TCP6" are supported. In the above example, we are using TCP4.
If you want to use other protocols, you can set it to "UNKNOWN". If set to "UNKNOWN", then the data after the CRLF will be ignored.
followed by a space separator.
Then there is the IP address of the network layer source. Depending on whether TCP4 or TCP6 is selected, the corresponding source IP address also has different representations.
followed by a space separator.
Then there is the IP address of the destination address of the network layer. Depending on whether TCP4 or TCP6 is selected, the corresponding source IP address also has different representations.
followed by a space separator.
Then is the port number of the TCP source, which ranges from 0 to 65535.
followed by a space separator.
Then is the port number of the TCP destination address, which ranges from 0 to 65535.
Then comes the CRLF terminator.
Such a v1 version of the proxy protocol is defined, isn't it very simple?
According to this definition, we are very good to calculate the maximum length of the entire proxy protocol. For TC4, the maximum length is expressed as:
- TCP/IPv4 :
"PROXY TCP4 255.255.255.255 255.255.255.255 65535 65535\r\n"
=> 5 + 1 + 4 + 1 + 15 + 1 + 15 + 1 + 5 + 1 + 5 + 2 = 56 chars
For TCP6, the maximum length is expressed as:
- TCP/IPv6 :
"PROXY TCP6 ffff:f...f:ffff ffff:f...f:ffff 65535 65535\r\n"
=> 5 + 1 + 4 + 1 + 39 + 1 + 39 + 1 + 5 + 1 + 5 + 2 = 104 chars
For UNKNOWN, the following minimum and maximum lengths are possible:
- unknown connection (short form) :
"PROXY UNKNOWN\r\n"
=> 5 + 1 + 7 + 2 = 15 chars
- worst case (optional fields set to 0xff) :
"PROXY UNKNOWN ffff:f...f:ffff ffff:f...f:ffff 65535 65535\r\n"
=> 5 + 1 + 7 + 1 + 39 + 1 + 39 + 1 + 5 + 1 + 5 + 2 = 107 chars
So, in general, 108 characters is enough for the v1 version.
Version 2
Version 2 is mainly implemented binary encoding, which is not human-readable, but can improve transmission and parsing efficiency.
The header of version 2 is a block starting with the following 12 bytes:
\x0D \x0A \x0D \x0A \x00 \x0D \x0A \x51 \x55 \x49 \x54 \x0A
The next byte (13 bytes) is the protocol version and command. Since a byte is 8 bits, using a byte to store it is a bit too extravagant. So split it into two parts.
The high-order 4 bits store the version, where the version number must be "\x2".
The lower 4 bits store the command, which has the following values:
- LOCAL(\x0): Indicates that the connection is initiated by the proxy itself, generally used when the proxy sends a health check to the server.
- PROXY(\x1): Indicates that the connection is initiated by another node, which is a proxy proxy request. The receiver must then use the information provided in the protocol block to obtain the original address.
- Others: All other commands need to be discarded because they are not recognized.
The next byte (14 bytes) holds the transport protocol and address family.
The upper 4 bits store the address family, and the lower 4 bits store the transport protocol.
address family may have the following values:
- AF_UNSPEC(0x0): Indicates unsupported or undefined protocol. This value can be used when the sender sends a LOCAL command or when processing protocol families.
- AF_INET(0x1): Indicates the IPv4 address, occupying 4 bytes.
- AF_INET6(0x2): Indicates the IPv6 address, occupying 16bytes.
- AF_UNIX(0x3): Indicates the unix address address, occupying 108 bytes.
The transport protocol may have the following values:
- UNSPEC(0x0): Unknown protocol type.
- STREAM(0x1): The SOCK_STREAM protocol is used, such as TCP or UNIX_STREAM.
- DGRAM(0x2): SOCK_DGRAM protocol is used, such as UDP or UNIX_DGRAM.
Combining the lower 4 bits and the upper 4 bits, the following values can be obtained:
- UNSPEC(\x00)
- TCP over IPv4(\x11)
- UDP over IPv4(\x12)
- TCP over IPv6(\x21)
- UDP over IPv6(\x22)
- UNIX stream(\x31)
- UNIX datagram(\x32)
The length of the remaining fields represented by the 15th and 16th bytes. To sum up, the 16-byte v2 can be represented by the following structure:
struct proxy_hdr_v2 {
uint8_t sig[12]; /* hex 0D 0A 0D 0A 00 0D 0A 51 55 49 54 0A */
uint8_t ver_cmd; /* protocol version and command */
uint8_t fam; /* protocol family and address */
uint16_t len; /* number of following bytes part of the header */
};
Starting from the 17th byte, it is the length of the address and the port number information, which can be represented by the following structure:
union proxy_addr {
struct { /* for TCP/UDP over IPv4, len = 12 */
uint32_t src_addr;
uint32_t dst_addr;
uint16_t src_port;
uint16_t dst_port;
} ipv4_addr;
struct { /* for TCP/UDP over IPv6, len = 36 */
uint8_t src_addr[16];
uint8_t dst_addr[16];
uint16_t src_port;
uint16_t dst_port;
} ipv6_addr;
struct { /* for AF_UNIX sockets, len = 216 */
uint8_t src_addr[108];
uint8_t dst_addr[108];
} unix_addr;
};
In the V2 version, in addition to the address information, the header can also contain some additional extended information, which is called Type-Length-Value (TLV vectors), the format is as follows:
struct pp2_tlv {
uint8_t type;
uint8_t length_hi;
uint8_t length_lo;
uint8_t value[0];
};
The meanings of the fields are type, length and value respectively.
The following are currently supported types:
#define PP2_TYPE_ALPN 0x01
#define PP2_TYPE_AUTHORITY 0x02
#define PP2_TYPE_CRC32C 0x03
#define PP2_TYPE_NOOP 0x04
#define PP2_TYPE_UNIQUE_ID 0x05
#define PP2_TYPE_SSL 0x20
#define PP2_SUBTYPE_SSL_VERSION 0x21
#define PP2_SUBTYPE_SSL_CN 0x22
#define PP2_SUBTYPE_SSL_CIPHER 0x23
#define PP2_SUBTYPE_SSL_SIG_ALG 0x24
#define PP2_SUBTYPE_SSL_KEY_ALG 0x25
#define PP2_TYPE_NETNS 0x30
Proxy Protocol usage
As mentioned above, the quality of a protocol is not only in the definition of the protocol, but also in the amount of software that uses this protocol.
If the mainstream proxy software does not use your proxy protocol, then the protocol definition is useless. On the contrary, if everyone is using your protocol, no matter how poorly defined the protocol is, it will be the mainstream protocol.
Fortunately, the Proxy Protocol has been widely used in the proxy server industry.
The specific software that uses this protocol is as follows:
- Elastic Load Balancing, AWS's load balancer, compatible since July 2013
- Dovecot, a POP/IMAP mail server compatible since version 2.2.19
- exaproxy, a forward and reverse proxy server, compatible since version 1.0.0
- gunicorn, python HTTP server, compatible since 0.15.0
- haproxy, reverse proxy load balancer, compatible since 1.5-dev3
- nginx, forward proxy server, http server, compatible since 1.5.12
- Percona DB, database server, compatible since 5.6.25-73.0
- stud, SSL offloader, compatible since the first version
- stunnel, SSL offloader, compatible since 4.45
- apache HTTPD, web server, used in extension module myfixip
- varnish, HTTP reverse proxy cache, compatible since version 4.1
Basically all mainstream servers are compatible with Proxy Protocol, so we can regard Proxy Protocol as the de facto standard.
Summarize
In this article, we introduced the underlying definition of Proxy Protocol, then how to use Proxy Protocol specifically, can you implement your own Proxy Protocol server? Stay tuned.
This article has been included in http://www.flydean.com/20-haproxy-protocol/
The most popular interpretation, the most profound dry goods, the most concise tutorials, and many tricks you don't know are waiting for you to discover!
Welcome to pay attention to my official account: "Program those things", understand technology, understand you better!
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。