头图

This article is shared by the ELab technical team. The original title is "Exploring HTTPS", with revisions and changes.

1 Introduction

For IM developers, the most commonly used communication technologies in IM are Socket long connections and HTTP short connections (usually a mainstream IM will be a combination of these two communication methods). From the perspective of communication security, the security of Socket long connection is realized based on the SSL/TLS encrypted TCP protocol (such as WeChat's mmtls, see "WeChat's Next-Generation Communication Security Solution: Detailed Explanation of MMTLS Based on TLS1.3") "); and for the security of HTTP short connections, that is HTTPS.

What exactly is HTTPS? Why use HTTPS? Today, I will take this opportunity to learn more about HTTPS with you, including the development of HTTP, problems encountered by HTTP, symmetric and asymmetric encryption algorithms, digital signatures, third-party certificate authorities and other concepts.

study Exchange:

(This article has been published simultaneously at: http://www.52im.net/thread-3897-1-1.html )

2. Series of articles

This article is the ninth article in a series of articles on IM communication security knowledge. The general catalogue of this series is as follows:

"Instant Messaging Security (1): Correctly Understand and Use Android-side Encryption Algorithms"
"Instant Messaging Security (2): Discussing the Application of Combined Encryption Algorithms in IM"
"Instant Messaging Security (3): Explanation of Common Encryption and Decryption Algorithms and Communication Security"
"Instant Messaging Security (4): Case Analysis of the Risk of Hard-coding Keys in Android"
"Instant Messaging Security (V): Application Practice of Symmetric Encryption Technology on Android Platform"
"Instant Messaging Security (6): Principles and Application Practices of Asymmetric Encryption Technology"
"Instant Messaging Security (7): If you understand the principle of HTTPS in this way, one article is enough"
"Instant Messaging Security (8): Do you know whether HTTPS uses symmetric encryption or asymmetric encryption? 》
"Instant Messaging Security (9): Why Use HTTPS? Explain in simple language, explore the security of short connections" (* this article)

3. Write in front

When it comes to HTTPS, we have to go back to the HTTP protocol.

For the HTTP protocol, everyone must be too familiar with it. Do you know the difference between HTTPS and HTTP?

For this classic interview question, most people will answer this way:

1) HTTPS has one more S (Secure) than HTTP: that is to say, HTTPS is a secure version of HTTP;
2) The port numbers are different: HTTP uses port 80, HTTPS uses port 443;
3) Encryption algorithm: HTTPS uses an asymmetric encryption algorithm.

How much is the answer above? After reading this article, we can come back and read this answer.

So, how does HTTPS achieve secure short-connection data transmission? If you want to fully understand this problem, we still have to start from the development history of HTTP...

4. Review of HTTP protocol

4.1 Basic knowledge
HTTP is the abbreviation of Hypertext Transfer Protocol, and the full name in Chinese is Hypertext Transfer Protocol (see "Explanation in simple language, comprehensive understanding of HTTP protocol").

The simple explanation is:

1) Hypertext refers to multimedia resources including but not limited to pictures, audios, videos, etc. outside the text;
2) The protocol is the data transmission format and communication rules agreed upon by both parties.

HTTP is the highest layer of the TCP/IP protocol suite - the application layer protocol:

▲ The above picture is quoted from "Explaining the profound things in simple language and fully understanding the HTTP protocol"

When the browser and the server use the HTTP protocol to transfer hypertext data to each other, they put the data into the message body, and fill in the header (request header or response header) to form a complete HTTP message and send it to the lower transport layer. The corresponding header (control part) is distributed layer by layer, and finally the physical layer sends the binary data in the form of electrical signals.

The HTTP request is shown in the following figure:

▲ The above picture is quoted from "Explaining the profound things in simple language and fully understanding the HTTP protocol"

The structure of the HTTP message is as follows:

4.2 Development History
The evolution of HTTP is as follows:

Judging from the development history of HTTP, the initial version of HTTP (HTTP1.0) can only initiate an HTTP request after each TCP connection is established, and the TCP connection is released after the request is completed.

We all know that the establishment of a TCP connection requires a three-way handshake process, and every time an HTTP request is sent, the TCP connection needs to be re-established, which is undoubtedly inefficient. So HTTP1.1 improves this, using the mechanism of long connection, that is, "one TCP connection, N HTTP requests".

The long and short connections of the HTTP protocol are essentially the long and short connections of the TCP protocol.

In the case of using a long connection, when a web page is opened, the TCP connection used to transmit HTTP data between the client and the server will not be closed. When the client accesses the server again, it will continue to use the established connection. . Keep-Alive does not keep the connection forever, it has a keep time, which can be set in different server software (such as Apache). Implementing a persistent connection requires both the client and the server to support persistent connections.

PS: For IM developers, in order to distinguish it from Socket long connection channels, HTTP is usually considered to be a "short connection" (although this "short connection" is not necessarily really "short").

To open a long connection in HTTP 1.0, you need to add the Connection: keep-alive request header. For the detailed development process of the HTTP protocol, you can read the article "Understanding the Historical Evolution and Design Ideas of the HTTP Protocol".

4.3 Security Issues As HTTP becomes more and more widely used, the security issues of HTTP are gradually exposed.

Recall the carrier hijacking that was everywhere many years ago. When you visit a normal web page, some advertising labels, jump scripts, deceptive red envelope buttons appear on the page for no reason, and sometimes you want to download a The file is finally downloaded and turned into a completely different thing. These are all phenomena of hijacking HTTP plaintext data by the operator.

The following picture is the effect of the operator hijacking that seems familiar:

PS: Regarding the problem of carrier hijacking, you can read "Comprehensive understanding of mobile DNS domain name hijacking and other miscellaneous diseases: principles, root causes, HttpDNS solutions, etc." in detail.

HTTP mainly has the following three security problems:

In a nutshell:

1) Data confidentiality issue: Because HTTP is stateless and transmitted in plain text, all data content is streaking across the network, including user identity information, payment account numbers and passwords. These sensitive information can easily be leaked and cause security risks;
2) Data integrity problem: HTTP data packets will go through many forwarding devices before reaching the destination host, and each device node may tamper or adjust the packet information, and the integrity of the data cannot be verified;
3) Identity verification problem: It is possible to suffer a man-in-the-middle attack, and we cannot verify that the other party of the communication is our target object.

Therefore, in order to ensure the security of data transmission, HTTP data must be encrypted.

5. Common encryption methods

5.1 Basic situation Common encryption methods are divided into three types:

1) Symmetric encryption;
2) Asymmetric encryption;
3) Digital summaries.

The first two are suitable for data transmission encryption, and the irreversible characteristics of digital digests are often used for digital signatures.

Next, let's briefly study these three common encryption methods one by one.

5.2 Symmetric encryption Symmetric encryption, also known as key encryption or one-way encryption, uses the same set of keys for encryption and decryption. The key can be understood as an encryption algorithm.

The symmetric encryption diagram is as follows:

The widely used symmetric encryptions are:

Advantages, disadvantages and applicable scenarios of symmetric encryption algorithms:

1) Advantages: The algorithm is open and simple, encryption and decryption are easy, encryption speed is fast, and efficiency is high;
2) Disadvantage: Relatively speaking, it is not particularly secure. There is only one key. If the ciphertext is intercepted and the key is also hijacked, the information can be easily deciphered;
3) Applicable scenarios: encryption and decryption is fast and efficient, so it is suitable for encryption scenarios with large amounts of data. Since how to transmit the key is a headache, it is suitable for scenarios that do not require key exchange, such as internal systems, where the key can be directly determined in advance.

PS: You can experience the symmetric encryption algorithm online, the link is: http://www.jsons.cn/textencrypt/

Little knowledge: base64 encoding is also a symmetric encryption!

5.3 Asymmetric encryption Asymmetric encryption uses a pair of keys (public and private keys) for encryption and decryption.

Asymmetric encryption can complete decryption without directly passing the key. The specific steps are as follows:

1) Party B generates two keys (public key and private key). The public key is public and can be obtained by anyone, and the private key is kept secret;
2) Party A obtains Party B's public key, and then uses it to encrypt the information;
3) Party B obtains the encrypted information and decrypts it with the private key.

Take the most typical asymmetric encryption algorithm RSA as an example, for example:

To fully understand RSA, you need to understand the knowledge of number theory, and all the derivation process of RSA encryption algorithm. A brief introduction to the idea: Using two super prime numbers and their product as the material for generating the public key and private key, it is very difficult to deduce the private key from the public key (the super large number needs to be factored into two very large prime numbers. product). The longest RSA key that has been cracked so far is 768 bits. That is to say, keys longer than 768 bits cannot be cracked (at least no one has announced it publicly). Therefore, it can be considered that a 1024-bit RSA key is basically secure, and a 2048-bit key is extremely secure.

Advantages, disadvantages and applicable scenarios of asymmetric encryption algorithms:

1) Advantages: high strength, stronger security than symmetric encryption algorithms, no need to pass private keys, resulting in no risk of key leakage;
2) Disadvantages: large amount of calculation and slow speed;
3) Applicable scenarios: It is suitable for scenarios that require key exchange, such as Internet applications, where the key cannot be agreed in advance.

In the process of practical application, it can actually be combined with the symmetric encryption algorithm:

1) The key of the symmetric encryption algorithm is transmitted by using the better security characteristics of the asymmetric encryption algorithm.
2) Using the characteristics of fast encryption and decryption speed of symmetric encryption algorithm, encryption of encryption scenarios with relatively large data content (such as HTTPS) is performed.

PS: For IM developers, the article "Discussing the Application of Combined Encryption Algorithms in IM" is worth reading.

5.4 How to choose?
1) If you choose symmetric encryption:

The HTTP requester uses a symmetric algorithm to encrypt data, so in order for the receiver to decrypt it, the sender also needs to pass the key to the receiver. In the process of passing the key, it is still possible to be attacked by sniffing. After stealing the key, the attacker can still decrypt the data to obtain the sent data, so this scheme is not feasible.

2) If you choose asymmetric encryption:

The receiver keeps the private key and passes the public key to the sender. The sender uses the public key to encrypt the data, and the receiver uses the private key to decrypt the data. Although the attacker cannot directly obtain the data (because there is no private key), he can intercept the passed public key, then pass his own public key to the sender, and then use his own private key to decrypt the data sent by the sender.

In the whole process, both parties of the communication do not know the existence of the middleman, but the middleman can obtain complete data information.

3) A mix of two encryption methods:

First, the asymmetric encryption algorithm is used to encrypt and pass the symmetric encryption key, and then the two parties encrypt the data to be sent through the symmetric encryption method. It looks fine, but is it true?

The middleman can still intercept the transmission of the public key and replace it with his own public key, so as to cure the symptoms but not the root cause.

If you want to solve the problem, you need to find a third-party notary to prove that the public key has not been replaced, so the concept of digital certificate is introduced, which is also the content that will be shared in the next section.

6. Digital certificate

6.1 CA organization
CA is Certificate Authority, that is, an organization that issues digital certificates.

As a trusted third party, the CA is responsible for verifying the validity of the public key in the public key system.

A certificate is a data file that an origin server applies for from a trusted third-party organization. In addition to indicating who the domain name belongs to, the date of issuance, etc., this certificate also includes the private key of the third-party certificate.

The server puts the public key in the digital certificate, and as long as the certificate is trusted, the public key is trusted.

The following two pictures are part of the letter in the certificate of Feishu domain name:

6.2 Digital Signature Digest Algorithm: Generally implemented by hash function, it can be understood as a fixed-length compression algorithm, which can compress data of any length to a fixed length. It's like putting a lock on the data, and any small change to the data can make the summary very different.

Normally, the applicant (server) of a digital certificate will generate a key pair consisting of private and public keys and a certificate request file (Certificate Signing Request, CSR). A CSR is an encoded text file that contains the public key and other information that will be included in the certificate (eg: domain name, organization, email address, etc.). Key pair and CSR generation is usually done on the server where the certificate will be installed, and the type of information contained in the CSR depends on the certificate's validation level. Unlike the public key, the applicant's private key is safe and should never be revealed to the CA (or anyone else).

After the CSR is generated: the applicant sends it to the CA, the CA verifies that the information it contains is correct, and if so, the certificate is digitally signed with the issued private key, and the signature is placed inside the certificate and sent to the application along with the certificate people.

In the SSL handshake phase: after the browser receives the server's certificate, it decrypts it with the CA's public key, and takes out the data in the certificate, the digital signature, and the server's public key. If the decryption is successful, the authenticity of the server can be verified. After that, the browser performs Hash operation on the data, and compares the result with the digital signature. If they are consistent, it can be considered that the content has not been tampered with.

Symmetric encryption and asymmetric encryption are public key encryption and private key decryption, while digital signature is just the opposite - private key encryption (signature) and public key decryption (verification), as shown in the following figure.

Due to space limitations, the content of digital certificates will not be repeated in this article. If you want to know more about it, you can read:

1) Understand the security principle of Https, digital certificate, single-item authentication, double-item authentication, etc. in one article;
2) Do you know, HTTPS uses symmetric encryption or asymmetric encryption? ;
3) If you understand HTTPS in this way, one article is enough.

7. Why use HTTPS

The book "Illustrated HTTP" mentions that HTTPS is HTTP in an SSL shell.

7.1 SSL
SSL was renamed TLS in 1999.

So: HTTPS is not a new application layer protocol, but the HTTP communication interface is partially replaced by SSL and TLS.

Specifically: HTTP will communicate directly with TCP first, while HTTPS will evolve to communicate with SSL first, and then SSL and TCP will communicate.

SSL is an independent protocol. Not only HTTP can be used, but other application layer protocols can also be used. For example, FTP and SMTP can be encrypted using SSL.

7.2 HTTPS request process
The whole process of HTTPS request is as follows:

As shown in FIG:

1) The user initiates an HTTPS request in the browser, and the connection is made using port 443 of the server by default;
2) HTTPS needs to use a set of CA digital certificates. The certificate will be accompanied by a server's public key Pub, while the corresponding private key Private is kept on the server side and not disclosed;
3) The server receives the request and returns the configured certificate containing the public key Pub to the client;
4) The client receives the certificate and verifies the validity, mainly including whether it is within the validity period, whether the domain name of the certificate matches the requested domain name, and whether the upper-level certificate is valid (recursive judgment, until it is judged that the system is built-in or the browser is configured properly) root certificate), if not passed, display HTTPS warning message, if passed, continue;
5) The client generates a random Key for symmetric encryption, encrypts it with the public key Pub in the certificate, and sends it to the server;
6) The server receives the ciphertext of the random Key, decrypts it with the private key Private paired with the public key Pub, and obtains the random Key that the client really wants to send;
7) The server uses the random Key sent by the client to symmetrically encrypt the HTTP data to be transmitted, and returns the ciphertext to the client;
8) The client uses a random Key to symmetrically decrypt the ciphertext to obtain the plaintext of the HTTP data;
9) Subsequent HTTPS requests use the previously exchanged random Key for symmetric encryption and decryption.

7.3 What Problems Does HTTPS Solve?
HTTPS does solve three security problems of HTTP:

1) Confidentiality: Combining asymmetric encryption and symmetric encryption to achieve confidentiality. Use asymmetric encryption to encrypt the symmetric encryption key, and then use symmetric encryption to encrypt the data;
2) Integrity: Solve the integrity problem through the digital signature of the third-party CA;
3) Identity verification: verify the identity of the server through the digital certificate of the third-party CA.

7.4 Advantages and Disadvantages of HTTPS Finally, we summarize the advantages and disadvantages of HTTPS:

It can be seen: HTTPS is indeed the optimal solution for secure HTTP transmission today, but it is not perfect, and there are still loopholes.

8. References

[1] Explain the profound things in simple language and fully understand the HTTP protocol
[2] Some knowledge that must be known about the HTTP protocol
[3] Deep decrypt HTTP from the data transport layer
[4] One article to understand the historical evolution and design ideas of the HTTP protocol
[5] Do you know how many HTTP requests can be made on a TCP connection?
[6] If you understand HTTPS in this way, one article is enough
[7] One minute to understand what problem HTTPS solves
[8] Do you know, HTTPS uses symmetric encryption or asymmetric encryption?
[9] The HTTPS era has come, are you planning to update your HTTP service?
[10] An article to understand HTTPS: encryption principle, security logic, digital certificate, etc.
[11] Comprehensive understanding of mobile DNS domain name hijacking and other miscellaneous diseases: principle, root cause, HttpDNS solution, etc.


JackJiang
1.6k 声望810 粉丝

专注即时通讯(IM/推送)技术学习和研究。