监控 - Monitoring and Alerting | Has the website been attacked? - 小程序云开发技术专栏

Some time ago, my website was suspected of being attacked. Today, I will take you to the scene of the accident and share accident analysis ideas and post-event prevention and control methods.

Evil spirits

Let’s take a look at how I discovered that the website was attacked.

Generally, in order to ensure the stable operation of online websites and background services, we need to add monitoring and alarm functions to the project. When an unexpected situation occurs, the system will send a notification to the administrator as soon as possible.

Since my project Tencent Cloud Cloud development, quota monitoring and alarms are provided by default, which can prevent excessive resource consumption, which is very convenient.

But an alarm alone is not enough. If something goes wrong, how can we analyze it? Some clues must be provided for troubleshooting.

Tencent is a cloud function by default, cloud hosting etc. Provides monitoring and log records, a line of code, such as running time, and IP address logs can be seen without writing the request information, and you can see the resource and log records without writing the request. , Request header information, etc., very convenient.

In addition, when I was still developing, I added some logs and data reports to the service, such as which user performed what operation at which time. The more detailed the record, the easier it is to troubleshoot problems. Of course, meaningless content does not need to be recorded, otherwise it will be dense, eye-damaging and inefficient when looking at the log!

I always think of the project as my own child (although I don't have a child yet), so I check the monitoring and logs every day to understand the physical condition of the "child".

The monitoring indicator I look at most often is the calls to the service, which largely reflects the access status of user traffic.

Under normal circumstances, the graph of the number of calls versus time should look like the following. No one looks at it at night, and the flow is fairly stable during the day, with occasional small peaks:

But one day, I suddenly saw the graph below. Let's take a look at the characteristics of this graph.

Yes, there is a long hair on the Mediterranean Sea! Around 25 minutes, the number of calls suddenly soared. We generally refer to this phenomenon as a "flow spike", and the outstanding one on the monitoring chart as a "burr".

In most cases, glitches are not a good thing. Seeing this curve, my first reaction was not "Fucking, the project is on fire?" but "Fucking, being attacked!"

Was it attacked? Who attacked me? No, I'm really hot, right (with a hint of fantasy)?

With these questions, let's analyze it quickly.

`analysis`

Just looking at the graph above, it is impossible to analyze it. You must look for clues from the scene of the accident.

Fortunately, Cloud Development helped us record the access log, select the time period when the accident occurred (25 minutes as a benchmark, 5 minutes before and after each), and then filtered out the corresponding logs.

In order to analyze more flexibly, we export the log to the local and open it with spreadsheet software such as Excel.

Then, we have to analyze the log, look at the log production time this column, that is, the incident time:

Did you find out? Log production time is very even! About 3-4 per second.

From this point, it is clear that the high probability is not to access the service manually, but the machine automatically sends the request according to a certain frequency.

Looking at the contents of the log again, the structure of each log is as follows:

// 请求时间
2021-04-29T04:22:05.937752445Z
// 发起请求的 IP
stdout F 169.254.128.20
// 请求头
HEAD /webroot.bak HTTP/1.1\
// 响应状态码
200 0 
// 请求地址
http://www.code-nav.cn/webroot.bak
// 请求浏览器身份
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)

Among them, the request time, request IP, and request address are key information. Time has just been analyzed, let's look at the request IP and address.

I searched the above IP directly in the table and found that all the IP addresses are the same!

Now I'm relieved, it should be just a small mess.

Then I looked at the request addresses of a few consecutive logs, which looked like this:

http://www.code-nav.cn/111.gz
http://www.code-nav.cn/111.tar.bz2
http://www.code-nav.cn/111.dat
http://www.code-nav.cn/111.bz2
http://www.code-nav.cn/222.tgz
http://www.code-nav.cn/222.gz
http://www.code-nav.cn/333.zip
...

Seeing "111", "222", and "333", I roughly understand that this attacker should be scanning my website with dictionary enumeration in an attempt to find out the backend address of the website.

The principle of the attack is very simple, just like when we tried to crack other people's passwords when we were young, we tried frantically one by one. It's just that the attacker usually uses some website scanning tools to pass the possible password as a dictionary to the machine instead of trying it manually. The higher the number and frequency of trials, it is called "blasting".

I recalled the fear of being dominated by the cybersecurity class in college. . .

Based on the above analysis, this "attacker" should just use my website to practice his hands. After all, the scanning frequency is not high and the duration is not long. Of course, I hope so.

`Prevention and control`

Although this incident is not very harmful, it is extremely insulting! It made me fully aware that my website is lacking in security. At the very least, you should give me an alert if there is abnormal traffic, send me a text message or something!

If you build your own server to deploy a website project, you need to access or develop a business monitoring alarm system by yourself. Although there are many such third-party systems on the Internet, such as Zabbix, Prometheus (AlertManager), Grafana, etc., they all need to be deployed and deployed by themselves. Maintenance requires a certain amount of manpower and material resources.

But using Tencent Cloud Cloud to develop , in addition to the basic resource quota alarm mentioned above, you can also flexibly customize various advanced alarm strategies.

For example, to add a call limit alarm to the like function, first select the alarm object as "cloud function":

Then configure the trigger conditions, for example, if the number of calls exceeds 100 times within 5 minutes, the alarm will be triggered:

Then configure the alarm recipient, alarm method, time period, etc., and support email, SMS, WeChat, etc., and choose a variety of options:

In this way, you're done. If you do the same, you can add an alarm to every function of the smallest granularity, and you can detect something wrong in the first place.

`product description`

Cloud Development (Tencent CloudBase, TCB) is a cloud-native integrated development environment and tool platform provided by Tencent Cloud. It provides developers with highly available, automatically and elastically scalable back-end cloud services, including serverless capabilities such as computing, storage, and hosting. , Can be used for cloud integration to develop a variety of end applications (small programs, official accounts, web applications, Flutter clients, etc.) to help developers build and manage back-end services and cloud resources in a unified manner, avoiding cumbersome servers in the application development process With construction and operation and maintenance, developers can focus on the realization of business logic, with lower development thresholds and higher efficiency. Open cloud development: https://console.cloud.tencent.com/tcb?tdl_anchor=techsite Product documentation: https://cloud.tencent.com/product/tcb?from=12763 Technical document: https://cloudbase.net?from=10004 Technical exchange group, latest information, follow WeChat public account [Developed by Tencent Cloud Cloud]

Monitoring and Alerting | Has the website been attacked?

Evil spirits

`analysis`

`Prevention and control`

`product description`

CloudBase云开发

`引用和评论`

如何抓住短剧“狂飙”风口？腾讯微搭发布一站式短剧平台解决方案

夜莺监控 v8.0 新版通知规则 | 对接企微告警

观测云多步拨测最佳实践

夜莺监控 v8.0 新版通知规则 | 对接飞书告警

夜莺监控新版，中心端连不通的时序库也可以告警了

使用外部事件检测接入 CDH 大数据管理平台告警

vivo Trace 监控追求极致的建设历程