1. Background
Tang Qingsong, security engineer of Beijing Fun Plus Technology Co., Ltd., author of the physical book "PHP WEB Security Development Practice", good at enterprise security construction, SDL security construction.
PHPCon 2020 The 8th PHP Developers Conference shared "PHP Security Coding Specifications and Review", and the 7th China Network Security Conference of NSC 2019 shared "PHP Deserialization Vulnerability Analysis Practice" Kanxue 2018 Security Developer Summit served as a web security training camp lecturer
Hello everyone, I am very happy to share with you the topic of "Code Security System Construction". My name is Tang Qingsong, and I am currently doing a lot in SDL. This topic today has a lot to do with SDL. What I'm sharing this topic this time is actually a part of SDL. Many students will also do some work in SDL if they are in Party A, so I hope the content I share this time will be helpful to you.
The topic shared this time is part of SDL, but it's not exactly SDL because I'm mostly focusing on the concept of safe shift left. So the topic shared today is mainly focused on this code risk management. What are the possible risks in this code security? Then it may include technical work and non-technical work, such as management work and some things in this learning aspect will be explained.
1.1 Summary of content
The topic I am talking about today is mainly about how to carry out our security work in four aspects: awareness, technology, supervision and learning. Here I made a brain map, and in the brain map, I think from this light security system, we have four levels that can be done, such as this security training.
In the security training, the first thing we need to tell him is the risk points, the second is that we have to teach him how to avoid pits, then teach him to avoid pits, we can directly get the code of his warehouse, and then We will analyze it by ourselves first. After the analysis, we can tell him where there are some risk points in your code during training, and tell him about the problem.
The third is that we told you, how you can't write, how you should write. Well, after we have formulated this set of rules, it is impossible for us to say that there must be a set of supervision mechanisms for humans to stare at you. Here I will also tell you how to combine semgrep and GitLab to make a hook event to detect some risk points in this code in real time.
Then the fourth is that we will definitely have a security test before going live. What difficulties may arise during security testing? So today I will generally mention these issues. First of all, let's talk about how we do safety training. For safety training, I believe that many students who are technical students may have their own skills, which is better, but let him teach others how to step on the pit and some cases, she is not necessarily good at him.
Let's first talk about how I understand the security of the entire application. I think it is not measured by whether a certain dimension is done well enough, it is a comprehensive aspect. It is also a work of multiple teams, and we, as security personnel, have the main responsibility here.
We want to do everything possible to ensure application security along with development and testing. We first establish such a security awareness for these developers, which tells him that there will be many loopholes in the Internet. What harm do these loopholes have? First of all, when he developed it, he would think that the application should not have loopholes.
Second, after making him conscious, teach him how to avoid these pits, don't let him know that there are these security risks, he doesn't know how to deal with them, and then he still steps on the pits. Then at this time, our security personnel themselves have a certain ability in this technical aspect.
The third is at the supervisory level. If you tell a developer that there are many loopholes in the Internet, you also teach him to avoid pitfalls. But he may not necessarily do what you want without supervising many people, so you must have a certain supervision mechanism at this time.
The fourth point is that it is driven by events. I believe that as a Party A, you will definitely encounter some security events to drive. For example, our company is a game maker, and from time to time there will be some problems with tractor mounts or other external mounts. Then we will organize these things into cases, and then let these developers learn.
2. Safety training
So here I talk about a few non-technical topics, that is, how do we train these developers. First of all, I think there are several aspects of training, and I can give you some reference opinions. For example, what topic did we talk about in the first training? How to avoid saying this at one time, then I will not say it next quarter? Is it because I finished the speech in one go? I think there is some skill, that is, what should you say for the first time.
The second one is that we have trained. After one training, we will call all the members of this development team. So, for example, dozens of people, hundreds of people will be called, and then we will talk on a stage or this group will talk. Well, here I recommend everyone to speak in small groups, and then I will talk about why you should speak in small groups.
The third one is that the case is based on the local case. We must open his code to see it before each training, and take his code to finish the training for them. Then we will start a topic we want to train this time, that is such a form.
2.1 The first basic training
What can we teach him after the first basic training? I think you can first tell him that you are in the security of this Party A, how did you do code audits for them, and how did you do security? This is actually what she is more concerned about, let him understand your work. Then what points do you usually pay attention to, you can fully communicate with him. Then establish a mutual trust mechanism.
Second, you can tell him about this vulnerability classification, such as a common coding vulnerability, such as SQL injection, XSS csrf file upload, and then command execution code injection and so on. After talking about these problems, we can also introduce this logic loophole to him, such as this payment loophole, Yuequan loophole, verification code loophole, SMS loophole and so on. We can give him popular science. one time.
So when talking about these vulnerabilities, you can join their team. What kind of business do they do? For example, if he is in the middle and Taiwan, he may leave the business. He may not have such a payment issue, or he may not have some questions from users or other issues that have nothing to do with him as the master. You can briefly mention it, but don't go into details.
Third, you can tell him some methods of code self-inspection. You can teach him some simple methods. For example, after he has written the code, how can he audit his own security issues? But is there a filter type for this parameter? Is it mandatory? For example, if I want to introduce an ID that is in the PHP language, it may be that he did not use this integer conversion, so if he accepts the possible character, then splicing it into this SQL Injected, right? Then you have to tell him at this time, you said that you have to do a filter when you receive this, if you don't do the filter, you have to do pdo when splicing social statements, and then check SQL to teach him how to to check.
How to check this SQL vulnerability? I believe that the technical students who are Party A have certain opinions. I will not expand them here. You can briefly mention these developments. Generally speaking, it is the students who do development. I also know a lot about this. If you give him a start, he can figure out a lot of things about himself.
2.2 Group training
The second time is training. My suggestion is group training. For example, your company may be divided into many groups. So for now, I am mainly training the back-end students. So when I usually do training, I will talk about some things on the backend. Then, for each group in the backend, he actually involves different things. For example, in some groups, he will not use the HTTP protocol, but he will use the socket TCP protocol. Then you tell him some loopholes in web business, he may not like to hear it.
Therefore, the purpose of group training is to teach students in accordance with their aptitude. Then each group has its own different characteristics, and you have to tell him about his own differences. Then you try to control within 10 people every time you train. And I think you have to control the time, you have to control it within 45 minutes. We usually have class, right, within 45 minutes.
It was said that you have to have a consciousness of this time. You should not keep talking. After you have finished speaking, others will not listen at all. You must have such an interaction mechanism. Then there are three points here, I need to tell you. The first one is to start talking about, you must be close to a real code of this team. Every time you train him, you have to audit her code first. After the audit, there may be some security issues or some irregularities. You take this code and tell him, don't come up directly, it's not bad for the case, and many people may not be willing to listen. Then the second one is to be close to his business scenario, as I mentioned just now. If you talk about HTTP to him, he is not willing to listen to him. I don't have this business at all. What can you tell me? The third is to share more stories in a form to form an interactive atmosphere. For example, when you are training him, the more you can tell him, the more you can tell him about the past. Maybe you have dug a pre-existing loophole in a certain website, how did you find it, and how you might be online If you have seen some cases of others, you can tell them about these analysis steps.
For example, I found that the ID of such a URL is equal to 100, then I think I want to try it. I changed this 100 to 101 and press Enter in the URL address bar. I found out that I saw someone else's information, I also found out someone else's order information, I found out someone else's personal user information and some other permission information, and I often communicated with him in the form of this kind of story. I think the effect is very good.
2.3 Case study on the spot
The third is the case study in situ. Many times when we share this with the classmates in group A, he may not have a loophole case in their group. At this time, I think you must be nearby. You can tell him about their department, right, and you can tell him about the entire company. Then this case is to get as close to their team as possible. So how do you take these cases? We know that before 2016, there were many such vulnerability cases in these Internets, including some vulnerability cases of various large companies. But after June 1, 2016, we basically can't see why the cybersecurity law came out, so there are many cases of such relatively intuitive loopholes. We have no way to get it now. At this time, you can take it from three channels.
The first one is that when you usually do code auditing, you save these cases and take screenshots. Then code the sensitive information. But explain the meaning roughly and organize these cases. The second is from the security test. Every time our business is going online, there will definitely be a round of security tests. Some problems found in the business test can also be sorted out. We have a case library of cases and leaks. The third is the vulnerability incident. For example, some general-purpose vulnerabilities, such as the end of 2021, is there such a component of logic4? I remember that a few days before the new year, there was a command execution vulnerability at that time, The scope of influence is quite large, so we can organize these events for it. You can tell him that you depend on some components and need to upgrade in time. If you are going to reference a component, you'd better check its version for bugs, if there are bugs, don't use it.
Generally speaking, we will use composer for php, another package manager for Java, and package manager for Python, so we need to update it in time, not to mention we package it once, and we don’t update it for a few years, so it’s easy to would create a security hole.
3. Risk reminder
Then we will talk about real-time risk alerts. Then after we finish the training for him, we have this first training, we also have this quarterly training, and we also have some explanations of these cases, right, we also have a role of this supervision, that is, after they write the code, we Had to remind him in time, and we have a full scan every quarter and so on.
3.1 The role of risk reminders
First, let me talk about a function of this risk reminder, which is mainly to strengthen everyone's awareness of a reminder. After you stop talking, don't remind him. After a week or two, he will get used to his previous code. How to write or how to write? You will find that your previous training was in vain, and after you taught it to him, there is not much change. So you have to be reminded.
That reminds us in real time that we can make a hook event in the git repository. Every time it pushes the code, we can take out his code, and after taking it out, extract the lines he changed, and extract the changed lines. . Then let's judge whether he has some problems such as dangerous functions. Then if there are these problems, we will give him feedback and tell him where you may be in danger.
This reminder has three meanings. The first is to strengthen his security awareness and let him know that someone is in charge of this security matter. The second is to block security from the source. But this hook, don't say that you encounter this dangerous function, just call it back. You can return it in this git, return a prompt message, tell it that there may be risks in this place, and let him pay attention. For example, if you put a variable in the command execution place, then you need to ensure that your variable is controllable and filtered to give him such a prompt.
The third is to improve the safety feedback speed. So if you don't have such a real-time reminder, and you go to scan the warehouse for him every two weeks, then maybe his code is online, right?
3.2 Risk function reminder
For the risk function, I simply listed a few functions. For example, here is the code injection, the execution system command, and the plaintext FTP download file.
There are also some encryption libraries and some regular libraries, and there are some reminders of information leakage here.
You can put it in your security risk reminder. I think it's a priority, you can put these high-risk ones in for everyone. For example, like FTP, you can see the situation, you can put it or not. It's like pprof, I think you have to pay attention. There is still some important information in PP info and in Golang. It directly uses these dangerous functions that write this statement, read file content and execute system commands. You should remind it.
3.3 Hook usage
How to use the hook event just mentioned? In fact, the principle of this hook is mainly in the git server, which stores a hook script. Every time he pushes, the server will trigger such a script.
When the script is triggered, you can pass some command lines, and you can get which files have changed, and you can get the changed line numbers. After finishing the data, you can use the dangerous functions I provided just now for the detection rules, and you can expand some by yourself, and write the rule file for it.
Second, this semgrep is relatively popular now, and many teams are using it, so I think it is also a relatively mature thing. You can add this rule to the sum group to detect this code, and then give these risks to Return back. Then the specific implementation address, I wrote an article before, which is more detailed, then you can open it, and then follow the operation to implement it.
So here I will show you this hook, what kind of effect will it have? For example, I'm on the command line, I type a git commit and commit this code, and then it fires when I push. Then when pushing, we can see which file he told me. Then in its line number, there is an EXEC that executes the variable A in this git, then it may cause a command injection, so make sure that the content is not controlled by the user at will. Well, such a prompt word, then you can go to the interface to optimize it.
3. Code Audit
In the code audit, we also have four points that we can share with you.
First of all, the direction of our code audit, how do we audit it? So here are some more technical topics. Then there is a saying that Wen Wu is the first and Wu Wu is second. I believe it is the same when it comes to technology.
3.1 Code audit direction
Then the first one is the general coding audit. For example, we can audit these sql injections, XSS, command execution, file upload, etc., then we can upgrade them through universal coding. Then the second we have to combine her business to audit. For example, you only have a system with users, you have such a user password retrieval and some loopholes in permissions.
The third one is the component type. Then you need to use the language of the warehouse, such as php, use compose, and use another form. Then put these components, determine the version of the component it uses, and then determine whether it has several component vulnerabilities. That specifies the general coding audit method. In fact, I briefly mentioned it earlier, but in fact, it is mainly from these three aspects.
The first is to receive parameters. If I receive an ID in this place, I clearly know that it is an integer. When I received, I didn't use shaping to force the receive. It is more common in such weakly typed languages such as php, so you can track this variable, whether it is put into the sql statement for execution, whether it is returned to the foreground, and whether it is possible to put it into this command for execution Inside this code execution and so on. These methods are to track the received parameters, it is not filtered, we have been chasing to the end, until the end of the program. Well this is one way.
Then the second way is that the conjunction is abnormal. The correlation word is abnormal mainly that we can go to the place of these functions to see if there is a variable in it. If it is a variable, we will trace the source of the variable. Then, if this variable is to receive users without filtering, and then put it into this dangerous function, then there must be a security problem, right? The third is for us to parse its dependent files. So now it is PhP, Java Python and go have dependency package management. Therefore, it is actually better to detect its dependencies now, and we can do one here. Of course there are some where it may still be a more traditional way. For example, PhP is in an old version, and a lot of this system is under PhP 7.0. He does not use composer. He directly downloads the source code to his directory. At this time, your analysis may be more troublesome.
3.2 Tool selection
So at this time, you have to use some third-party tools to analyze it. When it comes to the selection of some tools for code auditing, I have used most of them before, such as fortify, which is relatively proficient. Then check max this tool, I haven't used it yet, because I haven't bought their third code guard, it's from Qi Anxin. I've been using this one for a while.
The fourth one, which I use a lot at present, is mainly used in this hook detection. Some of its functions are also used in the audit system, but at present it is a full code audit, and I still prefer to use this fortify. Then at the time of this hook event, because fortify brings it with the syntax analysis of the AST, it will eat more memory and the response speed is relatively slow, so currently it mainly depends on the Sem group and the fifth CodeQl. CodeQl has this tool. I still know about it some time ago, and I haven't used it in this production environment, so I can't say it very well.
The sixth tool is the one I have used so far. The feeling is that it is a completely open source, including his rules, including his engine system, everyone is open source, but he can only detect other back-end languages of PhP, he can't detect it. So now I use this fortify and Sem group more. Fortify is to be commercial and check max, in fact, code hygiene is also commercial. Fortify is relatively easy to use for me at present. I have never used checkmax, so I don't have much say.
This fortify pan I used for a while, last December. Used for a month. The main feeling is that it is similar to the number of vulnerabilities detected by Welfare Fan, but its interface is designed to be clicked and I feel very uncomfortable. It is said that there will be a new version in March this year, and then everyone can try their SemGroup, whose rules are open source. Then it's this engine that it's encrypted.
The fifth is CodeQL. It has been used on a large scale in github, of course, you can also experience it. But you can only use it for learning, not for commercial use. Now let me talk about it again is an implementation of batch code auditing. That is currently a few products, in fact, its support for a single single library is still relatively good. When I was a Party A, I would encounter such a problem, because I am responsible for the security of the entire company's code base, so I can't say that I can only detect some security problems in several warehouses. At that time I had to audit in batches. It's like our company may have more than 600 warehouses, one by one, I have to go crazy, like fortify to open a large warehouse project, it may take a day or two. It may take him a year to analyze the audit alone. So if this default scheme is used, it is not very realistic.
3.3 Batch code audit tool
So I wrote a tool for batch code auditing, which is QingScan. So its main function actually consists of four parts, one part is information collection, the second part is black box detection, and the third part is code auditing. There is also a special use. Then here I mainly mention it, the white box audit. In fact, the main purpose of the white goods audit is to pull down your project, and then call this fortify and this sem group and some other tools for code auditing, and scan it one by one. Then scan this one, and then move on to the next one. Then it can also be deployed in a distributed manner. So far, I have used it in our company, and I recommend it to everyone to try it out, then this address is here
https://github.com/78778443/QingScan
4. Safety test
Then I will share with you a security test. There are mainly these security tests, web site test, API interface test, private protocol test and case output.
4.1 web site testing
Well, after testing this web site, I think it's relatively routine, and there is no major technical difficulty at present.
For example, testing sql injection and XSS, but generally speaking, there are fewer problems with sql injection and XSS. XSS may be more reflected XSS, but I don’t think the impact is too big, because it’s all that now. The cookie encryption is HTTP Only another form, so there is not much to say.
4.2 API interface test
How to test the API interface? API interface It has some differences with this web site. Like a web site, we may be able to crawl out that address by crawling, and then scan this address. Then the scan has a result, we will verify this, and then submit it after verification.
Then there is a problem with the API interface, that is, we can't go to the crawler. So at this time, we usually open a port of an xray and use this service mode. Then the phone is turned on, we point this port to the proxy address of xray, and then we open some requests, collect these addresses, and then scan. At the same time, we will also have a list of URL addresses, that is, this address is actually provided to us by the development side. Then the classmates in this functional test also have a copy of the logic that we are testing on this address, such as the detection of problems such as unauthorized access and payment loopholes.
4.3 Difficulties in Private Protocol Testing
The third is the private protocol test, which is actually more troublesome to test. For example, the socket protocol, then the TCP protocol, then we actually have no way to directly parse this data packet, unless the We have to have a client that simulates them. At present, there are only a few key projects. We will discuss with them a format of the data on this side and the server side, and then conduct a simulation test. That's still a lot of work. So this is a private protocol. It depends on whether you have enough manpower. There are not many good ways to simulate such a client of this private protocol.
The traditional external site is the easiest to test. It is nothing more than to collect the address first, then test whether there is any problem with the routine, and then test the business function. That business function means that I pay beyond my authority, and then the user password retrieves the verification code and so on. Good API interface. The main thing is that we have to get the address first. After getting the address, other test methods are similar. Then we got this address, we have two ways. The first is to get a list of APIs directly from the development team, and then figure out what each of their parameters does. Then some scan tests and some logic tests are performed on these interfaces, which is not much different from the traditional concept.
The second is that it is possible that we took not all of it from the development team, right? So we can use relay to open a port, and this burp suite will be used to open a port. Then set up a proxy on the mobile phone, pass our data packets over there, and then collect a batch of this address. The third is more troublesome, as I mentioned just now, there is no way to easily understand the format of this data packet, and then it is more troublesome. And it's not easy to make each data package easily, you have to go through the program, and you have no way to change the data manually. But some data it is this hexadecimal. Then he either said that the inscription you saw said that you had to simulate a client to encapsulate data packets. It depends on whether you have enough manpower. If you don't have enough manpower, the test is not very meaningful.
4.4 Case output
Every time we detect a vulnerability, or encounter an emergency response event, we can enter it into our security system, so that we can accumulate experience
Then here is a picture, just our team, some overall situation of some loopholes in the company. There is a quarterly report, a report and statistical information of this department, that is, some statistical information of its vulnerability category.
V. Summary
Today, I will mainly share these four points, from this training to the production of such a hook and this code audit, and finally to security testing. So this time the topic is here, and I hope it will be helpful to everyone. goodbye.
Author: Tang Qingsong
WeChat: songboy8888
Date: March 15, 2022
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。