5 Alibaba commonly used code detection tools, free to use!

Introduction to 5 commonly used Ali code detection tools are free to experience, only 2 steps, Cherry keyboard, doll hug home, 100% prize!

Author | Yu Yang

Facing problems

In the daily R&D process, the code asset issues we usually face are mainly divided into two categories: code quality issues and code security vulnerabilities.

1. Code quality problem

Code quality is actually a common topic, but the problem is that everyone knows that it is very important, but they don't know how to improve and maintain the common property of this team. On the one hand, developers may have neglected quality control in order to launch functions in time. On the other hand, developers have different coding habits and program understanding styles.

In the long run, the decline in code quality usually has its own cause and effect, which tends to decline due to heavy business pressure, and therefore the development efficiency declines, further increasing business pressure, leading to a vicious circle.

2. Code security issues

Security issues are often hidden in coding logic that lacks security awareness and open source dependent components that have not been tested or maintained, and are difficult to detect in time during daily development and code reviews.

Code security issues can also be analyzed in two aspects:

Coding security issues, namely: security specification issues, reduce the appearance of privacy data leakage, injection risks, and security policy vulnerabilities by avoiding non-compliant codes from entering the corporate code base.

Rely on security issues, that is: open source relies on security vulnerabilities introduced by third-party components. According to the Synopsys 2020 Open Source Security Report, more than 99% of organizations use open source technology. The advantages of using open source components for technical communication and collaboration on the shoulders of giants, reducing development costs, accelerating iteration cycles, and improving software quality need not be repeated. However, while open source software brings a series of conveniences, it also hides a lot of security. Risks. According to audits, 75% of the code base has security vulnerabilities, 49% of which contain high-risk issues, and 82% of the code bases are still using outdated components for more than 4 years.

For code security issues, on the one hand, it is also necessary to conduct an access check, and configure security coding specification detection and card points according to business scenarios and specifications. On the other hand, regular maintenance is required to detect and repair new security vulnerabilities in a timely manner.

5 popular code monitoring tools recommended by

1. Code quality inspection

Java code specification detection

In the practice of Alibaba, due to historical barriers and differences in business styles, various organizations have very different engineering structures, very different code styles, different specifications, high communication costs, low cooperation efficiency, and high maintenance costs. The development of the group to its current scale requires an iterative and intensive development of a professional technical group army, rather than repetitive rebuilding. A truly professional team must have a unified development protocol, which represents efficiency, resonance, feelings, and flexibility. continued.

Based on the above background, Alibaba formulated the "Alibaba Java Development Manual" as the development specification followed by Alibaba's internal Java engineers, covering programming specifications, unit testing specifications, exception log specifications, MySQL specifications, engineering specifications, and security specifications. This is a summary of the experience of nearly 10,000 Alibaba Java technology elites, and has undergone many large-scale front-line actual tests and improvements.

On the surface, the formulation of traffic laws is to restrict traffic rights, but in fact it is to protect the personal safety of the public. Imagine if there is no speed limit, no traffic lights, no right-hand drive clause, who would dare to go on the road. In the same way, for software, the development protocol is by no means to eliminate the creativity and elegance of the code content, but to limit excessive personalization, promote relative standardization, and do things together in a generally recognized way.

Therefore, the goals of the code specification are: 1. Efficient coding: unify standards, improve communication efficiency and research and development efficiency. 2. Code output quality: take precautions before they happen, improve quality awareness and system maintainability, and reduce failure rates. 3. Code out of feelings: craftsman spirit, pursuit of the ultimate spirit of excellence, polish high-quality codes.

The code specification is deeply integrated into various development activities of Alibaba through tools such as IDE detection plug-ins, pipeline integration testing, and code review integration. At the same time, in the cloud-effect code hosting platform Codeup, there is also a built-in integrated Java code specification detection capability, which provides developers with more convenient and quick checks during the code submission and code review stages.

Code smart patch recommendation

Defect detection and patch recommendation have been difficult problems in the field of software engineering for decades, and it is one of the most concerned issues for researchers and front-line developers. The defects mentioned here are not network vulnerabilities or system defects, but defects hidden in the code. . Helping developers to identify these defects and repair them can greatly improve the quality of the software.

Based on the more popular defect detection methods in the industry and academia, and analyzing and circumventing its limitations, the algorithm engineers of Alibaba Codeup proposed a new algorithm to achieve more accurate and efficient analysis of code defects and recommend optimization solutions. The algorithm has been included in the International Conference on Software Engineering (ICSE).

1. Find the repaired commit based on the keywords in the commit message, and only take the commits involving less than 5 files (commits involving too many files may dilute the repair behavior). This step is very dependent on the developer's good commit habits. I hope that the developer can make good use of commit and write a good message.

2. Extract the deleted content and new content from these repaired commits at the file level, namely Defect and Patch pairs (DP Pair), this step will inevitably be very noisy.

3. Use the improved DBSCAN method to cluster buggy and patch pairs simultaneously, clustering similar defects and patch codes together. (Segment-level clustering can also be done) By clustering similar defects and repairs, a large amount of noise left in the previous step is reduced. At the same time, the common mistakes made by everyone in the historical code submission have a strong reference value.

4. Use self-developed template extraction method to summarize defective code and patch code, and adapt the context according to different variables.

The code patch recommendation service is currently used in automatic code scanning scenarios for merge requests. In the code review process, it detects optimizable code fragments and gives optimization suggestions, and precipitates the manual experience in the historical review to continuously improve the quality of the corporate code.

2, code security detection

Sensitive information detection

In recent years, there have been many incidents in the industry in which sensitive information (API Key, Database credential, OAuth token, etc.) was unconsciously leaked out through certain sites, which brought security risks and even direct economic losses to enterprises.

In our practice, we also faced similar problems. Hard-coded problems appeared very frequently and lacked an effective identification mechanism. Therefore, developers and business managers urgently need a stable and sound sensitive information detection method and system. Through research, we have learned that most of the existing sensitive information detection tools simply use rule matching or information entropy technology, which makes it difficult for their recall or accuracy to meet expectations. Therefore, on the basis of rule matching and information entropy technology, combined with context semantics, we propose a sensitive information detection tool-SecretRadar that uses a multi-layer detection model.

SecretRadar's technical implementation ideas are mainly divided into three layers. The first layer uses rule matching, a traditional sensitive information identification technology. Rule matching has good accuracy and scalability, but it relies heavily on relatively solid lengths, prefixes, and variable names. Dealing with different coding styles of different developers can easily lead to underreporting. For scenes that are difficult to capture with fixed rules, we use the information entropy algorithm in the second layer. The information entropy algorithm is used to measure the degree of confusion in the code line, and has a good effect on the recognition of randomly generated keys and random identity information. However, the information entropy algorithm also has its limitations. The recall rate has increased at the same time that false alarms have also increased. Therefore, in the third layer, we adopted template clustering and contextual semantic analysis methods to filter and optimize, extract common keywords for information entropy result aggregation, and combine contextual semantics and current grammatical structure to improve the accuracy of the model.

Sensitive information detection tools not only serve our internally developed students, but also support more than 20,000 code bases and 3,000 companies on the cloud efficiency platform, helping developers solve more than 90,000 hard-coded problems.

Source code vulnerability detection

Alibaba uses the Sourcebrella Pinpoint source umbrella detection engine to perform source code vulnerability detection, which mainly involves injection risk and security strategy risk detection.

The source umbrella detection engine is the technical research result of the Prism research group of the Hong Kong University of Science and Technology in the past ten years. The engine has absorbed the research results of software verification technology in the past ten years in the world, and improved and innovated, independently designed and implemented a set of technology-leading software verification system. The main verification method is to translate the programming language into mathematical expressions such as first-order logic and linear algebra, and to infer the causes of defects through formal verification techniques. So far, a total of four core technology-related papers have been published, one PLDI and three ICSEs. Research-oriented students can click on the link at the end of the article to read.

The source umbrella detection engine can find defects that have been hidden for more than 10 years in large open source projects with high activity. Take MySQL detection [5] as an example. These defects are not scanned by other inspection tools on the market, and can be detected in 1.5 Complete the inspection of 2 million lines of large-scale open source projects in an hour. While maintaining the high efficiency of scanning, it can also control the false alarm rate at about 15%. For complex and large-scale analysis projects, the scanning efficiency and false alarm rate shown by the source umbrella detection engine are also at the leading level in the industry.

"Source Code Vulnerability Detection" integrates the security analysis capabilities of the source umbrella detection engine, and can obtain better analysis results in terms of analysis accuracy, speed, depth, etc., and has the core advantages:

1. Support the analysis of bytecode, and the code logic of the second and third party packages will not be missed;

2. Good at logical analysis of long call links across functions;

3. It can handle indirect data modification caused by references, pointers, etc.;

4. High accuracy. Compared with similar tools, such as Clang and Infer, it performs better in accuracy and effective problem identification;

5. Good performance, the current single application is analyzed in about 5 minutes on average;

The source umbrella detection engine can accurately track the data flow in the code, has high-depth and high-precision function call chain analysis capabilities, and can find in-depth problems that span multiple layers of functions. While discovering defects, it can also give the process of triggering the problem, and fully display the related control flow and data flow, which can assist developers to quickly understand and repair problems, improve software quality at a lower cost in the early stage of software development, and greatly reduce production. Cost, improve research and development efficiency.

Dependency package vulnerability detection

We expect to establish an effective detection and management mechanism for developers for the security and credibility of open source components. Therefore, we have implemented dependent package vulnerability detection services and dependent package security problem reports. In practice, developers generally report that the cost of relying on package vulnerabilities to repair is mostly higher than repairing their own coding vulnerabilities, making it unwilling or difficult to deal with such problems. The reason is that on the one hand, most of the vulnerabilities are not directly introduced, but the dependent third-party components indirectly depend on other components. On the other hand, it is not sure which version is clean, usable and compatible.

In order to reduce the developer's difficulty in repairing, we have further identified and analyzed the reference relationship of dependencies, clearly marked the direct and indirect dependencies, and located the specific dependency package import files, so that developers can quickly find the key problem location. At the same time, through the aggregation of vulnerability data, intelligently recommend version upgrade recommendations to fix vulnerabilities, because one dependency may correspond to multiple vulnerabilities, and developers can evaluate whether to accept the recommendations. By analyzing API changes and code call links between different versions, the cost of version upgrades is measured, and repair reviews are automatically created for developers to help developers maintain code security more efficiently.

Whether it is code quality testing or code security testing, developers can experience the above 5 Alibaba code automatic detection tools for free in the cloud-effect Codeup.

Detection Service Application

1. Code submission

The most direct application of inspection services is in code submission scenarios, where companies can formulate and configure inspection plans for different items according to business scenarios and specifications. When the developer pushes code changes to the server, the detection service of the current code base configuration is automatically triggered, which can check all the problems in the current commit version for the developer, help the developer find new problems early, and confirm the resolution of the existing problems . By accessing the above-mentioned testing services, testing can be moved to the left from multiple dimensions such as code specification, code quality, and code security, and rapid testing and feedback can be performed when developers have just finished coding.

2. Code review

In enterprise project collaboration, developers mostly merge feature branch code into the main branch by means of merge requests. The merge request process requires the person in charge of project development or the person in charge of module to conduct code review and manual inspection. On the one hand, manual review of code requires a lot of effort, and on the other hand, it is difficult for manual review to cover potential problems in all dimensions of the code. Therefore, through reasonable configuration of testing services, the workload of manual review can be greatly reduced, and the work process of code review can be accelerated. At the same time, through enriching, screening, and precipitating the detection rule set and manual experience, the detection service can be more suitable for the business scenario of the enterprise to prevent the code that does not meet the specifications or the risk from entering the enterprise code library.

3. Code measurement

In addition to helping developers find and solve problems early in the code submission and code review stages, testing services can also help managers perform corporate code quality measurement and risk visualization. Through the establishment of enterprise-level report services and project task management, it is possible to more intuitively measure the safety and quality issues in the process of project evolution.

Extended reading

1. Pinpoint: Fast and Precise Sparse Value Flow Analysis for Million Lines of Code

http://t.tb.cn/0qxIpFV5sRD5uxOcgED7o

2. SMOKE: Scalable Path-Sensitive Memory Leak Detection for Millions of Lines of Code

http://t.tb.cn/2l96Jh2yqOGowsfs4oVk2m

3. Pipelining Bottom-up Data Flow Analysis

https://qingkaishi.github.io/public\_pdfs/ICSE2020a.pdf

4. Conquering the Extensional Scalability Problem for Value-Flow Analysis Frameworks

https://qingkaishi.github.io/public\_pdfs/ICSE2020b.pdf

activity recommendation

代码捉虫-PC.jpg

5 Alibaba commonly used code detection tools are free to experience, only needs 2 steps, Cherry keyboard and dolls go home, 100% prizes!

It's all in 2021, do you still think code detection = grammar/style scanning?

What are the scanning software purchased by millions of large companies every year? How to not spend money on prostitution?

What is the second step to implement DevOps?

What is the quality and safety improvement tool with the lowest access cost?

**Cloud Effect DevOps Lab launches [Automatically catch bugs in 1 minute code] activity

1-3 minutes, do a full physical examination of your code. **

After the experience is complete, you can also draw Cherry mechanical keyboard, Alibaba Cloud customized GIt command mouse pad, building block planet, etc., 1000 points gift, 100% winning!

Click on the under to participate in the activity immediately! _Note: This event is only for new users of cloud effect to participate_

https://developer.aliyun.com/adc/series/activity/bugdetect

Copyright Statement: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users, and the copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.