About from the perspective of code services and code security, see how cloud-effect code encryption technology can solve this problem
code data is stored in the cloud, how to ensure its security?
Some corporate managers are code hosting : My code exists on the cloud server, will it be leaked?Next, we will look at how cloud-effect code encryption technology can solve this problem from the perspective of code services and code security.
I. Introduction
1. code hosting service
What is a code hosting service?
Code hosting service is a service that runs in a public environment and provides software version control management.
Two problems to be solved at the core of code hosting services:
- Archiving: need to have the ability to archive, that is, to save a copy of our current work for copying, tracing, etc.
- Collaboration: Different people can work based on the same object content, and their results can be reflected on this content together.
Since its inception, Git has been closely associated with the words "open source" and "shared". The reason why it can be quickly promoted and become a mainstream software version control tool is because it has changed the traditional software version. The collaboration method of control tools makes software contribution and collaboration more efficient and convenient.
2. Two forms of code hosting services
Code hosting services usually have two forms:
- Use open source products or purchase private deployment products, set up on user-controllable deployment environment to provide services.
- Using cloud hosting service products, only has the right to use the service , without having to pay attention to the operation and maintenance of the service.
2.1 Self-built code hosting service
As an important part of corporate assets, code assets have always been valued. Many companies and institutions will have self-operated code hosting services. From the perspective of complete control or data security, private network services seem to be more credible and impeccable, but the subsequent stability and reliability of often make small and medium-sized enterprises suffer.
At the beginning of enterprise development, it may only need a server resource to build an available code hosting service to meet certain daily R&D needs; however, as the scale of the enterprise continues to expand, the problems encountered gradually increase, and special personnel are required to manage it. This service; and the development of enterprise R&D personnel to a scale of more than a thousand people, even need to invest a small team to be responsible for the operation and maintenance of code hosting services and customized development work. This is undoubtedly a large sum of cost and expense .
In addition, due to imperfect self-built services, it is easy to cause local operation and maintenance authority to be too large and cause security risks , delete library and run off the road and other vicious incidents. Some IT practitioners can evaporate the company's market value of hundreds of millions of dollars by themselves. , Is always sounding the alarm for us.
2.2 Cloud code hosting service
The cloud code hosting service provides a wider range of service capabilities through cloud sharing; at the same time, because it often has an experienced operation and maintenance management and product team behind it, its reliability is far better than that of self-built enterprises. pull pipe service.
But in comparison, because the cloud code hosting service only has the right to use to the user, it cannot log in to the storage server, and cannot perceive the storage and copy mechanism behind it, and because of the sensitivity of the , it is very important for the cloud code hosting service. trust issue (such as the belief that the code data is visible to the operation and maintenance personnel of the service provider) is always the crux of some medium and large enterprises.
3. The evolution direction of code hosting services
With the continuous evolution of cloud computing, the application of cloud-native technology concepts can maximize the benefits of cloud computing technology, including elastic scaling, pay-as-you-go, no-vendor binding, and high SLA.
When practicing the concept of cloud native, it is bound to need:
- Adopt best practices in the DevOps field to manage R&D and operation and maintenance processes.
- Use the CICD tool chain to achieve rapid iteration and continuous delivery of applications.
GitOps and Iac (Infrastructure as code) are now more and more adopted by many companies, and code hosting services have gradually become a storage infrastructure with software version control and collaboration capabilities. Its reliability and The ability to access has gradually become the core measurement index of major cloud code hosting services. And this ability is exactly what cannot be satisfied by self-built code hosting services.
So, how to build a cloud code hosting security system to enhance users' sense of trust?
4. Cloud code hosting security system
In order to better support the archiving capabilities of the code hosting service, the cloud code hosting system is usually built according to the following four aspects:
Access Security:
Access security includes, but is not limited to, authentication, permission control, data transmission, etc. It mainly solves the identification of users, gives the designated users the reasonable permissions to the minimum, and guarantees the security of the code assets to the greatest extent beforehand. Data encryption during transmission (such as common https, openssl encryption) and other means to prevent intermediaries from intercepting or tampering with user data.
credible data:
On the basis of access security, some additional measures are needed to ensure the credibility of code submission and code attribution (such as accepting only the code of a specific owner, or requiring GPG signatures for submission records), thereby further reducing the Risks caused by account leaks, etc.
storage security:
As the core capability of cloud code hosting services, storage security must not only ensure the reliability of the service, and the data assets will not be lost, but also reduce the risk of users in the process of using storage snapshots and backups. At the same time, how to protect the data stored in physical devices without risk of data leakage is also the most concerned issue for users.
audit risk control:
In addition to the security capabilities before and during the event, the entire security protection system is further strengthened by means of intelligent risk identification and active defense after the event.
In terms of access security, data credibility, and audit risk control, the current cloud code hosting services more or less have some of these capabilities, but I believe that storage security is the core competitiveness. Among them, data encryption technology is also the most challenging.
5. Data encryption technology
Encrypting the data content and putting the key to open the lock in the hands of users is a silver bullet used by many cloud services to solve the problem of user trust. For this reason, object storage services with cloud disk encryption and cloud encryption capabilities are more easily accepted by users. Many database service vendors (such as MySQL, Oracle, PostgreSQL, etc.) have also launched TDE (Transparent Data Encryption) on their products. ability.
So, can code hosting services that also have storage properties also introduce the ability to encrypt data?
In addition to providing users with the ability to back up and delete code assets, through data encryption, the user's data assets are only visible to the users themselves, so as to achieve the goal of almost completely controllable code storage.
2. Industry code encryption scheme
1. Client Encryption
The client-side encryption method represented by the open source software git-crypt is a good choice for sensitive information storage. The user generates a data key to encrypt the target file and submit the encrypted content to the git warehouse.
In this mode, the encrypted content is only visible to the author, and when you need to share it with others, you can obtain the PGP public key of others, encrypt the data key used for encryption, and submit it to the code repository. . After the other party obtains the ciphertext of the data key, it uses its own PGP private key to decrypt it and obtains the data key, and then the data can be decrypted.
However, the disadvantages of this method are also obvious. Once the data key is obtained once, it can be used repeatedly and cannot be deauthorized; the original text file is encrypted and becomes a binary file, which also makes git unable to create delta for incremental changes. Use and frequent changes will lead to rapid expansion of the warehouse volume.
2. Disk encryption
Disk encryption technology is already very mature and seems to be a good option. However, disk encryption only solves the problem of data security at the physical device level. At the vm (virtual machine) level, data is still accessed in plaintext.
3. Encryption when the server is idle
For code warehouses that are not frequently used, encryption at rest is also a good solution. When a Git repository is not accessed, it is encrypted, and when it needs to be accessed again, it needs to wait for the decryption to be completed before opening access.
The versatility of this scheme is very high, and there are also many encryption schemes to choose from. The cost of implementation is not high, but the shortcomings are also obvious: warehouse access requires a preheating process, and the larger the warehouse, the preheating time is also Correspondingly longer; frequently accessed warehouses are almost always stored on disk in a non-encrypted form, and these hotspot warehouses are often the ones that users are most concerned about and least expect to be leaked.
4. Cloud storage encryption based on JGit and S3
CodeCommit, AWS' code hosting product, provides code encryption capabilities. It is based on the code hosting service implemented by JGit, reuses the original JGit storage model, and utilizes the storage encryption capabilities of S3.
The community activity of JGit is much lower than that of Git, and in terms of performance comparison, Git is also far better than JGit, which is why all major mainstream code hosting services use Git.
3. Transparent encryption for cloud code hosting
The transparent encryption of the cloud code hosting service is a server-side encryption technology (Server-Side Encryption) : It uses the user-authorized key to encrypt the user code data; when the user accesses it, the user’s Key to decrypt the data. The entire encryption and decryption process is completely transparent to the user. The user can use the regular Git client to access the code library or browse the page service provided by the code service; but the user guarantees retain full control of the data , which can be used when necessary , By canceling the key authorization, the purpose of freezing the code data is achieved.
1. Advantages of Cloud Transparent Encryption
Advantages of Git TDE (Transparent Data Encryption):
- Does not rely on the file system.
- The data accessed by the file system is ciphertext.
- You can choose to encrypt only some sensitive warehouses to reduce the impact on performance.
Git TDE protection breaks through the security threats of file system access control:
- A malicious user who steals storage devices and directly accesses code library files.
- A malicious user who restores data from a disk backup.
- Protect static data (persistent data), invisible to platform operation and maintenance personnel.
2. Encryption method
Encryption and decryption using envelope encryption [2]:
- The data key is generated by the key management service KMS, and the ciphertext of the data key and the encrypted result are stored.
- When decryption is needed, KMS decrypts the ciphertext of the data key to decrypt the encrypted content.
KMS encryption and decryption are usually used to process a small amount of data; while Git has a large amount of data, KMS encryption and decryption services cannot be used directly. Envelope encryption is a good choice for encrypting data keys.
3. Key granularity
One data key for each code base: One key for one library can effectively avoid data leakage.
4. Which files need to be encrypted
The data in the Git warehouse includes reference data (HEAD, branch, tag, etc.), index data (.idx, .bitmap), loose objects, and packed data (.pack).
Under normal circumstances, git push
will be stored as loose objects, and a loose object file corresponds to a user-side file entity. After garbage collection git gc
, loose objects will be cleaned up, and at the same time they will be packed into a single packaged data (ie packfile).
So, in this case, which files need to be encrypted?
We believe that reference information such as branches does not contain sensitive user information, so there is no need for encryption. At the same time, the index file is only used for searching packaged data, not user information, and no encryption is required. Under normal circumstances, our sensitive information is stored in smaller text files (such as code text), combined with performance considerations, we decided to:
- Encrypt only loose objects <10MB (equivalent to 5 million words in Chinese novels).
size | compression | encryption | loose objects | packfile |
< 10MB | √ | √ | √ | |
10MB 512MB ~ | √ | √ | ||
> 512MB | √ | √ |
acs:rdc:git-encryption
created by Codeup in the "Label" section:
By temporarily modifying or deleting the tag value, you can restrict Codeup's access to the key and achieve the purpose of temporarily freezing the warehouse.
> Copyright statement: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users, and the copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。