About from the perspective of code services and code security, see how cloud-effect code encryption technology can solve this problem

code data is stored in the cloud, how to ensure its security?

Some corporate managers are code hosting : My code exists on the cloud server, will it be leaked?

Next, we will look at how cloud-effect code encryption technology can solve this problem from the perspective of code services and code security.

I. Introduction

1. code hosting service

What is a code hosting service?

Code hosting service is a service that runs in a public environment and provides software version control management.

Two problems to be solved at the core of code hosting services:

  • Archiving: need to have the ability to archive, that is, to save a copy of our current work for copying, tracing, etc.
  • Collaboration: Different people can work based on the same object content, and their results can be reflected on this content together.

Since its inception, Git has been closely associated with the words "open source" and "shared". The reason why it can be quickly promoted and become a mainstream software version control tool is because it has changed the traditional software version. The collaboration method of control tools makes software contribution and collaboration more efficient and convenient.

2. Two forms of code hosting services

Code hosting services usually have two forms:

  • Use open source products or purchase private deployment products, set up on user-controllable deployment environment to provide services.
  • Using cloud hosting service products, only has the right to use the service , without having to pay attention to the operation and maintenance of the service.

2.1 Self-built code hosting service

As an important part of corporate assets, code assets have always been valued. Many companies and institutions will have self-operated code hosting services. From the perspective of complete control or data security, private network services seem to be more credible and impeccable, but the subsequent stability and reliability of often make small and medium-sized enterprises suffer.

At the beginning of enterprise development, it may only need a server resource to build an available code hosting service to meet certain daily R&D needs; however, as the scale of the enterprise continues to expand, the problems encountered gradually increase, and special personnel are required to manage it. This service; and the development of enterprise R&D personnel to a scale of more than a thousand people, even need to invest a small team to be responsible for the operation and maintenance of code hosting services and customized development work. This is undoubtedly a large sum of cost and expense .

In addition, due to imperfect self-built services, it is easy to cause local operation and maintenance authority to be too large and cause security risks , delete library and run off the road and other vicious incidents. Some IT practitioners can evaporate the company's market value of hundreds of millions of dollars by themselves. , Is always sounding the alarm for us.

2.2 Cloud code hosting service

The cloud code hosting service provides a wider range of service capabilities through cloud sharing; at the same time, because it often has an experienced operation and maintenance management and product team behind it, its reliability is far better than that of self-built enterprises. pull pipe service.

But in comparison, because the cloud code hosting service only has the right to use to the user, it cannot log in to the storage server, and cannot perceive the storage and copy mechanism behind it, and because of the sensitivity of the , it is very important for the cloud code hosting service. trust issue (such as the belief that the code data is visible to the operation and maintenance personnel of the service provider) is always the crux of some medium and large enterprises.

3. The evolution direction of code hosting services

With the continuous evolution of cloud computing, the application of cloud-native technology concepts can maximize the benefits of cloud computing technology, including elastic scaling, pay-as-you-go, no-vendor binding, and high SLA.

When practicing the concept of cloud native, it is bound to need:

  • Adopt best practices in the DevOps field to manage R&D and operation and maintenance processes.
  • Use the CICD tool chain to achieve rapid iteration and continuous delivery of applications.

GitOps and Iac (Infrastructure as code) are now more and more adopted by many companies, and code hosting services have gradually become a storage infrastructure with software version control and collaboration capabilities. Its reliability and The ability to access has gradually become the core measurement index of major cloud code hosting services. And this ability is exactly what cannot be satisfied by self-built code hosting services.

So, how to build a cloud code hosting security system to enhance users' sense of trust?

4. Cloud code hosting security system

In order to better support the archiving capabilities of the code hosting service, the cloud code hosting system is usually built according to the following four aspects:

Access Security:

Access security includes, but is not limited to, authentication, permission control, data transmission, etc. It mainly solves the identification of users, gives the designated users the reasonable permissions to the minimum, and guarantees the security of the code assets to the greatest extent beforehand. Data encryption during transmission (such as common https, openssl encryption) and other means to prevent intermediaries from intercepting or tampering with user data.

credible data:

On the basis of access security, some additional measures are needed to ensure the credibility of code submission and code attribution (such as accepting only the code of a specific owner, or requiring GPG signatures for submission records), thereby further reducing the Risks caused by account leaks, etc.

storage security:

As the core capability of cloud code hosting services, storage security must not only ensure the reliability of the service, and the data assets will not be lost, but also reduce the risk of users in the process of using storage snapshots and backups. At the same time, how to protect the data stored in physical devices without risk of data leakage is also the most concerned issue for users.

audit risk control:

In addition to the security capabilities before and during the event, the entire security protection system is further strengthened by means of intelligent risk identification and active defense after the event.

In terms of access security, data credibility, and audit risk control, the current cloud code hosting services more or less have some of these capabilities, but I believe that storage security is the core competitiveness. Among them, data encryption technology is also the most challenging.

5. Data encryption technology

Encrypting the data content and putting the key to open the lock in the hands of users is a silver bullet used by many cloud services to solve the problem of user trust. For this reason, object storage services with cloud disk encryption and cloud encryption capabilities are more easily accepted by users. Many database service vendors (such as MySQL, Oracle, PostgreSQL, etc.) have also launched TDE (Transparent Data Encryption) on their products. ability.

So, can code hosting services that also have storage properties also introduce the ability to encrypt data?

In addition to providing users with the ability to back up and delete code assets, through data encryption, the user's data assets are only visible to the users themselves, so as to achieve the goal of almost completely controllable code storage.

2. Industry code encryption scheme

1. Client Encryption

The client-side encryption method represented by the open source software git-crypt is a good choice for sensitive information storage. The user generates a data key to encrypt the target file and submit the encrypted content to the git warehouse.

In this mode, the encrypted content is only visible to the author, and when you need to share it with others, you can obtain the PGP public key of others, encrypt the data key used for encryption, and submit it to the code repository. . After the other party obtains the ciphertext of the data key, it uses its own PGP private key to decrypt it and obtains the data key, and then the data can be decrypted.

However, the disadvantages of this method are also obvious. Once the data key is obtained once, it can be used repeatedly and cannot be deauthorized; the original text file is encrypted and becomes a binary file, which also makes git unable to create delta for incremental changes. Use and frequent changes will lead to rapid expansion of the warehouse volume.

2. Disk encryption

Disk encryption technology is already very mature and seems to be a good option. However, disk encryption only solves the problem of data security at the physical device level. At the vm (virtual machine) level, data is still accessed in plaintext.

3. Encryption when the server is idle

For code warehouses that are not frequently used, encryption at rest is also a good solution. When a Git repository is not accessed, it is encrypted, and when it needs to be accessed again, it needs to wait for the decryption to be completed before opening access.

The versatility of this scheme is very high, and there are also many encryption schemes to choose from. The cost of implementation is not high, but the shortcomings are also obvious: warehouse access requires a preheating process, and the larger the warehouse, the preheating time is also Correspondingly longer; frequently accessed warehouses are almost always stored on disk in a non-encrypted form, and these hotspot warehouses are often the ones that users are most concerned about and least expect to be leaked.

4. Cloud storage encryption based on JGit and S3

CodeCommit, AWS' code hosting product, provides code encryption capabilities. It is based on the code hosting service implemented by JGit, reuses the original JGit storage model, and utilizes the storage encryption capabilities of S3.

The community activity of JGit is much lower than that of Git, and in terms of performance comparison, Git is also far better than JGit, which is why all major mainstream code hosting services use Git.

3. Transparent encryption for cloud code hosting

The transparent encryption of the cloud code hosting service is a server-side encryption technology (Server-Side Encryption) : It uses the user-authorized key to encrypt the user code data; when the user accesses it, the user’s Key to decrypt the data. The entire encryption and decryption process is completely transparent to the user. The user can use the regular Git client to access the code library or browse the page service provided by the code service; but the user guarantees retain full control of the data , which can be used when necessary , By canceling the key authorization, the purpose of freezing the code data is achieved.

1. Advantages of Cloud Transparent Encryption

Advantages of Git TDE (Transparent Data Encryption):

  • Does not rely on the file system.
  • The data accessed by the file system is ciphertext.
  • You can choose to encrypt only some sensitive warehouses to reduce the impact on performance.

Git TDE protection breaks through the security threats of file system access control:

  • A malicious user who steals storage devices and directly accesses code library files.
  • A malicious user who restores data from a disk backup.
  • Protect static data (persistent data), invisible to platform operation and maintenance personnel.

2. Encryption method

Encryption and decryption using envelope encryption [2]:

  • The data key is generated by the key management service KMS, and the ciphertext of the data key and the encrypted result are stored.
  • When decryption is needed, KMS decrypts the ciphertext of the data key to decrypt the encrypted content.

KMS encryption and decryption are usually used to process a small amount of data; while Git has a large amount of data, KMS encryption and decryption services cannot be used directly. Envelope encryption is a good choice for encrypting data keys.

3. Key granularity

One data key for each code base: One key for one library can effectively avoid data leakage.

4. Which files need to be encrypted

The data in the Git warehouse includes reference data (HEAD, branch, tag, etc.), index data (.idx, .bitmap), loose objects, and packed data (.pack).

Under normal circumstances, git push will be stored as loose objects, and a loose object file corresponds to a user-side file entity. After garbage collection git gc , loose objects will be cleaned up, and at the same time they will be packed into a single packaged data (ie packfile).

So, in this case, which files need to be encrypted?

We believe that reference information such as branches does not contain sensitive user information, so there is no need for encryption. At the same time, the index file is only used for searching packaged data, not user information, and no encryption is required. Under normal circumstances, our sensitive information is stored in smaller text files (such as code text), combined with performance considerations, we decided to:

  • Encrypt only loose objects <10MB (equivalent to 5 million words in Chinese novels).
size compression encryption loose objects packfile
< 10MB
10MB 512MB ~
> 512MB
* The single file packfile submitted is not encrypted. (By default, when we submit a single file larger than 500MB to the git service, the server will store it as a single file packfile instead of a loose object, and a single file larger than 500MB can be considered as a binary resource or program , No need for encryption) * Except for the single file packfile mentioned above, all other pack files are encrypted. ## 5. When to encrypt When the user submits the code, the data submitted by the user is first encrypted by SSL/TLS and transmitted to the code hosting cloud service. At the same time, the data key is obtained, the submitted data is encrypted, and then persisted to the disk. ## 6. How to encrypt Using AES (Advanced Encryption Standard) as our preferred algorithm, it is expected that SM4 support will be added in the future. encryption mode Use CTR mode to encrypt data content. key length The 256-bit data key is used, generated by KMS, and the platform only saves the cipher text of the data key. ## 7. Encrypted effect packfile effect: Loose object effect: ## 8. The impact of encryption on performance Encryption will have a small impact on warehouse performance: Taking Git project as an example, the performance of the packaging stage decreases by about 10-20% after encryption; however, network transmission occupies most of the time (usually more than 90%) during the download process, which is caused by encryption The performance degradation of is negligible during the use of . # Four, our code hosting products ## Codeup code encryption Cloud effect cloud encryption code base is a team of self-study products, is currently first domestic supports code encryption hosting services, is currently the world's first real-time using the encryption scheme code hosting service. the code base hosted in the cloud effect Codeup 160acbc98399bc in the cloud can effectively prevent people other than the data owner from contacting the user's plaintext data, and avoid data leakage in the cloud. At the same time, the code encryption process is completely transparent to users, and users can use any official Git terminal (including but not limited to Git, JGit, libgit2, etc.) to access the code repository on Codeup. ### Turn on warehouse encryption Users with administrator rights can see the "Warehouse Encryption" switch when entering the settings page of the specific warehouse that is expected to be encrypted, and click to open: ### Turn on encryption when creating a new warehouse When creating a new warehouse, check to enable warehouse encryption: ### Turn on/off warehouse encryption Users with administrator rights can see the "Warehouse Encryption" switch when entering the settings page of the specific warehouse that is expected to be encrypted, and click on/off: encryption is turned off, the decryption operation of the warehouse will be triggered. ### KMS Key Management Service Go to the Alibaba Cloud KMS service, you can view the service key automatically created by Codeup. The key cannot be deleted or disabled, but you can temporarily disable the call of Codeup to KMS by modifying the label of the KMS service key: Click ④——Key details, you can see a label key acs:rdc:git-encryption created by Codeup in the "Label" section: By temporarily modifying or deleting the tag value, you can restrict Codeup's access to the key and achieve the purpose of temporarily freezing the warehouse. > Copyright statement: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users, and the copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.

阿里云开发者
3.2k 声望6.3k 粉丝

阿里巴巴官方技术号,关于阿里巴巴经济体的技术创新、实战经验、技术人的成长心得均呈现于此。