EISS2021-Zero Trust Security Construction Practice of Office Network

1. Background

Hello everyone, I am very happy to share with you the topic of "Zero Trust and Security Construction of Office Network".

Before sharing, I would like to briefly introduce our company. FunPlus is a game company, mainly for overseas markets, so many students may not have heard of our company; but students who like to play games may have heard of a team, just Doing fpx is actually a team of our company.

2. Share content

The content I want to share today is mainly three points

The first is why we do zero trust security construction. Many teachers have talked about some application scenarios of zero trust and the concept of zero trust. Here I briefly mention why we want to do zero trust construction; the second is How do we design a zero-trust architecture? Zero-trust construction is mainly realized in combination with business. Here I want to take FunPlus as an example to share our construction ideas; the third point is how we do it in the construction practice process. , What have been done and some detailed questions;

Three, why do

First of all, why do we want to build zero trust? Many companies that attach importance to security divide their demand sources into two categories, one is external drivers, and the other is internal drivers; here, our FunPlus is actually internal drivers, and then go Do some security things

Why do you say that? Because we are a game company, the game company attaches great importance to security; therefore, we have some security requirements. The second is that our entire team attaches great importance to security, because the security of a game It can determine the life cycle of the game. When promoting zero trust, the degree of team cooperation is also very important; the third point is that after we report the zero trust security construction plan to the leadership, the leadership is also very supportive of us to do this one thing.

3.1 Security requirements

I just mentioned that we have security requirements, so why do we have such security requirements? Here I will give an example.

3.1.1 Network Architecture

Our company mainly divides the network into two networks, the internal office network and the public-facing network.

Based on the security system built by these two networks, the internal office network is considered to be a trusted network by default, which means that you only need to connect to this office network; if you access some internal services, it is considered to be trusted. You need to go through some simple authentication to operate; but there are some problems here. For example, during the epidemic in 2020 last year, many colleagues worked from home, and to access the internal office system when working from home, they had to connect to the internal network through vpn; Such network architecture actually has some weaknesses;

VPN can only guarantee the credibility of this identity, but it cannot guarantee the security of the device. In addition, a VPN is connected in many cases, but the access traffic does not need to use the VPN; for example, WeChat and access to some non-office web pages, in fact It is a waste of VPN resources; we are thinking about how to avoid using VPNs to ensure security while still being able to access services within the internal office network. At this time, the concept of zero trust is actually very appropriate.

3.1.2 Zero Trust Concept

Here I have summarized some concepts of zero trust, here are four

The first point is the default distrust, the default distrust of users, devices, and networks. The structure of our previous office network is actually to trust the network by default and trust all devices by default; this concept can actually complement us Some shortcomings

The second point is dynamic access permissions. As long as you are connected to this network on our previous internal office network and your identity is verified once, you can operate the following permissions. This can also strengthen our identity verification because it is logging in. After that, it has been continuously verifying; for example, if I think a user is illegal, I can delete this user at any time, and his subsequent visits will be immediately given to the front end.

The third point is to reduce the exposure of resources and reduce the scope of attacks. Before we had to access internal services, we only need to connect to the office network to directly connect to the service, and then the service can control its own authority. Some services are actually quite weak in security. , Such as weak passwords, etc., the scope of the attack is still relatively wide

The fourth point is the continuous evaluation and security response, through multiple dimensions to determine whether a request is sufficiently secure.

3.2.3 Construction goals

Combining the status quo of the network and some concepts of zero trust, we put forward three construction goals

The first is to allow employees to access the company’s internal services more securely and conveniently. The second is to ensure that the visitor’s identity and network environment are secure before allowing access. The third is to solve the problem of dispersing access logs and unable to trace user behavior. To achieve these three points, this is the original intention of our construction.

3.2 Pay attention to safety

As mentioned earlier, our game company attaches great importance to safety. The main reason for focusing on safety is that safety can directly affect the company’s revenue. Here I will give two examples from the game industry.

3.2.1 Source code disclosure scenario

Let me first talk about the source code leak case, what will happen if the source code leaks

Many students may have heard of the legendary private server or played this legend. In September 2002, the source code of the legend was leaked through the Italian server source code, and it soon spread to the country. In just six months, there were more than 500 privately built legends. For servers, many players began to switch from official servers to private servers. Legendary operators’ income was greatly affected. Therefore, they no longer paid the legendary developers for agency fees. This caused the developer to face the risk of bankruptcy and was even acquired later. From this example, we can see that the safety of a game can determine its life cycle.

3.2.2 High-risk vulnerability scenarios

There is also a high-risk vulnerability scenario. At the Def Con 2017 conference in the United States, a hacker revealed to the media that he had used online game vulnerabilities to make money in the past two decades.

And he entered the command in the debugger on the live demonstration, adding a lot of gold coins to his account in the game; different games use different methods to increase currency, the same is that the added gold coins or items are mainly obtained through third-party market transactions. Benefits; in this example, we can see that the security of a game can directly affect the company’s revenue; therefore, our team itself attaches great importance to security, and the boss’s security emphasis also gives us strong support, allowing us to Peace of mind to build zero trust security.

Four, architecture design

After deciding to do zero trust building, we mainly did three things

The first point is to determine the ideal goal. After the goal is determined, you must be familiar with the existing network architecture; because zero trust is not a product, it will be over after the product is developed. It fits the business line very well, so you must be familiar with the current Network structure, and then combine the goal and the status quo to get an implementable goal plan;

4.1 The ideal goal

This picture is our ideal target effect. In the picture, you can see that I hope that all visiting users will access these internal services through the security gateway proxy.

Before proxy visits, we will verify whether the visitor’s identity is legal, and at the same time verify whether the device is an internal device, and whether there are illegal requests in his request parameters, and there are some other abnormal behaviors, such as his Usually he visits this service during working hours. Suddenly one day he visits this service at one or two in the morning. At this time, we will lower his security level, and he needs a second verification to improve the security score;

To achieve these goals, we need to achieve the following items on the left, such as unified resource management and unified control of external access. For example, we need to manage these users in a unified manner, control these devices in a unified manner, and allow users to access internal services through a security gateway; so These services need to be configured with some firewall restrictions and only allow the security gateway to access it; the other two points are the desire to dynamically adjust the access control strategy and the hope to reduce the use of VPN.

4.2 Familiar with the existing architecture

After determining the ideal goal, you have to get familiar with the existing network structure. The current network status is mainly based on the two mechanisms of network trust and identity verification, so I will focus on understanding these two.

4.2.1 Network access

Let me talk about PC devices first. For example, there are three ways for devices such as windows mac linux to connect to the internal network. The first is that when working at home, they will be connected through vpn. When connecting to vpn, you need to log in to the account. The second is in the company. The office scene is generally connected to our network through wifi. After connecting to wifi, you will need to access the login account and password to confirm your identity. The last one is the network cable access of the office network; in fact, mobile devices such as mobile phones are connected to the Internet. The computers are basically the same, and some dumb devices such as printers and cameras are mainly connected to the internal network through a network cable.

Here are some problems with the current network

The first point is that after a user connects to our internal communication, if he also wants to access a certain service, it is entirely dependent on the service itself to control the authority, and there is no unified identity to control whether it can be accessed; some systems have some weaknesses Passwords or weak accounts are considered from the perspective of security;

The second point is that the stability of vpn cannot be guaranteed. For example, some employees need to access the internal services of the office network on the high-speed rail. At this time, they must first connect to the internal communication through the VPN. The VPN is a long link and may not be stable and stable; at the same time, there are Some traffic does not need to be accessed through vpn, but all through the VPN obviously increases the direct network consumption;

The third point is that users, devices, and applications are in the same network. This is unreasonable. It should be that our different departments are in different network isolation areas. For example, the development department has a development network and the administrative department has one. There needs to be an isolation between administrative networks.

4.2.2 Identity authentication

Identity authentication is mainly account verification. The account is mainly composed of two parts.

One part is the company's unified account, and the other part is self-built account. At present, regular employees have such a unified account, but some outsourcing personnel want to access a certain business, and they will open a self-built account on the business system. , And some systems are troublesome to rebuild because of the use of open source systems, and they still remain on self-built accounts.

4.3 The adjusted plan

After being familiar with the existing structure, combined with the previous ideal goals, we need to design a set of architecture that can be implemented, mainly this picture. As you can see in the picture, all requests are accessed through the terminal to access the office network application. Go through a security gateway

This security gateway is mainly an agent. Before the agent, he will retrieve the evaluation data of the Security Policy Center, and use this data to determine whether the request is legal. If it is not legal, the person's request will be discarded; the judgment of whether the request is legal mainly depends on In the security policy center, the decision-making of the security policy center is mainly based on the data of the center of gravity of the equipment management center and the ID card

The device management center mainly stores the security baseline data of the terminal, and at the same time issues a certificate to the device, the terminal will report the security data, such as whether the process is secure, whether the network is secure, and whether there is a lock screen. The certificate management is mainly to verify whether the device is legal for the company equipment.

For example, if someone uses a private device to access the company’s office network, we do not allow it, and the device management center will give this device a low score; another basis is that the ID card is in the center, mainly to verify whether a user’s identity information is legal , To synthesize this information to get a security score, when the security score is relatively low, we may improve his security score through face recognition or other multi-factor authentication, and finally decide whether this request can access the application of our office network

4.4 Construction module

Just now we have simply reduced the role of five modules through the architecture diagram. Now let’s take a closer look at these five modules.

4.4.1 Identity Authentication Center

The main role of the identity authentication center is to provide identity authentication. In addition to the usual regular authentication, there should be some enhanced secondary authentication methods to ensure the credibility of the identity in multiple dimensions; here we mainly use third-party identity authentication services. Alibaba Cloud’s Idaas service and Google’s google identity authenticator.

4.4.2 Security Policy Center

The Security Policy Center has just mentioned that it is mainly to determine whether the request is accessible, mainly based on the security policy dynamically generated based on the identity authentication and device risk, and the policy needs to be updated in real time;

In addition, some applications require a relatively high security level. For example, the financial system may require a second multi-factor authentication to increase the security score.

4.4.3 Security Gateway

The main function of the security gateway is to proxy the traffic from the external network to the inside. The security gateway will first judge that it is logged in before forwarding it. If there is no login, the request will be redirected to the idaas system, allowing the user to log in first;

When requesting again, the security score of the Security Policy Center will be called. If its score is relatively low, it will block the access of this request; if it is legitimate, you can hook this request, such as accessing a wiki system, you can add some watermarks here. .

When the traffic passes through the security gateway, the log can be stored and placed in the log analysis platform for statistical analysis, which can be traced and audited in the future.

4.4.4 Equipment agent

The equipment agent mainly collects some security information of the terminal, such as system information, network connection information, whether to install anti-virus software, and reports this information to the server of our equipment management center.

4.4.5 Device Management Center

The equipment management center mainly stores some safety information reported by the equipment, and the equipment is authenticated to determine whether the equipment is our company's internal equipment, etc., to provide data support to the safety decision center.

5. Construction experience

Zero trust construction is not something that can be completed in a short period of time. The zero trust construction of Google has taken nearly ten years. The previous teacher Yi Yi also spent four years building zero trust in the perfect world. Therefore, zero trust is not a short time. Completed, but now Zero Trust already has a basic structure relatively speaking, so we will build it much faster, but it is also a cycle.

5.1 Phased implementation

There are roughly 6 things that need to be done after dismantling the zero trust construction

Our FunPlus is implemented in stages when building zero trust, which is mainly divided into four stages, of which Q1 is the first stage. The first is to have a security gateway and support identity verification. This is the most basic function point. , And then to access some services to verify the feasibility of this model. In this process, a lot of requirements will be collected. After improving it, we will continue to promote business access. The first phase of our work has already landed; currently under construction In the second stage, the second stage is mainly the full access of the service and the third-party purchase and evaluation of terminal software. The service has been connected and the terminal is still being evaluated.

5.1.1 The first stage

The first stage is mainly to build a security gateway to achieve the most basic requirements, such as traffic forwarding and login, as well as identity authentication, and access to some office applications to verify whether this model is feasible.

5.1.2 The second stage

The second stage is mainly to access full applications. At the same time, we need to investigate the products of this terminal. It needs to support this baseline detection, certificate issuance, and TLS mutual authentication.

5.1.3 The third stage

The third and fourth stages are currently only a planning stage and have not yet been implemented. I will not explain this part too much.

5.1.4 Long-term implementation

Some things can be done at all stages, and there is no dependency

For example, the need to integrate services, support more detailed permissions to control access, and enrich the capabilities of proxy gateways, including traffic interception and behavior analysis statistics, content injection, such as watermarking; the third is to enrich user behavior audit capabilities, Combine business access behavior and terminal security logs to comprehensively monitor security risks.

5.2 Detailed construction

Here I will talk about our construction details, mainly recommend a few open source tools

5.2.1 openresty

Earlier we talked about the security gateway. The security gateway is mainly used for forwarding. Many students here will think of the Nginx server. I also use the Nginx server here, but there is an encapsulated service for Nginx called openresty, which is more convenient for us to go. Execute some lua scripts

5.2.2 NginxWebUi

If you use Nginx for forwarding, it will definitely involve the Nginx configuration file. If you use the vim editor to forward and edit the configuration file, it will inevitably make mistakes; so it is best to have an interface on which you can operate and generate configuration files.

Here I recommend an open source tool called NginxWebUi open source project, you can complete a reverse proxy configuration in the graphical interface

5.2.3 Configuration Distribution

We have more than 20 nodes in the world, and each node has a server. If we put the proxy gateway on one node, the proxy speed will be very slow, so it is impossible for us to deploy one node for all nodes; when multiple nodes When the configuration file synchronization needs a plan

Our solution is to first generate the configuration by NginxWebUi. There is a program that will monitor the modification of the folder. If there is any modification, the configuration will be submitted to the gitlab warehouse; at the same time, each node will be notified and the version number will be passed, and the node will be sent from the gitlab server Pull the latest configuration and check whether the configuration file will report an error. If no error is reported, Nginx will be restarted and the version used by the node will be passed to the center.

Author: Tang Qingsong

Date: 2021-5-14