Abstract: With the popularization of concepts such as MPC and privacy computing, many government agencies and financial companies have begun to consider participating in multi-party computing scenarios to expand the application value of data.

This article is shared from the Huawei Cloud Community " Using PSI to Solve the Data Collision Problem ", author: breakDraw.

Federated Computing Scenario

With the popularity of concepts such as MPC and privacy computing, many government agencies and financial companies have begun to consider participating in multi-party computing scenarios to expand the application value of data.

Take the following scenario as an example. A bank may want to obtain data from the Water and Electricity Bureau and its own bank depositors to comprehensively calculate the credit rating of each company.

Then the bank may wish to execute the following sql to get the credit score.

select0.5*c.资助金额*0.3+0.4*a.贴息金额*0.3+0.2*a.标的金额*0.3+(0.05*b.水费缴纳金额+0.05*b.汽费缴纳金额+0.05*b.电费缴纳金额)*0.1
frompartyA.taxa.partyB.amountb
ona.id=b.id

image.png

problem

In the above-mentioned federal computing scenario, a join operation is required to associate the data of the water and power bureau and the bank. In the traditional scheme, collision operations are performed in the TEE to obtain the associated data and then perform calculations.
image.png

However, the number of users of the Hydropower Bureau is very large, while the number of bank depositors is relatively limited. Therefore, the actual number of associations is based on the number of bank depositors.

If all the data of the water and power bureau is uploaded to the TEE, the transmission cost between the software and hardware will be very high, and this process will also bring up the sensitive data of the non-associated records.

In addition, the bank's depositor identity may also be highly sensitive to privacy.

solve

The use of PSI (Privacy Protection Set Intersection) can effectively solve the above two problems.

PSI usually has the following three characteristics:

  • Semi-trusted scenario: The two parties of the data are not willing to expose all the data, but only hope to find the intersection of the data sets
  • Data minimization: data other than the intersection of data sets cannot be leaked to any party
  • Secure two-party computing: The two parties involved in the calculation need to jointly implement a set of secure computing protocols to ensure data security.
    The specific flow diagram is as follows:
    image.png

This process can ensure that the ids of party A and party B collide in a pure ciphertext scenario to obtain a set of associated ids, and output based on this.

application

The current tics' federated computing business has supported the application of psi.

On the alliance management page, the administrator turns on the "high-level privacy protection". When it is turned on, if the sql statement of PSI-JOIN is satisfied, tics will use the psi method to construct the execution plan, perform the join collision, and then continue the subsequent calculation.

Create a job, execute the corresponding sql-join job

Perform the job, you can see the DAG diagram of the tics system, showing the entire process of psi. The output result is consistent with the result of doing the join directly.
image.png

Click to follow and learn about Huawei Cloud's fresh technology for the first time~


华为云开发者联盟
1.4k 声望1.8k 粉丝

生于云,长于云,让开发者成为决定性力量