区块链 - Privacy Computing FATE - Model Training - 陶陶技术笔记

1. Description

This article shares the test samples that come with Fate, conducts model training of the 纵向逻辑回归 algorithm, and visualizes the results through FATE Board .

The content of this article is based on the environment deployed in "Privacy Computing FATE-Concept and Single Machine Deployment Guide" .

2. Enter the container

Execute the following command to enter the Fate container:

 docker exec -it $(docker ps -aqf "name=standalone_fate") bash

file

You can see that there is a examples directory, which contains test samples of various algorithms and test data.

After entering examples , create a directory of my_test :

 cd examples

mkdir my_test

Note : All subsequent operations are performed in this directory by default.

3. Upload data

The first step is to prepare the data for training. We can upload the data to Fate through csv文件 ;

The built-in test data is in the container /data/projects/fate/examples/data directory:

file

It can be seen that each algorithm provides data on both the guest and host sides respectively.

3.1. Prepare the guest configuration

In the my_test directory, execute the following command:

 vi upload_hetero_guest.json

The content is as follows:

 {
  "file": "/data/projects/fate/examples/data/breast_hetero_guest.csv",
  "head": 1,
  "partition": 10,
    "work_mode": 0,
  "namespace": "experiment",
  "table_name": "breast_hetero_guest"
}

file: the path to the data file
head: whether the data file contains a header
partition: the number of partitions used to store data
work_mode: work mode, 0 is the stand-alone version, 1 is the cluster version
namespace: namespace
table_name: data table name

3.2. Prepare host side configuration

In the my_test directory, execute the following command:

 vi upload_hetero_host.json

The content is as follows:

 {
  "file": "/data/projects/fate/examples/data/breast_hetero_host.csv",
  "head": 1,
  "partition": 10,
    "work_mode": 0,
  "namespace": "experiment",
  "table_name": "breast_hetero_host"
}

Note that the file name and table name are not the same as the guest.

3.3. Execute upload

Execute the following two commands to upload the data of the guest and host respectively:

 flow data upload -c upload_hetero_guest.json

flow data upload -c upload_hetero_host.json

Specify the configuration file with -c.

After success, return the relevant information of the upload task:

 {
    "data": {
        "board_url": "http://127.0.0.1:8080/index.html#/dashboard?job_id=202205070640371260700&role=local&party_id=0",
        "code": 0,
        "dsl_path": "/data/projects/fate/fateflow/jobs/202205070640371260700/job_dsl.json",
        "job_id": "202205070640371260700",
        "logs_directory": "/data/projects/fate/fateflow/logs/202205070640371260700",
        "message": "success",
        "model_info": {
            "model_id": "local-0#model",
            "model_version": "202205070640371260700"
        },
        "namespace": "experiment",
        "pipeline_dsl_path": "/data/projects/fate/fateflow/jobs/202205070640371260700/pipeline_dsl.json",
        "runtime_conf_on_party_path": "/data/projects/fate/fateflow/jobs/202205070640371260700/local/0/job_runtime_on_party_conf.json",
        "runtime_conf_path": "/data/projects/fate/fateflow/jobs/202205070640371260700/job_runtime_conf.json",
        "table_name": "breast_hetero_guest",
        "train_runtime_conf_path": "/data/projects/fate/fateflow/jobs/202205070640371260700/train_runtime_conf.json"
    },
    "jobId": "202205070640371260700",
    "retcode": 0,
    "retmsg": "success"
}

3.4. Checking the data

Execute the following command to view information about the table:

 flow table info -t breast_hetero_guest -n experiment

Return after execution:

 {
    "data": {
        "address": {
            "home": null,
            "name": "breast_hetero_guest",
            "namespace": "experiment",
            "storage_type": "LMDB"
        },
        "count": 569,
        "exist": 1,
        "namespace": "experiment",
        "partition": 10,
        "schema": {
            "header": "y,x0,x1,x2,x3,x4,x5,x6,x7,x8,x9",
            "sid": "id"
        },
        "table_name": "breast_hetero_guest"
    },
    "retcode": 0,
    "retmsg": "success"
}

4. Model training

Next, we start the modeling task. We need to prepare two configuration files, the process configuration file dsl and the parameter configuration file conf.

4.1. Prepare the dsl file

Execute the following command:

 cp /data/projects/fate/examples/dsl/v2/hetero_logistic_regression/hetero_lr_normal_dsl.json /data/projects/fate/examples/my_test/

Copy the longitudinal logistic regression algorithm example that comes with Fate directly to our my_test directory.

Fate implements componentization of various algorithms. The dsl file mainly configures which components the entire modeling process consists of:

file

For example, the first module Reader is used to read the training data just uploaded, and then the DataTransform module, which converts the training data into instance objects, generally required for all modeling processes There are the first two modules;

In general configuring a component requires the following:

 - module：模型组件，Fate 当前支持 37 个模型组件 
- input： 
    - date：数据输入
    - module：模型输入
- output：
    - date：数据输出
    - module：模型输出

module defines the type of this component. Currently, Fate has 37 components that can be used. Of course, we can also develop new algorithm components ourselves;

Input and output are to set the input and output of the component respectively. Both of them support two types at the same time, namely data and model input and output.

For detailed configuration instructions, please refer to the official documentation: https://github.com/FederatedAI/FATE/blob/master/doc/tutorial/dsl_conf/dsl_conf_v2_setting_guide.zh.md

4.2. Prepare the conf file

Execute the following command:

 cp /data/projects/fate/examples/dsl/v2/hetero_logistic_regression/hetero_lr_normal_conf.json /data/projects/fate/examples/my_test/

Copy the longitudinal logistic regression algorithm example that comes with Fate directly to our my_test directory.

file

As can be seen from the above figure, under the component_parameters element, configure the table name read by the Reader component.

The configuration mainly configures the following:

DSL version
The roles and party_id of each party
Component operating parameters

For the component list and detailed configuration parameters of each component, please refer to the official documentation: https://fate.readthedocs.io/en/latest/zh/federatedml_component/

4.3. Submitting tasks

Execute the following command:

 flow job submit -d hetero_lr_normal_dsl.json -c hetero_lr_normal_conf.json

Specify the dsl and conf configuration files with -d and -c respectively.

After success, return the relevant information of the training task:

 {
    "data": {
        "board_url": "http://127.0.0.1:8080/index.html#/dashboard?job_id=202205070226373055640&role=guest&party_id=9999",
        "code": 0,
        "dsl_path": "/data/projects/fate/fateflow/jobs/202205070226373055640/job_dsl.json",
        "job_id": "202205070226373055640",
        "logs_directory": "/data/projects/fate/fateflow/logs/202205070226373055640",
        "message": "success",
        "model_info": {
            "model_id": "arbiter-10000#guest-9999#host-10000#model",
            "model_version": "202205070226373055640"
        },
        "pipeline_dsl_path": "/data/projects/fate/fateflow/jobs/202205070226373055640/pipeline_dsl.json",
        "runtime_conf_on_party_path": "/data/projects/fate/fateflow/jobs/202205070226373055640/guest/9999/job_runtime_on_party_conf.json",
        "runtime_conf_path": "/data/projects/fate/fateflow/jobs/202205070226373055640/job_runtime_conf.json",
        "train_runtime_conf_path": "/data/projects/fate/fateflow/jobs/202205070226373055640/train_runtime_conf.json"
    },
    "jobId": "202205070226373055640",
    "retcode": 0,
    "retmsg": "success"
}

There are several properties to pay attention to:

board_url: This address is the address of the FATE Board where the task status can be viewed.
job_id: The unique keyword of the task, you can view the details of the task through this ID on the FATE Board.
logs_directory: is the path of the log, you can view various log information of the task through this address.
model_info: There are two pieces of information, model_id and model_version, which will be used when performing prediction tasks. Before making predictions, you need to specify which model to use to perform prediction tasks, and these two pieces of information are the only keywords of the model.

5. Visualization

5.1. Task overview

Through the address of board_url in the returned information above, you can access the task overview page in the browser:

http://127.0.0.1:8080/index.html#/dashboard?job_id=202205070226373055640&role=guest&party_id=9999

It should be noted that because it is executed in the container, the IP address needs to be modified according to the actual situation.

The login username and password are both admin

file

The Dataset info on the left is the information of each participant, the middle is the running status of the task, showing the progress bar and time-consuming, the right is the components of the entire task process DAG picture, and the bottom is the task log information.

5.2. Component output

Click the view this job button in the middle to enter the details of the task:

file

Each component in the DAG diagram is clickable, select the hetero_lr_0 component, and click the view the outputs button in the lower right corner to enter the output page of the logistic regression component:

file

There are three TAGs in the upper left corner:

model output: The model output is the training result of the algorithm component.
data output: data output, the output of each component after data processing, for the input of downstream components.
log: The running log of the component.

Scan the code to follow for a surprise!

file

Privacy Computing FATE - Model Training

1. Description

2. Enter the container

3. Upload data

3.1. Prepare the guest configuration

3.2. Prepare host side configuration

3.3. Execute upload

3.4. Checking the data

4. Model training

4.1. Prepare the dsl file

4.2. Prepare the conf file

4.3. Submitting tasks

5. Visualization

5.1. Task overview

5.2. Component output

zlt2000

引用和评论

Spring AI与DeepSeek实战四：系统API调用

创新型区块链电商挖矿 DApp 的深度解析：MetaShop 元链购

印度股票实时数据API接口选型指南：iTick如何成为开发者优选

打造下一代数字资产枢纽：交易所与区块链钱包开发的黄金机遇

Hamster Kombat：基于 Telegram 的创新 Web3 互动小游戏

对接韩国金融市场数据全指南：K线、实时行情与IPO新股

解锁数字艺术新玩法：NFT潮玩藏品的商业潜力与创新机遇