头图

1. Description

This article shares the test samples that come with Fate, conducts model training of the 纵向逻辑回归 algorithm, and visualizes the results through FATE Board .

The content of this article is based on the environment deployed in "Privacy Computing FATE-Concept and Single Machine Deployment Guide" .

2. Enter the container

Execute the following command to enter the Fate container:

 docker exec -it $(docker ps -aqf "name=standalone_fate") bash

file

You can see that there is a examples directory, which contains test samples of various algorithms and test data.

After entering examples , create a directory of my_test :

 cd examples

mkdir my_test
Note : All subsequent operations are performed in this directory by default.

3. Upload data

The first step is to prepare the data for training. We can upload the data to Fate through csv文件 ;

The built-in test data is in the container /data/projects/fate/examples/data directory:

file

It can be seen that each algorithm provides data on both the guest and host sides respectively.

3.1. Prepare the guest configuration

In the my_test directory, execute the following command:

 vi upload_hetero_guest.json

The content is as follows:

 {
  "file": "/data/projects/fate/examples/data/breast_hetero_guest.csv",
  "head": 1,
  "partition": 10,
    "work_mode": 0,
  "namespace": "experiment",
  "table_name": "breast_hetero_guest"
}
  • file: the path to the data file
  • head: whether the data file contains a header
  • partition: the number of partitions used to store data
  • work_mode: work mode, 0 is the stand-alone version, 1 is the cluster version
  • namespace: namespace
  • table_name: data table name

3.2. Prepare host side configuration

In the my_test directory, execute the following command:

 vi upload_hetero_host.json

The content is as follows:

 {
  "file": "/data/projects/fate/examples/data/breast_hetero_host.csv",
  "head": 1,
  "partition": 10,
    "work_mode": 0,
  "namespace": "experiment",
  "table_name": "breast_hetero_host"
}
Note that the file name and table name are not the same as the guest.

3.3. Execute upload

Execute the following two commands to upload the data of the guest and host respectively:

 flow data upload -c upload_hetero_guest.json

flow data upload -c upload_hetero_host.json
Specify the configuration file with -c.

After success, return the relevant information of the upload task:

 {
    "data": {
        "board_url": "http://127.0.0.1:8080/index.html#/dashboard?job_id=202205070640371260700&role=local&party_id=0",
        "code": 0,
        "dsl_path": "/data/projects/fate/fateflow/jobs/202205070640371260700/job_dsl.json",
        "job_id": "202205070640371260700",
        "logs_directory": "/data/projects/fate/fateflow/logs/202205070640371260700",
        "message": "success",
        "model_info": {
            "model_id": "local-0#model",
            "model_version": "202205070640371260700"
        },
        "namespace": "experiment",
        "pipeline_dsl_path": "/data/projects/fate/fateflow/jobs/202205070640371260700/pipeline_dsl.json",
        "runtime_conf_on_party_path": "/data/projects/fate/fateflow/jobs/202205070640371260700/local/0/job_runtime_on_party_conf.json",
        "runtime_conf_path": "/data/projects/fate/fateflow/jobs/202205070640371260700/job_runtime_conf.json",
        "table_name": "breast_hetero_guest",
        "train_runtime_conf_path": "/data/projects/fate/fateflow/jobs/202205070640371260700/train_runtime_conf.json"
    },
    "jobId": "202205070640371260700",
    "retcode": 0,
    "retmsg": "success"
}

3.4. Checking the data

Execute the following command to view information about the table:

 flow table info -t breast_hetero_guest -n experiment

Return after execution:

 {
    "data": {
        "address": {
            "home": null,
            "name": "breast_hetero_guest",
            "namespace": "experiment",
            "storage_type": "LMDB"
        },
        "count": 569,
        "exist": 1,
        "namespace": "experiment",
        "partition": 10,
        "schema": {
            "header": "y,x0,x1,x2,x3,x4,x5,x6,x7,x8,x9",
            "sid": "id"
        },
        "table_name": "breast_hetero_guest"
    },
    "retcode": 0,
    "retmsg": "success"
}

4. Model training

Next, we start the modeling task. We need to prepare two configuration files, the process configuration file dsl and the parameter configuration file conf.

4.1. Prepare the dsl file

Execute the following command:

 cp /data/projects/fate/examples/dsl/v2/hetero_logistic_regression/hetero_lr_normal_dsl.json /data/projects/fate/examples/my_test/
Copy the longitudinal logistic regression algorithm example that comes with Fate directly to our my_test directory.

Fate implements componentization of various algorithms. The dsl file mainly configures which components the entire modeling process consists of:

file

For example, the first module Reader is used to read the training data just uploaded, and then the DataTransform module, which converts the training data into instance objects, generally required for all modeling processes There are the first two modules;

In general configuring a component requires the following:

 - module:模型组件,Fate 当前支持 37 个模型组件 
- input: 
    - date:数据输入
    - module:模型输入
- output:
    - date:数据输出
    - module:模型输出

module defines the type of this component. Currently, Fate has 37 components that can be used. Of course, we can also develop new algorithm components ourselves;

Input and output are to set the input and output of the component respectively. Both of them support two types at the same time, namely data and model input and output.

For detailed configuration instructions, please refer to the official documentation: https://github.com/FederatedAI/FATE/blob/master/doc/tutorial/dsl_conf/dsl_conf_v2_setting_guide.zh.md

4.2. Prepare the conf file

Execute the following command:

 cp /data/projects/fate/examples/dsl/v2/hetero_logistic_regression/hetero_lr_normal_conf.json /data/projects/fate/examples/my_test/
Copy the longitudinal logistic regression algorithm example that comes with Fate directly to our my_test directory.

file

As can be seen from the above figure, under the component_parameters element, configure the table name read by the Reader component.

The configuration mainly configures the following:

  • DSL version
  • The roles and party_id of each party
  • Component operating parameters
For the component list and detailed configuration parameters of each component, please refer to the official documentation: https://fate.readthedocs.io/en/latest/zh/federatedml_component/

4.3. Submitting tasks

Execute the following command:

 flow job submit -d hetero_lr_normal_dsl.json -c hetero_lr_normal_conf.json
Specify the dsl and conf configuration files with -d and -c respectively.

After success, return the relevant information of the training task:

 {
    "data": {
        "board_url": "http://127.0.0.1:8080/index.html#/dashboard?job_id=202205070226373055640&role=guest&party_id=9999",
        "code": 0,
        "dsl_path": "/data/projects/fate/fateflow/jobs/202205070226373055640/job_dsl.json",
        "job_id": "202205070226373055640",
        "logs_directory": "/data/projects/fate/fateflow/logs/202205070226373055640",
        "message": "success",
        "model_info": {
            "model_id": "arbiter-10000#guest-9999#host-10000#model",
            "model_version": "202205070226373055640"
        },
        "pipeline_dsl_path": "/data/projects/fate/fateflow/jobs/202205070226373055640/pipeline_dsl.json",
        "runtime_conf_on_party_path": "/data/projects/fate/fateflow/jobs/202205070226373055640/guest/9999/job_runtime_on_party_conf.json",
        "runtime_conf_path": "/data/projects/fate/fateflow/jobs/202205070226373055640/job_runtime_conf.json",
        "train_runtime_conf_path": "/data/projects/fate/fateflow/jobs/202205070226373055640/train_runtime_conf.json"
    },
    "jobId": "202205070226373055640",
    "retcode": 0,
    "retmsg": "success"
}

There are several properties to pay attention to:

  • board_url: This address is the address of the FATE Board where the task status can be viewed.
  • job_id: The unique keyword of the task, you can view the details of the task through this ID on the FATE Board.
  • logs_directory: is the path of the log, you can view various log information of the task through this address.
  • model_info: There are two pieces of information, model_id and model_version, which will be used when performing prediction tasks. Before making predictions, you need to specify which model to use to perform prediction tasks, and these two pieces of information are the only keywords of the model.

5. Visualization

5.1. Task overview

Through the address of board_url in the returned information above, you can access the task overview page in the browser:

http://127.0.0.1:8080/index.html#/dashboard?job_id=202205070226373055640&role=guest&party_id=9999

It should be noted that because it is executed in the container, the IP address needs to be modified according to the actual situation.

The login username and password are both admin

file

The Dataset info on the left is the information of each participant, the middle is the running status of the task, showing the progress bar and time-consuming, the right is the components of the entire task process DAG picture, and the bottom is the task log information.

5.2. Component output

Click the view this job button in the middle to enter the details of the task:

file

Each component in the DAG diagram is clickable, select the hetero_lr_0 component, and click the view the outputs button in the lower right corner to enter the output page of the logistic regression component:

file

There are three TAGs in the upper left corner:

  • model output: The model output is the training result of the algorithm component.
  • data output: data output, the output of each component after data processing, for the input of downstream components.
  • log: The running log of the component.

Scan the code to follow for a surprise!

file


zlt2000
116 声望2.5k 粉丝

具备多年一线互联网分布式系统开发和设计经验,专注分享Java、SpringBoot、SpringCloud、分布式系统/微服务、中间件等领域。