1. Description
This article shares the test samples that come with Fate, conducts model training of the 纵向逻辑回归
algorithm, and visualizes the results through FATE Board
.
The content of this article is based on the environment deployed in "Privacy Computing FATE-Concept and Single Machine Deployment Guide" .
2. Enter the container
Execute the following command to enter the Fate container:
docker exec -it $(docker ps -aqf "name=standalone_fate") bash
You can see that there is a examples
directory, which contains test samples of various algorithms and test data.
After entering examples
, create a directory of my_test
:
cd examples
mkdir my_test
Note : All subsequent operations are performed in this directory by default.
3. Upload data
The first step is to prepare the data for training. We can upload the data to Fate through csv文件
;
The built-in test data is in the container /data/projects/fate/examples/data
directory:
It can be seen that each algorithm provides data on both the guest and host sides respectively.
3.1. Prepare the guest configuration
In the my_test
directory, execute the following command:
vi upload_hetero_guest.json
The content is as follows:
{
"file": "/data/projects/fate/examples/data/breast_hetero_guest.csv",
"head": 1,
"partition": 10,
"work_mode": 0,
"namespace": "experiment",
"table_name": "breast_hetero_guest"
}
- file: the path to the data file
- head: whether the data file contains a header
- partition: the number of partitions used to store data
- work_mode: work mode, 0 is the stand-alone version, 1 is the cluster version
- namespace: namespace
- table_name: data table name
3.2. Prepare host side configuration
In the my_test
directory, execute the following command:
vi upload_hetero_host.json
The content is as follows:
{
"file": "/data/projects/fate/examples/data/breast_hetero_host.csv",
"head": 1,
"partition": 10,
"work_mode": 0,
"namespace": "experiment",
"table_name": "breast_hetero_host"
}
Note that the file name and table name are not the same as the guest.
3.3. Execute upload
Execute the following two commands to upload the data of the guest and host respectively:
flow data upload -c upload_hetero_guest.json
flow data upload -c upload_hetero_host.json
Specify the configuration file with -c.
After success, return the relevant information of the upload task:
{
"data": {
"board_url": "http://127.0.0.1:8080/index.html#/dashboard?job_id=202205070640371260700&role=local&party_id=0",
"code": 0,
"dsl_path": "/data/projects/fate/fateflow/jobs/202205070640371260700/job_dsl.json",
"job_id": "202205070640371260700",
"logs_directory": "/data/projects/fate/fateflow/logs/202205070640371260700",
"message": "success",
"model_info": {
"model_id": "local-0#model",
"model_version": "202205070640371260700"
},
"namespace": "experiment",
"pipeline_dsl_path": "/data/projects/fate/fateflow/jobs/202205070640371260700/pipeline_dsl.json",
"runtime_conf_on_party_path": "/data/projects/fate/fateflow/jobs/202205070640371260700/local/0/job_runtime_on_party_conf.json",
"runtime_conf_path": "/data/projects/fate/fateflow/jobs/202205070640371260700/job_runtime_conf.json",
"table_name": "breast_hetero_guest",
"train_runtime_conf_path": "/data/projects/fate/fateflow/jobs/202205070640371260700/train_runtime_conf.json"
},
"jobId": "202205070640371260700",
"retcode": 0,
"retmsg": "success"
}
3.4. Checking the data
Execute the following command to view information about the table:
flow table info -t breast_hetero_guest -n experiment
Return after execution:
{
"data": {
"address": {
"home": null,
"name": "breast_hetero_guest",
"namespace": "experiment",
"storage_type": "LMDB"
},
"count": 569,
"exist": 1,
"namespace": "experiment",
"partition": 10,
"schema": {
"header": "y,x0,x1,x2,x3,x4,x5,x6,x7,x8,x9",
"sid": "id"
},
"table_name": "breast_hetero_guest"
},
"retcode": 0,
"retmsg": "success"
}
4. Model training
Next, we start the modeling task. We need to prepare two configuration files, the process configuration file dsl and the parameter configuration file conf.
4.1. Prepare the dsl file
Execute the following command:
cp /data/projects/fate/examples/dsl/v2/hetero_logistic_regression/hetero_lr_normal_dsl.json /data/projects/fate/examples/my_test/
Copy the longitudinal logistic regression algorithm example that comes with Fate directly to our my_test
directory.
Fate implements componentization of various algorithms. The dsl file mainly configures which components the entire modeling process consists of:
For example, the first module Reader
is used to read the training data just uploaded, and then the DataTransform
module, which converts the training data into instance objects, generally required for all modeling processes There are the first two modules;
In general configuring a component requires the following:
- module:模型组件,Fate 当前支持 37 个模型组件
- input:
- date:数据输入
- module:模型输入
- output:
- date:数据输出
- module:模型输出
module defines the type of this component. Currently, Fate has 37 components that can be used. Of course, we can also develop new algorithm components ourselves;
Input and output are to set the input and output of the component respectively. Both of them support two types at the same time, namely data and model input and output.
For detailed configuration instructions, please refer to the official documentation: https://github.com/FederatedAI/FATE/blob/master/doc/tutorial/dsl_conf/dsl_conf_v2_setting_guide.zh.md
4.2. Prepare the conf file
Execute the following command:
cp /data/projects/fate/examples/dsl/v2/hetero_logistic_regression/hetero_lr_normal_conf.json /data/projects/fate/examples/my_test/
Copy the longitudinal logistic regression algorithm example that comes with Fate directly to our my_test
directory.
As can be seen from the above figure, under the component_parameters
element, configure the table name read by the Reader
component.
The configuration mainly configures the following:
- DSL version
- The roles and party_id of each party
- Component operating parameters
For the component list and detailed configuration parameters of each component, please refer to the official documentation: https://fate.readthedocs.io/en/latest/zh/federatedml_component/
4.3. Submitting tasks
Execute the following command:
flow job submit -d hetero_lr_normal_dsl.json -c hetero_lr_normal_conf.json
Specify the dsl and conf configuration files with -d and -c respectively.
After success, return the relevant information of the training task:
{
"data": {
"board_url": "http://127.0.0.1:8080/index.html#/dashboard?job_id=202205070226373055640&role=guest&party_id=9999",
"code": 0,
"dsl_path": "/data/projects/fate/fateflow/jobs/202205070226373055640/job_dsl.json",
"job_id": "202205070226373055640",
"logs_directory": "/data/projects/fate/fateflow/logs/202205070226373055640",
"message": "success",
"model_info": {
"model_id": "arbiter-10000#guest-9999#host-10000#model",
"model_version": "202205070226373055640"
},
"pipeline_dsl_path": "/data/projects/fate/fateflow/jobs/202205070226373055640/pipeline_dsl.json",
"runtime_conf_on_party_path": "/data/projects/fate/fateflow/jobs/202205070226373055640/guest/9999/job_runtime_on_party_conf.json",
"runtime_conf_path": "/data/projects/fate/fateflow/jobs/202205070226373055640/job_runtime_conf.json",
"train_runtime_conf_path": "/data/projects/fate/fateflow/jobs/202205070226373055640/train_runtime_conf.json"
},
"jobId": "202205070226373055640",
"retcode": 0,
"retmsg": "success"
}
There are several properties to pay attention to:
- board_url: This address is the address of the FATE Board where the task status can be viewed.
- job_id: The unique keyword of the task, you can view the details of the task through this ID on the FATE Board.
- logs_directory: is the path of the log, you can view various log information of the task through this address.
- model_info: There are two pieces of information, model_id and model_version, which will be used when performing prediction tasks. Before making predictions, you need to specify which model to use to perform prediction tasks, and these two pieces of information are the only keywords of the model.
5. Visualization
5.1. Task overview
Through the address of board_url
in the returned information above, you can access the task overview page in the browser:
http://127.0.0.1:8080/index.html#/dashboard?job_id=202205070226373055640&role=guest&party_id=9999
It should be noted that because it is executed in the container, the IP address needs to be modified according to the actual situation.
The login username and password are both admin
The Dataset info on the left is the information of each participant, the middle is the running status of the task, showing the progress bar and time-consuming, the right is the components of the entire task process DAG
picture, and the bottom is the task log information.
5.2. Component output
Click the view this job
button in the middle to enter the details of the task:
Each component in the DAG diagram is clickable, select the hetero_lr_0
component, and click the view the outputs
button in the lower right corner to enter the output page of the logistic regression component:
There are three TAGs in the upper left corner:
- model output: The model output is the training result of the algorithm component.
- data output: data output, the output of each component after data processing, for the input of downstream components.
- log: The running log of the component.
Scan the code to follow for a surprise!
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。