ML Spring 2021 HW1: Regression

课程首页：ML 2021 Spring (ntu.edu.tw)

库

# PyTorch
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

# For data preprocess
import numpy as np
import csv
import os

# For plotting
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure

myseed = 42069  # set a random seed for reproducibility
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
np.random.seed(myseed)
torch.manual_seed(myseed)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(myseed)

数据处理

1. 数据下载

Kaggle: ML2021Spring-hw1 | Kaggle

2. 检视数据

数据集分为 training data 和 testing data 两部分。

总览数据：

3. 预处理

三个数据集：

train: 训练集
dev: 验证集
test: 测试集（没有 target）

预处理：

读取csv文件
特征提取
将 covid.train.csv 分为训练集和测试集
归一化数据

读入数据

我们用一个简化的数据集做演示。

path = 'DataExample.csv'

with open(path, 'r') as fp:
            data = list(csv.reader(fp))
            data = np.array(data[1:])[:, 1:].astype(float)

我们用一个简化的数据集做演示。

DataExample

id	AL	cli	ili	hh_cmnty_cli	tested_positive
0	1	0.81461	0.7713562	25.6489069	19.586492
1	1	0.8389952	0.8077665	25.6791006	20.1518381
2	1	0.8978015	0.8878931	26.0605436	20.7049346
3	1	0.9728421	0.9654959	25.7540871	21.2929114
4	1	0.9553056	0.9630788	25.9470152	21.1666563

把数据转成用list存储

data = list(csv.reader(fp))
print(data)

out:

[['id', 'AL', 'AK', 'AZ', 'cli', 'ili', 'hh_cmnty_cli', 'tested_positive'], 
 ['0', '1', '0', '0', '0.81461', '0.7713562', '25.6489069', '19.586492'], 
 ['1', '1', '0', '0', '0.8389952', '0.8077665', '25.6791006', '20.1518381'], 
 ['2', '1', '0', '0', '0.8978015', '0.8878931', '26.0605436', '20.7049346'], 
 ['3', '1', '0', '0', '0.9728421', '0.9654959', '25.7540871', '21.2929114'], 
 ['4', '1', '0', '0', '0.9553056', '0.9630788', '25.9470152', '21.1666563']]

但我们不需要第1行和第1列

data = np.array(data[1:]) # 删去了第一行
print(data)

out:

[['0' '1' '0' '0' '0.81461' '0.7713562' '25.6489069' '19.586492']
 ['1' '1' '0' '0' '0.8389952' '0.8077665' '25.6791006' '20.1518381']
 ['2' '1' '0' '0' '0.8978015' '0.8878931' '26.0605436' '20.7049346']
 ['3' '1' '0' '0' '0.9728421' '0.9654959' '25.7540871' '21.2929114']
 ['4' '1' '0' '0' '0.9553056' '0.9630788' '25.9470152' '21.1666563']]

data = data[:, 1:].astype(float) # 删去了第一列，并把数据类型修改为浮点型
print(data)

out:

[[ 1.         0.         0.         0.81461    0.7713562 25.6489069
  19.586492 ]
 [ 1.         0.         0.         0.8389952  0.8077665 25.6791006
  20.1518381]
 [ 1.         0.         0.         0.8978015  0.8878931 26.0605436
  20.7049346]
 [ 1.         0.         0.         0.9728421  0.9654959 25.7540871
  21.2929114]
 [ 1.         0.         0.         0.9553056  0.9630788 25.9470152
  21.1666563]]

分数据集

DataExample

1	0.81461	0.7713562	25.6489069	19.586492	0.8389952	0.8077665	25.6791006	20.1518381	0.8978015	0.8878931	26.0605436	20.7049346
1	0.8389952	0.8077665	25.6791006	20.1518381	0.8978015	0.8878931	26.0605436	20.7049346	0.9728421	0.9654959	25.7540871	21.2929114
1	0.8978015	0.8878931	26.0605436	20.7049346	0.9728421	0.9654959	25.7540871	21.2929114	0.9553056	0.9630788	25.9470152	21.1666563
1	0.9728421	0.9654959	25.7540871	21.2929114	0.9553056	0.9630788	25.9470152	21.1666563	0.9475134	0.9687637	26.3505008	19.8966066
1	0.9553056	0.9630788	25.9470152	21.1666563	0.9475134	0.9687637	26.3505008	19.8966066	0.8838331	0.8930201	26.4806235	20.1784284

对于训练数据

feats = list(range(14)) # 14 = 3 + 4 + 4 + 3

target = data[:, -1]
data = data[:, feats]

print(target)
print(data)

out:

[20.7049346 21.2929114 21.1666563 19.8966066 20.1784284] # targets

[[ 1.         0.         0.         0.81461    0.7713562 25.6489069
  19.586492   0.8389952  0.8077665 25.6791006 20.1518381  0.8978015
   0.8878931 26.0605436]
 [ 1.         0.         0.         0.8389952  0.8077665 25.6791006
  20.1518381  0.8978015  0.8878931 26.0605436 20.7049346  0.9728421
   0.9654959 25.7540871]
 [ 1.         0.         0.         0.8978015  0.8878931 26.0605436
  20.7049346  0.9728421  0.9654959 25.7540871 21.2929114  0.9553056
   0.9630788 25.9470152]
 [ 1.         0.         0.         0.9728421  0.9654959 25.7540871
  21.2929114  0.9553056  0.9630788 25.9470152 21.1666563  0.9475134
   0.9687637 26.3505008]
 [ 1.         0.         0.         0.9553056  0.9630788 25.9470152
  21.1666563  0.9475134  0.9687637 26.3505008 19.8966066  0.8838331
   0.8930201 26.4806235]]

现在我们共有5笔data，接下来我们将训练数据分为训练集和测试集

# for train set
indices = [i for i in range(len(data)) if i % 3 != 0] 
print(indices)

out:

[1, 2, 4]

即训练集数据的下标为1, 2, 4

那么剩下的就是测试集的数据

# for dev set
indices_2 = [i for i in range(len(data)) if i %3 == 0]
print(indices_2)

out:

[0, 3]

接着我们把刚刚的data和target都转为tensor

data = torch.FloatTensor(data[indices])
target = torch.FloatTensor(target[indices])

print(data)
print(target)

out:

tensor([[ 1.0000,  0.0000,  0.0000,  0.8390,  0.8078, 25.6791, 20.1518,  0.8978,
          0.8879, 26.0605, 20.7049,  0.9728,  0.9655, 25.7541],
        [ 1.0000,  0.0000,  0.0000,  0.8978,  0.8879, 26.0605, 20.7049,  0.9728,
          0.9655, 25.7541, 21.2929,  0.9553,  0.9631, 25.9470],
        [ 1.0000,  0.0000,  0.0000,  0.9553,  0.9631, 25.9470, 21.1667,  0.9475,
          0.9688, 26.3505, 19.8966,  0.8838,  0.8930, 26.4806]])

tensor([21.2929, 21.1667, 20.1784])

数据归一化

可以看到不同feature的数据大小大不一样，为了平衡它们对模型的影响，有必要对数据归一化处理。方法是：

通常会将所有的数据归一化后使它们落在[-1,1]或者[0,1]之间。这里以后者为例。

线性函数归一化(Min-Max scaling)：

对于一组数据，最小值为m，最大值为M，那么对于其中的任意一个数据X，其归一化公式为：

$$ X_{norm} = \frac{X - m}{M- m} $$

注：该方法实现对原始数据的等比例缩放。

0均值标准化（Z-score standardization) ：

0均值归一化方法将原始数据集归一化为均值为0、方差1的数据集，归一化公式如下：

$$ z = \frac{x-\mu}{\sigma} $$

注：该种归一化方式要求原始数据的分布可以近似为高斯分布，否则归一化的效果会变得很糟糕！

这里我们采用0均值标准化。

data[:, 3:] =( data[:, 3:] - data[:, 3:].mean(dim=0, keepdim=True))\
             / data[:, 3:].std(dim=0, keepdim=True) # std 即标准差
print(data)

out:

tensor([[ 1.0000,  0.0000,  0.0000, -1.0037, -1.0104, -1.1051, -1.0286, -1.0893,
         -1.1540,  0.0184,  0.1048,  0.7532,  0.6065, -0.8144],
        [ 1.0000,  0.0000,  0.0000,  0.0075,  0.0212,  0.8424,  0.0599,  0.8764,
          0.5413, -1.0091,  0.9435,  0.3813,  0.5477, -0.3017],
        [ 1.0000,  0.0000,  0.0000,  0.9962,  0.9892,  0.2628,  0.9687,  0.2129,
          0.6127,  0.9907, -1.0483, -1.1346, -1.1542,  1.1161]])

在作业中我尝试了两种方法，发现线性函数归一化的收敛速度明显慢于0均值标准化，而且最后的精度低一些。

载入数据

A DataLoader loads data from a given Dataset into batches.

看看DataLoader与DataSet之间的关系，并了解什么是batch。

注：测试时shuffle要置为false，不然每次训练集顺序会不一样，有误差。

4. 完整代码

class COVID19Dataset(Dataset):
    ''' Dataset for loading and preprocessing the COVID19 dataset '''
    def __init__(self,
                 path,
                 mode='train',
                 target_only=False):
        self.mode = mode

        # Read data into numpy arrays
        with open(path, 'r') as fp:
            data = list(csv.reader(fp))
            data = np.array(data[1:])[:, 1:].astype(float)
        
        if not target_only:
            # feats存放的就是各个feature再data中对应的index
            feats = list(range(93)) # 93 = 40 states + day 1 (18) + day 2 (18) + day 3 (17)
        else:
            # TODO: Using 40 states & 2 tested_positive features (indices = 57 & 75)
            pass

        if mode == 'test':
            # Testing data
            # data: 893 x 93 (40 states + day 1 (18) + day 2 (18) + day 3 (17))
            data = data[:, feats]
            self.data = torch.FloatTensor(data)
        else:
            # Training data (train/dev sets)
            # data: 2700 x 94 (40 states + day 1 (18) + day 2 (18) + day 3 (18))
            target = data[:, -1]
            data = data[:, feats]
            
            # Splitting training data into train & dev sets
            if mode == 'train':
                indices = [i for i in range(len(data)) if i % 10 != 0]
            elif mode == 'dev':
                indices = [i for i in range(len(data)) if i % 10 == 0]
            
            # Convert data into PyTorch tensors
            self.data = torch.FloatTensor(data[indices])
            self.target = torch.FloatTensor(target[indices])

        # Normalize features (you may remove this part to see what will happen)
        self.data[:, 40:] = 
            (self.data[:, 40:] - self.data[:, 40:].mean(dim=0, keepdim=True)) 
            / self.data[:, 40:].std(dim=0, keepdim=True)

        self.dim = self.data.shape[1]

        print('Finished reading the {} set of COVID19 Dataset ({} samples found, each dim = {})'
              .format(mode, len(self.data), self.dim))

    def __getitem__(self, index):
        # Returns one sample at a time
        if self.mode in ['train', 'dev']:
            # For training
            return self.data[index], self.target[index]
        else:
            # For testing (no target)
            return self.data[index]

    def __len__(self):
        # Returns the size of the dataset
        return len(self.data)
    
# DataLoader

def prep_dataloader(path, mode, batch_size, n_jobs=0, target_only=False):
    ''' Generates a dataset, then is put into a dataloader. '''
    dataset = COVID19Dataset(path, mode=mode, target_only=target_only)  # Construct dataset
    dataloader = DataLoader(
        dataset, batch_size,
        shuffle=(mode == 'train'), drop_last=False,
        num_workers=n_jobs, pin_memory=True)                            # Construct dataloader
    return dataloader

网络构建

NeuralNet is an nn.Module designed for regression. The DNN consists of 2 fully-connected layers with ReLU activation. This module also included a function cal_loss for calculating loss.

1. ReLU

这里写图片描述

非线性激活函数

如果不使用激励函数，那么在这种情况下每一层的输出都是上层输入的线性函数，那么无论神经网络有多少层，输出都是输入的线性组合，与没有隐藏层效果相当。这就是最原始的感知机（perceptron）。

因此，我们决定引入非线性函数作为激励函数，这样深层神经网络就有意义了。其输出不再是输入的线性组合，而可以逼近任意函数，最早的想法是用sigmoid函数或者tanh函数，输出有界，很容易充当下一层的输入。

Why ReLU?

采用sigmoid等函数，算激活函数时候（指数运算），计算量大，反向传播求误差梯度时，求导涉及除法，计算量相当大。而采用Relu激活函数，整个过程的计算量节省很多。

对于深层网络，sigmoid函数反向传播时，很容易就出现梯度消失的情况（在sigmoid函数接近饱和区时，变化太缓慢，导数趋于0，这种情况会造成信息丢失），从而无法完成深层网络的训练。

Relu会使一部分神经元的输出为0，这样就造成了网络的稀疏性，并且减少了参数的相互依存关系，缓解了过拟合问题的发生。

2. nn.MSEloss()

$$ loss({\hat{y},y})=\frac1 n \displaystyle\sum(\hat{y}_i-y_i)^2 $$

pytorch中MSEloss()有两个布尔型参数：

参数	作用
size_average	是否求和后求平均
reduce	是否输出为标量
reduction	是否输出为标量

举例说明！

input = torch.randn(2,3,requires_grad=True) # prediction
target = torch.ones(2,3) # ground truth
print(f'input: {input}\n target: {target})

Out:

input: tensor([[-0.0733, -2.2085, -0.6919],
        [-1.1417, -1.1327, -1.5466]], requires_grad=True)
target: tensor([[1., 1., 1.],
        [1., 1., 1.]])

default

默认size_average=True，reduce=True。最后返回一个标量。

loss_1 = nn.MSELoss()
output_1 = loss_1(input,target)

print(f'loss_1: {output_1}')

Out:

loss_1: 1.8783622980117798

size_average=False

即求和后不会除以n！

loss_1 = nn.MSELoss(size_average=False)
output_1 = loss_1(input,target)

print(f'loss_1: {output_1}')

Out:

loss_1: 11.819371223449707

reduce=False

返回tensor。

loss_1 = nn.MSELoss(reduce=False)
output_1 = loss_1(input,target)

print(f'loss_1: {output_1}')

Out:

loss_1: tensor([[0.0039, 0.2338, 3.5550],
        [0.1358, 2.1851, 0.1533]], grad_fn=<MseLossBackward0>)

至于reduction，它其实就是size_average和reduce的结合！

这是legacy_get_string函数的一部分：

    if size_average is None:
        size_average = True
    if reduce is None:
        reduce = True

    if size_average and reduce:
        ret = 'mean'
    elif reduce:
        ret = 'sum'
    else:
        ret = 'none'
    if emit_warning:
        warnings.warn(warning.format(ret))
    return ret

3. Regularization

原始线性模型：

$$ f(\mathbf{x}) =b+\mathbf{w}^\mathrm{T}\mathbf{x}\\ $$

也即：

$$ f(\mathbf{x})=b+w_1x_1+w_2x_2+...+w_nx_n $$

但是，我们如何确定n呢？或者说我们怎么知道要为x设置几个特征呢？x维度太高容易过拟合（overfitting），太低又会欠拟合（underfitting）。这时，为了弱化某些维度的影响，让函数变得平滑，我们可以对函数正则化（regularization）。

$$ L(\mathbf{w,},b)=\displaystyle\sum(y_i-(b+\mathbf{w}^\mathrm{T}\mathbf{x_i}))^2+\lambda\displaystyle\sum_i (w_i)^2 $$

这里要注意的点是过拟合和欠拟合发生在测试集上。而对于训练集，维度越高，拟合效果越好，如图：

维度越高，函数域也就越大，那么当然可以覆盖到训练集上的最佳函数。

4. 完整代码

class NeuralNet(nn.Module):
    ''' A simple fully-connected deep neural network '''
    def __init__(self, input_dim):
        super(NeuralNet, self).__init__()

        # Define your neural network here
        # TODO: How to modify this model to achieve better performance?
        self.net = nn.Sequential(
            nn.Linear(input_dim, 64),
            nn.ReLU(),
            nn.Linear(64, 1)
        )

        # Mean squared error loss
        self.criterion = nn.MSELoss(reduction='mean')

    def forward(self, x):
        ''' Given input of size (batch_size x input_dim), compute output of the network '''
        return self.net(x).squeeze(1)

    def cal_loss(self, pred, target):
        ''' Calculate loss '''
        # TODO: you may implement L1/L2 regularization here
        return self.criterion(pred, target)

训练

1. 基本函数

getattr()

就相当于“."操作。参数如下：

object：对象的实例
name：字符串，对象的成员函数的名字或者成员变量
default：当对象中没有该属性时，返回的默认值
异常：当没有该属性并且没有默认的返回值时，抛出"AttrbuteError"

getattr(object, name) = object.name

model.train() v.s. model.eval()

model.train

启用 Batch Normalization 和 Dropout。

如果模型中有BN层 (Batch Normalization) 和 Dropout，需要在训练时添加model.train()。model.train()是保证BN层能够用到每一批数据的均值和方差。对于Dropout，model.train()是随机取一部分网络连接来训练更新参数。

model.eval()

不启用 Batch Normalization 和 Dropout。

如果模型中有BN层(Batch Normalization）和Dropout，在测试时添加model.eval()。model.eval()是保证BN层能够用全部训练数据的均值和方差，即测试过程中要保证BN层的均值和方差不变。对于Dropout，model.eval()是利用到了所有网络连接，即不进行随机舍弃神经元。

训练完train样本后，生成的模型model要用来测试样本。在model(test)之前，需要加上model.eval()，否则的话，有输入数据，即使不训练，它也会改变权值。这是model中含有BN层和Dropout所带来的的性质。

detach().cpu()

detach()

作用：阻断反向传播
返回值：tensor，但变量仍在GPU上

cpu()

作用：将数据移动到CPU上
返回值：tensor

item()

作用：获取tensor的值（tensor中只能有一个元素！）

numpy()

作用：将tensor转成numpy array

2. 一个基本套路

在训练时遍历epochs的过程中，我们常常会依次使用optimizer.zero_grad()，loss.backward() 和 optimizer.step() 三个函数。

比如：

while epoch < n_epochs:
        model.train()                           # set model to training mode
        for x, y in tr_set:                     # iterate through the dataloader
            optimizer.zero_grad()               # set gradient to zero
            x, y = x.to(device), y.to(device)   # move data to device (cpu/cuda)
            pred = model(x)                     # forward pass (compute output)
            mse_loss = model.cal_loss(pred, y)  # compute loss
            mse_loss.backward()                 # compute gradient (backpropagation)
            optimizer.step()                    # update model with optimizer
            loss_record['train'].append(mse_loss.detach().cpu().item())
            ....

总的来说，它们的作用如下：

optimizer.zero_grad()：梯度归零
- 训练的过程通常使用mini-batch方法，所以如果不将梯度清零的话，梯度会与上一个batch的数据相关，因此该函数要写在反向传播和梯度下降之前。
loss.backward()：反向传播计算每个参数的梯度
- 如果没有进行tensor.backward()的话，梯度值将会是None，因此loss.backward()要写在optimizer.step()之前。
optimizer.step()：梯度下降更新参数
- step()函数的作用是执行一次优化步骤，通过梯度下降法来更新参数的值。因为梯度下降是基于梯度的，所以在执行optimizer.step()函数前应先执行loss.backward()函数来计算梯度。

函数中常见的参数变量：

param_groups：Optimizer类在实例化时会在构造函数中创建一个param_groups列表，列表中有num_groups个长度为 6 的param_group字典（num_groups取决于定义optimizer时传入了几组参数），每个param_group包含了 ['params', 'lr', 'momentum', 'dampening', 'weight_decay', 'nesterov'] 这6组键值对。
param_group['params']：由传入的模型参数组成的列表，即实例化Optimizer类时传入该group的参数，如果参数没有分组，则为整个模型的参数model.parameters()，每个参数是一个torch.nn.parameter.Parameter对象。

3. 完整代码

def train(tr_set, dv_set, model, config, device):
    ''' DNN training '''

    n_epochs = config['n_epochs']  # Maximum number of epochs

    # Setup optimizer
    optimizer = getattr(torch.optim, config['optimizer'])(
        model.parameters(), **config['optim_hparas'])

    min_mse = 1000.
    loss_record = {'train': [], 'dev': []}      # for recording training loss
    early_stop_cnt = 0
    epoch = 0
    while epoch < n_epochs:
        model.train()                           # set model to training mode
        for x, y in tr_set:                     # iterate through the dataloader
            optimizer.zero_grad()               # set gradient to zero
            x, y = x.to(device), y.to(device)   # move data to device (cpu/cuda)
            pred = model(x)                     # forward pass (compute output)
            mse_loss = model.cal_loss(pred, y)  # compute loss
            mse_loss.backward()                 # compute gradient (backpropagation)
            optimizer.step()                    # update model with optimizer
            loss_record['train'].append(mse_loss.detach().cpu().item())

        # After each epoch, test your model on the validation (development) set.
        dev_mse = dev(dv_set, model, device)
        if dev_mse < min_mse:
            # Save model if your model improved
            min_mse = dev_mse
            print('Saving model (epoch = {:4d}, loss = {:.4f})'
                .format(epoch + 1, min_mse))
            torch.save(model.state_dict(), config['save_path'])  # Save model to specified path
            early_stop_cnt = 0
        else:
            early_stop_cnt += 1

        epoch += 1
        loss_record['dev'].append(dev_mse)
        if early_stop_cnt > config['early_stop']:
            # Stop training if your model stops improving for "config['early_stop']" epochs.
            break

    print('Finished training after {} epochs'.format(epoch))
    return min_mse, loss_record

验证

dev() 与 train() 很相似，但注意 model 的模式是 eval() ，即不进行 BN 和 Dropout。

完整代码

def dev(dv_set, model, device):
    model.eval()                                # set model to evalutation mode
    total_loss = 0
    for x, y in dv_set:                         # iterate through the dataloader
        x, y = x.to(device), y.to(device)       # move data to device (cpu/cuda)
        with torch.no_grad():                   # disable gradient calculation
            pred = model(x)                     # forward pass (compute output)
            mse_loss = model.cal_loss(pred, y)  # compute loss
        total_loss += mse_loss.detach().cpu().item() * len(x)  # accumulate loss
    total_loss = total_loss / len(dv_set.dataset)              # compute averaged loss

    return total_loss

测试

1. torch.cat()

cat() 可以将多个tensor拼接在一起。

参数：

inputs：待连接的张量序列，可以是任意相同tensor类型的序列
dim：选择的扩维, 必须在0到len(inputs[0])之间，沿着此维连接张量序列

注意

输入数据必须是序列，序列中数据是任意相同的shape的同类型tensor
维度不可以超过输入数据的任一个张量的维度

例如：

t1 = torch.Tensor([1,2,3])
t2 = torch.Tensor([4,5,6])
t3 = torch.Tensor([7,8,9])
list = [t1,t2,t3]
t = torch.cat(list,dim=0)
print(t)

Out:

tensor([1., 2., 3., 4., 5., 6., 7., 8., 9.])

修改dim：

t1 = torch.Tensor([[1,2,3],[3,2,1]])
t2 = torch.Tensor([[4,5,6],[6,5,4]])
t3 = torch.Tensor([[7,8,9],[9,8,7]])
list = [t1,t2,t3]
t = torch.cat(list,dim=1)
print(t)

Out:

tensor([[1., 2., 3., 4., 5., 6., 7., 8., 9.],
        [3., 2., 1., 6., 5., 4., 9., 8., 7.]])

2. 完整代码

def test(tt_set, model, device):
    model.eval()                                # set model to evalutation mode
    preds = []
    for x in tt_set:                            # iterate through the dataloader
        x = x.to(device)                        # move data to device (cpu/cuda)
        with torch.no_grad():                   # disable gradient calculation
            pred = model(x)                     # forward pass (compute output)
            preds.append(pred.detach().cpu())   # collect prediction
    preds = torch.cat(preds, dim=0).numpy()     # concatenate all predictions and convert to a numpy     array
    return preds

设置超参数

1. 完整代码

device = get_device()                 # get the current available device ('cpu' or 'cuda')
os.makedirs('models', exist_ok=True)  # The trained model will be saved to ./models/
target_only = False                   # TODO: Using 40 states & 2 tested_positive features

# TODO: How to tune these hyper-parameters to improve your model's performance?
config = {
    'n_epochs': 3000,                # maximum number of epochs
    'batch_size': 270,               # mini-batch size for dataloader
    'optimizer': 'SGD',              # optimization algorithm (optimizer in torch.optim)
    'optim_hparas': {                # hyper-parameters for the optimizer (depends on which optimizer you are using)
        'lr': 0.001,                 # learning rate of SGD
        'momentum': 0.9              # momentum for SGD
    },
    'early_stop': 200,               # early stopping epochs (the number epochs since your model's last improvement)
    'save_path': 'models/model.pth'  # your model will be saved here
}

载入数据和模型

1. 完整代码

tr_set = prep_dataloader(tr_path, 'train', config['batch_size'], target_only=target_only)
dv_set = prep_dataloader(tr_path, 'dev', config['batch_size'], target_only=target_only)
tt_set = prep_dataloader(tt_path, 'test', config['batch_size'], target_only=target_only)

开始！

model_loss, model_loss_record = train(tr_set, dv_set, model, config, device)

数据可视化

1. training

先看看 MSEloss 随训练次数增加的变化。

plot_learning_curve(model_loss_record, title='deep model')

再看看预测效果。

del model
model = NeuralNet(tr_set.dataset.dim).to(device) 
ckpt = torch.load(config['save_path'], map_location='cpu')  # Load your best model
model.load_state_dict(ckpt)
plot_pred(dv_set, model, device)  # Show prediction on the validation set

蓝线上的点表示预测值等于实际值。

2. testing

def save_pred(preds, file):
    ''' Save predictions to specified file '''
    print('Saving results to {}'.format(file))
    with open(file, 'w') as fp:
        writer = csv.writer(fp)
        writer.writerow(['id', 'tested_positive'])
        for i, p in enumerate(preds):
            writer.writerow([i, p])

preds = test(tt_set, model, device)  # predict COVID-19 cases with your model
save_pred(preds, 'pred.csv')         # save prediction file to pred.csv

preds结果：

Improvements

我们需要修改sample code，让模型更好！

Public leaderboard

simple baseline: 2.04826
medium baseline: 1.36937
strong baseline: 0.89266

Hints

Feature selection (what other features are useful?)
DNN architecture (layers? dimension? activation function?)
Training (mini-batch? optimizer? learning rate?)
L2 regularization
There are some mistakes in the sample code, can you find them?

TODO1: 修改训练所用特征

在COVID19Dataset中，我们可以修改提取的特征。

 if not target_only:
            # feats存放的就是各个feature再data中对应的index
            feats = list(range(93)) # 93 = 40 states + day 1 (18) + day 2 (18) + day 3 (17)
        else:
            # TODO: Using 40 states & 2 tested_positive features (indices = 57 & 75)
            # 仅使用42个特征
            feats = list(range(40))
            feats.append(57)
            feats.append(75)
            pass

来看看最后的结果！

收敛速度大幅度提升！

得分：

TODO2:加入正则化

在设置超参数时，加入正则化。

config = {
    'n_epochs': 3000,                # maximum number of epochs
    'batch_size': 270,               # mini-batch size for dataloader
    'optimizer': 'SGD',              # optimization algorithm (optimizer in torch.optim)
    'optim_hparas': {                # hyper-parameters for the optimizer (depends on which optimizer you are using)
        'lr': 0.001,                 # learning rate of SGD
        'momentum': 0.9              # momentum for SGD
        'weight_decay': 0.1             # regularization
    },

结果有提升，但并不明显，而且收敛的速度变缓（合理）……

TODO3:修改神经网络结构

Add more layers

class NeuralNet(nn.Module):
    ''' A simple fully-connected deep neural network '''
    def __init__(self, input_dim):
        super(NeuralNet, self).__init__()

        # Define your neural network here
        # TODO: How to modify this model to achieve better performance?
        self.net = nn.Sequential(
            nn.Linear(input_dim, 128),
            nn.ReLU(),
            nn.Linear(128,64),
            nn.ReLU(),
            nn.Linear(64, 1)
        )
        ...

随缘调参，只不过是多加了个线性层和ReLU层……

有一定效果，但也不是很显著……

PReLU

把ReLU层改成PReLU，效果拉跨……

TODO4:优化器

Adam

一言难尽，比SGD稍微差了一点。

而且MSEloss似乎波动有点大……

TODO5:修改错误

对于测试集的标准化应当选取训练集的平均数与方差，原因是测试集可能很小，平均值和方差不能反映大量数据的特征。

if mode == 'test':
            # Testing data
            # data: 893 x 93 (40 states + day 1 (18) + day 2 (18) + day 3 (17))
            data = data[:, feats]
            self.data = torch.FloatTensor(data)
            self.data[:, 40:] = \
            (self.data[:, 40:] - self.data[:, 40:].mean(dim=0, keepdim=True)) \
            / self.data[:, 40:].std(dim=0, keepdim=True)
        else:
            # Training data (train/dev sets)
            # data: 2700 x 94 (40 states + day 1 (18) + day 2 (18) + day 3 (18))
            target = data[:, -1]
            data = data[:, feats]
            indices_train = [i for i in range(len(data)) if i % 10 != 0]
            tr_data = self.data = torch.FloatTensor(data[indices_train])
            tr_mean = tr_data[:, 40:].mean(dim=0, keepdim=True)
            tr_std = tr_data[:, 40:].std(dim=0, keepdim=True)
            # Splitting training data into train & dev sets
            if mode == 'train':
                indices = indices_train
                self.data = tr_data
                self.target = torch.FloatTensor(target[indices])
            elif mode == 'dev':
                indices = [i for i in range(len(data)) if i % 10 == 0]
                self.data = torch.FloatTensor(data[indices])
                self.target = torch.FloatTensor(target[indices])
            self.data[:, 40:] = \
            (self.data[:, 40:] - tr_mean) \
            / tr_std

但结果很奇怪，得分反而差了很多……

感想

花了两个下午调参，最后还没有第一次效果好……感觉自己对模型还不够熟悉且没有调参经验。这次作业就暂时告一段落了，那天知识储备丰富了说不定就可以越过strong baseline了……

Some useful links

Colab: ML2021Spring - HW1.ipynb - Colaboratory (google.com)

Pytorch Totorial P1: ML2021 Pytorch tutorial part 1 - YouTube

Kaggle: ML2021Spring-hw1 | Kaggle

学习笔记：李宏毅2021春季机器学习课程视频笔记1：Introduction, Colab & PyTorch Tutorials, HW1_诸神缄默不语的博客-CSDN博客

Regularization：pytorch实现L2和L1正则化regularization的方法_pan_jinquan的博客-CSDN博客

Activation Function: 常用激活函数（激励函数）理解与总结_tyhj_sf的博客空间-CSDN博客_激活函数

HWExample: 2021李宏毅机器学习课程作业一 - 简书 (jianshu.com)

Cross baseline: 李宏毅ML2021Spring HW1 - lizhi334 - 博客园 (cnblogs.com)

Reference

ReLU：Relu的作用_KAMITA的博客-CSDN博客

train()与eval()：Pytorch：model.train()和model.eval()用法和区别，以及model.eval()和torch.no_grad()的区别_初识-CV的博客-CSDN博客&spm=1018.2226.3001.4187)

optimizer.step()：理解optimizer.zero_grad(), loss.backward(), optimizer.step()的作用及原理_PanYHHH的博客-CSDN博客&spm=1018.2226.3001.4187)

Source: Heng-Jui Chang @ NTUEE (https://github.com/ga642381/M...)

ML Spring 2021 HW1: Regression

库

数据处理

1. 数据下载

2. 检视数据

3. 预处理

读入数据

分数据集

数据归一化

载入数据

4. 完整代码

网络构建

1. ReLU

非线性激活函数

Why ReLU?

2. nn.MSEloss()

3. Regularization

4. 完整代码

训练

1. 基本函数

getattr()

model.train() v.s. model.eval()

detach().cpu()

2. 一个基本套路

3. 完整代码

验证

完整代码

测试

1. torch.cat()

2. 完整代码

设置超参数

1. 完整代码

载入数据和模型

1. 完整代码

开始！

数据可视化

1. training

2. testing

Improvements

Hints

TODO1: 修改训练所用特征

TODO2:加入正则化

TODO3:修改神经网络结构

Add more layers

PReLU

TODO4:优化器

Adam

TODO5:修改错误

感想

Some useful links

Reference

Francis

引用和评论

2025年医疗大模型各医疗场景赋能实践研究报告130+份汇总解读|附PDF下载

vLLM 实战教程汇总，从环境配置到大模型部署，中文文档追踪重磅更新

性能远超SAM系模型，苏黎世大学等开发通用3D血管分割基础模型

【vLLM 学习】基础教程

【Triton 教程】triton.heuristics

18个常用的强化学习算法整理：从基础方法到高级模型的理论技术与代码实现

【TVM 教程】使用 TVMC Micro 执行微模型