课程首页:ML 2021 Spring (ntu.edu.tw)

# PyTorch
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

# For data preprocess
import numpy as np
import csv
import os

# For plotting
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure

myseed = 42069  # set a random seed for reproducibility
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
np.random.seed(myseed)
torch.manual_seed(myseed)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(myseed)

数据处理

1. 数据下载

Kaggle: ML2021Spring-hw1 | Kaggle

2. 检视数据

数据集分为 training data 和 testing data 两部分。

总览数据:

image.png

3. 预处理

三个数据集:

  • train: 训练集
  • dev: 验证集
  • test: 测试集(没有 target)

预处理:

  • 读取csv文件
  • 特征提取
  • covid.train.csv 分为训练集和测试集
  • 归一化数据

读入数据

我们用一个简化的数据集做演示。

path = 'DataExample.csv'

with open(path, 'r') as fp:
            data = list(csv.reader(fp))
            data = np.array(data[1:])[:, 1:].astype(float)

我们用一个简化的数据集做演示。

DataExample

idALAKAZcliilihh_cmnty_clitested_positive
01000.814610.771356225.648906919.586492
11000.83899520.807766525.679100620.1518381
21000.89780150.887893126.060543620.7049346
31000.97284210.965495925.754087121.2929114
41000.95530560.963078825.947015221.1666563

把数据转成用list存储

data = list(csv.reader(fp))
print(data)

out:

[['id', 'AL', 'AK', 'AZ', 'cli', 'ili', 'hh_cmnty_cli', 'tested_positive'], 
 ['0', '1', '0', '0', '0.81461', '0.7713562', '25.6489069', '19.586492'], 
 ['1', '1', '0', '0', '0.8389952', '0.8077665', '25.6791006', '20.1518381'], 
 ['2', '1', '0', '0', '0.8978015', '0.8878931', '26.0605436', '20.7049346'], 
 ['3', '1', '0', '0', '0.9728421', '0.9654959', '25.7540871', '21.2929114'], 
 ['4', '1', '0', '0', '0.9553056', '0.9630788', '25.9470152', '21.1666563']]

但我们不需要第1行和第1列

data = np.array(data[1:]) # 删去了第一行
print(data)

out:

[['0' '1' '0' '0' '0.81461' '0.7713562' '25.6489069' '19.586492']
 ['1' '1' '0' '0' '0.8389952' '0.8077665' '25.6791006' '20.1518381']
 ['2' '1' '0' '0' '0.8978015' '0.8878931' '26.0605436' '20.7049346']
 ['3' '1' '0' '0' '0.9728421' '0.9654959' '25.7540871' '21.2929114']
 ['4' '1' '0' '0' '0.9553056' '0.9630788' '25.9470152' '21.1666563']]
data = data[:, 1:].astype(float) # 删去了第一列,并把数据类型修改为浮点型
print(data)

out:

[[ 1.         0.         0.         0.81461    0.7713562 25.6489069
  19.586492 ]
 [ 1.         0.         0.         0.8389952  0.8077665 25.6791006
  20.1518381]
 [ 1.         0.         0.         0.8978015  0.8878931 26.0605436
  20.7049346]
 [ 1.         0.         0.         0.9728421  0.9654959 25.7540871
  21.2929114]
 [ 1.         0.         0.         0.9553056  0.9630788 25.9470152
  21.1666563]]

分数据集

DataExample

1000.814610.771356225.648906919.5864920.83899520.807766525.679100620.15183810.89780150.887893126.060543620.7049346
1000.83899520.807766525.679100620.15183810.89780150.887893126.060543620.70493460.97284210.965495925.754087121.2929114
1000.89780150.887893126.060543620.70493460.97284210.965495925.754087121.29291140.95530560.963078825.947015221.1666563
1000.97284210.965495925.754087121.29291140.95530560.963078825.947015221.16665630.94751340.968763726.350500819.8966066
1000.95530560.963078825.947015221.16665630.94751340.968763726.350500819.89660660.88383310.893020126.480623520.1784284

对于训练数据

feats = list(range(14)) # 14 = 3 + 4 + 4 + 3

target = data[:, -1]
data = data[:, feats]

print(target)
print(data)

out:

[20.7049346 21.2929114 21.1666563 19.8966066 20.1784284] # targets

[[ 1.         0.         0.         0.81461    0.7713562 25.6489069
  19.586492   0.8389952  0.8077665 25.6791006 20.1518381  0.8978015
   0.8878931 26.0605436]
 [ 1.         0.         0.         0.8389952  0.8077665 25.6791006
  20.1518381  0.8978015  0.8878931 26.0605436 20.7049346  0.9728421
   0.9654959 25.7540871]
 [ 1.         0.         0.         0.8978015  0.8878931 26.0605436
  20.7049346  0.9728421  0.9654959 25.7540871 21.2929114  0.9553056
   0.9630788 25.9470152]
 [ 1.         0.         0.         0.9728421  0.9654959 25.7540871
  21.2929114  0.9553056  0.9630788 25.9470152 21.1666563  0.9475134
   0.9687637 26.3505008]
 [ 1.         0.         0.         0.9553056  0.9630788 25.9470152
  21.1666563  0.9475134  0.9687637 26.3505008 19.8966066  0.8838331
   0.8930201 26.4806235]]

现在我们共有5笔data,接下来我们将训练数据分为训练集和测试集

# for train set
indices = [i for i in range(len(data)) if i % 3 != 0] 
print(indices)

out:

[1, 2, 4]

即训练集数据的下标为1, 2, 4

那么剩下的就是测试集的数据

# for dev set
indices_2 = [i for i in range(len(data)) if i %3 == 0]
print(indices_2)

out:

[0, 3]

接着我们把刚刚的data和target都转为tensor

data = torch.FloatTensor(data[indices])
target = torch.FloatTensor(target[indices])

print(data)
print(target)

out:

tensor([[ 1.0000,  0.0000,  0.0000,  0.8390,  0.8078, 25.6791, 20.1518,  0.8978,
          0.8879, 26.0605, 20.7049,  0.9728,  0.9655, 25.7541],
        [ 1.0000,  0.0000,  0.0000,  0.8978,  0.8879, 26.0605, 20.7049,  0.9728,
          0.9655, 25.7541, 21.2929,  0.9553,  0.9631, 25.9470],
        [ 1.0000,  0.0000,  0.0000,  0.9553,  0.9631, 25.9470, 21.1667,  0.9475,
          0.9688, 26.3505, 19.8966,  0.8838,  0.8930, 26.4806]])

tensor([21.2929, 21.1667, 20.1784])

数据归一化

可以看到不同feature的数据大小大不一样,为了平衡它们对模型的影响,有必要对数据归一化处理。方法是:

通常会将所有的数据归一化后使它们落在[-1,1]或者[0,1]之间。这里以后者为例。

线性函数归一化(Min-Max scaling):

对于一组数据,最小值为m,最大值为M,那么对于其中的任意一个数据X,其归一化公式为:

$$ X_{norm} = \frac{X - m}{M- m} $$

注:该方法实现对原始数据的等比例缩放。

0均值标准化(Z-score standardization) :

0均值归一化方法将原始数据集归一化为均值为0、方差1的数据集,归一化公式如下:

$$ z = \frac{x-\mu}{\sigma} $$

注:该种归一化方式要求原始数据的分布可以近似为高斯分布,否则归一化的效果会变得很糟糕!

这里我们采用0均值标准化。

data[:, 3:] =( data[:, 3:] - data[:, 3:].mean(dim=0, keepdim=True))\
             / data[:, 3:].std(dim=0, keepdim=True) # std 即标准差
print(data)

out:

tensor([[ 1.0000,  0.0000,  0.0000, -1.0037, -1.0104, -1.1051, -1.0286, -1.0893,
         -1.1540,  0.0184,  0.1048,  0.7532,  0.6065, -0.8144],
        [ 1.0000,  0.0000,  0.0000,  0.0075,  0.0212,  0.8424,  0.0599,  0.8764,
          0.5413, -1.0091,  0.9435,  0.3813,  0.5477, -0.3017],
        [ 1.0000,  0.0000,  0.0000,  0.9962,  0.9892,  0.2628,  0.9687,  0.2129,
          0.6127,  0.9907, -1.0483, -1.1346, -1.1542,  1.1161]])

在作业中我尝试了两种方法,发现线性函数归一化的收敛速度明显慢于0均值标准化,而且最后的精度低一些。

载入数据

A DataLoader loads data from a given Dataset into batches.

看看DataLoader与DataSet之间的关系,并了解什么是batch。
image.png

注:测试时shuffle要置为false,不然每次训练集顺序会不一样,有误差。

4. 完整代码

class COVID19Dataset(Dataset):
    ''' Dataset for loading and preprocessing the COVID19 dataset '''
    def __init__(self,
                 path,
                 mode='train',
                 target_only=False):
        self.mode = mode

        # Read data into numpy arrays
        with open(path, 'r') as fp:
            data = list(csv.reader(fp))
            data = np.array(data[1:])[:, 1:].astype(float)
        
        if not target_only:
            # feats存放的就是各个feature再data中对应的index
            feats = list(range(93)) # 93 = 40 states + day 1 (18) + day 2 (18) + day 3 (17)
        else:
            # TODO: Using 40 states & 2 tested_positive features (indices = 57 & 75)
            pass

        if mode == 'test':
            # Testing data
            # data: 893 x 93 (40 states + day 1 (18) + day 2 (18) + day 3 (17))
            data = data[:, feats]
            self.data = torch.FloatTensor(data)
        else:
            # Training data (train/dev sets)
            # data: 2700 x 94 (40 states + day 1 (18) + day 2 (18) + day 3 (18))
            target = data[:, -1]
            data = data[:, feats]
            
            # Splitting training data into train & dev sets
            if mode == 'train':
                indices = [i for i in range(len(data)) if i % 10 != 0]
            elif mode == 'dev':
                indices = [i for i in range(len(data)) if i % 10 == 0]
            
            # Convert data into PyTorch tensors
            self.data = torch.FloatTensor(data[indices])
            self.target = torch.FloatTensor(target[indices])

        # Normalize features (you may remove this part to see what will happen)
        self.data[:, 40:] = 
            (self.data[:, 40:] - self.data[:, 40:].mean(dim=0, keepdim=True)) 
            / self.data[:, 40:].std(dim=0, keepdim=True)

        self.dim = self.data.shape[1]

        print('Finished reading the {} set of COVID19 Dataset ({} samples found, each dim = {})'
              .format(mode, len(self.data), self.dim))

    def __getitem__(self, index):
        # Returns one sample at a time
        if self.mode in ['train', 'dev']:
            # For training
            return self.data[index], self.target[index]
        else:
            # For testing (no target)
            return self.data[index]

    def __len__(self):
        # Returns the size of the dataset
        return len(self.data)
    
# DataLoader

def prep_dataloader(path, mode, batch_size, n_jobs=0, target_only=False):
    ''' Generates a dataset, then is put into a dataloader. '''
    dataset = COVID19Dataset(path, mode=mode, target_only=target_only)  # Construct dataset
    dataloader = DataLoader(
        dataset, batch_size,
        shuffle=(mode == 'train'), drop_last=False,
        num_workers=n_jobs, pin_memory=True)                            # Construct dataloader
    return dataloader

网络构建

NeuralNet is an nn.Module designed for regression. The DNN consists of 2 fully-connected layers with ReLU activation. This module also included a function cal_loss for calculating loss.

1. ReLU

这里写图片描述

非线性激活函数

如果不使用激励函数,那么在这种情况下每一层的输出都是上层输入的线性函数,那么无论神经网络有多少层,输出都是输入的线性组合,与没有隐藏层效果相当。这就是最原始的感知机(perceptron)。

因此,我们决定引入非线性函数作为激励函数,这样深层神经网络就有意义了。其输出不再是输入的线性组合,而可以逼近任意函数,最早的想法是用sigmoid函数或者tanh函数,输出有界,很容易充当下一层的输入。

Why ReLU?

  • 采用sigmoid等函数,算激活函数时候(指数运算),计算量大,反向传播求误差梯度时,求导涉及除法,计算量相当大。而采用Relu激活函数,整个过程的计算量节省很多。
  • 对于深层网络,sigmoid函数反向传播时,很容易就出现梯度消失的情况(在sigmoid函数接近饱和区时,变化太缓慢,导数趋于0,这种情况会造成信息丢失),从而无法完成深层网络的训练。
  • Relu会使一部分神经元的输出为0,这样就造成了网络的稀疏性,并且减少了参数的相互依存关系,缓解了过拟合问题的发生。

2. nn.MSEloss()

$$ loss({\hat{y},y})=\frac1 n \displaystyle\sum(\hat{y}_i-y_i)^2 $$

pytorch中MSEloss()有两个布尔型参数:

参数作用
size_average是否求和后求平均
reduce是否输出为标量
reduction是否输出为标量

举例说明!

input = torch.randn(2,3,requires_grad=True) # prediction
target = torch.ones(2,3) # ground truth
print(f'input: {input}\n target: {target})

Out:

input: tensor([[-0.0733, -2.2085, -0.6919],
        [-1.1417, -1.1327, -1.5466]], requires_grad=True)
target: tensor([[1., 1., 1.],
        [1., 1., 1.]])

default

默认size_average=True,reduce=True。最后返回一个标量。

loss_1 = nn.MSELoss()
output_1 = loss_1(input,target)

print(f'loss_1: {output_1}')

Out:

loss_1: 1.8783622980117798

size_average=False

即求和后不会除以n!

loss_1 = nn.MSELoss(size_average=False)
output_1 = loss_1(input,target)

print(f'loss_1: {output_1}')

Out:

loss_1: 11.819371223449707

reduce=False

返回tensor。

loss_1 = nn.MSELoss(reduce=False)
output_1 = loss_1(input,target)

print(f'loss_1: {output_1}')

Out:

loss_1: tensor([[0.0039, 0.2338, 3.5550],
        [0.1358, 2.1851, 0.1533]], grad_fn=<MseLossBackward0>)

至于reduction,它其实就是size_average和reduce的结合!

这是legacy_get_string函数的一部分:

    if size_average is None:
        size_average = True
    if reduce is None:
        reduce = True

    if size_average and reduce:
        ret = 'mean'
    elif reduce:
        ret = 'sum'
    else:
        ret = 'none'
    if emit_warning:
        warnings.warn(warning.format(ret))
    return ret

3. Regularization

原始线性模型:

$$ f(\mathbf{x}) =b+\mathbf{w}^\mathrm{T}\mathbf{x}\\ $$

也即:

$$ f(\mathbf{x})=b+w_1x_1+w_2x_2+...+w_nx_n $$

但是,我们如何确定n呢?或者说我们怎么知道要为x设置几个特征呢?x维度太高容易过拟合(overfitting),太低又会欠拟合(underfitting)。这时,为了弱化某些维度的影响,让函数变得平滑,我们可以对函数正则化(regularization)。

$$ L(\mathbf{w,},b)=\displaystyle\sum(y_i-(b+\mathbf{w}^\mathrm{T}\mathbf{x_i}))^2+\lambda\displaystyle\sum_i (w_i)^2 $$

这里要注意的点是过拟合和欠拟合发生在测试集上。而对于训练集,维度越高,拟合效果越好,如图:
image.png

维度越高,函数域也就越大,那么当然可以覆盖到训练集上的最佳函数。

4. 完整代码

class NeuralNet(nn.Module):
    ''' A simple fully-connected deep neural network '''
    def __init__(self, input_dim):
        super(NeuralNet, self).__init__()

        # Define your neural network here
        # TODO: How to modify this model to achieve better performance?
        self.net = nn.Sequential(
            nn.Linear(input_dim, 64),
            nn.ReLU(),
            nn.Linear(64, 1)
        )

        # Mean squared error loss
        self.criterion = nn.MSELoss(reduction='mean')

    def forward(self, x):
        ''' Given input of size (batch_size x input_dim), compute output of the network '''
        return self.net(x).squeeze(1)

    def cal_loss(self, pred, target):
        ''' Calculate loss '''
        # TODO: you may implement L1/L2 regularization here
        return self.criterion(pred, target)

训练

1. 基本函数

getattr()

就相当于“."操作。参数如下:

  • object:对象的实例
  • name:字符串,对象的成员函数的名字或者成员变量
  • default:当对象中没有该属性时,返回的默认值
  • 异常:当没有该属性并且没有默认的返回值时,抛出"AttrbuteError"

getattr(object, name) = object.name

model.train() v.s. model.eval()

model.train

启用 Batch Normalization 和 Dropout。

如果模型中有BN层 (Batch Normalization) 和 Dropout,需要在训练时添加model.train()。model.train()是保证BN层能够用到每一批数据的均值和方差。对于Dropout,model.train()是随机取一部分网络连接来训练更新参数。

model.eval()

不启用 Batch Normalization 和 Dropout。

如果模型中有BN层(Batch Normalization)和Dropout,在测试时添加model.eval()。model.eval()是保证BN层能够用全部训练数据的均值和方差,即测试过程中要保证BN层的均值和方差不变。对于Dropout,model.eval()是利用到了所有网络连接,即不进行随机舍弃神经元。

训练完train样本后,生成的模型model要用来测试样本。在model(test)之前,需要加上model.eval(),否则的话,有输入数据,即使不训练,它也会改变权值。这是model中含有BN层和Dropout所带来的的性质。

detach().cpu()

detach()

  • 作用:阻断反向传播
  • 返回值:tensor,但变量仍在GPU上

cpu()

  • 作用:将数据移动到CPU上
  • 返回值:tensor

item()

  • 作用:获取tensor的值(tensor中只能有一个元素!)

numpy()

  • 作用:将tensor转成numpy array

2. 一个基本套路

在训练时遍历epochs的过程中,我们常常会依次使用optimizer.zero_grad()loss.backward()optimizer.step() 三个函数。

比如:

while epoch < n_epochs:
        model.train()                           # set model to training mode
        for x, y in tr_set:                     # iterate through the dataloader
            optimizer.zero_grad()               # set gradient to zero
            x, y = x.to(device), y.to(device)   # move data to device (cpu/cuda)
            pred = model(x)                     # forward pass (compute output)
            mse_loss = model.cal_loss(pred, y)  # compute loss
            mse_loss.backward()                 # compute gradient (backpropagation)
            optimizer.step()                    # update model with optimizer
            loss_record['train'].append(mse_loss.detach().cpu().item())
            ....

总的来说,它们的作用如下:

  • optimizer.zero_grad():梯度归零

    • 训练的过程通常使用mini-batch方法,所以如果不将梯度清零的话,梯度会与上一个batch的数据相关,因此该函数要写在反向传播和梯度下降之前。
  • loss.backward():反向传播计算每个参数的梯度

    • 如果没有进行tensor.backward()的话,梯度值将会是None,因此loss.backward()要写在optimizer.step()之前。
  • optimizer.step():梯度下降更新参数

    • step()函数的作用是执行一次优化步骤,通过梯度下降法来更新参数的值。因为梯度下降是基于梯度的,所以在执行optimizer.step()函数前应先执行loss.backward()函数来计算梯度。

函数中常见的参数变量:

  • param_groups:Optimizer类在实例化时会在构造函数中创建一个param_groups列表,列表中有num_groups个长度为 6 的param_group字典(num_groups取决于定义optimizer时传入了几组参数),每个param_group包含了 ['params', 'lr', 'momentum', 'dampening', 'weight_decay', 'nesterov'] 这6组键值对。
  • param_group['params']:由传入的模型参数组成的列表,即实例化Optimizer类时传入该group的参数,如果参数没有分组,则为整个模型的参数model.parameters(),每个参数是一个torch.nn.parameter.Parameter对象。

3. 完整代码

def train(tr_set, dv_set, model, config, device):
    ''' DNN training '''

    n_epochs = config['n_epochs']  # Maximum number of epochs

    # Setup optimizer
    optimizer = getattr(torch.optim, config['optimizer'])(
        model.parameters(), **config['optim_hparas'])

    min_mse = 1000.
    loss_record = {'train': [], 'dev': []}      # for recording training loss
    early_stop_cnt = 0
    epoch = 0
    while epoch < n_epochs:
        model.train()                           # set model to training mode
        for x, y in tr_set:                     # iterate through the dataloader
            optimizer.zero_grad()               # set gradient to zero
            x, y = x.to(device), y.to(device)   # move data to device (cpu/cuda)
            pred = model(x)                     # forward pass (compute output)
            mse_loss = model.cal_loss(pred, y)  # compute loss
            mse_loss.backward()                 # compute gradient (backpropagation)
            optimizer.step()                    # update model with optimizer
            loss_record['train'].append(mse_loss.detach().cpu().item())

        # After each epoch, test your model on the validation (development) set.
        dev_mse = dev(dv_set, model, device)
        if dev_mse < min_mse:
            # Save model if your model improved
            min_mse = dev_mse
            print('Saving model (epoch = {:4d}, loss = {:.4f})'
                .format(epoch + 1, min_mse))
            torch.save(model.state_dict(), config['save_path'])  # Save model to specified path
            early_stop_cnt = 0
        else:
            early_stop_cnt += 1

        epoch += 1
        loss_record['dev'].append(dev_mse)
        if early_stop_cnt > config['early_stop']:
            # Stop training if your model stops improving for "config['early_stop']" epochs.
            break

    print('Finished training after {} epochs'.format(epoch))
    return min_mse, loss_record

验证

dev() 与 train() 很相似,但注意 model 的模式是 eval() ,即不进行 BN 和 Dropout。

完整代码

def dev(dv_set, model, device):
    model.eval()                                # set model to evalutation mode
    total_loss = 0
    for x, y in dv_set:                         # iterate through the dataloader
        x, y = x.to(device), y.to(device)       # move data to device (cpu/cuda)
        with torch.no_grad():                   # disable gradient calculation
            pred = model(x)                     # forward pass (compute output)
            mse_loss = model.cal_loss(pred, y)  # compute loss
        total_loss += mse_loss.detach().cpu().item() * len(x)  # accumulate loss
    total_loss = total_loss / len(dv_set.dataset)              # compute averaged loss

    return total_loss

测试

1. torch.cat()

cat() 可以将多个tensor拼接在一起。

参数:

  • inputs:待连接的张量序列,可以是任意相同tensor类型的序列
  • dim:选择的扩维, 必须在0len(inputs[0])之间,沿着此维连接张量序列

注意

  • 输入数据必须是序列,序列中数据是任意相同的shape的同类型tensor
  • 维度不可以超过输入数据的任一个张量的维度

例如:

t1 = torch.Tensor([1,2,3])
t2 = torch.Tensor([4,5,6])
t3 = torch.Tensor([7,8,9])
list = [t1,t2,t3]
t = torch.cat(list,dim=0)
print(t)

Out:

tensor([1., 2., 3., 4., 5., 6., 7., 8., 9.])

修改dim:

t1 = torch.Tensor([[1,2,3],[3,2,1]])
t2 = torch.Tensor([[4,5,6],[6,5,4]])
t3 = torch.Tensor([[7,8,9],[9,8,7]])
list = [t1,t2,t3]
t = torch.cat(list,dim=1)
print(t)

Out:

tensor([[1., 2., 3., 4., 5., 6., 7., 8., 9.],
        [3., 2., 1., 6., 5., 4., 9., 8., 7.]])

2. 完整代码

def test(tt_set, model, device):
    model.eval()                                # set model to evalutation mode
    preds = []
    for x in tt_set:                            # iterate through the dataloader
        x = x.to(device)                        # move data to device (cpu/cuda)
        with torch.no_grad():                   # disable gradient calculation
            pred = model(x)                     # forward pass (compute output)
            preds.append(pred.detach().cpu())   # collect prediction
    preds = torch.cat(preds, dim=0).numpy()     # concatenate all predictions and convert to a numpy     array
    return preds

设置超参数

1. 完整代码

device = get_device()                 # get the current available device ('cpu' or 'cuda')
os.makedirs('models', exist_ok=True)  # The trained model will be saved to ./models/
target_only = False                   # TODO: Using 40 states & 2 tested_positive features

# TODO: How to tune these hyper-parameters to improve your model's performance?
config = {
    'n_epochs': 3000,                # maximum number of epochs
    'batch_size': 270,               # mini-batch size for dataloader
    'optimizer': 'SGD',              # optimization algorithm (optimizer in torch.optim)
    'optim_hparas': {                # hyper-parameters for the optimizer (depends on which optimizer you are using)
        'lr': 0.001,                 # learning rate of SGD
        'momentum': 0.9              # momentum for SGD
    },
    'early_stop': 200,               # early stopping epochs (the number epochs since your model's last improvement)
    'save_path': 'models/model.pth'  # your model will be saved here
}

载入数据和模型

1. 完整代码

tr_set = prep_dataloader(tr_path, 'train', config['batch_size'], target_only=target_only)
dv_set = prep_dataloader(tr_path, 'dev', config['batch_size'], target_only=target_only)
tt_set = prep_dataloader(tt_path, 'test', config['batch_size'], target_only=target_only)

开始!

model_loss, model_loss_record = train(tr_set, dv_set, model, config, device)

数据可视化

1. training

先看看 MSEloss 随训练次数增加的变化。

plot_learning_curve(model_loss_record, title='deep model')

image.png

再看看预测效果。

del model
model = NeuralNet(tr_set.dataset.dim).to(device) 
ckpt = torch.load(config['save_path'], map_location='cpu')  # Load your best model
model.load_state_dict(ckpt)
plot_pred(dv_set, model, device)  # Show prediction on the validation set

image.png

蓝线上的点表示预测值等于实际值。

2. testing

def save_pred(preds, file):
    ''' Save predictions to specified file '''
    print('Saving results to {}'.format(file))
    with open(file, 'w') as fp:
        writer = csv.writer(fp)
        writer.writerow(['id', 'tested_positive'])
        for i, p in enumerate(preds):
            writer.writerow([i, p])

preds = test(tt_set, model, device)  # predict COVID-19 cases with your model
save_pred(preds, 'pred.csv')         # save prediction file to pred.csv

preds结果:

image.png

Improvements

我们需要修改sample code,让模型更好!

Public leaderboard

  • simple baseline: 2.04826
  • medium baseline: 1.36937
  • strong baseline: 0.89266

Hints

  • Feature selection (what other features are useful?)
  • DNN architecture (layers? dimension? activation function?)
  • Training (mini-batch? optimizer? learning rate?)
  • L2 regularization
  • There are some mistakes in the sample code, can you find them?

TODO1: 修改训练所用特征

COVID19Dataset中,我们可以修改提取的特征。

 if not target_only:
            # feats存放的就是各个feature再data中对应的index
            feats = list(range(93)) # 93 = 40 states + day 1 (18) + day 2 (18) + day 3 (17)
        else:
            # TODO: Using 40 states & 2 tested_positive features (indices = 57 & 75)
            # 仅使用42个特征
            feats = list(range(40))
            feats.append(57)
            feats.append(75)
            pass

来看看最后的结果!

image.png

收敛速度大幅度提升!

image.png

image.png

得分:

image.png

TODO2:加入正则化

在设置超参数时,加入正则化。

config = {
    'n_epochs': 3000,                # maximum number of epochs
    'batch_size': 270,               # mini-batch size for dataloader
    'optimizer': 'SGD',              # optimization algorithm (optimizer in torch.optim)
    'optim_hparas': {                # hyper-parameters for the optimizer (depends on which optimizer you are using)
        'lr': 0.001,                 # learning rate of SGD
        'momentum': 0.9              # momentum for SGD
        'weight_decay': 0.1             # regularization
    },

结果有提升,但并不明显,而且收敛的速度变缓(合理)……

image.png

TODO3:修改神经网络结构

Add more layers

class NeuralNet(nn.Module):
    ''' A simple fully-connected deep neural network '''
    def __init__(self, input_dim):
        super(NeuralNet, self).__init__()

        # Define your neural network here
        # TODO: How to modify this model to achieve better performance?
        self.net = nn.Sequential(
            nn.Linear(input_dim, 128),
            nn.ReLU(),
            nn.Linear(128,64),
            nn.ReLU(),
            nn.Linear(64, 1)
        )
        ...

随缘调参,只不过是多加了个线性层和ReLU层……

有一定效果,但也不是很显著……

image.png

PReLU

把ReLU层改成PReLU,效果拉跨……

image.png

TODO4:优化器

Adam

一言难尽,比SGD稍微差了一点。

而且MSEloss似乎波动有点大……

image.png

TODO5:修改错误

对于测试集的标准化应当选取训练集的平均数与方差,原因是测试集可能很小,平均值和方差不能反映大量数据的特征。

if mode == 'test':
            # Testing data
            # data: 893 x 93 (40 states + day 1 (18) + day 2 (18) + day 3 (17))
            data = data[:, feats]
            self.data = torch.FloatTensor(data)
            self.data[:, 40:] = \
            (self.data[:, 40:] - self.data[:, 40:].mean(dim=0, keepdim=True)) \
            / self.data[:, 40:].std(dim=0, keepdim=True)
        else:
            # Training data (train/dev sets)
            # data: 2700 x 94 (40 states + day 1 (18) + day 2 (18) + day 3 (18))
            target = data[:, -1]
            data = data[:, feats]
            indices_train = [i for i in range(len(data)) if i % 10 != 0]
            tr_data = self.data = torch.FloatTensor(data[indices_train])
            tr_mean = tr_data[:, 40:].mean(dim=0, keepdim=True)
            tr_std = tr_data[:, 40:].std(dim=0, keepdim=True)
            # Splitting training data into train & dev sets
            if mode == 'train':
                indices = indices_train
                self.data = tr_data
                self.target = torch.FloatTensor(target[indices])
            elif mode == 'dev':
                indices = [i for i in range(len(data)) if i % 10 == 0]
                self.data = torch.FloatTensor(data[indices])
                self.target = torch.FloatTensor(target[indices])
            self.data[:, 40:] = \
            (self.data[:, 40:] - tr_mean) \
            / tr_std

但结果很奇怪,得分反而差了很多……
image.png

感想

花了两个下午调参,最后还没有第一次效果好……感觉自己对模型还不够熟悉且没有调参经验。这次作业就暂时告一段落了,那天知识储备丰富了说不定就可以越过strong baseline了……

Some useful links

Colab: ML2021Spring - HW1.ipynb - Colaboratory (google.com)

Pytorch Totorial P1: ML2021 Pytorch tutorial part 1 - YouTube

Kaggle: ML2021Spring-hw1 | Kaggle

学习笔记:李宏毅2021春季机器学习课程视频笔记1:Introduction, Colab & PyTorch Tutorials, HW1_诸神缄默不语的博客-CSDN博客

Regularization:pytorch实现L2和L1正则化regularization的方法_pan_jinquan的博客-CSDN博客

Activation Function: 常用激活函数(激励函数)理解与总结_tyhj_sf的博客空间-CSDN博客_激活函数

HWExample: 2021李宏毅机器学习课程作业一 - 简书 (jianshu.com)

Cross baseline: 李宏毅ML2021Spring HW1 - lizhi334 - 博客园 (cnblogs.com)

Reference

ReLU:Relu的作用_KAMITA的博客-CSDN博客

train()与eval():Pytorch:model.train()和model.eval()用法和区别,以及model.eval()和torch.no_grad()的区别_初识-CV的博客-CSDN博客&spm=1018.2226.3001.4187)

optimizer.step():理解optimizer.zero_grad(), loss.backward(), optimizer.step()的作用及原理_PanYHHH的博客-CSDN博客&spm=1018.2226.3001.4187)

Source: Heng-Jui Chang @ NTUEE (https://github.com/ga642381/M...)


Francis
1 声望0 粉丝