课程首页:ML 2021 Spring (ntu.edu.tw)
库
# PyTorch
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
# For data preprocess
import numpy as np
import csv
import os
# For plotting
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
myseed = 42069 # set a random seed for reproducibility
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
np.random.seed(myseed)
torch.manual_seed(myseed)
if torch.cuda.is_available():
torch.cuda.manual_seed_all(myseed)
数据处理
1. 数据下载
Kaggle: ML2021Spring-hw1 | Kaggle
2. 检视数据
数据集分为 training data 和 testing data 两部分。
总览数据:
3. 预处理
三个数据集:
train
: 训练集dev
: 验证集test
: 测试集(没有 target)
预处理:
- 读取csv文件
- 特征提取
- 将
covid.train.csv
分为训练集和测试集 - 归一化数据
读入数据
我们用一个简化的数据集做演示。
path = 'DataExample.csv'
with open(path, 'r') as fp:
data = list(csv.reader(fp))
data = np.array(data[1:])[:, 1:].astype(float)
我们用一个简化的数据集做演示。
DataExample
id | AL | AK | AZ | cli | ili | hh_cmnty_cli | tested_positive |
---|---|---|---|---|---|---|---|
0 | 1 | 0 | 0 | 0.81461 | 0.7713562 | 25.6489069 | 19.586492 |
1 | 1 | 0 | 0 | 0.8389952 | 0.8077665 | 25.6791006 | 20.1518381 |
2 | 1 | 0 | 0 | 0.8978015 | 0.8878931 | 26.0605436 | 20.7049346 |
3 | 1 | 0 | 0 | 0.9728421 | 0.9654959 | 25.7540871 | 21.2929114 |
4 | 1 | 0 | 0 | 0.9553056 | 0.9630788 | 25.9470152 | 21.1666563 |
把数据转成用list存储
data = list(csv.reader(fp))
print(data)
out:
[['id', 'AL', 'AK', 'AZ', 'cli', 'ili', 'hh_cmnty_cli', 'tested_positive'],
['0', '1', '0', '0', '0.81461', '0.7713562', '25.6489069', '19.586492'],
['1', '1', '0', '0', '0.8389952', '0.8077665', '25.6791006', '20.1518381'],
['2', '1', '0', '0', '0.8978015', '0.8878931', '26.0605436', '20.7049346'],
['3', '1', '0', '0', '0.9728421', '0.9654959', '25.7540871', '21.2929114'],
['4', '1', '0', '0', '0.9553056', '0.9630788', '25.9470152', '21.1666563']]
但我们不需要第1行和第1列
data = np.array(data[1:]) # 删去了第一行
print(data)
out:
[['0' '1' '0' '0' '0.81461' '0.7713562' '25.6489069' '19.586492']
['1' '1' '0' '0' '0.8389952' '0.8077665' '25.6791006' '20.1518381']
['2' '1' '0' '0' '0.8978015' '0.8878931' '26.0605436' '20.7049346']
['3' '1' '0' '0' '0.9728421' '0.9654959' '25.7540871' '21.2929114']
['4' '1' '0' '0' '0.9553056' '0.9630788' '25.9470152' '21.1666563']]
data = data[:, 1:].astype(float) # 删去了第一列,并把数据类型修改为浮点型
print(data)
out:
[[ 1. 0. 0. 0.81461 0.7713562 25.6489069
19.586492 ]
[ 1. 0. 0. 0.8389952 0.8077665 25.6791006
20.1518381]
[ 1. 0. 0. 0.8978015 0.8878931 26.0605436
20.7049346]
[ 1. 0. 0. 0.9728421 0.9654959 25.7540871
21.2929114]
[ 1. 0. 0. 0.9553056 0.9630788 25.9470152
21.1666563]]
分数据集
DataExample
1 | 0 | 0 | 0.81461 | 0.7713562 | 25.6489069 | 19.586492 | 0.8389952 | 0.8077665 | 25.6791006 | 20.1518381 | 0.8978015 | 0.8878931 | 26.0605436 | 20.7049346 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 0 | 0 | 0.8389952 | 0.8077665 | 25.6791006 | 20.1518381 | 0.8978015 | 0.8878931 | 26.0605436 | 20.7049346 | 0.9728421 | 0.9654959 | 25.7540871 | 21.2929114 |
1 | 0 | 0 | 0.8978015 | 0.8878931 | 26.0605436 | 20.7049346 | 0.9728421 | 0.9654959 | 25.7540871 | 21.2929114 | 0.9553056 | 0.9630788 | 25.9470152 | 21.1666563 |
1 | 0 | 0 | 0.9728421 | 0.9654959 | 25.7540871 | 21.2929114 | 0.9553056 | 0.9630788 | 25.9470152 | 21.1666563 | 0.9475134 | 0.9687637 | 26.3505008 | 19.8966066 |
1 | 0 | 0 | 0.9553056 | 0.9630788 | 25.9470152 | 21.1666563 | 0.9475134 | 0.9687637 | 26.3505008 | 19.8966066 | 0.8838331 | 0.8930201 | 26.4806235 | 20.1784284 |
对于训练数据
feats = list(range(14)) # 14 = 3 + 4 + 4 + 3
target = data[:, -1]
data = data[:, feats]
print(target)
print(data)
out:
[20.7049346 21.2929114 21.1666563 19.8966066 20.1784284] # targets
[[ 1. 0. 0. 0.81461 0.7713562 25.6489069
19.586492 0.8389952 0.8077665 25.6791006 20.1518381 0.8978015
0.8878931 26.0605436]
[ 1. 0. 0. 0.8389952 0.8077665 25.6791006
20.1518381 0.8978015 0.8878931 26.0605436 20.7049346 0.9728421
0.9654959 25.7540871]
[ 1. 0. 0. 0.8978015 0.8878931 26.0605436
20.7049346 0.9728421 0.9654959 25.7540871 21.2929114 0.9553056
0.9630788 25.9470152]
[ 1. 0. 0. 0.9728421 0.9654959 25.7540871
21.2929114 0.9553056 0.9630788 25.9470152 21.1666563 0.9475134
0.9687637 26.3505008]
[ 1. 0. 0. 0.9553056 0.9630788 25.9470152
21.1666563 0.9475134 0.9687637 26.3505008 19.8966066 0.8838331
0.8930201 26.4806235]]
现在我们共有5笔data,接下来我们将训练数据分为训练集和测试集
# for train set
indices = [i for i in range(len(data)) if i % 3 != 0]
print(indices)
out:
[1, 2, 4]
即训练集数据的下标为1, 2, 4
那么剩下的就是测试集的数据
# for dev set
indices_2 = [i for i in range(len(data)) if i %3 == 0]
print(indices_2)
out:
[0, 3]
接着我们把刚刚的data和target都转为tensor
data = torch.FloatTensor(data[indices])
target = torch.FloatTensor(target[indices])
print(data)
print(target)
out:
tensor([[ 1.0000, 0.0000, 0.0000, 0.8390, 0.8078, 25.6791, 20.1518, 0.8978,
0.8879, 26.0605, 20.7049, 0.9728, 0.9655, 25.7541],
[ 1.0000, 0.0000, 0.0000, 0.8978, 0.8879, 26.0605, 20.7049, 0.9728,
0.9655, 25.7541, 21.2929, 0.9553, 0.9631, 25.9470],
[ 1.0000, 0.0000, 0.0000, 0.9553, 0.9631, 25.9470, 21.1667, 0.9475,
0.9688, 26.3505, 19.8966, 0.8838, 0.8930, 26.4806]])
tensor([21.2929, 21.1667, 20.1784])
数据归一化
可以看到不同feature的数据大小大不一样,为了平衡它们对模型的影响,有必要对数据归一化处理。方法是:
通常会将所有的数据归一化后使它们落在[-1,1]或者[0,1]之间。这里以后者为例。
线性函数归一化(Min-Max scaling):
对于一组数据,最小值为m,最大值为M,那么对于其中的任意一个数据X,其归一化公式为:
$$ X_{norm} = \frac{X - m}{M- m} $$
注:该方法实现对原始数据的等比例缩放。
0均值标准化(Z-score standardization) :
0均值归一化方法将原始数据集归一化为均值为0、方差1的数据集,归一化公式如下:
$$ z = \frac{x-\mu}{\sigma} $$
注:该种归一化方式要求原始数据的分布可以近似为高斯分布,否则归一化的效果会变得很糟糕!
这里我们采用0均值标准化。
data[:, 3:] =( data[:, 3:] - data[:, 3:].mean(dim=0, keepdim=True))\
/ data[:, 3:].std(dim=0, keepdim=True) # std 即标准差
print(data)
out:
tensor([[ 1.0000, 0.0000, 0.0000, -1.0037, -1.0104, -1.1051, -1.0286, -1.0893,
-1.1540, 0.0184, 0.1048, 0.7532, 0.6065, -0.8144],
[ 1.0000, 0.0000, 0.0000, 0.0075, 0.0212, 0.8424, 0.0599, 0.8764,
0.5413, -1.0091, 0.9435, 0.3813, 0.5477, -0.3017],
[ 1.0000, 0.0000, 0.0000, 0.9962, 0.9892, 0.2628, 0.9687, 0.2129,
0.6127, 0.9907, -1.0483, -1.1346, -1.1542, 1.1161]])
在作业中我尝试了两种方法,发现线性函数归一化的收敛速度明显慢于0均值标准化,而且最后的精度低一些。
载入数据
ADataLoader
loads data from a givenDataset
into batches.
看看DataLoader与DataSet之间的关系,并了解什么是batch。
注:测试时shuffle要置为false,不然每次训练集顺序会不一样,有误差。
4. 完整代码
class COVID19Dataset(Dataset):
''' Dataset for loading and preprocessing the COVID19 dataset '''
def __init__(self,
path,
mode='train',
target_only=False):
self.mode = mode
# Read data into numpy arrays
with open(path, 'r') as fp:
data = list(csv.reader(fp))
data = np.array(data[1:])[:, 1:].astype(float)
if not target_only:
# feats存放的就是各个feature再data中对应的index
feats = list(range(93)) # 93 = 40 states + day 1 (18) + day 2 (18) + day 3 (17)
else:
# TODO: Using 40 states & 2 tested_positive features (indices = 57 & 75)
pass
if mode == 'test':
# Testing data
# data: 893 x 93 (40 states + day 1 (18) + day 2 (18) + day 3 (17))
data = data[:, feats]
self.data = torch.FloatTensor(data)
else:
# Training data (train/dev sets)
# data: 2700 x 94 (40 states + day 1 (18) + day 2 (18) + day 3 (18))
target = data[:, -1]
data = data[:, feats]
# Splitting training data into train & dev sets
if mode == 'train':
indices = [i for i in range(len(data)) if i % 10 != 0]
elif mode == 'dev':
indices = [i for i in range(len(data)) if i % 10 == 0]
# Convert data into PyTorch tensors
self.data = torch.FloatTensor(data[indices])
self.target = torch.FloatTensor(target[indices])
# Normalize features (you may remove this part to see what will happen)
self.data[:, 40:] =
(self.data[:, 40:] - self.data[:, 40:].mean(dim=0, keepdim=True))
/ self.data[:, 40:].std(dim=0, keepdim=True)
self.dim = self.data.shape[1]
print('Finished reading the {} set of COVID19 Dataset ({} samples found, each dim = {})'
.format(mode, len(self.data), self.dim))
def __getitem__(self, index):
# Returns one sample at a time
if self.mode in ['train', 'dev']:
# For training
return self.data[index], self.target[index]
else:
# For testing (no target)
return self.data[index]
def __len__(self):
# Returns the size of the dataset
return len(self.data)
# DataLoader
def prep_dataloader(path, mode, batch_size, n_jobs=0, target_only=False):
''' Generates a dataset, then is put into a dataloader. '''
dataset = COVID19Dataset(path, mode=mode, target_only=target_only) # Construct dataset
dataloader = DataLoader(
dataset, batch_size,
shuffle=(mode == 'train'), drop_last=False,
num_workers=n_jobs, pin_memory=True) # Construct dataloader
return dataloader
网络构建
NeuralNet
is annn.Module
designed for regression. The DNN consists of 2 fully-connected layers with ReLU activation. This module also included a functioncal_loss
for calculating loss.
1. ReLU
非线性激活函数
如果不使用激励函数,那么在这种情况下每一层的输出都是上层输入的线性函数,那么无论神经网络有多少层,输出都是输入的线性组合,与没有隐藏层效果相当。这就是最原始的感知机(perceptron)。
因此,我们决定引入非线性函数作为激励函数,这样深层神经网络就有意义了。其输出不再是输入的线性组合,而可以逼近任意函数,最早的想法是用sigmoid函数或者tanh函数,输出有界,很容易充当下一层的输入。
Why ReLU?
- 采用sigmoid等函数,算激活函数时候(指数运算),计算量大,反向传播求误差梯度时,求导涉及除法,计算量相当大。而采用Relu激活函数,整个过程的计算量节省很多。
- 对于深层网络,sigmoid函数反向传播时,很容易就出现梯度消失的情况(在sigmoid函数接近饱和区时,变化太缓慢,导数趋于0,这种情况会造成信息丢失),从而无法完成深层网络的训练。
- Relu会使一部分神经元的输出为0,这样就造成了网络的稀疏性,并且减少了参数的相互依存关系,缓解了过拟合问题的发生。
2. nn.MSEloss()
$$ loss({\hat{y},y})=\frac1 n \displaystyle\sum(\hat{y}_i-y_i)^2 $$
pytorch中MSEloss()有两个布尔型参数:
参数 | 作用 |
---|---|
size_average | 是否求和后求平均 |
reduce | 是否输出为标量 |
reduction | 是否输出为标量 |
举例说明!
input = torch.randn(2,3,requires_grad=True) # prediction
target = torch.ones(2,3) # ground truth
print(f'input: {input}\n target: {target})
Out:
input: tensor([[-0.0733, -2.2085, -0.6919],
[-1.1417, -1.1327, -1.5466]], requires_grad=True)
target: tensor([[1., 1., 1.],
[1., 1., 1.]])
default
默认size_average=True,reduce=True。最后返回一个标量。
loss_1 = nn.MSELoss()
output_1 = loss_1(input,target)
print(f'loss_1: {output_1}')
Out:
loss_1: 1.8783622980117798
size_average=False
即求和后不会除以n!
loss_1 = nn.MSELoss(size_average=False)
output_1 = loss_1(input,target)
print(f'loss_1: {output_1}')
Out:
loss_1: 11.819371223449707
reduce=False
返回tensor。
loss_1 = nn.MSELoss(reduce=False)
output_1 = loss_1(input,target)
print(f'loss_1: {output_1}')
Out:
loss_1: tensor([[0.0039, 0.2338, 3.5550],
[0.1358, 2.1851, 0.1533]], grad_fn=<MseLossBackward0>)
至于reduction,它其实就是size_average和reduce的结合!
这是legacy_get_string函数的一部分:
if size_average is None:
size_average = True
if reduce is None:
reduce = True
if size_average and reduce:
ret = 'mean'
elif reduce:
ret = 'sum'
else:
ret = 'none'
if emit_warning:
warnings.warn(warning.format(ret))
return ret
3. Regularization
原始线性模型:
$$ f(\mathbf{x}) =b+\mathbf{w}^\mathrm{T}\mathbf{x}\\ $$
也即:
$$ f(\mathbf{x})=b+w_1x_1+w_2x_2+...+w_nx_n $$
但是,我们如何确定n呢?或者说我们怎么知道要为x设置几个特征呢?x维度太高容易过拟合(overfitting),太低又会欠拟合(underfitting)。这时,为了弱化某些维度的影响,让函数变得平滑,我们可以对函数正则化(regularization)。
$$ L(\mathbf{w,},b)=\displaystyle\sum(y_i-(b+\mathbf{w}^\mathrm{T}\mathbf{x_i}))^2+\lambda\displaystyle\sum_i (w_i)^2 $$
这里要注意的点是过拟合和欠拟合发生在测试集上。而对于训练集,维度越高,拟合效果越好,如图:
维度越高,函数域也就越大,那么当然可以覆盖到训练集上的最佳函数。
4. 完整代码
class NeuralNet(nn.Module):
''' A simple fully-connected deep neural network '''
def __init__(self, input_dim):
super(NeuralNet, self).__init__()
# Define your neural network here
# TODO: How to modify this model to achieve better performance?
self.net = nn.Sequential(
nn.Linear(input_dim, 64),
nn.ReLU(),
nn.Linear(64, 1)
)
# Mean squared error loss
self.criterion = nn.MSELoss(reduction='mean')
def forward(self, x):
''' Given input of size (batch_size x input_dim), compute output of the network '''
return self.net(x).squeeze(1)
def cal_loss(self, pred, target):
''' Calculate loss '''
# TODO: you may implement L1/L2 regularization here
return self.criterion(pred, target)
训练
1. 基本函数
getattr()
就相当于“."操作。参数如下:
- object:对象的实例
- name:字符串,对象的成员函数的名字或者成员变量
- default:当对象中没有该属性时,返回的默认值
- 异常:当没有该属性并且没有默认的返回值时,抛出"AttrbuteError"
getattr(object, name) = object.name
model.train() v.s. model.eval()
model.train
启用 Batch Normalization 和 Dropout。
如果模型中有BN层 (Batch Normalization) 和 Dropout,需要在训练时添加model.train()。model.train()是保证BN层能够用到每一批数据的均值和方差。对于Dropout,model.train()是随机取一部分网络连接来训练更新参数。
model.eval()
不启用 Batch Normalization 和 Dropout。
如果模型中有BN层(Batch Normalization)和Dropout,在测试时添加model.eval()。model.eval()是保证BN层能够用全部训练数据的均值和方差,即测试过程中要保证BN层的均值和方差不变。对于Dropout,model.eval()是利用到了所有网络连接,即不进行随机舍弃神经元。
训练完train样本后,生成的模型model要用来测试样本。在model(test)之前,需要加上model.eval(),否则的话,有输入数据,即使不训练,它也会改变权值。这是model中含有BN层和Dropout所带来的的性质。
detach().cpu()
detach()
- 作用:阻断反向传播
- 返回值:tensor,但变量仍在GPU上
cpu()
- 作用:将数据移动到CPU上
- 返回值:tensor
item()
- 作用:获取tensor的值(tensor中只能有一个元素!)
numpy()
- 作用:将tensor转成numpy array
2. 一个基本套路
在训练时遍历epochs的过程中,我们常常会依次使用optimizer.zero_grad(),loss.backward() 和 optimizer.step() 三个函数。
比如:
while epoch < n_epochs:
model.train() # set model to training mode
for x, y in tr_set: # iterate through the dataloader
optimizer.zero_grad() # set gradient to zero
x, y = x.to(device), y.to(device) # move data to device (cpu/cuda)
pred = model(x) # forward pass (compute output)
mse_loss = model.cal_loss(pred, y) # compute loss
mse_loss.backward() # compute gradient (backpropagation)
optimizer.step() # update model with optimizer
loss_record['train'].append(mse_loss.detach().cpu().item())
....
总的来说,它们的作用如下:
optimizer.zero_grad():梯度归零
- 训练的过程通常使用mini-batch方法,所以如果不将梯度清零的话,梯度会与上一个batch的数据相关,因此该函数要写在反向传播和梯度下降之前。
loss.backward():反向传播计算每个参数的梯度
- 如果没有进行tensor.backward()的话,梯度值将会是None,因此loss.backward()要写在optimizer.step()之前。
optimizer.step():梯度下降更新参数
- step()函数的作用是执行一次优化步骤,通过梯度下降法来更新参数的值。因为梯度下降是基于梯度的,所以在执行optimizer.step()函数前应先执行loss.backward()函数来计算梯度。
函数中常见的参数变量:
- param_groups:Optimizer类在实例化时会在构造函数中创建一个param_groups列表,列表中有num_groups个长度为 6 的param_group字典(num_groups取决于定义optimizer时传入了几组参数),每个param_group包含了 ['params', 'lr', 'momentum', 'dampening', 'weight_decay', 'nesterov'] 这6组键值对。
- param_group['params']:由传入的模型参数组成的列表,即实例化Optimizer类时传入该group的参数,如果参数没有分组,则为整个模型的参数model.parameters(),每个参数是一个torch.nn.parameter.Parameter对象。
3. 完整代码
def train(tr_set, dv_set, model, config, device):
''' DNN training '''
n_epochs = config['n_epochs'] # Maximum number of epochs
# Setup optimizer
optimizer = getattr(torch.optim, config['optimizer'])(
model.parameters(), **config['optim_hparas'])
min_mse = 1000.
loss_record = {'train': [], 'dev': []} # for recording training loss
early_stop_cnt = 0
epoch = 0
while epoch < n_epochs:
model.train() # set model to training mode
for x, y in tr_set: # iterate through the dataloader
optimizer.zero_grad() # set gradient to zero
x, y = x.to(device), y.to(device) # move data to device (cpu/cuda)
pred = model(x) # forward pass (compute output)
mse_loss = model.cal_loss(pred, y) # compute loss
mse_loss.backward() # compute gradient (backpropagation)
optimizer.step() # update model with optimizer
loss_record['train'].append(mse_loss.detach().cpu().item())
# After each epoch, test your model on the validation (development) set.
dev_mse = dev(dv_set, model, device)
if dev_mse < min_mse:
# Save model if your model improved
min_mse = dev_mse
print('Saving model (epoch = {:4d}, loss = {:.4f})'
.format(epoch + 1, min_mse))
torch.save(model.state_dict(), config['save_path']) # Save model to specified path
early_stop_cnt = 0
else:
early_stop_cnt += 1
epoch += 1
loss_record['dev'].append(dev_mse)
if early_stop_cnt > config['early_stop']:
# Stop training if your model stops improving for "config['early_stop']" epochs.
break
print('Finished training after {} epochs'.format(epoch))
return min_mse, loss_record
验证
dev() 与 train() 很相似,但注意 model 的模式是 eval() ,即不进行 BN 和 Dropout。
完整代码
def dev(dv_set, model, device):
model.eval() # set model to evalutation mode
total_loss = 0
for x, y in dv_set: # iterate through the dataloader
x, y = x.to(device), y.to(device) # move data to device (cpu/cuda)
with torch.no_grad(): # disable gradient calculation
pred = model(x) # forward pass (compute output)
mse_loss = model.cal_loss(pred, y) # compute loss
total_loss += mse_loss.detach().cpu().item() * len(x) # accumulate loss
total_loss = total_loss / len(dv_set.dataset) # compute averaged loss
return total_loss
测试
1. torch.cat()
cat() 可以将多个tensor拼接在一起。
参数:
- inputs:待连接的张量序列,可以是任意相同tensor类型的序列
- dim:选择的扩维, 必须在
0
到len(inputs[0])
之间,沿着此维连接张量序列
注意
- 输入数据必须是序列,序列中数据是任意相同的shape的同类型tensor
- 维度不可以超过输入数据的任一个张量的维度
例如:
t1 = torch.Tensor([1,2,3])
t2 = torch.Tensor([4,5,6])
t3 = torch.Tensor([7,8,9])
list = [t1,t2,t3]
t = torch.cat(list,dim=0)
print(t)
Out:
tensor([1., 2., 3., 4., 5., 6., 7., 8., 9.])
修改dim:
t1 = torch.Tensor([[1,2,3],[3,2,1]])
t2 = torch.Tensor([[4,5,6],[6,5,4]])
t3 = torch.Tensor([[7,8,9],[9,8,7]])
list = [t1,t2,t3]
t = torch.cat(list,dim=1)
print(t)
Out:
tensor([[1., 2., 3., 4., 5., 6., 7., 8., 9.],
[3., 2., 1., 6., 5., 4., 9., 8., 7.]])
2. 完整代码
def test(tt_set, model, device):
model.eval() # set model to evalutation mode
preds = []
for x in tt_set: # iterate through the dataloader
x = x.to(device) # move data to device (cpu/cuda)
with torch.no_grad(): # disable gradient calculation
pred = model(x) # forward pass (compute output)
preds.append(pred.detach().cpu()) # collect prediction
preds = torch.cat(preds, dim=0).numpy() # concatenate all predictions and convert to a numpy array
return preds
设置超参数
1. 完整代码
device = get_device() # get the current available device ('cpu' or 'cuda')
os.makedirs('models', exist_ok=True) # The trained model will be saved to ./models/
target_only = False # TODO: Using 40 states & 2 tested_positive features
# TODO: How to tune these hyper-parameters to improve your model's performance?
config = {
'n_epochs': 3000, # maximum number of epochs
'batch_size': 270, # mini-batch size for dataloader
'optimizer': 'SGD', # optimization algorithm (optimizer in torch.optim)
'optim_hparas': { # hyper-parameters for the optimizer (depends on which optimizer you are using)
'lr': 0.001, # learning rate of SGD
'momentum': 0.9 # momentum for SGD
},
'early_stop': 200, # early stopping epochs (the number epochs since your model's last improvement)
'save_path': 'models/model.pth' # your model will be saved here
}
载入数据和模型
1. 完整代码
tr_set = prep_dataloader(tr_path, 'train', config['batch_size'], target_only=target_only)
dv_set = prep_dataloader(tr_path, 'dev', config['batch_size'], target_only=target_only)
tt_set = prep_dataloader(tt_path, 'test', config['batch_size'], target_only=target_only)
开始!
model_loss, model_loss_record = train(tr_set, dv_set, model, config, device)
数据可视化
1. training
先看看 MSEloss 随训练次数增加的变化。
plot_learning_curve(model_loss_record, title='deep model')
再看看预测效果。
del model
model = NeuralNet(tr_set.dataset.dim).to(device)
ckpt = torch.load(config['save_path'], map_location='cpu') # Load your best model
model.load_state_dict(ckpt)
plot_pred(dv_set, model, device) # Show prediction on the validation set
蓝线上的点表示预测值等于实际值。
2. testing
def save_pred(preds, file):
''' Save predictions to specified file '''
print('Saving results to {}'.format(file))
with open(file, 'w') as fp:
writer = csv.writer(fp)
writer.writerow(['id', 'tested_positive'])
for i, p in enumerate(preds):
writer.writerow([i, p])
preds = test(tt_set, model, device) # predict COVID-19 cases with your model
save_pred(preds, 'pred.csv') # save prediction file to pred.csv
preds结果:
Improvements
我们需要修改sample code,让模型更好!
Public leaderboard
- simple baseline: 2.04826
- medium baseline: 1.36937
- strong baseline: 0.89266
Hints
- Feature selection (what other features are useful?)
- DNN architecture (layers? dimension? activation function?)
- Training (mini-batch? optimizer? learning rate?)
- L2 regularization
- There are some mistakes in the sample code, can you find them?
TODO1: 修改训练所用特征
在COVID19Dataset
中,我们可以修改提取的特征。
if not target_only:
# feats存放的就是各个feature再data中对应的index
feats = list(range(93)) # 93 = 40 states + day 1 (18) + day 2 (18) + day 3 (17)
else:
# TODO: Using 40 states & 2 tested_positive features (indices = 57 & 75)
# 仅使用42个特征
feats = list(range(40))
feats.append(57)
feats.append(75)
pass
来看看最后的结果!
收敛速度大幅度提升!
得分:
TODO2:加入正则化
在设置超参数时,加入正则化。
config = {
'n_epochs': 3000, # maximum number of epochs
'batch_size': 270, # mini-batch size for dataloader
'optimizer': 'SGD', # optimization algorithm (optimizer in torch.optim)
'optim_hparas': { # hyper-parameters for the optimizer (depends on which optimizer you are using)
'lr': 0.001, # learning rate of SGD
'momentum': 0.9 # momentum for SGD
'weight_decay': 0.1 # regularization
},
结果有提升,但并不明显,而且收敛的速度变缓(合理)……
TODO3:修改神经网络结构
Add more layers
class NeuralNet(nn.Module):
''' A simple fully-connected deep neural network '''
def __init__(self, input_dim):
super(NeuralNet, self).__init__()
# Define your neural network here
# TODO: How to modify this model to achieve better performance?
self.net = nn.Sequential(
nn.Linear(input_dim, 128),
nn.ReLU(),
nn.Linear(128,64),
nn.ReLU(),
nn.Linear(64, 1)
)
...
随缘调参,只不过是多加了个线性层和ReLU层……
有一定效果,但也不是很显著……
PReLU
把ReLU层改成PReLU,效果拉跨……
TODO4:优化器
Adam
一言难尽,比SGD稍微差了一点。
而且MSEloss似乎波动有点大……
TODO5:修改错误
对于测试集的标准化应当选取训练集的平均数与方差,原因是测试集可能很小,平均值和方差不能反映大量数据的特征。
if mode == 'test':
# Testing data
# data: 893 x 93 (40 states + day 1 (18) + day 2 (18) + day 3 (17))
data = data[:, feats]
self.data = torch.FloatTensor(data)
self.data[:, 40:] = \
(self.data[:, 40:] - self.data[:, 40:].mean(dim=0, keepdim=True)) \
/ self.data[:, 40:].std(dim=0, keepdim=True)
else:
# Training data (train/dev sets)
# data: 2700 x 94 (40 states + day 1 (18) + day 2 (18) + day 3 (18))
target = data[:, -1]
data = data[:, feats]
indices_train = [i for i in range(len(data)) if i % 10 != 0]
tr_data = self.data = torch.FloatTensor(data[indices_train])
tr_mean = tr_data[:, 40:].mean(dim=0, keepdim=True)
tr_std = tr_data[:, 40:].std(dim=0, keepdim=True)
# Splitting training data into train & dev sets
if mode == 'train':
indices = indices_train
self.data = tr_data
self.target = torch.FloatTensor(target[indices])
elif mode == 'dev':
indices = [i for i in range(len(data)) if i % 10 == 0]
self.data = torch.FloatTensor(data[indices])
self.target = torch.FloatTensor(target[indices])
self.data[:, 40:] = \
(self.data[:, 40:] - tr_mean) \
/ tr_std
但结果很奇怪,得分反而差了很多……
感想
花了两个下午调参,最后还没有第一次效果好……感觉自己对模型还不够熟悉且没有调参经验。这次作业就暂时告一段落了,那天知识储备丰富了说不定就可以越过strong baseline了……
Some useful links
Colab: ML2021Spring - HW1.ipynb - Colaboratory (google.com)
Pytorch Totorial P1: ML2021 Pytorch tutorial part 1 - YouTube
Kaggle: ML2021Spring-hw1 | Kaggle
学习笔记:李宏毅2021春季机器学习课程视频笔记1:Introduction, Colab & PyTorch Tutorials, HW1_诸神缄默不语的博客-CSDN博客
Regularization:pytorch实现L2和L1正则化regularization的方法_pan_jinquan的博客-CSDN博客
Activation Function: 常用激活函数(激励函数)理解与总结_tyhj_sf的博客空间-CSDN博客_激活函数
HWExample: 2021李宏毅机器学习课程作业一 - 简书 (jianshu.com)
Cross baseline: 李宏毅ML2021Spring HW1 - lizhi334 - 博客园 (cnblogs.com)
Reference
train()与eval():Pytorch:model.train()和model.eval()用法和区别,以及model.eval()和torch.no_grad()的区别_初识-CV的博客-CSDN博客&spm=1018.2226.3001.4187)
optimizer.step():理解optimizer.zero_grad(), loss.backward(), optimizer.step()的作用及原理_PanYHHH的博客-CSDN博客&spm=1018.2226.3001.4187)
Source: Heng-Jui Chang @ NTUEE (https://github.com/ga642381/M...)
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。