在过去的从实际深度学习项目来学习深度学习1中学习了pix2pixHD的深度学习项目代码,本篇博客从实际深度学习项目来学习深度学习的第2篇,在第2篇中将完成DeepSC项目代码的学习
DeepSC项目概述
DeepSC即Deep Learning Enabled Semantic Communication Systems,深度学习使能的语义通信系统,是文本信源的语义通信系统的研究中的经典文献,对于其原理在这里不做赘述,细节可以看一下论文,大概需要了解其网络结构即可,如下图所示:
DeepSC的具体细节会在后续项目学习中和代码一一对应,DeepSC项目的Github链接如下,我们这里主要关注Pytorch版:
DeepSC Pytorch版
DeepSC TensorFlow版
首先可以阅读一下README.md文件,对项目做一个总体的认识。DeepSC项目的README写的比较简洁,分为以下几个部分:
Requirements:DeepSC需要的一些Python包在requirements.txt中给出,可以运行
pip install -r requirements.txt
命令进行安装,这些Python包如下所示:- torch、nltk、w3lib、tqdm、sklearn、bert4keras==0.4.2
Preprocess:因为DeepSC是针对文本信源的,因此需要对文本进行一些预处理才能作为网络模型的输入,这里给出了一些命令完成这项工作:
mkdir data wget http://www.statmt.org/europarl/v7/europarl.tgz tar zxvf europarl.tgz python preprocess_text.py
- Train:模型训练,直接运行
python main.py
命令即可,README.md文件中还提到要谨慎设置互信息损失部分的λ值,这一点后面再结合代码来说 - Evaluation:模型测试,直接运行
python performance.py
命令即可
最后,DeepSC的README.md文件还提醒了如果需要计算句子相似度(Sentence Similarity)的话则需要下载BERT模型
DeepSC项目学习
对于DeepSC项目的学习可以分为三个部分,文本预处理部分、训练部分和测试部分,在上一节中,发现训练其实是运行了main.py,而测试其实是运行了performance.py,因此可以从这两个Python文件入手
文本预处理部分
对于文本预处理,从上面的命令可以看出是运行了preprocess_text.py
文件,因此这一部分从这个.py文件入手
训练部分
DeepSC项目的训练部分从main.py作为起点,先不用管.py文件开头import了什么模块,等后面用到了再说,这里为了方便根据不同的功能将main.py分为四个部分(不包括.py文件开头的import部分),分别进行学习
main.py的第一部分——项目的参数设置
这一部分的代码在main.py文件的最开头,如下所示:
parser = argparse.ArgumentParser()
#parser.add_argument('--data-dir', default='data/train_data.pkl', type=str)
parser.add_argument('--vocab-file', default='europarl/Europarl_vocabulary.json', type=str)
parser.add_argument('--checkpoint-path', default='checkpoints/deepsc-Rayleigh', type=str)
parser.add_argument('--channel', default='AWGN', type=str, help = 'Please choose AWGN, Rayleigh, and Rician')
parser.add_argument('--MAX-LENGTH', default=30, type=int)
parser.add_argument('--MIN-LENGTH', default=4, type=int)
parser.add_argument('--d-model', default=128, type=int)
parser.add_argument('--dff', default=512, type=int)
parser.add_argument('--num-layers', default=4, type=int)
parser.add_argument('--num-heads', default=8, type=int)
parser.add_argument('--batch-size', default=128, type=int)
parser.add_argument('--epochs', default=80, type=int)
argparse模块是Python用于解析命令行参数和选项的标准模块,可以帮助编写用户友好的命令行接口,而对argparse的使用可以总结为如下五个步骤:
import argparse
,即导入argparse模块parser = argparse.ArgumentParser(description='xxxx')
,即创建一个解析对象,该对象包含将命令行输入内容解析成Python数据的过程所需的全部功能,而description
是该对象的描述信息,可以在命令中加入-h
查看parser.add_argument('xxx', type=int, help='xxx')
,即添加需要输入的命令行参数,括号中依次为参数名、参数类型(这里是int,默认数据类型为str)、描述信息args = parser.parse_args()
,ArgumentParser通过parse_args()方法解析参数,获取到命令行中输入的参数- 调用命令行中的输入的参数完成功能
这一部分完成了前3步,在main.py文件中的调用内置类属性__name__
的代码中,完成了第四步,即args = parser.parse_args()
,这些参数在后面会用到
main.py的第二部分——导入经过文本预处理后的数据
这一部分的代码如下所示:
args.vocab_file = 'C/import/antennas/Datasets/hx301/' + args.vocab_file
""" preparing the dataset """
vocab = json.load(open(args.vocab_file, 'rb'))
token_to_idx = vocab['token_to_idx']
num_vocab = len(token_to_idx)
pad_idx = token_to_idx["<PAD>"]
start_idx = token_to_idx["<START>"]
end_idx = token_to_idx["<END>"]
可见这一部分就是调用了之前在文本预处理部分保存的vocabulary,token_to_idx存放token和index对应的字典,并且将<PAD> token、<START> token和<END> token对应的index取出以提供后续使用
main.py的第三部分——模型与优化器相关定义
这一部分的代码如下所示:
""" define optimizer and loss function """
deepsc = DeepSC(args.num_layers, num_vocab, num_vocab,
num_vocab, num_vocab, args.d_model, args.num_heads,
args.dff, 0.1).to(device)
mi_net = Mine().to(device)
criterion = nn.CrossEntropyLoss(reduction = 'none')
optimizer = torch.optim.Adam(deepsc.parameters(),
lr=1e-4, betas=(0.9, 0.98), eps=1e-8, weight_decay = 5e-4)
mi_opt = torch.optim.Adam(mi_net.parameters(), lr=1e-3)
initNetParams(deepsc)
可见这一部分定义了两个模型,一个是deepsc,即DeepSC文献中提到的语义通信系统;另一个是mi_net,即文献中提到的互信息估计网络( mutual information estimation model),因为这个网络的论文我还没细看,因此在这里就先不做过多讲解,把注意力集中在语义通信系统中。然后这一部分就定义了一些损失函数和优化器,损失函数采用了交叉熵,而deepsc模型的优化器采用了Adam
需要注意的是,这里的模型有一个.to(device)
的操作,device是在前面定义的:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
可见这一操作就是将模型加载到GPU上
这一部分有两个东西需要进行讲解,一个是DeepSC
类,定义了整个网络模型;另一个是initNetParams()
函数,初始化网络的权重,下面就分节对其进行介绍
DeepSC类
DeepSC类的代码如下所示:
class DeepSC(nn.Module):
def __init__(self, num_layers, src_vocab_size, trg_vocab_size, src_max_len,
trg_max_len, d_model, num_heads, dff, dropout = 0.1):
super(DeepSC, self).__init__()
self.encoder = Encoder(num_layers, src_vocab_size, src_max_len,
d_model, num_heads, dff, dropout)
self.channel_encoder = nn.Sequential(nn.Linear(d_model, 256),
#nn.ELU(inplace=True),
nn.ReLU(inplace=True),
nn.Linear(256, 16))
self.channel_decoder = ChannelDecoder(16, d_model, 512)
self.decoder = Decoder(num_layers, trg_vocab_size, trg_max_len,
d_model, num_heads, dff, dropout)
self.dense = nn.Linear(d_model, trg_vocab_size)
可见DeepSC类继承了nn.Module,然后在类的构造方法中定义了几个属性:
- encoder
- channel_encoder
- channel_decoder
- decoder
- dense
如果看过DeepSC的论文就可以知道,这里的encoder和decoder属性就是语义编码器和语义解码器,而channel_encoder和channel_decoder属性就是信道编码器和信道解码器,最后dense就是最后的输出层,即一个全连接层,其输出个数就是vocabulary的尺寸
值得注意的是,DeepSC类虽然继承了nn.Module,但是却没有定义forward()
方法,在后面的代码中将会通过调用类的实例的属性的方式直接进行前向传播
接下来就根据语义编码器、信道编码器、信道解码器、语义解码器的顺序来进行介绍
语义编码器
语义编码器采用Encoder类定义,其代码如下所示:
class Encoder(nn.Module):
"Core encoder is a stack of N layers"
def __init__(self, num_layers, src_vocab_size, max_len,
d_model, num_heads, dff, dropout = 0.1):
super(Encoder, self).__init__()
self.d_model = d_model
self.embedding = nn.Embedding(src_vocab_size, d_model)
self.pos_encoding = PositionalEncoding(d_model, dropout, max_len)
self.enc_layers = nn.ModuleList([EncoderLayer(d_model, num_heads, dff, dropout)
for _ in range(num_layers)])
def forward(self, x, src_mask):
"Pass the input (and mask) through each layer in turn."
# the input size of x is [batch_size, seq_len]
x = self.embedding(x) * math.sqrt(self.d_model)
x = self.pos_encoding(x)
for enc_layer in self.enc_layers:
x = enc_layer(x, src_mask)
return x
从DeepSC的论文中可知,语义编码器就是采用的就是Transformer的编码器,从Encoder类的前向传播方法中也可以看出,对于输入就是先经过embedding后进行位置编码然后通过几层编码层完成编码过程,具体的可以看下Transformer原始论文中给出的网络结构图,如下图所示:
下面就来看看Transformer编码器如何进行构建,其实就是三个部分:输入embedding、位置编码以及编码层
- 输入embedding:nn.Embedding类的实例,这个就是存储了一个查找表,保存一个vocabulary的embedding,输入就是vocabulary中的index输出就是对应的词嵌入
位置编码:采用
PositionalEncoding
类进行定义,其代码如下所示:class PositionalEncoding(nn.Module): "Implement the PE function." def __init__(self, d_model, dropout, max_len=5000): super(PositionalEncoding, self).__init__() self.dropout = nn.Dropout(p=dropout) # Compute the positional encodings once in log space. pe = torch.zeros(max_len, d_model) position = torch.arange(0, max_len).unsqueeze(1) # [max_len, 1] div_term = torch.exp(torch.arange(0, d_model, 2) * -(math.log(10000.0) / d_model)) #math.log(math.exp(1)) = 1 pe[:, 0::2] = torch.sin(position * div_term) pe[:, 1::2] = torch.cos(position * div_term) pe = pe.unsqueeze(0) #[1, max_len, d_model] self.register_buffer('pe', pe) def forward(self, x): x = x + self.pe[:, :x.size(1)] x = self.dropout(x) return x
可见PositionalEncoding类的构造方法中首先定义了pe为大小为[max_len, d_model]的全零张量,用于存储后面计算出的位置嵌入,然后
torch.exp()
和torch.sin/cos()
计算Transformer中的位置嵌入,最后用了类的register_buffer()
方法,这个方法的作用是定义一组参数,这组参数模型训练时不会更新(即调用 optimizer.step() 后该组参数不会变化,只可人为地改变它们的值),但是保存模型时,该组参数又作为模型参数不可或缺的一部分被保存,具体可以看一下下面这篇文章:
PyTorch nn.Module中的self.register_buffer()解析
然后在前向传播方法中就是将输入直接和位置嵌入的结果相加,位置嵌入需要根据输入的维度获取
编码层:采用
nn.ModuleList
类定义了一个存储module的容器,这里其实就是根据输入的num_layers定义编码层的数量,一个编码层采用EncoderLayer类定义,类的代码如下所示:class EncoderLayer(nn.Module): "Encoder is made up of self-attn and feed forward (defined below)" def __init__(self, d_model, num_heads, dff, dropout = 0.1): super(EncoderLayer, self).__init__() self.mha = MultiHeadedAttention(num_heads, d_model, dropout = 0.1) self.ffn = PositionwiseFeedForward(d_model, dff, dropout = 0.1) self.layernorm1 = nn.LayerNorm(d_model, eps=1e-6) self.layernorm2 = nn.LayerNorm(d_model, eps=1e-6) def forward(self, x, mask): "Follow Figure 1 (left) for connections." attn_output = self.mha(x, x, x, mask) x = self.layernorm1(x + attn_output) ffn_output = self.ffn(x) x = self.layernorm2(x + ffn_output) return x
可见EncoderLayer类就是实现了Transformer编码器中的结构,类的mha属性就是Transformer中的多头注意力机制,ffn就是Transformer中的逐位置前馈神经网络,在类的前向传播方法中采用了残差连接以及Layer Normalization。下面就来看看mha和ffn如何实现,它们分别由
MultiHeadedAttention
类和PositionwiseFeedForward
类定义
首先MultiHeadedAttention类的代码如下所示:
class MultiHeadedAttention(nn.Module):
def __init__(self, num_heads, d_model, dropout=0.1):
"Take in model size and number of heads."
super(MultiHeadedAttention, self).__init__()
assert d_model % num_heads == 0
# We assume d_v always equals d_k
self.d_k = d_model // num_heads
self.num_heads = num_heads
self.wq = nn.Linear(d_model, d_model)
self.wk = nn.Linear(d_model, d_model)
self.wv = nn.Linear(d_model, d_model)
self.dense = nn.Linear(d_model, d_model)
#self.linears = clones(nn.Linear(d_model, d_model), 4)
self.attn = None
self.dropout = nn.Dropout(p=dropout)
def forward(self, query, key, value, mask=None):
"Implements Figure 2"
if mask is not None:
# Same mask applied to all h heads.
mask = mask.unsqueeze(1)
nbatches = query.size(0)
# 1) Do all the linear projections in batch from d_model => h x d_k
query = self.wq(query).view(nbatches, -1, self.num_heads, self.d_k)
query = query.transpose(1, 2)
key = self.wk(key).view(nbatches, -1, self.num_heads, self.d_k)
key = key.transpose(1, 2)
value = self.wv(value).view(nbatches, -1, self.num_heads, self.d_k)
value = value.transpose(1, 2)
# query, key, value = \
# [l(x).view(nbatches, -1, self.h, self.d_k).transpose(1, 2)
# for l, x in zip(self.linears, (query, key, value))]
# 2) Apply attention on all the projected vectors in batch.
x, self.attn = self.attention(query, key, value, mask=mask)
# 3) "Concat" using a view and apply a final linear.
x = x.transpose(1, 2).contiguous() \
.view(nbatches, -1, self.num_heads * self.d_k)
x = self.dense(x)
x = self.dropout(x)
return x
在这个类里主要关注前向传播方法即可,实际上就是根据输入的Q、K和V先进行映射,然后再进行注意力计算(由attention()方法实现),最后再通过一个全连接层将多头注意力的信息聚合起来
其次PositionwiseFeedForward类的代码如下所示:
class PositionwiseFeedForward(nn.Module):
"Implements FFN equation."
def __init__(self, d_model, d_ff, dropout=0.1):
super(PositionwiseFeedForward, self).__init__()
self.w_1 = nn.Linear(d_model, d_ff)
self.w_2 = nn.Linear(d_ff, d_model)
self.dropout = nn.Dropout(dropout)
def forward(self, x):
x = self.w_1(x)
x = F.relu(x)
x = self.w_2(x)
x = self.dropout(x)
return x
可见这个类实现一个简单的两层全连接神经网络,输入维度为d_model,中间层维度为d_ff,输出维度仍为d_model
信道编码器
信道编码器直接采用nn.Sequential
类进行定义,由两层全连接层构成
信道解码器
信道解码器采用ChannelDecoder类定义,其代码如下所示:
class ChannelDecoder(nn.Module):
def __init__(self, in_features, size1, size2):
super(ChannelDecoder, self).__init__()
self.linear1 = nn.Linear(in_features, size1)
self.linear2 = nn.Linear(size1, size2)
self.linear3 = nn.Linear(size2, size1)
# self.linear4 = nn.Linear(size1, d_model)
self.layernorm = nn.LayerNorm(size1, eps=1e-6)
def forward(self, x):
x1 = self.linear1(x)
x2 = F.relu(x1)
x3 = self.linear2(x2)
x4 = F.relu(x3)
x5 = self.linear3(x4)
output = self.layernorm(x1 + x5)
return output
可见信道解码器三层的全连接神经网络,网络的输出经过Layer Normalization得到最终的结果
语义解码器
语义解码器采用Decoder类定义,类的代码如下所示:
class Decoder(nn.Module):
def __init__(self, num_layers, trg_vocab_size, max_len,
d_model, num_heads, dff, dropout = 0.1):
super(Decoder, self).__init__()
self.d_model = d_model
self.embedding = nn.Embedding(trg_vocab_size, d_model)
self.pos_encoding = PositionalEncoding(d_model, dropout, max_len)
self.dec_layers = nn.ModuleList([DecoderLayer(d_model, num_heads, dff, dropout)
for _ in range(num_layers)])
def forward(self, x, memory, look_ahead_mask, trg_padding_mask):
x = self.embedding(x) * math.sqrt(self.d_model)
x = self.pos_encoding(x)
for dec_layer in self.dec_layers:
x = dec_layer(x, memory, look_ahead_mask, trg_padding_mask)
return x
从DeepSC的论文中可知,语义解码器就是采用的就是Transformer的解码器,具体的可以看下Transformer原始论文中给出的网络结构图,如下图所示:
这里大部分的内容和Encoder类是类似的,只有解码层是用DecoderLayer类进行定义的,其代码如下所示:
class DecoderLayer(nn.Module):
"Decoder is made of self-attn, src-attn, and feed forward (defined below)"
def __init__(self, d_model, num_heads, dff, dropout):
super(DecoderLayer, self).__init__()
self.self_mha = MultiHeadedAttention(num_heads, d_model, dropout = 0.1)
self.src_mha = MultiHeadedAttention(num_heads, d_model, dropout = 0.1)
self.ffn = PositionwiseFeedForward(d_model, dff, dropout = 0.1)
self.layernorm1 = nn.LayerNorm(d_model, eps=1e-6)
self.layernorm2 = nn.LayerNorm(d_model, eps=1e-6)
self.layernorm3 = nn.LayerNorm(d_model, eps=1e-6)
#self.sublayer = clones(SublayerConnection(size, dropout), 3)
def forward(self, x, memory, look_ahead_mask, trg_padding_mask):
"Follow Figure 1 (right) for connections."
#m = memory
attn_output = self.self_mha(x, x, x, look_ahead_mask)
x = self.layernorm1(x + attn_output)
src_output = self.src_mha(x, memory, memory, trg_padding_mask) # q, k, v
x = self.layernorm2(x + src_output)
fnn_output = self.ffn(x)
x = self.layernorm3(x + fnn_output)
return x
可见解码层中有两个注意力机制,这与Transformer解码器的结构一致,即第一个注意力是自注意力,第二个注意力的输入的一部分来自编码器的输出
initNetParams()函数
initNetParams()函数的代码如下所示:
def initNetParams(model):
'''Init net parameters.'''
for p in model.parameters():
if p.dim() > 1:
nn.init.xavier_uniform_(p)
return model
可见函数实现了模型参数的初始化
main.py的第四部分——开始训练
这一部分的代码如下所示:
for epoch in range(args.epochs):
start = time.time()
record_acc = 10
train(epoch, args, deepsc)
avg_acc = validate(epoch, args, deepsc)
if avg_acc < record_acc:
if not os.path.exists(args.checkpoint_path):
os.makedirs(args.checkpoint_path)
with open(args.checkpoint_path + '/checkpoint_{}.pth'.format(str(epoch + 1).zfill(2)), 'wb') as f:
torch.save(deepsc.state_dict(), f)
record_acc = avg_acc
record_loss = []
可见这一部分就是根据项目参数中的epoch数来进行循环训练与性能验证,这里涉及两个函数:train()
和validate()
,下面分节对这两个函数进行介绍
train()函数
train()函数的代码如下所示:
def train(epoch, args, net, mi_net=None):
train_eur= EurDataset('train')
train_iterator = DataLoader(train_eur, batch_size=args.batch_size, num_workers=0,
pin_memory=True, collate_fn=collate_data)
pbar = tqdm(train_iterator)
noise_std = np.random.uniform(SNR_to_noise(5), SNR_to_noise(10), size=(1))
for sents in pbar:
sents = sents.to(device)
if mi_net is not None:
mi = train_mi(net, mi_net, sents, 0.1, pad_idx, mi_opt, args.channel)
loss = train_step(net, sents, sents, 0.1, pad_idx,
optimizer, criterion, args.channel, mi_net)
pbar.set_description(
'Epoch: {}; Type: Train; Loss: {:.5f}; MI {:.5f}'.format(
epoch + 1, loss, mi
)
)
else:
loss = train_step(net, sents, sents, noise_std[0], pad_idx,
optimizer, criterion, args.channel)
pbar.set_description(
'Epoch: {}; Type: Train; Loss: {:.5f}'.format(
epoch + 1, loss
)
)
在train()函数中首先定义了数据集以及一个DataLoader用于对数据集中的数据进行采样,可见DeepSC在一次epoch就把整个数据集按照batch size迭代一次,然后对数据集进行迭代训练,我们这里暂时不关注DeepSC中的mutual information net,因此在train()函数的判断语句中我们关注else的部分
这里的train_step()函数实现了deepsc网络的一次训练,其代码如下所示:
def train_step(model, src, trg, n_var, pad, opt, criterion, channel, mi_net=None):
model.train()
trg_inp = trg[:, :-1]
trg_real = trg[:, 1:]
channels = Channels()
opt.zero_grad()
src_mask, look_ahead_mask = create_masks(src, trg_inp, pad)
enc_output = model.encoder(src, src_mask)
channel_enc_output = model.channel_encoder(enc_output)
Tx_sig = PowerNormalize(channel_enc_output)
if channel == 'AWGN':
Rx_sig = channels.AWGN(Tx_sig, n_var)
elif channel == 'Rayleigh':
Rx_sig = channels.Rayleigh(Tx_sig, n_var)
elif channel == 'Rician':
Rx_sig = channels.Rician(Tx_sig, n_var)
else:
raise ValueError("Please choose from AWGN, Rayleigh, and Rician")
channel_dec_output = model.channel_decoder(Rx_sig)
dec_output = model.decoder(trg_inp, channel_dec_output, look_ahead_mask, src_mask)
pred = model.dense(dec_output)
# pred = model(src, trg_inp, src_mask, look_ahead_mask, n_var)
ntokens = pred.size(-1)
#y_est = x + torch.matmul(n, torch.inverse(H))
#loss1 = torch.mean(torch.pow((x_est - y_est.view(x_est.shape)), 2))
loss = loss_function(pred.contiguous().view(-1, ntokens),
trg_real.contiguous().view(-1),
pad, criterion)
if mi_net is not None:
mi_net.eval()
joint, marginal = sample_batch(Tx_sig, Rx_sig)
mi_lb, _, _ = mutual_information(joint, marginal, mi_net)
loss_mine = -mi_lb
loss = loss + 0.0009 * loss_mine
# loss = loss_function(pred, trg_real, pad)
loss.backward()
opt.step()
return loss.item()
可见train_step()函数完成了定义模型时缺少的前向传播步骤,整个过程还是比较明确,这里我们一点点的看:
然后定义了channels,为
Channels
类的对象,Channels类的代码如下所示:class Channels(): def AWGN(self, Tx_sig, n_var): Rx_sig = Tx_sig + torch.normal(0, n_var, size=Tx_sig.shape).to(device) return Rx_sig def Rayleigh(self, Tx_sig, n_var): shape = Tx_sig.shape H_real = torch.normal(0, math.sqrt(1/2), size=[1]).to(device) H_imag = torch.normal(0, math.sqrt(1/2), size=[1]).to(device) H = torch.Tensor([[H_real, -H_imag], [H_imag, H_real]]).to(device) Tx_sig = torch.matmul(Tx_sig.view(shape[0], -1, 2), H) Rx_sig = self.AWGN(Tx_sig, n_var) # Channel estimation Rx_sig = torch.matmul(Rx_sig, torch.inverse(H)).view(shape) return Rx_sig def Rician(self, Tx_sig, n_var, K=1): shape = Tx_sig.shape mean = math.sqrt(K / (K + 1)) std = math.sqrt(1 / (K + 1)) H_real = torch.normal(mean, std, size=[1]).to(device) H_imag = torch.normal(mean, std, size=[1]).to(device) H = torch.Tensor([[H_real, -H_imag], [H_imag, H_real]]).to(device) Tx_sig = torch.matmul(Tx_sig.view(shape[0], -1, 2), H) Rx_sig = self.AWGN(Tx_sig, n_var) # Channel estimation Rx_sig = torch.matmul(Rx_sig, torch.inverse(H)).view(shape) return Rx_sig
可见Channels类中的方法定义了信号经过AWGN信道、瑞利信道和莱斯信号的过程,从代码中也可以看出DeepSC采用的衰落信道为平坦衰落
- 接着就是经过DeepSC收发机的过程,即先后经过语义编码、信道编码、功率归一化、信道、信道解码、语义解码,最后通过一个dense layer输出词汇表中每个单词对应的预测值
最后完成了损失的计算,这里使用了
loss_function()
函数,函数的代码如下所示:def loss_function(x, trg, padding_idx, criterion): loss = criterion(x, trg) mask = (trg != padding_idx).type_as(loss.data) # a = mask.cpu().numpy() loss *= mask
validate()函数
validate()函数的代码如下所示:
def validate(epoch, args, net):
test_eur = EurDataset('test')
test_iterator = DataLoader(test_eur, batch_size=args.batch_size, num_workers=0,
pin_memory=True, collate_fn=collate_data)
net.eval()
pbar = tqdm(test_iterator)
total = 0
with torch.no_grad():
for sents in pbar:
sents = sents.to(device)
loss = val_step(net, sents, sents, 0.1, pad_idx,
criterion, args.channel)
total += loss
pbar.set_description(
'Epoch: {}; Type: VAL; Loss: {:.5f}'.format(
epoch + 1, loss
)
)
return total/len(test_iterator)
测试部分
在学习完项目的训练部分后,测试部分的学习会变得简单很多
性能指标
采用BERT模型计算语义相似度
from transformers import BertModel, BertTokenizer
import torch
from w3lib.html import remove_tags
import numpy as np
from sentence_transformers import SentenceTransformer
test_sentences = [["A plane is taking off", "An air plane is taking off"], # one-gram BLEU is 4/6
["A man is fishing", "A man is exercising"], # one-gram BLEU is 3/4
["A cat is playing a piano", "A man is playing a guitar"], # one-gram BLEU is 4/6
["Fish are swimming", "A fish is swimming"]] # one-gram BLEU is 1/4
class Semantic_Similarity_Score:
def __init__(self, model_used="BERT"):
self.device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
self.model_used = model_used
if self.model_used == "BERT":
# BERT model
self.bert_model = BertModel.from_pretrained('bert-base-uncased').to(self.device)
self.tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
self.CLS_token = ['[CLS]']
self.SEP_token = ['[SEP]']
elif self.model_used == "SBERT":
# Sentence-BERT model
self.sentence_bert_model = SentenceTransformer('all-MiniLM-L6-v2')
def semantic_similarity_score_compute(self, origin_sentences, predict_sentences):
semantic_similarity_score = []
for (sentence1, sentence2) in zip(origin_sentences, predict_sentences):
sentence1 = remove_tags(sentence1) # remove start(<START>) and end(<END>) word
sentence2 = remove_tags(sentence2) # remove start(<START>) and end(<END>) word
if len(sentence1.split()) != len(sentence2.split()):
sentence2 = " ".join(sentence2.split()[:-1])
if self.model_used == "BERT":
sentence1_bert_output = self.bert_hidden_state_output(sentence1).detach().cpu()
sentence2_bert_output = self.bert_hidden_state_output(sentence2).detach().cpu()
sentence1_vector = torch.sum(sentence1_bert_output, dim=0)
sentence2_vector = torch.sum(sentence2_bert_output, dim=0)
semantic_similarity_score.append(self.cosine_similarity_compute(sentence1_vector,
sentence2_vector))
elif self.model_used == "SBERT":
sentence1_embeddings = self.sentence_bert_model.encode([sentence1])
sentence2_embeddings = self.sentence_bert_model.encode([sentence2])
semantic_similarity_score.append(self.cosine_similarity_compute(sentence1_embeddings[0],
sentence2_embeddings[0]))
return semantic_similarity_score
def bert_hidden_state_output(self, sentence):
tokens = self.tokenizer.tokenize(sentence)
tokens = self.CLS_token + tokens + self.SEP_token
attention_mask = [1 if i != '<PAD>' else 0 for i in tokens]
token_ids = self.tokenizer.convert_tokens_to_ids(tokens)
token_ids = torch.tensor(token_ids).unsqueeze(0).to(self.device)
attention_mask = torch.tensor(attention_mask).unsqueeze(0).to(self.device)
bert_output = self.bert_model(token_ids, attention_mask=attention_mask)
return bert_output[0][0][1:-1]
def cosine_similarity_compute(self, sentence1_vector, sentence2_vector):
if self.model_used == "BERT":
sentence1_vector = sentence1_vector.detach().cpu().numpy()
sentence2_vector = sentence2_vector.detach().cpu().numpy()
if (np.linalg.norm(sentence1_vector) * np.linalg.norm(sentence2_vector)) < 1e-8:
print("numerator: ", sentence1_vector.dot(sentence2_vector))
print("denominator: ", (np.linalg.norm(sentence1_vector) * np.linalg.norm(sentence2_vector)))
cosine_similarity = 0
else:
cosine_similarity = sentence1_vector.dot(sentence2_vector) / \
(np.linalg.norm(sentence1_vector) * np.linalg.norm(sentence2_vector))
return cosine_similarity
if __name__ == "__main__":
model_used = "BERT"
Semantic_Similarity_calculator = Semantic_Similarity_Score(model_used=model_used)
for sentence_pairs in test_sentences:
sentence_vector = []
for sentence in sentence_pairs:
if model_used == "SBERT":
sentence_embeddings = Semantic_Similarity_calculator.sentence_bert_model.encode([sentence])
sentence_vector.append(sentence_embeddings[0])
elif model_used == "BERT":
sentence_tokens = ['[CLS]'] + Semantic_Similarity_calculator.tokenizer.tokenize(sentence) + ['[SEP]']
sentence_attention_mask = [1 if i != '<PAD>' else 0 for i in sentence_tokens]
sentence_token_ids = Semantic_Similarity_calculator.tokenizer.convert_tokens_to_ids(sentence_tokens)
sentence_token_ids = torch.tensor(sentence_token_ids).unsqueeze(0).to(Semantic_Similarity_calculator.device)
sentence_attention_mask = torch.tensor(sentence_attention_mask).unsqueeze(0).to(Semantic_Similarity_calculator.device)
bert_output = Semantic_Similarity_calculator.bert_model(sentence_token_ids,
attention_mask=sentence_attention_mask)[0][0][1:-1].detach().cpu()
sentence_vector.append(torch.mean(bert_output, dim=0))
cosine_similarity = Semantic_Similarity_calculator.cosine_similarity_compute(sentence_vector[0],
sentence_vector[1])
print(cosine_similarity)
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。