# 技术干货 | 基于MindSpore更好的理解Focal Loss

• 使用场景

1. 设计采样策略，一般都是对数量少的样本进行重采样

2. 设计 Loss，一般都是对不同类别样本进行权重赋值

## 论文分析

So，Why？and result？

(1) training is inefficient as most locations are easy negatives that contribute no useful learning signal;
(2) en masse, the easy negatives can overwhelm training and lead to degenerate models.

## 结论

``````import mindspore
import mindspore.common.dtype as mstype
from mindspore.common.tensor import Tensor
from mindspore.common.parameter import Parameter
from mindspore.ops import operations as P
from mindspore.ops import functional as F
from mindspore import nn

class FocalLoss(_Loss):

def __init__(self, weight=None, gamma=2.0, reduction='mean'):
super(FocalLoss, self).__init__(reduction=reduction)
# 校验gamma，这里的γ称作focusing parameter，γ>=0，称为调制系数
self.gamma = validator.check_value_type("gamma", gamma, [float])
if weight is not None and not isinstance(weight, Tensor):
raise TypeError("The type of weight should be Tensor, but got {}.".format(type(weight)))
self.weight = weight
# 用到的mindspore算子
self.expand_dims = P.ExpandDims()
self.gather_d = P.GatherD()
self.squeeze = P.Squeeze(axis=1)
self.tile = P.Tile()
self.cast = P.Cast()

def construct(self, predict, target):
targets = target
# 对输入进行校验
_check_ndim(predict.ndim, targets.ndim)
_check_channel_and_shape(targets.shape[1], predict.shape[1])
_check_predict_channel(predict.shape[1])

# 将logits和target的形状更改为num_batch * num_class * num_voxels.
if predict.ndim > 2:
predict = predict.view(predict.shape[0], predict.shape[1], -1) # N,C,H,W => N,C,H*W
targets = targets.view(targets.shape[0], targets.shape[1], -1) # N,1,H,W => N,1,H*W or N,C,H*W
else:
predict = self.expand_dims(predict, 2) # N,C => N,C,1
targets = self.expand_dims(targets, 2) # N,1 => N,1,1 or N,C,1

# 计算对数概率
log_probability = nn.LogSoftmax(1)(predict)
# 只保留每个voxel的地面真值类的对数概率值。
if target.shape[1] == 1:
log_probability = self.gather_d(log_probability, 1, self.cast(targets, mindspore.int32))
log_probability = self.squeeze(log_probability)

# 得到概率
probability = F.exp(log_probability)

if self.weight is not None:
convert_weight = self.weight[None, :, None]  # C => 1,C,1
convert_weight = self.tile(convert_weight, (targets.shape[0], 1, targets.shape[2])) # 1,C,1 => N,C,H*W
if target.shape[1] == 1:
convert_weight = self.gather_d(convert_weight, 1, self.cast(targets, mindspore.int32))  # selection of the weights  => N,1,H*W
convert_weight = self.squeeze(convert_weight)  # N,1,H*W => N,H*W
# 将对数概率乘以它们的权重
probability = log_probability * convert_weight
# 计算损失小批量
weight = F.pows(-probability + 1.0, self.gamma)
if target.shape[1] == 1:
loss = (-weight * log_probability).mean(axis=1)  # N
else:
loss = (-weight * targets * log_probability).mean(axis=-1)  # N,C

return self.get_loss(loss)``````

``````from mindspore.common import dtype as mstype
from mindspore import nn
from mindspore import Tensor

predict = Tensor([[0.8, 1.4], [0.5, 0.9], [1.2, 0.9]], mstype.float32)
target = Tensor([[1], [1], [0]], mstype.int32)
focalloss = nn.FocalLoss(weight=Tensor([1, 2]), gamma=2.0, reduction='mean')
output = focalloss(predict, target)
print(output)

0.33365273``````

## Focal Loss的两个重要性质

1. 当一个样本被分错的时候，pt是很小的，那么调制因子（1-Pt）接近1，损失不被影响；当Pt→1，因子（1-Pt）接近0，那么分的比较好的（well-classified）样本的权值就被调低了。因此调制系数就趋于1，也就是说相比原来的loss是没有什么大的改变的。当pt趋于1的时候（此时分类正确而且是易分类样本），调制系数趋于0，也就是对于总的loss的贡献很小。
2. 当γ=0的时候，focal loss就是传统的交叉熵损失，当γ增加的时候，调制系数也会增加。 专注参数γ平滑地调节了易分样本调低权值的比例。γ增大能增强调制因子的影响，实验发现γ取2最好。直觉上来说，调制因子减少了易分样本的损失贡献，拓宽了样例接收到低损失的范围。当γ一定的时候，比如等于2，一样easy example(pt=0.9)的loss要比标准的交叉熵loss小100+倍，当pt=0.968时，要小1000+倍，但是对于hard example(pt < 0.5)，loss最多小了4倍。这样的话hard example的权重相对就提升了很多。这样就增加了那些误分类的重要性Focal Loss的两个性质算是核心，其实就是用一个合适的函数去度量难分类和易分类样本对总的损失的贡献。

MindSpore官方资料：GitHub : https://github.com/mindspore-...
Gitee:https : //gitee.com/mindspore/mindspore

1.3k 声望
1.7k 粉丝
0 条评论