1. 多分类交叉熵对于整数标签计算

Softmax 对给定数据进行归一化；
logSoftmax 对 Softmax 的结果再进行 log 计算；
CrossEntropyLoss 对 logSoftmax 的结果再进行取反求均值；
NLLLoss 对给定的数据取反求均值。

其中：

N 为样本的数量，M 为标签的数量；
字母 i 表示样本的编号；
字母 j 表示样本多个标签的编号;
\(x_i\) 表示第 i 个样本正确类别对应预测的分数；
\(y_{i,j}\) 表示第 i 个样本的第 j 个标签的分数；

示例代码：

import torch
import torch.nn as nn


# 1. nn.Softmax、nn.logSoftMax
def test01():

    # 1.1 SoftMax 使用
    data = torch.tensor([[0.5, 0.3], [0.2, 0.3]])
    data = nn.Softmax(dim=1)(data)
    data = torch.log(data)
    print(data)

    # 1.2 LogSoftmax 使用
    # nn.logSoftMax 相当于 log(SoftMax(data))
    data = nn.LogSoftmax(dim=1)(data)
    print(data)


# 2. CrossEntropyLoss 多分类损失函数
def test02():

    # 2.1 CrossEntropyLoss
    y_pred = torch.tensor([[0.5, 0.3], [0.4, 0.8]])
    y_true = torch.tensor([0, 1])
    loss = nn.CrossEntropyLoss(label_smoothing=0.0)(y_pred, y_true)
    print(loss)

    # 2.2 CrossEntropyLoss = NLLLoss + LogSoftmax 用法
    log_data = nn.LogSoftmax(dim=1)(y_pred)
    # NLLLoss 对所有的 LogSoftmax 相加再取反
    loss = nn.NLLLoss()(log_data, torch.tensor([0, 1]))
    print(loss)


if __name__ == '__main__':
    test01()
    test02()

程序输出结果：

tensor([[-0.5981, -0.7981],
        [-0.7444, -0.6444]])
tensor([[-0.5981, -0.7981],
        [-0.7444, -0.6444]])
tensor(0.5556)
tensor(0.5556)

2. 多分类交叉熵对于浮点数标签的计算

在 Pytorch 1.10 版本之前，CrossEntropyLoss 的 Target 参数只支持 torch.Long 类型，即: 0~N 表示的标签索引。有时，我们需要计算两个概率分布的交叉熵损失，即: Input 为 Logits，Target 为每个类别的概率，在 PyTorch 1.10 版本之后支持，计算公式如下：

示例代码：

import torch.nn.functional as F
import torch.nn as nn
import torch


def test():

    # 固定随机数种子
    torch.manual_seed(0)

    # 1. 目标值是概率分布
    logits = torch.randn(2, 2)
    target = torch.randn(2, 2).softmax(dim=1)
    print(logits)
    print(target)
    loss = F.cross_entropy(logits, target)
    print(loss)

    # 2. 具体计算过程
    s1 = -F.log_softmax(logits, dim=1)
    # print(s1)

    s2 = s1 * target
    # print(s2)

    s3 = s2.sum(dim=-1)
    # print(s3)

    s4 = s3.sum()
    # print(s4)

    s5 = s4 / len(target)
    print(s5)


if __name__ == '__main__':
    test()

程序输出结果：

tensor([[ 1.5410, -0.2934],
        [-2.1788,  0.5684]])
tensor([[0.5779, 0.4221],
        [0.3930, 0.6070]])
tensor(1.0322)
tensor(1.0322)

3. 多分类交叉熵中的标签平滑

在分类问题中，我们希望模型能够把正确标签的概率预测为 1，而其他标签概率预测为 0。如下图所示：

比如：对于某个样本，我们只考虑其预测正确标签的损失，其他损失不考虑。那么，当训练样本较少，或者不具有代表性的情况下，会导致模型过拟合，泛化能力较差。标签平滑则表示，我们再考虑损失的时候，就不仅仅考虑正确标签的损失，也得考虑其他标签的损失，如下图：

看到这个图，马上想到，不对啊，难道我们想让模型把正确标签预测为 1，不正确标签也预测为 1 吗？显然不是这样的，我们希望的结果是：模型正确标签预测到 0.7 就可以了，不正确的标签预测到 0.3 就可以了，所以得增加一个平滑系数（0.3），如下图所示：

此时，我们发现正确标签模型只要往 0.7 学习就可以，而不是 1 ，不正确的标签到 0.3，而不是 0。这有点像模型不再向着一个极端目标去学习，而是定了一个相对软的目标进行学习。

Paper：
https://proceedings.neurips.cc/paper/2019/file/f1748d6b0fd9d439f71450117eba2725-Paper.pdf

公式中的第一项就是标准的交叉熵损失，而第二项则是考虑到所有标签的损失。下面是一段计算的示例代码：

import torch.nn as nn
import torch
import torch.nn.functional as F


import torch.nn.functional as F
import torch
import torch.nn as nn


def test():

    # 1. 标签平滑
    y_pred = torch.tensor([[0.5, 0.3, 0.9], [0.4, 0.8, 0.1]])
    y_true = torch.tensor([0, 1])
    loss = nn.CrossEntropyLoss(label_smoothing=0.3)(y_pred, y_true)
    print(loss)


    # 2. 具体计算过程
    log_preds = F.log_softmax(y_pred, dim=-1)

    # 该项引入了其他标签的损失
    # torch.mean 计算批次样本的平均损失
    # 除以 3 表示计算 3 个标签的平均损失
    loss = torch.mean(-log_preds.sum(dim=-1))  / 3
    # 先除以3和后除以3表示的含义相同，结果也相同
    # loss = torch.mean(-log_preds.sum(dim=-1) / 3)

    # 标准交叉熵损失
    nll = F.nll_loss(log_preds, y_true)

    label_smooting = 0.3
    loss = (1 - label_smooting) * nll + label_smooting * loss
    print(loss)


if __name__ == '__main__':
    test()

程序输出结果：

tensor(1.0302)
tensor(1.0302)

多分类交叉熵损失函数（CrossEntropyLoss）

1. 多分类交叉熵对于整数标签计算

2. 多分类交叉熵对于浮点数标签的计算

3. 多分类交叉熵中的标签平滑

文章目录