VGG 模型是 2014 年ILSVRC竞赛的第二名。从图像中提取 CNN 特征,VGG 模型是首选算法。它的缺点是参数量有 140M 之多,需要更大的存储空间。VGG 特点:
- 小卷积核。作者将卷积核全部替换为3×3(极少用了1×1);
- 小池化核。相比 AlexNet 的 3×3 的池化核,VGG全部为 2×2 的池化核;
- 层数更深特征图更宽。基于前两点外,由于卷积核专注于扩大通道数、池化专注于缩小宽和高,使得模型架构上更深更宽的同时,计算量的增加放缓;
- 全连接转卷积。网络测试阶段将训练阶段的 3 个全连接替换为 3 个卷积,测试重用训练时的参数,使得测试得到的全卷积网络因为没有全连接的限制,因而可以接收任意宽或高为的输入。
作者在测试阶段把全连接层换成了卷积层,如下图:
换了之后,使得 VGG 能够输入不同大小的输入图像。下面为示例代码:
import torch.nn as nn import torch def test02(): net1 = nn.Sequential(*[ nn.Linear(in_features=7*7*512, out_features=4096), nn.Linear(in_features=4096, out_features=4096), nn.Linear(in_features=4096, out_features=1000) ]) inputs1 = torch.randn(1, 512, 7, 7) inputs2 = torch.randn(1, 512, 12, 12) # 输入 7x7 特征图,输出: torch.Size([1, 1000]) inputs1 = inputs1.reshape(1, -1) outputs = net1(inputs1) print(outputs.shape) # 输入 12x12 特征图, 输出: 报错 inputs2 = inputs2.reshape(1, -1) outputs = net1(inputs2) print(outputs.shape) def test03(): net2 = nn.Sequential(*[ nn.Conv2d(in_channels=512, out_channels=4096, kernel_size=7), nn.Conv2d(in_channels=4096, out_channels=4096, kernel_size=1), nn.Conv2d(in_channels=4096, out_channels=1000, kernel_size=1) ]) inputs1 = torch.randn(1, 512, 7, 7) inputs2 = torch.randn(1, 512, 12, 12) # 输入 7x7 特征图,输出: torch.Size([1, 1000, 1, 1]) outputs = net2(inputs1) print(outputs.shape) # 输入 12x12 特征图,输出: torch.Size([1, 1000, 6, 6]) outputs = net2(inputs2) print(outputs.shape) if __name__ == '__main__': test02() test03()
虽然 test03 中将最后一层换成了卷积层,能够接受任意大小图像的输入,但是输出的形状为 [1, 1000, 6, 6]
,这个数据可以理解为一副图像 36 个部分分别属于不同类别的分数,我们可以把求这些分数的平均分数作为输入图像属于各个类别的分数。
PyTorch 中提供的 VGG 模型实现:
from torchvision.models import vgg11 from torchvision.models import vgg13 from torchvision.models import vgg16 from torchvision.models import vgg19 from torchvision.models import vgg11_bn from torchvision.models import vgg13_bn from torchvision.models import vgg16_bn from torchvision.models import vgg19_bn
bn 版本在卷积层和 RELU 激活函数之间增加了 BatchNorm 层,下面是 vgg11 和 vgg11_bn 的网络结构:
vgg11:
VGG( (features): Sequential( (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): ReLU(inplace=True) (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (3): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (4): ReLU(inplace=True) (5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (6): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (7): ReLU(inplace=True) (8): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (9): ReLU(inplace=True) (10): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (11): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (12): ReLU(inplace=True) (13): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (14): ReLU(inplace=True) (15): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (16): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (17): ReLU(inplace=True) (18): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (19): ReLU(inplace=True) (20): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) ) (avgpool): AdaptiveAvgPool2d(output_size=(7, 7)) (classifier): Sequential( (0): Linear(in_features=25088, out_features=4096, bias=True) (1): ReLU(inplace=True) (2): Dropout(p=0.5, inplace=False) (3): Linear(in_features=4096, out_features=4096, bias=True) (4): ReLU(inplace=True) (5): Dropout(p=0.5, inplace=False) (6): Linear(in_features=4096, out_features=1000, bias=True) ) )
我们可以看到上面的模型中,Conv2d 和 ReLU 是一个组合,在这里我们就把两者的组合叫做卷积层。上面的网络共包含 8 个卷积层,3 个线性层,共计 11 个网络层。
vgg11_bn:
VGG( (features): Sequential( (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU(inplace=True) (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (4): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (5): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (6): ReLU(inplace=True) (7): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (8): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (9): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (10): ReLU(inplace=True) (11): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (12): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (13): ReLU(inplace=True) (14): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (15): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (16): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (17): ReLU(inplace=True) (18): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (19): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (20): ReLU(inplace=True) (21): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (22): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (23): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (24): ReLU(inplace=True) (25): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (26): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (27): ReLU(inplace=True) (28): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) ) (avgpool): AdaptiveAvgPool2d(output_size=(7, 7)) (classifier): Sequential( (0): Linear(in_features=25088, out_features=4096, bias=True) (1): ReLU(inplace=True) (2): Dropout(p=0.5, inplace=False) (3): Linear(in_features=4096, out_features=4096, bias=True) (4): ReLU(inplace=True) (5): Dropout(p=0.5, inplace=False) (6): Linear(in_features=4096, out_features=1000, bias=True) ) )
我们可以看到上面的模型中,Conv2d、BatchNotm 和 ReLU 是一个组合,在这里我们就把这 3 个的组合叫做卷积层。上面的网络共包含 8 个卷积层,3 个线性层,共计 11 个网络层。