VGG 网络模型是在 2014 年 ImageNet 大规模视觉识别挑战赛(ILSVRC)中提出的,该模型在图像分类任务中取得了优异的成绩。VGG 网络的核心思想是通过使用小尺寸的卷积核(3×3)和增加网络深度来提高模型的性能。相比于 AlexNet 中使用的 11×11 和 5×5 的卷积核,小卷积核可以减少参数数量,同时通过多层堆叠,可以增加网络的非线性表达能力,显著提升了模型的准确率。
1. 网络架构
VGG 模型的主要版本包括 VGG-11、VGG-13、VGG-16 和 VGG-19,其中数字表示网络的总层数(包括卷积层和全连接层)。这些变体中最常见的是 VGG16 和 VGG19,下面我们以VGG16为例,详细解析其结构。
| 层数 | |||
| 1 | Conv2d(3, 64, k=3, stride=1, p=1) | ReLU(inplace=True) | |
| 2 | Conv2d(64, 64, k=3), s=1, p=1) | ReLU(inplace=True) | MaxPool2d(k=2, s=2, p=0) |
| 3 | Conv2d(64, 128, k=3, s=1, p=1) | ReLU(inplace=True) | |
| 4 | Conv2d(128, 128, k=3, s=1, p=1) | ReLU(inplace=True) | MaxPool2d(k=2, s=2, p=0) |
| 5 | Conv2d(128, 256, k=3, s=1, p=1) | ReLU(inplace=True) | |
| 6 | Conv2d(256, 256, k=3, s=1, p=1) | ReLU(inplace=True) | |
| 7 | Conv2d(256, 256, k=3, s=1, p=1) | ReLU(inplace=True) | MaxPool2d(k=2, s=2, p=0) |
| 8 | Conv2d(256, 512, k=3, s=1, p=1) | ReLU(inplace=True) | |
| 9 | Conv2d(512, 512, k=3, s=1, p=1) | ReLU(inplace=True) | |
| 10 | Conv2d(512, 512, k=3, s=1, p=1) | ReLU(inplace=True) | MaxPool2d(k=2, s=2, p=0) |
| 11 | Conv2d(512, 512, k=3, s=1, p=1) | ReLU(inplace=True) | |
| 12 | Conv2d(512, 512, k=3, s=1, p=1) | ReLU(inplace=True) | |
| 13 | Conv2d(512, 512, k=3, s=1, p=1) | ReLU(inplace=True) | MaxPool2d(k=2, s=2, p=0) |
| AdaptiveAvgPool2d(output_size=(7, 7)) | |||
| 14 | Linear(in=25088, out=4096, bias=True) | ReLU(inplace=True) | Dropout(p=0.5) |
| 15 | Linear(in=4096, out=4096, bias=True) | ReLU(inplace=True) | Dropout(p=0.5) |
| 16 | Linear(in=4096, out=1000, bias=True) |
2. 模型微调
如果针对 预训练模型微调,建议严格按照 224×224 的输入尺寸进行训练。如果从零开始训练(随机初始化权重),可以使用任意大小的输入图像,不需要固定 224×224。另外,orchvision 存在 vgg16 和 vgg16_bn 两个模型,后者是在网络中增加了 BatchNorm2d 层。接下来,我们下面选择在 vgg16 预训练模型基础进行微调。
由于我们微调使用的数据集是 CIFAR10,而预训练模型最后的输出层是 1000 类别,我们需要手动替换最后一层。
num_features = estimator.classifier[6].in_features # 获得最后一层的输入维度 estimator.classifier[6] = nn.Linear(num_features, 10) # CIFAR-10 有 10 类
import torch
import torchvision
import torchvision.models as models
import torchvision.transforms as transforms
import torch.optim as optim
import torch.nn as nn
from tqdm import tqdm
def train():
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
transform = transforms.Compose([transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))])
# 下载 CIFAR-10 训练数据集
train_data = torchvision.datasets.CIFAR10(root='data', train=True, download=True, transform=transform)
dataloader = torch.utils.data.DataLoader(train_data, batch_size=64, shuffle=True)
# 加载 VGG16 预训练模型
# IMAGENET1K_V1 : 有全连接层,直接分类,完整微调
# IMAGENET1K_FEATURES : 只有卷积层,迁移学习,特征提取
# 下载权重: https://download.pytorch.org/models/vgg16-397923af.pth
estimator = models.vgg16()
estimator.load_state_dict(torch.load('vgg16-397923af.pth'))
estimator.train()
# 修改最后的全连接层,
num_features = estimator.classifier[6].in_features # 获得最后一层的输入维度
estimator.classifier[6] = nn.Linear(num_features, 10) # CIFAR-10 有 10 类
estimator = estimator.to(device)
# 损失函数和优化器
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(estimator.parameters(), lr=0.0001)
num_epochs = 10
for epoch in range(num_epochs):
running_loss, running_size = 0.0, 0
progress = tqdm(range(len(dataloader)), desc='Epoch: %2d Loss: %.3f' % (0, 0))
for inputs, labels in dataloader:
inputs, labels = inputs.to(device), labels.to(device)
optimizer.zero_grad()
outputs = estimator(inputs)
cur_loss = criterion(outputs, labels)
cur_loss.backward()
optimizer.step()
running_loss += (cur_loss.item() * len(labels))
running_size += len(labels)
progress.set_description('Epoch: %2d Loss: %.3f' % (epoch + 1, running_loss / running_size))
progress.update()
progress.close()
torch.save(estimator.state_dict(), 'vgg16_cifar10.pth')
if __name__ == '__main__':
train()
Epoch: 1 Loss: 0.437: 100%|██████████████████| 782/782 [09:31<00:00, 1.37it/s] Epoch: 2 Loss: 0.197: 100%|██████████████████| 782/782 [09:32<00:00, 1.37it/s]
3. 模型评估
import torchvision
import torch
import torchvision.models as models
from tqdm import tqdm
from torchvision.transforms import transforms
from sklearn.metrics import accuracy_score
from torch.utils.data import DataLoader
def evaluate():
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
transform = transforms.Compose([transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))])
test_data = torchvision.datasets.CIFAR10(root='data', train=False, download=True, transform=transform)
dataloader = torch.utils.data.DataLoader(test_data, batch_size=64, shuffle=False)
estimator = models.vgg16(num_classes=10)
estimator.load_state_dict(torch.load('vgg16_cifar10.pth'))
estimator = estimator.to(device)
progress = tqdm(range(len(dataloader)), desc='Acc: %.2f' % 0)
y_true, y_pred = [], []
for inputs, batch_true in dataloader:
inputs = inputs.to(device)
outputs = estimator(inputs)
batch_pred = torch.argmax(outputs, dim=-1)
y_true.extend(batch_true.tolist())
y_pred.extend(batch_pred.cpu().tolist())
accuracy = accuracy_score(y_true, y_pred)
progress.set_description('Acc: %.2f' % accuracy)
progress.update()
progress.close()
if __name__ == '__main__':
evaluate()
Acc: 0.91: 100%|██████████████████████████████| 157/157 [00:39<00:00, 3.96it/s]

冀公网安备13050302001966号