从LeNet到MobileNet：手把手教你用PyTorch复现这6个经典CNN模型（附完整代码）-程序员充电站

从LeNet到MobileNet：PyTorch实战6大经典CNN模型

1. 环境准备与基础工具链

在开始复现经典CNN模型之前，我们需要搭建合适的开发环境。推荐使用Python 3.8+和PyTorch 1.10+的组合，这是目前最稳定的深度学习开发环境之一。

基础环境配置步骤如下：

conda create -n pytorch_cnn python=3.8 conda activate pytorch_cnn pip install torch torchvision torchaudio pip install jupyter matplotlib numpy tqdm

对于GPU加速，需要额外安装CUDA工具包。PyTorch官网提供了详细的版本对应关系，建议根据显卡型号选择匹配的CUDA版本。

关键工具说明：

Jupyter Notebook：交互式编程环境，非常适合模型调试和实验
Matplotlib：可视化工具，用于展示模型结构和训练过程
tqdm：进度条工具，让训练过程更加直观

提示：如果遇到包冲突问题，可以尝试使用conda而不是pip来安装主要依赖项。conda能更好地处理复杂的依赖关系。

2. LeNet-5：CNN的开山之作

LeNet-5是卷积神经网络的鼻祖，由Yann LeCun在1998年提出，最初用于手写数字识别。虽然结构简单，但包含了现代CNN的核心组件。

PyTorch实现关键代码：

import torch.nn as nn class LeNet5(nn.Module): def __init__(self, num_classes=10): super(LeNet5, self).__init__() self.features = nn.Sequential( nn.Conv2d(1, 6, kernel_size=5), nn.Tanh(), nn.AvgPool2d(kernel_size=2), nn.Conv2d(6, 16, kernel_size=5), nn.Tanh(), nn.AvgPool2d(kernel_size=2), ) self.classifier = nn.Sequential( nn.Linear(16*5*5, 120), nn.Tanh(), nn.Linear(120, 84), nn.Tanh(), nn.Linear(84, num_classes), ) def forward(self, x): x = self.features(x) x = torch.flatten(x, 1) x = self.classifier(x) return x

模型结构解析：

层类型	参数说明	输出尺寸
输入层	32x32单通道图像	1×32×32
Conv1	6个5×5卷积核	6×28×28
Tanh	激活函数	6×28×28
Pool1	2×2平均池化	6×14×14
Conv2	16个5×5卷积核	16×10×10
Tanh	激活函数	16×10×10
Pool2	2×2平均池化	16×5×5
FC1	全连接层	120
FC2	全连接层	84
输出层	全连接层	10

训练技巧：

使用交叉熵损失函数和SGD优化器
学习率设置为0.01，momentum设为0.9
批量大小(batch size)建议设为64或128

3. AlexNet：深度CNN的里程碑

AlexNet在2012年ImageNet竞赛中一战成名，开启了深度学习的新时代。相比LeNet，它引入了多项创新技术。

关键改进点：

使用ReLU激活函数替代Tanh，缓解梯度消失问题
采用Dropout减少过拟合
使用重叠池化(Overlapping Pooling)
引入局部响应归一化(LRN)
使用数据增强技术

PyTorch实现核心部分：

class AlexNet(nn.Module): def __init__(self, num_classes=1000): super(AlexNet, self).__init__() self.features = nn.Sequential( nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2), nn.ReLU(inplace=True), nn.MaxPool2d(kernel_size=3, stride=2), nn.Conv2d(64, 192, kernel_size=5, padding=2), nn.ReLU(inplace=True), nn.MaxPool2d(kernel_size=3, stride=2), nn.Conv2d(192, 384, kernel_size=3, padding=1), nn.ReLU(inplace=True), nn.Conv2d(384, 256, kernel_size=3, padding=1), nn.ReLU(inplace=True), nn.Conv2d(256, 256, kernel_size=3, padding=1), nn.ReLU(inplace=True), nn.MaxPool2d(kernel_size=3, stride=2), ) self.avgpool = nn.AdaptiveAvgPool2d((6, 6)) self.classifier = nn.Sequential( nn.Dropout(), nn.Linear(256*6*6, 4096), nn.ReLU(inplace=True), nn.Dropout(), nn.Linear(4096, 4096), nn.ReLU(inplace=True), nn.Linear(4096, num_classes), ) def forward(self, x): x = self.features(x) x = self.avgpool(x) x = torch.flatten(x, 1) x = self.classifier(x) return x

训练注意事项：

使用ImageNet等大数据集时，建议使用多GPU训练
学习率采用分段衰减策略
权重初始化使用He初始化
批量归一化(BatchNorm)可以显著提升性能

4. VGGNet：深度与规整的代表

VGGNet以其极简的3×3卷积堆叠结构闻名，证明了网络深度对性能的重要性。

VGG核心特点：

全部使用3×3小卷积核
网络深度从11层到19层不等
每经过池化层，通道数翻倍
最后接三个全连接层

VGG-16实现代码：

class VGG16(nn.Module): def __init__(self, num_classes=1000): super(VGG16, self).__init__() self.features = nn.Sequential( # Block 1 nn.Conv2d(3, 64, kernel_size=3, padding=1), nn.ReLU(inplace=True), nn.Conv2d(64, 64, kernel_size=3, padding=1), nn.ReLU(inplace=True), nn.MaxPool2d(kernel_size=2, stride=2), # Block 2-5 # ... 类似结构重复 ) self.avgpool = nn.AdaptiveAvgPool2d((7, 7)) self.classifier = nn.Sequential( nn.Linear(512*7*7, 4096), nn.ReLU(inplace=True), nn.Dropout(), nn.Linear(4096, 4096), nn.ReLU(inplace=True), nn.Dropout(), nn.Linear(4096, num_classes), ) def forward(self, x): x = self.features(x) x = self.avgpool(x) x = torch.flatten(x, 1) x = self.classifier(x) return x

VGG变体对比：

模型	层数	参数量	Top-1错误率
VGG-11	11	133M	28.5%
VGG-13	13	133M	28.0%
VGG-16	16	138M	27.0%
VGG-19	19	144M	26.7%

注意：实际应用中，VGG-16是最常用的版本，在性能和复杂度之间取得了良好平衡。

5. ResNet：残差学习的突破

ResNet通过引入残差连接，成功训练了超过100层的深度网络，解决了深度网络的退化问题。

残差块实现：

class BasicBlock(nn.Module): expansion = 1 def __init__(self, inplanes, planes, stride=1, downsample=None): super(BasicBlock, self).__init__() self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=3, stride=stride, padding=1, bias=False) self.bn1 = nn.BatchNorm2d(planes) self.relu = nn.ReLU(inplace=True) self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=1, padding=1, bias=False) self.bn2 = nn.BatchNorm2d(planes) self.downsample = downsample self.stride = stride def forward(self, x): identity = x out = self.conv1(x) out = self.bn1(out) out = self.relu(out) out = self.conv2(out) out = self.bn2(out) if self.downsample is not None: identity = self.downsample(x) out += identity out = self.relu(out) return out

ResNet架构特点：

使用批量归一化(BatchNorm)加速训练
采用全局平均池化替代全连接层
残差连接允许梯度直接反向传播
瓶颈结构(Bottleneck)减少计算量

不同深度ResNet配置：

模型	层数	参数量	Top-1错误率
ResNet-18	18	11.7M	27.9%
ResNet-34	34	21.8M	24.0%
ResNet-50	50	25.6M	22.9%
ResNet-101	101	44.5M	21.8%
ResNet-152	152	60.2M	21.4%

6. MobileNet：轻量级CNN典范

MobileNet系列专为移动和嵌入式设备设计，通过深度可分离卷积大幅减少计算量。

深度可分离卷积实现：

class DepthwiseSeparableConv(nn.Module): def __init__(self, in_channels, out_channels, stride=1): super(DepthwiseSeparableConv, self).__init__() self.depthwise = nn.Sequential( nn.Conv2d(in_channels, in_channels, kernel_size=3, stride=stride, padding=1, groups=in_channels, bias=False), nn.BatchNorm2d(in_channels), nn.ReLU6(inplace=True) ) self.pointwise = nn.Sequential( nn.Conv2d(in_channels, out_channels, kernel_size=1, bias=False), nn.BatchNorm2d(out_channels), nn.ReLU6(inplace=True) ) def forward(self, x): x = self.depthwise(x) x = self.pointwise(x) return x

MobileNetV1与V2对比：

特性	MobileNetV1	MobileNetV2
基本单元	深度可分离卷积	倒残差块
激活函数	ReLU6	ReLU6(扩展层)/Linear(输出层)
通道变化	固定扩展	先扩展后压缩
参数量	4.2M	3.4M
计算量	569M	300M

实际应用建议：

移动端应用优先考虑MobileNetV3
需要更高精度时可以使用EfficientNet
量化后的MobileNet在嵌入式设备上运行效率极高

7. 模型训练与调优技巧

通用训练流程：

数据准备与增强
模型初始化
损失函数选择
优化器配置
学习率调度
训练监控

PyTorch训练代码框架：

def train_model(model, dataloaders, criterion, optimizer, num_epochs=25): since = time.time() best_model_wts = copy.deepcopy(model.state_dict()) best_acc = 0.0 for epoch in range(num_epochs): for phase in ['train', 'val']: if phase == 'train': model.train() else: model.eval() running_loss = 0.0 running_corrects = 0 for inputs, labels in dataloaders[phase]: inputs = inputs.to(device) labels = labels.to(device) optimizer.zero_grad() with torch.set_grad_enabled(phase == 'train'): outputs = model(inputs) _, preds = torch.max(outputs, 1) loss = criterion(outputs, labels) if phase == 'train': loss.backward() optimizer.step() running_loss += loss.item() * inputs.size(0) running_corrects += torch.sum(preds == labels.data) epoch_loss = running_loss / len(dataloaders[phase].dataset) epoch_acc = running_corrects.double() / len(dataloaders[phase].dataset) if phase == 'val' and epoch_acc > best_acc: best_acc = epoch_acc best_model_wts = copy.deepcopy(model.state_dict()) time_elapsed = time.time() - since print(f'Training complete in {time_elapsed//60:.0f}m {time_elapsed%60:.0f}s') print(f'Best val Acc: {best_acc:4f}') model.load_state_dict(best_model_wts) return model

性能优化技巧：

混合精度训练：使用torch.cuda.amp减少显存占用
梯度累积：模拟更大的batch size
学习率预热：避免初期训练不稳定
标签平滑：提高模型泛化能力
模型剪枝：移除不重要的连接

8. 模型部署与生产实践

常见部署方案对比：

方案	优点	缺点	适用场景
PyTorch原生	简单直接	依赖完整PyTorch环境	研究原型
TorchScript	跨平台	需要额外转换步骤	生产环境
ONNX	框架中立	可能丢失部分特性	多框架协作
TensorRT	极致优化	NVIDIA硬件专属	高性能推理
Core ML	iOS原生支持	仅限Apple生态	移动应用

ONNX转换示例：

import torch.onnx dummy_input = torch.randn(1, 3, 224, 224) model = resnet18(pretrained=True) torch.onnx.export(model, dummy_input, "resnet18.onnx", input_names=["input"], output_names=["output"], dynamic_axes={"input": {0: "batch_size"}, "output": {0: "batch_size"}})

性能优化建议：

使用半精度(FP16)减少显存占用
应用TensorRT进行图优化
实现批处理(Batching)提高吞吐量
考虑模型量化(INT8)进一步加速

9. 经典CNN模型对比与选型指南

六大模型综合对比：

模型	参数量	计算量	适用场景	优势
LeNet	60K	0.3M	简单图像分类	结构简单，易于理解
AlexNet	60M	720M	中等复杂度分类	经典架构，教学价值
VGG	138M	15.5G	特征提取	结构规整，性能稳定
ResNet	25.6M	4.1G	复杂视觉任务	深度可扩展，性能优异
MobileNet	4.2M	569M	移动端应用	高效计算，低延迟
EfficientNet	5.3M	390M	资源受限场景	参数效率高