深夜赶工：CNN神经网络做彩色图像识别，用以测试天价核弹

在图像识别的道路越走越远✌( •̀ ω •́ )y

1.解释一下

深夜脑子不是很清楚，大部分代码参考了github……
此CNN图像识别神经网络的用途是之后用来评估NVIDIA-DGX服务器的性能，因此尽量扩大网络的训练时间。
此服务器搭载了8块NVIDIA TESLA V100显卡，是目前顶级的深度学习计算卡，单卡售价102万RMB，整机售价接近1000万，天价核弹，有钱真好。根据网上的信息，此服务器可在8小时内完成titanX 8天的工作量，顶级民用cpu数个月工作量。

此神经网络参考了GITHUB的图像识别项目，采用了DenseNet模型，增加了ImageDataGenerator函数以扩充数据集。打算后续通过改变常量epoch的值在各个平台进行运算。

由于深夜仓促，尚未完成GPU的配置，因此把epoch设置为1先在CPU上跑跑试试，通过经验估计在GTX1080上所需的时间。

2.数据集说明
该训练采用cifar10数据集，包含60000张32x32像素的彩色图片，这些图片分属不同的类别，如图所示：

具体说明参考多伦多大学官网：http://www.cs.toronto.edu/~kr...

此网络的目的是尽量精确地通过图像识别将图片分类到自己所属类别当中。

下载数据集后直接改名后放入user.kerasdatasets文件夹中：

解压后可发现，数据集分成6个batch，其中5个为训练集，1个为测试集：

3.深夜仓促，直接上代码：

导入第三方库（numpy/keras/math）：

import numpy as np
import keras
import math
from keras.datasets import cifar10
from keras.preprocessing.image import ImageDataGenerator
from keras.layers.normalization import BatchNormalization
from keras.layers import Conv2D, Dense, Input, add, Activation, AveragePooling2D, GlobalAveragePooling2D
from keras.layers import Lambda, concatenate
from keras.initializers import he_normal
from keras.layers.merge import Concatenate
from keras.callbacks import LearningRateScheduler, TensorBoard, ModelCheckpoint
from keras.models import Model
from keras import optimizers
from keras import regularizers
from keras.utils.vis_utils import plot_model as plot

设置常量：

growth_rate        = 12 
depth              = 100
compression        = 0.5

img_rows, img_cols = 32, 32           #图片尺寸
img_channels       = 3                #图片色彩通道数，RGB
num_classes        = 10               #数据集类别数量
batch_size         = 64               #训练batch所包含的example数量，只能是64或者32
epochs             = 1                #全数据集迭代次数，这里打算用cpu运算一次。
                                      #根据测试的显卡和自己的要求改epoch数量
                                      #当epoch数量为250时识别效果较好，但这里不考虑效果

iterations         = 782              #每一次epoch的步数
weight_decay       = 0.0001

mean = [125.307, 122.95, 113.865]
std  = [62.9932, 62.0887, 66.7048]

根迭代次数改变scheduler，越迭代到后面该值越小，这意味着希望训练过程中随机因素逐步减小：

def scheduler(epoch):
    if epoch <= 100:
       return 0.1
    if epoch <= 180:
       return 0.01
    return 0.0005

定义一个DenseNet模型（github搬运工上线！）：

def densenet(img_input,classes_num):

    def bn_relu(x):
        x = BatchNormalization()(x)
        x = Activation('relu')(x)
        return x

    def bottleneck(x):
        channels = growth_rate * 4
        x = bn_relu(x)
        x = Conv2D(channels,kernel_size=(1,1),strides=(1,1),padding='same',kernel_initializer=he_normal(),kernel_regularizer=regularizers.l2(weight_decay),use_bias=False)(x)
        x = bn_relu(x)
        x = Conv2D(growth_rate,kernel_size=(3,3),strides=(1,1),padding='same',kernel_initializer=he_normal(),kernel_regularizer=regularizers.l2(weight_decay),use_bias=False)(x)
        return x

    def single(x):
        x = bn_relu(x)
        x = Conv2D(growth_rate,kernel_size=(3,3),strides=(1,1),padding='same',kernel_initializer=he_normal(),kernel_regularizer=regularizers.l2(weight_decay),use_bias=False)(x)
        return x

    def transition(x, inchannels):
        x = bn_relu(x)
        x = Conv2D(int(inchannels * compression),kernel_size=(1,1),strides=(1,1),padding='same',kernel_initializer=he_normal(),kernel_regularizer=regularizers.l2(weight_decay),use_bias=False)(x)
        x = AveragePooling2D((2,2), strides=(2, 2))(x)
        return x

    def dense_block(x,blocks,nchannels):
        concat = x
        for i in range(blocks):
            x = bottleneck(concat)
            concat = concatenate([x,concat], axis=-1)
            nchannels += growth_rate
        return concat, nchannels

    def dense_layer(x):
        return Dense(classes_num,activation='softmax',kernel_initializer=he_normal(),kernel_regularizer=regularizers.l2(weight_decay))(x)


    # nblocks = (depth - 4) // 3 
    nblocks = (depth - 4) // 6 
    nchannels = growth_rate * 2

    x = Conv2D(nchannels,kernel_size=(3,3),strides=(1,1),padding='same',kernel_initializer=he_normal(),kernel_regularizer=regularizers.l2(weight_decay),use_bias=False)(img_input)

    x, nchannels = dense_block(x,nblocks,nchannels)
    x = transition(x,nchannels)
    x, nchannels = dense_block(x,nblocks,nchannels)
    x = transition(x,nchannels)
    x, nchannels = dense_block(x,nblocks,nchannels)
    x = bn_relu(x)
    x = GlobalAveragePooling2D()(x)
    x = dense_layer(x)
    return x

载入数据集，并对标签进行矩阵设置，改变数据集数据类型：

(x_train, y_train), (x_test, y_test) = cifar10.load_data()
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test  = keras.utils.to_categorical(y_test, num_classes)
x_train = x_train.astype('float32')
x_test  = x_test.astype('float32')

将数据集归一化，方便训练：

for i in range(3):
    x_train[:,:,:,i] = (x_train[:,:,:,i] - mean[i]) / std[i]
    x_test[:,:,:,i] = (x_test[:,:,:,i] - mean[i]) / std[i]

定义模型并打印简图，shell中打印的模型图太长了，就不贴了，长得一逼，需要看的话直接在shell中print summary就可以：

img_input = Input(shape=(img_rows,img_cols,img_channels))
output    = densenet(img_input,num_classes)
model     = Model(img_input, output)
# model.load_weights('ckpt.h5')
print(model.summary())
plot(model, to_file='cnn_model.png',show_shapes=True)

这个模型的参数情况如下图所示。图像识别的问题就是这点麻烦，参数太多了，大批求导，怪不得天价核弹这么贵还这么有市场：

本质上还是一个分类问题，使用交叉熵作为损失函数，定义输出结果的好坏：

sgd = optimizers.SGD(lr=.1, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])

设定回馈：

tb_cb     = TensorBoard(log_dir='./densenet/', histogram_freq=0)
change_lr = LearningRateScheduler(scheduler)
ckpt      = ModelCheckpoint('./ckpt.h5', save_best_only=False, mode='auto', period=10)
cbks      = [change_lr,tb_cb,ckpt]

添加上数据集扩充功能，对图像做一些弹性变换，比如水平翻转，垂直翻转，旋转：

print('Using real-time data augmentation.')
datagen   = ImageDataGenerator(horizontal_flip=True,width_shift_range=0.125,height_shift_range=0.125,fill_mode='constant',cval=0.)

datagen.fit(x_train)

训练模型：

model.fit_generator(datagen.flow(x_train, y_train,batch_size=batch_size), steps_per_epoch=iterations, epochs=epochs, callbacks=cbks,validation_data=(x_test, y_test))
model.save('densenet.h5')

训练过程cpu（i7-7820hk）满载：

在cpu上进行一次训练需要将近10000秒：

根据之前手写数字文本识别模型的经验（cpu需要12秒，gtx1080只需要0.47秒，gpu是cpu性能的25.72倍），把本程序的epoch改到2500，则gtx1080需要大概270小时。

在v100天价核弹上会是个什么情况呢？明天去试试看咯！

深夜赶工：CNN神经网络做彩色图像识别，用以测试天价核弹

JackieFang

引用和评论

GTAV智能驾驶源码详解（二）——Train the AlexNet

Anaconda安装教程以及Anaconda和pip配置国内镜像

如何减少跨团队交付摩擦？——基于 DevOps 与敏捷的最佳实践

Python 描述符

科学计算编程涉及到的技术栈简介

使用 chardet 判断文件编码需要注意的坑——过大的文件会导致高耗时

Python3 格式化时间（qbit）