TensorFlow 实现卷积神经网络

2018-01-28

TensorFlow 实战

5.1 卷积神经网络简介

一般的 CNN 由多个卷积层构成, 每个卷基层中通常会进行如下操作:

图像通过多个不同的卷积核的滤波, 加上 bias, 提取局部特征, 每一个卷积核映射出一个新的图像
卷积核的滤波结果进行非线性激活函数处理, 最常见是 ReLU
进行 pooling 操作 (降采样), 一般使用 max pooling, 保留最显著特征

上面即最常见卷积层, 也可以加上 LRN (Local Response Normalization, 局部响应归一化层), 流行的 trick 有 Batch Normalization.

卷积核权值共享, 不必担心隐含节点和图片大小, 参数量只和卷积核大小, 卷积核数量有关
局部连接
增加卷积核数量提取多种特征, 每一个卷积核滤波结果是一类特征映射 Feature Map. 使用 100 个卷积核在第一个卷积层已经很充足
参数数量下降, 但隐含节点数量没有下降, 与步长有关.

CNN 要点

局部连接 (Local Connection): 减少参数, 减轻过拟合
权值共享 (Weight Sharing): 减少参数, 减轻过拟合, 对平移容忍性
池化层 (Pooling) 中的降采样 (Down-Sampling): 对轻度形变容忍性

LeNet5 是最早的 DCNN 之一, 特性:

每个卷积层包括: 卷积, 池化, 非线性激活函数
使用卷积提取空间特征
降采样 (Subsample) 的平均池化层 (Average Pooling)
双曲正切 (Tanh) 或 S 型 (Sigmoid) 激活函数
MLP 作为最后的分类器
层与层之间稀疏连接

C1 有 6 个卷积核, 尺寸 5*5, 共 (5*5+1)*6=156 个参数, 1 代表 1 个 bias. 后面是 2*2 的平均池化层 S2 降采样. 再 Sigmoid 激活函数. 第二个卷积层 C3, 尺寸 5*5, 16 个卷积核. S4 与 S2 一致. 第三个卷积层 C5 有 120 个卷积核, 尺寸 5*5, 构成了全连接. F6 全连接, 84 个隐含节点, Sigmoid 激活函数. 最后一层由欧式 RBF 单元组成, 输出分类结果.

5.2 TensorFlow 实现简单的卷积网络

本节使用两个卷积层与一个全连接层.

from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
sess = tf.InteractiveSession()

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz

def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)

def bias_variable(shape):
    # 因为使用 ReLU, bias 置小的正值避免 dead neurons
    initial = tf.constant(0.1, shape=shape) 
    return tf.Variable(initial)

接下来定义卷积与池化函数

tf.nn.conv2d 参数 x 是输入, W 是卷积的参数, 例如 [5, 5, 1, 32], 前面两个是卷积核尺寸, 第三个是代表几个 channel, 这里只有灰度单色, 故为 1, RGB 为 3. 最后一个数字代表卷积核数量. strides 代表每一维度的步长, strides[0]=strides[3]=1. padding 代表边界处理方式. SAME 代表给边界加上 Padding 使卷积的输出与输入尺寸 SAME. 用 0 填充.

tf.nn.max_pool 是最大池化函数. 2*2 最大池化, strides 横竖方向步长为 2.

def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1],
                         padding='SAME')

定义输入, 将 1 维向量转换为 2 维图像, 尺寸为 [-1,28,28,1], -1 代表样本数量不固定, 最后的 1 代表 channel 数量.

1
2
3

x = tf.placeholder(tf.float32, [None, 784])
y_ = tf.placeholder(tf.float32, [None, 10])
x_image = tf.reshape(x, [-1, 28, 28, 1])

定义第一个卷积层

W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

定义第二个卷积层

W_conv2 = weight_variable([5, 5, 32, 64]) # 第一个卷积层层有 32 个 channel
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

经过两次 2*2 的最大池化, 边长为 1/4, 图片尺寸变为 7*7. 第二个卷积层卷积核数量 64, 输出的 tensor 为 7*7*64. 接下来全连接层, 隐含节点 1024

W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

使用 Dropout 减轻过拟合

1 2	keep_prob = tf.placeholder(tf.float32) h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

最后链接 Softmax 得到概率输出

1
2
3

W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])
y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)

定义损失函数 cross entropy

1
2
3

cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y_conv),
                                             reduction_indices=[1]))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

定义评测准确率

1 2	correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

下面开始训练, keep_prob 为 0.5, mini-batch 为 50, 20000 次迭代

tf.global_variables_initializer().run()
for i in range(20000):
    batch = mnist.train.next_batch(50)
    if i%100 == 0:
        train_accuracy = accuracy.eval(feed_dict={x: batch[0],
                                                 y_: batch[1],
                                                 keep_prob: 1.0})
        print('step %d, training accuracy %g' % (i, train_accuracy))
    train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

step 0, training accuracy 0.06
step 100, training accuracy 0.92
step 200, training accuracy 0.9
......
step 19600, training accuracy 1
step 19700, training accuracy 1
step 19800, training accuracy 1
step 19900, training accuracy 1

1 2	print('test accuracy %g' % accuracy.eval(feed_dict={ x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))

test accuracy 0.9919

接下来实现稍微复杂的 CNN, 采用 CIFAR-10 数据集训练

5.3 TensorFlow 实现进阶的卷积网络

CIFAR-10 包含 60000 张 32*32 的彩色图像, 训练集 50000 张, 测试集 10000 张. 一共 10 类, 每类 6000 张. 分别是 airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck. 一张图片只有 1 类物体. 兄弟版本 CIFAR-100.

state-of-art 错误率 3.5%. 本节实现的 CNN 根据 Alex 描述的 cuda-convnet 模型修改得到. 3000 个 batch, 每个 128 个样本, 正确率 73%. 100k 个 batch, 结合学习速度的 decay, 正确率 86%.

本模型技巧:

对 weights L2 正则化
对图片翻转随机裁剪等数据增强, 制造更多样本
每个卷积-最大池化层后使用 LRN, 增强泛化能力

下载 TensorFlow Models 库, 来获得 CIFAR-10 数据

git clone https://github.com/tensorflow/models.git cd models/tutorials/image/cifar10

import cifar10, cifar10_input
import tensorflow as tf
import numpy as np
import time

/root/anaconda3/envs/tensorflow/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6
  return f(*args, **kwds)

1
2
3

max_steps = 3000 # 训练轮数
batch_size = 128
data_dir = '/tmp/cifar10_data/cifar-10-batches-bin' # 数据下载路径

定义初始化 weight 的函数. 使用 w1 控制 L2 loss 大小, 使用 tf.nn.l2_loss 计算 weight 的 L2 loss, 再用 tf.multiply 让 L2 loss 乘 w1, 得到 weight loss.
使用 tf.add_to_collection 把 weight loss 统一存到 collection 中, 名为 ‘losses’, 计算神经网络总体 loss 时用上.

def variable_with_loss(shape, stddev, w1):
    var = tf.Variable(tf.truncated_normal(shape, stddev=stddev))
    if w1 is not None:
        weight_loss = tf.multiply(tf.nn.l2_loss(var), w1, name='weight_loss')
        tf.add_to_collection('losses', weight_loss)
    return var

下载数据集并解压

1	cifar10.maybe_download_and_extract()

使用 cifar10_input 类中的 distorted_inputs 函数产生训练数据, 包括特征及 label, 每次执行生成一个 batch_size 的样本, 并进行了数据增强, 并做了标准化.

1 2	images_train, labels_train = cifar10_input.distorted_inputs( data_dir=data_dir, batch_size=batch_size)

Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.

1	images_train

<tf.Tensor 'shuffle_batch:0' shape=(128, 24, 24, 3) dtype=float32>

再使用 cifar10_input.inputs 生成测试数据, 进行了裁剪与标准化

1
2
3

images_test, labels_test = cifar10_input.inputs(eval_data=True,
                                               data_dir=data_dir,
                                               batch_size=batch_size)

创建输入数据的 placeholder, 包括特征和 label. 数据尺寸中第一个值应为 batch_size

1 2	image_holder = tf.placeholder(tf.float32, [batch_size, 24, 24, 3]) label_holder = tf.placeholder(tf.int32, [batch_size])

接下来创建第一个卷积层. 其中最后使用了 LRN 层, 它模仿了生物神经系统的’侧抑制’机制, 响应较大的值更大, 抑制其他反馈较小的神经元, 增强泛化能力. 在 AlexNet 中使用了 LRN 层.

weight1 = variable_with_loss(shape=[5, 5, 3, 64] ,stddev=5e-2, w1=0.0)
kernel1 = tf.nn.conv2d(image_holder, weight1, [1, 1, 1, 1], padding='SAME')
bias1 = tf.Variable(tf.constant(0.0, shape=[64]))
conv1 = tf.nn.relu(tf.nn.bias_add(kernel1, bias1))
pool1 = tf.nn.max_pool(conv1, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1],
                      padding='SAME')
norm1 = tf.nn.lrn(pool1, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75)

现在创建第二个卷积层. 上一层卷积核数目 64, 故本层卷积核尺寸第三个维度为 64. bias 初始化 0.1. 最后调换了最大池化层与 LRN 的顺序

weight2 = variable_with_loss(shape=[5, 5, 64, 64], stddev=5e-2,
                            w1=0.0)
kernel2 = tf.nn.conv2d(norm1, weight2, [1, 1, 1, 1], padding='SAME')
bias2 = tf.Variable(tf.constant(0.1, shape=[64]))
conv2 = tf.nn.relu(tf.nn.bias_add(kernel2, bias2))
norm2 = tf.nn.lrn(conv2, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75)
pool2 = tf.nn.max_pool(norm2, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1],
                      padding='SAME')

下面使用全连接层. 使用 tf.reshape 将每个样本变成一维向量. get_shape 获取扁平化后的长度. 隐含层节点数 384, 对这里的权重做了 L2 正则.

reshape = tf.reshape(pool2, [batch_size, -1])
dim = reshape.get_shape()[1].value
weight3 = variable_with_loss(shape=[dim, 384], stddev=0.04, w1=0.004)
bias3 = tf.Variable(tf.constant(0.1, shape=[384]))
local3 = tf.nn.relu(tf.matmul(reshape, weight3) + bias3)

再次全连接

1
2
3

weight4 = variable_with_loss(shape=[384, 192], stddev=0.4, w1=0.004)
bias4 = tf.Variable(tf.constant(0.1, shape=[192]))
local4 = tf.nn.relu(tf.matmul(local3, weight4) + bias4)

最后一层, weight 正态分布方差为上个隐含层节点数的倒数. 这里我们把 softmax 操作放在了计算 loss 的部分, 直接比较 inference 输出各类数值大小即可获得分类结果.

1
2
3

weight5 = variable_with_loss(shape=[192, 10], stddev=1/192.0, w1=0.0)
bias5 = tf.Variable(tf.constant(0.0, shape=[10]))
logits = tf.add(tf.matmul(local4, weight5), bias5)

到这里完成了 inference 的部分.

接下来计算 CNN 的 loss. 我们把 softmax 与 cross entropy loss 的计算合在一起. tf.nn.sparse_softmax_cross_entropy_with_logits.

tf.add_n 将整体 losses 的 collection 中 loss 求和, 得到最终 loss

def loss(logits, labels):
    labels = tf.cast(labels, tf.int64)
    cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
        logits=logits, labels=labels, name='cross_entropy_per_example')
    cross_entropy_mean = tf.reduce_mean(cross_entropy,
                                       name='cross_entropy')
    tf.add_to_collection('losses', cross_entropy_mean)
    
    return tf.add_n(tf.get_collection('losses'), name='total_loss')

将 logits 节点和 label_placeholder 传入 loss 函数获得最终 loss

1	loss = loss(logits, label_holder)

1	train_op = tf.train.AdamOptimizer(1e-3).minimize(loss)

使用 tf.nn.in_top_k 函数求输出结果中 top k 的准确率, 默认 top 1.

1	top_k_op = tf.nn.in_top_k(logits, label_holder, 1)

1 2	sess = tf.InteractiveSession() tf.global_variables_initializer().run()

启动图片数据增强的线程队列, 一共使用 16 个线程加速.

1	tf.train.start_queue_runners()

正式开始训练.

for step in range(max_steps):
    start_time = time.time()
    image_batch, label_batch = sess.run([images_train, labels_train])
    _, loss_value = sess.run([train_op, loss],
                            feed_dict={image_holder: image_batch,
                                      label_holder: label_batch})
    duration = time.time() - start_time
    if step % 10 == 0:
        examples_per_sec = batch_size / duration
        sec_per_batch = float(duration)
        
        format_str = ('step %d, loss=%.2f (%.1f examples/sec; %.3f sec/batch)')
        print(format_str % (step, loss_value, examples_per_sec, sec_per_batch))

step 0, loss=22.67 (18.1 examples/sec; 7.089 sec/batch)
step 10, loss=21.20 (2764.8 examples/sec; 0.046 sec/batch)
step 20, loss=19.86 (2382.6 examples/sec; 0.054 sec/batch)
step 30, loss=18.91 (2671.7 examples/sec; 0.048 sec/batch)
......
step 2940, loss=1.16 (2765.8 examples/sec; 0.046 sec/batch)
step 2950, loss=1.00 (2583.5 examples/sec; 0.050 sec/batch)
step 2960, loss=0.98 (2499.1 examples/sec; 0.051 sec/batch)
step 2970, loss=1.01 (2658.3 examples/sec; 0.048 sec/batch)
step 2980, loss=1.16 (2629.0 examples/sec; 0.049 sec/batch)
step 2990, loss=1.01 (2608.5 examples/sec; 0.049 sec/batch)

下面评测模型在测试集上准确率

num_examples = 10000
import math
num_iter = int(math.ceil(num_examples / batch_size))
true_count = 0
total_sample_count = num_iter * batch_size
step = 0

1	total_sample_count

num_iter

while step < num_iter:
    image_batch, label_batch = sess.run([images_test, labels_test])
    predictions = sess.run([top_k_op], feed_dict={image_holder: image_batch,
                                                 label_holder: label_batch})
    true_count += np.sum(predictions)
    step += 1

1 2	precision = true_count / total_sample_count print('precision @ 1 = %.3f' % precision)

precision @ 1 = 0.694