TensorFlow 第一步

2018-01-27

3.1 Tensorflow 的编译及安装

3.2 Tensorflow 实现 Softmax Regression 识别手写数字

首先加载 MNIST 数据

1 2	from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets('MNIST_data/', one_hot=True)

Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting MNIST_data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz

1
2
3

print(mnist.train.images.shape, mnist.train.labels.shape)
print(mnist.test.images.shape, mnist.test.labels.shape)
print(mnist.validation.images.shape, mnist.validation.labels.shape)

(55000, 784) (55000, 10)
(10000, 784) (10000, 10)
(5000, 784) (5000, 10)

我们将在训练集上训练, 验证集上检验效果并决定何时完成训练, 最后在测试集上评测 (准确率, 召回率, F1-score)

使用 Softmax Regression 算法训练分类模型.
工作原理: 将可以判定为某类的特征相加, 再将特征转换为判定是这一类的概率.

$i$ 代表第 $i$ 类, $j$ 代表一张图片第 $j$ 个像素

$$
feature_i = \Sigma_j W_{i,j}x_j + b_i
$$

再标准化

$$
softmax(x) = normalize(exp(x))
$$

即

$$
softmax(x)_i = \frac{exp(x_i)}{\Sigma_j exp(x_j)}
$$

一行表达即

$$
y = softmax(Wx + b)
$$

接下来用 Tensorflow 实现 Softmax Regression

首先载入 Tensorflow 库, 创建新的 InteractiveSession, 这个命令将这个 session 注册为默认的 session, 之后的运算也在这个 session 中. 创建 Placeholder 存放输入数据. [None, 784] 代表输入条数不限, 每条输入 784 维.

1
2
3

import tensorflow as tf
sess = tf.InteractiveSession()
x = tf.placeholder(tf.float32, [None, 784])

接下来给 weights 和 biases 创建 Variable 对象.
W 的 shape 是 [784, 10], 784 是特征维数, 10 代表 10 维向量 (one-hot 编码)

1 2	W = tf.Variable(tf.zeros([784, 10])) b = tf.Variable(tf.zeros([10]))

1	y = tf.nn.softmax(tf.matmul(x, W) + b)

接下来定义 loss function. 多分类通常用 cross-entropy

Cross-entropy 定义如下, $y$ 是预测的概率分布, $y^{‘}$ 是真实的概率分布 (即 Label 的 one-hot 编码), 用它判断模型对真实概率分布估计的准确程度.

$$
H_{y^{‘}} (y) = -\Sigma_i y^{‘}_i log(y_i)
$$

1
2
3

y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), 
                                              reduction_indices=[1]))

先定义一个 placeholder, 输入真实的 label 来计算 cross-entropy.
tf.reduce_mean 对每个 batch 数据结果求平均值.

接下来定义优化算法. 采用 SGD. 直接调用 tf.train.GradientDescentOptimizer, 设置学习率为 0.5

1	train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

下一步使用全局参数初始化器 tf.global_variables_initializer

1	tf.global_variables_initializer().run()

最后, 迭代执行训练操作 train_step. 每次随机从训练集中抽取 100 条样本的 mini-batch, feed 给 placeholder

1
2
3

for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    train_step.run({x: batch_xs, y_: batch_ys})

接下来用测试集测试模型准确率. tf.argmax(y, 1) # 1 按行, 0 按列返回最大值下标

1	correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))

1	accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

tf.Tensor.eval(feed_dict=None, session=None)

在 session 里计算 tensor 值. session 若未指定, 则使用默认的 session

1	print(accuracy.eval({x: mnist.test.images, y_: mnist.test.labels}))

0.92

整个流程分为 4 个部分

定义算法公式, forward 计算
定义 loss, 选定优化器, 并指定该优化器优化 loss
迭代地对数据训练
测试集或验证集上对准确率评测

Tensorflow 与 Spark 类似, 各个公式只是 Computation Graph, 调用 run 方法并 feed 数据时才执行. 可以调用 run 方法执行节点获取结果.