TensorBoard Histogram Dashboard

2017-12-21

TensorFlow

TensorBoard 直方图仪表盘展示了在 TensorFlow 图中一些 Tensor 的分布是如何随着时间所改变的。它通过展示很多关于你的张量在不同时间点的的直方图来实现。

一个基本的例子

让我们从一个简单的情况开始：一个正态分布随机变量，均值随着时间改变。TensorFlow 有一个 ops tf.random_normal可以实现。在TensorBoard，我们经常使用 summary ops 来读入数据，tf.summary.histogram。

下面是部分代码，生成一些直方图的 summaries 包括正态分布数据，且均值随着时间增加。

import tensorflow as tf
k = tf.placeholder(tf.float32)

# Make a normal distribution, with a shifting mean
mean_moving_normal = tf.random_normal(shape=[1000], mean=(5*k), stddev=1)
# Record that distribution into a histogram summary
tf.summary.histogram('normal/moving_mean', mean_moving_normal)

# Setup a session and summary writer
sess = tf.Session()
writer = tf.summary.FileWriter('./tmp/histogram_example')

summaries = tf.summary.merge_all()

# Setup a loop and write the summaries to disk
N = 400
for step in range(N):
    k_val = step/float(N)
    summ = sess.run(summaries, feed_dict={k: k_val})
    writer.add_summary(summ, global_step=step)

1	!tensorboard --logdir=./tmp/histogram_example

可以在 TensorBoard 看到

tf.summary.histogram 接受任意 size 和 shape 的tensor，并且把它压缩到一个直方图数据结构包括很多 widths 和 counts 的 bins 中。例如，将要组织数据 [0.5, 1.1, 1.3, 2.2, 2.9, 2.99] 到 bins 中。我们可以做 3 个 bins：一个包含0到1的 bin，这将包含元素0.5,一个包含1到2的 bin，这将包含1.1,1.3,一个包含2-3的 bin，这将包含2.2,2.9,2.99。

TensorFlow 使用一个相似的方法来创建 bins，但是不像我们的例子，它没有创建整数 bins。对于大的，稀疏的数据集，这可能会产生数千个 bins。因此， bins 是指数分布的，很多 bins 接近于0而对于大的数则相对少。然而，可视化指数分布的 bins 是困难的，如果高度被用来编码数量，则宽的 bins 将会产据更多空间，即使它们有同样多的元素。反之，编码区域内数目使高度无法比较。因此，histograms 重采样数据到均匀的 bins 中，这将在一些情况导致一些 unfortunate artifacts。

histogram visualizer 的每一个切片展示了一个单一的 histogram。切片是一步步组织的。老的切片(例如 step 0)放在后面且颜色更深，新的切片(例如 step 400)放在前面颜色更浅。y 轴演示了迭代步的序号。

你可以用鼠标在 histogram 上移动来看更详细的信息。例如，下图我们可以看到 histogram 在第176步有一个中心在 2.25 并且有177个元素的 bin。

另外，你可能注意到 histogram 切片没有总是在步骤数或时间上均匀分布。这是因为 TensorBoard 使用(Reservoir sampling)蓄水池抽样来保持所有histogram 的一个子集，来存在内存中。蓄水池抽样保证了每一个样本有总样的似然度被选中，但是因为它是一个随机算法，抽样并不总是在均匀的步长中发生。

Overlay 重叠模式

左边有控制面板允许你改变 histogram 的模式从 offset 到 overlay：

在 offset 模式中，可视化旋转 45 度，所以独立的 histogram 切片可以随时间伸展，另一个模式则把所有的都在一样的 y 轴上。

现在，每一个切片是分离的先，y 轴展示了 item 数目在每一个 bucket 中。颜色越深越早，越浅越近，你同样可以用鼠标移动获取信息。

通常，重叠模式在你像直接比较不同 histograms 的数目时比较有用。

多峰的分布

Histogram 方便可视化多峰分布。例如构建一个简单的双峰分布通过把不同的正态分布的输出联系起来。代码如下：

import tensorflow as tf

k = tf.placeholder(tf.float32)

# Make a normal distribution, with a shifting mean
mean_moving_normal = tf.random_normal(shape=[1000], mean=(5*k), stddev=1)
# Record that distribution into a histogram summary
tf.summary.histogram('normal/moving_mean', mean_moving_normal)

# Make a normal distribution with shrinking variance
variance_shrinking_normal = tf.random_normal(shape=[1000], mean=0, stddev=1-(k))
# Record that distribution too
tf.summary.histogram('normal/shrinking_variance', variance_shrinking_normal)

# Let's combine both of those distribution into one dataset
normal_combined = tf.concat([mean_moving_normal, variance_shrinking_normal], 0)
# We add anothor hisotgram summary to record the combined distribution
tf.summary.histogram('normal/bimodal', normal_combined)

summaries = tf.summary.merge_all()

# Setup a session and summary writer
sess = tf.Session()
writer = tf.summary.FileWriter('./tmp/histogram_example')

# Setup a loop and write the summaries to disk
N = 400
for step in range(N):
    k_val = step/float(N)
    summ = sess.run(summaries, feed_dict={k: k_val})
    writer.add_summary(summ, global_step=step)

1	!tensorboard --logdir=./tmp/histogram_example

从上面的例子中你已经知道了 moving mean 正态分布。现在我们有一个 shrinking variance 分布：

当我们联合它们，我们的到一张图清晰地显示了 divergent，bimodal 结构：

一些更多的分布

just for fon。我们生成并可视化一些更多的分布，并且联合它们到一张图表里，下面是代码

import tensorflow as tf

k = tf.placeholder(tf.float32)

# Make a normal distribution, with a shifting mean
mean_moving_normal = tf.random_normal(shape=[1000], mean=(5*k), stddev=1)
# Record that distribution into a histogram summary
tf.summary.histogram('normal/moving_mean', mean_moving_normal)

# Make a normal distribution with shrinking variance
variance_shrinking_normal = tf.random_normal(shape=[1000], mean=0, stddev=1-(k))
# Record that distribution too
tf.summary.histogram('normal/shrinking_variance', variance_shrinking_normal)

# Let's combine both of those distribution into one dataset
normal_combined = tf.concat([mean_moving_normal, variance_shrinking_normal], 0)
# We add anothor hisotgram summary to record the combined distribution
tf.summary.histogram('normal/bimodal', normal_combined)

# Add a gamma distribution
gamma = tf.random_gamma(shape=[1000], alpha=k)
tf.summary.histogram('gamma', gamma)

# And a poisson distribution
poisson = tf.random_poisson(shape=[1000], lam=k)
tf.summary.histogram('poisson', poisson)

# And a uniform distribution
uniform = tf.random_uniform(shape=[1000], maxval=k*10)
tf.summary.histogram('uniform', uniform)

# Finally, combine everything together!
all_distributions = [mean_moving_normal, variance_shrinking_normal,
                     gamma, poisson, uniform]
all_combined = tf.concat(all_distributions, 0)
tf.summary.histogram("all_combined", all_combined)

summaries = tf.summary.merge_all()

# Setup a session and summary writer
sess = tf.Session()
writer = tf.summary.FileWriter('./tmp/histogram_example1')

# Setup a loop and write the summaries to disk
N = 400
for step in range(N):
    k_val = step/float(N)
    summ = sess.run(summaries, feed_dict={k: k_val})
    writer.add_summary(summ, global_step=step)

1	!tensorboard --logdir=./tmp/histogram_example1

poisson 分布定义在整数上，所以所有的值生成在整数上，histogram 压缩到浮点的 bins中，造成了可视化有一些小的凸起而不是完美的峰。