TensorFlow 学习教程（一）Using GPUs

2017-12-23

TensorFlow

支持的设备

在一个特定的系统中，有很多计算设备。TensorFlow 中，支持的设备是 CPU 与 GPU。它们用 strings 来表示。例如：

“/cpu:0”：表示CPU
“/device:GPU:0”: 第一个GPU
“/device:GPU:1”: 第二个GPU

如果一个 TensorFlow 操作同时使用 CPU 与 GPU，GPU 设备将会有更高的优先级。例如，matmul 操作同时有 CPU 与 GPU 实现。一个带有 cpu：0 与 gpu:0 的系统，gpu：0 将会被选择来运行 matmul。

记录设备指派情况

为了找出你的 ops 与 tensors 被指派给了哪个设备，创建 session 时加上 log_device_placement 配置选项为 True。

import tensorflow as tf

# Creates a graph
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op
print(sess.run(c))

手动指派设备

如果你想要手动指派具体的设备，你可以使用 with tf.device 来创建一个设备上下文环境来使在环境内的所有操作都在同一个设备进行。

# Creates a graph.
with tf.device('/cpu:0'):
  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))

注意到 matmul 操作没有在 with 环境中。

允许 GPU 内存增长

默认情况下，TensorFlow 映射几乎所有的 GPU 显存来对过程可见。这使得更有效率地使用更精确的 GPU 显存资源来减少内存碎片。

在一些情况下需要只分配部分的可见内存，或者只按照需要来分配内存。TensorFlow 在 Session 中提供了两个配置选项来控制。

第一个是 allow_growth 选项，试图分配足够的内存：它一点一点地分配内存，只有需要更多内存时才增加内存。注意到我们没有释放内存，因为这会导致更严重的内存碎片化。为了开启这个选项，如下

1
2
3

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)

第二种方法是 per_process_gpu_memory_fraction 选项，它决定了分配给每个 GPU 的内存的百分比。例如，你可以告诉 TensorFlow 只分配40% 的总内存：

1
2
3

config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config, ...)

如果你想限制 GPU 内存的使用，这将很有效。

在多 GPU 系统中使用单 GPU

如果你有超过一个 GPU，带有最低 ID 的 GPU 将会被默认选中。如果你在另一个 GPU 上运行，你需要指定

# Creates a graph.
with tf.device('/device:GPU:2'):
  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
  c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))

如果这个设备不存在，将会报错：InvalidArgumentError

如果你想让 TensorFlow 自动选择一个存在并支持的设备来运行如果指定的那个不存在，你可以在 Session 中设置配置选项 allow_soft_placement 为 True 。

# Creates a graph.
with tf.device('/device:GPU:2'):
  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
  c = tf.matmul(a, b)
# Creates a session with allow_soft_placement and log_device_placement set
# to True.
sess = tf.Session(config=tf.ConfigProto(
      allow_soft_placement=True, log_device_placement=True))
# Runs the op.
print(sess.run(c))

使用多 GPUs

如果你想运行 TensorFlow 在多 GPUs 上。你可以创建你的模型在一个 multi-tower 结构, 在这个结构里每个 tower 分别被指配给不同的 GPU 运行. 比如:

# Creates a graph.
c = []
for d in ['/device:GPU:2', '/device:GPU:3']:
  with tf.device(d):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2])
    c.append(tf.matmul(a, b))
with tf.device('/cpu:0'):
  sum = tf.add_n(c)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(sum))

输出如下

Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K20m, pci bus
id: 0000:02:00.0
/job:localhost/replica:0/task:0/device:GPU:1 -> device: 1, name: Tesla K20m, pci bus
id: 0000:03:00.0
/job:localhost/replica:0/task:0/device:GPU:2 -> device: 2, name: Tesla K20m, pci bus
id: 0000:83:00.0
/job:localhost/replica:0/task:0/device:GPU:3 -> device: 3, name: Tesla K20m, pci bus
id: 0000:84:00.0
Const_3: /job:localhost/replica:0/task:0/device:GPU:3
Const_2: /job:localhost/replica:0/task:0/device:GPU:3
MatMul_1: /job:localhost/replica:0/task:0/device:GPU:3
Const_1: /job:localhost/replica:0/task:0/device:GPU:2
Const: /job:localhost/replica:0/task:0/device:GPU:2
MatMul: /job:localhost/replica:0/task:0/device:GPU:2
AddN: /job:localhost/replica:0/task:0/cpu:0
[[  44.   56.]
 [  98.  128.]]