Thursday, 26 July 2018
Tensorflow团队
宣布 停止 支持 1.2以后mac版 的 tensorflow gpu版本 。
因 此没办法直接 安 装 只 能 自己 用 源 码编译了。Tensorflow 1.8 with CUDA on macOS High Sierra 10.13.6
CPU 运行 Tensorflow
重要 的 事情 说三 遍 :相 关的驱动以及编译环境工具 必须选择配 套的版本 ,否 则编译不成功 !!!
- TensorFlow r1.8 source code,
最新 的 1.9貌似还有问题 - macOS 10.13.6,这个应该关系
不 大 - 显卡驱动 387.10.10.10.40.105,
支持 的 CUDA 9.1 - CUDA 9.2,这个
是 CUDA 驱动,可 以高于上面 的 显卡支持 的 CUDA版本 ,也就是 CUDA Driver 9.2 - cuDNN 7.2,
与 上面 的 CUDA对应,直接 安 装 最新 版 - XCode 8.2.1,这个
是 重点 ,请降级到这个版本 ,否 则会编译出 错或运行时出错Segmentation Fault
- bazel 0.14.0,这个
是 重点 ,请降级到这个版本 - Python 3.6,这个
是 重点 ,不要 使用 最新 版 的 Python 3.7 截止目前 编译会 有 问题
-
Xcode 8.2.1
https://developer.apple.com/download/more/
Xcode_8.2.1.xip
-
bazel-0.14.0
https://github.com/bazelbuild/bazel/releases/download/0.14.0/bazel-0.14.0-installer-darwin-x86_64.sh
-
CUDA Toolkit 9.2
-
cuDNN v7.2.1
-
Tensorflow source code,333M
$ git clone https://github.com/tensorflow/tensorflow -b r1.8
$ brew unlink python
$ brew install https://raw.githubusercontent.com/Homebrew/homebrew-core/f2a764ef944b1080be64bd88dca9a1d80130c558/Formula/python.rb
$ pip3 install --upgrade pip setuptools wheel
# $ brew switch python 3.6.5_1
不要 使用 Python 3.7.0,否 则编译会有 问题
编译
$ brew switch python 3.7.0
需要 降 级 Xcode到 8.2.1
/Applications/Xcode.app
,
$ sudo xcode-select -s /Applications/Xcode.app
确认
$ cc -v
Apple LLVM version 8.0.0 (clang-800.0.42.1)
Target: x86_64-apple-darwin17.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
Command Line Tools,cc
即 clang这个很
重要 ,否 则虽然 编译成功 但 是 跑复杂一点项目会出现Segmentation Fault
由 于用到 CUDA的 lib不 是 在 系 统目录下,因 此需要 设置环境变量来 指向
在 Mac下 LD_LIBRARY_PATH 无效,使用 的 是 DYLD_LIBRARY_PATH
~/.bash_profile
~/.zshrc
export CUDA_HOME=/usr/local/cuda
export DYLD_LIBRARY_PATH=$CUDA_HOME/lib:$CUDA_HOME/extras/CUPTI/lib
export PATH=$CUDA_HOME/bin:$PATH
CUDA
是 NVIDIA推出的 用 于自家 GPU的 并行计算框 架 ,也就是 说CUDA只 能 在 NVIDIA的 GPU上 运行,而且只 有 当 要 解 决的计算问题是 可 以大量 并行计算的 时候才能 发挥CUDA的 作用 。
在 这里找到你的显卡型 号 ,看 是 否 支持
GPU | Compute Capability |
---|---|
GeForce GTX 750 Ti | 5.0 |
如果
$ sudo /usr/local/bin/uninstall_cuda_drv.pl
$ sudo /usr/local/cuda/bin/uninstall_cuda_9.1.pl
$ sudo rm -rf /Developer/NVIDIA/CUDA-9.1/
$ sudo rm -rf /Library/Frameworks/CUDA.framework
$ sudo rm -rf /usr/local/cuda/
为了
万 无一 失 ,最 好 还是重 启一下
- GPU Driver
即 显卡驱动-
我 的 macOS是 10.13.6 对应的 驱动已 经安装 最新 版 387.10.10.10.40.105
https://www.nvidia.com/download/driverResults.aspx/136062/en-us
Version: 387.10.10.10.40.105 Release Date: 2018.7.10 Operating System: macOS High Sierra 10.13.6 CUDA Toolkit: 9.1
- CUDA Driver
-
单独
先安 装 CUDA Driver,可 以选择最新 版本 ,看 他 对显卡驱动的支持 -
cudadriver_396.148_macos.dmg
New Release 396.148 CUDA driver update to support CUDA Toolkit 9.2, macOS 10.13.6 and NVIDIA display driver 387.10.10.10.40.105 Recommended CUDA version(s): CUDA 9.2 Supported macOS 10.13
- CUDA Toolkit
-
可 以选择最新 版本 ,这里选择 9.2 -
cuda_9.2.148_mac.dmg、cuda_9.2.148.1_mac.dmg
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Tue_Jun_12_23:08:12_CDT_2018
Cuda compilation tools, release 9.2, V9.2.148
确认驱动
$ kextstat | grep -i cuda.
149 0 0xffffff7f838d3000 0x2000 0x2000 com.nvidia.CUDA (1.1.0) E13478CB-B251-3C0A-86E9-A6B56F528FE8 <4 1>
测试CUDA
$ cd /usr/local/cuda/samples
$ sudo make -C 1_Utilities/deviceQuery
$ ./bin/x86_64/darwin/release/deviceQuery
./bin/x86_64/darwin/release/deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 750 Ti"
CUDA Driver Version / Runtime Version 9.2 / 9.2
CUDA Capability Major/Minor version number: 5.0
Total amount of global memory: 2048 MBytes (2147155968 bytes)
( 5) Multiprocessors, (128) CUDA Cores/MP: 640 CUDA Cores
GPU Max Clock rate: 1254 MHz (1.25 GHz)
Memory Clock rate: 2700 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 2097152 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Supports Cooperative Kernel Launch: No
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.2, CUDA Runtime Version = 9.2, NumDevs = 1
Result = PASS
如果
最 后 显示 Result = PASS,那 么CUDA就工作 正常
如果
The version ('9.1') of the host compiler ('Apple clang') is not supported
说明 Xcode
版 本太 新 了 ,要求 降 级 Xcode
cuDNN(CUDA Deep Neural Network library):
是 NVIDIA打 造 的 针对深度 神 经网络的加速 库,是 一个用于深层神经网络的GPU加速 库。如果你要用 GPU训练模型 ,cuDNN不 是 必须的 ,但 是 一般会采用这个加速库。
cuDNN
- https://developer.nvidia.com/rdp/cudnn-download
下 载最新版 cuDNN v7.2.1 for CUDA 9.2- cudnn-9.2-osx-x64-v7.2.1.38.tgz
$ tar -xzvf cudnn-9.2-osx-x64-v7.2.1.38.tgz
$ sudo cp cuda/include/cudnn.h /usr/local/cuda/include
$ sudo cp cuda/lib/libcudnn* /usr/local/cuda/lib
$ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib/libcudnn*
$ rm -rf cuda
用 来 查看 CUDA 运行情 况
$ brew cask install cuda-z
如果
有 已 经编译好的 版本 ,则可以跳过本章 直接 到 "安 装 "部分
请
参考 前面 部分
Python
$ python3 --version
Python 3.6.5
不要 使用 Python 3.7.0,否 则编译会有 问题
Python
$ pip3 install six numpy wheel
Coreutils,llvm,OpenMP
$ brew install coreutils llvm cliutils/apple/libomp
Bazel
需要 注意 ,这里必须是 0.14.0版本 ,新 或 旧都 能 导致编译失 败。下 载0.14.0版本 ,bazel发布页
$ curl -O https://github.com/bazelbuild/bazel/releases/download/0.14.0/bazel-0.14.0-installer-darwin-x86_64.sh
$ chmod +x bazel-0.14.0-installer-darwin-x86_64.sh
$ ./bazel-0.14.0-installer-darwin-x86_64.sh
$ bazel version
Build label: 0.14.0
太 低 版本 可能 会 导致找不到 环境变量,从而 Library not loaded
检查NVIDIA开发环境
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Tue_Jun_12_23:08:12_CDT_2018
Cuda compilation tools, release 9.2, V9.2.148
检查clang
$ cc -v
Apple LLVM version 8.0.0 (clang-800.0.42.1)
Target: x86_64-apple-darwin17.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
这里
$ curl -O https://raw.githubusercontent.com/SixQuant/tensorflow-macos-gpu/master/tensorflow-macos-gpu-r1.8-src.tar.gz
$ git clone https://github.com/tensorflow/tensorflow -b r1.8
$ cd tensorflow
$ curl -O https://raw.githubusercontent.com/SixQuant/tensorflow-macos-gpu/master/patch/tensorflow-macos-gpu-r1.8.patch
$ git apply tensorflow-macos-gpu-r1.8.patch
$ curl -o third_party/nccl/nccl.h https://raw.githubusercontent.com/SixQuant/tensorflow-macos-gpu/master/patch/nccl.h
$ which python3
/usr/local/bin/python3
$ ./configure
Please specify the location of python. [Default is /usr/local/opt/python@2/bin/python2.7]: /usr/local/bin/python3
Found possible Python library paths:
/usr/local/Cellar/python/3.6.5_1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages
Please input the desired Python library path to use. Default is [/usr/local/Cellar/python/3.6.5_1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages]
Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: n
No Google Cloud Platform support will be enabled for TensorFlow.
Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: n
No Hadoop File System support will be enabled for TensorFlow.
Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: n
No Amazon S3 File System support will be enabled for TensorFlow.
Do you wish to build TensorFlow with Apache Kafka Platform support? [Y/n]: n
No Apache Kafka Platform support will be enabled for TensorFlow.
Do you wish to build TensorFlow with XLA JIT support? [y/N]: n
No XLA JIT support will be enabled for TensorFlow.
Do you wish to build TensorFlow with GDR support? [y/N]: n
No GDR support will be enabled for TensorFlow.
Do you wish to build TensorFlow with VERBS support? [y/N]: n
No VERBS support will be enabled for TensorFlow.
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.
Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: 9.2
Please specify the location where CUDA 9.1 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 7.2
Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,5.2]3.0,3.5,5.0,5.2,6.0,6.1
Do you want to use clang as CUDA compiler? [y/N]:n
nvcc will be used as CUDA compiler.
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Do you wish to build TensorFlow with MPI support? [y/N]:
No MPI support will be enabled for TensorFlow.
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]:
Not configuring the WORKSPACE for Android builds.
Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See tools/bazel.rc for more details.
--config=mkl # Build with MKL support.
--config=monolithic # Config for mostly static monolithic build.
Configuration finished
一定要输入正确的版本
- /usr/local/bin/python3
- CUDA 9.2
- cuDNN 7.2
- compute capability 3.0,3.5,5.0,5.2,6.0,6.1 这个一定要去查你的显卡支持的版本,
可 以输入 多 个
.tf_configure.bazelrc
开始编译
$ bazel clean --expunge
$ bazel build --config=opt --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" --action_env PATH --action_env DYLD_LIBRARY_PATH //tensorflow/tools/pip_package:build_pip_package
编译过程
中 由 于网络问题,可能 会下 载失败,多重 试几次 如果bazel
版本 不 对,可能 会 造成 DYLD_LIBRARY_PATH没 有 传递过去,从而Library not loaded
--config=opt
build:opt --copt=-march=native
build:opt --host_copt=-march=native
build:opt --define with_default_optimizations=true
-march=native
表示 使用 当 前 CPU支持 的 优化指令 来 进行编译
查看
$ sysctl machdep.cpu.features
machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C
$ gcc -march=native -dM -E -x c++ /dev/null | egrep "AVX|SSE"
#define __AVX2__ 1
#define __AVX__ 1
#define __SSE2_MATH__ 1
#define __SSE2__ 1
#define __SSE3__ 1
#define __SSE4_1__ 1
#define __SSE4_2__ 1
#define __SSE_MATH__ 1
#define __SSE__ 1
#define __SSSE3__ 1
ERROR: /Users/c/Downloads/tensorflow-macos-gpu-r1.8/src/tensorflow/python/BUILD:1590:1: Executing genrule //tensorflow/python:string_ops_pygenrule failed (Aborted): bash failed: error executing command /bin/bash bazel-out/host/genfiles/tensorflow/python/string_ops_pygenrule.genrule_script.sh
dyld: Library not loaded: @rpath/libcudart.9.2.dylib
Referenced from: /private/var/tmp/_bazel_c/ea0f1e868907c49391ddb6d2fb9d5630/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/python/gen_string_ops_py_wrappers_cc
Reason: image not found
是 由 于 bazel的 bug 导致环境变量 DYLD_LIBRARY_PATH没 有 传递过去
external/protobuf_archive/python/google/protobuf/pyext/descriptor_pool.cc:169:7: error: assigning to 'char *' from incompatible type 'const char *'
if (PyString_AsStringAndSize(arg, &name, &name_size) < 0) {
这是
因 为 Python3.7 对 protobuf_python有 bug, 请换为 Python3.6后 重 新 编译
编译时间长达1.5
$ gcc -march=native -c -fPIC tensorflow/contrib/nccl/kernels/nccl_ops.cc -o _nccl_ops.o
$ gcc _nccl_ops.o -shared -o _nccl_ops.so
$ mv _nccl_ops.so bazel-out/darwin-py3-opt/bin/tensorflow/contrib/nccl/python/ops
$ rm _nccl_ops.o
$ bazel-bin/tensorflow/tools/pip_package/build_pip_package ~/Downloads/
$ bazel clean --expunge
$ pip3 uninstall tensorflow
$ pip3 install ~/Downloads/tensorflow-1.8.0-cp36-cp36m-macosx_10_13_x86_64.whl
也可以
$ pip3 install https://github.com/SixQuant/tensorflow-macos-gpu/releases/download/v1.8.0/tensorflow-1.8.0-cp36-cp36m-macosx_10_13_x86_64.whl
如果
是 直接 安 装 ,请一定要确认相关的版本是否和编译的一致或更高
- cudadriver_396.148_macos.dmg
- cuda_9.2.148_mac.dmg
- cuda_9.2.148.1_mac.dmg
- cudnn-9.2-osx-x64-v7.2.1.38.tgz
确认 Tensorflow GPU
是 否 工作 正常
确认Python
代 码是否 可 以读取 到 正 确的环境变量DYLD_LIBRARY_PATH
$ nano tensorflow-gpu-01-env.py
#!/usr/bin/env python
import os
print(os.environ["DYLD_LIBRARY_PATH"])
$ python3 tensorflow-gpu-01-env.py
/usr/local/cuda/lib:/usr/local/cuda/extras/CUPTI/lib
如果 TensorFlow matmul
cpu:0
gpu:0
设备gpu:0
matmul
。log_device_placement
True
。
$ nano tensorflow-gpu-02-hello.py
#!/usr/bin/env python
import tensorflow as tf
config = tf.ConfigProto()
config.log_device_placement = True
# Creates a graph.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
with tf.Session(config=config) as sess:
# Runs the op.
print(sess.run(c))
$ python3 tensorflow-gpu-02-hello.py
2018-08-26 14:13:45.987276: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX 750 Ti major: 5 minor: 0 memoryClockRate(GHz): 1.2545
pciBusID: 0000:01:00.0
totalMemory: 2.00GiB freeMemory: 706.66MiB
2018-08-26 14:13:45.987303: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-08-26 14:13:46.245132: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 426 MB memory) -> physical GPU (device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0)
Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0
2018-08-26 14:13:46.253938: I tensorflow/core/common_runtime/direct_session.cc:284] Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0
MatMul: (MatMul): /job:localhost/replica:0/task:0/device:GPU:0
2018-08-26 14:13:46.254406: I tensorflow/core/common_runtime/placer.cc:886] MatMul: (MatMul)/job:localhost/replica:0/task:0/device:GPU:0
b: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2018-08-26 14:13:46.254415: I tensorflow/core/common_runtime/placer.cc:886] b: (Const)/job:localhost/replica:0/task:0/device:GPU:0
a: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2018-08-26 14:13:46.254421: I tensorflow/core/common_runtime/placer.cc:886] a: (Const)/job:localhost/replica:0/task:0/device:GPU:0
[[22. 28.]
[49. 64.]]
其中
一 些无用 的 看 起 来 让人担心的 日 志 输出我 直接 从源码中注 释掉了 ,例 如:OS X does not support NUMA - returning NUMA node zero
Not found: TF GPU device with id 0 was not registered
$ nano tensorflow-gpu-04-cnn-gpu.py
#!/usr/bin/env python
from __future__ import absolute_import, division, print_function
import os
import time
import numpy as np
import tflearn
import tensorflow as tf
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '0'
from tensorflow.python.client import device_lib
def print_gpu_info():
for device in device_lib.list_local_devices():
print(device.name, 'memory_limit', str(round(device.memory_limit/1024/1024))+'M',
device.physical_device_desc)
print('=======================')
print_gpu_info()
DATA_PATH = "/Volumes/Cloud/DataSet"
mnist = tflearn.datasets.mnist.read_data_sets(DATA_PATH+"/mnist", one_hot=True)
config = tf.ConfigProto()
config.log_device_placement = True
config.allow_soft_placement = True
config.gpu_options.allocator_type = 'BFC'
config.gpu_options.allow_growth = True
#config.gpu_options.per_process_gpu_memory_fraction = 0.3
# Building convolutional network
net = tflearn.input_data(shape=[None, 28, 28, 1], name='input')
net = tflearn.conv_2d(net, 32, 5, weights_init='variance_scaling', activation='relu', regularizer="L2")
net = tflearn.conv_2d(net, 64, 5, weights_init='variance_scaling', activation='relu', regularizer="L2")
net = tflearn.fully_connected(net, 10, activation='softmax')
net = tflearn.regression(net,
optimizer='adam',
learning_rate=0.01,
loss='categorical_crossentropy',
name='target')
# Training
model = tflearn.DNN(net, tensorboard_verbose=3)
start_time = time.time()
model.fit(mnist.train.images.reshape([-1, 28, 28, 1]),
mnist.train.labels.astype(np.int32),
validation_set=(
mnist.test.images.reshape([-1, 28, 28, 1]),
mnist.test.labels.astype(np.int32)
),
n_epoch=1,
batch_size=128,
shuffle=True,
show_metric=True,
run_id='cnn_mnist_tflearn')
duration = time.time() - start_time
print('Training Duration %.3f sec' % (duration))
$ python3 tensorflow-gpu-04-cnn-gpu.py
2018-08-26 14:11:00.463212: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX 750 Ti major: 5 minor: 0 memoryClockRate(GHz): 1.2545
pciBusID: 0000:01:00.0
totalMemory: 2.00GiB freeMemory: 258.06MiB
2018-08-26 14:11:00.463235: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-08-26 14:11:00.717963: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/device:GPU:0 with 203 MB memory) -> physical GPU (device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0)
/device:CPU:0 memory_limit 256M
/device:GPU:0 memory_limit 204M device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0
=======================
Extracting /Volumes/Cloud/DataSet/mnist/train-images-idx3-ubyte.gz
Extracting /Volumes/Cloud/DataSet/mnist/train-labels-idx1-ubyte.gz
Extracting /Volumes/Cloud/DataSet/mnist/t10k-images-idx3-ubyte.gz
Extracting /Volumes/Cloud/DataSet/mnist/t10k-labels-idx1-ubyte.gz
2018-08-26 14:11:01.158727: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-08-26 14:11:01.158843: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 203 MB memory) -> physical GPU (device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0)
2018-08-26 14:11:01.487530: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-08-26 14:11:01.487630: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 203 MB memory) -> physical GPU (device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0)
---------------------------------
Run id: cnn_mnist_tflearn
Log directory: /tmp/tflearn_logs/
---------------------------------
Training samples: 55000
Validation samples: 10000
--
Training Step: 430 | total loss: 0.16522 | time: 45.764s
| Adam | epoch: 001 | loss: 0.16522 - acc: 0.9660 | val_loss: 0.06837 - val_acc: 0.9780 -- iter: 55000/55000
--
Training Duration 45.898 sec
速度 提 升 明 显:CPU
版 无 AVX2 FMA,time: 168.151sCPU
版 加 AVX2 FMA,time: 147.697sGPU
版 加 AVX2 FMA,time: 45.898s
cuda-smi
用 来 在 Mac上 代替 nvidia-smi
nvidia-smi
$ sudo scp cuda-smi /usr/local/bin/
$ sudo chmod 755 /usr/local/bin/cuda-smi
$ cuda-smi
Device 0 [PCIe 0:1:0.0]: GeForce GTX 750 Ti (CC 5.0): 5.0234 of 2047.7 MB (i.e. 0.245%) Free
重 新 编译一 个 _nccl_ops.so 复制过去即 可
$ gcc -c -fPIC tensorflow/contrib/nccl/kernels/nccl_ops.cc -o _nccl_ops.o
$ gcc _nccl_ops.o -shared -o _nccl_ops.so
$ mv _nccl_ops.so /usr/local/lib/python3.6/site-packages/tensorflow/contrib/nccl/python/ops/
$ rm _nccl_ops.o
这是
因 为 Jupyter中 丢失了 DYLD_LIBRARY_PATH 环境变量
或 者 说是新 版本 的 MacOS禁止 了 你对 DYLD_LIBRARY_PATH等 不安 全 因 素的 随意 修 改 ,除 非 你关闭SIP功 能
import os
os.environ['DYLD_LIBRARY_PATH']
上面 的 代 码在 Jupyter中 会 出 错,原因 是 因 为 SIP的 原因 环境变量 DYLD_LIBRARY_PATH不能 被 修 改
所 谓的段 错误就是指 访问的 内 存 超 过了系 统所给这个程序 的 内 存 空 间
直接 忽 略 这个警告