Tensorflow 1.8 macOS GPU Install

老ろう徐じょ

Thursday, 26 July 2018

Tensorflow 1.8 macOS GPU Install

Tensorflow团队宣布せんぷ停止ていし支持しじ1.2以后mac版ばん的てきtensorflow gpu版本はんぽん。

因いん此没办法直接ちょくせつ安あん装そう只ただ能のう自己じこ用よう源げん码编译了。

Tensorflow 1.8 with CUDA on macOS High Sierra 10.13.6

CPU 运行 Tensorflow 感かん觉不够快，想そう试试 GPU 加速かそく！正せい好こう自己じこ有ゆう一いち块支持しじCUDA的てき显卡。

版本はんぽん

重要じゅうよう的てき事情じじょう说三さん遍へん：相あい关的驱动以及编译环境工具こうぐ必须选择配はい套的版本はんぽん，否いや则编译不成功せいこう！！！

版本はんぽん：

TensorFlow r1.8 source code，最新さいしん的てき1.9貌似还有问题
macOS 10.13.6，这个应该关系不ふ大だい
显卡驱动 387.10.10.10.40.105，支持しじ的てき CUDA 9.1
CUDA 9.2，这个是ぜ CUDA 驱动，可か以高于上面めん的てき显卡支持しじ的てきCUDA 版本はんぽん，也就是ぜ CUDA Driver 9.2
cuDNN 7.2，与あずか上面うわつら的てきCUDA对应，直接ちょくせつ安あん装そう最新さいしん版ばん
XCode 8.2.1，这个是ぜ重点じゅうてん，请降级到这个版本はんぽん，否いや则会编译出で错或运行时出错 Segmentation Fault
bazel 0.14.0，这个是ぜ重点じゅうてん，请降级到这个版本はんぽん
Python 3.6，这个是ぜ重点じゅうてん，不要ふよう使用しよう最新さいしん版ばん的てき Python 3.7 截止目前もくぜん编译会かい有ゆう问题

准じゅん备

需要じゅよう下か载（某ぼう些文件けん较大需要じゅよう下か载，请在继续阅读前まえ先さき开始下か载，节省时间）：

Xcode 8.2.1

https://developer.apple.com/download/more/

Xcode_8.2.1.xip
bazel-0.14.0

https://github.com/bazelbuild/bazel/releases/download/0.14.0/bazel-0.14.0-installer-darwin-x86_64.sh
CUDA Toolkit 9.2

https://developer.nvidia.com/cuda-toolkit-archive
cuDNN v7.2.1

https://developer.nvidia.com/rdp/cudnn-download

Tensorflow source code，333M

$ git clone https://github.com/tensorflow/tensorflow -b r1.8

Python 3.6.5_1

目前もくぜん装そう的てき是ぜ3.7，降くだ级吧

$ brew unlink python
$ brew install https://raw.githubusercontent.com/Homebrew/homebrew-core/f2a764ef944b1080be64bd88dca9a1d80130c558/Formula/python.rb
$ pip3 install --upgrade pip setuptools wheel
# $ brew switch python 3.6.5_1

不要ふよう使用しよう Python 3.7.0，否いや则编译会有ゆう问题

编译完かん后きさき可か以切换回去さ

$ brew switch python 3.7.0

Xcode 8.2.1

需要じゅよう降くだ级 Xcode 到いた 8.2.1

去さapple开发者しゃ官かん网下载包，https://developer.apple.com/download/more/

解かい压后复制到いた/Applications/Xcode.app，然しか后きさき进行指向しこう

$ sudo xcode-select -s /Applications/Xcode.app

确认安あん装そう是ぜ否ひ准じゅん确

$ cc -v
Apple LLVM version 8.0.0 (clang-800.0.42.1)
Target: x86_64-apple-darwin17.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

Command Line Tools，cc 即そく clang

这个很重要じゅうよう，否いや则虽然しか编译成功せいこう但ただし是ぜ跑复杂一点项目会出现 Segmentation Fault

环境变量

由よし于用到いた CUDA 的てき lib 不ふ是ぜ在ざい系けい统目录下，因いん此需要よう设置环境变量来らい指向しこう

在ざい Mac 下か LD_LIBRARY_PATH 无效，使用しよう的てき是ぜ DYLD_LIBRARY_PATH

配置はいち环境变量编辑 ~/.bash_profile或ある ~/.zshrc

export CUDA_HOME=/usr/local/cuda
export DYLD_LIBRARY_PATH=$CUDA_HOME/lib:$CUDA_HOME/extras/CUPTI/lib
export PATH=$CUDA_HOME/bin:$PATH

安あん装そう CUDA

CUDA是ぜNVIDIA推出的てき用よう于自家かGPU的てき并行计算框かまち架か，也就是ぜ说CUDA只ただ能のう在ざいNVIDIA的てきGPU上じょう运行，而且只ただ有ゆう当とう要よう解かい决的计算问题是ぜ可か以大量りょう并行计算的てき时候才能さいのう发挥CUDA的てき作用さよう。

第一步だいいっぽ：确认显卡是ぜ否ひ支持しじ GPU 计算

在ざい这里找到你的显卡型がた号ごう，看み是ぜ否ひ支持しじ

https://developer.nvidia.com/cuda-gpus

我が的てき显卡是ぜ NVIDIA GeForce GTX 750 Ti:

GPU	Compute Capability
GeForce GTX 750 Ti	5.0

第だい二に步ほ：安あん装そう CUDA

如果安あん装そう了りょう其他版本はんぽん的てきCUDA，需要じゅよう卸おろし载请执行

$ sudo /usr/local/bin/uninstall_cuda_drv.pl
$ sudo /usr/local/cuda/bin/uninstall_cuda_9.1.pl
$ sudo rm -rf /Developer/NVIDIA/CUDA-9.1/
$ sudo rm -rf /Library/Frameworks/CUDA.framework
$ sudo rm -rf /usr/local/cuda/

为了万まん无一いち失しつ，最さい好こう还是重じゅう启一下か

首くび先さき需要じゅよう说明的てき是ぜ：CUDA Driver 与あずか GPU Driver的てき版本はんぽん必须一致いっち，才能さいのう让CUDA找到显卡。

GPU Driver 即そく显卡驱动
- http://www.macvidcards.com/drivers.html
- 我が的てき macOS 是ぜ 10.13.6 对应的てき驱动已やめ经安装そう最新さいしん版ばん 387.10.10.10.40.105
  
  https://www.nvidia.com/download/driverResults.aspx/136062/en-us
```
Version:	387.10.10.10.40.105
Release Date:	2018.7.10
Operating System:	macOS High Sierra 10.13.6
CUDA Toolkit:	9.1
```
CUDA Driver
- http://www.nvidia.com/object/mac-driver-archive.html
- 单独先安さきやす装そう CUDA Driver，可か以选择最新しん版本はんぽん，看み他た对显卡驱动的支持しじ
- cudadriver_396.148_macos.dmg
```
New Release 396.148
CUDA driver update to support CUDA Toolkit 9.2, macOS 10.13.6 and NVIDIA display driver 387.10.10.10.40.105
Recommended CUDA version(s): CUDA 9.2
Supported macOS 10.13
```
CUDA Toolkit
- https://developer.nvidia.com/cuda-toolkit
- 可か以选择最新しん版本はんぽん，这里选择 9.2
- cuda_9.2.148_mac.dmg、cuda_9.2.148.1_mac.dmg

安あん装そう完成かんせい后きさき检查：

$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Tue_Jun_12_23:08:12_CDT_2018
Cuda compilation tools, release 9.2, V9.2.148

确认驱动是ぜ否いや已やめ加か载

$ kextstat | grep -i cuda.
  149    0 0xffffff7f838d3000 0x2000     0x2000     com.nvidia.CUDA (1.1.0) E13478CB-B251-3C0A-86E9-A6B56F528FE8 <4 1>

测试CUDA能否のうひ正常せいじょう运行：

$ cd /usr/local/cuda/samples
$ sudo make -C 1_Utilities/deviceQuery
$ ./bin/x86_64/darwin/release/deviceQuery
./bin/x86_64/darwin/release/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 750 Ti"
  CUDA Driver Version / Runtime Version          9.2 / 9.2
  CUDA Capability Major/Minor version number:    5.0
  Total amount of global memory:                 2048 MBytes (2147155968 bytes)
  ( 5) Multiprocessors, (128) CUDA Cores/MP:     640 CUDA Cores
  GPU Max Clock rate:                            1254 MHz (1.25 GHz)
  Memory Clock rate:                             2700 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 2097152 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Supports Cooperative Kernel Launch:            No
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.2, CUDA Runtime Version = 9.2, NumDevs = 1
Result = PASS

如果最さい后きさき显示 Result = PASS，那な么CUDA就工作こうさく正常せいじょう

如果出で现下列れつ错误

The version ('9.1') of the host compiler ('Apple clang') is not supported

说明 Xcode 版ばん本太もとぶと新しん了りょう，要求ようきゅう降くだ级 Xcode

第だい三さん步ほ：安あん装そう cuDNN

cuDNN（CUDA Deep Neural Network library）：是ぜNVIDIA打だ造づくり的てき针对深度しんど神しん经网络的加速かそく库，是ぜ一个用于深层神经网络的GPU加速かそく库。如果你要用ようGPU训练模型もけい，cuDNN不ふ是ぜ必须的てき，但ただし是ぜ一般会采用这个加速库。

cuDNN

https://developer.nvidia.com/rdp/cudnn-download
下しも载最新版しんぱん cuDNN v7.2.1 for CUDA 9.2
cudnn-9.2-osx-x64-v7.2.1.38.tgz

下した好こう后きさき直接ちょくせつ把わ解かい压缩合あい并到CUDA目め录/usr/local/cuda/下しも即そく可か：

$ tar -xzvf cudnn-9.2-osx-x64-v7.2.1.38.tgz
$ sudo cp cuda/include/cudnn.h /usr/local/cuda/include
$ sudo cp cuda/lib/libcudnn* /usr/local/cuda/lib
$ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib/libcudnn*
$ rm -rf cuda

第だい四よん步ほ：安あん装そう CUDA-Z

用もちい来らい查看 CUDA 运行情じょう况

$ brew cask install cuda-z

然しか后きさき就可以从 Application 里さと运行 CUDA-Z 来らい查看CUDA运行情じょう况了

编译

如果有ゆう已やめ经编译好的てき版本はんぽん，则可以跳过本章ほんしょう直接ちょくせつ到いた"安やす装そう"部分ぶぶん

下面かめん从源码编译 Tensorflow GPU 版本はんぽん

CUDA准じゅん备

请参考さんこう前面ぜんめん部分ぶぶん

编译环境准じゅん备

Python

$ python3 --version
Python 3.6.5

不要ふよう使用しよう Python 3.7.0，否いや则编译会有ゆう问题

Python 依よ赖

$ pip3 install six numpy wheel

Coreutils，llvm，OpenMP

$ brew install coreutils llvm cliutils/apple/libomp

Bazel

需要じゅよう注意ちゅうい，这里必须是ぜ 0.14.0 版本はんぽん，新しん或ある旧都きゅうと能のう导致编译失しつ败。下した载0.14.0版本はんぽん，bazel发布页

$ curl -O https://github.com/bazelbuild/bazel/releases/download/0.14.0/bazel-0.14.0-installer-darwin-x86_64.sh
$ chmod +x bazel-0.14.0-installer-darwin-x86_64.sh
$ ./bazel-0.14.0-installer-darwin-x86_64.sh
$ bazel version
Build label: 0.14.0

太ふと低てい版本はんぽん可能かのう会かい导致找不到いた环境变量，从而 Library not loaded

检查NVIDIA开发环境

$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Tue_Jun_12_23:08:12_CDT_2018
Cuda compilation tools, release 9.2, V9.2.148

检查clang版本はんぽん

$ cc -v
Apple LLVM version 8.0.0 (clang-800.0.42.1)
Target: x86_64-apple-darwin17.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

源みなもと码准备

拉ひしげ取ど TensorFlow 源げん码 release 1.8 分ふん支ささえ并进行ぎょう修おさむ改あらため，使つかい其与macOS兼けん容よう

这里可か以直接ちょくせつ下か载修改あらため好このみ的てき源げん码

$ curl -O https://raw.githubusercontent.com/SixQuant/tensorflow-macos-gpu/master/tensorflow-macos-gpu-r1.8-src.tar.gz

或ある者もの手工しゅこう修おさむ改あらため

$ git clone https://github.com/tensorflow/tensorflow -b r1.8
$ cd tensorflow
$ curl -O https://raw.githubusercontent.com/SixQuant/tensorflow-macos-gpu/master/patch/tensorflow-macos-gpu-r1.8.patch
$ git apply tensorflow-macos-gpu-r1.8.patch
$ curl -o third_party/nccl/nccl.h https://raw.githubusercontent.com/SixQuant/tensorflow-macos-gpu/master/patch/nccl.h

Build

配置はいち

$ which python3
/usr/local/bin/python3

$ ./configure

Please specify the location of python. [Default is /usr/local/opt/python@2/bin/python2.7]: /usr/local/bin/python3

Found possible Python library paths:
  /usr/local/Cellar/python/3.6.5_1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages
Please input the desired Python library path to use.  Default is [/usr/local/Cellar/python/3.6.5_1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages]

Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: n
No Google Cloud Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: n
No Hadoop File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: n
No Amazon S3 File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Apache Kafka Platform support? [Y/n]: n
No Apache Kafka Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with XLA JIT support? [y/N]: n
No XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with GDR support? [y/N]: n
No GDR support will be enabled for TensorFlow.

Do you wish to build TensorFlow with VERBS support? [y/N]: n
No VERBS support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: 9.2

Please specify the location where CUDA 9.1 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:

Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 7.2

Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:

Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,5.2]3.0,3.5,5.0,5.2,6.0,6.1

Do you want to use clang as CUDA compiler? [y/N]:n
nvcc will be used as CUDA compiler.

Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:

Do you wish to build TensorFlow with MPI support? [y/N]:
No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:

Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]:
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See tools/bazel.rc for more details.
	--config=mkl         	# Build with MKL support.
	--config=monolithic  	# Config for mostly static monolithic build.
Configuration finished

一定要输入正确的版本

/usr/local/bin/python3

CUDA 9.2

cuDNN 7.2

compute capability 3.0,3.5,5.0,5.2,6.0,6.1 这个一定要去查你的显卡支持的版本，可か以输入いれ多た个

上面うわつら实际上じょう是ぜ生成せいせい了りょう编译配置はいち文ぶん件けん .tf_configure.bazelrc

开始编译

$ bazel clean --expunge
$ bazel build --config=opt --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" --action_env PATH --action_env DYLD_LIBRARY_PATH //tensorflow/tools/pip_package:build_pip_package

编译过程中ちゅう由よし于网络问题，可能かのう会下えげ载失败，多重たじゅう试几次じ

如果bazel版本はんぽん不ふ对，可能かのう会かい造成ぞうせい DYLD_LIBRARY_PATH 没ぼつ有ゆう传递过去，从而Library not loaded

编译说明

--config=opt 的てき意思いし应该是ぜ

build:opt --copt=-march=native
build:opt --host_copt=-march=native
build:opt --define with_default_optimizations=true

-march=native 表示ひょうじ使用しよう当とう前ぜんCPU支持しじ的てき优化指令しれい来らい进行编译

查看当とう前ぜん CPU 支持しじ的てき指令しれい集しゅう

$ sysctl machdep.cpu.features
machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C

$ gcc -march=native -dM -E -x c++ /dev/null | egrep "AVX|SSE"

#define __AVX2__ 1
#define __AVX__ 1
#define __SSE2_MATH__ 1
#define __SSE2__ 1
#define __SSE3__ 1
#define __SSE4_1__ 1
#define __SSE4_2__ 1
#define __SSE_MATH__ 1
#define __SSE__ 1
#define __SSSE3__ 1

编译错误 dyld: Library not loaded: @rpath/libcudart.9.2.dylib

ERROR: /Users/c/Downloads/tensorflow-macos-gpu-r1.8/src/tensorflow/python/BUILD:1590:1: Executing genrule //tensorflow/python:string_ops_pygenrule failed (Aborted): bash failed: error executing command /bin/bash bazel-out/host/genfiles/tensorflow/python/string_ops_pygenrule.genrule_script.sh
dyld: Library not loaded: @rpath/libcudart.9.2.dylib
  Referenced from: /private/var/tmp/_bazel_c/ea0f1e868907c49391ddb6d2fb9d5630/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/python/gen_string_ops_py_wrappers_cc
  Reason: image not found

是ぜ由よし于 bazel 的てき bug 导致环境变量 DYLD_LIBRARY_PATH 没ぼつ有ゆう传递过去

解かい决：安あん装そう正せい确版本はんぽん的てき bazel

编译错误 PyString_AsStringAndSize

external/protobuf_archive/python/google/protobuf/pyext/descriptor_pool.cc:169:7: error: assigning to 'char *' from incompatible type 'const char *'
  if (PyString_AsStringAndSize(arg, &name, &name_size) < 0) {

这是因いん为 Python3.7 对 protobuf_python 有ゆう bug, 请换为 Python3.6 后きさき重おも新しん编译

protocolbuffers/protobuf#4086

编译时间长达1.5小しょう时，请耐心しん等とう待まち

生成せいせいPIP安あん装そう包つつみ

重じゅう编译并且替がえ换_nccl_ops.so

$ gcc -march=native -c -fPIC tensorflow/contrib/nccl/kernels/nccl_ops.cc -o _nccl_ops.o
$ gcc _nccl_ops.o -shared -o _nccl_ops.so
$ mv _nccl_ops.so bazel-out/darwin-py3-opt/bin/tensorflow/contrib/nccl/python/ops
$ rm _nccl_ops.o

打だ包つつみ

$ bazel-bin/tensorflow/tools/pip_package/build_pip_package ~/Downloads/

清せい理さとし

$ bazel clean --expunge

安あん装そう

$ pip3 uninstall tensorflow
$ pip3 install ~/Downloads/tensorflow-1.8.0-cp36-cp36m-macosx_10_13_x86_64.whl

也可以直接ちょくせつ通どおり过http安あん装そう

$ pip3 install https://github.com/SixQuant/tensorflow-macos-gpu/releases/download/v1.8.0/tensorflow-1.8.0-cp36-cp36m-macosx_10_13_x86_64.whl

如果是ぜ直接ちょくせつ安あん装そう，请一定要确认相关的版本是否和编译的一致或更高

cudadriver_396.148_macos.dmg

cuda_9.2.148_mac.dmg

cuda_9.2.148.1_mac.dmg

cudnn-9.2-osx-x64-v7.2.1.38.tgz

确认

确认 Tensorflow GPU 是ぜ否ひ工作こうさく正常せいじょう

确认环境变量

确认Python代だい码是否いや可か以读取と到いた正せい确的环境变量DYLD_LIBRARY_PATH

$ nano tensorflow-gpu-01-env.py

#!/usr/bin/env python

import os

print(os.environ["DYLD_LIBRARY_PATH"])

$ python3 tensorflow-gpu-01-env.py
/usr/local/cuda/lib:/usr/local/cuda/extras/CUPTI/lib

确认是ぜ否ひ启用了りょうGPU

如果 TensorFlow 指令しれい中ちゅう兼有けんゆう CPU 和わ GPU 实现，当とう该指令れい分配ぶんぱい到いた设备时，GPU 设备有ゆう优先权。例れい如，如果 matmul 同どう时存在そんざい CPU 和わ GPU 核かく函数かんすう，在ざい同どう时有 cpu:0 和わ gpu:0 设备的てき系けい统中，gpu:0 会かい被ひ选来运行 matmul。要よう找出您的指令しれい和わ张量被ひ分配ぶんぱい到いた哪个设备，请创建けん会かい话并将しょう log_device_placement 配置はいち选项设为 True。

$ nano tensorflow-gpu-02-hello.py

#!/usr/bin/env python

import tensorflow as tf

config = tf.ConfigProto()
config.log_device_placement = True

# Creates a graph.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
with tf.Session(config=config) as sess:
    # Runs the op.
    print(sess.run(c))

$ python3 tensorflow-gpu-02-hello.py
2018-08-26 14:13:45.987276: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX 750 Ti major: 5 minor: 0 memoryClockRate(GHz): 1.2545
pciBusID: 0000:01:00.0
totalMemory: 2.00GiB freeMemory: 706.66MiB
2018-08-26 14:13:45.987303: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-08-26 14:13:46.245132: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 426 MB memory) -> physical GPU (device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0)
Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0
2018-08-26 14:13:46.253938: I tensorflow/core/common_runtime/direct_session.cc:284] Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0

MatMul: (MatMul): /job:localhost/replica:0/task:0/device:GPU:0
2018-08-26 14:13:46.254406: I tensorflow/core/common_runtime/placer.cc:886] MatMul: (MatMul)/job:localhost/replica:0/task:0/device:GPU:0
b: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2018-08-26 14:13:46.254415: I tensorflow/core/common_runtime/placer.cc:886] b: (Const)/job:localhost/replica:0/task:0/device:GPU:0
a: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2018-08-26 14:13:46.254421: I tensorflow/core/common_runtime/placer.cc:886] a: (Const)/job:localhost/replica:0/task:0/device:GPU:0
[[22. 28.]
 [49. 64.]]

其中一いち些无用よう的てき看み起おこり来らい让人担心的てき日び志こころざし输出我わが直接ちょくせつ从源码中注ちゅう释掉了りょう，例れい如：

OS X does not support NUMA - returning NUMA node zero

Not found: TF GPU device with id 0 was not registered

跑复杂一いち点てん的てき

$ nano tensorflow-gpu-04-cnn-gpu.py

#!/usr/bin/env python

from __future__ import absolute_import, division, print_function
import os
import time
import numpy as np
import tflearn
import tensorflow as tf

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '0'

from tensorflow.python.client import device_lib
def print_gpu_info():
    for device in device_lib.list_local_devices():
        print(device.name, 'memory_limit', str(round(device.memory_limit/1024/1024))+'M', 
            device.physical_device_desc)
    print('=======================')

print_gpu_info()


DATA_PATH = "/Volumes/Cloud/DataSet"

mnist = tflearn.datasets.mnist.read_data_sets(DATA_PATH+"/mnist", one_hot=True)

config = tf.ConfigProto()
config.log_device_placement = True
config.allow_soft_placement = True

config.gpu_options.allocator_type = 'BFC'
config.gpu_options.allow_growth = True
#config.gpu_options.per_process_gpu_memory_fraction = 0.3

# Building convolutional network
net = tflearn.input_data(shape=[None, 28, 28, 1], name='input') 
net = tflearn.conv_2d(net, 32, 5, weights_init='variance_scaling', activation='relu', regularizer="L2") 
net = tflearn.conv_2d(net, 64, 5, weights_init='variance_scaling', activation='relu', regularizer="L2") 
net = tflearn.fully_connected(net, 10, activation='softmax') 
net = tflearn.regression(net,
                         optimizer='adam',                  
                         learning_rate=0.01,
                         loss='categorical_crossentropy', 
                         name='target')

# Training
model = tflearn.DNN(net, tensorboard_verbose=3)

start_time = time.time()
model.fit(mnist.train.images.reshape([-1, 28, 28, 1]),
          mnist.train.labels.astype(np.int32),
          validation_set=(
              mnist.test.images.reshape([-1, 28, 28, 1]),
              mnist.test.labels.astype(np.int32)
          ),
          n_epoch=1,
          batch_size=128,
          shuffle=True,
          show_metric=True,
          run_id='cnn_mnist_tflearn')

duration = time.time() - start_time
print('Training Duration %.3f sec' % (duration))

$ python3 tensorflow-gpu-04-cnn-gpu.py
2018-08-26 14:11:00.463212: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX 750 Ti major: 5 minor: 0 memoryClockRate(GHz): 1.2545
pciBusID: 0000:01:00.0
totalMemory: 2.00GiB freeMemory: 258.06MiB
2018-08-26 14:11:00.463235: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-08-26 14:11:00.717963: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/device:GPU:0 with 203 MB memory) -> physical GPU (device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0)
/device:CPU:0 memory_limit 256M
/device:GPU:0 memory_limit 204M device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0
=======================
Extracting /Volumes/Cloud/DataSet/mnist/train-images-idx3-ubyte.gz
Extracting /Volumes/Cloud/DataSet/mnist/train-labels-idx1-ubyte.gz
Extracting /Volumes/Cloud/DataSet/mnist/t10k-images-idx3-ubyte.gz
Extracting /Volumes/Cloud/DataSet/mnist/t10k-labels-idx1-ubyte.gz
2018-08-26 14:11:01.158727: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-08-26 14:11:01.158843: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 203 MB memory) -> physical GPU (device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0)
2018-08-26 14:11:01.487530: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-08-26 14:11:01.487630: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 203 MB memory) -> physical GPU (device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0)
---------------------------------
Run id: cnn_mnist_tflearn
Log directory: /tmp/tflearn_logs/
---------------------------------
Training samples: 55000
Validation samples: 10000
--
Training Step: 430  | total loss: 0.16522 | time: 45.764s
| Adam | epoch: 001 | loss: 0.16522 - acc: 0.9660 | val_loss: 0.06837 - val_acc: 0.9780 -- iter: 55000/55000
--
Training Duration 45.898 sec

速度そくど提ひさげ升ます明あかり显：

CPU 版ばん无 AVX2 FMA，time: 168.151s

CPU 版ばん加か AVX2 FMA，time: 147.697s

GPU 版ばん加か AVX2 FMA，time: 45.898s

cuda-smi

cuda-smi 用よう来らい在ざいMac上じょう代替だいたい nvidia-smi

nvidia-smi是ぜ用よう来らい查看GPU内ない存そん使用しよう情じょう况的。

下しも载后放ひ到いた /usr/local/bin/ 目め录下

$ sudo scp cuda-smi /usr/local/bin/
$ sudo chmod 755 /usr/local/bin/cuda-smi
$ cuda-smi
Device 0 [PCIe 0:1:0.0]: GeForce GTX 750 Ti (CC 5.0): 5.0234 of 2047.7 MB (i.e. 0.245%) Free

问题

错误 _ncclAllReduce

重じゅう新しん编译一いち个 _nccl_ops.so 复制过去即そく可か

$ gcc -c -fPIC tensorflow/contrib/nccl/kernels/nccl_ops.cc -o _nccl_ops.o
$ gcc _nccl_ops.o -shared -o _nccl_ops.so
$ mv _nccl_ops.so /usr/local/lib/python3.6/site-packages/tensorflow/contrib/nccl/python/ops/
$ rm _nccl_ops.o

Library not loaded: @rpath/libcublas.9.2.dylib

这是因いん为 Jupyter 中ちゅう丢失了りょう DYLD_LIBRARY_PATH 环境变量

或ある者もの说是新しん版本はんぽん的てき MacOS 禁止きんし了りょう你对 DYLD_LIBRARY_PATH 等とう不安ふあん全ぜん因いん素的すてき随意ずいい修おさむ改あらため，除じょ非ひ你关闭SIP功こう能のう

重じゅう现

import os
os.environ['DYLD_LIBRARY_PATH']

上面うわつら的てき代だい码在 Jupyter 中ちゅう会かい出で错，原因げんいん是ぜ因いん为 SIP的てき原因げんいん环境变量 DYLD_LIBRARY_PATH 不能ふのう被ひ修おさむ改あらため

解かい决：参考さんこう前面ぜんめん的てき “环境变量” 设置部分ぶぶん

Segmentation Fault

所ところ谓的段だん错误就是指ゆび访问的てき内ない存そん超ちょう过了系けい统所给这个程序じょ的てき内ない存そん空そら间

解かい决：请再次じ确认使用しよう了りょう正せい确的版本はんぽん和わ编译参さん数すう，尤ゆう其是 XCode

Not found: TF GPU device with id 0 was not registered

直接ちょくせつ忽ゆるがせ略りゃく这个警告けいこく

GPU 内ない存そん有ゆう泄漏？？？

不知ふち道どう咋解决:(

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tensorflow 1.8 macOS GPU Install

版本はんぽん

准じゅん备

Python 3.6.5_1

Xcode 8.2.1

环境变量

安あん装そう CUDA

第一步だいいっぽ：确认显卡是ぜ否ひ支持しじ GPU 计算

第だい二に步ほ：安あん装そう CUDA

第だい三さん步ほ：安あん装そう cuDNN

第だい四よん步ほ：安あん装そう CUDA-Z

编译

CUDA准じゅん备

编译环境准じゅん备

源みなもと码准备

Build

编译说明

编译错误 dyld: Library not loaded: @rpath/libcudart.9.2.dylib

编译错误 PyString_AsStringAndSize

生成せいせいPIP安あん装そう包つつみ

安あん装そう

确认

确认环境变量

确认是ぜ否ひ启用了りょうGPU

跑复杂一いち点てん的てき

cuda-smi

问题

错误 _ncclAllReduce

Library not loaded: @rpath/libcublas.9.2.dylib

Segmentation Fault

Not found: TF GPU device with id 0 was not registered

GPU 内ない存そん有ゆう泄漏？？？

About

Releases 1

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
patch		patch
res		res
tools		tools
.gitignore		.gitignore
README.md		README.md
tensorflow-gpu-01-env.py		tensorflow-gpu-01-env.py
tensorflow-gpu-02-hello.py		tensorflow-gpu-02-hello.py
tensorflow-gpu-03-matmul.py		tensorflow-gpu-03-matmul.py
tensorflow-gpu-04-cnn-cpu.py		tensorflow-gpu-04-cnn-cpu.py
tensorflow-gpu-04-cnn-gpu.py		tensorflow-gpu-04-cnn-gpu.py
tensorflow-macos-gpu-r1.8-src.zip		tensorflow-macos-gpu-r1.8-src.zip

SixQuant/tensorflow-macos-gpu

Folders and files

Latest commit

History

Repository files navigation

Tensorflow 1.8 macOS GPU Install

版本はんぽん

准じゅん备

Python 3.6.5_1

Xcode 8.2.1

环境变量

安あん装そう CUDA

第一步だいいっぽ：确认显卡是ぜ否ひ支持しじ GPU 计算

第だい二に步ほ：安あん装そう CUDA

第だい三さん步ほ：安あん装そう cuDNN

第だい四よん步ほ：安あん装そう CUDA-Z

编译

CUDA准じゅん备

编译环境准じゅん备

源みなもと码准备

Build

编译说明

编译错误 dyld: Library not loaded: @rpath/libcudart.9.2.dylib

编译错误 PyString_AsStringAndSize

生成せいせいPIP安あん装そう包つつみ

安あん装そう

确认

确认环境变量

确认是ぜ否ひ启用了りょうGPU

跑复杂一いち点てん的てき

cuda-smi

问题

错误 _ncclAllReduce

Library not loaded: @rpath/libcublas.9.2.dylib

Segmentation Fault

Not found: TF GPU device with id 0 was not registered

GPU 内ない存そん有ゆう泄漏？？？

About

Topics

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages