深度学习环境搭建(以caffe+cudnn为例,无需root权限)

(这里在redhat 6.3上安装,无root权限,全程使用源码编译方式安装)
1、系统环境与权限: GTX780,redhat 6.3, gcc 4.4.6, 无root权限
2、所需依赖库:
(1) cuda 6.5(推荐),6.0, 5.5, 5.0以及cuda6.0对应的驱动,或者cuda 5对应驱动319.*(not 331.*)
注:cuda驱动安装需要root权限,我这里安装cuda 6.5, 至少需要340.*以上驱动(6.0及以下没有测试过)
(2)BLAS(ATLAS, MKL,OpenBLAS)括号中三选一。 我这里选择了系统管理员已经默认安装的Intel MKL
(3)OpenCV(>=2.4)
(4)Boost(>= 1.55)(其实只能安装1.55,后面会提到)
(5) glog, gflags, protobuf, leveldb, snappy, hdf5, lmdb
(6) Python 2.7, numpy(>= 1.7)
(7) MATLAB
(Python和Matlab应该是可选的,我只安装了python)
3、依赖库安装
cuda、python安装比较简单,intel MKL已默认安装,不再介绍

#protobuf
tar zxvf protobuf-2.6.1.tar.gz
cd protobuf-2.6.1
chmod a+x autogen.sh
./autogen.sh
./configure -PREFIX=intstall_dir
make && make install
#leveldb
unzip leveldb-master.zip
cd leveldb-master
chmod a+x build_detect_platform
./build_detect_platform
make
#snappy
unzip snappy-master.zip
cd snappy-master
chmod a+x autogen.sh
./autogen.sh
./configure -PREFIX=install_dir
make && make install
#hdf5
tar zxvf hdf5-1.8.14.tar.gz
cd hdf5-1.8.14
./configure --PREFIX=install_dir
make && make install
# glog
tar zxvf glog-0.3.3.tar.gz
cd glog-0.3.3
./configure --prefix=install_dir
make && make install
# gflags
tar zxvf gflags-2.1.1.tar.gz
cd gflags-2.1.1.ta.gz
mkdir build && cd build
export CXXFLAGS="-fPIC" && cmake -DCMAKE_INSTALL_PREFIX=install_dir .. && make VERBOSE=1
make && make install
# lmdb
git clone git://gitorious.org/mdb/mdb.git
cd mdb/libraries/liblmdb
make && make install
#opencv
unzip opencv-2.4.10.zip
cd opencv-2.4.10
cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=install_dir ..
make && make install

4、cudnn安装
首先需要在https://developer.nvidia.com/cuDNN下载cuDNN Deep Neural Network Library。(CUDA Registered Developers 可以自由获得cuDNN library)

tar -zxvf cudnn-6.5-linux-R1.tgz
cd cudnn-6.5-linux-R1
# cp cudnn.h to CUDD_DIR/include, CUDA_DIR is the directory where the CUDA toolkit is installed
cp cudnn.h CUDA_DIR/include
# cp cudnn library to CUDA_DIR/lib64
cp libcudnn* CUDA_DIR/lib64
#这里还需要对cudnn library建立软链接
ln -s libcudnn.so.6.5 libcudnn.so.6.5.18
ln -s libcudnn.so libcudnn.so.6.5

5、caffe
http://caffe.berkeleyvision.org/installation.html上下载caffe软件包
解压后,进入caffe目录,首先复制一份Makefile.config

cp Makefile.config.example Makefile.config

然后修改Makefile.config。有几处需要修改
(1) cudnn
使用cuDNN加速,就需要取消对USE_CUDNN:=1的注释。同时设置cuda 安装路径和cuda architecture(取消对CUDA_ARCH的注释即可)

# cuda_install_dir是cuda安装路径
CUDA_DIR := cuda_install_dir

没有GPU的话,就应该使用CPU_ONLY:=1。
(2) BLAS库
我这里使用默认安装的 Intel MKL

BLAS := mkl

(3)Python
设置python目录

#to find python.h and numpy/arrayobject.h
PYTHON_INCLUDE := python_install_dir
        python_install_dir/dist_packages/numpy/core/include
#to find libpythonX.X.so or .dylib
PYTHON_LIB := python_install_dir/lib

设置到这那就应该行了吧 。高兴早了点!!!上面的依赖库是普通用户权限下使用源码安装的,即使配置了PATH和LD_LIBRARY_PATH,编译还是会出现类似如下的一堆错误

/usr/bin/ld: cannot find -lgflags
collect2: ld returned 1 exit status
make: *** [.build_release/lib/libcaffe.so] Error 1

还需进行如下修改

#=======================================================
PROTO_INCLUDE := /home/usrname/software/protobuf/include
PROTO_LIB := /home/usrname/software/protobuf/lib
GLOG_INCLUDE := /home/usrname/software/glog/include
GLOG_LIB := /home/usrname/software/glog/lib
GFLAGS_INCLUDE := /home/usrname/software/gflags/include
GFLAGS_LIB := /home/usrname/software/gflags/lib
HDF5_INCLUDE := /home/usrname/software/hdf5/include
HDF5_LIB := /home/usrname/software/hdf5/lib
LEVELDB_INCLUDE := /home/usrname/software/leveldb-master/include
LEVELDB_LIB := /home/usrname/software/leveldb-master
LMDB_INCLUDE := /home/usrname/software/lmdb/include
LMDB_LIB := /home/usrname/software/lmdb/lib
OPENCV_INCLUDE := /home/usrname/software/opencv/include
OPENCV_LIB := /home/usrname/software/opencv/lib
SNAPPY_INCLUDE := /home/usrname/software/snappy/include
SNAPPY_LIB := /home/usrname/software/snappy/lib
BOOST_INCLUDE := /home/usrname/boost_1_57_0
BOOST_LIB := /home/usrname/boost_1_57_0/stage/lib
#======================================================================
# Whatever else you find you need goes here.
INCLUDE_DIRS := $(PYTHON_INCLUDE) $(PROTO_INCLUDE) $(GLOG_INCLUDE) $(GFLAGS_INCLUDE) $(HDF5_INCLUDE) $(LEVELDB_INCLUDE) $(LMDB_INCLUDE) $(OPENCV_INCLUDE) $(SNAPPY_INCLUDE) $(BOOST_INCLUDE) /usr/local/include
LIBRARY_DIRS := $(PYTHON_LIB) $(PROTO_LIB) $(GLOG_LIB) $(GFLAGS_LIB) $(HDF5_LIB) $(LEVELDB_LIB) $(LMDB_LIB) $(OPENCV_LIB) $(SNAPPY_LIB) $(BOOST_LIB) /usr/local/lib /usr/lib

(这里使我明白了此前一直没注意的一个问题: -L是编译时查找.o或者.so文件所在的目录,用于链接生成可执行文件; LD_LIBRARY_PATH 是环境变量,用于程序执行时, 搜索.so 的路径 。参见http://bbs.csdn.net/topics/330189724
此时,编译应该没什么问题了。

make all
# 或者make all -j12 加快编译速度。j后面数字是并行编译线程数,最好是机器的核数

但是顺利执行make all, make test后,make runtest出现了错误。。

[ PASSED ] 832 tests.
[ FAILED ] 6 tests, listed below:
[ FAILED ] PowerLayerTest/0.TestPowerGradientShiftZero, where TypeParam = caffe::FloatCPU
[ FAILED ] PowerLayerTest/1.TestPowerGradientShiftZero, where TypeParam = caffe::DoubleCPU
[ FAILED ] PowerLayerTest/1.TestPowerGradient, where TypeParam = caffe::DoubleCPU
[ FAILED ] PowerLayerTest/2.TestPowerGradientShiftZero, where TypeParam = caffe::FloatGPU
[ FAILED ] PowerLayerTest/3.TestPowerGradientShiftZero, where TypeParam = caffe::DoubleGPU
[ FAILED ] PowerLayerTest/3.TestPowerGradient, where TypeParam = caffe::DoubleGPU

解决方法就是将boost 1.57换成boost 1.55,然后重新编译。。。(1.56也是不行的,参考http://blog.csdn.net/danieljianfeng/article/details/42836167)

Leave a Reply

Your email address will not be published. Required fields are marked *