Cuda version issue while using Detectron2 in Google Colab - pytorch

I am trying to run the Detectron2 module on Colab using CUDA version 10.0 but since today there have been some issues regarding the versions of Cuda Compiler.
The output I get after running !nvidia-smi is :
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.36.06 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... Off | 00000000:00:04.0 Off | 0 |
| N/A 36C P0 26W / 250W | 0MiB / 16280MiB | 0% Default |
| | | ERR! |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
And what I get after running !nvcc --version is :
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
I am not able to understand the reason for the mismatch. Also the output from detectron after running !python -m detectron2.utils.collect_env is :
---------------------- ----------------------------------------------------------------------------
sys.platform linux
Python 3.6.9 (default, Apr 18 2020, 01:56:04) [GCC 8.4.0]
numpy 1.18.5
detectron2 0.1.3 #/content/gdrive/My Drive/Data/Table_Struct/detectron2_repo/detectron2
Compiler GCC 7.5
CUDA compiler CUDA 10.1
detectron2 arch flags sm_60
DETECTRON2_ENV_MODULE <not set>
PyTorch 1.4.0+cu100 #/usr/local/lib/python3.6/dist-packages/torch
PyTorch debug build False
GPU available True
GPU 0 Tesla K80
CUDA_HOME /usr/local/cuda
Pillow 7.0.0
torchvision 0.5.0+cu100 #/usr/local/lib/python3.6/dist-packages/torchvision
torchvision arch flags sm_35, sm_50, sm_60, sm_70, sm_75
fvcore 0.1.1
cv2 4.1.2
---------------------- ----------------------------------------------------------------------------
PyTorch built with:
- GCC 7.3
- Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- CUDA Runtime 10.0
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
- CuDNN 7.6.3
- Magma 2.5.1
- Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,
My guess is that the version of CUDA on Colab doesn't match the Detectron2 I am using. IF so how can I change something to make this work on Google Colab.

The problem was with the compiled Detectron2 Cuda runtime version and once I recompiled Detectron2 the error was solved.
Here is the result from !python -m detectron2.utils.collect_env command:
---------------------- ----------------------------------------------------------------------------
sys.platform linux
Python 3.6.9 (default, Apr 18 2020, 01:56:04) [GCC 8.4.0]
numpy 1.18.5
detectron2 0.1.3 #/content/gdrive/My Drive/Data/Table_Struct/detectron2_repo/detectron2
Compiler GCC 7.5
CUDA compiler CUDA 10.0
detectron2 arch flags sm_75
DETECTRON2_ENV_MODULE <not set>
PyTorch 1.4.0+cu100 #/usr/local/lib/python3.6/dist-packages/torch
PyTorch debug build False
GPU available True
GPU 0 Tesla T4
CUDA_HOME /usr/local/cuda
Pillow 7.0.0
torchvision 0.5.0+cu100 #/usr/local/lib/python3.6/dist-packages/torchvision
torchvision arch flags sm_35, sm_50, sm_60, sm_70, sm_75
fvcore 0.1.1
cv2 4.1.2
---------------------- ----------------------------------------------------------------------------
PyTorch built with:
- GCC 7.3
- Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- CUDA Runtime 10.0
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
- CuDNN 7.6.3
- Magma 2.5.1
- Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

Related

Installing Apex. fatal error: cuda_profiler_api.h: No such file or directory

I am trying to install apex following the steps:
git clone https://github.com/NVIDIA/apex
cd apex
pip3 install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" --global-option="--deprecated_fused_adam" --global-option="--xentropy" --global-option="--fast_multihead_attn" ./
cd ..
When I start the installation, I get the following error:
Compiling cuda extensions with
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Jun__8_16:49:14_PDT_2022
Cuda compilation tools, release 11.7, V11.7.99
Build cuda_11.7.r11.7/compiler.31442593_0
from /3tb/share/anaconda3/envs/ak_env/bin
running install
/3tb/share/anaconda3/envs/ak_env/lib/python3.10/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
running build
running build_py
running build_ext
/3tb/share/anaconda3/envs/ak_env/lib/python3.10/site-packages/torch/utils/cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
warnings.warn(msg.format('we could not find ninja.'))
building 'scaled_upper_triang_masked_softmax_cuda' extension
gcc -pthread -B /3tb/share/anaconda3/envs/ak_env/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -fPIC -O2 -isystem /3tb/share/anaconda3/envs/ak_env/include -fPIC -O2 -isystem /3tb/share/anaconda3/envs/ak_env/include -fPIC -I ~/seq2seq/apex/csrc -I/3tb/share/anaconda3/envs/ak_env/lib/python3.10/site-packages/torch/include -I/3tb/share/anaconda3/envs/ak_env/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/3tb/share/anaconda3/envs/ak_env/lib/python3.10/site-packages/torch/include/TH -I/3tb/share/anaconda3/envs/ak_env/lib/python3.10/site-packages/torch/include/THC -I/3tb/share/anaconda3/envs/ak_env/include -I/3tb/share/anaconda3/envs/ak_env/include/python3.10 -c csrc/megatron/scaled_upper_triang_masked_softmax.cpp -o build/temp.linux-x86_64-cpython-310/csrc/megatron/scaled_upper_triang_masked_softmax.o -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=scaled_upper_triang_masked_softmax_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
/3tb/share/anaconda3/envs/ak_env/bin/nvcc -I ~/seq2seq/apex/csrc -I/3tb/share/anaconda3/envs/ak_env/lib/python3.10/site-packages/torch/include -I/3tb/share/anaconda3/envs/ak_env/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/3tb/share/anaconda3/envs/ak_env/lib/python3.10/site-packages/torch/include/TH -I/3tb/share/anaconda3/envs/ak_env/lib/python3.10/site-packages/torch/include/THC -I/3tb/share/anaconda3/envs/ak_env/include -I/3tb/share/anaconda3/envs/ak_env/include/python3.10 -c csrc/megatron/scaled_upper_triang_masked_softmax_cuda.cu -o build/temp.linux-x86_64-cpython-310/csrc/megatron/scaled_upper_triang_masked_softmax_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -O3 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=scaled_upper_triang_masked_softmax_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=compute_61 -gencode=arch=compute_61,code=sm_61 -std=c++14
csrc/megatron/scaled_upper_triang_masked_softmax_cuda.cu:21:10: fatal error: cuda_profiler_api.h: No such file or directory
21 | #include <cuda_profiler_api.h>
| ^~~~~~~~~~~~~~~~~~~~~
compilation terminated.
csrc/megatron/scaled_upper_triang_masked_softmax_cuda.cu:21:10: fatal error: cuda_profiler_api.h: No such file or directory
21 | #include <cuda_profiler_api.h>
| ^~~~~~~~~~~~~~~~~~~~~
compilation terminated.
error: command '/3tb/share/anaconda3/envs/ak_env/bin/nvcc' failed with exit code 255
error: subprocess-exited-with-error
× Running setup.py install for apex did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
full command: /3tb/share/anaconda3/envs/ak_env/bin/python -u -c '
Here is the output of nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Jun__8_16:49:14_PDT_2022
Cuda compilation tools, release 11.7, V11.7.99
Build cuda_11.7.r11.7/compiler.31442593_0
Some solutions that I find suggest doing the following:
export PATH="/usr/local/cuda-11.7/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda-11.7/lib64:$LD_LIBRARY_PATH"
However, /usr/local/cuda-11.7 is not exists in my system.
How can I solve this issue.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.11 Driver Version: 525.60.11 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:03:00.0 Off | N/A |
| 0% 46C P0 37W / 180W | 0MiB / 8192MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
I was able to solve this issue by manually installing cuda-11.7 toolkit even though I have cuda-11.7 installed using conda
https://developer.nvidia.com/cuda-11-7-0-download-archive?target_os=Linux
After installing it, I followed these instructions
Please make sure that
PATH includes /usr/local/cuda-11.7/bin
LD_LIBRARY_PATH includes /usr/local/cuda-11.7/lib64, or, add /usr/local/cuda-11.7/lib64 to /etc/ld.so.conf and run ldconfig as root
By using the following commands before compiling apex
export PATH="/usr/local/cuda-11.7/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda-11.7/lib64:$LD_LIBRARY_PATH"
Note: I had to use the root user for the compiling due to issues with installing the toolkit, which you may not need to do. After that I changed the ownership to the regular user. It is not recommanded to use the root.

lld-link cannot find libcmt.lib when cross-compiling

I use the command
clang -v -fuse-ld=lld-link -target x86_64-pc-win32 out.0.ll -o out.exe
to turn LLVM IR code into an executable file
The command works flawlessly on my windows machine, but when I try running it on linux to cross-compile for windows it gives the following error message:
gbr22#instance-1:~/llvmtest$ clang -v -fuse-ld=lld-link -target x86_64-pc-win32 out.0.ll -o out.exe
clang version 7.0.1-8+deb10u2 (tags/RELEASE_701/final)
Target: x86_64-pc-windows-msvc
Thread model: posix
InstalledDir: /usr/bin
"/usr/lib/llvm-7/bin/clang" -cc1 -triple x86_64-pc-windows-msvc19.11.0 -emit-obj -mrelax-all -mincremental-linker-compatible -disable-free -disable-llvm-verifier -discard-value-names -main-file-name out.0.ll -mrelocation-model pic -pic-level 2 -mthread-model posix -fmath-errno -masm-verbose -mconstructor-aliases -munwind-tables -target-cpu x86-64 -dwarf-column-info -debugger-tuning=gdb -momit-leaf-frame-pointer -v -resource-dir /usr/lib/llvm-7/lib/clang/7.0.1 -fdebug-compilation-dir /home/gbr22/llvmtest -ferror-limit 19 -fmessage-length 120 -fno-use-cxa-atexit -fms-extensions -fms-compatibility -fms-compatibility-version=19.11 -fdelayed-template-parsing -fobjc-runtime=gcc -fdiagnostics-show-option -fcolor-diagnostics -o /tmp/out-db8d92.o -x ir out.0.ll
clang -cc1 version 7.0.1 based upon LLVM 7.0.1 default target x86_64-pc-linux-gnu
warning: overriding the module target triple with x86_64-pc-windows-msvc19.11.0 [-Woverride-module]
1 warning generated.
"/usr/lib/llvm-7/bin/lld-link" -out:out.exe -defaultlib:libcmt -libpath:lib/amd64 -nologo /tmp/out-db8d92.o
lld-link: error: could not open libcmt.lib: No such file or directory
clang: error: linker command failed with exit code 1 (use -v to see invocation)
How could I resolve this issue?
Here is a link to the file if that helps.

unable to build cuda 7.0 samples on linux with clang

I'm trying to build cuda samples version 7.0 on linux (redhat 7) using clang. Cuda 5.5, 6.0, 6.5 samples can be successfully built with clang, but when i'm trying to build 7.0 samples - the following error appears:
/usr/local/cuda-7.0/bin/nvcc -ccbin /usr/local/bin/clang++ -I../../common/inc -m64 -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -o simplePrintf.o -c simplePrintf.cu
nvcc fatal : Host compiler targets unsupported OS.
make: *** [simplePrintf.o] Error 1
My params from makefile:
HOST_COMPILER=/usr/local/bin/clang++
TARGET_OS=linux
TARGET_ARCH=x86_64
HOST_ARCH=x86_64
Any help would be appreciated.
Thanks.
clang isn't a supported compiler on linux for CUDA.
You can discover the supported configurations here
As pointed out, clang is not supported. In my case, changing it to usr/bin/g++ did the trick.

Develop a Cuda DLL with Visual Studio 2010 running on any pc [duplicate]

When I go to /usr/local/cuda/samples/1_Utilities/deviceQuery and execute
moose#pc09 /usr/local/cuda/samples/1_Utilities/deviceQuery $ sudo make clean
rm -f deviceQuery deviceQuery.o
rm -rf ../../bin/x86_64/linux/release/deviceQuery
moose#pc09 /usr/local/cuda/samples/1_Utilities/deviceQuery $ sudo make
"/usr/local/cuda-7.0"/bin/nvcc -ccbin g++ -I../../common/inc -m64 -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_52,code=compute_52 -o deviceQuery.o -c deviceQuery.cpp
"/usr/local/cuda-7.0"/bin/nvcc -ccbin g++ -m64 -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_52,code=compute_52 -o deviceQuery deviceQuery.o
mkdir -p ../../bin/x86_64/linux/release
cp deviceQuery ../../bin/x86_64/linux/release
moose#pc09 /usr/local/cuda/samples/1_Utilities/deviceQuery $ ./deviceQuery
I keep getting
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 35
-> CUDA driver version is insufficient for CUDA runtime version Result = FAIL
I have no idea how to fix it.
My System
moose#pc09 ~ $ cat /etc/issue
Linux Mint 17 Qiana \n \l
moose#pc09 ~ $ uname -a
Linux pc09 3.13.0-36-generic #63-Ubuntu SMP Wed Sep 3 21:30:07 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
moose#pc09 ~ $ lspci -v | grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation GK110B [GeForce GTX Titan Black] (rev a1) (prog-if 00 [VGA controller])
Subsystem: NVIDIA Corporation Device 1066
Kernel driver in use: nvidia
01:00.1 Audio device: NVIDIA Corporation GK110 HDMI Audio (rev a1)
Subsystem: NVIDIA Corporation Device 1066
moose#pc09 ~ $ sudo lshw -c video
*-display
description: VGA compatible controller
product: GK110B [GeForce GTX Titan Black]
vendor: NVIDIA Corporation
physical id: 0
bus info: pci#0000:01:00.0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress vga_controller bus_master cap_list rom
configuration: driver=nvidia latency=0
resources: irq:96 memory:fa000000-faffffff memory:d0000000-d7ffffff memory:d8000000-d9ffffff ioport:e000(size=128) memory:fb000000-fb07ffff
moose#pc09 ~ $ nvidia-settings -q NvidiaDriverVersion
Attribute 'NvidiaDriverVersion' (pc09:0.0): 331.79
moose#pc09 ~ $ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 331.79 Sun May 18 03:55:59 PDT 2014
GCC version: gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04)
moose#pc09 ~ $ lsmod | grep -i nvidia
nvidia_uvm 34855 0
nvidia 10703828 40 nvidia_uvm
drm 303102 5 ttm,drm_kms_helper,nvidia,nouveau
moose#pc09 ~ $ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Mon_Feb_16_22:59:02_CST_2015
Cuda compilation tools, release 7.0, V7.0.27
moose#pc09 ~ $ nvidia-smi
Thu Nov 12 11:23:24 2015
+------------------------------------------------------+
| NVIDIA-SMI 331.79 Driver Version: 331.79 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX TIT... Off | 0000:01:00.0 N/A | N/A |
| 26% 35C N/A N/A / N/A | 132MiB / 6143MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| 0 Not Supported |
+-----------------------------------------------------------------------------+
Update your NVIDIA driver. At the moment you have the driver which only supports CUDA 6 or lower, and you are trying to use the CUDA 7.0 toolkit with it.
I ran into this exact same error message with toolkit 8.0 on ubuntu 1604. I tried reinstalling toolkit, cudnn, etc etc and it didn't help. The solution turned out to be very simple: update to the latest NVIDIA driver. I installed NVIDIA-Linux-x86_64-367.57.run and the error went away.
My cent,
this error may be related to the selected GPU mode (Performance/Power Saving Mode), when you select (with nvidia-settings utiliy) the integrated Intel GPU and you execute the deviceQuery script... you get this error:
-> CUDA driver version is insufficient for CUDA runtime version
But this error is misleading, by selecting back the NVIDIA GPU(Performance mode) with nvidia-settings utility the problem disappears.
It is not a version problem (in my scenario).
Regards

Multithreading with LAPACK 3.3 & above on MacOS 10.6 and 10.7

I am trying to build and run a multi-thread program using openMP on MAC 10.6 and MAC 10.7
the program calls
zgelss and zgemm from multiple thread
I have compiled the LAPACK 3.4 and refBLAS
I compiled my program with following command
g++-4.2 main.cpp -o testProduct -L/Users/LAPACK/lapack-3.4.0/ -llapack3.4 \
-lrefblas -L/Users/opt/gcc4.2/lib -lgfortran -fopenmp
The results of this program are not correct
where as when I compile the program with libraries provided by Apple it works fine on 10.7 but not 10.6
(MAC 10.7 have modified lapack 3.2.1) and (MAC 10.6 have modified lapack 3.1.1) --I guess
g++-4.2 main.cpp -o testProduct -framework accelerate -fopenmp
Can any one explain if LAPACK 3.4 and its corresponding BLAS is thread safe what could be the problem?

Resources