Cannot dlopen some GPU libraries. Skipping registering GPU devices - python-3.x

Tensorflow is only using the CPU and wont use the GPU. I assume its because it expects Cuda 10.0 and it finds 10.2.
I had installed 10.2 but have purged it and installed 10.0.
Im running Ubuntu 19.10, AMD Ryzen 2700 Cpu, RTX 2080 S.
I have installed the 440 Nvidia driver, It says cuda version 10.2 when i check with nvidia-smi and nvcc -version.
From pip3: tensorflow-gpu 1.14.0
tensorflow-datasets 2.0.0
tensorflow-estimator 1.14.0
tensorflow-metadata 0.21.1
From Nvidia-smi
| NVIDIA-SMI 440.44 Driver Version: 440.44 CUDA Version: 10.2 |
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| 0 GeForce RTX 208... Off | 00000000:08:00.0 On | N/A |
| 0% 48C P8 13W / 250W | 369MiB / 7979MiB | 3% Default |
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
| 0 1110 G /usr/lib/xorg/Xorg 18MiB |
| 0 1611 G /usr/lib/xorg/Xorg 73MiB |
| 0 1816 G /usr/bin/gnome-shell 108MiB |
| 0 2655 C python3 115MiB |
from nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89
But when i check the version.txt i get 10.0.130
cat /usr/local/cuda/version.txt
CUDA Version 10.0.130
I check the devices with :
from tensorflow.python.client import device_lib
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
incarnation: 4810338588393992961
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
incarnation: 7271419476897292826
physical_device_desc: "device: XLA_CPU device"
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
incarnation: 4332706623198547606
physical_device_desc: "device: XLA_GPU device"
How do i register the 10.0.130
Is that the reason why it wont run on GPU? Its super slow on the 8 Core CPU. Any advice?
Here is the log:
2020-02-13 14:11:31.411277: I tensorflow/core/platform/] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-02-13 14:11:31.440150: I tensorflow/core/platform/profile_utils/] CPU Frequency: 3193485000 Hz
2020-02-13 14:11:31.441076: I tensorflow/compiler/xla/service/] XLA service 0x5625b689c790 executing computations on platform Host. Devices:
2020-02-13 14:11:31.441123: I tensorflow/compiler/xla/service/] StreamExecutor device (0): <undefined>, <undefined>
2020-02-13 14:11:31.443001: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2020-02-13 14:11:31.472935: I tensorflow/stream_executor/cuda/] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-13 14:11:31.473407: I tensorflow/core/common_runtime/gpu/] Found device 0 with properties:
name: GeForce RTX 2080 SUPER major: 7 minor: 5 memoryClockRate(GHz): 1.845
pciBusID: 0000:08:00.0
2020-02-13 14:11:31.474361: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2020-02-13 14:11:31.487124: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2020-02-13 14:11:31.496148: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2020-02-13 14:11:31.498873: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2020-02-13 14:11:31.514842: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2020-02-13 14:11:31.525992: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2020-02-13 14:11:31.526168: I tensorflow/stream_executor/platform/default/] Could not dlopen library ''; dlerror: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.2/lib64
2020-02-13 14:11:31.526183: W tensorflow/core/common_runtime/gpu/] Cannot dlopen some GPU libraries. Skipping registering GPU devices...
2020-02-13 14:11:31.618627: I tensorflow/core/common_runtime/gpu/] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-02-13 14:11:31.618655: I tensorflow/core/common_runtime/gpu/] 0
2020-02-13 14:11:31.618662: I tensorflow/core/common_runtime/gpu/] 0: N
2020-02-13 14:11:31.620367: I tensorflow/stream_executor/cuda/] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-13 14:11:31.621395: I tensorflow/compiler/xla/service/] XLA service 0x5625b732d5f0 executing computations on platform CUDA. Devices:
2020-02-13 14:11:31.621407: I tensorflow/compiler/xla/service/] StreamExecutor device (0): GeForce RTX 2080 SUPER, Compute Capability 7.5
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
incarnation: 13330791690361361129
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
incarnation: 11872341970779952422
physical_device_desc: "device: XLA_CPU device"
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
incarnation: 15007819717683015571
physical_device_desc: "device: XLA_GPU device"
WARNING:tensorflow:From The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.
WARNING:tensorflow:From The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.
WARNING:tensorflow:From The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.
2020-02-13 14:11:33.799163: I tensorflow/stream_executor/cuda/] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-13 14:11:33.799597: I tensorflow/core/common_runtime/gpu/] Found device 0 with properties:
name: GeForce RTX 2080 SUPER major: 7 minor: 5 memoryClockRate(GHz): 1.845
pciBusID: 0000:08:00.0
2020-02-13 14:11:33.799646: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2020-02-13 14:11:33.799658: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2020-02-13 14:11:33.799669: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2020-02-13 14:11:33.799684: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2020-02-13 14:11:33.799695: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2020-02-13 14:11:33.799706: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2020-02-13 14:11:33.799777: I tensorflow/stream_executor/platform/default/] Could not dlopen library ''; dlerror: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.2/lib64
2020-02-13 14:11:33.799786: W tensorflow/core/common_runtime/gpu/] Cannot dlopen some GPU libraries. Skipping registering GPU devices...
2020-02-13 14:11:33.800016: I tensorflow/core/common_runtime/gpu/] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-02-13 14:11:33.800028: I tensorflow/core/common_runtime/gpu/]
WARNING:tensorflow:From The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.
2020-02-13 14:11:34.197990: W tensorflow/compiler/jit/] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
WARNING:tensorflow:From /home/node/.local/lib/python3.7/site-packages/tensorflow/python/training/ checkpoint_exists (from is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
WARNING:tensorflow:From start_queue_runners (from is deprecated and will be removed in a future version.
Instructions for updating:
To construct input pipelines, use the `` module.
total training sample num:91
batch size: 64, batch num per epoch: 1, epoch num: 5000
start training...

Judging from your logs it looks like tensorflow finds the correct cuda version but the cudnn library is missing.
2020-02-13 14:11:31.474361: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2020-02-13 14:11:31.526168: I tensorflow/stream_executor/platform/default/] Could not dlopen library ''; dlerror: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.2/lib64
Have you installed the correct version of cudnn? As you can see here
tensorflow 1.14 also requires cudnn 7.4

The only thing that worked for me to solve this issue was to completely remove CUDA and reinstall it again.


While installing tensorflow in ubuntu for keras backend should I install it in virtual environment or main python files

iddharth#siddharth-HP-EliteBook-8460p:~$ source ./venv/bin/activate
(venv) siddharth#siddharth-HP-EliteBook-8460p:~$ python -c "import
tensorflow as tf;print(tf.reduce_sum(tf.random.normal([1000, 1000])))"
2020-06-07 13:40:24.858083: W
tensorflow/stream_executor/platform/default/] Could
not load dynamic library ''; dlerror: cannot
open shared object file: No such file or directory 2020-06-07
13:40:24.858216: E tensorflow/stream_executor/cuda/]
failed call to cuInit: UNKNOWN ERROR (303) 2020-06-07 13:40:24.858349:
I tensorflow/stream_executor/cuda/] kernel
driver does not appear to be running on this host
(siddharth-HP-EliteBook-8460p): /proc/driver/nvidia/version does not
exist 2020-06-07 13:40:25.024713: I
tensorflow/core/platform/profile_utils/] CPU
Frequency: 2593965000 Hz 2020-06-07 13:40:25.025838: I
tensorflow/compiler/xla/service/] XLA service
0x7f0574000b60 initialized for platform Host (this does not guarantee
that XLA will be used). Devices: 2020-06-07 13:40:25.025876: I
tensorflow/compiler/xla/service/] StreamExecutor
device (0): Host, Default Version tf.Tensor(-1066.3622, shape=(),

SLURM, using srun to print outputs

I am using srun to run my program, however, it cannot print the output.
me#home:~$ srun -p K80q --gres=gpu:1 -N 1 python3
2019-05-15 19:56:43.305156: I tensorflow/core/platform/] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-05-15 19:56:43.543516: I tensorflow/core/common_runtime/gpu/] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:85:00.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2019-05-15 19:56:43.543567: I tensorflow/core/common_runtime/gpu/] Adding visible gpu devices: 0
2019-05-15 19:56:43.900189: I tensorflow/core/common_runtime/gpu/] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-05-15 19:56:43.900248: I tensorflow/core/common_runtime/gpu/] 0
2019-05-15 19:56:43.900257: I tensorflow/core/common_runtime/gpu/] 0: N
2019-05-15 19:56:43.900619: I tensorflow/core/common_runtime/gpu/] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10761 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:85:00.0, compute capability: 3.7)
I only got the above output and it cannot print the information I expected. How can I fix it?
By the way, simply define a test code
import tensorflow
if __name__ == '__main__':
for i in range(10):
It can print Hello 10 times.
After 20 minutes, it outputs some information I expected. How can I make it output immediately?
Try the -u option of srun:
-u, --unbuffered
By default the connection between slurmstepd and the user launched application is over a pipe. The stdio output written by
the application is
buffered by the glibc until it is flushed or the output is set as unbuffered. See setbuf(3). If this option is specified the
tasks are executed
with a pseudo terminal so that the application output is unbuffered.

intel SPDK ioat example fail to run

I am new in the intel SPDK and meet some problem when I run the example code.
I setup the BIOS as this page said.
Intel® Hyper-Threading Technology off
Intel SpeedStep® technology enabled
Intel® Turbo Boost Technology disabled
then I git clone from this page and run all the command. The test command ./test/unit/ return All unit tests passed.
But when I run the example examples/ioat/verify/verify , it return
EAL: 24 hugepages of size 1073741824 reserved, but no mounted hugetlbfs found for that size
Starting SPDK v18.10-pre / DPDK 18.05.0 initialization...
[ DPDK EAL parameters: verify --no-shconf -c 0x1 --legacy-mem --file-prefix=spdk_pid3170 ]
EAL: Detected 16 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/spdk_pid3170/mp_socket
EAL: 24 hugepages of size 1073741824 reserved, but no mounted hugetlbfs found
for that size
EAL: Probing VFIO support...
User configuration:
Run time: 10 seconds
Core mask: 0x1
Queue depth: 32
Not enough ioat channels found. Check that ioat channels are bound
to uio_pci_generic or vfio-pci. scripts/ can help with this.
and scripts/ status shows
node hugesize free / total
node0 1048576kB 24 / 24
node0 2048kB 0 / 800
node1 1048576kB 0 / 0
node1 2048kB 0 / 224
NVMe devices
BDF Numa Node Driver name Device name
BDF Numa Node Driver Name
BDF Numa Node Driver Name Device Name
My hardware is:
linux kernel version 4.15.7
with ioatdma compile as module
CPU intel Xeon E5-2695
chipset C612
It would be great help if somebody could give me some advises or send me some website about SPDK!
Thank you!
Run ./scripts/ (with no parameters). If there will be no ioat devices under I/OAT DMA section you can't run this app. Also there is no hugetlbfs mount points.

CNTK on Azure Data Science VM

I have an N-Series Azure VM (the Data Science VM) with Tesla K80 GPU. According to the NVIDIA scanner my GPU driver is up to date.
When I run my CNTK Brainscript it says "No GPUs Found" and runs in CPU mode. What can I do to troubleshoot?
requestnodes [MPIWrapper]: using 1 out of 1 MPI nodes on a single host (1 reques
ted); we (0) are in (participating)
Build info:
Built time: Dec 22 2016 01:43:24
Last modified date: Thu Dec 22 01:35:04 2016
Build type: Release
Build target: GPU
With 1bit-SGD: yes
With ASGD: yes
Math lib: mkl
CUDA_PATH: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8
CUB_PATH: c:\src\cub-1.4.1
CUDNN_PATH: C:\local\cudnn-8.0-windows10-x64-v5.1
Build Branch: HEAD
Build SHA1: 8e8b5ff92eff4647be5d41a5a515956907567126
Built by svcphil on DPHAIM-24
Build Path: C:\jenkins\workspace\CNTK-Build-Windows\Source\CNTK\
No GPUs found
Edit: here is the output from NVidia_smi.exe:
C:\Program Files\NVIDIA Corporation\NVSMI>.\nvidia-smi.exe
Fri Jan 13 19:00:43 2017
| NVIDIA-SMI 369.30 Driver Version: 369.30 |
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| 0 Tesla K80 TCC | 0BD1:00:00.0 Off | Off |
| N/A 43C P8 27W / 149W | 0MiB / 12189MiB | 0% Default |
| 1 Tesla K80 TCC | 5871:00:00.0 Off | Off |
| N/A 35C P8 34W / 149W | 0MiB / 12189MiB | 0% Default |
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
| No running processes found |
The Windows Data Science VM bydefault does not come with the GPU drivers, CUDA etc. We do have an extension called "Deep Learning toolkit for DSVM" that adds on drivers, CUDA and GPU edition of deep learning software like CNTK, Tensorflow, MxNet.
More Info:
We also recently released a Ubuntu version of DSVM with builtin CUDA, GPU drivers and several more deep learning tools and can be deployed either on GPU VM or CPU only VMs on Azure.
Would it be possible for you to run the python notebooks and see if you could run them with the device being set to gpu(id)? or from activated CNTK python environment you could try setting some device.
import cntk as C
from cntk.device import set_default_device, gpu
This might give you some clues whether it is Brainscript specific issue.
Well the python script and Brainscript work now, after installing CUDA (I installed it to run NVIDIA_SMI). I should not have assumed that the Azure Data Science image (that only works with an N Series VM) has the necessary NVIDIA libraries pre-installed. :-)

Goal is to pull the video driver version from the display information, then compare it to a list of versions supported

The text file has following information:
System Information
Time of this report: 5/22/2014, 14:20:52
Machine name: CONFERENCE13
Operating System: Windows 7 Professional 64-bit (6.1, Build 7601) Service Pac
k 1 (7601.win7sp1_gdr.140303-2144)
Language: English (Regional Setting: English)
System Manufacturer: Mario, Inc.
System Model: Mario Virtual Platform
BIOS: PhoenixBIOS 4.0 Release 6.0
Processor: Intel(R) Xeon(R) CPU E5-2680 0 # 2.70GHz (4 CPUs), ~2.7GHz
Memory: 2048MB RAM
Available OS Memory: 2048MB RAM
Page File: 1302MB used, 2792MB available
Windows Dir: C:\Windows
DirectX Version: DirectX 11
DX Setup Parameters: Not found
User DPI Setting: Using System DPI
System DPI Setting: 96 DPI (100 percent)
DWM DPI Scaling: Disabled
DxDiag Version: 6.01.7601.17514 32bit Unicode
DxDiag Notes
Display Tab 1: No problems found.
Sound Tab 1: No problems found.
Input Tab: No problems found.
DirectX Debug Levels
Direct3D: 0/4 (retail)
DirectDraw: 0/4 (retail)
DirectInput: 0/5 (retail)
DirectMusic: 0/5 (retail)
DirectPlay: 0/9 (retail)
DirectSound: 0/5 (retail)
DirectShow: 0/6 (retail)
Display Devices
Card name: Mario SVGA 3D
Manufacturer: Mario, Inc.
Chip type: Mario Virtual SVGA 3D Graphics Adapter
DAC type: n/a
Device Key: Enum\PCI\VEN_15AD&DEV_0405&SUBSYS_040515AD&REV_00
Display Memory: 223 MB
Dedicated Memory: 35 MB
Shared Memory: 188 MB
Current Mode: 1555 x 794 (32 bit) (60Hz)
Monitor Name: Generic Non-PnP Monitor
Monitor Model: unknown
Monitor Id:
Native Mode: unknown
Output Type: HD15
Driver Name: vm3dum64.dll,vm3dum,vm3dgl64.dll,vm3dgl
Driver File Version: 7.14.0001.2032 (English)
Driver Version:
DDI Version: unknown
Driver Model: WDDM 1.0
Driver Attributes: Final Retail
Driver Date/Size: 2/11/2014 03:15:04, 258264 bytes
WHQL Logo'd: n/aWHQL Date Stamp: n/a
Device Identifier: {D7B71B4D-4745-11CF-ED71-0424A1C2CA35}
Vendor ID: 0x15AD
Device ID: 0x0405
SubSys ID: 0x040515AD
Revision ID: 0x0000
Driver Strong Name: oem13.inf:VMware.NTamd64.6.0:VM3D_AMD64:\ven_15ad&dev_0405&subsys_040515ad&rev_00
Rank Of Driver: 00F60000
Video Accel:
Deinterlace Caps: n/a
D3D9 Overlay: n/a
DXVA-HD: n/a
DDraw Status: Not Available
D3D Status: Not Available
AGP Status: Not Available
Sound Devices
Description: Speakers (Mario Virtual Audio (DevTap))
Default Sound Playback: Yes
Default Voice Playback: Yes
Hardware ID: PNPB009
Manufacturer ID: 1
Product ID: 100
Type: WDM
Driver Name: vmwvaudio.sys
Driver Version: 6.00.0000.3800 (English)
Driver Attributes: Final Retail
WHQL Logo'd: n/a
Date and Size: 11/13/2013 21:22:16, 46672 bytes
Other Files:
Driver Provider: VMware
HW Accel Level: Basic
Cap Flags: 0x0
Min/Max Sample Rate: 0, 0
Static/Strm HW Mix Bufs: 0, 0
Static/Strm HW 3D Bufs: 0, 0
HW Memory: 0
Voice Management: No
EAX(tm) 2.0 Listen/Src: No, No
I3DL2(tm) Listen/Src: No, No
Sensaura(tm) ZoomFX(tm): No**
I am trying to pull card name and driver file version from the display devices and then compare with certain list as follow:
Windows XP Windows Vista Windows 7 Windows 8 Windows 8.1 Windows Server 2008 R2
View 3.1.3 build 252693
Dated: 4/21/2010
VMware SVGA 3D
Dated: 4/21/2010
Not Supported Not Supported Not Supported Not Supported
View 4.0.2 build 294291
Dated: 4/21/2010
Tried awk but is giving me some error, new to awk and bash need some help thank you.
awk 'BEGIN{
FS="="; OFS=" - "; DispalyDevices=""
function display(){
print displaydevices,cardname,driverfileversion
if(cardname!="") display();
cardname=""; driverfileversion=""; display=$0;
gsub("Display.*PLAY"; "Display",display)
END{display}' dx_diag.txt | cat > dx_outputfile.txt
The error is:
awk: syntax error at source line 1
Within this context:
BEGIN{FS="="; OFS=" - "; DispalyDevices=""}function display(){print displaydevices,cardname,driverfileversion}/DisplayDevices/{if(cardname!="") display(); cardname=""; driverfileversion=""; display=$0; >>> gsub("Display.*PLAY"; <<<
awk: illegal statement at source line 1
awk: illegal statement at source line 1**
gsub("Display.*PLAY"; "Display",display)
gsub("Display.*PLAY", "Display",display)
#-------------------^--- comma, not semi-colon
