nvcc fatal : Unsupported gpu architecture 'compute_20' while compiling matlab - linux

(CentOS Linux release 7.3;cuda 9.1;GPU:Tesla P100-PCIE)
I've installed Matlab2018a on a server, but when I tried to do this:
vl_compilenn('enableGpu', true);
I encountered this:
vl_compilenn: CUDA: MEX config file:
'/data1/zhangdinghuai/gitrepo/explanatoryGraph/matconvnet-1.0-
beta24/matlab/src/config/mex_CUDA_glnxa64.xml'
Building with 'nvcc'.
nvcc fatal : Unsupported gpu architecture 'compute_20'
and
Building with 'nvcc'.
Error using mex
nvcc fatal : Unsupported gpu architecture 'compute_20'
Error in vl_compilenn>mex_compile (line 529)
mex(mopts{:}) ;
Error in vl_compilenn (line 487)
mex_compile(opts, srcs{i}, objfile, flags.mexcu) ;
I have searched similar questions but none of them works, can anyone give me a hand?
PS:more information about the server is listed below:
[zhangdinghuai#gpu01 2018a]$ lsb_release -a
LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-
noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages- 4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.3.1611 (Core)
Release: 7.3.1611
Codename: Core
[zhangdinghuai#gpu01 2018a]$ cat /etc/issue
\S
Kernel \r on an \m
[zhangdinghuai#gpu01 2018a]$ cat /proc/version
Linux version 3.10.0-514.26.1.el7.x86_64 (builder#kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Thu Jun 29 16:05:25 UTC 2017

In a similar thread here "nvcc fatal : Unsupported gpu architecture 'compute_20' while cuda 9.1+caffe+openCV 3.4.0 is installed" or at Askububtu , it was recommended to edit the makefile.config and to comment out the -gencode arch=compute_20.
Can you also share the exact kernel version you are using, the exact PCI device with PCI ID and driver versions if there are any. This might give better insight into your environment as well could help to answer further questions.

My solution was modifying the file matconvnet/matlab/src/config/mex_CUDA_glnxa64.xml.
Change the line
`NVCCFLAGS="-D_FORCE_INLINES -gencode=arch=compute_20,code=sm_20 -gencode=arch=compute_30,code=\"sm_30,compute_30\" $NVCC_FLAGS"`
into
`NVCCFLAGS="-D_FORCE_INLINES -gencode=arch=compute_20,code=sm_20 -gencode=arch=compute_30,code=\"sm_30,compute_30\" $NVCC_FLAGS"`

Related

Issues trying to debug a kernel vmcore

One of our clients called us saying they had had a kernel crash and asked us to investigate. They are running SLES12 SP2.
I copied the vmcore file under /var/crash (11 Mb) out of production, onto another machine also running SLES12 SP2. I copied the image of the kernel /boot/vmlinux-4.4.120-92.70-default.gz too. I installed the kernel debuginfo package in this machine. However I'm unable to run the crash utility on it:
$ strings vmcore |grep "4\.4\."
4.4.120-92.70-default
OSRELEASE=4.4.120-92.70-default
BOOT_IMAGE=/boot/vmlinuz-4.4.120-92.70-default root=[…]
$ strings ~/vmlinux-4.4.120-92.70-default |grep "4\.4\."
Linux version 4.4.120-92.70-default (geeko#buildhost) (gcc version 4.8.5 (SUSE Linux) ) #1 SMP Wed Mar 14 15:59:43 UTC 2018 (52a83de)
$ crash /usr/lib/debug/boot/vmlinux-4.4.120-92.70-default.debug ~/vmlinux-4.4.120-92.70-default vmcore
crash 7.1.5
[…]
GNU gdb (GDB) 7.6
[…]
This GDB was configured as "x86_64-unknown-linux-gnu"...
WARNING: could not find MAGIC_START!
WARNING: cannot read linux_banner string
crash: /usr/lib/debug/boot/vmlinux-4.4.120-92.70-default.debug and vmcore do not match!
Usage: […]
I think the strings invocations above prove that the kernel and the core do match, however I'm still getting that error. What can I do next?

Running 32-bit ARM binary on aarch64 not working despite CONFIG_COMPAT

I've got a 64-bit ARM machine that I'd like to run 32-bit ARM binaries on. As a test case I've built a small hello world for 32-bit ARM using the arm-linux-gnueabihf-gcc toolchain. file shows it as:
root#ubuntu:/home/ubuntu# file hello
hello: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.3, for GNU/Linux 3.2.0, BuildID[sha1]=61ffe5e22117a6d4c2ae37a1f4c76617d3e5facc, not stripped
But trying to run it produces:
root#ubuntu:/home/ubuntu# ./hello
bash: ./hello: cannot execute binary file: Exec format error
Based on a prior question, I checked whether the kernel was built with the CONFIG_COMPAT option, and it was:
root#ubuntu:/home/ubuntu# grep CONFIG_COMPAT= /boot/config-$(uname -r)
CONFIG_COMPAT=y
I also verified that the armhf architecture has been added and that the armhf version of the loader is present. Note that the loader itself doesn't run either:
root#ubuntu:/home/ubuntu# dpkg --print-foreign-architectures
armhf
root#ubuntu:/home/ubuntu# dpkg -l libc6:armhf
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-==============-=============-============-=================================
ii libc6:armhf 2.30-0ubuntu2 armhf GNU C Library: Shared libraries
root#ubuntu:/home/ubuntu# file /lib/arm-linux-gnueabihf/ld-2.30.so
/lib/arm-linux-gnueabihf/ld-2.30.so: ELF 32-bit LSB shared object, ARM, EABI5 version 1 (SYSV), dynamically linked, BuildID[sha1]=dff2b536287d61ddca68f3e001e14b7c235bbf68, stripped
root#ubuntu:/home/ubuntu# /lib/arm-linux-gnueabihf/ld-2.30.so
bash: /lib/arm-linux-gnueabihf/ld-2.30.so: cannot execute binary file: Exec format error
Other relevant system info:
root#ubuntu:/home/ubuntu# cat /proc/cpuinfo
processor : 0
BogoMIPS : 400.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics cpuid asimdrdm
CPU implementer : 0x43
CPU architecture: 8
CPU variant : 0x1
CPU part : 0x0af
CPU revision : 2
root#ubuntu:/home/ubuntu# uname -a
Linux ubuntu 5.3.0-24-generic #26-Ubuntu SMP Thu Nov 14 01:14:25 UTC 2019 aarch64 aarch64 aarch64 GNU/Linux
I'm running out of things to try at this point. Any idea how to get the kernel to recognize these binaries and run them?
Ok, it looks like the underlying problem here is not in software, unfortunately. The CPU we're using, the Cavium ThunderX2, is one of the few 64-bit ARM chips that does not have aarch32 support. Quoting from WikiChip:
Only the 64-bit AArch64 execution state is support. No 32-bit AArch32 support.
So, this explains why it's not able to run 32-bit ARM binaries. I'm still fairly sure that other 64-bit ARM chips, like the Cortex-A57, are able to do this.
Update: older 32-bit ARM binaries do indeed work on aarch64 with a CPU that supports it, as shown below on an AWS ARM a1.metal instance:
ubuntu#ip-172-31-12-156:~$ cat /proc/cpuinfo | tail
processor : 15
BogoMIPS : 166.66
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd08
CPU revision : 3
ubuntu#ip-172-31-12-156:~$ uname -a
Linux ip-172-31-12-156 4.15.0-1054-aws #56-Ubuntu SMP Thu Nov 7 16:18:50 UTC 2019 aarch64 aarch64 aarch64 GNU/Linux
ubuntu#ip-172-31-12-156:~$ dpkg --print-foreign-architectures
armhf
ubuntu#ip-172-31-12-156:~$ file hello_hf
hello_hf: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-, for GNU/Linux 3.2.0, BuildID[sha1]=c95f0c46dfab925db53506751d7677156e334e5c, not stripped
ubuntu#ip-172-31-12-156:~$ ./hello_hf
hello, world!
This question has become more relevant as 32bit ARM chips are being phased out.
Me, I like to see old 32 bit binaries supported on new 64 bit hardware. Not so important, but convenient.
I took out my old Raspberry Pi Zero (32 bit) and Raspberry Pi 3B+ (64 bit) and for the first time installed a 64 bit OS on the 3B+ despite warnings of bugs and such and did some compiling.
No special make parameters. Just plain ordinary 32 bit compiling.
Runs fine on 32 bit and 64 bit.
Controllers and VMs will be running 32 bit architecture for many more years.
4GB memory limit is a pain yes. But for small apps its not a problem.

Fail to run tensorflow on GPU

I fail to run the TF-CUDA tutorials_example_trainer as given in the installation guide (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/get_started/os_setup.md#installing-from-sources)
I've had problems with the CUDA libs before, but that was with graphics related demo's.
All details below,
Thank you in advance for the help provided.
Environment info
Operating System: Debian Stretch
Installed version of CUDA and cuDNN:
8.0, 5.0
If installed from source, provide
554ddd9ad2d4abad5a9a31f2d245f0b1012f0d10
Build label: 0.3.0
Build target: bazel-out/local-fastbuild/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Fri Jun 10 11:38:23 2016 (1465558703)
Steps to reproduce
Build from source with 367.35 driver
Run bazel-bin/tensorflow/cc/tutorials_example_trainer --use_gpu
Logs or other output that would be helpful
bazel-bin/tensorflow/cc/tutorials_example_trainer --use_gpu
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
modprobe: ERROR: ../libkmod/libkmod-module.c:832 kmod_module_insert_module() could not find module by name='nvidia_367_uvm'
modprobe: ERROR: could not insert 'nvidia_367_uvm': Unknown symbol in module, or unknown parameter (see dmesg)
E tensorflow/stream_executor/cuda/cuda_driver.cc:491] failed call to cuInit: CUDA_ERROR_UNKNOWN
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:153] retrieving CUDA diagnostic information for host: debian
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:160] hostname: debian
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:185] libcuda reported version is: 367.35.0
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:356] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module 367.35 Mon Jul 11 23:14:21 PDT 2016
GCC version: gcc version 5.4.0 20160609 (Debian 5.4.0-6)
"""
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] kernel reported version is: 367.35.0
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:293] kernel version seems to match DSO: 367.35.0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:81] No GPU devices available on machine.
F tensorflow/cc/tutorials/example_trainer.cc:125] Check failed: ::tensorflow::Status::OK() == (session->Run({{"x", x}}, {"y:0", "y_normalized:0"}, {}, &outputs)) (OK vs. Invalid argument: Cannot assign a device to node 'y': Could not satisfy explicit device specification '/gpu:0' because no devices matching that specification are registered in this process; available devices: /job:localhost/replica:0/task:0/cpu:0
[[Node: y = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/gpu:0"](Const, x)]])
The error message indicates that your GPU driver is not well set. You could try the following command to see if the driver is installed correctly.
$ nvidia-smi
If not please follow the instruction on the CUDA official site and reinstall CUDA. As your OS is not officially supported, you may want to change your OS.

Node: building from source vs binary distribution

I have downloaded node 0.10.31 source and built on my Linux machine using same steps mentioned in wiki. The source is just as it is and no changes made at all. The build is successful but when I compare bin/node file size with the one from binary downloaded there is around 900kb difference (built from source is bigger).
What is the reason?
Did I miss any optimizer or special config? Actually I wanted to use locally build node (after some change) in production. I just don't want to miss some settings here.
My environment:
gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4)
Python 2.6.6 (r266:84292, Nov 21 2013, 10:50:32)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
LSB Version: :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Distributor ID: RedHatEnterpriseServer
Description: Red Hat Enterprise Linux Server release 6.5 (Santiago)
Release: 6.5
Codename: Santiago
Note: Already posted in node.js groups, sorry for the cross post.
Thanks
You can try and strip out debugging symbols. E.g.,
$ strip node
Debugging symbols are pieces of information embedded in an object file, and useful for debugging purposes. Unfortunately, they take up space in the file, so if you do not plan debugging the node interpreter itself you could get rid of them.
Moreover, please check out strip's manual page for a full list of choices when discarding symbols from object files.

libstdc++.so.6: cannot handle TLS data

I have an application compiled at:
gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)
Linux debian 2.6.18-5-686 #1 SMP Fri Jun 1 00:47:00 UTC 2007 i686 GNU/Linux
and it runs well.
Now I want to run it at:
Linux 2.4.20_mvlcge31-tomas #7 Thu May 7 11:33:21 CEST 2009 i686 unknown
I got following errors:
libstdc++.so.6: cannot handle TLS data
From the web I saw someone suggested to do this: export LD_ASSUME_KERNEL=2.2.5
I tried but get even more errors:
ls: error while loading shared libraries: librt.so.1: cannot open shared object file: No such file or directory
Who can help me with it? thanks
You had compiled the application against much newer libc and kernel version, You can't compile program on 2.6 with newest libc and expect it to run on old kernel.
Also where do you actually still use Linux 2.4?

Resources