How to use CUDA_FORCE_PTX_JIT? - linux

According to NVIDIA Programming Guide:
Any PTX code loaded by an application at runtime is compiled further
to binary code by the device driver. This is called just-in-time
compilation. Just-in-time compilation increases application load time,
but allows applications to benefit from latest compiler improvements.
...
Setting CUDA_FORCE_PTX_JIT to 1 forces the device driver to ignore any
binary code embedded in an application (see Section 3.1.4) and to
just-in-time compile embedded PTX code instead; if a kernel does not
have embedded PTX code, it will fail to load
I've compiled my simple vectorAdd using following flags:
nvcc -o vectorAdd -gencode arch=compute_20,code=sm_20 vectorAdd.cu
When the CUDA_FORCE_PTX_JIT environment variable is unset, I get correct results. But when I set the CUDA_FORCE_PTX_JIT environment variable to 1 I get following error from cudaGetErrorString:
invalid device function
How can I fix this issue and get CUDA_FORCE_PTX_JIT working? Maybe the way of my compilation does not embed any PTX code.
Thanks in Advance.
Further information:
CUDA Driver Version: 295.41
CUDA Toolkit version: 4.0
OS: Ubuntu 10.04
Hardware: GTX 480, or Tesla C2050

I found a workout to handle the issue. During compile, the target GPU must not be specified in anyway (Remove -arch or -gencode flags). Subsequently, the driver generates the destination binary at the runtime.

Related

runtime detection whether ARMv7 ELF binaries can be loaded on ARMv6 host

Consider a host application that has been compiled for armv6 (think: Raspbian) and can dlopen() extensions/plugins.
Some of these plugins might be compiled for armv7.
If the host application is running on a recent hardware, there shouldn't be any problem loading and running these plugins.
OTOH, if the application is running on legacy hardware (e.g. RPi2), the dlopen() will probably fail.
Now I would like to determine, whether the host application is capable of loading armv7 binaries, without actually trying to do so.
My use case is: writing a package manager that queries an online resource for available plugins; only compatible packages should be displayed: if the host is running on armv6, only armv6 plugins will be shown, but if it is running on armv7 both armv7 and armv6 binaries are presented.
The package architecture (as can be queried via the online resource) is detected via something like readelf -A <binary> | grep Tag_CPU_arch.
My original attempt used the __ARM_ARCH macro on the host, but that is obviously a compile-time check rather than a runtime check.
On x86, there's __builtin_cpu_supports() (for gcc and friends), but that is not available on ARM.
Parsing /proc/cpuinfo might be an option, but honestly I have no clue what to check for...
Ideas?

Can gcc produce binary for Arm without cross compilation

Can we configure gcc running on intel x64 architecture to produce binary for ARM chip by just passing some flags to gcc and not using a cross compiler.
Short: Nope
Compiler:
gcc is not a native crosscompiler, the target architecture has to be specified at the time you compile gcc. (Some exceptions apply, as for example x86 and x86_64 can be supported at the same time)
clang would be a native crosscompiler, and you can generate code for arm by passing -target=arm-linux-gnu, but you still cant produce binaries, as you need a linker and a C-library too. Means you can run clang -target=arm-linux-gnu -c <your file> and compile C/C++ Code (will likely need to point it to your C/C++ include paths) - but you cant build binaries.
Rest of the toolchain:
You need a fitting linker and toolchain too, both are specific to the architecture and OS you want to run at.
Possible solutions:
Get a fitting toolchain, or compile your own. For arm linux you have for ex. CrossToolchains if you are on debian, for barebones you can get a crosscompiler from codesourcery.
Since you were very vague, its not possible to give you a clearer answer

CUDA 5.5 & Intel C/C++ Compiler on Linux

For my current project, I need to use CUDA and the Intel C/C++ compilers in the same project. (I rely on the SSYEV implementation of Intel's MKL, which takes roughly 10 times as long when using GCC+MKL instead of ICC+MKL (~3ms from GCC, ~300µs from ICC).
icc -v
icc version 12.1.5
NVIDIA states, that Intel ICC 12.1 is supported (http://docs.nvidia.com/cuda/cuda-samples/index.html#linux-platforms-supported), but even after having downgraded to Intel ICC 12.1.5 (installed as part of the Intel Composer XE 2011 SP1 Update 3), I am still running into this issue:
nvcc -ccbin=icc src/test.cu -o test
/usr/local/cuda-5.5/bin//..//include/host_config.h(72): catastrophic error: #error directive: -- unsupported ICC configuration! Only ICC 12.1 on Linux x86_64 is supported!
#error -- unsupported ICC configuration! Only ICC 12.1 on Linux x86_64 is supported!
Unfortunately, it seems as if Nvidia is merely tolerating the use of ICC, because I would hardly call it "support", given the lack of information provided by Nvidia for using ICC together with CUDA.
I am running Ubuntu 12.10 x86_64 and CUDA 5.5. Telling icc to mimick the behavior of the stock GCC 4.7.2 using the -Xcompiler -gcc-version=470 option did not help either. Using google/search, I was only able to find threads from the Nvidia forums dealing with CUDA 3.x and Intel ICC 11.1, but I was unable to transfer the obtained information to current CUDA releases.
I would be very grateful for any suggestion on how to solving this issue :-)
Referring to the file referenced in the error you received, it's specifically looking for an ICC compiler with a particular build date:
#if defined(__ICC)
#if !(__INTEL_COMPILER == 9999 && __INTEL_COMPILER_BUILD_DATE == 20110811) || !defined(__GNUC__) || !defined(__LP64__)
#error -- unsupported ICC configuration! Only ICC 12.1 on Linux x86_64 is supported!
#endif
The solution would be to have the intel compiler that actually matches that specified build date. As indicated, ICC 12.1, ie. version 12.1.0.233, instead of ICC 12.1.5 should do the trick.
The narrow focus is at least partly due to a test limitation. In this case, a particular ICC variant was tested with the CUDA toolkit before it was released, and so that host config check has this test in it.
I confronted the problem when compiling madagascar-1.5 with icc2013 and ifort2013. Then I try to resolve the problem by downloading ICC version 2011 update7. Based the INTEL_COMPILER_BUILD_DATE which is 20110811, I can download the correct one. I think the date 20110811 matched icc is the correct one.

Compiling Basic C-Language CUDA code in Linux (Ubuntu)

I've spent a lot of time setting up the CUDA toolchain on a machine running Ubuntu Linux (11.04). The rig has two NVIDIA Tesla GPUs, and I'm able to compile and run test programs from the NVIDIA GPU Computing SDK such as deviceQuery, deviceQueryDrv, and bandwidthTest.
My problems arise when I try to compile basic sample programs from books and online sources. I know you're supposed to compile with NVCC, but I get compile errors whenever I use it. Basically any sort of include statement involving CUDA libraries gives a missing file/library error. An example would be:
#include <cutil.h>
Do I need some sort of makefile to direct the compiler to these libraries or are there additional flags I need to set when compiling with NVCC?
I followed these guides:
http://hdfpga.blogspot.com/2011/05/install-cuda-40-on-ubuntu-1104.html http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_C_Getting_Started_Linux.pdf
To fix the include problems add the cuda include directory to your compilation options (assuming it is /usr/local/cuda/include):
nvcc -I/usr/local/cuda/include -L/usr/local/cuda/lib test.cu -o test
cutil is not part of the CUDA toolkit. It's part of the CUDA SDK. So, assuming you have followed the instructions and you have added the PATH and LIB directories to your environment variables you still need to point to the CUDA SDK includes and libraries directories.
In order to include that lib manually you must pass the paths to the compiler:
nvcc -I/CUDA_SDK_PATH/C/common/inc -L/CUDA_SDK_PATH/C/lib ...
Although I personally prefer not to use the CUDA SDK libraries, you probably will find easier start a project from a CUDA SDK example.

Loading 64-bit module to a 32-bit kernel using insmod

Is is possible to load a .ko file (kernel object file) which was compiled in 64-bit processor system into 32 bit processor system?
Actually I am getting following error when I issue the insmod command in my system:
insmod: error inserting 'be2net.ko': -1 Invalid module format
It is not possible to run 64-bit code in a 32-bit system. Depending on the requirements, the reverse can be true (running 32-bit software or libraries in a 64-bit system), but a 32-bit architecture cannot understand 64-bit code. You will need to compile the module on your system.
First download the kernel source from kernel.org. Then extract, and cd into
linux/drivers/net/benet
Once there, type (as your regular user)
make
and then
sudo insmod be2net.ko
That should work for you.
No, it is not possible to load 64-bit modules to a 32-bit kernel, and that is why you are getting an error. The reason is that 64 and 32-bit program have an incompatible ABI (e.g. different calling conventions). That is also the reason 64-bit applications can't be linked with 32-bit libraries, for example.
Note that insmod generally gives vague error message. For a more detailed message look at the output of dmesg.
The processor where it was compiled matters not at all. The compiler and compiler options do matter. If it was compiled FOR a 64-bit processor, it cannot run on a 32-bit processor, because it uses a different instruction set.
However, a 64-bit processor can run a cross-compiler and create 32-bit binaries. It is unlikely that you've done this.

Resources