Compiling Basic C-Language CUDA code in Linux (Ubuntu) - linux

I've spent a lot of time setting up the CUDA toolchain on a machine running Ubuntu Linux (11.04). The rig has two NVIDIA Tesla GPUs, and I'm able to compile and run test programs from the NVIDIA GPU Computing SDK such as deviceQuery, deviceQueryDrv, and bandwidthTest.
My problems arise when I try to compile basic sample programs from books and online sources. I know you're supposed to compile with NVCC, but I get compile errors whenever I use it. Basically any sort of include statement involving CUDA libraries gives a missing file/library error. An example would be:
#include <cutil.h>
Do I need some sort of makefile to direct the compiler to these libraries or are there additional flags I need to set when compiling with NVCC?
I followed these guides:
http://hdfpga.blogspot.com/2011/05/install-cuda-40-on-ubuntu-1104.html http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_C_Getting_Started_Linux.pdf

To fix the include problems add the cuda include directory to your compilation options (assuming it is /usr/local/cuda/include):
nvcc -I/usr/local/cuda/include -L/usr/local/cuda/lib test.cu -o test

cutil is not part of the CUDA toolkit. It's part of the CUDA SDK. So, assuming you have followed the instructions and you have added the PATH and LIB directories to your environment variables you still need to point to the CUDA SDK includes and libraries directories.
In order to include that lib manually you must pass the paths to the compiler:
nvcc -I/CUDA_SDK_PATH/C/common/inc -L/CUDA_SDK_PATH/C/lib ...
Although I personally prefer not to use the CUDA SDK libraries, you probably will find easier start a project from a CUDA SDK example.

Related

Setting up Pythran for compiling on Windows with clang-cl.exe and OpenMP working: need a way to pass compiler arguments

I'm using Pythran to compile Python code into C/C++ with OpenMP support on Windows. Now the documentation isn't great for Windows - it states:
"Windows support is on going and only targets Python 3.5+ with either Visual Studio 2017 or, better, clang-cl. Note that using clang-cl.exe is the default setting. It can be changed through the CXX and CC environment variables."
From playing around I found you MUST use clang-cl.exe or the code won't compile (MSVC doesn't like it).
So the preferred compiler is clang-cl.exe which is the "drop-in" replacement for cl.exe so Clang 12 was installed from Visual Studio 2019 setup by selecting "C++ Clang tools for Windows," and now I have C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\Llvm\x64\bin\clang-cl.exe as well as the LLVM linker lld-link.exe - since clang-cl.exe is the default I don't need to change any setup files, I just run vcvarsall.bat before Pythran so the compiler directory is in the path. (I noticed later to get lld-link.exe used some hacking of distutils _msvccompiler.py is required, switch link.exe to lld-link.exe and also comment out the '/LTCG' flag since Clang doesn't have that option, then it works... But still no OpenMP...
I compiled one of the examples with a virtual environment in Anaconda which had the pip installed NumPy and SciPy libraries (OpenBLAS backend) since MKL support is barely documented. It needed the pythran-openblas package so I pip installed that as well, and it compiled fine with clang-cl and I could import it no problem. I found that [Python]\Lib\site-packages\pythran\pythran-win32.cfg has an option to pass cflags where I can type the correct compiler arguments like: -Xclang -fopenmp -march=ivybridge and when running pythran [script.py], all those flags are passed the correct way (using the defaults isn't correct). BUT... this example from the docs is still not running in parallel.
I found on Stack Exchange: clang-cl -cc1 --help would output all the arguments clang can handle. Under openmp it states: -fopenmp Parse OpenMP pragmas and generate parallel code. So my guess here is that the example given in the Pythran documentation has no OpenMP pragmas to make parallel. Now why would they do that? No idea, as they show an example of it being made incredibly faster via OpenMP, but I can't reproduce it on Windows. And I have 6 cores / 12 virtual so I should see a speedup.
Anyone else have another OpenMP example I can try this out on??? Or have solved this mystery of using OpenMP another way?
Much appreciated!
The Pythran project maintainer got back to me after I emailed him directly. It seems that OpenMP is only supported via explicit #omp statements. So some time ago when they wrote the docs it would infer parallel routines, but not now. So to convert the example to OpenMP, a few changes are required:
#pythran export arc_distance(float[], float[], float[], float[])
import numpy as np
def arc_distance(theta_1, phi_1, theta_2, phi_2):
"""
Calculates the pairwise arc distance
between all points in vector a and b.
"""
size = theta_1.size
distance_matrix=np.empty_like(theta_1)
#omp parallel for
for i in range(size):
temp = (np.sin((theta_2[i]-theta_1[i])/2)**2 + np.cos(theta_1[i])*np.cos(theta_2[i]) * np.sin((phi_2[i]-phi_1[i])/2)**2)
distance_matrix[i] = 2 * np.arctan2(np.sqrt(temp), np.sqrt(1-temp))
return distance_matrix
BUT... there are other compiler arguments not documented that need to be passed to get an OpenBLAS-backed OpenMP module working, which took me HOURS to figure out. Here they are:
Pythran OpenBLAS Windows 10 Settings:
Find the file [Python]\Lib\site-packages\pythran\pythran-win32.cfg
Add to library_dirs: 'C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\Llvm\x64\lib'
Add to cflags: -Xclang -fopenmp
Add to ldflags: \libiomp5md.lib
Set blas to: blas=pythran-openblas
Then it should compile fine with a: pythran -v arc_distance.py - adding the -v flag is very helpful for finding issues (verbose compiler mode), but not needed.
Pythran Intel MKL Windows 10 Settings (Anaconda3 default libraries):
I also decided why not try to make this work on default Anaconda3 where NumPy and SciPy etc. are all compiled with MKL? My company uses Anaconda3, so everyone has Intel MKL already. And like the OpenBLAS settings, the MKL settings for Windows aren't documented either. So I figured it out:
Find the file [Python]\Lib\site-packages\pythran\pythran-win32.cfg, (most likely it's at C:\Users[username]\Anaconda3)
Add to include_dirs='C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\Llvm\x64\lib', '[Python]\Library\include'
Add to cflags: -Xclang -fopenmp
Add to ldflags: \libomp.lib
Set blas to: blas=mkl
Now you'll notice some strange things above compared to the OpenBLAS settings. The library path isn't populated, instead it has to be in the include path (don't ask why, I don't know). Also the OpenMP library is different. Again, I don't know why the one that works with OpenBLAS refuses to work with Intel MKL. But anyhow, that will give you Pythran with OpenMP on an Intel MKL based system.

Clang huge compilation?

Good Morning.
I am compiling Clang, following the instructions here Getting Started: Building and Running Clang
I am on linux and the compilation goes smoothly. But I think I am missing out something...
I want to compile ONLY clang, not all the related libraries. The option -DLLVM_ENABLE_PROJECTS=clang seems doing what I want (check LLVM_ENABLE_PROJECTS here)
If I use the instructions written there, I can compile, but I think I am compiling too much....a build directory of 70GB seems too much to me...
I tried to download the official debian source and compile the debian package (same source code! just using the "debian way" to create a package from official debian source), just to compare...The compilation goes smoothly, is very fast, and the build directory is much much smaller...as I expected...
I noticed in the first link I provided the phrase "This builds both LLVM and Clang for debug mode."...
So, anyone knows if my problem is due to the fact that I am compiling a "debug mode" version? if so, how could I compile the default version? and is there a way to compile ONLY clang without LLVM?
Yes, debug mode binaries are typically much larger than release mode binaries.
Cmake normally uses CMAKE_BUILD_TYPE to determine he build type. It can be set from the command line with -DCMAKE_BUILD_TYPE="Release" o -DCMAKE_BUILD_TYPE="Debug" (sometimes there are other build types as well).

What is the toolchain to be used to compile Berkeley bootloader (bbl)?

I have to run riscv-tests and SPEC2006 on riscv-linux (booted) on FPGA. I would like to know what is the compilation toolchain to be used for this flow.
I understand that riscv-linux has to be compiled with riscv64-linux-gcc. However, I'm unclear that about riscv-tests. Can riscv-elf-gcc be used to compile riscv-tests and run on riscv-linux? I read some of the posted mentioned in stackoverflow about SPEC2006 and bbl (both compiled with riscv-linux-gcc). I want to run riscv-tests also. Should they also be compiled with (riscv-linux-gcc) ?
Thanks!
To compile bbl or baremetal applications like riscv-tests you should you riscv64-unknown-elf- or riscv32-unknown-elf- (with Newlib) .
Because riscv64-linux contains more libraries that make compilation process complicated.We mainly use riscv64-linux to compile application that run on riscv-linux.

Error running cross-compiled code with pthread

I'm uing ARM_EABI cross-compiler to compile a code that makes use of pthreads to run at an ARM Cortex A9 simulation.
While I'm able to compile it with no problems (just as I've did with others non-pthread applications, which ran fine in the simulation), I'm having an error message when trying to run my pthread application at the simulated ARM (which is running Linux as OS). It's the following:
./pttest.exe: /lib/libpthread.so.0: no version information available (required by ./pttest.exe)
I did my research and found out that's because it's a dynamic lib, and I'm compiling the application with a higher version than the one available on my simulator.
My question is: how to find force my cross-compiler to compile the application with the same pthread lib version of my simulator? Is there anywhere I can download different versions of pthreads? And how to set it?
Sorry, I'm quite a newbie in that area.
Try compiling your application statically, e.g.
gcc -static -o myapplication myapplication.c

crosscompile glibc for arm

Good day
Currently, I'm working on an embedded device based on arm-linux. I want to build GCC for my target architecture with Glibc. GCC builds successful, but I have trouble with Glibc build.
I use the latest version of Glibc (ftp.gnu.org/gnu/glibc/glibc-2.12.1.tar.gz) and port for them (ftp.gnu.org/gnu/glibc/glibc-ports-2.12.1.tar.gz)
my configuration line:
../../glibc-2.12.1/configure --host=arm-none-linux-gnueabi --prefix=/home/anatoly/Desktop/ARM/build/glibc-build --enable-add-ons --with-binutils=/home/anatoly/Desctop/ARM/toolchain/arm/bin/
configuration script work fine, but i get some compile error:
...
/home/anatoly/Desktop/ARM/src/glibc-2.12.1/malloc/libmemusage_pic.a(memusage.os): In function me':
/home/anatoly/Desktop/ARM/src/glibc-2.12.1/malloc/lmemusage.c:253: undefined reference to__eabi+read_tp'
...
I also tried using the old version (2.11, 2.10) but have the same error.
Does anybody know the solution for this problem?
Use a precompiled toolchain, like those provided by code sourcery.
If you want to make your own, optimised (premature optimization is the root of all evil), use crosstool-NG, which is a tool dedicated to cross-compilation toolchain building.
If you are not convinced, and want to do everything with your own hands, ask your question on the crosstool-NG mailing list.
Try substituting arm-linux-gnueabi for arm-none-linux-gnueabi. Check that a compiler, loader etc. with the prefix you used for "host" exist on your path.

Resources