On which CPU processor is OpenCL kernel running - linux

I want to determine exactly how AMD schedules its OpenCL kernels on the CPU and I could not find any OpenCL function to determine the physical processor/core id on which it is running.
I could only find the following links related to my problem:
Getting the machine serial number and CPU ID using C/C++ in Linux
How to know on which physical processor and on which physical core my code is running
NUMA Get Current Node/Core
I tried the above but none of the solutions worked. I saw that OpenCL kernels do not support C99 headers like stddef.h which is required for sched.h or even fopen().
Is there any way I can see exactly how the openCL kernels have been assigned to each CPU core/processor?
Note: I am using Ubuntu 14.04, gcc version 4.8.2 and AMD APP SDK 3.0.
Thanks for your help!

Related

OpenCL support for both Intel CPU and Nvidia GPU

When querying for supported OpenCL platforms/devices, my Nvidia GPU gets returned, thanks to the OpenCL runtime provided by the CUDA SDK.
However, the Intel Xeon CPU does not.
I see that Intel provides an OpenCL Runtime for its CPUs. However, it seems to require replacing libOpenCL.so.
With some linker hackery I was able to get the Intel runtime loaded into my process but then only the Intel CPU, not the Nvidia device, is enumerated.
Is there a way to get them to co-exist so I can distribute tasks across both the GPU and CPU?

Is it possible to get OpenCL on Windows Linux Subsystem?

I'v been trying for the past day to get Tensorflow built with OpenCL on the Linux Subsystem.
I followed this guide. But when typing clinfo it says
Number of platforms 0
Then typing /usr/local/computecpp/bin/computecpp_info gives me
OpenCL error -1001: Unable to retrieve number of platforms. Device Info:
Cannot find any devices on the system. Please refer to your OpenCL vendor documentation.
Note that OPENCL_VENDOR_PATH is not defined. Some vendors may require this environment variable to be set.
Am I doing anything wrong? Is it even possible to install OpenCL on Windows Linux Subsystem?
Note:
I'm using an AMD R9 390X from MSI, 64bit Windows Home Edition
With the launch of WSL2, CUDA programs are now supported in WSL (more information here), however there is still no support for OpenCL as of this writing: https://github.com/microsoft/WSL/issues/6951.
According to a Microsoft representative in this forum post, Windows Subsystem for Linux does not support OpenCL or CUDA GPU programs, and support is not currently planned. To experiment with TensorFlow/OpenCL it would probably be easiest to install Linux in a dual-boot configuration.
You could use the Intel OpenCL SDK for the CPU, https://software.intel.com/en-us/articles/opencl-drivers.

Linux porting for RISCV multicore processor

We are developing a multi-core processor with RISCV architecture.
We had already ported Linux for single-core RISCV processor and it is working on our own FPGA based board with busybox rootfs.
I want to port Linux for multi-core RISCV processor now.
My doubts are:
Whether the gnu-riscv-gcc toolchain available now supports multi-core?
Whether spike available now supports multi-core?
Should I make any change to the bbl bootloader (Berkely bootloader) to support multi-core?
What are the changes I should make for my single-core Linux kernel to support multi-core?
The current RISC-V ecosystem already supports SMP Linux.
No changes to the compiler are required for multicore.
Spike can simulate multicore when using the '-p' flag.
BBL supports multicore.
Before building linux, configure it to support SMP.
Any hiccups, are probably due to the toolchain out of sync with the newest privileged spec changes. Last Fall, users successfully built and ran multicore Linux on RISC-V.
This is all expected to work out of the box. My standard testing flow for Linux and QEMU pull requests is to boot a Fedora root filesystem on QEMU via Linux+BBL. Instructions can be found on the QEMU Wiki Article about RISC-V. This will boot in our "virt" board, which uses VirtIO based devices. These devices have standard upstream Linux drivers that are very well supported, so there isn't really any platform-level work to be done.
In addition to the standard VirtIO-based devices, SiFive has devices that are part of the Freedom SOC platform. If you platform differs significantly from SiFive's Freedom platform then you'll need some additional drivers in both Linux and BBL.
We maintain an out-of-tree version of the drivers we haven't cleaned up for upstream yet in freedom-u-sdk, which should give you a rough idea of how much work it is. Running make qemu in that repository will boot Linux on QEMU via BBL, and running make will show you how to flash an SD card image for the HiFive Unleashed board.

CUDA performance penalty when running in Windows

I've noticed a big performance hit when I run my CUDA application in Windows 7 (versus Linux). I think I may know where the slowdown occurs: For whatever reason, the Windows Nvidia driver (version 331.65) does not immediately dispatch a CUDA kernel when invoked via the runtime API.
To illustrate the problem I profiled the mergeSort application (from the examples that ship with CUDA 5.5).
Consider first the kernel launch time when running in Linux:
Next, consider the launch time when running in Windows:
This post suggests the problem might have something to do with the windows driver batching the kernel launches. Is there anyway I can disable this batching?
I am running with a GTX 690 GPU, Windows 7, and version 331.65 of the Nvidia driver.
There is a fair amount of overhead in sending GPU hardware commands through the WDDM stack.
As you've discovered, this means that under WDDM (only) GPU commands can get "batched" to amortize this overhead. The batching process may (probably will) introduce some latency, which can be variable, depending on what else is going on.
The best solution under windows is to switch the operating mode of the GPU from WDDM to TCC, which can be done via the nvidia-smi command, but it is only supported on Tesla GPUs and certain members of the Quadro family of GPUs -- i.e. not GeForce. (It also has the side effect of preventing the device from being used as a windows accelerated display adapter, which might be relevant for a Quadro device or a few specific older Fermi Tesla GPUs.)
AFAIK there is no officially documented method to circumvent or affect the WDDM batching process in the driver, but unofficially I've heard , according to Greg#NV in this link the command to issue after the cuda kernel call is cudaEventQuery(0); which may/should cause the WDDM batch queue to "flush" to the GPU.
As Greg points out, extensive use of this mechanism will wipe out the amortization benefit, and may do more harm than good.
EDIT: moving forward to 2016, a newer recommendation for a "low-impact" flush of the WDDM command queue would be cudaStreamQuery(stream);
EDIT2: Using recent drivers on windows, you should be able to place Titan family GPUs in TCC mode, assuming you have some other GPU set up for primary display. The nvidia-smi tool will allow you to switch modes (using nvidia-smi --help for more info).
Additional info about the TCC driver model can be found in the windows install guide, including that it may reduce the latency of kernel launches.
The statement about TCC support is a general one. Not all Quadro GPUs are supported. The final determinant of support for TCC (or not) on a particular GPU is the nvidia-smi tool. Nothing here should be construed as a guarantee of support for TCC on your particular GPU.
Even it's been almost 3 years since the issue has been active, I still consider it necesssary to provide my findings.
I've been in the same situation: the same cuda programme elapsed for 5ms in Ubuntu cuda 8.0 while over 30ms in Windows 10 cuda 10.1. Both with GTX 1080Ti.
However, in Windows when I changed the compiler from VS Studio to cmd's nvcc compiler suddenly the programme was boosted to the same speed as the Linux one.
This suggests that maybe the problem comes from Visual Studio.

Can I run CUDA on Intel's integrated graphics processor?

I have a very simple Toshiba Laptop with i3 processor. Also, I do not have any expensive graphics card. In the display settings, I see Intel(HD) Graphics as display adapter. I am planning to learn some cuda programming. But, I am not sure, if I can do that on my laptop as it does not have any nvidia's cuda enabled GPU.
In fact, I doubt, if I even have a GPU o_o
So, I would appreciate if someone can tell me if I can do CUDA programming with the current configuration and if possible also let me know what does Intel(HD) Graphics mean?
At the present time, Intel graphics chips do not support CUDA. It is possible that, in the nearest future, these chips will support OpenCL (which is a standard that is very similar to CUDA), but this is not guaranteed and their current drivers do not support OpenCL either. (There is an Intel OpenCL SDK available, but, at the present time, it does not give you access to the GPU.)
Newest Intel processors (Sandy Bridge) have a GPU integrated into the CPU core. Your processor may be a previous-generation version, in which case "Intel(HD) graphics" is an independent chip.
Portland group have a commercial product called CUDA x86, it is hybrid compiler which creates CUDA C/ C++ code which can either run on GPU or use SIMD on CPU, this is done fully automated without any intervention for the developer. Hope this helps.
Link: http://www.pgroup.com/products/pgiworkstation.htm
If you're interested in learning a language which supports massive parallelism better go for OpenCL since you don't have an NVIDIA GPU. You can run OpenCL on Intel CPUs, but at best you can learn to program SIMDs.
Optimization on CPU and GPU are different. I really don't think you can use Intel card for GPGPU.
Intel HD Graphics is usually the on-CPU graphics chip in newer Core i3/i5/i7 processors.
As far as I know it doesn't support CUDA (which is a proprietary NVidia technology), but OpenCL is supported by NVidia, ATi and Intel.
in 2020 ZLUDA was created which provides CUDA API for Intel GPUs. It is not production ready yet though.

Resources