Intel CPU OpenCL in Mono killed by SIGXCPU (Ubuntu)

Intel CPU OpenCL in Mono killed by SIGXCPU (Ubuntu) - linux

Some time ago I wrote simple boids simulation using OpenCL (was school assignment), using C#, Cloo for OpenCL and OpenTK for OpenGL output. I tested it on Windows7 with AMD CPU implementation of OpenCL and on friend's NVidia.
Now I tried it on Linux (Ubuntu 12.04). I installed amd app sdk and intel sdk. It compiled ok, reference CPU implementation is working fine with graphic output. But when I try to run OpenCL version, it runs for about 1 second (showing what seems like valid output in OpenGL) and then gets killed by SIGXCPU. Tried to google some known issue, but found nothing.
So I tried to catch and ignore that signal, but everytime I try, program hangs. When I set mono to catch some different signal (e.g. SIGPIPE), it runs ok (minus that kill when opencl).
In Mono, i Tried Mono.UnixSignal as stated in FAQ
Tried
Mono.Unix.Native.Stdlib.SetSignalAction ( Mono.Unix.Native.Signum.SIGXCPU, Mono.Unix.Native.SignalAction.Ignore);
then something which doesn't hang, but doesn't help either:
Mono.Unix.Native.Stdlib.SetSignalAction ( Mono.Unix.Native.Signum.SIGXCPU, Mono.Unix.Native.SignalAction.Error);
and even
ignore (0);
Mono.Unix.Native.Stdlib.signal(Mono.Unix.Native.Signum.SIGXCPU, new Mono.Unix.Native.SignalHandler (ignore));
with
static void ignore(int signal) {
}
even when I remove everything else from main, it still hangs sometime after "touching" that signal.
One more weird thing:
Mono.Unix.Native.Stdlib.SetSignalAction ( Mono.Unix.Native.Signum.SIGXCPU, Mono.Unix.Native.SignalAction.Default);
Kills application with SIGXCPU somewhere after Application.EnableVisualStyles(); when I set it right before, not even touching OpenCL this time.
Is there something I missed in Mono? Is it using this signal somewhere internally that it gets in the way of OpenCl?

Mono uses SIGXCPU internally for its own purposes, so if you ignore it (or it's raised for some other reason), things will break.

SIGXCPU means that you've exceeded a per-process time limit. Once you've exceeded it, you've exceeded it. The process is stuck. You need to use ulimit, or get help from an admin, to set a higher limit.

Related

Code working on windows but launch failures on Linux

First and foremost: I am completely unable to create a MCVE, as I can only reproduce this when running a full code, any attempt to measure or replicate the error in a simpler environment makes it disappear. TDLR I suspect its not a code problem, but a configuration problem.
I have a piece of code for some mathematics on kernels in CUDA. I have a windows machine Win10 x64, GTX 1050, CUDA 9.2 and a Ubuntu 17.04, 2xGTX 1080 Ti, CUDA 9.1.
My code runs good on the windows machine. It is long (~700ms per kernel call for big samples) so I needed to increase the TDR value in windows. The code also (for now) forces it to run in 1 GPU, the first one that is selected with cudaSetDevice(0).
When I copy the same input data and code to the linux machine (I am using git, it is the same code), I get either
an illegal memory access was encountered
or
unspecified launch failure
in my error checking after the GPU call.
If I change the kernel to instead do the math, to just write a number in the output, the kernel executes properly. Other CUDA code (different functions that I have) works fine too. All this leads me to think that there is a problem outside the code, not with the code itself, nor with the general configuration of the drivers/environment variables.
I read that the xorg.conf can have an effect on the timeout of the kernels. I generated a xorg.conf (I had none) and remove the devices from there, as suggested here. I am connecting to the server remotely, and have no monitor plugged in. This changes nothing in the behavior, my kernels still error.
My question is: what else should I look? What linux specific configuration should I have a look at to pinpoint the cause of the kernel halts?

The error ended up being indeed illegal memory access.
These were caused by the fact that sizeof(unsigned long) is machine specific, and my linux machine returns 8 while my windows machine returns 4. As this code is called from MATLAB, and MATLAB (like some other high level languages such as python) defines the sizes of variables in bits (such as uint32(1)) there was a mismatch in the linux machine when doing memcpys. Turns out that this happened in a variable that is a index, so the kernels were reading garbage (due to the bad memcpy), but then triying to access another array at that location, creating an illegal memory error.
Too specific? yeah.

"clocksource tsc unstable" shown when the linux kernel boots up

I am booting up a linux kernel using a full system simulator, and I'd like to run my benchmark on the booted system. However, when it boots up, it shows me this message: "clocksource tsc unstable" and occasionally it hangs in the beggining. However, sometimes it lets me run my benchmark and then probably it hangs in the middle since the application never finishes and seems it's stuck there. Any idea how to fix this issue?
Thanks.

It suggests that, kernel didn't manage to calculate tsc (Time Stamp Counter) value properly i.e value is stale. This usually happens with VM. The way to avoid this is to - use predefined lpj (loops per jiffy) as kernel parameter (lpj=). Try it, hope issue will be fixed!

CUDA device seems to be blocked

I'm running a small CUDA application: the QuickSort benchmark algorithm (see here). I have a dual system with a NVIDIA 660GTX (device 0) and 8600GTS (device 1).
Under Windows 8 and Visual Studio, the application compiles and runs flawlessly on device 0. Under Linux (Ubuntu 12.04 LTS), the app compiles with nvcc and gcc but suddenly stops in its tracks, returning a (unspecified launch failure).
I have two issues:
After this error, my GPU cannot perform some other operations, e.g., running the SDK example bandwidhtTest blocks when it performs the first data transfer, but running deviceQuery continues to perform well. How can I reset my GPU? I've already tried the cudaDeviceReset() method but it doesn't help
How can I find what is going wrong under linux? Has someone a clue or seen this before?
Thanks in advance for your help!

Using the nvidia-smi utility you can reset the GPU if it is compatible
To my knowledge and experience, (unspecified launch failure) usually referees to segmentation fault. Have you specified the right GPU to use? Try to use cuda-memcheck to see if there is any memory out of bound scenario.

From my experience XID 31 was always caused by accessing bad pointer (aka Memory access violation).
I'd first pursue this trail. Run your application with cuda memcheck. Like that cuda-memcheck you_app args to your app and see if it finds any wrong memory accesses.
Also try stepping though the code with cuda-gdb or Nsight Eclipse Edition.

I've found that using
cuda-memcheck -b ...
prevents the device from locking up.

OpenGL on Windows uses Tons of CPU when SwapBuffers is called on a RenderThread

Ok, so I have been running into some threading issues with OpenGL on Windows. I'm using C# .NET to wrap GL. I'm on Windows7 x64.
So ive tried two different tests. In each test i'm rendering a untextured quad(aka two triangles). The CPU hit seems to be related to SwapBuffers from what I can tell.
Single threaded test(This works fine)::
{
Draw stuff;
SwapBuffers;
Sleep(15);
}
RenderingThread test(This eats all my CPU)::
{
Draw stuff;
SwapBuffers;
//glFinish(); //<< If used this seems to make the CPU usage normal
Sleep(15);
}
I know this example is simplistic, but the real question is why does OpenGL suck all my CPU when calling SwapBuffers on a different thread other then the one the Windows GUI thread runs on?? And why does glFinish() seem to fix this? Everybody say's not to use glFinish, so i'm not sure what i'm doing wrong or if OpenGL just sucks on Windows...?
I run the same test on OSX, CPU seems normal. I run the same test with D3D9 & D3D10 on windows, CPU seems normal. Haven't tested on Linux as my L-box is down.

This issue is simply solved by doing:
glFlush();
glFinish();
Before calling::
wglSwapBuffers(dc); // Windows
glxSwapBuffers(dc, handle); // Linux
cglFlushDrawable(ctx); // OS X
Although drivers make a big difference with OpenGL on Windows, and Windows still performs far better with Direct3D.

How to test the kernel for kernel panics?

I am testing the Linux Kernel on an embedded device and would like to find situations / scenarios in which Linux Kernel would issue panics.
Can you suggest some test steps (manual or code automated) to create Kernel panics?

There's a variety of tools that you can use to try to crash your machine:
crashme tries to execute random code; this is good for testing process lifecycle code.
fsx is a tool to try to exercise the filesystem code extensively; it's good for testing drivers, block io and filesystem code.
The Linux Test Project aims to create a large repository of kernel test cases; it might not be designed with crashing systems in particular, but it may go a long way towards helping you and your team keep everything working as planned. (Note that the LTP isn't proscriptive -- the kernel community doesn't treat their tests as anything important -- but the LTP team tries very hard to be descriptive about what the kernel does and doesn't do.)
If your device is network-connected, you can run nmap against it, using a variety of scanning options: -sV --version-all will try to find versions of all services running (this can be stressful), -O --osscan-guess will try to determine the operating system by throwing strange network packets at the machine and guessing by responses what the output is.
The nessus scanning tool also does version identification of running services; it may or may not offer any improvements over nmap, though.
You can also hand your device to users; they figure out the craziest things to do with software, they'll spot bugs you'd never even think to look for. :)

You can try following key combination
SysRq + c
or
echo c >/proc/sysrq-trigger

Crashme has been known to find unknown kernel panic situations, but it must be run in a potent way that creates a variety of signal exceptions handled within the process and a variety of process exit conditions.
The main purpose of the messages generated by Crashme is to determine if sufficiently interesting things are happening to indicate possible potency. For example, if the mprotect call is needed to allow memory allocated with malloc to be executed as instructions, and if you don't have the mprotect enabled in the source code crashme.c for your platform, then Crashme is impotent.
It seems that operating systems on x64 architectures tend to have execution turned off for data segments. Recently I have updated the crashme.c on http://crashme.codeplex.com/ to use mprotect in case of __APPLE__ and tested it on a MacBook Pro running MAC OS X Lion. This is the first serious update to Crashme since 1994. Expect to see updated Centos and Freebsd support soon.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string