OpenCL: struct field initialization from inline function does not work - struct

I have an OpenCL kernel code, which does not behave as expected. The similar C code compiled with gcc works fine.
struct data {
short* a;
};
typedef struct data Data;
inline void foo(Data* d) {
short b[1] = {99};
d->a = b;
}
__kernel void bar(__global short* output) {
Data d;
foo(&d);
short val = d.a[0];
int id = get_global_id(0);
output[id] = val;
}
Always outputs [0, 0, ..., 0].
If I initialize d.a in __kernel bar and only assign d->a[0] = 99 in foo it works as expected and outputs [99, 99, ..., 99]
Thanks in advance!
UPDATE:
I'm using Java and JOCL for the host code.
As ScottD suggested I've changed d->a = b; in function foo to *d->a = *b;.And it works great in C version. But causes the following error for OpenCL on MacOS:
Exception in thread "main" org.jocl.CLException:
CL_BUILD_PROGRAM_FAILURE Build log for device 0:
CVMS_ERROR_COMPILER_FAILURE: CVMS compiler has crashed or hung building an element.
at org.jocl.CL.clBuildProgram(CL.java:9368)
...
Or a JVM termination on Windows with AMD CPU:
# A fatal error has been detected by the Java Runtime Environment:
# EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x000007fedfeb007a, pid=3816, tid=4124
# JRE version: 7.0-b147
# Java VM: Java HotSpot(TM) 64-Bit Server VM (21.0-b17 mixed mode windows-amd64 compressed oops)
# Problematic frame:
# C [amdocl64.dll+0x60007a]

I believe the problem is this: Function foo sets a pointer used by the caller to the address of a local variable that goes out of scope when foo returns. When the caller accesses that pointer, data in the out of scope variable may or may not still be 99. To demonstrate, make a gcc debug build for this code. It works. Now add a printf(hello\n") after foo(&d) and before val=d.a[0]. Now it fails. This is because the printf call overites the stack memory containing the out of scope 99 value.
Probably you intended:
*d->a = *b; in place of d->a = b;

Related

Random segmentation fault in D lang on switch break

I was debugging a fairly simple program written in D, that seems to have a random chance to receive a SEGV signal.
Upon further inspection I observed that using different compilers and build modes yielded different results.
Results of my tests:
DMD Debug = works 99% of the time
DMD Release = 50/50
LDC Debug = 50/50
LDC Release = 50/50
Because the binary from the default compiler (DMD) crashed only once I couldn't really debug it, and release mode didn't help either due to lack of debug symbols.
Building the binary with LDC in debug mode let me test it with gdb and valgrind, to summarize what I gathered.
Relevant information from valgrind,
Invalid read of size 4 # ctor in file video.d line 46
Access not within mapped region at address 0x0 # ctor in file video.d line
Gdb doesn't give me any more insight, 3 stack frames, of which only 0th is of interest, backtrace of frame 0 shows file video.d line 46 which is a break statement, so what now?
This is the snippet of code producing a seg fault
module video;
import ffmpeg.libavformat.avformat;
import ffmpeg.libavcodec.avcodec;
import ffmpeg.libavutil.avutil;
class Foo
{
private
{
AVFormatContext* _format_ctx;
AVStream* _stream_video;
AVStream* _stream_audio;
}
...
public this(const(string) path)
{
import std.string : toStringz;
_format_ctx = null;
enforce(avformat_open_input(&_format_ctx, path.toStringz, null, null) == 0);
scope (failure) avformat_close_input(&_format_ctx);
enforce(avformat_find_stream_info(_format_ctx, null) == 0);
debug av_dump_format(_format_ctx, 0, path.toStringz, 0);
foreach (i; 0 .. _format_ctx.nb_streams)
{
AVStream* stream = _format_ctx.streams[i];
if (stream == null)
continue;
enforce (stream.codecpar != null);
switch (stream.codecpar.codec_type)
{
case AVMediaType.AVMEDIA_TYPE_VIDEO:
_stream_video = stream;
break;
case AVMediaType.AVMEDIA_TYPE_AUDIO:
_stream_audio = stream;
break;
default:
stream.discard = AVDiscard.AVDISCARD_ALL;
break; // Magic line 46
}
}
}
}
// Might contain spelling errors, had to write it by hand.
So does anyone have an idea what causes this behaviour, or more precisely how to go about fixing it?
Try to check validity _stream_audio
default:
enforce( _stream_audio, new Exception( "_stream_audio is null" ))
.discard = AVDiscard.AVDISCARD_ALL;
break; // Magic line 46
You are not abiding the warning in the toStringz documentation:
“Important Note: When passing a char* to a C function, and the C function keeps it around for any reason, make sure that you keep a reference to it in your D code. Otherwise, it may become invalid during a garbage collection cycle and cause a nasty bug when the C code tries to use it.”
This may not be the cause of your problem, but the way you use toStringz is risky.

Cython: external struct definition throws compiler error

I am trying to use Collections-C in Cython.
I noticed that some structures are defined in the .c file, and an alias for them is in the .h file. When I try to define those structures in a .pxd file and use them in a .pyx file, gcc throws an error: storage size of ‘[...]’ isn’t known.
I was able to reproduce my issue to a minimum setup that replicates the external library and my application:
testdef.c
/* Note: I can't change this */
struct bogus_s {
int x;
int y;
};
testdef.h
/* Note: I can't change this */
typedef struct bogus_s Bogus;
cytestdef.pxd
# This is my code
cdef extern from 'testdef.h':
struct bogus_s:
int x
int y
ctypedef bogus_s Bogus
cytestdef.pyx
# This is my code
def fn():
cdef Bogus n
n.x = 12
n.y = 23
print(n.x)
If I run cythonize, I get
In function ‘__pyx_pf_7sandbox_9cytestdef_fn’:
cytestdef.c:1106:9: error: storage size of ‘__pyx_v_n’ isn’t known
Bogus __pyx_v_n;
^~~~~~~~~
I also get the same error if I use ctypedef Bogus: [...] notation as indicated in the Cython manual.
What am I doing wrong?
Thanks.
Looking at the documentation for your Collections-C library these are opaque structures that you're supposed to use purely through pointers (don't need to know the size to have a pointer, while you do to allocate on the stack). Allocation of these structures is done in library functions.
To change your example to match this case:
// C file
int bogus_s_new(struct bogus_s** v) {
*v = malloc(sizeof(struct bogus_s));
return (v!=NULL);
}
void free_bogus_s(struct bogus_s* v) {
free(v);
}
Your H file would contain the declarations for those and your pxd file would contain wrappers for the declarations. Then in Cython:
def fn():
cdef Bogus* n
if not bogus_s_new(&n):
return
try:
# you CANNOT access x and y since the type is
# designed to be opaque. Instead you should use
# the acessor functions defined in the header
# n.x = 12
# n.y = 23
finally:
free_bogus_s(n)

why typedef throwing error :(S) Initializer must be a valid constant expression

in f1.h header using typedef for structure. sample code snippet shown below
typedef struct{
int a;
union u
{
int x;
char y;
}xyz;
}mystruct;
In f2.h header using the structure mysturct to get the offset. Code snippet shown below
static mystruct ktt
//#define OFFSET_T(b, c) ((int*)((&((mystruct*)0)->b)) - (int*)((&((mystruct*)0)->c)))
#define OFFSET_T(b, c) ((char*) &ktt.b - (char *) &ktt.c)
static struct Mystruct1{
int n;
}mystruct1 = {OFFSET_T(xyz,a)};
when i'm doing compilation in AIX machine using xlc compiler it is throwing the error as "1506-221(S) Initializer must be a valid constant expression".
i tried both the macro's but both are getting same error. Is there anything wrong in f2.h macro while performing size of structure to get offset ??
The expression in question needs to be an arithmetic constant expression in order to be portable. Neither macro qualifies, since operands of pointer type are involved and arithmetic constant expressions are restricted such that those operands are not allowed. In C11, this is found in subclause 6.6 paragraph 8.
That said, the code using the first macro (source reproduced below) does compile on multiple versions of the xlc compiler on AIX.
typedef struct{
int a;
union u
{
int x;
char y;
}xyz;
}mystruct;
static mystruct ktt;
#define OFFSET_T(b, c) ((int*)((&((mystruct*)0)->b)) - (int*)((&((mystruct*)0)->c)))
//#define OFFSET_T(b, c) ((char*) &ktt.b - (char *) &ktt.c)
static struct Mystruct1{
int n;
}mystruct1 = {OFFSET_T(xyz,a)};
The compiler invocation I used was:
xlc offsetcalc.c -c -o /dev/null
The version information for one of the older versions I tried is:
IBM XL C/C++ for AIX, V10.1
Version: 10.01.0000.0021
The version information for one of the newest versions I tried is:
IBM XL C/C++ for AIX, V13.1.3 (5725-C72, 5765-J07)
Version: 13.01.0003.0004

How to use proc_pid_cmdline in kernel module

I am writing a kernel module to get the list of pids with their complete process name. The proc_pid_cmdline() gives the complete process name;using same function /proc/*/cmdline gets the complete process name. (struct task_struct) -> comm gives hint of what process it is, but not the complete path.
I have included the function name, but it gives error because it does not know where to find the function.
How to use proc_pid_cmdline() in a module ?
You are not supposed to call proc_pid_cmdline().
It is a non-public function in fs/proc/base.c:
static int proc_pid_cmdline(struct seq_file *m, struct pid_namespace *ns,
struct pid *pid, struct task_struct *task)
However, what it does is simple:
get_cmdline(task, m->buf, PAGE_SIZE);
That is not likely to return the full path though and it will not be possible to determine the full path in every case. The arg[0] value may be overwritten, the file could be deleted or moved, etc. A process may exec() in a way which obscures the original command line, and all kinds of other maladies.
A scan of my Fedora 20 system /proc/*/cmdline turns up all kinds of less-than-useful results:
-F
BUG:
WARNING: at
WARNING: CPU:
INFO: possible recursive locking detecte
ernel BUG at
list_del corruption
list_add corruption
do_IRQ: stack overflow:
ear stack overflow (cur:
eneral protection fault
nable to handle kernel
ouble fault:
RTNL: assertion failed
eek! page_mapcount(page) went negative!
adness at
NETDEV WATCHDOG
ysctl table check failed
: nobody cared
IRQ handler type mismatch
Machine Check Exception:
Machine check events logged
divide error:
bounds:
coprocessor segment overrun:
invalid TSS:
segment not present:
invalid opcode:
alignment check:
stack segment:
fpu exception:
simd exception:
iret exception:
/var/log/messages
--
/usr/bin/abrt-dump-oops
-xtD
I have managed to solve a version of this problem. I wanted to access the cmdline of all PIDs but within the kernel itself (as opposed to a kernel module as the question states), but perhaps these principles can be applied to kernel modules as well?
What I did was, I added the following function to fs/proc/base.c
int proc_get_cmdline(struct task_struct *task, char * buffer) {
int i;
int ret = proc_pid_cmdline(task, buffer);
for(i = 0; i < ret - 1; i++) {
if(buffer[i] == '\0')
buffer[i] = ' ';
}
return 0;
}
I then added the declaration in include/linux/proc_fs.h
int proc_get_cmdline(struct task_struct *, char *);
At this point, I could access the cmdline of all processes within the kernel.
To access the task_struct, perhaps you could refer to kernel: efficient way to find task_struct by pid?.
Once you have the task_struct, you should be able to do something like:
char cmdline[256];
proc_get_cmdline(task, cmdline);
if(strlen(cmdline) > 0)
printk(" cmdline :%s\n", cmdline);
else
printk(" cmdline :%s\n", task->comm);
I was able to obtain the commandline of all processes this way.
To get the full path of the binary behind a process.
char * exepathp;
struct file * exe_file;
struct mm_struct *mm;
char exe_path [1000];
//straight up stolen from get_mm_exe_file
mm = get_task_mm(current);
down_read(&mm->mmap_sem); //lock read
exe_file = mm->exe_file;
if (exe_file) get_file(exe_file);
up_read(&mm->mmap_sem); //unlock read
//reduce exe path to a string
exepathp = d_path( &(exe_file->f_path), exe_path, 1000*sizeof(char) );
Where current is the task struct for the process you are interested in. The variable exepathp gets the string of the full path. This is slightly different than the process cmd, this is the path of binary which was loaded to start the process. Combining this path with the process cmd should give you the full path.

Memory layout mismatching between CPU and GPU code with CUDA

I'm experiencing a very weird situation. I have this template structures:
#ifdef __CUDACC__
#define __HOSTDEVICE __host__ __device__
#else
#define __HOSTDEVICE
#endif
template <typename T>
struct matrix
{
T* ptr;
int col_size, row_size;
int stride;
// some host & device methods
};
struct dummy1 {};
struct dummy2 : dummy1 {};
template <typename T>
struct a_functor : dummy2
{
matriz<T> help_m;
matrix<T> x, y;
T *x_ptr, *y_ptr;
int bsx, ind_thr;
__HOSTDEVICE void operator()(T* __x, T* __y)
{
// functor code
}
};
I've structured my code to separate cpp and cu files, so a_functor object is created in cpp file and used in a kernel function. The problem is that, executing operator() inside a kernel, I found some random behaviour I couldn't explain only looking at code. It was like my structs were sort of corrupted. So, calling a sizeof() on an a_functor object, I found:
CPU code (.cpp and .cu outside kernel): 64 bytes
GPU code (inside kernel): 68 bytes
There was obviously some kind of mismatching that ruined the whole stuff. Going further, I tracked the distance between struct parameter pointers and struct itself - to try to inspect the produced memory layout - and here's what I found:
a_functor foo;
// CPU
(char*)(&foo.help_m) - (char*)(&foo) = 0
(char*)(&foo.x) - (char*)(&foo) = 16
(char*)(&foo.y) - (char*)(&foo) = 32
(char*)(&foo.x_ptr) - (char*)(&foo) = 48
(char*)(&foo.y_ptr) - (char*)(&foo) = 52
(char*)(&foo.bsx) - (char*)(&foo) = 56
(char*)(&foo.ind_thr) - (char*)(&foo) = 60
// GPU - inside a_functor::operator(), in-kernel
(char*)(&this->help_m) - (char*)(this) = 4
(char*)(&this->x) - (char*)(this) = 20
(char*)(&this->y) - (char*)(this) = 36
(char*)(&this->x_ptr) - (char*)(this) = 52
(char*)(&this->y_ptr) - (char*)(this) = 56
(char*)(&this->bsx) - (char*)(this) = 60
(char*)(&this->ind_thr) - (char*)(this) = 64
I really can't understand why nvcc generated this memory layout for my struct (what are that 4 bytes supposed to be/do!?!). I thought it could be an alignment problem and I tryed to explicitly align a_functor, but I can't because it is passed by value in kernel
template <typename T, typename Str>
__global__ void mykernel(Str foo, T* src, T*dst);
and when I try compile I get
error: cannot pass a parameter with a too large explicit alignment to a global routine on win32 platforms
So, to solve this strange situation (...and I do think that's an nvcc bug), what should I do? The only thing I can think of is playing with alignment and passing my struct to kernel by pointer to avoid the aforementioned error. However, I'm really wondering: why that memory layout mismatching?! It really makes no sense...
Further information: I'm using Visual Studio 2008, compiling with MSVC on Windows XP 32bit platform. I installed the latest CUDA Toolkit 5.0.35. My card is a GeForce GTX 570 (compute capability 2.0).
From the comments it appears there may be differences between the code you're actually running and the code you've posted, so it's difficult to give more than vague answers without someone being able to reproduce the problem. That said, on Windows there are cases where the layout and size of a struct can differ between the CPU and the GPU, these are documented in the programming guide:
On Windows, the CUDA compiler may produce a different memory layout,
compared to the host Microsoft compiler, for a C++ object of class
type T that satisfies any of the following conditions:
T has virtual functions or derives from a direct or indirect base class that has virtual functions;
T has a direct or indirect virtual base class;
T has multiple inheritance with more than one direct or indirect empty base class.
The size for such an object may also be
different in host and device code. As long as type T is used
exclusively in host or device code, the program should work correctly.
Do not pass objects of type T between host and device code (e.g., as
arguments to global functions or through cudaMemcpy*() calls).
The third case may apply in your case where you have an empty base class, do you have multiple inheritance in the real code?

Resources