What is SEGV_MAPERR? - linux

What is SEGV_MAPERR, why does it always come up with SIGSEGV?

There are two common kinds of SEGV, which is an error that results from an invalid memory access:
A page was accessed which had the wrong permissions. E.g., it was read-only but your code tried to write to it. This will be reported as SEGV_ACCERR.
A page was accessed that is not even mapped into the address space of the application at all. This will often result from dereferencing a null pointer or a pointer that was corrupted with a small integer value. This is reported as SEGV_MAPERR.
Documentation of a sort (indexed Linux source code) for SEGV_MAPERR is here: https://elixir.bootlin.com/linux/latest/A/ident/SEGV_MAPERR.

It's a segmentation fault. Most probably a dangling pointer issue, or some sort of buffer overflow.
SIGSSEGV is the signal that terminates it based on the issue, segmentation fault.
Check for dangling pointers as well as the overflow issue.
Enabling core dumps will help you determine the problem.

Related

How do you determine where segfault occured when ip (null)?

segfault at 0 ip (null) sp bf9ed55c error 4 in appname[8048000+252000]
If I don't have the IP address, how do I determine where the crash occurred? does it being (null) mean anything useful?
in the appname[8048000+262000] = 0x82Aa000 is that supposed to give a clue? is it the 0x82AA000 the value I should try to use, both nm output and map file don't give much help on that.
Things that could set the instruction pointer to NULL:
branch to NULL
call NULL
return to NULL
In the first two cases, the stack is still in the state of the frame where the branch or call came from. In the last case, probably something has clobbered the return address on the stack of the previous function, and it may not be clear what the stack should have been, but usually it's still possible to find some earlier stack frame and try reconstructing what happened from there on.
Binaries may be loaded at different addresses. appname[8048000+252000] describes a segment in file appname which was mapped into memory at addresses 8048000-82aa000, it doesn't pinpoint where there fault was.
You will have better luck debugging with a core dump, which would contain details about the state (such as the other registers) and what was mapped into memory where (including the contents of the stack). If you are using systemd, coredumps are stored in the journal and can be retrieved with coredumpctl (for example, start debugging the last crash with coredumpctl gdb).

How can I check for a malloc() failure within a CUDA kernel? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
This is a fairly self-explanatory question. Some background info is appended.
How can I check for a malloc() failure within a CUDA kernel? I googled this and found nothing on what malloc() returns in a CUDA implementation.
In addition, I have no idea how to signal back to the host that there was an error within a CUDA kernel. How can I do this?
I thought one way would be to send an array of chars, one element for each kernel thread, and have the kernel place a 0x01 to signal an error and 0x00 for no error. Then the host could copy this memory back and check for any non zero bytes?
But this seems like a waste of memory. Is there a better way? Something like cudaThrowError()? ... maybe? ...
Appended:
I am running into trouble with a cuda error: GPUassert: the launch timed out and was terminated main.cu
If you google this, you will find info for Linux users (who have hybrid graphics solutions) - the fix is sometimes to run with optirun --no-xorg.
However in my case this isn't working.
If I run my program for a small enough data set, I get no errors. For a large enough data set, but not too large, I have to prevent time out errors by passing the --no-xorg flag. For an even larger dataset I get timeout errors regardless of the --no-xorg flag.
This hints to me that perhaps something else is going wrong?
Perhaps a malloc() failure within my kernel if I run out of memory?
I have checked my code and estimated memory usage - I don't think this is the problem, but I would like to check anyway.
How can I check for a malloc() failure within a CUDA kernel?
The behavior is the same as malloc on the host. If a malloc failure occurs, the returned pointer will be NULL.
So check for NULL after a malloc, and do something to address it:
#include <assert.h>
...
int *data
data = (int *)malloc(dsize*sizeof(int));
assert(data != NULL);
...rest of your code...
Notes:
It's legal to use assert in-kernel this way. If the assert is hit, your kernel will halt, and return an error to the host, which you can observe with proper cuda error checking or cuda-memcheck. This isn't the only possible way to handle a malloc failure, it's just a suggestion.
This may or may not be the problem with your actual code. This is good practice, however.

Are 32 bit pointers valid in 64 bit process?

Here is my issue. I implemented a Win 7 x64 process talking to a x32 process by following this link
The x64 process retrieves fine a x32 pointer (p_x32 below) to myClass:
myClass * POINTER_32 p = (myClass * POINTER_32)p_x32;
The trouble is that calling a method on 'p' crashes with memory violation. Indeed, under VS debugger I can see that 'p' members are not in order, i.e. the values are bogus. Digging further I found this link
Where the author says: "A handle or pointer cannot be serialized, it is only valid in the process that created it". As said above, apparently the pointer can be serialized (I used INT_PTR) but I wonder if "it is only valid in the process that created it" part is correct.
Thanks in advance.
The documentation is correct - the pointer is only valid in the originating process because the pointers are interpreted relative to the process's memory space. Therefore you cannot pass a pointer between processes and dereference it. You must serialize (deep copy) the actual data and transfer it to the other process.
The exception is you can setup specific "shared memory" spaces between the processes on Windows. Even then though the pointer values themselves are not guaranteed to be identical.
Each process has it's own virtual memory address space. If you pass pointers across process boundaries, they have a completely different meaning in the target process.

MOVDQU instruction + page boundary

I have a simple test program that loads an xmm register with the
movdqu instruction accessing data across a page boundary (OS = Linux).
If the following page is mapped, this works just fine. If it's not
mapped then I get a SIGSEGV, which is probably expected.
However this diminishes the usefulness of the unaligned loads quite
a bit. Additionally SSE4.2 instructions (like pcmpistri) which
allow for unaligned memory references appear to exhibit this behavior
as well.
That's all fine -- except there's many an implementation of strcmp
using pcmpistri that I've found that don't seem to address this issue
at all -- and I've been able to contrive trivial testcases that will
cause these implementations to fail, while the byte-at-a-time trivial
strcmp implementation will work just fine with the same data layout.
One more note -- it appears the the GNU C library implementation for
64-bit Linux has a __strcmp_sse42 variant that appears to use the
pcmpistri instruction in a more safe manner. The implementation of
this strcmp is fairly complex, but it appears to be carefully trying
to avoid the page boundary issue. I'm not sure if that's due to the
issue I describe above, or whether it's just a side-effect of trying to
get better performance by aligning the data.
Anyway the question I have is primarily -- where can I find out more
about this issue? I've typed in "movdqu crossing page boundary" and
every variant of that I can think of to Google, but haven't come across
anything particularly useful. If anyone can point me to further info
on this it would be greatly appreciated.
First, any algorithm which tries to access an unmapped address will cause a SegFault. If a non-AVX code flow used a 4 byte load to access the last byte of a page and the first 3 bytes of "the next page" which happened to not be mapped then it would also cause a SegFault. No? I believe that the "issue" is that the AVX(1/2/3) registers are so much bigger than "typical" that algorithms which were unsafe (but got away with it) get caught if they are trivially extended to the larger registers.
Aligned loads (MOVDQA) can never have this problem since they don't cross any boundaries of their own size or greater. Unaligned loads CAN have this problem (as you've noted) and "often" do. The reason for this is that the instruction is defined to load the full size of the target register. You need to look at the operand types in the instruction definitions quite carefully. It doesn't matter how much of the data you are interested in. It matters what the instruction is defined to do.
However...
AVX1 (Sandybridge) added a "masked move" capability which is slower than a movdqa or movdqu but will not (architecturally) access the unmapped page so long as the mask is not enabled for the portion of the access which would have fallen in that page. This is meant to address the issue. In general, moving forward, it appears that masked portions (See AVX512) of loads/stores will not cause access violations on IA either.
(It is a bummer about PCMPxSTRx behavior. Perhaps you could add 15 bytes of padding to your "string" objects?)
Facing a similar problem with a library I was writing, I got some information from a very helpful contributor.
The core of the idea is to align the 16-byte reads to the end of the string, then handle the leftover bytes at the beginning. This works because the end of the string must live in an accessible page, and you are guaranteed that the 16-byte truncated starting address must also live in an accessible page.
Since we never read past the string we cannot potentially stray into a protected page.
To handle the initial set of bytes, I chose to use the PCMPxSTRM functions, which return the bitmask of matching bytes. Then it's simply a matter of shifting the result to ignore any mask bits that occur before the true beginning of the string.

Kernel Panic -- Failed copy_from_user, kmalloc?

I am writing a rootkit for my OS class (the teacher is okay with me asking for help here). My rootkit hooks the sys_read system call to hide "magic" ports from the user. When I copy the user buffer *buf (one of the arguments of sys_read) to kernel space (into a buffer called kbuf) I get kernel panic/core dump error. It is possible that this is just because breaking read brings the system to a halt, but I wonder if anyone has any perspective on this.
The code is available online. Look at line 207: https://github.com/joshimhoff/toykit/blob/master/toykit.c
I hooked getdents and used copy_from_user to bring the getdents structs into kernel space, and this worked well! I am not sure what is different about read.
Thanks for the help!
I figured it out. I called the actual sys_read function and didn't check the return value. Sometimes it is negative to indicate an error. Instead of failing early, I asked kmalloc for a negative number of bytes.
Imagine that. Allocating negative memory. That would be a crazy world.

Resources