Analyzing a crash from a rust-generated C library

Analyzing a crash from a rust-generated C library - rust

I needed to call Rust code from my Go code. Then I used C as my interface. I did this:
I've created a Rust library that takes a CStr as a parameter, and returns a new processed string back as CStr.
This code is statically compiled to a static C library my_lib.a.
Then, this library is statically linked with my Go code, that then calls the library API using CGo (Go's representation to C String, just like Rusts's Cstr).
The final Go binary is sitting inside a docker container in my kubernetes. Now, my problem is that is some cases where the library's API is called, my pod (container) is crashing. Due to the nature of using CStr and his friends, I must use unsafe scopes in some places, and I highly suspect a segfault that is caused by one of the ptrs used in my code, but I have no way of communicating the error back to the Go code that could be then printed OR alternatively get some sort of a core dump from Rust/C so I can point out the problematic code. The pod just crashes with no info whatsoever, at least to my knowledge..
So my question is, how can I:
Recover from panic/crashes that happen inside an unsafe code? or maybe wrap it with a recoverable safe scope?
Override the SIG handlers so I can at least "catch" the errors and not crash? So I can debug it.
Perhaps communicate a signal interruption that was caused in my c-lib that was generated off Rust back to the caller?
I realize that once Rust is compiled to a c-library, it is a matter of C, but I have no idea how to tackle this one.
Thanks!

I've created a Rust library that takes a CStr as a parameter, and returns a new processed string back as CStr.
Neither operation seems OK:
the CStr documentation specifically notes that CStr is not repr(C)
CStr is a borrowed string, a "new processed string" would have to be owned (so a CString, which also isn't repr(C)).
Due to the nature of using CStr and his friends, I must use unsafe scopes in some places, and I highly suspect a segfault that is caused by one of the ptrs used in my code, but I have no way of communicating the error back to the Go code that could be then printed OR alternatively get some sort of a core dump from Rust/C so I can point out the problematic code. [...] Recover from panic/crashes that happen inside an unsafe code? or maybe wrap it with a recoverable safe scope?
If you're segfaulting there's no panic or crash which Rust can catch or manipulate in any way. A segfault means the OS itself makes your program go away. However you should have a core dump the usual way, this might be a configuration issue with your container thing.
Override the SIG handlers so I can at least "catch" the errors and not crash? So I can debug it.
You can certainly try to handle SIGSEGV, but after a SIGSEGV I'd expect the program state to be completely hosed, this is not an innocuous signal.

Related

Are objects accessed indirectly in D?

As I've read all objects in D are fully location independent. How this requirement is achieved?
One thing that comes to my mind, is that all references are not pointers to the objects, but to some proxy, so when you move object (in memory) you just update that proxy, not all references used in program.
But this is just my guess. How it is done in D for real?

edit: bottom line up front, no proxy object, objects are referenced directly through regular pointers. /edit
structs aren't allowed to keep a pointer to themselves, so if they get copied, they should continue to just work. This isn't strictly enforced by the language though:
struct S {
S* lol;
void beBad() {
lol = &this; // this compiler will allow this....
}
}
S pain() {
S s;
s.beBad();
return s;
}
void main() {
S s;
s = pain();
assert(s.lol !is &s); // but it will also move the object without notice!
}
(EDIT: actually, I guess you could use a postblit to update internal pointers, so it isn't quite without notice. If you're careful enough, you could make it work, but then again, if you're careful enough, you can shoot between your toes without hitting your foot too. EDIT2: Actually no, the compiler/runtime is still allowed to move it without even calling the postblit. One example of where this happens is if it copies a stack frame to the heap to make a closure. The struct data is moved to a new address without being informed. So yeah. /edit)
And actually, that assert isn't guaranteed to pass, the compiler might choose to call pain straight on the local object declared in main, so the pointer would work (though I'm not able to force this optimization here for a demo, generally, when you return a struct from a function, it is actually done via a hidden pointer the caller passes - the caller says "put the return value right here" thus avoiding a copy/move in some cases).
But anyway, the point just is that the compiler is free to copy or not to copy a struct at its leisure, so if you do keep the address of this around in it, it may become invalid without notice; keeping that pointer is not a compile error, but it is undefined behavior.
The situation is different with classes. Classes are allowed to keep references to this internally since a class is (in theory, realized by the garbage collector implementation)) an independent object with an infinite lifetime. While it may be moved (such as be a moving GC (not implemented in D today)), if it is moved, all references to it, internal and external, would also be required to be updated.
So classes can't have the memory pulled out from under them like structs can (unless you the programmer take matters into your own hands and bypass the GC...)
The location independent thing I'm pretty sure is referring only to structs and only to the rule that they can't have pointers to themselves. There's no magic done with references or pointers - they indeed work with memory addresses, no proxy objects.

Assign string to zmq::message_t without copying

I need to do some high performance c++ stuff and that is why I need to avoid copying data whenever possible.
Therefore I want to directly assign a string buffer to a zmq::message_t object without copying it. But there seems to be some deallocation of the string which avoids successful sending.
Here is the piece of code:
for (pair<int, string> msg : l) {
comm_out.send_int(msg.first);
comm_out.send_int(t_id);
int size = msg.second.size();
zmq::message_t m((void *) std::move(msg.second).data(), size, NULL, NULL);
comm_out.send_frame_msg(m, false); // some zmq-wrapper class
}
How can I avoid that the string is deallocated before the message is send out? And when is the string deallocated exactly?
Regards

I think that zmq::message_t m((void *) std::move(msg.second).data()... is probably undefined behaviour, but is certainly the cause of your problem. In this instance, std::move isn't doing what I suspect you think it does.
The call to std::move is effectively creating an anonymous temporary of a string, moving the contents of msg.second into it, then passing a pointer to that temporary data into the message_t constructor. The 0MQ code assumes that pointer is valid, but the temporary object is destroyed after the constructor of message_t completes - i.e. before you call send_frame.
Zero-copy is a complicated matter in 0mq (see the 0MQ Guide) for more details, but you have to ensure that the data that hasn't been copied is valid until 0MQ tells you explicitly that it's finished with it.
Using C++ strings in this situation is hard, and requires a lot of thought. Your question about how to "avoid that the string is deallocated..." goes right to the heart of the issue. The only answer to that is "with great care".
In short, are you sure you need zero-copy at all?

Is it safe to call dlclose(NULL)?

I experience a crash when I pass a null pointer to dlclose.
Should I check for null before calling dlclose?
POSIX tells nothing about this:
http://pubs.opengroup.org/onlinepubs/7908799/xsh/dlclose.html
Is it undefined behaviour or a bug in dlclose implementation?

This is tricky. POSIX states that
if handle does not refer to an open object, dlclose() returns a non-zero value
from which you could infer that it should detect, for an arbitrary pointer, whether that pointer refers to an open object. The Linux/Glibc version apparently does no such check, so you'll want to check for NULL yourself.
[Aside, the Linux manpage isn't very helpful either. It's quite implicit about libdl functions' behavior, deferring to POSIX without very clearly claiming conformance.]

It also says nothing about being accepting a NULL and not crashing. We can assume from your test that it doesn't do an explicit NULL check, and it does need to use the pointer somehow to perform the closing action … so there you have it.

Following the malloc/free convention (1), it is a bug. If it follows the fopen/fclose convention (2) it is not. So if there is a bug, it is in the standard because it lacks convention for dealing with zombies.
Convention (1) works well with C++11 move semantics
Convention (2) leaves more responsibility to the caller. In particular, a dtor must explicitly check for null if a move operation has been done.
I think this is something that should be revised for an upcoming POSIX revision in order to avoid confusion.
Update
I found from this answer https://stackoverflow.com/a/6277781/877329, and then reading man pthread_join that you can call pthread_join with an invalid tid, supporting the malloc/free convension. Another issue I have found with the dynamic loader interface is that it does not use the standard error handling system, but has its own dlerrorfunction.

Multithreading (pthreads)

I'm working on a project where I need to make a program run on multiple threads. However, I'm running into a bit of an issue.
In my program, I have an accessory function called 'func_call'.
If I use this in my code:
func_call((void*) &my_pixels);
The program runs fine.
However, if I try to create a thread, and then run the function on that, the program runs into a segmentation fault.
pthread_t thread;
pthread_create (&thread, NULL, (void*)&func_call, (void*) &my_pixels);
I've included pthread.h in my program. Any ideas what might be wrong?

You are not handling data in a thread safe manner:
the thread copies data from the thread argument, which is a pointer to the main thread's my_pixels variable; the main thread may exit, making my_pixles invalid.
the thread uses scene, main thread calls free_scene() on it, which I imagine makes it invalid
the thread calls printf(), the main thread closes stdout (kind of unusual itself)
the thread updates the picture array, the main thread accesses picture to output data from it
It looks like you should just wait for the thread to finish its work after creating it - call pthread_join() to do that.
For a single thread, that would seem to be pointless (you've just turned a multi-threaded program into a single threaded program). But on the basis of code that's commented out, it looks like you're planning to start up several threads that work on chunks of the data. So, when you get to the point of trying that again, make sure you join all the threads you start. As long as the threads don't modify the same data, it'll work. Note that you'll need to use separate my_pixels instances for each thread (make an array of them, just like you did with pthreads), or some threads will likely get parameters that are intended for a different thread.

Without knowing what func_call does, it is difficult to give you an answer. Nevertheless, here are few possibilities
Does func_call use some sort of a global state - check if that is initialized properly from within the thread. The order of execution of threads is not always the same for every execution
Not knowing your operating system (AIX /Linux/Solaris etc) it is difficult to answer this, but please check your compilation options
Please provide the signal trapped and atleast a few lines of the stack-trace - for all the threads. One thing you can check for yourself is to print the threads' stack-track (using threads/thread or pthread and thread current <x> based on the debugger) and and if there is a common data that is being accessed. It is most likely that the segfault occurred when two threads were trying to read off the other's (uncommitted) change
Hope that helps.
Edit:
After checking your code, I think the problem is the global picture array. You seem to be modifying that in the thread function without any guards. You loop using px and py and all the threads will have the same px and py and will try to write into the picture array at the same time. Please try to modify your code to prevent multiple threads from stepping on each other's data modifications.

Is func_call a function, or a function pointer? If it's a function pointer, there is your problem: you took the address of a function pointer and then cast it.
People are guessing because you've provided only a fraction of the program, which mentions names like func_call with no declaration in scope.
Your compiler must be giving you diagnostics about this program, because you're passing a (void *) expression to a function pointer parameter.
Define your thread function in a way that is compatible with pthread_create, and then just call it without any casts.

Is synchronization needed inside FinalConstruct()/FinalRelease()?

In my free-threaded in-proc COM object using ATL I want to add a member variable that will be set only in FinalConstruct() and read only in FinalRelease(). No other code will ever manipulate that member variable.
I doubt whether I need synchronization when accessing that member variable. I carefully read ATL sources and looks like those methods are always called no more than once and therefore from one thread only.
Is that correct assumption? Can I omit synchronization?

Yes, the assumption is correct. Think of it as an extension of the C++ constructor and destructor. In theory, you could call a method on a COM object from a different thread, while FinalRelease() is executing. Although, that is undefined behaviour, and not an expected occurrence. You shouldn't try to protect yourself from it, just as you wouldn't try to protect yourself from other threads in a destructor. If you have to protect yourself in the destructor, the design is generally broken (it would indicate that you do not have a proper termination protocol between your threads).
The only way FinalRelease() could be called from another thread, is when the client code does not have a valid reference count to your object, or if some other parts of your code is releasing twice. This is a hard error, and will probably end up in a crash anyway, totally unrelated to any synchronization errors you might have. The ATL code for managing object reference count is thread safe, and will not leave any race conditions open.
As for FinalConstruct(), a reference to the object is not returned to any client before FinalConstruct() has finished with a successful return code.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string