What happens when a panic hook panics? - rust

What happens if the function I passed to std::panic::set_hook panics?
I can imagine many ways of reacting to this: consider this UB, abort the program like C++ does, invoke the panic handler again for the new panic, simply abort the execution of the hook... What exactly does Rust promise here?
Context. I'm writing a web app with Rust/WASM backend and I would like to make a panic hook that sends any errors to the server for debugging. This involves a network operation, which can itself fail. So I'm trying to figure out how I can ensure some reasonable behavior in this double-failure scenario.

It's not documented outside of the source code.
The source code for the panic entry point in std has this comment:
// If this is the third nested call (e.g., panics == 2, this is 0-indexed),
// the panic hook probably triggered the last panic, otherwise the
// double-panic check would have aborted the process. In this case abort the
// process real quickly as we don't want to try calling it again as it'll
// probably just panic again.
So the answer to your question is either "invoke the panic handler again for the new panic" or "abort the program" depending on how many times the hook already panicked.
This all assumes you aren't using #![no_std]. If you are then you're either disabling panicking altogether or you are implementing your own panic handler with #[panic_handler], in which case you get to decide what happens yourself.

Related

Analyzing a crash from a rust-generated C library

I needed to call Rust code from my Go code. Then I used C as my interface. I did this:
I've created a Rust library that takes a CStr as a parameter, and returns a new processed string back as CStr.
This code is statically compiled to a static C library my_lib.a.
Then, this library is statically linked with my Go code, that then calls the library API using CGo (Go's representation to C String, just like Rusts's Cstr).
The final Go binary is sitting inside a docker container in my kubernetes. Now, my problem is that is some cases where the library's API is called, my pod (container) is crashing. Due to the nature of using CStr and his friends, I must use unsafe scopes in some places, and I highly suspect a segfault that is caused by one of the ptrs used in my code, but I have no way of communicating the error back to the Go code that could be then printed OR alternatively get some sort of a core dump from Rust/C so I can point out the problematic code. The pod just crashes with no info whatsoever, at least to my knowledge..
So my question is, how can I:
Recover from panic/crashes that happen inside an unsafe code? or maybe wrap it with a recoverable safe scope?
Override the SIG handlers so I can at least "catch" the errors and not crash? So I can debug it.
Perhaps communicate a signal interruption that was caused in my c-lib that was generated off Rust back to the caller?
I realize that once Rust is compiled to a c-library, it is a matter of C, but I have no idea how to tackle this one.
Thanks!
I've created a Rust library that takes a CStr as a parameter, and returns a new processed string back as CStr.
Neither operation seems OK:
the CStr documentation specifically notes that CStr is not repr(C)
CStr is a borrowed string, a "new processed string" would have to be owned (so a CString, which also isn't repr(C)).
Due to the nature of using CStr and his friends, I must use unsafe scopes in some places, and I highly suspect a segfault that is caused by one of the ptrs used in my code, but I have no way of communicating the error back to the Go code that could be then printed OR alternatively get some sort of a core dump from Rust/C so I can point out the problematic code. [...] Recover from panic/crashes that happen inside an unsafe code? or maybe wrap it with a recoverable safe scope?
If you're segfaulting there's no panic or crash which Rust can catch or manipulate in any way. A segfault means the OS itself makes your program go away. However you should have a core dump the usual way, this might be a configuration issue with your container thing.
Override the SIG handlers so I can at least "catch" the errors and not crash? So I can debug it.
You can certainly try to handle SIGSEGV, but after a SIGSEGV I'd expect the program state to be completely hosed, this is not an innocuous signal.

thread with a forever loop with one inherently asynch operation

I'm trying to understand the semantics of async/await in an infinitely looping worker thread started inside a windows service. I'm a newbie at this so give me some leeway here, I'm trying to understand the concept.
The worker thread will loop forever (until the service is stopped) and it processes an external queue resource (in this case a SQL Server Service Broker queue).
The worker thread uses config data which could be changed while the service is running by receiving commands on the main service thread via some kind of IPC. Ideally the worker thread should process those config changes while waiting for the external queue messages to be received. Reading from service broker is inherently asynchronous, you literally issue a "waitfor receive" TSQL statement with a receive timeout.
But I don't quite understand the flow of control I'd need to use to do that.
Let's say I used a concurrentQueue to pass config change messages from the main thread to the worker thread. Then, if I did something like...
void ProcessBrokerMessages() {
foreach (BrokerMessage m in ReadBrokerQueue()) {
ProcessMessage(m);
}
}
// ... inside the worker thread:
while (!serviceStopped) {
foreach (configChange in configChangeConcurrentQueue) {
processConfigChange(configChange);
}
ProcessBrokerMessages();
}
...then the foreach loop to process config changes and the broker processing function need to "take turns" to run. Specifically, the config-change-processing loop won't run while the potentially-long-running broker receive command is running.
My understanding is that simply turning the ProcessBrokerMessages() into an async method doesn't help me in this case (or I don't understand what will happen). To me, with my lack of understanding, the most intuitive interpretation seems to be that when I hit the async call it would go off and do its thing, and execution would continue with a restart of the outer while loop... but that would mean the loop would also execute the ProcessBrokerMessages() function over and over even though it's already running from the invocation in the previous loop, which I don't want.
As far as I know this is not what would happen, though I only "know" that because I've read something along those lines. I don't really understand it.
Arguably the existing flow of control (ie, without the async call) is OK... if config changes affect ProcessBrokerMessages() function (which they can) then the config can't be changed while the function is running anyway. But that seems like it's a point specific to this particular example. I can imagine a case where config changes are changing something else that the thread does, unrelated to the ProcessBrokerMessages() call.
Can someone improve my understanding here? What's the right way to have
a block of code which loops over multiple statements
where one (or some) but not all of those statements are asynchronous
and the async operation should only ever be executing once at a time
but execution should keep looping through the rest of the statements while the single instance of the async operation runs
and the async method should be called again in the loop if the previous invocation has completed
It seems like I could use a BackgroundWorker to run the receive statement, which flips a flag when its job is done, but it also seems weird to me to create a thread specifically for processing the external resource and then, within that thread, create a BackgroundWorker to actually do that job.
You could use a CancelationToken. Most async functions accept one as a parameter, and they cancel the call (the returned Task actually) if the token is signaled. SqlCommand.ExecuteReaderAsync (which you're likely using to issue the WAITFOR RECEIVE is no different. So:
Have a cancellation token passed to the 'execution' thread.
The settings monitor (the one responding to IPC) also has a reference to the token
When a config change occurs, the monitoring makes the config change and then signals the token
the execution thread aborts any pending WAITFOR (or any pending processing in the message processing loop actually, you should use the cancellation token everywhere). any transaction is aborted and rolled back
restart the execution thread, with new cancellation token. It will use the new config
So in this particular case I decided to go with a simpler shared state solution. This is of course a less sound solution in principle, but since there's not a lot of shared state involved, and since the overall application isn't very complicated, it seemed forgivable.
My implementation here is to use locking, but have writes to the config from the service main thread wrapped up in a Task.Run(). The reader doesn't bother with a Task since the reader is already in its own thread.

winapi apc function parameter passing - what is the best practice

Hi i using winapi's QueueUserAPC to invoke an apc function call in another thread.
my question is, what is the best practice for passing a parameter to it.
i refer to the object lifetime and allocation/deallocation responsibility.
DWORD WINAPI QueueUserAPC(PAPCFUNC pfnAPC, HANDLE hThread, ULONG_PTR dwData);
i am using the dwData to pass the parameter to pass a pointer to some data and i was wondering how i should handle it.
i need to make sure that it lives until the receiving thread finished using it.
should i use a smart pointer to make sure that data is deallocated when no longer used?
i guess that allocation in the calling thread and dealloc. in the receiving is possible but probably not such a good thing.
anything else that can be done?
i think i would like to avoid synchronization between the two only to notify that the receiving thread is done with the data...
thanks!
Alloc'ing in the sending thread and dealloc'ing in the receiving one is easy, but it has the main drawback that it may leak, even if you handle the sending failure, the receiving thread may finish before having a chance to execute the APC.
Probably your easiest way to avoid the leak is to create a queue for sent data -maybe a queue per thread- and when thread finishes, you traverse the thread queue and free all the pending data.
But as usual, the devil is in the details...

Is the first thread that gets to run inside a Win32 process the "primary thread"? Need to understand the semantics

I create a process using CreateProcess() with the CREATE_SUSPENDED and then go ahead to create a little patch of code inside the remote process to load a DLL and call a function (exported by that DLL), using VirtualAllocEx() (with ..., MEM_RESERVE | MEM_COMMIT, PAGE_EXECUTE_READWRITE), WriteProcessMemory(), then call FlushInstructionCache() on that patch of memory with the code.
After that I call CreateRemoteThread() to invoke that code, creating me a hRemoteThread. I have verified that the remote code works as intended. Note: this code simply returns, it does not call any APIs other than LoadLibrary() and GetProcAddress(), followed by calling the exported stub function that currently simply returns a value that will then get passed on as the exit status of the thread.
Now comes the peculiar observation: remember that the PROCESS_INFORMATION::hThread is still suspended. When I simply ignore hRemoteThread's exit code and also don't wait for it to exit, all goes "fine". The routine that calls CreateRemoteThread() returns and PROCESS_INFORMATION::hThread gets resumed and the (remote) program actually gets to run.
However, if I call WaitForSingleObject(hRemoteThread, INFINITE) or do the following (which has the same effect):
DWORD exitCode = STILL_ACTIVE;
while(STILL_ACTIVE == exitCode)
{
Sleep(500);
if(!GetExitCodeThread(hRemoteThread, &exitCode))
break;
}
followed by CloseHandle() this leads to hRemoteThread finishing before PROCESS_INFORMATION::hThread gets resumed and the process simply "disappears". It is enough to allow hRemoteThread to finish somehow without PROCESS_INFORMATION::hThread to cause the process to die.
This looks suspiciously like a race condition, since under certain circumstances hRemoteThread may still be faster and the process would likely still "disappear", even if I leave the code as is.
Does that imply that the first thread that gets to run within a process becomes automatically the primary thread and that there are special rules for that primary thread?
I was always under the impression that a process finishes when its last thread dies, not when a particular thread dies.
Also note: there is no call to ExitProcess() involved here in any way, because hRemoteThread simply returns and PROCESS_INFORMATION::hThread is still suspended when I wait for hRemoteThread to return.
This happens on Windows XP SP3, 32bit.
Edit: I have just tried Sysinternals Process Monitor to see what's happening and I could verify my observations from before. The injected code does not crash or anything, instead I get to see that if I don't wait for the thread it doesn't exit before I close the program where the code got injected. I'm thinking whether the call to CloseHandle(hRemoteThread) should be postponed or something ...
Edit+1: it's not CloseHandle(). If I leave that out just for a test, the behavior doesn't change when waiting for the thread to finish.
The first thread to run isn't special.
For example, create a console app which creates a suspended thread and terminates the original thread (by calling ExitThread). This process never terminates (on Windows 7 anyway).
Or make the new thread wait for five seconds then exit. As expected, the process will live for five seconds and exit when the secondary thread terminates.
I don't know what's happening with your example. The easiest way to avoid the race is to make the new thread resume the original thread.
Speculating now, I do wonder if what you're doing isn't likely to cause problems anyway. For example, what happens to all the DllMain calls for the implicitly loaded DLLs? Are they unexpectedly happening on the wrong thread, are they being skipped, or are they postponed until after your code has run and the main thread starts?
Odds are good that the thread with the main (or equivalent) function calls ExitProcess (either explicitly or in its runtime library). ExitProcess, well, exits the entire process, including killing all threads. Since the main thread doesn't know about your injected code, it doesn't wait for it to finish.
I don't know that there's a good way to make the main thread wait for yours to complete...

System::Threading::Mutex, called from unsynchronized block of code. Unexpected deadlock

In an attempt to rid my GUI of race conditions and deadlocks I have the following function which I call from the c'tor and whenever I need the service which shares my named mutex to provide its input:
void EnvCapt::FireServiceAndOrHold() {
try {
mutTimerSyncEx->ReleaseMutex();
Thread::Sleep(100); //Time enough for the service to complete.
if (!mutTimerSyncEx->WaitOne(3 * int_ms)) {//int_ms = the polling period
//Must've been doubly locked or worse.
mutTimerSyncEx->ReleaseMutex();
FireServiceAndOrHold();
}
} catch (Exception ^ ex) {
//Released unheld mutex. Retake control.
mutTimerSyncEx->WaitOne();
FireServiceAndOrHold();
}
}
This works relatively well but I am calling this before letting the service now I am ready to accept input so it never attempts to wait for me to release the mutex for it. Before I attempt to re-order things I would like to know what is going wrong with the above function. The error I get is:
Object synchronization method was called from an unsynchronized block of code.
Because calling release on a mutex that hasn't been WaitOne'd will throw I catch that, knowing I am free to take ownership of it and continue. But I am wrong. It hangs forever on the WaitOne() statement. I know what the other process is doing all this time because it is trapped in my second debugger window. It is not touching the mutex.
UPDATE
I've attempted the reordering I first suggested, this seemed good but now I find that the mutex is only sort of Global, despite having a Global\name.
It is shared because when my GUI c'tor's it firstInstance is false, hence I attempt to take control of it.
It is not shared because when the GUI calls WaitOne() on it the GUI blocks indefinitely. Whereas the service dances straight through its call to WaitOne() without a care in the world.
I just had an idea what might be going wrong for you there:
Hint: you cannot release a mutex on behalf of the other process! The other process will have to release the mutex if it holds it:
Process 1: Process 2:
============ =============
WaitOne (locks the mutex)
// do work WaitOne (awaits the mutex)
// do more work
// done
ReleaseMutex ------> WaitOne returns from the wait _with
the mutex locked_

Resources