Error report to external BMC inside interrupt handler - linux

We have one system and an external Baseboard Management Controller (BMC) to monitor this system. When there is a critical error occurred in the system, the error should be logged and sent to the external BMC. The process of sending the error message to the BMC may take a lot of time, as we need to compose the log entry, send the event out via the I2C bus. The error is captured inside the interrupt handler which requires to process the event in a very short time and non-blocking manner. On the other hand, if the error is non-recoverable, the system may reboot immediately.
May you please recommend a good way to handle the error reporting inside the interrupt handler, or is there any standard way for this procedure? Any suggestions are appreciated. Thanks in advanced.

There is no good way.
If your BMC communications sleep, you cannot do them from inside the interupt handler and must move them to a workqueue.
If your system reboots immediately after the interrupt handler, you cannot communicate with the BMC.
If your interrupt handler actually knows that the system will reboot, then you could change the I²C driver to add some method to send data from inside an interrupt handler, by busy-polling instead of sleeping.

Related

Is it possible to trigger service bus processors when the client object is disposed?

I am calling ServiceBusClient.DisposeAsync for disposing the client object. However, there are processors created from this object which starts throwing an exception saying the object is disposed and it cannot listen anymore. Is there any way to trigger auto closure of service processors when dispose is called? Or, should I get hold of all the processors created from this client and then stop the listening?
The short answer is no; there is intentionally no "stop processing on dispose" behavior for the processor. Your application is responsible for calling StopProcessingAsync.
More context:
The ServiceBusClient owns the connection shared by all child objects spawned from it. Closing/disposing the client will effectively dispose its children. If you attempt to invoke a service operation at that point, an error is triggered.
In the case of a processor, the application holds responsibility for calling start and stop. Because the processor is designed to be resilient, its goal is to continue to recover in the face of failures and keep trying to make forward progress until stop is called.
While it's true that the processor does understand that a disposed set of network resources is terminal, it has no way to understand your intent. Did your application close the ServiceBusClient with the intent that it would stop the associated processors? Did it close the client without realizing that there were processors still running?
Because the intent is ambiguous, we have to take the safer path. The processors will continue to run because they'll surface the exceptions to your application's error handler - which ensures that your application is made aware that there is an issue and allows you to respond in the way best for your application's needs.
On the other hand, if processing just stopped and you did not intend for that to happen, it would be much harder for your application to detect and remediate. You would just know that the processor stopped doing its job and there's a good chance that you wouldn't be able to understand why without a good deal of investigation.

When a Node.js process can exit() directly without triggering other signal events (uncaughtException, SIGINT, SIGTERM...)

I work on an application with a REDIS data store. To maintain the data integrity I would like to cleanup the store on process shutdown.
I created a function handler and bind it (with process.on()) to signal events: uncaughtException, SIGINT, SIGTERM, SIGQUIT and it works well (CTRL+C, process killing, exception, ...).
But I've read that in certain conditions the process can exit directly without triggering the other signal events.
The problem in this particular case is that process.on('exit') handler can only process synchronous tasks.
I made different test to try to kill the process in different ways.
And (except with SIGTERM on Windows) I wasn't able to identify the case where process.on('exit') is triggered directly without SIGINT, SIGTERM or other event firing.
So my question is (on Linux system), under what conditions the process can exit directly without firing on of this event: http://nodejs.org/api/all.html#all_signal_events ?
As of now, reading the documentation and doing some research, it seems there is only four way a node.js app exit:
process.exit(), which is handled by process.on('exit')
Receiving a *nix signal, which can be handled by process.on('signal_name')`
Having a exception going back to the event loop, handled by process.on('UncaughtException')
The computer being plugged out, destroyed, the node.js binary blowing up, or a SIGKILL/kill -9, and there is no handling to that.
It usually happen someone don t understand the error message from a uncaughtException, and mistakenly believe it is "something else" that killed node.js.
Indeed. I just meant to point out that Node programs can exit as a result of a signal without having executed a handler (if none was registered). Rereading your post, I see that may be what you meant as well.

Trying to send a packet to a terminated process, by using netlink

I am writing a linux module which exchanges data with a user process.
Perhaps the system may crash if module try to send data to a terminated user process. For avoiding crash we can use kill() function for checking availability of pid. But it will not work every time because user process may be closed after successful return of kill() function. I wanted to know that "Is there any signal and signal handling mechanism exist which can handle this situation?".
If yes, then please explain this.
Thanks..

How to handle ADO fatal network errors without interrupting threads' execution?

NOTE: A question similar to How to handle TIdHTTP fatal network errors without interrupting thread's execution? but handling TADOConnection fatal network errors.
I would greatly appreciate your suggestions on handling fatal network errors raised by TADOConnection inside TThread's Execute procedure.
My app runs a while..do loop inside the Execute procedure. Each loop makes a database update via TADOConnection.Execute(). All loop exceptions are handled at the loop level. There is also an upper level handler of on E: Exception do (level of Execute).
During active network operations a fatal network error occurs on the remote database server. Something went wrong and in the middle of the loop threads report they cannot find path a database server (Named Pipes Provider error, EOleException, etc.) The network gets back within 10 minutes.
I want to make sure that in case of a sudden network death, each thread somehow:
Detects that (well, that's the easiest).
Smokes a joint and waits until the network comes back.
Makes sure the network is up.
Re-establishes connection and restores all network hood.
Continues the loop.
The desired result is to keep the thread up and running and just survive the temporary network problem. The threads must periodically check if the network is back. When it is, the exception handler must restore all ADO hood calling thread's RestoreNetworkConnection and then continue the loop.
What I definitely don't want - is to stop threads' execution.
Questions
Which events/exceptions shall I intercept to handle fatal network errors?
Where to place exception handlers within Execute?
What is correct way of re-establishing connection back to normal state?

why does the irqaction structure in linux kernel refer to a list of interrupt handlers

I was trying to understand the linux interrupt handling mechanism. But I am currently stuck up at a very basic query - why is the 'irqaction' structure pointing to a list of interrupt handlers ? Isnt the do_IRQ function invoked for an interrupt from a particular source(device). Then shouldnt the interrupt handler as registered by that device be invoked?

Resources