While shutting down QNX neutrino using phshutdown(either reboot or shutdown),system hangs while killing message queues(mqueue).the message displayed on screen is
Shutting down service providers(mqueue)
What could be the reason for this ?
This happens from time to time when you issue shutdown from the command line as well.
Some of the reasons I've seen on the web are:
Hardware issue
Driver issue
Kernel told to shut down when it didn't want to
From what I've cobbled together (and this is by no means definitive, but seems to be plausible), basically, any program that is waiting for the hardware or OS to reply has a chance of hanging the shutdown if the thing it is waiting on gets killed before it does.
A possible mitigation is to slay all your apps/servers (especially those touching hardware devices or shared memory queues) prior to issuing a shutdown, wait for a second or two, then go ahead with your shutdown.
Related
Team i need to know how gcp compute instances stop [instance name] command works internally.
Need to know it does graceful shutdown or non graceful shutdown?
Also need to know ,is there any way via cli we can do non graceful (VM)shutdown?
That command could be considered as a grateful shutdown if the VM responds appropiately. If in a given time frame it doesn't, it will be forced to shutdown. I don't think there is a way to do a non-grateful, forced shutdown without attempting a ACPI signal first.
Stopping a VM causes Compute Engine to send the ACPI shutdown signal to the VM. Modern guest operating systems (OS) are configured to perform a clean shutdown before powering off in response to the power off signal. Compute Engine waits a short time for the guest OS to finish shutting down and then transitions the VM to the TERMINATED state.
https://cloud.google.com/compute/docs/instances/stop-start-instance
https://cloud.google.com/sdk/gcloud/reference/compute/instances/stop
I am using linux watchdog driver /dev/watchdog on a linux embedded system with busy box as user space tools. I want to trigger the watchdog from C/C++ code, which works fine for timeouts up to 60s:
watchdogFD = open( "/dev/watchdog", O_WRONLY );
int timeout = 60;
ioctl( watchdogFD, WDIOC_SETTIMEOUT, &timeout )
However for larger intervals the timeout is accepted, but the watchdog is triggered already after 60s.
The linux watchdog deamon offers a --force parameter to set timeouts larger than 60s (see https://linux.die.net/man/8/watchdog). However the busy box watchdog deamon does not offer this (see https://git.busybox.net/busybox/tree/miscutils/watchdog.c?id=1572f520ccfe9e45d4cb9b18bb7b728eb2bbb571).
Does anyone have a suggestion how to use the same --force option when controlling watchdog using ioctl? Thanks :)
It seems the busybox watchdog daemon you link to is very simple compared to the usual Linux one from here:
https://sourceforge.net/p/watchdog/code/ci/master/tree/
The --force option for the Linux daemon (above) is to override the sanity checks on the polling interval versus the hardware time-out used. It will not change any limits a specific hardware driver/timer has to offer.
Typically the choice of hardware time-out is in the 10-60 second range, depending on how long you can tolerate a major fault (like a kernel panic) persisting for. Then the watchdog daemon that feeds the timer has to poll at an interval that is at least a few seconds shorter so nothing timers out unexpectedly. Between polls it uses nanosleep() so gives up its CPU time, and so the system load for the daemon is proportional to the polling rate and the type of tests that are run.
Without any tests all you protect against is a major fault killing either the daemon or kernel, so usually you should be checking for something else that is essential for normal operations (e.g. a specific process being alive, files being updated, test script can be run, etc) to get the most benefit.
My question is related to knowledge on embedded Linux.
I just observed a strange reboot on my embedded project, which is very easy to reproduce.
When some condition is triggered, the system will like "freezing". I mean, its like encounter some infinite loop or be locked. Last for several seconds, system will quietly reboot. Not even core dump!!
I have no much clue about the cause. Generally will a lock or infinite loop can truly trigger Linux reboot? Or are there any things can freeze system and cause reboot with no core dump happens?
It is common on embedded systems to have a hardware watchdog; a timer implemented in hardware that resets the processor if it is allowed to expire.
Typically some software monitoring task continuously verifies the integrity of the system and restarts the hardware watchdog timer. If the monitoring task fails to run and the watchdog timer expires, the watchdog triggers a processor reset directly.
Your question is a bit hard to understand but yes, a "infinite loop" (the proper term is) in any application on any platform (including Linux) can crash a system. This happens obviously because an infinite loop can constantly take up memory and resources until there is none left. You mentioned you are doing embedded development (which can mean many different things) but usually means you are developing low-level applications built into Linux itself; these are more prone to crashing an OS than your average programming venture.
I'm working on an embedded linux platform.
When I do "echo "mem" > /sys/power/state", system will suspend.
I know that kernel and driver can know that suspend operation's coming. But would it be possible that a user space process or application can get the notification that the system will suspend? How?
For example, I have an application who writes 'A' continually into a buffer whose start address is given by a device driver. Would it be possible that this application be notified that the system will suspend so that it could replace all this buffer with 'B' so that when driver is resumed, all what driver sees are 'B'?
Thanks a lot.
Been searching for the same thing. But unfortunately, I didn't find any user space notification during suspend/resume. Applications are just refrigerated/frozen and they will never know they are suspended.
However, one possible approach would be to add a generic netlink message sending or uevent from any driver's suspend/resume function that you can modify. Still the application may never get enough time to process it before it is frozen and might lead to race conditions. Say it received the suspend message and got frozen before it could process it. And once resumed, it will be processing the suspend message.
IMO, it is better to handle the scenario in the driver. Leave the user space alone.
I'm not sure whether it's useful for you in particular, given the mention of "embedded", however systemd can notify you over DBus: https://www.freedesktop.org/wiki/Software/systemd/inhibit/
I'm trying to read a physical address using mmap in a application. Due to some reason, that physical address has some hardware fault and the ack on the bus will never come back when trying to read it.
When read this address, we found that the application hangs immediately without any message output, but the application can be cancelled or suspended, which means the OS is still alive without being impacted any.
1).I'm just curious what the application is doing and how the hang could happen?
my understand is that the CPU should have timeout detection when the ack not coming back at the specified time slot, the application should not stop at the read instruction and there should be some exception being triggered to inform the kernel.
2).We are doing a lot of hardware testing and so we want the application or the kernel output something when the hang happens. Is there a way of adding something to do this?
thanks a lot in advance!