Set up alert for Nimbus in Linux Server - linux

I have a server where Nimbus/Supervisor/Zookeeper is continuously running. I want to get E-Mail Notification whenever any of them is not running or if server is down due to any reason. What script should I write? I know the mail and cron part, Just need some hint on the Nimbus Checking part? A very lame way that i used is that I did
`ps -ef | grep Nimbus`
And I checked what it returns. But I believe it won't work when the server itself is down. I didn't check because it is a running server and I don't want to mess with it. So, Do I have to use any other application?

One possible way might be to use wait. Not sure whether it'll be possible for your app or not. But possible use might be like below :
wait `processname`
Return indicates the process has exited.

Related

Nodejs stuck on processing whenever the app is restarted

I have a nodejs application running on Linux, as we all know, whenever I restart the nodejs app it will get a new PID, suppose while the nodejs app is running, a client connects to it and running some process and the process status is processing, during that point of time, if the nodejs app restarts(on the server-side), how can we make sure the client connects back to the previous processing state.
What is happening now is, whenever the server restarts, the process stucks in processing forever.
Just direct me to a sample of how this scenario is handled in real life.
Thank You.
If I'm understanding you correctly, then the answer is you can't...
The reason for this is that, when you restart the process the event loop is restarted, meaning any processes that were running or were waiting in the event loop are gone. You are essentially clearing out the event loop when you restart.
I would say though, if you know the process is 'crashing' node then you probably want to look into that process and see why is crashing, place it in a try catch to it wont kill the server.
now with that said ( and without knowing what, processing state really means ) you could set a flag in your DB server for say 'job1' and have a status column of say 'running' when it was kicked off. When the node server restarts it can read Job status for 'running' jobs, if the 'job' is in a 'running' state you can fire off the job again and once complete update the table to 'completed'
This probably not the most efficient way as it's much better to figure out why the process if crashing, but as a fall-back this could work although in a clustered environment this could cause issues because server 1 may fail while server 2 is processing because server 1 does not know what server two is doing. With more details as to the use case, environment etc would probably allow for a better answer

Systemd http health check

I have a service on a Redhat 7.1 which I use systemctl start, stop, restart and status to control. One time the systemctl status returned active, but the application "behind" the service responded http code different from 200.
I know that I can use Monit or Nagios to check this and do the systemctl restart - but I would like to know if there exist something per default when using systemd, so that I do not need to have other tools installed.
My preferred solution would be to have my service restarted if http return code is different from 200 totally automatically without other tools than systemd itself - (and maybe with a possibility to notify a Hipchat room or send a email...)
I've tried googling the topic - without luck. Please help :-)
The Short Answer
systemd has a native (socket-based) healthcheck method, but it's not HTTP-based. You can write a shim that polls status over HTTP and forwards it to the native mechanism, however.
The Long Answer
The Right Thing in the systemd world is to use the sd_notify socket mechanism to inform the init system when your application is fully available. Use Type=notify for your service to enable this functionality.
You can write to this socket directly using the sd_notify() call, or you can inspect the NOTIFY_SOCKET environment variable to get the name and have your own code write READY=1 to that socket when the application is returning 200s.
If you want to put this off to a separate process that polls your process over HTTP and then writes to the socket, you can do that -- ensure that NotifyAccess is set appropriately (by default, only the main process of the service is allowed to write to the socket).
Inasmuch as you're interested in detecting cases where the application fails after it was fully initialized, and triggering a restart, the sd_notify socket is appropriate in this scenario as well:
Send WATCHDOG_USEC=... to set the amount of time which is permissible between successful tests, then WATCHDOG=1 whenever you have a successful self-test; whenever no successful test is seen for the configured period, your service will be restarted.

In Gnome, what signal on dbus-monitor indicates the user is logging out?

I want to write a script that runs in the background and detects when a user logs out. I am having trouble finding documentation on dbus-monitor. The best I can do is that I see a flurry of EndSessionQuery, EndSession, and EndSessionResponse but these all come with booleans so they can't fully be trusted (maybe a program says it doesn't want the user to logout?) and on top of that, what if no programs are open? This is too unreliable.
What I want is to listen for a signal that will always happen when the user is logging out. Can someone provide that signal? Currently I am running this command:
dbus-monitor --session \
"type='signal',interface='org.gnome.ScreenSaver',member='ActiveChanged'" | \
myprog
which catches the ScreenSaver events. But I also want to catch logout. What I wish for is something like:
dbus-monitor --session \
"type='signal',interface='org.gnome.Session',member='LogoutSuccess'" \
"type='signal',interface='org.gnome.ScreenSaver',member='ActiveChanged'" | \
myprog
Look for the files called org.gnome.SessionManager.* here: http://git.gnome.org/browse/gnome-session/tree/gnome-session
There is a SessionOver signal in the interface org.gnome.SessionManager that may be what you need.
Are you looking for the normal "session is ending, quit yourself or put up a prompt or something" request from the session manager, or a "session is really ending now, bye bye" signal?
This is an old thread, but I'm adding some info in case anyone else needs it.
I had the same needs, but ended up implementing a Session Manager DBus client as an easy to use script. It executes a user-defined script on logout. The ready to use application is shared on GitHub.
Gnome EndSession DBus client

Why does my node.js application occasionally hang when I don't have the terminal open?

I have a nodejs application that I run like this, over SSH:
$ tmux
$ node server.js
This starts my node application in a tmux session.
Obviously, I don't have the SSH session open all the time.
What I've been finding is that occasionally my application can get in a state where it won't server up any pages. This might be related to the application itself, or perhaps just a poorly disconnected SSH session.
Either way, simply logging into SSH, running:
$ tmux attach
And giving focus to the pane makes everything responsive again.
I thought the entire point of node.js was that everything is non-blocking - then what's going on here?
When a pane is is copy mode, tmux does not read from its tty. If some program running “in” in the tty continues to generate output, then the OS’s tty buffer will eventually fill and cause the writing process/thread to block. I do not know the internals of Node.js, but it may not expect writes to stdout/stderr to block: the console functions do not seem to have callbacks, so they may actually be blocking.
So, Node.js could very well end up blocked if the pane in which it was running was left in copy mode when your SSH connection was dropped.
If you need to assure non-blocking logging, then you might want to redirect (or tee) your stdout and stderr to a file and use something like less to view the prior logs (avoiding tmux’s copy mode since it might cause blocking).
Maybe something like this:
# Redirect stdout/stderr to a file, running Node.js in the background.
# Start a "less +F" on the log so that we immediately have a "tail" running.
node app.js >>app.log 2>&1 & less +F app.log
Or
# This pane will act as a 'tail -f', but do not use copy-mode here.
# Instead, run e.g. 'less app.log' in another pane to review prior logs.
node app.js 2>&1 | tee -a app.log
Or, if you are using a logging library, it might have something that you can use to automatically write to files.

Long running process remote termination?

I am developing an application that allows users to run AI algorithms on the server remotely. Some of these algorithms take a VERY long time. It is set up such that AJAX calls supply the algorithm parameters and launch a C++ algorithm on the server. The results and status of the computation are tracked via AJAX calls polling status files. This solution seems to work well for multiple users concurrently using the service, but I am now looking for a way to cancel the computation from the user's browser. I have a stop button that stops the AJAX updating service and ceases any communication between the browser and the running process on the server. The problem is that the process still runs, and I would like to free up the server resources when the user cancels the operation. Below are some more details.
The web service where the AJAX calls hit are run under the user 'tomcat' and can be listed by ps -U tomcat. The algorithm executions are all child processes of 'java' and can be listed by ps --ppid ###.
The browser keeps a record of the time that the current computation began (user system time, not server system time).
Multiple computations may be going on at once from users connected from different locations, resulting in many processes under the same name and parent process.
The restful service executes terminal commands via java runtime.exec().
I am not so knowledgeable about shell scripting, so any help would be greatly appreciated. Can anyone think of a way to either use java process object or shell script/awk to locate a process via timestamp (maybe the closest timestamp to user system time..?) or some other way?
Thanks in advance.
--edit
Is there even a way in java to get a handle for a given process if you have the pid...? Doesn't seem like it.
--edit
I cannot change the source code of the long running process on the server. :(
Your AJAX call should be manipulating some sort of a resource (most conveniently a text file) that acts as a semaphore to the process, which in every iteration of polling checks whether that semaphore file has been set to the stop status. If the AJAX changes the semaphore file to stop, then the process stops because your application checks it and responds accordingly. Which in turn means that the functionality needs to be programmed into your Java AI application rather than figuring out what the PID is and then killing it at the OS level. That, of course, assumes you have access to the source code of the app.
Of course, the semaphore does not have to be a file but can be a value in the DB etc., whichever suits your taste and configuration.
I have finally found a secure solution. From the restful java service, using Process p = Runtime.getRuntime().exec() gives you a handle on the running process. The only way, however, to get the pid is through a technique called reflection.
Field f = p.getClass().getDeclaredField();
f.setAccessible(true);
String pid = Integer.toString(f.getInt(p));
How unbelievably awkward...
Anyways, due to the passing of p from the server to the client being impossible, and the insecurity of allowing a remote call to kill an arbitrary server process by a pid passed by parameter, the only logical strategy I could come up with was to write the obtained pid to a process-unique file indicated by the initial client timestamp, and to delete this file upon restful service function return. This unique file can be used as a termination handle via yet another restful service which reads the file, and terminates the process with pid equal to the contents of the file. This
You could keep the Process instance returned by runtime.exec and invoke Process.destroy to kill the subprocess. Not knowing much about your webservice application I would assume you can keep the process instances in a global session map that maps users to process lists. Make sure access to this map is thread-safe. Also it only works if you have one webservice process that allows to share such a global session map across different requests.
Alternatively take a look at Get subprocess id in Java.

Resources