how to monitor various process on linux - linux

My application is composed of 4 unique processess. For HA reason, i am going to start 3 instances of each such that 2 instances of each process would run on a single linux host and one another set on a different linux host.
I am trying to write a monitoring script (bash script) that would periodically poll for these processess. My main challenge is that it sounds kludgy to write a script which is host name and process name dependent. For example , i dont want to write a script which monitors process-A-1, process-B-1, process-A-2, process-B-2 on linux host with IP Address A and process-A-3 on linux IP host Address B.
One way to write a monitoring script which is independent of host and process name is that when each of these process starts, they create a mutex name. For example, process-A-1 will create a mutex called mutex.process-A-1 and process-A-2 will create a mutex called mutex.process-A-2. Then all the script has to do is look for mutexes on a system with name like mutex.process-A*. The script then can use a ps command to check if those processess are running.
My question is that coupling with mutex name okay? Is there another way you have solved this problem on linux?

I would personally write a bash script which runs all this processes in background, then you can store their PIDs directly after calling them, storing process1_pid=$! after you send each one to the background and then trigger another script for monitoring using those pids.
Other way to get their PIDs is using the jobs command which will list all the jobs you have set to the background, jobs -p will list all the PIDs you have on the background. You can also make use of jobs to know if they're still running or not.
http://www.gnu.org/software/bash/manual/bashref.html#Job-Control
I'd start from there, if it's more complicated and your processes are being created from other places you can always use a particular user to run them all and use ps -u to filter them by user.

Related

How can I send a command to X number of EC2 instances via SSH

I've a lot AWS EC2 Instances and I need to execute a python script from them at the same time.
I've been trying from my pc to execute the script by sending via ssh the commands required. For this, I've created a another python script that open a cmd terminal and then execute some commands (the ones I need to execute the python script on each instance). Since I need that all these cmd terminal are openned at the same time I've used the ThreatPoolExecutor that (with my PC characteristics) grants me 60 runs in parallel. This is the code:
import os
from concurrent.futures import ThreadPoolExecutor
ipAddressesList=list(open("hosts.txt").read().splitlines())
def functionMain(threadID):
os.system(r'start cmd ssh -o StrictHostKeyChecking=no -i mysshkey.pem ec2-user#'+ipAddressesList[threadID]+' "cd scripts && python3.7 script.py"')
functionMainList =list(range(0,len(ipAddressesList)))
with ThreadPoolExecutor() as executor:
results = executor.map(functionMain, functionMainList)
The problem of this is that the command that executes the script.py is blocking the terminal until the end of the process, hence the functionMain stays waiting for the result. I would like to find the way that after sending the command python3.7 script.py the function ends but the script keeps executing in the instance. So the pool executor can continue with the threads.
The AWS Systems Manager Run Command can be used to run scripts on multiple Amazon EC2 instances (and even on-premises computers if they have the Systems Manager agent installed).
The Run Command can also provide back results of the commands run on each instance.
This is definitely preferably to connecting to the instances via SSH to run commands.
Forgive me for not providing a "code" answer, but I believe there are existing tools that already solve this problem. This sounds like an ideal use of ClusterShell:
ClusterShell provides a light and unified command execution Python framework to help administer GNU/Linux or BSD clusters. Some of the most important benefits of using ClusterShell are to:
provide an efficient, parallel and highly scalable command execution engine in Python,
Using clush you can execute commands in parallel across many nodes. It has options for grouping the output by hostname as well.
Another option would be to use Ansible, but you'll need to create a playbook in that case whereas with ClusterShell you are running a command the same way you would with SSH. With Ansible, you will create a target group for a playbook and it will connect up to each instance and tell it to run the playbook. To make it disconnect while the command is still running, look into asynchronous actions:
By default Ansible runs tasks synchronously, holding the connection to the remote node open until the action is completed. This means within a playbook, each task blocks the next task by default, meaning subsequent tasks will not run until the current task completes. This behavior can create challenges. For example, a task may take longer to complete than the SSH session allows for, causing a timeout. Or you may want a long-running process to execute in the background while you perform other tasks concurrently. Asynchronous mode lets you control how long-running tasks execute.
I've used both of these in HPC environments with more than 5,000 machines and they both will work well for your purpose.

Docker containers instead of multiprocessing

One of the main application of Docker containers is load-balancing. For example, in the case of a web application, instead of having only one instance handling all requests, we have many containers doing exactly the same thing, but the requests are split toward all of these instances.
But can it be used to do the same service, but with different "parameters"?
For instance, let's suppose I want to create a platform storing crypto-currency data from different exchange platforms (Bitfinex, Bittrex, etc.).
A lot of these platforms are handling web sockets. So in order to create one socket per platform, I would do something at the "code layer" like (language agnostic):
foreach (platform in platforms)
client = createClient(platform)
socket = client.createSocket()
socket.GetData()
Now of course, this loop would be stuck on the first iteration, because the websocket is waiting (although I could use asynchrony, anyway). To circumvent that, I could use multiprocessing, something like:
foreach (platform in platforms)
client = createClient(platform)
socket = client.createSocket()
process = new ProcessWhichGetData(socket)
process.Launch()
Is there any way to do that at a "Docker layer", I mean to use Docker to make the different containers handling different platforms? I would have one Docker container for Bittrex, one Docker container for Bitfinex, etc.
I know this would imply that either the different containers would communicate between each other (who takes care of Bitfinex? who takes care of Bittrex?), or the container orchestrator (Docker Swarm / Kubernete) would handle itself this "repartition".
Is it something we could do, and, on top of that, is it something we want?
Docker containerization simply adds various layers of isolation around regular user-land processes. It does not by itself introduces coordination among several processes, though it certainly can be exploited in building a multi-process system where each process perform some jobs, no matter if these jobs are redundant or complementary.
If you can design your solution so that one process is launched for each "platform" (for example, passing the specific platform an instance should handle as a command line parameter), then indeed, this can technically be done in Docker.
I should however point out that it is not clear why you would want to run each process in a distinct container. Is isolation pertinent for security reasons? For resource accounting? To have each process dispatched to a distinct host in order to have access to more processing power? Also, is there coordination required among these processes, outside of the having to initially determine which process handle which platform? If so, do they need to have access to a shared storage, or be able to send signals to each others? These questions will help you decide how to approach the dockerization of your solution.
In the most simple case, assuming that all you want is to have the whole process be isolated from the rest of the system, but with no requirement that these processes be isolated from each other, then the most simple strategy would simply to have a single container that contains an entrypoint shell script, which will simply launch one process per platform.
entrypoint.sh (inside your docker image):
#!/bin/bash
platforms=Bitfinex Bittrex
for platform in ${platforms} ; do
./myprogram "${platform}" &
done
If you really need a distinct container for each platform, then you would use a similar script, but this time, it would be run directly on the host machine (that is, outside of a container), and would encapsulate each process inside a docker container.
launch.sh (directly on the host):
#!/bin/bash
for platform in ${platforms} ; do
docker -name "program_${platform}" my_program_docker \
/usr/local/bin/myprogram "$platform"
done
Alternatively, you could use docker-compose to define the list of docker containers to be launched, but I will not discuss more this option at present (just ask if this seems pertinent to your you case).
If you need containers to be distributed among several host machines, then that same loop could be used, but this time, processes would be launched using docker-machine. Alternatively, if using docker-compose, the processes could be distributed using Swarm.
Say you restructured this as a long-running program that handled only one platform at a time, and controlled which platform it was via a command-line option or an environment variable. Instead of having your "launch all the platforms" loop in code, you might write a shell script like
#!/bin/sh
for platform in $(cat platforms.txt); do
./run_platform $platform &
done
This setup is easy to transplant into Docker.
You should not plan on processes launching Docker containers dynamically. This is hard to set up and has significant security implications (by which I mean "a bug in your container launcher could easily root your host").
If the individual processing tasks can all run totally independently (maybe they use a shared database to store data) then you're basically done. You could replace that shell script with something like a Docker Compose YAML file that lists out all of the containers; if you want to run this on multiple hosts you can use tools like Ansible, or Docker Swarm, or Kubernetes to spread the containers out (with varying levels of infrastructure complexity).
You can bunch the different docker containers in a STACK and also configure networking so that the docker containers can remain isolated form the outside world but can communicate with each other.
More info here Docker Stack

How to run application on linux in the background but leave possibility to interact with it?

Requirements:
I want to run my application on linux in the background (at startup of course).
I want to be able to call start/stop/restart commands directly from console (it have to be simple just like for /etc/init.d - just call simple command directly from console).
I want to be able to call status - and I want that this command will somehow get the actual status of application returned by itself. I thought that I can call some method which returns String or just use stdin to send command but when I do noup .. &, or start-stop-daemon, then the stdin is detached. Is there a simple way to attach stdin back to the application (I've seen that I can create a pipe, but this is pretty complitated). Or what is the best way to communicate with application after it is started as a daemon (I can make a socket and connect through telnet for example, but I am looking for simpler solution and possibility to do it directly from console, without starting telnet first)? Ideally it will be great to get the possibility to send any command, but simple status will be sufficient (but again - it have to communicate with the application to get that status somnehow)
I have found many different answers. Some of them says to simply use nohup and &, and some others says that nohup and & is old fashion. Some answers says to use start-stop-daemon or JSvc (for java). But it seems that none of them will suffice this 3 requirements from me.
So... What are the simplest possibilities for all of 3 requirements to be met?
PS. I don't want to use screen. The application must be run as a linux daemon.
PPS. Application is written in Java but I am looking for generic soluction which is not limited to java.
You should create a command line tool for communicate with a daemon in way you need. The tool itself can use TCP/IP or named pipes.
And then use cli-tool start|stop|restart|status from console.
If you need to start a daemon at startup sequence (before user login) you have to deal with init system (init.d, systemd, OpenRC, etc...).
Dragons be here:
Be sure that init doesn't restart your daemon after manual stop via cli.
Command line tool itself runs with unprivileged user rights, so restart may be hard if first startup script use superuser rights or application-specific user and, especially in case deep init integration, you might have to use sudo cli-tool start.
To avoid this one possible solution is to make wrapper daemon, that runs forever via init and control the underlying application (start-stop) with proper rights.
Cons: Develop two additional tools for a daemon.
Pros: Wrapper daemon can operate as a circuit breaker between superuser/specific user and userspace.

Start process in new session to be able to kill all tree at once

There is a Linux-oriented Python program that is launching Puppet subprocess. Puppet is a configuration management software and while executing it launches many subprocesses (yum, curl, custom scripts etc.) Python code has a watchdog that kills puppet subprocess if it is running too long. In a current version it uses os.kill to do that.
The problem is that when puppet process is killed on timeout, it's orphaned children are attached to "init" and keep running. Typically these children are an initial cause of a timeout.
The first attempt was killing the entire process group (os.killpg). But kill call fails with OSError(3, 'No such process'). After studying process management docs I understood that it does not work because puppet itself launches ruby process in a separate process group. Also, process group is not inherited by children, so os.killpg would not help anyway. Yes, POSIX allows to set PGID of children with some limitations, but it requires iterative monitoring of new subprocesses and looks like a hack in my case.
Next attempt was running Puppet in a separate shell ("sh -c")/"su" environment/setsid (in various combinations). The desired result is to start this process (and children) in a new session. I expect that it will allow to emulate something like remote ssh connection disconnect: sending SIGHUP to session leader, eg puppet process, will send SIGHUP to an entire tree of children. So I will be able to kill the entire tree. Experiments with running puppet through remote SSH connection show that this approach seems to work: all processes die after terminal disconnect. I have not achieved this behaviour from Python yet. Is this approach correct, am I missing something or it's an ugly hack?
One more approach I see is to send SIGSTOP to every process in tree (iterating while there is at least one running process in a tree to avoid race conditions), and then kill every process individually. This approach will work, but looks not too elegant.
Problem is not related to Python code, it also reproduces when running "puppet apply" from a console and sending signals using the "kill" command.
Yes, I know that Puppet has a "timeout" keyword for the described purpose, but I am looking for a more general solution, applicable not only to Puppet, but to any fruitful subprocess.

NodeJS: how to run three servers acting as one single application?

My application is built with three distincts servers: each one of them serves a different purpose and they must stay separated (at least, for using more than one core). As an example (this is not the real thing) you could think about this set up as one server managing user authentication, another one serving as the game engine, another one as a pubsub server. Logically the "application" is only one and clients connect to one or another server depending on their specific need.
Now I'm trying to figure out the best way to run a setup like this in a production environment.
The simplest way could be to have a bash script that would run each server in background one after the other. One problem with this approach would be that in the case I need to restart the "application", I should have saved each server's pid and kill each one.
Another way would be to use a node process that would run each servers as its own child (using child_process.spawn). Node spawning nodes. Is that stupid for some reason? This way I'd have a single process to kill when I need to stop/restart the whole application.
What do you think?
If you're on Linux or another *nix OS, you might try writing an init script that start/stop/restart your application. here's an example.
Use specific tools for process monitoring. Monit for example can monitor your processes by their pid and restart them whenever they die, and you can manually restart each process with the monit-cmd or with their web-gui.
So in your example you would create 3 independent processes and tell monit to monitor each of them.
I ended up creating a wrapper/supervisor script in Node that uses child_process.spawn to execute all three processes.
it pipes each process stdout/stderr to the its stdout/stderr
it intercepts errors of each process, logs them, then exit (as it were its fault)
It then forks and daemonize itself
I can stop the whole thing using the start/stop paradigm.
Now that I have a robust daemon, I can create a unix script to start/stop it on boot/shutdown as usual (as #Levi says)
See also my other (related) Q: NodeJS: will this code run multi-core or not?

Resources