Monitor detached child processes with pm2 - node.js

I have a pretty data heavy Node application. Besides common things like file uploading, the app also spawns detached (long running) child processes.
For an example, consider a user uploads a file and the detached process triggers a native tool installed on the system to do some heavy processing. This can take anywhere between a second and several minutes - therefore the process is detached and the user is notified within the web site (when online) or via email.
I'm considering to use pm2 as monitoring tool. It seems great, though how would I monitor individual detached child processes with it? I've read most of the docs and checked the code examples - but I didn't find an example for my particular problem.
Concrete config examples will be welcome, since I'm new to pm2.

As of November 2020, there is an open issue for this feature:
https://github.com/Unitech/pm2/issues/1869

Related

What are the use cases of identifying process id of a log message?

I'm researching node.js logging tools. The top 2 ones seem to be Winston and Bunyan. In the comparison article, it's mentioned how to get the process id.
What is the use case of knowing this process ID? When is it useful?
I mainly develop web apps.
Normally you have one log file that the application writes to. A few use cases could be:
if you have multiple processes logging to the same file (child processes) you can tell them apart
if a process crashes or is restarted, the pid will change, which indicates that this has happened
if you are debugging something it might be useful to see a change of events, so you need the pid to group them together

Monitor node.js scripts running on ubuntu instance

I have a node.js script that run once in a day on ubuntu EC2 instance. This script pulls data from some hundered thousand remote APIs and save to our local database. Is there any way we can monitor this node.js script on remote server? There have been few instances where script crashed due to some reason and we were unable to figure it out without SSHing into instance and checking the logs. I have however created a small system after first few crashes which send us an email whenever script crashes due to some uncaught exception and also when script completes execution.
However, we need to develop a better system where we can monitor the progress of script via web interface of our admin application which is deployed over some other instance and also trigger start/stop of script via this interface. What are possible options for achieving this?
If you like to stay in Node.js, then there are several process monitoring tools:
PM2 comes with lots of other features besides monitoring processes. You can monitor your processes via CLI or their official web interface: https://keymetrics.io/. A quick search on npm also gives a bunch of nice unofficial gui tools: https://www.npmjs.com/search?q=pm2+web
Forever is not as feature rich as PM2 but will do the basic process operations and couple of gui are also available in npm.
There are two problems here that you are trying to solve:
Scheduling work to be done
Monitoring a process for failure
At a simple level, this is easy: schedule a cron job and restart failed things so they keep trying.
However, when things don't go smoothly, it helps to have a lot more granularity over what you are scheduling, and how it is executed. This would also give you the visibility over each little piece of work.
Adding a little more complexity, you can end up with something like this:
Schedule the script that starts everything (via cron, if that's comfortable)
That script generates several jobs that need to be executed into a queue
A worker process (or n worker processes) consume that queue and execute pending jobs
You can monitor both the progress of the jobs, as well as the state of each worker (# of crashes, failures, jobs completed, etc.). The other tools mentioned above are good candidates for this (forever, pm2, etc.)
When jobs fail, other workers can pick up the small piece of work that was in progress and restart it. This is much more efficient than restarting the entire process, and also lets you parallelize things across n workers based on how you can split up the workloads.
You could easily throw the status onto a web app so you can check in periodically rather than have to dig through server logs.
You can also get more intelligent with different types of failures. Network error? Retry 5 times. Rated limited? Gradual back-off. Crash? Don't retry and notify via email. etc
I have tried this with pm2, you can get the info of the task, then cat out or grab the log files. Or you could have a logging server, see also: https://github.com/papertrail/remote_syslog2

How to Use Powershell to Kill threads of a specific processID

well this has been bugging me for a couple of days on and off. I am at a clients site where they have a number of bespoke, written in house, services running on a Windows 2008R2 IIS server. The problem is that a couple of these services keep hanging, they are stuck in a “Stopping” state and the only way to kill them off is to open process explorer and kill the threads. Before anyone says anything about using ‘runas’, or logging on as the local admin, or the service owner, etc we’ve been through all of that.
The problem lies with the executable itselfs. The development team, in another country are going to look at this but it will take 4-5 months minimum, and we’re not certain they’ll get it right then.
I have a Powershell script to check the services on a regular basis which has the ability to ensure the services are running and if not, the force a stop and restart of the service, then it sends an email to confirm the actions. However with these specific services mentioned it can do nothing. They can’t be killed in task manager, taskkill, or process explorer (unless one kills the threads) it just says access denied. It is possible to change the permissions in process explorer and kill it but that’s a lengthier process than killing the threads.
To make things a little more difficult I can’t use the process name as on this server there are two other websites using an exe with the same name, just in a different folder.
What I’m after is a way to find and kill the threads of a processID, which I’ve already obtained via the script I have, so the rest of the script can complete the task of restarting the said service. At the moment this service dies on an inconsistent basis throughout the day and night, and the support guys have to RDP onto the server, open process explorer, find the offending process and kill the threads off then restart the services. A bit too much hassle for these already over worked guys especially if we can get powershell to do it automatically.
Hope someone can help on this. Thanks in advance.
Low level thread handling is likely to require native Win32 API usage. Powershell might help with P/Invoke, but the process is going to be complex. For starters, find out if the following tools can be used to identify the stuck thread. Maybe you can combine this info with some Sysinternals tools like handle.exe to find out what really blocks the thread.
The .Net framework has some tools available via System.Diagnostics.Process namespace. A list for threads for named process is available like so,
$ps = [diagnostics.process]::getProcessesByName("iexplore")
$p = $ps[0]
$p.Threads[0]
Full documentation is in MSDN. There is no method for killing a thread, but this should be kind of starting point for identifying the stuck one.
Another a way is to use WMI to get win32_thread data like so,
$threads = gwmi win32_thread
The output is quite different and some filtering is needed. Some examples are available. Another a WMI solution attempt might be based on Win32_process that has Terminate method.

Executing process on Linux from WSGI based web application

I have a dashboard and I want a process to run when the user clicks on a button. That process might take a long time to complete.
My options so far:
using popen or something similar to execute the process
having a daemon monitor a directory. When this directory is changed (a file created) the daemon will do the job and then delete the file before idling again.
using cron, running every 5 seconds and also monitoring some directory.
Which one is more Linux-friendly? Is there any I have not considered?
This is what task queueing systems like Celery and Redis Queue are for.
Another option is to have a daemon (as in your 2nd option) that listen on some socket. Then, your WSGI application could just connect & send a command. There are many possibilities for how the communication over the socket would take place, choosing the right one depends a lot on the actual case.
This have the advantage that you can eventually have the two application (WSGI and the daemon) run on different computers or VMs at some point.

Debugging utilities for Linux process hang issues?

I have a daemon process which does the configuration management. all the other processes should interact with this daemon for their functioning. But when I execute a large action, after few hours the daemon process is unresponsive for 2 to 3 hours. And After 2- 3 hours it is working normally.
Debugging utilities for Linux process hang issues?
How to get at what point the linux process hangs?
strace can show the last system calls and their result
lsof can show open files
the system log can be very effective when log messages are written to track progress. Allows to box the problem in smaller areas. Also correlate log messages to other messages from other systems, this often turns up interesting results
wireshark if the apps use sockets to make the wire chatter visible.
ps ax + top can show if your app is in a busy loop, i.e. running all the time, sleeping or blocked in IO, consuming CPU, using memory.
Each of these may give a little bit of information which together build up a picture of the issue.
When using gdb, it might be useful to trigger a core dump when the app is blocked. Then you have a static snapshot which you can analyze using post mortem debugging at your leisure. You can have these triggered by a script. The you quickly build up a set of snapshots which can be used to test your theories.
One option is to use gdb and use the attach command in order to attach to a running process. You will need to load a file containing the symbols of the executable in question (using the file command)
There are a number of different ways to do:
Listening on a UNIX domain socket, to handle status requests. An external application can then inquire as to whether the application is still ok. If it gets no response within some timeout period, then it can be assumed that the application being queried has deadlocked or is dead.
Periodically touching a file with a preselected path. An external application can look a the timestamp for the file, and if it is stale, then it can assume that the appliation is dead or deadlocked.
You can use the alarm syscall repeatedly, having the signal terminate the process (use sigaction accordingly). As long as you keep calling alarm (i.e. as long as your program is running) it will keep running. Once you don't, the signal will fire.
You can seamlessly restart your process as it dies with fork and waitpid as described in this answer. It does not cost any significant resources, since the OS will share the memory pages.

Resources