How to track down process that's running too long? - linux

I have a VPS with firewall and security notices enabled. I keep getting emails like this:
Time: Wed Jun 19 19:01:54 2019 -0500
Account: user
Resource: Process Time
Exceeded: 7248 > 3600 (seconds)
Executable: /opt/cpanel/ea-php72/root/usr/sbin/php-fpm
Command Line: php-fpm: pool domain_com
PID: 16374 (Parent PID:9915)
Killed: No
So for some reason with this example I have a script that has apparently been running for 2+ hours non-stop. I don't have anything that should be doing that.
I'm getting notices like this quite often. How can I use this info to track down what specifically is causing this?
Any information would be greatly appreciated. Thanks!

You can track which the exact process with the process ID mentioned.
lsof -p 16374
The alert which you are getting is from the LDF which is installed as a part of CSF. I think its normal for cPanel with php_fpm to have the process php_fpm run this long.
You can add the php-fpm to csf.pignore file to stop this warning.
You can also refer the below cPanel fourm thread.
https://forums.cpanel.net/threads/lfd-excessive-resource-usage-normal-for-php-fpm.592583/

To get more information on processes, I would use the Htop tool. This is a great article for learning about how to manage processes using htop and ps
Lsof (List open files) will tell you more information about what files the process is using.
You can get htop and lsof with
sudo apt install htop lsof -y
This article indicates that :
That message comes from the third-party CSF/LFD application and indicates a PHP-FPM process was running longer than the maximum time configured for the CSF/LFD detection period. It shows the process was not killed, thus you should not have traffic loss.
So you might want to check the PHP-FPM error log for the account in-question to see if you notice any particular error messages. It's located at:
/home/$username/logs/domain_tld.php.error.log
It looks like your specific issue has not been resolved on that form. So, you might want to try strace. It handles watching system calls made by a given process including all read-write operations and os function calls. You can activate it on the command line before the program you want to track or attach to a running process by hitting s on a process selected in htop.

Related

Linux SIGINT passive capture

Is there a place where the Linux kernel passively logs SIGKILL (kill -9) shutdown requests?
I have a JVM running that is arbitrarily being shut down and I suspect that, based on the evidence available, is being shut down by a stray process that is somehow issuing a shutdown of the JVM process. I have robust logging in place but in order to confirm my suspicion, I'd have to turn up the logging level to overwhelming levels.
I've researched exhaustively through /var/log and can't seem to find any place that might capture and log these SIGKILL events. Any ideas where I might find these events, if they exist?
Option 1:
If your kernel has ftrace support (very likely) try the killsnoop tool from Brendan Gregg's perf-tools:
wget https://raw.githubusercontent.com/brendangregg/perf-tools/master/killsnoop
chmod +x killsnoop
sudo ./killsnoop -s
More usage examples in the killsnoop_example.txt file.
Option 2: (passive capture)
If your kernel has no ftrace support you can use the kernel-siglog kernel module from https://github.com/nfedera/kernel-siglog :
git clone https://github.com/nfedera/kernel-siglog.git
cd kernel-siglog/
make
sudo insmod siglog.ko
Once inserted the siglog kernel module will record the last 10,000 signals in /proc/siglog
I had a similar issue and found the culprit using this kernel module. I had it inserted on a customer's server for some weeks and when the service was killed I logged in, did a cat /proc/siglog and found that my service was killed by a customer's own buggy watchdog script.

Neo4j Server running problems on Linux

I am using Neo4j 2.0.3 Community server by installing it on my linux system (by unzipping the tar.gz). I got this error while I tried to start the server
WARNING! You are using an unsupported Java runtime.
process [50690]... waiting for server to be ready.neo4j-community-2.0.3/bin/neo4j: line 147: lsof : command not found
.neo4j-community-2.0.3/bin/neo4j: line 147: lsof : command not found
.neo4j-community-2.0.3/bin/neo4j: line 147: lsof : command not found
. Failed to start within 120 seconds.
Neo4j Server may have failed to start, please check the logs.
I checked for the solution for this and came to know that /usr/sbin had to be added to the path. On doing so and restarting the server, I got the following message
Another server-process is running with [40903], cannot start a new one. Exiting.
However, when I run the command neo4j staus , it says
Neo4j Server is not running
Can anybody please help me with how should I get started with it?
This is very late, but might help others.
If it tells you this, and you check that process id with, for example, ps aux | grep 40903, and it's not neo4j, the problem might be that the port is being used.
By default neo4j uses 7474, but can change this on the neo4j folder /conf/neo4j-server.properties and that was my problem, I had set the port to '22' which was being used. SO make sure it is set to a port that is open and available.
Hope this helps.
You might want to examine the startup script.
Another server-process is running with [40903], cannot start a new one. Exiting.
indicates (me to) that there might be a pid file (or the script uses them) which was written and is checked before attempting to start a new instances. This the normal thing to do.
I think you need to kill the other process using kill
You can see this answer for how to kill the process:
https://unix.stackexchange.com/questions/8916/when-should-i-not-kill-9-a-process
Otherwise, restarting the operating system will also do the job. For me, I normally start neo4j in the console, as in ./neo4j console. This makes it easier to stop the process.

jsvc.exec error: Still running according to PID file

Running my Java service using jsvc on Linux (Ubuntu) 10.04.4 LTS and when I stop service and requests and then hung, checked log to find jsvc exec error below.
14/03/2014 12:49:48 19831 jsvc.exec error: Still running according to PID file /home/user/tmp/example.pid, PID is 19728
14/03/2014 12:49:48 19830 jsvc.exec error: Service exit with a return value of 122
Any idea ?
Thanks,
I'm having a similar problem that happens at a log rotation. It appears that the system is shutting down, rotating the logs, then trying to start the system. I believe error 122 is telling you that it hasn't completed shutting down yet and can't restart. I believe the -wait parameter is needed in the start script.
http://commons.apache.org/proper/commons-daemon/jsvc.html
Also see http://freddyandersen.wordpress.com/2009/09/02/running-tomcat-as-a-service-on-linux/ for an example.
If this is happening due to logrotate, use the copytruncate option instead of restarting the service.
http://www.vineetmanohar.com/2010/03/howto-rotate-tomcat-catalina-out/
Try to run below command. Here 19728 is the process id.
pkill -9 19728
Now start your process. It will work.
This will solve your problem.
Thank you.

Disable logging in a node.js script running with forever

I am continually running a few server scripts (on different ports) with nodejs using forever.
There is a considerable amount of traffic on some of these servers. The console.log commands I have for tracking connection anomalies result in bloated log files that I don't need all of the time - only for debugging. I have been manually stopping the scripts late at night, truncating the files, and restarting them. This won't do for long term, so we decided to find a solution.
Someone else on my system deleted the log files I had set up for each of the servers without my knowledge. Calling forever list on the command line shows that the server scripts are still running but now I can't tail the log files to see how the nodes are doing.
Node downtime should be kept to a bare minimum, so I'm hesitant to stop the servers during daylight hours for longer than a few minutes. Initial testing from the client side seems to indicate that the scripts are doing fine, but I can't be 100% sure there are no errors due to failed attempts at logging to a nonexistent file.
I have a few questions actually:
Is it ok to keep forever running like this?
If not, is there a proper way to disable logging? The github repository seems to indicate that forever will still log to a default file, which I don't want. Otherwise I may just write a cronjob that periodically stops scripts, truncates logs, then restarts the scripts.
What happens if I just create the logfile again with something like touch logfile_name.log while the script is still running - will this make forever freak out or is this a plausible solution?
Thanks for your time.
according to https://github.com/foreverjs/forever, try to pass -s to silent all log.
forever -s start YOURSCRIPT
Surely, before doing this, try to update forever to the latest:
sudo curl -L https://npmjs.com/install.sh | sudo sh
sudo npm update -g.
1) Just build in a periodic function or admin option to clear the forever logs. From the manual forever cleanlogs
2) At least for linux. Send each log file to /dev/null. Each log type is specified by options -l -o and -r. The -a option for append log, will stop it complaining about the log already existing.
forever start -a -l /dev/null -o /dev/null -r /dev/null your-server.js
Perhaps employ your own logging system, I use log4js, it doesn't complain if I delete the log file while the node process is still running.
There's a nifty tool that can help you that called logrotate. Have a look here
Especially the copytruncate option, it is very useful in your case.

Detect pending linux shutdown

Since I install pending updates for my Ubuntu server as soon as possible, I have to restart my linux server quite often. I'm running an webapp on that server and would like to warn my users about the pending restart. Right now, I do this manually, adding an announcement before the restart, give them some time to finish their work, restart and remove the announcement.
I hope, shutdown -r +60 writes an file with all the information about the restart, which I can check on every access. Is there such a file? Would prefer a file in a virtual file system like /proc for performance reasons...
I'm running Ubuntu 10.04.2 LTS
If you are using systemd, the following command shows the scheduled shutdown info.
cat /run/systemd/shutdown/scheduled
Example of output:
USEC=1636410600000000
WARN_WALL=1
MODE=reboot
As remarked in a comment by #Björn, USEC is the timestamp in micro seconds.
You can convert it to a human friendly format dropping the last 6 figures and using date like this:
$ date -d #1636410600
Mon Nov 8 23:30:00 CET 2021
The easiest solution I can envisage means writing a script to wrap the shutdown command, and in that script create a file that your web application can check for.
As far as I know, shutdown doesn't write a file to the underlying files system, although it does trigger broadcast messages warning of the shutdown, which I suppose you could write a program to intercept .. but the above solution seems the easiest.
Script example:
shutdown.bsh
touch /somefolder/somefile
shutdown -r $1
then check for 'somefile' in your web app.
You'd need to add a startup link that erased the 'somefile' otherwise it would still be there when the system comes up and the web app would always be telling your users it was about to shut down.
You can simply check for running shutdown process:
if ps -C shutdown > /dev/null; then
echo "Shutdown is pending"
else
echo "Shutdown is not scheduled"
fi
For newer linux distributions versions you might need to do:
busctl get-property org.freedesktop.login1 /org/freedesktop/login1 org.freedesktop.login1.Manager ScheduledShutdown
The method of how shutdown works has changed
Tried on:
- Debian Stretch 9.6
- Ubuntu 18.04.1 LTS
References
Check if shutdown schedule is active and when it is
The shutdown program on a modern systemd-based Linux system
You could write a daemon that does the announcement when it catches the SIGINT / SIGQUIT signal.

Resources