Perl Crons Jobs = High Server Load? - linux

I have multiple cron jobs written in Perl and they appear to be causing a high server load.
When I run top through SSH it shows that Perl is using the most CPU and memory. However, since there's multiple cron jobs, I need to know specifically which one is using the most resources.
Is there any way to check which of the Perl files is using the most resources?

Note the PID of the process top shows is using cpu. Then do a
ps -ef | grep perl
Match the PID to one listed and you'll see the full commandline of the perl process for the high cpu job.

Well, if you look at ps -ef you should see which script maps to that process id. You could also use strace -fTt -p <pid> to attach a debugger to a specific process id to see what it's doing.
Or you could modify the script to change $0 to something meaningful that tells you which script is which.
But it's hard to say without a bit more detail. Is there any chance the script is taking longer to run than you cron job takes to 'fire'? Because if you start backlogging a cron job you'll slowly get worse as more and more start piling up behind it.

Related

list of all executions in Linux

Is it possible to find out all the programs getting executed in Linux. There will be many scripts and other executables getting launched and killed during the life time of a system and I would like to get a list of these (or a print while execuction start). I am looking for this to understand program flow in an embedded board.
Type ps aux in the terminal will give information about the processes(Started time, runtime), more information about keep track of processes
There's a kernel interface which will notify a client program of every fork, exec, and exit. See this answer: https://stackoverflow.com/a/8255487/5844347
Take a look at ps -e and eventually crontab if you want to take that information periodically.

using qsub (sge) with multi-threaded applications

i wanted to submit a multi-threaded job to the cluster network i'm working with -
but the man page about qsub is not clear how this is done - By default i guess it just sends it as a normal job regardless of the multi-threading - but this might cause problems, i.e. sending many multi-threaded jobs to the same computer, slowing things down.
Does anyone know how to accomplish this? thanks.
The batch server system is sge.
In SGE/UGE the configuration is set by the administrator so you have to check what they've called the parallel environments
qconf -spl
make
our_paraq
look for one with $pe_slots in the config
qconf -sp make
qconf -sp our_paraq
qsub with that environment and number of cores you want to use
qsub -pe our_paraq 8 -cwd ./myscript
If you're using mpi you have more choices for the config allocation rule ($pe_slots above) like $round_robin and $fill_up, but this should get you going.
If your job is multithreaded, you can harness the advantage of multithreading even in SGE. In SGE single job can use one or many CPUs. If you submit a job that uses single processor and you have a program that have many threads than a single processor can handle, problem occurs. Verify how many processors your job is using and how many threads per CPU your program is creating.
In my case i have a java program that uses one processor with two threads, it works pretty efficiently. i submit same java program for execution to many CPUs with 2 threads each to make it parallel as i have not use MPI.
The answer by the user "j_m" is very helpful, but in my case I needed to both request multiple cores AND submit my job to a specific node. After a copious amount of searching, I finally found a solution that worked for me and I'm posting it here so that other people who might have a similar problem don't have to go through the same pain (please note that I'm leaving this as an answer instead of a reply because I don't have enough reputation for making replies):
qsub -S /bin/sh -cwd -l h=$NODE_NAME -V -pe $ENV_NAME $N_OF_CORES $SCRIPT_NAME
I think the variables $NODE_NAME, $N_OF_CORES and $SCRIPT_NAME are pretty straightforward. You can easily find $ENV_NAME by following the answer by "j_m".

How to kill many instances of a process at one go?

I have several instances of a process (i.e. with a common command line). I would like to kill all of them at one go. How to achieve it?
Options:
killall
ps|awk|xargs kill
tag-and-kill in htop
Killall is super powerful, but I find it hazardous to use indiscriminately. Option 2 is awkward to use, but I often find myself in environments that don't have killall; also, leaving out the xargs bit on the first pass lets me review the condemned processes before I swing the blade. Ultimately, I usually favour htop, since it lets me pick and choose before hitting the big "k".
You are probably looking for the killall command. For example:
killall perl
Would kill off all perl processes that are running on your machine. See http://linux.die.net/man/1/killall for more details.
killall will do that for you. Use man killall for the options but I usually do:
killall myProgName
Just be very careful (eg, use ps first to make sure it will only kill what you want).
NOTE: killall is the answer... IF you're on Linux. SysV also has a killall command, but it does a very, very different thing: it's part of the shutting-down-processes-prior-to-system-halt. So, yes, killall's the easiest, but if you often shuttle between Linux and SysV systems, I might recommend writing up a quick script to do what you want, instead.

How to run processes piped with bash on multiple cores?

I have a simple bash script that pipes output of one process to another. Namely:.
dostuff | filterstuff
It happens that on my Linux system (openSUSE if it matters, kernel 2.6.27) these both processes run on a single core. However, running different processes on different cores is a default policy that doesn't happen to trigger in this case.
What component of the system is responsible for that and what should I do to utilize multicore feature?
Note that there's no such problem on 2.6.30 kernel.
Clarification: Having followed Dennis Williamson's advice, I made sure with top program, that piped processes are indeed always run on the same processor. Linux scheduler, which usually does a really good job, this time doesn't do it.
I figure that something in bash prevents OS from doing it. The thing is that I need a portable solution for both multi-core and single-core machines. The taskset solution proposed by Dennis Williamson won't work on single-core machines. Currently I'm using:,
dostuff | taskset -c 0 filterstuff
but this seems like a dirty hack. Could anyone provide a better solution?
Suppose dostuff is running on one CPU. It writes data into a pipe, and that data will be in cache on that CPU. Because filterstuff is reading from that pipe, the scheduler decides to run it on the same CPU, so that its input data is already in cache.
If your kernel is built with CONFIG_SCHED_DEBUG=y,
# echo NO_SYNC_WAKEUPS > /sys/kernel/debug/sched_features
should disable this class of heuristics. (See /usr/src/linux/kernel/sched_features.h and /proc/sys/kernel/sched_* for other scheduler tunables.)
If that helps, and the problem still happens with a newer kernel, and it's really faster to run on separate CPUs than one CPU, please report the problem to the Linux Kernel Mailing List so that they can adjust their heuristics.
Give this a try to set the CPU (processor) affinity:
taskset -c 0 dostuff | taskset -c 1 filterstuff
Edit:
Try this experiment:
create a file called proctest and chmod +x proctest with this as the contents:
#!/bin/bash
while true
do
ps
sleep 2
done
start this running:
./proctest | grep bash
in another terminal, start top - make sure it's sorting by %CPU
let it settle for several seconds, then quit
issue the command ps u
start top -p with a list of the PIDs of the highest several processes, say 8 of them, from the list left on-screen by the exited top plus the ones for proctest and grep which were listed by ps - all separated by commas, like so (the order doesn't matter):
top -p 1234, 1255, 1211, 1212, 1270, 1275, 1261, 1250, 16521, 16522
add the processor field - press f then j then Space
set the sort to PID - press Shift+F then a then Space
optional: press Shift+H to turn on thread view
optional: press d and type .09 and press Enter to set a short delay time
now watch as processes move from processor to processor, you should see proctest and grep bounce around, sometimes on the same processor, sometimes on different ones
The Linux scheduler is designed to give maximum throughput, not do what you imagine is best. If you're running processes which are connected with a pipe, in all likelihood, one of them is blocking the other, then they swap over. Running them on separate cores would achieve little or nothing, so it doesn't.
If you have two tasks which are both genuinely ready to run on the CPU, I'd expect to see them scheduled on different cores (at some point).
My guess is, what happens is that dostuff runs until the pipe buffer becomes full, at which point it can't run any more, so the "filterstuff" process runs, but it runs for such a short time that dostuff doesn't get rescheduled until filterstuff has finished filtering the entire pipe buffer, at which point dostuff then gets scheduled again.

Determining the reason for a stalled process on Linux

I'm trying to determine the reason for a stalled process on Linux. It's a telecom application, running under fairly heavy load. There is a separate process for each of 8 T1 spans. Every so often, one of the processes will get very unresponsive - up to maybe 50 seconds before an event is noted in the normally very busy process's log.
It is likely some system resource that runs short. The obvious thing - CPU usage - looks to be OK.
Which linux utilities might be best for catching and analyzing this sort of thing, and be as unobtrusive about it as possible, as this is a highly loaded system? It would need to be processes rather than system oriented, it would seem. Maybe ongoing monitoring of /proc/pid/XX? Top wouldn't seem to be too useful here.
If you are able to spot this "moment of unresponsiveness", then you might use strace to attach to the process in question during that time and try to figure out where it "sleeps":
strace -f -o LOG -p <pid>
More lightweight, but less reliable method:
When process hangs, use top/ps/gdp/strace/ltrace to find out the state of the process (e.g. whether it waits in "select" or consumes 100% cpu in some library call)
Knowing the general nature of the call in question, tailor the invocation of strace to log specific syscalls or groups of syscall. For example, to log only file access-related syscalls, use:
strace -e file -f -o LOG ....
If the strace is too heavy a tool for you, try monitoring:
Memory usage with "vmstat 1 > /some/log" - maybe process is being swapped in (or out) during that time
IO usage with vmstat/iotop - maybe some other process is thrashing the disks
/proc/interrupts - maybe driver for your T1 card is experiencing problems?
You can strace the program in question and see what system calls it's making.
Thanks - strace sounds useful. Catching the process at the right time will be part of the fun. I came up with a scheme to periodically write a time stamp into shared memory, then monitor with another process. Sending a SIGSTOP would then let me at least examine the application stack with gdb. I don't know if strace on a paused process will tell me much, but I could maybe then turn on strace and see what it will say. Or turn on strace and hit the process with a SIGCONT.

Resources