How to run processes piped with bash on multiple cores? - linux

I have a simple bash script that pipes output of one process to another. Namely:.
dostuff | filterstuff
It happens that on my Linux system (openSUSE if it matters, kernel 2.6.27) these both processes run on a single core. However, running different processes on different cores is a default policy that doesn't happen to trigger in this case.
What component of the system is responsible for that and what should I do to utilize multicore feature?
Note that there's no such problem on 2.6.30 kernel.
Clarification: Having followed Dennis Williamson's advice, I made sure with top program, that piped processes are indeed always run on the same processor. Linux scheduler, which usually does a really good job, this time doesn't do it.
I figure that something in bash prevents OS from doing it. The thing is that I need a portable solution for both multi-core and single-core machines. The taskset solution proposed by Dennis Williamson won't work on single-core machines. Currently I'm using:,
dostuff | taskset -c 0 filterstuff
but this seems like a dirty hack. Could anyone provide a better solution?

Suppose dostuff is running on one CPU. It writes data into a pipe, and that data will be in cache on that CPU. Because filterstuff is reading from that pipe, the scheduler decides to run it on the same CPU, so that its input data is already in cache.
If your kernel is built with CONFIG_SCHED_DEBUG=y,
# echo NO_SYNC_WAKEUPS > /sys/kernel/debug/sched_features
should disable this class of heuristics. (See /usr/src/linux/kernel/sched_features.h and /proc/sys/kernel/sched_* for other scheduler tunables.)
If that helps, and the problem still happens with a newer kernel, and it's really faster to run on separate CPUs than one CPU, please report the problem to the Linux Kernel Mailing List so that they can adjust their heuristics.

Give this a try to set the CPU (processor) affinity:
taskset -c 0 dostuff | taskset -c 1 filterstuff
Edit:
Try this experiment:
create a file called proctest and chmod +x proctest with this as the contents:
#!/bin/bash
while true
do
ps
sleep 2
done
start this running:
./proctest | grep bash
in another terminal, start top - make sure it's sorting by %CPU
let it settle for several seconds, then quit
issue the command ps u
start top -p with a list of the PIDs of the highest several processes, say 8 of them, from the list left on-screen by the exited top plus the ones for proctest and grep which were listed by ps - all separated by commas, like so (the order doesn't matter):
top -p 1234, 1255, 1211, 1212, 1270, 1275, 1261, 1250, 16521, 16522
add the processor field - press f then j then Space
set the sort to PID - press Shift+F then a then Space
optional: press Shift+H to turn on thread view
optional: press d and type .09 and press Enter to set a short delay time
now watch as processes move from processor to processor, you should see proctest and grep bounce around, sometimes on the same processor, sometimes on different ones

The Linux scheduler is designed to give maximum throughput, not do what you imagine is best. If you're running processes which are connected with a pipe, in all likelihood, one of them is blocking the other, then they swap over. Running them on separate cores would achieve little or nothing, so it doesn't.
If you have two tasks which are both genuinely ready to run on the CPU, I'd expect to see them scheduled on different cores (at some point).
My guess is, what happens is that dostuff runs until the pipe buffer becomes full, at which point it can't run any more, so the "filterstuff" process runs, but it runs for such a short time that dostuff doesn't get rescheduled until filterstuff has finished filtering the entire pipe buffer, at which point dostuff then gets scheduled again.

Related

list of all executions in Linux

Is it possible to find out all the programs getting executed in Linux. There will be many scripts and other executables getting launched and killed during the life time of a system and I would like to get a list of these (or a print while execuction start). I am looking for this to understand program flow in an embedded board.
Type ps aux in the terminal will give information about the processes(Started time, runtime), more information about keep track of processes
There's a kernel interface which will notify a client program of every fork, exec, and exit. See this answer: https://stackoverflow.com/a/8255487/5844347
Take a look at ps -e and eventually crontab if you want to take that information periodically.

Why Running multiple same commands take a very long time

This isn't so much of a programming question, but more of a problem which I've encountered lately, which I'm trying to understand.
Example, running an ls command in linux take maybe ..... 1 sec.
But when I spawn off a few thousands of ls command simultaneously, I noticed that some of the process is not running, and kinda take a very long time to run.
Why is that so? And how can we work around that?
Thanks in advance.
UPDATE:
I did a ps, and saw that a couple of the ls commands were in the state of D<. I checked up a bit, and understand that it is an Uninterruptable Sleep. What is that? And when will that happen? How to avoid that?
The number of processes or threads that can be executing concurrently is limited by the number of cores in your machine.
If you spawn thousands of processes or threads simultaneously the kernel is only able to run 'n' (where n equals the number of available cores) at the same time, some of them will have to wait to be scheduled.
If you want to run more processes or threads truly concurrently then you need to increase the number of available cores in the system (ie. by adding CPUs, enabling hyperthreading if available).

Will forking more workers allow me to balance CPU-heavy work?

I love node.js' evented model, but it only takes you so far - when you have a function (say, a request handler for HTTP connections) that does a lot of heavy work on the CPU, it's still "blocking" until its function returns. That's to be expected. But what if I want to balance this out a bit, so that a given requests takes longer to process but the overall response time is shorter, using the operarting system's ability to schedule the processes?
My production code uses node's wonderfully simple Cluster module to fork a number of workers equal to the number of cores the system's CPU has. Would it be bad to fork more than this - perhaps two or three workers per core? I know there'll be a memory overhead here, but memory is not my limitation. What reading I did mentioned that you want to avoid "oversubscribing", but surely on a modern system you're not going crazy by having two or three processes vying for time on the processor.
I think your idea sounds like a good one; especially because many processors support hyperthreading. Hyperthreading is not magical and won't suddenly double your application's speed or throughput but it can make sense to have another thread ready to execute in a core when the first thread needs to wait for a memory request to be filled.
Be careful when you start multiple workers: the Linux kernel really prefers to keep processes executing on the same processor for their entire lifetime to provide for strong cache affinity. This makes enough sense. But I've seen several CPU-hungry processes vying for a single core or worse a single hyperthread instance rather than the system re-balancing the processes across all cores or all siblings. Check your processor affinities by running ps -eo pid,psr,comm (or whatever your favorite ps(1) command is; add the psr column).
To combat this you might want to start your workers with an explicitly limited CPU affinity:
taskset -c 0,1 node worker 1
taskset -c 2,3 node worker 2
taskset -c 4,5 node worker 3
taskset -c 6,7 node worker 4
Or perhaps start eight, one per HT sibling, or eight and confine each one to their own set of CPUs, or perhaps sixteen, confine four per core or two per sibling, etc. (You can go nuts trying to micromanage. I suggest keeping it simple if you can.) See the taskset(1) manpage for details.

Is it possible to "hang" a Linux box with a SCHED_FIFO process?

I want to have a real-time process take over my computer. :)
I've been playing a bit with this. I created a process which is essentially a while (1) (never blocks nor yields the processor) and used schedtool to run it with SCHED_FIFO policy (also tried chrt). However, the process was letting other processes run as well.
Then someone told me about sched_rt_runtime_us and sched_rt_period_us. So I set the runtime to -1 in order to make the real-time process take over the processor (and also tried making both values the same), but it didn't work either.
I'm on Linux 2.6.27-16-server, in a virtual machine with just one CPU. What am I doing wrong?
Thanks,
EDIT: I don't want a fork bomb. I just want one process to run forever, without letting other processes run.
There's another protection I didn't know about.
If you have just one processor and want a SCHED_FIFO process like this (one that never blocks nor yields the processor voluntarily) to monopolize it, besides giving it a high priority (not really necessary in most cases, but doesn't hurt) you have to:
Set sched_rt_runtime_us to -1 or to the value in sched_rt_period_us
If you have group scheduling configured, set /cgroup/cpu.rt_runtime_us to -1 (in case
you mount the cgroup filesystem on /cgroup)
Apparently, I had group scheduling configured and wasn't bypassing that last protection.
If you have N processors, and want your N processes to monopolize the processor, you just do the same but launch all of them from your shell (the shell shouldn't get stuck until you launch the last one, since it will have processors to run on). If you want to be really sure each process will go to a different processor, set its CPU affinity accordingly.
Thanks to everyone for the replies.
I'm not sure about schedtool, but if you successfully change the scheduler using sched_setscheduler to SCHED_FIFO, then run a task which does not block, then one core will be entirely allocated to the task. If this is the only core, no SCHED_OTHER tasks will run at all (i.e. anything except a few kernel threads).
I've tried it myself.
So I speculate that either your "non blocking" task was blocking, or your schedtool program failed to change the scheduler (or changed it for the wrong task).
Also You can make you process a SCHED_FIFO with highest priority of 1. So the process would run forever and it wont be pre-empted.

Determining the reason for a stalled process on Linux

I'm trying to determine the reason for a stalled process on Linux. It's a telecom application, running under fairly heavy load. There is a separate process for each of 8 T1 spans. Every so often, one of the processes will get very unresponsive - up to maybe 50 seconds before an event is noted in the normally very busy process's log.
It is likely some system resource that runs short. The obvious thing - CPU usage - looks to be OK.
Which linux utilities might be best for catching and analyzing this sort of thing, and be as unobtrusive about it as possible, as this is a highly loaded system? It would need to be processes rather than system oriented, it would seem. Maybe ongoing monitoring of /proc/pid/XX? Top wouldn't seem to be too useful here.
If you are able to spot this "moment of unresponsiveness", then you might use strace to attach to the process in question during that time and try to figure out where it "sleeps":
strace -f -o LOG -p <pid>
More lightweight, but less reliable method:
When process hangs, use top/ps/gdp/strace/ltrace to find out the state of the process (e.g. whether it waits in "select" or consumes 100% cpu in some library call)
Knowing the general nature of the call in question, tailor the invocation of strace to log specific syscalls or groups of syscall. For example, to log only file access-related syscalls, use:
strace -e file -f -o LOG ....
If the strace is too heavy a tool for you, try monitoring:
Memory usage with "vmstat 1 > /some/log" - maybe process is being swapped in (or out) during that time
IO usage with vmstat/iotop - maybe some other process is thrashing the disks
/proc/interrupts - maybe driver for your T1 card is experiencing problems?
You can strace the program in question and see what system calls it's making.
Thanks - strace sounds useful. Catching the process at the right time will be part of the fun. I came up with a scheme to periodically write a time stamp into shared memory, then monitor with another process. Sending a SIGSTOP would then let me at least examine the application stack with gdb. I don't know if strace on a paused process will tell me much, but I could maybe then turn on strace and see what it will say. Or turn on strace and hit the process with a SIGCONT.

Resources