How to run shell commands in parallel without background processes?

How to run shell commands in parallel without background processes? - linux

I am trying to run several shell commands in a parallel way without making background processes using "&".
Also, I want to assign one job to one CPU (in a fair way)
For example, if I have four cores,
I want assign four cmd1 to cmd4 as follows:
CPU #1: cmd1
CPU #2: cmd2
CPU #3: cmd3
CPU #4: cmd4
Could you please let me know ways doing that?
I've found "parallel" command, but I could not figure out how to use it.
Also, I've tried the following command: ./cmd1 | ./cmd2 | ./cmd3 | ./cmd4
It seems like that four commands (cmd1 to cmd4) are running in parallel, but I am not sure the jobs are assigned to cores as I said in the above.
Thank you!
Sorry. I am running the commands on linux.

First, if you want processes to be executed in parallel, they have to be background jobs. What do have you against using &?
Second, you can use taskset to bind a process to a CPU core, or a set of cores. For example:
taskset -c 0 cmd1 &
taskset -c 1 cmd2 &
taskset -c 2 cmd3 &
taskset -c 3 cmd4 &
This might not be a good idea though; if one process is idle for long periods of time the other 3 cannot use the core it's assigned to.

Related

Proper way to use taskset command

I am getting my available CPU that i have with this command
cat /proc/cpuinfo | grep processor |wc -l
It says, i have available 4 cores (actually 2 physical cores and others logicals)
Then i run my task python3 mytask.py from the command line. After run my program, i want to change its pinned core, as only in core0 or core3 or only core0 and core2
I know i can do it with os.sched_setaffinity() function but i want to do that using taskset command
I am trying this ;
taskset -pc 2 <pid> Can i run this command only checking my available CPU number ?
or do i have to check eligible cores for my task before the run taskset command ?
will linux kernel give me a guarantee to accept my new affinity list if it is between 0 and 4 ?
For example i have 4 CPUs available, and when i want to change kworker thread affinity core0 to core1, it failed. Then i checked allowed CPUs for kworker thread with this command ;
cat /proc/6/status |grep "Cpus_allowed_list:"
it says current affinity list: 0
Do i need to check "Cpus_allowed_list" when i want to run taskset command to change affinity list ?

How to start and monitor a set of programs in bash?

I use a system that is started by a script similar to that:
#!/bin/bash
prog_a & # run continuously
prog_b & # run continuously
prog_c & # run continuously
sleep 2 # wait for some stuff to be done
prog_d # start 'main' program ...
killall -9 prog_a
killall -9 prog_b
killall -9 prog_c
It works well. If I do a ctrl-c in the terminal (or if prog_d crashes), then prog_d died and the first processes prog_a, prog_b, prog_c are killed.
The problem I have is that sometimes prog_a, prog_b or prog_c crashed. And prog_d is still alive. What I would like in fact is: if one program died, then the other ones are killed.
Is it possible to do that simply in bash ? I have tried to create a kind of:
wait pid1 pid2 pid3 ... # wait that pid1 or pid2 or pid3 died
But without success (I need to be able to do a ctrl-c to kill prog_d).
Thanks !

I would do that with GNU Parallel, which has nice handling for what to do when any job fails... whether one or more or a percentage fail, whether other jobs should be terminated immediately or only no new jobs should be started.
In your specific case:
parallel -j 4 --halt now,fail=1 --line-buffer ::: progA progB progC 'sleep 2; progD'
That says... "run all four jobs parallel, and halt immediately killing all others if any job fails. Buffer the output by lines. The jobs to be run are then specified after the ::: and they are just your jobs but with a delay before the final one."
You may like the output tagged by the job-name, so you can see which outputs came from which processes, if so, use parallel --tag ...
You may like to delay/stagger the starts of each job, in which case use parallel --delay 1 to start jobs at 1 second intervals and remove the sleep 2.

How to add a different scheduler to a shell script in Linux?

I am trying to use different schedulers to measure CPU usage among various programs. I am currently having trouble figuring out how to add a different scheduler to the script. I have tried using the chrt command, but I can not reliably get the pid for the script.

PIDs are fickle and racy (only the parent process of a PID can be sure it hasn't died and been recycled).
I'd use the first form (chrt [options] prio command [arg]...
) instead, relying on two scripts:
wrapper_script:
exec chrt --fifo 99 wrapee #wrapee must be in $PATH
wrapee:
echo "I'm a hi-priority hello world"

How to control multithreaded background jobs in for loop in shell script

I found that my linux workstation with 12 CPUs had almost stopped to work after I executed a shell script (tcsh) having a for-loop where more than hundreds of loops are executed simultaneously by adding '&' at the end of the command. Is there any way to control the number or executing time for background processes in the for-loop using tcsh?

GNU Parallel is made for this kind of situations.
GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.
If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:
GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:
Installation
If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:
(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash
For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README
Learn more
See more examples: http://www.gnu.org/software/parallel/man.html
Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html
Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel

linux batch jobs in parallel

I have seven licenses of a particular software. Therefore, I want to start 7 jobs simultaneously. I can do that using '&'. Now, 'wait' command waits till the end of all of those 7 processes to be finished to spawn the next 7. Now, I would like to write the shell script where after I start the first seven, as and when a job gets completed I would like to start another. This is because some of those 7 jobs might take very long while some others get over really quickly. I don't want to waste time waiting for all of them to finish. Is there a way to do this in linux? Could you please help me?
Thanks.

GNU parallel is the way to go. It is designed for launching multiples instances of a same command, each with a different argument retrieved either from stdin or an external file.
Let's say your licensed script is called myScript, each instance having the same options --arg1 --arg2 and taking a variable parameter --argVariable for each instance spawned, those parameters being stored in file myParameters :
cat myParameters | parallel -halt 1 --jobs 7 ./myScript --arg1 --argVariable {} --arg2
Explanations :
-halt 1 tells parallel to halt all jobs if one fails
--jobs 7 will launch 7 instances of myScript
On a debian-based linux system, you can install parallel using :
sudo apt-get install parallel
As a bonus, if your licenses allow it, you can even tell parallel to launch these 7 instances amongst multiple computers.

You could check how many are currently running and start more if you have less than 7:
while true; do
if [ "`ps ax -o comm | grep process-name | wc -l`" -lt 7 ]; then
process-name &
fi
sleep 1
done

Write two scripts. One which restarts a job everytime it is finished and one that starts 7 times the first script.
Like:
script1:
./script2 job1
...
./script2 job7
and
script2:
while(...)
./jobX

I found a fairly good solution using make, which is a part of the standard distributions. See here

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to run shell commands in parallel without background processes? - linux

Related

Proper way to use taskset command

How to start and monitor a set of programs in bash?

How to add a different scheduler to a shell script in Linux?

How to control multithreaded background jobs in for loop in shell script

linux batch jobs in parallel

Categories

Resources