Running multiple executable parallel with limitation to 5 - linux

I am running some larger amount (100) of calculations. Each calculation needs 5 hours to finish and can run only on one core. So in order to make whole process more efficient i need to run 5 (or more, depends on number of cpus) of them at same time.
I have 100 folders and in each is one executable. How can I run all 100 executable in the way that only 5 are running at the same time?
I have tried xargs as seen: Parallelize Bash script with maximum number of processes
cpus=5
find . -name ./*/*.exe | xargs --max-args=1 --max-procs=$cpus echo
I can't find the way to run them. Echo only prints paths on screen.

You can execute the input as the command by specifying a substitution pattern and using that as the command:
$ printf '%s\n' uname pwd | xargs -n 1 -I{} {}
Linux
/data/data/com.termux/files/home
or more easily but less directly by using env as the command, since that will execute its first argument:
printf '%s\n' uname pwd | xargs -n 1 env
In your case, your full command might be:
find . -name '*.exe' | xargs -n 1 -P 5 env

Related

Why does xtrace show piped commands executing out of order?

Here's a simple reproducer:
cat >sample_pipeline.sh << EOF
set -x
head /dev/urandom | awk '{\$2=\$3=""; print \$0}' | column -t | grep 123 | wc -l >/dev/null
EOF
watch -g -d -n 0.1 'bash sample_pipeline.sh 2>&1 | tee -a /tmp/watchout'
wc -l /tmp/watchout
tail /tmp/watchout
As expected, usually the commands execute in the order they are written in:
+ head /dev/urandom
+ awk '{$2=$3=""; print $0}'
+ column -t
+ grep 123
+ wc -l
...but some of the time the order is different, e.g. awk before head:
+ awk '{$2=$3=""; print $0}'
+ head /dev/urandom
+ column -t
+ grep 123
+ wc -l
I can understand if the shell pre-spawns processes waiting for input, but why wouldn't it spawn them in order?
Replacing bash with dash (the default Ubuntu shell) seems to have the same behavior.
When you write a pipeline like that, the shell will fork off a bunch of child processes to run all of the subcommands. Each child process will then print the command it is executing just before it calls exec. Since each child is an independent process, the OS might schedule them in any order, and various things going on (cache misses/thrashing between CPU cores) might delay some children and not others. So the order the messages come out is unpredictable.
This happens because of how pipelines are implement. The shell will first fork N subshells, and each subshell will (when scheduled) print out its xtrace output and invoke its command. The order of the output is therefore the result of a race.
You can see a similar effect with
for i in {1..5}; do echo "$i" & done
Even though each echo command is spawned in order, the output may still be 3 4 5 2 1.

How to run shell script commands in an sh file in parallel?

I'm trying to take backup of tables in my database server.
I have around 200 tables. I have a shell script that contains commands to take backups of each table like:
backup.sh
psql -u username ..... table1 ... file1;
psql -u username ..... table2 ... file2;
psql -u username ..... table3 ... file3;
I can run the script and create backups in my machine. But as there are 200 tables, it's gonna run the commands sequentially and takes lot of time.
I want to run the backup commands in parallel. I have seen articles where in they suggested to use && after each command or use nohup command or wait command.
But I don't want to edit the script and include around 200 such commands.
Is there any way to run these list of shell script commands parallelly? something like nodejs does? Is it possible to do it? Or am I looking at it wrong?
Sample command in the script:
psql --host=somehost --port=5490 --username=user --dbname=db -c '\copy dbo.tablename TO "/home/username/Desktop/PostgresFiles/tablename.csv" with DELIMITER ","';
You can leverage xargs to run command in parallel, AND control the number of concurrent jobs. Running 200 backup jobs might overwhelm your database, and result in less than optimal performance.
Assuming you have backup.sh with one backup command per line
xargs -P5 -I{} bash -c "{}" < backup.sh
The commands in backup.sh should be modified to allow quoting (using single quote when possible, escaping double quote):
psql --host=somehost --port=5490 --username=user --dbname=db -c '\copy dbo.tablename TO \"/home/username/Desktop/PostgresFiles/tablename.csv\" with DELIMITER \",\"';
Where -P5 control the number of concurrent jobs. This will be able to process command lines WITHOUT double quotes. For the above script, you change "\copy ..." to '\copy ...'
Simpler alternative will be to use a helper backup-table.sh, which will take two parameters (table, file), and use
xargs -P5 -I{} backup-table.sh "{}" < tables.txt
And put all the complex quoting into the backup-table.sh
doit() {
table=$1
psql --host=somehost --port=5490 --username=user --dbname=db -c '\copy dbo.'$table' TO "/home/username/Desktop/PostgresFiles/'$table'.csv" with DELIMITER ","';
}
export -f doit
sql --listtables -n postgresql://user:pass#host:5490/db | parallel -j0 doit
Is there any logic in the script other than individual commands? (EG: and if's or processing of output?).
If it's just a file with a list of scripts, you could write a wrapper for the script (or a loop from the CLI) EG:
$ cat help.txt
echo 1
echo 2
echo 3
$ while read -r i;do bash -c "$i" &done < help.txt
[1] 18772
[2] 18773
[3] 18774
1
2
3
[1] Done bash -c "$i"
[2]- Done bash -c "$i"
[3]+ Done bash -c "$i"
$ while read -r i;do bash -c "$i" &done < help.txt
[1] 18820
[2] 18821
[3] 18822
2
3
1
[1] Done bash -c "$i"
[2]- Done bash -c "$i"
[3]+ Done bash -c "$i"
Each line of help.txt contains a command and I run a loop where I take each command and run it in subshell. (this is a simple example where I just background each job. You could get more complex using something like xargs -p or parallel but this is a starting point)

Collect stdout logs for a shell script that runs inside another shell script

I have a shell script called load_data.sh that runs inside a shell script called shell.sh
The contents of shell.sh
xargs --max-procs 10 -n 1 sh load_data.sh < tables.txt
This shell script runs on 10 tables at the same time in the tables.txt
Now I want to collect the Full logs of the load_data.sh So I did
xargs --max-procs 10 -n 1 sh load_data.sh < tables.txt |& tee-a logs.txt
But I am getting a mix of all the logs. What I want is the logs should be 1st table log then 2nd table log and then 3rd table logs and so on...
Is it possible to achieve that. If so how can I achieve that?
You can solve your problem by creating a separate logfile for each time your script is run. To get the logfiles to be created in sequence you can use the 'nl' utility to number each line of input.
nl -n rz tables.txt | xargs -n 2 --max-procs 10 sh -c './load_data.sh "$1" > log-$0'
Will produced logfiles in sequence
log-001
log-002
log-003
..
To turn that back into one file you can just use 'cat'
cat log* > result

Need to insert the value of process ID dynamically into a command

I want to monitor the number of file descriptors opened by a process running on my centos box. the below command works for me
watch -n 1 "ls /proc/pid/fd | wc -l"
The problem comes when I need to monitor the same when the above process is restarted. The pid changes and I cant get the stats.
The good thing is that the pname is constant. So I can extract the pid using pgrep pname.
So how can I use the command in the below way:
watch -n 1 "ls /proc/"pgrep <pname>"/fd | wc -l"
I want the pgrep pname value to be dynamically picked up.
Is there any way I can define a variable which continuously gets the latest value of pgrep pname and I can insert the variable here.
watch evaluates its command as shell command each time, so we first have to find a shell command that produces the output. Since there may be multiple matching processes, we can use a loop:
for pid in $(pgrep myprocess); do ls "/proc/$pid/fd"; done | wc -l
Now we can quote that to pass it literally to watch:
watch -n 1 'for pid in $(pgrep myprocess); do ls "/proc/$pid/fd"; done | wc -l'
watch -n 1 "pgrep memcached | xargs -I{} ls /proc/{}/fd | wc -l"
Another one way.

Processing file with xargs for concurrency

There is an input like:
folder1
folder2
folder3
...
foldern
I would like to iterate over taking multiple lines at once and processes each line, remove the first / (and more but for now this is enough) and echo the. Iterating over in bash with a single thread can be slow sometimes. The alternative way of doing this would be splitting up the input file to N pieces and run the same script with different input and output N times, at the end you can merge the results.
I was wondering if this is possible with xargs.
Update 1:
Input:
/a/b/c
/d/f/e
/h/i/j
Output:
mkdir a/b/c
mkdir d/f/e
mkdir h/i/j
Script:
for i in $(<test); do
echo mkdir $(echo $i | sed 's/\///') ;
done
Doing it with xargs does not work as I would expect:
xargs -a test -I line --max-procs=2 echo mkdir $(echo $line | sed 's/\///')
Obviously I need a way to execute the sed on the input for each line, but using $() does not work.
You probably want:
--max-procs=max-procs, -P max-procs
Run up to max-procs processes at a time; the default is 1. If
max-procs is 0, xargs will run as many processes as possible at
a time. Use the -n option with -P; otherwise chances are that
only one exec will be done.
http://unixhelp.ed.ac.uk/CGI/man-cgi?xargs
With GNU Parallel you can do:
cat file | perl -pe s:/:: | parallel mkdir -p
or:
cat file | parallel mkdir -p {= s:/:: =}

Resources