How to run shell script commands in an sh file in parallel? - linux

I'm trying to take backup of tables in my database server.
I have around 200 tables. I have a shell script that contains commands to take backups of each table like:
backup.sh
psql -u username ..... table1 ... file1;
psql -u username ..... table2 ... file2;
psql -u username ..... table3 ... file3;
I can run the script and create backups in my machine. But as there are 200 tables, it's gonna run the commands sequentially and takes lot of time.
I want to run the backup commands in parallel. I have seen articles where in they suggested to use && after each command or use nohup command or wait command.
But I don't want to edit the script and include around 200 such commands.
Is there any way to run these list of shell script commands parallelly? something like nodejs does? Is it possible to do it? Or am I looking at it wrong?
Sample command in the script:
psql --host=somehost --port=5490 --username=user --dbname=db -c '\copy dbo.tablename TO "/home/username/Desktop/PostgresFiles/tablename.csv" with DELIMITER ","';

You can leverage xargs to run command in parallel, AND control the number of concurrent jobs. Running 200 backup jobs might overwhelm your database, and result in less than optimal performance.
Assuming you have backup.sh with one backup command per line
xargs -P5 -I{} bash -c "{}" < backup.sh
The commands in backup.sh should be modified to allow quoting (using single quote when possible, escaping double quote):
psql --host=somehost --port=5490 --username=user --dbname=db -c '\copy dbo.tablename TO \"/home/username/Desktop/PostgresFiles/tablename.csv\" with DELIMITER \",\"';
Where -P5 control the number of concurrent jobs. This will be able to process command lines WITHOUT double quotes. For the above script, you change "\copy ..." to '\copy ...'
Simpler alternative will be to use a helper backup-table.sh, which will take two parameters (table, file), and use
xargs -P5 -I{} backup-table.sh "{}" < tables.txt
And put all the complex quoting into the backup-table.sh

doit() {
table=$1
psql --host=somehost --port=5490 --username=user --dbname=db -c '\copy dbo.'$table' TO "/home/username/Desktop/PostgresFiles/'$table'.csv" with DELIMITER ","';
}
export -f doit
sql --listtables -n postgresql://user:pass#host:5490/db | parallel -j0 doit

Is there any logic in the script other than individual commands? (EG: and if's or processing of output?).
If it's just a file with a list of scripts, you could write a wrapper for the script (or a loop from the CLI) EG:
$ cat help.txt
echo 1
echo 2
echo 3
$ while read -r i;do bash -c "$i" &done < help.txt
[1] 18772
[2] 18773
[3] 18774
1
2
3
[1] Done bash -c "$i"
[2]- Done bash -c "$i"
[3]+ Done bash -c "$i"
$ while read -r i;do bash -c "$i" &done < help.txt
[1] 18820
[2] 18821
[3] 18822
2
3
1
[1] Done bash -c "$i"
[2]- Done bash -c "$i"
[3]+ Done bash -c "$i"
Each line of help.txt contains a command and I run a loop where I take each command and run it in subshell. (this is a simple example where I just background each job. You could get more complex using something like xargs -p or parallel but this is a starting point)

Related

Different file indirection at every watch execution

I'm trying to collect some data at every second to different file(preferably timed name file). I'm trying to use watch command but it's not behaving as per expectation.
watch -p -n 1 "curl -s http://127.0.0.1:9273/metrics > `date +'%H-%M-%S'`.txt"
Only 1 file is created and data is being directed to it. I was expecting it to write to different files. I'm not looking to alternative methods. Can it be modified to achieve said task?
quote it with single quote
or wrap the command line passed to watch , with bash -c
pay attention to the quotes i used, they can not be swapped
both following command works for one second per file
watch -p -n 1 'curl -s http://127.0.0.1:9273/metrics > `date +'%H-%M-%S'`.txt'
watch -p -n 1 'bash -c "curl -s http://127.0.0.1:9273/metrics > `date +'%H-%M-%S'`.txt"'

Collect stdout logs for a shell script that runs inside another shell script

I have a shell script called load_data.sh that runs inside a shell script called shell.sh
The contents of shell.sh
xargs --max-procs 10 -n 1 sh load_data.sh < tables.txt
This shell script runs on 10 tables at the same time in the tables.txt
Now I want to collect the Full logs of the load_data.sh So I did
xargs --max-procs 10 -n 1 sh load_data.sh < tables.txt |& tee-a logs.txt
But I am getting a mix of all the logs. What I want is the logs should be 1st table log then 2nd table log and then 3rd table logs and so on...
Is it possible to achieve that. If so how can I achieve that?
You can solve your problem by creating a separate logfile for each time your script is run. To get the logfiles to be created in sequence you can use the 'nl' utility to number each line of input.
nl -n rz tables.txt | xargs -n 2 --max-procs 10 sh -c './load_data.sh "$1" > log-$0'
Will produced logfiles in sequence
log-001
log-002
log-003
..
To turn that back into one file you can just use 'cat'
cat log* > result

Run all shell scripts in folder

I have many .sh scripts in a single folder and would like to run them one after another. A single script can be executed as:
bash wget-some_long_number.sh -H
Assume my directory is /dat/dat1/files
How can I run bash wget-some_long_number.sh -H one after another?
I understand something in these lines should work:
for i in *.sh;...do ....; done
Use this:
for f in *.sh; do
bash "$f"
done
If you want to stop the whole execution when a script fails:
for f in *.sh; do
bash "$f" || break # execute successfully or break
# Or more explicitly: if this execution fails, then stop the `for`:
# if ! bash "$f"; then break; fi
done
It you want to run, e.g., x1.sh, x2.sh, ..., x10.sh:
for i in `seq 1 10`; do
bash "x$i.sh"
done
To preserve exit code of failed script (responding to #VespaQQ):
#!/bin/bash
set -e
for f in *.sh; do
bash "$f"
done
There is a much simpler way, you can use the run-parts command which will execute all scripts in the folder:
run-parts /path/to/folder
I ran into this problem where I couldn't use loops and run-parts works with cron.
Answer:
foo () {
bash -H $1
#echo $1
#cat $1
}
cd /dat/dat1/files #change directory
export -f foo #export foo
parallel foo ::: *.sh #equivalent to putting a & in between each script
You use GNU parallel, this executes everything in the directory, with the added buff of it happening at a lot faster rate. Not to mention it isn't just with script execution, you could put any command in the function and it'll work.

Triple nested quotations in shell script

I'm trying to write a shell script that calls another script that then executes a rsync command.
The second script should run in its own terminal, so I use a gnome-terminal -e "..." command. One of the parameters of this script is a string containing the parameters that should be given to rsync. I put those into single quotes.
Up until here, everything worked fine until one of the rsync parameters was a directory path that contained a space. I tried numerous combinations of ',",\",\' but the script either doesn't run at all or only the first part of the path is taken.
Here's a slightly modified version of the code I'm using
gnome-terminal -t 'Rsync scheduled backup' -e "nice -10 /Scripts/BackupScript/Backup.sh 0 0 '/Scripts/BackupScript/Stamp' '/Scripts/BackupScript/test' '--dry-run -g -o -p -t -R -u --inplace --delete -r -l '\''/media/MyAndroid/Internal storage'\''' "
Within Backup.sh this command is run
rsync $5 "$path"
where the destination $path is calculated from text in Stamp.
How can I achieve these three levels of nested quotations?
These are some question I looked at just now (I've tried other sources earlier as well)
https://unix.stackexchange.com/questions/23347/wrapping-a-command-that-includes-single-and-double-quotes-for-another-command
how to make nested double quotes survive the bash interpreter?
Using multiple layers of quotes in bash
Nested quotes bash
I was unsuccessful in applying the solutions to my problem.
Here is an example. caller.sh uses gnome-terminal to execute foo.sh, which in turn prints all the arguments and then calls rsync with the first argument.
caller.sh:
#!/bin/bash
gnome-terminal -t "TEST" -e "./foo.sh 'long path' arg2 arg3"
foo.sh:
#!/bin/bash
echo $# arguments
for i; do # same as: for i in "$#"; do
echo "$i"
done
rsync "$1" "some other path"
Edit: If $1 contains several parameters to rsync, some of which are long paths, the above won't work, since bash either passes "$1" as one parameter, or $1 as multiple parameters, splitting it without regard to contained quotes.
There is (at least) one workaround, you can trick bash as follows:
caller2.sh:
#!/bin/bash
gnome-terminal -t "TEST" -e "./foo.sh '--option1 --option2 \"long path\"' arg2 arg3"
foo2.sh:
#!/bin/bash
rsync_command="rsync $1"
eval "$rsync_command"
This will do the equivalent of typing rsync --option1 --option2 "long path" on the command line.
WARNING: This hack introduces a security vulnerability, $1 can be crafted to execute multiple commands if the user has any influence whatsoever over the string content (e.g. '--option1 --option2 \"long path\"; echo YOU HAVE BEEN OWNED' will run rsync and then execute the echo command).
Did you try escaping the space in the path with "\ " (no quotes)?
gnome-terminal -t 'Rsync scheduled backup' -e "nice -10 /Scripts/BackupScript/Backup.sh 0 0 '/Scripts/BackupScript/Stamp' '/Scripts/BackupScript/test' '--dry-run -g -o -p -t -R -u --inplace --delete -r -l ''/media/MyAndroid/Internal\ storage''' "

How can I use a pipe or redirect in a qsub command?

There are some commands I'd like to run on a grid using qsub (SGE 8.1.3, CentOS 5.9) that need to use a pipe (|) or a redirect (>). For example, let's say I have to parallelize the command
echo 'hello world' > hello.txt
(Obviously a simplified example: in reality I might need to redirect the output of a program like bowtie directly to samtools). If I did:
qsub echo 'hello world' > hello.txt
the resulting content of hello.txt would look like
Your job 123454321 ("echo") has been submitted
Similarly if I used a pipe (echo "hello world" | myprogram), that message is all that would be passed to myprogram, not the actual stdout.
I'm aware I could write a small bash script that each contain the command with the pipe/redirect, and then do qsub ./myscript.sh. However, I'm trying to run many parallelized jobs at the same time using a script, so I'd have to write many such bash scripts each with a slightly different command. When scripting this solution can start to feel very hackish. An example of such a script in Python:
for i, (infile1, infile2, outfile) in enumerate(files):
command = ("bowtie -S %s %s | " +
"samtools view -bS - > %s\n") % (infile1, infile2, outfile)
script = "job" + str(counter) + ".sh"
open(script, "w").write(command)
os.system("chmod 755 %s" % script)
os.system("qsub -cwd ./%s" % script)
This is frustrating for a few reasons, among them that my program can't even delete the many jobXX.sh scripts afterwards to clean up after itself, since I don't know how long the job will be waiting in the queue, and the script has to be there when the job starts.
Is there a way to provide my full echo 'hello world' > hello.txt command to qsub without having to create another file containing the command?
You can do this by turning it into a bash -c command, which lets you put the | in a quoted statement:
qsub bash -c "cmd <options> | cmd2 <options>"
As #spuder has noted in the comments, it seems that in other versions of qsub (not SGE 8.1.3, which I'm using), one can solve the problem with:
echo "cmd <options> | cmd2 <options>" | qsub
as well.
Although my answer is a bit late I am adding it for any incoming viewers. To use a pipe/direct and submit that as a qsub job you need to do a couple of things. But first, using qsub at the end of a pipe like you're doing will only result in one job being sent to the queue (i.e. Your code will run serially rather than get parallelized).
Run qsub with enabling binary mode since the default qsub behavior rather expects compiled code. For that you use the "-b y" flag to qsub and you'll avoid any errors of the sort "command required for a binary mode" or "script length does not match declared length".
echo each call to qsub and then pipe that to shell.
Suppose you have a file params-query.txt which hold several bowtie commands and piped calls to samtools of the following form:
bowtie -q query -1 param1 -2 param2 ... | samtools ...
To send each query as a separate job first prepare your command line units from STDIN through xargs STDIN. Notice the quotes around the braces are important if you are submitting a command of piped parts. That way your entire query is treated a single unit.
cat params-query.txt | xargs -i echo qsub -b y -o output_log -e error_log -N job_name \"{}\" | sh
If that didn't work as expected then you're probably better off generating an intermediate output between bowtie and samtools before calling samtools to accept that intermediate output. You won't need to change the qsub call through xargs but the code in params-query.txt should look like:
bowtie -q query -o intermediate_query_out -1 param1 -2 param2 && samtools read_from_intermediate_query_out
This page has interesting qsub tricks you might like
grep http *.job | awk -F: '{print $1}' | sort -u | xargs -I {} qsub {}

Resources