Collect stdout logs for a shell script that runs inside another shell script - linux

I have a shell script called load_data.sh that runs inside a shell script called shell.sh
The contents of shell.sh
xargs --max-procs 10 -n 1 sh load_data.sh < tables.txt
This shell script runs on 10 tables at the same time in the tables.txt
Now I want to collect the Full logs of the load_data.sh So I did
xargs --max-procs 10 -n 1 sh load_data.sh < tables.txt |& tee-a logs.txt
But I am getting a mix of all the logs. What I want is the logs should be 1st table log then 2nd table log and then 3rd table logs and so on...
Is it possible to achieve that. If so how can I achieve that?

You can solve your problem by creating a separate logfile for each time your script is run. To get the logfiles to be created in sequence you can use the 'nl' utility to number each line of input.
nl -n rz tables.txt | xargs -n 2 --max-procs 10 sh -c './load_data.sh "$1" > log-$0'
Will produced logfiles in sequence
log-001
log-002
log-003
..
To turn that back into one file you can just use 'cat'
cat log* > result

Related

How to run shell script commands in an sh file in parallel?

I'm trying to take backup of tables in my database server.
I have around 200 tables. I have a shell script that contains commands to take backups of each table like:
backup.sh
psql -u username ..... table1 ... file1;
psql -u username ..... table2 ... file2;
psql -u username ..... table3 ... file3;
I can run the script and create backups in my machine. But as there are 200 tables, it's gonna run the commands sequentially and takes lot of time.
I want to run the backup commands in parallel. I have seen articles where in they suggested to use && after each command or use nohup command or wait command.
But I don't want to edit the script and include around 200 such commands.
Is there any way to run these list of shell script commands parallelly? something like nodejs does? Is it possible to do it? Or am I looking at it wrong?
Sample command in the script:
psql --host=somehost --port=5490 --username=user --dbname=db -c '\copy dbo.tablename TO "/home/username/Desktop/PostgresFiles/tablename.csv" with DELIMITER ","';
You can leverage xargs to run command in parallel, AND control the number of concurrent jobs. Running 200 backup jobs might overwhelm your database, and result in less than optimal performance.
Assuming you have backup.sh with one backup command per line
xargs -P5 -I{} bash -c "{}" < backup.sh
The commands in backup.sh should be modified to allow quoting (using single quote when possible, escaping double quote):
psql --host=somehost --port=5490 --username=user --dbname=db -c '\copy dbo.tablename TO \"/home/username/Desktop/PostgresFiles/tablename.csv\" with DELIMITER \",\"';
Where -P5 control the number of concurrent jobs. This will be able to process command lines WITHOUT double quotes. For the above script, you change "\copy ..." to '\copy ...'
Simpler alternative will be to use a helper backup-table.sh, which will take two parameters (table, file), and use
xargs -P5 -I{} backup-table.sh "{}" < tables.txt
And put all the complex quoting into the backup-table.sh
doit() {
table=$1
psql --host=somehost --port=5490 --username=user --dbname=db -c '\copy dbo.'$table' TO "/home/username/Desktop/PostgresFiles/'$table'.csv" with DELIMITER ","';
}
export -f doit
sql --listtables -n postgresql://user:pass#host:5490/db | parallel -j0 doit
Is there any logic in the script other than individual commands? (EG: and if's or processing of output?).
If it's just a file with a list of scripts, you could write a wrapper for the script (or a loop from the CLI) EG:
$ cat help.txt
echo 1
echo 2
echo 3
$ while read -r i;do bash -c "$i" &done < help.txt
[1] 18772
[2] 18773
[3] 18774
1
2
3
[1] Done bash -c "$i"
[2]- Done bash -c "$i"
[3]+ Done bash -c "$i"
$ while read -r i;do bash -c "$i" &done < help.txt
[1] 18820
[2] 18821
[3] 18822
2
3
1
[1] Done bash -c "$i"
[2]- Done bash -c "$i"
[3]+ Done bash -c "$i"
Each line of help.txt contains a command and I run a loop where I take each command and run it in subshell. (this is a simple example where I just background each job. You could get more complex using something like xargs -p or parallel but this is a starting point)

How to pass the output from one command as value of argument of the next command

I want to pass the result of a command as a value of an argument of the next command. I know there is xargs and pipe | but those don't really help.
I want to run the command tail -f --pid=$REMOTE_PID logs where REMOTE_PID is the PID of a program which is running on a remote server. It writes digits from 1 to 30 in a log file and sleep(1). So I want to display simultaneously the digits coming from the log file on the local machine. All this is done in a script, not manually !!
Here is what I have done so far but can't get the correct PID. In the first command, I put the & to release the shell, so that I can run the next command
ssh user#host 'nohup sh showdigits.sh' &
ssh user#host 'PID=`pgrep -f showdigits.sh` && tail --pid=$PID -f logs'
These commands work but I get several PIDs before gitting the right one:
tail: cannot open '8087' for reading: No such file or directory
tail: cannot open '8109' for reading: No such file or directory
==> logs <==
1
2
3
...
I tried another code :
ssh user#host 'nohup sh showdigits.sh' &
ssh user#host "ps -ef | awk '/[s]howdigits.sh/{print $2}' > pid && tail --pid=`cat pid` -f logs"
I get this error :
cat: pid: No such file or directory
tail: : invalid PID
I want to have the only one PID of the script showdigits.sh and pass it to tail. Maybe is there a simpler solution ?
Thank you
Your strategy is:
Start a process
Disconnect and forget about it
Reconnect and try to find it
Follow it
You can simplify it by dropping step 2 and 3 entirely:
ssh user#host '
nohup sh showdigits.sh &
tail --pid=$! -f logs
'
(NB: using sh to run a script is a worst practice)
your grep is matching more than 1 result. first match is assigned to --pid= and others are interpreted as file names.
you have to pass it through head (or tail) before processing (depending on which pid you want)
PID=$(pgrep -f showdigits.sh | head -n1)

Multiple scripts making rest calls interfering

So I am running into a problem with unix scripts that use curl to make rest calls. I have one script, that runs two other scripts inside of it.
cat example.sh
FILE="file1.txt"
RECIP="wilfred#blamagam.com"
rm -f $FILE
./script1.sh > $FILE
mail -s "subject" $RECIP < $FILE
RECIP="bob#blamagam.com"
rm -f $FILE
./script2.sh > $FILE
mail -s "subject" $RECIP < $FILE
exit 0
Each script makes rest calls to the same service. It is my understanding that script1.sh should completely finish before script2.sh is ran, however that is not the case. In the logs for the rest service I see a rest call from the second script in the middle of the first one still executing. The second script then fails because of this (it does not get any data returned).
I am modifying this process so I am not the one who originally wrote it. I am not seeing any forked processes, or background processes at all and I have been banging my head against the wall.
I do know that script2.sh works. Whenever script1.sh takes under a minute script2.sh works just fine, but more often than not script1.sh takes over a min, causing the second script to fail.
This is ran by a cron, and the contents of the files are mailed out, so I cant just default to running them manually. Any suggestions for what to look into would be much appreciated!
EDIT: Here is a high pseudo code example
script1.sh
ITEMS=`/usr/bin/curl -m 10 -k -u userName:passWord -L https://server/rest-service/rest?where=clause=value;clause2=value2&sel=field 2>/dev/null | sed s/<\/\?Attribute[^>]*>/\n/g | grep -v '^<' | grep -v '^$' | sed 's/ //g'`
echo "\n Subject for these metrics"
echo "$ITEMS"
Both scripts have lots of entries like this. There are 2 or 3 for loops but they are simple and I do not see any background processes being called. Its a large script so I could only provide a snippet. Could the rest call into pipes be causing an issue?
Edit:
Just tested this on my system and it seems to work.
cat example.sh
FILE="file1.txt"
RECIP="wilfred#blamagam.com"
rm -f "$FILE"
(./script1.sh > "$FILE") &
procscript1=$!
wait "$procscript1"
mail -s "subject" "$RECIP" < "$FILE"
RECIP="bob#blamagam.com"
rm -f "$FILE"
(./script2.sh > "$FILE") &
procscript2=$!
wait "$procscript2"
mail -s "subject" "$RECIP" < "$FILE"
exit 0
Put the script executions in the background with the &.
Get the process id's for each script execution.
Use the wait command to block until the execution is done.

reading command line arguments through pipe to sh

I am running a shell script by piping it to sh. For example:
curl commands.io/count-duplicate-lines-in-a-file | sh
The only way I could figure out how to pass in the filename was to use:
read file </dev/tty
You can check out the script here:
Count duplicate lines in a file
Is there another way to pass in the filename as an argument to the script without first saving it to a file locally, setting permissions and running it?
The idea is you can use Monitor to capture terminal input/output and then re-run it from the command line using curl piped to sh.
Use -s option:
echo 'echo "$#"' | sh -s 1 2 3 4
Output:
1 2 3 4
Another way is to use process substitution if shell supports it:
bash <(echo 'echo "$#"') 1 2 3 4

How can I use a pipe or redirect in a qsub command?

There are some commands I'd like to run on a grid using qsub (SGE 8.1.3, CentOS 5.9) that need to use a pipe (|) or a redirect (>). For example, let's say I have to parallelize the command
echo 'hello world' > hello.txt
(Obviously a simplified example: in reality I might need to redirect the output of a program like bowtie directly to samtools). If I did:
qsub echo 'hello world' > hello.txt
the resulting content of hello.txt would look like
Your job 123454321 ("echo") has been submitted
Similarly if I used a pipe (echo "hello world" | myprogram), that message is all that would be passed to myprogram, not the actual stdout.
I'm aware I could write a small bash script that each contain the command with the pipe/redirect, and then do qsub ./myscript.sh. However, I'm trying to run many parallelized jobs at the same time using a script, so I'd have to write many such bash scripts each with a slightly different command. When scripting this solution can start to feel very hackish. An example of such a script in Python:
for i, (infile1, infile2, outfile) in enumerate(files):
command = ("bowtie -S %s %s | " +
"samtools view -bS - > %s\n") % (infile1, infile2, outfile)
script = "job" + str(counter) + ".sh"
open(script, "w").write(command)
os.system("chmod 755 %s" % script)
os.system("qsub -cwd ./%s" % script)
This is frustrating for a few reasons, among them that my program can't even delete the many jobXX.sh scripts afterwards to clean up after itself, since I don't know how long the job will be waiting in the queue, and the script has to be there when the job starts.
Is there a way to provide my full echo 'hello world' > hello.txt command to qsub without having to create another file containing the command?
You can do this by turning it into a bash -c command, which lets you put the | in a quoted statement:
qsub bash -c "cmd <options> | cmd2 <options>"
As #spuder has noted in the comments, it seems that in other versions of qsub (not SGE 8.1.3, which I'm using), one can solve the problem with:
echo "cmd <options> | cmd2 <options>" | qsub
as well.
Although my answer is a bit late I am adding it for any incoming viewers. To use a pipe/direct and submit that as a qsub job you need to do a couple of things. But first, using qsub at the end of a pipe like you're doing will only result in one job being sent to the queue (i.e. Your code will run serially rather than get parallelized).
Run qsub with enabling binary mode since the default qsub behavior rather expects compiled code. For that you use the "-b y" flag to qsub and you'll avoid any errors of the sort "command required for a binary mode" or "script length does not match declared length".
echo each call to qsub and then pipe that to shell.
Suppose you have a file params-query.txt which hold several bowtie commands and piped calls to samtools of the following form:
bowtie -q query -1 param1 -2 param2 ... | samtools ...
To send each query as a separate job first prepare your command line units from STDIN through xargs STDIN. Notice the quotes around the braces are important if you are submitting a command of piped parts. That way your entire query is treated a single unit.
cat params-query.txt | xargs -i echo qsub -b y -o output_log -e error_log -N job_name \"{}\" | sh
If that didn't work as expected then you're probably better off generating an intermediate output between bowtie and samtools before calling samtools to accept that intermediate output. You won't need to change the qsub call through xargs but the code in params-query.txt should look like:
bowtie -q query -o intermediate_query_out -1 param1 -2 param2 && samtools read_from_intermediate_query_out
This page has interesting qsub tricks you might like
grep http *.job | awk -F: '{print $1}' | sort -u | xargs -I {} qsub {}

Resources