STDOUT of nested commands in Bash

STDOUT of nested commands in Bash - linux

Quite new to bash - I'm trying to store the output of my /usr/bin/time command into the TIME_INFO variable, which works with the below setup... however I would also like to be able to store the output of some of the other nested commands (such as /usr/local/bin/firejail or ./program) to other variables. Currently if there is a runtime exception in ./program it'll also go to the TIME_INFO variable.
TIME_INFO=$( /usr/bin/time --quiet -f "%e-%U-%S-%M-%x" 2>&1 \
timeout 5s \
/usr/local/bin/firejail --quiet --cgroup=/sys/fs/cgroup/memory/group1/tasks --profile=java.profile \
./program < test.in > test.out )
Is there any way to accomplish separating outputs of multiple nested commands?
Thanks in advance!

One way to do this is to inject a shell in the call chain and make it responsible for modifying stderr for its subprocesses:
time_info=$( /usr/bin/time --quiet -f "%e-%U-%S-%M-%x" 2>&1 \
sh -c '"$#" 2>"$0"' test.err \
timeout 5s \
/usr/local/bin/firejail \
--quiet --cgroup=/sys/fs/cgroup/memory/group1/tasks --profile=java.profile \
./program < test.in > test.out )
# read your content back into a shell variable
error_text=$(<test.err)
The pertinent change here is sh -c '"$#" 2>"$0", which runs its arguments as a command, with stderr redirected to the filename passed in $0 -- which is populated from the string immediately following code passed with sh -c.
Note that I modified the case of the TIME_INFO variable per POSIX guidance specifying all-caps names for variables with meaning to the shell or OS, and reserving names with at least one lower-case character for other purposes.

Related

Open new gnome-terminal from scripts and input vars from present script.

#!/bin/bash
Dpath=/home/$USER/Docker/
IP=`sed -n 1p /home/medma/.medmadoc`
DockerMachine=`sed -n 2p /home/$USER/.medmadoc`
DockerPort=`sed -n 5p /home/$USER/.medmadoc`
DockerUser=`sed -n 3p /home/$USER/.medmadoc`
DockerPass=`sed -n 4p /home/$USER/.medmadoc`
if [ ! -d $Dpath ] ; then
mkdir -p $Dpath
else
stat=`wget -O ".dockerid" http://$IP/DOCKER-STAT.txt`
for ids in `cat .dockerid`
do
if [ "$ids" == "$DockerMachine" ] ; then
gnome-terminal -x sh -c 'sshfs -p$DockerPort $DockerUser#$IP:/var/www/html $Dpath ; bash '
nautilus $Dpath
zenity --info --text "Mounted $DockerMachine"
exit
else
:
fi
done
zenity --info --text "No Such ID:$DockerMachine"
fi
gnome-terminal -x sh -c 'sshfs -p$DockerPort $DockerUser#$IP:/var/www/html $Dpath ; bash '
this command opens up a new terminal but the problem is that it does not load vars like $DockerPort $DockerUser $IP $Dpath from this script.
How do I input the values in these vars from this script to the newly opened terminal ?
Thanks !

As indicated before, you could try to use double quotes instead of single quotes around the sshfs invocation.
Single quotes in Bash are used to delimit verbatim text, in which variables are not expanded. Double quotes, in contrast, allow for variables expansion and command substitution ($(...)) to take place.
If you do use double quotes, beware of unintended side-effects (your username may contain a space, a dollar, a semicolon, or any other shell-special character). A cleaner approach would be to export the variables to the environment before calling gnome-terminal (and not forgetting to add double quotes around your variables inside the single-quotes), so that your code looks like :
export Docker{Port,User} IP Dpath
gnome-terminal -x sh -c 'sshfs -p"$DockerPort" "$DockerUser#$IP":/var/www/html "$Dpath" ; bash'
You may not want to pollute the environment with variables that will only be used once. If that is the case, instead of exporting them, you can use Bash's declare -p feature to serialize variables before loading them into a new environment (in my opinion, this is the cleanest approach). Here is what it looks like :
set_vars="$(declare -p Docker{Port,User} IP Dpath)"
gnome-terminal -x bash -c "$set_vars;"'sshfs ....'
Using this latest method, the variables are only visible to the shell process that runs the sshfs command, not gnome-terminal itself nor any sub-process run thereafter.
PS: you could read all your variables at once from the ~/.medmadoc file by using the following code instead of repeated sed invocations :
for var in IP Docker{Machine,User,Pass,Port}; do
read $var
done < ~/.medmadoc
This code makes use of the read builtin, that reads a line of input into a variable (in its simplest form).
PPS: That stat variable probably won't contain any useful information, since the output of wget was redirected by the -O flag. Perhaps you meant to store the result code of wget into stat, in which case what you meant was :
wget -O .dockerid ...
stat=$?

How to make the bash script work with one command after another?

I have a bash script like below. First it will take sorted.bam files as input and use "stringtie" tool give each sample gtf as output. Then path for each sample gtf will be given into mergelist.txt. and then use "stringtie merge" on them to get "stringtie_merged.gtf".
I totally have 40 sorted.bam files.
for sample in /path/*.sorted.bam
do
dir="/pathto/hisat2_output"
dir2="/pathto/folder"
base=`basename $sample '.sorted.bam'`
"stringtie -p 8 -G gencode.v27.primary_assembly.annotation_nochr.gtf -o ${dir2}/stringtie_output/${base}/${base}_GRCh38.gtf -l ${dir2}/stringtie_output/${base}/${base} ${dir}/${base}.sorted.bam; ls ${dir2}/stringtie_output/*/*_GRCh38.gtf > mergelist.txt; stringtie --merge -p 8 -G gencode.v27.primary_assembly.annotation_nochr.gtf -o ${dir2}/stringtie_output/stringtie_merged.gtf mergelist.txt"
done
I separated the commands with ; After running the script on all sorted.bam files and after completing the job I see that mergelist.txt has paths only for 33 sample gtf's. Which means the path for other 7 sample gtfs is missing in merge list.txt.
Is Separating the commands with ; right one or is there any other way?
The script should use one command first and with the output the paths need to be given in the text file and then use the other command.

You haven't separated the commands with semi-colons; you've invoked a single command that has semi-colons embedded in it. Consider the simple script:
"ls; pwd"
This script does not call ls followed by pwd. Instead, the shell will search the PATH looking for a file named ls; pwd (that is, a file with a semi-colon and a space in its name), probably not find one and respond with an error message. You need to remove the double quotes.

What's wrong with multiple lines, as you already have more than one line:
dir="/pathto/hisat2_output"
dir2="/pathto/folder"
for sample in /path/*.sorted.bam ;do
base=$(basename ${sample} '.sorted.bam')
stringtie -p 8 -G gencode.v27.primary_assembly.annotation_nochr.gtf -o ${dir2}/stringtie_output/${base}/${base}_GRCh38.gtf -l ${dir2}/stringtie_output/${base}/${base} ${dir}/${base}.sorted.bam
ls ${dir2}/stringtie_output/*/*_GRCh38.gtf > mergelist.txt
stringtie --merge -p 8 -G gencode.v27.primary_assembly.annotation_nochr.gtf -o ${dir2}/stringtie_output/stringtie_merged.gtf mergelist.txt
done
Anyway, I don't see the point in having the second stringtie command inside the loop, it should work fine just after.
If stringtie is able process STDIN you might get away without the mergelist.txt by using:
stringtie --merge -p 8 -G gencode.v27.primary_assembly.annotation_nochr.gtf -o ${dir2}/stringtie_output/stringtie_merged.gtf <<< $(echo ${dir2}/stringtie_output/*/*_GRCh38.gtf)

you should double quote your variables and use $( command ) instead backticks
base=$( basename $sample '.sorted.bam' ) :
you have a space in filenames??
prefer:
base=$( basename "$sample.sorted.bam" ) # with or without space
if you have spaces, you must double quote:
stringtie -p 8 \
-G gencode.v27.primary_assembly.annotation_nochr.gtf \
-o "$dir2/stringtie_output/$base/$base_GRCh38.gtf" \
-l "$dir2/stringtie_output/$base/$base" \
"$dir/$base.sorted.bam"
ls "$dir2"/stringtie_output/*/*_GRCh38.gtf > mergelist.txt
...

Execute a find command with expression from a shell script [duplicate]

This question already has answers here:
Why does shell ignore quoting characters in arguments passed to it through variables? [duplicate]
(3 answers)
Closed 6 years ago.
I'm trying to write a database call from within a bash script and I'm having problems with a sub-shell stripping my quotes away.
This is the bones of what I am doing.
#---------------------------------------------
#! /bin/bash
export COMMAND='psql ${DB_NAME} -F , -t --no-align -c "${SQL}" -o ${EXPORT_FILE} 2>&1'
PSQL_RETURN=`${COMMAND}`
#---------------------------------------------
If I use an 'echo' to print out the ${COMMAND} variable the output looks fine:
echo ${COMMAND}
screen output:-
#---------------
psql drupal7 -F , -t --no-align -c "SELECT DISTINCT hostname FROM accesslog;" -o /DRUPAL/INTERFACES/EXPORTS/ip_list.dat 2>&1
#---------------
Also if I cut and paste this screen output it executes just fine.
However, when I try to execute the command as a variable within a sub-shell call, it gives an error message.
The error is from the psql client to the effect that the quotes have been removed from around the ${SQL} string.
The error suggests psql is trying to interpret the terms in the sql string as parameters.
So it seems the string and quotes are composed correctly but the quotes around the ${SQL} variable/string are being interpreted by the sub-shell during the execution call from the main script.
I've tried to escape them using various methods: \", \\", \\\", "", \"" '"', \'"\', ... ...
As you can see from my 'try it all' approach I am no expert and it's driving me mad.
Any help would be greatly appreciated.
Charlie101

Instead of storing command in a string var better to use BASH array here:
cmd=(psql ${DB_NAME} -F , -t --no-align -c "${SQL}" -o "${EXPORT_FILE}")
PSQL_RETURN=$( "${cmd[#]}" 2>&1 )

Rather than evaluating the contents of a string, why not use a function?
call_psql() {
# optional, if variables are already defined in global scope
DB_NAME="$1"
SQL="$2"
EXPORT_FILE="$3"
psql "$DB_NAME" -F , -t --no-align -c "$SQL" -o "$EXPORT_FILE" 2>&1
}
then you can just call your function like:
PSQL_RETURN=$(call_psql "$DB_NAME" "$SQL" "$EXPORT_FILE")
It's entirely up to you how elaborate you make the function. You might like to check for the correct number of arguments (using something like (( $# == 3 ))) before calling the psql command.
Alternatively, perhaps you'd prefer just to make it as short as possible:
call_psql() { psql "$1" -F , -t --no-align -c "$2" -o "$3" 2>&1; }
In order to capture the command that is being executed for debugging purposes, you can use set -x in your script. This will the contents of the function including the expanded variables when the function (or any other command) is called. You can switch this behaviour off using set +x, or if you want it on for the whole duration of the script you can change the shebang to #!/bin/bash -x. This saves you explicitly echoing throughout your script to find out what commands are being run; you can just turn on set -x for a section.
A very simple example script using the shebang method:
#!/bin/bash -x
ec() {
echo "$1"
}
var=$(ec 2)
Running this script, either directly after making it executable or calling it with bash -x, gives:
++ ec 2
++ echo 2
+ var=2
Removing the -x from the shebang or the invocation results in the script running silently.

Processing data with inotify-tools as a daemon

I have a bash script that processes some data using inotify-tools to know when certain events took place on the filesystem. It works fine if run in the bash console, but when I try to run it as a daemon it fails. I think the reason is the fact that all the output from the inotifywait command call goes to a file, thus, the part after | while doesn't get called anymore. How can I fix that? Here is my script.
#!/bin/bash
inotifywait -d -r \
-o /dev/null \
-e close_write \
--exclude "^[\.+]|cgi-bin|recycle_bin" \
--format "%w:%&e:%f" \
$1|
while IFS=':' read directory event file
do
#doing my thing
done
So, -d tells inotifywait to run as daemon, -r to do it recursively and -o is the file in which to save the output. In my case the file is /dev/null because I don't really need the output except for processing the part after the command (| while...)

You don't want to run inotify-wait as a daemon in this case, because you want to continue process output from the command. You want to replace the -d command line option with -m, which tells inotifywait to keep monitoring the files and continue printing to stdout:
-m, --monitor
Instead of exiting after receiving a single event, execute
indefinitely. The default behaviour is to exit after the
first event occurs.
If you want things running in the background, you'll need to background the entire script.

Here's a solution using nohup: (Note in my testing, if I specified the -o the while loop didn't seem to be evaluated)
nohup inotifywait -m -r \
-e close_write \
--exclude "^[\.+]|cgi-bin|recycle_bin" \
--format "%w:%&e:%f" \
$1 |
while IFS=':' read directory event file
do
#doing my thing
done >> /some/path/to/log 2>&1 &

Parallel download using Curl command line utility

I want to download some pages from a website and I did it successfully using curl but I was wondering if somehow curl downloads multiple pages at a time just like most of the download managers do, it will speed up things a little bit. Is it possible to do it in curl command line utility?
The current command I am using is
curl 'http://www...../?page=[1-10]' 2>&1 > 1.html
Here I am downloading pages from 1 to 10 and storing them in a file named 1.html.
Also, is it possible for curl to write output of each URL to separate file say URL.html, where URL is the actual URL of the page under process.

My answer is a bit late, but I believe all of the existing answers fall just a little short. The way I do things like this is with xargs, which is capable of running a specified number of commands in subprocesses.
The one-liner I would use is, simply:
$ seq 1 10 | xargs -n1 -P2 bash -c 'i=$0; url="http://example.com/?page${i}.html"; curl -O -s $url'
This warrants some explanation. The use of -n 1 instructs xargs to process a single input argument at a time. In this example, the numbers 1 ... 10 are each processed separately. And -P 2 tells xargs to keep 2 subprocesses running all the time, each one handling a single argument, until all of the input arguments have been processed.
You can think of this as MapReduce in the shell. Or perhaps just the Map phase. Regardless, it's an effective way to get a lot of work done while ensuring that you don't fork bomb your machine. It's possible to do something similar in a for loop in a shell, but end up doing process management, which starts to seem pretty pointless once you realize how insanely great this use of xargs is.
Update: I suspect that my example with xargs could be improved (at least on Mac OS X and BSD with the -J flag). With GNU Parallel, the command is a bit less unwieldy as well:
parallel --jobs 2 curl -O -s http://example.com/?page{}.html ::: {1..10}

Well, curl is just a simple UNIX process. You can have as many of these curl processes running in parallel and sending their outputs to different files.
curl can use the filename part of the URL to generate the local file. Just use the -O option (man curl for details).
You could use something like the following
urls="http://example.com/?page1.html http://example.com?page2.html" # add more URLs here
for url in $urls; do
# run the curl job in the background so we can start another job
# and disable the progress bar (-s)
echo "fetching $url"
curl $url -O -s &
done
wait #wait for all background jobs to terminate

As of 7.66.0, the curl utility finally has built-in support for parallel downloads of multiple URLs within a single non-blocking process, which should be much faster and more resource-efficient compared to xargs and background spawning, in most cases:
curl -Z 'http://httpbin.org/anything/[1-9].{txt,html}' -o '#1.#2'
This will download 18 links in parallel and write them out to 18 different files, also in parallel. The official announcement of this feature from Daniel Stenberg is here: https://daniel.haxx.se/blog/2019/07/22/curl-goez-parallel/

For launching of parallel commands, why not use the venerable make command line utility.. It supports parallell execution and dependency tracking and whatnot.
How? In the directory where you are downloading the files, create a new file called Makefile with the following contents:
# which page numbers to fetch
numbers := $(shell seq 1 10)
# default target which depends on files 1.html .. 10.html
# (patsubst replaces % with %.html for each number)
all: $(patsubst %,%.html,$(numbers))
# the rule which tells how to generate a %.html dependency
# $# is the target filename e.g. 1.html
%.html:
curl -C - 'http://www...../?page='$(patsubst %.html,%,$#) -o $#.tmp
mv $#.tmp $#
NOTE The last two lines should start with a TAB character (instead of 8 spaces) or make will not accept the file.
Now you just run:
make -k -j 5
The curl command I used will store the output in 1.html.tmp and only if the curl command succeeds then it will be renamed to 1.html (by the mv command on the next line). Thus if some download should fail, you can just re-run the same make command and it will resume/retry downloading the files that failed to download during the first time. Once all files have been successfully downloaded, make will report that there is nothing more to be done, so there is no harm in running it one extra time to be "safe".
(The -k switch tells make to keep downloading the rest of the files even if one single download should fail.)

Curl can also accelerate a download of a file by splitting it into parts:
$ man curl |grep -A2 '\--range'
-r/--range <range>
(HTTP/FTP/SFTP/FILE) Retrieve a byte range (i.e a partial docu-
ment) from a HTTP/1.1, FTP or SFTP server or a local FILE.
Here is a script that will automatically launch curl with the desired number of concurrent processes: https://github.com/axelabs/splitcurl

Starting from 7.68.0 curl can fetch several urls in parallel. This example will fetch urls from urls.txt file with 3 parallel connections:
curl --parallel --parallel-immediate --parallel-max 3 --config urls.txt
urls.txt:
url = "example1.com"
output = "example1.html"
url = "example2.com"
output = "example2.html"
url = "example3.com"
output = "example3.html"
url = "example4.com"
output = "example4.html"
url = "example5.com"
output = "example5.html"

curl and wget cannot download a single file in parallel chunks, but there are alternatives:
aria2 (written in C++, available in Deb and Cygwin repo's)
aria2c -x 5 <url>
axel (written in C, available in Deb repo)
axel -n 5 <url>
wget2 (written in C, available in Deb repo)
wget2 --max-threads=5 <url>
lftp (written in C++, available in Deb repo)
lftp -n 5 <url>
hget (written in Go)
hget -n 5 <url>
pget (written in Go)
pget -p 5 <url>

Run a limited number of process is easy if your system have commands like pidof or pgrep which, given a process name, return the pids (the count of the pids tell how many are running).
Something like this:
#!/bin/sh
max=4
running_curl() {
set -- $(pidof curl)
echo $#
}
while [ $# -gt 0 ]; do
while [ $(running_curl) -ge $max ] ; do
sleep 1
done
curl "$1" --create-dirs -o "${1##*://}" &
shift
done
to call like this:
script.sh $(for i in `seq 1 10`; do printf "http://example/%s.html " "$i"; done)
The curl line of the script is untested.

I came up with a solution based on fmt and xargs. The idea is to specify multiple URLs inside braces http://example.com/page{1,2,3}.html and run them in parallel with xargs. Following would start downloading in 3 process:
seq 1 50 | fmt -w40 | tr ' ' ',' \
| awk -v url="http://example.com/" '{print url "page{" $1 "}.html"}' \
| xargs -P3 -n1 curl -o
so 4 downloadable lines of URLs are generated and sent to xargs
curl -o http://example.com/page{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16}.html
curl -o http://example.com/page{17,18,19,20,21,22,23,24,25,26,27,28,29}.html
curl -o http://example.com/page{30,31,32,33,34,35,36,37,38,39,40,41,42}.html
curl -o http://example.com/page{43,44,45,46,47,48,49,50}.html

Bash 3 or above lets you populate an array with multiple values as it expands sequence expressions:
$ urls=( "" http://example.com?page={1..4} )
$ unset urls[0]
Note the [0] value, which was provided as shorthand to make the indices line up with page numbers, since bash arrays autonumber starting at zero. This strategy obviously might not always work. Anyway, you can unset it in this example.
Now you have a an array, and you can verify the contents with declare -p:
$ declare -p urls
declare -a urls=([1]="http://example.com?Page=1" [2]="http://example.com?Page=2" [3]="http://example.com?Page=3" [4]="http://example.com?Page=4")
Now that you have a list of URLs in an array, expand the array into a curl command line:
$ curl $(for i in ${!urls[#]}; do echo "-o $i.html ${urls[$i]}"; done)
The curl command can take multiple URLs and fetch all of them, recycling the existing connection (HTTP/1.1) to a common server, but it needs the -o option before each one in order to download and save each target. Note that characters within some URLs may need to be escaped to avoid interacting with your shell.

I am not sure about curl, but you can do that using wget.
wget \
--recursive \
--no-clobber \
--page-requisites \
--html-extension \
--convert-links \
--restrict-file-names=windows \
--domains website.org \
--no-parent \
www.website.org/tutorials/html/

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string