Loop for reading more than 1 query in a bash file - linux

I need a loop in a Bash script (analysis-run.sh) for running many queries. As I have many queries I can't run them manually so I need a way to automate them. So far, I created a file inputs.txt with all my queries and at the end of the bash script file I added the following:
while read f ; do
./analysis-run.sh $f
done < imputs.txt
With that loop, analysis-run is only running the first query of inputs.txt over and over again. I am really new to this, so any help would be appreciated.
The content of imputs.txt is:
bones
muscles
blood
saliva
and so on..
The content of analysis-run.sh is:
Execute this script as ./analysis-run.sh [query] [group]
query=$1
group=$2
if [ $group = "clean" ]; then
cluster=A
else
cluster=B
fi
adamo-obtain_bundance.py - query $query -ref combined_$cluster.$group.align -splits 1 -group $group
adamo-obtain_structure.py -i $query.combined_$query$group.csv -o $query.$group -cutoff 0.5 -group $group

With that loop, analysis-run is only running the first query of inputs.txt over and over again.
The problem (probably) is that you need to quote $f:
while read -r f ; do
./analysis-run.sh "$f"
done < samples.txt
Without the quotes, the line read from samples.txt will be subject to word splitting and glob expansion.
Read http://tldp.org/LDP/abs/html/quotingvar.html
And run your scripts though ShellCheck

Using loops in Bash can sometimes work, but it is loaded with perils.
Using xargs is usually the cleanest, most robust approach...
<inputs.txt xargs --max-args=1 do_something
The command to execute could be provided as a Bash function...
function do_something
{
echo value=${1}
}
Although the call to xargs is somewhat more involved when taking that approach. See: Calling functions with xargs within a bash script
About xargs
xargs takes a list of arguments (usually file names), which are provided as an input file or stream, and it places those arguments on the command-line for another specified command or function. If the command can handle multiple input arguments, you can drop the --max-args=1 option.

Related

How can I use xargs to run a function in a command substitution for each match?

While writing Bash functions for string replacements I have encountered a strange behaviour when using xargs. This is actually driving me mad currently as I cannot get it to work.
Fortunately I have been able to nail it down to the following simple example:
Define a simple function which doubles every character of the given parameter:
function subs { echo $1 | sed -E "s/(.)/\1\1/g"; }
Call the function:
echo $(subs "ABC")
As expected the output is:
AABBCC
Now call the function using xargs:
echo "ABC" | xargs -I % echo $(subs "%")
Surprisingly the result now is:
ABCABC
It seems as if the sed command inside the function treats the whole string now as a single character.
Why does this happen and how can it be prevented?
You might ask, why I use xargs at all. Of course, this is a simplified example and the actual use case is much more complex.
In the original use case, I have a program which produces lots of output. I pipe the output through several greps to get the lines of interest. Afterwards, I pipe the lines to sed to extract the data I need from the lines. Because some transformations I need to do on the data are too complex to do with regular expressions alone, I'd like to use a function for these. So, my original idea was to simply pipe into the function but I couldn't get that to work and end up with the xargs solution. My original idea was something like this:
command | grep ... | grep ... | grep ... | sed ... | subs
BTW: I do not do this from the command line but from within a script. The function is defined in the very same script in which it is used.
I'm using Bash 3.2 (Mac OS X default), so fancy Bash 4.x stuff won't help me, sorry.
I'll be happy about everything which might shed some light on this topic.
Best regards
Frank
If you really need to do this (and you probably don't, but we can't help without a more representative sample), a better-practice approach might look like:
subs() { sed -E "s/(.)/\1\1/g" <<<"$1"; }
export -f subs
echo "ABC" | xargs bash -c 'for arg; do subs "$arg"; done' _
The use of echo "$(subs "$arg")" instead of just subs "$arg" adds nothing but bugs (consider what happens if one of your arguments is -n -- and that's assuming a relatively tame echo; they're allowed to consume backslashes even without a -e argument and to do all manner of other surprising things). You could do it above, but it slows your program down and makes it more prone to surprising behaviors; there's no point.
Running export -f subs export your function to the environment, so it can be run by other instances of bash invoked as child processes (all programs invoked by xargs are outside your shell, so they can't see shell-local variables or functions).
Without -I -- which is to say, in its default mode of operation -- xargs appends arguments to the end of the command it's given. This permits a much more efficient usage mode, where instead of invoking one command per line of input, it passes as many arguments as possible to the shortest possible number of subprocesses.
This also avoids major security bugs that can happen when using xargs -I in conjunction with bash -c '...' or sh -c '...'. (If you ever use -I% sh -c '...%...', then your filenames become part of your code, and are able to be used in injection attacks on your system).
That's because the construct $(subs "%") gets expanded by the shell when parsing the pipeline, so xargs runs with echo %%.

how to pass asterisk into ls command inside bash script

Hi… Need a little help here…
I tried to emulate the DOS' dir command in Linux using bash script. Basically it's just a wrapped ls command with some parameters plus summary info. Here's the script:
#!/bin/bash
# default to current folder
if [ -z "$1" ]; then var=.;
else var="$1"; fi
# check file existence
if [ -a "$var" ]; then
# list contents with color, folder first
CMD="ls -lgG $var --color --group-directories-first"; $CMD;
# sum all files size
size=$(ls -lgGp "$var" | grep -v / | awk '{ sum += $3 }; END { print sum }')
if [ "$size" == "" ]; then size="0"; fi
# create summary
if [ -d "$var" ]; then
folder=$(find $var/* -maxdepth 0 -type d | wc -l)
file=$(find $var/* -maxdepth 0 -type f | wc -l)
echo "Found: $folder folders "
echo " $file files $size bytes"
fi
# error message
else
echo "dir: Error \"$var\": No such file or directory"
fi
The problem is when the argument contains an asterisk (*), the ls within the script acts differently compare to the direct ls command given at the prompt. Instead of return the whole files list, the script only returns the first file. See the video below to see the comparation in action. I don't know why it behaves like that.
Anyone knows how to fix it? Thank you.
Video: problem in action
UPDATE:
The problem has been solved. Thank you all for the answers. Now my script works as expected. See the video here: http://i.giphy.com/3o8dp1YLz4fIyCbOAU.gif
The asterisk * is expanded by the shell when it parses the command line. In other words, your script doesn't get a parameter containing an asterisk, it gets a list of files as arguments. Your script only works with $1, the first argument. It should work with "$#" instead.
This is because when you retrieve $1 you assume the shell does NOT expand *.
In fact, when * (or other glob) matches, it is expanded, and broken into segments by $IFS, and then passed as $1, $2, etc.
You're lucky if you simply retrieved the first file. When your first file's path contains spaces, you'll get an error because you only get the first segment before the space.
Seriously, read this and especially this. Really.
And please don't do things like
CMD=whatever you get from user input; $CMD;
You are begging for trouble. Don't execute arbitrary string from the user.
Both above answers already answered your question. So, i'm going a bit more verbose.
In your terminal is running the bash interpreter (probably). This is the program which parses your input line(s) and doing "things" based on your input.
When you enter some line the bash start doing the following workflow:
parsing and lexical analysis
expansion
brace expansion
tidle expansion
variable expansion
artithmetic and other substitutions
command substitution
word splitting
filename generation (globbing)
removing quotes
Only after all above the bash
will execute some external commands, like ls or dir.sh... etc.,
or will do so some "internal" actions for the known keywords and builtins like echo, for, if etc...
As you can see, the second last is the filename generation (globbing). So, in your case - if the test* matches some files, your bash expands the willcard characters (aka does the globbing).
So,
when you enter dir.sh test*,
and the test* matches some files
the bash does the expansion first
and after will execute the command dir.sh with already expanded filenames
e.g. the script get executed (in your case) as: dir.sh test.pas test.swift
BTW, it acts exactly with the same way for your ls example:
the bash expands the ls test* to ls test.pas test.swift
then executes the ls with the above two arguments
and the ls will print the result for the got two arguments.
with other words, the ls don't even see the test* argument - if it is possible - the bash expands the wilcard characters. (* and ?).
Now back to your script: add after the shebang the following line:
echo "the $0 got this arguments: $#"
and you will immediatelly see, the real argumemts how your script got executed.
also, in such cases is a good practice trying to execute the script in debug-mode, e.g.
bash -x dir.sh test*
and you will see, what the script does exactly.
Also, you can do the same for your current interpreter, e.g. just enter into the terminal
set -x
and try run the dir.sh test* = and you will see, how the bash will execute the dir.sh command. (to stop the debug mode, just enter set +x)
Everbody is giving you valuable advice which you should definitely should follow!
But here is the real answer to your question.
To pass unexpanded arguments to any executable you need to single quote them:
./your_script '*'
The best solution I have is to use the eval command, in this way:
#!/bin/bash
cmd="some command \"with_quetes_and_asterisk_in_it*\""
echo "$cmd"
eval $cmd
The eval command takes its arguments and evaluates them into the command as the shell does.
This solves my problem when I need to call a command with asterisk '*' in it from a script.

Bash: execute a multi-command line string in a script

There is, in a file, some multi-command line like this:
cd /home/user; ls
In a bash script, I would like to execute these commands, adding some arguments to the last one. For example:
cd /home/user; ls -l *.png
I thought it would be enough to do something like this:
#!/bin/bash
commandLine="$(cat theFileWithCommandInside) -l *.png"
$commandLine
exit 0
But it says:
/home/user;: No such file or directory
In other words, the ";" character doesn't mean anymore "end of the command": The shell is trying to find a directory called "user;" in the home folder...
I tried to replace ";" with "&&", but the result is the same.
the point of your question is to execute command stored in string. there are thousands of ways to execute that indirectly. but eventually, bash has to involve.
so why not explicitly invoke bash to do the job?
bash -c "$commandLine"
from doc:
-c string
If the -c option is present, then commands are read from string. If there are arguments after the string, they are assigned to the positional parameters, starting with $0.
http://linux.die.net/man/1/bash
Why dont you execute the commands themselves in the script, instead of "importing" them?
#!/bin/bash
cd /home/user; ls -l *.png
exit 0
Wrap the command into a function:
function doLS() {
cd user; ls $#
}
$# expands to all arguments passed to the function. If you (or the snippet authors) add functions expecting a predefined number of arguments, you may find the positional parameters $1, $2, ... useful instead.
As the maintainer of the main script, you will have to make sure that everyone providing such a snippet provides that "interface" your code uses (i.e. their code defines the functions your program calls and their functions process the arguments your program passes).
Use source or . to import the function into your running shell:
#!/bin/bash
source theFileWithCommandInside
doLS -l *.png
exit 0
I'd like to add a few thoughts on the ; topic:
In other words, the ";" character doesn't mean anymore "end of the
command": The shell is trying to find a directory called "user;" in
the home folder...
; is not used to terminate a statement as in C-style languages. Instead it is used to separate commands that should be executed sequentially inside a list. Example executing two commands in a subshell:
( command1 ; command2 )
If the list is part of a group, it must be succeeded by a ;:
{ command1 ; command2 ; }
In your example, tokenization and globbing (replacing the *) will not be executed (as you may have expected), so your code will not be run successfully.
The key is: eval
Here, the fixed script (look at the third line):
#!/bin/bash
commandLine="$(cat theFileWithCommandInside) -l *.png"
eval $commandLine
exit 0
Using the <(...) form
sh <(sed 's/$/ *.png/' theFileWithCommandInside)

Execute n lines of shell script

Is there a way to execute only a specified number of lines from a shell script? I will try copying them with head and putting them on a separate .sh, but I wonder if there's a shortcut...
Reorganize the shell script and create functions.
Seriously, put every line of code into a function.
Then (using ksh as an example), source the script with "." into an interactive shell.
You can now run any of the functions by name, and only the code within that function will run.
The following trivial example illustrates this. You can use this in two ways:
1) Link the script so you can call it by the name of one of the functions.
2) Source the script (with . script.sh) and you can then reuse the functions elsewhere.
function one {
print one
}
function two {
print two
}
(
progname=${0##*/}
case $progname in
(one|two)
$progname $#
esac
)
Write your own script /tmp/headexecute for example
#!/bin/ksh
trap 'rm -f /tmp/somefile' 0
head -n $2 $1 > /tmp/somefile
chmod 755 /tmp/somefile
/tmp/somefile
call it with the name of the files and the number of lines to execute
/tmp/headexecute /tmp/originalscript 10
Most shells have no such facility. You will have to do it the hard way.
This might work for you (GNU sed):
sed -n '1{h;d};H;2{x;s/.*/&/ep;q}' script
This executes the first two lines of a script.
x=starting line
y=number of lines to execute
eval "$(tail -n +$x script | head -$y)"

Accessing each line using a $ sign in linux

Whenever I execute a linux command that outputs multiple lines, I want to perform some operation on each line of the output. generally i do
command something | while read a
do
some operation on $a;
done
This works fine. But my question is, Is there some how I can access each line by a predefined symbol( dont know how to call it) /// something like $? .. or .. $! .. or .. $_
Is it possible to do
cat to_be_removed.txt | rm -f $LINE
is there a predefined $LINE in bash .. or the previous one is the shortest way. ie.
cat to_be_removed.txt | while read line; do rm -f $line; done;
xargs is what you're looking for:
cat to_be_removed.txt | xargs rm -f
Watch out for spaces in your filenames if you use that one, though. Check out the xargs man page for more information.
You might be looking for the xargs command.
It takes control arguments, plus a command and optionally some arguments for the command. It then reads its standard input, normally splitting at white space, and then arranges to repeatedly execute the command with the given arguments and as many 'file names' read from the standard input as will fit on the command line.
rm -f $(<to_be_removed.txt)
This works because rm can take multiple files as input. It also makes it much more efficient because you only call rm once and you don't need to create a pipe to cat or xargs
On a separate note, rather than using pipes in a while loop, you can avoid a subshell by using process substitution:
while read line; do
some operation on $a;
done < <(command something)
The additional benefit you get by avoiding a subshell is that variables you change inside the loop maintain their altered values outside the loop as well. This is not the case when using the pipe form and it is a common gotcha.

Resources