IFS and command substitution - linux

I am writing a shell script to read input csv files and run a java program accordingly.
#!/usr/bin/ksh
CSV_FILE=${1}
myScript="/usr/bin/java -version"
while read row
do
$myScript
IFS=$"|"
for column in $row
do
$myScript
done
done < $CSV_FILE
csv file:
a|b|c
Interestingly, $myScript outside the for loop works but the $myScript inside the for loop says "/usr/bin/java -version: not found [No such file or directory]". I have come to know that it is because I am setting IFS. If I comment IFS, and change the csv file to
a b c
It works ! I imagine the shell using the default IFS to separate the command /usr/bin/java and then apply the -version argument later. Since I changed the IFS, it is taking the entire string as a single command - or that is what I think is happening.
But this is my requirement: I have a csv file with a custom delimiter, and the command has arguments in it, separated by space. How can I do this correctly?

IFS indicates how to split the values of variables in unquoted substitutions. It applies to both $row and $myscript.
If you want to use IFS to do the splitting, which is convenient in plain sh, then you need to change the value of IFS or arrange to need the same value. In this particular case, you can easily arrange to need the same value, by defining myScript as myScript="/usr/bin/java|-version". Alternatively, you can change the value of IFS just in time. In both cases, note that an unquoted substitution doesn't just split the value using IFS, it also interprets each part as a wildcard pattern and replaces it by the list of matching file names if there are any. This means that if your CSV file contains a line like
foo|*|bar
then the row won't be foo, *, bar but foo, each file name in the current directory, bar. To process the data like this, you need to turn off with set -f. Also remember that read reads continuation lines when a line ends with a backslash, and strips leading and trailing IFS characters. Use IFS= read -r to turn off these two behaviors.
myScript="/usr/bin/java -version"
set -f
while IFS= read -r row
do
$myScript
IFS='|'
for column in $row
do
IFS=' '
$myScript
done
done
However there are better ways that avoid IFS-splitting altogether. Don't store a command in a space-separated string: it fails in complex cases, like commands that need an argument that contains a space. There are three robust ways to store a command:
Store the command in a function. This is the most natural approach. Running a command is code; you define code in a function. You can refer to the function's arguments collectively as "$#".
myScript () {
/usr/bin/java -version "$#"
}
…
myScript extra_argument_1 extra_argument_2
Store an executable command name and its arguments in an array.
myScript=(/usr/bin/java -version)
…
"${myScript[#]}" extra_argument_1 extra_argument_2
Store a shell command, i.e. something that is meant to be parsed by the shell. To evaluate the shell code in a string, use eval. Be sure to quote the argument, like any other variable expansion, to avoid premature wildcard expansion. This approach is more complex since it requires careful quoting. It's only really useful when you have to store the command in a string, for example because it comes in as a parameter to your script. Note that you can't sensibly pass extra arguments this way.
myScript='/usr/bin/java -version'
…
eval "$myScript"
Also, since you're using ksh and not plain sh, you don't need to use IFS to split the input line. Use read -A instead to directly split into an array.
#!/usr/bin/ksh
CSV_FILE=${1}
myScript=(/usr/bin/java -version)
while IFS='|' read -r -A columns
do
"${myScript[#]}"
for column in "${columns[#]}"
do
"${myScript[#]}"
done
done <"$CSV_FILE"

The simplest soultion is to avoid changing IFS and do the splitting with read -d <delimiter> like this:
#!/usr/bin/ksh
CSV_FILE=${1}
myScript="/usr/bin/java -version"
while read -A -d '|' columns
do
$myScript
for column in "${columns[#]}"
do
echo next is "$column"
$myScript
done
done < $CSV_FILE

IFS tells the shell which characters separate "words", that is, the different components of a command. So when you remove the space character from IFS and run foo bar, the script sees a single argument "foo bar" rather than "foo" and "bar".

the IFS should be placed behind of "while"
#!/usr/bin/ksh
CSV_FILE=${1}
myScript="/usr/bin/java -version"
while IFS="|" read row
do
$myScript
for column in $row
do
$myScript
done
done < $CSV_FILE

Related

Avoid using an array for wildcard expansion in bash

I wrote the following code:
join(){
IFS="$1"
shift
echo "$*"
}
FILES=(/tmp/*)
SEPARATED_FILES=$(join , ${FILES[*]})
echo $VAR
And it prints the comma separated lists of files in /tmp just fine. But I would like to refactor it and eliminate the tmp global variable FILES which is an array. I tried the following:
SEPARATED_FILES=$(join , ${(/tmp/*)[*]})
echo $VAR
But it prints the following error:
line 8: ${(/tmp/*)[*]}: bad substitution
Yes! You can avoid it by doing pass the glob as directly an argument to the function. Note that, the glob results are expanded by the shell before passing to the function. So pass the first argument as the IFS you want to set and the second as the glob expression you want to use.
join , /tmp/*
The glob is expanded to file names before the function is being called.
join , /tmp/file1 /tmp/file2 /tmp/file3
A noteworthy addition to the above would be to use nullglob option before calling the function. Because when the glob does not produce any results, the un-expanded string can be safely ignored.
shopt -s nullglob
join , /tmp/*
and in a command substitution syntax as
fileList=$(shopt -s nullglob; join , /tmp/*)
Couple of takeaways from your good effort.
Always apply shell quoting to variables/arrays unless you have a reason not to do so. Doing so preserves the literal value of the contents inside and prevents Word-Splitting from happening
Always use lower case names for user-defined variable/function and array names

How to extract string in shell script

I have file names like Tarun_Verma_25_02_2016_10_00_10.csv. How can I extract the string like 25_02_2016_10_00_10 from it in shell script?
It is not confirmed that how many numeric parts there would be after "firstName"_"lastName"
A one-line solution would be preferred.
with sed
$ echo Tarun_Verma_25_02_2016_10_00_10.csv | sed -r 's/[^0-9]*([0-9][^.]*)\..*/\1/'
25_02_2016_10_00_10
extract everything between the first digit and dot.
If you want some control over which parts you pick out (assuming the format is always like <firstname>_<lastname>_<day>_<month>_<year>_<hour>_<minute>_<second>.csv) awk would be pretty handy
echo "Tarun_Verma_25_02_2016_10_00_10.csv" | awk -F"[_.]" 'BEGIN{OFS="_"}{print $3,$4,$5,$6,$7,$8}'
Here awk splits by both underscore and period, sets the Output Field Seperator to an underscore, and then prints the parts of the file name that you are interested in.
ksh93 supports the syntax bash calls extglobs out-of-the-box. Thus, in ksh93, you can do the following:
f='Tarun_Verma_25_02_2016_10_00_10.csv'
f=${f##+([![:digit:]])} # trim everything before the first digit
f=${f%%+([![:digit:]])} # trim everything after the last digit
echo "$f"
To do the same in bash, you'll want to run the following command first
shopt -s extglob
Since this uses shell-native string manipulation, it runs much more quickly than invoking an external command (sed, awk, etc) when processing only a single line of input. (When using ksh93 rather than bash, it's quite speedy even for large inputs).

Using a glob expression passed as a bash script argument

TL;DR:
Why isn't invoking ./myscript foo* when myscript has var=$1 the same as invoking ./myscript with var=foo* hardcoded?
Longer form
I've come across a weird issue in a bash script I'm writing. I am sure there is a simple explanation, but I can't figure it out.
I am trying to pass a command line argument to be assigned as a variable in the script.
I want the script to allow 2 command line arguments as follows:
$ bash my_bash_script.bash args1 args2
In my script, I assigned variables like this:
ARGS1=$1
ARGS2=$2
Args 1 is a string descriptor to add to the output file.
Args 2 is a group of directories: "dir1, dir2, dir3", which I am passing as dir*
When I assign dir* to ARGS2 in the script it works fine, but when I pass dir* as the second command line argument, it only includes dir1 in the wildcard expansion of dir*.
I assume this has something to do with how the shell handles wildcards (even when passed as args), but I don't really understand it.
Any help would be appreciated.
Environment / Usage
I have a group of directories:
dir_1_y_map, dir_1_x_map, dir_2_y_map, dir_2_x_map,
... dir_10_y_map, dir_10_x_map...
Inside these directories I am trying to access a file with extension ".status" via *.status, and ".report.txt" via *report.txt.
I want to pass dir_*_map as the second argument to the script and store it in the variable ARGS2, then use it to search within each of the directories for the ".status" and ".report" files.
The issue is that passing dir_*_map from the command line doesn't give the list of directories, but rather just the first item in the list. If I assign the variable ARGS2=dir_*_map within the script, it works as I intend.
Workaround: Quoting
It turns out that passing the second argument in quotes allowed the wildcard expansion to work appropriately for "dir_*_map"
#!/usr/bin/env bash
ARGS1=$1
ARGS2=$2
touch $ARGS1".extension"
for i in /$ARGS2/*.status
do
grep -e "string" $i >> $ARGS1".extension"
done
Here is an example invocation of the script:
sh ~/path/to/script descriptor "dir_*_map"
I don't fully understand when/why some arguments must be passed in quotes, but I assume it has to do with the wildcard expansion in the for loop.
Addressing the "why"
Assignments, as in var=foo*, don't expand globs -- that is, when you run var=foo*, the literal string foo* is put into the variable foo, not the list of files matching foo*.
By contrast, unquoted use of foo* on a command line expands the glob, replacing it with a list of individual names, each of which is passed as a separate argument.
Thus, running ./yourscript foo* doesn't pass foo* as $1 unless no files matching that glob expression exist; instead, it becomes something like ./yourscript foo01 foo02 foo03, with each argument in a different spot on the command line.
The reason running ./yourscript "foo*" functions as a workaround is the unquoted expansion inside the script allowing the glob to be expanded at that later time. However, this is bad practice: glob expansion happens concurrent with string-splitting (meaning that relying on this behavior removes your ability to pass filenames containing characters found in IFS, typically whitespace), and also means that you can't pass literal filenames when they could also be interpreted as globs (if you have a file named [1] and a file named 1, passing [1] would always be replaced with 1).
Idiomatic Usage
The idiomatic way to build this would be to shift away the first argument, and then iterate over subsequent ones, like so:
#!/bin/bash
out_base=$1; shift
shopt -s nullglob # avoid generating an error if a directory has no .status
for dir; do # iterate over directories passed in $2, $3, etc
for file in "$dir"/*.status; do # iterate over files ending in .status within those
grep -e "string" "$file" # match a single file
done
done >"${out_base}.extension"
If you have many .status files in a single directory, all this can be made more efficient by using find to invoke grep with as many arguments as possible, rather than calling grep individually on a per-file basis:
#!/bin/bash
out_base=$1; shift
find "$#" -maxdepth 1 -type f -name '*.status' \
-exec grep -h -- /dev/null '{}' + \
>"${out_base}.extension"
Both scripts above expect the globs passed not to be quoted on the invoking shell. Thus, usage is of the form:
# being unquoted, this expands the glob into a series of separate arguments
your_script descriptor dir_*_map
This is considerably better practice than passing globs to your script (which then is required to expand them to retrieve the actual files to use); it works correctly with filenames containing whitespace (which the other practice doesn't), and files whose names are themselves glob expressions.
Some other points of note:
Always put double quotes around expansions! Failing to do so results in the additional steps of string-splitting and glob expansion (in that order) being applied. If you want globbing, as in the case of "$dir"/*.status, then end the quotes before the glob expression starts.
for dir; do is precisely equivalent to for dir in "$#"; do, which iterates over arguments. Don't make the mistake of using for dir in $*; do or for dir in $#; do instead! These latter invocations combine each element of the list with the first character of IFS (which, by default, contains the space, the tab and the newline in that order), then splits the resulting string on any IFS characters found within, then expands each component of the resulting list as a glob.
Passing /dev/null as an argument to grep is a safety measure: It ensures that you don't have different behavior between the single-argument and multi-argument cases (as an example, grep defaults to printing filenames within output only when passed multiple arguments), and ensures that you can't have grep hang trying to read from stdin if it's passed no additional filenames at all (which find won't do here, but xargs can).
Using lower-case names for your own variables (as opposed to system- and shell-provided variables, which have all-uppercase names) is in accordance with POSIX-specified convention; see fourth paragraph of the POSIX specification regarding environment variables, keeping in mind that environment variables and shell variables share a namespace.

Make multiple copies of files with a shell script

I am trying to write a small shell script that makes the multiple copies of a file. I am able to take the file name as input but not the number of copies. Here is what I wrote. But I am unable to pass the NUMBER variable to for loop.
echo -n "Enter filename: "
read FILENAME
echo -n "Number of copies to be made: "
read NUMBER
for i in {2..$NUMBER}
do
cp -f $FILENAME ${FILENAME%%.*}"_"$i.csv
done
Unfortunately it doesn't work like that. Bash performs brace expansion before parameter expansion, so your brace will be expanded before $NUMBER is evaluated. See also the Bash Pitfall #33, which explains the issue.
One way to do this, using your code, would be:
for i in $(eval echo {2..$NUMBER})
do
# ...
done
Or, even shorter:
for i in $(seq 2 $NUMBER)
# ...
(thanks, Glenn Jackman!)
Note that typically, variables should be quoted. This is especially important for file names. What if your file is called foo bar? Then your cp -f would copy foo and bar since the arguments are split by whitespace.
So, do something like this:
cp -f "$FILENAME" "${FILENAME%%.*}_${i}.csv"
While it might not matter if your files don't contain whitespace, quoting variables is something you should do automatically to prevent any surprises in the future.

How to pass the value of a variable to the standard input of a command?

I'm writing a shell script that should be somewhat secure, i.e., does not pass secure data through parameters of commands and preferably does not use temporary files. How can I pass a variable to the standard input of a command?
Or, if it's not possible, how can I correctly use temporary files for such a task?
Passing a value to standard input in Bash is as simple as:
your-command <<< "$your_variable"
Always make sure you put quotes around variable expressions!
Be cautious, that this will probably work only in bash and will not work in sh.
Simple, but error-prone: using echo
Something as simple as this will do the trick:
echo "$blah" | my_cmd
Do note that this may not work correctly if $blah contains -n, -e, -E etc; or if it contains backslashes (bash's copy of echo preserves literal backslashes in absence of -e by default, but will treat them as escape sequences and replace them with corresponding characters even without -e if optional XSI extensions are enabled).
More sophisticated approach: using printf
printf '%s\n' "$blah" | my_cmd
This does not have the disadvantages listed above: all possible C strings (strings not containing NULs) are printed unchanged.
(cat <<END
$passwd
END
) | command
The cat is not really needed, but it helps to structure the code better and allows you to use more commands in parentheses as input to your command.
Note that the 'echo "$var" | command operations mean that standard input is limited to the line(s) echoed. If you also want the terminal to be connected, then you'll need to be fancier:
{ echo "$var"; cat - ; } | command
( echo "$var"; cat - ) | command
This means that the first line(s) will be the contents of $var but the rest will come from cat reading its standard input. If the command does not do anything too fancy (try to turn on command line editing, or run like vim does) then it will be fine. Otherwise, you need to get really fancy - I think expect or one of its derivatives is likely to be appropriate.
The command line notations are practically identical - but the second semi-colon is necessary with the braces whereas it is not with parentheses.
This robust and portable way has already appeared in comments. It should be a standalone answer.
printf '%s' "$var" | my_cmd
or
printf '%s\n' "$var" | my_cmd
Notes:
It's better than echo, reasons are here: Why is printf better than echo?
printf "$var" is wrong. The first argument is format where various sequences like %s or \n are interpreted. To pass the variable right, it must not be interpreted as format.
Usually variables don't contain trailing newlines. The former command (with %s) passes the variable as it is. However tools that work with text may ignore or complain about an incomplete line (see Why should text files end with a newline?). So you may want the latter command (with %s\n) which appends a newline character to the content of the variable. Non-obvious facts:
Here string in Bash (<<<"$var" my_cmd) does append a newline.
Any method that appends a newline results in non-empty stdin of my_cmd, even if the variable is empty or undefined.
I liked Martin's answer, but it has some problems depending on what is in the variable. This
your-command <<< """$your_variable"""
is better if you variable contains " or !.
As per Martin's answer, there is a Bash feature called Here Strings (which itself is a variant of the more widely supported Here Documents feature):
3.6.7 Here Strings
A variant of here documents, the format is:
<<< word
The word is expanded and supplied to the command on its standard
input.
Note that Here Strings would appear to be Bash-only, so, for improved portability, you'd probably be better off with the original Here Documents feature, as per PoltoS's answer:
( cat <<EOF
$variable
EOF
) | cmd
Or, a simpler variant of the above:
(cmd <<EOF
$variable
EOF
)
You can omit ( and ), unless you want to have this redirected further into other commands.
Try this:
echo "$variable" | command
If you came here from a duplicate, you are probably a beginner who tried to do something like
"$variable" >file
or
"$variable" | wc -l
where you obviously meant something like
echo "$variable" >file
echo "$variable" | wc -l
(Real beginners also forget the quotes; usually use quotes unless you have a specific reason to omit them, at least until you understand quoting.)

Resources