Bash Issue: AWK

Bash Issue: AWK - linux

I came back to work from a break to see that my Bash script wasn't working like it used to. The below tid-bit of code would grab and filter what's in a file. Here's the contents of said file:
# A colon, ':', is used as the field terminator. A new line terminates
# the entry. Lines beginning with a pound sign, '#', are comments.
#
# Entries are of the form:
# $ORACLE_SID:$ORACLE_HOME:<N|Y>:
#
# The first and second fields are the system identifier and home
# directory of the database respectively. The third filed indicates
# to the dbstart utility that the database should , "Y", or should not,
# "N", be brought up at system boot time.
#
# Multiple entries with the same $ORACLE_SID are not allowed.
#
#
OEM:/software/oracle/agent/agent12c/core/12.1.0.3.0:N
*:/software/oracle/agent/agent11g:N
dev068:/software/oracle/ora-10.02.00.04.11:Y
dev299:/software/oracle/ora-10.02.00.04.11:Y
xtst036:/software/oracle/ora-10.02.00.04.11:Y
xtst161:/software/oracle/ora-10.02.00.04.11:Y
dev360:/software/oracle/ora-11.02.00.04.02:Y
dev361:/software/oracle/ora-11.02.00.04.02:Y
xtst215:/software/oracle/ora-11.02.00.04.02:Y
xtst216:/software/oracle/ora-11.02.00.04.02:Y
dev298:/software/oracle/ora-11.02.00.04.03:Y
xtst160:/software/oracle/ora-11.02.00.04.03:Y
What the code used to produce and throw into an array:
dev068
dev299
xtst036
xtst161
dev360
dev361
xtst215
xtst216
dev298
xtst160
It would look at the file (oratab), find the database names (e.g. xtst160), and put them into an array. I then used this array for other tasks later in the script. Here's the relevant Bash script code:
# Collect the databases using a mixture of AWK and regex, and throw it into an array.
printf "\n2) Collecting databases on %s:\n" $HOSTNAME
declare -a arr_dbs=(`awk -F: -v key='/software/oracle/ora' '$2 ~ key{print $ddma_input}' /etc/oratab`)
# Loop through and print the array of databases.
for i in ${arr_dbs[#]}
do
printf "%s " $i
done
It doesn't seem anyone has modified the code or that the oratab file format has changed. So I'm not 100% sure what's going on now. Instead of grabbing the few characters, it's grabbing the entire line:
dev068:/software/oracle/ora-10.02.00.04.11:Y
I'm trying to understand Bash and regex more but I'm stumped. Definitely not my forte. A broken down explanation of the awk line would be greatly appreciated.

I found the error. We changed the amount of arguments being passed in and the order they are received.

printing $1 instead $ddma_input and resolve the issue as well.
declare -a arr_dbs=(`awk -F ":" -v key='/software/oracle/ora' '$2 ~ key{print $1}' /etc/oratab`)
# Loop through and print the array of databases.
for i in ${arr_dbs[#]}
do
printf "%s " $i
done

You could easily implement this whole thing in native bash with no external tools at all:
arr_dbs=( )
while IFS= read -r line; do
case $line in
"#"*) continue ;;
*:/software/oracle/ora*:*) arr_dbs+=( "${line%%:*}" ) ;;
esac
done </etc/oratab
printf ' %s\n' "${arr_dbs[#]}"
This actually avoids some bugs you had in your original implementation. Let's say you had a line like the following:
*:/software/oracle/ora-default:Y
If you aren't careful with how you handle that *, it'll be replaced with a list of filenames in the current directory by the shell whenever expansion occurs.
What does "whenever expansion occurs" mean in this context? Well:
# this will expand a * into a list of filenames during the assignment to the array
arr=( $(echo "*") ) # vs the correct read -a arr < <(echo "*")
# this will expand a * into a list of filenames while generating items to iterate over
for i in ${arr[#]} # vs the correct for i in "${arr[#]}"
# this will expand a * into a list of filenames while building the argument list for echo
i="*"
echo $i # vs the correct printf '%s\n' "$i"
Note the use of printf over echo -- see the APPLICATION USAGE section of the POSIX specification of echo.

Related

Can I get the name of the file currently being read in a for loop?

I want to write a script that takes a word as an argument and searches the current and sub directories' files for the word. if it is found in any of the files it should echo out a message containing the file name and the line the word is found on.
this is what I have so far, but I can't find a way to actually store the file name of the file being read or the line number..
word=$1
for var in $(grep -R "$word *")
do
filename=$(find . -type f -name "*") ------- //this doesnt work
linenmbr=$(grep -n "$ord" file) ----------- //this doesnt work
echo found $word in $filename on line number $linenmbr
done

In bash, any time you are looping, you want to avoid calling utilities (e.g. grep and find) within the loop. That is horribly inefficient because it will spawn a separate subshell for every utility every iteration. (which for 10 iterations -- that is 20 additional subshells, it adds up quick) So in your case, you call grep to feed the loop, and then spawn a separate subshell calling grep again within the loop as well as spawning a separate subshell for find.
You should think of a way to only call grep (or a utility that will provide the needed information) only once, and then parse the output.
If you did want to use grep, then calling grep -rn within a process substitution which is used to feed a while loop is probably as good as you are going to get. You can then use the bash builtin parameter expansions to isolate the filename and line-numbers which will be about as efficient as bash could get, e.g.
#!/bin/bash
[ -z "$1" ] && { ## validate at least 1 input given
printf "error: insufficient input.\nusage: %s srch_term\n" "${0##*/}"
exit 1
}
while read -r line; do ## read each line of grep output
fn="${line%%:*}" ## isolate filename
no="${line#*:}" ## remove filename
no="${no%%:*}" ## isolate number
printf "found %s in %s on line number %d\n" "$1" "$fn" "$no"
done < <(grep -rn "$1") ## grep in process substitution
Choosing A More Efficient Method
If you can accomplish what you are attempting with one of the stream editing tools, e.g. awk or sed, you are likely to be able to isolate the wanted information an order of magnitude faster. For example, using awk and setting globstar you could do something similar to the following:
#!/bin/bash
shopt -s globstar ## set globstar
[ -z "$1" ] && { ## validate at least 1 input given
printf "error: insufficient input.\nusage: %s srch_term\n" "${0##*/}"
exit 1
}
## find all matching files and line numbers
awk -v word="$1" '/'$1'/ {
print "found",word,"in",FILENAME,"on line number",FNR; next
}' **/* 2>/dev/null
Give both a try and let me know if you have further questions.
If you want to compare and ensure both are producing the same output, you can use diff to confirm, e.g.
$ diff <(grepscript.sh | sort) <(awkscript.sh | sort)
(if no difference is reported, the output is the same)

Linux Scripting with Spaces in Filenames

I am currently working with a vendor-provided software that is trying to handle sending attachment files to another script that will text-extract from the listed file. The script fails when we receive files from an outside source that contain spaces, as the vendor-supplied software does not surround the filename in quotes - meaning when the text-extraction script is run, it receives a filename that will split apart on the space and cause an error on the extractor script. The vendor-provided software is not editable by us.
This whole process is designed to be an automated transfer, so having this wrench that could be randomly thrown into the gears is an issue.
What we're trying to do, is handle the spaced name in our text extractor script, since that is the piece we have some control over. After a quick Google, it seems like changing the IFS value for the script would be the quick solution, but unfortunately, that script would take effect after the extensions have already mutilated the incoming data.
The script I'm using takes in a -e value, a -i value, and a -o value. These values are sent from the vendor supplied script, which I have no editing control over.
#!/bin/bash
usage() { echo "Usage: $0 -i input -o output -e encoding" 1>&2; exit 1; }
while getopts ":o:i:e:" o; do
case "${o}" in
i)
inputfile=${OPTARG}
;;
o)
outputfile=${OPTARG}
;;
e)
encoding=${OPTARG}
;;
*)
usage
;;
esac
done
shift $((OPTIND-1))
...
...
<Uses the inputfile, outputfile, and encoding variables>
I admit, there may be pieces to this I don't fully understand, and it could be a simple fix, but my end goal is to be able to extract -o, -i, and -e that all contain 1 value, regardless of the spaces within each section. I can handle quoting the script after I can extract the filename value

The script fragment that you have posted does not have any issues with spaces in the arguments.
The following, for example, does not need quoting (since it's an assignment):
inputfile=${OPTARG}
All other uses of $inputfile in the script should be double quoted.
What matters is how this script is called.
This would fail and would assign only hello to the variable inputfile:
$ ./script.sh -i hello world.txt
The string world.txt would prompt the getopts function to stop processing the command line and the script would continue with the shift (world.txt would be left in $1 afterwards).
The following would correctly assign the string hello world.txt to inputfile:
$ ./script.sh -i "hello world.txt"
as would
$ ./script.sh -i hello\ world.txt

The following script uses awk to split the arguments while including spaces in the file names. The arguments can be in any order. It does not handle multiple consecutive spaces in an argument, it collapses them to one.
#!/bin/bash
IFS=' '
str=$(printf "%s" "$*")
istr=$(echo "${str}" | awk 'BEGIN {FS="-i"} {print $2}' | awk 'BEGIN {FS="-o"} {print $1}' | awk 'BEGIN {FS="-e"} {print $1}')
estr=$(echo "${str}" | awk 'BEGIN {FS="-e"} {print $2}' | awk 'BEGIN {FS="-o"} {print $1}' | awk 'BEGIN {FS="-i"} {print $1}')
ostr=$(echo "${str}" | awk 'BEGIN {FS="-o"} {print $2}' | awk 'BEGIN {FS="-e"} {print $1}' | awk 'BEGIN {FS="-i"} {print $1}')
inputfile=""${istr}""
outputfile=""${ostr}""
encoding=""${estr}""
# call the jar
There was an issue when calling the jar where Java threw a MalformedUrlException on a filename with a space.

So after reading through the commentary, we decided that although it may not be the right answer for every scenario, the right answer for this specific scenario was to extract the pieces manually.
Because we are building this for a pre-built script passing to it, and we aren't updating that script any time soon, we can accept with certainty that this script will always receive a -i, -o, and -e flag, and there will be spaces between them, which causes all the pieces passed in to be stored in different variables in $*.
And we can assume that the text after a flag is the response to the flag, until another flag is referenced. This leaves us 3 scenarios:
The variable contains one of the flags
The variable contains the first piece of a parameter immediately after the flag
The variable contains part 2+ of a parameter, and the space in the name was interpreted as a split, and needs to be reinserted.
One of the other issues I kept running into was trying to get string literals to equate to variables in my IF statements. To resolve that issue, I pre-stored all relevant data in array variables, so I could test $variable == $otherVariable.
Although I don't expect it to change, we also handled what to do if the three flags appear in a different order than we anticipate (Our assumption was that they list as i,o,e... but we can't see excatly what is passed). The parameters are dumped into an array in the order they were read in, and a parallel array tracks whether the items in slots 0,1,2 relate to i,o,e.
The final result still has one flaw: if there is more than one consecutive space in the filename, the whitespace is trimmed before processing, and I can only account for one space. But saying as we processed over 4000 files before encountering one with a space, I find it unlikely with the naming conventions that we would encounter something with more than one space.
At that point, we would have to be stepping in for a rare intervention anyways.
Final code change is as follows:
#!/bin/bash
IFS='|'
position=-1
ioeArray=("" "" "")
previous=""
flagArr=("-i" "-o" "-e" " ")
ioePattern=(0 1 2)
#echo "for loop:"
for i in $*; do
#printf "%s\n" "$i"
if [ "$i" == "${flagArr[0]}" ] || [ "$i" == "${flagArr[1]}" ] || [ "$i" == "${flagArr[2]}" ]; then
((position += 1));
previous=$i;
case "$i" in
"${flagArr[0]}")
ioePattern[$position]=0
;;
"${flagArr[1]}")
ioePattern[$position]=1
;;
"${flagArr[2]}")
ioePattern[$position]=2
;;
esac
continue;
fi
if [[ $previous == "-"* ]]; then
ioeArray[$position]=${ioeArray[$position]}$i;
else
ioeArray[$position]=${ioeArray[$position]}" "$i;
fi
previous=$i;
done
echo "extracting (${ioeArray[${ioePattern[0]}]}) to (${ioeArray[${ioePattern[1]}]}) with (${ioeArray[${ioePattern[2]}]}) encoding."
inputfile=""${ioeArray[${ioePattern[0]}]}"";
outputfile=""${ioeArray[${ioePattern[1]}]}"";
encoding=""${ioeArray[${ioePattern[2]}]}"";

Extract substrings from a file and store them in shell variables

I am working on a script. I have a file called test.txt whose contents are as follows:
a. parent = 192.168.1.2
b. child1 = 192.168.1.21
c. child2 = 192.154.1.2
I need to store the values in three different variables called parent, child1and child2 as follows and then my script will use these values:
parent = 192.168.1.2
child1= 192.168.1.21
child2= 192.154.1.2
How can I do that using sed or awk? I know there is a way to extract substrings using awk function substr but my particular requirement is tostore them in variables as mentioned above. Thanks

Try this if you're using bash:
$ declare $(awk '{print $2"="$4}' file)
$ echo "$parent"
192.168.1.2
If the file contained white space in the values you want to init the variables with then you'd just have to set IFS to a newline before invoking declare, e.g. (simplified the input file to highlight the important part of white space on the right of the = signs):
$ cat file
parent=192.168.1.2 is first
child1=192.168.1.21 comes after it
child2=and then theres 192.154.1.2
$ IFS=$'\n'; declare $(awk -F'=' '{print $1"="$2}' file)
$ echo "$parent"
192.168.1.2 is first
$ echo "$child1"
192.168.1.21 comes after it

Ed Morton's answer is the way to go for the specific problem at hand - elegant and concise.
Update: Ed has since updated his answer to also provide a solution that correctly deals with variable value values with embedded spaces - the original lack of which prompted this answer.
His solution is superior to this one - more concise and more efficient (the only caveat is that you may have to restore the previous $IFS value afterward).
This solution may still be of interest if you need to process variable definitions one by one, e.g., in order to transform variable values based on other shell functions or variables before assigning them.
The following uses bash with process substitution on a simplified problem to process variable definitions one by one:
#!/usr/bin/env bash
while read -r name val; do # read a name-value pair
# Assign the value after applying a transformation to it; e.g.:
# 'value of' -> 'value:'
declare $name="${val/ of /: }" # `declare "$name=${val/ of /: }"` would work too.
done < <(awk -F= '{print $1, $2}' <<<$'v1=value of v1\nv2= value of v2')
echo "v1=[$v1], v2=[$v2]" # -> 'v1=[value: v1], v2=[value: v2]'
awk's output lines are read line by line, split into name and value, and declared as shell variables individually.
Since read, which trims by whitespace, is only given 2 variable names to read into, the 2nd one receives everything from the 2nd token _through the end of the line, thus preserving interior whitespace (and, as written, will trim leading and trailing whitespace in the process).
Note that declare normally does not require a variable reference on the RHS of the assignment (the value) to be double-quoted (e.g. a=$b; though it never hurts). In this particular case, however - seemingly because the LHS (the name) is also a variable reference - the double quotes are needed.

I also got it done finally . Thanks everyone for helping.
counter=0
while read line
do
declare $(echo $line | awk '{print $2"="$4}')
#echo "$parent"
if [ $counter = 0 ]
then
parent=$(echo $parent)
fi
if [ $counter = 1 ]
then
child1=$(echo $child)
else
child2=$(echo $child)
fi
counter=$((counter+1))
done < "/etc/cluster_info.txt"

eval "$( sed 's/..//;s/ *//g' YourFile )"
just a sed equivalent to Ed solution and with an eval instead of declare.

Looping through the elements of a path variable in Bash

I want to loop through a path list that I have gotten from an echo $VARIABLE command.
For example:
echo $MANPATH will return
/usr/lib:/usr/sfw/lib:/usr/info
So that is three different paths, each separated by a colon. I want to loop though each of those paths. Is there a way to do that? Thanks.
Thanks for all the replies so far, it looks like I actually don't need a loop after all. I just need a way to take out the colon so I can run one ls command on those three paths.

You can set the Internal Field Separator:
( IFS=:
for p in $MANPATH; do
echo "$p"
done
)
I used a subshell so the change in IFS is not reflected in my current shell.

The canonical way to do this, in Bash, is to use the read builtin appropriately:
IFS=: read -r -d '' -a path_array < <(printf '%s:\0' "$MANPATH")
This is the only robust solution: will do exactly what you want: split the string on the delimiter : and be safe with respect to spaces, newlines, and glob characters like *, [ ], etc. (unlike the other answers: they are all broken).
After this command, you'll have an array path_array, and you can loop on it:
for p in "${path_array[#]}"; do
printf '%s\n' "$p"
done

You can use Bash's pattern substitution parameter expansion to populate your loop variable. For example:
MANPATH=/usr/lib:/usr/sfw/lib:/usr/info
# Replace colons with spaces to create list.
for path in ${MANPATH//:/ }; do
echo "$path"
done
Note: Don't enclose the substitution expansion in quotes. You want the expanded values from MANPATH to be interpreted by the for-loop as separate words, rather than as a single string.

In this way you can safely go through the $PATH with a single loop, while $IFS will remain the same inside or outside the loop.
while IFS=: read -d: -r path; do # `$IFS` is only set for the `read` command
echo $path
done <<< "${PATH:+"${PATH}:"}" # append an extra ':' if `$PATH` is set
You can check the value of $IFS,
IFS='xxxxxxxx'
while IFS=: read -d: -r path; do
echo "${IFS}${path}"
done <<< "${PATH:+"${PATH}:"}"
and the output will be something like this.
xxxxxxxx/usr/local/bin
xxxxxxxx/usr/bin
xxxxxxxx/bin
Reference to another question on StackExchange.

for p in $(echo $MANPATH | tr ":" " ") ;do
echo $p
done

IFS=:
arr=(${MANPATH})
for path in "${arr[#]}" ; do # <- quotes required
echo $path
done
... it does take care of spaces :o) but also adds empty elements if you have something like:
:/usr/bin::/usr/lib:
... then index 0,2 will be empty (''), cannot say why index 4 isnt set at all

This can also be solved with Python, on the command line:
python -c "import os,sys;[os.system(' '.join(sys.argv[1:]).format(p)) for p in os.getenv('PATH').split(':')]" echo {}
Or as an alias:
alias foreachpath="python -c \"import os,sys;[os.system(' '.join(sys.argv[1:]).format(p)) for p in os.getenv('PATH').split(':')]\""
With example usage:
foreachpath echo {}
The advantage to this approach is that {} will be replaced by each path in succession. This can be used to construct all sorts of commands, for instance to list the size of all files and directories in the directories in $PATH. including directories with spaces in the name:
foreachpath 'for e in "{}"/*; do du -h "$e"; done'
Here is an example that shortens the length of the $PATH variable by creating symlinks to every file and directory in the $PATH in $HOME/.allbin. This is not useful for everyday usage, but may be useful if you get the too many arguments error message in a docker container, because bitbake uses the full $PATH as part of the command line...
mkdir -p "$HOME/.allbin"
python -c "import os,sys;[os.system(' '.join(sys.argv[1:]).format(p)) for p in os.getenv('PATH').split(':')]" 'for e in "{}"/*; do ln -sf "$e" "$HOME/.allbin/$(basename $e)"; done'
export PATH="$HOME/.allbin"
This should also, in theory, speed up regular shell usage and shell scripts, since there are fewer paths to search for every command that is executed. It is pretty hacky, though, so I don't recommend that anyone shorten their $PATH this way.
The foreachpath alias might come in handy, though.

Combining ideas from:
https://stackoverflow.com/a/29949759 - gniourf_gniourf
https://stackoverflow.com/a/31017384 - Yi H.
code:
PATHVAR='foo:bar baz:spam:eggs:' # demo path with space and empty
printf '%s:\0' "$PATHVAR" | while IFS=: read -d: -r p; do
echo $p
done | cat -n
output:
1 foo
2 bar baz
3 spam
4 eggs
5

You can use Bash's for X in ${} notation to accomplish this:
for p in ${PATH//:/$'\n'} ; do
echo $p;
done

OP's update wants to ls the resulting folders, and has pointed out that ls only requires a space-separated list.
ls $(echo $PATH | tr ':' ' ') is nice and simple and should fit the bill nicely.

reading a file using shell script

I have a text file named sqlfile, with the following content:
a.sql
b.sql
c.sql
d.sql
What I want is that to store them in variables and then print using for loop.
But here I get only d.sql in the output of the script.
The script:
#!/bin/bash
while read line
do
files=`echo $line`
done < /home/abdul_old/Desktop/My_Shell_Script/sqlfile
for file in $files
do
echo $file
done

A variable can only hold one element, what you want is an array
#!/bin/bash
while read line
do
files+=( "$line" )
done < /home/abdul_old/Desktop/My_Shell_Script/sqlfile
for file in "${files[#]}"
do
echo "$file"
done

while read line
do files="$files $line"
done < /home/abdul_old/Desktop/My_Shell_Script/sqlfile
or
files=$(</home/abdul_old/Desktop/My_Shell_Script/sqlfile)
or
files=$(cat /home/abdul_old/Desktop/My_Shell_Script/sqlfile)
You're doing way too much work in your loop.
The middle alternative works with bash; the other two work with most shells. Prefer $(...) to back-quotes.
This code assumes there are no spaces in file names to mess things up. If you do use blanks in file names, you have to work marginally harder - see the array-based solution by SiegeX

I think you need to make the "files" as array. otherwise, as soon as the while finishes, "files" stores the latest "line".
try:
files=( "${files[#]}" $line )

That's right, you assifn last value to "files"
You must use for instance += instead of =
#!/bin/bash
while read line
do
files+=`echo " $line"`
done < /home/abdul_old/Desktop/My_Shell_Script/sqlfile
for file in $files
do
echo $file
done

Using read is fine but you have to set the IFS environment variable first else leading and trailing white space are removed from each line: Preserving leading white space while reading>>writing a file line by line in bash.

All you have to do is:
readarray myData < sqlfile
This will put file lines into an array called myData
Now you can access any of these lines like this:
printf "%s\n" "${myData[0]}" #outputs first line
printf "%s\n" "${myData[2]}" #outputs third line
And you can iterate over it:
for curLine in "${myData[#]}"; do
echo "$curLine"
done
Note that these lines would contain \n character as well. To remove trailing newlines you can use -t flag like this:
readarray -t myData < sqlfile
readarray is a synonym to mapfile. You can read about it in man bash

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Bash Issue: AWK - linux

I found the error. We changed the amount of arguments being passed in and the order they are received.

printing $1 instead $ddma_input and resolve the issue as well. declare -a arr_dbs=(`awk -F ":" -v key='/software/oracle/ora' '$2 ~ key{print $1}' /etc/oratab`) # Loop through and print the array of databases. for i in ${arr_dbs[#]} do printf "%s " $i done