Extract substrings from a file and store them in shell variables

Extract substrings from a file and store them in shell variables - linux

I am working on a script. I have a file called test.txt whose contents are as follows:
a. parent = 192.168.1.2
b. child1 = 192.168.1.21
c. child2 = 192.154.1.2
I need to store the values in three different variables called parent, child1and child2 as follows and then my script will use these values:
parent = 192.168.1.2
child1= 192.168.1.21
child2= 192.154.1.2
How can I do that using sed or awk? I know there is a way to extract substrings using awk function substr but my particular requirement is tostore them in variables as mentioned above. Thanks

Try this if you're using bash:
$ declare $(awk '{print $2"="$4}' file)
$ echo "$parent"
192.168.1.2
If the file contained white space in the values you want to init the variables with then you'd just have to set IFS to a newline before invoking declare, e.g. (simplified the input file to highlight the important part of white space on the right of the = signs):
$ cat file
parent=192.168.1.2 is first
child1=192.168.1.21 comes after it
child2=and then theres 192.154.1.2
$ IFS=$'\n'; declare $(awk -F'=' '{print $1"="$2}' file)
$ echo "$parent"
192.168.1.2 is first
$ echo "$child1"
192.168.1.21 comes after it

Ed Morton's answer is the way to go for the specific problem at hand - elegant and concise.
Update: Ed has since updated his answer to also provide a solution that correctly deals with variable value values with embedded spaces - the original lack of which prompted this answer.
His solution is superior to this one - more concise and more efficient (the only caveat is that you may have to restore the previous $IFS value afterward).
This solution may still be of interest if you need to process variable definitions one by one, e.g., in order to transform variable values based on other shell functions or variables before assigning them.
The following uses bash with process substitution on a simplified problem to process variable definitions one by one:
#!/usr/bin/env bash
while read -r name val; do # read a name-value pair
# Assign the value after applying a transformation to it; e.g.:
# 'value of' -> 'value:'
declare $name="${val/ of /: }" # `declare "$name=${val/ of /: }"` would work too.
done < <(awk -F= '{print $1, $2}' <<<$'v1=value of v1\nv2= value of v2')
echo "v1=[$v1], v2=[$v2]" # -> 'v1=[value: v1], v2=[value: v2]'
awk's output lines are read line by line, split into name and value, and declared as shell variables individually.
Since read, which trims by whitespace, is only given 2 variable names to read into, the 2nd one receives everything from the 2nd token _through the end of the line, thus preserving interior whitespace (and, as written, will trim leading and trailing whitespace in the process).
Note that declare normally does not require a variable reference on the RHS of the assignment (the value) to be double-quoted (e.g. a=$b; though it never hurts). In this particular case, however - seemingly because the LHS (the name) is also a variable reference - the double quotes are needed.

I also got it done finally . Thanks everyone for helping.
counter=0
while read line
do
declare $(echo $line | awk '{print $2"="$4}')
#echo "$parent"
if [ $counter = 0 ]
then
parent=$(echo $parent)
fi
if [ $counter = 1 ]
then
child1=$(echo $child)
else
child2=$(echo $child)
fi
counter=$((counter+1))
done < "/etc/cluster_info.txt"

eval "$( sed 's/..//;s/ *//g' YourFile )"
just a sed equivalent to Ed solution and with an eval instead of declare.

Related

Increment a variable name in ksh [duplicate]

Seems that the recommended way of doing indirect variable setting in bash is to use eval:
var=x; val=foo
eval $var=$val
echo $x # --> foo
The problem is the usual one with eval:
var=x; val=1$'\n'pwd
eval $var=$val # bad output here
(and since it is recommended in many places, I wonder just how many scripts are vulnerable because of this...)
In any case, the obvious solution of using (escaped) quotes doesn't really work:
var=x; val=1\"$'\n'pwd\"
eval $var=\"$val\" # fail with the above
The thing is that bash has indirect variable reference baked in (with ${!foo}), but I don't see any such way to do indirect assignment -- is there any sane way to do this?
For the record, I did find a solution, but this is not something that I'd consider "sane"...:
eval "$var='"${val//\'/\'\"\'\"\'}"'"

A slightly better way, avoiding the possible security implications of using eval, is
declare "$var=$val"
Note that declare is a synonym for typeset in bash. The typeset command is more widely supported (ksh and zsh also use it):
typeset "$var=$val"
In modern versions of bash, one should use a nameref.
declare -n var=x
x=$val
It's safer than eval, but still not perfect.

Bash has an extension to printf that saves its result into a variable:
printf -v "${VARNAME}" '%s' "${VALUE}"
This prevents all possible escaping issues.
If you use an invalid identifier for $VARNAME, the command will fail and return status code 2:
$ printf -v ';;;' '%s' foobar; echo $?
bash: printf: `;;;': not a valid identifier
2

eval "$var=\$val"
The argument to eval should always be a single string enclosed in either single or double quotes. All code that deviates from this pattern has some unintended behavior in edge cases, such as file names with special characters.
When the argument to eval is expanded by the shell, the $var is replaced with the variable name, and the \$ is replaced with a simple dollar. The string that is evaluated therefore becomes:
varname=$value
This is exactly what you want.
Generally, all expressions of the form $varname should be enclosed in double quotes, to prevent accidental expansion of filename patterns like *.c.
There are only two places where the quotes may be omitted since they are defined to not expand pathnames and split fields: variable assignments and case. POSIX 2018 says:
Each variable assignment shall be expanded for tilde expansion, parameter expansion, command substitution, arithmetic expansion, and quote removal prior to assigning the value.
This list of expansions is missing the parameter expansion and the field splitting. Sure, that's hard to see from reading this sentence alone, but that's the official definition.
Since this is a variable assignment, the quotes are not needed here. They don't hurt, though, so you could also write the original code as:
eval "$var=\"the value is \$val\""
Note that the second dollar is escaped using a backslash, to prevent it from being expanded in the first run. What happens is:
eval "$var=\"the value is \$val\""
The argument to the command eval is sent through parameter expansion and unescaping, resulting in:
varname="the value is $val"
This string is then evaluated as a variable assignment, which assigns the following value to the variable varname:
the value is value

The main point is that the recommended way to do this is:
eval "$var=\$val"
with the RHS done indirectly too. Since eval is used in the same
environment, it will have $val bound, so deferring it works, and since
now it's just a variable. Since the $val variable has a known name,
there are no issues with quoting, and it could have even been written as:
eval $var=\$val
But since it's better to always add quotes, the former is better, or
even this:
eval "$var=\"\$val\""
A better alternative in bash that was mentioned for the whole thing that
avoids eval completely (and is not as subtle as declare etc):
printf -v "$var" "%s" "$val"
Though this is not a direct answer what I originally asked...

Newer versions of bash support something called "parameter transformation", documented in a section of the same name in bash(1).
"${value#Q}" expands to a shell-quoted version of "${value}" that you can re-use as input.
Which means the following is a safe solution:
eval="${varname}=${value#Q}"

Just for completeness I also want to suggest the possible use of the bash built in read. I've also made corrections regarding -d'' based on socowi's comments.
But much care needs to be exercised when using read to ensure the input is sanitized (-d'' reads until null termination and printf "...\0" terminates the value with a null), and that read itself is executed in the main shell where the variable is needed and not a sub-shell (hence the < <( ... ) syntax).
var=x; val=foo0shouldnotterminateearly
read -d'' -r "$var" < <(printf "$val\0")
echo $x # --> foo0shouldnotterminateearly
echo ${!var} # --> foo0shouldnotterminateearly
I tested this with \n \t \r spaces and 0, etc it worked as expected on my version of bash.
The -r will avoid escaping \, so if you had the characters "\" and "n" in your value and not an actual newline, x will contain the two characters "\" and "n" also.
This method may not be aesthetically as pleasing as the eval or printf solution, and would be more useful if the value is coming in from a file or other input file descriptor
read -d'' -r "$var" < <( cat $file )
And here are some alternative suggestions for the < <() syntax
read -d'' -r "$var" <<< "$val"$'\0'
read -d'' -r "$var" < <(printf "$val") #Apparently I didn't even need the \0, the printf process ending was enough to trigger the read to finish.
read -d'' -r "$var" <<< $(printf "$val")
read -d'' -r "$var" <<< "$val"
read -d'' -r "$var" < <(printf "$val")

Yet another way to accomplish this, without eval, is to use "read":
INDIRECT=foo
read -d '' -r "${INDIRECT}" <<<"$(( 2 * 2 ))"
echo "${foo}" # outputs "4"

Bash Issue: AWK

I came back to work from a break to see that my Bash script wasn't working like it used to. The below tid-bit of code would grab and filter what's in a file. Here's the contents of said file:
# A colon, ':', is used as the field terminator. A new line terminates
# the entry. Lines beginning with a pound sign, '#', are comments.
#
# Entries are of the form:
# $ORACLE_SID:$ORACLE_HOME:<N|Y>:
#
# The first and second fields are the system identifier and home
# directory of the database respectively. The third filed indicates
# to the dbstart utility that the database should , "Y", or should not,
# "N", be brought up at system boot time.
#
# Multiple entries with the same $ORACLE_SID are not allowed.
#
#
OEM:/software/oracle/agent/agent12c/core/12.1.0.3.0:N
*:/software/oracle/agent/agent11g:N
dev068:/software/oracle/ora-10.02.00.04.11:Y
dev299:/software/oracle/ora-10.02.00.04.11:Y
xtst036:/software/oracle/ora-10.02.00.04.11:Y
xtst161:/software/oracle/ora-10.02.00.04.11:Y
dev360:/software/oracle/ora-11.02.00.04.02:Y
dev361:/software/oracle/ora-11.02.00.04.02:Y
xtst215:/software/oracle/ora-11.02.00.04.02:Y
xtst216:/software/oracle/ora-11.02.00.04.02:Y
dev298:/software/oracle/ora-11.02.00.04.03:Y
xtst160:/software/oracle/ora-11.02.00.04.03:Y
What the code used to produce and throw into an array:
dev068
dev299
xtst036
xtst161
dev360
dev361
xtst215
xtst216
dev298
xtst160
It would look at the file (oratab), find the database names (e.g. xtst160), and put them into an array. I then used this array for other tasks later in the script. Here's the relevant Bash script code:
# Collect the databases using a mixture of AWK and regex, and throw it into an array.
printf "\n2) Collecting databases on %s:\n" $HOSTNAME
declare -a arr_dbs=(`awk -F: -v key='/software/oracle/ora' '$2 ~ key{print $ddma_input}' /etc/oratab`)
# Loop through and print the array of databases.
for i in ${arr_dbs[#]}
do
printf "%s " $i
done
It doesn't seem anyone has modified the code or that the oratab file format has changed. So I'm not 100% sure what's going on now. Instead of grabbing the few characters, it's grabbing the entire line:
dev068:/software/oracle/ora-10.02.00.04.11:Y
I'm trying to understand Bash and regex more but I'm stumped. Definitely not my forte. A broken down explanation of the awk line would be greatly appreciated.

I found the error. We changed the amount of arguments being passed in and the order they are received.

printing $1 instead $ddma_input and resolve the issue as well.
declare -a arr_dbs=(`awk -F ":" -v key='/software/oracle/ora' '$2 ~ key{print $1}' /etc/oratab`)
# Loop through and print the array of databases.
for i in ${arr_dbs[#]}
do
printf "%s " $i
done

You could easily implement this whole thing in native bash with no external tools at all:
arr_dbs=( )
while IFS= read -r line; do
case $line in
"#"*) continue ;;
*:/software/oracle/ora*:*) arr_dbs+=( "${line%%:*}" ) ;;
esac
done </etc/oratab
printf ' %s\n' "${arr_dbs[#]}"
This actually avoids some bugs you had in your original implementation. Let's say you had a line like the following:
*:/software/oracle/ora-default:Y
If you aren't careful with how you handle that *, it'll be replaced with a list of filenames in the current directory by the shell whenever expansion occurs.
What does "whenever expansion occurs" mean in this context? Well:
# this will expand a * into a list of filenames during the assignment to the array
arr=( $(echo "*") ) # vs the correct read -a arr < <(echo "*")
# this will expand a * into a list of filenames while generating items to iterate over
for i in ${arr[#]} # vs the correct for i in "${arr[#]}"
# this will expand a * into a list of filenames while building the argument list for echo
i="*"
echo $i # vs the correct printf '%s\n' "$i"
Note the use of printf over echo -- see the APPLICATION USAGE section of the POSIX specification of echo.

Rename a variable in a for loop

Lets say i have a nested for loop:
for i in $test
do
name=something
for j in $test2
do
name2=something
jj=$j | sed s/'tRap\/tRapTrain'/'BEEML\/BEEMLTrain'/g
if [ name == name2 ]
then
qsub scrip.sh $i $j $jj
fi
done
done
Now the problem occurs when i try to rename the variable $j into variable $jj. I only get empty values back for submitting the script within the if statement. Is there another way to rename variables so that i can pass them through to the if part of the code?
PS. i tried 3 for loops but this makes the script awfully slow.

Your problem is piping the assignment into sed. Try something like
jj=$(echo $j | sed s/'tRap\/tRapTrain'/'BEEML\/BEEMLTrain'/g)
This uses command substitution to assign jj.

This is not correct:
jj=$j | sed s/'tRap\/tRapTrain'/'BEEML\/BEEMLTrain'/g
In order to assign the output of a command to a variable you need to use command substitution like this:
jj=$(sed s/'tRap\/tRapTrain'/'BEEML\/BEEMLTrain'/g <<< "$j")
You may not even have to use sed because bash has in-built string replacement. For example, the following will replace foo with bar in the j variable and assign it to jj:
jj=${j//foo/bar}
There is also a problem with your if-statement. It should be:
if [ "$name" == "$name2" ]

A tiny little thing:
Sed treats the first character after the action selector as the field separator.
Knowing this you can translate your expresion:
sed s/'tRap\/tRapTrain'/'BEEML\/BEEMLTrain'/g
into:
sed s%'tRap/tRapTrain'%'BEEML/BEEMLTrain'%g
So you don't have to worry about scaping your slashes when substituting paths. I normally use '%', but feel free to use any other character. I think the optimal approach would be using a non printable character:
SEP=$'\001' ; sed s${SEP}'tRap/tRapTrain'${SEP}'BEEML/BEEMLTrain'${SEP}g

Concating string with shell script with accumulator

I'd like to convert a list separated with '\n' in another one separated with space.
Ex:
Get a dictionary like ispell english dictionary. http://downloads.sourceforge.net/wordlist/ispell-enwl-3.1.20.zip
My initial idea was using a variable as accumulator:
a=""; cat american.0 | while read line; do a="$a $line"; done; echo $a
... but it results '\n' string!!!
Questions:
Why is it not working?
What is the correct way to do that?
Thanks.

The problem is that when you have a pipeline:
command_1 | command_2
each command is run in a separate subshell, with a separate copy of the parent environment. So any variables that the command creates, or any modifications it makes to existing variables, will not be perceived by the containing shell.
In your case, you don't really need the pipeline, because this:
cat filename | command
is equivalent, in every way that you need, to this:
command < filename
So you can write:
a=""; while read line; do a="$a $line"; done < american.0; echo $a
to avoid creating any subshells.
That said, according to this StackOverflow answer, you can't really rely on a shell variable being able to hold more than about 1–4KB of data, so you probably need to rethink your overall approach. Storing the entire word-list in a shell variable likely won't work, and even if it does, it likely won't work well.
Edited to add: To create a temporary file named /tmp/american.tmp that contains what the variable $a would have, you can write:
while IFS= read -r line; do
printf %s " $line"
done < american.0 > /tmp/american.tmp

If you want to replace '\n' with a space, you can simply use tr as follows:
a=$(tr '\n' ' ' < american.0)

Looping through the elements of a path variable in Bash

I want to loop through a path list that I have gotten from an echo $VARIABLE command.
For example:
echo $MANPATH will return
/usr/lib:/usr/sfw/lib:/usr/info
So that is three different paths, each separated by a colon. I want to loop though each of those paths. Is there a way to do that? Thanks.
Thanks for all the replies so far, it looks like I actually don't need a loop after all. I just need a way to take out the colon so I can run one ls command on those three paths.

You can set the Internal Field Separator:
( IFS=:
for p in $MANPATH; do
echo "$p"
done
)
I used a subshell so the change in IFS is not reflected in my current shell.

The canonical way to do this, in Bash, is to use the read builtin appropriately:
IFS=: read -r -d '' -a path_array < <(printf '%s:\0' "$MANPATH")
This is the only robust solution: will do exactly what you want: split the string on the delimiter : and be safe with respect to spaces, newlines, and glob characters like *, [ ], etc. (unlike the other answers: they are all broken).
After this command, you'll have an array path_array, and you can loop on it:
for p in "${path_array[#]}"; do
printf '%s\n' "$p"
done

You can use Bash's pattern substitution parameter expansion to populate your loop variable. For example:
MANPATH=/usr/lib:/usr/sfw/lib:/usr/info
# Replace colons with spaces to create list.
for path in ${MANPATH//:/ }; do
echo "$path"
done
Note: Don't enclose the substitution expansion in quotes. You want the expanded values from MANPATH to be interpreted by the for-loop as separate words, rather than as a single string.

In this way you can safely go through the $PATH with a single loop, while $IFS will remain the same inside or outside the loop.
while IFS=: read -d: -r path; do # `$IFS` is only set for the `read` command
echo $path
done <<< "${PATH:+"${PATH}:"}" # append an extra ':' if `$PATH` is set
You can check the value of $IFS,
IFS='xxxxxxxx'
while IFS=: read -d: -r path; do
echo "${IFS}${path}"
done <<< "${PATH:+"${PATH}:"}"
and the output will be something like this.
xxxxxxxx/usr/local/bin
xxxxxxxx/usr/bin
xxxxxxxx/bin
Reference to another question on StackExchange.

for p in $(echo $MANPATH | tr ":" " ") ;do
echo $p
done

IFS=:
arr=(${MANPATH})
for path in "${arr[#]}" ; do # <- quotes required
echo $path
done
... it does take care of spaces :o) but also adds empty elements if you have something like:
:/usr/bin::/usr/lib:
... then index 0,2 will be empty (''), cannot say why index 4 isnt set at all

This can also be solved with Python, on the command line:
python -c "import os,sys;[os.system(' '.join(sys.argv[1:]).format(p)) for p in os.getenv('PATH').split(':')]" echo {}
Or as an alias:
alias foreachpath="python -c \"import os,sys;[os.system(' '.join(sys.argv[1:]).format(p)) for p in os.getenv('PATH').split(':')]\""
With example usage:
foreachpath echo {}
The advantage to this approach is that {} will be replaced by each path in succession. This can be used to construct all sorts of commands, for instance to list the size of all files and directories in the directories in $PATH. including directories with spaces in the name:
foreachpath 'for e in "{}"/*; do du -h "$e"; done'
Here is an example that shortens the length of the $PATH variable by creating symlinks to every file and directory in the $PATH in $HOME/.allbin. This is not useful for everyday usage, but may be useful if you get the too many arguments error message in a docker container, because bitbake uses the full $PATH as part of the command line...
mkdir -p "$HOME/.allbin"
python -c "import os,sys;[os.system(' '.join(sys.argv[1:]).format(p)) for p in os.getenv('PATH').split(':')]" 'for e in "{}"/*; do ln -sf "$e" "$HOME/.allbin/$(basename $e)"; done'
export PATH="$HOME/.allbin"
This should also, in theory, speed up regular shell usage and shell scripts, since there are fewer paths to search for every command that is executed. It is pretty hacky, though, so I don't recommend that anyone shorten their $PATH this way.
The foreachpath alias might come in handy, though.

Combining ideas from:
https://stackoverflow.com/a/29949759 - gniourf_gniourf
https://stackoverflow.com/a/31017384 - Yi H.
code:
PATHVAR='foo:bar baz:spam:eggs:' # demo path with space and empty
printf '%s:\0' "$PATHVAR" | while IFS=: read -d: -r p; do
echo $p
done | cat -n
output:
1 foo
2 bar baz
3 spam
4 eggs
5

You can use Bash's for X in ${} notation to accomplish this:
for p in ${PATH//:/$'\n'} ; do
echo $p;
done

OP's update wants to ls the resulting folders, and has pointed out that ls only requires a space-separated list.
ls $(echo $PATH | tr ':' ' ') is nice and simple and should fit the bill nicely.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string