AWK output to array

AWK output to array - linux

I am learning about "AWK". Well, I need to output awk command on variable to parse it.
The file have 130000 lanes. I need put with AWK a column like array to use the variable in other part of the script. Sorry for my english and ask me if you dont understand my objetive.
well the code:
awk '/file:/{ name=$3 ; print name ;)' ejemplo.txt
I try:
list=$(awk '/file:/{ name=$3 ; print name;)' ejemplo.txt)
but when I try to show the content in the variable $list, only show me 1 lane.
I tried declare array but only show me 1 result
anyone understand what happen? how can i build a array with all the output?
I try this code to build a array with AWK. Maybe I am a little dumb but i dont look how to solve my problem:
#!/bin/bash
#filename: script2.sh
conta=$(cat ejemplo_hdfs.txt | wc -l)
for i in `seq 0 $conta`;do
objeto=" "
owner=" "
group=" "
awk '{
if($1=="#" && $2=="file:") objeto=$3;
else if($1=="#" && $2=="owner:") owner=$3;
else if($1=="#" && $2=="group:") group=$3;
else
print $3
;}' ejemplo_hdfs.txt
echo $objeto+","+$owner+","+$group
done

To assign an array to a variable in bash, the whole expression that generates the elements needs to be in parenthesis. Each word produced by the evaluated expression becomes an element of the resulting array.
Example:
#!/bin/bash
foo=($(awk -F, '{print $2}' x.txt))
# Size of array
echo "There are ${#foo[#]} elements"
# Iterate over each element
for f in "${foo[#]}"; do
echo "$f"
done
# Use a specific element.
echo "The second element is ${foo[1]}"
$ cat x.txt
1,a dog
2,b
3,c
$ ./array_example.sh
There are 4 elements
a
dog
b
c
The second element is dog

Related

Storing command values into a variable Bash Script

I am trying to loop through files in a directory to find an animal and its value. The command is supposed to only display the animal and total value. For example:
File1 has:
Monkey 11
Bear 4
File2 has:
Monkey 12
If I wanted the total value of monkeys then I would do:
for f in *; do
total=$(grep $animal $f | cut -d " " -f 2- | paste -sd+ | bc)
done
echo $animal $total
This would return the correct value of:
Monkey 23
However, if there is only one instance of an animal like for example Bear, the variable total doesn't return any value, I only get echoed:
Bear
Why is this the case and how do I fix it?
Note: I'm not allowed to use the find command.

you could use this little awk instead of for grep cut paste bc:
awk -v animal="Bear" '
$1 == animal { count += $2 }
END { print count + 0 }
' *

Comments on OP's question about why code behaves as it does:
total is reset on each pass through the loop so ...
upon leaving the loop total will have the count from the 'last' file processed
in the case of Bear the 'last' file processed is File2 and since File2 does not contain any entries for Bear we get total='', which is what's printed by the echo
if the Bear entry is moved from File1 to File2 then OP's code should print Bear 4
OP's current code effectively ignores all input files and prints whatever's in the 'last' file (File2 in this case)
OP's current code generates the following:
# Monkey
Monkey 12 # from File2
# Bear
Bear # no match in File2
I'd probably opt for replacing the whole grep/cut/paste/bc (4x subprocesses) with a single awk (1x subprocess) call (and assuming no matches we report 0):
for animal in Monkey Bear Hippo
do
total=$(awk -v a="${animal}" '$1==a {sum+=$2} END {print sum+0}' *)
echo "${animal} ${total}"
done
This generates:
Monkey 23
Bear 4
Hippo 0
NOTES:
I'm assuming OP's real code does more than echo the count to stdout hence the need of the total variable otherwise we could eliminate the total variable and have awk print the animal/sum pair directly to stdout
if OP's real code has a parent loop processing a list of animals it's likely possible a single awk call could process all of the animals at once; objective being to have awk generate the entire set of animal/sum pairs that could then be fed to the looping construct; if this is the case, and OP has some issues implementing a single awk solution, a new question should be asked

Why is this the case
grep outputs nothing, so nothing is propagated through the pipe and empty string is assigned to total.
Because total is reset every loop (total=anything without referencing previous value), it just has the value for the last file.
how do I fix it?
Do not try to do all at once, just less thing at once.
total=0
for f in *; do
count=$(grep "$animal" "$f" | cut -d " " -f 2-)
total=$((total + count)) # reuse total, reference previous value
done
echo "$animal" "$total"
A programmer fluent in shell will most probably jump to AWK for such problems. Remember to check your scripts with shellcheck.
With what you were trying to do, you could do all files at once:
total=$(
{
echo 0 # to have at least nice 0 if animal is not found
grep "$animal" * |
cut -d " " -f 2-
} |
paste -sd+ |
bc
)

With just bash:
declare -A animals=()
for f in *; do
while read -r animal value; do
(( animals[$animal] = ${animals[$animal]:-0} + value ))
done < "$f"
done
declare -p animals
outputs
declare -A animals=([Monkey]="23" [Bear]="4" )
With this approach, you have all the totals for all the animals by processing each file exactly once

$ head File*
==> File1 <==
Monkey 11
Bear 4
==> File2 <==
Monkey 12
==> File3 <==
Bear
Monkey
Using awk and bash array
#!/bin/bash
sumAnimals(){
awk '
{ NF == 1 ? a[$1]++ : a[$1]=a[$1]+$2 }
END{
for (i in a ) printf "[%s]=%d\n",i, a[i]
}
' File*
}
# storing all animals in bash array
declare -A animalsArr="( $(sumAnimals) )"
# show array content
declare -p animalsArr
# getting total from array
echo "Monkey: ${animalsArr[Monkey]}"
echo "Bear: ${animalsArr[Monkey]}"
Output
declare -A animalsArr=([Bear]="5" [Monkey]="24" )
Monkey: 24
Bear: 5

How to search the full string in file which is passed as argument in shell script?

i am passing a argument and that argument i have to match in file and extract the information. Could you please how I can get it?
Example:
I have below details in file-
iMedical_Refined_load_Procs_task_id=970113
HV_Rawlayer_Execution_Process=988835
iMedical_HV_Refined_Load=988836
DHS_RawLayer_Execution_Process=988833
iMedical_DHS_Refined_Load=988834
If I am passing 'hv' as argument so it should to pick 'iMedical_HV_Refined_Load' and give the result - '988836'
If I am passing 'dhs' so it should pick - 'iMedical_DHS_Refined_Load' and give the result = '988834'
I tried below logic but its not giving the result correctly. What Changes I need to do-
echo $1 | tr a-z A-Z
g=${1^^}
echo $g
echo $1
val=$(awk -F= -v s="$g" '$g ~ s{print $2}' /medaff/Scripts/Aggrify/sltconfig.cfg)
echo "TASK ID is $val"

Assuming your matching criteria is the first string after delimiter _ and the output needed is the numbers after the = char, then you can try this sed
$ sed -n "/_$1/I{s/[^=]*=\(.*\)/\1/p}" input_file
$ read -r input
hv
$ sed -n "/_$input/I{s/[^=]*=\(.*\)/\1/p}" input_file
988836
$ read -r input
dhs
$ sed -n "/_$input/I{s/[^=]*=\(.*\)/\1/p}" input_file
988834

If I'm reading it right, 2 quick versions -
$: cat 1
awk -F= -v s="_${1^^}_" '$1~s{print $2}' file
$: cat 2
sed -En "/_${1^^}_/{s/^.*=//;p;}" file
Both basically the same logic.
In pure bash -
$: cat 3
while IFS='=' read key val; do [[ "$key" =~ "_${1^^}_" ]] && echo "$val"; done < file
That's a lot less efficient, though.
If you know for sure there will be only one hit, all these could be improved a bit by short-circuit exits, but on such a small sample it won't matter at all. If you have a larger dataset to read, then I strongly suggest you formalize your specs better than "in this set I should get...".

Bash - replace in file strings from first array with strings from second array

I have two arrays. The first one is filled with values greped from the file that I want to replace with the new ones downloaded.
Please note that I don't know exactly how the first array will look like, meaning some values will have _, other - and some won't have any of that and immediately after the name will be placed : (colon).
Example arrays:
array1:
[account:123 shoppingcart-1:123 notification-core_1:123 notification-dispatcher_core_1:123 notification-dispatcher-smschannel_core_1:123]
array2:
[account_custom_2:124 shoppingcart_custom_2:124 notification_custom_2:124 notification-dispatcher_custom_2:124 notification-dispatcher-smschannel_custom_2:124]
Those arrays are the only example, there are more than 50 values to replace.
I am doing a comparison of every item in the second array with every item in the first array as shown below:
file_name="<path_to_file>/file.txt"
for i in "${!array1[#]}"
do
for j in "${!array2[#]}"
do
array_2_base="`echo ${array2[$j]} | awk -F_ '{print $1}'`"
if [[ "${array1[$i]}" == *"$array_2_base"* ]]
then
sed -i "s=${array1[$i]}=${array2[$j]}=g" $file_name
fi
done
done
Here I substring only the first part for every item in the second array so I can compare it with the item in the first array.
e.g. account_custom_2:124 -> account or notification-dispatcher_custom_2:124 -> notification-dispatcher.
This works nice but I encounter problem when notification is in notification-core_1:123 and notification-dispatcher_core_1:123 and notification-dispatcher-smschannel_core_1:123.
Can you please give advice on how to fix this or if you can suggest another approach to this?

The point is the base of array2 element may include other element
as a substring and will cause an improper replacement depending on the order
of matching.
To avoid this, you can sort the array in descending order so the
longer pattern comes first.
Assuming the strings in the arrays do not contain tab characters, would
you please try:
file_name="<path_to_file>/file.txt"
array1=(account:123 shoppingcart-1:123 notification-core_1:123 notification-dispatcher_core_1:123 notification-dispatcher-smschannel_core_1:123)
array2=(account_custom_2:124 shoppingcart_custom_2:124 notification_custom_2:124 notification-dispatcher_custom_2:124 notification-dispatcher-smschannel_custom_2:124)
# insert the following block to sort array2 in descending order
array2=( $(for j in "${array2[#]}"; do
array_2_base=${j%%_*}
printf "%s\t%s\n" "$array_2_base" "$j"
done | sort -r | cut -f2-) )
# the following code will work "as is"
for i in "${!array1[#]}"
do
for j in "${!array2[#]}"
do
array_2_base="`echo ${array2[$j]} | awk -F_ '{print $1}'`"
if [[ "${array1[$i]}" == *"$array_2_base"* ]]
then
sed -i "s=${array1[$i]}=${array2[$j]}=g" "$file_name"
delete="${array1[$i]}"
array1=( "${array1[#]/$delete}" )
fi
done
done
The script above will be inefficent in execution time due to the repetitive
invocation of sed -i command.
The script below will run faster
by pre-generating the sed script and executing it just once.
file_name="<path_to_file>/file.txt"
array1=(
account:123
shoppingcart-1:123
notification-core_1:123
notification-dispatcher_core_1:123
notification-dispatcher-smschannel_core_1:123
)
array2=(
account_custom_2:124
shoppingcart_custom_2:124
notification_custom_2:124
notification-dispatcher_custom_2:124
notification-dispatcher-smschannel_custom_2:124
)
while IFS=$'\t' read -r base a2; do # read the sorted list line by line
for a1 in "${array1[#]}"; do
if [[ $a1 == *$base* ]]; then
scr+="s=$a1=$a2=g;" # generate sed script by appending the "s" command
continue 2
fi
done
done < <(for j in "${array2[#]}"; do
array_2_base=${j%%_*} # substring before the 1st "_"
printf "%s\t%s\n" "$array_2_base" "$j"
# print base and original element side by side
done | sort -r)
sed -i "$scr" "$file_name" # execute the replacement at once

If number of items in your arrays are equal then you can process them in one loop
for i in "${!array1[#]}"; {
value=${array1[$i]}
new_value=${array2[$i]}
sed -i "s/$value/$new_value/" file
}

I found a way to fix this.
I am deleting string from the first array once replaced.
file_name="<path_to_file>/file.txt"
for i in "${!array1[#]}"
do
for j in "${!array2[#]}"
do
array_2_base="`echo ${array2[$j]} | awk -F_ '{print $1}'`"
if [[ "${array1[$i]}" == *"$array_2_base"* ]]
then
sed -i "s=${array1[$i]}=${array2[$j]}=g" $file_name
delete="${array1[$i]}"
array1=( "${array1[#]/$delete}" )
fi
done
done

Get the index of a specific string from a dynamically generated output

I tried a lot of things, but now I am at my wit's end.
My problem is I need the index of a specific string from my dynamically generated output.
In example, I want the index from the string 'cookie' in this output:
1337 cat dog table cookie 42
So in this example I would need this result:
5
One problem is that I need that number for a later executed awk command. Another problem is that this generated output has a flexible length and you are not able to 'sed' something with . - or something else. There is no such pattern like this.
Cheers

Just create an array mapping the string value to it's index and then print the entry:
$ cat file
1337 cat dog table cookie 42
$ awk -v v="cookie" '{v2i[v]=0; for (i=1;i<=NF;i++) v2i[$i]=i; print v2i[v]}' file
5
The above will print 0 if the string doesn't exist as a field on the given line.
By the way, you say you need that number output from the above "for a later executed awk command". Wild idea - why not do both steps in one awk command?

Ugly, but possible:
echo '1337 cat dog table cookie 42' \
| tr ' ' '\n' \
| grep -Fn cookie \
| cut -f1 -d:

Here is a way to find position of word in a string using gnu awk (due to RS), and store it to a variable.
pat="cookie"
pos=$(echo "1337 cat dog table cookie 42" | awk '{print NF+1;exit}' RS="$pat")
echo "$pos"
5
If you do not have gnu awk
pat="cookie"
pos=$(echo "1337 cat dog table cookie 42" | awk '{for (i=1;i<=NF;i++) if ($i~p) print i}' p="$pat")
echo "$pos"
5

Here is pure bash way of doing it with arrays, no sed or awk or GNUs required ;-)
# Load up array, you would use your own command in place of echo
array=($(echo 1337 cat dog table cookie 42))
# Show what we have
echo ${array[*]}
1337 cat dog table cookie 42
# Find which element contains our pattern
for ((i=0;i<${#array[#]};i++)); do [ ${array[$i]} == "cookie" ] && echo $(($i+1)); done
5
Of course, you could set a variable to use later instead of echoing $i+1. You may also want some error checking in case pattern isn't found, but you get the idea!

Here is another answer, not using arrays, or "sed" or "awk" or "tr", just based on the bash IFS separating the values for you:
#!/bin/bash
output="cat dog mouse cookie 42" # Or output=$(yourProgram)
f=0 # f will be your answer
i=0 # i counts the fields
for x in $output; do \
((i++)); [[ "$x" = "cookie" ]] && f=$i; \
done
echo $f
Result:
4
Or you can put it all on one line, if you remove the backslashes, like this:
#!/bin/bash
output="cat dog mouse cookie 42" # Or output=$(yourProgram)
f=0;i=0;for x in $output; do ((i++)); [[ "$x" = "cookie" ]] && f=$i; done
echo $f
Explanation:
The "[[a=b]] && c" part is just shorthand for
if [a=b]; then
c
fi
It relies on shortcut evaluation of logicals. Basically, we are asking shell to determine if the two statements "a equals b" AND the statement "c" are both true. If a is not equal to b, it already knows it doesn't need to evaluate c because they already can't both be true - so f doesn't get the value of i. If, on the other hand, a is equal to b, the shell must still evaluate statement "c" to see if it is also true - and when it does so, f will get the value of i.

Pat="cookie"
YourInput | sed -n "/${Pat}/ {s/.*/ & /;s/ ${Pat} .*/I/;s/[[:blank:]\{1,\}[^[:blank:]\{1,\}/I/g
s/I\{9\}/9/;s/I\{8\}/8/;s/I\{7\}/7/;s/IIIIII/6/;s/IIIII/5/;s/IIII/4/;s/III/3/;s/II/2/;s/I/1/
p;q;}
$ s/.*/0/p"
if there is more than 9 cols, a more complex sed could be made or pass through a wc -c instead

bash print first to nth column in a line iteratively

I am trying to get the column names of a file and print them iteratively. I guess the problem is with the print $i but I don't know how to correct it. The code I tried is:
#! /bin/bash
for i in {2..5}
do
set snp = head -n 1 smaller.txt | awk '{print $i}'
echo $snp
done
Example input file:
ID Name Age Sex State Ext
1 A 12 M UT 811
2 B 12 F UT 818
Desired output:
Name
Age
Sex
State
Ext
But the output I get is blank screen.

You'd better just read the first line of your file and store the result as an array:
read -a header < smaller.txt
and then printf the relevant fields:
printf "%s\n" "${header[#]:1}"
Moreover, this uses bash only, and involves no unnecessary loops.
Edit. To also answer your comment, you'll be able to loop through the header fields thus:
read -a header < smaller.txt
for snp in "${header[#]:1}"; do
echo "$snp"
done
Edit 2. Your original method had many many mistakes. Here's a corrected version of it (although what I wrote before is a much preferable way of solving your problem):
for i in {2..5}; do
snp=$(head -n 1 smaller.txt | awk "{print \$$i}")
echo "$snp"
done
set probably doesn't do what you think it does.
Because of the single quotes in awk '{print $i}', the $i never gets expanded by bash.
This algorithm is not good since you're calling head and awk 4 times, whereas you don't need a single external process.
Hope this helps!

You can print it using awk itself:
awk 'NR==1{for (i=2; i<=5; i++) print $i}' smaller.txt

The main problem with your code is that your assignment syntax is wrong. Change this:
set snp = head -n 1 smaller.txt | awk '{print $i}'
to this:
snp=$(head -n 1 smaller.txt | awk '{print $i}')
That is:
Do not use set. set is for setting shell options, numbered parameters, and so on, not for assigning arbitrary variables.
Remove the spaces around =.
To run a command and capture its output as a string, use $(...) (or `...`, but $(...) is less error-prone).
That said, I agree with gniourf_gniourf's approach.

Here's another alternative; not necessarily better or worse than any of the others:
for n in $(head smaller.txt)
do
echo ${n}
done

somthin like
for x1 in $(head -n1 smaller.txt );do
echo $x1
done

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

AWK output to array - linux

Related

Storing command values into a variable Bash Script

How to search the full string in file which is passed as argument in shell script?

Bash - replace in file strings from first array with strings from second array

Get the index of a specific string from a dynamically generated output

bash print first to nth column in a line iteratively

Categories

Resources