Get the index of a specific string from a dynamically generated output - linux

I tried a lot of things, but now I am at my wit's end.
My problem is I need the index of a specific string from my dynamically generated output.
In example, I want the index from the string 'cookie' in this output:
1337 cat dog table cookie 42
So in this example I would need this result:
5
One problem is that I need that number for a later executed awk command. Another problem is that this generated output has a flexible length and you are not able to 'sed' something with . - or something else. There is no such pattern like this.
Cheers

Just create an array mapping the string value to it's index and then print the entry:
$ cat file
1337 cat dog table cookie 42
$ awk -v v="cookie" '{v2i[v]=0; for (i=1;i<=NF;i++) v2i[$i]=i; print v2i[v]}' file
5
The above will print 0 if the string doesn't exist as a field on the given line.
By the way, you say you need that number output from the above "for a later executed awk command". Wild idea - why not do both steps in one awk command?

Ugly, but possible:
echo '1337 cat dog table cookie 42' \
| tr ' ' '\n' \
| grep -Fn cookie \
| cut -f1 -d:

Here is a way to find position of word in a string using gnu awk (due to RS), and store it to a variable.
pat="cookie"
pos=$(echo "1337 cat dog table cookie 42" | awk '{print NF+1;exit}' RS="$pat")
echo "$pos"
5
If you do not have gnu awk
pat="cookie"
pos=$(echo "1337 cat dog table cookie 42" | awk '{for (i=1;i<=NF;i++) if ($i~p) print i}' p="$pat")
echo "$pos"
5

Here is pure bash way of doing it with arrays, no sed or awk or GNUs required ;-)
# Load up array, you would use your own command in place of echo
array=($(echo 1337 cat dog table cookie 42))
# Show what we have
echo ${array[*]}
1337 cat dog table cookie 42
# Find which element contains our pattern
for ((i=0;i<${#array[#]};i++)); do [ ${array[$i]} == "cookie" ] && echo $(($i+1)); done
5
Of course, you could set a variable to use later instead of echoing $i+1. You may also want some error checking in case pattern isn't found, but you get the idea!

Here is another answer, not using arrays, or "sed" or "awk" or "tr", just based on the bash IFS separating the values for you:
#!/bin/bash
output="cat dog mouse cookie 42" # Or output=$(yourProgram)
f=0 # f will be your answer
i=0 # i counts the fields
for x in $output; do \
((i++)); [[ "$x" = "cookie" ]] && f=$i; \
done
echo $f
Result:
4
Or you can put it all on one line, if you remove the backslashes, like this:
#!/bin/bash
output="cat dog mouse cookie 42" # Or output=$(yourProgram)
f=0;i=0;for x in $output; do ((i++)); [[ "$x" = "cookie" ]] && f=$i; done
echo $f
Explanation:
The "[[a=b]] && c" part is just shorthand for
if [a=b]; then
c
fi
It relies on shortcut evaluation of logicals. Basically, we are asking shell to determine if the two statements "a equals b" AND the statement "c" are both true. If a is not equal to b, it already knows it doesn't need to evaluate c because they already can't both be true - so f doesn't get the value of i. If, on the other hand, a is equal to b, the shell must still evaluate statement "c" to see if it is also true - and when it does so, f will get the value of i.

Pat="cookie"
YourInput | sed -n "/${Pat}/ {s/.*/ & /;s/ ${Pat} .*/I/;s/[[:blank:]\{1,\}[^[:blank:]\{1,\}/I/g
s/I\{9\}/9/;s/I\{8\}/8/;s/I\{7\}/7/;s/IIIIII/6/;s/IIIII/5/;s/IIII/4/;s/III/3/;s/II/2/;s/I/1/
p;q;}
$ s/.*/0/p"
if there is more than 9 cols, a more complex sed could be made or pass through a wc -c instead

Related

Storing command values into a variable Bash Script

I am trying to loop through files in a directory to find an animal and its value. The command is supposed to only display the animal and total value. For example:
File1 has:
Monkey 11
Bear 4
File2 has:
Monkey 12
If I wanted the total value of monkeys then I would do:
for f in *; do
total=$(grep $animal $f | cut -d " " -f 2- | paste -sd+ | bc)
done
echo $animal $total
This would return the correct value of:
Monkey 23
However, if there is only one instance of an animal like for example Bear, the variable total doesn't return any value, I only get echoed:
Bear
Why is this the case and how do I fix it?
Note: I'm not allowed to use the find command.
you could use this little awk instead of for grep cut paste bc:
awk -v animal="Bear" '
$1 == animal { count += $2 }
END { print count + 0 }
' *
Comments on OP's question about why code behaves as it does:
total is reset on each pass through the loop so ...
upon leaving the loop total will have the count from the 'last' file processed
in the case of Bear the 'last' file processed is File2 and since File2 does not contain any entries for Bear we get total='', which is what's printed by the echo
if the Bear entry is moved from File1 to File2 then OP's code should print Bear 4
OP's current code effectively ignores all input files and prints whatever's in the 'last' file (File2 in this case)
OP's current code generates the following:
# Monkey
Monkey 12 # from File2
# Bear
Bear # no match in File2
I'd probably opt for replacing the whole grep/cut/paste/bc (4x subprocesses) with a single awk (1x subprocess) call (and assuming no matches we report 0):
for animal in Monkey Bear Hippo
do
total=$(awk -v a="${animal}" '$1==a {sum+=$2} END {print sum+0}' *)
echo "${animal} ${total}"
done
This generates:
Monkey 23
Bear 4
Hippo 0
NOTES:
I'm assuming OP's real code does more than echo the count to stdout hence the need of the total variable otherwise we could eliminate the total variable and have awk print the animal/sum pair directly to stdout
if OP's real code has a parent loop processing a list of animals it's likely possible a single awk call could process all of the animals at once; objective being to have awk generate the entire set of animal/sum pairs that could then be fed to the looping construct; if this is the case, and OP has some issues implementing a single awk solution, a new question should be asked
Why is this the case
grep outputs nothing, so nothing is propagated through the pipe and empty string is assigned to total.
Because total is reset every loop (total=anything without referencing previous value), it just has the value for the last file.
how do I fix it?
Do not try to do all at once, just less thing at once.
total=0
for f in *; do
count=$(grep "$animal" "$f" | cut -d " " -f 2-)
total=$((total + count)) # reuse total, reference previous value
done
echo "$animal" "$total"
A programmer fluent in shell will most probably jump to AWK for such problems. Remember to check your scripts with shellcheck.
With what you were trying to do, you could do all files at once:
total=$(
{
echo 0 # to have at least nice 0 if animal is not found
grep "$animal" * |
cut -d " " -f 2-
} |
paste -sd+ |
bc
)
With just bash:
declare -A animals=()
for f in *; do
while read -r animal value; do
(( animals[$animal] = ${animals[$animal]:-0} + value ))
done < "$f"
done
declare -p animals
outputs
declare -A animals=([Monkey]="23" [Bear]="4" )
With this approach, you have all the totals for all the animals by processing each file exactly once
$ head File*
==> File1 <==
Monkey 11
Bear 4
==> File2 <==
Monkey 12
==> File3 <==
Bear
Monkey
Using awk and bash array
#!/bin/bash
sumAnimals(){
awk '
{ NF == 1 ? a[$1]++ : a[$1]=a[$1]+$2 }
END{
for (i in a ) printf "[%s]=%d\n",i, a[i]
}
' File*
}
# storing all animals in bash array
declare -A animalsArr="( $(sumAnimals) )"
# show array content
declare -p animalsArr
# getting total from array
echo "Monkey: ${animalsArr[Monkey]}"
echo "Bear: ${animalsArr[Monkey]}"
Output
declare -A animalsArr=([Bear]="5" [Monkey]="24" )
Monkey: 24
Bear: 5

How to search the full string in file which is passed as argument in shell script?

i am passing a argument and that argument i have to match in file and extract the information. Could you please how I can get it?
Example:
I have below details in file-
iMedical_Refined_load_Procs_task_id=970113
HV_Rawlayer_Execution_Process=988835
iMedical_HV_Refined_Load=988836
DHS_RawLayer_Execution_Process=988833
iMedical_DHS_Refined_Load=988834
If I am passing 'hv' as argument so it should to pick 'iMedical_HV_Refined_Load' and give the result - '988836'
If I am passing 'dhs' so it should pick - 'iMedical_DHS_Refined_Load' and give the result = '988834'
I tried below logic but its not giving the result correctly. What Changes I need to do-
echo $1 | tr a-z A-Z
g=${1^^}
echo $g
echo $1
val=$(awk -F= -v s="$g" '$g ~ s{print $2}' /medaff/Scripts/Aggrify/sltconfig.cfg)
echo "TASK ID is $val"
Assuming your matching criteria is the first string after delimiter _ and the output needed is the numbers after the = char, then you can try this sed
$ sed -n "/_$1/I{s/[^=]*=\(.*\)/\1/p}" input_file
$ read -r input
hv
$ sed -n "/_$input/I{s/[^=]*=\(.*\)/\1/p}" input_file
988836
$ read -r input
dhs
$ sed -n "/_$input/I{s/[^=]*=\(.*\)/\1/p}" input_file
988834
If I'm reading it right, 2 quick versions -
$: cat 1
awk -F= -v s="_${1^^}_" '$1~s{print $2}' file
$: cat 2
sed -En "/_${1^^}_/{s/^.*=//;p;}" file
Both basically the same logic.
In pure bash -
$: cat 3
while IFS='=' read key val; do [[ "$key" =~ "_${1^^}_" ]] && echo "$val"; done < file
That's a lot less efficient, though.
If you know for sure there will be only one hit, all these could be improved a bit by short-circuit exits, but on such a small sample it won't matter at all. If you have a larger dataset to read, then I strongly suggest you formalize your specs better than "in this set I should get...".

AWK output to array

I am learning about "AWK". Well, I need to output awk command on variable to parse it.
The file have 130000 lanes. I need put with AWK a column like array to use the variable in other part of the script. Sorry for my english and ask me if you dont understand my objetive.
well the code:
awk '/file:/{ name=$3 ; print name ;)' ejemplo.txt
I try:
list=$(awk '/file:/{ name=$3 ; print name;)' ejemplo.txt)
but when I try to show the content in the variable $list, only show me 1 lane.
I tried declare array but only show me 1 result
anyone understand what happen? how can i build a array with all the output?
I try this code to build a array with AWK. Maybe I am a little dumb but i dont look how to solve my problem:
#!/bin/bash
#filename: script2.sh
conta=$(cat ejemplo_hdfs.txt | wc -l)
for i in `seq 0 $conta`;do
objeto=" "
owner=" "
group=" "
awk '{
if($1=="#" && $2=="file:") objeto=$3;
else if($1=="#" && $2=="owner:") owner=$3;
else if($1=="#" && $2=="group:") group=$3;
else
print $3
;}' ejemplo_hdfs.txt
echo $objeto+","+$owner+","+$group
done
To assign an array to a variable in bash, the whole expression that generates the elements needs to be in parenthesis. Each word produced by the evaluated expression becomes an element of the resulting array.
Example:
#!/bin/bash
foo=($(awk -F, '{print $2}' x.txt))
# Size of array
echo "There are ${#foo[#]} elements"
# Iterate over each element
for f in "${foo[#]}"; do
echo "$f"
done
# Use a specific element.
echo "The second element is ${foo[1]}"
$ cat x.txt
1,a dog
2,b
3,c
$ ./array_example.sh
There are 4 elements
a
dog
b
c
The second element is dog

grep -o: Keep input line format

$ echo "abca\ndeaf" | grep -o a
a
a
a
I am looking for the output:
aa
a
Or perhaps
a a
a
or even
a<TAB>a
a
(this is a very very simplified example)
I just want it not to throw away the line grouping.
You can do it with sed by removing any character that isn't a:
echo "abca\ndeaf" | sed 's/[^a]//g'
aa
a
It can't be done with grep alone.
#sudo_O's answer shows how to do this with single-character strings. The difficulty level is raised if you want to match longer strings.
One way to do it is by parsing the output of grep -n -o, like so:
$ cat mgrep
#!/bin/bash
# Print each match along with its line number.
grep -no "$#" | {
matches=() # An array of matches to be printed when the line number changes.
lastLine= # Keep track of the current and previous line numbers.
# Read the matches, with `:' as the separator.
while IFS=: read line match; do
# If this is the same line number as the previous match, add this one to
# the list.
if [[ $line = $lastLine ]]; then
matches+=("$match")
# Otherwise, print out the list of matches we've accumulated and start
# over.
else
(( ${#matches[#]} )) && echo "${matches[#]}"
matches=("$match")
fi
lastLine=$line
done
# Print any remaining matches.
(( ${#matches[#]} )) && echo "${matches[#]}"
}
Example usage:
$ echo $'abca\ndeaf' | ./mgrep a
a a
a
$ echo $'foo bar foo\nbaz\ni like food' | ./mgrep foo
foo foo
foo
Based off John Kugelman's solution, this one works with one input file and gawk
grep -on abc file.txt | awk -v RS='[[:digit:]]+:' 'NF{$1=$1; print}'
If you're willing to use perl:
$ echo $'abca\ndeaf' | perl -ne '#m = /a/g; print "#m\n"'
a a
a

bash print first to nth column in a line iteratively

I am trying to get the column names of a file and print them iteratively. I guess the problem is with the print $i but I don't know how to correct it. The code I tried is:
#! /bin/bash
for i in {2..5}
do
set snp = head -n 1 smaller.txt | awk '{print $i}'
echo $snp
done
Example input file:
ID Name Age Sex State Ext
1 A 12 M UT 811
2 B 12 F UT 818
Desired output:
Name
Age
Sex
State
Ext
But the output I get is blank screen.
You'd better just read the first line of your file and store the result as an array:
read -a header < smaller.txt
and then printf the relevant fields:
printf "%s\n" "${header[#]:1}"
Moreover, this uses bash only, and involves no unnecessary loops.
Edit. To also answer your comment, you'll be able to loop through the header fields thus:
read -a header < smaller.txt
for snp in "${header[#]:1}"; do
echo "$snp"
done
Edit 2. Your original method had many many mistakes. Here's a corrected version of it (although what I wrote before is a much preferable way of solving your problem):
for i in {2..5}; do
snp=$(head -n 1 smaller.txt | awk "{print \$$i}")
echo "$snp"
done
set probably doesn't do what you think it does.
Because of the single quotes in awk '{print $i}', the $i never gets expanded by bash.
This algorithm is not good since you're calling head and awk 4 times, whereas you don't need a single external process.
Hope this helps!
You can print it using awk itself:
awk 'NR==1{for (i=2; i<=5; i++) print $i}' smaller.txt
The main problem with your code is that your assignment syntax is wrong. Change this:
set snp = head -n 1 smaller.txt | awk '{print $i}'
to this:
snp=$(head -n 1 smaller.txt | awk '{print $i}')
That is:
Do not use set. set is for setting shell options, numbered parameters, and so on, not for assigning arbitrary variables.
Remove the spaces around =.
To run a command and capture its output as a string, use $(...) (or `...`, but $(...) is less error-prone).
That said, I agree with gniourf_gniourf's approach.
Here's another alternative; not necessarily better or worse than any of the others:
for n in $(head smaller.txt)
do
echo ${n}
done
somthin like
for x1 in $(head -n1 smaller.txt );do
echo $x1
done

Resources