Storing command values into a variable Bash Script - linux

I am trying to loop through files in a directory to find an animal and its value. The command is supposed to only display the animal and total value. For example:
File1 has:
Monkey 11
Bear 4
File2 has:
Monkey 12
If I wanted the total value of monkeys then I would do:
for f in *; do
total=$(grep $animal $f | cut -d " " -f 2- | paste -sd+ | bc)
done
echo $animal $total
This would return the correct value of:
Monkey 23
However, if there is only one instance of an animal like for example Bear, the variable total doesn't return any value, I only get echoed:
Bear
Why is this the case and how do I fix it?
Note: I'm not allowed to use the find command.

you could use this little awk instead of for grep cut paste bc:
awk -v animal="Bear" '
$1 == animal { count += $2 }
END { print count + 0 }
' *

Comments on OP's question about why code behaves as it does:
total is reset on each pass through the loop so ...
upon leaving the loop total will have the count from the 'last' file processed
in the case of Bear the 'last' file processed is File2 and since File2 does not contain any entries for Bear we get total='', which is what's printed by the echo
if the Bear entry is moved from File1 to File2 then OP's code should print Bear 4
OP's current code effectively ignores all input files and prints whatever's in the 'last' file (File2 in this case)
OP's current code generates the following:
# Monkey
Monkey 12 # from File2
# Bear
Bear # no match in File2
I'd probably opt for replacing the whole grep/cut/paste/bc (4x subprocesses) with a single awk (1x subprocess) call (and assuming no matches we report 0):
for animal in Monkey Bear Hippo
do
total=$(awk -v a="${animal}" '$1==a {sum+=$2} END {print sum+0}' *)
echo "${animal} ${total}"
done
This generates:
Monkey 23
Bear 4
Hippo 0
NOTES:
I'm assuming OP's real code does more than echo the count to stdout hence the need of the total variable otherwise we could eliminate the total variable and have awk print the animal/sum pair directly to stdout
if OP's real code has a parent loop processing a list of animals it's likely possible a single awk call could process all of the animals at once; objective being to have awk generate the entire set of animal/sum pairs that could then be fed to the looping construct; if this is the case, and OP has some issues implementing a single awk solution, a new question should be asked

Why is this the case
grep outputs nothing, so nothing is propagated through the pipe and empty string is assigned to total.
Because total is reset every loop (total=anything without referencing previous value), it just has the value for the last file.
how do I fix it?
Do not try to do all at once, just less thing at once.
total=0
for f in *; do
count=$(grep "$animal" "$f" | cut -d " " -f 2-)
total=$((total + count)) # reuse total, reference previous value
done
echo "$animal" "$total"
A programmer fluent in shell will most probably jump to AWK for such problems. Remember to check your scripts with shellcheck.
With what you were trying to do, you could do all files at once:
total=$(
{
echo 0 # to have at least nice 0 if animal is not found
grep "$animal" * |
cut -d " " -f 2-
} |
paste -sd+ |
bc
)

With just bash:
declare -A animals=()
for f in *; do
while read -r animal value; do
(( animals[$animal] = ${animals[$animal]:-0} + value ))
done < "$f"
done
declare -p animals
outputs
declare -A animals=([Monkey]="23" [Bear]="4" )
With this approach, you have all the totals for all the animals by processing each file exactly once

$ head File*
==> File1 <==
Monkey 11
Bear 4
==> File2 <==
Monkey 12
==> File3 <==
Bear
Monkey
Using awk and bash array
#!/bin/bash
sumAnimals(){
awk '
{ NF == 1 ? a[$1]++ : a[$1]=a[$1]+$2 }
END{
for (i in a ) printf "[%s]=%d\n",i, a[i]
}
' File*
}
# storing all animals in bash array
declare -A animalsArr="( $(sumAnimals) )"
# show array content
declare -p animalsArr
# getting total from array
echo "Monkey: ${animalsArr[Monkey]}"
echo "Bear: ${animalsArr[Monkey]}"
Output
declare -A animalsArr=([Bear]="5" [Monkey]="24" )
Monkey: 24
Bear: 5

Related

AWK output to array

I am learning about "AWK". Well, I need to output awk command on variable to parse it.
The file have 130000 lanes. I need put with AWK a column like array to use the variable in other part of the script. Sorry for my english and ask me if you dont understand my objetive.
well the code:
awk '/file:/{ name=$3 ; print name ;)' ejemplo.txt
I try:
list=$(awk '/file:/{ name=$3 ; print name;)' ejemplo.txt)
but when I try to show the content in the variable $list, only show me 1 lane.
I tried declare array but only show me 1 result
anyone understand what happen? how can i build a array with all the output?
I try this code to build a array with AWK. Maybe I am a little dumb but i dont look how to solve my problem:
#!/bin/bash
#filename: script2.sh
conta=$(cat ejemplo_hdfs.txt | wc -l)
for i in `seq 0 $conta`;do
objeto=" "
owner=" "
group=" "
awk '{
if($1=="#" && $2=="file:") objeto=$3;
else if($1=="#" && $2=="owner:") owner=$3;
else if($1=="#" && $2=="group:") group=$3;
else
print $3
;}' ejemplo_hdfs.txt
echo $objeto+","+$owner+","+$group
done
To assign an array to a variable in bash, the whole expression that generates the elements needs to be in parenthesis. Each word produced by the evaluated expression becomes an element of the resulting array.
Example:
#!/bin/bash
foo=($(awk -F, '{print $2}' x.txt))
# Size of array
echo "There are ${#foo[#]} elements"
# Iterate over each element
for f in "${foo[#]}"; do
echo "$f"
done
# Use a specific element.
echo "The second element is ${foo[1]}"
$ cat x.txt
1,a dog
2,b
3,c
$ ./array_example.sh
There are 4 elements
a
dog
b
c
The second element is dog

Bash: transform key-value lines to CSV format [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
Editor's note: I've clarified the problem definition, because I think the problem is an interesting one, and this question deserves to be reopened.
I've got a text file containing key-value lines in the following format - note that the # lines below are only there to show repeating blocks and are NOT part of the input:
Country:United Kingdom
Language:English
Capital city:London
#
Country:France
Language:French
Capital city:Paris
#
Country:Germany
Language:German
Capital city:Berlin
#
Country:Italy
Language:Italian
Capital city:Rome
#
Country:Russia
Language:Russian
Capital city:Moscow
Using shell commands and utilities, how can I transform such a file to CSV format, so it will look like this?
Country,Language,Capital city
United Kingdom,English,London
France,French,Paris
Germany,German,Berlin
Italy,Italian,Rome
Russia,Russian,Moscow
In other words:
Make the key names the column names of the CSV header row.
Make the values from each block a data row each.
[OP's original] Edit: My idea would be to separate the entries e.g. Country:France would become Country France, and then grep/sed the heading. However I have no idea how to move the headings from a single column to several separate ones.
A simple solution with cut, paste, and head (assumes input file file, outputs to file out.csv):
#!/usr/bin/env bash
{ cut -d':' -f1 file | head -n 3 | paste -d, - - -;
cut -d':' -f2- file | paste -d, - - -; } >out.csv
cut -d':' -f1 file | head -n 3 creates the header line:
cut -d':' -f1 file extracts the first :-based field from each input line, and head -n 3 stops after 3 lines, given that the headers repeat every 3 lines.
paste -d, - - - takes 3 input lines from stdin (one for each -) and combines them to a single, comma-separated output line (-d,)
cut -d':' -f2- file | paste -d, - - - creates the data lines:
cut -d':' -f2- file extracts everything after the : from each input line.
As above, paste then combines 3 values to a single, comma-separated output line.
agc points out in a comment that the column count (3) and the paste operands (- - -) are hard-coded above.
The following solution parameterizes the column count (set it via n=...):
{ n=3; pasteOperands=$(printf '%.s- ' $(seq $n))
cut -d':' -f1 file | head -n $n | paste -d, $pasteOperands;
cut -d':' -f2- file | paste -d, $pasteOperands; } >out.csv
printf '%.s- ' $(seq $n) is a trick that produces a list of as many space-separated - chars. as there are columns ($n).
While the previous solution is now parameterized, it still assumes that the column count is known in advance; the following solution dynamically determines the column count (requires Bash 4+ due to use of readarray, but could be made to work with Bash 3.x):
# Determine the unique list of column headers and
# read them into a Bash array.
readarray -t columnHeaders < <(awk -F: 'seen[$1]++ { exit } { print $1 }' file)
# Output the header line.
(IFS=','; echo "${columnHeaders[*]}") >out.csv
# Append the data lines.
cut -d':' -f2- file | paste -d, $(printf '%.s- ' $(seq ${#columnHeaders[#]})) >>out.csv
awk -F: 'seen[$1]++ { exit } { print $1 } outputs each input line's column name (the 1st :-separated field), remembers the column names in associative array seen, and stops at the first column name that is seen for the second time.
readarray -t columnHeaders reads awk's output line by line into array columnHeaders
(IFS=','; echo "${columnHeaders[*]}") >out.csv prints the array elements using a space as the separator (specified via $IFS); note the use of a subshell ((...)) so as to localize the effect of modifying $IFS, which would otherwise have global effects.
The cut ... pipeline uses the same approach as before, with the operands for paste being created based on the count of the elements of array columnHeaders (${#columnHeaders[#]}).
To wrap the above up in a function that outputs to stdout and also works with Bash 3.x:
toCsv() {
local file=$1 columnHeaders
# Determine the unique list of column headers and
# read them into a Bash array.
IFS=$'\n' read -d '' -ra columnHeaders < <(awk -F: 'seen[$1]++ { exit } { print $1 }' "$file")
# Output the header line.
(IFS=','; echo "${columnHeaders[*]}")
# Append the data lines.
cut -d':' -f2- "$file" | paste -d, $(printf '%.s- ' $(seq ${#columnHeaders[#]}))
}
# Sample invocation
toCsv file > out.csv
My bash script for this would be :
#!/bin/bash
count=0
echo "Country,Language,Capital city"
while read line
do
(( count++ ))
(( count -lt 3 )) && printf "%s," "${line##*:}"
(( count -eq 3 )) && printf "%s\n" "${line##*:}" && (( count = 0 ))
done<file
Output
Country,Language,Capital city
United Kingdom,English,London
France,French,Paris
Germany,German,Berlin
Italy,Italian,Rome
Russia,Russian,Moscow
Edit
Replaced [ stuff ] with (( stuff )) ie test with double parenthesis which is used for arithmetic expansion.
You can also write a slightly more generalized version of a bash script that can take the number of repeating rows holding the data and produce output on that basis to avoid hardcoding the header values and handle additional fields. (you could also just scan the field names for the first repeat and set the repeat rows in that manner as well).
#!/bin/bash
declare -i rc=0 ## record count
declare -i hc=0 ## header count
record=""
header=""
fn="${1:-/dev/stdin}" ## filename as 1st arg (default: stdin)
repeat="${2:-3}" ## number of repeating rows (default: 3)
while read -r line; do
record="$record,${line##*:}"
((hc == 0)) && header="$header,${line%%:*}"
if ((rc < (repeat - 1))); then
((rc++))
else
((hc == 0)) && { printf "%s\n" "${header:1}"; hc=1; }
printf "%s\n" "${record:1}"
record=""
rc=0
fi
done <"$fn"
There are any number of ways to approach the problem. You will have to experiment to find the most efficient for your data file size, etc. Whether you use a script, or a combination of shell tools, cut, paste, etc.. is to a large extent left to you.
Output
$ bash readcountry.sh country.txt
Country,Language,Capital city
United Kingdom,English,London
France,French,Paris
Germany,German,Berlin
Italy,Italian,Rome
Russia,Russian,Moscow
Output with 4 Fields
Example input file adding a Population field:
$ cat country2.txt
Country:United Kingdom
Language:English
Capital city:London
Population:20000000
<snip>
Output
$ bash readcountry.sh country2.txt 4
Country,Language,Capital city,Population
United Kingdom,English,London,20000000
France,French,Paris,10000000
Germany,German,Berlin,150000000
Italy,Italian,Rome,9830000
Russia,Russian,Moscow,622000000
Using datamash, tr, and join:
datamash -t ':' -s -g 1 collapse 2 < country.txt | tr ',' ':' |
datamash -t ':' transpose |
join -t ':' -a1 -o 1.2,1.3,1.1 - /dev/null | tr ':' ','
Output:
Country,Language,Capital city
United Kingdom,English,London
France,French,Paris
Germany,German,Berlin
Italy,Italian,Rome
Russia,Russian,Moscow

In if test the list contains only one specific string or multiple same specific string

I have a list file as
mike
jack
jack
mike
sometimes is (no mike)
jack
jack
I would like to test whether this file only contains one mike or multiple mike's like following
if [list **only** contains one `mike` or multiple `mike`'s]
then
do something
else
echo jack(other's name) is using it
done
[ "$(sort inputfile | uniq)" = mike ]
sort the input, then remove all identical lines. You need to sort the input for uniq because it works only for consecutive identical lines.
Short form:
[ "$(sort --unique inputfile)" = mike ]
Solution in bash
You can use a while loop with read to read through the lines as
while read line;
do
[ $line == "mike" ] && ((count=$count+1))
done < inputFile
The $count will contain the count of mike in the file.
$ echo $count
2
Solution in awk
$ awk '/mike/{count++}; END{print count}' input
2
find_mike () {
mike_count=$(grep -c 'mike');
if (( mike_count == 1 )); then
printf 'I found only one mike.'
elif (( mike_count > 1 )); then
printf 'I found %d mikes.' "$mike_count"
else
printf '%s\n' "I have no idea where is mike";
fi
}
Usage example:
$ find_mike < input_file.txt
I found 2 mikes
grep -xvq mike inputfile
-x: match the whole line
-v: invert the match
-q: do not print anything; exit at first match
This command exits with 0 as soon as it finds something that is not mike. If the file is empty or contains only (any number of) mike lines, it exits with 1.
grep is very, very fast. And it stops as soon as possible parsing the input file.
You might want to invert the exit value by prefixing the command with an exclamation mark.

Get the index of a specific string from a dynamically generated output

I tried a lot of things, but now I am at my wit's end.
My problem is I need the index of a specific string from my dynamically generated output.
In example, I want the index from the string 'cookie' in this output:
1337 cat dog table cookie 42
So in this example I would need this result:
5
One problem is that I need that number for a later executed awk command. Another problem is that this generated output has a flexible length and you are not able to 'sed' something with . - or something else. There is no such pattern like this.
Cheers
Just create an array mapping the string value to it's index and then print the entry:
$ cat file
1337 cat dog table cookie 42
$ awk -v v="cookie" '{v2i[v]=0; for (i=1;i<=NF;i++) v2i[$i]=i; print v2i[v]}' file
5
The above will print 0 if the string doesn't exist as a field on the given line.
By the way, you say you need that number output from the above "for a later executed awk command". Wild idea - why not do both steps in one awk command?
Ugly, but possible:
echo '1337 cat dog table cookie 42' \
| tr ' ' '\n' \
| grep -Fn cookie \
| cut -f1 -d:
Here is a way to find position of word in a string using gnu awk (due to RS), and store it to a variable.
pat="cookie"
pos=$(echo "1337 cat dog table cookie 42" | awk '{print NF+1;exit}' RS="$pat")
echo "$pos"
5
If you do not have gnu awk
pat="cookie"
pos=$(echo "1337 cat dog table cookie 42" | awk '{for (i=1;i<=NF;i++) if ($i~p) print i}' p="$pat")
echo "$pos"
5
Here is pure bash way of doing it with arrays, no sed or awk or GNUs required ;-)
# Load up array, you would use your own command in place of echo
array=($(echo 1337 cat dog table cookie 42))
# Show what we have
echo ${array[*]}
1337 cat dog table cookie 42
# Find which element contains our pattern
for ((i=0;i<${#array[#]};i++)); do [ ${array[$i]} == "cookie" ] && echo $(($i+1)); done
5
Of course, you could set a variable to use later instead of echoing $i+1. You may also want some error checking in case pattern isn't found, but you get the idea!
Here is another answer, not using arrays, or "sed" or "awk" or "tr", just based on the bash IFS separating the values for you:
#!/bin/bash
output="cat dog mouse cookie 42" # Or output=$(yourProgram)
f=0 # f will be your answer
i=0 # i counts the fields
for x in $output; do \
((i++)); [[ "$x" = "cookie" ]] && f=$i; \
done
echo $f
Result:
4
Or you can put it all on one line, if you remove the backslashes, like this:
#!/bin/bash
output="cat dog mouse cookie 42" # Or output=$(yourProgram)
f=0;i=0;for x in $output; do ((i++)); [[ "$x" = "cookie" ]] && f=$i; done
echo $f
Explanation:
The "[[a=b]] && c" part is just shorthand for
if [a=b]; then
c
fi
It relies on shortcut evaluation of logicals. Basically, we are asking shell to determine if the two statements "a equals b" AND the statement "c" are both true. If a is not equal to b, it already knows it doesn't need to evaluate c because they already can't both be true - so f doesn't get the value of i. If, on the other hand, a is equal to b, the shell must still evaluate statement "c" to see if it is also true - and when it does so, f will get the value of i.
Pat="cookie"
YourInput | sed -n "/${Pat}/ {s/.*/ & /;s/ ${Pat} .*/I/;s/[[:blank:]\{1,\}[^[:blank:]\{1,\}/I/g
s/I\{9\}/9/;s/I\{8\}/8/;s/I\{7\}/7/;s/IIIIII/6/;s/IIIII/5/;s/IIII/4/;s/III/3/;s/II/2/;s/I/1/
p;q;}
$ s/.*/0/p"
if there is more than 9 cols, a more complex sed could be made or pass through a wc -c instead

bash print first to nth column in a line iteratively

I am trying to get the column names of a file and print them iteratively. I guess the problem is with the print $i but I don't know how to correct it. The code I tried is:
#! /bin/bash
for i in {2..5}
do
set snp = head -n 1 smaller.txt | awk '{print $i}'
echo $snp
done
Example input file:
ID Name Age Sex State Ext
1 A 12 M UT 811
2 B 12 F UT 818
Desired output:
Name
Age
Sex
State
Ext
But the output I get is blank screen.
You'd better just read the first line of your file and store the result as an array:
read -a header < smaller.txt
and then printf the relevant fields:
printf "%s\n" "${header[#]:1}"
Moreover, this uses bash only, and involves no unnecessary loops.
Edit. To also answer your comment, you'll be able to loop through the header fields thus:
read -a header < smaller.txt
for snp in "${header[#]:1}"; do
echo "$snp"
done
Edit 2. Your original method had many many mistakes. Here's a corrected version of it (although what I wrote before is a much preferable way of solving your problem):
for i in {2..5}; do
snp=$(head -n 1 smaller.txt | awk "{print \$$i}")
echo "$snp"
done
set probably doesn't do what you think it does.
Because of the single quotes in awk '{print $i}', the $i never gets expanded by bash.
This algorithm is not good since you're calling head and awk 4 times, whereas you don't need a single external process.
Hope this helps!
You can print it using awk itself:
awk 'NR==1{for (i=2; i<=5; i++) print $i}' smaller.txt
The main problem with your code is that your assignment syntax is wrong. Change this:
set snp = head -n 1 smaller.txt | awk '{print $i}'
to this:
snp=$(head -n 1 smaller.txt | awk '{print $i}')
That is:
Do not use set. set is for setting shell options, numbered parameters, and so on, not for assigning arbitrary variables.
Remove the spaces around =.
To run a command and capture its output as a string, use $(...) (or `...`, but $(...) is less error-prone).
That said, I agree with gniourf_gniourf's approach.
Here's another alternative; not necessarily better or worse than any of the others:
for n in $(head smaller.txt)
do
echo ${n}
done
somthin like
for x1 in $(head -n1 smaller.txt );do
echo $x1
done

Resources