how to multiply two tables in BASH - linux

I have two data files like this:
file1:
a1 a2 a3 ... aN
b1 b2 b3 ... bN
.
.
.
file1:
A1 A2 A3 ... AN
B1 B2 B3 ... BN
.
.
.
I want to multiply the two tables, i.e.,
a1*A1 a2*A2 a3*A3 ... aN*AN
b1*B1 b2*B2 b3*B3 ... bN*BN
.
.
.
Can I do it with AWK or something else in BASH? Thanks a lot!

Here's one way using GNU awk, assuming you have the same number of fields and rows in each file. Run like:
awk -f script.awk file1 file2
Contents of script.awk:
FNR==NR {
for (i=1;i<=NF;i++) {
a[NR][i]=$i
}
next
}
{
for (j=1;j<=NF;j++) {
$j = $j * a[FNR][j]
}
}1
Alternatively, here's the one liner:
awk 'FNR==NR { for(i=1;i<=NF;i++) a[NR][i]=$i; next } { for(j=1;j<=NF;j++) $j = $j * a[FNR][j] }1' file1 file2
Testing:
Contents of file1:
1 2 3
2 4 6
Contents of file2:
3 4 5
6 7 8
Results:
3 8 15
12 28 48
EDIT:
If, and I mean if, there could be extra fields that one file has that the other doesn't, change:
$j = $j * a[FNR][j]
to:
$j = (a[FNR][j] ? $j * a[FNR][j] : $j)
This will print the existing value and not zero. HTH.

Related

Combine all the columns of two files using bash

I have two files
A B C D E F
B D F A C E
D E F A B C
and
1 2 3 4 5 6
2 4 6 1 3 5
4 5 6 1 2 3
I want to have something like this:
A1 B2 C3 D4 E5 F6
B2 D4 F6 A1 C3 E5
D4 E5 F6 A1 B2 C3
I mean, combine both files pasting the content of all columns.
Thank you very much!
Here's a bash solution:
paste -d' ' file1 file2 \
| while read -a fields ; do
(( width=${#fields[#]}/2 ))
for ((i=0; i<width; ++i)) ; do
printf '%s%s ' "${fields[i]}" "${fields[ i + width ]}"
done
printf '\n'
done
paste outputs the files side by side.
read -a reads the columns into an array.
in the for loop, we iterate over the array and print the corresponding values.
Could you please try following, trying to do some fun with combinations of xargs + paste here.
xargs -n6 < <(paste -d'\0' <(xargs -n1 < Input_file1) <(xargs -n1 < Input_file2))

In linux bash reverse file lines order but for blocks each 3 lines

I would like to reverse a file however in this file I have records 3 lines each
a1
a2
a3
...
x1
x2
x3
and I would like to get such file
x1
x2
x3
...
a1
a2
a3
I use Linux so tail -r doesn't work for me.
You can do this all in awk, using an associative array:
BEGIN { j=1 }
++i>3 { i=1; ++j }
{ a[j,i]=$0 }
END{ for(m=j;m>0;--m)
for(n=1;n<=3;++n) print a[m,n]
}
Run it like this:
awk -f script.awk file.txt
or of course, if you prefer a one-liner, you can use this:
awk 'BEGIN{j=1}++i>3{i=1;++j}{a[j,i]=$0}END{for(m=j;m>0;--m)for(n=1;n<=3;++n)print a[m,n]}' file.txt
Explanation
This uses two counters: i which runs from 1 to 3 and j, which counts the number of groups of 3 lines. All lines are stored in the associative array a and printed in reverse in the END block.
Testing it out
$ cat file
a1
a2
a3
b1
b2
b3
x1
x2
x3
$ awk 'BEGIN{j=1}++i>3{i=1;++j}{a[j,i]=$0}END{for(m=j;m>0;--m)for(n=1;n<=3;++n)print a[m,n]}' file
x1
x2
x3
b1
b2
b3
a1
a2
a3
This is so ugly that I'm kinda ashamed to even post it... so I guess I'll delete it as soon as a more decent answer pops up.
tac /path/to/file | awk '{ a[(NR-1)%3]=$0; if (NR%3==0) { print a[2] "\n" a[1] "\n" a[0] }}'
With the file:
~$ cat f
1
2
3
4
5
6
7
8
9
with awk: store the first line in a, then append each line on top of a and for the third line print/reinitialise:
~$ awk '{a=$0"\n"a}NR%3==0{print a}NR%3==1{a=$0}' f
3
2
1
6
5
4
9
8
7
then use tac to reverse again:
~$ awk '{a=$0"\n"a}NR%3==0{print a}NR%3==1{a=$0}' f | tac
7
8
9
4
5
6
1
2
3
Another way in awk
awk '{a[i]=a[i+=(NR%3==1)]?a[i]"\n"$0:$0}END{for(i=NR/3;i>0;i--)print a[i]}' file
Input
a1
a2
a3
x1
x2
x3
b1
b2
b3
Output
b1
b2
b3
x1
x2
x3
a1
a2
a3
Here's a pure Bash (Bash≥4) possibility that should be okay for files that are not too large.
We also assume that the number of lines in your file is a multiple of 3.
mapfile -t ary < /path/to/file
for((i=3*(${#ary[#]}/3-1);i>=0;i-=3)); do
printf '%s\n' "${ary[#]:i:3}"
done

count using awk commands

I have fileA.txt and a few lines of it are shown below:
AA
BB
CC
DD
EE
And i have fileB.txt, and it has text like shown below:
Group col2 col3 col4
1 pp 4567 AA,BC,AB
1 qp 3428 AA
2 pp 3892 AA
3 ee 28399 AA
4 dd 3829 BB,CC
1 dd 27819 BB
5 ak 29938 CC
For every line in fileA.txt, it should count the number of times it is present in fileB.txt based on column1 in fileB.txt.
Sample output should look like:
AA 3
BB 2
CC 2
AA is present 4 times but it is present in the group "1" twice. If it is present more than once in the same group in column1,it should be counted only once and therefore in the above output AA count is 3.
Any help using awk or any other oneliners?
Here is an awk one-liner that should work:
awk '
NR==FNR && !seen[$4,$1]++{count[$4]++;next}
($1 in count){print $1,count[$1]}' fileB.txt fileA.txt
Explaination:
NR==FNR&&!seen[$4,$1]++ pattern is only true when Column 1 has not been captured at all. For all duplicate captures we dont increment the counter.
$1 in count looks for first file column 1 presence in array. If it is present, we print along with counts.
Output:
$ awk 'NR==FNR && !seen[$4,$1]++{count[$4]++;next}($1 in count){print $1,count[$1]}' fileB.txt fileA.txt
AA 3
BB 2
CC 1
Update based on the modified question:
awk '
NR==FNR {
n = split($4,tmp,/,/);
for(x = 1; x <= n; x++) {
if(!seen[$1,tmp[x]]++) {
count[tmp[x]]++
}
}
next
}
($1 in count) {
print $1, count[$1]
}' fileB.txt fileA.txt
Outputs:
AA 3
BB 2
CC 2
Pure bash (4.0 or newer):
#!/bin/bash
declare -A items=()
# read in the list of items to track
while read -r; do items[$REPLY]=0; done <fileA.txt
# read fourth column from fileB and increment for each match
while read -r _ _ _ item _; do
[[ ${items[$item]} ]] || continue # skip unrecognized values
items[$item]=$(( items[$item] + 1 )) # otherwise, increment
done <fileB.txt
# print output
for key in "${!items[#]}"; do # iterate over keys
value="${items[$key]}" # look up values
printf '%s\t%s\n' "$key" "$value" # print them together
done
A simple awk one-liner.
awk 'NR>FNR{if($0 in a)print$0,a[$0];next}!a[$4,$1]++{a[$4]++}' fileB.txt fileA.txt
Note the order of files.

awk sum up multiple files show lines which does not appear on both sets of files

I have been using awk to sum up multiple files, this is used to sum up the summary of server log parsing values, it really does speed up the final overall count but I have hit a minor problem and the typical examples I have hit on the web have not helped.
Here is the example:
cat file1
aa 1
bb 2
cc 3
ee 4
cat file2
aa 1
bb 2
cc 3
dd 4
cat file3
aa 1
bb 2
cc 3
ff 4
And the script:
cat test.sh
#!/bin/bash
files="file1 file2 file3"
i=0;
oldname="";
for names in $(echo $files); do
((i++));
if [ $i == 1 ]; then
oldname=$names
#echo "-- $i $names"
shift;
else
oldname1=$names.$$
awk 'NR==FNR { _[$1]=$2 } NR!=FNR { if(_[$1] != "") nn=0; nn=($2+_[$1]); print $1" "nn }' $names $oldname> $oldname1
if [ $i -gt 2 ]; then
rm $oldname;
fi
oldname=$oldname1
fi
done
echo "------------------------------ $i"
cat $oldname
When I run this, the identical columns are added up but those that appear only in one of the files does not
./test.sh
------------------------------ 3
aa 3
bb 6
cc 9
ee 4
ff dd does not appear in the list, from what I have seen its within the NR==FR
I have come across this:
http://dbaspot.com/shell/246751-awk-comparing-two-files-problem.html
you want all the lines in file1 that are not in file2,
awk 'NR == FNR { a[$0]; next } !($0 in a)' file2 file1
If you want only uniq lines in file1 that are not in file2,
awk 'NR == FNR { a[$0]; next } !($0 in a) { print; a[$0] }'
file2
file1
but this only complicates current issue further when attempted since lots of other fields get duplicated
After posting question - updates to the content ... and tests....
I wanted to stick with awk since it does appear to be a much shorter way of achieving result there is a problem still..
awk '{a[$1]+=$2}END{for (k in a) print k,a[k]}' file1 file2 file3
aa 3
bb 6
cc 9
ee 4
ff 4
gg 4
RESULT_SET_4 0
RESULT_SET_3 0
RESULT_SET_2 0
RESULT_SET_1 0
$ cat file1
RESULT_SET_1
aa 1
RESULT_SET_2
bb 2
RESULT_SET_3
cc 3
RESULT_SET_4
ff 4
$ cat file2
RESULT_SET_1
aa 1
RESULT_SET_2
bb 2
RESULT_SET_3
cc 3
RESULT_SET_4
ee 4
The file content is not left as it was originally i.e. the results are not under the headings, my original method did keep it all intact
Updated expected output - headings in correct context
cat file1
RESULT_SET_1
aa 1
RESULT_SET_2
bb 2
RESULT_SET_3
cc 3
RESULT_SET_4
ff 4
cat file2
RESULT_SET_1
aa 1
RESULT_SET_2
bb 2
RESULT_SET_3
cc 3
RESULT_SET_4
ee 4
cat file3
RESULT_SET_1
aa 1
RESULT_SET_2
bb 2
RESULT_SET_3
cc 3
RESULT_SET_4
gg 4
test.sh awk line to produce above is :
awk -v i=$i 'NR==FNR { _[$1]=$2 } NR!=FNR { if (_[$1] != "") { if ($2 ~ /[0-9]/) { nn=($2+_[$1]); print $1" "nn; } else { print;} }else { print; } }' $names $oldname> $oldname1
./test.sh
------------------------------ 3
RESULT_SET_1
aa 3
RESULT_SET_2
bb 6
RESULT_SET_3
cc 9
RESULT_SET_4
ff 4
works but destroys required formatting
awk '($2 != "") {a[$1]+=$2}; ($2 == "") { a[$1]=$2 } END {for (k in a) print k,a[k]} ' file1 file2 file3
aa 3
bb 6
cc 9
ee 4
ff 4
gg 4
RESULT_SET_4
RESULT_SET_3
RESULT_SET_2
RESULT_SET_1
$ awk '{a[$1]+=$2}END{for (k in a) print k,a[k]}' file1 file2 file3 | sort
aa 3
bb 6
cc 9
dd 4
ee 4
ff 4
Edit:
It's a bit of a hack but it does the job:
$ awk 'FNR==NR&&!/RESULT/{a[$1]=$2;next}($1 in a){a[$1]+=$2}END{for (k in a) print k,a[k]}' file1 file2 file3 | sort | awk '$1="RESULTS_SET_"NR"\n"$1'
RESULTS_SET_1
aa 3
RESULTS_SET_2
bb 6
RESULTS_SET_3
cc 9
RESULTS_SET_4
ff 4
You can do this in awk, as sudo_O suggested, but you can also do it in pure bash.
#!/bin/bash
# We'll use an associative array, where the indexes are strings.
declare -A a
# Our list of files, in an array (not associative)
files=(file1 file2 file3)
# Walk through array of files...
for file in "${files[#]}"; do
# And for each file, increment the array index with the value.
while read index value; do
((a[$index]+=$value))
done < "$file"
done
# Walk through array. ${!...} returns a list of indexes.
for i in ${!a[#]}; do
echo "$i ${a[$i]}"
done
And the result:
$ ./doit
dd 4
aa 3
ee 4
bb 6
ff 4
cc 9
And if you want the output sorted ... you can pipe it through sort. :)
Here's one way using GNU awk. Run like:
awk -f script.awk File1 File2 File3
Contents of script.awk:
sub(/RESULT_SET_/,"") {
i = $1
next
}
{
a[i][$1]+=$2
}
END {
for (j=1;j<=length(a);j++) {
print "RESULT_SET_" j
for (k in a[j]) {
print k, a[j][k]
}
}
}
Results:
RESULT_SET_1
aa 3
RESULT_SET_2
bb 6
RESULT_SET_3
cc 9
RESULT_SET_4
ee 4
ff 4
gg 4
Alternatively, here's the one-liner:
awk 'sub(/RESULT_SET_/,"") { i = $1; next } { a[i][$1]+=$2 } END { for (j=1;j<=length(a);j++) { print "RESULT_SET_" j; for (k in a[j]) print k, a[j][k] } }' File1 File2 File3
fixed using this
Basically it goes through each file, if the entry exists on the other side, it will add the entry to approximate line number with a 0 value so that it can sum up the content - been testing this on my current output and seems to be working real well
#!/bin/bash
files="file1 file2 file3 file4 file5 file6 file7 file8"
RAND="$$"
i=0;
oldname="";
for names in $(echo $files); do
((i++));
if [ $i == 1 ]; then
oldname=$names
shift;
else
oldname1=$names.$RAND
for entries in $(awk -v i=$i 'NR==FNR { _[$1]=$2 } NR!=FNR { if (_[$1] == "") { if ($2 ~ /[0-9]/) { nn=0; nn=(_[$1]+=$2); print FNR"-"$1"%0"} else { } } else { } }' $oldname $names); do
line=$(echo ${entries%%-*})
content=$(echo ${entries#*-})
content=$(echo $content|tr "%" " ")
edit=$(ed -s $oldname << EOF
$line
a
$content
.
w
q
EOF
)
$edit >/dev/null 2>&1
done
awk -v i=$i 'NR==FNR { _[$1]=$2 } NR!=FNR { if (_[$1] != "") { if ($2 ~ /[0-9]/) { nn=0; nn=($2+_[$1]); print $1" "nn; } else { print $1;} }else { print; } }' $names $oldname> $oldname1
oldname=$oldname1
fi
done
cat $oldname
#rm file?.*

linux, Comma Separated Cells to Rows Preserve/Aggregate Column

There was a similar question here but for excel/vba Excel Macro - Comma Separated Cells to Rows Preserve/Aggregate Column
because i have a big file (>300mb) this is not an option, thus I am struggeling to get it to work in bash.
Based on this data
1 Cat1 a,b,c
2 Cat2 d
3 Cat3 e
4 Cat4 f,g
I would like to convert it to:
1 Cat1 a
1 Cat1 b
1 Cat1 c
2 Cat2 d
3 Cat3 e
4 Cat4 f
4 Cat4 g
cat > data << EOF
1 Cat1 a,b,c
2 Cat2 d
3 Cat3 e
4 Cat4 f,g
EOF
set -f # turn off globbing
IFS=, # prepare for comma-separated data
while IFS=$'\t' read C1 C2 C3; do # split columns at tabs
for X in $C3; do # split C3 at commas (due to IFS)
printf '%s\t%s\t%s\n' "$C1" "$C2" "$X"
done
done < data
This looks like a job for awk or perl.
awk 'BEGIN { FS = OFS = "\t" }
{ split($3, a, ",");
for (i in a) {$3 = a[i]; print} }'
perl -F'\t' -alne 'foreach (split ",", $F[2]) {
$F[2] = $_; print join("\t", #F)
}'
Both programs are based on the same algorithm: split the third column at commas, and iterate over the components, printing the original line with each component in the third column in turn.

Resources