I have a .txt file with the following content:
a 3
a 4
a 5
a 6
b 1
b 3
b 5
c 9
c 10
I am wondering if there is any command (no awk if possible) that can read the .txt file and give the following output (Sorted by the second column):
c 19
a 18
b 9
You can use awk piped to sort:
awk '{sums[$1] += $2} END {for (i in sums) print i, sums[i]}' file | sort -rnk2
c 19
a 18
b 9
sums[$1] += $2 is adding value of $2 in an array sums that is indexed by field #1 ($1).
sort -rnk2 is reverse sorting numerically output of awk on field 2
Use can use this code:
cat 1.txt | awk '{arr[$1]+=$2}END{for (var in arr) print var," ",arr[var]}' | sort -rnk 2
Explanation:
cat 1.txt - read 1.txt file with content
awk - is a language very useful for data manipulation
{arr[$1]+=$2} for each line in content file increase array item with key first field with value of second field. Field separator by default is space.
END{for (var in arr) print var," ",arr[var]}' - after all line is proceeded, print array content
sort -rnk 2 - reverse numeric sort on field 2
Non-awk solutions.
perl
perl -lane '
$sum{$F[0]} += $F[1]
} END {
$, = " ";
print $_, $sum{$_} for reverse sort {$sum{$a} <=> $sum{$b}} keys %sum
' file.txt
bash version 4
declare -A sum
while read key val; do (( sum[$key] += $val )); done < file.txt
for key in "${!sum[#]}"; do echo "$key ${sum[$key]}"; done | sort -rn -k2
non-awk challenge accepted
vars=$(cut -d" " -f1 nums | uniq); paste <(echo "$vars") <(cat <(sed -e 's/ /+=/' nums) <(echo "$vars" | sed 's/$/;/') | bc) | sort -k2,2nr
c 19
a 18
b 9
Related
Im trying to get the columsums (exept for the first one) of a tab delimited containing numbers.
To find out the number of columns an store it in a variable I use:
cols=$(awk '{print NF}' file.txt | sort -nu | tail -n 1
next I want to calculate the sum of all numbers in that column and store this in a variable again in a for loop:
for c in 2:$col
do
num=$(cat file.txt | awk '{sum+$2 ; print $0} END{print sum}'| tail -n 1
done
this
num=$(cat file.txt | awk '{sum+$($c) ; print $0} END{print sum}'| tail -n 1
on itself with a fixed numer and without variable input works find but i cannot get it to accept the for-loop variable.
Thanks for the support
p.s. It would also be fine if i could sum all columns (expept the first one) at once without the loop-trouble.
Assuming you want the sums of the individual columns,
$ cat file
1 2 3 4
5 6 7 8
9 10 11 12
$ awk '
{for (i=2; i<=NF; i++) sum[i] += $i}
END {for (i=2; i<=NF; i++) printf "%d%s", sum[i], OFS; print ""}
' file
18 21 24
In case you're not bound to awk, there's a nice tool for "command-line statistical operations" on textual files called GNU datamash.
With datamash, summing (probably the simplest operation of all) a 2nd column is as easy as:
$ datamash sum 2 < table
9
Assuming the table file holds tab-separated data like:
$ cat table
1 2 3 4
2 3 4 5
3 4 5 6
To sum all columns from 2 to n use column ranges (available in datamash 1.2):
$ n=4
$ datamash sum 2-$n < table
9 12 15
To include headers, see the --headers-out option
I want to use awk/sed to deal with two files(a.txt and b.txt) below and get the result
cat a.txt
a UK
b Japan
c China
d Korea
e US
And cat b.txt results
c Russia
e Canada
The result that I want is as below:
a UK
b Japan
c Russia
d Korea
e Canada
With awk:
First fill aray/hash a with complete row ($0) and use first column ($1) from this row as index. Finally, print all elements of array/hash a with a loop.
awk '{a[$1]=$0} END{for(i in a) print a[i]}' file1 file2
Output:
a UK
b Japan
c Russia
d Korea
e Canada
try:
awk 'FNR==NR{A[$1]=$NF;next} {printf("%s %s\n",$1,$1 in A?A[$1]:$NF)}' b.txt a.txt
Checking here condition FNR==NR which will be TRUE only when first file(b.txt) is being read. Then creating an array named A whose index is $1 and have the value last column. Then using printf for printing 2 strings where first string is $1 and another is if $1 of a.txt is present in array A then print array A's value whose index is $1 else print last column of a.tzt itself.
EDIT: as OP had carriage characters into Input_files so please remove them by following too.
tr -d '\r' < b.txt > temp_b.txt && mv temp_b.txt b.txt
You can use the below one-liner:
join -a 1 -a 2 a.txt <( awk '{print $1, "--", $0, "--"}' < b.txt ) | sed 's/ --$//' | awk -F ' -- ' '{print $NF}'
We use awk to prefix each line in b.txt with a key and -- to give us a split point later:
<( awk '{print $1, "--", $0, "--"}' < b.txt )
Use the join command to join the files on common keys. The -a 1 option tells the command to
join -a 1 -a 2 a.txt <( awk '{print $1, "--", $0, "--"}' < b.txt )
Use sed to remove the -- parts that are on some end of lines:
sed 's/ --$//'
Use awk to print the last item on each line:
awk -F ' -- ' '{print $NF}'
$ awk 'NR==FNR{b[$1]=$2;next} {print $1, ($1 in b ? b[$1] : $2)}' b.txt a.txt
a UK
b Japan
c Russia
d Korea
e Canada
I want to sum all the numbers in a file (columns and lines) given by the first parameter, but my program shows sum=sum+$i instead of the numeric sum:
sum=0;
file=$1
for i in $file
do
sum=sum+$i;
done;
echo "The sum is: " $sum
Input file:
$cat file.txt
10 20 10
40
50
Expected output :
The sum is: 21
Maybe if there is an awk method to solve this?
Try this -
$cat file1.txt
10 20 10
40
50
$awk '{for(i=1;i<=NF;i++) {sum+=$i}} END {print sum}' file1.txt
130
OR
$xargs < file1.txt| tr ' ' + | bc
130
cat file.txt | xargs | sed -e 's/\ /+/g' | bc
You can also use a simple read and an array to sum the value relying on word splitting to separate the values into an array via the default IFS (Internal Field Separator), e.g.
#!/bin/bash
declare -i sum=0
fn="${1:-/dev/stdin}" ## read from file as 1st argument (default stdin)
while read -r line; do ## read each line
a=( $line ) ## separate values into array
for i in ${a[#]}; do ## for each value in array
((sum += i)) ## add to sum
done
done <"$fn"
echo "sum: $sum"
Example Input File
$ cat dat/numfile.txt
10 20 10
40
50
Example Use/Output
$ bash sumnumfile.sh dat/numfile.txt
sum: 130
Another for some awks (at least mawk and gawk):
$ awk -v RS="[^0-9]" '{s+=$1}END{print s}' file
130
Given a file with data like this (ie stores.dat file)
sid|storeNo|latitude|longitude
2tt|1|-28.0372000t0|153.42921670
9|2t|-33tt.85t09t0000|15t1.03274200
What is the command that would return the number of occurrences of the 't' character per line?
eg. would return:
count lineNum
4 1
3 2
6 3
Also, to do it by count of occurrences by field what is the command to return the following results?
eg. input of column 2 and character 't'
count lineNum
1 1
0 2
1 3
eg. input of column 3 and character 't'
count lineNum
2 1
1 2
4 3
To count occurrence of a character per line you can do:
awk -F'|' 'BEGIN{print "count", "lineNum"}{print gsub(/t/,"") "\t" NR}' file
count lineNum
4 1
3 2
6 3
To count occurrence of a character per field/column you can do:
column 2:
awk -F'|' -v fld=2 'BEGIN{print "count", "lineNum"}{print gsub(/t/,"",$fld) "\t" NR}' file
count lineNum
1 1
0 2
1 3
column 3:
awk -F'|' -v fld=3 'BEGIN{print "count", "lineNum"}{print gsub(/t/,"",$fld) "\t" NR}' file
count lineNum
2 1
1 2
4 3
gsub() function's return value is number of substitution made. So we use that to print the number.
NR holds the line number so we use it to print the line number.
For printing occurrences of particular field, we create a variable fld and put the field number we wish to extract counts from.
grep -n -o "t" stores.dat | sort -n | uniq -c | cut -d : -f 1
gives almost exactly the output you want:
4 1
3 2
6 3
Thanks to #raghav-bhushan for the grep -o hint, what a useful flag. The -n flag includes the line number as well.
To count occurences of a character per line:
$ awk -F 't' '{print NF-1, NR}' input.txt
4 1
3 2
6 3
this sets field separator to the character that needs to be counted, then uses the fact that number of fields is one greater than number of separators.
To count occurences in a particular column cut out that column first:
$ cut -d '|' -f 2 input.txt | awk -F 't' '{print NF-1, NR}'
1 1
0 2
1 3
$ cut -d '|' -f 3 input.txt | awk -F 't' '{print NF-1, NR}'
2 1
1 2
4 3
One possible solution using perl:
Content of script.pl:
use warnings;
use strict;
## Check arguments:
## 1.- Input file
## 2.- Char to search.
## 3.- (Optional) field to search. If blank, zero or bigger than number
## of columns, default to search char in all the line.
(#ARGV == 2 || #ARGV == 3) or die qq(Usage: perl $0 input-file char [column]\n);
my ($char,$column);
## Get values or arguments.
if ( #ARGV == 3 ) {
($char, $column) = splice #ARGV, -2;
} else {
$char = pop #ARGV;
$column = 0;
}
## Check that $char must be a non-white space character and $column
## only accept numbers.
die qq[Bad input\n] if $char !~ m/^\S$/ or $column !~ m/^\d+$/;
print qq[count\tlineNum\n];
while ( <> ) {
## Remove last '\n'
chomp;
## Get fields.
my #f = split /\|/;
## If column is a valid one, select it to the search.
if ( $column > 0 and $column <= scalar #f ) {
$_ = $f[ $column - 1];
}
## Count.
my $count = eval qq[tr/$char/$char/];
## Print result.
printf qq[%d\t%d\n], $count, $.;
}
The script accepts three parameters:
Input file
Char to search
Column to search: If column is a bad digit, it searchs all the line.
Running the script without arguments:
perl script.pl
Usage: perl script.pl input-file char [column]
With arguments and its output:
Here 0 is a bad column, it searches all the line.
perl script.pl stores.dat 't' 0
count lineNum
4 1
3 2
6 3
Here it searches in column 1.
perl script.pl stores.dat 't' 1
count lineNum
0 1
2 2
0 3
Here it searches in column 3.
perl script.pl stores.dat 't' 3
count lineNum
2 1
1 2
4 3
th is not a char.
perl script.pl stores.dat 'th' 3
Bad input
No need for awk or perl, only with bash and standard Unix utilities:
cat file | tr -c -d "t\n" | cat -n |
{ echo "count lineNum"
while read num data; do
test ${#data} -gt 0 && printf "%4d %5d\n" ${#data} $num
done; }
And for a particular column:
cut -d "|" -f 2 file | tr -c -d "t\n" | cat -n |
{ echo -e "count lineNum"
while read num data; do
test ${#data} -gt 0 && printf "%4d %5d\n" ${#data} $num
done; }
And we can even avoid tr and the cats:
echo "count lineNum"
num=1
while read data; do
new_data=${data//t/}
count=$((${#data}-${#new_data}))
test $count -gt 0 && printf "%4d %5d\n" $count $num
num=$(($num+1))
done < file
and event the cut:
echo "count lineNum"
num=1; OLF_IFS=$IFS; IFS="|"
while read -a array_data; do
data=${array_data[1]}
new_data=${data//t/}
count=$((${#data}-${#new_data}))
test $count -gt 0 && printf "%4d %5d\n" $count $num
num=$(($num+1))
done < file
IFS=$OLF_IFS
awk '{gsub("[^t]",""); print length($0),NR;}' stores.dat
The call to gsub() deletes everything in the line that is not a t, then just print the length of what remains, and the current line number.
Want to do it just for column 2?
awk 'BEGIN{FS="|"} {gsub("[^t]","",$2); print NR,length($2);}' stores.dat
You could also split the line or field with "t" and check the length of the resulting array - 1. Set the col variable to 0 for the line or 1 through 3 for columns:
awk -F'|' -v col=0 -v OFS=$'\t' 'BEGIN {
print "count", "lineNum"
}{
split($col, a, "t"); print length(a) - 1, NR
}
' stores.dat
$ cat -n test.txt
1 test 1
2 you want
3 void
4 you don't want
5 ttttttttttt
6 t t t t t t
$ awk '{n=split($0,c,"t")-1;if (n!=0) print n,NR}' test.txt
2 1
1 2
2 4
11 5
6 6
cat stores.dat | awk 'BEGIN {FS = "|"}; {print $1}' | awk 'BEGIN {FS = "\t"}; {print NF}'
Where $1 would be a column number you want to count.
perl -e 'while(<>) { $count = tr/t//; print "$count ".++$x."\n"; }' stores.dat
Another perl answer yay! The tr/t// function returns the count of the number of times the translation occurred on that line, in other words the number of times tr found the character 't'. ++$x maintains the line number count.
i had made a small bash script in order to get the frequency of items in a certain column of a file.
The output would be sth like this
A 30
B 25
C 20
D 15
E 10
the command i used inside the script is like this
cut -f $1 $2| sort | uniq -c |
sort -r -k1,1 -n | awk '{printf "%-20s %-15d\n", $2,$1}'
how can i modify it to show the relative percentages for each case as well. so it would be like
A 30 30%
B 25 25%
C 20 20%
D 15 15%
E 10 10%
Try this (with the sort moved to the end:
cut -f $1 $2| sort | uniq -c | awk '{array[$2]=$1; sum+=$1} END { for (i in array) printf "%-20s %-15d %6.2f%%\n", i, array[i], array[i]/sum*100}' | sort -r -k2,2 -n
Change your awk command to something like this:
awk '{ a[++n,1] = $2; a[n,2] = $1; t += $1 }
END {
for (i = 1; i <= n; i++)
printf "%-20s %-15d%d%%\n", a[i,1], a[i,2], 100 * a[i,2] / t
}'