How to count all numbers in a file with awk? - linux

I want to count all numbers that are in a file.
Example:
input -> Hi, this is 25 ...
input -> Lalala 21 or 29 what is ... 79?
The output should be the sum of all numbers: 154 (that is, 25+21+29+79).

From this beautiful answer by hek2mgl on how to extract the biggest number in a file, let's catch all the numbers in the file and sum them:
$ awk '{for(i=1;i<=NF;i++){sum+=$i}}END{print sum}' RS='$' FPAT='-{0,1}[0-9]+' file
154
This sets the record separator in a way that the whole block of text is a unique record. Then, it sets FPAT so that every single number (positive or negative) is a different field:
FPAT #
A regular expression (as a string) that tells gawk to create the
fields based on text that matches the regular expression. Assigning a
value to FPAT overrides the use of FS and FIELDWIDTHS for field
splitting.

$ cat data
Hi, this is 25 ...
Lalala 21 or 29 what is ... 79?
$ grep -oP '\b\d+\b' data | paste -s -d '+' | bc
154

With grep and awk :
$ cat test.txt
Hi, this is 25 ...
Lalala 21 or 29 what is ... 79?
$ grep '[0-9]\+' -o test.txt | awk '{ sum+=$1} END {print sum}'
154

Related

How to use awk '{print $1*Number}' from the second line or telling him to ignore NaN values?

I have a file called 'waterproofposters.jsonl' with this type of output:
Regular price
100
200
300
400
500
And I need to take out 2% of each value. I have used the following code:
awk '{print $1*0.98}' waterproofposters.jsonl
And then I have the following output:
0
98
196
294
392
490
And then I'm stuck because I need to have 'Regular price' in the first line instead '0'
I thought to replace '0' with 'Regular price using
find . -name "waterproof.jsonl" | xargs sed -i -e 's/0/Regular price/g'
But it will replace all the '0' by 'Regular price'
To print the first line as-is:
awk '{print (NR>1 ? $0*0.98 : $0)}'
To print lines that are not a number as-is:
awk '{print ($0+0 == $0 ? $0*0.98 : $0)}'
I'm using $0 instead of $1 in the multiplication because:
They're the same thing in your numerical input, and
I aesthetically prefer using the same value across the whole script rather than different values for the numeric vs non-numeric lines, and
When you use a specific field it causes awk to do field-splitting so it's a bit more efficient to not reference a field when the whole record will do.
Here's both of the above working with the posted sample input:
$ awk '{print (NR>1 ? $0*0.98 : $0)}' file
Regular price
98
196
294
392
490
$ awk '{print ($0+0 == $0 ? $0*0.98 : $0)}' file
Regular price
98
196
294
392
490
and here's the difference between the two given input that has a non-numeric value mid input file:
$ cat file
Regular price
100
200
foobar
400
500
$ awk '{print (NR>1 ? $0*0.98 : $0)}' file
Regular price
98
196
0
392
490
$ awk '{print ($0+0 == $0 ? $0*0.98 : $0)}' file
Regular price
98
196
foobar
392
490
You can certainly achieve what you need with a single awk call, but an answer to why your sed -i -e 's/0/Regular price/g' command did not work as expected is that you used 0 as the regex pattern. 0 matches any zero char inside the string.
You want to replace 0s that are the only char on a line.
Hence, you need to use ^ and $ anchors to match the start and end of the line respectively:
sed -i 's/^0$/Regular price/'
If you need to replace on the first line only add the 1 address before the substitution command:
sed -i '1 s/^0$/Regular price/'
Note you do not need g, since you only expect one replacement per line and g is only needed when performing multiple replacements on a line. By default, all lines will get processed.
How to use awk '{print $1Number}' from the second line or telling him to ignore NaN values?*
I would do it following way using GNU AWK, let file.txt content be
Regular price
100
200
300
400
500
then
awk 'NR==1{print}NR>=2{print $1*0.98}' file.txt
output
Regular price
98
196
294
392
490
Explanation: if it 1st line just print it, if it 2nd or later line print 0.98 of 1st column value
(tested in GNU Awk 5.0.1)

Print out only last 4 digits of mac addresses from 2nd column using awk in linux

I have made a shell script for getting the list of mac address using awk and arp-scan command. I want to strip the mac address to only last 4 digits i.e (i want to print only the letters yy)
ac:1e:04:0e:yy:yy
ax:8d:5c:27:yy:yy
ax:ee:fb:55:yy:yy
dx:37:42:c9:yy:yy
cx:bf:9c:a4:yy:yy
Try cut -d: -f5-
(Options meaning: delimiter : and fields 5 and up.)
EDIT: Or in awk, as you requested:
awk -F: '{ print $5 ":" $6 }'
here are a few
line=cx:bf:9c:a4:yy:yy
echo ${line:(-5)}
line=cx:bf:9c:a4:yy:yy
echo $line | cut -d":" -f5-
I imagine you want to strip the trailing spaces, but it isn't clear whether you want yy:yy or yyyy.
Anyhow, there are multiple ways to it but you already are running AWK and have the MAC in $2.
In the first case it would be:
awk '{match($2,/([^:]{2}:[^:]{2}) *$/,m); print m[0]}'
yy:yy
In the second (no colon :):
awk 'match($2,/([^:]{2}):([^:]{2}) *$/,m); print m[1]m[2]}'
yyyy
In case you don't have match available in your AWK, you'd need to resort to gensub.
awk '{print gensub(/.*([^:]{2}:[^:]{2}) *$/,"\\1","g",$2)}'
yy:yy
or:
awk '{print gensub(/.*([^:]{2}):([^:]{2}) *$/,"\\1\\2","g",$0)}'
yyyy
Edit:
I now realized the trailing spaces were added by anubhava in his edit; they were not present in the original question! You can then simply keep the last n characters:
awk '{print substr($2,13,5)}'
yy:yy
or:
awk '{print substr($2,13,2)substr($2,16,2)}'
yyyy
Taking into account that the mac address always is 6 octets, you probably could just do something like this to get the last 2 octets:
awk '{print substr($0,13)}' input.txt
While testing on the fly by using arp -an I notice that the output was not always printing the mac addresses in some cases it was returning something like:
(169.254.113.54) at (incomplete) on en4 [ethernet]
Therefore probably is better to filter the input to guarantee a mac address, this can be done by applying this regex:
^([0-9A-Fa-f]{2}[:-]){5}([0-9A-Fa-f]{2})$
Applying the regex in awk and only printing the 2 last octecs:
arp -an | awk '{if ($4 ~ /^([0-9A-Fa-f]{2}[:-]){5}([0-9A-Fa-f]{2})$/) print substr($4,13)}'
This will filter the column $4 and verify that is a valid MAC address, then it uses substr to just return the last "letters"
You could also split by : and print the output in multiple ways, for example:
awk '{if ($4 ~ /^([0-9A-Fa-f]{2}[:-]){5}([0-9A-Fa-f]{2})$/) split($4,a,":"); print a[5] ":" a[6]}
Notice the exp ~ /regexp/
This is true if the expression exp (taken as a string) is matched by regexp.
The following example matches, or selects, all input records with the upper-case letter `J' somewhere in the first field:
$ awk '$1 ~ /J/' inventory-shipped
-| Jan 13 25 15 115
-| Jun 31 42 75 492
-| Jul 24 34 67 436
-| Jan 21 36 64 620
So does this:
awk '{ if ($1 ~ /J/) print }' inventory-shipped

Sum all the numbers in a file given by positional parameter

I want to sum all the numbers in a file (columns and lines) given by the first parameter, but my program shows sum=sum+$i instead of the numeric sum:
sum=0;
file=$1
for i in $file
do
sum=sum+$i;
done;
echo "The sum is: " $sum
Input file:
$cat file.txt
10 20 10
40
50
Expected output :
The sum is: 21
Maybe if there is an awk method to solve this?
Try this -
$cat file1.txt
10 20 10
40
50
$awk '{for(i=1;i<=NF;i++) {sum+=$i}} END {print sum}' file1.txt
130
OR
$xargs < file1.txt| tr ' ' + | bc
130
cat file.txt | xargs | sed -e 's/\ /+/g' | bc
You can also use a simple read and an array to sum the value relying on word splitting to separate the values into an array via the default IFS (Internal Field Separator), e.g.
#!/bin/bash
declare -i sum=0
fn="${1:-/dev/stdin}" ## read from file as 1st argument (default stdin)
while read -r line; do ## read each line
a=( $line ) ## separate values into array
for i in ${a[#]}; do ## for each value in array
((sum += i)) ## add to sum
done
done <"$fn"
echo "sum: $sum"
Example Input File
$ cat dat/numfile.txt
10 20 10
40
50
Example Use/Output
$ bash sumnumfile.sh dat/numfile.txt
sum: 130
Another for some awks (at least mawk and gawk):
$ awk -v RS="[^0-9]" '{s+=$1}END{print s}' file
130

how to use shell to split string into correct format?

I have this file with time duration. Some have days but mostly in hh:mm form. The entire form is dd+hh:mm
I was trying to "tr -s '+:' ':'" them into dd:hh:mm form and then split($1,tm,":")calculate them into seconds.
However, the problem I am facing is that after this operation, the form with hh:mm would have hh in tm[1] but if its dd:hh:mm then the tm[1] would be dd.
Is there a way to put the hh in form of hh:mm into tm[2] and put tm[1] to be 0 Please?
4+11:26
10+06:54
20:27
is the input
the output I wanted would be(in form of tm[1], tm[2], tm[3]):
4 11 26
10 06 54
0 20 27
I would first preprocess it with sed (to add missing 0+ in lines that don't have a plus sign) and then tr +: to spaces:
cat a.txt | sed 's/^\([^+]\+\)$/0+\1/g' | tr '+:' ' '
Or as suggested by Lars, shorter sed version:
cat a.txt | sed '/+/! s/^/0+/;' | tr '+:' ' '
awk to the rescue!
You can do the conversion and computation in awk, using your input file the values are converted to minutes
$ awk -F: '{if($1~/+/){split($1,f,"+");h=f[1]*24+f[2]}
else h=$1; m=h*60+$2; print $0 " --> " m}' file
4+11:26 --> 6446
10+06:54 --> 14814
20:27 --> 1227

Match specific column with grep command

I am having trouble matching specific column with grep command. I have a test file (test.txt) like this..
Bra001325 835 T 13 c$c$c$c$c$cccccCcc !!!!!68886676
Bra001325 836 C 8 ,,,,,.,, 68886676
Bra001325 841 A 6 ,$,.,,. BJJJJE
Bra001325 866 C 2 ,. HJ
And i want to extract all those lines which has a number 866 in the second column. When i use grep command i am getting all the lines that contains the number that number
grep "866" test.txt
Bra001325 835 T 13 c$c$c$c$c$cccccCcc !!!!!68886676
Bra001325 836 C 8 ,,,,,.,, 68886676
Bra001325 866 C 2 ,. HJ
How can i match specific column with grep command?
Try doing this :
$ awk '$2 == 866' test.txt
No need to add {print}, the default behaviour of awk is to print on a true condition.
with grep :
$ grep -P '^\S+\s+866\b' *
But awk can print filenames too & is quite more robust than grep here :
$ awk '$2 == 866{print FILENAME":"$0; nextfile}' *
In my case, the field separator is not space but comma. So I would have to add this, otherwise it won't work for me (On ubuntu 18.04.1).
awk -F ', ' '$2 == 866' test.txt

Resources