How to use awk '{print $1*Number}' from the second line or telling him to ignore NaN values? - linux

I have a file called 'waterproofposters.jsonl' with this type of output:
Regular price
100
200
300
400
500
And I need to take out 2% of each value. I have used the following code:
awk '{print $1*0.98}' waterproofposters.jsonl
And then I have the following output:
0
98
196
294
392
490
And then I'm stuck because I need to have 'Regular price' in the first line instead '0'
I thought to replace '0' with 'Regular price using
find . -name "waterproof.jsonl" | xargs sed -i -e 's/0/Regular price/g'
But it will replace all the '0' by 'Regular price'

To print the first line as-is:
awk '{print (NR>1 ? $0*0.98 : $0)}'
To print lines that are not a number as-is:
awk '{print ($0+0 == $0 ? $0*0.98 : $0)}'
I'm using $0 instead of $1 in the multiplication because:
They're the same thing in your numerical input, and
I aesthetically prefer using the same value across the whole script rather than different values for the numeric vs non-numeric lines, and
When you use a specific field it causes awk to do field-splitting so it's a bit more efficient to not reference a field when the whole record will do.
Here's both of the above working with the posted sample input:
$ awk '{print (NR>1 ? $0*0.98 : $0)}' file
Regular price
98
196
294
392
490
$ awk '{print ($0+0 == $0 ? $0*0.98 : $0)}' file
Regular price
98
196
294
392
490
and here's the difference between the two given input that has a non-numeric value mid input file:
$ cat file
Regular price
100
200
foobar
400
500
$ awk '{print (NR>1 ? $0*0.98 : $0)}' file
Regular price
98
196
0
392
490
$ awk '{print ($0+0 == $0 ? $0*0.98 : $0)}' file
Regular price
98
196
foobar
392
490

You can certainly achieve what you need with a single awk call, but an answer to why your sed -i -e 's/0/Regular price/g' command did not work as expected is that you used 0 as the regex pattern. 0 matches any zero char inside the string.
You want to replace 0s that are the only char on a line.
Hence, you need to use ^ and $ anchors to match the start and end of the line respectively:
sed -i 's/^0$/Regular price/'
If you need to replace on the first line only add the 1 address before the substitution command:
sed -i '1 s/^0$/Regular price/'
Note you do not need g, since you only expect one replacement per line and g is only needed when performing multiple replacements on a line. By default, all lines will get processed.

How to use awk '{print $1Number}' from the second line or telling him to ignore NaN values?*
I would do it following way using GNU AWK, let file.txt content be
Regular price
100
200
300
400
500
then
awk 'NR==1{print}NR>=2{print $1*0.98}' file.txt
output
Regular price
98
196
294
392
490
Explanation: if it 1st line just print it, if it 2nd or later line print 0.98 of 1st column value
(tested in GNU Awk 5.0.1)

Related

how to write awk code with specific condition

I want to create a code that operates on a certain number of a row of data, for which I just want to count negative numbers to make them positive by multiplying by the number itself negative
example
data
10
11
-12
-13
-14
expected output
10
11
144
169
196
this is what I've been try
awk 'int($0)<0 {$4 = int($0) + 360}
END {print $4}' data.txt
but I don't even get the output, anyone can help me?
awk '$0 < 0 { $0 = $0 * $0 } 1' data.txt
The first condition multiplies the value by itself when it's negative. The condition 1 is always true, so the line is printed unconditionally.
Also:
awk '{print($0<0)?$0*$0:$0}' input
$ awk '{print $0 ^ (/-/ ? 2 : 1)}' file
10
11
144
169
196
You could also match only digits that start with - and in that case multiply them by themselves
awk '{print (/^-[0-9]+$/ ? $0 * $0 : $0)}' data.txt
Output
10
11
144
169
196

Print out only last 4 digits of mac addresses from 2nd column using awk in linux

I have made a shell script for getting the list of mac address using awk and arp-scan command. I want to strip the mac address to only last 4 digits i.e (i want to print only the letters yy)
ac:1e:04:0e:yy:yy
ax:8d:5c:27:yy:yy
ax:ee:fb:55:yy:yy
dx:37:42:c9:yy:yy
cx:bf:9c:a4:yy:yy
Try cut -d: -f5-
(Options meaning: delimiter : and fields 5 and up.)
EDIT: Or in awk, as you requested:
awk -F: '{ print $5 ":" $6 }'
here are a few
line=cx:bf:9c:a4:yy:yy
echo ${line:(-5)}
line=cx:bf:9c:a4:yy:yy
echo $line | cut -d":" -f5-
I imagine you want to strip the trailing spaces, but it isn't clear whether you want yy:yy or yyyy.
Anyhow, there are multiple ways to it but you already are running AWK and have the MAC in $2.
In the first case it would be:
awk '{match($2,/([^:]{2}:[^:]{2}) *$/,m); print m[0]}'
yy:yy
In the second (no colon :):
awk 'match($2,/([^:]{2}):([^:]{2}) *$/,m); print m[1]m[2]}'
yyyy
In case you don't have match available in your AWK, you'd need to resort to gensub.
awk '{print gensub(/.*([^:]{2}:[^:]{2}) *$/,"\\1","g",$2)}'
yy:yy
or:
awk '{print gensub(/.*([^:]{2}):([^:]{2}) *$/,"\\1\\2","g",$0)}'
yyyy
Edit:
I now realized the trailing spaces were added by anubhava in his edit; they were not present in the original question! You can then simply keep the last n characters:
awk '{print substr($2,13,5)}'
yy:yy
or:
awk '{print substr($2,13,2)substr($2,16,2)}'
yyyy
Taking into account that the mac address always is 6 octets, you probably could just do something like this to get the last 2 octets:
awk '{print substr($0,13)}' input.txt
While testing on the fly by using arp -an I notice that the output was not always printing the mac addresses in some cases it was returning something like:
(169.254.113.54) at (incomplete) on en4 [ethernet]
Therefore probably is better to filter the input to guarantee a mac address, this can be done by applying this regex:
^([0-9A-Fa-f]{2}[:-]){5}([0-9A-Fa-f]{2})$
Applying the regex in awk and only printing the 2 last octecs:
arp -an | awk '{if ($4 ~ /^([0-9A-Fa-f]{2}[:-]){5}([0-9A-Fa-f]{2})$/) print substr($4,13)}'
This will filter the column $4 and verify that is a valid MAC address, then it uses substr to just return the last "letters"
You could also split by : and print the output in multiple ways, for example:
awk '{if ($4 ~ /^([0-9A-Fa-f]{2}[:-]){5}([0-9A-Fa-f]{2})$/) split($4,a,":"); print a[5] ":" a[6]}
Notice the exp ~ /regexp/
This is true if the expression exp (taken as a string) is matched by regexp.
The following example matches, or selects, all input records with the upper-case letter `J' somewhere in the first field:
$ awk '$1 ~ /J/' inventory-shipped
-| Jan 13 25 15 115
-| Jun 31 42 75 492
-| Jul 24 34 67 436
-| Jan 21 36 64 620
So does this:
awk '{ if ($1 ~ /J/) print }' inventory-shipped

How to count all numbers in a file with awk?

I want to count all numbers that are in a file.
Example:
input -> Hi, this is 25 ...
input -> Lalala 21 or 29 what is ... 79?
The output should be the sum of all numbers: 154 (that is, 25+21+29+79).
From this beautiful answer by hek2mgl on how to extract the biggest number in a file, let's catch all the numbers in the file and sum them:
$ awk '{for(i=1;i<=NF;i++){sum+=$i}}END{print sum}' RS='$' FPAT='-{0,1}[0-9]+' file
154
This sets the record separator in a way that the whole block of text is a unique record. Then, it sets FPAT so that every single number (positive or negative) is a different field:
FPAT #
A regular expression (as a string) that tells gawk to create the
fields based on text that matches the regular expression. Assigning a
value to FPAT overrides the use of FS and FIELDWIDTHS for field
splitting.
$ cat data
Hi, this is 25 ...
Lalala 21 or 29 what is ... 79?
$ grep -oP '\b\d+\b' data | paste -s -d '+' | bc
154
With grep and awk :
$ cat test.txt
Hi, this is 25 ...
Lalala 21 or 29 what is ... 79?
$ grep '[0-9]\+' -o test.txt | awk '{ sum+=$1} END {print sum}'
154

Match specific column with grep command

I am having trouble matching specific column with grep command. I have a test file (test.txt) like this..
Bra001325 835 T 13 c$c$c$c$c$cccccCcc !!!!!68886676
Bra001325 836 C 8 ,,,,,.,, 68886676
Bra001325 841 A 6 ,$,.,,. BJJJJE
Bra001325 866 C 2 ,. HJ
And i want to extract all those lines which has a number 866 in the second column. When i use grep command i am getting all the lines that contains the number that number
grep "866" test.txt
Bra001325 835 T 13 c$c$c$c$c$cccccCcc !!!!!68886676
Bra001325 836 C 8 ,,,,,.,, 68886676
Bra001325 866 C 2 ,. HJ
How can i match specific column with grep command?
Try doing this :
$ awk '$2 == 866' test.txt
No need to add {print}, the default behaviour of awk is to print on a true condition.
with grep :
$ grep -P '^\S+\s+866\b' *
But awk can print filenames too & is quite more robust than grep here :
$ awk '$2 == 866{print FILENAME":"$0; nextfile}' *
In my case, the field separator is not space but comma. So I would have to add this, otherwise it won't work for me (On ubuntu 18.04.1).
awk -F ', ' '$2 == 866' test.txt

Slice 3TB log file with sed, awk & xargs?

I need to slice several TB of log data, and would prefer the speed of the command line.
I'll split the file up into chunks before processing, but need to remove some sections.
Here's an example of the format:
uuJ oPz eeOO 109 66 8
uuJ oPz eeOO 48 0 221
uuJ oPz eeOO 9 674 3
kf iiiTti oP 88 909 19
mxmx lo uUui 2 9 771
mxmx lo uUui 577 765 27878456
The gaps between the first 3 alphanumeric strings are spaces. Everything after that is tabs. Lines are separated with \n.
I want to keep only the last line in each group.
If there's only 1 line in a group, it should be kept.
Here's the expected output:
uuJ oPz eeOO 9 674 3
kf iiiTti oP 88 909 19
mxmx lo uUui 577 765 27878456
How can I do this with sed, awk, xargs and friends, or should I just use something higher level like Python?
awk -F '\t' '
NR==1 {key=$1}
$1!=key {print line; key=$1}
{line=$0}
END {print line}
' file_in > file_out
Try this:
awk 'BEGIN{FS="\t"}
{if($1!=prevKey) {if (NR > 1) {print lastLine}; prevKey=$1} lastLine=$0}
END{print lastLine}'
It saves the last line and prints it only when it notcies that the key has changed.
This might work for you:
sed ':a;$!N;/^\(\S*\s\S*\s\S*\)[^\n]*\n\1/s//\1/;ta;P;D' file

Resources