Match specific column with grep command - linux

I am having trouble matching specific column with grep command. I have a test file (test.txt) like this..
Bra001325 835 T 13 c$c$c$c$c$cccccCcc !!!!!68886676
Bra001325 836 C 8 ,,,,,.,, 68886676
Bra001325 841 A 6 ,$,.,,. BJJJJE
Bra001325 866 C 2 ,. HJ
And i want to extract all those lines which has a number 866 in the second column. When i use grep command i am getting all the lines that contains the number that number
grep "866" test.txt
Bra001325 835 T 13 c$c$c$c$c$cccccCcc !!!!!68886676
Bra001325 836 C 8 ,,,,,.,, 68886676
Bra001325 866 C 2 ,. HJ
How can i match specific column with grep command?

Try doing this :
$ awk '$2 == 866' test.txt
No need to add {print}, the default behaviour of awk is to print on a true condition.
with grep :
$ grep -P '^\S+\s+866\b' *
But awk can print filenames too & is quite more robust than grep here :
$ awk '$2 == 866{print FILENAME":"$0; nextfile}' *

In my case, the field separator is not space but comma. So I would have to add this, otherwise it won't work for me (On ubuntu 18.04.1).
awk -F ', ' '$2 == 866' test.txt

Related

How to use awk '{print $1*Number}' from the second line or telling him to ignore NaN values?

I have a file called 'waterproofposters.jsonl' with this type of output:
Regular price
100
200
300
400
500
And I need to take out 2% of each value. I have used the following code:
awk '{print $1*0.98}' waterproofposters.jsonl
And then I have the following output:
0
98
196
294
392
490
And then I'm stuck because I need to have 'Regular price' in the first line instead '0'
I thought to replace '0' with 'Regular price using
find . -name "waterproof.jsonl" | xargs sed -i -e 's/0/Regular price/g'
But it will replace all the '0' by 'Regular price'
To print the first line as-is:
awk '{print (NR>1 ? $0*0.98 : $0)}'
To print lines that are not a number as-is:
awk '{print ($0+0 == $0 ? $0*0.98 : $0)}'
I'm using $0 instead of $1 in the multiplication because:
They're the same thing in your numerical input, and
I aesthetically prefer using the same value across the whole script rather than different values for the numeric vs non-numeric lines, and
When you use a specific field it causes awk to do field-splitting so it's a bit more efficient to not reference a field when the whole record will do.
Here's both of the above working with the posted sample input:
$ awk '{print (NR>1 ? $0*0.98 : $0)}' file
Regular price
98
196
294
392
490
$ awk '{print ($0+0 == $0 ? $0*0.98 : $0)}' file
Regular price
98
196
294
392
490
and here's the difference between the two given input that has a non-numeric value mid input file:
$ cat file
Regular price
100
200
foobar
400
500
$ awk '{print (NR>1 ? $0*0.98 : $0)}' file
Regular price
98
196
0
392
490
$ awk '{print ($0+0 == $0 ? $0*0.98 : $0)}' file
Regular price
98
196
foobar
392
490
You can certainly achieve what you need with a single awk call, but an answer to why your sed -i -e 's/0/Regular price/g' command did not work as expected is that you used 0 as the regex pattern. 0 matches any zero char inside the string.
You want to replace 0s that are the only char on a line.
Hence, you need to use ^ and $ anchors to match the start and end of the line respectively:
sed -i 's/^0$/Regular price/'
If you need to replace on the first line only add the 1 address before the substitution command:
sed -i '1 s/^0$/Regular price/'
Note you do not need g, since you only expect one replacement per line and g is only needed when performing multiple replacements on a line. By default, all lines will get processed.
How to use awk '{print $1Number}' from the second line or telling him to ignore NaN values?*
I would do it following way using GNU AWK, let file.txt content be
Regular price
100
200
300
400
500
then
awk 'NR==1{print}NR>=2{print $1*0.98}' file.txt
output
Regular price
98
196
294
392
490
Explanation: if it 1st line just print it, if it 2nd or later line print 0.98 of 1st column value
(tested in GNU Awk 5.0.1)

Print lines not containg a period linux

I have a file with thousands of rows. I want to print the rows which do not contain a period.
awk '{print$2}' file.txt | head
I have used this to print the column I am interested in, column 2 (The file only has two columns).
I have removed the head and then did
awk '{print$2}' file.txt | grep -v "." | head
But I only get blank lines not any actual values which is expected, I think it has included the spaces between the rows but I am not sure.
Is there an alternative command?
As suggested by Jim, I did-
awk '{print$2}' file.txt | grep -v "\." | head
However the number of lines is greater than before, is this expected? Also, my output is a list of numbers but with spaces in between them (Vertical), is this normal?
file.txt example below-
120.4 3
270.3 7.9
400.8 3.9
200.2 4
100.2 8.7
300.2 3.4
102.3 6
49.0 2.3
38.0 1.2
So the expected (and correct) output would be 3 lines, as there is 3 values in column 2 without the period:
$ awk '{print$2}' file.txt | grep -v "\." | head
3
4
6
However, when running the code as above, I instead get 5, which is also counting the spaces between the rows I think:
$ awk '{print$2}' file.txt | grep -v "\." | head
3
4
6
You seldom need to use grep if you're already using awk
This would print the second column on each line where that second column doesn't contain a dot:
awk '$2 !~ /\./ {print $2}'
But you also wanted to skip empty lines, or perhaps ones where the second column is not empty. So just test for that, too:
awk '$2 != "" && $2 !~ /\./ {print $2}'
(A more amusing version would be awk '$2 ~ /./ && $2 !~ /\./ {print $2}' )
As you said, grep -v "." gives you only blank lines. That's because the dot means "any character", and with -v, the only lines printed are those that don't contain, well, any characters.
grep is interpreting the dot as a regex metacharacter (the dot will match any single character). Try escaping it with a backslash:
awk '{print$2}' file.txt | grep -v "\." | head
If I understand well, you can try this sed
sed ':A;N;${s/.*/&\n/};/\n$/!bA;s/\n/ /g;s/\([^ ]*\.[^ ]* \)//g' file.txt
output
3
4
6

How to count all numbers in a file with awk?

I want to count all numbers that are in a file.
Example:
input -> Hi, this is 25 ...
input -> Lalala 21 or 29 what is ... 79?
The output should be the sum of all numbers: 154 (that is, 25+21+29+79).
From this beautiful answer by hek2mgl on how to extract the biggest number in a file, let's catch all the numbers in the file and sum them:
$ awk '{for(i=1;i<=NF;i++){sum+=$i}}END{print sum}' RS='$' FPAT='-{0,1}[0-9]+' file
154
This sets the record separator in a way that the whole block of text is a unique record. Then, it sets FPAT so that every single number (positive or negative) is a different field:
FPAT #
A regular expression (as a string) that tells gawk to create the
fields based on text that matches the regular expression. Assigning a
value to FPAT overrides the use of FS and FIELDWIDTHS for field
splitting.
$ cat data
Hi, this is 25 ...
Lalala 21 or 29 what is ... 79?
$ grep -oP '\b\d+\b' data | paste -s -d '+' | bc
154
With grep and awk :
$ cat test.txt
Hi, this is 25 ...
Lalala 21 or 29 what is ... 79?
$ grep '[0-9]\+' -o test.txt | awk '{ sum+=$1} END {print sum}'
154

linux command to get the last appearance of a string in a text file

I want to find the last appearance of a string in a text file with linux commands. For example
1 a 1
2 a 2
3 a 3
1 b 1
2 b 2
3 b 3
1 c 1
2 c 2
3 c 3
In such a text file, i want to find the line number of the last appearance of b which is 6.
I can find the first appearance with
awk '/ b / {print NR;exit}' textFile.txt
but I have no idea how to do it for the last occurrence.
cat -n textfile.txt | grep " b " | tail -1 | cut -f 1
cat -n prints the file to STDOUT prepending line numbers.
grep greps out all lines containing "b" (you can use egrep for more advanced patterns or fgrep for faster grep of fixed strings)
tail -1 prints last line of those lines containing "b"
cut -f 1 prints first column, which is line # from cat -n
Or you can use Perl if you wish (It's very similar to what you'd do in awk, but frankly, I personally don't ever use awk if I have Perl handy - Perl supports 100% of what awk can do, by design, as 1-liners - YMMV):
perl -ne '{$n=$. if / b /} END {print "$n\n"}' textfile.txt
This can work:
$ awk '{if ($2~"b") a=NR} END{print a}' your_file
We check every second file being "b" and we record the number of line. It is appended, so by the time we finish reading the file, it will be the last one.
Test:
$ awk '{if ($2~"b") a=NR} END{print a}' your_file
6
Update based on sudo_O advise:
$ awk '{if ($2=="b") a=NR} END{print a}' your_file
to avoid having some abc in 2nd field.
It is also valid this one (shorter, I keep the one above because it is the one I thought :D):
$ awk '$2=="b" {a=NR} END{print a}' your_file
Another approach if $2 is always grouped (may be more efficient then waiting until the end):
awk 'NR==1||$2=="b",$2=="b"{next} {print NR-1; exit}' file
or
awk '$2=="b"{f=1} f==1 && $2!="b" {print NR-1; exit}' file

deleting lines from a text file with bash

I have two sets of text files. First set is in AA folder. Second set is in BB folder. The content of ff.txt file from first set(AA folder) is shown below.
Name number marks
john 1 60
maria 2 54
samuel 3 62
ben 4 63
I would like to print the second column(number) from this file if marks>60. The output will be 3,4. Next, read the ff.txt file in BB folder and delete the lines containing numbers 3,4. How can I do this with bash?
files in BB folder looks like this. second column is the number.
marks 1 11.824 24.015 41.220 1.00 13.65
marks 1 13.058 24.521 40.718 1.00 11.82
marks 3 12.120 13.472 46.317 1.00 10.62
marks 4 10.343 24.731 47.771 1.00 8.18
awk 'FNR == NR && $3 > 60 {array[$2] = 1; next} {if ($2 in array) next; print}' AA/ff.txt BB/filename
This works, but is not efficient (is that matter?)
gawk 'BEGIN {getline} $3>60{print $2}' AA/ff.txt | while read number; do gawk -v number=$number '$2 != number' BB/ff.txt > /tmp/ff.txt; mv /tmp/ff.txt BB/ff.txt; done
Of course, the second awk can be replaced with sed -i
For multi files:
ls -1 AA/*.txt | while read file
do
bn=`basename $file`
gawk 'BEGIN {getline} $3>60{print $2}' AA/$bn | while read number
do
gawk -v number=$number '$2 != number' BB/$bn > /tmp/$bn
mv /tmp/$bn BB/$bn
done
done
I didn't test it, so if there is a problem, please comment.

Resources