Linux awk with condition

Linux awk with condition - linux

I have a very large file (2.5M record) with 2 columns seperated by |.
I would like to filter all record that do not contain the value "-1" inside the second column and write it into a new file.
I tried to use:
grep -v "-1" norm_cats_21_07_assignments.psv > norm_cats_21_07_assignments.psv
but noo luck.

For quick and dirty solution, you can simply add | to your grep:
grep -v "|-1" input.psv > output.psv
This assumes that rows to be ignored look like
something|-1
Note that if you ever need to use grep -v "-1", you have to add -- after options, otherwise grep will treat -1 as an option, something like this:
grep -v -- "-1"

You could do this through awk,
awk -F"|" '$2~/^-1$/{next}1' file > newfile
Example:
$ cat r
foo|-1
foo|bar
$ awk -F"|" '$2~/^-1$/{next}1' r
foo|bar

You can have:
awk -F'|' '$2 != "-1"' file.psv > new_file.psv
Or
awk -F'|' '$2 !~ /-1/' file.psv > new_file.psv
!= matches the whole column while !~ needs only a part of it.
Edit: Just noticed that your input file and output file are the same. You can't do that as the output file which is the same file would get truncated even before awk starts reading it.
With awk after making the new filtered file (e.g. new_file.psv), you can save it back by using cat new_file.psv > file.psv or mv new_file.psv file.psv.
But somehow if you exactly have 2 columns separated with | and no spaces in between, and no quotes around, etc. You can just use inline editing with sed:
sed -i '/|-1/d' file.psv
Or perhaps something equivalent to awk -F'|' '$2 !~ /-1/':
sed -i '/|.*-1/d' file.psv

Related

How to get 1st field of a file only when 2nd field matches a string?

How to get 1st field of a file only when 2nd field matches a given string?
#cat temp.txt
Ankit pass
amit pass
aman fail
abhay pass
asha fail
ashu fail
cat temp.txt | awk -F"\t" '$2 == "fail" { print $1 }'*
gives no output

Another syntax with awk:
awk '$2 ~ /^faild$/{print $1}' input_file
A deleted 'cat' command.
^ start string
$ end string
It's the best way to match patten.

Either:
Your fields are not tab-separated or
You have blanks at the end of the relevant lines or
You have DOS line-endings and so there are CRs at the end of every
line and so also at the end of every $2 in every line (see
Why does my tool output overwrite itself and how do I fix it?)
With GNU cat you can run cat -Tev temp.txt to see tabs (^I), CRs (^M) and line endings ($).

Your code seems to work fine when I remove the * at the end
cat temp.txt | awk -F"\t" '$2 == "fail" { print $1 }'
The other thing to check is if your file is using tab or spaces. My copy/paste of your data file copied spaces, so I needed this line:
cat temp.txt | awk '$2 == "fail" { print $1 }'
The other way of doing this is with grep:
cat temp.txt | grep fail$ | awk '{ print $1 }'

Exact Match of Word using grep

I have data in file.txt as follows
BRAD CHICAGO|NORTH SAMSONCHESTER|
CORA|NEW ERICA|
CAMP LOGAN|KINGBERG|
NCHICAGOS|ESTING|
CHICAGO|MANKING|
OCREAN|CHICAGO|
CHICAGO PIT|BULL|
CHICAGO |NEWYORK|
Question 1:
I want to search for the exact match for word "CHICAGO" in first column and print second column.
Output should look like:
MANKING
NEWYORK
Question 2:
If multiple matches found then can we limit the out to only one ? so that the output will be only MANKING or NEWYORK
I tried below
grep -E -i "^CHICAGO" file.txt | awk -F '|' '{print $2}'
but i am getting below output
MANKING
BULL
NEWYORK
Expected output for Question 1:
MANKING
NEWYORK
Expected output for Question 2:
MANKING

Here are some more ways:
Using grep and cut:
grep "^CHICAGO|" file.txt | cut -d'|' -f2
Using awk
awk -F"|" '/^CHICAGO\|/{print $2}' file.txt
For question 2 simply pipe it to head, i.e:
grep "^CHICAGO|" file.txt | cut -d'|' -f2 | head -n1
Similarly for the awk command.

how about an awk solution?
awk -F'|' '$1 == "CHICAGO"{print $2}' file
to only print one output, exit once you have a match, i.e.
awk -F'|' '$1 == "CHICAGO"{print $2; exit}' file
Making that more generic, you can pass in a variable, i.e.
awk -v trgt="CHICAGO" -F'|' '{targ="^" trgt " *$"; if ( $1 ~ targ ) {print $2}}' file
The " *$" regex limits the match to zero or more trailing spaces without any extra chars at the end of the target string. So this will meet your criteria to match skip matching CHICAGO PIT|BULL.
AND this can be further reduced to
awk -v trgt="CHICAGO" -F'|' '{ if ( $1 ~ "^" trgt " *$" ) {print $2}}' file
constructing the regex "in-place" in with the comparison.
So you could use more verbose variable names to "describe" how the regex is being constructed from the input and the regex "wrappers" (as in the 3rd example) OR, you can just combine the input variable with the regex syntax in place. That is just a matter of taste or documentation conventions.
You might want to include a comment to explain you are constructing a regex test that would look like the $1 ~ /^CHICAGO *$/.
IHTH

Extract field after colon for lines where field before colon matches pattern

I have a file file1 which looks as below:
tool1v1:1.4.4
tool1v2:1.5.3
tool2v1:1.5.2.c8.5.2.r1981122221118
tool2v2:32.5.0.abc.r20123433554
I want to extract value of tool2v1 and tool2v2
My output should be 1.5.2.c8.5.2.r1981122221118 and 32.5.0.abc.r20123433554.
I have written the following awk but it is not giving correct result:
awk -F: '/^tool2v1/ {print $2}' file1
awk -F: '/^tool2v2/ {print $2}' file1

grep -E can also do the job:
grep -E "tool2v[12]" file1 |sed 's/^.*://'

If you have a grep that supports Perl compatible regular expressions such as GNU grep, you can use a variable-sized look-behind:
$ grep -Po '^tool2v[12]:\K.*' infile
1.5.2.c8.5.2.r1981122221118
32.5.0.abc.r20123433554
The -o option is to retain just the match instead of the whole matching line; \K is the same as "the line must match the things to the left, but don't include them in the match".
You could also use a normal look-behind:
$ grep -Po '(?<=^tool2v[12]:).*' infile
1.5.2.c8.5.2.r1981122221118
32.5.0.abc.r20123433554
And finally, to fix your awk which was almost correct (and as pointed out in a comment):
$ awk -F: '/^tool2v[12]/ { print $2 }' infile
1.5.2.c8.5.2.r1981122221118
32.5.0.abc.r20123433554

You can filter with grep:
grep '\(tool2v1\|tool2v2\)'
And then remove the part before the : with sed:
sed 's/^.*://'
This sed operation means:
^ - match from beginning of string
.* - all characters
up to and including the :
... and replace this matched content with nothing.
The format is sed 's/<MATCH>/<REPLACE>/'
Whole command:
grep '\(tool2v1\|tool2v2\)' file1|sed 's/^.*://'
Result:
1.5.2.c8.5.2.r1981122221118
32.5.0.abc.r20123433554

the question has already been answered though, but you can also use pure bash to achieve the desired result
#!/usr/bin/env bash
while read line;do
if [[ "$line" =~ ^tool2v* ]];then
echo "${line#*:}"
fi
done < ./file1.txt
the while loop reads every line of the file.txt, =~ does a regexp match to check if the value of $line variable if it starts with toolv2, then it trims : backward

Matching third field in a CSV with pattern file in GNU Linux (AWK/SED/GREP)

I need to print all the lines in a CSV file when 3rd field matches a pattern in a pattern file.
I have tried grep with no luck because it matches with any field not only the third.
grep -f FILE2 FILE1 > OUTPUT
FILE1
dasdas,0,00567,1,lkjiou,85249
sadsad,1,52874,0,lkjiou,00567
asdasd,0,85249,1,lkjiou,52874
dasdas,1,48555,0,gfdkjh,06793
sadsad,0,98745,1,gfdkjh,45346
asdasd,1,56321,0,gfdkjh,47832
FILE2
00567
98745
45486
54543
48349
96349
56485
19615
56496
39493
RIGHT OUTPUT
dasdas,0,00567,1,lkjiou,85249
sadsad,0,98745,1,gfdkjh,45346
WRONG OUTPUT
dasdas,0,00567,1,lkjiou,85249
sadsad,1,52874,0,lkjiou,00567 <---- I don't want this to appear
sadsad,0,98745,1,gfdkjh,45346
I have already searched everywhere and tried different formulas.
EDIT: thanks to Wintermute, I managed to write something like this:
csvquote file1.csv > file1.csv
awk -F '"' 'FNR == NR { patterns[$0] = 1; next } patterns[$6]' file2.csv file1.csv | csvquote -u > result.csv
Csvquote helps parsing CSV files with AWK.
Thank you very much everybody, great community!

With awk:
awk -F, 'FNR == NR { patterns[$0] = 1; next } patterns[$3]' file2 file1
This works as follows:
FNR == NR { # when processing the first file (the pattern file)
patterns[$0] = 1 # remember the patterns
next # and do nothing else
}
patterns[$3] # after that, select lines whose third field
# has been seen in the patterns.

Using grep and sed:
grep -f <( sed -e 's/^\|$/,/g' file2) file1
dasdas,0,00567,1,lkjiou,85249
sadsad,0,98745,1,gfdkjh,45346
Explanation:
We insert a coma at the beginning and at the end of file2, but without changing the file, then we just grep as you were already doing.

This can be a start
for i in $(cat FILE2);do cat FILE1| cut -d',' -f3|grep $i ;done

sed 's#.*#/^[^,]*,[^,]*,&,/!d#' File2 >/tmp/File2.sed && sed -f /tmp/File2.sed FILE1;rm /tmp/File2.sed
hard in a simple sed like awk can do but should work if awk is not available
same with egrep (usefull on huge file)
sed 's#.*#^[^,]*,[^,]*,&,#' File2 >/tmp/File2.egrep && egrep -f /tmp/File2.egrep FILE1;rm /tmp/File2.egrep

Grep entire line after word

What would be the grep command to get an everything in the line after a match?
For example on a file path:
/home/usr/we/This/is/the/file/path
and I want the output to be
/we/This/is/the/File/Path
Matching the /we as the regex.

grep -o does what you want.
grep -o '/we.*'

OP like to use we as a trigger. Using awk
awk -F/ '{for (i=1;i<=NF;i++) {if ($i~/we/) f=1;if (f) printf "/%s",$i}print ""}' file
/we/This/is/the/file/path
Using gnu awk
awk '{print gensub(/.*(\/we)/,"\\1","g")}' file
/we/This/is/the/file/path

YourInput | sed 's|/home/usr\(/we.*\)|\1|'
assuming it's always (and only) starting with /home/usr
else
YourInput | sed -n 's|^.*\(/we.*\)||p'
return only line(s) having /we and remove text before /we

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Linux awk with condition - linux

You could do this through awk, awk -F"|" '$2~/^-1$/{next}1' file > newfile Example: $ cat r foo|-1 foo|bar $ awk -F"|" '$2~/^-1$/{next}1' r foo|bar

Related

How to get 1st field of a file only when 2nd field matches a string?

Exact Match of Word using grep

Extract field after colon for lines where field before colon matches pattern

Matching third field in a CSV with pattern file in GNU Linux (AWK/SED/GREP)

Grep entire line after word

Categories

Resources