How to run grep inside awk? - linux

Suppose I have a file input.txt with few columns and few rows, the first column is the key, and a directory dir with files which contain some of these keys. I want to find all lines in the files in dir which contain these key words. At first I tried to run the command
cat input.txt | awk '{print $1}' | xargs grep dir
This doesn't work because it thinks the keys are paths on my file system. Next I tried something like
cat input.txt | awk '{system("grep -rn dir $1")}'
But this didn't work either, eventually I have to admit that even this doesn't work
cat input.txt | awk '{system("echo $1")}'
After I tried to use \ to escape the white space and the $ sign, I came here to ask for your advice, any ideas?
Of course I can do something like
for x in `cat input.txt` ; do grep -rn $x dir ; done
This is not good enough, because it takes two commands, but I want only one. This also shows why xargs doesn't work, the parameter is not the last argument

You don't need grep with awk, and you don't need cat to open files:
awk 'NR==FNR{keys[$1]; next} {for (key in keys) if ($0 ~ key) {print FILENAME, $0; next} }' input.txt dir/*
Nor do you need xargs, or shell loops or anything else - just one simple awk command does it all.
If input.txt is not a file, then tweak the above to:
real_input_generating_command |
awk 'NR==FNR{keys[$1]; next} {for (key in keys) if ($0 ~ key) {print FILENAME, $0; next} }' - dir/*
All it's doing is creating an array of keys from the first file (or input stream) and then looking for each key from that array in every file in the dir directory.

Try following
awk '{print $1}' input.txt | xargs -n 1 -I pattern grep -rn pattern dir

First thing you should do is research this.
Next ... you don't need to grep inside awk. That's completely redundant. It's like ... stuffing your turkey with .. a turkey.
Awk can process input and do "grep" like things itself, without the need to launch the grep command. But you don't even need to do this. Adapting your first example:
awk '{print $1}' input.txt | xargs -n 1 -I % grep % dir
This uses xargs' -I option to put xargs' input into a different place on the command line it runs. In FreeBSD or OSX, you would use a -J option instead.
But I prefer your for loop idea, converted into a while loop:
while read key junk; do grep -rn "$key" dir ; done < input.txt

Use process substitution to create a keyword "file" that you can pass to grep via the -f option:
grep -f <(awk '{print $1}' input.txt) dir/*
This will search each file in dir for lines containing keywords printed by the awk command. It's equivalent to
awk '{print $1}' input.txt > tmp.txt
grep -f tmp.txt dir/*

grep requires parameters in order: [what to search] [where to search]. You need to merge keys received from awk and pass them to grep using the \| regexp operator.
For example:
arturcz#szczaw:/tmp/s$ cat words.txt
foo
bar
fubar
foobaz
arturcz#szczaw:/tmp/s$ grep 'foo\|baz' words.txt
foo
foobaz
Finally, you will finish with:
grep `commands|to|prepare|a|keywords|list` directory

In case you still want to use grep inside awk, make sure $1, $2 etc are outside quote.
eg. this works perfectly
cat file_having_query | awk '{system("grep " $1 " file_to_be_greped")}'
// notice the space after grep and before file name

Related

How do you change column names to lowercase with linux and store the file as it is?

I am trying to change the column names to lowercase in a csv file. I found the code to do that online but I dont know how to replace the old column names(uppercase) with new column names(lowercase) in the original file. I did something like this:
$cat head -n1 xxx.csv | tr "[A-Z]" "[a-z]"
But it simply just prints out the column names in lowercase, which is not enough for me.
I tried to add sed -i but it did not do any good. Thanks!!
Using awk (readability winner) :
concise way:
awk 'NR==1{print tolower($0);next}1' file.csv
or using ternary operator:
awk '{print (NR==1) ? tolower($0): $0}' file.csv
or using if/else statements:
awk '{if (NR==1) {print tolower($0)} else {print $0}}' file.csv
To change the file for real:
awk 'NR==1{print tolower($0);next}1' file.csv | tee /tmp/temp
mv /tmp/temp file.csv
For your information, sed using the in place edit switch -i do the same: it use a temporary file under the hood.
You can check this by using :
strace -f -s 800 sed -i'' '...' file
Using perl:
perl -i -pe '$_=lc() if $.==1' file.csv
It replace the file on the fly with -i switch
You can use sed to tell it to replace the first line with all lower-case and then print the rest as-is:
sed '1s/.*/\L&/' ./xxx.csv
Redirect the output or use -i to do an in-place edit.
Proof of Concept
$ echo -e "COL1,COL2,COL3\nFoO,bAr,baZ" | sed '1s/.*/\L&/'
col1,col2,col3
FoO,bAr,baZ

bash: awk print with in print

I need to grep some pattern and further i need to print some output within that. Currently I am using the below command which is working fine. But I like to eliminate using multiple pipe and want to use single awk command to achieve the same output. Is there a way to do it using awk?
root#Server1 # cat file
Jenny:Mon,Tue,Wed:Morning
David:Thu,Fri,Sat:Evening
root#Server1 # awk '/Jenny/ {print $0}' file | awk -F ":" '{ print $2 }' | awk -F "," '{ print $1 }'
Mon
I want to get this output using single awk command. Any help?
You can try something like:
awk -F: '/Jenny/ {split($2,a,","); print a[1]}' file
Try this
awk -F'[:,]+' '/Jenny/{print $2}' file.txt
It is using muliple -F value inside the [ ]
The + means one or more since it is treated as a regex.
For this particular job, I find grep to be slightly more robust.
Unless your company has a policy not to hire people named Eve.
(Try it out if you don't understand.)
grep -oP '^[^:]*Jenny[^:]*:\K[^,:]+' file
Or to do a whole-word match:
grep -oP '^[^:]*\bJenny\b[^:]*:\K[^,:]+' file
Or when you are confident that "Jenny" is the full name:
grep -oP '^Jenny:\K[^,:]+' file
Output:
Mon
Explanation:
The stuff up until \K speaks for itself: it selects the line(s) with the desired name.
[^,:]+ captures the day of week (in this case Mon).
\K cuts off everything preceding Mon.
-o cuts off anything following Mon.

Search A and replace B in A|B in shell scripting/SED/AWK

I have a text file with layout as:
tableName1|counterVariable1
tableName2|counterVariable2
I want to replace the counterVariable1 with some other variable say counterVariableNew.
How can I accomplish this?
I have tried various SED/AWK approaches, closest one is mentioned below:
cat $fileName | grep -w $tableName | sed -i 's/$tableName\|counterVariable/$tableName\|counterVariableNew'
But all the 3 commands are not merging properly, please help!
Your script is an example of [ useless use of cat ]. But the key point here is to escape the pipe delimiter which has a special meaning(it stands for OR) when used with awk FS. So below script should do
# cat 42000479
tableName1|counterVariable1
tableName2|counterVariable2
tableName3|counterVariable2
# awk -F\| '$1=="tableName2"{$2="counterVariableNew"}1' 42000479
tableName1|counterVariable1
tableName2 counterVariableNew
tableName3|counterVariable2
An alternate way of doing the same stuff is below
# awk -v FS='|' '$1=="tableName2"{$2="counterVariableNew"}1' 42000479
Stuff inside the single quote will not be expanded.
awk -F'|' -v OFS='|' '/tableName1/ {$2="counterVariableNew"}1' file
tableName1|counterVariableNew
tableName2|counterVariable2
This will search for A (tableName1) and replace B (counterVariable1) to counterVariableNew.
Or by using sed :
sed -r '/tableName1/ s/(^.*\|)(.*)/\1counterVariableNew/g' file
tableName1|counterVariableNew
tableName2|counterVariable2
For word bounded search: Enclose the pattern inside \< and \> .
sed -r '/\<tableName1\>/ s/(^.*\|)(.*)/\1counterVariableNew/g' file
awk -F'|' -v OFS='|' '/\<tableName1\>/ {$2="counterVariableNew"}1' file

Sum out of grep -c

I am trying to find the number an even occured in my log file.
Command:
grep -Eo "2016-08-30" applciationLog.log* -c
Output:
applciationLog.log.1:0
applciationLog.log.2:0
applciationLog.log.3:0
applciationLog.log.4:0
applciationLog.log.5:7684
applciationLog.log.6:9142
applciationLog.log.7:8699
applciationLog.log.8:0
What I actually need is sum of all these values 7684 + 9142 + 8699 = 25525. Any suggestion I can do it? Anything I can append to the grep to enable it.
Any help or pointers are welcome and appreciated.
If you want to keep your grep command, pipe its output to awk, the quick and dirty way is down here:
grep -Eo "aaa" -c aaa.txt bbb.txt -c | awk 'BEGIN {cnt=0;FS=":"}; {cnt+=$2;}; END {print cnt;}'
Or use use awk regex directly:
awk 'BEGIN {cnt=0}; {if(/aaa/) {cnt+=1;}}; END {print cnt;}' aaa.txt bbb.txt
As addition to the already given answer by ghoti:
You can avoid awk -F: by using grep -h:
grep -c -h -F "2016-08-30" applicationLog.log* | awk '{n+=$0} END {print n}'
This means no filenames and only the counts are printed by grep and we can use the first field for the addition in awk.
See if this works for you:
grep -Eo "2016-08-30" applciationLog.log* -c | awk -F':' 'BEGIN {sum = 0;} {sum += $2;} END {print sum;}'
We use awk to split each line up with a delimeter of :, sum up the numbers for each line, and print the result at the end.
The grep command doesn't do arithmetic, it just finds lines that match regular expressions.
To count the output you already have, I'd use awk.
grep -c -F "2016-08-30" applciationLog.log* | awk -F: '{n+=$2} END {print n}'
Note that your grep options didn't make sense -- -E tells the command to use Extended regular expressions, but you're just looking for a fixed string (the date). So I swapped in the -F option instead. And -o tells grep to print the matched text, which you've overridden with -c, so I dropped it.
An alternative using for-loop and arithmetic expansion could be:
x=0
for i in $(grep -hc "2016-08-30" applciationLog.log*);do
x=$((x+i))
done
echo "$x"
An easy alternative is to merge all the files before grep sees them:
cat applciationLog.log* | grep -Eo "2016-08-30" -c
In my directory have have hundreds of files, each file contains lot of text along with a lines similar to this-
Job_1-Run.log:[08/27/20 01:28:40] Total Jobs Cancelled for Job_1_set0 = 10
I do
grep '^Total Jobs Cancelled' ./*
to get that above line.
Then I do a pipe
| awk 'BEGIN {cnt=0;FS="="}; {cnt+=$2;}; END {print cnt;}'
so my final command is-
grep '^Total Jobs Cancelled' ./* | awk 'BEGIN {cnt=0;FS="="}; {cnt+=$2;};END {print cnt;}'
and result is the sum. e.g. -
900
I am using Cmder # https://cmder.net/
Thanks to the answer by #alagner, #john above

Print name of the file in front of every line of file

I have a lot of text files and I want to make a bash script in linux to print the name of file in each lines of file. For example I have file lenovo.txt and I want that every line in the file to start with lenovo.txt.
I try to make a "for" for this but didn't work.
for i in *.txt
do
awk '{print '$i' $0}' /var/SambaShare/$i > /var/SambaShare/new_$i
done
Thanks!
It doesn't work because you need to pass $i to awk with the -v option. But you can also use the FILENAME built-in variable in awk :
ls *txt
file.txt file2.txt
cat *txt
A
B
C
A2
B2
C2
for i in *txt; do
awk '{print FILENAME,$0}' $i;
done
file.txt A
file.txt B
file.txt C
file2.txt A2
file2.txt B2
file2.txt C2
An to redirect into a new file :
for i in *txt; do
awk '{print FILENAME,$0}' $i > ${i%.txt}_new.txt;
done
As for your corrected version :
for i in *.txt
do
awk -v i=$i '{print i,$0}' $i > new_$i
done
Hope this helps.
Using grep you can make use of the --with-filename (alias -H) option and use an empty pattern that always matches:
for i in *.txt
do
grep -H "" $i > new_$i
done
Awk and Bash don't share the same variables as they are different languages with separate interpreters. You should pass Bash variables to Awk with the -v option.
You should also quote your file name variables to ensure they don't get expanded as separate arguments if they contain whitespace.
for i in *.txt
do
awk -v i="$i" '{print i,$0}' "$i" > "$i"
done

Resources