How to print files names from file where awk is selecting values? - linux

I have a .txt file having files names as
z1.cap
z2.cap
z3.cap
z4.cap
Sample data present in these files are like shown below,
OTR 25896 PAT210 $TREMD DEST
OFR 21475 NAT102 #TREMD DEST
then I'm using below code to print desired values from files.
while read file_name
do
echo "progressing with file :${file_name}"
cat ${file_name} | grep "PAT210" | awk -F' ' '$5 == "(DEST" { print $file_name, $1}' | uniq >> OUTPUT_FILE
Now I want output which consists of 2 fields like,
z1.cap OTR
z2.cap OFR
and so on...
But i'm getting ouputs like,
- OTR
- OFR
Any help is aprreciated, Thanks.

To access the filename that awk is currently processing use the builtin variable FILENAME
To bind other shell variables from your shell to variables in awk use:
awk -v var1=$shvar1 -v var2=$shvar2 'your awk code using var1 and var2'
Assuming files.txt contains your list of files and with zero understanding of what exactly you are trying to achieve:
for file_name in $(cat files.txt)
do
echo "progressing with file :${file_name}"
awk -F' ' '($5 == "DEST") && ($3=="PAT210") { print FILENAME, $1}' $file_name | uniq >> OUTPUT_FILE
done
I removed the cat and incorporated the grep into your awk. The cat was unnecessary since awk can read the file itself.
You can remove the for loop entirely by saying
awk -F' ' '($5 == "DEST") && ($3=="PAT210") { print FILENAME, $1}' $(<files.txt) | uniq >> OUTPUT_FILE
The $(<files.txt) will send each filename to awk.

Related

print second column value after exact match of variables in file

I have a file list.txt containing data like this
hvar-mp-log.hvams europe#gmail.com
mvar-mp-log.mvams japan#gmail.com
mst-mp-log.mst korea#gmail.com
pif-mp-log-pif atlas#gmail.com
I need to match the string in the list.txt and print the matched string second column data.
if string=mst-mp-log.mst print korea#gmail.com.
I can match string like this example
grep -q "$string" list.txt
how to print matched string mail id. expected output should be like korea#gmail.com
With awk:
string="mst-mp-log.mst"
awk -v var="$string" '$1 == var {print $2}' list.txt
Or, if your grep command is already returning the correct lines, perhaps:
grep -q "$string" list.txt | awk '{print $2}'
Here is a solution using individual commands in a pipe:
$ grep '^mst-mp-log.mst ' list.txt | tr -s ' ' | cut -d ' ' -f 2
korea#gmail.com

Using awk to separate an output containing a tab and a "/" separators into a delimited format

I'll appreciate help in converting this output to a pipe delimited
I have the following output
abcde1234 /path/A/file1
test23455 /path/B/file2345
But I would like in
abcde1234|file1
test23455|file2345
In awk, If you set FS as [[:blank:]]+/|/ you can print the first and last fields:
awk -v FS='[[:blank:]]+/|/' -v OFS='|' '{print $1, $NF}' file
abcde1234|file1
test23455|file2345
Here is a one-liner awk solution:
awk -v FS='[ \t].*/' -v OFS='|' '{$1=$1}1' file
and, a sed one-liner:
sed 's%[[:blank:]].*/%|%' file
and a pure bash one
while read -r; do echo "${REPLY%%[[:blank:]]*}|${REPLY##*/}"; done < file
try to use cut 🤷🏻‍♀️.
abcde1234 /path/A/file1
test23455 /path/B/file2345
while IFS= read -r line; do
value1=$(echo $line | cut -d ' ' -f1)
value2=$(echo $line | cut -d '/' -f4)
printf "$value1 $value2\n"
done < <(cat list)

Set an external variable in awk

I have written a script in which I want to count the number of columns in data.txt . My problem is I am unable to set the x in awk script.
Any help would be highly appreciated.
while read p; do
x=1;
echo $p | awk -F' ' '{x=NF}'
echo $x;
file="$x"".txt";
echo $file;
done <$1
data.txt file:
4495125 94307025 giovy115p#live.it 94307025.094307025 12443
stazla deva1a23#gmail.com 1992/.:\1
1447585 gioao_87#hotmail.it h1st#1
saknit tomboro#seznam.cz 1233 1990
Expected output:
5.txt
3.txt
3.txt
4.txt
My output:
1.txt
1.txt
1.txt
1.txt
You just cannot import variable set in Awk to a shell context. In your example the value set inside x containing NF will be not reflected outside.
Either you need to use command substitution($(..)) syntax to get the value of NF and use it later
x=$(echo "$p" | awk '{print NF}')
Now x will contain the column count in each of the line. Note that you don't need to use -F' ' which is the default de-limiter in awk.
Besides your requirement can be fully done in Awk itself.
awk 'NF{print NF".txt"}' file
Here the NF{..} is to ensure that the actions inside {..} are applied only to non-empty rows. The for each row we print the length and append the extension .txt along with it.
Awk processes a line at a time -- processing each line in a separate Awk script inside a shell while read loop is horrendously inefficient. See also https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice
Maybe something like this:
awk '{ print >(NF ".txt") }' data.txt
to create a file with the five-column rows in 5.txt, the four-column ones in 4.txt, the three-column rows in 2.txt, etc for each unique column count.
The Awk variable NF contains the number of fields (by default, Awk splits fields on runs of whitespace -- use -F to change to some other separator) and the expression (NF ".txt") simply produces a string catenation of the number of fields with the suffix .txt which we pass as a file name to the print redirection.
With bash:
while read p; do p=($p); echo "${#p[#]}.txt"; done < file
or shorter:
while read -a p; do echo "${#p[#]}.txt"; done < file
Output:
5.txt
3.txt
3.txt
4.txt

Print the results of a grep to a csv file

I am trying to print the results of grep to a CSV file. When I run this command grep FILENAME * in my directory I get this result : B140509184-#-02Jun2015-#-11:00-#-12:00-#-LT4-UGW-MAJAZ-#-I-#-CMAP-#-P-45088966.lif: FILENAME A20150602.0835+0400-0840+0400_DXB-GGSN-V9-PGP-16.xml.
What I want is to print the FILENAME part to a csv. Below is what I have tried so far, I am a bit lost in the part of how I should go about printing the result
BASE_DIR=/appl/virtuo/var/loader/spool/umtsericssonggsn_2013a/2013A/good
cd ${BASE_DIR}
grep FILENAME *
#grep END_TIME *
#grep START_TIME *
#my $Filename = grep FILENAME *;
print $Filename;
#$ awk '{print;}' employee.txt
#echo "$Filename :"
File sample
measInfoId 140509184
GP 3600
START_DATE 2015-06-02
Output should be 140509184 3600 2015-06-02, in columns of a csv file
Updated Answer
If you only want lines that start with GP, or START_DATE or measInfoId, you would modify the awk commands below to look like this:
awk '/^GP/ || /^START_DATE/ || /^measInfoId/ {print $2}' file
Original Answer
I am not sure what you are trying to do actually, but this may help...
If your input file is called file, and contains this:
measInfoId 140509184
GP 3600
START_DATE 2015-06-02
This command will print the second field on each line:
awk '{print $2}' file
140509184
3600
2015-06-02
Then, building on that, this command will put all those onto a single line:
awk '{print $2}' file | xargs
140509184 3600 2015-06-02
And then this one will translate the spaces into commas:
awk '{print $2}' file | xargs | tr " " ","
140509184,3600,2015-06-02
And if you want to do that for all the files in a directory, you can do this:
for f in *; do
awk '{print $2}' "$f" | xargs | tr " " ","
done > result.cv

Matching third field in a CSV with pattern file in GNU Linux (AWK/SED/GREP)

I need to print all the lines in a CSV file when 3rd field matches a pattern in a pattern file.
I have tried grep with no luck because it matches with any field not only the third.
grep -f FILE2 FILE1 > OUTPUT
FILE1
dasdas,0,00567,1,lkjiou,85249
sadsad,1,52874,0,lkjiou,00567
asdasd,0,85249,1,lkjiou,52874
dasdas,1,48555,0,gfdkjh,06793
sadsad,0,98745,1,gfdkjh,45346
asdasd,1,56321,0,gfdkjh,47832
FILE2
00567
98745
45486
54543
48349
96349
56485
19615
56496
39493
RIGHT OUTPUT
dasdas,0,00567,1,lkjiou,85249
sadsad,0,98745,1,gfdkjh,45346
WRONG OUTPUT
dasdas,0,00567,1,lkjiou,85249
sadsad,1,52874,0,lkjiou,00567 <---- I don't want this to appear
sadsad,0,98745,1,gfdkjh,45346
I have already searched everywhere and tried different formulas.
EDIT: thanks to Wintermute, I managed to write something like this:
csvquote file1.csv > file1.csv
awk -F '"' 'FNR == NR { patterns[$0] = 1; next } patterns[$6]' file2.csv file1.csv | csvquote -u > result.csv
Csvquote helps parsing CSV files with AWK.
Thank you very much everybody, great community!
With awk:
awk -F, 'FNR == NR { patterns[$0] = 1; next } patterns[$3]' file2 file1
This works as follows:
FNR == NR { # when processing the first file (the pattern file)
patterns[$0] = 1 # remember the patterns
next # and do nothing else
}
patterns[$3] # after that, select lines whose third field
# has been seen in the patterns.
Using grep and sed:
grep -f <( sed -e 's/^\|$/,/g' file2) file1
dasdas,0,00567,1,lkjiou,85249
sadsad,0,98745,1,gfdkjh,45346
Explanation:
We insert a coma at the beginning and at the end of file2, but without changing the file, then we just grep as you were already doing.
This can be a start
for i in $(cat FILE2);do cat FILE1| cut -d',' -f3|grep $i ;done
sed 's#.*#/^[^,]*,[^,]*,&,/!d#' File2 >/tmp/File2.sed && sed -f /tmp/File2.sed FILE1;rm /tmp/File2.sed
hard in a simple sed like awk can do but should work if awk is not available
same with egrep (usefull on huge file)
sed 's#.*#^[^,]*,[^,]*,&,#' File2 >/tmp/File2.egrep && egrep -f /tmp/File2.egrep FILE1;rm /tmp/File2.egrep

Resources