Adding double quotes around non-numeric columns by awk - linux

I have a file like this;
2018-01-02;1.5;abcd;111
2018-01-04;2.75;efgh;222
2018-01-07;5.25;lmno;333
2018-01-09;1.25;prs;444
I'd like to add double ticks to non-numeric columns, so the new file should look like;
"2018-01-02";1.5;"abcd";111
"2018-01-04";2.75;"efgh";222
"2018-01-07";5.25;"lmno";333
"2018-01-09";1.25;"prs";444
I tried this so far, know that this is not the correct way
head myfile.csv -n 4 | awk 'BEGIN{FS=OFS=";"} {gsub($1,echo $1 ,$1)} 1' | awk 'BEGIN{FS=OFS=";"} {gsub($3,echo "\"" $3 "\"",$3)} 1'
Thanks in advance.

You may use this awk that sets ; as input/output delimiter and then wraps each field with "s if that field is non-numeric:
awk '
BEGIN {
FS = OFS = ";"
}
{
for (i=1; i<=NF; ++i)
$i = ($i+0 == $i ? $i : "\"" $i "\"")
} 1' file
"2018-01-02";1.5;"abcd";111
"2018-01-04";2.75;"efgh";222
"2018-01-07";5.25;"lmno";333
"2018-01-09";1.25;"prs";444
Alternative gnu-awk solution:
awk -v RS='[;\n]' '$0+0 != $0 {$0 = "\"" $0 "\""} {ORS=RT} 1' file

Using GNU awk and typeof(): Fields - - that are numeric strings have the strnum attribute. Otherwise, they have the string attribute.1
$ gawk 'BEGIN {
FS=OFS=";"
}
{
for(i=1;i<=NF;i++)
if(typeof($i)=="string")
$i=sprintf("\"%s\"",$i)
}1' file
Some output:
"2018-01-02";1.5;"abcd";111
- -
Edit:
If some the fields are already quoted:
$ gawk 'BEGIN {
FS=OFS=";"
}
{
for(i=1;i<=NF;i++)
if(typeof($i)=="string")
gsub(/^"?|"?$/,"\"",$i)
}1' <<< string,123,"quoted string"
Output:
"string",123,"quoted string"

Further enhancing upon anubhava's solution (including handling fields already double-quoted :
gawk -e 'sub(".+",$-_==+$-_?"&":(_)"&"_\
)^gsub((_)_, _)^(ORS = RT)' RS='[;\n]' \_='\42'
"2018-01-02";1.5;"abcd";111
"2018-01-04";2.75;"efgh";222
"2018-01-07";5.25;"lmno";333
"2018-01-09";1.25;"prs";444
"2018-01-09";1.25;"prs";111111111111111111112222222222
222222223333333333333333333333
333344444444444444444499999999
999991111111111111111111122222
222222222222233333333333333333
333333333444444444444444444999
999999999991111111111111111111
122222222222222222233333333333
333333333333333444444444444444
444999999999999991111111111111
111111122222222222222222233333
333333333333333333333444444444
444444444999999999999991111111
111111111111122222222222222222
233333333333333333333333333444
444444444444444999999999999991
111111111111111111122222222222
222222233333333333333333333333
333444444444444444444999999999
999991111111111111111111122222
222222222222233333333333333333
333333333444444444444444444999
999999999999

Related

Regex issue for match a column value

I wrote a script to extract a column value from a file which doesn't matches the pattern defined in col metadata file.
But it is not returning the right output. Can anyone point out the issue here? I was trying to match string with double quotes .quotes also needs to be matched.
Code:
`awk -F'|' -v n="$col_pos" -v m="$col_patt" 'NR!=1 && $n !~ "^" m "$" {
printf "%s:%s:%s\n", FILENAME, FNR, $0 > "/dev/stderr"
count++
}
END {print count}' $input_file`
run output :-
++ awk '-F|' -v n=4 -v 'm="[a-z]+#gmail.com"' 'NR!=1 && $n !~ "^" m "$" {
printf "%s:%s:%s\n", FILENAME, FNR, $0 > "/dev/stderr"
count++
}
END {print count}' /test/data/infa_shared/dev/SrcFiles/datawarehouse/poc/BNX.csv
10,22,"00AF","abc#gmail.com",197,10,1/1/2020 12:06:10.260 PM,"BNX","Hard b","50","Us",1,"25" -- this line is not expected in output as it matches the email pattern "[a-z]+#gmail.com". pattern is extracted from the below file
Input file for pattern extraction file_col_metadata
FILE_ID~col_POS~COL_START_POS~COL_END_POS~datatype~delimited_ind~col_format~columnlength
5~4~~~char~Y~"[a-z]+#gmail.com"~100
If you replace awk -F'|' ... with awk -F',' ... it will work.

grep or awk or sed to concatenate or merge two files linux

I have a file1.csv file which has four columns.
udp,4080,10.11.76.172,10.121.147.99
tcp,22,10.21.146.131,10.131.149.91
tcp,8080,10.56.10.91,10.151.150.90
Another file2.yml file as below
ssh_port: "22"
Jenkins_port: 8080
sqlstr_port: "5162-5164"
I need to compare the two files and merge to one based on the port number.
I have tried something like this.
for port in $(cat file1.csv | cut -d',' -f2); do if [[ $port =~ ..
Is there any simple method where I can merge two files based on the port number, Where I need to get the output similar to like this.
tcp,22,10.21.146.131,10.131.149.91,ssh_port
tcp,8080,10.56.10.91,10.151.150.90,jenkins_port
Could you please try following awk and let me know if this helps you(this will not consider that your values have port ranges in file2).
awk 'FNR==NR{sub(/:/,"",$1);gsub(/\"/,"",$NF);a[$NF]=$1;next} ($2 in a){print $0,a[$2]}' FIle2.yml FS="," OFS="," FIle1.csv
EDIT: If you have ranges in your file2 separated with - then following may help you on same too.
awk 'FNR==NR{sub(/:/,"",$1);gsub(/\"/,"",$NF);if($NF~/-/){num=split($NF,array,"-");for(i=array[1];i<=array[num];i++){a[i]=$1}} else {a[$NF]=$1};next} ($2 in a){print $0,a[$2]}' FIle2.yml FS="," OFS="," FIle1.csv
Adding a non-one liner form of above solution too now.
awk '
FNR==NR{
sub(/:/,"",$1);
gsub(/\"/,"",$NF);
if($NF~/-/) { num=split($NF,array,"-");
for(i=array[1];i<=array[num];i++) { a[i]=$1 }}
else {a[$NF]=$1}; next }
($2 in a) { print $0,a[$2] }
' FIle2.yml FS="," OFS="," FIle1.csv
Extended awk solution considering multiple ports like 5162-5164:
awk 'NR == FNR{
gsub(/[:"]/, "");
len = split($2, a, "-");
for (i=1; i<=len; i++) ports[a[i]] = $1;
next
}
$2 in ports{ print $0, ports[$2] }' file2.yml FS=',' OFS=',' file1.csv
The output:
tcp,22,10.21.146.131,10.131.149.91,ssh_port
tcp,8080,10.56.10.91,10.151.150.90,Jenkins_port

How to mark line in target file in case line matched

my bash script read each line from file - /tmp/file.CSV until EOF
And find if this line match line in other file - /tmp/target.CSV ( in case of full match bash script need to add "+" in the beginning of the matched line )
for example
line="/VPNfig/EME/EM3/Ucll/ucelobeconn/6EKoHH11" ( from /tmp/file.CSV )
we see that $line have full match with line:
1,ull,LINUX,"/VPNfig/EME/EM3/Ucll/ucelobeconn/6EKoHH11",fnt,rfdr,OK ( from /tmp/target.CSV )
then we need to add "+" on the line in /tmp/target.CSV as
+1,ull,LINUX,"/VPNfig/EME/EM3/Ucll/ucelobeconn/6EKoHH11",fnt,rfdr,OK
please advice how to do that with sed or awk or maybe perl one liner in my bash script
more /tmp/target.CSV
1,ull,LINUX,"/VPNfig/EME/EM3/Ucll/ucelobeconn/6EKoHH11",fnt,rfdr,OK
2,Ama,LINUX,"/VPNfig/EME/EM8/Franlecom Eana SA/Amen",comrse,temporal,OK
3,ArnTel,LINUX,"/VPConfig/EME/EM3/ArmenTem Armenia)/ArmenTe",Coers,FAIL
4,Ahh,LINUX,"/VPConfig/EMA/EM/llk/AAe",Coers,FAIL
142,ucell,LINUX,/VPNAAonfig/EMEA/EM3/Ucell/ede3fc34,Glo,G/rvrev443,OK
more file.CSV
/VPNfig/EME/EM3/Ucll/ucelobeconn/6EKoHH11
/VPNfig/EME/EM8/Franlecom Eana SA/Amen
/VPConfig/EME/EM3/ArmenTem Armenia)/ArmenTe
/VPConfig/EME/EM0/TTR/Ar
/VPNAAonfig/EMEA/EM3/Ucell/ede3fc34
my bash code
while read -r line
do
grep -iq "$line" /tmp/target.CSV
if [[ $? -ne 0 ]]
then
echo "$line" NOT MATCH target.CSV
else
sed .................
fi
done < /tmp/file.CSV
Example of expected results (according to the files /tmp/target.CSV file.CSV )
more /tmp/target.CSV
+1,ull,LINUX,"/VPNfig/EME/EM3/Ucll/ucelobeconn/6EKoHH11",fnt,rfdr,OK
+2,Ama,LINUX,"/VPNfig/EME/EM8/Franlecom Eana SA/Amen",comrse,temporal,OK
+3,ArnTel,LINUX,"/VPConfig/EME/EM3/ArmenTem Armenia)/ArmenTe",Coers,FAIL
4,Ahh,LINUX,"/VPConfig/EMA/EM/llk/AAe",Coers,FAIL
more file.CSV
+/VPNfig/EME/EM3/Ucll/ucelobeconn/6EKoHH11
+/VPNfig/EME/EM8/Franlecom Eana SA/Amen
+/VPConfig/EME/EM3/ArmenTem Armenia)/ArmenTe
/VPConfig/EME/EM0/TTR/Ar
+/VPNAAonfig/EMEA/EM3/Ucell/ede3fc34
awk -F\" -v OFS=\" 'FNR==NR{ a[$0]++; next} $2 in a { $0 = "+" $0 } 1' file.csv target.csv
Output:
+1,ull,LINUX,"/VPNfig/EME/EM3/Ucll/ucelobeconn/6EKoHH11",fnt,rfdr,OK
+2,Ama,LINUX,"/VPNfig/EME/EM8/Franlecom Eana SA/Amen",comrse,temporal,OK
+3,ArnTel,LINUX,"/VPConfig/EME/EM3/ArmenTem Armenia)/ArmenTe",Coers,FAIL
4,Ahh,LINUX,"/VPConfig/EMA/EM/llk/AAe",Coers,FAIL
Or
awk -F\" -v OFS=\" 'FNR==NR{ a[$0]++; next} { print ($2 in a ? "+" : " ") $0 }' file.csv target.csv
awk -F\" -v OFS=\" 'FNR==NR{ a[$0]++; next} { $0 = ($2 in a ? "+" : " ") $0 } 1' file.csv target.csv
Output:
+1,ull,LINUX,"/VPNfig/EME/EM3/Ucll/ucelobeconn/6EKoHH11",fnt,rfdr,OK
+2,Ama,LINUX,"/VPNfig/EME/EM8/Franlecom Eana SA/Amen",comrse,temporal,OK
+3,ArnTel,LINUX,"/VPConfig/EME/EM3/ArmenTem Armenia)/ArmenTe",Coers,FAIL
4,Ahh,LINUX,"/VPConfig/EMA/EM/llk/AAe",Coers,FAIL
And this one is valid whether each line starts with a single space or not:
awk -F\" -v OFS=\" 'FNR==NR{ a[$0]++; next} { sub(/^ ?/, $2 in a ? "+" : " ") } 1' file.csv target.csv
Try
awk -F\" -v OFS=\" 'FNR==NR{ a[$0]++; next} { sub(/^ ?/, $2 in a ? "+" : " ") } 1' file.csv target.csv
Update (1)
awk -F, -v OFS=, 'FNR==NR{ sub(/[ \t\r]*$/, ""); a[$0]++; next} { t = $4; gsub(/(^"|"$)/, "", t); sub(/^[ \t]*/, t in a ? "+" : " "); } 1' file.csv target.csv
Output:
+1,ull,LINUX,"/VPNfig/EME/EM3/Ucll/ucelobeconn/6EKoHH11",fnt,rfdr,OK
+2,Ama,LINUX,"/VPNfig/EME/EM8/Franlecom Eana SA/Amen",comrse,temporal,OK
+3,ArnTel,LINUX,"/VPConfig/EME/EM3/ArmenTem Armenia)/ArmenTe",Coers,FAIL
4,Ahh,LINUX,"/VPConfig/EMA/EM/llk/AAe",Coers,FAIL
+142,ucell,LINUX,/VPNAAonfig/EMEA/EM3/Ucell/ede3fc34,Glo,G/rvrev443,OK
Update (2)
awk -F, -v OFS=, 'FNR==NR{ sub(/[ \t\r]$/, ""); a[$0]++; b[FNR]=$0; next} { t = $4; gsub(/(^"|"$)/, "", t); r = " "; if (t in a) { c[t]++; r = "+" }; sub(/^[ \t]*/, r); } 1; END { for (i = 1; i in b; ++i) { t = b[i]; sub(/^[ \t]*/, t in c ? "+" : " ", t); print t > "/dev/stderr" } }' file.csv target.csv > new_target.csv 2> new_file.cs
Try this Perl's one liner:
perl -pi -e '$_="+".$_ if($_=~m{/VPNfig/EME/EM3/Ucll/ucelobeconn/6EKoHH11}is);' /tmp/target.CSV

Merging Multiple records into a Unique records with all the non-null values

Suppose I have 3 records :
P1||1234|
P1|56001||
P1|||NJ
I want to merge these 3 records into one with all the attributes. Final record :
P1|56001|1234|NJ
Is there any way to achieve this in Unix/Linux?
I assume you ask solution with bash, awk, sed etc.
You could try something like
$ cat test.txt
P1||1234|
P1|56001||
P1|||NJ
$ cat test.txt | awk -F'|' '{ for (i = 1; i <= NF; i++) print $i }' | egrep '.+' | sort | uniq | awk 'BEGIN{ c = "" } { printf c $0; c = "|" } END{ printf "\n" }'
1234|56001|NJ|P1
Briefly, awk splits the lines with '|' separator and prints each field to a line. egrep removes the empty lines. After that, sort and uniq removes multiple attributes. Finally, awk merges the lines with '|' separator.
Update:
If I understand correctly, here's what you seek for;
$ cat test.txt | awk -F'|' '{ for (i = 1; i <= NF; i++) if($i) col[i]=$i } END{ for (i = 1; i <= length(col); i++) printf col[i] (i == length(col) ? "\n" : "|")}'
P1|56001|1234|NJ
In your example, 1st row you have 1234, 2nd row you have 56001.
I don't get why in your final result, the 56001 goes before 1234. I assume it is a typo/mistake.
an awk-oneliner could do the job:
awk -F'|' '{for(i=2;i<=NF;i++)if($i)a[$1]=(a[$1]?a[$1]"|":"")$i}END{print $1"|"a[$1]}'
with your data:
kent$ echo "P1||1234|
P1|56001||
P1||NJ"|awk -F'|' '{for(i=2;i<=NF;i++)if($i)a[$1]=(a[$1]?a[$1]"|":"")$i}END{print $1"|"a[$1]}'
P1|1234|56001|NJ

Filling a shell script array with data

I want to extract some data from a file and save it in an array, but i don't know how to do it.
In the following I'm extracting some data from /etc/group and save it in another file, after that I print every single item:
awk -F: '/^'$GROUP'/ { gsub(/,/,"\n",$4) ; print $4 }' /etc/group > $FILE
for i in `awk '{ print $0 }' $FILE`
do
echo "member: "$i" "
done
However, I don't want to extract the data into a file, but into an array.
members=( $(awk -F: '/^'$GROUP':/ { gsub(/,/,"\n",$4) ; print $4 }' /etc/group) )
The assignment with the parentheses indicates that $members is an array. The original awk command has been enclosed in $(...), and the colon added so that if you have group and group1 in the file, and you look for group, you don't get the data for group1 too. Of course, if you wanted both entries, then you drop the colon I added.
j=0
for i in `awk '{ print $0 }' $FILE`
do
arr[$j] = $i
j=`expr $j + 1`
done
arr=($(awk -F: -v g=$GROUP '$1 == g { gsub(/,/,"\n",$4) ; print $4 }' /etc/group))

Resources