Using AWK to substitute

Using AWK to substitute - linux

First time attempting to tinker with AWK and use it to take input from a file like the following:
data.csv
James,Jones,30,Mr,Main St
Melissa,Greene,200,Mrs,Wall St
Robert,Krupp,410,Mr,Random St
and process it into a LaTeX Template
data.tex
\newcommand{\customertitle}{XYZ} %ggf. \
\newcommand{\customerName}{Max Sample} % Name \
\newcommand{\customerStreet}{Str} % Street \
\newcommand{\customerZIP}{12345} % ZIP
First I tried to replace the customer name this way
awk 'BEGIN{FS=","}{print $1 " " $2 }' data.csv | xargs -I{} sed "s/Max Sample/{}/" data.tex > names
which gave me a merged file.. and therefore I subsequently attempted to render the file as single .tex files by inserting a keyword "#TEST" at the end of the original file, so I could use it as a Record Separator to get me back to single files with the following command:
awk 'BEGIN {FS=RS="#TEST"} {i=1}{ while (i <= NF) {print $i >>"final"NR".tex"; i++}}' names
Even though that worked for this one field, for multiple fields it doesn't seem to be a proper solution as is though. (title, street, zip code)
That's why I'm now attempting to get it working with the gsub action in AWK.
Tried those different approaches. Based on what I could find regarding it thus far, that's what I came up with:
awk 'BEGIN {FS=","}NR==FNR{a[FNR]=$4;next}{gsub ("XYZ",a[FNR]);print}' data.csv data.tex
which replaces XYZ with nothing
awk 'BEGIN {FS=","}NR==FNR{a[FNR]=$4;next}RS="#TEST"{for (i in a) {gsub("XYZ",i);print}}' data.csv data.tex \
which counts four times to 7
Tried those also with the merged file, ie. the "names" output from the first command and didn't get it to work.
What am I missing? Can the gsub command not replace a string with an array? Is a loop required?
I'm stuck, hope someone can help out here

I hope it is your case.
Create a file csv_to_latex.awk and put this code:
BEGIN{
FS=","
while(getline < latex > 0) {
lax_array[$0]
}
}
{
name = $1" "$2
zip = $3
status = $4
street = $5
for (lax_key in lax_array)
{
if (lax_key ~ /XYZ/)
{
gsub("{XYZ}", "{"status"}", lax_key)
print lax_key
}
else if (lax_key ~ /Max Sample/)
{
gsub("{Max Sample}", "{"name"}", lax_key)
print lax_key
}
else if (lax_key ~ /Str/)
{
gsub("{Str}", "{"street"}", lax_key)
print lax_key
}
else if (lax_key ~ /12345/)
{
gsub("{12345}", "{"zip"}", lax_key)
print lax_key
}
}
}
To execute this code, use in terminal:
awk -v latex="data.tex" -f csv_to_latex.awk data.csv
The output:
\newcommand{\customertitle}{Mr} %ggf. \
\newcommand{\customerName}{James Jones} % Name \
\newcommand{\customerStreet}{Main St} % Street \
\newcommand{\customerZIP}{30} % ZIP
\newcommand{\customertitle}{Mrs} %ggf. \
\newcommand{\customerName}{Melissa Greene} % Name \
\newcommand{\customerStreet}{Wall St} % Street \
\newcommand{\customerZIP}{200} % ZIP
\newcommand{\customertitle}{Mr} %ggf. \
\newcommand{\customerName}{Robert Krupp} % Name \
\newcommand{\customerStreet}{Random St} % Street \
\newcommand{\customerZIP}{410} % ZIP

I feel like doing replaces is unnecessary and would probably approach it like this:
awk -F, -v po='\\newcommand{\\%s}{%s} %s \n' '{
printf po, "customertitle", $4, "%ggf. \\"
printf po, "customerName", $1" "$2, "% Name \\"
printf po, "customerStreet", $NF, "% Street \\"
printf po, "customerZip", $3, "% Zip"
}' data.csv
output:
\newcommand{\customertitle}{Mr} %ggf. \
\newcommand{\customerName}{James Jones} % Name \
\newcommand{\customerStreet}{Main St} % Street \
\newcommand{\customerZip}{30} % Zip
\newcommand{\customertitle}{Mrs} %ggf. \
\newcommand{\customerName}{Melissa Greene} % Name \
\newcommand{\customerStreet}{Wall St} % Street \
\newcommand{\customerZip}{200} % Zip
\newcommand{\customertitle}{Mr} %ggf. \
\newcommand{\customerName}{Robert Krupp} % Name \
\newcommand{\customerStreet}{Random St} % Street \
\newcommand{\customerZip}{410} % Zip

Related

Alternate way to print and sort output

I have a list of names and scores (First,Last,Score)
I'm trying to print out ONLY the last name that occurs most often in DESCENDING NUMERICAL order.
Here is an example list.
inisha__Ohler__1
Loralee__Hippe__5
Boyd__Leslie__8
Donnette__Cosentino__5
Viva__Bedsole__4
Jann__Banfield__3
Alan__Dionne__2
Sandee__Verdun__2
Raeann__Sweetman__3
Judson__Goers__2
Mandie__Salcedo__8
Yesenia__Bibeau__1
Doug__Petteway__9
Alejandra__Winter__9
Marquitta__Sang__7
Rusty__Rodrigue__2
Rickie__Devin__1
Marie__Elem__3
Faustina__Haltom__4
Dorthea__Ervin__4
Yesenia__Bibeau__5
Doug__Petteway__8
Alejandra__Winter__1
Marquitta__Sang__9
Rusty__Rodrigue__4
Yesenia__Bibeau__2
Doug__Petteway__4
Alejandra__Winter__3
Marquitta__Sang__6
Rusty__Rodrigue__6
Rickie__Devin__7
Marie__Elem__1
Faustina__Haltom__2
Dorthea__Ervin__4
I want to spit the output out using a single "|" or less.
cut -d "_" -f 3 scores | sort -r | uniq -c | sort -nr
Already works but I am looking for something less expensive.

I believe least expensive way to achieve the same is by using awk with sort as follows:
awk -F"__" '{ count[$2]++ } END {for (word in count) print count[word], word}' < scores | sort -r
and in case if you also want those three spaces in the beginning just like uniq -c provides you,
awk -F"__" '{ count[$2]++ } END {for (word in count) print " ", count[word], word}' < scores | sort -r

GNU awk-specific:
$ gawk -F__ '{ names[$2]++ }
END { PROCINFO["sorted_in"] = "#val_num_desc";
for (n in names) { print n }
}' input.txt
Sang
etc.

Using this perl one-liner
perl -aF/__/ -ne '$h{$F[1]}++; END{ print"$$_[0]\t$$_[1]\n" for sort {$$b[0]<=>$$a[0]} map {[$h{$_},$_]} keys %h }' <scores
or to show only the name that occurs most often
perl -MList::Util=max -aF/__/ -ne '$h{$F[1]}++; END{ $max=max(values%h); print "$h{$_}\t$_\n" for grep {$h{$_}==$max} keys%h }' <scores

Splitting data using awk but column name missing

I managed to get this command as I want but why my column name excluded?
this is my command
awk \
-v DATE="$(date +"%d%m%Y")" \
-F"," \
'BEGIN{OFS=","} NR>1{ gsub(/"/,"",$1); print > "Assignment_"$1"_"DATE".csv"}' \
Test_01012020.CSV
In my original files, test_01012020.csv contain column: name, class, age and etc but after I do splitting in files Assignment_"$1"_"DATE".csv" I just get the value for example : FARAH, CLASS A, 24 and etc but in the new file not included column name. I need column name as original file not header in my splitting files. can anyone help me?

#FARAH: Try:
awk \
-v DATE="$(date +"%d%m%Y")" \
-F"," \
'BEGIN{OFS=","} NR==1{print > "Assignment_"$1"_"DATE".csv"}} NR>1{ gsub(/"/,"",$1); print > "Assignment_"$1"_"DATE".csv"}' \
Test_01012020.CSV
Obviously it will not print headings as NR>1 means leave the very first line, try above and you could change the headings as per your need too.

AWK script automatically removing leading 0s from String

I have a file BLACK.FUL.eg2:
10>BLACK.FUL>272/GSMA/000000>151006>01
15>004401074905590>004401074905590>B>I>0011>Insert>240/PLMN/000100>>5000-K525122-15
15>004402145955010>004402145955010>B>I>0011>Insert>240/PLMN/000100>>1200-K108534-14
15>004402146016260>004402146016360>B>I>0011>Insert>240/PLMN/000100>>1200-K-94878-14
15>004402452698630>004402452698630>B>I>0011>Insert>240/PLMN/000100>>5000-K538947-14
90>BLACK.FUL>272/GSMA/000000>151006>01>4
I've written this AWK script:
awk 'NR > 2 { print p } { p = $0 }' BLACK.FUL.eg2 | awk -F">" \
'{if (length($2) == 15) print substr($2,1,length($2)-1)","substr($3,1,length($3)-1)","$6","$8; \
else print $2","$3","$6","$8;}' | awk -F"," '{if ($2 == $1) print $1","$3","$4; \
else {if (length($1) > 14) {v = substr($1,9,6); t = substr($2,9,6); \
while(v <= t) print substr($2,1,8)v++substr($2,15,2)","$3","$4;} \
else {d = $1;while(d <= $2) print d++","$3","$4;}}}'
which gives me an output of:
00440107490559,0011,240/PLMN/000100
00440214595501,0011,240/PLMN/000100
440214601626,0011,240/PLMN/000100
440214601627,0011,240/PLMN/000100
440214601628,0011,240/PLMN/000100
440214601629,0011,240/PLMN/000100
440214601630,0011,240/PLMN/000100
440214601631,0011,240/PLMN/000100
440214601632,0011,240/PLMN/000100
440214601633,0011,240/PLMN/000100
440214601634,0011,240/PLMN/000100
440214601635,0011,240/PLMN/000100
440214601636,0011,240/PLMN/000100
00440245269863,0011,240/PLMN/000100
with one problem: the leading 0s of strings in field1, are automatically getting removed due to a numeric operation on them. So my actual expected output is:
00440107490559,0011,240/PLMN/000100
00440214595501,0011,240/PLMN/000100
00440214601626,0011,240/PLMN/000100
00440214601627,0011,240/PLMN/000100
00440214601628,0011,240/PLMN/000100
00440214601629,0011,240/PLMN/000100
00440214601630,0011,240/PLMN/000100
00440214601631,0011,240/PLMN/000100
00440214601632,0011,240/PLMN/000100
00440214601633,0011,240/PLMN/000100
00440214601634,0011,240/PLMN/000100
00440214601635,0011,240/PLMN/000100
00440214601636,0011,240/PLMN/000100
00440245269863,0011,240/PLMN/000100
For that I'm trying the below updated AWK script:
awk 'NR > 2 { print p } { p = $0 }' BLACK.FUL.eg2 | awk -F">" \
'{if (length($2) == 15) print substr($2,1,length($2)-1)","substr($3,1,length($3)-1)","$6","$8; \
else print $2","$3","$6","$8;}' | awk -F"," '{if ($2 == $1) print $1","$3","$4; \
else {if (length($1) > 14) {v = substr($1,9,6); t = substr($2,9,6); \
while(v <= t) print substr($2,1,8)v++substr($2,15,2)","$3","$4;} \
else {d = $1; for ( i=1;i<length($1);i++ ) if (substr($1,i++,1) == "0") \
{m=m"0"; else exit 1;}; while(d <= $2) print md++","$3","$4;}}}'
But getting an error:
awk: cmd. line:4: {m=m"0"; else exit 1;}; while(d <= $2) print md++","$3","$4;}}}
awk: cmd. line:4: ^ syntax error
Can you please highlight what I'm doing wrong to achieve the expected output. Modification only for my already existing AWK script will be of much help. Thanks
NOTE: The Leading 0s can be of any number of occcurence, not only 2 0s in every case as in the above example outputs.

since your field sizes are fixed, for the given example just change the last print statement to
$ awk ... printf "%014d,%s,%s\n",d++,$3,$4}}}'
00440107490559,0011,240/PLMN/000100
00440214595501,0011,240/PLMN/000100
00440214601626,0011,240/PLMN/000100
00440214601627,0011,240/PLMN/000100
00440214601628,0011,240/PLMN/000100
00440214601629,0011,240/PLMN/000100
00440214601630,0011,240/PLMN/000100
00440214601631,0011,240/PLMN/000100
00440214601632,0011,240/PLMN/000100
00440214601633,0011,240/PLMN/000100
00440214601634,0011,240/PLMN/000100
00440214601635,0011,240/PLMN/000100
00440214601636,0011,240/PLMN/000100
00440245269863,0011,240/PLMN/000100
UPDATE
if your field size is not fixed, you can capture the length (or desired length) and use the same pattern. Since your code is too complicated, I'm going to write a proof of concept which you can embed into your script.
this is essentially your problem, increment a zero padded number and the leading zeros dropped.
$ echo 0001 | awk '{$1++; print $1}'
2
this is the proposed solution with parametric length with zero padding.
$ echo 0001 | awk '{n=length($1); $1++; printf "%0"n"s\n", $1}'
0002

How to use AWK to print line with highest number?

I have a question. Assuming I dump a file and do a grep for foo and comes out the result like this:
Foo-bar-120:'foo name 1'
Foo-bar-130:'foo name 2'
Foo-bar-1222:'foo name 3'
Etc.
All I want is trying to extract the foo name with largest number. For instance in this case, largest number is 1222 and the result I expect is foo name 3
Is there a easy way using awk and sed to achieve this? Rather than pull the number out line by line and loop through to find the largest number?

Code for awk:
awk -F[-:] '$3>a {a=$3; b=$4} END {print b}' file
$ cat file
Foo-bar-120:'foo name 1'
Foo-bar-130:'foo name 2'
Foo-bar-1222:'foo name 3'
$ awk -F[-:] '$3>a {a=$3; b=$4} END {print b}' file
'foo name 3'

Here's how I would do it. I just tested this in Cygwin. Hopefully it works under Linux as well. Put this into a file, such as mycommand:
#!/usr/bin/awk -f
BEGIN {
FS="-";
max = 0;
maxString = "";
}
{
num = $3 + 0; # convert string to int
if (num > max) {
max = num;
split($3, arr, "'");
maxString = arr[2];
}
}
END {
print maxString;
}
Then make the file executable (chmod 755 mycommand). Now you can pipe whatever you want through it by typing, for example, cat somefile | ./mycommand.

Assuming the line format is as shown with 2 hyphens before "the number":
cut -d- -f3- | sort -rn | sed '1{s/^[0-9]\+://; q}'

is this ok for you?
awk -F'[:-]' '{n=$(NF-1);if(n>m){v=$NF;m=n}}END{print v}'
with your data:
kent$ echo "Foo-bar-120:’foo name 1’
Foo-bar-130:’foo name 2’
Foo-bar-1222:’foo name 3’"|awk -F'[:-]' '{n=$(NF-1);if(n>m){v=$NF;m=n}}END{print v}'
’foo name 3’
P.S. I like the Field separator [:-]

$ awk '{gsub(/.*:.|.$/,"")} (NR==1)||($NF>max){max=$NF; val=$0} END{print val}' file
foo name 3

You don't need to use grep. you can use awk directly on your file as:
awk -F"[-:]" '/Foo/ && $3>prev{val=$NF;prev=$3}END{print val}' file

Cut and export command in shell script

I am working on a shell script which contains following piece of code.
I don't understand these lines, mostly the cut command and export command. Can any one help me...
Also please point me to a better linux command reference.
Thanks in advance!
# determine sum of 60 records
awk '{
if (substr($0,12,2) == "60" || substr($0,12,2) == "78") \
print $0
}'< /tmp/checks$$.1 > /tmp/checks$$.2
rec_sum =`cut -c 151-160 /tmp/checks$$.2 | /u/fourgen/cashnet/bin/sumit`
export rec_sum
Inside my sumit script following is the code
awk '{ total += $1}
END {print total}' $1
Let me show my main script prep_chk
awk 'BEGIN{OFS=""} {if (substr($0,12,2) == "60" && substr($0,151,1) == "-") \
{ print substr($0,1,11), "78", substr($0,14) } \
else \
{ print $0 } \
}' > /tmp/checks$$.1
# determine count of non-header record
rec_cnt=`wc -l /tmp/checks$$.1`
rec_cnt=`expr "$rec_cnt - 1"`
export rec_cnt
# determine sum of 60 records
awk '{ if (substr($0,12,2) == "60" || substr($0,12,2) == "78") \
print $0 }'< /tmp/checks$$.1 > /tmp/checks$$.2
rec_sum=`cut -c 151-160 /tmp/checks$$.2 | /u/fourgen/cashnet/bin/sumit`
export rec_sum
# make a new header record and output it
head -1 /tmp/checks$$.1 | awk '{ printf("%s%011.11d%05.5d%s\n", \
substr($0,1,45), rec_sum, rec_cnt, substr($0,62)) }' \
rec_sum="$rec_sum" rec_cnt="$rec_cnt"
# output everything else sorted by tran code
grep -v "%%%%%%%%%%%" /tmp/checks$$.1 | cut -c 1-150 | sort -k 1.12,13

cut -c cuts text from a given position in a file, in this case characters 151 to 160 in the file /tmp/checks$$.2. This string is piped to some code called submit which produces some output.
That output is then assigned to the environment variable rec_sum. The export command makes this variable available to be used through the system, for example in another shell script.
Edit:
If that's all you have inside your submit script it simply adds on the string you pass it, which it seems must be a number, to some value total and prints the number it was passed. It seems like there must be some more code inside that script otherwise it would be a bit of an over complicated way to do it.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Using AWK to substitute - linux

Related

Alternate way to print and sort output

Splitting data using awk but column name missing

AWK script automatically removing leading 0s from String

How to use AWK to print line with highest number?

Cut and export command in shell script

Categories

Resources