I managed to get this command as I want but why my column name excluded?
this is my command
awk \
-v DATE="$(date +"%d%m%Y")" \
-F"," \
'BEGIN{OFS=","} NR>1{ gsub(/"/,"",$1); print > "Assignment_"$1"_"DATE".csv"}' \
Test_01012020.CSV
In my original files, test_01012020.csv contain column: name, class, age and etc but after I do splitting in files Assignment_"$1"_"DATE".csv" I just get the value for example : FARAH, CLASS A, 24 and etc but in the new file not included column name. I need column name as original file not header in my splitting files. can anyone help me?
#FARAH: Try:
awk \
-v DATE="$(date +"%d%m%Y")" \
-F"," \
'BEGIN{OFS=","} NR==1{print > "Assignment_"$1"_"DATE".csv"}} NR>1{ gsub(/"/,"",$1); print > "Assignment_"$1"_"DATE".csv"}' \
Test_01012020.CSV
Obviously it will not print headings as NR>1 means leave the very first line, try above and you could change the headings as per your need too.
Related
First time attempting to tinker with AWK and use it to take input from a file like the following:
data.csv
James,Jones,30,Mr,Main St
Melissa,Greene,200,Mrs,Wall St
Robert,Krupp,410,Mr,Random St
and process it into a LaTeX Template
data.tex
\newcommand{\customertitle}{XYZ} %ggf. \
\newcommand{\customerName}{Max Sample} % Name \
\newcommand{\customerStreet}{Str} % Street \
\newcommand{\customerZIP}{12345} % ZIP
First I tried to replace the customer name this way
awk 'BEGIN{FS=","}{print $1 " " $2 }' data.csv | xargs -I{} sed "s/Max Sample/{}/" data.tex > names
which gave me a merged file.. and therefore I subsequently attempted to render the file as single .tex files by inserting a keyword "#TEST" at the end of the original file, so I could use it as a Record Separator to get me back to single files with the following command:
awk 'BEGIN {FS=RS="#TEST"} {i=1}{ while (i <= NF) {print $i >>"final"NR".tex"; i++}}' names
Even though that worked for this one field, for multiple fields it doesn't seem to be a proper solution as is though. (title, street, zip code)
That's why I'm now attempting to get it working with the gsub action in AWK.
Tried those different approaches. Based on what I could find regarding it thus far, that's what I came up with:
awk 'BEGIN {FS=","}NR==FNR{a[FNR]=$4;next}{gsub ("XYZ",a[FNR]);print}' data.csv data.tex
which replaces XYZ with nothing
awk 'BEGIN {FS=","}NR==FNR{a[FNR]=$4;next}RS="#TEST"{for (i in a) {gsub("XYZ",i);print}}' data.csv data.tex \
which counts four times to 7
Tried those also with the merged file, ie. the "names" output from the first command and didn't get it to work.
What am I missing? Can the gsub command not replace a string with an array? Is a loop required?
I'm stuck, hope someone can help out here
I hope it is your case.
Create a file csv_to_latex.awk and put this code:
BEGIN{
FS=","
while(getline < latex > 0) {
lax_array[$0]
}
}
{
name = $1" "$2
zip = $3
status = $4
street = $5
for (lax_key in lax_array)
{
if (lax_key ~ /XYZ/)
{
gsub("{XYZ}", "{"status"}", lax_key)
print lax_key
}
else if (lax_key ~ /Max Sample/)
{
gsub("{Max Sample}", "{"name"}", lax_key)
print lax_key
}
else if (lax_key ~ /Str/)
{
gsub("{Str}", "{"street"}", lax_key)
print lax_key
}
else if (lax_key ~ /12345/)
{
gsub("{12345}", "{"zip"}", lax_key)
print lax_key
}
}
}
To execute this code, use in terminal:
awk -v latex="data.tex" -f csv_to_latex.awk data.csv
The output:
\newcommand{\customertitle}{Mr} %ggf. \
\newcommand{\customerName}{James Jones} % Name \
\newcommand{\customerStreet}{Main St} % Street \
\newcommand{\customerZIP}{30} % ZIP
\newcommand{\customertitle}{Mrs} %ggf. \
\newcommand{\customerName}{Melissa Greene} % Name \
\newcommand{\customerStreet}{Wall St} % Street \
\newcommand{\customerZIP}{200} % ZIP
\newcommand{\customertitle}{Mr} %ggf. \
\newcommand{\customerName}{Robert Krupp} % Name \
\newcommand{\customerStreet}{Random St} % Street \
\newcommand{\customerZIP}{410} % ZIP
I feel like doing replaces is unnecessary and would probably approach it like this:
awk -F, -v po='\\newcommand{\\%s}{%s} %s \n' '{
printf po, "customertitle", $4, "%ggf. \\"
printf po, "customerName", $1" "$2, "% Name \\"
printf po, "customerStreet", $NF, "% Street \\"
printf po, "customerZip", $3, "% Zip"
}' data.csv
output:
\newcommand{\customertitle}{Mr} %ggf. \
\newcommand{\customerName}{James Jones} % Name \
\newcommand{\customerStreet}{Main St} % Street \
\newcommand{\customerZip}{30} % Zip
\newcommand{\customertitle}{Mrs} %ggf. \
\newcommand{\customerName}{Melissa Greene} % Name \
\newcommand{\customerStreet}{Wall St} % Street \
\newcommand{\customerZip}{200} % Zip
\newcommand{\customertitle}{Mr} %ggf. \
\newcommand{\customerName}{Robert Krupp} % Name \
\newcommand{\customerStreet}{Random St} % Street \
\newcommand{\customerZip}{410} % Zip
Tried various approaches, but nearest to working one:
Replace multiple spaces with single one
Replace commas(,) in INTERNAL_IP column with Pipe(|)
Remove 4th cloumn (PREEMPTIBLE) as it was causing IPs in INTERNAL_IP cloumn shift under it.
Replace space with comma(,) to prepare a csv file.
But did not work. Gets messed up at PREEMPTIBLE cloumn.
gcloud compute instances list > file1
tr -s " " < file1 > file2 // to replace multiple spaces with single one
sed s/\,/\|/g file2 > file3 // to replace , with pipe
awk '{$4=""; print $0}' file3 // to remove 4th column
sed -e 's/\s\+/,/g' file3 > final.csv
Output of gcloud compute instances list command:
Expected format:
Any help or suggestion is appreciated. Thank you in advance.
Edit:
Attached sample input and expected output files:
sample_input.txt
expected_output.xlsx
csv format is supported in gcloud CLI so everything you are doing can be done without sed/awk maybe with | tail -n +2 if you want to skip the column header :
gcloud compute instances list --format="csv(NAME,ZONE,MACHINE_TYPE,PREEMPTIBLE,INTERNAL_IP,EXTERNAL_IP,STATUS)" > final.csv
Or if you wanted to do something with the data in your bash script:
while IFS="," read -r NAME ZONE MACHINE_TYPE PREMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS
do
echo "NAME=$NAME ZONE=$ZONE MACHINE_TYPE=$MACHINE_TYPE PREMPTIBLE=$PREMPTIBLE INTERNAL_IP=$INTERNAL_IP EXTERNAL_IP=$EXTERNAL_IP STATUS=$STATUS"
done < <(gcloud compute instances list --format="csv(NAME,ZONE,MACHINE_TYPE,PREEMPTIBLE,INTERNAL_IP,EXTERNAL_IP,STATUS)" | tail -n +2 | awk ' BEGIN {print "NAME,ZONE,MACHINE_TYPE,PREMPTIBLE,INTERNAL_IP,EXTERNAL_IP,STATUS"} {print $1","$2","$3","" "","$4","" "","$5}' )
Based on attached files sample input & expected output i have made following change :
Some of the instances for multiple internal IPs and they are
separated by ",". I have replaced that "," with "-" using sed
's/,/-/g' to aviod conflicts with other fields as we are
generating a CSV.
Displaying $4 & $6 in 5th & 7th columns so that they will be aligned
with Column Headers Internal IP Address and Status
cat command_output.txt | grep -v 'NAME' | sed 's/,/-/g' | awk ' BEGIN {print "NAME,ZONE,MACHINE_TYPE,PREMPTIBLE,INTERNAL_IP,EXTERNAL_IP,STATUS"} {print $1","$2","$3","" "","$4","" "","$5}'
I need to grep value of ErrCode, ErrAttkey and ErrDesc from the below Input file.
and need to display as below in another file
How can i do this using shell script?
Required output
ErrCode|ErrAtkey|ErrDesc
003010|A3|The Unique Record IDalreadyExists
008024|A8|Prepaid / Postpaid not specified
Input File
<TariffRecords><Tariff><UniqueID>TT07PMST0088</UniqueID><SubStat>Failure</SubStat><ErrCode>003010</ErrCode><ErrAttKey>A3</ErrAttKey><ErrDesc>The' Unique Record ID already 'Exists</ErrDesc></Tariff><Tariff><UniqueID>TT07PMST0086</UniqueID><SubStat>Success</SubStat><ErrCode>000000</ErrCode><ErrAttKey></ErrAttKey><ErrDesc>SUCCESS</ErrDesc></Tariff><Tariff><UniqueID>TT07PMCM0048</UniqueID><SubStat>Failure</SubStat><ErrCode>003010</ErrCode><ErrAttKey>A3</ErrAttKey><ErrDesc>The' Unique Record ID already 'Exists</ErrDesc></Tariff><Tariff><UniqueID>TT07PMCM0049</UniqueID><SubStat>Failure</SubStat><ErrCode>003010</ErrCode><ErrAttKey>A3</ErrAttKey><ErrDesc>The' Unique Record ID already 'Exists</ErrDesc></Tariff><Tariff><UniqueID>TT07PMPV0188</UniqueID><SubStat>Failure</SubStat><ErrCode>003010</ErrCode><ErrAttKey>A3</ErrAttKey><ErrDesc>The' Unique Record ID already 'Exists</ErrDesc></Tariff><Tariff><UniqueID>TT07PMTP0060</UniqueID><SubStat>Failure</SubStat><ErrCode>003010</ErrCode><ErrAttKey>A3</ErrAttKey><ErrDesc>The' Unique Record ID already 'Exists</ErrDesc></Tariff><Tariff><UniqueID>TT07PMVS0072</UniqueID><SubStat>Failure</SubStat><ErrCode>003010</ErrCode><ErrAttKey>A3</ErrAttKey><ErrDesc>The' Unique Record ID already 'Exists</ErrDesc></Tariff><Tariff><UniqueID>TT07PMPO0073</UniqueID><SubStat>Failure</SubStat><ErrCode>003010</ErrCode><ErrAttKey>A3</ErrAttKey><ErrDesc>The' Unique Record ID already 'Exists</ErrDesc></Tariff><Tariff><UniqueID>TT07PMPO0073</UniqueID><SubStat>Failure</SubStat><ErrCode>008024</ErrCode><ErrAttKey>A8</ErrAttKey><ErrDesc>Prepaid' / Postpaid not 'specified</ErrDesc></Tariff><Tariff><UniqueID>TT07PMSK0005</UniqueID><SubStat>Failure</SubStat><ErrCode>003010</ErrCode><ErrAttKey>A3</ErrAttKey><ErrDesc>The' Unique Record ID already 'Exists</ErrDesc></Tariff><Tariff><UniqueID>TT07PMSK0005</UniqueID><SubStat>Failure</SubStat><ErrCode>005020</ErrCode><ErrAttKey>A5</ErrAttKey><ErrDesc>Invalid' LSA 'Name</ErrDesc></Tariff><Tariff><UniqueID>TT07PMSK0005</UniqueID><SubStat>Failure</SubStat><ErrCode>008024</ErrCode><ErrAttKey>A8</ErrAttKey><ErrDesc>Prepaid' / Postpaid not 'specified</ErrDesc></Tariff><Tariff><UniqueID>TT07PMSK0005</UniqueID><SubStat>Failure</SubStat><ErrCode>015038</ErrCode><ErrAttKey>A15</ErrAttKey><ErrDesc>Regular' / Promotional is 'compulsory</ErrDesc></Tariff><Tariff><UniqueID>TT07PMSK0005</UniqueID><SubStat>Failure</SubStat><ErrCode>018048</ErrCode><ErrAttKey>A18</ErrAttKey><ErrDesc>Special' Eligibility Conditions cannot be left blank. If no conditions, please enter '`NIL`</ErrDesc></Tariff><Tariff><UniqueID>TT07PMTP0080</UniqueID><SubStat>Success</SubStat><ErrCode>000000</ErrCode><ErrAttKey></ErrAttKey><ErrDesc>SUCCESS</ErrDesc></Tariff></TariffRecords>
EDIT: As per OP all results should be shown even they are coming multiple times in Input_file so in that case following may help.
awk '{gsub(/></,">"RS"<")} 1' Input_file |
awk -F"[><]" -v time="$(date +%r)" -v date="$(date +%d/%m/%Y)" '
/ErrCode/||/ErrAttKey/||/ErrDesc/{
val=val?val OFS $3:$3
}
/<\/Tariff>/{
print val,date,time,FILENAME;
val=""
}' OFS="|"
I am surprised that you are saying that all lines are actually a single line.
So in case you want to change them into multiple lines(which actually should be the case then do following in single awk).
awk '{gsub(/></,">"RS"<")} 1' Input_file > temp_file && mv temp_file Input_file
awk -F"[><]" '/ErrCode/{value=$3;a[value]++} a[value]==1 && NF>3 &&(/ErrCode/||/ErrAttKey/||/ErrDesc/){val=val?val OFS $3:$3} /<\/Tariff>/{if(val && val ~ /^[0-9]/){print val};val=""}' Input_file
In case you don't want to change your Input_file into multiple lines pattern then run these 2 commands with pipe as follows.
awk '{gsub(/></,">"RS"<")} 1' Input_file |
awk -F"[><]" '
/ErrCode/{
value=$3;
a[value]++
}
a[value]==1 && NF>3 && (/ErrCode/||/ErrAttKey/||/ErrDesc/){
val=val?val OFS $3:$3
}
/<\/Tariff>/{
if(val && val ~ /^[0-9]/){
print val};
val=""
}'
NOTE: 2 points to be noted here, 1st: If anywhere tag's ErrCode value is null or not starting from digits then that tag's values will not be printed. 2nd point is it will not print any duplicate of values of ErrCode tag.
Assuming the content of your xml is in a file file.txt, the following will work :
echo "ErrCode|ErrAtkey|ErrDesc" && cat file.txt | sed 's/<Tariff>/\n/g' | sed 's/.*<ErrCode>//g;s/<.*<ErrAttKey>/|/g;s/<.*<ErrDesc>/|/g;s/<.*//g' | grep -v '^$'
Sample Logs
Location Number Status Comment
Delhi 919xxx Processed Test File 1
Mumbai 918xxx Got Stucked Test File 123
I'm trying to add one tab space after Status using AWK, but getting error.
Sample Query
awk '{$3 = $3 "\t"; print}' z
Getting Output As
Location Number Status Comment
Delhi 919xxx Processed Test File 1
Mumbai 918xxx **Got** **Stucked** Test File 123
As it is taking 'Got Stucked' as multiple fields please suggest.
If you only want one tab after the header text Status to make it look better, use sub to the first record only:
$ awk 'NR==1 {sub(/Status/,"Status\t")} 1' file
Location Number Status Comment
Delhi 919xxx Processed Test File 1
Mumbai 918xxx Got Stucked Test File 123
This way awk won't rebuild the record and replace FS with OFS etc.
#JamesBrown's answer sounds like what you asked for but also consider:
$ awk -F' +' -v OFS='\t' '{$1=$1}1' file | column -s$'\t' -t
Location Number Status Comment
Delhi 919xxx Processed Test File 1
Mumbai 918xxx Got Stucked Test File 123
The awk converts every sequence of 2+ spaces to a tab so the result is a tab-separated stream which column can then convert to a visually aligned table if that's your ultimate goal. Or you could generate a CSV to read into Excel or similar:
$ awk -F' +' -v OFS=',' '{$1=$1}1' file
Location,Number,Status,Comment
Delhi,919xxx,Processed,Test File 1
Mumbai,918xxx,Got Stucked,Test File 123
$ awk -F' +' -v OFS=',' '{for(i=1;i<=NF;i++) $i="\""$i"\""}1' file
"Location","Number","Status","Comment"
"Delhi","919xxx","Processed","Test File 1"
"Mumbai","918xxx","Got Stucked","Test File 123"
or more robustly:
$ awk -F' +' -v OFS=',' '{$1=$1; for(i=1;i<=NF;i++) { gsub(/"/,"\"\"",$i); if ($i~/[[:space:],"]/) $i="\""$i"\"" } }1' file
Location,Number,Status,Comment
Delhi,919xxx,Processed,"Test File 1"
Mumbai,918xxx,"Got Stucked","Test File 123"
If your input fields aren't always separated by at least 2 blank chars then tell us how they are separated.
Try using
awk -F' ' '{$3 = $3 "\t"; print}' z
The problem is that awk consider (by default) a single space as the separator between two column. This means that Got Stucked are actually two different columns.
With -F' ' you tell awk to use a double space as the separator to distinguish between two columns.
I am working on a shell script which contains following piece of code.
I don't understand these lines, mostly the cut command and export command. Can any one help me...
Also please point me to a better linux command reference.
Thanks in advance!
# determine sum of 60 records
awk '{
if (substr($0,12,2) == "60" || substr($0,12,2) == "78") \
print $0
}'< /tmp/checks$$.1 > /tmp/checks$$.2
rec_sum =`cut -c 151-160 /tmp/checks$$.2 | /u/fourgen/cashnet/bin/sumit`
export rec_sum
Inside my sumit script following is the code
awk '{ total += $1}
END {print total}' $1
Let me show my main script prep_chk
awk 'BEGIN{OFS=""} {if (substr($0,12,2) == "60" && substr($0,151,1) == "-") \
{ print substr($0,1,11), "78", substr($0,14) } \
else \
{ print $0 } \
}' > /tmp/checks$$.1
# determine count of non-header record
rec_cnt=`wc -l /tmp/checks$$.1`
rec_cnt=`expr "$rec_cnt - 1"`
export rec_cnt
# determine sum of 60 records
awk '{ if (substr($0,12,2) == "60" || substr($0,12,2) == "78") \
print $0 }'< /tmp/checks$$.1 > /tmp/checks$$.2
rec_sum=`cut -c 151-160 /tmp/checks$$.2 | /u/fourgen/cashnet/bin/sumit`
export rec_sum
# make a new header record and output it
head -1 /tmp/checks$$.1 | awk '{ printf("%s%011.11d%05.5d%s\n", \
substr($0,1,45), rec_sum, rec_cnt, substr($0,62)) }' \
rec_sum="$rec_sum" rec_cnt="$rec_cnt"
# output everything else sorted by tran code
grep -v "%%%%%%%%%%%" /tmp/checks$$.1 | cut -c 1-150 | sort -k 1.12,13
cut -c cuts text from a given position in a file, in this case characters 151 to 160 in the file /tmp/checks$$.2. This string is piped to some code called submit which produces some output.
That output is then assigned to the environment variable rec_sum. The export command makes this variable available to be used through the system, for example in another shell script.
Edit:
If that's all you have inside your submit script it simply adds on the string you pass it, which it seems must be a number, to some value total and prints the number it was passed. It seems like there must be some more code inside that script otherwise it would be a bit of an over complicated way to do it.