AWK script automatically removing leading 0s from String

AWK script automatically removing leading 0s from String - linux

I have a file BLACK.FUL.eg2:
10>BLACK.FUL>272/GSMA/000000>151006>01
15>004401074905590>004401074905590>B>I>0011>Insert>240/PLMN/000100>>5000-K525122-15
15>004402145955010>004402145955010>B>I>0011>Insert>240/PLMN/000100>>1200-K108534-14
15>004402146016260>004402146016360>B>I>0011>Insert>240/PLMN/000100>>1200-K-94878-14
15>004402452698630>004402452698630>B>I>0011>Insert>240/PLMN/000100>>5000-K538947-14
90>BLACK.FUL>272/GSMA/000000>151006>01>4
I've written this AWK script:
awk 'NR > 2 { print p } { p = $0 }' BLACK.FUL.eg2 | awk -F">" \
'{if (length($2) == 15) print substr($2,1,length($2)-1)","substr($3,1,length($3)-1)","$6","$8; \
else print $2","$3","$6","$8;}' | awk -F"," '{if ($2 == $1) print $1","$3","$4; \
else {if (length($1) > 14) {v = substr($1,9,6); t = substr($2,9,6); \
while(v <= t) print substr($2,1,8)v++substr($2,15,2)","$3","$4;} \
else {d = $1;while(d <= $2) print d++","$3","$4;}}}'
which gives me an output of:
00440107490559,0011,240/PLMN/000100
00440214595501,0011,240/PLMN/000100
440214601626,0011,240/PLMN/000100
440214601627,0011,240/PLMN/000100
440214601628,0011,240/PLMN/000100
440214601629,0011,240/PLMN/000100
440214601630,0011,240/PLMN/000100
440214601631,0011,240/PLMN/000100
440214601632,0011,240/PLMN/000100
440214601633,0011,240/PLMN/000100
440214601634,0011,240/PLMN/000100
440214601635,0011,240/PLMN/000100
440214601636,0011,240/PLMN/000100
00440245269863,0011,240/PLMN/000100
with one problem: the leading 0s of strings in field1, are automatically getting removed due to a numeric operation on them. So my actual expected output is:
00440107490559,0011,240/PLMN/000100
00440214595501,0011,240/PLMN/000100
00440214601626,0011,240/PLMN/000100
00440214601627,0011,240/PLMN/000100
00440214601628,0011,240/PLMN/000100
00440214601629,0011,240/PLMN/000100
00440214601630,0011,240/PLMN/000100
00440214601631,0011,240/PLMN/000100
00440214601632,0011,240/PLMN/000100
00440214601633,0011,240/PLMN/000100
00440214601634,0011,240/PLMN/000100
00440214601635,0011,240/PLMN/000100
00440214601636,0011,240/PLMN/000100
00440245269863,0011,240/PLMN/000100
For that I'm trying the below updated AWK script:
awk 'NR > 2 { print p } { p = $0 }' BLACK.FUL.eg2 | awk -F">" \
'{if (length($2) == 15) print substr($2,1,length($2)-1)","substr($3,1,length($3)-1)","$6","$8; \
else print $2","$3","$6","$8;}' | awk -F"," '{if ($2 == $1) print $1","$3","$4; \
else {if (length($1) > 14) {v = substr($1,9,6); t = substr($2,9,6); \
while(v <= t) print substr($2,1,8)v++substr($2,15,2)","$3","$4;} \
else {d = $1; for ( i=1;i<length($1);i++ ) if (substr($1,i++,1) == "0") \
{m=m"0"; else exit 1;}; while(d <= $2) print md++","$3","$4;}}}'
But getting an error:
awk: cmd. line:4: {m=m"0"; else exit 1;}; while(d <= $2) print md++","$3","$4;}}}
awk: cmd. line:4: ^ syntax error
Can you please highlight what I'm doing wrong to achieve the expected output. Modification only for my already existing AWK script will be of much help. Thanks
NOTE: The Leading 0s can be of any number of occcurence, not only 2 0s in every case as in the above example outputs.

since your field sizes are fixed, for the given example just change the last print statement to
$ awk ... printf "%014d,%s,%s\n",d++,$3,$4}}}'
00440107490559,0011,240/PLMN/000100
00440214595501,0011,240/PLMN/000100
00440214601626,0011,240/PLMN/000100
00440214601627,0011,240/PLMN/000100
00440214601628,0011,240/PLMN/000100
00440214601629,0011,240/PLMN/000100
00440214601630,0011,240/PLMN/000100
00440214601631,0011,240/PLMN/000100
00440214601632,0011,240/PLMN/000100
00440214601633,0011,240/PLMN/000100
00440214601634,0011,240/PLMN/000100
00440214601635,0011,240/PLMN/000100
00440214601636,0011,240/PLMN/000100
00440245269863,0011,240/PLMN/000100
UPDATE
if your field size is not fixed, you can capture the length (or desired length) and use the same pattern. Since your code is too complicated, I'm going to write a proof of concept which you can embed into your script.
this is essentially your problem, increment a zero padded number and the leading zeros dropped.
$ echo 0001 | awk '{$1++; print $1}'
2
this is the proposed solution with parametric length with zero padding.
$ echo 0001 | awk '{n=length($1); $1++; printf "%0"n"s\n", $1}'
0002

Related

bash convert rows into columns in table format

I'm trying to convert rows into columns in table format.
Server Name : dev1-151vd
Status : DONE
Begin time : 2021-12-20 04:30:05.458719-05:00
End time : 2021-12-20 04:33:15.549731-05:00
Server Name : dev2-152vd
Status : DONE
Begin time : 2021-12-20 04:30:05.405746-05:00
End time : 2021-12-20 04:30:45.212935-05:00
I used the following awk script to transpose rows into columns
awk -F":" -v n=4 \
'BEGIN { x=1; c=0;}
++c <= n && x == 1 {print $1; buf = buf $2 "\n";
if(c == n) {x = 2; printf buf} next;}
!/./{c=0;next}
c <=n {printf "%4s\n", $2}' temp1.txt | \
paste - - - - | \
column -t -s "$(printf "\t")"
Server Name Status Begin time End time
dev1-151vd DONE 2021-12-20 04 2021-12-20 04
dev2-152vd DONE 2021-12-20 04 2021-12-20 04
The above o/p doesn't have proper begin time & End time,Please let me know how to get the formatting right so the time is printed appropriately.

$ cat tst.awk
BEGIN { OFS="\t" }
NF {
if ( ++fldNr == 1 ) {
recNr++
rec = ""
}
tag = val = $0
sub(/[[:space:]]*:.*/,"",tag)
sub(/[^:]+:[[:space:]]*/,"",val)
hdr = hdr (fldNr>1 ? OFS : "") tag
rec = rec (fldNr>1 ? OFS : "") val
next
}
{
if ( recNr == 1 ) {
print hdr
}
print rec
fldNr = 0
}
END { if (fldNr) print rec }
$ awk -f tst.awk file | column -s$'\t' -t
Server Name Status Begin time End time
dev1-151vd DONE 2021-12-20 04:30:05.458719-05:00 2021-12-20 04:33:15.549731-05:00
dev2-152vd DONE 2021-12-20 04:30:05.405746-05:00 2021-12-20 04:30:45.212935-05:00
The above will work no matter how many lines per record you have in your input and whether you have other :s or %ss or anything else.

See this script:
awk -F": " -v n=4 \
'BEGIN { x=1; c=0;}
++c <= n && x == 1 {print $1; buf = buf $2 "\n";
if(c == n) {x = 2; printf buf} next;}
!/./{c=0;next}
c <=n {printf "%4s\n", $2}' 20211222.txt | \
paste - - - - | \
column -t -s "$(printf "\t")"
Output:
Server Name Status Begin time End time
dev1-151vd DONE 2021-12-20 04:30:05.458719-05:00 2021-12-20 04:33:15.549731-05:00
dev2-152vd DONE 2021-12-20 04:30:05.405746-05:00 2021-12-20 04:30:45.212935-05:00
Explanation:
In awk, the -F option means field-separator. In your code you used a colon to separate columns from one another. However in your input, some lines have more than 1 colon (i.e. your timestamp field alone has 3 colons) therefore awk interprets these as having 5 columns.
The solution is to add a whitespace to your field separator (": "), since your input does have a whitespace after the first colon and before your second column.

How to separate number and unit from variable when using awk

In a 10 line awk script I need to split the content of a variable into a number variable and an unit variable. Here is a simplified example
~$ echo 139506MB | awk '{
ex = index("KMGTPEZY", substr($1, length($1)));
val = substr($1, 0, length($1) - 2);
print ex " " val
}'
0 139506
I know the unit part is always 2 chars, but for some reason ex always returns 0 instead of MB as I was hoping.
Question
Any idea why ex doesn't contain the unit?

The logic in your index() function is wrong, the character you've extracted is not part of the string you've defined. Hence the return value 0 you are seeing.
For a regex approach using GNU Awk for storing captured groups to an array. With the match() function you could do as below. The captured groups are stored into the array(ar) from which you can access the elements 1 and 2.
echo 139506MB | gawk 'match($0, /([[:digit:]]+)([[:alpha:]]+)/, ary) {print ary[1] ary[2]}'

Your substr() call is substr($1, length($1)) which will return only the last character of $1 (B). This character is not part of the string KMGTPEZY.
$ echo '139506MB' | awk '{ n=$1+0; sub(n,"",$1); print $1,n }'
MB 139506
This uses the fact that converting a string to a number discards everything from the first non-digit. This allows us to store the number in n using $1+0 (force interpreting the first field as a number). We then remove the number from the original line using sub(). The number and the remaining text is then printed.

Using GNU awk and split's seps to abuse the .B as the separator to separate number and unit from variable when using (GNU) awk:
$ echo 139506MB | awk '{split($1,a,/.B/,seps);print seps[1],a[1]}'
MB 139506
Also, regarding your code: You (try to) set the index of M in string KMGTPEZY so I assume you are looking for ex==2. By fixing the substr like below:
$ echo 139506MB | awk '{
ex = index("KMGTPEZY", substr($1, length($1)-1,1)); # from substr($1, length($1))
# ex = substr($1, length($1)-1,1); # uncomment for the unit
val = substr($1, 0, length($1) - 2);
print ex " " val
}'
2 139506
Maybe you should update the OP with the expected output.

Following awk may help you on same too.
str="139506MB"
echo "$str" | awk '
match($0,/[0-9]+/){
val=substr($0,RSTART+RLENGTH);
if(val ~ /[a-zA-Z]+/){
print substr($0,RSTART,RLENGTH),val}
}'

The first issue is here:
substr($1, length($1))
You are getting the last character of the string, which is "B". There is no "B" in "KMGTPEZY", so index returns 0.
I don't think you need to use index at all. To use substr:
ex = substr($1, length($1) - 1);
val = substr($1, 0, length($1) - 2);
Testing:
$ awk '{ print substr($1, length($1) - 1), substr($1, 0, length($0) - 2) }' <<< '139506MB'
MB 139506

parse text vertical to horizontal

I'm looking to parse the following data:
T
E
S
T
_
7
TTTTTTT
EEEEEEE
SSSSSSS
TTTTTTT
_______
5679111
012
into something like:
TEST_7
TEST_5, TEST_6, TEST_7, TEST_9, TEST_10, TEST_11, TEST_12
Any suggestions could help. Ty

awk to the rescue!
This is basically a transpose operation
awk 'BEGIN {FS=""}
{for(i=1;i<=NF;i++) a[NR,i]=$i;
if(max<NF)max=NF}
END {for(i=1;i<=max;i++)
{for(j=1;j<=NR;j++) printf "%s",a[j,i];
print ""}}' file
TEST_7TEST_5
TEST_6
TEST_7
TEST_9
TEST_10
TEST_11
TEST_12
you need to explain the rules on how to transform this to your desired layout.

Python:
#!/usr/bin/python
txt='''\
T
E
S
T
_
7
TTTTTTT
EEEEEEE
SSSSSSS
TTTTTTT
_______
5679111
012 '''
row_len=max(len(line.rstrip()) for line in txt.splitlines())
arr=[list('{:{w}}'.format(line.rstrip(), w=row_len)) for line in txt.splitlines()]
print '\n'.join([''.join(t) for t in zip(*arr)])
Or, awk:
awk 'BEGIN{RS="[ ]*\n"}
{lines[NR]=$0
max=length($0)>max ? length($0) : max }
END{ for (i=1; i in lines; i++)
lines[i]=sprintf("%-*s", max, lines[i])
for (i=1;i<=max; i++){
for (j=1; j in lines; j++)
printf "%s", substr(lines[j], i, 1)
print ""
}
}' file
Prints:
TEST_7TEST_5
TEST_6
TEST_7
TEST_9
TEST_10
TEST_11
TEST_12

In awk (well GNU awk for -F ''):
$ awk -F '' '
NR!=1 && NF!=p {
for(i=1;i<=p;i++)
printf "%s%s",a[i],(i==p?ORS:"")
delete a
p=NF }
NR==1 || NF==p {
for(i=1;i<=NF;i++)
a[i]=a[i] $i
p=NF
j++ }
END {
for(i=1;i<=p;i++)
printf "%s%s",a[i],(i==p?ORS:", ") }
' file
TEST_7
TEST_5 , TEST_6 , TEST_7 , TEST_9 , TEST_10, TEST_11, TEST_12
It detects change (and prints buffered) when record length (NF actually) changes.

Search and replace null and dot in a column of file

I want to search and replace null and dot in a column of file using awk or sed.
The file's content is:
02-01-12 28.46
02-02-12 27.15
02-03-12
02-04-12 27.36
02-05-12 47.57
02-06-12 27.01
02-07-12 27.41
02-08-12 27.27
02-09-12 27.39
02-10-12 .
02-11-12 27.44
02-12-12 49.93
02-13-12 26.99
02-14-12 27.47
02-15-12 27.21
02-16-12 27.48
02-17-12 27.66
02-18-12 27.15
02-19-12 51.74
02-20-12 27.37
The dots and null value can be be appeared in any rows in the file, I want to replace null and dots with the value above, say ,
02-01-12 28.46
02-02-12 27.15
02-03-12 27.15 ****** replace with the above value
02-04-12 27.36
02-05-12 47.57
02-06-12 27.01
02-07-12 27.41
02-08-12 27.27
02-09-12 27.39
02-10-12 27.39 ****** replace with the above value
02-11-12 27.44
02-12-12 49.93
02-13-12 26.99
02-14-12 27.47
02-15-12 27.21
02-16-12 27.48
02-17-12 27.66
02-18-12 27.15
02-19-12 51.74
02-20-12 27.37

This might work for you (GNU sed):
sed -i '$!N;s/^\(.\{9\}\(.*\)\n.\{9\}\)\.\?$/\1\2/;P;D' file

awk 'BEGIN {prev = "00.00"} NF < 2 || $2 == "." {$2 = prev} {prev = $2; print}' filename
If you have multiple columns which might have missing data:
awk 'BEGIN {p = "00.00"} {for (i = 1; i <= NF; i++) {if (! $i || $i == ".") {if (prev[i]) {$i = prev[i]} else {$i = p}}; prev[i] = $i}; print}' filename

The following awk script should work:
BEGIN {
last="00.00"
}
{
if ($2 != "" && $2 != ".") {
last=$2
}
print $1 " " last
}

$ awk -v prev=00.00 'NF<2 || $2=="." { print $1, prev; next }{prev=$2}1' input-file

unfolding a file on linux

I have a huge textfile, approx 400.000 lines 80 charachters wide on liux.
Need to "unfold" the file, merging four lines into one
ending up having 1/4 of the lines, each line 80*4 charachters long.
any suggestions?

perl -pe 'chomp if (++$i % 4);'

An easier way to do it with awk would be:
awk '{ printf $0 } (NR % 4 == 0) { print }' filename
Although if you wanted to protect against ending up without a trailing newline it gets a little more complicated:
awk '{ printf $0 } (NR % 4 == 0) { print } END { if (NR % 4 != 0) print }' filename

I hope I understood your question correctly. You have an input line like this (except your lines are longer):
abcdef
ghijkl
mnopqr
stuvwx
yz0123
456789
ABCDEF
You want output like this:
abcdefghijklmnopqrstuvwx
yz0123456789ABCDEF
The following awk program should do it:
{ line = line $0 }
(NR % 4) == 0 { print line; line = "" }
END { if (line != "") print line }
Run it like this:
awk -f merge.awk data.txt

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

AWK script automatically removing leading 0s from String - linux

Related

bash convert rows into columns in table format

How to separate number and unit from variable when using awk

parse text vertical to horizontal

Search and replace null and dot in a column of file

unfolding a file on linux

Categories

Resources