Search and replace null and dot in a column of file

Search and replace null and dot in a column of file - linux

I want to search and replace null and dot in a column of file using awk or sed.
The file's content is:
02-01-12 28.46
02-02-12 27.15
02-03-12
02-04-12 27.36
02-05-12 47.57
02-06-12 27.01
02-07-12 27.41
02-08-12 27.27
02-09-12 27.39
02-10-12 .
02-11-12 27.44
02-12-12 49.93
02-13-12 26.99
02-14-12 27.47
02-15-12 27.21
02-16-12 27.48
02-17-12 27.66
02-18-12 27.15
02-19-12 51.74
02-20-12 27.37
The dots and null value can be be appeared in any rows in the file, I want to replace null and dots with the value above, say ,
02-01-12 28.46
02-02-12 27.15
02-03-12 27.15 ****** replace with the above value
02-04-12 27.36
02-05-12 47.57
02-06-12 27.01
02-07-12 27.41
02-08-12 27.27
02-09-12 27.39
02-10-12 27.39 ****** replace with the above value
02-11-12 27.44
02-12-12 49.93
02-13-12 26.99
02-14-12 27.47
02-15-12 27.21
02-16-12 27.48
02-17-12 27.66
02-18-12 27.15
02-19-12 51.74
02-20-12 27.37

This might work for you (GNU sed):
sed -i '$!N;s/^\(.\{9\}\(.*\)\n.\{9\}\)\.\?$/\1\2/;P;D' file

awk 'BEGIN {prev = "00.00"} NF < 2 || $2 == "." {$2 = prev} {prev = $2; print}' filename
If you have multiple columns which might have missing data:
awk 'BEGIN {p = "00.00"} {for (i = 1; i <= NF; i++) {if (! $i || $i == ".") {if (prev[i]) {$i = prev[i]} else {$i = p}}; prev[i] = $i}; print}' filename

The following awk script should work:
BEGIN {
last="00.00"
}
{
if ($2 != "" && $2 != ".") {
last=$2
}
print $1 " " last
}

$ awk -v prev=00.00 'NF<2 || $2=="." { print $1, prev; next }{prev=$2}1' input-file

Related

Changing previous duplicate line in awk

I want to change all duplicate names in .csv to unique, but after finding duplicate I cannot reach previous line, because it's already printed. I've tried to save all lines in array and print them in End section, but it doesn't work and I don't understand how to access specific field in this array (two-dimensional array isn't supported in awk?).
sample input
...,9,phone,...
...,43,book,...
...,27,apple,...
...,85,hook,...
...,43,phone,...
desired output
...,9,phone9,...
...,43,book,...
...,27,apple,...
...,85,hook,...
...,43,phone43,...
My attempt ($2 - id field, $3 - name field)
BEGIN{
FS=","
OFS=","
marker=777
}
{
if (names[$3] == marker) {
$3 = $3 $2
#Attempt to change previous duplicate
results[nameLines[$3]]=$3 id[$3]
}
names[$3] = marker
id[$3] = $2
nameLines[$3] = NR
results[NR] = $0
}
END{
#it prints some numbers, not saved lines
for(result in results)
print result
}

Here is single pass awk that stores all records in buffer:
awk -F, '
{
rec[NR] = $0
++fq[$3]
}
END {
for (i=1; i<=NR; ++i) {
n = split(rec[i], a, /,/)
if (fq[a[3]] > 1)
a[3] = a[3] a[2]
for (k=1; k<=n; ++k)
printf "%s", a[k] (k < n ? FS : ORS)
}
}' file
...,9,phone9,...
...,43,book,...
...,27,apple,...
...,85,hook,...
...,43,phone43,...

This could be easily done in 2 pass Input_file in awk where we need not to create 2 dimensional arrays in it. With your shown samples written in GNU awk.
awk '
BEGIN{FS=OFS=","}
FNR==NR{
arr1[$3]++
next
}
{
$3=(arr1[$3]>1?$3 $2:$3)
}
1
' Input_file Input_file
Output will be as follows:
...,9,phone9,...
...,43,book,...
...,27,apple,...
...,85,hook,...
...,43,phone43,...

In bash how to move row field to column in a text file

I have a .txt file with this record:
field_1 value01a value01b value01c
field_2 value02
field_3 value03a value03b value03c
field_1 value11
field_2 value12a value12b
field_3 value13
field_1 value21
field_2 value22
field_3 value23
...
field_1 valuen1
field_2 valuen2
field_3 valuen3
I would like to convert them like that:
field1 field2 field3
value01a value01b value01c valu02 value03a value03b value03c
value11 value12a value12b value13
value21 value22 value23
...
valuen1 valuen2 valuen3
I have tried something like:
awk '{for (i = 1; i <NR; i ++) FNR == i {print i, $ (i + 1)}}' filename
or like
awk '
{
for (i=1; i<=NF; i++) {
a[NR,i] = $i
}
}
NF>p { p = NF }
END {
for(j=1; j<=p; j++) {
str=a[1,j]
for(i=2; i<=NR; i++){
str=str" "a[i,j]
}
print str
}
}'
but i can't get it to work
I would like the values to be transposed and that each tuple of values associated with a specific field is aligned with the others
Any suggestions?
Thank you in advance

I have downloaded your bigger sample file. And here is what I have come up with:
awk -v OFS='\t' -v RS= '
((n = split($0, a, / {2,}| *\n/)) % 2) == 0 {
# print header
if (NR==1)
for (i=1; i<=n; i+=2)
printf "%s", a[i] (i < n-1 ? OFS : ORS)
# print all records
for (i=2; i<=n; i+=2)
printf "%s", a[i] (i < n ? OFS : ORS)
}' reclamiTestFile.txt | column -t -s $'\t'
Code Demo

Could you please try following, written and tested with shown samples in GNU awk.
awk '
{
first=$1
$1=""
sub(/^ +/,"")
if(!arr[first]++){
++indArr
counter[indArr]=first
}
++count[first]
arr[first OFS count[first]]=$0
}
END{
for(j=1;j<=indArr;j++){
printf("%s\t%s",counter[j],j==indArr?ORS:"\t")
}
for(i=1;i<=FNR;i++){
for(j=1;j<=indArr;j++){
if(arr[counter[j] OFS i]){
printf("%s\t%s",arr[counter[j] OFS i],j==indArr?ORS:"\t")
}
}
}
}' Input_file | column -t -s $'\t'
column command is taken from #anubhava sir's answer here.

parse text vertical to horizontal

I'm looking to parse the following data:
T
E
S
T
_
7
TTTTTTT
EEEEEEE
SSSSSSS
TTTTTTT
_______
5679111
012
into something like:
TEST_7
TEST_5, TEST_6, TEST_7, TEST_9, TEST_10, TEST_11, TEST_12
Any suggestions could help. Ty

awk to the rescue!
This is basically a transpose operation
awk 'BEGIN {FS=""}
{for(i=1;i<=NF;i++) a[NR,i]=$i;
if(max<NF)max=NF}
END {for(i=1;i<=max;i++)
{for(j=1;j<=NR;j++) printf "%s",a[j,i];
print ""}}' file
TEST_7TEST_5
TEST_6
TEST_7
TEST_9
TEST_10
TEST_11
TEST_12
you need to explain the rules on how to transform this to your desired layout.

Python:
#!/usr/bin/python
txt='''\
T
E
S
T
_
7
TTTTTTT
EEEEEEE
SSSSSSS
TTTTTTT
_______
5679111
012 '''
row_len=max(len(line.rstrip()) for line in txt.splitlines())
arr=[list('{:{w}}'.format(line.rstrip(), w=row_len)) for line in txt.splitlines()]
print '\n'.join([''.join(t) for t in zip(*arr)])
Or, awk:
awk 'BEGIN{RS="[ ]*\n"}
{lines[NR]=$0
max=length($0)>max ? length($0) : max }
END{ for (i=1; i in lines; i++)
lines[i]=sprintf("%-*s", max, lines[i])
for (i=1;i<=max; i++){
for (j=1; j in lines; j++)
printf "%s", substr(lines[j], i, 1)
print ""
}
}' file
Prints:
TEST_7TEST_5
TEST_6
TEST_7
TEST_9
TEST_10
TEST_11
TEST_12

In awk (well GNU awk for -F ''):
$ awk -F '' '
NR!=1 && NF!=p {
for(i=1;i<=p;i++)
printf "%s%s",a[i],(i==p?ORS:"")
delete a
p=NF }
NR==1 || NF==p {
for(i=1;i<=NF;i++)
a[i]=a[i] $i
p=NF
j++ }
END {
for(i=1;i<=p;i++)
printf "%s%s",a[i],(i==p?ORS:", ") }
' file
TEST_7
TEST_5 , TEST_6 , TEST_7 , TEST_9 , TEST_10, TEST_11, TEST_12
It detects change (and prints buffered) when record length (NF actually) changes.

AWK file reformatting

I'm struggling to reformat a comma separated file using awk. The file contains minute data for a day for multiple servers and for multiple metrics
e.g 2 records, per minute, per server for 24hrs
Example input file:
server01,00:01:00,AckDelayAverage,9999
server01,00:01:00,AckDelayMax,8888
server01,00:02:00,AckDelayAverage,666
server01,00:02:00,AckDelayMax,5555
.....
server01,23:58:00,AckDelayAverage,4545
server01,23:58:00,AckDelayMax,8777
server01,23:59:00,AckDelayAverage,4686
server01,23:59:00,AckDelayMax,7820
server02,00:01:00,AckDelayAverage,1231
server02,00:01:00,AckDelayMax,4185
server02,00:02:00,AckDelayAverage,1843
server02,00:02:00,AckDelayMax,9982
.....
server02,23:58:00,AckDelayAverage,1022
server02,23:58:00,AckDelayMax,1772
server02,23:59:00,AckDelayAverage,1813
server02,23:59:00,AckDelayMax,9891
I'm trying to re-format the file to have a single row for each minute with a unique concatenation of fields 1 & 3 as the column headers
e.g the expected output file would look like:
Minute, server01-AckDelayAverage,server01-AckDelayMax, server02-AckDelayAverage,server02-AckDelayMax
00:01:00,9999,8888,1231,4185
00:02:00,666,5555,1843,8892
...
...
23:58:00,4545,8777,1022,1772
23:59:00,4686,7820,1813,9891

A solution using GNU awk. Call this as awk -F, -f script input_file:
/Average/ { average[$2, $1] = $4; }
/Max/ { maximum[$2, $1] = $4; }
{
if (!($2 in minutes)) {
minutes[$2] = 1;
}
if (!($1 in servers)) {
servers[$1] = 1;
}
}
END {
mcount = asorti(minutes, smin);
scount = asorti(servers, sserv);
printf "minutes";
for (col = 1; col <= scount; col++) {
printf "," sserv[col] "-average," sserv[col] "-maximum";
}
print "";
for (row = 1; row <= mcount; row++) {
key = smin[row];
printf key;
for (col = 1; col <= scount; col++) {
printf "," average[key, sserv[col]] "," maximum[key, sserv[col]];
}
print "";
}
}

run awk command : ./script.awk file
#! /bin/awk -f
BEGIN{
FS=",";
OFS=","
}
$1 ~ /server01/ && $3 ~ /Average/{
a[$2]["Avg01"] = $4;
}
$1 ~ /server01/ && $3 ~ /Max/{
a[$2]["Max01"] = $4;
}
$1 ~ /server02/ && $3 ~ /Average/{
a[$2]["Avg02"] = $4;
}
$1 ~ /server02/ && $3 ~ /Max/{
a[$2]["Max02"] = $4;
}
END{
print "Minute","server01-AckDelayAverage","server01-AckDelayMax","server02-AckDelayAverage","server02-AckDelayMax"
for(i in a){
print i,a[i]["Avg01"],a[i]["Max01"],a[i]["Avg02"],a[i]["Max02"] | "sort"
}
}

With awk and sort:
awk -F, -v OFS=, '{
a[$2]=(a[$2]?a[$2]","$4:$4)
}
END{
for ( i in a ) print i,a[i]
}' File | sort
If $4 has 0 values:
awk -F, -v OFS=, '!a[$2]{a[$2]=$2} {a[$2]=a[$2]","$4} END{for ( i in a ) print a[i]}' | sort
!a[$2]{a[$2]=$2}: If array with a with Index $2 ( the time in Minute) doesn't exit, array a with index as $2( the time in Minute) with value as $2 is created. True when Minute entry first time occurs in line.
{a[$2]=a[$2]","$4}: Concatenate value $4 to this array
END: Print all values of in array a
Finally pipe this awk result to sort.

Awk between two patterns with pattern in the middle

Hi i am looking for an awk that can find two patterns and print the data between them to
a file only if in the middle there is a third patterns in the middle.
for example:
Start
1
2
middle
3
End
Start
1
2
End
And the output will be:
Start
1
2
middle
3
End
I found in the web awk '/patterns1/, /patterns2/' path > text.txt
but i need only output with the third patterns in the middle.

And here is a solution without flags:
$ awk 'BEGIN{RS="End"}/middle/{printf "%s", $0; print RT}' file
Start
1
2
middle
3
End
Explanation: The RS variable is the record separator, so we set it to "End", so that each Record is separated by "End".
Then we filter the Records that contain "middle", with the /middle/ filter, and for the matched records we print the current record with $0 and the separator with print RT

This awk should work:
awk '$1=="Start"{ok++} ok>0{a[b++]=$0} $1=="middle"{ok++} $1=="End"{if(ok>1) for(i=0; i<length(a); i++) print a[i]; ok=0;b=0;delete a}' file
Start
1
2
middle
3
End
Expanded:
awk '$1 == "Start" {
ok++
}
ok > 0 {
a[b++] = $0
}
$1 == "middle" {
ok++
}
$1 == "End" {
if (ok > 1)
for (i=0; i<length(a); i++)
print a[i];
ok=0;
b=0;
delete a
}' file

Just use some flags with awk:
/Start/ {
start_flag=1
}
/middle/ {
mid_flag=1
}
start_flag {
n=NR;
lines[NR]=$0
}
/End/ {
if (start_flag && mid_flag)
for(i=n;i<NR;i++)
print lines[i]
start_flag=mid_flag=0
delete lines
}

Modified the awk user000001
awk '/middle/{printf "%s%s\n",$0,RT}' RS="End" file
EDIT:
Added test for Start tag
awk '/Start/ && /middle/{printf "%s%s\n",$0,RT}' RS="End" file

This will work with any modern awk:
awk '/Start/{f=1;rec=""} f{rec=rec $0 ORS} /End/{if (rec~/middle/) printf "%s",rec}' file
The solutions that set RS to "End" are gawk-specific, which may be fine but it's definitely worth mentioning.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Search and replace null and dot in a column of file - linux

This might work for you (GNU sed): sed -i '$!N;s/^\(.\{9\}\(.*\)\n.\{9\}\)\.\?$/\1\2/;P;D' file

The following awk script should work: BEGIN { last="00.00" } { if ($2 != "" && $2 != ".") { last=$2 } print $1 " " last }

$ awk -v prev=00.00 'NF<2 || $2=="." { print $1, prev; next }{prev=$2}1' input-file

Related

Changing previous duplicate line in awk

In bash how to move row field to column in a text file

parse text vertical to horizontal

AWK file reformatting

Awk between two patterns with pattern in the middle

Categories

Resources