Replace null/blank columns with last known column (sed / awk / script)

Replace null/blank columns with last known column (sed / awk / script) - linux

Can someone help me figure out how to replace the empty columns with the last known value. Here is a line that i would like the number "0.7588044" to replace the null values in this line:
0.7723808|0.767398|0.7645381|0.7605125|0.759718|0.7588044|0.7588044|0.7588044|0.7588044|0.7588044|0.7588044||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
In other words, I would like "0.7588044" to be between the empty/null "|" delimeters at the end of the line.
I can't figure out how to do this with something like sed. Any help would be greatly appreciated.
Here are the first 3 lines of my file:
66943|0.9939215|0.9873032|0.9791299|0.9708792|0.9623731|0.9535987|0.945847|0.9379317|0.9286675|0.9203091|0.9127985|0.9041528|0.8966769|0.8902251|0.8832675|0.8778407|0.8734665|0.8679647|0.8616999|0.8560756|0.8518617|0.8463235|0.8410841|0.8342401|0.8311638|0.8261909|0.8252836|0.8218218|0.8177906|0.815474|0.8122096|0.8115648|0.8108233|0.8108233|0.8108233|0.8108233|0.8108233|0.8108233||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
69550|0.9946427|0.9888051|0.9815896|0.9742986|0.966774|0.9590039|0.9521323|0.9451087|0.9368793|0.9294462|0.9227601|0.9150554|0.9083862|0.9026252|0.896407|0.8915528|0.8876377|0.8827099|0.8770942|0.8720485|0.8682655|0.8632902|0.8585799|0.8524216|0.8496516|0.8451712|0.8443534|0.8412323|0.8375956|0.8355048|0.8325575|0.8319751|0.8313053|0.8313053|0.8313053|0.8313053|0.8313053|0.8313053||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
380713|0.9942899|0.9880703|0.9803859|0.9726248|0.9646193|0.9563567|0.9490533|0.941592|0.9328543|0.9249665|0.917875|0.9097072|0.9026409|0.8965395|0.8899569|0.8848204|0.8806788|0.8754678|0.8695317|0.8642001|0.8602043|0.8549507|0.8499787|0.8434811|0.8405594|0.8358352|0.834973|0.8316831|0.8278509|0.8256481|0.8225436|0.8219303|0.8212249|0.8212249|0.8212249|0.8212249|0.8212249|0.8212249||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The awk code works but just the first line:

You can use the following awk script:
awk -F'|' 'BEGIN{OFS="|"}{for(i=1;i<NF;i++){if($i==""){$i=l}else{l=$i}}print}'
It is better readable in this form:
BEGIN {
OFS="|" # set output field separator to |
}
{
for(i=1;i<NF;i++) { # iterate through columns
if($i=="") { # if current column is empty
$i=l # use the last value
} else {
l=$i # else store the value
}
}
print # print the line
}

This might work for you (GNU sed):
sed -r ':a;s/^(.*\|([^|]+)\|)\|/\1\2|/;ta' file

Some shorter version of solution hek2mgl
awk '{for(i=1;i<NF;i++) $i=($i=="")?l:l=$i}1' FS=\| OFS=\| file

Related

How to edit output rows from awk with defined position?

Is there a way how to solve this?
I have a bash script, which creates .dat and .log file from source files.
I'm using awk with print and position what I need to print. The problem is with the last position - ID2 (lower). It should be just \*[0-9]{3}\*#, but in some cases there is a string before [0-9]{12}\[00]\>.
Then row looks for example like this:
2020-01-11 01:01:01;test;test123;123456789123[00]>*123*#
What I need is remove the string before in a file:
2020-01-11 01:01:01;test;test123;*123*#
File structure:
YYYY-DD-MM HH:MM:SS;string;ID1;ID2
I will be happy for any advice, thanks.

awk 'BEGIN{FS=OFS=";"} {$NF=substr($NF,length($NF)-5)}1' file
Here we keep only last 6 characters of the last field, while semicolon is the field separator. If there is nothing else in front of that *ID*#, then we keep all of it.

Delete everything before the first *:
$ awk 'BEGIN{FS=OFS=";"}{sub(/^[^*]*/,"",$NF)}1' file
Output:
2020-01-11 01:01:01;test;test123;*123*#

Could you please try following tested and written with shown samples in GNU awk.
awk '
match($0,/[0-9]{12}\[[0-9]+\]>/) && /\*[0-9]{3}\*#/{
print substr($0,1,RSTART-1) substr($0,RSTART+RLENGTH)
}
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
match($0,/[0-9]{12}\[[0-9]+\]>/) && /\*[0-9]{3}\*#/{ ##Using match function to match regex in it, what regex does is: It matches digits(12 in number) then [ then digits(continuously coming) and ] Also checking condition if line ends with *3 digits *
print substr($0,1,RSTART-1) substr($0,RSTART+RLENGTH) ##If above condition is TRUE then printing sub-string from 1st character to RSTART-1 and then sub-string from RSTART+RLENGTH value to till last of line.
}
' Input_file ##Mentioning Input_file name here.

using sed to extract certain strings from config file

I am trying to export certain strings from below output, however i have no experience with sed/awk and i need some advise how can i proceed with that.
Input:
name Cleartext-Password := "password", Service-Type := Framed-User
Framed-IP-Address := 127.0.0.1,
MS-Primary-DNS-Server := 8.8.8.8,
Fall-Through = Yes,
Mikrotik-Rate-Limit = 20M/30M
The output should be:
name;password;127.0.0.1;20M;30M;
I am not sure if this is correct way to do that, but i have tried to remove everything between my required string, for example:
sed 's/ Cleartext-Password := "/;/'
However i think this is dirty way and not the clever one.
Could you please let me know what i need to look for in order to create working sed/awk solution for this?

Could you please try following based on your shown samples. Written and tested it in site
https://ideone.com/eWXv3w
Since OP's Input_file has control M characters so added gsub(/\r/,"") in code here.
awk '
BEGIN{ OFS=";" }
{ gsub(/\r/,"") }
match($0,/Cleartext-Password[^,]*/){
val=substr($0,RSTART,RLENGTH)
gsub(/Cleartext-Password[^"]*|"/,"",val)
val=$1 OFS val
next
}
/Framed-IP-Address/{
sub(/,$/,"")
val=val OFS $NF
next
}
/Mikrotik-Rate-Limit/{
print val, $NF
val=""
}' Input_file
Explanation: In BEGIN section of program setting OFS to semi colon as per question. Then using match function of awk to match regex from string Cleartext...Cleartext-Password[^,]* till first comma comes. If regex matches perfectly then capturing that sub-string in variable val here. Now using gsub to globally substitute everything from Cleartext-Password and all un-necessary stuff there as per required output.
Then checking if line contains Framed-IP-Address if it's found then send substituting , from last of line and adding that line last field to variable val here.
Now checking condition if a line contains Mikrotik-Rate-Limit then simply printing value of val and last field here, nullifying val here too.

There are a number of ways to approach this with awk, the key is to match part of the record with the regular expression to identify the record you are operating on and then isolate the wanted test and output in the desired format.
One approach would be:
awk '
/Cleartext-Password/ { printf "%s;%s;", $1, substr($4,2,length($4)-3) }
/Framed-IP-Address/ { printf "%s;", substr($NF,1,length($NF)-1) }
/Mikrotik-Rate-Limit/{ sub(/\//,";",$NF); printf "%s;\n", $NF }
' config
Example Use/Output
With your sample input in the file named config, you would receive:
name;password;127.0.0.1;20M;30M;
Look things over and let me know if I misunderstood anywhere.

This might work for you (GNU sed):
sed -nE -e '/Cleartext-Password/{s/ .*:=\s"(.*)",.*/;\1/;h}' \
-e '/Framed-IP-Address/{s/.*:= (.*),/\1/;H}' \
-e '/Mikrotik-Rate-Limit/{s#.*= (.*)/(.*)#\1;\2#;H;g;y/\n/;/;p}' file
Turn off implicit printing by invoking the -n option.
Reduce back slashes by invoking the -E option.
Stash the fields of the record in the hold space and when all fields have been collected, copy the hold space to the pattern space, replace newlines by the field separators and print the result.
You may prefer:
sed -nE '/Cleartext-Password/{s/ .*:=\s"(.*)",.*/;\1/;h};
/Framed-IP-Address/{s/.*:= (.*),/\1/;H};
/Mikrotik-Rate-Limit/{s#.*= (.*)/(.*)#\1;\2#;H;g;y/\n/;/;p}' file

How to extract data with field value greater than particular number

I am trying to extract the values/methods which are taking more than particular milliseconds, I am unable to provide correct field separator
awk -F'=' '$3>960' file
awk -F'=||' '$3>960' file
This is a sample line:
logAlias=Overall|logDurationMillis=34|logTimeStart=2019-09-12_05:22:02.602|logTimeStop=2019-09-12_05:22:02.636|logTraceUID=43adbcaf55de|getMethod1=26|getMethod2=0|getMethod3=0|getMethod4=1|getMethod5=8
I do not see any result or i see it gives me all the transactions

Here is a generic, robust and easily extendible way:
awk -F'|' '{
for(i=1;i<=NF;++i) {
split($i,kv,"=")
f[kv[1]]=kv[2]
}
}
f["logDurationMillis"]>960' file

You may use
awk -F'[=|]' '$4>960' file
Note that [=|] is a regex matching either = or | and the value you want to compare against appears in the fourth field.
See online demo:
s="logAlias=Overall|logDurationMillis=34|logTimeStart=2019-09-12_05:22:02.602|logTimeStop=2019-09-12_05:22:02.636|logTraceUID=43adbcaf55de|getMethod1=26|getMethod2=0|getMethod3=0|getMethod4=1|getMethod5=8
logAlias=Overall|logDurationMillis=980|logTimeStart=2019-09-12_05:22:02.602|logTimeStop=2019-09-12_05:22:02.636|logTraceUID=43adbcaf55de|getMethod1=26|getMethod2=0|getMethod3=0|getMethod4=1|getMethod5=8"
awk -F'[=|]' '$4>960' <<< "$s"
Output:
logAlias=Overall|logDurationMillis=980|logTimeStart=2019-09-12_05:22:02.602|logTimeStop=2019-09-12_05:22:02.636|logTraceUID=43adbcaf55de|getMethod1=26|getMethod2=0|getMethod3=0|getMethod4=1|getMethod5=8

Could you please try following, this may help you in case your string is not coming in a fixed place.
awk '
match($0,/logDurationMillis=[0-9]+/){
if(substr($0,RSTART+18,RLENGTH-18)+0>960){
print
}
}
' Input_file
2nd Solution:
awk '
match($0,/logDurationMillis=[0-9]+/){
val=substr($0,RSTART,RLENGTH)
sub(/.*=/,"",val)
if(val+0>960){
print
}
}
' Input_file

Here is how I do it:
awk -F'logDurationMillis=' '{split($2,a,"[|]")} a[1]>960' file
If its log duration logDurationMillis you are looking for, I do set it as a line separator. This way I know for sure the next data is the value to get. Then split the next data by | to get the number in front of it. Then a[1] have your value and you can test it against what you need. No loop, so should be fast.

Linux cut, paste

I have to write a script file to cut the following column and paste it the end of the same row in a new .arff file. I guess the file type doesn't matter.
Current file:
63,male,typ_angina,145,233,t,left_vent_hyper,150,no,2.3,down,0,fixed_defect,'<50'
67,male,asympt,160,286,f,left_vent_hyper,108,yes,1.5,flat,3,normal,'>50_1'
The output should be:
male,typ_angina,145,233,t,left_vent_hyper,150,no,2.3,down,0,fixed_defect,'<50',63
male,asympt,160,286,f,left_vent_hyper,108,yes,1.5,flat,3,normal,'>50_1',67
how can I do this? using a Linux script file?

sed -r 's/^([^,]*),(.*)$/\2,\1/' Input_file
Brief explanation,
^([^,]*) would match the first field which separated by commas, and \1 behind refer to the match
(.*)$ would be the remainding part except the first comma, and \2 would refer to the match

Shorter awk solution:
$ awk -F, '{$(NF+1)=$1;sub($1",","")}1' OFS=, input.txt
gives:
male,typ_angina,145,233,t,left_vent_hyper,150,no,2.3,down,0,fixed_defect,'<50',63
male,asympt,160,286,f,left_vent_hyper,108,yes,1.5,flat,3,normal,'>50_1',67
Explanation:
{$(NF+1)=$1 # add extra field with value of field $1
sub($1",","") # search for string "$1," in $0, replace it with ""
}1 # print $0
EDIT: Reading your comments following your question, looks like your swapping more columns than just the first to the end of the line. You might consider using a swap function that you call multiple times:
func swap(i,j){s=$i; $i=$j; $j=s}
However, this won't work whenever you want to move a column to the end of the line. So let's change that function:
func swap(i,j){
s=$i
if (j>NF){
for (k=i;k<NF;k++) $k=$(k+1)
$NF=s
} else {
$i=$j
$j=s
}
}
So now you can do this:
$ cat tst.awk
BEGIN{FS=OFS=","}
{swap(1,NF+1); swap(2,5)}1
func swap(i,j){
s=$i
if (j>NF){
for (k=i;k<NF;k++) $k=$(k+1)
$NF=s
} else {
$i=$j
$j=s
}
}
and:
$ awk -f tst.awk input.txt
male,t,145,233,typ_angina,left_vent_hyper,150,no,2.3,down,0,fixed_defect,'<50',63
male,f,160,286,asympt,left_vent_hyper,108,yes,1.5,flat,3,normal,'>50_1',67

Why using sed or awk, the shell can handle this easily
while read l;do echo ${l#*,},${l%%,*};done <infile
If it's a win file with \r
while read l;do f=${l%[[:cntrl:]]};echo ${f#*,},${l%%,*};done <infile
If you want to keep the file in place.
printf "%s" "$(while read l;do f=${l%[[:cntrl:]]};printf "%s\n" "${f#*,},${l%%,*}";done <infile)">infile

grep only for certain word on line

Need to grep only the word between the 2nd and 3rd to last /
This is shown in the extract below, to note that the location on the filename is not always the same counting from the front. Any ideas would be helpful.
/home/user/Drive-backup/2010 Backup/2010 Account/Jan/usernameneedtogrep/user.dir/4.txt

Here is a Perl script that does the job:
my $str = q!/home/user/Drive-backup/2010 Backup/2010 Account/Jan/usernameneedtogrep/user.dir/4.txt!;
my $res = (split('/',$str))[-3];
print $res;
output:
usernameneedtogrep

I'd use awk:
awk -F/ '{print $(NF-2)}'
splits on /
NF is the index of the last column, $NF the last column itself and $(NF-2) the 3rd-to-last column.
You might of course first need to filter out lines in your input that are not paths (e.g. using grep and then piping to awk)

a regular expression something like this should do the trick:
/.\/(.+?)\/.*?\/.*$/
(note I'm using lazy searches (+? and *?) so that it doesn't includes slashes where we don't want it to)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Replace null/blank columns with last known column (sed / awk / script) - linux

This might work for you (GNU sed): sed -r ':a;s/^(.*\|([^|]+)\|)\|/\1\2|/;ta' file

Some shorter version of solution hek2mgl awk '{for(i=1;i<NF;i++) $i=($i=="")?l:l=$i}1' FS=\| OFS=\| file

Related

How to edit output rows from awk with defined position?

using sed to extract certain strings from config file

How to extract data with field value greater than particular number

Linux cut, paste

grep only for certain word on line

Categories

Resources