How to replace newlines between brackets - linux

I have log file similar to this format
test {
seq-cont {
0,
67,
266
},
grp-id 505
}
}
test{
test1{
val
}
}
Here is the echo command to produce that output
$ echo -e "test {\nseq-cont {\n\t\t\t0,\n\t\t\t67,\n\t\t\t266\n\t\t\t},\n\t\tgrp-id 505\n\t}\n}\ntest{\n\ttest1{\n\t\tval\n\t}\n}\n"
Question is how to remove all whitespace between seq-cont { and the next } that may be multiple in the file.
I want the output to be like this. Preferably use sed to produce the output.
test{seq-cont{0,67,266},
grp-id 505
}
}
test{
test1{
val
}
}
Efforts by OP: Here is the one somewhat worked but not exactly what I wanted:
sed ':a;N;/{/s/[[:space:]]\+//;/}/s/}/}/;ta;P;D' logfile

It can be done using gnu-awk with a custom RS regex that matches { and closing }:
awk -v RS='{[^}]+}' 'NR==1 {gsub(/[[:space:]]+/, "", RT)} {ORS=RT} 1' file
test {seq-cont{0,67,266},
grp-id 505
}
}
test{
test1{
val
}
}
Here:
NR==1 {gsub(/[[:space:]]+/, "", RT)}: For the first record replace all whitespaces (including line breaks) with empty string.
{ORS=RT}: Set ORS to whatever text we captured in RS
PS: Remove NR==1 if you want to do this for entire file.

With your shown samples, please try following awk program. Tested and written in GNU awk.
awk -v RS= '
match($0,/{\nseq-cont {\n[^}]*/){
val=substr($0,RSTART,RLENGTH)
gsub(/[[:space:]]+/,"",val)
print substr($0,1,RSTART-1) val substr($0,RSTART+RLENGTH)
}
' Input_file
Explanation: Simple explanation would be, using RS capability to set it to null. Then using match function of awk to match everything between seq-cont { to till next occurrence of }. Removing all spaces, new lines in matched value. Finally printing all the values including newly edited values to get expected output mentioned by OP.

You can do that much easier with perl:
perl -0777 -i -pe 's/\s+(seq-cont\s*\{[^}]*\})/$1=~s|\s+||gr/ge' logfilepath
The -0777 option tells perl to slurp the file into a single string, -i saves changes inline, \s+(seq-cont\s*\{[^}]*\}) regex matches one or more whitespaces, then captures into Group 1 ($1) seq-cont, zero or more whitespaces, and then a substring between the leftmost { and the next } char ([^}]* matches zero or more chars other than }) and then all one or more whitespace character chunks (matched with \s+) are removed from the whole Group 1 value ($1) (this second inner replacement is enabled with e flag). All occurrences are handled due to the g flag (next to e).
See the online demo:
#!/bin/bash
s=$(echo -e "test {\nseq-cont {\n\t\t\t0,\n\t\t\t67,\n\t\t\t266\n\t\t\t},\n\t\tgrp-id 505\n\t}\n}\ntest{\n\ttest1{\n\t\tval\n\t}\n}\n")
perl -0777 -pe 's/\s+(seq-cont\s*\{[^}]*\})/$1=~s|\s+||gr/ge' <<< "$s"
Output:
test {seq-cont{0,67,266},
grp-id 505
}
}
test{
test1{
val
}
}

Related

How to I remove the white space between overridable="true" and default using shell script?

<Property name="wt.99999" overridable="true"
default="ext.listner.services.ListnerService/ext.listner.services.ListnerService"/>
I want to remove the space using shell script.
I write this code.
filename2=CECWT.xconf
c=wt.99999
c1=ext.listner.services.ListnerService
grep -w "$c\|$c1" $filename2 > output.txt
cat output.txt | trim
I want output :
<Property name="wt.99999" overridable="true" default="ext.listner.services.ListnerService/ext.listner.services.ListnerService"/>
You could use this GNU sed command:
sed '/overridable=/N; N; s/\n\s*/ /g' your_file
It finds any line with containing the string 'overridable=' and then reads in
the next 2 lines (N; N) since in your example there is a blank line before
the one you want. Then removes the newline characters and any trailing space
\n\s*, replacing them with spaces, resulting in a single joined line. (Note
this command will only work for the case of a single blank line between the two
lines you want joined!)
If you want to do the replacement in-place (i.e. overwrite the contents of the
your_file, you can add the -i flag:
sed -i '/overridable=.*/N; N; s/\n\s*/ /g' your_file
This page https://www.grymoire.com/Unix/Sed.html explains sed very well if you
want to learn it.
Trying the command
When I test the command on a file called 'your_file' which has the contents:
<Property name="wt.99999" overridable="true"
default="ext.listner.services.ListnerService/ext.listner.services.ListnerService"/>
I get:
<Property name="wt.99999" overridable="true" default="ext.listner.services.ListnerService/ext.listner.services.ListnerService"/>
grep alone does not seem like a very good tool for this. If your input is proper XML, using a real XML tool to reformat it is probably the way to go. But if you just need to clean something up quickly, try this simple Awk script:
awk '/<Property name="wt.99999"/ { printf "%s", $0; isprop=1; next }
/^[ \t]*$/ && isprop { next; }
{ isprop=0 } 1' CECWT.xconf >output.txt
The 1 at the end prints the input line with newline and all. We special-case the line which matches the first regex to print that without the newline. We then add a state variable to also skip any lines with only whitespace on them (or nothing at all) until we find a line which doesn't match either regex.
This will still leave whitespace from the end of the property line and whitespace from the beginning of the following line; trimming that complicates the script slightly, but not by much.
awk '/<Property name="wt.99999"/ { printf "%s", $0; isprop=1; next }
/^[ \t]*$/ && isprop { next; }
isprop { sub(/^[ \t]*/, " "); isprop=0 } 1' CECWT.xconf >output.txt

Split large line in to smaller lines and insert special character at the beginning of newline

I have a file containing large lines. The lines should be divided into separate lines after two characters, and the new line should have // appended at the beginning.
What I have :
MEANDER_XY
MEANDER_WS
What is required :
ME
//AN
//DE
//R_X
//Y
ME
//AN
//DE
//R_W
//S
I have used sed -e 's/.\{2\}/&\n/g'. The line is getting divided but I also need // to be appended as shown.
This might work for you (GNU sed):
sed 's/[A-Z]_\?[A-Z]/&\n\/\//g' file
Match an uppercase A thru Z, followed by an optional underscore, followed by another uppercase A thru Z and replace it by itself, followed by a newline, followed by two back slashes, globally.
Could you please try following, tested with your given examples.
awk '
{
count=""
while($0){
if(match($0,/[a-zA-Z][^a-zA-Z]*[a-zA-Z]/)){
value=substr($0,RSTART,RLENGTH)
if(++count>1){
print "//"value
}
else{
print value
}
$0=substr($0,RSTART+RLENGTH)
}
else{
if(length($0)){
print "//" $0
next
}
}
}
}' Input_file
Using GNU awk, you can define fields by their pattern using FPAT (See here). You state:
The lines should be divided into separate lines after two characters, and the new line should have // appended at the beginning.
Your output however assumes two alphabetic characters. The pattern you are after is:
FPAT=""[^[:alpha:]]*[[:alpha:]][^[:alpha:]]*[[:alpha:]]?"
It searches for one or two alphabetic characters at most which can be preceaded or interleaved with non-alphabetic characters. (not ended):
MEANDER_XY -> ME\n//AN\n//DE\n//R_X\n//Y
MEANDERS_XY -> ME\n//AN\n//DE\n//RS\n//_XY
So, the following awk will do the trick:
awk 'BEGIN{FPAT=""[^[:alpha:]]*[[:alpha:]][^[:alpha:]]*[[:alpha:]]?"; OFS="\n//"}{$1=$1}1'
Using POSIX awk, you can do the more extended version that will always work:
awk 'BEGIN{regex=""[^[:alpha:]]*[[:alpha:]][^[:alpha:]]*[[:alpha:]]"; OFS="\n//"}
{ s=$0
while(match(s,regex)) {
printf substr(s,RSTART,RLENGTH) (length(s)==RLENGTH ? "": OFS)
s=substr(s,RLENGTH+1)
}
print s
}'

How can I make multiple lines into one line using bash?

So I have code that looks like this:
else if(between(pay,1260,1280))
{
return 159;
}
else if(between(pay,1280,1300))
{
return 162;
}
else if(between(pay,1300,1320))
{
return 165;
}
But I want it to look like this:
else if(between(pay,1260,1280)){return 159;}
else if(between(pay,1280,1300)){return 162;}
else if(between(pay,1300,1320)){return 165;}
Can I do this in bash? If not, which language can I use?
The full code is over 30,000 lines and I could manually do it, but I know there's a better way. I want to say the 'sed' command can help me with a mixture of regex, but that's as far as my knowledge can take me.
P.S Please overlook how un-optimized it is just this once.
Following awk may also help you in same.
awk -v RS="" '{
$1=$1;
gsub(/ { /,"{");
gsub(/ }/,"}");
gsub(/}/,"&\n");
gsub(/ else/,"else");
sub(/\n$/,"")
}
1
' Input_file
Output will be as follows.
else if(between(pay,1260,1280)){return 159;}
else if(between(pay,1280,1300)){return 162;}
else if(between(pay,1300,1320)){return 165;}
EDIT: Adding explanation for solution too now.
awk -v RS="" '{ ##Making RS(record separator) as NULL here.
$1=$1; ##re-creating first field to remove new lines or space.
gsub(/ { /,"{"); ##globally substituting space { with only { here.
gsub(/ }/,"}"); ##globally substituting space } with only } here.
gsub(/}/,"&\n"); ##globally substituting } with } and new line here.
gsub(/ else/,"else");##globally substituting space else with only else here.
sub(/\n$/,"") ##substituting new line at last of line with NULL.
}
1 ##motioning 1 here as awk works on method of condition and action.
##So here I am making condition as TRUE and then not mentioning any action so be default print of current line will happen.
' Input_file
This might work for you (GNU sed):
sed '/^else/{:a;N;/^}/M!ba;s/\n\s*//g}' file
Gather up the required lines in the pattern space and remove all newlines and following spaces on encountering the end marker i.e. a line beginning }.

How To Sed Search Replace Entire Word With String Match In File

I have modified the code found here: sed whole word search and replace
I have been trying to use the proper syntax \< and \> for the sed to match multiple terms in a file.
echo "Here Is My Example Testing Code" | sed -e "$(sed 's:\<.*\>:s/&//ig:' file.txt)"
However, I think, because it's looking into the file, it doesn't match the full word (only exact match) leaving some split words and single characters.
Does anyone know the proper syntax?
Example:
Input:
Here Is My Example Testing Code
File.txt:
example
test
Desired output:
Here Is My Code
Modify your sed command as followed should extract what you want,
sed -e "$(sed 's:\<.*\>:s/&\\w*\\s//ig:' file.txt)"
Brief explanation,
\b matches the position between a word and a non-alphanumeric character. In this case, the pattern 'test' in file.txt would not match 'Testing'.
In this way, modify the searched pattern appended with \w* should work. \w actually matched [a-zA-Z0-9_]
And don't forget to eliminate the space behind each searched pattern, \s should be added.
Following awk could help you in same.
awk 'FNR==NR{a[$0]=$0;next} {for(i=1;i<=NF;i++){for(j in a){if(tolower($i)~ a[j]){$i=""}}}} 1' file.txt input
***OR***
awk '
FNR==NR{
a[$0]=$0;
next
}
{
for(i=1;i<=NF;i++){
for(j in a){
if(tolower($i)~ a[j]){
$i=""}
}}}
1
' file.txt input
Output will be as follows.
Here Is My Code
Also if your Input_file is always a single space delimited and you don't want unnecessary space as shown in above output, then you could use following.
awk 'FNR==NR{a[$0]=$0;next} {for(i=1;i<=NF;i++){for(j in a){if(tolower($i)~ a[j]){$i=""}}};gsub(/ +/," ")} 1' file.txt input
***OR***
awk '
FNR==NR{
a[$0]=$0;
next
}
{
for(i=1;i<=NF;i++){
for(j in a){
if(tolower($i)~ a[j]){
$i=""}
}};
gsub(/ +/," ")
}
1
' file.txt input
Output will be as follows.
Here Is My Code

AWK remove blank lines

The /./ is removing blank lines for the first condition { print "a"$0 } only, how would I ensure the script removes blank lines for every condition ?
awk -F, '/./ { print "a"$0 } NR!=1 { print "b"$0 } { print "c"$0 } END { print "d"$0 }' MyFile
A shorter form of the already proposed answer could be the following:
awk NF file
Any awk script follows the syntax condition {statement}. If the statement block is not present, awk will print the whole record (line) in case the condition is not zero.
NF variable in awk holds the number of fields in the line. So when the line is non empty, NF holds a positive value which trigger the default awk action (print the whole line). In case of empty line, NF is zero and the condition is not met, so awk does nothing.
Note that you don't even need quote because this 2 letters awk script doesn't contain any space or character that could be interpreted by the shell.
or
awk '!/^$/' file
^$ is the regex for an empty line. The 2 / is needed to let awk understand the string is a regex. ! is the standard negation.
Awk command to remove blank lines from a file:
awk 'NF > 0' filename
if you want to ignore all blank lines, put this at the beginning of the script
/^$/ {next}
Put following conditions inside the first one, and check them with if statements, like this:
awk -F, '
/./ {
print "a"$0;
if (NR!=1) { print "b"$0 }
print "c"$0
}
END { print "d"$0 }
' MyFile

Resources