Extract text between braces - linux

I have a string as:
MESSAGES { "Instance":[{"InstanceID":"i-098098"}] } ff23710b29c0220849d4d4eded562770 45c391f7-ea54-47ee-9970-34957336e0b8
I need to extract the part { "Instance":[{"InstanceID":"i-098098"}] } i.e from the first occurence of '{' to last occurence of '}' and keep it in a separate file.

If you have this in a file,
sed 's/^[^{]*//;s/[^}]*$//' file
(This will print to standard output. Redirect to a file or capture into a variable or do whatever it is that you want to do with it.)
If you have this in a variable called MESSAGES,
EXTRACTED=${MESSAGES#*{}
EXTRACTED="{${EXTRACTED%\}*}}"

I would suggest either sed or awk from this article. But initial testing shows its a little more complicated and you will probably have to use a combination or pipe:
echo "MESSAGES { "Instance":[{"InstanceID":"i-098098"}] } ff23710b29c0220849d4d4eded562770 45c391f7-ea54-47ee-9970-34957336e0b8" | sed 's/^\(.*\)}.*$/\1}/' | sed 's/^[^{]*{/{/'
So the first sed delete everything after the last } and replace it with a } so it still shows; and the second sed delete everything up to the first { and replace it with a { so it still shows.
This is the output I got:
{ Instance:[{InstanceID:i-098098}] }

Related

How to replace newlines between brackets

I have log file similar to this format
test {
seq-cont {
0,
67,
266
},
grp-id 505
}
}
test{
test1{
val
}
}
Here is the echo command to produce that output
$ echo -e "test {\nseq-cont {\n\t\t\t0,\n\t\t\t67,\n\t\t\t266\n\t\t\t},\n\t\tgrp-id 505\n\t}\n}\ntest{\n\ttest1{\n\t\tval\n\t}\n}\n"
Question is how to remove all whitespace between seq-cont { and the next } that may be multiple in the file.
I want the output to be like this. Preferably use sed to produce the output.
test{seq-cont{0,67,266},
grp-id 505
}
}
test{
test1{
val
}
}
Efforts by OP: Here is the one somewhat worked but not exactly what I wanted:
sed ':a;N;/{/s/[[:space:]]\+//;/}/s/}/}/;ta;P;D' logfile
It can be done using gnu-awk with a custom RS regex that matches { and closing }:
awk -v RS='{[^}]+}' 'NR==1 {gsub(/[[:space:]]+/, "", RT)} {ORS=RT} 1' file
test {seq-cont{0,67,266},
grp-id 505
}
}
test{
test1{
val
}
}
Here:
NR==1 {gsub(/[[:space:]]+/, "", RT)}: For the first record replace all whitespaces (including line breaks) with empty string.
{ORS=RT}: Set ORS to whatever text we captured in RS
PS: Remove NR==1 if you want to do this for entire file.
With your shown samples, please try following awk program. Tested and written in GNU awk.
awk -v RS= '
match($0,/{\nseq-cont {\n[^}]*/){
val=substr($0,RSTART,RLENGTH)
gsub(/[[:space:]]+/,"",val)
print substr($0,1,RSTART-1) val substr($0,RSTART+RLENGTH)
}
' Input_file
Explanation: Simple explanation would be, using RS capability to set it to null. Then using match function of awk to match everything between seq-cont { to till next occurrence of }. Removing all spaces, new lines in matched value. Finally printing all the values including newly edited values to get expected output mentioned by OP.
You can do that much easier with perl:
perl -0777 -i -pe 's/\s+(seq-cont\s*\{[^}]*\})/$1=~s|\s+||gr/ge' logfilepath
The -0777 option tells perl to slurp the file into a single string, -i saves changes inline, \s+(seq-cont\s*\{[^}]*\}) regex matches one or more whitespaces, then captures into Group 1 ($1) seq-cont, zero or more whitespaces, and then a substring between the leftmost { and the next } char ([^}]* matches zero or more chars other than }) and then all one or more whitespace character chunks (matched with \s+) are removed from the whole Group 1 value ($1) (this second inner replacement is enabled with e flag). All occurrences are handled due to the g flag (next to e).
See the online demo:
#!/bin/bash
s=$(echo -e "test {\nseq-cont {\n\t\t\t0,\n\t\t\t67,\n\t\t\t266\n\t\t\t},\n\t\tgrp-id 505\n\t}\n}\ntest{\n\ttest1{\n\t\tval\n\t}\n}\n")
perl -0777 -pe 's/\s+(seq-cont\s*\{[^}]*\})/$1=~s|\s+||gr/ge' <<< "$s"
Output:
test {seq-cont{0,67,266},
grp-id 505
}
}
test{
test1{
val
}
}

Split large line in to smaller lines and insert special character at the beginning of newline

I have a file containing large lines. The lines should be divided into separate lines after two characters, and the new line should have // appended at the beginning.
What I have :
MEANDER_XY
MEANDER_WS
What is required :
ME
//AN
//DE
//R_X
//Y
ME
//AN
//DE
//R_W
//S
I have used sed -e 's/.\{2\}/&\n/g'. The line is getting divided but I also need // to be appended as shown.
This might work for you (GNU sed):
sed 's/[A-Z]_\?[A-Z]/&\n\/\//g' file
Match an uppercase A thru Z, followed by an optional underscore, followed by another uppercase A thru Z and replace it by itself, followed by a newline, followed by two back slashes, globally.
Could you please try following, tested with your given examples.
awk '
{
count=""
while($0){
if(match($0,/[a-zA-Z][^a-zA-Z]*[a-zA-Z]/)){
value=substr($0,RSTART,RLENGTH)
if(++count>1){
print "//"value
}
else{
print value
}
$0=substr($0,RSTART+RLENGTH)
}
else{
if(length($0)){
print "//" $0
next
}
}
}
}' Input_file
Using GNU awk, you can define fields by their pattern using FPAT (See here). You state:
The lines should be divided into separate lines after two characters, and the new line should have // appended at the beginning.
Your output however assumes two alphabetic characters. The pattern you are after is:
FPAT=""[^[:alpha:]]*[[:alpha:]][^[:alpha:]]*[[:alpha:]]?"
It searches for one or two alphabetic characters at most which can be preceaded or interleaved with non-alphabetic characters. (not ended):
MEANDER_XY -> ME\n//AN\n//DE\n//R_X\n//Y
MEANDERS_XY -> ME\n//AN\n//DE\n//RS\n//_XY
So, the following awk will do the trick:
awk 'BEGIN{FPAT=""[^[:alpha:]]*[[:alpha:]][^[:alpha:]]*[[:alpha:]]?"; OFS="\n//"}{$1=$1}1'
Using POSIX awk, you can do the more extended version that will always work:
awk 'BEGIN{regex=""[^[:alpha:]]*[[:alpha:]][^[:alpha:]]*[[:alpha:]]"; OFS="\n//"}
{ s=$0
while(match(s,regex)) {
printf substr(s,RSTART,RLENGTH) (length(s)==RLENGTH ? "": OFS)
s=substr(s,RLENGTH+1)
}
print s
}'

Find and return an if code block from a file

I am writing a bash script to find if a code block starting with
if (isset($conf['memcache_servers'])) { exists in a file?
If true, then I need to return the whole if block.
How to do that?
Code block return example:
if (isset($conf['memcache_servers'])) {
$conf['cache_backends'][] = '.memcache.inc';
$conf['cache_default_class'] = 'MemCache';
$conf['cache_class_cache_form'] = 'DatabaseCache';
}
You can use sed to do this. From a bash command line, run this:
sed -n "/if (isset(\$conf\['memcache_servers'\]))/,/}/p" inputFile
This uses range option /pattern1/,/pattern2/ from sed, and p to print everything between and including the if...{ and } lines.
Here, I have used double quotes to express the sed script because the first pattern includes single quotes'. he sqaure-brackets need to be escaped as well. \[ and \].

Get text between, but not including, header and footer using awk or sed

Suppose I have a file myfile.txt, with the following contents:
1234
5678
start
stuff
stop
9871
I would like to get the data between the header 'start' and the footer 'stop' but not including these borders (so in this case, my result would just be the line 'stuff'). Using awk and sed, I tried the following:
awk '/start/ { show=1 } show; /stop/ { show=0 }' myfile.txt
sed -n '/start/,/stop/p' myfile.txt
But these include the header and footer in the output. How can I do it so that I don't retain the header and foot - but only the info in between?
Just reverse the order of the tests:
$ awk '/stop/{show=0} show; /start/ { show=1 }' myfile.txt
stuff
How it works
/stop/{show=0}
Any time we encounter a line that matches the regex stop, we set the variable show to 0 (false).
show;
If show is true, print the line.
In more detail, show is a condition, meaning that it is evaluated and, if true, an action is performed. Since we don't explicitly specify an action, the default action is performed which is print $0.
Since no action is explicitly specified, we need to follow show with ; in order to separated it from the next command.
/start/ { show=1 }
Any time we encounter a line that matches the regex start, we set the variable show to 1 (true).
With gnu sed
sed '/start/,/stop/!d;//d' myfile.txt
Another sed-command, but gnu-sed, too:
echo "1234
5678
start
stuff
stop
9871" | sed -n '/start/,/stop/p' | sed '1d;$d'
stuff
There is no problem in programming, which couldn't be solved with another layer of sed. :)

Capitalize the first letter matching a pattern

I have a dozens of files that contain the following text:
request {
Request {
input("testing")
}
}
I would like to use sed to capitalize the first letter of any text within input. For example, I want testing to be Testing. I tried the following command to capitalize the beginning of all, but how can I only apply it to input ?
sed -e "s/\b\(.\)/\u\1/g"
How about
sed 's/input("\(.\)/input("\u\1/'
Test
$ echo -e 'request {
Request {
input("testing")
}
}' | sed 's/input("\(.\)/input("\u\1/'
# Outputs
# request {
# Request {
# input("Testing")
# }
#}
What it does?
/input("\(.\) Matches input(" followed by the first character( in example t. The character matched by . is captured in \1
input("\u\1 Replacement string. input(" is replaced with itself. \u\1 converts the character captured in \1 to uppercase.

Resources