Capitalize the first letter matching a pattern - linux

I have a dozens of files that contain the following text:
request {
Request {
input("testing")
}
}
I would like to use sed to capitalize the first letter of any text within input. For example, I want testing to be Testing. I tried the following command to capitalize the beginning of all, but how can I only apply it to input ?
sed -e "s/\b\(.\)/\u\1/g"

How about
sed 's/input("\(.\)/input("\u\1/'
Test
$ echo -e 'request {
Request {
input("testing")
}
}' | sed 's/input("\(.\)/input("\u\1/'
# Outputs
# request {
# Request {
# input("Testing")
# }
#}
What it does?
/input("\(.\) Matches input(" followed by the first character( in example t. The character matched by . is captured in \1
input("\u\1 Replacement string. input(" is replaced with itself. \u\1 converts the character captured in \1 to uppercase.

Related

How to replace newlines between brackets

I have log file similar to this format
test {
seq-cont {
0,
67,
266
},
grp-id 505
}
}
test{
test1{
val
}
}
Here is the echo command to produce that output
$ echo -e "test {\nseq-cont {\n\t\t\t0,\n\t\t\t67,\n\t\t\t266\n\t\t\t},\n\t\tgrp-id 505\n\t}\n}\ntest{\n\ttest1{\n\t\tval\n\t}\n}\n"
Question is how to remove all whitespace between seq-cont { and the next } that may be multiple in the file.
I want the output to be like this. Preferably use sed to produce the output.
test{seq-cont{0,67,266},
grp-id 505
}
}
test{
test1{
val
}
}
Efforts by OP: Here is the one somewhat worked but not exactly what I wanted:
sed ':a;N;/{/s/[[:space:]]\+//;/}/s/}/}/;ta;P;D' logfile
It can be done using gnu-awk with a custom RS regex that matches { and closing }:
awk -v RS='{[^}]+}' 'NR==1 {gsub(/[[:space:]]+/, "", RT)} {ORS=RT} 1' file
test {seq-cont{0,67,266},
grp-id 505
}
}
test{
test1{
val
}
}
Here:
NR==1 {gsub(/[[:space:]]+/, "", RT)}: For the first record replace all whitespaces (including line breaks) with empty string.
{ORS=RT}: Set ORS to whatever text we captured in RS
PS: Remove NR==1 if you want to do this for entire file.
With your shown samples, please try following awk program. Tested and written in GNU awk.
awk -v RS= '
match($0,/{\nseq-cont {\n[^}]*/){
val=substr($0,RSTART,RLENGTH)
gsub(/[[:space:]]+/,"",val)
print substr($0,1,RSTART-1) val substr($0,RSTART+RLENGTH)
}
' Input_file
Explanation: Simple explanation would be, using RS capability to set it to null. Then using match function of awk to match everything between seq-cont { to till next occurrence of }. Removing all spaces, new lines in matched value. Finally printing all the values including newly edited values to get expected output mentioned by OP.
You can do that much easier with perl:
perl -0777 -i -pe 's/\s+(seq-cont\s*\{[^}]*\})/$1=~s|\s+||gr/ge' logfilepath
The -0777 option tells perl to slurp the file into a single string, -i saves changes inline, \s+(seq-cont\s*\{[^}]*\}) regex matches one or more whitespaces, then captures into Group 1 ($1) seq-cont, zero or more whitespaces, and then a substring between the leftmost { and the next } char ([^}]* matches zero or more chars other than }) and then all one or more whitespace character chunks (matched with \s+) are removed from the whole Group 1 value ($1) (this second inner replacement is enabled with e flag). All occurrences are handled due to the g flag (next to e).
See the online demo:
#!/bin/bash
s=$(echo -e "test {\nseq-cont {\n\t\t\t0,\n\t\t\t67,\n\t\t\t266\n\t\t\t},\n\t\tgrp-id 505\n\t}\n}\ntest{\n\ttest1{\n\t\tval\n\t}\n}\n")
perl -0777 -pe 's/\s+(seq-cont\s*\{[^}]*\})/$1=~s|\s+||gr/ge' <<< "$s"
Output:
test {seq-cont{0,67,266},
grp-id 505
}
}
test{
test1{
val
}
}

Split large line in to smaller lines and insert special character at the beginning of newline

I have a file containing large lines. The lines should be divided into separate lines after two characters, and the new line should have // appended at the beginning.
What I have :
MEANDER_XY
MEANDER_WS
What is required :
ME
//AN
//DE
//R_X
//Y
ME
//AN
//DE
//R_W
//S
I have used sed -e 's/.\{2\}/&\n/g'. The line is getting divided but I also need // to be appended as shown.
This might work for you (GNU sed):
sed 's/[A-Z]_\?[A-Z]/&\n\/\//g' file
Match an uppercase A thru Z, followed by an optional underscore, followed by another uppercase A thru Z and replace it by itself, followed by a newline, followed by two back slashes, globally.
Could you please try following, tested with your given examples.
awk '
{
count=""
while($0){
if(match($0,/[a-zA-Z][^a-zA-Z]*[a-zA-Z]/)){
value=substr($0,RSTART,RLENGTH)
if(++count>1){
print "//"value
}
else{
print value
}
$0=substr($0,RSTART+RLENGTH)
}
else{
if(length($0)){
print "//" $0
next
}
}
}
}' Input_file
Using GNU awk, you can define fields by their pattern using FPAT (See here). You state:
The lines should be divided into separate lines after two characters, and the new line should have // appended at the beginning.
Your output however assumes two alphabetic characters. The pattern you are after is:
FPAT=""[^[:alpha:]]*[[:alpha:]][^[:alpha:]]*[[:alpha:]]?"
It searches for one or two alphabetic characters at most which can be preceaded or interleaved with non-alphabetic characters. (not ended):
MEANDER_XY -> ME\n//AN\n//DE\n//R_X\n//Y
MEANDERS_XY -> ME\n//AN\n//DE\n//RS\n//_XY
So, the following awk will do the trick:
awk 'BEGIN{FPAT=""[^[:alpha:]]*[[:alpha:]][^[:alpha:]]*[[:alpha:]]?"; OFS="\n//"}{$1=$1}1'
Using POSIX awk, you can do the more extended version that will always work:
awk 'BEGIN{regex=""[^[:alpha:]]*[[:alpha:]][^[:alpha:]]*[[:alpha:]]"; OFS="\n//"}
{ s=$0
while(match(s,regex)) {
printf substr(s,RSTART,RLENGTH) (length(s)==RLENGTH ? "": OFS)
s=substr(s,RLENGTH+1)
}
print s
}'

How to remove a specific character before and after a pattern match in shell script?

I want to remove "#" after a specific pattern match.
Ex:
in below code i will grep for "SOME DESC". i want all the lines commented above and below "SOME DESC" to get Un-commented.
existing code
#define service {
# hostgroup_name hmaster_hosts
# use local-service
# servicegroups SOME_GROUP
# service_description SOME DESC : service Service
# check_command check_nrpe!check_something
#}
After removal of "#"
define service {
hostgroup_name hmaster_hosts
use local-service
servicegroups SOME_GROUP
service_description SOME DESC : service Service
check_command check_nrpe!check_something
}
i tried the below code to do the changes.
sed -i 's/#define service {/define service {/g' services.cfg
sed -i 's/# hostgroup_name / hostgroup_name /' services.cfg
sed -i 's/# use / use /' services.cfg
sed -i 's/# servicegroups/ servicegroups/' services.cfg
sed -i 's/# service_description / service_description /' services.cfg
sed -i 's/# check_command / check_command /' services.cfg
sed -i 's/#}/}/g' services.cfg
but the place of # is uncertain in the code i.e it can be # hostgroup_nameor #hostgroup_nameor # hostgroup_name so my approach dint worked for some part of the code. i wanted to know is there any better way to do this irrespective of the position of #
If you have GNU awk, you can try the following:
awk -v search='SOME DESC' -v RS='(^|\n)#define service \\{[^}]*\\#\\}\n' '
index(RT, search) { RT = gensub("(^|\n)#", "\\1", "g", RT) }
{ printf "%s%s", $0, RT }
' file
The above assumes that the commented lines start directly with # and that the code follows directly after, as in the sample input in the question. For a variant solution that works with variable amounts of whitespace, see bottom.
This assumes that commented lines of interest are blocks of commented define service { ... } lines that should be uncommented as a whole if the search string is found inside the block.
-v search='SOME DESC' pass the literal string to search for as Awk variable search.
-v RS='(^|\n)#define service \\{[^}]*\\#\\}\n' defines RS, the input-record separator, as a regular expression that starts with a commented-out define service { line and and ends with a commented-out } line, followed by a newline.
This means that the data reported in $0 for the current record comprises the line(s) before each block of interest.
However, GNU exposes the actual record terminator (separator) that the regex in RS matched via (nonstandard) variable RT. Thus, it is the value of RT that contains a block of interest in each iteration.
index(RT, search) returns the 1-based index of the search string's position inside the block at hand, or 0, if the block doesn't contain the search string. When used as a pattern (a Boolean condition), the associated action ({...}) is therefore only executed if the block contains the search string.
RT = gensub("(^|\n)#", "\\1", "g", RT) removes the comment char. (#) from the very beginning of all lines in the block.
Note that gensub() is a GNU-specific function that notably allows the use of references to capture groups (\1 refers to what capture group (^|\n) matched; the extra \ is needed, because awk's string parsing process \-prefixed escape sequences too).
{ printf "%s%s", $0, RT } prints the current record ($0) followed by the - potentially uncommented - block.
Variant that works with variable amounts of whitespace:
awk -v search='SOME DESC' -v RS='(^|\n)[[:blank:]]*#[[:blank:]]*define[[:blank:]]+service[[:blank:]]+\\{[^}]*\\#[[:blank:]]*\\}\n' '
index(RT, search) { RT = gensub("(^|\n)[[:blank:]]*#", "\\1", "g", RT) }
{ printf "%s%s", $0, RT }
' file
This is essentially the same solution as above, except that (potentially empty) runs of spaces/tabs before and after the # are matched ([[:blank:]]*), and nonempty runs of variable length between the tokens of the define line ([[:blank:]]+).

Perl String parsing multiple patterns

I'm trying to parse text file which has multiple patters.
Goal is to have everything in between * * and only integer in between ^ ^ it should remove all special character or string if found.
data.txt
*ABC-13077* ^817266, 55555^
*BCD-13092* ^CL: 816933^
*CDE-13127* ^ ===> Change 767666 submitted^
output.txt
ABC-13077 817266 55555
BCD-13092 816933
CDE-13127 767666
my script
#!/usr/bin/perl
use strict;
use Cwd;
my $var;
open(FH,"changelists.txt")or die("can't open file:$!");
while($var=<FH>)
{
my #vareach=split(/[* \s\^]+/,$var);
for my $each(#vareach)
{
print "$each\n";
}
}
Replace the while loop with the following:
while (<FH>) {
s/\*(.*)\*/$1/;
s/\^(.*)\^/ join ' ', $1 =~ m([0-9]+)g /e;
print;
}
The first substitution removes the asterisks.
The second substitution takes the ^...^ part, and replaces it with the result of the code in the replacement part because of the /e modifier. The code matches all the integers, and as join forces list context on the match, it returns all the matches.

Extract text between braces

I have a string as:
MESSAGES { "Instance":[{"InstanceID":"i-098098"}] } ff23710b29c0220849d4d4eded562770 45c391f7-ea54-47ee-9970-34957336e0b8
I need to extract the part { "Instance":[{"InstanceID":"i-098098"}] } i.e from the first occurence of '{' to last occurence of '}' and keep it in a separate file.
If you have this in a file,
sed 's/^[^{]*//;s/[^}]*$//' file
(This will print to standard output. Redirect to a file or capture into a variable or do whatever it is that you want to do with it.)
If you have this in a variable called MESSAGES,
EXTRACTED=${MESSAGES#*{}
EXTRACTED="{${EXTRACTED%\}*}}"
I would suggest either sed or awk from this article. But initial testing shows its a little more complicated and you will probably have to use a combination or pipe:
echo "MESSAGES { "Instance":[{"InstanceID":"i-098098"}] } ff23710b29c0220849d4d4eded562770 45c391f7-ea54-47ee-9970-34957336e0b8" | sed 's/^\(.*\)}.*$/\1}/' | sed 's/^[^{]*{/{/'
So the first sed delete everything after the last } and replace it with a } so it still shows; and the second sed delete everything up to the first { and replace it with a { so it still shows.
This is the output I got:
{ Instance:[{InstanceID:i-098098}] }

Resources