Two pattern match on same sed command - linux

I have the following sed command:
sed -n '/^out(/{n;p}' ${filename} | sed -n '/format/ s/.*format=//g; s/),$//gp; s/))$//gp'
I tried to do it as one line as in:
sed -n '/^out(/{n;}; /format/ s/.*format=//g; s/),$//gp; s/))$//gp' ${filename}
But that also display the lines I don't want (those that do not match).
What I have is a file with some strings as in:
entry(variable=value)),
format(variable=value)),
entry(variable=value)))
out(variable=value)),
format(variable=value)),
...
I just want the format lines that came right after the out entry. and remove those trailing )) or ),

You can use this sed command:
sed -nr '/^out[(]/ {n ; s/.*[(]([^)]+)[)].*/\1/p}' your_file
Once a out is found, it advanced to the next line (n) and uses the s command with p flag to extract only what is inside parenthesises.
Explanation:
I used [(] instead of \(. Outside brackets a ( usually means grouping, if you want a literal (, you need to escape it as \( or you can put it inside brackets. Most RE special characters dont need escaping when put inside brackets.
([^)]+) means a group (the "(" here are RE metacharacters not literal parenthesis) that consists of one or more (+) characters that are not (^) ) (literal closing parenthesis), the ^ inverts the character class [ ... ]

Related

Linux sed regular expression

I have a string:
2021-05-27 10:40:50.678117 PID529270:TID 47545543550720:SID 1673488:TXID 786092740:QID 140: INFO:MEMCONTEXT:MemContext state: mem[cur/hi/max] = 9135 / 96586 / 96576 MB, VM[cur/hi/max] = 9161 / 21841178 / 100663296 MB
I want to get the number 9135 that first occurrence between '=' and '/', right now, my command as below, it works, but I don't think it's perfect:
sed -r 's/.* = ([0-9]+) .* = .*/\1 /'
Need a more neat one, please help advise.
You can use
sed -En 's~.*= ([0-9]+) /.*=.*~\1~p'
See the online demo.
An awk solution:
awk -F= '{gsub(/\/.*|[^0-9]/,"",$2);print $2}'
See this demo.
Details:
-En - E (or r as in your example) enables the POSIX ERE syntax and n suppresses the default line output
.*= ([0-9]+) /.*=.* - matches any text, = + space, captures one or more digits into Grou 1, then matches a space, /, then any text, = and again any text
\1 - replaces with Group 1 value
p - prints the result of the substitution.
Here, ~ are used as regex delimiters in order not to escape / in the pattern.
awk:
-F= - sets the input field separator to =
gsub(/\/.*|[^0-9]/,"",$2) - removes any non-digit or / and the rest of the string
print $2 - prints the modified Field 2 value.
You could also get the first match with grep using -P for Perl-compatible regular expressions.
grep -oP "^.*? = \K\d+(?= /)"
^ Start of string
.*? Match as least as possible chars
= Match space = and space
\K\d+ Forget what is matched so far
(?= /) Assert a space and / to the right
Output
9135
See a bash demo
Since you want the material between the first = and the first /, ignoring the spaces, you could use:
sed -E -e 's%^[^=]*= ([^/]*) /.*$%\1%'
This uses Extended Regular Expressions (ERE) (-E; -r also works with GNU sed), and searches from the start of the line for a sequence of 'not =' characters, the = character, a space, anything that's not a slash (which is remembered), another space, a slash, and anything that follows, replacing it all with what was remembered. The ^ and $ anchors aren't crucial; it will work the same without them. The % symbols are used instead of / because the searched-for pattern includes a /. If your sure there'll never be any spaces other than the first and last ones between the = and /, you can use [^ /]* in place of [^/]* and there should be some small (probably immeasurable) performance benefit.

Get Text after word at specific position

I have file like this
TT;12-11-18;text;abc;def;word
AA;12-11-18;tee;abc;def;gih;word
TA;12-11-18;teet abc;def;word
TT;12-11-18;tdd;abc;def;gih;jkl;word
I want output like this
TT;12-11-18;text;abc;def;word
TA;12-11-18;teet abc;def;word
I want to get word if it occur at position 5 after date 12-11-18. I do not want this occurrence if its found after this position that is at 6th or 7th position. Count of position start from date 12-11-18
I want tried this command
cat file.txt|grep "word" -n1
This print all occurrence in which this pattern word is matched. How should I solve my problem?
Try this(GNU awk):
awk -F"[; ]" '/12-11-18/ && $6=="word"' file
Or sed one:
sed -n '/12-11-18;\([^; ]*[; ]\)\{3\}word/p' file
Or grep with basically the same regex(different escape):
grep -E "12-11-18;([^; ]*[; ]){3}word" file
[^; ] means any character that's not ; or (space).
* means match any repetition of former character/group.
-- [^; ]* means any length string that don't contain ; or space, the ^ in [^; ] is to negate.
[; ] means ; or space, either one occurance.
() is to group those above together.
{3} is to match three repetitives of former chracter/group.
As a whole ([^; ]*[; ]){3} means ;/space separated three fields included the delimiters.
As #kvantour points out, if there could be multiple spaces at one place they could be faulty.
To consider multiple spaces as one separator, then:
awk -F"(;| +)" '/12-11-18/ && $6=="word"'
and
grep -E "12-11-18;([^; ]*(;| +)){3}word"
or GNU sed (posix/bsd/osx sed does not support |):
sed -rn '/12-11-18;([^; ]*(;| +)){3}word/p'

How to replace a string between two commas in linux using sed

execute PKG_SP_MAINTENANCE.MoveAccount(91, 129031, 958408630); Lowes
From the above statement I am trying to get the content between the first comma and second comma i.e., 129031 and replace it with a new string which is passed as a parameter to the script. For now let's replace with N . I tried the following sed command ended up getting an error. Could someone please help?
04:24:01 Tue Sep 19 [serviceb#LQASRDSBMST002V:~/isops/tmp] cat Navya | sed 's/,^.\{*\},/N/'
sed: -e expression #1, char 14: Invalid content of \{\}
$: echo "start,middle,end" | sed 's/,[^,]*,/,NEW,/g'
start,NEW,end
Is this what you mean? This simply matches the inner-most commas and replaces the text.
Depending how you want to handle strings with more than two commas, you could do something like this to match the outer-most instead:
$: echo "start,middle,end" | sed 's/,.*,/,NEW,/g'
start,NEW,end

Delete none AGTC charachter in a text file

I have a Text file and it should contains A,G,C,T characters. However it sometimes has some unknown characters (very few) which I want to delete and if it is N replace it with A. Also I want to escape the lines which starts with a > symbol.
So far I know only how to replace N with A, which I do like this :
sed "s/N/A/g" file1.fa >file2.fasta
But I don't know how to do the first task.
Example :
Initial file
first line
AGCCCMCCCN
Target file should be like this
first line
AGCCCCCCA
Any help will be appreciate. Thanks in advance!
You can do another substitution on your sed
sed -e 's/N/A/g' -e 's/[^AGCT>]//g' -e 's/^>/\\>/' -e 's/[^\]>//g' file1.fa > file2.fasta
Pattern 1
-e 's/N/A/g'
Your pattern, replaces all instances of N with A first of all.
Pattern 2
-e 's/[^AGCT>]//g'
Secondly replace all characters that aren't A, G, C, T or > with nothing.
Pattern 3
-e 's/^>/\\>/'
Then replace all instances of > that are at the start of a string with \>
Pattern 4
-e 's/[^\]>//g'
Finally remove all > characters that aren't preceded by a \

Using sed to match anything and \s

I've got the following:
sed -i "s/SYNFLOOD_RATE = \"100/s\"/SYNFLOOD_RATE = \"10\s\"/g"
Question is how do I avoid this error?
/bin/sed: -e expression #1, char 28: unknown option to `s'
And is there a way to do a wild card match and replace with sed?
You have too many slashes, 4 when there should be 3. Use a different delimiter; comma (,), bang (!), hash (#), and at (#) are common alternatives.
sed -i "s,SYNFLOOD_RATE = \"100/s\",SYNFLOOD_RATE = \"10\s\",g"
Note that you have "100/s" in the original and "10s" (no slash) in the replacement. To actually insert a backslash, you'd need to enter 4 of them: 10\\\\s. Each pair will get reduced to a single by the shell and then the remaining double will be interpreted as a literal backslash by sed.
If you want to first grep then substitute :
sed -i '/SYNFLOOD_RATE = \"100/s/"\/SYNFLOOD_RATE = \"10\s\"/replacement/g'
But the delimiter can be anything else than /, see :
sed -i '/SYNFLOOD_RATE = "100/s#"/SYNFLOOD_RATE = "10\s"#replacement#g'
( the delimiter here is #)

Resources