Replace with SED multiple occurences at the same line

Replace with SED multiple occurences at the same line - linux

I want to replace all slashes "/" between alphanumeric with backslash+slash "\/" apart from the last one on each string, e.g.
nocareNocare abc\/def/ghi/mno\/pq/r abc\/def\/ghi/mno\/pq/r
should become:
nocareNocare abc\/def\/ghi\/mno\/pq/r abc\/def\/ghi\/mno\/pq/r
I use:
sed 's/\(.*\)\([[:alnum:]]\)\/\([[:alnum:]]\)\(\S*\)\(\\\|\/\)/\1\2\\\/\3\4\//g'
Short explanation: match
any string + alnum + / + any non-white + / or \
But it only replace one case, so I need to run it 3 times to replace all 3 occurences. Looks like the first time it matches all the way to :
>nocareNocare abc\/def/ghi/mno\/pq/r abc\/def\/ghi/
instead of
>nocareNocare abc\/def/

sed -e :a -e 's|\([a-z0-9]\)/\([a-z0-9][^ ]*[a-z0-9]/[a-z0-9]\)|\1\\/\2|;ta' filename
Loosely translated, this says "replace a lone slash followed by some other stuff in the string, followed by another lone slash, with backslash-slash and that same stuff (and the second slash). And after making such a replacement, start over again."

You can use a perl command line solution based on the following regEx's
(?<!\\)
not preceded by a backslash
(?!\w+\s)
not followed by word characters terminating in whitespace
perl -pe 's;(?<!\\)/(?!\w+\s);\\/;g' file
nocareNocare abc\/def\/ghi\/mno\/pq/r abc\/def\/ghi\/mno\/pq/r

With GNU sed:
sed -E 's:([^\])/:\1\\/:g;s:\\/([^\]*( |$)):/\1:g' file
Two s command here:
s:([^\])/:\1\\/:g replace all / not preceded by a \ with \/
s:\\/([^\]*( |$)):/\1:g replace last \/ before space or end of line with /

Related

Linux sed regular expression

I have a string:
2021-05-27 10:40:50.678117 PID529270:TID 47545543550720:SID 1673488:TXID 786092740:QID 140: INFO:MEMCONTEXT:MemContext state: mem[cur/hi/max] = 9135 / 96586 / 96576 MB, VM[cur/hi/max] = 9161 / 21841178 / 100663296 MB
I want to get the number 9135 that first occurrence between '=' and '/', right now, my command as below, it works, but I don't think it's perfect:
sed -r 's/.* = ([0-9]+) .* = .*/\1 /'
Need a more neat one, please help advise.

You can use
sed -En 's~.*= ([0-9]+) /.*=.*~\1~p'
See the online demo.
An awk solution:
awk -F= '{gsub(/\/.*|[^0-9]/,"",$2);print $2}'
See this demo.
Details:
-En - E (or r as in your example) enables the POSIX ERE syntax and n suppresses the default line output
.*= ([0-9]+) /.*=.* - matches any text, = + space, captures one or more digits into Grou 1, then matches a space, /, then any text, = and again any text
\1 - replaces with Group 1 value
p - prints the result of the substitution.
Here, ~ are used as regex delimiters in order not to escape / in the pattern.
awk:
-F= - sets the input field separator to =
gsub(/\/.*|[^0-9]/,"",$2) - removes any non-digit or / and the rest of the string
print $2 - prints the modified Field 2 value.

You could also get the first match with grep using -P for Perl-compatible regular expressions.
grep -oP "^.*? = \K\d+(?= /)"
^ Start of string
.*? Match as least as possible chars
= Match space = and space
\K\d+ Forget what is matched so far
(?= /) Assert a space and / to the right
Output
9135
See a bash demo

Since you want the material between the first = and the first /, ignoring the spaces, you could use:
sed -E -e 's%^[^=]*= ([^/]*) /.*$%\1%'
This uses Extended Regular Expressions (ERE) (-E; -r also works with GNU sed), and searches from the start of the line for a sequence of 'not =' characters, the = character, a space, anything that's not a slash (which is remembered), another space, a slash, and anything that follows, replacing it all with what was remembered. The ^ and $ anchors aren't crucial; it will work the same without them. The % symbols are used instead of / because the searched-for pattern includes a /. If your sure there'll never be any spaces other than the first and last ones between the = and /, you can use [^ /]* in place of [^/]* and there should be some small (probably immeasurable) performance benefit.

Sed - remove all semicolons between a pair of double-quotes

I have a dirty csv-file containing rows with quoted semicolons. I am trying to clear these semicolons with commands like:
sed -rin 's/(^.*\;.*\;\".*)(\;)(.*\"\;.*$)/\1\3/' file
But somehow this doesn't remove all of the semicolons. Some of the problematic rows look like this:
;0;"One ▒;)";123; ... ; nth-1column;
;0;"Two ▒;)";456; ... ; nthcolumn;
When they should be cleaned to:
;0;"One ▒)";123; ... ; nth-1column;
;0;"Two ▒)";456; ... ; nthcolumn;
There might be some encoding issues, but this should be ignored by the regex. I am only interested in removing the semicolons, the encoding is handled afterwards.
Any ideas on how to aggressively clean all semicolons contained within double-quotes?

This might work for you (GNU sed):
sed -E ':a;s/^([^"]*("[^;"]*"[^"]*)*"[^";]*);/\1/;ta' file
Make a back reference starting from the front of each line that contain characters not between double quotes and quoted strings that do not contain ;'s followed by double quote and characters that are neither double-quote or semi-colon. If the next character is a semi-colon, remove it and repeat until failure, then print the result.
An alternative:
sed -E '/^([^"]*("[^";]*"[^"]*)*"[^";]*);/{s//\n\1/;D}' file
or:
sed -E 's/^([^"]*("[^";]*"[^"]*)*"[^";]*);/\n\1/;T;D' file
EDIT:
sed -nE '/^([^"]*("[^";]*"[^"]*)*"[^";]*);/{:a;s//\1/;ta;p}' file

You can use
sed ':a;s/^\(\([^"]*;\?\|"[^";]*";\?\)*"[^";]*\);/\1/;ta' file
See an online demo.
It works like this:
:a - sets a label
^\(\([^"]*;\?\|"[^";]*";\?\)*"[^";]*\); - find:
^ - start of string
\(\([^"]*;\?\|"[^";]*";\?\)*"[^";]*\) - Group 1:
\([^"]*;\?\|"[^";]*";\?\)* - zero or more occurrences of
[^"]*;\? - zero or more chars other than " and then an optional ;
\| - or
"[^";]*";\? - ", then zero or more chars other than " and ; and then a " and then an optional ;
" - a " char
[^";]* - zero or more chars other than a ; and "
; - a semi-colon
\1 - replace with Group 1 value
ta - if there was a substitution, go back to a label position.

How to remove the first three character from the fasta file header

I have a fasta file like this:
>rna-XM_00001.1
actact
>rna-XM_00002.1
atcatc
How do I remove the 'rna-' so it become
>XM_00001.1
actact
>XM_00002.1
atcatc

What you're showing is the file contents? Then sed should be able to do this:
sed 's/^>rna-/>/' < inputfile > outputfile
Explanation:
The first character of the command-line to sed is s, which tells sed to do substitution
The / are delimiters
The ^ tells sed to look only at the start of a line
The next >rna- is the pattern to match at the start of a line
The next > is the replacement substituted for the pattern
If, instead, you want to always remove the first four characters after a > as long as they end in -, you could use:
sed 's/^>...-/>/' < inputfile > outputfile
Explanation:
This is similar to above, except the pattern to match at the start of a line is >...-. The pattern is a regexp, where a . matches any single character. So this pattern matches any line starting with >, followed by any three characters, followed by -.

Two pattern match on same sed command

I have the following sed command:
sed -n '/^out(/{n;p}' ${filename} | sed -n '/format/ s/.*format=//g; s/),$//gp; s/))$//gp'
I tried to do it as one line as in:
sed -n '/^out(/{n;}; /format/ s/.*format=//g; s/),$//gp; s/))$//gp' ${filename}
But that also display the lines I don't want (those that do not match).
What I have is a file with some strings as in:
entry(variable=value)),
format(variable=value)),
entry(variable=value)))
out(variable=value)),
format(variable=value)),
...
I just want the format lines that came right after the out entry. and remove those trailing )) or ),

You can use this sed command:
sed -nr '/^out[(]/ {n ; s/.*[(]([^)]+)[)].*/\1/p}' your_file
Once a out is found, it advanced to the next line (n) and uses the s command with p flag to extract only what is inside parenthesises.
Explanation:
I used [(] instead of \(. Outside brackets a ( usually means grouping, if you want a literal (, you need to escape it as \( or you can put it inside brackets. Most RE special characters dont need escaping when put inside brackets.
([^)]+) means a group (the "(" here are RE metacharacters not literal parenthesis) that consists of one or more (+) characters that are not (^) ) (literal closing parenthesis), the ^ inverts the character class [ ... ]

Using sed to match anything and \s

I've got the following:
sed -i "s/SYNFLOOD_RATE = \"100/s\"/SYNFLOOD_RATE = \"10\s\"/g"
Question is how do I avoid this error?
/bin/sed: -e expression #1, char 28: unknown option to `s'
And is there a way to do a wild card match and replace with sed?

You have too many slashes, 4 when there should be 3. Use a different delimiter; comma (,), bang (!), hash (#), and at (#) are common alternatives.
sed -i "s,SYNFLOOD_RATE = \"100/s\",SYNFLOOD_RATE = \"10\s\",g"
Note that you have "100/s" in the original and "10s" (no slash) in the replacement. To actually insert a backslash, you'd need to enter 4 of them: 10\\\\s. Each pair will get reduced to a single by the shell and then the remaining double will be interpreted as a literal backslash by sed.

If you want to first grep then substitute :
sed -i '/SYNFLOOD_RATE = \"100/s/"\/SYNFLOOD_RATE = \"10\s\"/replacement/g'
But the delimiter can be anything else than /, see :
sed -i '/SYNFLOOD_RATE = "100/s#"/SYNFLOOD_RATE = "10\s"#replacement#g'
( the delimiter here is #)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Replace with SED multiple occurences at the same line - linux

You can use a perl command line solution based on the following regEx's (?<!\\) not preceded by a backslash (?!\w+\s) not followed by word characters terminating in whitespace perl -pe 's;(?<!\\)/(?!\w+\s);\\/;g' file nocareNocare abc\/def\/ghi\/mno\/pq/r abc\/def\/ghi\/mno\/pq/r

With GNU sed: sed -E 's:([^\])/:\1\\/:g;s:\\/([^\]( |$)):/\1:g' file Two s command here: s:([^\])/:\1\\/:g replace all / not preceded by a \ with \/ s:\\/([^\]( |$)):/\1:g replace last \/ before space or end of line with /

Related

Linux sed regular expression

Sed - remove all semicolons between a pair of double-quotes

How to remove the first three character from the fasta file header

Two pattern match on same sed command

Using sed to match anything and \s

Categories

Resources

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Replace with SED multiple occurences at the same line - linux

You can use a perl command line solution based on the following regEx's (?<!\\) not preceded by a backslash (?!\w+\s) not followed by word characters terminating in whitespace perl -pe 's;(?<!\\)/(?!\w+\s);\\/;g' file nocareNocare abc\/def\/ghi\/mno\/pq/r abc\/def\/ghi\/mno\/pq/r

With GNU sed: sed -E 's:([^\])/:\1\\/:g;s:\\/([^\]*( |$)):/\1:g' file Two s command here: s:([^\])/:\1\\/:g replace all / not preceded by a \ with \/ s:\\/([^\]*( |$)):/\1:g replace last \/ before space or end of line with /

Related

Linux sed regular expression

Sed - remove all semicolons between a pair of double-quotes

How to remove the first three character from the fasta file header

Two pattern match on same sed command

Using sed to match anything and \s

Categories

Resources

With GNU sed: sed -E 's:([^\])/:\1\\/:g;s:\\/([^\]( |$)):/\1:g' file Two s command here: s:([^\])/:\1\\/:g replace all / not preceded by a \ with \/ s:\\/([^\]( |$)):/\1:g replace last \/ before space or end of line with /