Linux sed regular expression

Linux sed regular expression - linux

I have a string:
2021-05-27 10:40:50.678117 PID529270:TID 47545543550720:SID 1673488:TXID 786092740:QID 140: INFO:MEMCONTEXT:MemContext state: mem[cur/hi/max] = 9135 / 96586 / 96576 MB, VM[cur/hi/max] = 9161 / 21841178 / 100663296 MB
I want to get the number 9135 that first occurrence between '=' and '/', right now, my command as below, it works, but I don't think it's perfect:
sed -r 's/.* = ([0-9]+) .* = .*/\1 /'
Need a more neat one, please help advise.

You can use
sed -En 's~.*= ([0-9]+) /.*=.*~\1~p'
See the online demo.
An awk solution:
awk -F= '{gsub(/\/.*|[^0-9]/,"",$2);print $2}'
See this demo.
Details:
-En - E (or r as in your example) enables the POSIX ERE syntax and n suppresses the default line output
.*= ([0-9]+) /.*=.* - matches any text, = + space, captures one or more digits into Grou 1, then matches a space, /, then any text, = and again any text
\1 - replaces with Group 1 value
p - prints the result of the substitution.
Here, ~ are used as regex delimiters in order not to escape / in the pattern.
awk:
-F= - sets the input field separator to =
gsub(/\/.*|[^0-9]/,"",$2) - removes any non-digit or / and the rest of the string
print $2 - prints the modified Field 2 value.

You could also get the first match with grep using -P for Perl-compatible regular expressions.
grep -oP "^.*? = \K\d+(?= /)"
^ Start of string
.*? Match as least as possible chars
= Match space = and space
\K\d+ Forget what is matched so far
(?= /) Assert a space and / to the right
Output
9135
See a bash demo

Since you want the material between the first = and the first /, ignoring the spaces, you could use:
sed -E -e 's%^[^=]*= ([^/]*) /.*$%\1%'
This uses Extended Regular Expressions (ERE) (-E; -r also works with GNU sed), and searches from the start of the line for a sequence of 'not =' characters, the = character, a space, anything that's not a slash (which is remembered), another space, a slash, and anything that follows, replacing it all with what was remembered. The ^ and $ anchors aren't crucial; it will work the same without them. The % symbols are used instead of / because the searched-for pattern includes a /. If your sure there'll never be any spaces other than the first and last ones between the = and /, you can use [^ /]* in place of [^/]* and there should be some small (probably immeasurable) performance benefit.

Related

Sed - remove all semicolons between a pair of double-quotes

I have a dirty csv-file containing rows with quoted semicolons. I am trying to clear these semicolons with commands like:
sed -rin 's/(^.*\;.*\;\".*)(\;)(.*\"\;.*$)/\1\3/' file
But somehow this doesn't remove all of the semicolons. Some of the problematic rows look like this:
;0;"One ▒;)";123; ... ; nth-1column;
;0;"Two ▒;)";456; ... ; nthcolumn;
When they should be cleaned to:
;0;"One ▒)";123; ... ; nth-1column;
;0;"Two ▒)";456; ... ; nthcolumn;
There might be some encoding issues, but this should be ignored by the regex. I am only interested in removing the semicolons, the encoding is handled afterwards.
Any ideas on how to aggressively clean all semicolons contained within double-quotes?

This might work for you (GNU sed):
sed -E ':a;s/^([^"]*("[^;"]*"[^"]*)*"[^";]*);/\1/;ta' file
Make a back reference starting from the front of each line that contain characters not between double quotes and quoted strings that do not contain ;'s followed by double quote and characters that are neither double-quote or semi-colon. If the next character is a semi-colon, remove it and repeat until failure, then print the result.
An alternative:
sed -E '/^([^"]*("[^";]*"[^"]*)*"[^";]*);/{s//\n\1/;D}' file
or:
sed -E 's/^([^"]*("[^";]*"[^"]*)*"[^";]*);/\n\1/;T;D' file
EDIT:
sed -nE '/^([^"]*("[^";]*"[^"]*)*"[^";]*);/{:a;s//\1/;ta;p}' file

You can use
sed ':a;s/^\(\([^"]*;\?\|"[^";]*";\?\)*"[^";]*\);/\1/;ta' file
See an online demo.
It works like this:
:a - sets a label
^\(\([^"]*;\?\|"[^";]*";\?\)*"[^";]*\); - find:
^ - start of string
\(\([^"]*;\?\|"[^";]*";\?\)*"[^";]*\) - Group 1:
\([^"]*;\?\|"[^";]*";\?\)* - zero or more occurrences of
[^"]*;\? - zero or more chars other than " and then an optional ;
\| - or
"[^";]*";\? - ", then zero or more chars other than " and ; and then a " and then an optional ;
" - a " char
[^";]* - zero or more chars other than a ; and "
; - a semi-colon
\1 - replace with Group 1 value
ta - if there was a substitution, go back to a label position.

Get Text after word at specific position

I have file like this
TT;12-11-18;text;abc;def;word
AA;12-11-18;tee;abc;def;gih;word
TA;12-11-18;teet abc;def;word
TT;12-11-18;tdd;abc;def;gih;jkl;word
I want output like this
TT;12-11-18;text;abc;def;word
TA;12-11-18;teet abc;def;word
I want to get word if it occur at position 5 after date 12-11-18. I do not want this occurrence if its found after this position that is at 6th or 7th position. Count of position start from date 12-11-18
I want tried this command
cat file.txt|grep "word" -n1
This print all occurrence in which this pattern word is matched. How should I solve my problem?

Try this(GNU awk):
awk -F"[; ]" '/12-11-18/ && $6=="word"' file
Or sed one:
sed -n '/12-11-18;\([^; ]*[; ]\)\{3\}word/p' file
Or grep with basically the same regex(different escape):
grep -E "12-11-18;([^; ]*[; ]){3}word" file
[^; ] means any character that's not ; or (space).
* means match any repetition of former character/group.
-- [^; ]* means any length string that don't contain ; or space, the ^ in [^; ] is to negate.
[; ] means ; or space, either one occurance.
() is to group those above together.
{3} is to match three repetitives of former chracter/group.
As a whole ([^; ]*[; ]){3} means ;/space separated three fields included the delimiters.
As #kvantour points out, if there could be multiple spaces at one place they could be faulty.
To consider multiple spaces as one separator, then:
awk -F"(;| +)" '/12-11-18/ && $6=="word"'
and
grep -E "12-11-18;([^; ]*(;| +)){3}word"
or GNU sed (posix/bsd/osx sed does not support |):
sed -rn '/12-11-18;([^; ]*(;| +)){3}word/p'

Replace with SED multiple occurences at the same line

I want to replace all slashes "/" between alphanumeric with backslash+slash "\/" apart from the last one on each string, e.g.
nocareNocare abc\/def/ghi/mno\/pq/r abc\/def\/ghi/mno\/pq/r
should become:
nocareNocare abc\/def\/ghi\/mno\/pq/r abc\/def\/ghi\/mno\/pq/r
I use:
sed 's/\(.*\)\([[:alnum:]]\)\/\([[:alnum:]]\)\(\S*\)\(\\\|\/\)/\1\2\\\/\3\4\//g'
Short explanation: match
any string + alnum + / + any non-white + / or \
But it only replace one case, so I need to run it 3 times to replace all 3 occurences. Looks like the first time it matches all the way to :
>nocareNocare abc\/def/ghi/mno\/pq/r abc\/def\/ghi/
instead of
>nocareNocare abc\/def/

sed -e :a -e 's|\([a-z0-9]\)/\([a-z0-9][^ ]*[a-z0-9]/[a-z0-9]\)|\1\\/\2|;ta' filename
Loosely translated, this says "replace a lone slash followed by some other stuff in the string, followed by another lone slash, with backslash-slash and that same stuff (and the second slash). And after making such a replacement, start over again."

You can use a perl command line solution based on the following regEx's
(?<!\\)
not preceded by a backslash
(?!\w+\s)
not followed by word characters terminating in whitespace
perl -pe 's;(?<!\\)/(?!\w+\s);\\/;g' file
nocareNocare abc\/def\/ghi\/mno\/pq/r abc\/def\/ghi\/mno\/pq/r

With GNU sed:
sed -E 's:([^\])/:\1\\/:g;s:\\/([^\]*( |$)):/\1:g' file
Two s command here:
s:([^\])/:\1\\/:g replace all / not preceded by a \ with \/
s:\\/([^\]*( |$)):/\1:g replace last \/ before space or end of line with /

Two pattern match on same sed command

I have the following sed command:
sed -n '/^out(/{n;p}' ${filename} | sed -n '/format/ s/.*format=//g; s/),$//gp; s/))$//gp'
I tried to do it as one line as in:
sed -n '/^out(/{n;}; /format/ s/.*format=//g; s/),$//gp; s/))$//gp' ${filename}
But that also display the lines I don't want (those that do not match).
What I have is a file with some strings as in:
entry(variable=value)),
format(variable=value)),
entry(variable=value)))
out(variable=value)),
format(variable=value)),
...
I just want the format lines that came right after the out entry. and remove those trailing )) or ),

You can use this sed command:
sed -nr '/^out[(]/ {n ; s/.*[(]([^)]+)[)].*/\1/p}' your_file
Once a out is found, it advanced to the next line (n) and uses the s command with p flag to extract only what is inside parenthesises.
Explanation:
I used [(] instead of \(. Outside brackets a ( usually means grouping, if you want a literal (, you need to escape it as \( or you can put it inside brackets. Most RE special characters dont need escaping when put inside brackets.
([^)]+) means a group (the "(" here are RE metacharacters not literal parenthesis) that consists of one or more (+) characters that are not (^) ) (literal closing parenthesis), the ^ inverts the character class [ ... ]

Using sed to match anything and \s

I've got the following:
sed -i "s/SYNFLOOD_RATE = \"100/s\"/SYNFLOOD_RATE = \"10\s\"/g"
Question is how do I avoid this error?
/bin/sed: -e expression #1, char 28: unknown option to `s'
And is there a way to do a wild card match and replace with sed?

You have too many slashes, 4 when there should be 3. Use a different delimiter; comma (,), bang (!), hash (#), and at (#) are common alternatives.
sed -i "s,SYNFLOOD_RATE = \"100/s\",SYNFLOOD_RATE = \"10\s\",g"
Note that you have "100/s" in the original and "10s" (no slash) in the replacement. To actually insert a backslash, you'd need to enter 4 of them: 10\\\\s. Each pair will get reduced to a single by the shell and then the remaining double will be interpreted as a literal backslash by sed.

If you want to first grep then substitute :
sed -i '/SYNFLOOD_RATE = \"100/s/"\/SYNFLOOD_RATE = \"10\s\"/replacement/g'
But the delimiter can be anything else than /, see :
sed -i '/SYNFLOOD_RATE = "100/s#"/SYNFLOOD_RATE = "10\s"#replacement#g'
( the delimiter here is #)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Linux sed regular expression - linux

Related

Sed - remove all semicolons between a pair of double-quotes

Get Text after word at specific position

Replace with SED multiple occurences at the same line

Two pattern match on same sed command

Using sed to match anything and \s

Categories

Resources