Wrong matching regex - python-3.x

So I'm using re module to compile my regex, and my regex looks like this:
"(^~\w+?[ & ~\w+?]*?$)"
So I compile it using pattern = re.compile(regex) and then I use re.findall(pattern, string) to find if the given string is matching and to give me the group if it is.
String that I'm matching is "v1 V ~v2_ V ~~v3".
I'd expect to not have a match but it says that it matches the regular expression. I suspect that \w+ matches white spaces so that it matches the whole string but I could not find in the documentation that is correct. What am I missing?
Here this is minimum reproductible example:
import re
test_string = "v1 V ~v2_ V ~~v3"
regex = "(^~*\w+?[ & ~*\w+?]*?$)"
pattern = re.compile(regex)
for elem in re.findall(regex, test_string):
print(elem)

If you expect to not match I think your problem is with [ & ~*\w+?]* part.
The characters between square brackets means one occurrence of, in this case one occurrence of &, ~, *, ?, word and space. And the asterisk (*) at the end makes zero or many occurrences of what is in the brackets.
If what you wanted is to match this sub-regex & ~*\w+? zero or more times use parenthesis.
So I would say that you wanted this regex: (^~*\w+?( & ~*\w+?)*?$) (just change brackets for parenthesis.

Related

Python - how to find string and remove string plus next x characters

I have the following string:
mystr = '(string_to_delete_20221012_11-36) keep this (string_to_delete_20221016_22-22) keep this (string_to_delete_20221017_20-55) keep this'
I wish to delete all the entries (string_to_deletexxxxxxxxxxxxxxx) (including the trailing space)
I sort of need pseudo code as follows:
If you find a string (string_to_delete then replace that string and the timestamp, closing parenthesis and trailing space with null e.g. delete the string (string_to_delete_20221012_11-36)
I would use a list comprehension but given that not all strings are contained inside parenthesis I cannot see what I could use to create the list via a string.split().
Is this somethng that needs regular expressions?
it seemed like a good place to put regex:
import re
pattern = r'\(string_to_delete_.*?\)\s*'
mystr = '(string_to_delete_20221012_11-36) keep this (string_to_delete_20221016_22-22) keep this (string_to_delete_20221017_20-55) keep this'
for match in re.findall(pattern, mystr):
mystr = mystr.replace(match, '', 1) # replace 1st occurence of matched str with empty string
print(mystr)
results with:
>> keep this keep this keep this
brief regex breakdown: \(string_to_delete_.*?\)\s*
\( look for left parenthesis - escape needed
match string string_to_delete_
.*? look for zero or more characters if any
\) match closing parenthesis
\s* include zero or more whitespaces after that

Regex: Match between delimiters (a letter and a special character) in a string to form new sub-strings

I was working on a certain problem where I have form new sub-strings from a main string.
For e.g.
in_string=ste5ts01,s02,s03
The expected output strings are ste5ts01, ste5ts02, ste5ts03
There could be comma(,) or forward-slash (/) as the separator and in this case the delimiters are the letter s and ,
The pattern I have created so far:
pattern = r"([^\s,/]+)(?<num>\d+)([,/])(?<num>\d+)(?:\2(?<num>\d+))*(?!\S)"
The issue is, I am not able to figure out how to give the letter 's' as one of the delimiters.
Any help will be much appreciated!
You might use an approach using the PyPi regex module and named capture groups which are available in the captures:
=(?<prefix>s\w+)(?<num>s\d+)(?:,(?<num>s\d+))+
Explanation
= Match literally
(?<prefix>s\w+) Match s and 1+ word chars in group prefix
(?<num>s\d+) Capture group num match s and 1+ digits
(?:,(?<num>s\d+))+ Repeat 1+ times matching , and capture s followed by 1+ digits in group num
Example
import regex as re
pattern = r"=(?<prefix>s\w+)(?<num>s\d+)(?:,(?<num>s\d+))+"
s="in_string=ste5ts01,s02,s03"
matches = re.finditer(pattern, s)
for _, m in enumerate(matches, start=1):
print(','.join([m.group("prefix") + c for c in m.captures("num")]))
Output
ste5ts01,ste5ts02,ste5ts03

Regex to find compensations in text

I need to find mentions of compensations in emails. I am new to regex. Please see below the approach I am using.
sample_text = "Rate – $115k/yr. or $55/hr. - $60/hr"
My python code to find this,
impor re
PATTERN = r'((\$|\£) [0-9]*)|((\$|\£)[0-9]*)'
print(re.findall(PATTERN,sample_text))
The matches I am getting is
[('', '', '$115', '$'), ('', '', '$55', '$'), ('', '', '$60', '$')]
Expected match
["$115k/yr","$55/hr","$60/hr"]
Also the $ sign can be written as USD. How do I handle this in the same regex.
You can use
[$£]\d+[^.\s]*
[$£] Match either $ or £
\d+ Match 1+ digits
[^.\s]* Repeat 0+ times matching any char except . or a whitespace
Regex demo
import re
sample_text = "Rate – $115k/yr. or $55/hr. - $60/hr"
PATTERN = r'[$£]\d+[^.\s]*'
print(re.findall(PATTERN,sample_text))
Output
['$115k/yr', '$55/hr', '$60/hr']
If there must be a / present, you might also use
[$£]\d+[^\s/.]*/\w+
Regex demo
You can have something like:
[$£]\d+[^.]+
>>> PATTERN = '[$£]\d+[^.]+
>>> print(re.findall(PATTERN,sample_text))
['$115k/yr', '$55/hr', '$60/hr']
[$£] matches "$" or a "£"
\d+ matches one or more digits
[^.]+ matches everything that's not a "."
The parentheses in your regex cause the engine to report the contents of each parenthesized subexpression. You can use non-grouping parentheses (?:...) to avoid this, but of course, your expression can be rephrased to not have any parentheses at all:
PATTERN = r'[$£]\s*\d+'
Notice also how I changed the last quantifier to a + -- your attempt would also find isolated currency symbols with no numbers after them.
To point out the hopefully obvious, \s matches whitespace and \s* matches an arbitrary run of whitespace, including none at all; and \d matches a digit.
If you want to allow some text after the extracted match, add something like (?:/\w+)? to allow for a slash and one single word token as an optional expression after the main match. (Maybe adorn that with \s* on both sides of the slash, too.)

Regular expression to capture n lines of text between two regex patterns

Need help with a regular expression to grab exactly n lines of text between two regex matches. For example, I need 17 lines of text and I used the example below, which does not work. I
Please see sample code below:
import re
match_string = re.search(r'^.*MDC_IDC_RAW_MARKER((.*?\r?\n){17})Stored_EGM_Trigger.*\n'), t, re.DOTALL).group()
value1 = re.search(r'value="(\d+)"', match_string).group(1)
value2 = re.search(r'value="(\d+\.\d+)"', match_string).group(1)
print(match_string)
print(value1)
print(value2)
I added a sample string to here, because SO does not allow long code string:
https://hastebin.com/aqowusijuc.xml
You are getting false positives because you are using the re.DOTALL flag, which allows the . character to match newline characters. That is, when you are matching ((.*?\r?\n){17}), the . could eat up many extra newline characters just to satisfy your required count of 17. You also now realize that the \r is superfluous. Also, starting your regex with ^.*? is superfluous because you are forcing the search to start from the beginning but then saying that the search engine should skip as many characters as necessary to find MDC_IDC_RAW_MARKER. So, a simplified and correct regex would be:
match_string = re.search(r'MDC_IDC_RAW_MARKER.*\n((.*\n){17})Stored_EGM_Trigger.*\n', t)
Regex Demo

Is it possible to use find and replace on a wildcard string in VIM?

For example, I have a bunch of values with a common prefix and postfix, such as:
fooVal1Bar;
fooVal2Bar;
fooVal3Bar;
In this case, all variable names begin and end with foo and end with Bar. I want to use a find and replace using the random variable names found between foo and Bar. Say I already have variables Val1, Val2, Val3, and Val1Old, Val2Old, and Val3Old Defined. I would do a find a replace, something along the lines of:
:%s/foo<AnyString>Bar/foo<AnyString>Bar = <AnyString> + <AnyString>Old
This would result in:
fooVal1Bar = Val1 + Val1Old;
fooVal2Bar = Val2 + Val2Old;
fooVal3Bar = Val3 + Val3Old;
I hope it's clear what I want to do, I couldn't find anything in vim help or online about replacing with wildcard strings. The most I could find was about searching for wildcard strings.
I believe you want
:%s/foo\(\w\+\)Bar/& = \1 + \1\Old/
explanation:
\w\+ finds one or more occurences of a character. The preceeding foo and following Bar ensure that these matched characters are just between a foo and a Bar.
\(...\) stores this characters so that they can be used in the replace part of the substitution.
& copies what was matched
\1 is the string captured in the \(....\) part.
You need to capture what you want to save. Try something like this:
%s/\(foo\(\w\+\)Bar\);/\1 = \2 \2Old/
Or you can clean it up a little bit with magic:
%s/\v(foo(\w+)Bar);/\1 = \2 \2Old/
Replace string with wildcard
:%s/foo.*Bar/hello_world/gc
Here, .* handles wildcoard follows regex more info on regex quantifiers
. - Any character except line break
* - Zero or more times

Resources