Regular Expression with if Condition and if then else condition - regular-language

How can i solve and use this condition in regular Expression
if (cond) then s1 else s2.
if (cond1)then if (cond) then s1 else s2.
please help me

Basically first you give a group a name - for example (?<grp1>match).
Now we have a group named 'grp1', that matched the word 'match'.
Then we can subsequently probe if this group has matched with (?(R&grp1)yes|no)) which means if grp1 has been matched continue to see if 'yes' now matches otherwise see if 'no' matches.

Related

Why doesn't this RegEx match anything?

I've been trying for about two hours now to write a regular expression which matches a single character that's not preceded or followed by the same character.
This is what I've got: (\d)(?<!\1)\1(?!\1); but it doesn't seem to work! (testing at https://regex101.com/r/whnj5M/6)
For example:
In 1111223 I would expect to match the 3 at the end, since it's not preceded or followed by another 3.
In 1151223 I would expect to match the 5 in the middle, and the 3 at the end for the same reasons as above.
The end goal for this is to be able to find pairs (and only pairs) of characters in strings (e.g. to find 11 in 112223 or 44 in 123544) and I was going to try and match single isolated characters, and then add a {2} to it to find pairs, but I can't even seem to get isolated characters to match!
Any help would be much appreciated, I thought I knew RegEx pretty well!
P.S. I'm testing in JS on regex101.com because it wouldn't let me use variable length lookbacks in Python on there, and I'm using the regex library to allow for this in my actual implementation.
Your regex is close, but by using simply (\d) you are consuming characters, which prevents the other match from occurring. Instead, you can use a positive lookahead to set the capture group and then test for any occurrences of the captured digit not being surrounded by copies of itself:
(?=.*?(.))(?<!\1)\1(?!\1)
By using a lookahead you avoid consuming any characters and so the regex can match anywhere in the string.
Note that in 1151223 this returns 5, 1 and 3 because the third 1 is not adjacent to any other 1s.
Demo on regex101 (requires JS that supports variable width lookbehinds)
The pattern you tried does not match because this part (\d)(?<!\1) can not match.
It reads as:
Capture a digit in group 1. Then, on the position after that captured
digit, assert what is captured should not be on the left.
You could make the pattern work by adding for example a dot after the backreference (?<!\1.) to assert that the value before what you have just matched is not the same as group 1
Pattern
(\d)(?<!\1.)\1(?!\1)
Regex demo | Python demo
Note that you have selected ECMAscript on regex101.
Python re does not support variable width lookbehind.
To make this work in Python, you need the PyPi regex module.
Example code
import regex
pattern = r"(\d)(?<!\1.)\1(?!\1)"
test_str = ("1111223\n"
"1151223\n\n"
"112223\n"
"123544")
matches = regex.finditer(pattern, test_str)
for matchNum, match in enumerate(matches, start=1):
print(match.group())
Output
22
11
22
11
44
#Theforthbird has provided a good explanation for why your regular explanation does not match the characters of interest.
Each character matched by the following regular expression is neither preceded nor followed by the same character (including characters at the beginning and end of the string).
r'^.$|^(.)(?!\1)|(?<=(.))(?!\2)(.)(?!\3)'
Demo
Python's re regex engine performs the following operations.
^.$ match the first char if it is the only char in the line
| or
^ match beginning of line
(.) match a char in capture group 1...
(?!\1) ...that is not followed by the same character
| or
(?<=(.)) save the previous char in capture group 2...
(?!\2) ...that is not equal to the next char
(.) match a character and save to capture group 3...
(?!\3) ...that is not equal to the following char
Suppose the string were "cat".
The internal string pointer is initially at the beginning of the line.
"c" is not at the end of the line so the first part of the alternation fails and the second part is considered.
"c" is matched and saved to capture group 1.
The negative lookahead asserting that "c" is not followed by the content of capture group 1 succeeds, so "c" is matched and the internal string pointer is advanced to a position between "c" and "a".
"a" fails the first two parts of the assertion so the third part is considered.
The positive lookbehind (?<=(.)) saves the preceding character ("c") in capture group 2.
The negative lookahead (?!\2), which asserts that the next character ("a") is not equal to the content of capture group 2, succeeds. The string pointer remains just before "a".
The next character ("a") is matched and saved in capture group 3.
The negative lookahead (?!\3), which asserts that the following character ("t") does not equal the content of capture group 3, succeeds, so "a" is matched and the string pointer advances to just before "t".
The same steps are performed when evaluating "t" as were performed when evaluating "a". Here the last token ((?!\3)) succeeds, however, because no characters follow "t".

If statement not triggering when trying to match strings

I'm attempting to pick out strings containing a specific character (*) in an if/else statement using Python's in command. It works in the terminal, but the if statement isn't picking up on it for some reason.
In the terminal:
match = '*moustache'
'*' in match
Out[27]: True
But when I try to use it in an if statement,
if '*' in match == True:
print(match)
does absolutely nothing. Why? Is there a different/better way to do this?
It will work if you remove the == True.
if '*' in match:
print(match)
The if statement will evaluate to True and then the print line will execute.

Python3: pexpect issue about expect list

Here comes the template, in the while loop , variable "index" is a list, So , I can't understand the code "if index == 0" mean, does index[0] = "suc", index[1]="fail" ? please make it as more clear as possible。
import pexpect
while True:
index = child.expect(["suc","fail",pexpect.TIMEOUT])
if index == 0:
break
elif index == 1:
return False
elif index == 2:
pass #continue to wait
The expect() method returns the index of the pattern that is matched. index is not a list.
According to the manual:
expect(pattern, timeout=-1, searchwindowsize=-1, async=False)
This seeks through the stream until a pattern is matched. The pattern is overloaded and may take several
types. The pattern can be a StringType, EOF, a compiled re, or a list of any of those types. Strings will be
compiled to re types. This returns the index into the pattern list. If the pattern was not a list this returns
index 0 on a successful match. This may raise exceptions for EOF or TIMEOUT. To avoid the EOF or
TIMEOUT exceptions add EOF or TIMEOUT to the pattern list. That will cause expect to match an EOF
or TIMEOUT condition instead of raising an exception.
If you pass a list of patterns and more than one matches, the first match in the stream is chosen. If more
than one pattern matches at that point, the leftmost in the pattern list is chosen.

Use of "("?) in Lua with string.find and the value that it returns

a, i, c = string.find(s, '"("?)', i + 1)
What is role of ? here? I believe it was checking for double quotes but I really do not understand exactly the use of "("?).
I read that string.find returns the starting and ending index of the matched pattern. But as per above line of code, a, i and c, 3 values are being returned. What is the third value being returned here?
? matches an optional character, i.e, zero or one occurrence of a character. So the pattern "("?) matches a ", followed by an optional ", i.e, it matches either " or "". Note that the match for "?(zero or one ") is captured.
As for the return value of string.find(), from string.find():
If the pattern has captures, then in a successful match the captured values are also returned, after the two indices.
The capture is the third return value, when there is a successful match.

How to remove redundant matches between two strings?

Given two strings str1 and str2 I have a list of matches describing shared substrings as intervals in the form of [str1_beg, str1_end, str2_beg, str2_end]. I want to remove redundant matches where str1_beg, str1_end and str2_beg, str2_end from a match are embedded in some other match.
For each [beg_index, end_index] find [beg_index_new, end_index_new] and remove the ones that satisfy end_index < end_index_new and beg_index >= beg_index_new.
And that's O(n^2)
first of all, you can store your matches more efficiently.
[str_beg,str2_beg,match_len]
This will also make it very easy to check for redundancy, for example
for match in matches:
for i in xrange(len(matches)):
if matches[i][:2] == match[:2] and mathches[i][2] < match[2]:
del matches[i]
I'm assuming your list of matches is assigned to a variable called matches, and has the structure I proposed above, so the ma. I'm using the < operator and not the <= operator, because in the case they are equal, they're the exact same match, and I'm assuming you won't have the same match twice.
Where I'm checking both matche's [:2] slice, I'm chec king the first 2 elements of their lists, which are the starting positions.

Resources