RegEX to extract Variables names, operators, and quoted strings with symbols - python-3.x

I'm trying to break a string (from command line argument) into 4 components. (C++ variable name):(python variable)(operator)(value or quoted string)
Example:
CVariable_1:PythonVariable.attribute<=2343.23
result=('CVariable_1','PythonVariable.attribute','<=','2343.23')
CVariable_2:PythonVariable2.attribute2.value<="Any string including SYMBOLS~!##$%^&*\"\'<> and spaces"
result=('CVariable_2','PythonVariable2.attribute2.value','==','Any string including SYMBOLS##$%\"\'<> and spaces')
The closest regex I've come up with is:
[^:'"<>=]+|[\.\w]+|[<>!=]+
But the string could have any symbols in it. Quotes would be escaped though.

I think ([^:]+(?=:))?([^<=!]+)([<>=!]+)(.*$) will work.
I tried the following code:
import re
import typing
pattern = re.compile(r"([^:]+(?=:))?([^<=!]+)([<>=!]+)(.*$)")
def split_name(name: str) -> typing.Tuple[str, str, str, str]:
global pattern
match = re.match(pattern, name)
if match is None:
raise ValueError("The name is invalid")
cname = match.group(1)
pyname = match.group(2)
operator = match.group(3)
value = match.group(4)
return cname, pyname, operator, value
test_names = ["PythonVariable1.attribute.value==False",
"CVariable_2:PythonVariable2.attribute<=2343.23",
"CVariable_3:PythonVariable3.attribute3.value<=\"Any string including SYMBOLS~!##$%^&*\\\"\\'<> and spaces\""]
print(list(map(split_name, test_names)))
If you want to write the allowed operators explicitly, you can change the third group. Note that the order is important in this case (<= needs to come before <):
([^:]+(?=:))?([^<=!]+)(<=|>=|!=|==|<|>)(.*$)

Related

How do i find/count number of variable in string using Python

Here is example of string
Hi {{1}},
The status of your leave application has changed,
Leaves: {{2}}
Status: {{3}}
See you soon back at office by Management.
Expected Result:
Variables Count = 3
i tried python count() using if/else, but i'm looking for sustainable solution.
You can use regular expressions:
import re
PATTERN = re.compile(r'\{\{\d+\}\}', re.DOTALL)
def count_vars(text: str) -> int:
return sum(1 for _ in PATTERN.finditer(text))
PATTERN defines the regular expression. The regular expression matches all strings that contain at least one digit (\d+) within a pair of curly brackets (\{\{\}\}). Curly brackets are special characters in regular expressions, so we must add \. re.DOTALL makes sure that we don't skip over new lines (\n). The finditer method iterates over all matches in the text and we simply count them.

pass regex group to function for substituting [duplicate]

I have a string S = '02143' and a list A = ['a','b','c','d','e']. I want to replace all those digits in 'S' with their corresponding element in list A.
For example, replace 0 with A[0], 2 with A[2] and so on. Final output should be S = 'acbed'.
I tried:
S = re.sub(r'([0-9])', A[int(r'\g<1>')], S)
However this gives an error ValueError: invalid literal for int() with base 10: '\\g<1>'. I guess it is considering backreference '\g<1>' as a string. How can I solve this especially using re.sub and capture-groups, else alternatively?
The reason the re.sub(r'([0-9])',A[int(r'\g<1>')],S) does not work is that \g<1> (which is an unambiguous representation of the first backreference otherwise written as \1) backreference only works when used in the string replacement pattern. If you pass it to another method, it will "see" just \g<1> literal string, since the re module won't have any chance of evaluating it at that time. re engine only evaluates it during a match, but the A[int(r'\g<1>')] part is evaluated before the re engine attempts to find a match.
That is why it is made possible to use callback methods inside re.sub as the replacement argument: you may pass the matched group values to any external methods for advanced manipulation.
See the re documentation:
re.sub(pattern, repl, string, count=0, flags=0)
If repl is a function, it is called for every non-overlapping
occurrence of pattern. The function takes a single match object
argument, and returns the replacement string.
Use
import re
S = '02143'
A = ['a','b','c','d','e']
print(re.sub(r'[0-9]',lambda x: A[int(x.group())],S))
See the Python demo
Note you do not need to capture the whole pattern with parentheses, you can access the whole match with x.group().

SyntaxError: unexpected character after line continuation character re

I want to get the first two number of str "21N"
import re
str = "21N"
number = re.find(r\'d{1,2}', str)
but get this error, how do I get the first two number from the the str
I'am very think for
Move the backslash between the apostrophe and d:
number = re.findall(r'\d{1,2}', str)
You have a couple of errors. re.find doesn't exist, you can use re.search instead. And your backslash needs to be inside rather than outside your opening quote.
So the following would work:
number = re.search(r'\d{1,2}', str)
But the {1,2} is actually unnecessary if you know you're looking for exactly 2 digits. Just use:
number = re.search(r'\d{2}', str)
And as an aside, don't use the variable name str, as it is a built-in type in Python.

Get words from a file using groovy

Using groovy how can I get the words/texts from a file which enclosed with parentheses.
Example:
George (a programmer) used to think much.
words to get: a programmer
Here you have an example program solving the issue:
String inp = 'George (a programmer) used to think much.'
def matcher = inp =~ /\(([^\)]+)\)/ // Try to find a match
if (matcher) { // Something found
String str = matcher[0][1] // Get the 1st capture group
printf("Found: %s.\n", str)
def words = str.tokenize() // Create a list of words
words.eachWithIndex{ it, i -> printf("%d: %s.\n", i, it)}
} else {
print("Not found")
}
Note the meaning of parentheses in the regular expression:
Outer (backslash quoted) parentheses are literal parentheses (we are
looking for these chars).
Unquoted parentheses (between them) are delimiters of the capture group.
The remaining (quoted) closing parenthesis between them is the char
that should not be present within the capture group.

How to extract substring in Groovy?

I have a Groovy method that currently works but is real ugly/hacky looking:
def parseId(String str) {
System.out.println("str: " + str)
int index = href.indexOf("repositoryId")
System.out.println("index: " + index)
int repoIndex = index + 13
System.out.println("repoIndex" + repoIndex)
String repoId = href.substring(repoIndex)
System.out.println("repoId is: " + repoId)
}
When this runs, you might get output like:
str: wsodk3oke30d30kdl4kof94j93jr94f3kd03k043k?planKey=si23j383&repositoryId=31850514
index: 59
repoIndex: 72
repoId is: 31850514
As you can see, I'm simply interested in obtaining the repositoryId value (everything after the = operator) out of the String. Is there a more efficient/Groovier way of doing this or this the only way?
There are a lot of ways to achieve what you want. I'll suggest a simple one using split:
sub = { it.split("repositoryId=")[1] }
str='wsodk3oke30d30kdl4kof94j93jr94f3kd03k043k?planKey=si23j383&repositoryId=31850514'
assert sub(str) == '31850514'
Using a regular expression you could do
def repositoryId = (str =~ "repositoryId=(.*)")[0][1]
The =~ is a regex matcher
or a shortcut regexp - if you are looking only for single match:
String repoId = str.replaceFirst( /.*&repositoryId=(\w+).*/, '$1' )
All the answers here contains regular expressions, however there are a bunch of string methods in Groovy.
String Function
Sample
Description
contains
myStringVar.contains(substring)
Returns true if and only if this string contains the specified sequence of char values
equals
myStringVar.equals(substring)
This is similar to the above but has to be an exact match for the check to return a true value
endsWith
myStringVar.endsWith(suffix)
This method checks the new value contains an ending string
startsWith
myStringVar.startsWith(prefix)
This method checks the new value contains an starting string
equalsIgnoreCase
myStringVar.equalsIgnoreCase(substring)
The same as equals but without case sensitivity
isEmpty
myStringVar.isEmpty()
Checks if myStringVar is populated or not.
matches
myStringVar.matches(substring)
This is the same as equals with the slight difference being that matches takes a regular string as a parameter unlike equals which takes another String object
replace
myStringVar.replace(old,new)
Returns a string resulting from replacing all occurrences of oldChar in this string with newChar
replaceAll
myStringVar.replaceAll(old_regex,new)
Replaces each substring of this string that matches the given regular expression with the given replacement
split
myStringVar.split(regex)
Splits this string around matches of the given regular expression
Source

Resources