How to get the content after a string using regex in python - python-3.x

I am having a string as follows:
A5697[2:10] = {ravi, rageev, raghav, smith};
I want the content after "A5697[2:10] =". So, my output should be:
{ravi, rageev, raghav, smith};
This is my code:
print(re.search(r'(?<=A\d+\[.*\] =\s).*', line).group())
But, this is giving error:
sre_constants.error: look-behind requires fixed-width pattern
Can anyone help to solve this issue? I would prefer to use regex.

You can try re.sub , like below, Since you have given only one data point. I am assuming all the other data points are following the similar pattern.
import re
text = "A5697[2:10] = {ravi, rageev, raghav, smith}"
re.sub(r'(A\d+\[\d+:\d+\]\s+=\s+)(.+)', r'\2', text)
returns,
'{ravi, rageev, raghav, smith}'
re.sub : substitutes the entire match as given as regex with the 2nd capturing group. The second capturing group captures every thing after '= '.

Simply replace the bits you don't want:
print re.sub(r'A\d[^=]*= *','',line)
See demo here: https://rextester.com/NSG17655

Related

Get number from string in Python

I have a string, I have to get digits only from that string.
url = "www.mylocalurl.com/edit/1987"
Now from that string, I need to get 1987 only.
I have been trying this approach,
id = [int(i) for i in url.split() if i.isdigit()]
But I am getting [] list only.
You can use regex and get the digit alone in the list.
import re
url = "www.mylocalurl.com/edit/1987"
digit = re.findall(r'\d+', url)
output:
['1987']
Replace all non-digits with blank (effectively "deleting" them):
import re
num = re.sub('\D', '', url)
See live demo.
You aren't getting anything because by default the .split() method splits a sentence up where there are spaces. Since you are trying to split a hyperlink that has no spaces, it is not splitting anything up. What you can do is called a capture using regex. For example:
import re
url = "www.mylocalurl.com/edit/1987"
regex = r'(\d+)'
numbers = re.search(regex, url)
captured = numbers.groups()[0]
If you do not what what regular expressions are, the code is basically saying. Using the regex string defined as r'(\d+)' which basically means capture any digits, search through the url. Then in the captured we have the first captured group which is 1987.
If you don't want to use this, then you can use your .split() method but this time provide a split using / as the separator. For example `url.split('/').

How to get demangled function name using regex

I have list of demangled-function names like _Z6__comp7StudentS_
_Z4SortiSt6vectorI7StudentSaIS0_EE. I read wiki and found out that it follows some sort of defined structure. _Z is mangled Symbol followed by a number and then the function name of that length.
So I wanted to retrieve that function name using regex. I only come close to _Z(?:\d)(?<function_name>[a-z_A-Z]){\1}. But referring \1 won't work because its string, right? Is there a single regex pattern solution to this.
You can use 2 capture groups, and get the part of the string using the position of capture group 2
import re
pattern = r"_Z(\d+)([a-z_A-Z]+)"
s = "_Z4SortiSt6vectorI7StudentSaIS0_EE"
m = re.search(pattern, s)
if m:
print(m.group(2)[0: int(m.group(1))])
Output
Sort
Using _Z6__comp7StudentS_ will return __comp

How can I search a pattern and extract the value behind it

I am a newbee in python. I am trying to pull data (XXXX) out from a text with a pattern PDB:XXXX. The XXXX varies, but it is exactly what I want.
Since the data all contain PDB:, I use re.findall() to search and get this pattern. But this only gave me a list of PDB:. How can I get it to include the XXXX???
this is my code:
text = 'blah...........
PDB:AAAA
blah...........
blah...........
PDB:BBBB'
etc.
r = re.findall("PDB:",text)
and the output gave me:
['PDB:', 'PDB:']
My desired output should be something like
['AAAA', 'BBBB']
You need to use """ to quote multi-line strings in Python. Also, to get a specific subset of the matched pattern, you need to use capture groups (the parentheses in my regular expression below).
import re
text = """blah...........
PDB:AAAA
blah...........
blah...........
PDB:BBBB"""
results = re.findall(r"PDB:(.*)", text)
print results #['AAAA', 'BBBB']

multiple variable in python regex

I have seen several related posts and several forums to find an answer for my question, but nothing has come up to what I need.
I am trying to use variable instead of hard-coded values in regex which search for either word in a line.
However i am able to get desired result if i don't use variable.
<http://www.somesite.com/software/sub/a1#Msoffice>
<http://www.somesite.com/software/sub1/a1#vlc>
<http://www.somesite.com/software/sub2/a2#dell>
<http://www.somesite.com/software/sub3/a3#Notepad>
re.search(r"\#Msoffice|#vlc|#Notepad", line)
This regex will return the line which has #Msoffice OR #vlc OR #Notepad.
I tried defining a single variable using re.escape and that worked absolutely fine. However i have tried many combination using | and , (pipe and comma) but no success.
Is there any way i can specify #Msoffice , #vlc and #Notepad in different variables and so later i can change those ?
Thanks in advance!!
If I did understand you the right way you'd like to insert variables in your regex.
You are actually using a raw string using r' ' to make the regex more readable, but if you're using f' ' it allows you to insert any variables using {your_var} then construct your regex as you like:
var1 = '#Msoffice'
var2 = '#vlc'
var3 = '#Notepad'
re.search(f'{var1}|{var2}|{var3}', line)
The most annoying issue is that you will have to add \ to escaped char, to look for \ it will be \\
Hope it helped
import re
lines = ["<http://www.somesite.com/software/sub/a1#Msoffice>",
"<http://www.somesite.com/software/sub1/a1#vlc>",
"<http://www.somesite.com/software/sub2/a2#dell>",
"<http://www.somesite.com/software/sub3/a3#Notepad>"]
for line in lines:
if re.search(r'\b(?:\#{}|\#{}|\#{})\b'.format('Msoffice', 'vlc', 'Notepad'), line):
print(line)
Output :
<http://www.somesite.com/software/sub/a1#Msoffice>
<http://www.somesite.com/software/sub1/a1#vlc>
<http://www.somesite.com/software/sub3/a3#Notepad>

Python How to split a string when find the first numeric char

This is my string:
"Somestring8/9/0"
I need to get something like this:
['Somestring','8/9/0']
The moment I find a numeric char, I need to split the string to get:
'8/9/0'
This my code:
stringSample = "GigabitEthernet8/9/0"
print re.findall(r'(\w+?)(\d+)', stringSample)[0]
('GigabitEthernet', '8')
But I'm getting this result
What am I doing wrong?
I appreciate your help!!
Your second regex group accepts only digits. Allow it to include forward slashes too.
stringSample = "GigabitEthernet8/9/0"
print re.findall(r'(\w+?)([\d/]+)', stringSample)[0]
# ('GigabitEthernet', '8/9/0')
Try Using the re.split method to split your string in two, passing the maxsplit parameter
re.split('(\w+?)([\d/]+)', stringSample, 1)

Resources