Optional Group python Rgx - python-3.x

import re
string = 'Alabama[edit]'
a = re.search(r'(\w+)(?:\(([\w+\s*]+)\))(\[.*\])',string).group(2)
I have made the () in the optional group, but the result still returned None.
what I want to achieve is that there are two different types of string:
1.Alabama[edit]
2.Alabama (some text)[edit]
I want to abstract either none , if there is no parenthesis or the string in the
parenthesis.
And also I am not sure why this doesn't work for the optional Group I mean if there is on parenthesis , this expression should be ignored and capture the rest group which are captured right?
(?:\(([\w+\s*]+)\))
thanks!
Erik

This seems to work:
a = re.search('(\w+)(\([\w+\s*]+\))?(\[.*\])',string).groups()
print(a) #('Alabama', None, '[edit]')
In your original expression you didn't use the optional indicator. To make the group optional you put the ? at the end, after the closing ). The ?: notation you used means the following group will be ignored in the result, but will be always taken into consideration for matching. It basically says: "Match this group, but i don't want to know anything about it in the result"
I think what you wanted after all is this:
a = re.search('(\w+)(?:\([\w+\s*]+\))?(\[.*\])',string).groups()
so:
import re
s1 = 'Alabama[edit]'
s2 = 'Alabama(test)[edit]'
print(re.search(r'(\w+)(?:\(([\w+\s*]+)\))?(\[.*\])',s1).groups())
#('Alabama', None, '[edit]')
print(re.search(r'(\w+)(?:\(([\w+\s*]+)\))?(\[.*\])',s2).groups())
#('Alabama', 'test', '[edit]')

Related

Is it possible to do lazy formatting of a python string? [duplicate]

I want to use f-string with my string variable, not with string defined with a string literal, "...".
Here is my code:
name=["deep","mahesh","nirbhay"]
user_input = r"certi_{element}" # this string I ask from user
for element in name:
print(f"{user_input}")
This code gives output:
certi_{element}
certi_{element}
certi_{element}
But I want:
certi_{deep}
certi_{mahesh}
certi_{nirbhay}
How can I do this?
f"..." strings are great when interpolating expression results into a literal, but you don't have a literal, you have a template string in a separate variable.
You can use str.format() to apply values to that template:
name=["deep","mahesh","nirbhay"]
user_input = "certi_{element}" # this string i ask from user
for value in name:
print(user_input.format(element=value))
String formatting placeholders that use names (such as {element}) are not variables. You assign a value for each name in the keyword arguments of the str.format() call instead. In the above example, element=value passes in the value of the value variable to fill in the placeholder with the element.
Unlike f-strings, the {...} placeholders are not expressions and you can't use arbitrary Python expressions in the template. This is a good thing, you wouldn't want end-users to be able to execute arbitrary Python code in your program. See the Format String Syntax documenation for details.
You can pass in any number of names; the string template doesn't have to use any of them. If you combine str.format() with the **mapping call convention, you can use any dictionary as the source of values:
template_values = {
'name': 'Ford Prefect',
'number': 42,
'company': 'Sirius Cybernetics Corporation',
'element': 'Improbability Drive',
}
print(user_input.format(**template_values)
The above would let a user use any of the names in template_values in their template, any number of times they like.
While you can use locals() and globals() to produce dictionaries mapping variable names to values, I'd not recommend that approach. Use a dedicated namespace like the above to limit what names are available, and document those names for your end-users.
If you define:
def fstr(template):
return eval(f"f'{template}'")
Then you can do:
name=["deep","mahesh","nirbhay"]
user_input = r"certi_{element}" # this string i ask from user
for element in name:
print(fstr(user_input))
Which gives as output:
certi_deep
certi_mahesh
certi_nirbhay
But be aware that users can use expressions in the template, like e.g.:
import os # assume you have used os somewhere
user_input = r"certi_{os.environ}"
for element in name:
print(fstr(user_input))
You definitely don't want this!
Therefore, a much safer option is to define:
def fstr(template, **kwargs):
return eval(f"f'{template}'", kwargs)
Arbitrary code is no longer possible, but users can still use string expressions like:
user_input = r"certi_{element.upper()*2}"
for element in name:
print(fstr(user_input, element=element))
Gives as output:
certi_DEEPDEEP
certi_MAHESHMAHESH
certi_NIRBHAYNIRBHAY
Which may be desired in some cases.
If you want the user to have access to your namespace, you can do that, but the consequences are entirely on you. Instead of using f-strings, you can use the format method to interpolate dynamically, with a very similar syntax.
If you want the user to have access to only a small number of specific variables, you can do something like
name=["deep", "mahesh", "nirbhay"]
user_input = "certi_{element}" # this string i ask from user
for element in name:
my_str = user_input.format(element=element)
print(f"{my_str}")
You can of course rename the key that the user inputs vs the variable name that you use:
my_str = user_input.format(element=some_other_variable)
And you can just go and let the user have access to your whole namespace (or at least most of it). Please don't do this, but be aware that you can:
my_str = user_input.format(**locals(), **globals())
The reason that I went with print(f'{my_str}') instead of print(my_str) is to avoid the situation where literal braces get treated as further, erroneous expansions. For example, user_input = 'certi_{{{element}}}'
I was looking for something similar with your problem.
I came across this other question's answer: https://stackoverflow.com/a/54780825/7381826
Using that idea, I tweaked your code:
user_input = r"certi_"
for element in name:
print(f"{user_input}{element}")
And I got this result:
certi_deep
certi_mahesh
certi_nirbhay
If you would rather stick to the layout in the question, then this final edit did the trick:
for element in name:
print(f"{user_input}" "{" f"{element}" "}")
Reading the security concerns of all other questions, I don't think this alternative has serious security risks because it does not define a new function with eval().
I am no security expert so please do correct me if I am wrong.
This is what you’re looking for. Just change the last line of your original code:
name=["deep","mahesh","nirbhay"]
user_input = "certi_{element}" # this string I ask from user
for element in name:
print(eval("f'" + f"{user_input}" + "'"))

Filter out ASGs that contain string in AutoScalingGroupName using boto3

I am attempting to filter out auto scaling groups that contain the string 'base' in AutoScalingGroupName. I'm trying to use the JMESpath query language but cannot find any examples on filtering by the value, only for key.
import boto3
session = boto3.Session(profile_name='prod')
asg_client = session.client(
'autoscaling',
region_name='us-west-1'
)
paginator = asg_client.get_paginator('describe_auto_scaling_groups')
page_iterator = paginator.paginate(
PaginationConfig={'PageSize': 100}
)
filtered_asgs = page_iterator.search(
'AutoScalingGroups[] | AutoScalingGroupName[?!contains(#, `{}`)]'.format('base')
)
for asg in filtered_asgs:
pprint.pprint(asg)
This returns
None
None
I've also tried
filtered_asgs = page_iterator.search('AutoScalingGroups[] | [?contains(AutoScalingGroupName[].Value, `{}`)]'.format('base'))
jmespath.exceptions.JMESPathTypeError: In function contains(), invalid type for value: None, expected one of: ['array', 'string'], received: "null"
This is the correct syntax:
substring = 'base'
filtered_args = page_iterator.search(f"AutoScalingGroups[?!contains(AutoScalingGroupName,`{substring}`)][]")
If you prefer the "format-syntax" over f-strings you can of course also write:
filtered_args = page_iterator.search("AutoScalingGroups[?!contains(AutoScalingGroupName,`{}`)][]".format(substring))
And if the substring 'base' is constant you can also write it directly into the expression:
filtered_args = page_iterator.search("AutoScalingGroups[?!contains(AutoScalingGroupName,`base`)][]")
Most of the time you are not interested in the whole content of the response syntax. If you just care about the group name, you can write:
filtered_args = (page['AutoScalingGroupName'] for page in page_iterator.search("AutoScalingGroups[?!contains(AutoScalingGroupName,`base`)][]"))
If you prefer a list as a result over a generator, you can simple replace the surrounding parenthesis with square brackets:
filtered_args = [page['AutoScalingGroupName'] for page in page_iterator.search("AutoScalingGroups[?!contains(AutoScalingGroupName,`base`)][]")]

why re.findall behaves weird way as compared with re.search

Scenario 1: Works as expected
>>> output = 'addr:10.0.2.15'
>>> regnew = re.search(r'addr:(([0-9]+\.){3}[0-9]+)',output)
>>> print(regnew)
<re.Match object; span=(0, 14), match='addr:10.0.2.15'>
>>> print(regnew.group(1))
10.0.2.15
Scenario 2: Works as expected
>>> regnew = re.findall(r'addr:([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)',output)
>>> print(regnew)
['10.0.2.15']
Scenario 3: Does not work as expected. Why is the output not ['10.0.2.15']?
>>> regnew = re.findall(r'addr:([0-9]+\.){3}[0-9]+',output)
>>> print(regnew)
['2.']
Your regex is not correct for what you want:
import re
output = 'addr:10.0.2.15'
regnew = re.findall(r'addr:((?:[0-9]+.){3}[0-9]+)', output)
print(regnew)
Notice what it changed is that I wrapped with parenthesis the full IP address, and added '?:' for the first part of the address. '?:' means it is a non capturing group. findall() as stated in the docs, gives a list of captured groups, that is why you want that '(?:[0-9]+.)' as non capturing group and you want to have the whole thing in a group.
The difference here between findall and everything else is that findall returns capture groups by default (if any are present) instead of the entire matched expression.
A quick fix would be to simply change your repeated group to a noncapturing group, so findall will return the full match rather than the last result in your capture group.
addr:(?:[0-9]+\.){3}[0-9]+
That will of course include addr: in your match. To get just the IP address, wrap both the pattern and quantifier in a capture group.
addr:((?:[0-9]+\.){3}[0-9]+)

using regular expressions isolate the words with ei or ie in it

How do I use regular expressions isolate the words with ei or ie in it?
import re
value = ("How can one receive one who over achieves while believing that he/she cannot be deceived.")
list = re.findall("[ei,ie]\w+", value)
print(list)
it should print ['receive', 'achieves', 'believing', 'deceived'], but I get ['eceive', 'er', 'ieves', 'ile', 'elieving', 'eceived'] instead.
The set syntax [] is for individual characters, so use (?:) instead, with words separated by |. This is like using a group, but it doesn't capture a match group like () would. You also want the \w on either side to be captured to get the whole word.
import re
value = ("How can one receive one who over achieves while believing that he/she cannot be deceived.")
list = re.findall("(\w*(?:ei|ie)\w*)", value)
print(list)
['receive', 'achieves', 'believing', 'deceived']
(I'm assuming you meant "achieves", not "achieve" since that's the word that actually appears here.)

A more "pythonic" approach to "check for None and deal with it"

I have a list of dict with keys ['name','content','summary',...]. All the values are strings. But some values are None. I need to remove all the new lines in content, summary and some other keys. So, I do this:
...
...
for item in item_list:
name = item['name']
content = item['content']
if content is not None: content = content.replace('\n','')
summary = item['summary']
if summary is not None: summary = summary.replace('\n','')
...
...
...
...
I somewhat feel that the if x is not None: x = x.replace('\n','') idiom not so intelligent or clean. Is there a more "pythonic" or better way to do it?
Thanks.
The code feels unwieldy to you, but part of the reason is because you are repeating yourself. This is better:
def remove_newlines(text):
if text is not None:
return text.replace('\n', '')
for item in item_list:
name = item['name']
content = remove_newlines(item['content'])
summary = remove_newlines(item['summary'])
If you are going to use sentinel values (None) then you will be burdened with checking for them.
There are a lot of different answers to your question, but they seem to be missing this point: don't use sentinel values in a dictionary when the absence of an entry encodes the same information.
For example:
bibliography = [
{ 'name': 'bdhar', 'summary': 'questioner' },
{ 'name': 'msw', 'content': 'an answer' },
]
then you can
for article in bibliography:
for key in article:
...
and then your loop is nicely ignorant of what keys, if any, are contained in a given article.
In reading your comments, you claim that you are getting the dict from somewhere else. So clean it of junk values first. It is much more clear to have a cleaning step then it is to carry their misunderstanding through your code.
Python has a ternary operator, so one option is to do this in a more natural word order:
content = content.replace('\n', '') if content is not None else None
Note that if "" and None are equivalent in your case (which appears to be so), you can shorten it to just if content, as non-empty strings evaluate to True.
content = content.replace('\n', '') if content else None
This also follows the Python idiom of explicit is better than implicit. This shows someone following the code that the value can be None very clearly.
It's worth noting that if you repeat this operation a lot, it might be worth encapsulating it as a function.
Another idiom in Python is ask for forgiveness, not permission. So you could simply use try and except the AttributeError that follows, however, this becomes a lot more verbose in this case, so it's probably not worth it, especially as the cost of the check is so small.
try:
content = content.replace('\n', '')
except AttributeError:
content = None
#pass #Also an option, but as mentioned above, explicit is generally clearer than implicit.
One possibility is to use the empty string instead of None. This is not a fully general solution, but in many cases if your data is all of a single type, there will be a sensible "null" value other than None (empty string, empty list, zero, etc.). In this case it looks like you could use the empty string.
The empty string evaluates to False in Python, so the Pythonic way is if content:.
In [2]: bool("")
Out[2]: False
In [3]: bool("hello")
Out[3]: True
Side note but you can make your code a little clearer:
name, content = item["name"], item["content"]
And:
content = content.replace('\n','') if content else None
You might also consider abstracting some of your if clauses into a separate function:
def remove_newlines(mystr):
if mystr:
mystr = mystr.replace('\n')
return mystr
(edited to remove the over-complicated solution with dictionaries, etc)
Try:
if content: content = content.replace('\n','')
--
if content will (almost1) always be True as long as content contains anything except for 0, False, or None.
1As Lattyware correctly points out in the comments, this is not strictly True. There are other things that will evaluate to False in an if statement, for example, an empty list. See the link provided in the comment below.
I think that the "pythonic" thing is to use the fact that None will evaluate to False in an if statement.
So you can just say:
if content: content = content.replace('\n','')

Resources