why re.findall behaves weird way as compared with re.search - python-3.x

Scenario 1: Works as expected
>>> output = 'addr:10.0.2.15'
>>> regnew = re.search(r'addr:(([0-9]+\.){3}[0-9]+)',output)
>>> print(regnew)
<re.Match object; span=(0, 14), match='addr:10.0.2.15'>
>>> print(regnew.group(1))
10.0.2.15
Scenario 2: Works as expected
>>> regnew = re.findall(r'addr:([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)',output)
>>> print(regnew)
['10.0.2.15']
Scenario 3: Does not work as expected. Why is the output not ['10.0.2.15']?
>>> regnew = re.findall(r'addr:([0-9]+\.){3}[0-9]+',output)
>>> print(regnew)
['2.']

Your regex is not correct for what you want:
import re
output = 'addr:10.0.2.15'
regnew = re.findall(r'addr:((?:[0-9]+.){3}[0-9]+)', output)
print(regnew)
Notice what it changed is that I wrapped with parenthesis the full IP address, and added '?:' for the first part of the address. '?:' means it is a non capturing group. findall() as stated in the docs, gives a list of captured groups, that is why you want that '(?:[0-9]+.)' as non capturing group and you want to have the whole thing in a group.

The difference here between findall and everything else is that findall returns capture groups by default (if any are present) instead of the entire matched expression.
A quick fix would be to simply change your repeated group to a noncapturing group, so findall will return the full match rather than the last result in your capture group.
addr:(?:[0-9]+\.){3}[0-9]+
That will of course include addr: in your match. To get just the IP address, wrap both the pattern and quantifier in a capture group.
addr:((?:[0-9]+\.){3}[0-9]+)

Related

Defining a function to find the unique palindromes in a given string

I'm kinda new to python.I'm trying to define a function when asked would give an output of only unique words which are palindromes in a string.
I used casefold() to make it case-insensitive and set() to print only uniques.
Here's my code:
def uniquePalindromes(string):
x=string.split()
for i in x:
k=[]
rev= ''.join(reversed(i))
if i.casefold() == rev.casefold():
k.append(i.casefold())
print(set(k))
else:
return
I've tried to run this line
print( uniquePalindromes('Hanah asked Sarah but Sarah refused') )
The expected output should be ['hanah','sarah'] but its returning only {'hanah'} as the output. Please help.
Your logic is sound, and your function is mostly doing what you want it to. Part of the issue is how you're returning things - all you're doing is printing the set of each individual word. For example, when I take your existing code and do this:
>>> print(uniquePalindromes('Hannah Hannah Alomomola Girafarig Yes Nah, Chansey Goldeen Need log'))
{'hannah'}
{'alomomola'}
{'girafarig'}
None
hannah, alomomola, and girafarig are the palindromes I would expect to see, but they're not given in the format I expect. For one, they're being printed, instead of returned, and for two, that's happening one-by-one.
And the function is returning None, and you're trying to print that. This is not what we want.
Here's a fixed version of your function:
def uniquePalindromes(string):
x=string.split()
k = [] # note how we put it *outside* the loop, so it persists across each iteration without being reset
for i in x:
rev= ''.join(reversed(i))
if i.casefold() == rev.casefold():
k.append(i.casefold())
# the print statement isn't what we want
# no need for an else statement - the loop will continue anyway
# now, once all elements have been visited, return the set of unique elements from k
return set(k)
now it returns roughly what you'd expect - a single set with multiple words, instead of printing multiple sets with one word each. Then, we can print that set.
>>> print(uniquePalindromes("Hannah asked Sarah but Sarah refused"))
{'hannah'}
>>> print(uniquePalindromes("Hannah and her friend Anna caught a Girafarig and named it hannaH"))
{'anna', 'hannah', 'girafarig', 'a'}
they are not gonna like me on here if I give you some tips. But try to divide the amount of characters (that aren't whitespace) into 2. If the amount on each side is not equivalent then you must be dealing with an odd amount of letters. That means that you should be able to traverse the palindrome going downwards from the middle and upwards from the middle, comparing those letters together and using the middle point as a "jump off" point. Hope this helps

Python: Print entire line of string match and not cut off after the period

See bottom for the solution I came up with.
Hopefully this is a easy question for you guys. Trying to match a string to a list and print just that string matched. I was successful using re, but it is cutting off the rest of the string after the period. The span per re is 0,10 and when i look at the output without using re it is 0,14 not 0,10 so match is cutting off the info after the period. So I would like to learn how to tell it to print the entire span or learn a new way to match a var string to a list and print that exact string. My original attempts printed anything with the TESTPR in it, 3 printed total, the others I do not want printing have a 1 in the front and the last match has an additional R at the end. Here is my current match code:
#OLD See below
for element in catalog:
z = re.match("((TESTPRR )\w+)", element)
if z:
print((z.group()))
Output: TESTPR 105
It should show:
Wanted output: TESTPT 105.465
It will go up to 3 decimal places after the period and no more. I am currently taking a Python class to learn Python and love it so far, but this one has me stumped as I am just now learning about re and matching by reading as we have not gotten to that yet in class.
I am open to learning a different way to search for and match a string and print just that string. For my first attempt that prints 3 results was this:
catalog = [ long list pulled from API then code here to make it a nice column]
prod = 'TESTPR'
print ([s for s in catalog if prod in s])
When I add a space at the end of prod i can get rid of the match with the extra char at the end, but I cannot add a space to do the same thing with the match that has an extra char at the front. This is for the code above and not for the re match code. Thanks!
Answer below!
Since you are interested in learning about ways to match strings and solve your problem: try fuzzywuzzy.
In your case you could try:
from fuzzywuzzy import process
catalog = [long list pulled from API then code here to make it a nice column]
prod = "TESTPR"
hit = process.extractOne(prod, catalog, score_cutoff = 75) #you can adjust this to suit how close the match should be
print(hit[0]) #hit will be sth like ("TESTPT 105.465", 75)
Output: TESTPT 105.465
For information on different ways of using fuzzywuzzy, check out this link.
You can use different ways of matching such as:
fuzz.partial_ratio
fuzz.ratio
token_sort_ratio
fuzz.token_set_ratio
for this from fuzzywuzzy import fuzz
Kept at it with re.match and got the correct regex so the entire match prints and it does not cut off numbers after the period.
my original match as you can see above was re.match("((TESTPRR )\w+)", element), some of the ( were unneeded and needed to add a few more expressions and now it prints the correct match. See above for old code and below for the new code that works.
# New code, replaced w+ with w*\d*[.,]?\d*$
for element in catalog:
z = re.match("STRING\w*\d*[.,]?\d*$", element)
if z:
print(z.group())

using regular expressions isolate the words with ei or ie in it

How do I use regular expressions isolate the words with ei or ie in it?
import re
value = ("How can one receive one who over achieves while believing that he/she cannot be deceived.")
list = re.findall("[ei,ie]\w+", value)
print(list)
it should print ['receive', 'achieves', 'believing', 'deceived'], but I get ['eceive', 'er', 'ieves', 'ile', 'elieving', 'eceived'] instead.
The set syntax [] is for individual characters, so use (?:) instead, with words separated by |. This is like using a group, but it doesn't capture a match group like () would. You also want the \w on either side to be captured to get the whole word.
import re
value = ("How can one receive one who over achieves while believing that he/she cannot be deceived.")
list = re.findall("(\w*(?:ei|ie)\w*)", value)
print(list)
['receive', 'achieves', 'believing', 'deceived']
(I'm assuming you meant "achieves", not "achieve" since that's the word that actually appears here.)

Optional Group python Rgx

import re
string = 'Alabama[edit]'
a = re.search(r'(\w+)(?:\(([\w+\s*]+)\))(\[.*\])',string).group(2)
I have made the () in the optional group, but the result still returned None.
what I want to achieve is that there are two different types of string:
1.Alabama[edit]
2.Alabama (some text)[edit]
I want to abstract either none , if there is no parenthesis or the string in the
parenthesis.
And also I am not sure why this doesn't work for the optional Group I mean if there is on parenthesis , this expression should be ignored and capture the rest group which are captured right?
(?:\(([\w+\s*]+)\))
thanks!
Erik
This seems to work:
a = re.search('(\w+)(\([\w+\s*]+\))?(\[.*\])',string).groups()
print(a) #('Alabama', None, '[edit]')
In your original expression you didn't use the optional indicator. To make the group optional you put the ? at the end, after the closing ). The ?: notation you used means the following group will be ignored in the result, but will be always taken into consideration for matching. It basically says: "Match this group, but i don't want to know anything about it in the result"
I think what you wanted after all is this:
a = re.search('(\w+)(?:\([\w+\s*]+\))?(\[.*\])',string).groups()
so:
import re
s1 = 'Alabama[edit]'
s2 = 'Alabama(test)[edit]'
print(re.search(r'(\w+)(?:\(([\w+\s*]+)\))?(\[.*\])',s1).groups())
#('Alabama', None, '[edit]')
print(re.search(r'(\w+)(?:\(([\w+\s*]+)\))?(\[.*\])',s2).groups())
#('Alabama', 'test', '[edit]')

Need help working with lists within lists

I'm taking a programming class and have our first assignment. I understand how it's supposed to work, but apparently I haven't hit upon the correct terms to search to get help (and the book is less than useless).
The assignment is to take a provided data set (names and numbers) and perform some manipulation and computation with it.
I'm able to get the names into a list, and know the general format of what commands I'm giving, but the specifics are evading me. I know that you refer to the numbers as names[0][1], names[1][1], etc, but not how to refer to just that record that is being changed. For example, we have to have the program check if a name begins with a letter that is Q or later; if it does, we double the number associated with that name.
This is what I have so far, with ??? indicating where I know something goes, but not sure what it's called to search for it.
It's homework, so I'm not really looking for answers, but guidance to figure out the right terms to search for my answers. I already found some stuff on the site (like the statistics functions), but just can't find everything the book doesn't even mention.
names = [("Jack",456),("Kayden",355),("Randy",765),("Lisa",635),("Devin",358),("LaWanda",452),("William",308),("Patrcia",256)]
length = len(names)
count = 0
while True
count < length:
if ??? > "Q" # checks if first letter of name is greater than Q
??? # doubles number associated with name
count += 1
print(names) # self-check
numberNames = names # creates new list
import statistics
mean = statistics.mean(???)
median = statistics.median(???)
print("Mean value: {0:.2f}".format(mean))
alphaNames = sorted(numberNames) # sorts names list by name and creates new list
print(alphaNames)
first of all you need to iter over your names list. To do so use for loop:
for person in names:
print(person)
But names are a list of tuples so you will need to get the person name by accessing the first item of the tuple. You do this just like you do with lists
name = person[0]
score = person[1]
Finally to get the ASCII code of a character, you use ord() function. That is going to be helpful to know if name starts with a Q or above.
print(ord('A'))
print(ord('Q'))
print(ord('R'))
This should be enough informations to get you started with.
I see a few parts to your question, so I'll try to separate them out in my response.
check if first letter of name is greater than Q
Hopefully this will help you with the syntax here. Like list, str also supports element access by index with the [] syntax.
$ names = [("Jack",456),("Kayden",355)]
$ names[0]
('Jack', 456)
$ names[0][0]
'Jack'
$ names[0][0][0]
'J'
$ names[0][0][0] < 'Q'
True
$ names[0][0][0] > 'Q'
False
double number associated with name
$ names[0][1]
456
$ names[0][1] * 2
912
"how to refer to just that record that is being changed"
We are trying to update the value associated with the name.
In theme with my previous code examples - that is, we want to update the value at index 1 of the tuple stored at index 0 in the list called names
However, tuples are immutable so we have to be a little tricky if we want to use the data structure you're using.
$ names = [("Jack",456), ("Kayden", 355)]
$ names[0]
('Jack', 456)
$ tpl = names[0]
$ tpl = (tpl[0], tpl[1] * 2)
$ tpl
('Jack', 912)
$ names[0] = tpl
$ names
[('Jack', 912), ('Kayden', 355)]
Do this for all tuples in the list
We need to do this for the whole list, it looks like you were onto that with your while loop. Your counter variable for indexing the list is named count so just use that to index a specific tuple, like: names[count][0] for the countth name or names[count][1] for the countth number.
using statistics for calculating mean and median
I recommend looking at the documentation for a module when you want to know how to use it. Here is an example for mean:
mean(data)
Return the sample arithmetic mean of data.
$ mean([1, 2, 3, 4, 4])
2.8
Hopefully these examples help you with the syntax for continuing your assignment, although this could turn into a long discussion.
The title of your post is "Need help working with lists within lists" ... well, your code example uses a list of tuples
$ names = [("Jack",456),("Kayden",355)]
$ type(names)
<class 'list'>
$ type(names[0])
<class 'tuple'>
$ names = [["Jack",456], ["Kayden", 355]]
$ type(names)
<class 'list'>
$ type(names[0])
<class 'list'>
notice the difference in the [] and ()
If you are free to structure the data however you like, then I would recommend using a dict (read: dictionary).
I know that you refer to the numbers as names[0][1], names[1][1], etc, but
not how to refer to just that record that is being changed. For
example, we have to have the program check if a name begins with a
letter that is Q or later; if it does, we double the number associated
with that name.
It's not entirely clear what else you have to do in this assignment, but regarding your concerns above, to reference the ith"record that is being changed" in your names list, simply use names[i]. So, if you want to access the first record in names, simply use names[0], since indexing in Python begins at zero.
Since each element in your list is a tuple (which can also be indexed), using constructs like names[0][0] and names[0][1] are ways to index the values within the tuple, as you pointed out.
I'm unsure why you're using while True if you're trying to iterate through each name and check whether it begins with "Q". It seems like a for loop would be better, unless your class hasn't gotten there yet.
As for checking whether the first letter is 'Q', str (string) objects are indexed similarly to lists and tuples. To access the first letter in a string, for example, see the following:
>>> my_string = 'Hello'
>>> my_string[0]
'H'
If you give more information, we can help guide you with the statistics piece, as well. But I would first suggest you get some background around mean and median (if you're unfamiliar).

Resources