Python regular expressions: Better way to handle non-matches? - python-3.x

When I deal with regular expressions, my code is littered with conditionals so as to not create exceptions when a pattern is not found:
m = some_compiled_pattern.match(s)
if m:
x = m.groups()
do_something_with(x)
m = some_other_compiled_pattern.search(s):
if m:
y = m.groupdict()
else:
y = {}
do_something_else_with(y)
Isn't there a better (less verbose) way to handle such exceptions?

You might find this class useful to reduce most of those if-no-match handling to a one line.
class Returns:
"""
Makes an object that pretends to have all possible methods,
but returns the same value (default None) no matter what this method,
or its arguments, is.
"""
def __init__(self, return_val=None):
self.return_val = return_val
def the_only_method_there_is(*args, **kwargs):
return return_val
self.the_only_method_there_is = MethodType(the_only_method_there_is, self)
def __getattr__(self, item):
if not item.startswith('_') and item not in {'return_val', 'the_only_method_there_id'}:
return self.the_only_method_there_is
else:
return getattr(self, item)
Example use:
>>> import re
>>> p = re.compile(r'(\d+)\W+(\w+)')
>>>
>>> # when all goes well...
>>> m = p.search('The number 42 is mentioned often')
>>> num, next_word = m.groups()
>>> num, next_word
('42', 'is')
>>>
>>> # when the pattern is not found...
>>> m = p.search('No number here')
>>> assert m is None # m is None so...
>>> num, next_word = m.groups() # ... this is going to choke
Traceback (most recent call last):
...
AttributeError: 'NoneType' object has no attribute 'groups'
>>>
>>> # Returns to the rescue
>>> num, next_word = (p.search('No number here') or Returns((None, 'default_word'))).groups()
>>> assert num is None
>>> next_word
'default_word'
EDIT: See this gist for a longer discussion (and alternate but similar solution) of this problem.

Related

Unable to compare types of identical but redeclared namedtuples in Python

While working on a difference engine to identify differences in very large data structures, I noticed that a type comparison between identical-but-redeclared namedtuples misbehaves. Redeclaring the namedtuples is unavoidable*. Here is a minimal example:
def test_named_tuples_same_type():
from collections import namedtuple
X = namedtuple("X", "x")
a = X(1)
# We are unable to avoid redeclaring X
X = namedtuple("X", "x")
b = X(1)
print(repr(a))
print(repr(b))
# X(x=1)
# X(x=1)
assert isinstance(type(a), type(b)) # fail
assert type(a) == type(b) # fail
The asserts return:
> assert isinstance(type(a), type(b)) # fail
E AssertionError: assert False
E + where False = isinstance(<class 'tests.test_deep_diff.X'>, <class 'tests.test_deep_diff.X'>)
E + where <class 'tests.test_deep_diff.X'> = type(X(x=1))
E + and <class 'tests.test_deep_diff.X'> = type(X(x=1))
and
> assert type(a) == type(b) # fail
E AssertionError: assert <class 'tests.test_deep_diff.X'> == <class 'tests.test_deep_diff.X'>
How to assert the type of both are equal or semantically equal (without str(type()))?
*Redeclaring the namedtuple is unavoidable because it takes place in unmodifiable exec'd code to generate the data structures being diffed.
It isn't entirely clear what you mean by semantically equivalent precisely. But consider:
>>> from collections import namedtuple
>>> X1 = namedtuple("X", "x")
>>> X2 = namedtuple("X", "x")
Then you can use something like:
>>> def equivalent_namedtuple_types(t1, t2):
... return (t1.__name__, t1._fields) == (t2.__name__, t2._fields)
...
>>> equivalent_namedtuple_types(X1, X2)
True
>>>
From your comments, it seems like you may care about the .__module__ attribute as well.

Decorators or assertions in setters to check property type?

In a python project, my class has several properties that I need to be of specific type. Users of the class must have the ability to set the property.
What is the best way to do this? Two solutions come to my mind:
1. Have test routines in each setter function.
2. Use decorators for attributes
My current solution is 1 but I am not happy with it due to the code duplication. It looks like this:
class MyClass(object):
#property
def x(self):
return self._x
#x.setter
def x(self, val):
if not isinstance(self, int):
raise Exception("Value must be of type int")
self._x = val
#property
def y(self):
return self._y
#x.setter
def y(self, val):
if not isinstance(self, (tuple, set, list)):
raise Exception("Value must be of type tuple or set or list")
self._y = val
From what I know of decorators, it should be possible to have a decorator before def x(self) handle this job. Alas I fail miserably at this, as all examples I found (like this or this) are not targeted at what I want.
The first question is thus: Is it better to use a decorator to check property types? If yes, the next question is: What is wrong with below decorator (I want to be able write #accepts(int)?
def accepts(types):
"""Decorator to check types of property."""
def outer_wrapper(func):
def check_accepts(prop):
getter = prop.fget
if not isinstance(self[0], types):
msg = "Wrong type."
raise ValueError(msg)
return self
return check_accepts
return outer_wrapper
Appetizer
Callables
This is likely beyond your needs, since it sounds like you're dealing with end-user input, but I figured it may be helpful for others.
Callables include functions defined with def, built-in functions/methods such as open(), lambda expressions, callable classes, and many more. Obviously, if you only want to allow a certain type(s) of callables, you can still use isinstance() with types.FunctionType, types.BuiltinFunctionType, types.LambdaType, etc. But if this is not the case, the best solution to this that I am aware of is demonstrated by the MyDecoratedClass.z property using isinstance() with collections.abc.Callable. It's not perfect, and will return false positives in extraordinary cases (for example, if a class defines a __call__ function that doesn't actually make the class callable). The callable(obj) built-in is the only foolproof check function to my knowledge. The MyClass.z the use property demonstrates this function, but you'd have to write another/modify the existing decorator function in MyDecoratedClass in order to support the use of check functions other than isinstance().
Iterables (and Sequences and Sets)
The y property in the code you provided is supposed to be restricted to tuples, sets, and lists, so the following may be of some use to you.
Instead of checking if arguments are of individual types, you might want to consider using Iterable, Sequence, and Set from the collections.abc module. Please use caution though, as these types are far less restrictive than simply passing (tuple, set, list) as you have. abc.Iterable (as well as the others) work near-perfectly with isinstance(), although it does sometimes return false positives as well (e.g. a class defines an __iter__ function but doesn't actually return an iterator -- who hurt you?). The only foolproof method of determining whether or not an argument is iterable is by calling the iter(obj) built-in and letting it raise a TypeError if it's not iterable, which could work in your case. I don't know of any built-in alternatives to abc.Sequence and abc.Set, but almost every sequence/set object is also iterable as of Python 3, if that helps. The MyClass.y2 property implements iter() as a demonstration, however the decorator function in MyDecoratedClass does not (currently) support functions other than isinstance(); as such, MyDecoratedClass.y2 uses abc.Iterable instead.
For the completeness' sake, here is a quick comparison of their differences:
>>> from collections.abc import Iterable, Sequence, Set
>>> def test(x):
... print((isinstance(x, Iterable),
... isinstance(x, Sequence),
... isinstance(x, Set)))
...
>>> test(123) # int
False, False, False
>>> test("1, 2, 3") # str
True, True, False
>>> test([1, 2, 3]) # list
(True, True, False)
>>> test(range(3)) # range
(True, True, False)
>>> test((1, 2, 3)) # tuple
(True, True, False)
>>> test({1, 2, 3}) # set
(True, False, True)
>>> import numpy as np
>>> test(numpy.arange(3)) # numpy.ndarray
(True, False, False)
>>> test(zip([1, 2, 3],[4, 5, 6])) # zip
(True, False, False)
>>> test({1: 4, 2: 5, 3: 6}) # dict
(True, False, False)
>>> test({1: 4, 2: 5, 3: 6}.keys()) # dict_keys
(True, False, True)
>>> test({1: 4, 2: 5, 3: 6}.values()) # dict_values
(True, False, False)
>>> test({1: 4, 2: 5, 3: 6}.items()) # dict_items
(True, False, True)
Other Restrictions
Virtually all other argument type restrictions that I can think of must use hasattr(), which I'm not going to get into here.
Main Course
This is the part that actually answers your question. assert is definitely the simplest solution, but it has its limits.
class MyClass:
#property
def x(self):
return self._x
#x.setter
def x(self, val):
assert isinstance(val, int) # raises AssertionError if val is not of type 'int'
self._x = val
#property
def y(self):
return self._y
#y.setter
def y(self, val):
assert isinstance(val, (list, set, tuple)) # raises AssertionError if val is not of type 'list', 'set', or 'tuple'
self._y = val
#property
def y2(self):
return self._y2
#y2.setter
def y2(self, val):
iter(val) # raises TypeError if val is not iterable
self._y2 = val
#property
def z(self):
return self._z
#z.setter
def z(self, val):
assert callable(val) # raises AssertionError if val is not callable
self._z = val
def multi_arg_example_fn(self, a, b, c, d, e, f, g):
assert isinstance(a, int)
assert isinstance(b, int)
# let's say 'c' is unrestricted
assert isinstance(d, int)
assert isinstance(e, int)
assert isinstance(f, int)
assert isinstance(g, int)
this._a = a
this._b = b
this._c = c
this._d = d
this._e = e
this._f = f
this._g = g
return a + b * d - e // f + g
Pretty clean overall, besides the multi-argument function I threw in there at the end, demonstrating that asserts can get tedious. However, I'd argue that the biggest drawback here is the lack of Exception messages/variables. If the end-user sees an AssertionError, it has no message and is therefore mostly useless. If you write intermediate code that could except these errors, that code will have no variables/data to be able to explain to the user what went wrong. Enter the decorator function...
from collections.abc import Callable, Iterable
class MyDecoratedClass:
def isinstance_decorator(*classinfo_args, **classinfo_kwargs):
'''
Usage:
Always remember that each classinfo can be a type OR tuple of types.
If the decorated function takes, for example, two positional arguments...
* You only need to provide positional arguments up to the last positional argument that you want to restrict the type of. Take a look:
1. Restrict the type of only the first argument with '#isinstance_decorator(<classinfo_of_arg_1>)'
* Notice that a second positional argument is not required
* Although if you'd like to be explicit for clarity (in exchange for a small amount of efficiency), use '#isinstance_decorator(<classinfo_of_arg_1>, object)'
* Every object in Python must be of type 'object', so restricting the argument to type 'object' is equivalent to no restriction whatsoever
2. Restrict the types of both arguments with '#isinstance_decorator(<classinfo_of_arg_1>, <classinfo_of_arg_2>)'
3. Restrict the type of only the second argument with '#isinstance_decorator(object, <classinfo_of_arg_2>)'
* Every object in Python must be of type 'object', so restricting the argument to type 'object' is equivalent to no restriction whatsoever
Keyword arguments are simpler: #isinstance_decorator(<a_keyword> = <classinfo_of_the_kwarg>, <another_keyword> = <classinfo_of_the_other_kwarg>, ...etc)
* Remember that you only need to include the kwargs that you actually want to restrict the type of (no using 'object' as a keyword argument!)
* Using kwargs is probably more efficient than using example 3 above; I would avoid having to use 'object' as a positional argument as much as possible
Programming-Related Errors:
Raises IndexError if given more positional arguments than decorated function
Raises KeyError if given keyword argument that decorated function isn't expecting
Raises TypeError if given argument that is not of type 'type'
* Raised by 'isinstance()' when fed improper 2nd argument, like 'isinstance(foo, 123)'
* Virtually all UN-instantiated objects are of type 'type'
Examples:
example_instance = ExampleClass(*args)
# Neither 'example_instance' nor 'ExampleClass(*args)' is of type 'type', but 'ExampleClass' itself is
example_int = 100
# Neither 'example_int' nor '100' are of type 'type', but 'int' itself is
def example_fn: pass
# 'example_fn' is not of type 'type'.
print(type(example_fn).__name__) # function
print(type(isinstance).__name__) # builtin_function_or_method
# As you can see, there are also several types of callable objects
# If needed, you can retrieve most function/method/etc. types from the built-in 'types' module
Functional/Intended Errors:
Raises TypeError if a decorated function argument is not an instance of the type(s) specified by the corresponding decorator argument
'''
def isinstance_decorator_wrapper(old_fn):
def new_fn(self, *args, **kwargs):
for i in range(len(classinfo_args)):
classinfo = classinfo_args[i]
arg = args[i]
if not isinstance(arg, classinfo):
raise TypeError("%s() argument %s takes argument of type%s' but argument of type '%s' was given" %
(old_fn.__name__, i,
"s '" + "', '".join([x.__name__ for x in classinfo]) if isinstance(classinfo, tuple) else " '" + classinfo.__name__,
type(arg).__name__))
for k, classinfo in classinfo_kwargs.items():
kwarg = kwargs[k]
if not isinstance(kwarg, classinfo):
raise TypeError("%s() keyword argument '%s' takes argument of type%s' but argument of type '%s' was given" %
(old_fn.__name__, k,
"s '" + "', '".join([x.__name__ for x in classinfo]) if isinstance(classinfo, tuple) else " '" + classinfo.__name__,
type(kwarg).__name__))
return old_fn(self, *args, **kwargs)
return new_fn
return isinstance_decorator_wrapper
#property
def x(self):
return self._x
#x.setter
#isinstance_decorator(int)
def x(self, val):
self._x = val
#property
def y(self):
return self._y
#y.setter
#isinstance_decorator((list, set, tuple))
def y(self, val):
self._y = val
#property
def y2(self):
return self._y2
#y2.setter
#isinstance_decorator(Iterable)
def y2(self, val):
self._y2 = val
#property
def z(self):
return self._z
#z.setter
#isinstance_decorator(Callable)
def z(self, val):
self._z = val
#isinstance_decorator(int, int, e = int, f = int, g = int, d = (int, float, str))
def multi_arg_example_fn(self, a, b, c, d, e, f, g):
# Identical to assertions in MyClass.multi_arg_example_fn
self._a = a
self._b = b
self._c = c
self._d = d
return a + b * e - f // g
Clearly, multi_example_fn is one place where this decorator really shines. The clutter made by assertions has been reduced to a single line. Let's take a look at some example error messages:
>>> test = MyClass()
>>> dtest = MyDecoratedClass()
>>> test.x = 10
>>> dtest.x = 10
>>> print(test.x == dtest.x)
True
>>> test.x = 'Hello'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 7, in x
AssertionError
>>> dtest.x = 'Hello'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 100, in new_fn
TypeError: x() argument 0 takes argument of type 'int' but argument of type 'str' was given
>>> test.y = 1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 15, in y
AssertionError
>>> test.y2 = 1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 23, in y2
TypeError: 'int' object is not iterable
>>> dtest.y = 1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 100, in new_fn
TypeError: y() argument 0 takes argument of types 'list', 'set', 'tuple' but argument of type 'int' was given
>>> dtest.y2 = 1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 100, in new_fn
TypeError: y2() argument 0 takes argument of type 'Iterable' but argument of type 'int' was given
>>> test.z = open
>>> dtest.z = open
>>> test.z = None
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 31, in z
AssertionError
>>> dtest.z = None
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 100, in new_fn
TypeError: z() argument 0 takes argument of type 'Callable' but argument of type 'NoneType' was given
Far superior in my opinion. Everything looks good except...
>>> test.multi_arg_example_fn(9,4,[1,2],'hi', g=2,e=1,f=4)
11
>>> dtest.multi_arg_example_fn(9,4,[1,2],'hi', g=2,e=1,f=4)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 102, in new_fn
KeyError: 'd'
>>> print('I forgot that you have to merge args and kwargs in order for the decorator to work properly with both but I dont have time to fix it right now. Absolutely safe for properties for the time being though!')
I forgot that you have to merge args and kwargs in order for the decorator to work properly with both but I dont have time to fix it right now. Absolutely safe for properties for the time being though!
Edit Notice: My previous answer was completely incorrect. I was suggesting the use of type hints, forgetting that they aren't actually ensured in any way. They are strictly a development/IDE tool. They still are insanely helpful though; I recommend looking into using them.

What's the underlying implementation for most_common method of Counter?

I found a pyi file which has the following def
def most_common(self, n: Optional[int] = ...) -> List[Tuple[_T, int]]: ...
How could this happen? List is not defined, and no implementation?
Just highlight some valuable suggestions here for followers:
List is imported from the typing module; it's not the same thing as list. The .pyi file doesn't need to import it because stub files are never executed; they just have to be syntactically valid Python
If you use from future import annotations, you won't have to import typing to use List et al. in function annotations in .py files, either, since function annotations will be treated as string literals. (Starting in Python 4, that will be the default behavior. See PEP 563 for details.)
You are looking at the pyi file which is used solely for annotations. It is never executed by the Python interpreter. You can learn more about pyi files by reading PEP484.
Using a debugger, put a breakpoint on the line where you call most_commonand then step into the method.
Python 3.7 implementation.
...\Lib\collections\__init__.py:
def most_common(self, n=None):
'''List the n most common elements and their counts from the most
common to the least. If n is None, then list all element counts.
>>> Counter('abcdeabcdabcaba').most_common(3)
[('a', 5), ('b', 4), ('c', 3)]
'''
# Emulate Bag.sortedByCount from Smalltalk
if n is None:
return sorted(self.items(), key=_itemgetter(1), reverse=True)
return _heapq.nlargest(n, self.items(), key=_itemgetter(1))
_heapq.nlargest (in ...\Lib\heapq.py) implementation:
def nlargest(n, iterable, key=None):
"""Find the n largest elements in a dataset.
Equivalent to: sorted(iterable, key=key, reverse=True)[:n]
"""
# Short-cut for n==1 is to use max()
if n == 1:
it = iter(iterable)
sentinel = object()
if key is None:
result = max(it, default=sentinel)
else:
result = max(it, default=sentinel, key=key)
return [] if result is sentinel else [result]
# When n>=size, it's faster to use sorted()
try:
size = len(iterable)
except (TypeError, AttributeError):
pass
else:
if n >= size:
return sorted(iterable, key=key, reverse=True)[:n]
# When key is none, use simpler decoration
if key is None:
it = iter(iterable)
result = [(elem, i) for i, elem in zip(range(0, -n, -1), it)]
if not result:
return result
heapify(result)
top = result[0][0]
order = -n
_heapreplace = heapreplace
for elem in it:
if top < elem:
_heapreplace(result, (elem, order))
top, _order = result[0]
order -= 1
result.sort(reverse=True)
return [elem for (elem, order) in result]
# General case, slowest method
it = iter(iterable)
result = [(key(elem), i, elem) for i, elem in zip(range(0, -n, -1), it)]
if not result:
return result
heapify(result)
top = result[0][0]
order = -n
_heapreplace = heapreplace
for elem in it:
k = key(elem)
if top < k:
_heapreplace(result, (k, order, elem))
top, _order, _elem = result[0]
order -= 1
result.sort(reverse=True)
return [elem for (k, order, elem) in result]

Python 3: Invalid Syntax Error when using * operator in __init__ between pre-defined values defined in-scope

I am getting a syntax error when trying to do the following MCVE in Python 3.
HEIGHT = 26
WIDTH = 26
OTHERVAR = 5
class Foo():
def __init__(self, OTHERVAR, HEIGHT*WIDTH):
print (str(OTHERVAR + HEIGHT*WIDTH))
foo_inst = Foo()
Below is the error
File "a.py", line 6
def __init__(self, OTHERVAR, HEIGHT*WIDTH):
^
SyntaxError: invalid syntax
I'm wondering why the multiplication * operator is invalid syntax in this scenario.
If someone could explain why this is bad syntax and offer a potential workaround, that would be great. Thank you.
A function parameter supposes to be a variable, your HEIGHT*WIDTH produces a value, not a variable.
Are you probably looking for this (default value)?
>>> a = 1
>>> b = 2
>>> def test(c=a*b):
... print(c)
...
>>> test()
2
>>> def test(c=a*b, d):
... print(c, d)
...
File "<stdin>", line 1
SyntaxError: non-default argument follows default argument
>>> def test(d, c=a*b):
... print(d, c)
...
>>> test(10)
(10, 2)
And called by named parameters
>>> def test(d, c=a*b, e=20):
... print(d, c, e)
...
>>> test(10, e=30)
(10, 2, 30)

Why does type(mock.MagicMock()) == mock.MagicMock returns False?

In Python3.4:
>>> import mock.MagicMock
>>> type(mock.MagicMock()) == mock.MagicMock
False # Huh, why is that?
>>> isinstance(mock.MagicMock(), mock.MagicMock)
True
When I simplify this to class A and B I type(B()) == B returns True:
>>> class A: pass
>>> class B: pass
>>> class C(A, B): pass
>>> type(B()) == B
True # Of course I would say.
Why returns type(mock.MagicMock()) == mock.MagicMock False? I know about the difference between isinstance() and type() in Python. type() doesn't 'understand' subclassing where isinstance does. But I don't see how that is that difference is involved here.
source of mock.MagicMock.
More experiments suggest the answer.
>>> from unittest.mock import MagicMock as mm
>>> mm1 = mm()
>>> mm2 = mm()
>>> type(mm1)
<class 'unittest.mock.MagicMock'>
>>> type(mm2)
<class 'unittest.mock.MagicMock'>
>>> type(mm1) == type(mm2)
False
>>> id(type(mm1))
53511896
>>> id(type(mm2))
53510984
>>> type(mm1) is mm1.__class__
True
>>> mm
<class 'unittest.mock.MagicMock'>
>>> id(mm)
53502776
Conclusion: each instance of MagicMock has a 'class' that looks like MagicMock, but is not. What is the new that creates such instances? MagicMock subclasses Mock, which subclasses NonCallableMock, which has this new method.
def __new__(cls, *args, **kw):
# every instance has its own class
# so we can create magic methods on the
# class without stomping on other mocks
new = type(cls.__name__, (cls,), {'__doc__': cls.__doc__})
instance = object.__new__(new)
return instance
The new = ... statement creates a subclass of the cls argument with the same name and docstring. The next line creates a single instance of this subclass. So Mocks follow a revised equality instead of type(mm()) is mm.
>>> mm.__bases__
(<class 'unittest.mock.MagicMixin'>, <class 'unittest.mock.Mock'>)
>>> type(mm1).__bases__
(<class 'unittest.mock.MagicMock'>,)
>>> type(mm1).__bases__[0] is mm
True

Resources