Unable to compare types of identical but redeclared namedtuples in Python - python-3.x

While working on a difference engine to identify differences in very large data structures, I noticed that a type comparison between identical-but-redeclared namedtuples misbehaves. Redeclaring the namedtuples is unavoidable*. Here is a minimal example:
def test_named_tuples_same_type():
from collections import namedtuple
X = namedtuple("X", "x")
a = X(1)
# We are unable to avoid redeclaring X
X = namedtuple("X", "x")
b = X(1)
print(repr(a))
print(repr(b))
# X(x=1)
# X(x=1)
assert isinstance(type(a), type(b)) # fail
assert type(a) == type(b) # fail
The asserts return:
> assert isinstance(type(a), type(b)) # fail
E AssertionError: assert False
E + where False = isinstance(<class 'tests.test_deep_diff.X'>, <class 'tests.test_deep_diff.X'>)
E + where <class 'tests.test_deep_diff.X'> = type(X(x=1))
E + and <class 'tests.test_deep_diff.X'> = type(X(x=1))
and
> assert type(a) == type(b) # fail
E AssertionError: assert <class 'tests.test_deep_diff.X'> == <class 'tests.test_deep_diff.X'>
How to assert the type of both are equal or semantically equal (without str(type()))?
*Redeclaring the namedtuple is unavoidable because it takes place in unmodifiable exec'd code to generate the data structures being diffed.

It isn't entirely clear what you mean by semantically equivalent precisely. But consider:
>>> from collections import namedtuple
>>> X1 = namedtuple("X", "x")
>>> X2 = namedtuple("X", "x")
Then you can use something like:
>>> def equivalent_namedtuple_types(t1, t2):
... return (t1.__name__, t1._fields) == (t2.__name__, t2._fields)
...
>>> equivalent_namedtuple_types(X1, X2)
True
>>>
From your comments, it seems like you may care about the .__module__ attribute as well.

Related

Python regular expressions: Better way to handle non-matches?

When I deal with regular expressions, my code is littered with conditionals so as to not create exceptions when a pattern is not found:
m = some_compiled_pattern.match(s)
if m:
x = m.groups()
do_something_with(x)
m = some_other_compiled_pattern.search(s):
if m:
y = m.groupdict()
else:
y = {}
do_something_else_with(y)
Isn't there a better (less verbose) way to handle such exceptions?
You might find this class useful to reduce most of those if-no-match handling to a one line.
class Returns:
"""
Makes an object that pretends to have all possible methods,
but returns the same value (default None) no matter what this method,
or its arguments, is.
"""
def __init__(self, return_val=None):
self.return_val = return_val
def the_only_method_there_is(*args, **kwargs):
return return_val
self.the_only_method_there_is = MethodType(the_only_method_there_is, self)
def __getattr__(self, item):
if not item.startswith('_') and item not in {'return_val', 'the_only_method_there_id'}:
return self.the_only_method_there_is
else:
return getattr(self, item)
Example use:
>>> import re
>>> p = re.compile(r'(\d+)\W+(\w+)')
>>>
>>> # when all goes well...
>>> m = p.search('The number 42 is mentioned often')
>>> num, next_word = m.groups()
>>> num, next_word
('42', 'is')
>>>
>>> # when the pattern is not found...
>>> m = p.search('No number here')
>>> assert m is None # m is None so...
>>> num, next_word = m.groups() # ... this is going to choke
Traceback (most recent call last):
...
AttributeError: 'NoneType' object has no attribute 'groups'
>>>
>>> # Returns to the rescue
>>> num, next_word = (p.search('No number here') or Returns((None, 'default_word'))).groups()
>>> assert num is None
>>> next_word
'default_word'
EDIT: See this gist for a longer discussion (and alternate but similar solution) of this problem.

Performance Issue: Lookup sub-value of python dictionary if it is matching another value

I have a python dictionary as follows. Same way, dictionary might have 2 comma separate values for 'Var'(i.e. Dep1,Dep2) and then their respective SubValue (ABC1||A1B1||B1C1, ABC2||A2B2||B2C2).
I'm trying to extract value A1B1 (or A1B1 and B1C1 if there are two Var) with a match of mainValue 'ABC1' and prefix of SubVal 'ABC1'.
ld = { 'id' : 0
'Var': 'Dep1'
'SubVal': 'ABC1||A1B1,ABC2||A2B2,ABC3||A3B3',
'MainValue': 'ABC1'}
So far I tried splitting Subval into list (splitting by comma) and then convert each pair (|| separated) into another dictionary and then looking up the match.
Can anyone suggest a better approach in terms of performance to do this?
Let:
>>> ld = { 'id' : 0, 'Var': 'Dep1', 'SubVal': 'ABC1||A1B1,ABC2||A2B2,ABC3||A3B3', 'MainValue': 'ABC1'}
Your split + dict solution is roughly (note the maxsplit parameter to handle ABC1||A1B1||B1C1 cases):
>>> def parse(d):
... sub_val = dict(t.split('||', maxsplit=1) for t in ld['SubVal'].split(","))
... return sub_val[d['MainValue']]
>>> parse(ld)
'A1B1'
A benchmarck gives:
>>> import timeit
>>> timeit.timeit(lambda: parse(ld))
1.002971081999931
You build a dict for a one shot lookup: that's a bit overkill. You can perform a direct lookup for the MainValue:
>>> def parse_iter(d):
... mv = d['MainValue']
... g = (t.split('||', maxsplit=1) for t in d['SubVal'].split(","))
... return next(v for k, v in g if k == mv)
>>> parse_iter(ld)
'A1B1'
It is a little faster:
>>> timeit.timeit(lambda: parse_iter(ld))
0.8656512869993094
A faster approach is to look for the MainValue in the the ld[SubVal] string and extract the right SubVal. (I assume the MainValue can't be a SubVal or a substring of a SubVal).
With a regex:
>>> import re
>>> def parse_re(d):
... pattern = d['MainValue']+"\|\|([^,]+)"
... return re.search(pattern, d['SubVal']).group(1)
>>> parse_re(ld)
'A1B1'
This is around 25 % faster than the first version on the example:
>>> timeit.timeit(lambda: parse_re(ld))
0.7367669239997667
But why not perform the search manually?
>>> def parse_search(d):
... s = d['SubVal']
... mv = d['MainValue']
... i = s.index(mv) + len(mv) + 2 # after the ||
... j = s.index(",", i)
... return s[i:j]
>>> parse_search(ld)
'A1B1'
This version is around 60% faster than the first version (on the given example):
>>> timeit.timeit(lambda: parse_search(ld))
0.3840863969999191
(If the MainValue can be a SubVal, you can check if there is a comma before the MainValue or SubVal starts with MainValue.)

Python 3: Invalid Syntax Error when using * operator in __init__ between pre-defined values defined in-scope

I am getting a syntax error when trying to do the following MCVE in Python 3.
HEIGHT = 26
WIDTH = 26
OTHERVAR = 5
class Foo():
def __init__(self, OTHERVAR, HEIGHT*WIDTH):
print (str(OTHERVAR + HEIGHT*WIDTH))
foo_inst = Foo()
Below is the error
File "a.py", line 6
def __init__(self, OTHERVAR, HEIGHT*WIDTH):
^
SyntaxError: invalid syntax
I'm wondering why the multiplication * operator is invalid syntax in this scenario.
If someone could explain why this is bad syntax and offer a potential workaround, that would be great. Thank you.
A function parameter supposes to be a variable, your HEIGHT*WIDTH produces a value, not a variable.
Are you probably looking for this (default value)?
>>> a = 1
>>> b = 2
>>> def test(c=a*b):
... print(c)
...
>>> test()
2
>>> def test(c=a*b, d):
... print(c, d)
...
File "<stdin>", line 1
SyntaxError: non-default argument follows default argument
>>> def test(d, c=a*b):
... print(d, c)
...
>>> test(10)
(10, 2)
And called by named parameters
>>> def test(d, c=a*b, e=20):
... print(d, c, e)
...
>>> test(10, e=30)
(10, 2, 30)

python numba fingerprint error

I'm attempting numba to optimise some code. I've worked through the initial examples in section 1.3.1 in the 0.26.0 user guide (http://numba.pydata.org/numba-doc/0.26.0/user/jit.html) and get the expected results, so I don't think the problem is installation.
Here's my code:
import numba
import numpy
import random
a = 8
b = 4
def my_function(a, b):
all_values = numpy.fromiter(range(a), dtype = int)
my_array = []
for n in (range(a)):
some_values = (all_values[all_values != n]).tolist()
c = random.sample(some_values, b)
my_array.append(sorted([n] + c))
return my_array
print(my_function(a, b))
my_function_numba = numba.jit()(my_function)
print(my_function_numba(a, b))
Which after printing out the expected results from the my_function call returns the following error message:
ValueError Traceback (most recent call last)
<ipython-input-8-b5d8983a58f6> in <module>()
19 my_function_numba = numba.jit()(my_function)
20
---> 21 print(my_function_numba(a, b))
ValueError: cannot compute fingerprint of empty list
Fingerprint of empty list?
I'm not sure about that error in particular, but in general, to be fast numba requires a particular subset of numpy/python (see here and here for more). So I might rewrite it like this.
#numba.jit(nopython=True)
def fast_my_function(a, b):
all_values = np.arange(a)
my_array = np.empty((a, b + 1), dtype=np.int32)
for n in range(a):
some = all_values[all_values != n]
c = np.empty(b + 1, dtype=np.int32)
c[1:] = np.random.choice(some, b)
c[0] = n
c.sort()
my_array[n, :] = c
return my_array
Main things to note:
no lists, I'm pre-allocating everything.
no use of generators (in both python 2 & 3 for n in range(a) will get converted to a fast native loop)
adding nopython=True to the decorator makes it so numba will complain if I use something that can't be efficiently JITed.

Why does type(mock.MagicMock()) == mock.MagicMock returns False?

In Python3.4:
>>> import mock.MagicMock
>>> type(mock.MagicMock()) == mock.MagicMock
False # Huh, why is that?
>>> isinstance(mock.MagicMock(), mock.MagicMock)
True
When I simplify this to class A and B I type(B()) == B returns True:
>>> class A: pass
>>> class B: pass
>>> class C(A, B): pass
>>> type(B()) == B
True # Of course I would say.
Why returns type(mock.MagicMock()) == mock.MagicMock False? I know about the difference between isinstance() and type() in Python. type() doesn't 'understand' subclassing where isinstance does. But I don't see how that is that difference is involved here.
source of mock.MagicMock.
More experiments suggest the answer.
>>> from unittest.mock import MagicMock as mm
>>> mm1 = mm()
>>> mm2 = mm()
>>> type(mm1)
<class 'unittest.mock.MagicMock'>
>>> type(mm2)
<class 'unittest.mock.MagicMock'>
>>> type(mm1) == type(mm2)
False
>>> id(type(mm1))
53511896
>>> id(type(mm2))
53510984
>>> type(mm1) is mm1.__class__
True
>>> mm
<class 'unittest.mock.MagicMock'>
>>> id(mm)
53502776
Conclusion: each instance of MagicMock has a 'class' that looks like MagicMock, but is not. What is the new that creates such instances? MagicMock subclasses Mock, which subclasses NonCallableMock, which has this new method.
def __new__(cls, *args, **kw):
# every instance has its own class
# so we can create magic methods on the
# class without stomping on other mocks
new = type(cls.__name__, (cls,), {'__doc__': cls.__doc__})
instance = object.__new__(new)
return instance
The new = ... statement creates a subclass of the cls argument with the same name and docstring. The next line creates a single instance of this subclass. So Mocks follow a revised equality instead of type(mm()) is mm.
>>> mm.__bases__
(<class 'unittest.mock.MagicMixin'>, <class 'unittest.mock.Mock'>)
>>> type(mm1).__bases__
(<class 'unittest.mock.MagicMock'>,)
>>> type(mm1).__bases__[0] is mm
True

Resources