I have some tuple with equal dimensions and some of them contain None or NaN elements.
I though that those with NaN inside would not compare equals, but in fact some do compare equals.
I have read Value Comparisons from the Python Documentation, especially the part saying:
Sequences compare lexicographically using comparison of corresponding
elements, whereby reflexivity of the elements is enforced.
In enforcing reflexivity of elements, the comparison of collections
assumes that for a collection element x, x == x is always true. Based
on that assumption, element identity is compared first, and element
comparison is performed only for distinct elements. This approach
yields the same result as a strict element comparison would, if the
compared elements are reflexive. For non-reflexive elements, the
result is different than for strict element comparison, and may be
surprising: The non-reflexive not-a-number values for example result
in the following comparison behavior when used in a list:
>>> nan = float('NaN')
>>> nan is nan # True
>>> nan == nan # False <-- the defined non-reflexive behavior of NaN
>>> [nan] == [nan] # True <-- list enforces reflexivity and tests identity first
Here the MCVE which led me to ask the question:
import numpy as np
# [1]
x = (12, float('nan'), 'test')
x == (12, float('nan'), 'test') # False
# [2]
y = (12, np.nan, 'test')
y == (12, np.nan, 'test') # True
# [3]
u = float('nan')
q = (12, u, 'test')
q == (12, u, 'test') # True
# [4]
v = (12, None, 'test')
v == (12, None, 'test') # True
So, if I have properly understood - lets talk about container of same length - container comparison works as follow:
It compares element-wise;
First it checks if x_i is y_i;
If not it checks if x_i == y_i;
If identity or equality fails for at least one element, it returns False, otherwise True.
And this is somehow surprising!
I also have discovered than numpy.nan is a Singleton:
np.nan is np.nan # True
And it not something like:
class test:
#property
def nan(self):
return float('nan')
inst = test()
inst.nan is inst.nan # False
My questions are:
Am I correct in my reasoning or have I missed something else?
Is there another way to compare in Python that will ensure equality only?
Related
Two variables in Python have the same id:
a = 10
b = 10
a is b
>>> True
If I take two lists:
a = [1, 2, 3]
b = [1, 2, 3]
a is b
>>> False
according to this link Senderle answered that immutable object references have the same id and mutable objects like lists have different ids.
So now according to his answer, tuples should have the same ids - meaning:
a = (1, 2, 3)
b = (1, 2, 3)
a is b
>>> False
Ideally, as tuples are not mutable, it should return True, but it is returning False!
What is the explanation?
Immutable objects don't have the same id, and as a matter of fact this is not true for any type of objects that you define separately. Generally speaking, every time you define an object in Python, you'll create a new object with a new identity. However, for the sake of optimization (mostly) there are some exceptions for small integers (between -5 and 256) and interned strings, with a special length --usually less than 20 characters--* which are singletons and have the same id (actually one object with multiple pointers). You can check this like following:
>>> 30 is (20 + 10)
True
>>> 300 is (200 + 100)
False
>>> 'aa' * 2 is 'a' * 4
True
>>> 'aa' * 20 is 'a' * 40
False
And for a custom object:
>>> class A:
... pass
...
>>> A() is A() # Every time you create an instance you'll have a new instance with new identity
False
Also note that the is operator will check the object's identity, not the value. If you want to check the value you should use ==:
>>> 300 == 3*100
True
And since there is no such optimizational or interning rule for tuples or any mutable type for that matter, if you define two same tuples in any size they'll get their own identities, hence different objects:
>>> a = (1,)
>>> b = (1,)
>>>
>>> a is b
False
It's also worth mentioning that rules of "singleton integers" and "interned strings" are true even when they've been defined within an iterator.
>>> a = (100, 700, 400)
>>>
>>> b = (100, 700, 400)
>>>
>>> a[0] is b[0]
True
>>> a[1] is b[1]
False
* A good and detailed article on this: http://guilload.com/python-string-interning/
Immutable != same object.*
An immutable object is simply an object whose state cannot be altered; and that is all. When a new object is created, a new address will be assigned to it. As such, checking if the addresses are equal with is will return False.
The fact that 1 is 1 or "a" is "a" returns True is due to integer caching and string interning performed by Python so do not let it confuse you; it is not related with the objects in question being mutable/immutable.
*Empty immutable objects do refer to the same object and their isness does return true, this is a special implementation specific case, though.
Take a look at this code:
>>> a = (1, 2, 3)
>>> b = (1, 2, 3)
>>> c = a
>>> id(a)
178153080L
>>> id(b)
178098040L
>>> id(c)
178153080L
In order to figure out why a is c is evaluated as True whereas a is b yields False I strongly recommend you to run step-by-step the snippet above in the Online Python Tutor. The graphical representation of the objects in memory will provide you with a deeper insight into this issue (I'm attaching a screenshot).
According to the documentation, immutables may have same id but it is not guaranteed that they do. Mutables always have different ids.
https://docs.python.org/3/reference/datamodel.html#objects-values-and-types
Types affect almost all aspects of object behavior. Even the importance of object identity is affected in some sense: for immutable types, operations that compute new values may actually return a reference to any existing object with the same type and value, while for mutable objects this is not allowed.
In previous versions of Python, tuples were assigned different IDs. (Pre 3.7)
As of Python 3.7+, two variables with the same tuple assigned may have the same id:
>>>a = (1, 2, 3)
>>>b = (1, 2, 3)
>>>a is b
True
Integers above 256 also have different ids:
>>>a = 123
>>>b = 123
>>>a is b
True
>>>
>>>a = 257
>>>b = 257
>>>a is b
False
Check below code..
tupils a and b are retaining their older references(ID) back when we have assigned their older values back. (BUT, THIS WILL NOT BE THE CASE WITH LISTS AS THEY ARE MUTABLE)
Initially a and b have same values ( (1,2) ), but they have difference IDs. After alteration to their values, when we reassign value (1,2) to a and b, they are now referencing to THEIR OWN same IDs (88264264 and 88283400 respectively).
>>> a = (1,2)
>>> b = (1,2)
>>> a , b
((1, 2), (1, 2))
>>> id(a)
88264264
>>> id(b)
88283400
>>> a = (3,4)
>>> b = (3,4)
>>> id(a)
88280008
>>> id(b)
88264328
>>> a = (1,2)
>>> b = (1,2)
>>> id(a)
88264264
>>> id(b)
88283400
>>> a , b
((1, 2), (1, 2))
>>> id(a) , id(b)
(88264264, 88283400)
>>>
**Check the link Why don't tuples get the same ID when assigned the same values?
also after reading this. Another case also been discussed here.
I'm trying to replace duplicate words from string with the code bellow:
from functools import reduce
from collections import Counter
import re
if __name__ == '__main__':
sentence = 'User Key Account Department Account Start Date'
result = reduce(
lambda sentence, word: re.sub(rf'{word}\s*', '', sentence, count=1),
filter(lambda x: x[0] if x[1] > 1 else '',
Counter(sentence.split()).items()),
sentence
)
import pdb
pdb.set_trace()
print(result)
# User Key Department Account Start Date
But it does not print the expected. The strange part is in filter. If I list only results filtered:
[el for el in filter(lambda x: x[0] if x[1] > 1 else '', Counter(sentence.split()).items())]
# [('Account', 2)]
Despite what is specified in lambda, x[0].
If I pass a not false value to else clause:
[el for el in filter(lambda x: x[0] if x[1] > 1 else ['foo'], Counter(sentence.split()).items())]
# [('User', 1), ('Key', 1), ('Account', 2), ('Department', 1), ('Start', 1), ('Date', 1)]
What I'm missing here?
I'd like to do the following:
[el for el in filter(lambda key,value: key if value > 1 else '', Counter(sentence.split()).items())]
And get Account. But it raises *** TypeError: <lambda>() missing 1 required positional argument: 'value'
It works fine using list comprehension:
[key for key, value in Counter(sentence.split()).items() if value > 1]
# ['Account']
The issue here is that I'm not sure what you're trying to do here. But I will explain what's actually happening.
Consider the expression filter(lambda x: x[0] if x[1] > 1 else '', Counter(sentence.split()).items()).
The first argument to filter is a predicate. This is a function which takes one input (x) and returns a value which is interpreted as a Boolean.
In this case, let's consider the predicate lambda x : x[0] if x[1] > 1 else '' - we will write this as P for shorthand. We will assume we call this function on an ordered pair (a, b) such that a is a string and b is a number.
Then we see that P((a, b)) = a if b > 1 else ''.
So if b > 1, then P((a, b)) evaluates to a. This value is then interpreted as a Boolean (even though it's a string) because P serves as a predicate.
When we interpret some "container" data-type like a String as a Boolean, we interpret the container to be "true-like" if it is non-empty and "false-like" if it is empty. So in this case, a will be interpreted as True when a is non-empty and False when a is empty.
On the other hand, when b <= 1, P((a, b)) will evaluate to '' which is then interpreted as False (because it's the empty string).
So P((a, b)) is a string which, when interpreted as a Boolean, is equal to b > 1 and (a is non-empty).
So when we call filter(P, seq), where seq is a sequence of pairs (a, b), a a string and b a number, we see that we will keep exactly those pairs (a, b) where b > 1 and a is non-empty.
This is indeed what happens.
However, it seems that what you want to happen is to only keep the items which occur more than once while ignoring their count. To do this, you need a combination of map and filter. You would want
map(lambda x: x[0], filter(lambda x: x[1] > 1, Counter(sentence.split()).items()))
This first keeps only the pairs (a, b) where b > 1. It then takes each remaining pair (a, b) and keeps only the a.
Say if I had two dictionaries:
d1 = {'a':1, 'b':2}
d2 = {'a':'b', 'b':'b', 'a':'a'}
How can I use dictionary d1 as the rules to decode d2, such as:
def decode(dict_rules, dict_script):
//do something
return dict_result
decode(d1,d2)
>> {1:2, 2:2, 1:1}
of course it can be written much shorter, but here a version to see the principle:
result_list = list()
result_dict = dict()
for d2_key in d2.keys():
d2_key_decoded = d1[d2_key]
d2_value = d2[d2_key]
d2_value_decoded = d1[d2_value]
result_dict[d2_key_decoded] = d2_value_decoded
# add a tuple to the result list
result_list.append((d2_key_decoded, d2_value_decoded))
the result might be unexpected - because the resulting dict would have entries with the same key, what is not possible, so the key 1 is overwritten:
>>> # equals to :
>>> result_dict[1] = 2
>>> result_dict[2] = 2
>>> result_dict[1] = 1
>>> # Result : {1:1, 2:2}
>>> # therefore I added a list of Tuples as result :
>>> # [(1, 2), (2, 2), (1, 1)]
but as #Patrik Artner pointed out, that is not possible, because already the input dictionary can not have duplicate keys !
I have two 2D tensors, in different length, both are different subsets of the same original 2d tensor and I would like to find all the matching "rows"
e.g
A = [[1,2,3],[4,5,6],[7,8,9],[3,3,3]
B = [[1,2,3],[7,8,9],[4,4,4]]
torch.2dintersect(A,B) -> [0,2] (the indecies of A that B also have)
I've only see numpy solutions, that use dtype as dicts, and does not work for pytorch.
Here is how I do it in numpy
arr1 = edge_index_dense.numpy().view(np.int32)
arr2 = edge_index2_dense.numpy().view(np.int32)
arr1_view = arr1.view([('', arr1.dtype)] * arr1.shape[1])
arr2_view = arr2.view([('', arr2.dtype)] * arr2.shape[1])
intersected = np.intersect1d(arr1_view, arr2_view, return_indices=True)
This answer was posted before the OP updated the question with other restrictions that changed the problem quite a bit.
TL;DR You can do something like this:
torch.where((A == B).all(dim=1))[0]
First, assuming you have:
import torch
A = torch.Tensor([[1,2,3],[4,5,6],[7,8,9]])
B = torch.Tensor([[1,2,3],[4,4,4],[7,8,9]])
We can check that A == B returns:
>>> A == B
tensor([[ True, True, True],
[ True, False, False],
[ True, True, True]])
So, what we want is: the rows in which they are all True. For that, we can use the .all() operation and specify the dimension of interest, in our case 1:
>>> (A == B).all(dim=1)
tensor([ True, False, True])
What you actually want to know is where the Trues are. For that, we can get the first output of the torch.where() function:
>>> torch.where((A == B).all(dim=1))[0]
tensor([0, 2])
If A and B are 2D tensors, the following code finds the indices such that A[indices] == B. If multiple indices satisfy this condition, the first index found is returned. If not all elements of B are present in A, the corresponding index is ignored.
values, indices = torch.topk(((A.t() == B.unsqueeze(-1)).all(dim=1)).int(), 1, 1)
indices = indices[values!=0]
# indices = tensor([0, 2])
Trying to figure best way to union of two dictionaries. Here is the code that I have. Counter is one of the options that I found.
def __add__(self,right):
mergedbag = Bag()
mergedbag.bag_value = copy.copy(self.bag_value)
for item in right.bag_value.keys():
mergedbag.bag_value[item] += right.bag_value[item]
return mergedbag
To test if two dictionaries have the same contents, simply use an equality test:
self.bag_items == bag_equal.bag_items
Python does this comparison test efficiently; keys and values have to match exactly and difference in length means the dictionaries are not equal:
>>> a = {'a': 'b'}
>>> b = {'a': 'b'}
>>> a == b
True
>>> b['b'] = 'c'
>>> a == b
False
>>> del b['b']
>>> b['a'] = 'c'
>>> a == b
False
>>> b['a'] = 'b'
>>> a == b
True
Note that rather than raise a TypeError, __eq__ should return the NotImplemented sentinel object to signal that equality testing is not supported:
def __eq__(self, other):
if not isinstance(other, Bag):
return NotImplemented
return self.bag_items == other.bag_items
As a side-note, the in membership operator already returns either True or False, there is no need to use a conditional expression in __contains__; the following is enough:
def __contains__(self, include):
return include in self.bag_items
Your code never actually does anything with the items passed in however, nor are you ever counting the items. Your count() method should just look up the element in self.bag_items and return the count once you properly track counts.