torch find indices of matching rows in 2 2D tensors - pytorch

I have two 2D tensors, in different length, both are different subsets of the same original 2d tensor and I would like to find all the matching "rows"
e.g
A = [[1,2,3],[4,5,6],[7,8,9],[3,3,3]
B = [[1,2,3],[7,8,9],[4,4,4]]
torch.2dintersect(A,B) -> [0,2] (the indecies of A that B also have)
I've only see numpy solutions, that use dtype as dicts, and does not work for pytorch.
Here is how I do it in numpy
arr1 = edge_index_dense.numpy().view(np.int32)
arr2 = edge_index2_dense.numpy().view(np.int32)
arr1_view = arr1.view([('', arr1.dtype)] * arr1.shape[1])
arr2_view = arr2.view([('', arr2.dtype)] * arr2.shape[1])
intersected = np.intersect1d(arr1_view, arr2_view, return_indices=True)

This answer was posted before the OP updated the question with other restrictions that changed the problem quite a bit.
TL;DR You can do something like this:
torch.where((A == B).all(dim=1))[0]
First, assuming you have:
import torch
A = torch.Tensor([[1,2,3],[4,5,6],[7,8,9]])
B = torch.Tensor([[1,2,3],[4,4,4],[7,8,9]])
We can check that A == B returns:
>>> A == B
tensor([[ True, True, True],
[ True, False, False],
[ True, True, True]])
So, what we want is: the rows in which they are all True. For that, we can use the .all() operation and specify the dimension of interest, in our case 1:
>>> (A == B).all(dim=1)
tensor([ True, False, True])
What you actually want to know is where the Trues are. For that, we can get the first output of the torch.where() function:
>>> torch.where((A == B).all(dim=1))[0]
tensor([0, 2])

If A and B are 2D tensors, the following code finds the indices such that A[indices] == B. If multiple indices satisfy this condition, the first index found is returned. If not all elements of B are present in A, the corresponding index is ignored.
values, indices = torch.topk(((A.t() == B.unsqueeze(-1)).all(dim=1)).int(), 1, 1)
indices = indices[values!=0]
# indices = tensor([0, 2])

Related

I am learning python and I appreciate if someone explains why I got the following output regarding tuple. They seem kind of contradictory to me [duplicate]

Two variables in Python have the same id:
a = 10
b = 10
a is b
>>> True
If I take two lists:
a = [1, 2, 3]
b = [1, 2, 3]
a is b
>>> False
according to this link Senderle answered that immutable object references have the same id and mutable objects like lists have different ids.
So now according to his answer, tuples should have the same ids - meaning:
a = (1, 2, 3)
b = (1, 2, 3)
a is b
>>> False
Ideally, as tuples are not mutable, it should return True, but it is returning False!
What is the explanation?
Immutable objects don't have the same id, and as a matter of fact this is not true for any type of objects that you define separately. Generally speaking, every time you define an object in Python, you'll create a new object with a new identity. However, for the sake of optimization (mostly) there are some exceptions for small integers (between -5 and 256) and interned strings, with a special length --usually less than 20 characters--* which are singletons and have the same id (actually one object with multiple pointers). You can check this like following:
>>> 30 is (20 + 10)
True
>>> 300 is (200 + 100)
False
>>> 'aa' * 2 is 'a' * 4
True
>>> 'aa' * 20 is 'a' * 40
False
And for a custom object:
>>> class A:
... pass
...
>>> A() is A() # Every time you create an instance you'll have a new instance with new identity
False
Also note that the is operator will check the object's identity, not the value. If you want to check the value you should use ==:
>>> 300 == 3*100
True
And since there is no such optimizational or interning rule for tuples or any mutable type for that matter, if you define two same tuples in any size they'll get their own identities, hence different objects:
>>> a = (1,)
>>> b = (1,)
>>>
>>> a is b
False
It's also worth mentioning that rules of "singleton integers" and "interned strings" are true even when they've been defined within an iterator.
>>> a = (100, 700, 400)
>>>
>>> b = (100, 700, 400)
>>>
>>> a[0] is b[0]
True
>>> a[1] is b[1]
False
* A good and detailed article on this: http://guilload.com/python-string-interning/
Immutable != same object.*
An immutable object is simply an object whose state cannot be altered; and that is all. When a new object is created, a new address will be assigned to it. As such, checking if the addresses are equal with is will return False.
The fact that 1 is 1 or "a" is "a" returns True is due to integer caching and string interning performed by Python so do not let it confuse you; it is not related with the objects in question being mutable/immutable.
*Empty immutable objects do refer to the same object and their isness does return true, this is a special implementation specific case, though.
Take a look at this code:
>>> a = (1, 2, 3)
>>> b = (1, 2, 3)
>>> c = a
>>> id(a)
178153080L
>>> id(b)
178098040L
>>> id(c)
178153080L
In order to figure out why a is c is evaluated as True whereas a is b yields False I strongly recommend you to run step-by-step the snippet above in the Online Python Tutor. The graphical representation of the objects in memory will provide you with a deeper insight into this issue (I'm attaching a screenshot).
According to the documentation, immutables may have same id but it is not guaranteed that they do. Mutables always have different ids.
https://docs.python.org/3/reference/datamodel.html#objects-values-and-types
Types affect almost all aspects of object behavior. Even the importance of object identity is affected in some sense: for immutable types, operations that compute new values may actually return a reference to any existing object with the same type and value, while for mutable objects this is not allowed.
In previous versions of Python, tuples were assigned different IDs. (Pre 3.7)
As of Python 3.7+, two variables with the same tuple assigned may have the same id:
>>>a = (1, 2, 3)
>>>b = (1, 2, 3)
>>>a is b
True
Integers above 256 also have different ids:
>>>a = 123
>>>b = 123
>>>a is b
True
>>>
>>>a = 257
>>>b = 257
>>>a is b
False
Check below code..
tupils a and b are retaining their older references(ID) back when we have assigned their older values back. (BUT, THIS WILL NOT BE THE CASE WITH LISTS AS THEY ARE MUTABLE)
Initially a and b have same values ( (1,2) ), but they have difference IDs. After alteration to their values, when we reassign value (1,2) to a and b, they are now referencing to THEIR OWN same IDs (88264264 and 88283400 respectively).
>>> a = (1,2)
>>> b = (1,2)
>>> a , b
((1, 2), (1, 2))
>>> id(a)
88264264
>>> id(b)
88283400
>>> a = (3,4)
>>> b = (3,4)
>>> id(a)
88280008
>>> id(b)
88264328
>>> a = (1,2)
>>> b = (1,2)
>>> id(a)
88264264
>>> id(b)
88283400
>>> a , b
((1, 2), (1, 2))
>>> id(a) , id(b)
(88264264, 88283400)
>>>
**Check the link Why don't tuples get the same ID when assigned the same values?
also after reading this. Another case also been discussed here.

Keep duplciate items in list of tuples if only the first index matches between the tuples

Input [(1,3), (3,1), (1,5), (2,3), (2,4), (44,33), (33,22), (44,22), (22,33)]
Expected Output [(1,3), (1,5), (2,3), (2,4), (44,33), (44,22)]
I am trying to figure out the above and have tried lots of stuff. So far my only success has been,
for x in range(len(list1)):
if list1[0][0] == list1[x][0]:
print(list1[x])
Output: (1, 3) \n (1, 5)
Any sort of advice or help would be appreciated.
Use a collections.defaultdict(list) keyed by the first value, and keep only the values that are ultimately duplicated:
from collections import defaultdict # At top of file, for collecting values by first element
from itertools import chain # At top of file, for flattening result
dct = defaultdict(list)
inp = [(1,3), (3,1), (1,5), (2,3), (2,4), (44,33), (33,22), (44,22), (22,33)]
# For each tuple
for tup in inp:
first, _ = tup # Extract first element (and verify it's actually a pair)
dct[first].append(tup) # Collect with other tuples sharing the same first element
# Extract all lists which have two or more elements (first element duplicated at least once)
# Would be list of lists, with each inner list sharing the same first element
onlydups = [lst for firstelem, lst in dct.items() if len(lst) > 1]
# Flattens to make single list of all results (if desired)
flattened_output = list(chain.from_iterable(onlydups))
Importantly, this doesn't required ordered input, and scales well, doing O(n) work (scaling your solution naively would produce a O(n²) solution, considerably slower for larger inputs).
Another approach is the following :
def sort(L:list):
K = []
for i in L :
if set(i) not in K :
K.append(set(i))
output = [tuple(m) for m in K]
return output
output :
[(1, 3), (1, 5), (2, 3), (2, 4), (33, 44), (33, 22), (44, 22)]

Return a matrix by applying a boolean mask (a boolean matrix of same size) in python

I have generated a square matrix of size 4 and a boolean matrix of same size by:
import numpy as np
A = np.random.randn(4,4)
B = np.full((4,4), True, dtype = bool)
B[[0],:] = False
B[:,[0]] = False
The following code return two matrices of size 4, A has all the random numbers, and B has all the boolean operators where the enitre first row and column is false
B = [[False, False, False, False],
[False, True, True, True],
[False, True, True, True],
[False, True, True, True]]
What i want is to apply the B boolean matrix to A, such that, i get a 3 by 3 matrix of A where B is True (the elements in B == True).
Is their any logical operator in numpy to perform this operation? or do I have to go through each element of A and B and compare them and then assign it to a new matrix?
In [214]: A = np.random.randn(4,4)
...: B = np.full((4,4), True, dtype = bool)
...: B[[0],:] = False
...: B[:,[0]] = False
In [215]: A
Out[215]:
array([[-0.80676817, -0.20810386, 1.28448594, -0.52667651],
[ 0.6292733 , -0.05575997, 0.32466482, -0.23495175],
[-0.70896794, -1.60571282, -1.43718839, -0.42032337],
[ 0.01541418, -2.00072652, -1.54197002, 1.2626283 ]])
In [216]: B
Out[216]:
array([[False, False, False, False],
[False, True, True, True],
[False, True, True, True],
[False, True, True, True]])
Boolean indexing (with matching size array) always produces a 1d array. In this case it did not select any values for A[0,:]:
In [217]: A[B]
Out[217]:
array([-0.05575997, 0.32466482, -0.23495175, -1.60571282, -1.43718839,
-0.42032337, -2.00072652, -1.54197002, 1.2626283 ])
But because the other 3 rows all have 3 True, reshaping the result does produce a reasonable result:
In [218]: A[B].reshape(3,3)
Out[218]:
array([[-0.05575997, 0.32466482, -0.23495175],
[-1.60571282, -1.43718839, -0.42032337],
[-2.00072652, -1.54197002, 1.2626283 ]])
Whether the reshape makes sense depends on the total number of elements, and your own interpretation of the data.
If you are looking to remove any rows/cols that include at least on False element in them, you can use np.any to find such rows and columns and then use np.ix_ to create 2D array from row/col indices:
A=A[np.ix_(*np.where(np.any(B, axis=0)), *np.where(np.any(B, axis=1)))]
This will give you the output for any 2D numpy array and same shape boolean mask/condition. You can expand this to any dimension numpy array by adding dimensions in brackets.
sample A:
[[-0.36027839 -1.54588632 0.1607951 1.68865218]
[ 0.20959185 0.13962857 1.97189081 -0.7686762 ]
[ 0.03868048 -0.36612182 0.77802273 0.23195807]
[-1.26148984 0.44672696 0.45970364 -1.58457129]]
Masked A with B:
[[ 0.13962857 1.97189081 -0.7686762 ]
[-0.36612182 0.77802273 0.23195807]
[ 0.44672696 0.45970364 -1.58457129]]

Transform an integer into dummies vector in python

Hi there!
I m working on python with pandas' get_dummies function and I try to transform an int into a vector like for example with a 5 categories feature :
1 -> [1,0,0,0,0]
2 -> [0,1,0,0,0]
...
Does a function exist for that?
If not I can built a function but I just ask before reinventing the wheel.
Thanks !
Just cast the relevant Series to a string and then use get_dummies as usual.
pd.get_dummies(df['col'].astype(str))
I think it's so easy you should just write a simple function to do that, instead of asking. Here is one of countless ways to do this.
import numpy as np
def get_dumm(lenn, num):
arr = np.zeros(lenn, dtype='bool_') #replace type with 'int8' if needed
arr[num - 1] = True #replace True with 1 if type of arr is 'int8'
return arr
get_dumm(5,3)
Output:
array([False, False, True, False, False], dtype=bool)
Or if you use int8:
array([0, 0, 1, 0, 0], dtype=int8)

Unexpected behaviour when comparing tuple with python

I have some tuple with equal dimensions and some of them contain None or NaN elements.
I though that those with NaN inside would not compare equals, but in fact some do compare equals.
I have read Value Comparisons from the Python Documentation, especially the part saying:
Sequences compare lexicographically using comparison of corresponding
elements, whereby reflexivity of the elements is enforced.
In enforcing reflexivity of elements, the comparison of collections
assumes that for a collection element x, x == x is always true. Based
on that assumption, element identity is compared first, and element
comparison is performed only for distinct elements. This approach
yields the same result as a strict element comparison would, if the
compared elements are reflexive. For non-reflexive elements, the
result is different than for strict element comparison, and may be
surprising: The non-reflexive not-a-number values for example result
in the following comparison behavior when used in a list:
>>> nan = float('NaN')
>>> nan is nan # True
>>> nan == nan # False <-- the defined non-reflexive behavior of NaN
>>> [nan] == [nan] # True <-- list enforces reflexivity and tests identity first
Here the MCVE which led me to ask the question:
import numpy as np
# [1]
x = (12, float('nan'), 'test')
x == (12, float('nan'), 'test') # False
# [2]
y = (12, np.nan, 'test')
y == (12, np.nan, 'test') # True
# [3]
u = float('nan')
q = (12, u, 'test')
q == (12, u, 'test') # True
# [4]
v = (12, None, 'test')
v == (12, None, 'test') # True
So, if I have properly understood - lets talk about container of same length - container comparison works as follow:
It compares element-wise;
First it checks if x_i is y_i;
If not it checks if x_i == y_i;
If identity or equality fails for at least one element, it returns False, otherwise True.
And this is somehow surprising!
I also have discovered than numpy.nan is a Singleton:
np.nan is np.nan # True
And it not something like:
class test:
#property
def nan(self):
return float('nan')
inst = test()
inst.nan is inst.nan # False
My questions are:
Am I correct in my reasoning or have I missed something else?
Is there another way to compare in Python that will ensure equality only?

Resources