Getting indices using conditional list comprehension - python-3.x

I have the following np.array:
my_array=np.array([False, False, False, True, True, True, False, True, False, False, False, False, True])
How can I make a list using list comprehension of the indices corresponding to the True elements. In this case the output I'm looking for would be [3,4,5,7,12]
I've tried the following:
cols = [index if feature_condition==True for index, feature_condition in enumerate(my_array)]
But is not working

why specifically a list comprehension?
>>> np.where(my_array==True)
(array([ 3, 4, 5, 7, 12]),)
this does the job and is quicker. The list solution would be:
>>> [index for index, feature_condition in enumerate(my_array) if feature_condition == True]
[3, 4, 5, 7, 12]
Accepted answer of this explains the confusion of the ordering: if/else in a list comprehension
I was curious of the differences:
def np_time(array):
np.where(my_array==True)
def list_time(array):
[index for index, feature_condition in enumerate(my_array) if feature_condition == True]
timeit.timeit(lambda: list_time(my_array),number = 1000)
0.007574789000500459
timeit.timeit(lambda: np_time(my_array),number = 1000)
0.0010812399996211752

The order of if is not correct, should be in last -
$more numpy_1.py
import numpy as np
my_array=np.array([False, False, False, True, True, True, False, True, False, False, False, False, True])
print (my_array)
cols = [index for index, feature_condition in enumerate(my_array) if feature_condition]
print (cols)
$python numpy_1.py
[False False False True True True False True False False False False
True]
[3, 4, 5, 7, 12]

Related

How to locate 1-D array in a multi-dimensional array in all possible directions

I'm trying to solve an image search issue using NumPy and Pandas for weeks now. Would like to seek for some advice regarding same as I'm stuck and feeling back to square one with any attempt.
There're 2 image sets. The first set of images are full images and another set is smaller / cut version (chunks). The smaller chunks can be in any order (flipped, transposed, rotated, etc).
Converted both to corresponding NumPy matrices.
For simplicity, consider the below 2 matrices. I'm using a smaller size for illustration, but the actuals are 10000x12000 or more.
array([[ 2, 15, 9, 16, 4, 3, 12, 8],
[ 9, 9, 0, 16, 0, 1, 11, 12],
[ 9, 10, 6, 3, 2, 12, 19, 2],
[16, 2, 0, 6, 7, 5, 8, 8],
[18, 17, 3, 19, 5, 10, 1, 18],
[10, 7, 0, 0, 8, 17, 6, 4],
[ 2, 12, 8, 9, 6, 1, 11, 1],
[ 6, 7, 15, 15, 18, 15, 17, 15]])
and I'm trying to locate the following 1-D array in the earlier matrix.
array([6, 7, 5, 8])
It's in location (3,3) -> (3,4) -> (4,4) -> (5,4), which isn't in a straight line, rather in L-shape as in below:
array([[ False, False, False, False, False, False, False, False],
[ False, False, False, False, False, False, False, False],
[ False, False, False, False, False, False, False, False],
[ False, False, False, True, True, False, False, False],
[ False, False, False, False, True, False, False, False],
[ False, False, False, False, True, False, False, False],
[ False, False, False, False, False, False, False, False],
[ False, False, False, False, False, False, False, False]])
The elements from 1-D array aren't in a straight line always, rather can follow any order, like straight-line, L-shape, slanting line, etc. since the image chunks are transformed. This leads to a larger permutations & combinations, so need an efficient way.
So far, I tried to formalize few patterns and locating the position of the each element in the dataframe / matrix by indexing methods and checking for True per element:
np.any(a == 6, axis=0)
np.any(a == 7, axis=1)
It's taking forever to identify 1 pattern and forcing to consider some other solution, which I'm not aware of.
What would be the best way to locate the 1-D array in this multi-dimensional array in any order as mentioned earlier using NumPy and/or Pandas library? Any advice is appreciated. Thank you.
IIUC, one way is to convolve a 4 by 4 filter on the main 2-D array. The 4 by 4 squares which contains all of the 4 wanted values are the candidates for further inspection.
for i in range(len(main)-len(to_match)+1):
for j in range(len(main)-len(to_match)+1):
filter4 = main[i:i+4, j:j+4]
if np.sum(np.isin(to_match, filter4))==4:
print(np.isin(filter4, to_match))
print('------')

slice tensor of tensors using boolean tensor

Having two tensors :inputs_tokens is a batch of 20x300 of token ids
and seq_A is my model output with size of [20, 300, 512] (512 vector for each of the tokens in the batch)
seq_A.size()
Out[1]: torch.Size([20, 300, 512])
inputs_tokens.size()
torch.Size([20, 300])
I would like to get only the vectors of the token 101 (CLS) as follow:
cls_tokens = (inputs_tokens == 101)
cls_tokens
Out[4]:
tensor([[ True, False, False, ..., False, False, False],
[ True, False, False, ..., False, False, False],
[ True, False, False, ..., False, False, False], ...
How do I slice seq_A to get only the vectors which are true in cls_tokens for each batch?
when I do
seq_A[cls_tokens].size()
Out[7]: torch.Size([278, 512])
but I still need it to bee in the size of [20 x N x 512 ] (otherwise I don't know to which sample it belongs)
TLDR; You can't, all sequences must have the same size along a given axis.
Take this simplified example:
>>> inputs_tokens = torch.tensor([[ 1, 101, 18, 101, 9],
[ 1, 2, 101, 101, 101]])
>>> inputs_tokens.shape
torch.Size([2, 5])
>>> cls_tokens = inputs_tokens == 101
tensor([[False, True, False, True, False],
[False, False, True, True, True]])
Indexing inputs_tokens with the cls_tokens mask comes down to reducing inputs_tokens to cls_tokens's true values. In a general case where there is a different number of true values per batch, keeping the shape is impossible.
Following the above example, here is seq_A:
>>> seq_A = torch.rand(2, 5, 1)
tensor([[[0.4644],
[0.7656],
[0.3951],
[0.6384],
[0.1090]],
[[0.6754],
[0.0144],
[0.7154],
[0.5805],
[0.5274]]])
According to your example, you would expect to have an output shape of (2, N, 1). What would N be? 3? What about the first batch which only as 2 true values? The resulting tensor can't have different sizes (2 and 3 on axis=1). Hence: "all sequences on axis=1 must have the same size".
If however, you are expecting each batch to have the same number of tokens 101, then you could get away with a broadcast of your indexed tensor:
>>> inputs_tokens = torch.tensor([[ 1, 101, 101, 101, 9],
[ 1, 2, 101, 101, 101]])
>>> inputs_tokens.shape
>>> N = cls_tokens[0].sum()
3
Here remember, I'm assuming you have:
>>> assert all(cls_tokens.sum(axis=1) == N)
Therefore the desired output (with shape (2, 3, 1)) is:
>>> seq_A[cls_tokens].reshape(seq_A.size(0), N, -1)
tensor([[[0.7656],
[0.3951],
[0.6384]],
[[0.7154],
[0.5805],
[0.5274]]])
Edit - if you really want to do this though you would require the use of a list comprehension:
>>> [seq_A[i, cls_tokens[i]] for i in range(cls_tokens.size(0))]
[ tensor([[0.7656],
[0.6384]]),
tensor([[0.7154],
[0.5805],
[0.5274]]) ]

Python3 Boolean assignment to multidimension list is wrong?

Trying to do below
visited = [[False] * 3]* 3
print(visited)
visited[0][0] = True
print(visited)
why it prints like this:
[[False, False, False], [False, False, False], [False, False, False]]
[[True, False, False], [True, False, False], [True, False, False]]
shouldn't it be:
[[False, False, False], [False, False, False], [False, False, False]]
[[True, False, False], [False, False, False], [False, False, False]]
When you create a 2D array using
arr = [something * m]*n, all subarrays point to the same memory location. If you modify even one subarray, all other subarrays get modified.
The correct way to initialise the 2D matrix is
arr = [[something for i in range(m)] for j in range(n)]
to create a n x m matrix.
I am going to give an example how the array works. when you create a contiguous array(below is an example to check what is the size of the integer in python)
from sys import getsizeof
getsizeof(bool())
24
In my case boolean is 3 bytes (24 / 8 bits). when you define an array like above you are going to get the start location (which is 1000 as per below example) and when you access an array with index you will be given access to that particular location by the calculation start_location + size of boolean * index
1st Index of array will give 1000 + 3 * 1 -> 1003 for example
In your example visited = [[False] * 3]* 3 we are actually multiplying the reference point.[1000, 1000, 1000] in the example given above. so when you do visited[0][0] = True you are ideally changing the array with reference 1000 and modifying the 0th element to True. since the all array elements point to same array the value becomes like [[True, False, False], [True, False, False], [True, False, False]]
you should be using like below to initialize 2D matrix
array = [[False for i in range(3)] for j in range(3)]
array[0][0] = True
array
[[True, False, False], [False, False, False], [False, False, False]]
You would have noticed that by above example we are creating separate arrays on the go (with different references ofcourse).

Replace all Trues in a boolean dataframe with index valule

How can I replace all cells in a boolean dataframe (True/False) with the index name of that cell, when "True"? For example:
df = pd.DataFrame(
[
[False, True],
[True, False],
],
index=["abc", "def"],
columns=list("ab")
)
comes out as:
df = pd.DataFrame(
[
[False, abc],
[def, False],
],
index=["abc", "def"],
columns=list("ab")
)
Use df.mask:
Replace values where the condition is True.
df.mask(df,df.index)
a b
abc False abc
def def False

checking whether tuple in the list in python

Suppose I have a list of variables as follows:
v = [('d',0),('i',0),('g',0)]
What I want is to obtain a vector of values, that gives the truth value of the presence of the variable inside the list.
So, if have another list say
g = [('g',0)]
The output of that should be
op(v,g) = [False, False, True]
P.S.
I have tried using np.in1d but it gives the following:
array([False, True, False, True, True, True], dtype=bool)
In python you can use a list comprehension like following :
>>> v=[('d', 0), ('i', 0), ('g', 0)]
>>> g=[('t', 0), ('g', 0),('d',0)]
>>> [i in g for i in v]
[True, False, True]
You can convert those lists to numpy arrays and then use np.in1d like so -
import numpy as np
# Convert to numpy arrays
v_arr = np.array(v)
g_arr = np.array(g)
# Slice the first & second columns to get string & numeric parts.
# Use in1d to get matches between first columns of those two arrays;
# repeat for the second columns.
string_part = np.in1d(v_arr[:,0],g_arr[:,0])
numeric_part = np.in1d(v_arr[:,1],g_arr[:,1])
# Perform boolean AND to get the final boolean output
out = string_part & numeric_part
Sample run -
In [157]: v_arr
Out[157]:
array([['d', '0'],
['i', '0'],
['g', '0']],
dtype='<U1')
In [158]: g_arr
Out[158]:
array([['g', '1']],
dtype='<U1')
In [159]: string_part = np.in1d(v_arr[:,0],g_arr[:,0])
In [160]: string_part
Out[160]: array([False, False, True], dtype=bool)
In [161]: numeric_part = np.in1d(v_arr[:,1],g_arr[:,1])
In [162]: numeric_part
Out[162]: array([False, False, False], dtype=bool)
In [163]: string_part & numeric_part
Out[163]: array([False, False, False], dtype=bool)

Resources