numpy 1D array: identify pairs of elements that sum to 0 - python-3.x

My code generates numpy 1D arrays of integers. Here's an example.
arr = np.array([-8, 7, -5, 2, -7, 8, -6, 3, 5])
There are two steps I need to take with this array, but I'm new enough at Python that I'm at a loss how do this efficiently. The two steps are:
a) Identify the 1st element of pairs having sum == 0. For arr, we have (-8, 7, -5).
b) Now I need to find the difference in indices for each of the pairs identified in a).
The difference in indices for (-8,8) is 5, for (7,-7) is 3,
and for (-5,5) is 6.
Ideally, the output could be a 2D array, something like:
[[-8, 5],
[ 7, 3],
[-5, 6]]
Thank you for any assistance.

Here is my solution:
arr = np.array([-8, 7, -5, 2, -7, 8, -6, 3, 5])
output = list()
for i in range(len(arr)):
for j in range(len(arr)-i):
if arr[i] + arr[i+j] == 0:
output.append([arr[i],j])
print(output)
[[-8, 5], [7, 3], [-5, 6]]
I have two comments also:
1) You can transfer the list to the numpy array by np.asarray(output)
2) Imagine you have list [8, -8, -8]. If you want to calculate distance of the first pair only, you can simply add break after the appending procedure.

Related

Is it possible to perform row-wise tensor operations if two PyTorch tensors do not have the same size without using list comprehensions?

This is a pretty specific usage case, but I'm hoping someone out there is more familiar with PyTorch tensors than I am and can help me speed this up.
I'm working on implementing a custom similarity metric for a neural network and have successfully gotten it to work, but it is incredibly slow to calculate. Each epoch takes about a minute to run, which simply isn't going to work with how I wanted to compare it with other metrics. So, I've been trying to utilize PyTorch tensors more effectively to speed things up, but haven't had much success.
Basically, I need to sum up the integers in the 'counts' tensor between the min and max indices specified in the 'min' and 'max' tensors for each sample and cluster combination.
As mentioned, my original implementation using loops took about a minute per epoch to run, but I did manage to reduce that to about 18-20 seconds using list comprehensions:
# counts has size (16, 100), max and min have size (2708, 7, 16)
data_mass = torch.sum(torch.tensor([[[torch.pow(torch.sum(counts[k][min[i][j][k]:max[i][j][k]+1]) / divisor, 2) for k in range(len(counts))] for j in range(len(min[i]))] for i in range(len(min))]), 2)
This feels super janky, and I've seen some clever things done with PyTorch functions, but I haven't been able to find anything yet that addresses quite what I want to do. Thanks in advance! I'm happy to clarify anything that may not be clear, I understand the use case is a bit convoluted.
EDIT: I'll try and break down the code snippet above and provide a minimal example. Examples of minimal inputs might look like the following:
'min' and 'max' are both 3-dimensional tensors of shape (num_samples, num_clusters, num_features), such as this one of size (2, 3, 4)
min = tensor([[[1, 2, 3, 1],
[2, 1, 1, 2],
[1, 2, 2, 1]],
[[2, 3, 2, 1],
[3, 3, 1, 2],
[1, 0, 2, 1]]])
max = tensor([[[3, 3, 4, 4],
[3, 2, 3, 4],
[2, 4, 3, 2]],
[[4, 4, 3, 3],
[4, 4, 2, 3],
[2, 1, 3, 2]]])
'counts' is a 2-dimensional tensor of size(num_features, num_bins),
so for this example we'll say size (4, 5)
counts = tensor([[1, 2, 3, 4, 5],
[2, 5, 3, 1, 1],
[1, 2, 3, 4, 5],
[2, 5, 3, 1, 1]])
The core part of the code snippet given above is the summation of the counts tensor between the values given by the min and max tensors for each pair of indices given at each index in max/min. For the first sample/cluster combo above:
mins = [1, 2, 3, 1]
maxes = [3, 3, 4, 4]
#Starting with feature #1 (leftmost element of min/max, top row of counts),
we sum the values in counts between the indices specified by min and max:
min_value = mins[0] = 1
max_value = maxes[0] = 3
counts[0] = [1, 2, 3, 4, 5]
subset = counts[0][mins[0]:maxes[0]+1] = [2, 3, 4]
torch.sum(subset) = 9
#Second feature
min_value = mins[1] = 2
max_value = maxes[1] = 3
counts[1] = [2, 5, 3, 1, 1]
subset = counts[0][mins[0]:maxes[0]+1] = [3, 1]
torch.sum(subset) = 4
In my code snippet, I perform a few additional operations, but if we ignore those and just sum all the index pairs, the output will have the form
pre_sum_output = tensor([[[9, 4, 9, 10],
[7, 8, 9, 5]
[5, 5, 7, 8]],
[[12, 2, 7, 9],
[9, 2, 5, 4],
[5, 7, 7, 8]]])
Finally, I sum the output one final time along the third dimension:
data_mass = torch.sum(pre_sum_output, 2) = torch.tensor([[32, 39, 25],
[30, 20, 27]])
I then need to repeat this for every pair of mins and maxes in 'min' and 'max' (each [i][j][k]), hence the list comprehension above iterating through i and j to get each sample and cluster respectively.
By noticing that torch.sum(counts[0][mins[0]:maxes[0]+1]) is equal to cumsum[maxes[0]] - cumsum[mins[0]-1] where cumsum = torch.cumsum(counts[0]), you can get rid of the loops like so:
# Dim of sample, clusters, etc.
S, C, F, B = range(4)
# Copy min and max over bins
min = min.unsqueeze(B)
max = max.unsqueeze(B)
# Copy counts over samples and clusters
counts = counts.reshape(
1, # S
1, # C
*counts.shape # F x B
)
# Number of samples, clusters, etc.
ns, nc, nf, nb = min.size(S), min.size(C), min.size(F), counts.size(B)
# Calculate cumulative sum and copy over samples and clusters
cum_counts = counts.cumsum(dim=B).expand(ns, nc, nf, nb)
# Prevent index error when min index is 0
is_zero = min == 0
lo = (min - 1).masked_fill(is_zero, 0)
# Compute the contiguous sum from min to max (inclusive)
lo_sum = cum_counts.gather(dim=B, index=lo)
hi_sum = cum_counts.gather(dim=B, index=max)
sum_counts = torch.where(is_zero, hi_sum, hi_sum - lo_sum)
pre_sum_output = sum_counts.squeeze(B)
You can then sum over the 2nd dim to get data_mass.

How do you merge a multiple 2d list thats not the same length?

Good morning I'm trying to merge two or more 2d list together that doesnt have the same length.
For example below I have two different multidimensional list that doesnt have the same length.
A=[[1,2,3],[4,7,19]]
B=[[2,4], [3],[5,7,9]]
If this is possible what code do I use to get the results below.
C=[[[1,2,3,2,4],[1,2,3,3],[1,2,3,5,7,9]],[[4,7,19,2,4],[4,7,19,3],[4,7,19,5,7,9]]]
Use a nested list comprehension:
>>> [[a + b for b in B] for a in A]
[[[1, 2, 3, 2, 4], [1, 2, 3, 3], [1, 2, 3, 5, 7, 9]], [[4, 7, 19, 2, 4], [4, 7, 19, 3], [4, 7, 19, 5, 7, 9]]]
a and b are each sub-list of A and B, respectively. The comprehension takes the first member of A in the outer for a in A and cycles through each sub-list of B, adding each one to a in turn. Then the next a in A is selected and the process keeps repeating until there are no more members of A left.

How to use list comprehensions for this?

I want to take input of 2 numbers: the number of rows and the number of columns. I then want to use these to output a matrix numbered sequentially. I want to do this using a list comprehension. The following is a possible output.
>>>> my_matrix = matrix_fill(3, 4)
>>>> my_matrix
[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]
I am using the following code to output a sequentially numbered list:
def matrix_fill(num_rows, num_col):
list=[i for i in range(num_col)]
return (list)
I cannot, however, figure out how to make the sequential list of numbers break into the separate lists as shown in the output based on num_rows.
I don't think you need itertools for that. The range function can take a step as a parameter. Like this:
def matrix_fill(rows,cols):
return [[x for x in range(1,rows*cols+1)][i:i+cols] for i in range(0,rows*cols,cols)]
And then it works as expected.
>>> matrix_fill(3,4)
[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]
Let's break this down a little bit and understand what's happening.
>>> [x for x in range(1,3*4+1)]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
So what we want to do is to get a new slice every four elements.
>>> [x for x in range(1,3*4+1)][0:4]
[1, 2, 3, 4]
>>> [x for x in range(1,3*4+1)][4:8]
[5, 6, 7, 8]
>>> [x for x in range(1,3*4+1)][8:12]
[9, 10, 11, 12]
So we want to iterate over the elements of the list[x for x in range(1,3*4+1)] of length "rows*cols" ( 3 * 4 ), create a new slice every "cols" number of elements, and group these slices under a single list. Therefore, [[x for x in range(1,rows*cols+1)][i:i+cols] for i in range(0,rows*cols,cols)] is a suitable expression.
Nest a list comprehension inside another one, use itertools.count() to generate the sequence:
import itertools
rows = 3
cols = 4
count_gen = itertools.count() # pass start=1 if you need the sequence to start at 1
my_matrix = [[next(count_gen) for c in range(cols)] for r in range(rows)]
print(my_matrix)
# prints: [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]
# As a function
def matrix_fill(rows, cols):
count_gen = itertools.count()
return [[next(count_gen) for c in range(cols)] for r in range(rows)]
If you used the numpy module, the method is extremely simple, with no list comprehension needed.
my_matrix = np.arange(1, 13).reshape(3,4)
Printing the variable my_matrix shows
[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]]

Returning the N largest values' indices in a multidimensional array (can find solutions for one dimension but not multi-dimension)

I have a numpy array X, and I'd like to return another array Y whose entries are the indices of the n largest values of X i.e. suppose I have:
a =np.array[[1, 3, 5], [4, 5 ,6], [9, 1, 7]]
then say, if I want the first 5 "maxs"'s indices-here 9, 7 , 6 , 5, 5 are the maxs, and their indices are:
b=np.array[[2, 0], [2 2], [ 2 1], [1 1], [0 , 2])
I've been able to find some solutions and make this work for a one dimensional array like
c=np.array[1, 2, 3, 4, 5, 6]:
def f(a,N):
return np.argsort(a)[::-1][:N]
But have not been able to generate something that works in more than one dimension. Thanks!
Approach #1
Get the argsort indices on its flattened version and select the last N indices. Then, get the corresponding row and column indices -
N = 5
idx = np.argsort(a.ravel())[-N:][::-1] #single slicing: `[:N-2:-1]`
topN_val = a.ravel()[idx]
row_col = np.c_[np.unravel_index(idx, a.shape)]
Sample run -
# Input array
In [39]: a = np.array([[1,3,5],[4,5,6],[9,1,7]])
In [40]: N = 5
...: idx = np.argsort(a.ravel())[-N:][::-1]
...: topN_val = a.ravel()[idx]
...: row_col = np.c_[np.unravel_index(idx, a.shape)]
...:
In [41]: topN_val
Out[41]: array([9, 7, 6, 5, 5])
In [42]: row_col
Out[42]:
array([[2, 0],
[2, 2],
[1, 2],
[1, 1],
[0, 2]])
Approach #2
For performance, we can use np.argpartition to get top N indices without keeping sorted order, like so -
idx0 = np.argpartition(a.ravel(), -N)[-N:]
To get the sorted order, we need one more round of argsort -
idx = idx0[a.ravel()[idx0].argsort()][::-1]

multiplying elements in a 3x4 matrix individually in python 3 using loops (no numpy)

Im trying to make a matrix that is 3 rows by 4 columns and includes the numbers 1-12. Would then like to multiply those numbers by a factor to make a new matrix.
def matrix(x):
matrix=[[1,2,3],[4,5,6],[7,8,9],[10,11,12]]
new_matrix=[[x*1,x*2,x*3],[x*4,x*5,x*6],[x*7,x*8,x*9],[x*10,x*11,x*12]]
print(new_matrix)
This approach works, however it does not use loops, I'm looking for an approach that uses loops, something like this:
def matrix(x):
for i in range(3):
matrix.append([])
for j in range(4):
matrix[i].append(0)
return matrix
You do not need to use explicit loops for something like this (unless you really want to). List comprehensions are a much more efficient way to generate lists, and have a similar syntax to a for loop:
Here is a comprehension for generating any MxN matrix containing the numbers up to M * N:
def matrix(m, n):
return [[x+1 for x in range(row * n, (row + 1) * n)] for row in range(m)]
Here is a comprehension for multiplying the nested list returned by matrix by some factor:
def mult(mat, fact):
return [[x * fact for x in row] for row in mat]
Here is the result for your specific 3x4 case:
>>> m = matrix(3, 4)
>>> print(m)
[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]
>>> m2 = mult(m, 2)
>>> print(m2)
[[2, 4, 6, 8], [10, 12, 14, 16], [18, 20, 22, 24]]
If you want the indices to be swapped as in your original example, just swap the inputs m and n:
>>> matrix(4, 3)
[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]
mult will work the same for any nested list you pass in.

Resources