Creating a new tensor according to a list of lengths - pytorch

I have a tensor t with dim b x 3 and a list of lengths len = [l_0, l_1, ..., l_n]. All entries in len sum to to b. I want to create a new tensor with dim n x 3, which stores the average of the entries in t. E.g. The first l_0 entries in t are averaged and build the first element in the new tensor. The following l_1 entries are averaged and build the second element, ...
Thanks for your help.

You can do so using a combination a cumulative list of indices as helper and a list comprehension to construct the new array:
>>> b, lens = 10, [2, 3, 1, 3, 1]
>>> t = torch.rand(b, 3)
tensor([[0.3567, 0.3998, 0.9396],
[0.4061, 0.6465, 0.6955],
[0.3500, 0.4135, 0.5288],
[0.0726, 0.9575, 0.3785],
[0.6216, 0.2975, 0.3293],
[0.3878, 0.0735, 0.8181],
[0.1694, 0.5446, 0.1179],
[0.7793, 0.6613, 0.1748],
[0.0964, 0.9825, 0.1651],
[0.1421, 0.0994, 0.8086]])
Build the cumulative list of indices:
>>> c = torch.cumsum(torch.tensor([0] + lens), 0)
tensor([ 0, 2, 5, 6, 9, 10])
Loop over c by twos, with an overlapping window. For example zip(c[:-1], c[1:]) works well. Each selection from i to j gets averaged on dim=0.
>>> [t[i:j].sum(0) for i, j in zip(c[:-1], c[1:])]
[tensor([0.7628, 1.0463, 1.6351]),
tensor([1.0442, 1.6685, 1.2367]),
tensor([0.3878, 0.0735, 0.8181]),
tensor([1.0451, 2.1885, 0.4578]),
tensor([0.1421, 0.0994, 0.8086])]
Then you can stack the list:
>>> torch.stack([t[i:j].sum(0) for i, j in zip(c[:-1], c[1:])])
tensor([[0.7628, 1.0463, 1.6351],
[1.0442, 1.6685, 1.2367],
[0.3878, 0.0735, 0.8181],
[1.0451, 2.1885, 0.4578],
[0.1421, 0.0994, 0.8086]])

Related

Python optimization of time-series data re-indexing based on multiple-parameter multi-varialbe input and singular value output

I am trying to optimize a funciton that is trying to maximize the correlation between two (pandas) time series arrays (X and Y). This is done by using three parameters (a, b, c) and a third time series array (Z). The Z array is used to reindex the values in the X array (based on the parameters a, b, c) in such a way as to maximize the correlation of the reindexed X array (Xnew) with the Y array.
Below is some pseudo-code to demonstrate what I amy trying to do. I have attempted this using LMfit and scipy optimize but I am not sure how to make this task work in those packages. For example in LMfit if I tried to minimize the MyOpt function (which passes back a single value of the correlation metric) then it complains that I have more parameters than outputs. However, if I pass back the time series of the corrlation metric (diff) the the parameter values remain fixed at their input values.
I know the reindexing function I am using works because using the rather crude methods similar to the code below give signifianct changes in the mean (diff) metric passed back.
My knowledge of these optimizaiton packages is not up to scratch for this job so if anyone has a suggestion on how to tackle this, I would be greatfull.
def GetNewIndex(Z, a, b ,c):
old_index = np.arange(0, len(Z))
index_adj = some_func(a,b,c)
new_index = old_index + index_adj
max_old = np.max(old_index)
new_index[new_index > max_old] = max_old
new_index[new_index < 0] = 0
return new_index
def MyOpt(params, X, Y ,Z):
a = params['A']
b = params['B']
c = params['C']
# estimate lag (in samples) based on ambient RH
new_index = GetNewIndex(Z, a, b, c)
# assign old values to new locations and convert back to pandas series
Xnew = np.take(X.values, new_index)
Xnew = pd.Series(Xnew, index=X.index)
cc = Y.rolling(1201, center=True).corr(Xnew)
cc = cc.interpolate(limit_direction='both', limit_area=None)
diff = 1-np.abs(cc)
return np.mean(diff)
#==================================================
X = some long pandas time series data
Y = some long pandas time series data
Z = some long pandas time series data
As = [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2]
Bs = [0, 0 ,0, 1, 1, 1, 0, 0, 0, 1, 1, 1]
Cs = [5, 6, 5, 6, 5, 6, 5, 6, 5, 6, 5, 6]
outs = []
for A, B, C in zip(As, Bs, Cs):
params={'A':A, 'B':B, 'C':C}
out = MyOpt(params, X, Y, Z)
outs.append(out)

Foobar Lucky Triple

I am trying to solve the following problem:
Write a function solution(l) that takes a list of positive integers l and counts the number of "lucky triples" of (li, lj, lk) where the list indices meet the requirement i < j < k. The length of l is between 2 and 2000 inclusive. A "lucky triple" is a tuple (x, y, z) where x divides y and y divides z, such as (1, 2, 4). The elements of l are between 1 and 999999 inclusive. The solution fits within a signed 32-bit integer. Some of the lists are purposely generated without any access codes to throw off spies, so if no triples are found, return 0.
For example, [1, 2, 3, 4, 5, 6] has the triples: [1, 2, 4], [1, 2, 6], [1, 3, 6], making the solution 3 total.
My solution only passes the first two tests; I am trying to understand what it is wrong with my approach rather then the actual solution. Below is my function for reference:
def my_solution(l):
from itertools import combinations
if 2<len(l)<=2000:
l = list(combinations(l, 3))
l= [value for value in l if value[1]%value[0]==0 and value[2]%value[1]==0]
#l= [value for value in l if (value[1]/value[0]).is_integer() and (value[2]/value[1]).is_integer()]
if len(l)<0xffffffff:
l= len(l)
return l
else:
return 0
If you do nested iteration of the full list and remaining list, then compare the two items to check if they are divisors... the result counts as the beginning and middle numbers of a 'triple',
then on the second round it will calculate the third... All you need to do is track which ones pass the divisor test along the way.
For Example
def my_solution(l):
row1, row2 = [[0] * len(l) for i in range(2)] # Tracks which indices pass modulus
for i1, first in enumerate(l):
for i2 in range(i1+1, len(l)): # iterate the remaining portion of the list
middle = l[i2]
if not middle % first: # check for matches
row1[i2] += 1 # increment the index in the tracker lists..
row2[i1] += 1 # for each matching pair
result = sum([row1[i] * row2[i] for i in range(len(l))])
# the final answer will be the sum of the products for each pair of values.
return result

Python how can create a subset from a integer array list based on a range?

I am looking around a way to get the subset from an integer array based on certain range
For example
Input
array1=[3,5,4,12,34,54]
#Now getting subset for every 3 element
Output
subset= [(3,5,4), (12,34,54)]
I know it could be simple, but didn't find the right way to get this output
Appreciated for the help
Thanks
Consider using a list comprehension:
>>> array1 = [3, 5, 4, 12, 34, 54]
>>> subset = [tuple(array1[i:i+3]) for i in range(0, len(array1), 3)]
>>> subset
[(3, 5, 4), (12, 34, 54)]
Links to other relevant documentation:
tuples
ranges
arr = [1,2,3,4,5,6]
sets = [tuple(arr[i:i+3]) for i in range(0, len(arr), 3)]
print(sets)
We are taking a range of values from the array that we make into a tuple. The range is determined by the for loop which iterates at a step of three so that a tuple only is create after every 3 items.
you can use code:
from itertools import zip_longest
input_list = [3,5,4,12,34,54]
iterables = [iter(input_list)] * 3
slices = zip_longest(*iterables, fillvalue=None)
output_list =[]
for slice in slices:
my_list = [slice]
# print(my_list)
output_list = output_list + my_list
print(output_list)
You could use the zip_longest function from itertools
https://docs.python.org/3.0/library/itertools.html#itertools.zip_longest

Make a list with non-decreasing order elements of a list in Python

I have a list a = [2,2,1,3,4,1] .
I want to make a new list c with the non-decreasing elements lists of list a.
That means my expected form is -
c = [[2,2],[1,3,4],[1]]
Here is my code:
>>> c = []
>>> for x in a:
... xx = a[0]
... if xx > x:
... b = a[:x]
... c.append(b)
... a = a[x:]
but my output is:
>>> c
[[2], [2]]
How can i make a list with all non-decreasing part of list a?
You can initialise the first entry of c with [a[0]] and then either append the current value from a to the end of the current list in c if it is >= the previous value, otherwise append a new list containing that value to c:
a = [2,2,1,3,4,1]
c = [[a[0]]]
last = a[0]
for x in a[1:]:
if x >= last:
c[-1].append(x)
else:
c.append([x])
last = x
print(c)
Output:
[[2, 2], [1, 3, 4], [1]]
If I understand what you are after correctly then what you want is to split the list every time the number decreases. If so then this should do what you need
c = []
previous_element = a[0]
sub_list = [previous_element]
for element in a[1:]:
if previous_element > element:
c.append(sub_list)
sub_list = []
previous_element = element
sub_list.append(previous_element)
c.append(sub_list)
Output:
In [1]: c
Out[2]: [[2, 2], [1, 3, 4], [1]]
There is possibly a clearer way to right the above, but it's pre coffee for me ;)
Also note that this code assumes that a will contain at least one item, if that is not always the case then you will need to either add an if statement around this, or re-structure the loop in a more while loop

Roll of different amount along a single axis in a 3D matrix [duplicate]

I have a matrix (2d numpy ndarray, to be precise):
A = np.array([[4, 0, 0],
[1, 2, 3],
[0, 0, 5]])
And I want to roll each row of A independently, according to roll values in another array:
r = np.array([2, 0, -1])
That is, I want to do this:
print np.array([np.roll(row, x) for row,x in zip(A, r)])
[[0 0 4]
[1 2 3]
[0 5 0]]
Is there a way to do this efficiently? Perhaps using fancy indexing tricks?
Sure you can do it using advanced indexing, whether it is the fastest way probably depends on your array size (if your rows are large it may not be):
rows, column_indices = np.ogrid[:A.shape[0], :A.shape[1]]
# Use always a negative shift, so that column_indices are valid.
# (could also use module operation)
r[r < 0] += A.shape[1]
column_indices = column_indices - r[:, np.newaxis]
result = A[rows, column_indices]
numpy.lib.stride_tricks.as_strided stricks (abbrev pun intended) again!
Speaking of fancy indexing tricks, there's the infamous - np.lib.stride_tricks.as_strided. The idea/trick would be to get a sliced portion starting from the first column until the second last one and concatenate at the end. This ensures that we can stride in the forward direction as needed to leverage np.lib.stride_tricks.as_strided and thus avoid the need of actually rolling back. That's the whole idea!
Now, in terms of actual implementation we would use scikit-image's view_as_windows to elegantly use np.lib.stride_tricks.as_strided under the hoods. Thus, the final implementation would be -
from skimage.util.shape import view_as_windows as viewW
def strided_indexing_roll(a, r):
# Concatenate with sliced to cover all rolls
a_ext = np.concatenate((a,a[:,:-1]),axis=1)
# Get sliding windows; use advanced-indexing to select appropriate ones
n = a.shape[1]
return viewW(a_ext,(1,n))[np.arange(len(r)), (n-r)%n,0]
Here's a sample run -
In [327]: A = np.array([[4, 0, 0],
...: [1, 2, 3],
...: [0, 0, 5]])
In [328]: r = np.array([2, 0, -1])
In [329]: strided_indexing_roll(A, r)
Out[329]:
array([[0, 0, 4],
[1, 2, 3],
[0, 5, 0]])
Benchmarking
# #seberg's solution
def advindexing_roll(A, r):
rows, column_indices = np.ogrid[:A.shape[0], :A.shape[1]]
r[r < 0] += A.shape[1]
column_indices = column_indices - r[:,np.newaxis]
return A[rows, column_indices]
Let's do some benchmarking on an array with large number of rows and columns -
In [324]: np.random.seed(0)
...: a = np.random.rand(10000,1000)
...: r = np.random.randint(-1000,1000,(10000))
# #seberg's solution
In [325]: %timeit advindexing_roll(a, r)
10 loops, best of 3: 71.3 ms per loop
# Solution from this post
In [326]: %timeit strided_indexing_roll(a, r)
10 loops, best of 3: 44 ms per loop
In case you want more general solution (dealing with any shape and with any axis), I modified #seberg's solution:
def indep_roll(arr, shifts, axis=1):
"""Apply an independent roll for each dimensions of a single axis.
Parameters
----------
arr : np.ndarray
Array of any shape.
shifts : np.ndarray
How many shifting to use for each dimension. Shape: `(arr.shape[axis],)`.
axis : int
Axis along which elements are shifted.
"""
arr = np.swapaxes(arr,axis,-1)
all_idcs = np.ogrid[[slice(0,n) for n in arr.shape]]
# Convert to a positive shift
shifts[shifts < 0] += arr.shape[-1]
all_idcs[-1] = all_idcs[-1] - shifts[:, np.newaxis]
result = arr[tuple(all_idcs)]
arr = np.swapaxes(result,-1,axis)
return arr
I implement a pure numpy.lib.stride_tricks.as_strided solution as follows
from numpy.lib.stride_tricks import as_strided
def custom_roll(arr, r_tup):
m = np.asarray(r_tup)
arr_roll = arr[:, [*range(arr.shape[1]),*range(arr.shape[1]-1)]].copy() #need `copy`
strd_0, strd_1 = arr_roll.strides
n = arr.shape[1]
result = as_strided(arr_roll, (*arr.shape, n), (strd_0 ,strd_1, strd_1))
return result[np.arange(arr.shape[0]), (n-m)%n]
A = np.array([[4, 0, 0],
[1, 2, 3],
[0, 0, 5]])
r = np.array([2, 0, -1])
out = custom_roll(A, r)
Out[789]:
array([[0, 0, 4],
[1, 2, 3],
[0, 5, 0]])
By using a fast fourrier transform we can apply a transformation in the frequency domain and then use the inverse fast fourrier transform to obtain the row shift.
So this is a pure numpy solution that take only one line:
import numpy as np
from numpy.fft import fft, ifft
# The row shift function using the fast fourrier transform
# rshift(A,r) where A is a 2D array, r the row shift vector
def rshift(A,r):
return np.real(ifft(fft(A,axis=1)*np.exp(2*1j*np.pi/A.shape[1]*r[:,None]*np.r_[0:A.shape[1]][None,:]),axis=1).round())
This will apply a left shift, but we can simply negate the exponential exponant to turn the function into a right shift function:
ifft(fft(...)*np.exp(-2*1j...)
It can be used like that:
# Example:
A = np.array([[1,2,3,4],
[1,2,3,4],
[1,2,3,4]])
r = np.array([1,-1,3])
print(rshift(A,r))
Building on divakar's excellent answer, you can apply this logic to 3D array easily (which was the problematic that brought me here in the first place). Here's an example - basically flatten your data, roll it & reshape it after::
def applyroll_30(cube, threshold=25, offset=500):
flattened_cube = cube.copy().reshape(cube.shape[0]*cube.shape[1], cube.shape[2])
roll_matrix = calc_roll_matrix_flattened(flattened_cube, threshold, offset)
rolled_cube = strided_indexing_roll(flattened_cube, roll_matrix, cube_shape=cube.shape)
rolled_cube = triggered_cube.reshape(cube.shape[0], cube.shape[1], cube.shape[2])
return rolled_cube
def calc_roll_matrix_flattened(cube_flattened, threshold, offset):
""" Calculates the number of position along time axis we need to shift
elements in order to trig the data.
We return a 1D numpy array of shape (X*Y, time) elements
"""
# armax(...) finds the position in the cube (3d) where we are above threshold
roll_matrix = np.argmax(cube_flattened > threshold, axis=1) + offset
# ensure we don't have index out of bound
roll_matrix[roll_matrix>cube_flattened.shape[1]] = cube_flattened.shape[1]
return roll_matrix
def strided_indexing_roll(cube_flattened, roll_matrix_flattened, cube_shape):
# Concatenate with sliced to cover all rolls
# otherwise we shift in the wrong direction for my application
roll_matrix_flattened = -1 * roll_matrix_flattened
a_ext = np.concatenate((cube_flattened, cube_flattened[:, :-1]), axis=1)
# Get sliding windows; use advanced-indexing to select appropriate ones
n = cube_flattened.shape[1]
result = viewW(a_ext,(1,n))[np.arange(len(roll_matrix_flattened)), (n - roll_matrix_flattened) % n, 0]
result = result.reshape(cube_shape)
return result
Divakar's answer doesn't do justice to how much more efficient this is on large cube of data. I've timed it on a 400x400x2000 data formatted as int8. An equivalent for-loop does ~5.5seconds, Seberg's answer ~3.0seconds and strided_indexing.... ~0.5second.

Resources