Python optimization of time-series data re-indexing based on multiple-parameter multi-varialbe input and singular value output - python-3.x

I am trying to optimize a funciton that is trying to maximize the correlation between two (pandas) time series arrays (X and Y). This is done by using three parameters (a, b, c) and a third time series array (Z). The Z array is used to reindex the values in the X array (based on the parameters a, b, c) in such a way as to maximize the correlation of the reindexed X array (Xnew) with the Y array.
Below is some pseudo-code to demonstrate what I amy trying to do. I have attempted this using LMfit and scipy optimize but I am not sure how to make this task work in those packages. For example in LMfit if I tried to minimize the MyOpt function (which passes back a single value of the correlation metric) then it complains that I have more parameters than outputs. However, if I pass back the time series of the corrlation metric (diff) the the parameter values remain fixed at their input values.
I know the reindexing function I am using works because using the rather crude methods similar to the code below give signifianct changes in the mean (diff) metric passed back.
My knowledge of these optimizaiton packages is not up to scratch for this job so if anyone has a suggestion on how to tackle this, I would be greatfull.
def GetNewIndex(Z, a, b ,c):
old_index = np.arange(0, len(Z))
index_adj = some_func(a,b,c)
new_index = old_index + index_adj
max_old = np.max(old_index)
new_index[new_index > max_old] = max_old
new_index[new_index < 0] = 0
return new_index
def MyOpt(params, X, Y ,Z):
a = params['A']
b = params['B']
c = params['C']
# estimate lag (in samples) based on ambient RH
new_index = GetNewIndex(Z, a, b, c)
# assign old values to new locations and convert back to pandas series
Xnew = np.take(X.values, new_index)
Xnew = pd.Series(Xnew, index=X.index)
cc = Y.rolling(1201, center=True).corr(Xnew)
cc = cc.interpolate(limit_direction='both', limit_area=None)
diff = 1-np.abs(cc)
return np.mean(diff)
#==================================================
X = some long pandas time series data
Y = some long pandas time series data
Z = some long pandas time series data
As = [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2]
Bs = [0, 0 ,0, 1, 1, 1, 0, 0, 0, 1, 1, 1]
Cs = [5, 6, 5, 6, 5, 6, 5, 6, 5, 6, 5, 6]
outs = []
for A, B, C in zip(As, Bs, Cs):
params={'A':A, 'B':B, 'C':C}
out = MyOpt(params, X, Y, Z)
outs.append(out)

Related

Creating a new tensor according to a list of lengths

I have a tensor t with dim b x 3 and a list of lengths len = [l_0, l_1, ..., l_n]. All entries in len sum to to b. I want to create a new tensor with dim n x 3, which stores the average of the entries in t. E.g. The first l_0 entries in t are averaged and build the first element in the new tensor. The following l_1 entries are averaged and build the second element, ...
Thanks for your help.
You can do so using a combination a cumulative list of indices as helper and a list comprehension to construct the new array:
>>> b, lens = 10, [2, 3, 1, 3, 1]
>>> t = torch.rand(b, 3)
tensor([[0.3567, 0.3998, 0.9396],
[0.4061, 0.6465, 0.6955],
[0.3500, 0.4135, 0.5288],
[0.0726, 0.9575, 0.3785],
[0.6216, 0.2975, 0.3293],
[0.3878, 0.0735, 0.8181],
[0.1694, 0.5446, 0.1179],
[0.7793, 0.6613, 0.1748],
[0.0964, 0.9825, 0.1651],
[0.1421, 0.0994, 0.8086]])
Build the cumulative list of indices:
>>> c = torch.cumsum(torch.tensor([0] + lens), 0)
tensor([ 0, 2, 5, 6, 9, 10])
Loop over c by twos, with an overlapping window. For example zip(c[:-1], c[1:]) works well. Each selection from i to j gets averaged on dim=0.
>>> [t[i:j].sum(0) for i, j in zip(c[:-1], c[1:])]
[tensor([0.7628, 1.0463, 1.6351]),
tensor([1.0442, 1.6685, 1.2367]),
tensor([0.3878, 0.0735, 0.8181]),
tensor([1.0451, 2.1885, 0.4578]),
tensor([0.1421, 0.0994, 0.8086])]
Then you can stack the list:
>>> torch.stack([t[i:j].sum(0) for i, j in zip(c[:-1], c[1:])])
tensor([[0.7628, 1.0463, 1.6351],
[1.0442, 1.6685, 1.2367],
[0.3878, 0.0735, 0.8181],
[1.0451, 2.1885, 0.4578],
[0.1421, 0.0994, 0.8086]])

SymPy result Filtering

I was recently working on a CodeForce problem
So, I was using SymPy to solve this.
My code is :
from sympy import *
x,y = symbols("x,y", integer = True)
m,n = input().split(" ")
sol = solve([x**2 + y - int(n), y**2 + x - int(m)], [x, y])
print(sol)
What I wanted to do:
Filter only Positive and integer value from SymPy
Ex: If I put 14 28 in the terminal it will give me tons of result, but I just want it to show [(5, 3)]
I don't think that this is the intended way to solve the code force problem (I think you're just supposed to loop over the possible values for one of the variables).
I'll show how to make use of SymPy here anyway though. Your problem is a diophantine system of equations. Although SymPy has a diophantine solver it only works for individual equations rather than systems.
Usually the idea of using a CAS for something like this though is to symbolically find something like a general result that then helps you to write faster concrete numerical code. Here are your equations with m and n as arbitrary symbols:
In [62]: x, y, m, n = symbols('x, y, m, n')
In [63]: eqs = [x**2 + y - n, y**2 + x - m]
Using the polynomial resultant we can eliminate either x or y from this system to obtain a quartic polynomial for the remaining variable:
In [31]: py = resultant(eqs[0], eqs[1], x)
In [32]: py
Out[32]:
2 2 4
m - 2⋅m⋅y - n + y + y
While there is a quartic general formula that SymPy can use (if you use solve or roots here) it is too complicated to be useful for a problem like the one that you are describing. Instead though the rational root theorem tells us that an integer root for y must be a divisor of the constant term:
In [33]: py.coeff(y, 0)
Out[33]:
2
m - n
Therefore the possible values for y are:
In [64]: yvals = divisors(py.coeff(y, 0).subs({m:14, n:28}))
In [65]: yvals
Out[65]: [1, 2, 3, 4, 6, 7, 8, 12, 14, 21, 24, 28, 42, 56, 84, 168]
Since x is m - y**2 the corresponding values for x are:
In [66]: solve(eqs[1], x)
Out[66]:
⎡ 2⎤
⎣m - y ⎦
In [67]: xvals = [14 - yv**2 for yv in yvals]
In [68]: xvals
Out[68]: [13, 10, 5, -2, -22, -35, -50, -130, -182, -427, -562, -770, -1750, -3122, -7042, -28210]
The candidate solutions are then given by:
In [69]: candidates = [(xv, yv) for xv, yv in zip(xvals, yvals) if xv > 0]
In [70]: candidates
Out[70]: [(13, 1), (10, 2), (5, 3)]
From there you can test which values are solutions:
In [74]: eqsmn = [eq.subs({m:14, n:28}) for eq in eqs]
In [75]: [c for c in candidates if all(eq.subs(zip([x,y],c))==0 for eq in eqsmn)]
Out[75]: [(5, 3)]
The algorithmically minded will probably see from the above example how to make a much more efficient way of implementing the solver.
I've figured out the answer to my question ! At first, I was trying to filter the result from solve(). But there is an easy way to do this.
Pseudo code:
solve() gives the intersection point of both Parabolic Equations as a List
I just need to filter() the other types of values. Which in my case is <sympy.core.add.Add>
def rem(_list):
return list(filter(lambda v: type(v) != Add, _list))
Yes, You can also use type(v) == int
Final code:
from sympy import *
# the other values were <sympy.core.add.Add> type. So, I just defined a function to filterOUT these specific types from my list.
def rem(_list):
return list(filter(lambda v: type(v) != Add, _list))
x,y = symbols("x,y", integer = True, negative = False)
output = []
m,n = input().split(' ')
# I need to solve these 2 equations separately. Otherwise, my defined function will not work without loop.
solX = rem(solve((x+(int(n)-x**2)**2 - int(m)), x))
solY = rem(solve((int(m) - y**2)**2 + y - int(n), y))
if len(solX) == 0 or len(solY) == 0:
print(0)
else:
output.extend(solX) # using "Extend" to add multiple values in the list.
output.extend(solY)
print(int((len(output))/2)) # Obviously, result will come in pairs. So, I need to divide the length of the list by 2.
Why I used this way :
I tried to solve it by algorithmic way, but it still had some float numbers. I just wanted to skip the loop thing here again !
As sympy solve() has already found the values. So, I skipped the other way and focused on filtering !
Sadly, code force compiler shows a runtime error! I guess it can't import sympy. However, it works fine in VSCode.

Adding two numbers from list and assigning the output to a matrix

I'm trying to add two numbers from a list and assign the addition of two numbers to a matrix in Python.I tried like this below
import numpy as np
#Create bin and cost list
bin = [10, 20]
cost = [10, 20]
# Create a result matrix of 2 x 2
res_mat = [[0.0 for i in range(len(bin))] for j in range(len(cost))]
for b in bin:
for c in cost:
for i in range(len(bin)):
for j in range(len(cost)):
a = c + b
res_mat[i][j] = a
print(np.array(res_mat)) #Print the final result matrix
When I print the res_mat I get the matrix like below :
[[40 40]
[40 40]]
While I'm expecting the correct matrix like below :
[[20 30]
[30 40]]
So what change should be made so that the matrix correctly displays the result?
Try:
for i, b in enumerate(bin):
for j, c in enumerate(cost):
a = c + b
res_mat[i][j] = a

Roll of different amount along a single axis in a 3D matrix [duplicate]

I have a matrix (2d numpy ndarray, to be precise):
A = np.array([[4, 0, 0],
[1, 2, 3],
[0, 0, 5]])
And I want to roll each row of A independently, according to roll values in another array:
r = np.array([2, 0, -1])
That is, I want to do this:
print np.array([np.roll(row, x) for row,x in zip(A, r)])
[[0 0 4]
[1 2 3]
[0 5 0]]
Is there a way to do this efficiently? Perhaps using fancy indexing tricks?
Sure you can do it using advanced indexing, whether it is the fastest way probably depends on your array size (if your rows are large it may not be):
rows, column_indices = np.ogrid[:A.shape[0], :A.shape[1]]
# Use always a negative shift, so that column_indices are valid.
# (could also use module operation)
r[r < 0] += A.shape[1]
column_indices = column_indices - r[:, np.newaxis]
result = A[rows, column_indices]
numpy.lib.stride_tricks.as_strided stricks (abbrev pun intended) again!
Speaking of fancy indexing tricks, there's the infamous - np.lib.stride_tricks.as_strided. The idea/trick would be to get a sliced portion starting from the first column until the second last one and concatenate at the end. This ensures that we can stride in the forward direction as needed to leverage np.lib.stride_tricks.as_strided and thus avoid the need of actually rolling back. That's the whole idea!
Now, in terms of actual implementation we would use scikit-image's view_as_windows to elegantly use np.lib.stride_tricks.as_strided under the hoods. Thus, the final implementation would be -
from skimage.util.shape import view_as_windows as viewW
def strided_indexing_roll(a, r):
# Concatenate with sliced to cover all rolls
a_ext = np.concatenate((a,a[:,:-1]),axis=1)
# Get sliding windows; use advanced-indexing to select appropriate ones
n = a.shape[1]
return viewW(a_ext,(1,n))[np.arange(len(r)), (n-r)%n,0]
Here's a sample run -
In [327]: A = np.array([[4, 0, 0],
...: [1, 2, 3],
...: [0, 0, 5]])
In [328]: r = np.array([2, 0, -1])
In [329]: strided_indexing_roll(A, r)
Out[329]:
array([[0, 0, 4],
[1, 2, 3],
[0, 5, 0]])
Benchmarking
# #seberg's solution
def advindexing_roll(A, r):
rows, column_indices = np.ogrid[:A.shape[0], :A.shape[1]]
r[r < 0] += A.shape[1]
column_indices = column_indices - r[:,np.newaxis]
return A[rows, column_indices]
Let's do some benchmarking on an array with large number of rows and columns -
In [324]: np.random.seed(0)
...: a = np.random.rand(10000,1000)
...: r = np.random.randint(-1000,1000,(10000))
# #seberg's solution
In [325]: %timeit advindexing_roll(a, r)
10 loops, best of 3: 71.3 ms per loop
# Solution from this post
In [326]: %timeit strided_indexing_roll(a, r)
10 loops, best of 3: 44 ms per loop
In case you want more general solution (dealing with any shape and with any axis), I modified #seberg's solution:
def indep_roll(arr, shifts, axis=1):
"""Apply an independent roll for each dimensions of a single axis.
Parameters
----------
arr : np.ndarray
Array of any shape.
shifts : np.ndarray
How many shifting to use for each dimension. Shape: `(arr.shape[axis],)`.
axis : int
Axis along which elements are shifted.
"""
arr = np.swapaxes(arr,axis,-1)
all_idcs = np.ogrid[[slice(0,n) for n in arr.shape]]
# Convert to a positive shift
shifts[shifts < 0] += arr.shape[-1]
all_idcs[-1] = all_idcs[-1] - shifts[:, np.newaxis]
result = arr[tuple(all_idcs)]
arr = np.swapaxes(result,-1,axis)
return arr
I implement a pure numpy.lib.stride_tricks.as_strided solution as follows
from numpy.lib.stride_tricks import as_strided
def custom_roll(arr, r_tup):
m = np.asarray(r_tup)
arr_roll = arr[:, [*range(arr.shape[1]),*range(arr.shape[1]-1)]].copy() #need `copy`
strd_0, strd_1 = arr_roll.strides
n = arr.shape[1]
result = as_strided(arr_roll, (*arr.shape, n), (strd_0 ,strd_1, strd_1))
return result[np.arange(arr.shape[0]), (n-m)%n]
A = np.array([[4, 0, 0],
[1, 2, 3],
[0, 0, 5]])
r = np.array([2, 0, -1])
out = custom_roll(A, r)
Out[789]:
array([[0, 0, 4],
[1, 2, 3],
[0, 5, 0]])
By using a fast fourrier transform we can apply a transformation in the frequency domain and then use the inverse fast fourrier transform to obtain the row shift.
So this is a pure numpy solution that take only one line:
import numpy as np
from numpy.fft import fft, ifft
# The row shift function using the fast fourrier transform
# rshift(A,r) where A is a 2D array, r the row shift vector
def rshift(A,r):
return np.real(ifft(fft(A,axis=1)*np.exp(2*1j*np.pi/A.shape[1]*r[:,None]*np.r_[0:A.shape[1]][None,:]),axis=1).round())
This will apply a left shift, but we can simply negate the exponential exponant to turn the function into a right shift function:
ifft(fft(...)*np.exp(-2*1j...)
It can be used like that:
# Example:
A = np.array([[1,2,3,4],
[1,2,3,4],
[1,2,3,4]])
r = np.array([1,-1,3])
print(rshift(A,r))
Building on divakar's excellent answer, you can apply this logic to 3D array easily (which was the problematic that brought me here in the first place). Here's an example - basically flatten your data, roll it & reshape it after::
def applyroll_30(cube, threshold=25, offset=500):
flattened_cube = cube.copy().reshape(cube.shape[0]*cube.shape[1], cube.shape[2])
roll_matrix = calc_roll_matrix_flattened(flattened_cube, threshold, offset)
rolled_cube = strided_indexing_roll(flattened_cube, roll_matrix, cube_shape=cube.shape)
rolled_cube = triggered_cube.reshape(cube.shape[0], cube.shape[1], cube.shape[2])
return rolled_cube
def calc_roll_matrix_flattened(cube_flattened, threshold, offset):
""" Calculates the number of position along time axis we need to shift
elements in order to trig the data.
We return a 1D numpy array of shape (X*Y, time) elements
"""
# armax(...) finds the position in the cube (3d) where we are above threshold
roll_matrix = np.argmax(cube_flattened > threshold, axis=1) + offset
# ensure we don't have index out of bound
roll_matrix[roll_matrix>cube_flattened.shape[1]] = cube_flattened.shape[1]
return roll_matrix
def strided_indexing_roll(cube_flattened, roll_matrix_flattened, cube_shape):
# Concatenate with sliced to cover all rolls
# otherwise we shift in the wrong direction for my application
roll_matrix_flattened = -1 * roll_matrix_flattened
a_ext = np.concatenate((cube_flattened, cube_flattened[:, :-1]), axis=1)
# Get sliding windows; use advanced-indexing to select appropriate ones
n = cube_flattened.shape[1]
result = viewW(a_ext,(1,n))[np.arange(len(roll_matrix_flattened)), (n - roll_matrix_flattened) % n, 0]
result = result.reshape(cube_shape)
return result
Divakar's answer doesn't do justice to how much more efficient this is on large cube of data. I've timed it on a 400x400x2000 data formatted as int8. An equivalent for-loop does ~5.5seconds, Seberg's answer ~3.0seconds and strided_indexing.... ~0.5second.

Python3: set a range of data

I feel this must be very basic but I cannot find a simple way.
I am using python3
I have many data files with x,y data where x goes from 0 to 140 (floating).
Let's say
0, 2.1
0.5,3.5
0.8,3.2
...
I want to import values of x within the range 25.4 to 28.1 and their correspondent values in y. Every file might have different length so the value x>25.4 might appear in different row.
I am looking for something equivalent to the following command in gnuplot:
set xrange [25.4:28.1]
This time I cannot use gnuplot because the data processing requires more than the capabilities of gnuplot.
I imported the data with Pandas but I cannot set a range.
Thank you.
r = range(start, stop, step) is the pattern for this in Python.
So, for example, to get:
r == [0, 1, 2]
You would write:
r = [x for x in range(3)]
And to get:
r == [0, 5, 10]
You would write:
r = [x for x in range(0, 11, 5)]
This doesn't get you very far because:
r = [0, .2, 4.3, 6.3]
r = [x for x in r if x in range(3, 10)]
# r == []
But you can do:
r = [0, .2, 4.3, 6.3]
r = [x for x in r if ((x > 3) & (x < 10))]
# r == [4.3, 6.3]
Pandas and Numpy give you a much more concise way of doing this. Consider the following demo of .between
import pandas as pd
import io
text = io.StringIO("""Close Top_Barrier Bottom_Barrier
0 441.86 441.964112 426.369888
1 448.95 444.162225 425.227108
2 449.99 446.222271 424.285063
3 449.74 447.947051 423.678282
4 451.97 449.879254 423.029413""")
df = pd.read_csv(text, sep='\\s+')
df = df[df["Close"].between(449, 452)] # between
df
So for your df you can do the same: df = df[df["x"].between(min, max)]

Resources