Defining a function to calculate mean-differences at specific array size - python-3.x

I have an array:
arr = np.array([1,2,3,4,5,6,7,8]
I want to define a function to calculate the difference of means of the elements of this array but at a given length.
For example:
diff_avg(arr, size=2)
Expected Result:
[-2, -2]
because:
((1+2)/2) - ((3+4)/2)) = -2 -> first 4 elements because size is 2, so 2 groups of 2 elements
((5+6)/2) - ((7+8)/2)) = -2 -> last 4 elements
if size=3
then:
output: [-3]
because:
((1+2+3)/3) - ((4+5+6)/3)) = -3 -> first 6 elements
what I did so far:
def diff_avg(first_group, second_group, size):
results =[]
x = np.mean(first_group) - np.mean(second_group)
results.append(x)
return results
I don't know how to add the size parameter
I can use the first size elements with arr[:size] but how to get the next size elements.
Does anyone can help me?

First, truncate the array to remove the extra items:
size = 3
sized_array = arr[:arr.size // (size * 2) * (size * 2)]
# array([1, 2, 3, 4, 5, 6])
Next, reshape the sized array and get the means:
means = sized_array.reshape([2, size, -1]).mean(axis=1)
# array([[2.], [5.]])
Finally, take the differences:
means[0] - means[1]
#array([-3.])

Related

Python optimization of time-series data re-indexing based on multiple-parameter multi-varialbe input and singular value output

I am trying to optimize a funciton that is trying to maximize the correlation between two (pandas) time series arrays (X and Y). This is done by using three parameters (a, b, c) and a third time series array (Z). The Z array is used to reindex the values in the X array (based on the parameters a, b, c) in such a way as to maximize the correlation of the reindexed X array (Xnew) with the Y array.
Below is some pseudo-code to demonstrate what I amy trying to do. I have attempted this using LMfit and scipy optimize but I am not sure how to make this task work in those packages. For example in LMfit if I tried to minimize the MyOpt function (which passes back a single value of the correlation metric) then it complains that I have more parameters than outputs. However, if I pass back the time series of the corrlation metric (diff) the the parameter values remain fixed at their input values.
I know the reindexing function I am using works because using the rather crude methods similar to the code below give signifianct changes in the mean (diff) metric passed back.
My knowledge of these optimizaiton packages is not up to scratch for this job so if anyone has a suggestion on how to tackle this, I would be greatfull.
def GetNewIndex(Z, a, b ,c):
old_index = np.arange(0, len(Z))
index_adj = some_func(a,b,c)
new_index = old_index + index_adj
max_old = np.max(old_index)
new_index[new_index > max_old] = max_old
new_index[new_index < 0] = 0
return new_index
def MyOpt(params, X, Y ,Z):
a = params['A']
b = params['B']
c = params['C']
# estimate lag (in samples) based on ambient RH
new_index = GetNewIndex(Z, a, b, c)
# assign old values to new locations and convert back to pandas series
Xnew = np.take(X.values, new_index)
Xnew = pd.Series(Xnew, index=X.index)
cc = Y.rolling(1201, center=True).corr(Xnew)
cc = cc.interpolate(limit_direction='both', limit_area=None)
diff = 1-np.abs(cc)
return np.mean(diff)
#==================================================
X = some long pandas time series data
Y = some long pandas time series data
Z = some long pandas time series data
As = [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2]
Bs = [0, 0 ,0, 1, 1, 1, 0, 0, 0, 1, 1, 1]
Cs = [5, 6, 5, 6, 5, 6, 5, 6, 5, 6, 5, 6]
outs = []
for A, B, C in zip(As, Bs, Cs):
params={'A':A, 'B':B, 'C':C}
out = MyOpt(params, X, Y, Z)
outs.append(out)

Foobar Lucky Triple

I am trying to solve the following problem:
Write a function solution(l) that takes a list of positive integers l and counts the number of "lucky triples" of (li, lj, lk) where the list indices meet the requirement i < j < k. The length of l is between 2 and 2000 inclusive. A "lucky triple" is a tuple (x, y, z) where x divides y and y divides z, such as (1, 2, 4). The elements of l are between 1 and 999999 inclusive. The solution fits within a signed 32-bit integer. Some of the lists are purposely generated without any access codes to throw off spies, so if no triples are found, return 0.
For example, [1, 2, 3, 4, 5, 6] has the triples: [1, 2, 4], [1, 2, 6], [1, 3, 6], making the solution 3 total.
My solution only passes the first two tests; I am trying to understand what it is wrong with my approach rather then the actual solution. Below is my function for reference:
def my_solution(l):
from itertools import combinations
if 2<len(l)<=2000:
l = list(combinations(l, 3))
l= [value for value in l if value[1]%value[0]==0 and value[2]%value[1]==0]
#l= [value for value in l if (value[1]/value[0]).is_integer() and (value[2]/value[1]).is_integer()]
if len(l)<0xffffffff:
l= len(l)
return l
else:
return 0
If you do nested iteration of the full list and remaining list, then compare the two items to check if they are divisors... the result counts as the beginning and middle numbers of a 'triple',
then on the second round it will calculate the third... All you need to do is track which ones pass the divisor test along the way.
For Example
def my_solution(l):
row1, row2 = [[0] * len(l) for i in range(2)] # Tracks which indices pass modulus
for i1, first in enumerate(l):
for i2 in range(i1+1, len(l)): # iterate the remaining portion of the list
middle = l[i2]
if not middle % first: # check for matches
row1[i2] += 1 # increment the index in the tracker lists..
row2[i1] += 1 # for each matching pair
result = sum([row1[i] * row2[i] for i in range(len(l))])
# the final answer will be the sum of the products for each pair of values.
return result

Roll of different amount along a single axis in a 3D matrix [duplicate]

I have a matrix (2d numpy ndarray, to be precise):
A = np.array([[4, 0, 0],
[1, 2, 3],
[0, 0, 5]])
And I want to roll each row of A independently, according to roll values in another array:
r = np.array([2, 0, -1])
That is, I want to do this:
print np.array([np.roll(row, x) for row,x in zip(A, r)])
[[0 0 4]
[1 2 3]
[0 5 0]]
Is there a way to do this efficiently? Perhaps using fancy indexing tricks?
Sure you can do it using advanced indexing, whether it is the fastest way probably depends on your array size (if your rows are large it may not be):
rows, column_indices = np.ogrid[:A.shape[0], :A.shape[1]]
# Use always a negative shift, so that column_indices are valid.
# (could also use module operation)
r[r < 0] += A.shape[1]
column_indices = column_indices - r[:, np.newaxis]
result = A[rows, column_indices]
numpy.lib.stride_tricks.as_strided stricks (abbrev pun intended) again!
Speaking of fancy indexing tricks, there's the infamous - np.lib.stride_tricks.as_strided. The idea/trick would be to get a sliced portion starting from the first column until the second last one and concatenate at the end. This ensures that we can stride in the forward direction as needed to leverage np.lib.stride_tricks.as_strided and thus avoid the need of actually rolling back. That's the whole idea!
Now, in terms of actual implementation we would use scikit-image's view_as_windows to elegantly use np.lib.stride_tricks.as_strided under the hoods. Thus, the final implementation would be -
from skimage.util.shape import view_as_windows as viewW
def strided_indexing_roll(a, r):
# Concatenate with sliced to cover all rolls
a_ext = np.concatenate((a,a[:,:-1]),axis=1)
# Get sliding windows; use advanced-indexing to select appropriate ones
n = a.shape[1]
return viewW(a_ext,(1,n))[np.arange(len(r)), (n-r)%n,0]
Here's a sample run -
In [327]: A = np.array([[4, 0, 0],
...: [1, 2, 3],
...: [0, 0, 5]])
In [328]: r = np.array([2, 0, -1])
In [329]: strided_indexing_roll(A, r)
Out[329]:
array([[0, 0, 4],
[1, 2, 3],
[0, 5, 0]])
Benchmarking
# #seberg's solution
def advindexing_roll(A, r):
rows, column_indices = np.ogrid[:A.shape[0], :A.shape[1]]
r[r < 0] += A.shape[1]
column_indices = column_indices - r[:,np.newaxis]
return A[rows, column_indices]
Let's do some benchmarking on an array with large number of rows and columns -
In [324]: np.random.seed(0)
...: a = np.random.rand(10000,1000)
...: r = np.random.randint(-1000,1000,(10000))
# #seberg's solution
In [325]: %timeit advindexing_roll(a, r)
10 loops, best of 3: 71.3 ms per loop
# Solution from this post
In [326]: %timeit strided_indexing_roll(a, r)
10 loops, best of 3: 44 ms per loop
In case you want more general solution (dealing with any shape and with any axis), I modified #seberg's solution:
def indep_roll(arr, shifts, axis=1):
"""Apply an independent roll for each dimensions of a single axis.
Parameters
----------
arr : np.ndarray
Array of any shape.
shifts : np.ndarray
How many shifting to use for each dimension. Shape: `(arr.shape[axis],)`.
axis : int
Axis along which elements are shifted.
"""
arr = np.swapaxes(arr,axis,-1)
all_idcs = np.ogrid[[slice(0,n) for n in arr.shape]]
# Convert to a positive shift
shifts[shifts < 0] += arr.shape[-1]
all_idcs[-1] = all_idcs[-1] - shifts[:, np.newaxis]
result = arr[tuple(all_idcs)]
arr = np.swapaxes(result,-1,axis)
return arr
I implement a pure numpy.lib.stride_tricks.as_strided solution as follows
from numpy.lib.stride_tricks import as_strided
def custom_roll(arr, r_tup):
m = np.asarray(r_tup)
arr_roll = arr[:, [*range(arr.shape[1]),*range(arr.shape[1]-1)]].copy() #need `copy`
strd_0, strd_1 = arr_roll.strides
n = arr.shape[1]
result = as_strided(arr_roll, (*arr.shape, n), (strd_0 ,strd_1, strd_1))
return result[np.arange(arr.shape[0]), (n-m)%n]
A = np.array([[4, 0, 0],
[1, 2, 3],
[0, 0, 5]])
r = np.array([2, 0, -1])
out = custom_roll(A, r)
Out[789]:
array([[0, 0, 4],
[1, 2, 3],
[0, 5, 0]])
By using a fast fourrier transform we can apply a transformation in the frequency domain and then use the inverse fast fourrier transform to obtain the row shift.
So this is a pure numpy solution that take only one line:
import numpy as np
from numpy.fft import fft, ifft
# The row shift function using the fast fourrier transform
# rshift(A,r) where A is a 2D array, r the row shift vector
def rshift(A,r):
return np.real(ifft(fft(A,axis=1)*np.exp(2*1j*np.pi/A.shape[1]*r[:,None]*np.r_[0:A.shape[1]][None,:]),axis=1).round())
This will apply a left shift, but we can simply negate the exponential exponant to turn the function into a right shift function:
ifft(fft(...)*np.exp(-2*1j...)
It can be used like that:
# Example:
A = np.array([[1,2,3,4],
[1,2,3,4],
[1,2,3,4]])
r = np.array([1,-1,3])
print(rshift(A,r))
Building on divakar's excellent answer, you can apply this logic to 3D array easily (which was the problematic that brought me here in the first place). Here's an example - basically flatten your data, roll it & reshape it after::
def applyroll_30(cube, threshold=25, offset=500):
flattened_cube = cube.copy().reshape(cube.shape[0]*cube.shape[1], cube.shape[2])
roll_matrix = calc_roll_matrix_flattened(flattened_cube, threshold, offset)
rolled_cube = strided_indexing_roll(flattened_cube, roll_matrix, cube_shape=cube.shape)
rolled_cube = triggered_cube.reshape(cube.shape[0], cube.shape[1], cube.shape[2])
return rolled_cube
def calc_roll_matrix_flattened(cube_flattened, threshold, offset):
""" Calculates the number of position along time axis we need to shift
elements in order to trig the data.
We return a 1D numpy array of shape (X*Y, time) elements
"""
# armax(...) finds the position in the cube (3d) where we are above threshold
roll_matrix = np.argmax(cube_flattened > threshold, axis=1) + offset
# ensure we don't have index out of bound
roll_matrix[roll_matrix>cube_flattened.shape[1]] = cube_flattened.shape[1]
return roll_matrix
def strided_indexing_roll(cube_flattened, roll_matrix_flattened, cube_shape):
# Concatenate with sliced to cover all rolls
# otherwise we shift in the wrong direction for my application
roll_matrix_flattened = -1 * roll_matrix_flattened
a_ext = np.concatenate((cube_flattened, cube_flattened[:, :-1]), axis=1)
# Get sliding windows; use advanced-indexing to select appropriate ones
n = cube_flattened.shape[1]
result = viewW(a_ext,(1,n))[np.arange(len(roll_matrix_flattened)), (n - roll_matrix_flattened) % n, 0]
result = result.reshape(cube_shape)
return result
Divakar's answer doesn't do justice to how much more efficient this is on large cube of data. I've timed it on a 400x400x2000 data formatted as int8. An equivalent for-loop does ~5.5seconds, Seberg's answer ~3.0seconds and strided_indexing.... ~0.5second.

Hi i am new to python and was wondering how do i find the max value in my search algorithm?

Hi so im currently taking discrete structures and algorithm course and have to work with python for the first time so im having a little trouble getting my function find the max value in the list can you take a look at my code because im trying to also convert to pseudocode:
def max_search(numbers):
numbers = [1, 5, 9, 3, 4, 6]
max = numbers = [0]
for i in range(1, len(numbers)):
if numbers[i] > max:
max = numbers[i]
max_search(numbers)
print(max)
Use the max method provided for list
max(numbers)
When you write the code for maximum number in a list, start by thinking of base cases, which will be.
Maximum can be pre-defined constant, say -1 if the list is empty
Maximum is the first element in the list, if the list only has one element.
After that, if the list is longer, you assign the first element of the list as maximum, and then you iterate through the list, updating the maximum if you find a number which is greater than the maximum.
def max_search(numbers):
#Maximum of an empty list is undefined, I defined it as -1
if len(numbers) == 0:
return -1
#Maximum of a list with one element is the element itself
if len(numbers) == 1:
return numbers[0]
max = numbers[0]
#Iterate through the list and update maximum on the fly
for num in numbers:
if num >= max:
max = num
return max
In your case, you are overwriting the numbers argument with another list inside the function [1, 5, 9, 3, 4, 6], and you are recursively calling the same functions with same arguments, which will lead to Stack Overflow
I have made some changes
def max_search(numbers):
max = -1 # if numbers contains all positive number
for i in range(len(numbers)):
if numbers[i] > max:
max = numbers[i]
max = max_search([1, 5, 9, 3, 4, 6])
print(max)

Filter array by last value Toleranz

i‘ m using Python 3.7.
I have an Array like this:
L1 = [1,2,3,-10,8,12,300,17]
Now i want to filter the values(the -10 and the 300 is not okay)
The values in the array may be different but always counting up or counting down.
Has Python 3 a integrated function for that?
The result should look like this:
L1 = [1,2,3,8,12,17]
Thank you !
Edit from comments:
I want to keep each element if it is only a certain distance (toleranz: 10 f.e.) distance away from the one before.
Your array is a list. You can use built in functions:
L1 = [1,2,3,-10,8,12,300,17]
min_val = min(L1) # -10
max_val = max(L1) # 300
p = list(filter(lambda x: min_val < x < max_val, L1)) # all x not -10 or 300
print(p) # [1, 2, 3, 8, 12, 17]
Doku:
min()
max()
filter()
If you want instead an incremental filter you go through your list of datapoints and decide if to keep or not:
delta = 10
result = []
last = L1[0] # first one as last value .. check the remaining list L1[1:]
for elem in L1[1:]:
if last-delta < elem < last+delta:
result.append(last)
last = elem
if elem-delta < result[-1] < elem+delta :
result.append(elem)
print(result) # [1, 2, 3, 8, 12, 17]

Resources