i have a function that is supposed to chain link a list of daily returns in a dataframe, but when i pass the column, the function is returning a series, rather than a float
def my_aggfunc(x):
y = np.exp(np.log1p(x).cumsum())
return y
if however i change the second line to be
np.sum(x)
this returns a float
Any ideas pls?
np.log1p(x) is an array.
np.log1p(x).cumsum() is another array of the same size.
np.exp(np.log1p(x).cumsum()) is yet another array.
I'm assuming you didn't want cumsum you wanted sum
np.exp(np.log1p(x).sum())
From the np.exp docs:
Calculate the exponential of all elements in the input array.
Returns: out : ndarray Output array, element-wise exponential of x.
So y is an array.
Related
We have several "in_arrays" like
in_1=np.array([0.4,0.7,0.8,0.3])
in_2=np.array([0.9,0.8,0.6,0.4])
I need to create two outputs like
out_1=np.array([0,0,1,0])
out_2=np.array([1,1,0,0])
So, the given element of the output array is 1 if the value in the corresponding input array is greater than 0.5 AND the value in this position of this input array is greater than the values of other arrays in this position. What is the efficient way to do this?
You can aggregate all the input arrays in a single matrix, where each row represents a particular input array. That way it is possible to calculate all the output arrays again as a single matrix.
The code could look something like that:
import numpy as np
# input matrix corresponding to the example input arrays given in the question
in_matrix = np.array([[0.4,0.7,0.8,0.3], [0.9,0.8,0.6,0.4]])
out_matrix = np.zeros(in_matrix.shape)
# each element in the array is the maximal value of the corresponding column in input_matrix
max_values = np.max(in_matrix, axis=0)
# compute the values in the output matrix row by row
for n, row in enumerate(in_matrix):
out_matrix[n] = np.logical_and(row > 0.5, row == max_values)
I am having two lists with repeating values and I wanted to take the intersection of the repeating values along with the values that have occurred only once in any one of the lists.
I am just a beginner and would love to hear simple suggestions!
Method 1
l1=[1,2,3,4]
l2=[1,2,5]
intersection=[value for value in l1 if value in l2]
for x in l1+l2:
if x not in intersection:
intersection.append(x)
print(intersection)
Method 2
print(list(set(l1+l2)))
The data from my files is stored in 4D arrays in python of shape (64,128,64,3). The code I run is in a grid code format, so the shape tells us that there are 64 cells in the x,128 in the y, and 64 in the z. The 3 is the x, y, and z components of velocity. What I want to do is compute the average x velocity in each direction for every cell in y.
Let's start in the corner of my grid. I want the first element of my average array to be the average of the x velocity of all the x cells and all the z cells in position y[0]. The next element should be the same, but for y[1]. The end result should be an array of shape (128).
I'm fairly new to python, so I could be missing something simple, but I don't see a way to do this with one np.mean statement because you need to sum over two axes (In this case, 1 and 2 I think). I tried
velx_avg = np.mean(ds['u'][:,:,:,0],axis=1)
here, ds is the data set I've loaded in, and the module I've used to load it stores the velocity data under 'u'. This gave me an array of shape (64,64).
What is the most efficient way to produce the result that I want?
You can use the flatten command to make your life here much easier, this takes an np.ndarray and flattens it into one dimension.
The challenge here is trying to find your definition of 'efficient', but you can play around with that yourself. To do what you want, I simply iterate over the array and flatten the x and z component into a continuous array, and then take the mean of that, see below:
velx_avg = np.mean([ds['u'][:, i, :, 0].flatten() for i in range(128)], axis=1)
I would like to create an numpy array like the one below, except I want to be able to create a variable shape array. So for the one below it would be n=3. Is there an a slick way to do this with numpy, or do I need a for loop.
output data:
import numpy as np
np.array([1,0,0],[0,1,0],[0,0,1],[1,1,1],[0,0,0])
Say you want to create array with name d having row number of rows and col number of columns.
It would also initialize all elements of the array with 0.
d = [[0 for x in range(col)] for y in range(row)]
You can access any element I,j by d[i][j].
guys, I have this function
def averageRating(a,b):
avg = (float(a)+float(b))/2
return round(avg/25)*25
Currently, I am looping over my np array which is just a 2D array that has numerical values. What I want to be able to do is have "a" be the 1st array and "b" be the 2nd array and get the average per row and what I want for my return is just an array with the values. I have used mean but could not find a way to edit it and have the round() part or multiple (avg*25)/25.
My goal is to get rid of looping and replace it with a vectorized operations because of how slow looping is.
Sorry for the question new to python and numpy.
def averageRating(a,b):
avg = (np.average(a,axis=1) + np.average(b,axis=1))/2
return np.round(avg,0)
This should do what you are looking for if I understand the question correctly. Specifying axis = 1 in np.average will give the average of the rows (axis = 0 would be the average of the columns). And the 0 in np.round will round to 0 decimal places, changing it will change the number of decimal places you round to. Hope that helps!
def averageRating(a, b):
averages = []
for i in range( len(a) ):
averages.append( (a[i] + b[i]) / 2 )
return averages
Giving your arrays are of equal length this should be a simple resolution.
This doesn't eliminate the use of for loops, however, it will be computationally cheaper than the current approach.