I would like to create an numpy array like the one below, except I want to be able to create a variable shape array. So for the one below it would be n=3. Is there an a slick way to do this with numpy, or do I need a for loop.
output data:
import numpy as np
np.array([1,0,0],[0,1,0],[0,0,1],[1,1,1],[0,0,0])
Say you want to create array with name d having row number of rows and col number of columns.
It would also initialize all elements of the array with 0.
d = [[0 for x in range(col)] for y in range(row)]
You can access any element I,j by d[i][j].
Related
We have several "in_arrays" like
in_1=np.array([0.4,0.7,0.8,0.3])
in_2=np.array([0.9,0.8,0.6,0.4])
I need to create two outputs like
out_1=np.array([0,0,1,0])
out_2=np.array([1,1,0,0])
So, the given element of the output array is 1 if the value in the corresponding input array is greater than 0.5 AND the value in this position of this input array is greater than the values of other arrays in this position. What is the efficient way to do this?
You can aggregate all the input arrays in a single matrix, where each row represents a particular input array. That way it is possible to calculate all the output arrays again as a single matrix.
The code could look something like that:
import numpy as np
# input matrix corresponding to the example input arrays given in the question
in_matrix = np.array([[0.4,0.7,0.8,0.3], [0.9,0.8,0.6,0.4]])
out_matrix = np.zeros(in_matrix.shape)
# each element in the array is the maximal value of the corresponding column in input_matrix
max_values = np.max(in_matrix, axis=0)
# compute the values in the output matrix row by row
for n, row in enumerate(in_matrix):
out_matrix[n] = np.logical_and(row > 0.5, row == max_values)
I have a pd DataFrame mat. From this DataFrame, I want to get the count of cells that contain a specific value (>0.5 in this case). To do so, I used:
mat[mat[:] > 0.5].count().sum()
This seems like a lot of code for a simple application. Is this the most efficient way to get the count?
Use sum only with mask for count Trues values:
(mat > 0.5).sum()
If need total sum:
np.sum(mat > 0.5).sum()
(mat > 0.5).sum().sum()
Or if convert values to 2d numpy array np.sum return scalar:
np.sum(mat.to_numpy() > 0.5)
I have a sorted array of distances, e.g.:
d = np.linspace(0.5, 50, 200)
and I want to iteratively get the indexes of adjacent elements that are at a distance smaller or equal than L for each element of d.
Is there any simple way of doing it?
You can use numpy.argwhere after checking that consecutive differences are less than or equal to L:
import numpy as np
np.argwhere((d[1:]-d[:-1])<=L)
What is the meaning of below lines., especially confused about how iloc[:,1:] is working ? and also data[:,:1]
data = np.asarray(train_df_mv_norm.iloc[:,1:])
X, Y = data[:,1:],data[:,:1]
Here train_df_mv_norm is a dataframe --
Definition: pandas iloc
.iloc[] is primarily integer position based (from 0 to length-1 of the
axis), but may also be used with a boolean array.
For example:
df.iloc[:3] # slice your object, i.e. first three rows of your dataframe
df.iloc[0:3] # same
df.iloc[0, 1] # index both axis. Select the element from the first row, second column.
df.iloc[:, 0:5] # first five columns of data frame with all rows
So, your dataframe train_df_mv_norm.iloc[:,1:] will select all rows but your first column will be excluded.
Note that:
df.iloc[:,:1] select all rows and columns from 0 (included) to 1 (excluded).
df.iloc[:,1:] select all rows and columns, but exclude column 1.
To complete the answer by KeyMaker00, I add that data[:,:1] means:
The first : - take all rows.
:1 - equal to 0:1 take columns starting from column 0,
up to (excluding) column 1.
So, to sum up, the second expression reads only the first column from data.
As your expression has the form:
<variable_list> = <expression_list>
each expression is substituted under the corresponding variable (X and Y).
Maybe it will complete the answers before.
You will know
what you get,
its shape
how to use it with de column name
df.iloc[:,1:2] # get column 1 as a DATAFRAME of shape (n, 1)
df.iloc[:,1:2].values # get column 1 as an NDARRAY of shape (n, 1)
df.iloc[:,1].values # get column 1 as an NDARRAY of shape ( n,)
df.iloc[:,1] # get column 1 as a SERIES of shape (n,)
# iloc with the name of a column
df.iloc[:, df.columns.get_loc('my_col')] # maybe there is some more
elegants methods
i have a function that is supposed to chain link a list of daily returns in a dataframe, but when i pass the column, the function is returning a series, rather than a float
def my_aggfunc(x):
y = np.exp(np.log1p(x).cumsum())
return y
if however i change the second line to be
np.sum(x)
this returns a float
Any ideas pls?
np.log1p(x) is an array.
np.log1p(x).cumsum() is another array of the same size.
np.exp(np.log1p(x).cumsum()) is yet another array.
I'm assuming you didn't want cumsum you wanted sum
np.exp(np.log1p(x).sum())
From the np.exp docs:
Calculate the exponential of all elements in the input array.
Returns: out : ndarray Output array, element-wise exponential of x.
So y is an array.