We have several "in_arrays" like
in_1=np.array([0.4,0.7,0.8,0.3])
in_2=np.array([0.9,0.8,0.6,0.4])
I need to create two outputs like
out_1=np.array([0,0,1,0])
out_2=np.array([1,1,0,0])
So, the given element of the output array is 1 if the value in the corresponding input array is greater than 0.5 AND the value in this position of this input array is greater than the values of other arrays in this position. What is the efficient way to do this?
You can aggregate all the input arrays in a single matrix, where each row represents a particular input array. That way it is possible to calculate all the output arrays again as a single matrix.
The code could look something like that:
import numpy as np
# input matrix corresponding to the example input arrays given in the question
in_matrix = np.array([[0.4,0.7,0.8,0.3], [0.9,0.8,0.6,0.4]])
out_matrix = np.zeros(in_matrix.shape)
# each element in the array is the maximal value of the corresponding column in input_matrix
max_values = np.max(in_matrix, axis=0)
# compute the values in the output matrix row by row
for n, row in enumerate(in_matrix):
out_matrix[n] = np.logical_and(row > 0.5, row == max_values)
Related
I have a numpy matrix representing rgb image. Its shape is (n,m,3) with n rows, m columns and 3 channels. I want to convert it to list of rgb values along with corresponding indeces.
I can convert to list of rgb values but I am trying to have row and col indeces alongside as well.
We can do something like this for rgb values only.
flat_image = np.reshape(image, [-1,3]) # shape = [mxn, 3]
After also adding row and column number, the shape should be [mxn, 3+2]
so first three columns in the flat image represent rgb, fourth column represents row number from the original image array and fifth column represent col number from the original imagem array.
You can use numpy.indices to construct the row/column indices and then concatenate that with your flat_image
indices = np.indices(image.shape[:-1])
result = np.concatenate([flat_image, indices], axis=-1)
I have a sorted array of distances, e.g.:
d = np.linspace(0.5, 50, 200)
and I want to iteratively get the indexes of adjacent elements that are at a distance smaller or equal than L for each element of d.
Is there any simple way of doing it?
You can use numpy.argwhere after checking that consecutive differences are less than or equal to L:
import numpy as np
np.argwhere((d[1:]-d[:-1])<=L)
What is the meaning of below lines., especially confused about how iloc[:,1:] is working ? and also data[:,:1]
data = np.asarray(train_df_mv_norm.iloc[:,1:])
X, Y = data[:,1:],data[:,:1]
Here train_df_mv_norm is a dataframe --
Definition: pandas iloc
.iloc[] is primarily integer position based (from 0 to length-1 of the
axis), but may also be used with a boolean array.
For example:
df.iloc[:3] # slice your object, i.e. first three rows of your dataframe
df.iloc[0:3] # same
df.iloc[0, 1] # index both axis. Select the element from the first row, second column.
df.iloc[:, 0:5] # first five columns of data frame with all rows
So, your dataframe train_df_mv_norm.iloc[:,1:] will select all rows but your first column will be excluded.
Note that:
df.iloc[:,:1] select all rows and columns from 0 (included) to 1 (excluded).
df.iloc[:,1:] select all rows and columns, but exclude column 1.
To complete the answer by KeyMaker00, I add that data[:,:1] means:
The first : - take all rows.
:1 - equal to 0:1 take columns starting from column 0,
up to (excluding) column 1.
So, to sum up, the second expression reads only the first column from data.
As your expression has the form:
<variable_list> = <expression_list>
each expression is substituted under the corresponding variable (X and Y).
Maybe it will complete the answers before.
You will know
what you get,
its shape
how to use it with de column name
df.iloc[:,1:2] # get column 1 as a DATAFRAME of shape (n, 1)
df.iloc[:,1:2].values # get column 1 as an NDARRAY of shape (n, 1)
df.iloc[:,1].values # get column 1 as an NDARRAY of shape ( n,)
df.iloc[:,1] # get column 1 as a SERIES of shape (n,)
# iloc with the name of a column
df.iloc[:, df.columns.get_loc('my_col')] # maybe there is some more
elegants methods
I would like to create an numpy array like the one below, except I want to be able to create a variable shape array. So for the one below it would be n=3. Is there an a slick way to do this with numpy, or do I need a for loop.
output data:
import numpy as np
np.array([1,0,0],[0,1,0],[0,0,1],[1,1,1],[0,0,0])
Say you want to create array with name d having row number of rows and col number of columns.
It would also initialize all elements of the array with 0.
d = [[0 for x in range(col)] for y in range(row)]
You can access any element I,j by d[i][j].
i have a function that is supposed to chain link a list of daily returns in a dataframe, but when i pass the column, the function is returning a series, rather than a float
def my_aggfunc(x):
y = np.exp(np.log1p(x).cumsum())
return y
if however i change the second line to be
np.sum(x)
this returns a float
Any ideas pls?
np.log1p(x) is an array.
np.log1p(x).cumsum() is another array of the same size.
np.exp(np.log1p(x).cumsum()) is yet another array.
I'm assuming you didn't want cumsum you wanted sum
np.exp(np.log1p(x).sum())
From the np.exp docs:
Calculate the exponential of all elements in the input array.
Returns: out : ndarray Output array, element-wise exponential of x.
So y is an array.