Iterating over a list of arrays to use as input in a function - python-3.x

In the code below, how can I iterate over y to get all the 5 groups of 2 arrays each to use as input to func?
I know I could just do :
func(y[0],y[1])
func(y[2],y[3])...etc....
But I cant code the lines above because I can have hundreds of arrays in y
import numpy as np
import itertools
# creating an array with 100 samples
array = np.random.rand(100)
# making the array an iterator
iter_array = iter(array)
# Cerating a list of list to store 10 list of 10 elements each
n = 10
result = [[] for _ in range(n)]
# Starting the list creating
for _ in itertools.repeat(None, 10):
for i in range(n):
result[i].append(next(iter_array))
# Casting the lists to arrays
y=np.array([np.array(xi) for xi in result], dtype=object)
#list to store the results of the calculation below
result_func =[]
#applying a function take takes 2 arrays as input
#I have 10 arrays within y, so I need to perfom the function below 5 times: [0,1],[2,3],[4,5],[6,7],[8,9]
a = func(y[0],y[1])
# Saving the result
result_func.append(a)

You could use list comprehension:
result_func = [func(y[i], y[i+1]) for i in range(0, 10, 2)]
or the general for loop:
for i in range(0, 10, 2):
result_func.append(funct(y[i], y[i+1]))

Because of numpy's fill-order when reshaping, you could reshape the array to have
a variable depth (depending on the number of arrays)
a height of two
the same width as the number of elements in each input row
Thus when filling it will fill two rows before needing to increase the depth by one.
Iterating over this array results in a series of matrices (one for each depthwise layer). Each matrix has two rows, which comes out to be y[0], y[1], y[2], y[3], and so on.
For examples sake say the inner arrays each have length 6, and that there are 8 of them in total (so that there are 4 function calls):
import numpy as np
elems_in_row = 6
y = np.array(
[[1,2,3,4,5,6],
[7,8,9,10,11,12],
[13,14,15,16,17,18],
[19,20,21,22,23,24],
[25,26,27,28,29,30],
[31,32,33,34,35,36],
[37,38,39,40,41,42],
[43,44,45,46,47,48],
])
# the `-1` makes the number of rows be inferred from the input array.
y2 = y.reshape((-1,2,elems_in_row))
for ar1,ar2 in y2:
print("1st:", ar1)
print("2nd:", ar2)
print("")
output:
1st: [1 2 3 4 5 6]
2nd: [ 7 8 9 10 11 12]
1st: [13 14 15 16 17 18]
2nd: [19 20 21 22 23 24]
1st: [25 26 27 28 29 30]
2nd: [31 32 33 34 35 36]
1st: [37 38 39 40 41 42]
2nd: [43 44 45 46 47 48]
As a sidenote, if your function outputs simple values (like integers or floats) and does not have side-effects like IO, it may perhaps be possible to use apply_along_axis to create the output array directly without explicitly iterating over the pairs.

Related

How to extract and remove few range of indices from a two dimensional numpy array in python

I am struck into a problem and it is required to be resolved. I have created a two dimensional matrix from a continuous range of length. Now, I want to extract few ranges of indices from that 2D matrix. Suppose, I have a matrix like:
a = [[ 12 4 35 0 26 15 100]
[17 37 29 87 46 95 120]]
Now I want to delete some part based on the indices for example: index number 2 to 5 and 8:10. After deleting I want to return my array with same two dimension. Thank you in advance.
I have tried many ways like numpy stacking and concatenating but I cannot solve the problem.
deleting columns of the numpy array is relatively straight forward.
using a corrected example from the question, it looks like this:
import numpy as np
a = np.array([
[12, 4, 35, 0, 26, 15, 100],
[17, 37, 29, 87, 46, 95, 120]])
print('first array:')
print(a)
# deletes items from first row
b = np.delete(a, [2,3], 1)
print('second array:')
print(b)
which gives this:
first array:
[[ 12 4 35 0 26 15 100]
[ 17 37 29 87 46 95 120]]
second array:
[[ 12 4 26 15 100]
[ 17 37 46 95 120]]
So the columns 2,3,4 have been removed in the above example.

Generate conditional lists of lists in Pandas, "Pythonically"

I want to generate a conditional list of lists. The number of embedded lists is determined by the number of unique conditions, and each embedded list contains values from a given condition.
I can generate this list of lists using a for-loop. See the code below. However, I am looking for a faster and more Pythonic (i.e, no for-loop) approach.
import pandas as pd
from random import randint
example_conditions = ["A","A","B","B","B","C","D","D","D","D"]
example_values = [randint(-100,100) for _ in example_conditions ]
df = pd.DataFrame({
"conditions":example_conditions,
"values": example_values
})
lol = []
for condition in df["conditions"].unique():
sublist = df.loc[df["conditions"]==condition]["values"].values.tolist()
lol.append(sublist)
Thanks!
Try:
x = df.groupby("conditions")["values"].agg(list).to_list()
print(x)
Prints:
[[-1, 78], [33, 74, -79], [59], [-32, -2, 52, -66]]
Input dataframe:
conditions values
0 A -1
1 A 78
2 B 33
3 B 74
4 B -79
5 C 59
6 D -32
7 D -2
8 D 52
9 D -66

How to convert a list of elements to n*n space seperated arrangement where n is number of elements in the list

this is my list :
N= 9
Mylist=[9,8,7,6,5,4,3,2,1]
For this input
Output should be :
9 8 7
6 5 4
3 2 1
It sounds like you're wondering how to turn a list into a numpy array of a particular shape. Documentation is here.
import numpy as np
my_list=[3,9,8,7,6,5,4,3,2,1]
# Dropping the first item of your list as it isn't used in your output
array = np.array(my_list[1:]).reshape((3,3))
print(array)
Output
[[9 8 7]
[6 5 4]
[3 2 1]]

Performing pair-wise comparisons of some pandas dataframe rows as efficiently as possible

For a given pandas dataframe df, I would like to compare every sample (row) with each other.
For bigger datasets this would lead to too many comparisons (n**2). Therefore, it is necessary to perform these comparisons only for smaller groups (i.e. for all of those which share the same id) and as efficiently as possible.
I would like to construct a dataframe (df_pairs), which contains in every row one pair. Additionally, I would like to get all pair indices (ideally as a Python set).
First, I construct an example dataframe:
import numpy as np
import pandas as pd
from functools import reduce
from itertools import product, combinations
n_samples = 10_000
suffixes = ["_1", "_2"] # for df_pairs
id_str = "id"
df = pd.DataFrame({id_str: np.random.randint(0, 10, n_samples),
"A": np.random.randint(0, 100, n_samples),
"B": np.random.randint(0, 100, n_samples),
"C": np.random.randint(0, 100, n_samples)}, index=range(0, n_samples))
columns_df_pairs = ([elem + suffixes[0] for elem in df.columns] +
[elem + suffixes[1] for elem in df.columns])
In the following, I am comparing 4 different options with the corresponding performance measures:
Option 1
groups = df.groupby(id_str).groups # get the groups
pairs_per_group = [set(product(elem.tolist(), repeat=2)) for _, elem in groups.items()] # determine pairs per group
set_of_pairs = reduce(set.union, pairs_per_group) # convert all groups into one set
idcs1, idcs2 = zip(*[(e1, e2) for e1, e2 in set_of_pairs])
df_pairs = pd.DataFrame(np.hstack([df.values[idcs1, :], df.values[idcs2, :]]), # construct the dataframe of pairs
columns=columns_df_pairs,
index=pd.MultiIndex.from_tuples(set_of_pairs, names=('index 1', 'index 2')))
df_pairs.drop([id_str + suffixes[0], id_str + suffixes[1]], inplace=True, axis=1)
Option 1 takes 34.2 s ± 1.28 s.
Option 2
groups = df.groupby(id_str).groups # get the groups
pairs_per_group = [np.array(np.meshgrid(elem.values, elem.values)).T.reshape(-1, 2) for _, elem in groups.items()]
idcs = np.unique(np.vstack(pairs_per_group), axis=0)
df_pairs2 = pd.DataFrame(np.hstack([df.values[idcs[:, 0], :], df.values[idcs[:, 1], :]]), # construct the dataframe of pairs
columns=columns_df_pairs,
index=pd.MultiIndex.from_arrays([idcs[:, 0], idcs[:, 1]], names=('index 1', 'index 2')))
df_pairs2.drop([id_str + suffixes[0], id_str + suffixes[1]], inplace=True, axis=1)
Option 2 takes 13 s ± 1.34 s.
Option 3
groups = df.groupby(id_str).groups # get the groups
pairs_per_group = [np.array([np.tile(elem.values, len(elem.values)), np.repeat(elem.values, len(elem.values))]).T.reshape(-1, 2) for _, elem in groups.items()]
idcs = np.unique(np.vstack(pairs_per_group), axis=0)
df_pairs3 = pd.DataFrame(np.hstack([df.values[idcs[:, 0], :], df.values[idcs[:, 1], :]]), # construct the dataframe of pairs
columns=columns_df_pairs,
index=pd.MultiIndex.from_arrays([idcs[:, 0], idcs[:, 1]], names=('index 1', 'index 2')))
df_pairs3.drop([id_str + suffixes[0], id_str + suffixes[1]], inplace=True, axis=1)
Option 3 takes 12.1 s ± 347 ms.
Option 4
df_pairs4 = pd.merge(left=df, right=df, how="inner", on=id_str, suffixes=suffixes)
# here, I do not know how to get the MultiIndex in
df_pairs4.drop([id_str], inplace=True, axis=1)
Option 4 is computed the quickest with 1.41 s ± 239 ms. However, I do not have the paired indices in this case.
I could improve the performance a little bit by using comparisons instead of product of itertools. I could also build the comparison matrix and use only the upper triangular one and construct my dataframe from there. This however does not seem to be more efficient than performing the cartesian product and removing the self-references as well as inverse comparisons (a, b) = (b, a).
Could you tell me a more efficient way to get pairs for comparison (ideally as a set to be able to use set operations)?
Could I use merge or another pandas function to construct my desired dataframe with the multi-indices?
An inner merge will destroy the index in favor of a new Int64Index. If the index is important bring it along as a column by reset_index, then set those columns back to the Index.
df_pairs4 = (pd.merge(left=df.reset_index(), right=df.reset_index(),
how="inner", on=id_str, suffixes=suffixes)
.set_index(['index_1', 'index_2']))
id A_1 B_1 C_1 A_2 B_2 C_2
index_1 index_2
0 0 4 92 79 10 92 79 10
13 4 92 79 10 83 68 69
24 4 92 79 10 67 73 90
25 4 92 79 10 22 31 35
36 4 92 79 10 64 44 20
... .. ... ... ... ... ... ...
9993 9971 7 20 65 92 47 65 21
9977 7 20 65 92 50 35 27
9980 7 20 65 92 43 36 62
9992 7 20 65 92 99 2 17
9993 7 20 65 92 20 65 92

Index a 2D matrix with a 1D matrix

I have a 2D matrix of values named matrix1 as shown below:
col1 col2 col3
1 1 0
2 1 2
I have a 1D matrix of values named arr1 as shown below:
col1
10
20
30
I would like to use values from this 2D matrix to index values from a 1D matrix, creating a new 2D matrix in the process.
new_col1 new_col2 new_col3
20 20 10
30 20 30
The actual arrays is shaped (512,1) and matrix shaped (65672, 720). I have tried using arr1[matrix1] but I end up getting a memory error.
A Python3 Solution:
import numpy as np
x = np.array([[1, 1, 0], [2, 1, 2]])
y = np.array([10, 20, 30])
y[x]
Output:
array([[20, 20, 10],
[30, 20, 30]])
So I noticed that I was using a 32 bit python interpreter instead of a 64 bit python interpreter ( I am using a virtual environment in pycharm) changing the python interpreter to 64 bit fixed this memory error.

Resources