Parallel iteration with for loop - python-3.x

I have 3 lists with the same size, I tried to loop through those loop by for command, but I always get the following error, my question is how to make Parallel iteration within for loop
a=list[...]
b=list[...]
c=list[...]
arrayList=[a,b,c]
for x,y,z in a,b,c:
do somthing
or
for x,y,z in arrayList:
do somthing
error
ValueError: too many values to unpack (expected 3)

You should probably use zip(), that creates tuples of same index elements from given collections:
>>> xs = [1,2,3,4]
>>> ys= [5,6,7,8]
>>> zs = [9,10,11,12]
>>> for x, y, z in zip(xs,ys,zs):
... print(x,y,z)
the output here is:
1 5 9
2 6 10
3 7 11
4 8 12
>>>

Related

Compare value in a dataframe to multiple columns of another dataframe to get a list of lists where entries match in an efficient way

I have two pandas dataframes and i want to find all entries of the second dataframe where a specific value occurs.
As an example:
df1:
NID
0 1
1 2
2 3
3 4
4 5
df2:
EID N1 N2 N3 N4
0 1 1 2 13 12
1 2 2 3 14 13
2 3 3 4 15 14
3 4 4 5 16 15
4 5 5 6 17 16
5 6 6 7 18 17
6 7 7 8 19 18
7 8 8 9 20 19
8 9 9 10 21 20
9 10 10 11 22 21
Now, what i basically want, is a list of lists with the values EID (from df2) where the values NID (from df1) occur in any of the columns N1,N2,N3,N4:
Solution would be:
sol = [[1], [1, 2], [2, 3], [3, 4], [4, 5]]
The desired solution explained:
The solution has 5 entries (len(sol = 5)) since I have 5 entries in df1.
The first entry in sol is 1 because the value NID = 1 only appears in the columns N1,N2,N3,N4 for EID=1 in df2.
The second entry in sol refers to the value NID=2 (of df1) and has the length 2 because NID=2 can be found in column N1 (for EID=2) and in column N2 (for EID=1). Therefore, the second entry in the solution is [1,2] and so on.
What I tried so far is looping for each element in df1 and then looping for each element in df2 to see if NID is in any of the columns N1,N2,N3,N4. This solution works but for huge dataframes (each df can have up to some thousand entries) this solution becomes extremely time-consuming.
Therefore I was looking for a much more efficient solution.
My code as implemented:
Input data:
import pandas as pd
df1 = pd.DataFrame({'NID':[1,2,3,4,5]})
df2 = pd.DataFrame({'EID':[1,2,3,4,5,6,7,8,9,10],
'N1':[1,2,3,4,5,6,7,8,9,10],
'N2':[2,3,4,5,6,7,8,9,10,11],
'N3':[13,14,15,16,17,18,19,20,21,22],
'N4':[12,13,14,15,16,17,18,19,20,21]})
solution acquired using looping:
sol= []
for idx,node in df1.iterrows():
x = []
for idx2,elem in df2.iterrows():
if node['NID'] == elem['N1']:
x.append(elem['EID'])
if node['NID'] == elem['N2']:
x.append(elem['EID'])
if node['NID'] == elem['N3']:
x.append(elem['EID'])
if node['NID'] == elem['N4']:
x.append(elem['EID'])
sol.append(x)
print(sol)
If anyone has a solution where I do not have to loop, I would be very happy. Maybe using a numpy function or something like cKDTrees but unfortunately I have no idea on how to get this problem solved in a faster way.
Thank you in advance!
You can reshape with melt, filter with loc, and groupby.agg as list. Then reindex and convert tolist:
out = (df2
.melt('EID') # reshape to long form
# filter the values that are in df1['NID']
.loc[lambda d: d['value'].isin(df1['NID'])]
# aggregate as list
.groupby('value')['EID'].agg(list)
# ensure all original NID are present in order
# and convert to list
.reindex(df1['NID']).tolist()
)
Alternative with stack:
df3 = df2.set_index('EID')
out = (df3
.where(df3.isin(df1['NID'].tolist())).stack()
.reset_index(name='group')
.groupby('group')['EID'].agg(list)
.reindex(df1['NID']).tolist()
)
Output:
[[1], [2, 1], [3, 2], [4, 3], [5, 4]]

Pandas DataFrame producing unexpected result with list comprehension

When I am using list comprehension on dataframe to find common elements in each columns.
df
A B C
0 1 2 0
1 3 4 6
2 5 6 7
3 7 3 3
4 9 1 9
l=[i for i in df.A if i in df.B ]
l
[1, 3]
list2=[i for i in l if i in df.C]
list2
[1, 3]
first list comprehension produces the result as expected i.e common element in A and B are [1,3].
But [i for i in l if i in df.C] this line produces unexpected result.
Convert the DataFrame column to a list.
list2 = [i for i in l if i in list(df.C)]
OR
list2=[i for i in l if i in df.C.tolist()]
output of list2:
>>>print(list2)
[3]
This is because df.C returns a Series with index '1' included.
You can also use df.C.values instead.

How to convert a list of elements to n*n space seperated arrangement where n is number of elements in the list

this is my list :
N= 9
Mylist=[9,8,7,6,5,4,3,2,1]
For this input
Output should be :
9 8 7
6 5 4
3 2 1
It sounds like you're wondering how to turn a list into a numpy array of a particular shape. Documentation is here.
import numpy as np
my_list=[3,9,8,7,6,5,4,3,2,1]
# Dropping the first item of your list as it isn't used in your output
array = np.array(my_list[1:]).reshape((3,3))
print(array)
Output
[[9 8 7]
[6 5 4]
[3 2 1]]

How can I delete useless strings by index from a Pandas DataFrame defining a function?

I have a DataFrame, namely 'traj', as follow:
x y z
0 5 3 4
1 4 2 8
2 1 1 7
3 Some string here
4 This is spam
5 5 7 8
6 9 9 7
... #continues repeatedly a lot with the same strings here in index 3 and 4
79 4 3 3
80 Some string here
I'm defining a function in order to delete useless strings positioned in certain index from the DataFrame. Here is what I'm trying:
def spam(names,df): #names is a list composed, for instance, by "Some" and "This" in 'traj'
return df.drop(index = ([traj[(traj.iloc[:,0] == n)].index for n in names]))
But when I call it it returns the error:
traj_clean = spam(my_list_of_names, traj)
...
KeyError: '[(3,4,...80)] not found in axis'
If I try alone:
traj.drop(index = ([traj[(traj.iloc[:,0] == 'Some')].index for n in names]))
it works.
I solved it in a different way:
df = traj[~traj[:].isin(names)].dropna()
Where names is a list of the terms you wish to delete.
df will contain only rows without these terms

Reshaping in julia

If I reshape in python I use this:
import numpy as np
y= np.asarray([1,2,3,4,5,6,7,8])
x=2
z=y.reshape(-1, x)
print(z)
and get this
>>>
[[1 2]
[3 4]
[5 6]
[7 8]]
How would I get the same thing in julia? I tried:
z = [1,2,3,4,5,6,7,8]
x= 2
a=reshape(z,x,4)
println(a)
and it gave me:
[1 3 5 7
2 4 6 8]
If I use reshape(z,4,x) it would give
[1 5
2 6
3 7
4 8]
Also is there a way to do reshape without specifying the second dimension like reshape(z,x) or if the secondary dimension is more ambiguous?
I think what you have hit upon is NumPy stores in row-major order and Julia stores arrays in column major order as covered here.
So Julia is doing what numpy would do if you used
z=y.reshape(-1,x,order='F')
what you want is the transpose of your first attempt, which is
z = [1,2,3,4,5,6,7,8]
x= 2
a=reshape(z,x,4)'
println(a)
you want to know if there is something that will compute the 2nd dimension assuming the array is 2 dimensional? Not that I know of. Possibly ArrayViews? Here's a simple function to start
julia> shape2d(x,shape...)=length(shape)!=1?reshape(x,shape...):reshape(x,shape[1],Int64(length(x)/shape[1]))
shape2d (generic function with 1 method)
julia> shape2d(z,x)'
4x2 Array{Int64,2}:
1 2
3 4
5 6
7 8
How about
z = [1,2,3,4,5,6,7,8]
x = 2
a = reshape(z,x,4)'
which gives
julia> a = reshape(z,x,4)'
4x2 Array{Int64,2}:
1 2
3 4
5 6
7 8
As for your bonus question
"Also is there a way to do reshape without specifying the second
dimension like reshape(z,x) or if the secondary dimension is more
ambiguous?"
the answer is not exactly, because it'd be ambiguous: reshape can make 3D, 4D, ..., tensors so its not clear what is expected. You can, however, do something like
matrix_reshape(z,x) = reshape(z, x, div(length(z),x))
which does what I think you expect.
"Also is there a way to do reshape without specifying the second dimension like reshape(z,x) or if the secondary dimension is more ambiguous?"
Use : instead of -1
I'm using Julia 1.1 (not sure if there was a feature when it was originally answered)
julia> z = [1,2,3,4,5,6,7,8]; a = reshape(z,:,2)
4×2 Array{Int64,2}:
1 5
2 6
3 7
4 8
However, if you want the first row to be 1 2 and match Python, you'll need to follow the other answer mentioning row-major vs column-major ordering and do
julia> z = [1,2,3,4,5,6,7,8]; a = reshape(z,2,:)'
4×2 LinearAlgebra.Adjoint{Int64,Array{Int64,2}}:
1 2
3 4
5 6
7 8

Resources