python 3 comprehension dictionary - python-3.x

This is my code:
def brujinGraph(k, strList):
vertex = [[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0], [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0], [1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0]]
brujinGraph = {strList[i]: strList[j][:-1] for i in range(len(vertex)) for j in range(k) and vertex[i][j] == 1}
return brujinGraph
strList = ['AAGA', 'AAGA', 'AGAT', 'ATTC', 'CTAA', 'CTCT', 'GATT', 'TAAG', 'TCTA', 'TCTC', 'TTCT']
brujinGraph(4, strList)
and it is throwing me an UnboundLocalError: local variable 'j' referenced before assignment
any idea what does it means and why am I getting this error?

Without knowing exactly what vertex and strList are :
Do you actually mean :
{strList[i]: strList[j][:-1] for i in range(len(vertex)) for j in range(len(vertex[i])) if vertex[i][j] == 1}
i.e. change that and into an if

Couple of issues:
You need an if not an and at the end
I think it is better expressed this way:
brujinGraph = {strList[i]: strList[j][:-1] for i, x in enumerate(vertex) for j, e in enumerate(x) if e == 1}

Related

why do I get: "unsupervised_wiener() got an unexpected keyword argument 'max_num_iter'" when using skimage.restoration.unsupervised_wiener?

i am playing around with scikit image restoration package and successfully ran the unsupervised_wiener algorithm on some made up data. In this simple example it does what I expect, but on my more complicated dataset it returns a striped pattern with extreme values of -1 and 1.
I would like to fiddle with the parameters to better understand what is going on, but I get the error as stated in the question. I tried scikit image version 0.19.3 and downgraded to scikit image version 0.19.2, but the error remains.
The same goes for the "other parameters":https://scikit-image.org/docs/0.19.x/api/skimage.restoration.html#skimage.restoration.unsupervised_wiener
Can someone explain why I can't input parameters?
The example below contains a "scan" and a "point-spread-function". I convolve the scan with the point spread function and then reverse the process using the unsupervised wiener deconvolution.
import numpy as np
import matplotlib.pyplot as plt
from skimage import color, data, restoration
import pickle
rng = np.random.default_rng()
from scipy.signal import convolve2d as conv2
scan = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0.5, 0.5, 0.5, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0.5, 1, 0.5, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0.5, 1, 0.5, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0.5, 1, 0.5, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0.5, 0.5, 0.5, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
])
print(scan.shape)
psf = np.array([
[1, 1, 1, 1, 1],#1
[1, 0, 0, 0, 1],#2
[1, 0, 0, 0, 1],#3
[1, 0, 0, 0, 1],#4
[1, 1, 1, 1, 1]#5
])
psf = psf/(np.sum(psf))
print(psf)
scan_conv = conv2(scan, psf, 'same')
deconvolved1, _ = restoration.unsupervised_wiener(scan_conv, psf, max_num_iter=10)
fig, ax = plt.subplots(nrows=1, ncols=3, figsize=(8, 5),
sharex=True, sharey=True)
ax[0].imshow(scan, vmin=scan.min(), vmax=1)
ax[0].axis('off')
ax[0].set_title('Data')
ax[1].imshow(scan_conv)
ax[1].axis('off')
ax[1].set_title('Data_distorted')
ax[2].imshow(deconvolved1)
ax[2].axis('off')
ax[2].set_title('restoration1')
fig.tight_layout()
plt.show()

Difference between dp = [[0]*8]*8 and dp2 = [([0]*8) for i in range(8)] [duplicate]

This question already has answers here:
List of lists changes reflected across sublists unexpectedly
(17 answers)
Closed 8 months ago.
What is the diference between defining with dp = [[0]*8]*8 and dp2 = [([0]*8) for i in range(8)] ? They seem to be equal but when I set one value in one case and the other they set it diferently. Why ?
Thanks
>>> dp = [[0]*8]*8
>>> dp
[[0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0]]
>>> dp2 = [([0]*8) for i in range(8)]
>>> dp2
[[0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0]]
>>> dp[1][4] = 1
>>> dp2[1][4] = 1
>>> dp
[[0, 0, 0, 0, 1, 0, 0, 0], [0, 0, 0, 0, 1, 0, 0, 0], [0, 0, 0, 0, 1, 0, 0, 0], [0, 0, 0, 0, 1, 0, 0, 0], [0, 0, 0, 0, 1, 0, 0, 0], [0, 0, 0, 0, 1, 0, 0, 0], [0, 0, 0, 0, 1, 0, 0, 0], [0, 0, 0, 0, 1, 0, 0, 0]]
>>> dp2
[[0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 1, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0]]
>>>
They are not equal
dp = [[0]*8]*8 here there is only one inner list obejct. It's like below
a=[0]*8
dp=[a,a,a,a,a,a,a,a]
That's why when you change item in dp[1] all others are changed. There is only one inner list. dp holds multiple refereces for that same list object.
dp2 = [([0]*8) for i in range(8)] here multiple inner list objects are created. It's like below,
dp2=[[0]*8,[0]*8,[0]*8,[0]*8,[0]*8,[0]*8,[0]*8,[0]*8]
That's why you can change only 1 element without affecting others

Improve time efficiency in double loops

I have a working code to make some calculations and create a dataframe, however it take a considerable amount of time when the number if id:s considered grows (actually, time increases exponentially).
So, here if the situation: I have a dataframe consisting of vectors, one for each id:
id vector
0 A4070270297516241 [0, 0, 0, 0, 0, 0, 0, 0, 7, 4, 0, 0, 0, 13, 0,...
1 A4060461064716279 [0, 2, 0, 0, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
2 A4050500015016271 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 3, 0, 0, ...
3 A4050494283416274 [15, 13, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
4 A4050500876316279 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
5 A4050494111016270 [6, 10, 1, 0, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
6 A4050470673516272 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
7 A4060461035616276 [0, 0, 0, 11, 0, 15, 13, 0, 5, 3, 0, 0, 0, 0, ...
8 A4050500809916271 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
9 A4050500822216279 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, ...
10 A4050494817416277 [0, 0, 0, 0, 0, 4, 9, 0, 5, 8, 0, 15, 0, 0, 8,...
11 A4060462005116279 [15, 12, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
12 A4050500802316278 [0, 0, 0, 0, 0, 1, 2, 0, 2, 2, 0, 15, 12, 0, 8...
13 A4050500841416272 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 0, 5, ...
14 A4050494856516271 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0, 3, ...
15 A4060462230216270 [0, 0, 2, 2, 15, 15, 10, 0, 0, 0, 0, 0, 0, 0, ...
16 A4090150867216273 [0, 0, 0, 0, 0, 0, 0, 13, 6, 3, 0, 2, 0, 15, 4...
17 A4060464010916275 [0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
18 A4139311891213540 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
19 A4050500938416279 [0, 10, 11, 6, 6, 2, 4, 0, 0, 0, 0, 0, 0, 0, 0...
20 A4620451871516274 [0, 0, 0, 0, 15, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
21 A4060460331116279 [0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 5, 15, 0, 2,...
I provide a dict at the end of the question to avoid clutter.
Now, what I do is that I determine, for each id which other id is the closest by calculating a weighted distance between each vector and creating a dataframe storing the infomation:
ids = list(set(df1.id))
Closest_identifier = pd.DataFrame(columns = ['id','Identifier','Closest identifier','distance'])
and my code goes like this:
import time
t = time.process_time()
for idnr in ids:
df_identifier = df1[df1['id'] == idnr]
identifier = df_identifier['vector'].to_list()
base_identifier = np.array(df_identifier['vector'].to_numpy().tolist())
Number_of_devices =len(np.nonzero(identifier)[1])
df_other_identifier = df1[df1['id'] != idnr]
other_id = list(set(df_other_identifier.id))
for id_other in other_id:
gf_identifier = df_other_identifier[df_other_identifier['id']==id_other]
identifier_other = np.array(gf_identifier['vector'].to_numpy().tolist())
dist = np.sqrt(np.sum((base_identifier - identifier_other)**2/Number_of_devices))
Closest_identifier = Closest_identifier.append({'id':id_other,'Identifier':base_identifier, 'Closest identifier':identifier_other, 'distance':dist},ignore_index=True)
elapsed_time = time.process_time() - t
print(elapsed_time)
6.0625
To explain what is happening: In the fisrt part of the code, I choose an id and set up all the infomation I need. The number of devices is the number of non zero values of the vector associated to that id (i.e., the number of devices that detected the object with that id). In the second part I compute the distance of that id to all other objects.
So, for each id, I have n-1 rows, where n is the length of my id set. So, for 50 ids, I have 50*50-50 = 2450 rows
The time given here is for 50 ids. For 200, the time for the loops to finish is 120 s, for 400 the time is 871 s. As you can see, time grows exponentially here. I have 1700 ids and it'll take days for this to complete.
My questions is thus: Is there a more efficient way to do this?
Grateful for insights.
Test data
{'id': {0: 'A4070270297516241',
1: 'A4060461064716279',
2: 'A4050500015016271',
3: 'A4050494283416274',
4: 'A4050500876316279',
5: 'A4050494111016270',
6: 'A4050470673516272',
7: 'A4060461035616276',
8: 'A4050500809916271',
9: 'A4050500822216279',
10: 'A4050494817416277',
11: 'A4060462005116279',
12: 'A4050500802316278',
13: 'A4050500841416272',
14: 'A4050494856516271',
15: 'A4060462230216270',
16: 'A4090150867216273',
17: 'A4060464010916275',
18: 'A4139311891213540',
19: 'A4050500938416279',
20: 'A4620451871516274',
21: 'A4060460331116279',
22: 'A4060454590916277',
23: 'A4060454778016276',
24: 'A4060462019716270',
25: 'A4050500945416277',
26: 'A4050494267716279',
27: 'A4090281644816244',
28: 'A4050500929516270',
29: 'N4010442537213363',
30: 'A4050500938216277',
31: 'A4060454598916275',
32: 'A4050494086216273',
33: 'A4060462859616271',
34: 'A4060454600116271',
35: 'A4050494551816276',
36: 'A4610490015816279',
37: 'A4060454605416279',
38: 'A4060454665916270',
39: 'A4060454579316278',
40: 'A4060464023516275',
41: 'A4050500588616272',
42: 'A4050500905516274',
43: 'A4070262442416243',
44: 'A4050500946716271',
45: 'A4070271195016244',
46: 'A4060454663216271',
47: 'A4060454590416272',
48: 'A4060461993616279',
49: 'N4010442139713366'},
'vector': {0: [0,
0,
0,
0,
0,
0,
0,
0,
7,
4,
0,
0,
0,
13,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0],
1: [0,
2,
0,
0,
6,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
4,
9,
14],
2: [0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
1,
3,
0,
0,
0,
5,
12,
15,
2,
0,
0,
0,
0,
0,
0],
3: [15,
13,
3,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
4,
5],
4: [0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
15,
15,
0,
0,
0,
0,
0,
0],
5: [6,
10,
1,
0,
2,
1,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
5,
10,
13,
15],
6: [0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
2,
15,
7,
2,
0,
0,
0],
7: [0,
0,
0,
11,
0,
15,
13,
0,
5,
3,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0],
8: [0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
3,
10,
2,
0,
0],
9: [0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
8,
15,
5,
0,
0,
0,
0,
0,
0,
0,
0,
0],
10: [0,
0,
0,
0,
0,
4,
9,
0,
5,
8,
0,
15,
0,
0,
8,
2,
0,
0,
0,
0,
0,
0,
0,
5,
0,
0],
11: [15,
12,
2,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
2,
0,
6,
2],
12: [0,
0,
0,
0,
0,
1,
2,
0,
2,
2,
0,
15,
12,
0,
8,
1,
9,
2,
0,
0,
0,
0,
0,
0,
0,
0],
13: [0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
6,
0,
5,
3,
11,
15,
11,
1,
0,
0,
0,
0,
0,
0],
14: [0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
5,
0,
3,
0,
7,
12,
14,
1,
0,
0,
0,
0,
0,
0],
15: [0,
0,
2,
2,
15,
15,
10,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
15,
2,
15],
16: [0,
0,
0,
0,
0,
0,
0,
13,
6,
3,
0,
2,
0,
15,
4,
1,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0],
17: [0,
0,
0,
0,
7,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
15,
15,
8,
2],
18: [0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
2,
12,
9,
2,
0,
0],
19: [0,
10,
11,
6,
6,
2,
4,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
4],
20: [0,
0,
0,
0,
15,
3,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
2,
4,
14,
13,
11],
21: [0,
0,
0,
0,
0,
0,
0,
0,
0,
2,
0,
5,
15,
0,
2,
3,
10,
9,
0,
0,
0,
0,
0,
0,
0,
0],
22: [0,
0,
0,
2,
7,
15,
15,
0,
2,
3,
0,
15,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
4,
0,
4],
23: [2,
15,
15,
4,
2,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
2,
11],
24: [0,
0,
0,
0,
4,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
6,
15,
14,
2],
25: [0,
9,
13,
15,
6,
2,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
4],
26: [0,
0,
0,
1,
2,
8,
15,
0,
1,
4,
0,
15,
1,
0,
6,
0,
0,
0,
0,
0,
0,
0,
0,
8,
0,
0],
27: [0,
0,
0,
0,
0,
0,
0,
0,
2,
1,
0,
0,
0,
5,
4,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0],
28: [0,
7,
9,
6,
6,
4,
7,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
5],
29: [8,
6,
2,
2,
6,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
7,
11],
30: [0,
10,
11,
15,
6,
2,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
4],
31: [6,
15,
6,
0,
2,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
3,
8],
32: [0,
0,
0,
0,
2,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
2,
2,
11,
8,
9,
2],
33: [0,
0,
0,
0,
0,
11,
15,
0,
2,
4,
0,
3,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0],
34: [4,
15,
15,
5,
1,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
2,
1,
8],
35: [2,
1,
1,
0,
12,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
9,
15,
14,
12],
36: [0,
0,
0,
0,
0,
0,
5,
15,
4,
2,
0,
0,
0,
1,
2,
0,
3,
0,
0,
0,
0,
0,
0,
0,
0,
0],
37: [0,
0,
0,
0,
0,
11,
15,
0,
15,
15,
0,
14,
0,
3,
4,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
4],
38: [0,
0,
0,
0,
0,
3,
14,
0,
10,
15,
0,
14,
0,
2,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0],
39: [0,
0,
2,
0,
8,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
4,
15,
15,
5],
40: [0,
0,
0,
3,
0,
4,
10,
5,
15,
14,
0,
2,
2,
13,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0],
41: [0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
15,
10,
0,
0,
0,
0,
0,
0],
42: [0,
0,
0,
0,
0,
0,
0,
1,
0,
2,
0,
2,
0,
10,
15,
14,
6,
2,
0,
0,
0,
0,
0,
0,
0,
0],
43: [0,
0,
0,
0,
0,
0,
3,
2,
7,
8,
0,
2,
0,
15,
8,
2,
4,
0,
0,
0,
0,
0,
0,
0,
0,
0],
44: [0,
0,
0,
0,
0,
0,
0,
0,
0,
5,
0,
11,
15,
0,
3,
0,
13,
12,
0,
0,
0,
0,
0,
0,
0,
0],
45: [0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
3,
11,
0,
0,
0,
1,
0,
0,
0,
0,
0,
0,
0,
0,
0],
46: [0,
0,
0,
0,
0,
3,
11,
0,
15,
15,
0,
15,
2,
9,
5,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0],
47: [0,
2,
3,
0,
15,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
4,
15,
15,
6],
48: [0,
0,
0,
0,
2,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
3,
0,
11,
7,
9,
3],
49: [0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
12,
15,
9,
0,
0,
0]}}
Try:
# id1: [0, 0, 0, 1, 1, 1]
m = np.repeat(np.vstack(df1['vector']), df1.shape[0], axis=0)
# id2: [0, 1, 0, 1, 0, 1]
n = np.tile(np.vstack(df1['vector']), (df1.shape[0], 1))
# number of devices for each vector of m
d = np.count_nonzero(m, axis=1, keepdims=True)
# compute the distance
dist = np.sqrt(np.sum((m - n)**2/d, axis=-1))
# create the final dataframe
mi = pd.MultiIndex.from_product([df1['id']] * 2, names=['id1', 'id2'])
out = pd.DataFrame({'vector1': m.tolist(),
'vector2': n.tolist(),
'distance': dist}, index=mi).reset_index()
Output:
>>> out
id1 id2 vector1 vector2 distance
0 A4070270297516241 A4070270297516241 [0, 0, 0, 0, 0, 0, 0, 0, 7, 4, 0, 0, 0, 13, 0,... [0, 0, 0, 0, 0, 0, 0, 0, 7, 4, 0, 0, 0, 13, 0,... 0.000000
1 A4070270297516241 A4060461064716279 [0, 0, 0, 0, 0, 0, 0, 0, 7, 4, 0, 0, 0, 13, 0,... [0, 2, 0, 0, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 13.747727
2 A4070270297516241 A4050500015016271 [0, 0, 0, 0, 0, 0, 0, 0, 7, 4, 0, 0, 0, 13, 0,... [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 3, 0, 0, ... 14.628739
3 A4070270297516241 A4050494283416274 [0, 0, 0, 0, 0, 0, 0, 0, 7, 4, 0, 0, 0, 13, 0,... [15, 13, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0... 15.033296
4 A4070270297516241 A4050500876316279 [0, 0, 0, 0, 0, 0, 0, 0, 7, 4, 0, 0, 0, 13, 0,... [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 15.099669
... ... ... ... ... ...
2495 N4010442139713366 A4070271195016244 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 11, 0, 0,... 13.916417
2496 N4010442139713366 A4060454663216271 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... [0, 0, 0, 0, 0, 3, 11, 0, 15, 15, 0, 15, 2, 9,... 21.330729
2497 N4010442139713366 A4060454590416272 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... [0, 2, 3, 0, 15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... 19.304576
2498 N4010442139713366 A4060461993616279 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... [0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 12.288206
2499 N4010442139713366 N4010442139713366 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 0.000000
[2500 rows x 5 columns]
Performance
%timeit loop_op()
3.75 s ± 88.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit vect_corralien()
4.16 ms ± 12.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
You can calculate the squared distances with broadcasting. After you have that you just have to find num_devices for every row and use it to calculate your custom distance. After filling the diagonal with infinite values you can get the minimum index of every row which gives you the closest device.
arr = np.array([value for value in data['vector'].values()])
squared_distances = np.power((arr[:,None,:] - arr[None,:,:]).sum(axis=-1), 2)
num_devices = (arr != 0).sum(axis=1)
distances = np.sqrt(squared_distances / num_devices )
np.fill_diagonal(distances, np.inf)
closest_indentifiers = distances.argmin(axis=0)
You can format the output of the program as you desire

Convert connected components to adjacency matrix

l have an adjacency matrix of 16 by 16.
Adjacency=[0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
From this adjacency matrix l applied scipy algorithm to determine the connected components as follow :
from scipy.sparse.csgraph import connected_components
supernodes=connected_components(Adjacency)
which returns 4 components :
(4, array([0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 2, 2, 3, 0], dtype=int32))
Now the algorithm returns 4 components (4 new nodes or 4 supernodes 0,1,2,3) and its associated adjacency matrix is of dim=(4,4)
My question is as follow :
Given the intial adjacency matrix of 16 by 16 and the connected components, how can l compute efficiently the new adjacency matrix ?
In other way, we need to merge all the nodes that are affected to the same connected component.
EDIT 1 :
Here a concrete example. Given the following adjacency matrix of 6 nodes, dim=-6,6) :
Adjacency_matrix=[[0,1,1,0,0,1],
[1,0,0,1,0,0],
[1,0,0,0,1,1],
[0,1,0,0,1,0],
[0,0,1,1,0,0],
[1,0,1,0,0,0]]
Given three supernodes as follow :
supernodes[0]=[0,2]# supernode 0 merges node 0 and 2
supernodes[1]=[1,4]#supernode 1 merges node 1 and 4
supernodes[2]=[3,5]#supernode 2 merges node 3 and 5
The supposed output :
Adjacency matrix of 3 supernodes dim=(3,3)
reduced_adjacency_matrix=[[0,1,1],
[1,0,1],
[1,1,0]]
What does it mean ?
For instance, consider the first supernodes[0]=[0,2]. The idea is as follow :
A) if i and j are in the same supernode then adjacency[i,j]=0
B)if i and j are in the same supernode and i or j has connection with other nodes other than i and j set 1
Thank you for your help.

how to slice values from a group of list?

how to get first value (i.e index 0) of all the list and store it in another list. and second value (i.e index 1) in all list and store in another list and so on.
[[0, 0, 0, 0, 0, 1, 0, 0, 0, 0]
, [0, 0, 0, 0, 0, 1, 1, 0, 0, 0]
, [0, 0, 0, 0, 0, 1, 0, 1, 0, 0]
, [0, 0, 0, 0, 0, 0, 1, 0, 0, 0]
, [0, 0, 0, 0, 1, 1, 0, 0, 0, 0]
, [0, 0, 0, 0, 0, 1, 1, 0, 0, 0]
, [0, 0, 0, 0, 0, 1, 1, 1, 0, 0]
, [0, 0, 0, 0, 0, 1, 1, 0, 0, 0]
, [0, 0, 0, 0, 0, 1, 1, 0, 0, 0]
, [0, 0, 0, 0, 0, 1, 0, 0, 0, 0]]

Resources