Improve time efficiency in double loops

Improve time efficiency in double loops - python-3.x

I have a working code to make some calculations and create a dataframe, however it take a considerable amount of time when the number if id:s considered grows (actually, time increases exponentially).
So, here if the situation: I have a dataframe consisting of vectors, one for each id:
id vector
0 A4070270297516241 [0, 0, 0, 0, 0, 0, 0, 0, 7, 4, 0, 0, 0, 13, 0,...
1 A4060461064716279 [0, 2, 0, 0, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
2 A4050500015016271 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 3, 0, 0, ...
3 A4050494283416274 [15, 13, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
4 A4050500876316279 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
5 A4050494111016270 [6, 10, 1, 0, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
6 A4050470673516272 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
7 A4060461035616276 [0, 0, 0, 11, 0, 15, 13, 0, 5, 3, 0, 0, 0, 0, ...
8 A4050500809916271 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
9 A4050500822216279 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, ...
10 A4050494817416277 [0, 0, 0, 0, 0, 4, 9, 0, 5, 8, 0, 15, 0, 0, 8,...
11 A4060462005116279 [15, 12, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
12 A4050500802316278 [0, 0, 0, 0, 0, 1, 2, 0, 2, 2, 0, 15, 12, 0, 8...
13 A4050500841416272 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 0, 5, ...
14 A4050494856516271 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0, 3, ...
15 A4060462230216270 [0, 0, 2, 2, 15, 15, 10, 0, 0, 0, 0, 0, 0, 0, ...
16 A4090150867216273 [0, 0, 0, 0, 0, 0, 0, 13, 6, 3, 0, 2, 0, 15, 4...
17 A4060464010916275 [0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
18 A4139311891213540 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
19 A4050500938416279 [0, 10, 11, 6, 6, 2, 4, 0, 0, 0, 0, 0, 0, 0, 0...
20 A4620451871516274 [0, 0, 0, 0, 15, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
21 A4060460331116279 [0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 5, 15, 0, 2,...
I provide a dict at the end of the question to avoid clutter.
Now, what I do is that I determine, for each id which other id is the closest by calculating a weighted distance between each vector and creating a dataframe storing the infomation:
ids = list(set(df1.id))
Closest_identifier = pd.DataFrame(columns = ['id','Identifier','Closest identifier','distance'])
and my code goes like this:
import time
t = time.process_time()
for idnr in ids:
df_identifier = df1[df1['id'] == idnr]
identifier = df_identifier['vector'].to_list()
base_identifier = np.array(df_identifier['vector'].to_numpy().tolist())
Number_of_devices =len(np.nonzero(identifier)[1])
df_other_identifier = df1[df1['id'] != idnr]
other_id = list(set(df_other_identifier.id))
for id_other in other_id:
gf_identifier = df_other_identifier[df_other_identifier['id']==id_other]
identifier_other = np.array(gf_identifier['vector'].to_numpy().tolist())
dist = np.sqrt(np.sum((base_identifier - identifier_other)**2/Number_of_devices))
Closest_identifier = Closest_identifier.append({'id':id_other,'Identifier':base_identifier, 'Closest identifier':identifier_other, 'distance':dist},ignore_index=True)
elapsed_time = time.process_time() - t
print(elapsed_time)
6.0625
To explain what is happening: In the fisrt part of the code, I choose an id and set up all the infomation I need. The number of devices is the number of non zero values of the vector associated to that id (i.e., the number of devices that detected the object with that id). In the second part I compute the distance of that id to all other objects.
So, for each id, I have n-1 rows, where n is the length of my id set. So, for 50 ids, I have 50*50-50 = 2450 rows
The time given here is for 50 ids. For 200, the time for the loops to finish is 120 s, for 400 the time is 871 s. As you can see, time grows exponentially here. I have 1700 ids and it'll take days for this to complete.
My questions is thus: Is there a more efficient way to do this?
Grateful for insights.
Test data
{'id': {0: 'A4070270297516241',
1: 'A4060461064716279',
2: 'A4050500015016271',
3: 'A4050494283416274',
4: 'A4050500876316279',
5: 'A4050494111016270',
6: 'A4050470673516272',
7: 'A4060461035616276',
8: 'A4050500809916271',
9: 'A4050500822216279',
10: 'A4050494817416277',
11: 'A4060462005116279',
12: 'A4050500802316278',
13: 'A4050500841416272',
14: 'A4050494856516271',
15: 'A4060462230216270',
16: 'A4090150867216273',
17: 'A4060464010916275',
18: 'A4139311891213540',
19: 'A4050500938416279',
20: 'A4620451871516274',
21: 'A4060460331116279',
22: 'A4060454590916277',
23: 'A4060454778016276',
24: 'A4060462019716270',
25: 'A4050500945416277',
26: 'A4050494267716279',
27: 'A4090281644816244',
28: 'A4050500929516270',
29: 'N4010442537213363',
30: 'A4050500938216277',
31: 'A4060454598916275',
32: 'A4050494086216273',
33: 'A4060462859616271',
34: 'A4060454600116271',
35: 'A4050494551816276',
36: 'A4610490015816279',
37: 'A4060454605416279',
38: 'A4060454665916270',
39: 'A4060454579316278',
40: 'A4060464023516275',
41: 'A4050500588616272',
42: 'A4050500905516274',
43: 'A4070262442416243',
44: 'A4050500946716271',
45: 'A4070271195016244',
46: 'A4060454663216271',
47: 'A4060454590416272',
48: 'A4060461993616279',
49: 'N4010442139713366'},
'vector': {0: [0,
0,
0,
0,
0,
0,
0,
0,
7,
4,
0,
0,
0,
13,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0],
1: [0,
2,
0,
0,
6,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
4,
9,
14],
2: [0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
1,
3,
0,
0,
0,
5,
12,
15,
2,
0,
0,
0,
0,
0,
0],
3: [15,
13,
3,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
4,
5],
4: [0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
15,
15,
0,
0,
0,
0,
0,
0],
5: [6,
10,
1,
0,
2,
1,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
5,
10,
13,
15],
6: [0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
2,
15,
7,
2,
0,
0,
0],
7: [0,
0,
0,
11,
0,
15,
13,
0,
5,
3,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0],
8: [0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
3,
10,
2,
0,
0],
9: [0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
8,
15,
5,
0,
0,
0,
0,
0,
0,
0,
0,
0],
10: [0,
0,
0,
0,
0,
4,
9,
0,
5,
8,
0,
15,
0,
0,
8,
2,
0,
0,
0,
0,
0,
0,
0,
5,
0,
0],
11: [15,
12,
2,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
2,
0,
6,
2],
12: [0,
0,
0,
0,
0,
1,
2,
0,
2,
2,
0,
15,
12,
0,
8,
1,
9,
2,
0,
0,
0,
0,
0,
0,
0,
0],
13: [0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
6,
0,
5,
3,
11,
15,
11,
1,
0,
0,
0,
0,
0,
0],
14: [0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
5,
0,
3,
0,
7,
12,
14,
1,
0,
0,
0,
0,
0,
0],
15: [0,
0,
2,
2,
15,
15,
10,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
15,
2,
15],
16: [0,
0,
0,
0,
0,
0,
0,
13,
6,
3,
0,
2,
0,
15,
4,
1,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0],
17: [0,
0,
0,
0,
7,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
15,
15,
8,
2],
18: [0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
2,
12,
9,
2,
0,
0],
19: [0,
10,
11,
6,
6,
2,
4,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
4],
20: [0,
0,
0,
0,
15,
3,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
2,
4,
14,
13,
11],
21: [0,
0,
0,
0,
0,
0,
0,
0,
0,
2,
0,
5,
15,
0,
2,
3,
10,
9,
0,
0,
0,
0,
0,
0,
0,
0],
22: [0,
0,
0,
2,
7,
15,
15,
0,
2,
3,
0,
15,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
4,
0,
4],
23: [2,
15,
15,
4,
2,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
2,
11],
24: [0,
0,
0,
0,
4,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
6,
15,
14,
2],
25: [0,
9,
13,
15,
6,
2,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
4],
26: [0,
0,
0,
1,
2,
8,
15,
0,
1,
4,
0,
15,
1,
0,
6,
0,
0,
0,
0,
0,
0,
0,
0,
8,
0,
0],
27: [0,
0,
0,
0,
0,
0,
0,
0,
2,
1,
0,
0,
0,
5,
4,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0],
28: [0,
7,
9,
6,
6,
4,
7,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
5],
29: [8,
6,
2,
2,
6,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
7,
11],
30: [0,
10,
11,
15,
6,
2,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
4],
31: [6,
15,
6,
0,
2,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
3,
8],
32: [0,
0,
0,
0,
2,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
2,
2,
11,
8,
9,
2],
33: [0,
0,
0,
0,
0,
11,
15,
0,
2,
4,
0,
3,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0],
34: [4,
15,
15,
5,
1,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
2,
1,
8],
35: [2,
1,
1,
0,
12,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
9,
15,
14,
12],
36: [0,
0,
0,
0,
0,
0,
5,
15,
4,
2,
0,
0,
0,
1,
2,
0,
3,
0,
0,
0,
0,
0,
0,
0,
0,
0],
37: [0,
0,
0,
0,
0,
11,
15,
0,
15,
15,
0,
14,
0,
3,
4,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
4],
38: [0,
0,
0,
0,
0,
3,
14,
0,
10,
15,
0,
14,
0,
2,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0],
39: [0,
0,
2,
0,
8,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
4,
15,
15,
5],
40: [0,
0,
0,
3,
0,
4,
10,
5,
15,
14,
0,
2,
2,
13,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0],
41: [0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
15,
10,
0,
0,
0,
0,
0,
0],
42: [0,
0,
0,
0,
0,
0,
0,
1,
0,
2,
0,
2,
0,
10,
15,
14,
6,
2,
0,
0,
0,
0,
0,
0,
0,
0],
43: [0,
0,
0,
0,
0,
0,
3,
2,
7,
8,
0,
2,
0,
15,
8,
2,
4,
0,
0,
0,
0,
0,
0,
0,
0,
0],
44: [0,
0,
0,
0,
0,
0,
0,
0,
0,
5,
0,
11,
15,
0,
3,
0,
13,
12,
0,
0,
0,
0,
0,
0,
0,
0],
45: [0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
3,
11,
0,
0,
0,
1,
0,
0,
0,
0,
0,
0,
0,
0,
0],
46: [0,
0,
0,
0,
0,
3,
11,
0,
15,
15,
0,
15,
2,
9,
5,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0],
47: [0,
2,
3,
0,
15,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
4,
15,
15,
6],
48: [0,
0,
0,
0,
2,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
3,
0,
11,
7,
9,
3],
49: [0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
12,
15,
9,
0,
0,
0]}}

Try:
# id1: [0, 0, 0, 1, 1, 1]
m = np.repeat(np.vstack(df1['vector']), df1.shape[0], axis=0)
# id2: [0, 1, 0, 1, 0, 1]
n = np.tile(np.vstack(df1['vector']), (df1.shape[0], 1))
# number of devices for each vector of m
d = np.count_nonzero(m, axis=1, keepdims=True)
# compute the distance
dist = np.sqrt(np.sum((m - n)**2/d, axis=-1))
# create the final dataframe
mi = pd.MultiIndex.from_product([df1['id']] * 2, names=['id1', 'id2'])
out = pd.DataFrame({'vector1': m.tolist(),
'vector2': n.tolist(),
'distance': dist}, index=mi).reset_index()
Output:
>>> out
id1 id2 vector1 vector2 distance
0 A4070270297516241 A4070270297516241 [0, 0, 0, 0, 0, 0, 0, 0, 7, 4, 0, 0, 0, 13, 0,... [0, 0, 0, 0, 0, 0, 0, 0, 7, 4, 0, 0, 0, 13, 0,... 0.000000
1 A4070270297516241 A4060461064716279 [0, 0, 0, 0, 0, 0, 0, 0, 7, 4, 0, 0, 0, 13, 0,... [0, 2, 0, 0, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 13.747727
2 A4070270297516241 A4050500015016271 [0, 0, 0, 0, 0, 0, 0, 0, 7, 4, 0, 0, 0, 13, 0,... [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 3, 0, 0, ... 14.628739
3 A4070270297516241 A4050494283416274 [0, 0, 0, 0, 0, 0, 0, 0, 7, 4, 0, 0, 0, 13, 0,... [15, 13, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0... 15.033296
4 A4070270297516241 A4050500876316279 [0, 0, 0, 0, 0, 0, 0, 0, 7, 4, 0, 0, 0, 13, 0,... [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 15.099669
... ... ... ... ... ...
2495 N4010442139713366 A4070271195016244 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 11, 0, 0,... 13.916417
2496 N4010442139713366 A4060454663216271 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... [0, 0, 0, 0, 0, 3, 11, 0, 15, 15, 0, 15, 2, 9,... 21.330729
2497 N4010442139713366 A4060454590416272 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... [0, 2, 3, 0, 15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... 19.304576
2498 N4010442139713366 A4060461993616279 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... [0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 12.288206
2499 N4010442139713366 N4010442139713366 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 0.000000
[2500 rows x 5 columns]
Performance
%timeit loop_op()
3.75 s ± 88.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit vect_corralien()
4.16 ms ± 12.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

You can calculate the squared distances with broadcasting. After you have that you just have to find num_devices for every row and use it to calculate your custom distance. After filling the diagonal with infinite values you can get the minimum index of every row which gives you the closest device.
arr = np.array([value for value in data['vector'].values()])
squared_distances = np.power((arr[:,None,:] - arr[None,:,:]).sum(axis=-1), 2)
num_devices = (arr != 0).sum(axis=1)
distances = np.sqrt(squared_distances / num_devices )
np.fill_diagonal(distances, np.inf)
closest_indentifiers = distances.argmin(axis=0)
You can format the output of the program as you desire

Related

How can I draw the Confusion Matrix when using image_dataset_from_directory in Tensorflow2.x?

My TF version is 2.9 and Python 3.8.
I have built an image binary classification CNN model and I am trying to get a confusion matrix.
The dataset structure is as follows.
train/
│------ benign/
│------ normal/
test/
│------ benign/
│------ normal/
The dataset configuration is as follows.
train_ds = tf.keras.utils.image_dataset_from_directory(
directory = train_data_dir,
labels="inferred",
validation_split=0.2,
subset="training",
seed=1337,
color_mode='grayscale',
image_size=(img_height, img_width),
batch_size=batch_size,
)
val_ds = tf.keras.utils.image_dataset_from_directory(
directory = train_data_dir,
labels="inferred",
validation_split=0.2,
subset="validation",
seed=1337,
color_mode='grayscale',
image_size=(img_height, img_width),
batch_size=batch_size,
)
test_ds = tf.keras.utils.image_dataset_from_directory(
directory = test_data_dir,
color_mode='grayscale',
seed=1337,
image_size=(img_height, img_width),
batch_size=batch_size,
)
I wrote the code referring to the following link to get the confusion matrix.
Reference Page
And this is my code about the confusion matrix.
predictions = model.predict(test_ds)
y_pred = []
y_true = []
# iterate over the dataset
for image_batch, label_batch in test_ds: # use dataset.unbatch() with repeat
# append true labels
y_true.append(label_batch)
# compute predictions
preds = model.predict(image_batch)
# append predicted labels
y_pred.append(np.argmax(preds, axis = - 1))
# convert the true and predicted labels into tensors
true_labels = tf.concat([item for item in y_true], axis = 0)
predicted_labels = tf.concat([item for item in y_pred], axis = 0)
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(true_labels, predicted_labels)
print(cm)
y_pred and y_true were obtained from test_ds as above, and the results of confusion matrix were as follows.
[[200 0]
[200 0]]
So I tried outputting true_labels and predicted_labels, and confirmed that predicted_labels are both 0 as follows.
print(true_labels)
<tf.Tensor: shape=(400,), dtype=int32, numpy=
array([0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0,
1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0,
0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0,
0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0,
1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0,
0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1,
0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1,
1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1,
1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1,
0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0,
1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0,
0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1,
0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0,
1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0,
1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0,
0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1,
1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0,
0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0,
0, 0, 1, 1])>
print(predicted_labels)
<tf.Tensor: shape=(400,), dtype=int64, numpy=
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0], dtype=int64)>
I'm not sure why predicted_labels are all zero.
But this is wrong. I think the following results are correct.
[[200 0]
[0 200]]
What is wrong? I've been struggling for a few days. Please please help me.
Thanks a lot.

In case of Image Binary Classification, threshold should be used to obtain predict label after model.predict(test_ds). I found that modifying the code in my question y_pred.append(np.argmax(preds, axis = - 1)) to y_pred.append(np.where(preds > threshold, 1, 0)) solved the problem. Hope it was helpful to someone.

why do I get: "unsupervised_wiener() got an unexpected keyword argument 'max_num_iter'" when using skimage.restoration.unsupervised_wiener?

i am playing around with scikit image restoration package and successfully ran the unsupervised_wiener algorithm on some made up data. In this simple example it does what I expect, but on my more complicated dataset it returns a striped pattern with extreme values of -1 and 1.
I would like to fiddle with the parameters to better understand what is going on, but I get the error as stated in the question. I tried scikit image version 0.19.3 and downgraded to scikit image version 0.19.2, but the error remains.
The same goes for the "other parameters":https://scikit-image.org/docs/0.19.x/api/skimage.restoration.html#skimage.restoration.unsupervised_wiener
Can someone explain why I can't input parameters?
The example below contains a "scan" and a "point-spread-function". I convolve the scan with the point spread function and then reverse the process using the unsupervised wiener deconvolution.
import numpy as np
import matplotlib.pyplot as plt
from skimage import color, data, restoration
import pickle
rng = np.random.default_rng()
from scipy.signal import convolve2d as conv2
scan = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0.5, 0.5, 0.5, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0.5, 1, 0.5, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0.5, 1, 0.5, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0.5, 1, 0.5, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0.5, 0.5, 0.5, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
])
print(scan.shape)
psf = np.array([
[1, 1, 1, 1, 1],#1
[1, 0, 0, 0, 1],#2
[1, 0, 0, 0, 1],#3
[1, 0, 0, 0, 1],#4
[1, 1, 1, 1, 1]#5
])
psf = psf/(np.sum(psf))
print(psf)
scan_conv = conv2(scan, psf, 'same')
deconvolved1, _ = restoration.unsupervised_wiener(scan_conv, psf, max_num_iter=10)
fig, ax = plt.subplots(nrows=1, ncols=3, figsize=(8, 5),
sharex=True, sharey=True)
ax[0].imshow(scan, vmin=scan.min(), vmax=1)
ax[0].axis('off')
ax[0].set_title('Data')
ax[1].imshow(scan_conv)
ax[1].axis('off')
ax[1].set_title('Data_distorted')
ax[2].imshow(deconvolved1)
ax[2].axis('off')
ax[2].set_title('restoration1')
fig.tight_layout()
plt.show()

Convert connected components to adjacency matrix

l have an adjacency matrix of 16 by 16.
Adjacency=[0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
From this adjacency matrix l applied scipy algorithm to determine the connected components as follow :
from scipy.sparse.csgraph import connected_components
supernodes=connected_components(Adjacency)
which returns 4 components :
(4, array([0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 2, 2, 3, 0], dtype=int32))
Now the algorithm returns 4 components (4 new nodes or 4 supernodes 0,1,2,3) and its associated adjacency matrix is of dim=(4,4)
My question is as follow :
Given the intial adjacency matrix of 16 by 16 and the connected components, how can l compute efficiently the new adjacency matrix ?
In other way, we need to merge all the nodes that are affected to the same connected component.
EDIT 1 :
Here a concrete example. Given the following adjacency matrix of 6 nodes, dim=-6,6) :
Adjacency_matrix=[[0,1,1,0,0,1],
[1,0,0,1,0,0],
[1,0,0,0,1,1],
[0,1,0,0,1,0],
[0,0,1,1,0,0],
[1,0,1,0,0,0]]
Given three supernodes as follow :
supernodes[0]=[0,2]# supernode 0 merges node 0 and 2
supernodes[1]=[1,4]#supernode 1 merges node 1 and 4
supernodes[2]=[3,5]#supernode 2 merges node 3 and 5
The supposed output :
Adjacency matrix of 3 supernodes dim=(3,3)
reduced_adjacency_matrix=[[0,1,1],
[1,0,1],
[1,1,0]]
What does it mean ?
For instance, consider the first supernodes[0]=[0,2]. The idea is as follow :
A) if i and j are in the same supernode then adjacency[i,j]=0
B)if i and j are in the same supernode and i or j has connection with other nodes other than i and j set 1
Thank you for your help.

how to slice values from a group of list?

how to get first value (i.e index 0) of all the list and store it in another list. and second value (i.e index 1) in all list and store in another list and so on.
[[0, 0, 0, 0, 0, 1, 0, 0, 0, 0]
, [0, 0, 0, 0, 0, 1, 1, 0, 0, 0]
, [0, 0, 0, 0, 0, 1, 0, 1, 0, 0]
, [0, 0, 0, 0, 0, 0, 1, 0, 0, 0]
, [0, 0, 0, 0, 1, 1, 0, 0, 0, 0]
, [0, 0, 0, 0, 0, 1, 1, 0, 0, 0]
, [0, 0, 0, 0, 0, 1, 1, 1, 0, 0]
, [0, 0, 0, 0, 0, 1, 1, 0, 0, 0]
, [0, 0, 0, 0, 0, 1, 1, 0, 0, 0]
, [0, 0, 0, 0, 0, 1, 0, 0, 0, 0]]

python 3 comprehension dictionary

This is my code:
def brujinGraph(k, strList):
vertex = [[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0], [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0], [1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0]]
brujinGraph = {strList[i]: strList[j][:-1] for i in range(len(vertex)) for j in range(k) and vertex[i][j] == 1}
return brujinGraph
strList = ['AAGA', 'AAGA', 'AGAT', 'ATTC', 'CTAA', 'CTCT', 'GATT', 'TAAG', 'TCTA', 'TCTC', 'TTCT']
brujinGraph(4, strList)
and it is throwing me an UnboundLocalError: local variable 'j' referenced before assignment
any idea what does it means and why am I getting this error?

Without knowing exactly what vertex and strList are :
Do you actually mean :
{strList[i]: strList[j][:-1] for i in range(len(vertex)) for j in range(len(vertex[i])) if vertex[i][j] == 1}
i.e. change that and into an if

Couple of issues:
You need an if not an and at the end
I think it is better expressed this way:
brujinGraph = {strList[i]: strList[j][:-1] for i, x in enumerate(vertex) for j, e in enumerate(x) if e == 1}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Improve time efficiency in double loops - python-3.x

Related

How can I draw the Confusion Matrix when using image_dataset_from_directory in Tensorflow2.x?

why do I get: "unsupervised_wiener() got an unexpected keyword argument 'max_num_iter'" when using skimage.restoration.unsupervised_wiener?

Convert connected components to adjacency matrix

how to slice values from a group of list?

python 3 comprehension dictionary

Categories

Resources