I have a CSV dataset as shown below:
index
s_key
identifier
edge_pairs
0
[1683, 1684, 1685, 1686, 1688, 1689, 1691, 1692, 1693, 1694, 1695, 1696, 1697, 1698, 1699, 12740]
[0, 0]
[[0, 793]]
1
[9774, 9800, 9807, 9818, 9831, 9834, 9836, 9837, 9839, 9843, 13723, 21455]
[0, 1]
[[1, 3], [1, 123], [1, 152], [1, 163], [1, 266], [1, 337], [1, 351], [1, 352], [1, 355], [1, 606], [1, 869], [1, 962], [1, 1125], [1, 1412], [1, 1413], [1, 1417], [1, 1435], [1, 1440], [1, 1454], [1, 1572], [1, 1588], [1, 1653], [1, 1726], [1, 1898], [1, 2075], [1, 2076], [1, 2166], [1, 2297], [1, 2299], [1, 2319], [1, 2327], [1, 2330], [1, 2335], [1, 2393], [1, 2395], [1, 2400], [1, 2405], [1, 2486]]
3
[2156, 2896, 3028, 4023, 4256, 6787, 7265, 8882, 8970, 9831, 10959, 11268, 11341, 12601, 13737, 17264, 18906, 20430, 21747, 22228, 22229, 22512, 22841, 24049, 25104, 25394, 25731, 26045, 26103, 31121, 31522, 31839, 31851, 31859, 31872, 35527, 35547, 36538, 37150, 37345, 37692, 37888, 37895, 38962, 45332]
[0, 3]
[[3, 8], [3, 11], [3, 12], [3, 13], [3, 27], [3, 34], [3, 99], [3, 123], [3, 125], [3, 130], [3, 132], [3, 133], [3, 134], [3, 144], [3, 147], [3, 152], [3, 154], [3, 180], [3, 181], [3, 207]]
4
[25203, 25204, 25215, 25219, 25227, 25232, 25235, 25248, 25251, 25252, 25259, 25270]
[0, 4]
[[4, 215], [4, 322], [4, 342], [4, 793], [4, 1043], [4, 1127], [4, 1176], [4, 1454], [4, 2154], [4, 2284], [4, 2331], [4, 2400], [4, 2759], [4, 2920], [4, 3335]]
5
[27099, 27101, 27104, 27107, 27108, 27111, 27117, 27120, 27123, 27131, 27143, 27153, 27156, 27158, 27162, 27167, 27172, 27175, 27176, 27178, 27184, 27185]
[0, 5]
[[5, 8], [5, 239], [5, 378], [5, 1163], [5, 1220], [5, 1378], [5, 1422], [5, 1440], [5, 1636], [5, 1681], [5, 2190], [5, 2303], [5, 2399]]
The index column represents each node.
The edge_pairs column represents the connection of each node.
For example: In Index 0, the edge pair column: [[0, 793]] represents the connection of node 0 with Node 793 and so on.
I want to make a graph out of this CSV in a format that PyG accepts data = Data(x=x, edge_index=edge_index, y=y).
I am unsure of what to take as Node Features & Labels and how to represent the connection of edges between them.
lst = [2, 2, 2, 3, 5, 7]
I want to partition it into two or more sets.
Here is my code
import more_itertools as mit
import pprint as pp
from operator import itemgetter
from itertools import groupby
x = [part for k in range(1, len(lst) + 1) for part in mit.set_partitions(lst, k)]
x.sort()
x = list(map(itemgetter(0), groupby(x)))
pp.pprint(x)
Here is my current output:
[[[2], [2], [2], [3], [5], [7]],
[[2], [2], [2], [3], [5, 7]],
[[2], [2], [2], [3, 5], [7]],
[[2], [2], [2], [3, 5, 7]],
[[2], [2], [2], [5], [3, 7]],
[[2], [2], [2, 3], [5], [7]],
[[2], [2], [2, 3], [5, 7]],
[[2], [2], [2, 3, 5], [7]],
[[2], [2], [2, 3, 5, 7]],
[[2], [2], [2, 5], [3, 7]],
[[2], [2], [3], [2, 5], [7]],
[[2], [2], [3], [2, 5, 7]],
[[2], [2], [3], [5], [2, 7]],
[[2], [2], [3, 5], [2, 7]],
[[2], [2], [5], [2, 3, 7]],
[[2], [2, 2], [3], [5], [7]],
[[2], [2, 2], [3], [5, 7]],
[[2], [2, 2], [3, 5], [7]],
[[2], [2, 2], [3, 5, 7]],
[[2], [2, 2], [5], [3, 7]],
[[2], [2, 2, 3], [5], [7]],
[[2], [2, 2, 3], [5, 7]],
[[2], [2, 2, 3, 5], [7]],
[[2], [2, 2, 3, 5, 7]],
[[2], [2, 2, 5], [3, 7]],
[[2], [2, 3], [2, 5], [7]],
[[2], [2, 3], [2, 5, 7]],
[[2], [2, 3], [5], [2, 7]],
[[2], [2, 3, 5], [2, 7]],
[[2], [2, 5], [2, 3, 7]],
[[2], [3], [2, 2, 5], [7]],
[[2], [3], [2, 2, 5, 7]],
[[2], [3], [2, 5], [2, 7]],
[[2], [3], [5], [2, 2, 7]],
[[2], [3, 5], [2, 2, 7]],
[[2], [5], [2, 2, 3, 7]],
[[2, 2], [2], [3], [5], [7]],
[[2, 2], [2], [3], [5, 7]],
[[2, 2], [2], [3, 5], [7]],
[[2, 2], [2], [3, 5, 7]],
[[2, 2], [2], [5], [3, 7]],
[[2, 2], [2, 3], [5], [7]],
[[2, 2], [2, 3], [5, 7]],
[[2, 2], [2, 3, 5], [7]],
[[2, 2], [2, 3, 5, 7]],
[[2, 2], [2, 5], [3, 7]],
[[2, 2], [3], [2, 5], [7]],
[[2, 2], [3], [2, 5, 7]],
[[2, 2], [3], [5], [2, 7]],
[[2, 2], [3, 5], [2, 7]],
[[2, 2], [5], [2, 3, 7]],
[[2, 2, 2], [3], [5], [7]],
[[2, 2, 2], [3], [5, 7]],
[[2, 2, 2], [3, 5], [7]],
[[2, 2, 2], [3, 5, 7]],
[[2, 2, 2], [5], [3, 7]],
[[2, 2, 2, 3], [5], [7]],
[[2, 2, 2, 3], [5, 7]],
[[2, 2, 2, 3, 5], [7]],
[[2, 2, 2, 3, 5, 7]],
[[2, 2, 2, 5], [3, 7]],
[[2, 2, 3], [2, 5], [7]],
[[2, 2, 3], [2, 5, 7]],
[[2, 2, 3], [5], [2, 7]],
[[2, 2, 3, 5], [2, 7]],
[[2, 2, 5], [2, 3, 7]],
[[2, 3], [2, 2, 5], [7]],
[[2, 3], [2, 2, 5, 7]],
[[2, 3], [2, 5], [2, 7]],
[[2, 3], [5], [2, 2, 7]],
[[2, 3, 5], [2, 2, 7]],
[[2, 5], [2, 2, 3, 7]],
[[3], [2, 2, 2, 5], [7]],
[[3], [2, 2, 2, 5, 7]],
[[3], [2, 2, 5], [2, 7]],
[[3], [2, 5], [2, 2, 7]],
[[3], [5], [2, 2, 2, 7]],
[[3, 5], [2, 2, 2, 7]],
[[5], [2, 2, 2, 3, 7]]]
I have managed to remove some amount of redundancy by sorting followed by using groupby(x) as you can see but there are more redundancies like
[[2], [2, 2], [3], [5], [7]]
and
[[2, 2], [2], [3], [5], [7]]
for me are one and the same thing since order is not important to me.
Please dont close the question without putting in an effort. I have framed the question only after going through other similar questions on stackoverflow, they are for ordered sets, mine is for unordered sets.(part of my code were formed from those answers)
You basically need to construct a set somehow so that you only have unique combinations in your result(removing all permutations of a list except 1). This can be done by sorting each sublist of the list and then constructing a dictionary.
import more_itertools as mit
import pprint as pp
from operator import itemgetter
from itertools import groupby
lst=[2, 2, 2, 3, 5, 7]
x = [part for k in range(1, len(lst) + 1) for part in mit.set_partitions(lst, k)]
x.sort()
x = list(map(itemgetter(0), groupby(x)))
# sort each sublist
for temp in x:
temp = temp.sort()
# create a dictionary(hashtable), key of a dictionary should be immutable therefore stringified it
unique_combinations = {str(temp):temp for temp in x}
# since unique keys will have unique values, we have unique combinations here
unique_combinations = list(unique_combinations.values())
pp.pprint(unique_combinations)
Output
[[[2], [2], [2], [3], [5], [7]],
[[2], [2], [2], [3], [5, 7]],
[[2], [2], [2], [3, 5], [7]],
[[2], [2], [2], [3, 5, 7]],
[[2], [2], [2], [3, 7], [5]],
[[2], [2], [2, 3], [5], [7]],
[[2], [2], [2, 3], [5, 7]],
[[2], [2], [2, 3, 5], [7]],
[[2], [2], [2, 3, 5, 7]],
[[2], [2], [2, 5], [3, 7]],
[[2], [2], [2, 5], [3], [7]],
[[2], [2], [2, 5, 7], [3]],
[[2], [2], [2, 7], [3], [5]],
[[2], [2], [2, 7], [3, 5]],
[[2], [2], [2, 3, 7], [5]],
[[2], [2, 2], [3], [5], [7]],
[[2], [2, 2], [3], [5, 7]],
[[2], [2, 2], [3, 5], [7]],
[[2], [2, 2], [3, 5, 7]],
[[2], [2, 2], [3, 7], [5]],
[[2], [2, 2, 3], [5], [7]],
[[2], [2, 2, 3], [5, 7]],
[[2], [2, 2, 3, 5], [7]],
[[2], [2, 2, 3, 5, 7]],
[[2], [2, 2, 5], [3, 7]],
[[2], [2, 3], [2, 5], [7]],
[[2], [2, 3], [2, 5, 7]],
[[2], [2, 3], [2, 7], [5]],
[[2], [2, 3, 5], [2, 7]],
[[2], [2, 3, 7], [2, 5]],
[[2], [2, 2, 5], [3], [7]],
[[2], [2, 2, 5, 7], [3]],
[[2], [2, 5], [2, 7], [3]],
[[2], [2, 2, 7], [3], [5]],
[[2], [2, 2, 7], [3, 5]],
[[2], [2, 2, 3, 7], [5]],
[[2, 2], [2, 3], [5], [7]],
[[2, 2], [2, 3], [5, 7]],
[[2, 2], [2, 3, 5], [7]],
[[2, 2], [2, 3, 5, 7]],
[[2, 2], [2, 5], [3, 7]],
[[2, 2], [2, 5], [3], [7]],
[[2, 2], [2, 5, 7], [3]],
[[2, 2], [2, 7], [3], [5]],
[[2, 2], [2, 7], [3, 5]],
[[2, 2], [2, 3, 7], [5]],
[[2, 2, 2], [3], [5], [7]],
[[2, 2, 2], [3], [5, 7]],
[[2, 2, 2], [3, 5], [7]],
[[2, 2, 2], [3, 5, 7]],
[[2, 2, 2], [3, 7], [5]],
[[2, 2, 2, 3], [5], [7]],
[[2, 2, 2, 3], [5, 7]],
[[2, 2, 2, 3, 5], [7]],
[[2, 2, 2, 3, 5, 7]],
[[2, 2, 2, 5], [3, 7]],
[[2, 2, 3], [2, 5], [7]],
[[2, 2, 3], [2, 5, 7]],
[[2, 2, 3], [2, 7], [5]],
[[2, 2, 3, 5], [2, 7]],
[[2, 2, 5], [2, 3, 7]],
[[2, 2, 5], [2, 3], [7]],
[[2, 2, 5, 7], [2, 3]],
[[2, 3], [2, 5], [2, 7]],
[[2, 2, 7], [2, 3], [5]],
[[2, 2, 7], [2, 3, 5]],
[[2, 2, 3, 7], [2, 5]],
[[2, 2, 2, 5], [3], [7]],
[[2, 2, 2, 5, 7], [3]],
[[2, 2, 5], [2, 7], [3]],
[[2, 2, 7], [2, 5], [3]],
[[2, 2, 2, 7], [3], [5]],
[[2, 2, 2, 7], [3, 5]],
[[2, 2, 2, 3, 7], [5]]]