Converting CSV to PyG graph - python-3.x

I have a CSV dataset as shown below:
index
s_key
identifier
edge_pairs
0
[1683, 1684, 1685, 1686, 1688, 1689, 1691, 1692, 1693, 1694, 1695, 1696, 1697, 1698, 1699, 12740]
[0, 0]
[[0, 793]]
1
[9774, 9800, 9807, 9818, 9831, 9834, 9836, 9837, 9839, 9843, 13723, 21455]
[0, 1]
[[1, 3], [1, 123], [1, 152], [1, 163], [1, 266], [1, 337], [1, 351], [1, 352], [1, 355], [1, 606], [1, 869], [1, 962], [1, 1125], [1, 1412], [1, 1413], [1, 1417], [1, 1435], [1, 1440], [1, 1454], [1, 1572], [1, 1588], [1, 1653], [1, 1726], [1, 1898], [1, 2075], [1, 2076], [1, 2166], [1, 2297], [1, 2299], [1, 2319], [1, 2327], [1, 2330], [1, 2335], [1, 2393], [1, 2395], [1, 2400], [1, 2405], [1, 2486]]
3
[2156, 2896, 3028, 4023, 4256, 6787, 7265, 8882, 8970, 9831, 10959, 11268, 11341, 12601, 13737, 17264, 18906, 20430, 21747, 22228, 22229, 22512, 22841, 24049, 25104, 25394, 25731, 26045, 26103, 31121, 31522, 31839, 31851, 31859, 31872, 35527, 35547, 36538, 37150, 37345, 37692, 37888, 37895, 38962, 45332]
[0, 3]
[[3, 8], [3, 11], [3, 12], [3, 13], [3, 27], [3, 34], [3, 99], [3, 123], [3, 125], [3, 130], [3, 132], [3, 133], [3, 134], [3, 144], [3, 147], [3, 152], [3, 154], [3, 180], [3, 181], [3, 207]]
4
[25203, 25204, 25215, 25219, 25227, 25232, 25235, 25248, 25251, 25252, 25259, 25270]
[0, 4]
[[4, 215], [4, 322], [4, 342], [4, 793], [4, 1043], [4, 1127], [4, 1176], [4, 1454], [4, 2154], [4, 2284], [4, 2331], [4, 2400], [4, 2759], [4, 2920], [4, 3335]]
5
[27099, 27101, 27104, 27107, 27108, 27111, 27117, 27120, 27123, 27131, 27143, 27153, 27156, 27158, 27162, 27167, 27172, 27175, 27176, 27178, 27184, 27185]
[0, 5]
[[5, 8], [5, 239], [5, 378], [5, 1163], [5, 1220], [5, 1378], [5, 1422], [5, 1440], [5, 1636], [5, 1681], [5, 2190], [5, 2303], [5, 2399]]
The index column represents each node.
The edge_pairs column represents the connection of each node.
For example: In Index 0, the edge pair column: [[0, 793]] represents the connection of node 0 with Node 793 and so on.
I want to make a graph out of this CSV in a format that PyG accepts data = Data(x=x, edge_index=edge_index, y=y).
I am unsure of what to take as Node Features & Labels and how to represent the connection of edges between them.

Related

list merging between list with condition in python [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed last year.
Improve this question
i have an output in a single list like this:
[[1, 74], [1, 224], [1, 247], [1, 5], [1, 225], [1, 207], [1, 79], [1, 131], [1, 180], [1, 20], [1, 104], [1, 93], [1, 213], [1, 93], [1, 151], [1, 223]]
[[2, 200], [2, 64], [2, 51], [2, 83], [2, 127], [2, 160], [2, 237], [2, 98], [2, 123], [2, 213], [2, 80], [2, 131], [2, 200], [2, 203], [2, 8], [2, 174]]
[[3, 148], [3, 72], [3, 37], [3, 40], [3, 237], [3, 24], [3, 177], [3, 205], [3, 52], [3, 53], [3, 155], [3, 208], [3, 184], [3, 44], [3, 202], [3, 171]]
but i want to make the output become:
[[1,74],[2,200],[3,148]]
[[1,224],[2,64],[3,72]]
[[1,247],[2,51],[3,37]]
and so on...
is it by mapping? or just looping then append?
please help me with code
I'm assuming your list lengths are equal for all three lists. Because you shared so.
list1 = [[1, 74], [1, 224], [1, 247], [1, 5], [1, 225], [1, 207], [1, 79], [1, 131], [1, 180], [1, 20], [1, 104], [1, 93], [1, 213], [1, 93], [1, 151], [1, 223]]
list2 = [[2, 200], [2, 64], [2, 51], [2, 83], [2, 127], [2, 160], [2, 237], [2, 98], [2, 123], [2, 213], [2, 80], [2, 131], [2, 200], [2, 203], [2, 8], [2, 174]]
list3 = [[3, 148], [3, 72], [3, 37], [3, 40], [3, 237], [3, 24], [3, 177], [3, 205], [3, 52], [3, 53], [3, 155], [3, 208], [3, 184], [3, 44], [3, 202], [3, 171]]
for i in range(len(list1)):
print([list1[i],list2[i],list3[i]])
For the basic level, you can use the code above.

Why removing a value from a 2D list works differently with a dynamic list vs. a fixed list (pre-defined)

If I generate a list and try to remove a value (e.g.; 1) from a sub-list, it removes it from all sub-lists but if I use a pre-defined list (identical to the one created, the result is different. WHY?
The build function creates a matrix of x rows by x columns where the first item of each row is the row#
e.g.; [0,[1,2,3],[1,2,3],[1,2,3]] [1,[1,2,3],[1,2,3],[1,2,3]] [2,[1,2,3],[1,2,3],[1,2,3]]
def build(size):
values = []
activetable = []
for value in range(size): # create the list of possible values
values.append(value + 1)
for row in range(size):
# Create the "Active" table with all possible values
activetable.append([row])
for item in range(size):
activetable[row].append(values)
return activetable
This function is intended to remove a specific value in the list using the row and column coordinate
def remvalue(row, col, value, table):
before = table[row][col]
before.remove(value)
table[row][col] = before
return table
When I build a list and try to remove a value in a sub-list, it is removing it from all sub-list
print("start")
table1 = build(3) # this function create a 2d table called table1
print(f" table 1: {table1}")
newtable = remvalue(row=0, col=1, value=1, table=table1)
print(f"from a dynamic table : {newtable}")
As you can see the value "1" has been removed from all sub-lists
start
table 1: [[0, [1, 2, 3], [1, 2, 3], [1, 2, 3]], [1, [1, 2, 3], [1, 2, 3], [1, 2, 3]], [2, [1, 2, 3], [1, 2, 3], [1, 2, 3]]]
from a dynamic table : [[0, [2, 3], [2, 3], [2, 3]], [1, [2, 3], [2, 3], [2, 3]], [2, [2, 3], [2, 3], [2, 3]]]
But if I use a pre-defined list with exactly the same data, the result is different
table1 = [[0, [1, 2, 3], [1, 2, 3], [1, 2, 3]], [1, [1, 2, 3], [1, 2, 3], [1, 2, 3]], [2, [1, 2, 3], [1, 2, 3], [1, 2, 3]]]
newtable = remvalue(row=0, col=1, value=1, table=table1)
print(f"from a predefined table : {newtable}")
As you can see it works as desired only when I use a pre-defined list. Why do we have this difference?
start
table 1: [[0, [1, 2, 3], [1, 2, 3], [1, 2, 3]], [1, [1, 2, 3], [1, 2, 3], [1, 2, 3]], [2, [1, 2, 3], [1, 2, 3], [1, 2, 3]]]
from a dynamic table : [[0, [2, 3], [2, 3], [2, 3]], [1, [2, 3], [2, 3], [2, 3]], [2, [2, 3], [2, 3], [2, 3]]]
from a predefined table : [[0, [2, 3], [1, 2, 3], [1, 2, 3]], [1, [1, 2, 3], [1, 2, 3], [1, 2, 3]], [2, [1, 2, 3], [1, 2, 3], [1, 2, 3]]]

Creating Kfold cross validation set without sklearn

I am trying to split my data into K-folds with train and test set. I am stuck at the end:
I have a data set example:
[1,2,3,4,5,6,7,8,9,10]
I have successful created the partition for 5-fold cross validation and the output is
fold=[[2, 1], [6, 0], [7, 8], [9, 5], [4, 3]]
Now I want to create K such instances having K-1 training data and 1 validation set.
I am using this code:
```
cross_val={"train":[],"test":[]}
new_fold=folds.copy()
for i in range(4):
val=folds.pop(i)
cross_val["train"].append(folds)
cross_val["test"].append(val)
folds[i:i]=[val]```
The output that I am getting is:
{'train': [[[6, 0], [7, 8], [9, 5], [4, 3]],
[[6, 0], [7, 8], [9, 5], [4, 3]],
[[6, 0], [7, 8], [9, 5], [4, 3]],
[[6, 0], [7, 8], [9, 5], [4, 3]]],
'test': [[6, 0], [7, 8], [9, 5], [4, 3]]}
This is the wrong output that I am getting.
But I want the output as
train test
[[6, 0], [7, 8], [9, 5], [4, 3]] [2,1]
[[2, 1], [7, 8], [9, 5], [4, 3]] [6,0]
[[6, 0], [2, 1], [9, 5], [4, 3]] [7,8]
[[6, 0], [7, 8], [9, 5], [2, 1]] [4,3]
[[6, 0], [7, 8], [2, 1], [4, 3]] [9,5]
You here each time make edits to the same list, and append that list multiple times. As a result if you edit the list, you see that edit in all elements of the list.
You can create a cross-fold validation with:
train = []
test = []
cross_val={'train': train, 'test': test}
for i, testi in enumerate(fold):
train.append(fold[:i] + fold[i+1:])
test.append(testi)
For the given sample data, this gives us:
>>> pprint(cross_val)
{'test': [[2, 1], [6, 0], [7, 8], [9, 5], [4, 3]],
'train': [[[6, 0], [7, 8], [9, 5], [4, 3]],
[[2, 1], [7, 8], [9, 5], [4, 3]],
[[2, 1], [6, 0], [9, 5], [4, 3]],
[[2, 1], [6, 0], [7, 8], [4, 3]],
[[2, 1], [6, 0], [7, 8], [9, 5]]]}

Merging multiples lists with same length when matching conditions

given input: theList = [<userID>,<number_of_views>]
theList = [
[[3, 5], [1, 1], [2, 3]],
[[1, 2], [3, 5], [3, 0], [2, 3], [4, 2]],
[[1, 2], [3, 5], [3, 0], [2, 3], [4, 2]],
[[1, 2], [1, 1], [4, 2]]
]
expected output = [
[[3, 5], [2, 3], [1, 1]],
[[3, 5], [2, 3], [1, 2], [4, 2]],
[[3, 5], [2, 3], [1, 2], [4, 2]],
[[1, 3], [4, 2]]
]
for sublist in theList:
e.x -->
theList[3] = [[1,2], [1,1], [4,2]]
how to merge items that have same userIDs = 1 in this case and sum all the corresponding views to this (userID=1) (2+1) = 3 views into a new_list --> [1,3]
expected theList[3] = [[1,3], [4,2]].
How could I make this process for all theList?
Thanks so much for spending time on this question!
This is one approach using collections.defaultdict.
Ex:
from collections import defaultdict
theList = [
[[3, 5], [1, 1], [2, 3]],
[[1, 2], [3, 5], [3, 0], [2, 3], [4, 2]],
[[1, 2], [3, 5], [3, 0], [2, 3], [4, 2]],
[[1, 2], [1, 1], [4, 2]]
]
result = []
for i in theList:
r = defaultdict(int)
for j, k in i:
r[j] += k
result.append(list(r.items()))
print(result)
Output:
[[(3, 5), (1, 1), (2, 3)],
[(1, 2), (3, 5), (2, 3), (4, 2)],
[(1, 2), (3, 5), (2, 3), (4, 2)],
[(1, 3), (4, 2)]]

Recursion - Euler 15

I am aware that there are published solutions to Euler 15. I have technically got a working solution (it yields the correct number) but when it print's the routes, it does so incorrectly
def legal_moves (row, column, grid):
legal_moves = []
if row != grid:
legal_moves.append ([row+1,column])
if column != grid:
legal_moves.append ([row,column+1])
if column == grid and row == grid:
return False
return legal_moves
def find_route (row,column,grid, route):
l_moves = legal_moves (row,column,grid)
if l_moves == False:
route.append ([row,column])
list_routes.append (route)
return
else:
route.append ([row,column])
if len(l_moves) == 1:
row = l_moves[0][0]
column = l_moves[0][1]
find_route (row,column,grid,route)
if len(l_moves) ==2:
row_a, column_a = l_moves[0][0], l_moves[0][1]
row_b, column_b = l_moves[1][0], l_moves[1][1]
find_route (row_a,column_a,grid,route)
find_route (row_b,column_b,grid,route)
grid = int(input("Enter A for grid size AxA: "))
list_routes = []
find_route(0,0,grid, route = [])
for item in list_routes:
print (item)
print ()
I ran it for board size 2 (so input: grid = 2) and the terminal printed
[[0, 0], [1, 0], [2, 0], [2, 1], [2, 2], [1, 1], [2, 1], [2, 2], [1, 2], [2, 2], [0, 1], [1, 1], [2, 1], [2, 2], [1, 2], [2, 2], [0, 2], [1, 2], [2, 2]]
[[0, 0], [1, 0], [2, 0], [2, 1], [2, 2], [1, 1], [2, 1], [2, 2], [1, 2], [2, 2], [0, 1], [1, 1], [2, 1], [2, 2], [1, 2], [2, 2], [0, 2], [1, 2], [2, 2]]
[[0, 0], [1, 0], [2, 0], [2, 1], [2, 2], [1, 1], [2, 1], [2, 2], [1, 2], [2, 2], [0, 1], [1, 1], [2, 1], [2, 2], [1, 2], [2, 2], [0, 2], [1, 2], [2, 2]]
[[0, 0], [1, 0], [2, 0], [2, 1], [2, 2], [1, 1], [2, 1], [2, 2], [1, 2], [2, 2], [0, 1], [1, 1], [2, 1], [2, 2], [1, 2], [2, 2], [0, 2], [1, 2], [2, 2]]
[[0, 0], [1, 0], [2, 0], [2, 1], [2, 2], [1, 1], [2, 1], [2, 2], [1, 2], [2, 2], [0, 1], [1, 1], [2, 1], [2, 2], [1, 2], [2, 2], [0, 2], [1, 2], [2, 2]]
[[0, 0], [1, 0], [2, 0], [2, 1], [2, 2], [1, 1], [2, 1], [2, 2], [1, 2], [2, 2], [0, 1], [1, 1], [2, 1], [2, 2], [1, 2], [2, 2], [0, 2], [1, 2], [2, 2]]
I cannot work out why it line 1 does not just print:
[0, 0], [1, 0], [2, 0], [2, 1], [2, 2]
and line 2:
[0, 0], [1, 0], [1, 1], [2, 1], [2, 2]
etc...
Please can someone help me understand why?
Thanks

Resources