Related
I have a CSV dataset as shown below:
index
s_key
identifier
edge_pairs
0
[1683, 1684, 1685, 1686, 1688, 1689, 1691, 1692, 1693, 1694, 1695, 1696, 1697, 1698, 1699, 12740]
[0, 0]
[[0, 793]]
1
[9774, 9800, 9807, 9818, 9831, 9834, 9836, 9837, 9839, 9843, 13723, 21455]
[0, 1]
[[1, 3], [1, 123], [1, 152], [1, 163], [1, 266], [1, 337], [1, 351], [1, 352], [1, 355], [1, 606], [1, 869], [1, 962], [1, 1125], [1, 1412], [1, 1413], [1, 1417], [1, 1435], [1, 1440], [1, 1454], [1, 1572], [1, 1588], [1, 1653], [1, 1726], [1, 1898], [1, 2075], [1, 2076], [1, 2166], [1, 2297], [1, 2299], [1, 2319], [1, 2327], [1, 2330], [1, 2335], [1, 2393], [1, 2395], [1, 2400], [1, 2405], [1, 2486]]
3
[2156, 2896, 3028, 4023, 4256, 6787, 7265, 8882, 8970, 9831, 10959, 11268, 11341, 12601, 13737, 17264, 18906, 20430, 21747, 22228, 22229, 22512, 22841, 24049, 25104, 25394, 25731, 26045, 26103, 31121, 31522, 31839, 31851, 31859, 31872, 35527, 35547, 36538, 37150, 37345, 37692, 37888, 37895, 38962, 45332]
[0, 3]
[[3, 8], [3, 11], [3, 12], [3, 13], [3, 27], [3, 34], [3, 99], [3, 123], [3, 125], [3, 130], [3, 132], [3, 133], [3, 134], [3, 144], [3, 147], [3, 152], [3, 154], [3, 180], [3, 181], [3, 207]]
4
[25203, 25204, 25215, 25219, 25227, 25232, 25235, 25248, 25251, 25252, 25259, 25270]
[0, 4]
[[4, 215], [4, 322], [4, 342], [4, 793], [4, 1043], [4, 1127], [4, 1176], [4, 1454], [4, 2154], [4, 2284], [4, 2331], [4, 2400], [4, 2759], [4, 2920], [4, 3335]]
5
[27099, 27101, 27104, 27107, 27108, 27111, 27117, 27120, 27123, 27131, 27143, 27153, 27156, 27158, 27162, 27167, 27172, 27175, 27176, 27178, 27184, 27185]
[0, 5]
[[5, 8], [5, 239], [5, 378], [5, 1163], [5, 1220], [5, 1378], [5, 1422], [5, 1440], [5, 1636], [5, 1681], [5, 2190], [5, 2303], [5, 2399]]
The index column represents each node.
The edge_pairs column represents the connection of each node.
For example: In Index 0, the edge pair column: [[0, 793]] represents the connection of node 0 with Node 793 and so on.
I want to make a graph out of this CSV in a format that PyG accepts data = Data(x=x, edge_index=edge_index, y=y).
I am unsure of what to take as Node Features & Labels and how to represent the connection of edges between them.
If I generate a list and try to remove a value (e.g.; 1) from a sub-list, it removes it from all sub-lists but if I use a pre-defined list (identical to the one created, the result is different. WHY?
The build function creates a matrix of x rows by x columns where the first item of each row is the row#
e.g.; [0,[1,2,3],[1,2,3],[1,2,3]] [1,[1,2,3],[1,2,3],[1,2,3]] [2,[1,2,3],[1,2,3],[1,2,3]]
def build(size):
values = []
activetable = []
for value in range(size): # create the list of possible values
values.append(value + 1)
for row in range(size):
# Create the "Active" table with all possible values
activetable.append([row])
for item in range(size):
activetable[row].append(values)
return activetable
This function is intended to remove a specific value in the list using the row and column coordinate
def remvalue(row, col, value, table):
before = table[row][col]
before.remove(value)
table[row][col] = before
return table
When I build a list and try to remove a value in a sub-list, it is removing it from all sub-list
print("start")
table1 = build(3) # this function create a 2d table called table1
print(f" table 1: {table1}")
newtable = remvalue(row=0, col=1, value=1, table=table1)
print(f"from a dynamic table : {newtable}")
As you can see the value "1" has been removed from all sub-lists
start
table 1: [[0, [1, 2, 3], [1, 2, 3], [1, 2, 3]], [1, [1, 2, 3], [1, 2, 3], [1, 2, 3]], [2, [1, 2, 3], [1, 2, 3], [1, 2, 3]]]
from a dynamic table : [[0, [2, 3], [2, 3], [2, 3]], [1, [2, 3], [2, 3], [2, 3]], [2, [2, 3], [2, 3], [2, 3]]]
But if I use a pre-defined list with exactly the same data, the result is different
table1 = [[0, [1, 2, 3], [1, 2, 3], [1, 2, 3]], [1, [1, 2, 3], [1, 2, 3], [1, 2, 3]], [2, [1, 2, 3], [1, 2, 3], [1, 2, 3]]]
newtable = remvalue(row=0, col=1, value=1, table=table1)
print(f"from a predefined table : {newtable}")
As you can see it works as desired only when I use a pre-defined list. Why do we have this difference?
start
table 1: [[0, [1, 2, 3], [1, 2, 3], [1, 2, 3]], [1, [1, 2, 3], [1, 2, 3], [1, 2, 3]], [2, [1, 2, 3], [1, 2, 3], [1, 2, 3]]]
from a dynamic table : [[0, [2, 3], [2, 3], [2, 3]], [1, [2, 3], [2, 3], [2, 3]], [2, [2, 3], [2, 3], [2, 3]]]
from a predefined table : [[0, [2, 3], [1, 2, 3], [1, 2, 3]], [1, [1, 2, 3], [1, 2, 3], [1, 2, 3]], [2, [1, 2, 3], [1, 2, 3], [1, 2, 3]]]
given input: theList = [<userID>,<number_of_views>]
theList = [
[[3, 5], [1, 1], [2, 3]],
[[1, 2], [3, 5], [3, 0], [2, 3], [4, 2]],
[[1, 2], [3, 5], [3, 0], [2, 3], [4, 2]],
[[1, 2], [1, 1], [4, 2]]
]
expected output = [
[[3, 5], [2, 3], [1, 1]],
[[3, 5], [2, 3], [1, 2], [4, 2]],
[[3, 5], [2, 3], [1, 2], [4, 2]],
[[1, 3], [4, 2]]
]
for sublist in theList:
e.x -->
theList[3] = [[1,2], [1,1], [4,2]]
how to merge items that have same userIDs = 1 in this case and sum all the corresponding views to this (userID=1) (2+1) = 3 views into a new_list --> [1,3]
expected theList[3] = [[1,3], [4,2]].
How could I make this process for all theList?
Thanks so much for spending time on this question!
This is one approach using collections.defaultdict.
Ex:
from collections import defaultdict
theList = [
[[3, 5], [1, 1], [2, 3]],
[[1, 2], [3, 5], [3, 0], [2, 3], [4, 2]],
[[1, 2], [3, 5], [3, 0], [2, 3], [4, 2]],
[[1, 2], [1, 1], [4, 2]]
]
result = []
for i in theList:
r = defaultdict(int)
for j, k in i:
r[j] += k
result.append(list(r.items()))
print(result)
Output:
[[(3, 5), (1, 1), (2, 3)],
[(1, 2), (3, 5), (2, 3), (4, 2)],
[(1, 2), (3, 5), (2, 3), (4, 2)],
[(1, 3), (4, 2)]]
Suppose I have an array lets say [1,2,3,4]:
I want to find the sum as follows:
First I generate pairs like:
(1 2 3 4)
(123)(4)
(1)(234)
(12)(34)
(12)(3)(4)
(1)(23)(4)
(1)(2)(34)
(1)(2)(3)(4)
The ans would then be sum of elements in one group multiplied by the length of that group(for all possible groups)
eg in the arrangement (123)(4), the sum would be
(1+2+3)*3 + (4)*1
I just want the final sum which is sum of all such values , not the actual groups. How can I do this?
I was able to do it by first generating all possible groups and then finding the sum
But since I only need the sum and not the actual groups, is there a better way?
The number of arrangements is 2**(len(L)-1). A list of 8 elements produce 128 different arrangements. It is an exponential problem. You either generate all possible solutions and then calculate each answer, or you calculate each answer on the fly. Either way it is still exp.
def part1(L, start, lsum):
if start == len(L):
print lsum
else:
for i in range(start, len(L)):
left = sum(L[start:i+1]) * (i-start+1)
part1(L, i + 1, lsum + left)
def part2(L, M, X, start):
if start == len(L):
M.append(X)
print sum([sum(x) * len(x) for x in X])
else:
for i in range(start, len(L)):
part2(L, M, X + [L[start:i+1]], i + 1)
ex:
>>> part1(L, 0, 0)
10
17
15
28
13
20
22
40
>>> M = []
>>> part2(L, M, [], 0)
10
17
15
28
13
20
22
40
edit: sum of all the sums in O(n**3)
for L = [1,2,3,4,5,6]
[[[1], [2], [3], [4], [5], [6]],
[[1], [2], [3], [4], [5, 6]],
[[1], [2], [3], [4, 5], [6]],
[[1], [2], [3], [4, 5, 6]],
[[1], [2], [3, 4], [5], [6]],
[[1], [2], [3, 4], [5, 6]],
[[1], [2], [3, 4, 5], [6]],
[[1], [2], [3, 4, 5, 6]],
[[1], [2, 3], [4], [5], [6]],
[[1], [2, 3], [4], [5, 6]],
[[1], [2, 3], [4, 5], [6]],
[[1], [2, 3], [4, 5, 6]],
[[1], [2, 3, 4], [5], [6]],
[[1], [2, 3, 4], [5, 6]],
[[1], [2, 3, 4, 5], [6]],
[[1], [2, 3, 4, 5, 6]],
[[1, 2], [3], [4], [5], [6]],
[[1, 2], [3], [4], [5, 6]],
[[1, 2], [3], [4, 5], [6]],
[[1, 2], [3], [4, 5, 6]],
[[1, 2], [3, 4], [5], [6]],
[[1, 2], [3, 4], [5, 6]],
[[1, 2], [3, 4, 5], [6]],
[[1, 2], [3, 4, 5, 6]],
[[1, 2, 3], [4], [5], [6]],
[[1, 2, 3], [4], [5, 6]],
[[1, 2, 3], [4, 5], [6]],
[[1, 2, 3], [4, 5, 6]],
[[1, 2, 3, 4], [5], [6]],
[[1, 2, 3, 4], [5, 6]],
[[1, 2, 3, 4, 5], [6]],
[[1, 2, 3, 4, 5, 6]]]
There seems to be a pattern. The odd case is: the sets having the first elements of the sequence as the smallest element as the sorted set, there are 32. But then all the rest there are 16. For each element of the list, I add all the sets which contains that element as the first sorted element.
def part3(L):
ret = 0
for i in range(len(L)):
p = 0
for k in range(len(L) - i - 1):
p += sum(L[i:i+k+1]) * (k+1) * 2**(len(L) - i - k - 2)
p += sum(L[i:]) * (len(L) - i)
ret += p * max(1, 2**(i-1))
return ret
edit2: to lower it to O(n^2) you need to use DP. building a table of sums to calculate each sum in O(1). You build an array S with S[i] = S[i-1] + L[i] and sum(L[a:b]) is S[b] - S[a].
Given the following data frame:
import pandas as pd
df=pd.DataFrame({'A':['a','b','c'],
'B':[[[1,2],[3,4],[5,6]],[[1,2],[3,4],[5,6]],[[1,2],[3,4],[5,6]]]})
df
A B
0 a [[1, 2], [3, 4], [5, 6]]
1 b [[1, 2], [3, 4], [5, 6]]
2 c [[1, 2], [3, 4], [5, 6]]
I'd like to create a new column ('C') containing the first value in each element of the tuple of column B like this:
A B C
0 a [[1, 2], [3, 4], [5, 6]] [1,3,5]
1 b [[1, 2], [3, 4], [5, 6]] [1,3,5]
2 c [[1, 2], [3, 4], [5, 6]] [1,3,5]
So far, I've tried:
df['C']=df['B'][0]
...but that only returns the first tuple ([1, 2]).
Thanks in advance!
This works for me -
df['C'] = df['B'].str[0]
df['C'] = df['B'].apply(lambda x: [y[0] for y in x])
try this:
df['C'] = df["B"].apply(lambda x : [y[0] for y in list(x)])