Creating Kfold cross validation set without sklearn - python-3.x

I am trying to split my data into K-folds with train and test set. I am stuck at the end:
I have a data set example:
[1,2,3,4,5,6,7,8,9,10]
I have successful created the partition for 5-fold cross validation and the output is
fold=[[2, 1], [6, 0], [7, 8], [9, 5], [4, 3]]
Now I want to create K such instances having K-1 training data and 1 validation set.
I am using this code:
```
cross_val={"train":[],"test":[]}
new_fold=folds.copy()
for i in range(4):
val=folds.pop(i)
cross_val["train"].append(folds)
cross_val["test"].append(val)
folds[i:i]=[val]```
The output that I am getting is:
{'train': [[[6, 0], [7, 8], [9, 5], [4, 3]],
[[6, 0], [7, 8], [9, 5], [4, 3]],
[[6, 0], [7, 8], [9, 5], [4, 3]],
[[6, 0], [7, 8], [9, 5], [4, 3]]],
'test': [[6, 0], [7, 8], [9, 5], [4, 3]]}
This is the wrong output that I am getting.
But I want the output as
train test
[[6, 0], [7, 8], [9, 5], [4, 3]] [2,1]
[[2, 1], [7, 8], [9, 5], [4, 3]] [6,0]
[[6, 0], [2, 1], [9, 5], [4, 3]] [7,8]
[[6, 0], [7, 8], [9, 5], [2, 1]] [4,3]
[[6, 0], [7, 8], [2, 1], [4, 3]] [9,5]

You here each time make edits to the same list, and append that list multiple times. As a result if you edit the list, you see that edit in all elements of the list.
You can create a cross-fold validation with:
train = []
test = []
cross_val={'train': train, 'test': test}
for i, testi in enumerate(fold):
train.append(fold[:i] + fold[i+1:])
test.append(testi)
For the given sample data, this gives us:
>>> pprint(cross_val)
{'test': [[2, 1], [6, 0], [7, 8], [9, 5], [4, 3]],
'train': [[[6, 0], [7, 8], [9, 5], [4, 3]],
[[2, 1], [7, 8], [9, 5], [4, 3]],
[[2, 1], [6, 0], [9, 5], [4, 3]],
[[2, 1], [6, 0], [7, 8], [4, 3]],
[[2, 1], [6, 0], [7, 8], [9, 5]]]}

Related

Converting CSV to PyG graph

I have a CSV dataset as shown below:
index
s_key
identifier
edge_pairs
0
[1683, 1684, 1685, 1686, 1688, 1689, 1691, 1692, 1693, 1694, 1695, 1696, 1697, 1698, 1699, 12740]
[0, 0]
[[0, 793]]
1
[9774, 9800, 9807, 9818, 9831, 9834, 9836, 9837, 9839, 9843, 13723, 21455]
[0, 1]
[[1, 3], [1, 123], [1, 152], [1, 163], [1, 266], [1, 337], [1, 351], [1, 352], [1, 355], [1, 606], [1, 869], [1, 962], [1, 1125], [1, 1412], [1, 1413], [1, 1417], [1, 1435], [1, 1440], [1, 1454], [1, 1572], [1, 1588], [1, 1653], [1, 1726], [1, 1898], [1, 2075], [1, 2076], [1, 2166], [1, 2297], [1, 2299], [1, 2319], [1, 2327], [1, 2330], [1, 2335], [1, 2393], [1, 2395], [1, 2400], [1, 2405], [1, 2486]]
3
[2156, 2896, 3028, 4023, 4256, 6787, 7265, 8882, 8970, 9831, 10959, 11268, 11341, 12601, 13737, 17264, 18906, 20430, 21747, 22228, 22229, 22512, 22841, 24049, 25104, 25394, 25731, 26045, 26103, 31121, 31522, 31839, 31851, 31859, 31872, 35527, 35547, 36538, 37150, 37345, 37692, 37888, 37895, 38962, 45332]
[0, 3]
[[3, 8], [3, 11], [3, 12], [3, 13], [3, 27], [3, 34], [3, 99], [3, 123], [3, 125], [3, 130], [3, 132], [3, 133], [3, 134], [3, 144], [3, 147], [3, 152], [3, 154], [3, 180], [3, 181], [3, 207]]
4
[25203, 25204, 25215, 25219, 25227, 25232, 25235, 25248, 25251, 25252, 25259, 25270]
[0, 4]
[[4, 215], [4, 322], [4, 342], [4, 793], [4, 1043], [4, 1127], [4, 1176], [4, 1454], [4, 2154], [4, 2284], [4, 2331], [4, 2400], [4, 2759], [4, 2920], [4, 3335]]
5
[27099, 27101, 27104, 27107, 27108, 27111, 27117, 27120, 27123, 27131, 27143, 27153, 27156, 27158, 27162, 27167, 27172, 27175, 27176, 27178, 27184, 27185]
[0, 5]
[[5, 8], [5, 239], [5, 378], [5, 1163], [5, 1220], [5, 1378], [5, 1422], [5, 1440], [5, 1636], [5, 1681], [5, 2190], [5, 2303], [5, 2399]]
The index column represents each node.
The edge_pairs column represents the connection of each node.
For example: In Index 0, the edge pair column: [[0, 793]] represents the connection of node 0 with Node 793 and so on.
I want to make a graph out of this CSV in a format that PyG accepts data = Data(x=x, edge_index=edge_index, y=y).
I am unsure of what to take as Node Features & Labels and how to represent the connection of edges between them.

Partition a given list into two or more unordered sets

lst = [2, 2, 2, 3, 5, 7]
I want to partition it into two or more sets.
Here is my code
import more_itertools as mit
import pprint as pp
from operator import itemgetter
from itertools import groupby
x = [part for k in range(1, len(lst) + 1) for part in mit.set_partitions(lst, k)]
x.sort()
x = list(map(itemgetter(0), groupby(x)))
pp.pprint(x)
Here is my current output:
[[[2], [2], [2], [3], [5], [7]],
[[2], [2], [2], [3], [5, 7]],
[[2], [2], [2], [3, 5], [7]],
[[2], [2], [2], [3, 5, 7]],
[[2], [2], [2], [5], [3, 7]],
[[2], [2], [2, 3], [5], [7]],
[[2], [2], [2, 3], [5, 7]],
[[2], [2], [2, 3, 5], [7]],
[[2], [2], [2, 3, 5, 7]],
[[2], [2], [2, 5], [3, 7]],
[[2], [2], [3], [2, 5], [7]],
[[2], [2], [3], [2, 5, 7]],
[[2], [2], [3], [5], [2, 7]],
[[2], [2], [3, 5], [2, 7]],
[[2], [2], [5], [2, 3, 7]],
[[2], [2, 2], [3], [5], [7]],
[[2], [2, 2], [3], [5, 7]],
[[2], [2, 2], [3, 5], [7]],
[[2], [2, 2], [3, 5, 7]],
[[2], [2, 2], [5], [3, 7]],
[[2], [2, 2, 3], [5], [7]],
[[2], [2, 2, 3], [5, 7]],
[[2], [2, 2, 3, 5], [7]],
[[2], [2, 2, 3, 5, 7]],
[[2], [2, 2, 5], [3, 7]],
[[2], [2, 3], [2, 5], [7]],
[[2], [2, 3], [2, 5, 7]],
[[2], [2, 3], [5], [2, 7]],
[[2], [2, 3, 5], [2, 7]],
[[2], [2, 5], [2, 3, 7]],
[[2], [3], [2, 2, 5], [7]],
[[2], [3], [2, 2, 5, 7]],
[[2], [3], [2, 5], [2, 7]],
[[2], [3], [5], [2, 2, 7]],
[[2], [3, 5], [2, 2, 7]],
[[2], [5], [2, 2, 3, 7]],
[[2, 2], [2], [3], [5], [7]],
[[2, 2], [2], [3], [5, 7]],
[[2, 2], [2], [3, 5], [7]],
[[2, 2], [2], [3, 5, 7]],
[[2, 2], [2], [5], [3, 7]],
[[2, 2], [2, 3], [5], [7]],
[[2, 2], [2, 3], [5, 7]],
[[2, 2], [2, 3, 5], [7]],
[[2, 2], [2, 3, 5, 7]],
[[2, 2], [2, 5], [3, 7]],
[[2, 2], [3], [2, 5], [7]],
[[2, 2], [3], [2, 5, 7]],
[[2, 2], [3], [5], [2, 7]],
[[2, 2], [3, 5], [2, 7]],
[[2, 2], [5], [2, 3, 7]],
[[2, 2, 2], [3], [5], [7]],
[[2, 2, 2], [3], [5, 7]],
[[2, 2, 2], [3, 5], [7]],
[[2, 2, 2], [3, 5, 7]],
[[2, 2, 2], [5], [3, 7]],
[[2, 2, 2, 3], [5], [7]],
[[2, 2, 2, 3], [5, 7]],
[[2, 2, 2, 3, 5], [7]],
[[2, 2, 2, 3, 5, 7]],
[[2, 2, 2, 5], [3, 7]],
[[2, 2, 3], [2, 5], [7]],
[[2, 2, 3], [2, 5, 7]],
[[2, 2, 3], [5], [2, 7]],
[[2, 2, 3, 5], [2, 7]],
[[2, 2, 5], [2, 3, 7]],
[[2, 3], [2, 2, 5], [7]],
[[2, 3], [2, 2, 5, 7]],
[[2, 3], [2, 5], [2, 7]],
[[2, 3], [5], [2, 2, 7]],
[[2, 3, 5], [2, 2, 7]],
[[2, 5], [2, 2, 3, 7]],
[[3], [2, 2, 2, 5], [7]],
[[3], [2, 2, 2, 5, 7]],
[[3], [2, 2, 5], [2, 7]],
[[3], [2, 5], [2, 2, 7]],
[[3], [5], [2, 2, 2, 7]],
[[3, 5], [2, 2, 2, 7]],
[[5], [2, 2, 2, 3, 7]]]
I have managed to remove some amount of redundancy by sorting followed by using groupby(x) as you can see but there are more redundancies like
[[2], [2, 2], [3], [5], [7]]
and
[[2, 2], [2], [3], [5], [7]]
for me are one and the same thing since order is not important to me.
Please dont close the question without putting in an effort. I have framed the question only after going through other similar questions on stackoverflow, they are for ordered sets, mine is for unordered sets.(part of my code were formed from those answers)
You basically need to construct a set somehow so that you only have unique combinations in your result(removing all permutations of a list except 1). This can be done by sorting each sublist of the list and then constructing a dictionary.
import more_itertools as mit
import pprint as pp
from operator import itemgetter
from itertools import groupby
lst=[2, 2, 2, 3, 5, 7]
x = [part for k in range(1, len(lst) + 1) for part in mit.set_partitions(lst, k)]
x.sort()
x = list(map(itemgetter(0), groupby(x)))
# sort each sublist
for temp in x:
temp = temp.sort()
# create a dictionary(hashtable), key of a dictionary should be immutable therefore stringified it
unique_combinations = {str(temp):temp for temp in x}
# since unique keys will have unique values, we have unique combinations here
unique_combinations = list(unique_combinations.values())
pp.pprint(unique_combinations)
Output
[[[2], [2], [2], [3], [5], [7]],
[[2], [2], [2], [3], [5, 7]],
[[2], [2], [2], [3, 5], [7]],
[[2], [2], [2], [3, 5, 7]],
[[2], [2], [2], [3, 7], [5]],
[[2], [2], [2, 3], [5], [7]],
[[2], [2], [2, 3], [5, 7]],
[[2], [2], [2, 3, 5], [7]],
[[2], [2], [2, 3, 5, 7]],
[[2], [2], [2, 5], [3, 7]],
[[2], [2], [2, 5], [3], [7]],
[[2], [2], [2, 5, 7], [3]],
[[2], [2], [2, 7], [3], [5]],
[[2], [2], [2, 7], [3, 5]],
[[2], [2], [2, 3, 7], [5]],
[[2], [2, 2], [3], [5], [7]],
[[2], [2, 2], [3], [5, 7]],
[[2], [2, 2], [3, 5], [7]],
[[2], [2, 2], [3, 5, 7]],
[[2], [2, 2], [3, 7], [5]],
[[2], [2, 2, 3], [5], [7]],
[[2], [2, 2, 3], [5, 7]],
[[2], [2, 2, 3, 5], [7]],
[[2], [2, 2, 3, 5, 7]],
[[2], [2, 2, 5], [3, 7]],
[[2], [2, 3], [2, 5], [7]],
[[2], [2, 3], [2, 5, 7]],
[[2], [2, 3], [2, 7], [5]],
[[2], [2, 3, 5], [2, 7]],
[[2], [2, 3, 7], [2, 5]],
[[2], [2, 2, 5], [3], [7]],
[[2], [2, 2, 5, 7], [3]],
[[2], [2, 5], [2, 7], [3]],
[[2], [2, 2, 7], [3], [5]],
[[2], [2, 2, 7], [3, 5]],
[[2], [2, 2, 3, 7], [5]],
[[2, 2], [2, 3], [5], [7]],
[[2, 2], [2, 3], [5, 7]],
[[2, 2], [2, 3, 5], [7]],
[[2, 2], [2, 3, 5, 7]],
[[2, 2], [2, 5], [3, 7]],
[[2, 2], [2, 5], [3], [7]],
[[2, 2], [2, 5, 7], [3]],
[[2, 2], [2, 7], [3], [5]],
[[2, 2], [2, 7], [3, 5]],
[[2, 2], [2, 3, 7], [5]],
[[2, 2, 2], [3], [5], [7]],
[[2, 2, 2], [3], [5, 7]],
[[2, 2, 2], [3, 5], [7]],
[[2, 2, 2], [3, 5, 7]],
[[2, 2, 2], [3, 7], [5]],
[[2, 2, 2, 3], [5], [7]],
[[2, 2, 2, 3], [5, 7]],
[[2, 2, 2, 3, 5], [7]],
[[2, 2, 2, 3, 5, 7]],
[[2, 2, 2, 5], [3, 7]],
[[2, 2, 3], [2, 5], [7]],
[[2, 2, 3], [2, 5, 7]],
[[2, 2, 3], [2, 7], [5]],
[[2, 2, 3, 5], [2, 7]],
[[2, 2, 5], [2, 3, 7]],
[[2, 2, 5], [2, 3], [7]],
[[2, 2, 5, 7], [2, 3]],
[[2, 3], [2, 5], [2, 7]],
[[2, 2, 7], [2, 3], [5]],
[[2, 2, 7], [2, 3, 5]],
[[2, 2, 3, 7], [2, 5]],
[[2, 2, 2, 5], [3], [7]],
[[2, 2, 2, 5, 7], [3]],
[[2, 2, 5], [2, 7], [3]],
[[2, 2, 7], [2, 5], [3]],
[[2, 2, 2, 7], [3], [5]],
[[2, 2, 2, 7], [3, 5]],
[[2, 2, 2, 3, 7], [5]]]

list merging between list with condition in python [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed last year.
Improve this question
i have an output in a single list like this:
[[1, 74], [1, 224], [1, 247], [1, 5], [1, 225], [1, 207], [1, 79], [1, 131], [1, 180], [1, 20], [1, 104], [1, 93], [1, 213], [1, 93], [1, 151], [1, 223]]
[[2, 200], [2, 64], [2, 51], [2, 83], [2, 127], [2, 160], [2, 237], [2, 98], [2, 123], [2, 213], [2, 80], [2, 131], [2, 200], [2, 203], [2, 8], [2, 174]]
[[3, 148], [3, 72], [3, 37], [3, 40], [3, 237], [3, 24], [3, 177], [3, 205], [3, 52], [3, 53], [3, 155], [3, 208], [3, 184], [3, 44], [3, 202], [3, 171]]
but i want to make the output become:
[[1,74],[2,200],[3,148]]
[[1,224],[2,64],[3,72]]
[[1,247],[2,51],[3,37]]
and so on...
is it by mapping? or just looping then append?
please help me with code
I'm assuming your list lengths are equal for all three lists. Because you shared so.
list1 = [[1, 74], [1, 224], [1, 247], [1, 5], [1, 225], [1, 207], [1, 79], [1, 131], [1, 180], [1, 20], [1, 104], [1, 93], [1, 213], [1, 93], [1, 151], [1, 223]]
list2 = [[2, 200], [2, 64], [2, 51], [2, 83], [2, 127], [2, 160], [2, 237], [2, 98], [2, 123], [2, 213], [2, 80], [2, 131], [2, 200], [2, 203], [2, 8], [2, 174]]
list3 = [[3, 148], [3, 72], [3, 37], [3, 40], [3, 237], [3, 24], [3, 177], [3, 205], [3, 52], [3, 53], [3, 155], [3, 208], [3, 184], [3, 44], [3, 202], [3, 171]]
for i in range(len(list1)):
print([list1[i],list2[i],list3[i]])
For the basic level, you can use the code above.

PyTorch unfold vs as_stride

It seems PyTorch unfold and as_stride are doing the same thing but for the former, you cannot control the tensor output size.
import torch
import torch.nn as nn
x = torch.arange(0, 10)
x1 = x.unfold(0, 3, 1)
x2 = x.as_strided((8,3), (1,1))
print(f'x1 = {x1}')
print(f'x2 = {x2}')
output:
x1 = tensor([[0, 1, 2],
[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6],
[5, 6, 7],
[6, 7, 8],
[7, 8, 9]])
x2 = tensor([[0, 1, 2],
[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6],
[5, 6, 7],
[6, 7, 8],
[7, 8, 9]])
Then is there any situation that you should use unfold instead of as_stride and vice versa?

Merging multiples lists with same length when matching conditions

given input: theList = [<userID>,<number_of_views>]
theList = [
[[3, 5], [1, 1], [2, 3]],
[[1, 2], [3, 5], [3, 0], [2, 3], [4, 2]],
[[1, 2], [3, 5], [3, 0], [2, 3], [4, 2]],
[[1, 2], [1, 1], [4, 2]]
]
expected output = [
[[3, 5], [2, 3], [1, 1]],
[[3, 5], [2, 3], [1, 2], [4, 2]],
[[3, 5], [2, 3], [1, 2], [4, 2]],
[[1, 3], [4, 2]]
]
for sublist in theList:
e.x -->
theList[3] = [[1,2], [1,1], [4,2]]
how to merge items that have same userIDs = 1 in this case and sum all the corresponding views to this (userID=1) (2+1) = 3 views into a new_list --> [1,3]
expected theList[3] = [[1,3], [4,2]].
How could I make this process for all theList?
Thanks so much for spending time on this question!
This is one approach using collections.defaultdict.
Ex:
from collections import defaultdict
theList = [
[[3, 5], [1, 1], [2, 3]],
[[1, 2], [3, 5], [3, 0], [2, 3], [4, 2]],
[[1, 2], [3, 5], [3, 0], [2, 3], [4, 2]],
[[1, 2], [1, 1], [4, 2]]
]
result = []
for i in theList:
r = defaultdict(int)
for j, k in i:
r[j] += k
result.append(list(r.items()))
print(result)
Output:
[[(3, 5), (1, 1), (2, 3)],
[(1, 2), (3, 5), (2, 3), (4, 2)],
[(1, 2), (3, 5), (2, 3), (4, 2)],
[(1, 3), (4, 2)]]

Resources