From this thread I found out that I can use an approach with the random.choices for my needs:
class Weights:
ITEM = {
'a': 0.5,
'b': 0.4,
'c': 0.3,
'd': 0.2,
'e': 0.1
}
import random
slot_1 = random.choices(population=list(Weights.ITEM.keys()), weights=list(Weights.ITEM.values()), k=1)[0]
slot_2 = ...?
slot_3 = ...?
Is it possible for me to get an array with the k=3 that will have "unique" results (probably [a,b,c]) or somehow to exclude any previously selected value from the next call (with k=1)?
For example lets say slot_1 got "b" and slot_2 will get a random from the list of everything else without the "b" value.
This step can be sensitive to the performance and I think that creating new arrays each time is not a good idea.
Maybe there is something except random.choices that can be applied in this case.
You could take all the samples all at once using numpy's random.choice with the replace = False option (assuming the weights are just renormalized between steps,) and store them using multiple assignment, to get it into one line of code.
import numpy as np
slot_1, slot_2, slot_3 = np.random.choice(list(Weights.ITEM.keys()), size = 3, replace=False, p=list(Weights.ITEM.values()))
More generally, you could have a function that generated arbitrary length subsamples (k is length, n is number of samples):
def a(n,k,values,weights):
a = np.split(np.random.choice(values, size = n*k,replace=False, p=weights), n)
return [list(sublist) for sublist in a]
>>> a(3,5, range(100), [.01]*100)
[[39, 34, 27, 91, 88], [19, 98, 62, 55, 38], [37, 22, 54, 11, 84]]
Related
I am having issues creating a dictionary that assigns a list of multiple values to each key. Currently the data is in a list of list of 2 items:Category and Value, for example:
sample_data = [["January", 9],["Februrary", 10], ["June", 12], ["March", 15], ["January", 10],["June", 14], ["March", 16]]
It has to be transformed into a dicitonary like this:
d = {"January" : [9,10], "February":[10], "June":[12,14], "March": [15,16]}
This is my current code:
d = defaultdict(list)
for category, value in sample_data:
d[category].append(value)
This works for small samples but with very large samples of data it raises a ValueError saying too much values to unpack. Is there any way I could improve on this code or is there another way of doing this?
So, the setdefault method creates a list as the value for a key.
d = defaultdict(list)
for category, value in sample_data:
d.setdefault(category, []).append(value)
Output:
defaultdict(<class 'list'>, {'January': [9, 10], 'Februrary': [10], 'June': [12, 14], 'March': [15, 16]})
Note: I do not have a larger sample set to work with but the setdefault() method could possibly help out with that.
One way to solve this is prob. change the code to accept more than one values. This is just a wild guess - could it be something in your data (eg. value) problem - eg. one particular month has 2+ more data points showing all at once.
Note - *value means that it can take multiple values (more than one)
Without the * before value, it can only take one number at a time. That is why you got the error "Too many values to unpack..."
Because the sample data is not complete enough to show the exact error point, there's prob. other issue with data. But it could help you eliminate the earlier "error"...or narrow down to it.
data = [["January", 9],["Februrary", 10], ["June", 12],
["March", 15], ["January", 10],["June", 14], ["March", 16],
['April', 20, 21, 22]] # <--- add April & 3 values (to handle the earlier error)
from collections import defaultdict
# d = {"January" : [9,10], "February":[10], "June":[12,14],
# "March": [15,16]}
# This is my current code:
dc = defaultdict(list)
for category, *value in data: # *value to accept multiple values
dc[category].append(value)
print(dc)
output:
defaultdict(<class 'list'>, {'January': [[9], [10]], 'Februrary': [[10]], 'June': [[12], [14]], 'March': [[15], [16]], 'April': [[20, 21, 22]]})
ISSUE
I have a FOR loop that creates a list of lists where each entry consists of the input and associated output. I can't figure out how to iterate over the outputs after the list is created and return the corresponding input. I was able to solve my problem by converting the list into a dataframe and use .loc[], but I'm stubborn and want to produce the same result without having to perform the conversion to a dataframe. I also do not want to convert this into a dictionary, I have already solved for that case as well.
I have included the list that is produced as well as the converted dataframe that works. In this case best_tree_size should return 100 as it's output was the minimum result.
CURRENT CODE THAT WORKS
candidate_max_leaf_nodes = [5, 25, 50, 100, 250, 500]
#list placeholder for loop calculation
leaf_list = []
#Write loop to find the ideal tree size from candidate_max_leaf_nodes
for max_leaf_nodes in candidate_max_leaf_nodes:
#each iteration outputs a 2 item list [leaf, MAE], which appends to leaf_list as an array
leaf_list.append([max_leaf_nodes, get_mae(max_leaf_nodes, train_X, val_X, train_y, val_y)])
#convert array into dataframe
scores = pd.DataFrame(leaf_list, columns =['Leaf', 'MAE'])
#Store the best value of max_leaf_nodes (it will be either 5, 25, 50, 100, 250 or 500)
#idxmin() is finding the min value of MAE and returning the dataframe index
#.loc is utilizing the index from idxmin() and returning the corresponding value from Leaf that caused it
best_tree_size = scores.loc[scores.MAE.idxmin(), 'Leaf']
#clear list placeholder (if needed)
leaf_list.clear()
PRODUCED leaf_list
[[5, 35044.51299744237],
[25, 29016.41319191076],
[50, 27405.930473214907],
[100, 27282.50803885739],
[250, 27893.822225701646],
[500, 29454.18598068598]]
CONVERTED scores DATAFRAME
So you have a list of [leaf, MAE] and you want to get the item from that list with the minimum MAE?
You can do it like this:
scores = [
[5, 35044.51299744237],
[25, 29016.41319191076],
[50, 27405.930473214907],
[100, 27282.50803885739],
[250, 27893.822225701646],
[500, 29454.18598068598]
]
from operator import itemgetter
best_leaf, best_mae = min(scores, key=itemgetter(1))
# beaf_leaf will be equal to 100, best_mae will be equal to 27282.50803885739
The key here is itemgetter(1) which returns a method that, when passed a tuple or a list, returns the element at index 1 (here, the MAE).
We use that as key to min(), so that elements are compared based on their MAE value.
Numpy style:
leaf_list = [
[5, 35044.51299744237],
[25, 29016.41319191076],
[50, 27405.930473214907],
[100, 27282.50803885739],
[250, 27893.822225701646],
[500, 29454.18598068598]
]
# to numpy
leaf_list = np.array(leaf_list)
# reduce dimension
flatten = leaf_list.flatten()
# def. cond. (check every second item (output) and find min value index
index = np.where(flatten == flatten[1::2].min())[0]//2
# output list
out_list = leaf_list[index]
Output:
array([[ 100. , 27282.50803886]])
Also multiple min values (same num.):
leaf_list = [[14, 6],
[25, 55],
[5, 6]]
#... same code
Output:
array([[14, 6],
[ 5, 6]])
I would like to sort the list s and in the same manner the list s1. The code below workers for integers (after changing 2.2 to 2 and 20.6 to 20). How to adjust the code for floats, please?
s = [2.2, 3, 1, 4, 5, 3]
s1 = [20.6, 600, 10, 40, 5000, 300]
res = []
for i in range(len(s1)):
res0 = s1[s[i]]
res.append(res0)
print(res)
print('Sorted s:', sorted(s))
print('Ind:', sorted(range(len(s)), key=lambda k: s[k]))
print('s1 in the same manner as s:', res)
There is actually an error related with a part of your code res0 = s1[s[i]] that pops up:
list indices must be integers or slices, not float.
Supposed that the index is 0: s1[s[0]] -> s[0] == 2.2 -> s1[2.2]
Your code is actually using the values of s as an index for each value of s1. Your code wouldn't be able to sort a list by the manner of another list regardless if the list contains integers only.
Instead, you should add two new arrays:
s_index_list = sorted(range(len(s)), key=lambda k: s[k])
s1_sorted = sorted(s1)
One which contains the index of each value of s (Reference to this answer https://stackoverflow.com/a/7851166/18052186), and another which sort s1.
Then, you replace this bit of your code.
res0 = s1[s[i]]
by
res0 = s1_sorted[s_index_list[i]]
That way, you can sort the list s1 in the same manner as s by actually associating a value of s1 with an index from s. The result would have been:
[40, 10, 20.6, 5000, 300, 600]
Sorted s: [1, 2.2, 3, 3, 4, 5]
Ind: [2, 0, 1, 5, 3, 4]
s1 in the same manner as s: [40, 10, 20.6, 5000, 300, 600]
I want to invert list order without changing the values.
The original list is the following:
[15, 15, 10, 8, 73, 1]
While the resulting expecting list is:
[10, 8, 15, 15, 1, 73]
The example has been taken from a real data handling problem from a more complex pandas data frame.
I proposed a list problem only to simplify the issue. So, it can also be a pandas function.
zlist = int(len(list)/2)
for i in range(0, zlist):
a, b = list.index(sorted(list, reverse=True)[i]), list.index(sorted(list,reverse=False)[i])
list[b], list[a] = list[a], list[b]
I can't seem to find a solution for this. Given two theano tensors a and b, I want to find the indices of elements in b within the tensor a. This example will help, say a = [1, 5, 10, 17, 23, 39] and b = [1, 10, 39], I want the result to be the indices of the b values in tensor a, i.e. [0, 2, 5].
After spending some time, I thought the best way would be to use scan; here is my shot at the minimal example.
def getIndices(b_i, b_v, ar):
pI_subtensor = pI[b_i]
return T.set_subtensor(pI_subtensor, np.where(ar == b_v)[0])
ar = T.ivector()
b = T.ivector()
pI = T.zeros_like(b)
result, updates = theano.scan(fn=getIndices,
outputs_info=None,
sequences=[T.arange(b.shape[0], dtype='int32'), b],
non_sequences=ar)
get_proposal_indices = theano.function([b, ar], outputs=result)
d = get_proposal_indices( np.asarray([1, 10, 39], dtype=np.int32), np.asarray([1, 5, 10, 17, 23, 39], dtype=np.int32) )
I am getting the error:
TypeError: Trying to increment a 0-dimensional subtensor with a 1-dimensional value.
in the return statement line. Further, the output needs to be a single tensor of shape b and I am not sure if this would get the desired result. Any suggestion would be helpful.
It all depends on how big your arrays will be. As long as it fits in memory you can proceed as follows
import numpy as np
import theano
import theano.tensor as T
aa = T.ivector()
bb = T.ivector()
equality = T.eq(aa, bb[:, np.newaxis])
indices = equality.nonzero()[1]
f = theano.function([aa, bb], indices)
a = np.array([1, 5, 10, 17, 23, 39], dtype=np.int32)
b = np.array([1, 10, 39], dtype=np.int32)
f(a, b)
# outputs [0, 2, 5]