How to convert list's values to dict's values - python-3.x

I have a list. Actually this is word's index.
lst = [[1, 2, 3],
[4, 5],
[6]]
and I have a dictionary. Dictionary's value is word's vector(word2vec) and each vector has same dimension(of course).
dic={1:array([0.1, 0.2, 0.3]),
2:array([0.4, 0.5, 0.6]),
3:array([0.7, 0.8, 0.9]),
4:array([1.0, 1.1, 1.2]),
5:array([1.3, 1.4, 1.5]),
6:array([1.6, 1.7, 1.8])}
and I want convert list's values(word index) to dict's values(word vector) what a pair with dictionary(as you look below).
lst = [[[0.1, 0.2, 0.3], [0.4, 0.5, 0.6], [0.7, 0.8, 0.9]],
[[1.0, 1.1, 1.2], [1.3, 1.4, 1.5]],
[[1.6, 1.7, 1.8]]]
Can you help me??

One can use the below helper function:
def word2vec(list_param, dict_param):
for i in range(len(list_param)):
for j in range(len(list_param[i])):
list_param[i][j] = dict_param[list[i][j]]
The list will have the updated values as required. Would strongly recommend not to use reserved key words like list,dict... as variable names.

Using map because I love it :)
ll = [[1, 2, 3], [4, 5], [6]]
dd = {1:array([0.1, 0.2, 0.3]),
2:array([0.4, 0.5, 0.6]),
3:array([0.7, 0.8, 0.9]),
4:array([1.0, 1.1, 1.2]),
5:array([1.3, 1.4, 1.5]),
6:array([1.6, 1.7, 1.8])}
res = []
for item in ll:
res.append(list(map(lambda x: list(dd[x]), item)))
print(res)
Gives
[
[[0.1, 0.2, 0.3], [0.4, 0.5, 0.6], [0.7, 0.8, 0.9]],
[[1.0, 1.1, 1.2], [1.3, 1.4, 1.5]],
[[1.6, 1.7, 1.8]]
]

Related

I want add my simples list in list of comprehension in python

I have two lists of 24 values and I would like to create a list which could be seen as a 24x2 matrix where the first column is my the values of my first list and the other column is the values of my second list.
Here are my two lists:
q = [6.0, 5.75, 5.5, 5.25, 5.0, 4.75, 4.5, 4.25, 4.0, 3.75, 3.5, 3.25, 3.0, 2.75, 2.5, 2.25, 2.0, 1.75, 1.5, 1.25, 1.0, 0.75, 0.5, 0.25]
t = [0.38, 0.51, 0.71, 1.09, 2.0, 5.68, 0.31, 0.32, 0.34, 0.35, 0.36, 0.38, 0.4, 0.42, 0.44, 0.48, 0.51, 0.56, 0.63, 0.74, 1.41, 2.17, 3.97, 11.36]
You can use zip() function like this
q = [6.0, 5.75, 5.5, 5.25, 5.0, 4.75, 4.5, 4.25, 4.0, 3.75, 3.5, 3.25, 3.0, 2.75, 2.5, 2.25, 2.0, 1.75, 1.5, 1.25, 1.0, 0.75, 0.5, 0.25]
t = [0.38, 0.51, 0.71, 1.09, 2.0, 5.68, 0.31, 0.32, 0.34, 0.35, 0.36, 0.38, 0.4, 0.42, 0.44, 0.48, 0.51, 0.56, 0.63, 0.74, 1.41, 2.17, 3.97, 11.36]
L1 = list(zip(q, t))
res = []
for i, j in L1:
res.append(i)
res.append(j)
print(res)
It seems that you just need to zip your two lists:
myList = [0,1,2,3,4,5]
myOtherList = ["a","b","c","d","e","f"]
# Iterator of tuples
zip(myList, myOtherList)
# List of tuples
list(zip(myList, myOtherList))
You will get this result: [(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e'), (5, 'f')].
If you need another structure, you could use comprehension:
length = min(len(myList), len(myOtherList))
# List of list
[[myList[i], myOtherList[i]] for i in range(length)]
# Dict
{myList[i]: myOtherList[i] for i in range(length)}

gridsearch.predict_proba results in list rather than array

I followed example and tried to use gridsearch with a random forest classifier to generate roc_auc_score, however, the y_prob=model.predict_proba(X_test)
I generated was in list (two arrays) rather than one. So I was wondering what went wrong here.
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import label_binarize
from sklearn.model_selection import GridSearchCV, StratifiedShuffleSplit
from sklearn.metrics import roc_auc_score
X = np.random.rand(50,10)
y = np.random.permutation([1] * 25 + [0] * 25)
y= label_binarize(y, classes=[0, 1])
y= np.hstack((1-y, y))
sss = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=7)
index_split = sss.split(X, y)
train_index = []
test_index = []
for train_ind, test_ind in index_split:
train_index.extend(train_ind)
test_index.extend(test_ind)
data_train = X[train_index]
out_train = y[train_index]
data_test = X[test_index]
out_test = y[test_index]
rf = RandomForestClassifier()
grids = {
'n_estimators': [10, 50, 100, 200],
'max_features': ['auto', 'sqrt', 'log2'],
'criterion': ['gini', 'entropy']
}
rf_grids_searched = GridSearchCV(rf,
grids,
scoring = "roc_auc",
n_jobs = -1,
refit=True,
cv = 5,
verbose=10)
rf_grids_searched.fit(data_train, out_train)
rf_best = rf_grids_searched.best_estimator_
y_prob=rf_best.predict_proba(data_test)
print(roc_auc_score(out_test, y_prob))
my result:
array([[0.5, 0.5],
[0.5, 0.5],
[0.7, 0.3],
[0.3, 0.7],
[0.7, 0.3],
[0.5, 0.5],
[0.1, 0.9],
[0.6, 0.4],
[0.6, 0.4],
[0.4, 0.6]]), array([[0.5, 0.5],
[0.5, 0.5],
[0.3, 0.7],
[0.7, 0.3],
[0.3, 0.7],
[0.5, 0.5],
[0.9, 0.1],
[0.4, 0.6],
[0.4, 0.6],
[0.6, 0.4]])]
expected results with probability of [0,1]:
array([[0.5, 0.5],
[0.5, 0.5],
[0.7, 0.3],
[0.3, 0.7],
[0.7, 0.3],
[0.5, 0.5],
[0.1, 0.9],
[0.6, 0.4],
[0.6, 0.4],
I also tried not to binarize y in the first place and then train gridsearch to get the following array y_prob. Later, I binarize y_test to match the dimension of y_prob and get the score. I was wondering if the sequence is correct?
code:
out_test1= label_binarize(out_test, classes=[0, 1])
out_test1= np.hstack((1-out_test1, out_test1))
print(roc_auc_score(out_test1, y_prob))
array([[0.6, 0.4],
[0.5, 0.5],
[0.6, 0.4],
[0.5, 0.5],
[0.7, 0.3],
[0.3, 0.7],
[0.8, 0.2],
[0.4, 0.6],
[0.8, 0.2],
[0.4, 0.6]])
The grid search's predict_proba method is just a dispatch to the best estimator's predict_proba. And from the docstring for RandomForestClassifier.predict_proba (emphasis added):
Returns
p : ndarray of shape (n_samples, n_classes), or a list of n_outputs
such arrays if n_outputs > 1. ...
Since you've specified two outputs (two columns in y), you get predicted probabilities for each of the two classes for each of the two targets.

how to change float 5.0 to number 5 in list of lists

I have the following list
X=[[[0, 'rating'], [1, 4.0], [2, 5.0], [1, 5.0], [0, 4.0], [8, 5.0], [3, 2.0], [5, 5.0], [4, 3.0], [2, 5.0]]]
y=[1, 1, 1, 1, 1, 0, 1, 0, 1, 1]
And I want to fit with sklearn.linear_model in order to classify and count the accuracy of the training data.
By using the following code
classifier = Perceptron(tol=1e-5, random_state=0)
classifier.fit(X,y)
I got this error: ValueError: could not convert string to float: 'rating'
I guess the problem is the float 5.0, but how can I simply change it? I tried with [[int(x) for x in x[1]]]

Python: A constant subtract by elements of a list to return a list

I have a list decay_positions = [0.2, 3, 0.5, 5, 1, 7, 1.5, 8] and I want a list such that
new_position = 2 - decay_positions
Essentially I want a new list where its elements are equal to 2 subtracted by elements of decay_positions
However when I do:
decay_positions = [0.2, 3, 0.5, 5, 1, 7, 1.5, 8]
print(2 - decay_positions)
I get
TypeError: unsupported operand type(s) for -: 'int' and 'list'
So I thought maybe if dimensions aren't the same u can subtract. So I did
decay_positions = [0.2, 3, 0.5, 5, 1, 7, 1.5, 8]
print([2]*len(decay_positions) - decay_positions)
but it still gives TypeError: unsupported operand type(s) for -: 'int' and 'list'
despite the fact that [2]*len(decay_positions) and decay_positions have the same size. So thoughts? Shouldn't element-wise subtraction be quite straightforward?
Use numpy ftw:
>>> import numpy as np
>>> decay_positions = np.array([0.2, 3, 0.5, 5, 1, 7, 1.5, 8])
>>> 2 - decay_positions
array([ 1.8, -1. , 1.5, -3. , 1. , -5. , 0.5, -6. ])
If you for some reason despise numpy, you could always use list comprehensions as a secondary option:
>>> [2-dp for dp in [0.2, 3, 0.5, 5, 1, 7, 1.5, 8]]
[1.8, -1, 1.5, -3, 1, -5, 0.5, -6]
You can do this:
decay_positions = [0.2, 3, 0.5, 5, 1, 7, 1.5, 8]
result = [2-t for t in decay_positions]
print(result)
Try
decay_positions = [0.2, 3, 0.5, 5, 1, 7, 1.5, 8]
new_decay_positions = [2-pos for pos in decay_positions ]
I just want to add that if you want to modify the list in place you can do
decay_positions = [0.2, 3, 0.5, 5, 1, 7, 1.5, 8]
decay_positions[:] = (2 - it for it in decay_positions)
print(decay_positions)

How to construct a numpy array with its each element be the minimum value of all possible values?

I want to construct a 1d numpy array a, and I know each a[i] has several possible values. Of course, the numbers of the possible values of any two elements of a can be different. For each a[i], I want to set it be the minimum value of all the possible values.
For example, I have two array:
idx = np.array([0, 1, 0, 2, 3, 3, 3])
val = np.array([0.1, 0.5, 0.2, 0.6, 0.2, 0.1, 0.3])
The array I want to construct is following:
a = np.array([0.1, 0.5, 0.6, 0.1])
So does there exist any function in numpy can finish this work?
Here's one approach -
def groupby_minimum(idx, val):
sidx = idx.argsort()
sorted_idx = idx[sidx]
cut_idx = np.r_[0,np.flatnonzero(sorted_idx[1:] != sorted_idx[:-1])+1]
return np.minimum.reduceat(val[sidx], cut_idx)
Sample run -
In [36]: idx = np.array([0, 1, 0, 2, 3, 3, 3])
...: val = np.array([0.1, 0.5, 0.2, 0.6, 0.2, 0.1, 0.3])
...:
In [37]: groupby_minimum(idx, val)
Out[37]: array([ 0.1, 0.5, 0.6, 0.1])
Here's another using pandas -
import pandas as pd
def pandas_groupby_minimum(idx, val):
df = pd.DataFrame({'ID' : idx, 'val' : val})
return df.groupby('ID')['val'].min().values
Sample run -
In [66]: pandas_groupby_minimum(idx, val)
Out[66]: array([ 0.1, 0.5, 0.6, 0.1])
You can also use binned_statistic:
from scipy.stats import binned_statistic
idx_list=np.append(np.unique(idx),np.max(idx)+1)
stats=binned_statistic(idx,val,statistic='min', bins=idx_list)
a=stats.statistic
I think, in older scipy versions, statistic='min' was not implemented, but you can use statistic=np.min instead. Intervals are half open in binned_statistic, so this implementation is safe.

Resources