I have a list. Actually this is word's index.
lst = [[1, 2, 3],
[4, 5],
[6]]
and I have a dictionary. Dictionary's value is word's vector(word2vec) and each vector has same dimension(of course).
dic={1:array([0.1, 0.2, 0.3]),
2:array([0.4, 0.5, 0.6]),
3:array([0.7, 0.8, 0.9]),
4:array([1.0, 1.1, 1.2]),
5:array([1.3, 1.4, 1.5]),
6:array([1.6, 1.7, 1.8])}
and I want convert list's values(word index) to dict's values(word vector) what a pair with dictionary(as you look below).
lst = [[[0.1, 0.2, 0.3], [0.4, 0.5, 0.6], [0.7, 0.8, 0.9]],
[[1.0, 1.1, 1.2], [1.3, 1.4, 1.5]],
[[1.6, 1.7, 1.8]]]
Can you help me??
One can use the below helper function:
def word2vec(list_param, dict_param):
for i in range(len(list_param)):
for j in range(len(list_param[i])):
list_param[i][j] = dict_param[list[i][j]]
The list will have the updated values as required. Would strongly recommend not to use reserved key words like list,dict... as variable names.
Using map because I love it :)
ll = [[1, 2, 3], [4, 5], [6]]
dd = {1:array([0.1, 0.2, 0.3]),
2:array([0.4, 0.5, 0.6]),
3:array([0.7, 0.8, 0.9]),
4:array([1.0, 1.1, 1.2]),
5:array([1.3, 1.4, 1.5]),
6:array([1.6, 1.7, 1.8])}
res = []
for item in ll:
res.append(list(map(lambda x: list(dd[x]), item)))
print(res)
Gives
[
[[0.1, 0.2, 0.3], [0.4, 0.5, 0.6], [0.7, 0.8, 0.9]],
[[1.0, 1.1, 1.2], [1.3, 1.4, 1.5]],
[[1.6, 1.7, 1.8]]
]
Related
I have two lists of 24 values and I would like to create a list which could be seen as a 24x2 matrix where the first column is my the values of my first list and the other column is the values of my second list.
Here are my two lists:
q = [6.0, 5.75, 5.5, 5.25, 5.0, 4.75, 4.5, 4.25, 4.0, 3.75, 3.5, 3.25, 3.0, 2.75, 2.5, 2.25, 2.0, 1.75, 1.5, 1.25, 1.0, 0.75, 0.5, 0.25]
t = [0.38, 0.51, 0.71, 1.09, 2.0, 5.68, 0.31, 0.32, 0.34, 0.35, 0.36, 0.38, 0.4, 0.42, 0.44, 0.48, 0.51, 0.56, 0.63, 0.74, 1.41, 2.17, 3.97, 11.36]
You can use zip() function like this
q = [6.0, 5.75, 5.5, 5.25, 5.0, 4.75, 4.5, 4.25, 4.0, 3.75, 3.5, 3.25, 3.0, 2.75, 2.5, 2.25, 2.0, 1.75, 1.5, 1.25, 1.0, 0.75, 0.5, 0.25]
t = [0.38, 0.51, 0.71, 1.09, 2.0, 5.68, 0.31, 0.32, 0.34, 0.35, 0.36, 0.38, 0.4, 0.42, 0.44, 0.48, 0.51, 0.56, 0.63, 0.74, 1.41, 2.17, 3.97, 11.36]
L1 = list(zip(q, t))
res = []
for i, j in L1:
res.append(i)
res.append(j)
print(res)
It seems that you just need to zip your two lists:
myList = [0,1,2,3,4,5]
myOtherList = ["a","b","c","d","e","f"]
# Iterator of tuples
zip(myList, myOtherList)
# List of tuples
list(zip(myList, myOtherList))
You will get this result: [(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e'), (5, 'f')].
If you need another structure, you could use comprehension:
length = min(len(myList), len(myOtherList))
# List of list
[[myList[i], myOtherList[i]] for i in range(length)]
# Dict
{myList[i]: myOtherList[i] for i in range(length)}
I followed example and tried to use gridsearch with a random forest classifier to generate roc_auc_score, however, the y_prob=model.predict_proba(X_test)
I generated was in list (two arrays) rather than one. So I was wondering what went wrong here.
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import label_binarize
from sklearn.model_selection import GridSearchCV, StratifiedShuffleSplit
from sklearn.metrics import roc_auc_score
X = np.random.rand(50,10)
y = np.random.permutation([1] * 25 + [0] * 25)
y= label_binarize(y, classes=[0, 1])
y= np.hstack((1-y, y))
sss = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=7)
index_split = sss.split(X, y)
train_index = []
test_index = []
for train_ind, test_ind in index_split:
train_index.extend(train_ind)
test_index.extend(test_ind)
data_train = X[train_index]
out_train = y[train_index]
data_test = X[test_index]
out_test = y[test_index]
rf = RandomForestClassifier()
grids = {
'n_estimators': [10, 50, 100, 200],
'max_features': ['auto', 'sqrt', 'log2'],
'criterion': ['gini', 'entropy']
}
rf_grids_searched = GridSearchCV(rf,
grids,
scoring = "roc_auc",
n_jobs = -1,
refit=True,
cv = 5,
verbose=10)
rf_grids_searched.fit(data_train, out_train)
rf_best = rf_grids_searched.best_estimator_
y_prob=rf_best.predict_proba(data_test)
print(roc_auc_score(out_test, y_prob))
my result:
array([[0.5, 0.5],
[0.5, 0.5],
[0.7, 0.3],
[0.3, 0.7],
[0.7, 0.3],
[0.5, 0.5],
[0.1, 0.9],
[0.6, 0.4],
[0.6, 0.4],
[0.4, 0.6]]), array([[0.5, 0.5],
[0.5, 0.5],
[0.3, 0.7],
[0.7, 0.3],
[0.3, 0.7],
[0.5, 0.5],
[0.9, 0.1],
[0.4, 0.6],
[0.4, 0.6],
[0.6, 0.4]])]
expected results with probability of [0,1]:
array([[0.5, 0.5],
[0.5, 0.5],
[0.7, 0.3],
[0.3, 0.7],
[0.7, 0.3],
[0.5, 0.5],
[0.1, 0.9],
[0.6, 0.4],
[0.6, 0.4],
I also tried not to binarize y in the first place and then train gridsearch to get the following array y_prob. Later, I binarize y_test to match the dimension of y_prob and get the score. I was wondering if the sequence is correct?
code:
out_test1= label_binarize(out_test, classes=[0, 1])
out_test1= np.hstack((1-out_test1, out_test1))
print(roc_auc_score(out_test1, y_prob))
array([[0.6, 0.4],
[0.5, 0.5],
[0.6, 0.4],
[0.5, 0.5],
[0.7, 0.3],
[0.3, 0.7],
[0.8, 0.2],
[0.4, 0.6],
[0.8, 0.2],
[0.4, 0.6]])
The grid search's predict_proba method is just a dispatch to the best estimator's predict_proba. And from the docstring for RandomForestClassifier.predict_proba (emphasis added):
Returns
p : ndarray of shape (n_samples, n_classes), or a list of n_outputs
such arrays if n_outputs > 1. ...
Since you've specified two outputs (two columns in y), you get predicted probabilities for each of the two classes for each of the two targets.
I have the following list
X=[[[0, 'rating'], [1, 4.0], [2, 5.0], [1, 5.0], [0, 4.0], [8, 5.0], [3, 2.0], [5, 5.0], [4, 3.0], [2, 5.0]]]
y=[1, 1, 1, 1, 1, 0, 1, 0, 1, 1]
And I want to fit with sklearn.linear_model in order to classify and count the accuracy of the training data.
By using the following code
classifier = Perceptron(tol=1e-5, random_state=0)
classifier.fit(X,y)
I got this error: ValueError: could not convert string to float: 'rating'
I guess the problem is the float 5.0, but how can I simply change it? I tried with [[int(x) for x in x[1]]]
I have a list decay_positions = [0.2, 3, 0.5, 5, 1, 7, 1.5, 8] and I want a list such that
new_position = 2 - decay_positions
Essentially I want a new list where its elements are equal to 2 subtracted by elements of decay_positions
However when I do:
decay_positions = [0.2, 3, 0.5, 5, 1, 7, 1.5, 8]
print(2 - decay_positions)
I get
TypeError: unsupported operand type(s) for -: 'int' and 'list'
So I thought maybe if dimensions aren't the same u can subtract. So I did
decay_positions = [0.2, 3, 0.5, 5, 1, 7, 1.5, 8]
print([2]*len(decay_positions) - decay_positions)
but it still gives TypeError: unsupported operand type(s) for -: 'int' and 'list'
despite the fact that [2]*len(decay_positions) and decay_positions have the same size. So thoughts? Shouldn't element-wise subtraction be quite straightforward?
Use numpy ftw:
>>> import numpy as np
>>> decay_positions = np.array([0.2, 3, 0.5, 5, 1, 7, 1.5, 8])
>>> 2 - decay_positions
array([ 1.8, -1. , 1.5, -3. , 1. , -5. , 0.5, -6. ])
If you for some reason despise numpy, you could always use list comprehensions as a secondary option:
>>> [2-dp for dp in [0.2, 3, 0.5, 5, 1, 7, 1.5, 8]]
[1.8, -1, 1.5, -3, 1, -5, 0.5, -6]
You can do this:
decay_positions = [0.2, 3, 0.5, 5, 1, 7, 1.5, 8]
result = [2-t for t in decay_positions]
print(result)
Try
decay_positions = [0.2, 3, 0.5, 5, 1, 7, 1.5, 8]
new_decay_positions = [2-pos for pos in decay_positions ]
I just want to add that if you want to modify the list in place you can do
decay_positions = [0.2, 3, 0.5, 5, 1, 7, 1.5, 8]
decay_positions[:] = (2 - it for it in decay_positions)
print(decay_positions)
I want to construct a 1d numpy array a, and I know each a[i] has several possible values. Of course, the numbers of the possible values of any two elements of a can be different. For each a[i], I want to set it be the minimum value of all the possible values.
For example, I have two array:
idx = np.array([0, 1, 0, 2, 3, 3, 3])
val = np.array([0.1, 0.5, 0.2, 0.6, 0.2, 0.1, 0.3])
The array I want to construct is following:
a = np.array([0.1, 0.5, 0.6, 0.1])
So does there exist any function in numpy can finish this work?
Here's one approach -
def groupby_minimum(idx, val):
sidx = idx.argsort()
sorted_idx = idx[sidx]
cut_idx = np.r_[0,np.flatnonzero(sorted_idx[1:] != sorted_idx[:-1])+1]
return np.minimum.reduceat(val[sidx], cut_idx)
Sample run -
In [36]: idx = np.array([0, 1, 0, 2, 3, 3, 3])
...: val = np.array([0.1, 0.5, 0.2, 0.6, 0.2, 0.1, 0.3])
...:
In [37]: groupby_minimum(idx, val)
Out[37]: array([ 0.1, 0.5, 0.6, 0.1])
Here's another using pandas -
import pandas as pd
def pandas_groupby_minimum(idx, val):
df = pd.DataFrame({'ID' : idx, 'val' : val})
return df.groupby('ID')['val'].min().values
Sample run -
In [66]: pandas_groupby_minimum(idx, val)
Out[66]: array([ 0.1, 0.5, 0.6, 0.1])
You can also use binned_statistic:
from scipy.stats import binned_statistic
idx_list=np.append(np.unique(idx),np.max(idx)+1)
stats=binned_statistic(idx,val,statistic='min', bins=idx_list)
a=stats.statistic
I think, in older scipy versions, statistic='min' was not implemented, but you can use statistic=np.min instead. Intervals are half open in binned_statistic, so this implementation is safe.