Is it ParameterGrid function from scikit-learn 0.22 in Python 3.7.5 that produces duplicates or is it because I don't use it correctly? Have a look at the following example.
from sklearn.model_selection import ParameterGrid
import pandas as pd
hyper_params_dict = {
"SQM_FOLDER_SUFFIX": ["_SQM_MM"],
"HYPER_RATIO_SCORED_POSES": [0.8],
"HYPER_OUTLIER_MAD_THRESHOLD": [2.0, 2.5, 3.0, 3.5, 4.0, 5.0, 6.0, 7.0, 8.0],
"HYPER_KEEP_MAX_DeltaG_POSES": [1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0],
"HYPER_KEEP_POSE_COLUMN": ["r_i_docking_score"],
"HYPER_SELECT_BEST_BASEMOLNAME_SCORE_BY": ["Eint"],
"HYPER_SELECT_BEST_BASEMOLNAME_POSE_BY": ["Eint"],
"HYPER_SELECT_BEST_STRUCTVAR_POSE_BY": ["complexE"],
"CROSSVAL_PROTEINS_STRING": ['MARK4', 'ACHE', 'JNK2', 'AR', 'EPHB4', 'PPARG', 'MDM2', 'PARP-1', 'TP', 'TPA',
'SIRT2', 'SARS-HCoV', 'PPARG'],
"XTEST_PROTEINS_STRING": [""],
"HYPER_2D_DESCRIPTORS": [""],
"HYPER_3D_DESCRIPTORS": [""],
"HYPER_GLIDE_DESCRIPTORS": [""]
}
df = pd.concat([pd.DataFrame({k: [v] for k, v in p.items()}) for p in ParameterGrid(hyper_params_dict)], ignore_index=True)
df.duplicated().value_counts()
ParameterGrid creates combinations of all values without duplicates.
You have duplicated parameters combinations because CROSSVAL_PROTEINS_STRING contains 2 times the value PPARG.
Related
I have two lists of 24 values and I would like to create a list which could be seen as a 24x2 matrix where the first column is my the values of my first list and the other column is the values of my second list.
Here are my two lists:
q = [6.0, 5.75, 5.5, 5.25, 5.0, 4.75, 4.5, 4.25, 4.0, 3.75, 3.5, 3.25, 3.0, 2.75, 2.5, 2.25, 2.0, 1.75, 1.5, 1.25, 1.0, 0.75, 0.5, 0.25]
t = [0.38, 0.51, 0.71, 1.09, 2.0, 5.68, 0.31, 0.32, 0.34, 0.35, 0.36, 0.38, 0.4, 0.42, 0.44, 0.48, 0.51, 0.56, 0.63, 0.74, 1.41, 2.17, 3.97, 11.36]
You can use zip() function like this
q = [6.0, 5.75, 5.5, 5.25, 5.0, 4.75, 4.5, 4.25, 4.0, 3.75, 3.5, 3.25, 3.0, 2.75, 2.5, 2.25, 2.0, 1.75, 1.5, 1.25, 1.0, 0.75, 0.5, 0.25]
t = [0.38, 0.51, 0.71, 1.09, 2.0, 5.68, 0.31, 0.32, 0.34, 0.35, 0.36, 0.38, 0.4, 0.42, 0.44, 0.48, 0.51, 0.56, 0.63, 0.74, 1.41, 2.17, 3.97, 11.36]
L1 = list(zip(q, t))
res = []
for i, j in L1:
res.append(i)
res.append(j)
print(res)
It seems that you just need to zip your two lists:
myList = [0,1,2,3,4,5]
myOtherList = ["a","b","c","d","e","f"]
# Iterator of tuples
zip(myList, myOtherList)
# List of tuples
list(zip(myList, myOtherList))
You will get this result: [(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e'), (5, 'f')].
If you need another structure, you could use comprehension:
length = min(len(myList), len(myOtherList))
# List of list
[[myList[i], myOtherList[i]] for i in range(length)]
# Dict
{myList[i]: myOtherList[i] for i in range(length)}
Example of array content:
[
[4.9, 3.0, 1.4, 0.2, 0.0, 2.0],
[4.7, 3.2, 1.3, 0.2, 0.0, 2.0],
[4.6, 3.1, 1.5, 0.2, 0.0, 2.0],
...
]
model = TSNE(learning_rate=100)
transformed = model.fit_transform(data)
I'm trying to apply tSNE to a float array, but I get an error. What should I change?
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (149,) + inhomogeneous part.
Try this example:
from sklearn.manifold import TSNE
import numpy as np
X = np.array([[4.9, 3.0, 1.4, 0.2, 0.0, 2.0], [4.7, 3.2, 1.3, 0.2, 0.0, 2.0]])
model = TSNE(learning_rate=100)
transformed = model.fit_transform(X)
print(transformed)
I am trying to get the columns from dataframe whose correlation with another column is greater than certain values like below.
df.loc[:, (df.corr()['col'] <= -0.05) | (df.corr()['col'] >= 0.05)]
But I am getting below error,
IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).
Also if I try to select the columns with variance > 1, I get the same error,
df.loc[;df.var() > 1 ].
Why I am getting indexing error. I want to filter the columns of dataframe if correlation of that column with another columns is between -0.05 and 0.05.
Can someone assist in resolving this issue. I am not sure where I am going wrong
I think I found what's your problem.
First I tried to build my own testing set, unfortunately everything worked nicely:
df = pd.DataFrame({
"col": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0],
"A": [1.1, 1.0, 1.0, 1.0, 1.0, 1.1],
"B": [1.0, 2.1, 3.0, 3.9, 5.0, 6.0]
})
df.loc[:, (df.corr()['col'] <= -0.05) | (df.corr()['col'] >= 0.05)]
I got :
col B
0 1.0 1.0
1 2.0 2.1
2 3.0 3.0
3 4.0 3.9
4 5.0 5.0
5 6.0 6.0
But then, after reading again your error, I thought maybe there are some columns in your data the corr() method is just ignoring such as column with an object dtype.
If I build a new testing set with textual columns, I get the same error as you:
df = pd.DataFrame({
"col": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0],
"A": [1.1, 1.0, 1.0, 1.0, 1.0, 1.1],
"B": [1.0, 2.1, 3.0, 3.9, 5.0, 6.0],
"C": ["A", "B", "C", "D", "E", "F"]
})
df.corr()['col'] >= 0.05
df.loc[:, (df.corr()['col'] <= -0.05) | (df.corr()['col'] >= 0.05)]
Then I got:
pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).
One way of fixing this is by doing so:
df = df.drop(columns=df.corr().query("-0.05 < col < 0.05").index)
Note: Please remind you'll have quicker and more relevant answers if you provide a full sample of the non-working code so that your error can be reproduced easily ;)
I have two 2D arrays, I want to create arrays that are copy of the first one and then replace some columns by others from the second one.
M1 = np.array([[1.0, 2.0, 3.0, 1.0, 2.0, 3.0],
[4.0, 5.0, 6.0, 4.0, 5.0, 6.0]])
M2 = np.array([[1.1, 2.1, 3.1, 1.2, 2.2, 3.2],
[4.1, 5.1, 6.1., 4.2, 5.2, 6.2]])
I want to do a loop that can give the following arrays:
M3 = np.array([[1.1, 2.0, 3.0, 1.2, 2.0, 3.0],
[4.1, 5.0, 6.0, 4.2, 5.0, 6.0]])
M4 = np.array([[1.0, 2.1, 3.0, 1.0, 2.2, 3.0],
[4.0, 5.1, 6.0, 4.0, 5.2, 6.0]])
M5 = np.array([[1.0, 2.0, 3.1, 1.0, 2.0, 3.2],
[4.0, 5.0, 6.1, 4.0, 5.0, 6.2]])
You can use np.where:
selector = [1,0,0,1,0,0]
np.where(selector,M2,M1)
# array([[1.1, 2. , 3. , 1.2, 2. , 3. ],
# [4.1, 5. , 6. , 4.2, 5. , 6. ]])
selector = [0,1,0,0,1,0]
np.where(selector,M2,M1)
# array([[1. , 2.1, 3. , 1. , 2.2, 3. ],
# [4. , 5.1, 6. , 4. , 5.2, 6. ]])
etc.
Or in a loop:
M3,M4,M5 = (np.where(s,M2,M1) for s in np.tile(np.identity(3,bool), (1,2)))
M3
# array([[1.1, 2. , 3. , 1.2, 2. , 3. ],
# [4.1, 5. , 6. , 4.2, 5. , 6. ]])
M4
# array([[1. , 2.1, 3. , 1. , 2.2, 3. ],
# [4. , 5.1, 6. , 4. , 5.2, 6. ]])
M5
# array([[1. , 2. , 3.1, 1. , 2. , 3.2],
# [4. , 5. , 6.1, 4. , 5. , 6.2]])
Alternatively, you can copy M1 and then slice in M2. This is more verbose but should be faster:
n = 3
Mj = []
for j in range(n):
Mp = M1.copy()
Mp[:,j::n] = M2[:,j::n]
Mj.append(Mp)
M3,M4,M5 = Mj
I'm trying to write simple code that will take floats in two lists, find the mean between the two numbers in the same position in each list, and generate a new list with the updated means. For example, with list_1 and list_2,
list_1: [1.0, 2.0, 3.0, 4.0, 5.0]
list_2: [6.0, 7.0, 8.0, 9.0, 10.0]
list_3: []
for i in list_1:
for x in list_2:
list_3.append((x+i)/2)
print (list_3)
Find the mean between floats in two lists and create a new list such that:
list_3 = [3.5, 4.5, 5.5, 6.5, 7.5]
I tried creating a for loop but (for obvious reasons) doesn't iterate the way that I want it to. The output is:
[3.5, 4.0, 4.5, 5.0, 5.5, 4.0, 4.5, 5.0, 5.5, 6.0, 4.5, 5.0, 5.5, 6.0, 6.5, 5.0, 5.5, 6.0, 6.5, 7.0, 5.5, 6.0, 6.5, 7.0, 7.5]
any help would be greatly appreciated!
You can do that with a generator expresion like:
Code:
[sum(x)/len(x) for x in zip(list_1, list_2)]
How:
The function zip() allows easy iteration through multiple lists at the same time. From there these values can be fed into sum() and len() as shown.
Test Code:
list_1 = [1.0, 2.0, 3.0, 4.0, 5.0]
list_2 = [6.0, 7.0, 8.0, 9.0, 10.0]
list_3 = [sum(x)/len(x) for x in zip(list_1, list_2)]
print(list_3)
Results:
[3.5, 4.5, 5.5, 6.5, 7.5]