generate list of means from lists of floats python - python-3.x

I'm trying to write simple code that will take floats in two lists, find the mean between the two numbers in the same position in each list, and generate a new list with the updated means. For example, with list_1 and list_2,
list_1: [1.0, 2.0, 3.0, 4.0, 5.0]
list_2: [6.0, 7.0, 8.0, 9.0, 10.0]
list_3: []
for i in list_1:
for x in list_2:
list_3.append((x+i)/2)
print (list_3)
Find the mean between floats in two lists and create a new list such that:
list_3 = [3.5, 4.5, 5.5, 6.5, 7.5]
I tried creating a for loop but (for obvious reasons) doesn't iterate the way that I want it to. The output is:
[3.5, 4.0, 4.5, 5.0, 5.5, 4.0, 4.5, 5.0, 5.5, 6.0, 4.5, 5.0, 5.5, 6.0, 6.5, 5.0, 5.5, 6.0, 6.5, 7.0, 5.5, 6.0, 6.5, 7.0, 7.5]
any help would be greatly appreciated!

You can do that with a generator expresion like:
Code:
[sum(x)/len(x) for x in zip(list_1, list_2)]
How:
The function zip() allows easy iteration through multiple lists at the same time. From there these values can be fed into sum() and len() as shown.
Test Code:
list_1 = [1.0, 2.0, 3.0, 4.0, 5.0]
list_2 = [6.0, 7.0, 8.0, 9.0, 10.0]
list_3 = [sum(x)/len(x) for x in zip(list_1, list_2)]
print(list_3)
Results:
[3.5, 4.5, 5.5, 6.5, 7.5]

Related

I want add my simples list in list of comprehension in python

I have two lists of 24 values and I would like to create a list which could be seen as a 24x2 matrix where the first column is my the values of my first list and the other column is the values of my second list.
Here are my two lists:
q = [6.0, 5.75, 5.5, 5.25, 5.0, 4.75, 4.5, 4.25, 4.0, 3.75, 3.5, 3.25, 3.0, 2.75, 2.5, 2.25, 2.0, 1.75, 1.5, 1.25, 1.0, 0.75, 0.5, 0.25]
t = [0.38, 0.51, 0.71, 1.09, 2.0, 5.68, 0.31, 0.32, 0.34, 0.35, 0.36, 0.38, 0.4, 0.42, 0.44, 0.48, 0.51, 0.56, 0.63, 0.74, 1.41, 2.17, 3.97, 11.36]
You can use zip() function like this
q = [6.0, 5.75, 5.5, 5.25, 5.0, 4.75, 4.5, 4.25, 4.0, 3.75, 3.5, 3.25, 3.0, 2.75, 2.5, 2.25, 2.0, 1.75, 1.5, 1.25, 1.0, 0.75, 0.5, 0.25]
t = [0.38, 0.51, 0.71, 1.09, 2.0, 5.68, 0.31, 0.32, 0.34, 0.35, 0.36, 0.38, 0.4, 0.42, 0.44, 0.48, 0.51, 0.56, 0.63, 0.74, 1.41, 2.17, 3.97, 11.36]
L1 = list(zip(q, t))
res = []
for i, j in L1:
res.append(i)
res.append(j)
print(res)
It seems that you just need to zip your two lists:
myList = [0,1,2,3,4,5]
myOtherList = ["a","b","c","d","e","f"]
# Iterator of tuples
zip(myList, myOtherList)
# List of tuples
list(zip(myList, myOtherList))
You will get this result: [(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e'), (5, 'f')].
If you need another structure, you could use comprehension:
length = min(len(myList), len(myOtherList))
# List of list
[[myList[i], myOtherList[i]] for i in range(length)]
# Dict
{myList[i]: myOtherList[i] for i in range(length)}

Does ParameterGrid produce duplicates?

Is it ParameterGrid function from scikit-learn 0.22 in Python 3.7.5 that produces duplicates or is it because I don't use it correctly? Have a look at the following example.
from sklearn.model_selection import ParameterGrid
import pandas as pd
hyper_params_dict = {
"SQM_FOLDER_SUFFIX": ["_SQM_MM"],
"HYPER_RATIO_SCORED_POSES": [0.8],
"HYPER_OUTLIER_MAD_THRESHOLD": [2.0, 2.5, 3.0, 3.5, 4.0, 5.0, 6.0, 7.0, 8.0],
"HYPER_KEEP_MAX_DeltaG_POSES": [1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0],
"HYPER_KEEP_POSE_COLUMN": ["r_i_docking_score"],
"HYPER_SELECT_BEST_BASEMOLNAME_SCORE_BY": ["Eint"],
"HYPER_SELECT_BEST_BASEMOLNAME_POSE_BY": ["Eint"],
"HYPER_SELECT_BEST_STRUCTVAR_POSE_BY": ["complexE"],
"CROSSVAL_PROTEINS_STRING": ['MARK4', 'ACHE', 'JNK2', 'AR', 'EPHB4', 'PPARG', 'MDM2', 'PARP-1', 'TP', 'TPA',
'SIRT2', 'SARS-HCoV', 'PPARG'],
"XTEST_PROTEINS_STRING": [""],
"HYPER_2D_DESCRIPTORS": [""],
"HYPER_3D_DESCRIPTORS": [""],
"HYPER_GLIDE_DESCRIPTORS": [""]
}
df = pd.concat([pd.DataFrame({k: [v] for k, v in p.items()}) for p in ParameterGrid(hyper_params_dict)], ignore_index=True)
df.duplicated().value_counts()
ParameterGrid creates combinations of all values without duplicates.
You have duplicated parameters combinations because CROSSVAL_PROTEINS_STRING contains 2 times the value PPARG.

How to use fit_transform with an array?

Example of array content:
[
[4.9, 3.0, 1.4, 0.2, 0.0, 2.0],
[4.7, 3.2, 1.3, 0.2, 0.0, 2.0],
[4.6, 3.1, 1.5, 0.2, 0.0, 2.0],
...
]
model = TSNE(learning_rate=100)
transformed = model.fit_transform(data)
I'm trying to apply tSNE to a float array, but I get an error. What should I change?
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (149,) + inhomogeneous part.
Try this example:
from sklearn.manifold import TSNE
import numpy as np
X = np.array([[4.9, 3.0, 1.4, 0.2, 0.0, 2.0], [4.7, 3.2, 1.3, 0.2, 0.0, 2.0]])
model = TSNE(learning_rate=100)
transformed = model.fit_transform(X)
print(transformed)

Replace columns in a 2D numpy array by columns from another2D array

I have two 2D arrays, I want to create arrays that are copy of the first one and then replace some columns by others from the second one.
M1 = np.array([[1.0, 2.0, 3.0, 1.0, 2.0, 3.0],
[4.0, 5.0, 6.0, 4.0, 5.0, 6.0]])
M2 = np.array([[1.1, 2.1, 3.1, 1.2, 2.2, 3.2],
[4.1, 5.1, 6.1., 4.2, 5.2, 6.2]])
I want to do a loop that can give the following arrays:
M3 = np.array([[1.1, 2.0, 3.0, 1.2, 2.0, 3.0],
[4.1, 5.0, 6.0, 4.2, 5.0, 6.0]])
M4 = np.array([[1.0, 2.1, 3.0, 1.0, 2.2, 3.0],
[4.0, 5.1, 6.0, 4.0, 5.2, 6.0]])
M5 = np.array([[1.0, 2.0, 3.1, 1.0, 2.0, 3.2],
[4.0, 5.0, 6.1, 4.0, 5.0, 6.2]])
You can use np.where:
selector = [1,0,0,1,0,0]
np.where(selector,M2,M1)
# array([[1.1, 2. , 3. , 1.2, 2. , 3. ],
# [4.1, 5. , 6. , 4.2, 5. , 6. ]])
selector = [0,1,0,0,1,0]
np.where(selector,M2,M1)
# array([[1. , 2.1, 3. , 1. , 2.2, 3. ],
# [4. , 5.1, 6. , 4. , 5.2, 6. ]])
etc.
Or in a loop:
M3,M4,M5 = (np.where(s,M2,M1) for s in np.tile(np.identity(3,bool), (1,2)))
M3
# array([[1.1, 2. , 3. , 1.2, 2. , 3. ],
# [4.1, 5. , 6. , 4.2, 5. , 6. ]])
M4
# array([[1. , 2.1, 3. , 1. , 2.2, 3. ],
# [4. , 5.1, 6. , 4. , 5.2, 6. ]])
M5
# array([[1. , 2. , 3.1, 1. , 2. , 3.2],
# [4. , 5. , 6.1, 4. , 5. , 6.2]])
Alternatively, you can copy M1 and then slice in M2. This is more verbose but should be faster:
n = 3
Mj = []
for j in range(n):
Mp = M1.copy()
Mp[:,j::n] = M2[:,j::n]
Mj.append(Mp)
M3,M4,M5 = Mj

Save a scikit-learn Bunch object

How do I save a scikit-learn Bunch object to a single file? Currently, I save it into several numpy files, which is cumbersome:
from sklearn.datasets import fetch_lfw_people
# Save to files
faces = fetch_lfw_people(min_faces_per_person=60)
np.save('faces_data.npy', faces.data)
np.save('faces_images.npy', faces.images)
np.save('faces_target.npy', faces.target)
np.save('faces_target_names.npy', faces.target_names)
np.save('faces_descr.npy', faces.DESCR)
# Read the files
from sklearn.datasets.base import Bunch
faces = Bunch()
faces['data'] = np.load('faces_data.npy')
faces['images'] = np.load('faces_images.npy')
faces['target'] = np.load('faces_target.npy')
faces['target_names'] = np.load('faces_target_names.npy')
faces['DESCR'] = np.load('faces_descr.npy')
I'm not sure that this will work for all cases but you should be able to save a bunch object as a pickle file.
Example:
from sklearn import datasets
import pickle
iris = datasets.load_iris()
with open('iris.pkl', 'wb') as bunch:
pickle.dump(iris, bunch, protocol=pickle.HIGHEST_PROTOCOL)
with open('iris.pkl', 'rb') as bunch:
df = pickle.load(bunch)
print(df)
Result:
{'data': array([[5.1, 3.5, 1.4, 0.2],
[4.9, 3. , 1.4, 0.2],
[4.7, 3.2, 1.3, 0.2],
[4.6, 3.1, 1.5, 0.2],
[5. , 3.6, 1.4, 0.2],
[5.4, 3.9, 1.7, 0.4],
[4.6, 3.4, 1.4, 0.3],
[5. , 3.4, 1.5, 0.2],
[4.4, 2.9, 1.4, 0.2],
[4.9, 3.1, 1.5, 0.1],
[5.4, 3.7, 1.5, 0.2],
[4.8, 3.4, 1.6, 0.2],
[4.8, 3. , 1.4, 0.1],
[4.3, 3. , 1.1, 0.1],
[5.8, 4. , 1.2, 0.2],
[5.7, 4.4, 1.5, 0.4],
[5.4, 3.9, 1.3, 0.4],
[5.1, 3.5, 1.4, 0.3],
[5.7, 3.8, 1.7, 0.3],
[5.1, 3.8, 1.5, 0.3],
[5.4, 3.4, 1.7, 0.2],
[5.1, 3.7, 1.5, 0.4],
[4.6, 3.6, 1. , 0.2],
[5.1, 3.3, 1.7, 0.5],
[4.8, 3.4, 1.9, 0.2],
[5. , 3. , 1.6, 0.2],
[5. , 3.4, 1.6, 0.4],
[5.2, 3.5, 1.5, 0.2],
[5.2, 3.4, 1.4, 0.2],
[4.7, 3.2, 1.6, 0.2],
[4.8, 3.1, 1.6, 0.2],
[5.4, 3.4, 1.5, 0.4],
[5.2, 4.1, 1.5, 0.1],
[5.5, 4.2, 1.4, 0.2],
[4.9, 3.1, 1.5, 0.2],
[5. , 3.2, 1.2, 0.2],
[5.5, 3.5, 1.3, 0.2],
[4.9, 3.6, 1.4, 0.1],
[4.4, 3. , 1.3, 0.2],
[5.1, 3.4, 1.5, 0.2],
[5. , 3.5, 1.3, 0.3],
[4.5, 2.3, 1.3, 0.3],
[4.4, 3.2, 1.3, 0.2],
[5. , 3.5, 1.6, 0.6],
[5.1, 3.8, 1.9, 0.4],
[4.8, 3. , 1.4, 0.3],
[5.1, 3.8, 1.6, 0.2],
[4.6, 3.2, 1.4, 0.2],
[5.3, 3.7, 1.5, 0.2],
[5. , 3.3, 1.4, 0.2],
[7. , 3.2, 4.7, 1.4],
[6.4, 3.2, 4.5, 1.5],
[6.9, 3.1, 4.9, 1.5],
[5.5, 2.3, 4. , 1.3],
[6.5, 2.8, 4.6, 1.5],
[5.7, 2.8, 4.5, 1.3],
[6.3, 3.3, 4.7, 1.6],
[4.9, 2.4, 3.3, 1. ],
[6.6, 2.9, 4.6, 1.3],
[5.2, 2.7, 3.9, 1.4],
[5. , 2. , 3.5, 1. ],
[5.9, 3. , 4.2, 1.5],
[6. , 2.2, 4. , 1. ],
[6.1, 2.9, 4.7, 1.4],
[5.6, 2.9, 3.6, 1.3],
[6.7, 3.1, 4.4, 1.4],
[5.6, 3. , 4.5, 1.5],
[5.8, 2.7, 4.1, 1. ],
[6.2, 2.2, 4.5, 1.5],
[5.6, 2.5, 3.9, 1.1],
[5.9, 3.2, 4.8, 1.8],
[6.1, 2.8, 4. , 1.3],
[6.3, 2.5, 4.9, 1.5],
[6.1, 2.8, 4.7, 1.2],
[6.4, 2.9, 4.3, 1.3],
[6.6, 3. , 4.4, 1.4],
[6.8, 2.8, 4.8, 1.4],
[6.7, 3. , 5. , 1.7],
[6. , 2.9, 4.5, 1.5],
[5.7, 2.6, 3.5, 1. ],
[5.5, 2.4, 3.8, 1.1],
[5.5, 2.4, 3.7, 1. ],
[5.8, 2.7, 3.9, 1.2],
[6. , 2.7, 5.1, 1.6],
[5.4, 3. , 4.5, 1.5],
[6. , 3.4, 4.5, 1.6],
[6.7, 3.1, 4.7, 1.5],
[6.3, 2.3, 4.4, 1.3],
[5.6, 3. , 4.1, 1.3],
[5.5, 2.5, 4. , 1.3],
[5.5, 2.6, 4.4, 1.2],
[6.1, 3. , 4.6, 1.4],
[5.8, 2.6, 4. , 1.2],
[5. , 2.3, 3.3, 1. ],
[5.6, 2.7, 4.2, 1.3],
[5.7, 3. , 4.2, 1.2],
[5.7, 2.9, 4.2, 1.3],
[6.2, 2.9, 4.3, 1.3],
[5.1, 2.5, 3. , 1.1],
[5.7, 2.8, 4.1, 1.3],
[6.3, 3.3, 6. , 2.5],
[5.8, 2.7, 5.1, 1.9],
[7.1, 3. , 5.9, 2.1],
[6.3, 2.9, 5.6, 1.8],
[6.5, 3. , 5.8, 2.2],
[7.6, 3. , 6.6, 2.1],
[4.9, 2.5, 4.5, 1.7],
[7.3, 2.9, 6.3, 1.8],
[6.7, 2.5, 5.8, 1.8],
[7.2, 3.6, 6.1, 2.5],
[6.5, 3.2, 5.1, 2. ],
[6.4, 2.7, 5.3, 1.9],
[6.8, 3. , 5.5, 2.1],
[5.7, 2.5, 5. , 2. ],
[5.8, 2.8, 5.1, 2.4],
[6.4, 3.2, 5.3, 2.3],
[6.5, 3. , 5.5, 1.8],
[7.7, 3.8, 6.7, 2.2],
[7.7, 2.6, 6.9, 2.3],
[6. , 2.2, 5. , 1.5],
[6.9, 3.2, 5.7, 2.3],
[5.6, 2.8, 4.9, 2. ],
[7.7, 2.8, 6.7, 2. ],
[6.3, 2.7, 4.9, 1.8],
[6.7, 3.3, 5.7, 2.1],
[7.2, 3.2, 6. , 1.8],
[6.2, 2.8, 4.8, 1.8],
[6.1, 3. , 4.9, 1.8],
[6.4, 2.8, 5.6, 2.1],
[7.2, 3. , 5.8, 1.6],
[7.4, 2.8, 6.1, 1.9],
[7.9, 3.8, 6.4, 2. ],
[6.4, 2.8, 5.6, 2.2],
[6.3, 2.8, 5.1, 1.5],
[6.1, 2.6, 5.6, 1.4],
[7.7, 3. , 6.1, 2.3],
[6.3, 3.4, 5.6, 2.4],
[6.4, 3.1, 5.5, 1.8],
[6. , 3. , 4.8, 1.8],
[6.9, 3.1, 5.4, 2.1],
[6.7, 3.1, 5.6, 2.4],
[6.9, 3.1, 5.1, 2.3],
[5.8, 2.7, 5.1, 1.9],
[6.8, 3.2, 5.9, 2.3],
[6.7, 3.3, 5.7, 2.5],
[6.7, 3. , 5.2, 2.3],
[6.3, 2.5, 5. , 1.9],
[6.5, 3. , 5.2, 2. ],
[6.2, 3.4, 5.4, 2.3],
[5.9, 3. , 5.1, 1.8]]), 'target': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]), 'frame': None, 'target_names': array(['setosa', 'versicolor', 'virginica'], dtype='<U10'), 'DESCR': '.. _iris_dataset:\n\nIris plants dataset\n--------------------\n\n**Data Set Characteristics:**\n\n :Number of Instances: 150 (50 in each of three classes)\n :Number of Attributes: 4 numeric, predictive attributes and the class\n :Attribute Information:\n - sepal length in cm\n - sepal width in cm\n - petal length in cm\n - petal width in cm\n - class:\n - Iris-Setosa\n - Iris-Versicolour\n - Iris-Virginica\n \n :Summary Statistics:\n\n ============== ==== ==== ======= ===== ====================\n Min Max Mean SD Class Correlation\n ============== ==== ==== ======= ===== ====================\n sepal length: 4.3 7.9 5.84 0.83 0.7826\n sepal width: 2.0 4.4 3.05 0.43 -0.4194\n petal length: 1.0 6.9 3.76 1.76 0.9490 (high!)\n petal width: 0.1 2.5 1.20 0.76 0.9565 (high!)\n ============== ==== ==== ======= ===== ====================\n\n :Missing Attribute Values: None\n :Class Distribution: 33.3% for each of 3 classes.\n :Creator: R.A. Fisher\n :Donor: Michael Marshall (MARSHALL%PLU#io.arc.nasa.gov)\n :Date: July, 1988\n\nThe famous Iris database, first used by Sir R.A. Fisher. The dataset is taken\nfrom Fisher\'s paper. Note that it\'s the same as in R, but not as in the UCI\nMachine Learning Repository, which has two wrong data points.\n\nThis is perhaps the best known database to be found in the\npattern recognition literature. Fisher\'s paper is a classic in the field and\nis referenced frequently to this day. (See Duda & Hart, for example.) The\ndata set contains 3 classes of 50 instances each, where each class refers to a\ntype of iris plant. One class is linearly separable from the other 2; the\nlatter are NOT linearly separable from each other.\n\n.. topic:: References\n\n - Fisher, R.A. "The use of multiple measurements in taxonomic problems"\n Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions to\n Mathematical Statistics" (John Wiley, NY, 1950).\n - Duda, R.O., & Hart, P.E. (1973) Pattern Classification and Scene Analysis.\n (Q327.D83) John Wiley & Sons. ISBN 0-471-22361-1. See page 218.\n - Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System\n Structure and Classification Rule for Recognition in Partially Exposed\n Environments". IEEE Transactions on Pattern Analysis and Machine\n Intelligence, Vol. PAMI-2, No. 1, 67-71.\n - Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule". IEEE Transactions\n on Information Theory, May 1972, 431-433.\n - See also: 1988 MLC Proceedings, 54-64. Cheeseman et al"s AUTOCLASS II\n conceptual clustering system finds 3 classes in the data.\n - Many, many more ...', 'feature_names': ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'], 'filename': 'iris.csv', 'data_module': 'sklearn.datasets.data'}

Resources