How to insert z values onto x,y coordinates? - python-3.x

I have three list, which has some repeated values. I am using them as x and y coordinates. Thus, my third list is corresponds to the values that I want to plot as a heatmap.
For now, I can assign the values of z for only one list, how can I make a for loop to do the same for the rest of the grid?
x = [1,1,1,2,2,2,3,3,3]
y = [1,2,3,1,2,3,1,2,3]
z = [5.9617e-09,6.3562e-09,6.819e-09,7.3562e-09,7.989e-09,8.6735e-
09,9.3898e-09,1.0139e-08,1.0912e-08,1.0912e-08]
xs = len(set(x))
ys = len(set(y))
grid = []
counter = 0
for row in range(ys):
rows = [] # creating the rows on the grid
if len(rows) < ys: # I want to loop over ys and assign the values os z to each coordinate
grid.append(z[counter])
counter = counter+1
print(grid)
Once I have a 2d array, then I can use the heatmap to plot it nicely.

The easiest way is to use numpy:
In [1]: z = [5.9617e-09,6.3562e-09,6.819e-09,7.3562e-09,
7.989e-09,8.6735e-09,9.3898e-09,1.0139e-08,
1.0912e-08,1.0912e-08]
In [2]: len(z)
Out[2]: 10
In [3]: import numpy as np
Ten numbers don't fit in a 3x3 grid, so skip the last one.
In [4]: nz = np.array(z[:-1])
Out[4]:
array([5.9617e-09, 6.3562e-09, 6.8190e-09, 7.3562e-09, 7.9890e-09,
8.6735e-09, 9.3898e-09, 1.0139e-08, 1.0912e-08])
In [5]: nz.reshape((3,3))
Out[5]:
array([[5.9617e-09, 6.3562e-09, 6.8190e-09],
[7.3562e-09, 7.9890e-09, 8.6735e-09],
[9.3898e-09, 1.0139e-08, 1.0912e-08]])
A plain Python solution using itertools and functools:
In [6]: import itertools as it
...: import functools as ft
In [7]: def chunked(iterable, n): # {{{1
...: def take(n, iterable):
...: return list(it.islice(iterable, n))
...: return iter(ft.partial(take, n, iter(iterable)), [])
...:
In [8]: list(chunked(z[:-1], 3))
Out[8]:
[[5.9617e-09, 6.3562e-09, 6.819e-09],
[7.3562e-09, 7.989e-09, 8.6735e-09],
[9.3898e-09, 1.0139e-08, 1.0912e-08]]

Related

list comprehension in matrix multiplication in python 3.x

I have found a code of Matrix Multiplication in Python 3.x but I am not able to understand how list comprehension is working in the below code.
# Program to multiply two matrices using list comprehension
# 3x3 matrix
X = [[12,7,3],
[4 ,5,6],
[7 ,8,9]]
# 3x4 matrix
Y = [[5,8,1,2],
[6,7,3,0],
[4,5,9,1]]
# result is 3x4
result = [[sum(a*b for a,b in zip(X_row,Y_col)) for Y_col in zip(*Y)] for X_row in X]
for r in result:
print(r)
#Santosh, it's probably easier to understand this List Comprehension from pure loop way, like this:
# 3x3 matrix
X = [[12,7,3],
[4,5,6],
[7,8,9]]
# 3x4 matrix
Y = [[5,8,1,2],
[6,7,3,0],
[4,5,9,1]]
# result is 3x4
result = [[0,0,0,0],
[0,0,0,0],
[0,0,0,0]]
# iterate through rows of X
for r in range(len(X)):
# iterate through columns of Y
for c in range(len(Y[0])):
# iterate through rows of Y
for k in range(len(Y)):
result[r][c] += X[r][k] * Y[k][c]
print(result)
Then you prob. can find the similarity with the List Comprehension version, with little reformatting:
def matrix_mul(X, Y):
zip_b = list(zip(*Y))
return [[sum(a * b for a, b in zip(row_a, col_b))
for col_b in zip_b]
for row_a in X]

Loop over items in dicts in list using Python 3 comprehensions

Suppose I have a list with dicts, given is: each dict contains one key.
testlist = [{'x': 15}, {'y': 16}, {'z': 17}]
for x in testlist:
for k, v in x.items():
print(k,v)
# x 15
# y 16
# z 17
How can I use comprehensions to get the same result?
I tried this:
for k,v in [x.items() for x in testlist]:
print(k,v)
Returns: ValueError: not enough values to unpack (expected 2, got 1)
You have to make a multiloop comprehension:
for k,v in [pair for x in testlist for pair in x.items()]:
or use itertools.chain to do the flattening for you (somewhat more efficiently):
from itertools import chain
for k, v in chain.from_iterable(x.items() for x in testlist):
# Or with operator.methodcaller to move the work to the C layer:
for k, v in chain.from_iterable(map(methodcaller('items'), testlist)):

how to check if vaules of a matrix and an array are equal in python

Given a matrix mat and an array arr, for each row of the matrix if elements of Column 1 are equal to the corresponding element of the array, then print the corresponding value of Column 2 of the matrix.
mat = np.array([['abc','A'],['def','B'],['ghi','C'],['jkl','D']])
arr = np.array(['abc','dfe','ghi','kjl'])
This can be solved via numpy.where.
Extract the first row of the matrix using mat[:,0], and compare it to arr using np.where(mat[:,0] == arr) to extract the indexes.
and use those indexes to get the elements you want from mat
In [1]: import numpy as np
...:
...: mat = np.array([['abc','A'],['def','B'],['ghi','C'],['jkl','D']])
...:
...: arr = np.array(['abc','dfe','ghi','kjl'])
In [2]: print(mat[np.where(mat[:,0] == arr)])
[['abc' 'A']
['ghi' 'C']]
Output should be `['A', 'C']``
So above code can be modified as
print(mat[np.where(mat[:,0]=arr)][:,1]
# output ['A' 'C']

How to iterate over dfs and append data with combine names

i have this problem to solve, this is a continuation of a previus question How to iterate over pandas df with a def function variable function and the given answer worked perfectly, but now i have to append all the data in a 2 columns dataframe (Adduct_name and mass).
This is from the previous question:
My goal: i have to calculate the "adducts" for a given "Compound", both represents numbes, but for eah "Compound" there are 46 different "Adducts".
Each adduct is calculated as follow:
Adduct 1 = [Exact_mass*M/Charge + Adduct_mass]
where exact_mass = number, M and Charge = number (1, 2, 3, etc) according to each type of adduct, Adduct_mass = number (positive or negative) according to each adduct.
My data: 2 data frames. One with the Adducts names, M, Charge, Adduct_mass. The other one correspond to the Compound_name and Exact_mass of the Compounds i want to iterate over (i just put a small data set)
Adducts: df_al
import pandas as pd
data = [["M+3H", 3, 1, 1.007276], ["M+3Na", 3, 1, 22.989], ["M+H", 1, 1,
1.007276], ["2M+H", 1, 2, 1.007276], ["M-3H", 3, 1, -1.007276]]
df_al = pd.DataFrame(data, columns=["Ion_name", "Charge", "M", "Adduct_mass"])
Compounds: df
import pandas as pd
data1 = [[1, "C3H64O7", 596.465179], [2, "C30H42O7", 514.293038], [4,
"C44H56O8", 712.397498], [4, "C24H32O6S", 448.191949], [5, "C20H28O3",
316.203834]]
df = pd.DataFrame(data1, columns=["CdId", "Formula", "exact_mass"])
The solution to this problem was:
df_name = df_al["Ion_name"]
df_mass = df_al["Adduct_mass"]
df_div = df_al["Charge"]
df_M = df_al["M"]
#Defining general function
def Adduct(x,i):
return x*df_M[i]/df_div[i] + df_mass[i]
#Applying general function in a range from 0 to 5.
for i in range(5):
df[df_name.loc[i]] = df['exact_mass'].map(lambda x: Adduct(x,i))
Output
Name exact_mass M+3H M+3Na M+H 2M+H M-3H
0 a 596.465179 199.829002 221.810726 597.472455 1193.937634 197.814450
1 b 514.293038 172.438289 194.420013 515.300314 1029.593352 170.423737
2 c 712.397498 238.473109 260.454833 713.404774 1425.802272 236.458557
3 d 448.191949 150.404592 172.386316 449.199225 897.391174 148.390040
4 e 316.203834 106.408554 128.390278 317.211110 633.414944 104.39400
Now that is the rigth calculations but i need now a file where:
-only exists 2 columns (Name and mass)
-All the different adducts are appended one after another
desired out put
Name Mass
a_M+3H 199.82902
a_M+3Na 221.810726
a_M+H 597.472455
a_2M+H 1193.937634
a_M-3H 197.814450
b_M+3H 514.293038
.
.
.
c_M+3H
and so on.
Also i need to combine the name of the respective compound with the ion form (M+3H, M+H, etc).
At this point i have no code for that.
I would apprecitate any advice and a better approach since the begining.
This part is an update of the question above:
Is posible to obtain and ouput like this one:
Name Mass RT
a_M+3H 199.82902 1
a_M+3Na 221.810726 1
a_M+H 597.472455 1
a_2M+H 1193.937634 1
a_M-3H 197.814450 1
b_M+3H 514.293038 3
.
.
.
c_M+3H 2
The RT is the same value for all forms of a compound, in this example is RT for a =1, b = 3, c =2, etc.
Is posible to incorporate (Keep this column) from the data set df (which i update here below)?. As you can see that df has more columns like "Formula" and "RT" which desapear after calculations.
import pandas as pd
data1 = [[a, "C3H64O7", 596.465179, 1], [b, "C30H42O7", 514.293038, 3], [c,
"C44H56O8", 712.397498, 2], [d, "C24H32O6S", 448.191949, 4], [e, "C20H28O3",
316.203834, 1.5]]
df = pd.DataFrame(data1, columns=["Name", "Formula", "exact_mass", "RT"])
Part three! (sorry and thank you)
this is a trial i did on a small data set (df) using the code below, with the same df_al of above.
df=
Code
#Defining variables for calculation
df_name = df_al["Ion_name"]
df_mass = df_al["Adduct_mass"]
df_div = df_al["Charge"]
df_M = df_al["M"]
df_ID= df["Name"]
#Defining the RT dictionary
RT = dict(zip(df["Name"], df["RT"]))
#Removing RT column
df=df.drop(columns=["RT"])
#Defining general function
def Adduct(x,i):
return x*df_M[i]/df_div[i] + df_mass[i]
#Applying general function in a range from 0 to 46.
for i in range(47):
df[df_name.loc[i]] = df['exact_mass'].map(lambda x: Adduct(x,i))
df
output
#Melting
df = pd.melt(df, id_vars=['Name'], var_name = "Adduct", value_name= "Exact_mass", value_vars=[x for x in df.columns if 'Name' not in x and 'exact' not in x])
df['name'] = df.apply(lambda x:x[0] + "_" + x[1], axis=1)
df['RT'] = df.Name.apply(lambda x: RT[x[0]] if x[0] in RT else np.nan)
del df['Name']
del df['Adduct']
df['RT'] = df.name.apply(lambda x: RT[x[0]] if x[0] in RT else np.nan)
df
output
Why NaN?
Here is how I will go about it, pandas.melt comes to rescue:
import pandas as pd
import numpy as np
from io import StringIO
s = StringIO('''
Name exact_mass M+3H M+3Na M+H 2M+H M-3H
0 a 596.465179 199.829002 221.810726 597.472455 1193.937634 197.814450
1 b 514.293038 172.438289 194.420013 515.300314 1029.593352 170.423737
2 c 712.397498 238.473109 260.454833 713.404774 1425.802272 236.458557
3 d 448.191949 150.404592 172.386316 449.199225 897.391174 148.390040
4 e 316.203834 106.408554 128.390278 317.211110 633.414944 104.39400
''')
df = pd.read_csv(s, sep="\s+")
df = pd.melt(df, id_vars=['Name'], value_vars=[x for x in df.columns if 'Name' not in x and 'exact' not in x])
df['name'] = df.apply(lambda x:x[0] + "_" + x[1], axis=1)
del df['Name']
del df['variable']
RT = {'a':1, 'b':2, 'c':3, 'd':5, 'e':1.5}
df['RT'] = df.name.apply(lambda x: RT[x[0]] if x[0] in RT else np.nan)
df
Here is the output:

Python3: set a range of data

I feel this must be very basic but I cannot find a simple way.
I am using python3
I have many data files with x,y data where x goes from 0 to 140 (floating).
Let's say
0, 2.1
0.5,3.5
0.8,3.2
...
I want to import values of x within the range 25.4 to 28.1 and their correspondent values in y. Every file might have different length so the value x>25.4 might appear in different row.
I am looking for something equivalent to the following command in gnuplot:
set xrange [25.4:28.1]
This time I cannot use gnuplot because the data processing requires more than the capabilities of gnuplot.
I imported the data with Pandas but I cannot set a range.
Thank you.
r = range(start, stop, step) is the pattern for this in Python.
So, for example, to get:
r == [0, 1, 2]
You would write:
r = [x for x in range(3)]
And to get:
r == [0, 5, 10]
You would write:
r = [x for x in range(0, 11, 5)]
This doesn't get you very far because:
r = [0, .2, 4.3, 6.3]
r = [x for x in r if x in range(3, 10)]
# r == []
But you can do:
r = [0, .2, 4.3, 6.3]
r = [x for x in r if ((x > 3) & (x < 10))]
# r == [4.3, 6.3]
Pandas and Numpy give you a much more concise way of doing this. Consider the following demo of .between
import pandas as pd
import io
text = io.StringIO("""Close Top_Barrier Bottom_Barrier
0 441.86 441.964112 426.369888
1 448.95 444.162225 425.227108
2 449.99 446.222271 424.285063
3 449.74 447.947051 423.678282
4 451.97 449.879254 423.029413""")
df = pd.read_csv(text, sep='\\s+')
df = df[df["Close"].between(449, 452)] # between
df
So for your df you can do the same: df = df[df["x"].between(min, max)]

Resources