How to compute MAE for the columns in a pandas Dataframe with the last column:
,CPFNN,EN,Blupred,Horvath2,EPM,vMLP,Age
202,4.266596,3.5684403102704,5.2752761330328,5.17705043941232,3.30077613485548,3.412883,4.0
203,5.039452,5.1258136685894,4.40019825995985,5.03563327742846,3.97465334472661,4.140719,4.0
204,5.0227585,5.37207428128756,1.56392554883583,4.41805439337257,4.43779809822224,4.347523,4.0
205,4.796998,5.61052306552109,4.20912233479662,3.57075401779518,3.24902718889411,3.887743,4.0
I have a pandas dataframe and I want to create a list with the mae values of each column with "Age".
Is there a "pandas" way of doing this instead of just doing a for loop for each column?
from sklearn.metrics import mean_absolute_error as mae
mae(blood_bestpred_df["CPFNN"], blood_bestpred_df['Age'])
I'd like to do this:
mae(blood_bestpred_df[["CPFNN,EN,Blupred,Horvath2,EPM,vMLP"]], blood_bestpred_df['Age'])
But I have a dimension issue.
Looks like sklearn's MAE requires both inputs to be the same shape and doesn't do any broadcasting (I'm not an sklearn expert, there might be another way around this). You can use raw pandas instead:
import pandas as pd
df = pd.read_clipboard(sep=",", index_col=0) # Your df here
out = df.drop(columns="Age").sub(df["Age"], axis=0).abs().mean()
out:
CPFNN 0.781451
EN 1.134993
Blupred 1.080168
Horvath2 0.764996
EPM 0.478335
vMLP 0.296904
dtype: float64
I have a dataset like below. I want to do one hot encoding for logistic regression for the 'Item' column. There are 313 distinct items in the 'Item' column I'm getting below error. Can you please assist how to resolve it?
enter image description here
Here is the code:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [0])],
remainder='passthrough')
X = np.array(ct.fit_transform(X))**
array(<1126x316 sparse matrix of type '<class 'numpy.float64'>'
with 4493 stored elements in Compressed Sparse Row format>, dtype=object)
Use this code, where df is the name of your dataframe
import pandas as pd
df = pd.get_dummies(data = df, columns = ['Item'])
I am looking for a single vector with values [(0:400) (-400:-1)]
Can anyone help me on how to write this in python.
Using Numpy .array to create the vector and .arange to generate the range:
import numpy as np
arr = np.array([[np.arange(401)], [np.arange(-400, 0)]], dtype=object)
Convert 3d numpy array to pandas dataframe with numpy.array as elements.
Are there any other solutions? What about speed?
import numpy as np
import pandas as pd
ones = np.ones((2,3,5))
temp = [[np.array(column_elem, dtype=np.object) for column_elem in row] for row in ones]
df = pd.DataFrame(temp)
I am looking for the best way to compute many dask delayed obejcts stored in a dataframe. I am unsure if the pandas dataframe should be converted to a dask dataframe with delayed objects within, or if the compute call should be called on all values of the pandas dataframe.
I would appreciate any suggestions in general, as I am having trouble with the logic of passing delayed object across nested for loops.
import numpy as np
import pandas as pd
from scipy.stats import hypergeom
from dask import delayed, compute
steps = 5
sample = [int(x) for x in np.linspace(5, 100, num=steps)]
enr_df = pd.DataFrame()
for N in sample:
enr = []
for i in range(20):
k = np.random.randint(1, 200)
enr.append(delayed(hypergeom.sf)(k=k, M=10000, n=20, N=N, loc=0))
enr_df[N] = enr
I cannot call compute on this dataframe without applying the function across all cells like so: enr_df.applymap(compute) (which I believe calls compute on each value individually).
However if I convert to a dask dataframe the delayed objects I want to compute are layered in the dask dataframe structure:
enr_dd = dd.from_pandas(enr_df, npartitions=1)
enr_dd.compute()
And the computation output I expect does not proceed.
You can pass a list of delayed objects into dask.compute
results = dask.compute(*list_of_delayed_objects)
So you need to get a list from your Pandas dataframe. This is something you can do with normal Python code.