Summing specific columns based on a mapping - python-3.x

I have a series which contains a mapping:
serm = pd.Series(
data={'ARD1': 53, 'BUL1': 37,
'BUL2': 37, 'BSR1': 49, 'BTR1': 53, 'CR1': 53,
'CRR1': 53, 'CRE3': 53,'TAB1': 52, 'NEP1': 42, 'HAL1': 42})
which maps the asset id (the index) to an area (the value).
I have the the following dataframe where serm index is the columns names.
data=pd.DataFrame(data={'ARD1': {0: 4.0, 1: 2.0, 2: 2.0, 3: 3.0, 4: 2.0},
'BUL1': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0},
'BUL2': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0},
'BSR1': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0},
'BTR1': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0},
'CR1': {0: 15.0, 1: 13.0, 2: 13.0, 3: 11.0, 4: 13.0},
'CRR1': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0},
'CRE3': {0: 8.0, 1: 10.0, 2: 9.0, 3: 10.0, 4: 11.0},
'TAB1': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0},
'NEP1': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0},
'HAL1': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0}})
I would like to sum the columns of data that fall in the same area, according to the mapping of serm. How can I achieve this (the more pandanoic the better)

Use Index.map with groupby per columns and aggregate sum:
df = data.groupby(data.columns.map(serm.get), axis=1).sum()
print (df)
37 42 49 52 53
0 0.0 0.0 0.0 0.0 27.0
1 0.0 0.0 0.0 0.0 25.0
2 0.0 0.0 0.0 0.0 24.0
3 0.0 0.0 0.0 0.0 24.0
4 0.0 0.0 0.0 0.0 26.0
Or assign columns back and use sum:
data.columns = data.columns.map(serm.get)
df = data.sum(level=0, axis=1)

Related

Panda how to overwrite new value on previous value?

I have the following code :
# Append columns to an empty DataFrame.
self.df = pd.DataFrame(columns = ["ID", "Features"],index=['index1'])
tracking_id = output[4]
print(tracking_id in set(self.df['ID']))
if tracking_id in self.df['ID'] :
df2 = pd.DataFrame([[tracking_id], features])
self.df.update(df2)
print(self.df)
else :
self.df = self.df.append({'ID' : tracking_id, 'Features' : features}, ignore_index = True)
In this code, first I check whether an element with the same İD is available or not. Is it is available, previous feature value should be updated with the new one. Actually it works but not correctly.
My output is :
ID Features
0 NaN NaN
1 1.0 [[1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 50.0...
2 4.0 [[0.0, 0.0, 0.0, 1.0, 89.0, 15.0, 0.0, 0.0, 10...
3 4.0 [[70.0, 41.0, 17.0, 41.0, 4.0, 0.0, 0.0, 0.0, ...
4 4.0 [[42.0, 18.0, 16.0, 14.0, 2.0, 0.0, 0.0, 0.0, ...
5 6.0 [[3.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.0, 8.0, 59.0...
6 6.0 [[0.0, 6.0, 7.0, 9.0, 12.0, 3.0, 0.0, 0.0, 51....
As you can see, the same value is appended like absent İD on the list. However sometimes it is not added to the list just is overwrited to previous ones. How can I solve this problem ?

ValueError: could not convert string to float: 'ane'

print(xtest.head())
print("predicted as",myModel.predict(xtest))
output:-
age bp sg al su rbc pc ... rbcc htn dm cad appet pe ane
235 45.0 70.0 1.01 2.0 0.0 1.0 1.0 ... 4.8 0.0 0.0 1.0 1.0 0.0 1.0
[1 rows x 24 columns]
predicted as [[0.99633694]]
The xtest dataframe had a column named ane and the model is predicting well. But when I am giving the same input in form of dictionary as
di={'age': 59, 'bp': 70, 'sg': 1.01, 'al': 1.0, 'su': 3.0, 'rbc': 0.0, 'pc': 0.0, 'pcc': 0.0, 'ba': 0.0, 'bgr': 424.0, 'bu': 55.0, 'sc': 1.7, 'sod': 138.0, 'pot': 4.5, 'hemo': 12.0, 'pcv': 37.0, 'wbcc': 10200.0, 'rbcc': 4.1, 'htn': 1.0, 'dm': 1.0, 'cad': 1.0, 'appet': 1.0, 'pe': 0.0, 'ane': 1.0 }
b=pd.DataFrame(di.items())
b=b.T
x['ane'] = x['ane'].astype(float)
tensor = tf.convert_to_tensor(b, dtype=tf.float64)
print(myModel.predict((tensor)))
It's showing the following error:-
ValueError: could not convert string to float: 'ane'
In the training model, I did the same conversion and it worked well.
My colab notebook:-
https://colab.research.google.com/drive/1DomDo3adwRBQUFD0g8JVpF5jxC9HoegW
you should try this code I replaced smae code in colab also.
import pandas as pd
di={'age': 59, 'bp': 70, 'sg': 1.01, 'al': 1.0, 'su': 3.0, 'rbc': 0.0, 'pc': 0.0, 'pcc': 0.0, 'ba': 0.0, 'bgr': 424.0, 'bu': 55.0, 'sc': 1.7, 'sod': 138.0, 'pot': 4.5, 'hemo': 12.0, 'pcv': 37.0, 'wbcc': 10200.0, 'rbcc': 4.1, 'htn': 1.0, 'dm': 1.0, 'cad': 1.0, 'appet': 1.0, 'pe': 0.0, 'ane': 1.0 }
b=pd.DataFrame(list(di.items()),index=di)
b= b.drop(columns=0)
b=b.T
b['ane'] = b['ane'].astype(float)
tensor = tf.convert_to_tensor(b, dtype=tf.float32)
print(myModel.predict((tensor)))

I want to convert a dictionary to pandas dataFrame

di={'ind': 1, 'age': 59, 'bp': 70, 'sg': 1.01, 'al': 1.0, 'su': 3.0, 'rbc': 0.0, 'pc': 0.0, 'pcc': 0.0, 'ba': 0.0, 'bgr': 424.0, 'bu': 55.0, 'sc': 1.7, 'sod': 138.0, 'pot': 4.5, 'hemo': 12.0, 'pcv': 37.0, 'wbcc': 10200.0, 'rbcc': 4.1, 'htn': 1.0, 'dm': 1.0, 'cad': 1.0, 'appet': 1.0, 'pe': 0.0, 'ane': 1.0}
I'm having this dictionary, and I want to convert this to a pandas dataframe with 'ind': 1 as the index, 24 columns and 1 row.
These are the names of each column that I want to have in my df:-
d=['age', 'bp', 'sg','al', 'su', 'rbc', 'pc', 'pcc', 'ba', 'bgr', 'bu', 'sc', 'sod', 'pot', 'hemo', 'pcv', 'wbcc', 'rbcc', 'htn', 'dm','cad', 'appet', 'pe', 'ane']
Please guide me with it. I tried the method pd.DataFrame(di.items(), columns=d) but it returned a df with 1 column and 24 rows, I wan the reciprocal of it i.e. 24 columns and 1 row.
Thank You
di={'ind': 1, 'age': 59, 'bp': 70, 'sg': 1.01, 'al': 1.0, 'su': 3.0, 'rbc': 0.0, 'pc': 0.0, 'pcc': 0.0, 'ba': 0.0, 'bgr': 424.0, 'bu': 55.0, 'sc': 1.7, 'sod': 138.0, 'pot': 4.5, 'hemo': 12.0, 'pcv': 37.0, 'wbcc': 10200.0, 'rbcc': 4.1, 'htn': 1.0, 'dm': 1.0, 'cad': 1.0, 'appet': 1.0, 'pe': 0.0, 'ane': 1.0}
print( pd.DataFrame([di]).set_index('ind') )
Prints:
age bp sg al su rbc pc pcc ba bgr bu sc sod pot hemo pcv wbcc rbcc htn dm cad appet pe ane
ind
1 59 70 1.01 1.0 3.0 0.0 0.0 0.0 0.0 424.0 55.0 1.7 138.0 4.5 12.0 37.0 10200.0 4.1 1.0 1.0 1.0 1.0 0.0 1.0
You can try
df = pd.Series(di).to_frame(0).T.set_index('ind')

can some one help to fit the array in kmeans clustering

when i try to fit it in kmeans clustering it throws error "ValueError: setting an array element with a sequence."
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=5)
kmeans.fit(df)
Array decription.
Name: Vector, Length: 179, dtype: object
0 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
1 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
10 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
100 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
101 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
Your column has a list in it. It needs to be opened up into multiple columns before passing it to KMeans.
df = pd.read_json('/Users/roshansk/Downloads/NewsArticles.json')
#Extracting the vectors into columns
vectors = df.Vector.apply(pd.Seriesies)
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=5)
kmeans.fit(vectors)

Gaussian elimination with partial pivoting (column)

I cannot find out the mistake I made, could anyone help me? Thanks very much!
import math
def GASSEM():
a0 = [12,-2,1,0,0,0,0,0,0,0,13.97]
a1 = [-2,12,-2,1,0,0,0,0,0,0,5.93]
a2 = [1,-2,12,-2,1,0,0,0,0,0,-6.02]
a3 = [0,1,-2,12,-2,1,0,0,0,0,8.32]
a4 = [0,0,1,-2,12,-2,1,0,0,0,-23.75]
a5 = [0,0,0,1,-2,12,-2,1,0,0,28.45]
a6 = [0,0,0,0,1,-2,12,-2,1,0,-8.9]
a7 = [0,0,0,0,0,1,-2,12,-2,1,-10.5]
a8 = [0,0,0,0,0,0,1,-2,12,-2,10.34]
a9 = [0,0,0,0,0,0,0,1,-2,12,-38.74]
A = [a0,a1,a2,a3,a4,a5,a6,a7,a8,a9] # 10x11 matrix
interchange=[0,0,0,0,0,0,0,0,0,0,0]
for i in range (1,10):
median = abs(A[i-1][i-1])
for m in range (i,10): #pivoting
if abs(A[m][i-1]) > median:
median = abs(A[m][i-1])
interchange = A[i-1]
A[i-1] = A[m]
A[m] = interchange
for j in range(i,10): #creating upper triangle matrix
A[j] = [A[j][k]-(A[j][i-1]/A[i-1][i-1])*A[i-1][k] for k in range(0,11)]
for t in range (0,10): #print the upper triangle matrix
print(A[t])
The output is not an upper triangle matrix, I'm getting lost in the for loops...
When I run this code, the output is
[12, -2, 1, 0, 0, 0, 0, 0, 0, 0, 13.97]
[0.0, 11.666666666666666, -1.8333333333333333, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 8.258333333333333]
[0.0, 0.0, 11.628571428571428, -1.842857142857143, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, -5.886428571428571]
[0.0, 0.0, -2.220446049250313e-16, 11.622235872235873, -1.8415233415233416, 1.0, 0.0, 0.0, 0.0, 0.0, 6.679281326781327]
[0.0, 0.0, -3.518258683818212e-17, 0.0, 11.622218698800275, -1.8415517150256329, 1.0, 0.0, 0.0, 0.0, -22.185475397706252]
[0.0, 0.0, 1.3530439218911067e-17, 0.0, 0.0, 11.62216239813737, -1.841549039580908, 1.0, 0.0, 0.0, 24.359991632712457]
[0.0, 0.0, 5.171101701700419e-18, 0.0, 0.0, 0.0, 11.622161705324444, -1.84154850220678, 1.0, 0.0, -3.131238144426707]
[0.0, 0.0, -3.448243038110395e-19, 0.0, 0.0, 0.0, 0.0, 11.62216144141611, -1.8415485389982904, 1.0, -13.0921440313208]
[0.0, 0.0, -4.995725026226573e-19, 0.0, 0.0, 0.0, 0.0, 0.0, 11.622161418001749, -1.8415485322346454, 8.534950160892514]
[0.0, 0.0, -4.9488445836100553e-20, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 11.622161417603511, -36.26114362292296]
This effectively is upper triangular. The absolute value of the 'non-zero' entries in the third column of the lower triangle are all less than 10e-15. Given that other values are 1 or greater, these small numbers look like floating point subtraction errors in A[j][k] - (A[j][i-1]/A[i-1][i-1])*A[i-1][k] that can be considered to be 0. Without more investigation, I don't know why the non-zero values are limited to this column.
For this data, the condition abs(A[m][i-1]) > median is never true, so the if block code is not tested.

Resources