I want to convert a dictionary to pandas dataFrame - python-3.x

di={'ind': 1, 'age': 59, 'bp': 70, 'sg': 1.01, 'al': 1.0, 'su': 3.0, 'rbc': 0.0, 'pc': 0.0, 'pcc': 0.0, 'ba': 0.0, 'bgr': 424.0, 'bu': 55.0, 'sc': 1.7, 'sod': 138.0, 'pot': 4.5, 'hemo': 12.0, 'pcv': 37.0, 'wbcc': 10200.0, 'rbcc': 4.1, 'htn': 1.0, 'dm': 1.0, 'cad': 1.0, 'appet': 1.0, 'pe': 0.0, 'ane': 1.0}
I'm having this dictionary, and I want to convert this to a pandas dataframe with 'ind': 1 as the index, 24 columns and 1 row.
These are the names of each column that I want to have in my df:-
d=['age', 'bp', 'sg','al', 'su', 'rbc', 'pc', 'pcc', 'ba', 'bgr', 'bu', 'sc', 'sod', 'pot', 'hemo', 'pcv', 'wbcc', 'rbcc', 'htn', 'dm','cad', 'appet', 'pe', 'ane']
Please guide me with it. I tried the method pd.DataFrame(di.items(), columns=d) but it returned a df with 1 column and 24 rows, I wan the reciprocal of it i.e. 24 columns and 1 row.
Thank You

di={'ind': 1, 'age': 59, 'bp': 70, 'sg': 1.01, 'al': 1.0, 'su': 3.0, 'rbc': 0.0, 'pc': 0.0, 'pcc': 0.0, 'ba': 0.0, 'bgr': 424.0, 'bu': 55.0, 'sc': 1.7, 'sod': 138.0, 'pot': 4.5, 'hemo': 12.0, 'pcv': 37.0, 'wbcc': 10200.0, 'rbcc': 4.1, 'htn': 1.0, 'dm': 1.0, 'cad': 1.0, 'appet': 1.0, 'pe': 0.0, 'ane': 1.0}
print( pd.DataFrame([di]).set_index('ind') )
Prints:
age bp sg al su rbc pc pcc ba bgr bu sc sod pot hemo pcv wbcc rbcc htn dm cad appet pe ane
ind
1 59 70 1.01 1.0 3.0 0.0 0.0 0.0 0.0 424.0 55.0 1.7 138.0 4.5 12.0 37.0 10200.0 4.1 1.0 1.0 1.0 1.0 0.0 1.0

You can try
df = pd.Series(di).to_frame(0).T.set_index('ind')

Related

Panda how to overwrite new value on previous value?

I have the following code :
# Append columns to an empty DataFrame.
self.df = pd.DataFrame(columns = ["ID", "Features"],index=['index1'])
tracking_id = output[4]
print(tracking_id in set(self.df['ID']))
if tracking_id in self.df['ID'] :
df2 = pd.DataFrame([[tracking_id], features])
self.df.update(df2)
print(self.df)
else :
self.df = self.df.append({'ID' : tracking_id, 'Features' : features}, ignore_index = True)
In this code, first I check whether an element with the same İD is available or not. Is it is available, previous feature value should be updated with the new one. Actually it works but not correctly.
My output is :
ID Features
0 NaN NaN
1 1.0 [[1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 50.0...
2 4.0 [[0.0, 0.0, 0.0, 1.0, 89.0, 15.0, 0.0, 0.0, 10...
3 4.0 [[70.0, 41.0, 17.0, 41.0, 4.0, 0.0, 0.0, 0.0, ...
4 4.0 [[42.0, 18.0, 16.0, 14.0, 2.0, 0.0, 0.0, 0.0, ...
5 6.0 [[3.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.0, 8.0, 59.0...
6 6.0 [[0.0, 6.0, 7.0, 9.0, 12.0, 3.0, 0.0, 0.0, 51....
As you can see, the same value is appended like absent İD on the list. However sometimes it is not added to the list just is overwrited to previous ones. How can I solve this problem ?

I want add my simples list in list of comprehension in python

I have two lists of 24 values and I would like to create a list which could be seen as a 24x2 matrix where the first column is my the values of my first list and the other column is the values of my second list.
Here are my two lists:
q = [6.0, 5.75, 5.5, 5.25, 5.0, 4.75, 4.5, 4.25, 4.0, 3.75, 3.5, 3.25, 3.0, 2.75, 2.5, 2.25, 2.0, 1.75, 1.5, 1.25, 1.0, 0.75, 0.5, 0.25]
t = [0.38, 0.51, 0.71, 1.09, 2.0, 5.68, 0.31, 0.32, 0.34, 0.35, 0.36, 0.38, 0.4, 0.42, 0.44, 0.48, 0.51, 0.56, 0.63, 0.74, 1.41, 2.17, 3.97, 11.36]
You can use zip() function like this
q = [6.0, 5.75, 5.5, 5.25, 5.0, 4.75, 4.5, 4.25, 4.0, 3.75, 3.5, 3.25, 3.0, 2.75, 2.5, 2.25, 2.0, 1.75, 1.5, 1.25, 1.0, 0.75, 0.5, 0.25]
t = [0.38, 0.51, 0.71, 1.09, 2.0, 5.68, 0.31, 0.32, 0.34, 0.35, 0.36, 0.38, 0.4, 0.42, 0.44, 0.48, 0.51, 0.56, 0.63, 0.74, 1.41, 2.17, 3.97, 11.36]
L1 = list(zip(q, t))
res = []
for i, j in L1:
res.append(i)
res.append(j)
print(res)
It seems that you just need to zip your two lists:
myList = [0,1,2,3,4,5]
myOtherList = ["a","b","c","d","e","f"]
# Iterator of tuples
zip(myList, myOtherList)
# List of tuples
list(zip(myList, myOtherList))
You will get this result: [(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e'), (5, 'f')].
If you need another structure, you could use comprehension:
length = min(len(myList), len(myOtherList))
# List of list
[[myList[i], myOtherList[i]] for i in range(length)]
# Dict
{myList[i]: myOtherList[i] for i in range(length)}

ValueError: could not convert string to float: 'ane'

print(xtest.head())
print("predicted as",myModel.predict(xtest))
output:-
age bp sg al su rbc pc ... rbcc htn dm cad appet pe ane
235 45.0 70.0 1.01 2.0 0.0 1.0 1.0 ... 4.8 0.0 0.0 1.0 1.0 0.0 1.0
[1 rows x 24 columns]
predicted as [[0.99633694]]
The xtest dataframe had a column named ane and the model is predicting well. But when I am giving the same input in form of dictionary as
di={'age': 59, 'bp': 70, 'sg': 1.01, 'al': 1.0, 'su': 3.0, 'rbc': 0.0, 'pc': 0.0, 'pcc': 0.0, 'ba': 0.0, 'bgr': 424.0, 'bu': 55.0, 'sc': 1.7, 'sod': 138.0, 'pot': 4.5, 'hemo': 12.0, 'pcv': 37.0, 'wbcc': 10200.0, 'rbcc': 4.1, 'htn': 1.0, 'dm': 1.0, 'cad': 1.0, 'appet': 1.0, 'pe': 0.0, 'ane': 1.0 }
b=pd.DataFrame(di.items())
b=b.T
x['ane'] = x['ane'].astype(float)
tensor = tf.convert_to_tensor(b, dtype=tf.float64)
print(myModel.predict((tensor)))
It's showing the following error:-
ValueError: could not convert string to float: 'ane'
In the training model, I did the same conversion and it worked well.
My colab notebook:-
https://colab.research.google.com/drive/1DomDo3adwRBQUFD0g8JVpF5jxC9HoegW
you should try this code I replaced smae code in colab also.
import pandas as pd
di={'age': 59, 'bp': 70, 'sg': 1.01, 'al': 1.0, 'su': 3.0, 'rbc': 0.0, 'pc': 0.0, 'pcc': 0.0, 'ba': 0.0, 'bgr': 424.0, 'bu': 55.0, 'sc': 1.7, 'sod': 138.0, 'pot': 4.5, 'hemo': 12.0, 'pcv': 37.0, 'wbcc': 10200.0, 'rbcc': 4.1, 'htn': 1.0, 'dm': 1.0, 'cad': 1.0, 'appet': 1.0, 'pe': 0.0, 'ane': 1.0 }
b=pd.DataFrame(list(di.items()),index=di)
b= b.drop(columns=0)
b=b.T
b['ane'] = b['ane'].astype(float)
tensor = tf.convert_to_tensor(b, dtype=tf.float32)
print(myModel.predict((tensor)))

Summing specific columns based on a mapping

I have a series which contains a mapping:
serm = pd.Series(
data={'ARD1': 53, 'BUL1': 37,
'BUL2': 37, 'BSR1': 49, 'BTR1': 53, 'CR1': 53,
'CRR1': 53, 'CRE3': 53,'TAB1': 52, 'NEP1': 42, 'HAL1': 42})
which maps the asset id (the index) to an area (the value).
I have the the following dataframe where serm index is the columns names.
data=pd.DataFrame(data={'ARD1': {0: 4.0, 1: 2.0, 2: 2.0, 3: 3.0, 4: 2.0},
'BUL1': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0},
'BUL2': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0},
'BSR1': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0},
'BTR1': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0},
'CR1': {0: 15.0, 1: 13.0, 2: 13.0, 3: 11.0, 4: 13.0},
'CRR1': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0},
'CRE3': {0: 8.0, 1: 10.0, 2: 9.0, 3: 10.0, 4: 11.0},
'TAB1': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0},
'NEP1': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0},
'HAL1': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0}})
I would like to sum the columns of data that fall in the same area, according to the mapping of serm. How can I achieve this (the more pandanoic the better)
Use Index.map with groupby per columns and aggregate sum:
df = data.groupby(data.columns.map(serm.get), axis=1).sum()
print (df)
37 42 49 52 53
0 0.0 0.0 0.0 0.0 27.0
1 0.0 0.0 0.0 0.0 25.0
2 0.0 0.0 0.0 0.0 24.0
3 0.0 0.0 0.0 0.0 24.0
4 0.0 0.0 0.0 0.0 26.0
Or assign columns back and use sum:
data.columns = data.columns.map(serm.get)
df = data.sum(level=0, axis=1)

Finding frequency distribution of a list of numbers in python

I have a Long list of numbers like the following. I would like to find frequency distribution of each number, but I could not use Counter function to get frequency of each item, as they are integers and I get the error that it is not iterable , and therefore I could not convert the list to strings. I checked the similar questions but they did not work for me.
data=[1.0, 2.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 15.0, 0.0, 0.0, 0.0, 0.0, 3.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 7.0, 1.0, 0.0, 0.0, 4.0, 3.0, 3.0, 1.0, 1.0, 2.0, 4.0, 0.0, 1.0, 7.0, 2.0, 1.0, 1.0, 4.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 3.0, 2.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 10.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 2.0, 3.0, 0.0, 3.0, 2.0, 11.0, 0.0, 5.0, 2.0, 0.0, 1.0, 2.0, 1.0, 8.0, 1.0, 0.0, 6.0, 2.0, 4.0, 0.0, 17.0, 0.0, 27.0, 2.0, 2.0, 1.0, 1.0, 3.0, 2.0, 0.0, 0.0, 6.0, 0.0, 0.0, 1.0, 1.0, 2.0, 0.0, 10.0, 0.0, 0.0, 5.0, 7.0, 1.0, 0.0, 1.0, 2.0, 1.0, 5.0, 2.0, 1.0, 9.0, 1.0, 0.0, 2.0, 0.0, 1.0, 3.0, 1.0, 1.0, 0.0, 0.0, 3.0, 5.0, 2.0, 0.0, 1.0, 9.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 3.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 2.0, 0.0, 1.0, 1.0, 3.0, 1.0, 2.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 5.0, 2.0, 3.0, 2.0, 8.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 4.0, 1.0, 0.0, 2.0, 1.0, 1.0, 19.0, 0.0, 1.0, 0.0, 1.0, 2.0, 1.0, 2.0, 1.0, 1.0, 5.0, 4.0, 2.0, 0.0, 1.0, 2.0, 0.0, 5.0, 0.0, 0.0, 3.0, 1.0, 0.0, 1.0, 1.0, 0.0, 3.0, 2.0, 4.0, 10.0, 2.0, 1.0, 3.0, 1.0, 0.0, 2.0, 1.0, 1.0, 1.0, 1.0, 0.0, 2.0, 17.0, 0.0, 2.0, 3.0, 2.0, 1.0, 0.0, 2.0, 2.0, 1.0, 2.0, 5.0, 2.0, 1.0, 1.0, 1.0, 3.0, 0.0, 1.0, 1.0, 0.0, 4.0, 5.0, 2.0, 2.0, 1.0, 3.0, 0.0, 1.0, 3.0, 1.0, 1.0, 1.0, 0.0, 3.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 3.0, 5.0, 0.0, 1.0, 4.0, 0.0, 0.0, 1.0, 6.09]
You could use something simple like:
def freq(lst):
d = {}
for i in lst:
if d.get(i):
d[i] += 1
else:
d[i] = 1
return d
results:
>>> freq(data)
{0.0: 72, 1.0: 106, 2.0: 40, 3.0: 21, 4.0: 9, 5.0: 10, 6.0: 2, 7.0: 3, 8.0: 2, 9.0: 2, 10.0: 3, 11.0: 1, 15.0: 1, 17.0: 2, 19.0: 1, 6.09: 1, 27.0: 1}
Though Counter worked fine for me (I copy-pasted the data that you posted):
...
>>> from collections import Counter
>>> Counter(data)
Counter({1.0: 106, 0.0: 72, 2.0: 40, 3.0: 21, 5.0: 10, 4.0: 9, 7.0: 3, 10.0: 3, 6.0: 2, 8.0: 2, 9.0: 2, 17.0: 2, 11.0: 1, 15.0: 1, 19.0: 1, 6.09: 1, 27.0: 1})
distribution ={i:data.count(i)/len(data) for i in set(data)}

Resources