Convert dataframe dict to table in pandas - python-3.x

I have the output from
df = pd.DataFrame.from_records(get_data)
display(df_data)
Output
'f_data':[{'fid': '9.3', 'lfid': '39.3'}, {'fid': '839.4', 'lfid': '739.3'}]
Needed output format like below
f_data
fid
lfid
9.3
39.3
839.4
739.3

Try with dict get the correct key
d = {'f_data':[{'fid': '9.3', 'lfid': '39.3'}, {'fid': '839.4', 'lfid': '739.3'}]}
out = pd.DataFrame(d['f_data'])
Out[147]:
fid lfid
0 9.3 39.3
1 839.4 739.3

Related

How can i convert Mongodb cursor result to Dataframe in python?

This is my code:
x = list(coll.find({"activities.flowCenterInfo": {
'$exists': True
}},{'activities.activityId':1,'activities.flowCenterInfo':1,'_id':0}).limit(5))
for row in x:
print(row)
This is the result of x for one sample:
{'activities': [{'activityId': 'B83F36898FE444309757FBEB6DF0685D', 'flowCenterInfo': {'processId': '178888', 'demandComplaintSubject': 'İkna Görüşmesi', 'demandComplaintDetailSubject': 'Hayat Sigortadan Ayrılma', 'demandComplaintId': '178888'}}]}
I want to convert to Dataframe to write the oracle table. How can i convert it to Dataframe properly i can't find anyway
This image shows that the mongodb structure of one sample
Assuming that activities key contains a list with a single dict, each field within flowCenterInfo key is marked with fcinfo_:
# sample list
l = [{'activities': [{'activityId': 'B83F36898FE444309757FBEB6DF0685D', 'flowCenterInfo': {'processId': '178888', 'demandComplaintSubject': 'İkna Görüşmesi', 'demandComplaintDetailSubject': 'Hayat Sigortadan Ayrılma', 'demandComplaintId': '178888'}}]},
{'activities': [{'activityId': 'B83F36898FE444309757FBEB6DF0685D', 'flowCenterInfo': {'processId': '178888', 'demandComplaintSubject': 'İkna Görüşmesi', 'demandComplaintDetailSubject': 'Hayat Sigortadan Ayrılma', 'demandComplaintId': '178888'}}]},
{'activities': [{'activityId': 'B83F36898FE444309757FBEB6DF0685D', 'flowCenterInfo': {'processId': '178888', 'demandComplaintSubject': 'İkna Görüşmesi', 'demandComplaintDetailSubject': 'Hayat Sigortadan Ayrılma', 'demandComplaintId': '178888'}}]}]
df = pd.DataFrame.from_records([dict(**{'activityId': r['activities'][0]['activityId']}, \
**dict(zip(map('fcinfo_{}'.format, r['activities'][0]['flowCenterInfo'].keys()), \
r['activities'][0]['flowCenterInfo'].values()))) for r in l])
print(df)
activityId fcinfo_processId ... fcinfo_demandComplaintDetailSubject fcinfo_demandComplaintId
0 B83F36898FE444309757FBEB6DF0685D 178888 ... Hayat Sigortadan Ayrılma 178888
1 B83F36898FE444309757FBEB6DF0685D 178888 ... Hayat Sigortadan Ayrılma 178888
2 B83F36898FE444309757FBEB6DF0685D 178888 ... Hayat Sigortadan Ayrılma 178888
[3 rows x 5 columns]

How to use groupby function by leaving out leap day [duplicate]

I have a data frame in pandas like this:
ID Date Element Data_Value
0 USW00094889 2014-11-12 TMAX 22
1 USC00208972 2009-04-29 TMIN 56
2 USC00200032 2008-05-26 TMAX 278
3 USC00205563 2005-11-11 TMAX 139
4 USC00200230 2014-02-27 TMAX -106
I want to remove all leap days and my code is
df = df[~((df.Date.month == 2) & (df.Date.day == 29))]
but the AttributeError happened :
'Series' object has no attribute 'month'
Whats wrong with my code?
Use dt accessor:
df = df[~((df.Date.dt.month == 2) & (df.Date.dt.day == 29))]
Add dt accessor because working with Series, not with DatetimeIndex:
df = df[~((df.Date.dt.month == 2) & (df.Date.dt.day == 29))]
Or invert condition with chaining | for bitwise OR and != for not equal:
df = df[(df.Date.dt.month != 2) | (df.Date.dt.day != 29)]
Or use strftime for convert to MM-DD format:
df = df[df.Date.dt.strftime('%m-%m') != '02-29']
Another way you can try below in incase your Date column is not proper datetime rather a str.
df[~df.Date.str.endswith('02-29')]
OR , if it's in datetime format even you can try converting to str.
df[~df.Date.astype(str).str.endswith('02-29')]
OR, Even use contains:
df[~df.Date.str.contains('02-29')]

Append strings to pandas dataframe column with conditional

I'm trying to achieve the following: check if each key value in the dictionary is in the string from column layers. If it meets the conditional, to append the value from the dictionary to the pandas dataframe.
For example, if BR and EWKS is contained within the layer, then in the new column there will be BRIDGE-EARTHWORKS.
Dataframe
mapping = {'IDs': [1244, 35673, 37863, 76373, 234298],
'Layers': ['D-BR-PILECAPS-OUTLINE 2',
'D-BR-STEEL-OUTLINE 2D-TERR-BOUNDARY',
'D-SUBG-OTHER',
'D-COMP-PAVE-CONC2',
'D-EWKS-HINGE']}
df = pd.DataFrame(mapping)
Dictionary
d1 = {"BR": "Bridge", "EWKS": "Earthworks", "KERB": "Kerb", "TERR": "Terrain"}
My code thus far is:
for i in df.Layers
for x in d1.keys():
first_key = list(d1)[0]
first_val = list(d1.values())[0]
print(first_key,first_val)
if first_key in i:
df1 = df1.append(first_val, ignore_index = True)
# df.apply(first_val)
Note I'm thinking it may be easier to do the comprehension at the mapping step prior to creating the dataframe.. I'm rather new to python still so any tips are appreciated.
Thanks!
Use Series.str.extractall for all matched keys, then mapping by dictionary with Series.map and last aggregate join:
pat = r'({})'.format('|'.join(d1.keys()))
df['new'] = df['Layers'].str.extractall(pat)[0].map(d1).groupby(level=0).agg('-'.join)
print (df)
IDs Layers new
0 1244 D-BR-PILECAPS-OUTLINE 2 Bridge
1 35673 D-BR-STEEL-OUTLINE 2D-TERR-BOUNDARY Bridge-Terrain
2 37863 D-SUBG-OTHER NaN
3 76373 D-COMP-PAVE-CONC2 NaN
4 234298 D-EWKS-HINGE Earthworks

How to apply a function with multiple arguments to a specific column in Pandas?

I'm trying to apply a function to a specific column in this dataframe
datetime PM2.5 PM10 SO2 NO2
0 2013-03-01 7.125000 10.750000 11.708333 22.583333
1 2013-03-02 30.750000 42.083333 36.625000 66.666667
2 2013-03-03 76.916667 120.541667 61.291667 81.000000
3 2013-03-04 22.708333 44.583333 22.854167 46.187500
4 2013-03-06 223.250000 265.166667 116.236700 142.059383
5 2013-03-07 263.375000 316.083333 97.541667 147.750000
6 2013-03-08 221.458333 297.958333 69.060400 120.092788
I'm trying to apply this function(below) to a specific column(PM10) of the above dataframe:
range1 = [list(range(0,50)),list(range(51,100)),list(range(101,200)),list(range(201,300)),list(range(301,400)),list(range(401,2000))]
def c1_c2(x,y):
for a in y:
if x in a:
min_val = min(a)
max_val = max(a)+1
return max_val - min_val
Where "x" can be any column and "y" = Range1
Available Options
df.PM10.apply(c1_c2,args(df.PM10,range1),axis=1)
df.PM10.apply(c1_c2)
I've tried these couple of available options and none of them seems to be working. Any suggestions?
Not sure what the expected output is from the function. But to get the function getting called you can try the following
from functools import partial
df.PM10.apply(partial(c1_c2, y=range1))
Update:
Ok, I think I understand a little better. This should work, but 'range1' is a list of lists of integers. Your data doesn't have integers and the new column comes up empty. I created another list based on your initial data that works. See below:
df = pd.read_csv('pm_data.txt', header=0)
range1= [[7.125000,10.750000,11.708333,22.583333],list(range(0,50)),list(range(51,100)),list(range(101,200)),
list(range(201,300)),list(range(301,400)),list(range(401,2000))]
def c1_c2(x,y):
for a in y:
if x in a:
min_val = min(a)
max_val = max(a)+1
return max_val - min_val
df['function']=df.PM10.apply(lambda x: c1_c2(x,range1))
print(df.head(10))
datetime PM2.5 PM10 SO2 NO2 new_column function
0 2013-03-01 7.125000 10.750000 11.708333 22.583333 25.750000 16.458333
1 2013-03-02 30.750000 42.083333 36.625000 66.666667 2.104167 NaN
2 2013-03-03 76.916667 120.541667 61.291667 81.000000 6.027083 NaN
3 2013-03-04 22.708333 44.583333 22.854167 46.187500 2.229167 NaN
4 2013-03-06 223.250000 265.166667 116.236700 142.059383 13.258333 NaN
5 2013-03-07 263.375000 316.083333 97.541667 147.750000 15.804167 NaN
6 2013-03-08 221.458333 297.958333 69.060400 120.092788 14.897917 NaN
Only the first item in 'function' had a match because it came from your initial data because of 'if x in a'.
Old Code:
I'm also not sure what you are doing. But you can use a lambda to modify columns or create new ones.
Like this,
import pandas as pd
I created a data file to import from the data you posted above:
datetime,PM2.5,PM10,SO2,NO2
2013-03-01,7.125000,10.750000,11.708333,22.583333
2013-03-02,30.750000,42.083333,36.625000,66.666667
2013-03-03,76.916667,120.541667,61.291667,81.000000
2013-03-04,22.708333,44.583333,22.854167,46.187500
2013-03-06,223.250000,265.166667,116.236700,142.059383
2013-03-07,263.375000,316.083333,97.541667,147.750000
2013-03-08,221.458333,297.958333,69.060400,120.092788
Here is how I import it,
df = pd.read_csv('pm_data.txt', header=0)
and create a new column and apply a function to the data in 'PM10'
df['new_column'] = df['PM10'].apply(lambda x: x+15 if x < 30 else x/20)
which yields,
datetime PM2.5 PM10 SO2 NO2 new_column
0 2013-03-01 7.125000 10.750000 11.708333 22.583333 25.750000
1 2013-03-02 30.750000 42.083333 36.625000 66.666667 2.104167
2 2013-03-03 76.916667 120.541667 61.291667 81.000000 6.027083
3 2013-03-04 22.708333 44.583333 22.854167 46.187500 2.229167
4 2013-03-06 223.250000 265.166667 116.236700 142.059383 13.258333
5 2013-03-07 263.375000 316.083333 97.541667 147.750000 15.804167
6 2013-03-08 221.458333 297.958333 69.060400 120.092788 14.897917
Let me know if this helps.
"I've tried these couple of available options and none of them seems to be working..."
What do you mean by this? What's your output, are you getting errors or what?
I see a couple of problems:
range1 lists contain int while your column values are float, so c1_c2() will return None.
if the data types were the same within range1 and columns, c1_c2() will return None when value is not in range1.
Below is how I would do it, assuming the data-types match:
def c1_c2(x):
range1 = [list of lists]
for a in range1:
if x in a:
min_val = min(a)
max_val = max(a)+1
return max_val - min_val
return x # returns the original value if not in range1
df.PM10.apply(c1_c2)

Convert pandas column from object type [] in python 3

I have read this Pandas: convert type of column and this How to convert datatype:object to float64 in python?
I have current output of df:
Day object
Time object
Open float64
Close float64
High float64
Low float64
Day Time Open Close High Low
0 ['2019-03-25'] ['02:00:00'] 882.2 882.6 884.0 882.1
1 ['2019-03-25'] ['02:01:00'] 882.9 882.9 883.4 882.9
2 ['2019-03-25'] ['02:02:00'] 882.8 882.8 883.0 882.7
So I can not use this:
day_=df.loc[df['Day'] == '2019-06-25']
My final purpose is to extract df by filtering the value of column "Day" by specific condition.
I think the reason of df.loc above failed to excecute is that dtype of Day is object so I can not execute df.loc
so I try to convert the above df to something like this:
Day Time Open Close High Low
0 2019-03-25 ['02:00:00'] 882.2 882.6 884.0 882.1
1 2019-03-25 ['02:01:00'] 882.9 882.9 883.4 882.9
2 2019-03-25 ['02:02:00'] 882.8 882.8 883.0 882.7
I have tried:
df=pd.read_csv('output.csv')
df = df.convert_objects(convert_numeric=True)
#df['Day'] = df['CTR'].str.replace('[','').astype(np.float64)
df['Day'] = pd.to_numeric(df['Day'].str.replace(r'[,.%]',''))
But it does not work with error like this:
ValueError: Unable to parse string "['2019-03-25']" at position 0
I am novice at pandas and this may be duplicated!
Pls, help me to find solution. Thanks alot.
Try this I hope it would work
first remove list brackets by from day then do filter using .loc
df = pd.DataFrame(data={'Day':[['2016-05-12']],
'day2':[['2016-01-01']]})
df['Day'] = df['Day'].apply(''.join)
df['Day'] = pd.to_datetime(df['Day']).dt.date.astype(str)
days_df=df.loc[df['Day'] == '2016-05-12']
Second Solution
If the list is stored as string
from ast import literal_eval
df2 = pd.DataFrame(data={'Day':["['2016-05-12']"],
'day2':["['2016-01-01']"]})
df2['Day'] = df2['Day'].apply(literal_eval)
df2['Day'] = df2['Day'].apply(''.join)
df2['Day'] = pd.to_datetime(df2['Day']).dt.date.astype(str)
days_df=df2.loc[df2['Day'] == '2016-05-12']

Resources