Groupby single column and merge using Pandas - python-3.x

I have a dataframe df which i need to merge Department wise
Input
My Output
Expected Output

Related

Python Pandas spit to series groupBy

I have a dataframe which I can run split on a specific column and get a series - but how do I then add my other columns back into this dataframe? or do I somehow specify in the split that there's column a which is the groupBy then split on columnb ?
input:
ixd _id systemA systemB
0 abc123 1703.0|1144.0 2172.0|735.0
output:
pandas series data (not expanded) for systemA and B split on '|' groupedBy _id
It sounds like a regular .groupby will achieve what you are after:
for specific_value, subset_df in df.groupby(column_of_interest):
...
The subset_df will be a pandas dataframe containing only rows for which column_of_interest contains specific_value.

Write Array and Variable to Dataframe

I have an array in the format [27.214 27.566] - there can be several numbers. Additionally I have a Datetime variable.
now=datetime.now()
datetime=now.strftime('%Y-%m-%d %H:%M:%S')
time.sleep(0.5)
agilent.write("MEAS:TEMP? (#101:102)")
values = np.fromstring(agilent.read(), dtype=float, sep=',')
The output from the array is [27.214 27.566]
Now I would like to write this into a dataframe with the following structure:
Datetime, FirstValueArray, SecondValueArray, ....
How to do this? In the dataframe every one minute a new array is added.
I will assume you want to append a row to an existing dataframe df with appropriate columns : value1, value2, ..., lastvalue, datetime
We can easily convert the array to a series :
s = pd.Series(array)
What you want to do next is append the datetime value to the series :
s.append(datetime, ignore_index=True) cf Series.append
Now you have a series whose length matches df.columns. You want to convert that series to a dataframe to be able to use pd.concat :
df_to_append = s.to_frame().T
We need to get the transpose of the original dataframe, because Series.to_frame() returns a dataframe with the series as a single column, and we want a single index but multiple columns.
Before you concatenate, however, you need to make sure both those dataframes columns names match, or it will create additional columns :
df_to_append.columns = df.columns
Now we can concatenate our two dataframes :
pd.concat([df, df_to_append], ignore_index=True) cf pandas.Concat
For further details, see the documentation

Python: DataFrame Index shifting

I have several dataframes that I have concatenated with pandas in the line:
xspc = pd.concat([df1,df2,df3], axis = 1, join_axes = [df3.index])
In df2 the index values read one day later than the values of df1, and df3. So for instance when the most current date is 7/1/19 the index values for df1 and df3 will read "7/1/19" while df2 reads '7/2/19'. I would like to be able to concatenate each series so that each dataframe is joined on the most recent date, so in other words I would like all the dataframe values from df1 index value '7/1/19' to be concatenated with dataframe 2 index value '7/2/19' and dataframe 3 index value '7/1/19'. When methods can I use to shift the data around to join on these not matching index values?
You can reset the index of the data frame and then concat the dataframes
df1=df1.reset_index()
df2=df2.reset_index()
df3=df3.reset_index()
df_final = pd.concat([df1,df2,df3],axis=1, join_axes=[df3.index])
This should work since you mentioned that the date in df2 will be one day after df1 or df3

Merge multiple dataframes using multiindex in python

I have 3 series which is generated out of the code shown below. I have shown a the code for one series below
I would like to merge 3 such series/dataframes using columns (subject_id,hadm_id,icustay_id) but unfortunately these headings don't appear as column names. How do I convert them as columns and use them for merging with another series/dataframe of similar datatype
I am generating series from another dataframe (df) based on the condition given below. Though I already tried converting this series to dataframe, still it doesn't display the indices, instead it displays the column name as index. I have shown the output below. I would like to see the values 'Subject_id','hadm_id','icustay_id' as column names in dataframe along with other column 'val_bw_80_110' so that I can join with other dataframes using these 3 ids ('Subject_id','hadm_id','icustay_id')
s1 =
df.groupby(['subject_id','hadm_id','icustay_id'['val_bw_80_110'].mean()
I expect an output where the ids (subject_id,hadm_id,icustay_id) are converted to column names and can be used for joining/merging with other dataframes.
You can add parameter as_index=False to DataFrame.groupby or use Series.reset_index:
df = df.groupby(['subject_id','hadm_id','icustay_id'], as_index=False)['val_bw_80_110'].mean()
Or:
df = df.groupby(['subject_id','hadm_id','icustay_id'])['val_bw_80_110'].mean().reset_index()

Read CSV file without scientific notation

I have two dataframes with one column, values as
df1:
0000D3447E
0000D3447E39S
5052722161014
5052722161021
5052722161038
00000E2377
00000E2378
0000F2892E
0000F2892EDI1
8718934652999
8718934653095
8718934653002
8718934653118
8718934653019
8718934653125
8718934653132
00000E2387
df2:
0000D3447E
0000D3447E39S
5052722161014
5052722161021
5052722161038
00000E2377
00000E2378
0000F2892E
00000E2387
when I merge these two dataframes I don't get the merged values which starts with "00000E"
I used:
df1[['PartNumber']] = df2[['PartNumber']].astype(str)
To convert the column dtype to string. But, still I'm not able get the merged values.
my question is how can I get these values which start with 00000E when I merge the two dataframes.

Resources