Let us say we have a dataframe as under:
index = pd.MultiIndex.from_tuples(zip(['A','A','A','A','B','B','B','B'],[2012,2013,2014,2015,2012,2013,2014,2015]))
df = pd.DataFrame({
'col1':[10,20,10,30,50,20,60,80],
'col2':[40,20,40,30,50,20,60,80],
'col3':[10,20,80,30,80,20,80,10],
},index=index)
I then added the names for these multi-index as 'Product' and 'Year' respectively.
Now I need to plot this data in such a way that for each 'Product' there is a different line for a specific column.
I tried this but it doesn't work.
df.plot(kind='line',x='Year')
I tried unstacking the dataframe using unstack(), however, as there are multiple columns, I will have to create as many new dataframes as there are columns for this to work.
Is there any other way?
You can unstack:
df['col'].unstack(level=0).plot()
Output:
Related
This is how I am reading and creating the dataframe with pandas
def get_sheet_data(sheet_name='SomeName'):
df = pd.read_excel(f'{full_q_name}',
sheet_name=sheet_name,
header=[0,1],
index_col=0)#.fillna(method='ffill')
df = df.swapaxes(axis1="index", axis2="columns")
return df.set_index('Product Code')
printing this tabularized gives me(this potentially will have hundreds of columns):
I cant seem to add those first two rows into the header, I've tried:
python:pandas - How to combine first two rows of pandas dataframe to dataframe header?https://stackoverflow.com/questions/59837241/combine-first-row-and-header-with-pandas
and I'm failing at each point. I think its because of the multiindex, not necessarily the axis swap? But using: https://pandas.pydata.org/docs/reference/api/pandas.MultiIndex.html is kind of going over my head right now. Please help me add those two rows into the header?
The output of df.columns is massive so Ive cut it down alot:
Index(['Product Code','Product Narrative\nHigh-level service description','Product Name','Huawei Product ID','Type','Bill Cycle Alignment',nan,'Stackable',nan,
and ends with:
nan], dtype='object')
We Create new column names and set them to df.columns, the new column names are generated by joining the 3 Multindex headers and the 1st row of the DataFrame.
df.columns = ['_'.join(i) for i in zip(df.columns.get_level_values(0).tolist(), df.columns.get_level_values(1).tolist(), df.iloc[0,:].replace(np.nan,'').tolist())]
I have two dataframe
Dataframe 1
Dataframe 2
ID column is not unique in the two tables. I want to compare all the columns in both the tables except ID's and print the unique rows
Expected output
I tried 'isin' function, but not working. Each dataframe size is 150000 and I removed duplicates in both the tables. Please advise how to do that?
You can use df.append to combine the dataframe, then use df.duplicated which will flag the duplicates.
df3 = df1.append(df, ignore_index=True)
df4 = df3.duplicated(subset=['Team', 'name', 'Country', 'Token'], keep=False)
I have a dataframe df as below.
I want the final dataframe to be like this as follows. i.e, for each unique Name only last 2 rows must be present in the final output.
i tried the following snippet but its not working.
df = df[df['Name']].tail(2)
Use GroupBy.tail:
df1 = df.groupby('Name').tail(2)
Just one more way to solve this using GroupBy.nth:
df1 = df.groupby('Name').nth([-1,-2]) ## this will pick the last 2 rows
I have a DataFrame which contains 55000 rows and 3 columns
I want to return every row as DataFrame from this bigdataframe for using it as parameter of different function.
My idea was iterating over big DataFrame by iterrows(),iloc but I can't make it as DataFrame it is showing series type. How could I solve this
I think it is obviously not necessary, because index of Series is same like columns of DataFrame.
But it is possible by:
df1 = s.to_frame().T
Or:
df1 = pd.DataFrame([s.to_numpy()], columns=s.index)
Also you can try yo avoid iterrows, because obviously very slow.
I suspect you're doing something not optimal if you need what you describe. That said, if you need each row as a dataframe:
l = [pd.DataFrame(df.iloc[i]) for i in range(len(df))]
This makes a list of dataframes for each row in df
I have 3 series which is generated out of the code shown below. I have shown a the code for one series below
I would like to merge 3 such series/dataframes using columns (subject_id,hadm_id,icustay_id) but unfortunately these headings don't appear as column names. How do I convert them as columns and use them for merging with another series/dataframe of similar datatype
I am generating series from another dataframe (df) based on the condition given below. Though I already tried converting this series to dataframe, still it doesn't display the indices, instead it displays the column name as index. I have shown the output below. I would like to see the values 'Subject_id','hadm_id','icustay_id' as column names in dataframe along with other column 'val_bw_80_110' so that I can join with other dataframes using these 3 ids ('Subject_id','hadm_id','icustay_id')
s1 =
df.groupby(['subject_id','hadm_id','icustay_id'['val_bw_80_110'].mean()
I expect an output where the ids (subject_id,hadm_id,icustay_id) are converted to column names and can be used for joining/merging with other dataframes.
You can add parameter as_index=False to DataFrame.groupby or use Series.reset_index:
df = df.groupby(['subject_id','hadm_id','icustay_id'], as_index=False)['val_bw_80_110'].mean()
Or:
df = df.groupby(['subject_id','hadm_id','icustay_id'])['val_bw_80_110'].mean().reset_index()