Transpose pandas dataframe while replacing select column names - python-3.x

I have a pandas dataframe as below:
Which I instead want to transpose it like so relabeling the first 2 columns "title" and "heading" while the remainder are labeled starting from 0:
title
heading
0
1
2
3
0
Residential Zones
Purpose
0.021688304841518402
0.034876345
0.01611880026757717
-0.014965585432946682
1
Purposes of Residential Zones
RR (Residential Ranchette)
0.00977486465126276
-0.0021642891224473715
-0.008860375732183456
0.01690787263214588
I tried:
df.T
df.rename(columns = {0:'title', 1: 'heading'}, inplace=True)
But while this renames the first two columns I still need the remaining column labels to start from 0 as above. There are loads of columns so a for loop would be totally inefficient. How can I do this while transposing the pandas dataframe? This originated from a csv file btw

Your current dataset does not contain the appropriate column labels. Therefore, once it has been transposed you must relabel each column, however you can simply pop the last label and insert a new label, 'Heading', without ever using a for loop.
from pandas import DataFrame
import numpy as np
data = np.array([['', 'RZ', 'POTRZ'],
['0', 'Purpose', 'RR'],
['1', 0.1, 0.2],
['2', 0.3, 0.4]])
df = DataFrame(data=data[1:, 1:], index=data[1:, 0], columns=data[0, 1:])
# Transpose the data set
df = df.T
# Take the current index col ('RZ, POTRZ') then Shift & Replace
col_index = df.columns[:-1]
shifted_col = df.index
df.index = col_index
df.insert(0, '', shifted_col)
# Pop last label & inject new 'Heading' col. label, then rename
df.rename(columns=dict(zip(df.columns, ['Title', 'Heading'] + list(df.columns[1:-1].values))), inplace=True)

Related

How to transpose and Pandas DataFrame and name new columns?

I have simple Pandas DataFrame with 3 columns. I am trying to Transpose it into and then rename that new dataframe and I am having bit trouble.
df = pd.DataFrame({'TotalInvoicedPrice': [123],
'TotalProductCost': [18],
'ShippingCost': [5]})
I tried using
df =df.T
which transpose the DataFrame into:
TotalInvoicedPrice,123
TotalProductCost,18
ShippingCost,5
So now i have to add column names to this data frame "Metrics" and "Values"
I tried using
df.columns["Metrics","Values"]
but im getting errors.
What I need to get is DataFrame that looks like:
Metrics Values
0 TotalInvoicedPrice 123
1 TotalProductCost 18
2 ShippingCost 5
Let's reset the index then set the column labels
df.T.reset_index().set_axis(['Metrics', 'Values'], axis=1)
Metrics Values
0 TotalInvoicedPrice 123
1 TotalProductCost 18
2 ShippingCost 5
Maybe you can avoid transpose operation (little performance overhead)
#YOUR DATAFRAME
df = pd.DataFrame({'TotalInvoicedPrice': [123],
'TotalProductCost': [18],
'ShippingCost': [5]})
#FORM THE LISTS FROM YOUR COLUMNS AND FIRST ROW VALUES
l1 = df.columns.values.tolist()
l2 = df.iloc[0].tolist()
#CREATE A DATA FRAME.
df2 = pd.DataFrame(list(zip(l1, l2)),columns = ['Metrics', 'Values'])
print(df2)

How to plot this code of matplotlib efficiently

I am new to python and doing a time series analysis of stocks.I created a data frame of rolling average of 5 stocks according to their percentage change in close price.Therefore this df has 5 columns and i have another df index rolling average of percentage change of closing price.I want to plot individual stock column of the df with the index df. I wrote this code
fig.add_subplot(5,1,1)
plt.plot(pctchange_RA['HUL'])
plt.plot(N50_RA)
fig.add_subplot(5,1,2)
plt.plot(pctchange_RA['IRCON'])
plt.plot(N50_RA)
fig.add_subplot(5,1,3)
plt.plot(pctchange_RA['JUBLFOOD'])
plt.plot(N50_RA)
fig.add_subplot(5,1,4)
plt.plot(pctchange_RA['PVR'])
plt.plot(N50_RA)
fig.add_subplot(5,1,5)
plt.plot(pctchange_RA['VOLTAS'])
plt.plot(N50_RA)
NOTE:pctchange_RA is a pandas df of 5 stocks and N50_RA is a index df of one column
You can put your column names in a list and then just loop over it and create subplots dynamically. A pseudocode would look like the following
cols = ['HUL', 'IRCON', 'JUBLFOOD', 'PVR', 'VOLTAS']
for i, col in enumerate(cols):
ax = fig.add_subplot(5, 1, i+1)
ax.plot(pctchange_RA[col])
ax.plot(N50_RA)

How to selecting multiple rows and take mean value based on name of the row

From this data frame I like to select rows with same concentration and also almost same name. For example, first three rows has same concentration and also same name except at the end of the name Dig_I, Dig_II, Dig_III. This 3 rows same with same concentration. I like to somehow select this three rows and take mean value of each column. After that I want to create a new data frame.
here is the whole data frame:
import pandas as pd
df = pd.read_csv("https://gist.github.com/akash062/75dea3e23a002c98c77a0b7ad3fbd25b.js")
import pandas as pd
df = pd.read_csv("https://gist.github.com/akash062/75dea3e23a002c98c77a0b7ad3fbd25b.js")
new_df = df.groupby('concentration').mean()
Note: This will only find the averages for columns with dtype float or int... this will drop the img_name column and will take the averages of all columns...
This may be faster...
df = pd.read_csv("https://gist.github.com/akash062/75dea3e23a002c98c77a0b7ad3fbd25b.js").groupby('concentration').mean()
If you would like to preserve the img_name...
df = pd.read_csv("https://gist.github.com/akash062/75dea3e23a002c98c77a0b7ad3fbd25b.js")
new = df.groupby('concentration').mean()
pd.merge(df, new, left_on = 'concentration', right_on = 'concentration', how = 'inner')
Does that help?

How to replace outliers with NaN while keeping row intact using pandas in python?

I am working with a very large file and need to eliminate different outliers for each column.
I have been able to find outliers and replace them with NaN, however it is turning the whole row into NaN. I'm sure that I'm missing somthing simple but I can't seem to find it.
import pandas as pd
import numpy as np
pd.set_option('display.max_rows', 100000)
pd.set_option('display.max_columns', 10)
pd.set_option('display.width', 1000)
df = pd.read_excel('example sheet.xlsx')
df = df.replace(df.loc[df['column 2']<=0] ,np.nan)
print(df)
How can I convert only the one value into NaN and not the whole row?
Thanks
In order to change certain cell with NAN, you should change the series value.
instead of dataframe replace, you should use series repalce.
The wrong way:
df = df.replace(df.loc[df['column 2']<=0] ,np.nan)
One of right way:
for col in df.columns:
s = df[col]
outlier_s = s<=0
df[col] = s.where(~outlier_s,np.nan)
where function: Replace values where the condition is False.
http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.where.html?highlight=where#pandas.DataFrame.where
You can do something like the following:
df.mask(df <= 0, np.nan, axis=1)
No need to iterate over columns.
However, I would suggest you to use proper statistics in order to define the outliers, instead of <= 0.
You can use quantiles like:
df.mask(((df < df.quantile(0.05)) or (df > df.quantile(0.95))), np.nan, axis=1)
Use np.where for replacing the value based on condition.
# if you have to perform only for single column
df['column 2'] = np.where(df['column 2']<=0, np.nan, df['column 2'])
# if you want to apply on all/multiple columns.
for col in df.columns:
df[col] = np.where(df[col]<=0, np.nan, df[col])

Change index column text in pandas

I have the following spreadsheet that I am bringing in to pandas:
Excel Spreadsheet
I import it with:
import pandas as pd
df = pd.read_excel("sessions.xlsx")
Jupyter shows it like this:
Panda Dataframe 1
I then transpose the dataframe with
df = df.T
Which results in this
Transposed DataFrame
At this stage how can I now change the text in the leftmost index column? I want to change the word Day to the word Service, but I am not sure how to address that cell/header. I can't refer to column 0 and change the header for that.
Likewise how could i then go on to change the A, B, C, D text which is now the index column?
You could first assign to the columns attribute, and then apply the transposition.
import pandas as pd
df = pd.read_excel("sessions.xlsx")
df.columns = ['Service','AA', 'BB', 'CC', 'DD']
df = df.T
Renaming the columns before transposing would work. To do exactly what you want, you can use the the rename function. In the documentation it also has a helpful example on how to rename the index.
Your example in full:
import pandas as pd
df = pd.read_excel("sessions.xlsx")
df = df.T
dict_rename = {'Day': 'Service'}
df.rename(index = dict_rename)
To extend this to more index values, you merely need to adjust the dict_rename argument before renaming.
Full sample:
import pandas as pd
df = pd.read_excel("sessions.xlsx")
df = df.T
dict_rename = {'Day': 'Service','A':'AA','B':'BB','C':'CC','D':'DD'}
df.rename(index = dict_rename)

Resources