Change index column text in pandas - excel

I have the following spreadsheet that I am bringing in to pandas:
Excel Spreadsheet
I import it with:
import pandas as pd
df = pd.read_excel("sessions.xlsx")
Jupyter shows it like this:
Panda Dataframe 1
I then transpose the dataframe with
df = df.T
Which results in this
Transposed DataFrame
At this stage how can I now change the text in the leftmost index column? I want to change the word Day to the word Service, but I am not sure how to address that cell/header. I can't refer to column 0 and change the header for that.
Likewise how could i then go on to change the A, B, C, D text which is now the index column?

You could first assign to the columns attribute, and then apply the transposition.
import pandas as pd
df = pd.read_excel("sessions.xlsx")
df.columns = ['Service','AA', 'BB', 'CC', 'DD']
df = df.T

Renaming the columns before transposing would work. To do exactly what you want, you can use the the rename function. In the documentation it also has a helpful example on how to rename the index.
Your example in full:
import pandas as pd
df = pd.read_excel("sessions.xlsx")
df = df.T
dict_rename = {'Day': 'Service'}
df.rename(index = dict_rename)
To extend this to more index values, you merely need to adjust the dict_rename argument before renaming.
Full sample:
import pandas as pd
df = pd.read_excel("sessions.xlsx")
df = df.T
dict_rename = {'Day': 'Service','A':'AA','B':'BB','C':'CC','D':'DD'}
df.rename(index = dict_rename)

Related

I'm using pandas to delete first 9 rows but it is still keeping the first row

Below is my code -
import pandas as pd
import datetime
df1 = pd.read_excel(str(sys_folder) + "Italy_SS304.xlsx")
df1.drop(df1.index[0:9], axis=0, inplace=True)
df1.drop(df1.columns[1:3], axis=1, inplace=True)
df1
attached image is my database from excel
While reading the excel, the first row is taken as header, unless you explicitly say it is not. You need to add header = False so that the dataframe does not take the first row as header.
Assuming you are writing df1 back into an excel using something like df1.to_excel(), you will need to use the same header=None, index=False, assuming you don't want to add index.
Change the read_excel file to as shown below.
df1 = pd.read_excel(str(sys_folder) + "Italy_SS304.xlsx", header=None)
...and if you are writing back to excel, use the line like this (after the drop commands)
df1.to_excel('NEWFILE.xlsx', header=None, index=False)

how to change the method into .csv file

import pandas as pd
f=open("xyz.csv",'w')
df=pd.DataFrame(sq3row)
df.to_csv(f)
i am using above code to write sq3lite output rows to .csv file based on conditions
but instead of row wise the output is writing in columns for ex data's to be written from row[0] to row[11] is writing to col[0] to col[11] and second output row is writing from col[12] to col[24] likewise
how to write this in row wise like col[1]:row[0] to row[11]
for next col[2]: row[0] to row[11]
## I think you are trying to transpose your df
df = df.T
you can use transpose function to convert rows into columns and vice versa.
import pandas as pd
f=open("xyz.csv",'w')
df=pd.DataFrame(sq3row)
df.to_csv(f)
df.transpose()

How to selecting multiple rows and take mean value based on name of the row

From this data frame I like to select rows with same concentration and also almost same name. For example, first three rows has same concentration and also same name except at the end of the name Dig_I, Dig_II, Dig_III. This 3 rows same with same concentration. I like to somehow select this three rows and take mean value of each column. After that I want to create a new data frame.
here is the whole data frame:
import pandas as pd
df = pd.read_csv("https://gist.github.com/akash062/75dea3e23a002c98c77a0b7ad3fbd25b.js")
import pandas as pd
df = pd.read_csv("https://gist.github.com/akash062/75dea3e23a002c98c77a0b7ad3fbd25b.js")
new_df = df.groupby('concentration').mean()
Note: This will only find the averages for columns with dtype float or int... this will drop the img_name column and will take the averages of all columns...
This may be faster...
df = pd.read_csv("https://gist.github.com/akash062/75dea3e23a002c98c77a0b7ad3fbd25b.js").groupby('concentration').mean()
If you would like to preserve the img_name...
df = pd.read_csv("https://gist.github.com/akash062/75dea3e23a002c98c77a0b7ad3fbd25b.js")
new = df.groupby('concentration').mean()
pd.merge(df, new, left_on = 'concentration', right_on = 'concentration', how = 'inner')
Does that help?

How do I convert range of openpyxl cells to pandas dataframe without looping though all cells?

Openpyxl supports converting an entire worksheet of an excel 2010 workbook to a pandas dataframe. I want to select a subset of those cells, using Excel's native indices, and convert that block of cells to a dataframe. Openpyxl's documentation on working with pandas does not help: https://openpyxl.readthedocs.io/en/stable/pandas.html
I am trying to avoid 1) Looping through all rows and columns in the data, since that's inefficient 2) removing this cells from the dataframe after creation instead, and 3) Pandas' read_excel module, since it does not seem to support specifying the range in Excel's native indices.
#This converts an entire workbook to a pandas dataframe
import pandas as pd
import openpyxl as px
Work_Book = px.load_workbook(filename='MyBook.xlsx')
Work_Sheet = Work_Book['Sheet1']
df = pd.DataFrame(Work_Sheet.values)
#This produces a tuple of cells. Calling pd.DataFrame on it returns
#"ValueError: DataFrame constructor not properly called!"
Cell_Range = Work_Sheet['B2:D4']
#This is the only way I currently know to convert Cell_Range to a Pandas
# DataFrame. I'm trying to avoid these nested loops.
row_list = []
for row in Cell_Range:
col_list = []
for col in row:
col_list.append(col.value)
row_list.append(col_list)
df = pd.DataFrame(row_list)
I am trying to find the most efficient way to convert the Cell_Range object above into a pandas dataframe. Thanks!
Work_Sheet.values will give you a generator. Turn it into a list to generate a list of tuples, with first tuple having the headers.
To covert it into dataframe, do the following:
df = pd.DataFrame(list(Work_Sheet.values))
df.columns = df.iloc[0,:]
df = df.iloc[1:,].reset_index(drop=True)

Python - Pandas Dataframe with Multiple Names per Column

Is there a way in pandas to give the same column of a pandas dataframe two names, so that I can index the column by only one of the two names? Here is a quick example illustrating my problem:
import pandas as pd
index=['a','b','c','d']
# The list of tuples here is really just to
# somehow visualize my problem below:
columns = [('A','B'), ('C','D'),('E','F')]
df = pd.DataFrame(index=index, columns=columns)
# I can index like that:
df[('A','B')]
# But I would like to be able to index like this:
df[('A',*)] #error
df[(*,'B')] #error
You can create a multi-index column:
df.columns = pd.MultiIndex.from_tuples(df.columns)
Then you can do:
df.loc[:, ("A", slice(None))]
Or: df.loc[:, (slice(None), "B")]
Here slice(None) is equivalent to selecting all indices at the level, so (slice(None), "B") selects columns whose second level is B regardless of the first level names. This is semantically the same as :. Or write in pandas index slice way. df.loc[:, pd.IndexSlice[:, "B"]] for the second case.

Resources