Creating new sheet overwrites existing sheet created via openpyxl - python-3.x

I'm trying to create a bar chart directly in excel, using a pandas dataframe. In the same output excel, I'd like to save in a separate sheet the original csv used for the bar chart. My code:
wb = openpyxl.Workbook()
ws = wb.active
for row in dataframe_to_rows(new_df, index=False, header=False):
ws.append(row)
chart = BarChart()
values = Reference(ws, min_col=1, min_row=1, max_col=2, max_row=ws.max_row)
labels = Reference(ws, min_col=1, min_row=1, max_col=1, max_row=ws.max_row)
chart.add_data(values)
chart.set_categories(labels)
ws.add_chart(chart, "E2")
wb.save("~/barChart.xlsx")
writer = pd.ExcelWriter("~/barChart.xlsx", engine='openpyxl')
df.to_excel(writer, sheet_name="Source_data")
writer.save()
The problem I get is the the last three lines, which overwrite the produced bar chart. How do I overcome this?

from pandas documentation:
ExcelWriter can also be used to append to an existing Excel file:
with pd.ExcelWriter('output.xlsx', mode='a') as writer:
df.to_excel(writer, sheet_name='Sheet_name_3')

Related

When writing values to Excel in Pandas, formulas based on those cells are not recalculated

I am using below code for reading data, spliting it then writing it again. All actions are done correctly, but when I check in excel workbook, the values that correspond by excel function on data wrote by df.to_excel(), I found that function didn't recognize it, until I go to cell and open it then press enter.
import pandas as pd
df1 = pd.read_excel(io=IP+'.xlsx',sheet_name='VLAN', header=None)
df2 = pd.read_excel(io=IP+'.xlsx',sheet_name='Profile', header=None)
# Code to separate data
df1 = df1[0].str.split(' |:', expand=True)
df2 = df2[0].str.split(' |:', expand=True)
# Save and close the workbook
with pd.ExcelWriter(path=IP+'.xlsx',engine='openpyxl',if_sheet_exists='replace',mode='a') as write:
df1.to_excel(write , sheet_name='VLAN', header=None, index=False)
df2.to_excel(write , sheet_name='Profile', header=None, index=False)

how to read xlsx as pandas dataframe with formulas as strings

I have a excel file with some calculated columns.
for example, I have some data in columns 'a' and column 'b' is calculated using values in column 'a'.
i need to append new data to column 'a' and calculate column 'b' and save the file.
import pandas as pd
df = pd.DataFrame({'a':[1,2,3],'b':["=a2","=a3","=a4"]})
df.to_excel('test.xlsx',index=False)
when i try to read the file using pandas read excel it reads the column 'b' as NaN.
df = pd.read_excel(r'test.xlsx')
how do i achieve this. may be if i can read the file as string and append the formulas as string. when i open the file in excel the excel will do the calculations?
Use OpenPyXL to load the excel worksheet instead of directly with pandas
from openpyxl import load_workbook
import pandas as pd
wb = load_workbook(filename = 'test.xlsx')
sheet_name = wb.get_sheet_names()[0]
ws = wb[sheet_name]
df = pd.DataFrame(ws.values)
import pandas as pd
import xlsxwriter
name = '123.xlsx'
writer = pd.ExcelWriter(name,engine='xlsxwriter')
pd.DataFrame({}).to_excel(writer,sheet_name='Sheet1')
workbook = writer.book
worksheet = writer.sheets['Sheet1']
worksheet.write('A1',1)
worksheet.write('A2','=A1')
writer.save()

Using xlsxwriter (or other packages) to create Excel tabs with specific naming, and write dataframe to the corresponding tab

I am trying to query based on different criteria, and then create individual tabs in Excel to store the query results.
For example, I want to query all the results that match criteria A, and write the result to an Excel tab named "A". The query result is stored in the panda data frame format.
My problem is, when I want to perform 4 different queries based on criteria "A", "B", "C", "D", the final Excel file only contains one tab, which corresponds to the last criteria in the list. It seems that all the previous tabs are over-written.
Here is sample code where I replace the SQL query part with a pre-set dataframe and the tab name is set to 0, 1, 2, 3 ... instead of the default Sheet1, Sheet2... in Excel.
import pandas as pd
import xlsxwriter
import datetime
def GCF_Refresh(fileCreatePath, inputName):
currentDT = str(datetime.datetime.now())
currentDT = currentDT[0:10]
loadExcelName = currentDT + '_' + inputName + '_Load_File'
fileCreatePath = fileCreatePath +'\\' + loadExcelName+'.xlsx'
wb = xlsxwriter.Workbook(fileCreatePath)
data = [['tom'], ['nick'], ['juli']]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['Name'])
writer = pd.ExcelWriter(fileCreatePath, engine='xlsxwriter')
for iCount in range(5):
#worksheet = writer.sheets[str(iCount)]
#worksheet.write(0, 0, 'Name')
df['Name'].to_excel(fileCreatePath, sheet_name=str(iCount), startcol=0, startrow=1, header=None, index=False)
writer.save()
writer.close()
# Change the file path here to store on your local computer
GCF_Refresh("H:\\", "Bulk_Load")
My goal for this sample code is to have 5 tabs named, 0, 1, 2, 3, 4 and each tab has 'tom', 'nick' and 'juli' printed to it. Right now, I just have one tab (named 4), which is the last tab among all the tabs I expected.
There are a number of errors in the code:
The xlsx file is created using XlsxWriter directly and then overwritten by creating it Again in Pandas.
The to_excel() method takes a reference to the writer object not the file path.
The save() and close() are the same thing and shouldn't be in the
loop.
Here is a simplified version of your code with these issues fixes:
import pandas as pd
import xlsxwriter
fileCreatePath = 'test.xlsx'
data = [['tom'], ['nick'], ['juli']]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['Name'])
writer = pd.ExcelWriter(fileCreatePath, engine='xlsxwriter')
for iCount in range(5):
df['Name'].to_excel(writer,
sheet_name=str(iCount),
startcol=0,
startrow=1,
header=None,
index=False)
writer.save()
Output:
See Working with Python Pandas and XlsxWriter in the XlsxWriter docs for some details about getting Pandas and XlsxWriter working together.

Delete null values on multiple worksheets and export to excel

I am trying to write a code that deletes null values on multiple excel sheets on specific columns and export the file. Any help is appreciated!
Code below:
import pandas as pd
fileName = 'data.xls'
df = pd.ExcelFile(fileName)
arrayOf_SheetNames = df.sheet_names
for sheetName in arrayOf_SheetNames:
masterdf = pd.read_excel(fileName, sheet_name=sheetName, header=4)
masterdf = masterdf.dropna(subset=['Column 1', 'Column 2'], inplace=True)
masterdf.to_excel('file_path.xls')
One problem you're having is you are redefining what masterdf is for every sheet in the for loop. Another problem is you aren't saving it at the end with writer.save().
dfs = pd.read_excel('/tmp/Untitled spreadsheet-2.xlsx', sheet_name=None, header=4)
writer = pd.ExcelWriter('/tmp/out.xlsx')
for sheetname, df in dfs.items():
df.dropna(subset=['Column 1', 'Column 2'], inplace=True)
df.to_excel(writer, sheetname, index=False)
writer.save()

Getting AttributeError 'Workbook' object has no attribute 'add_worksheet' - while writing data frame to excel sheet

I have the following code, and I am trying to write a data frame into an "existing" worksheet of an Excel file (referred here as test.xlsx). Sheet3 is the targeted sheet where I want to place the data, and I don't want to replace the entire sheet with a new one.
df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})
book = load_workbook('test.xlsx')
writer = pd.ExcelWriter('test.xlsx')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets) # *I am not sure what is happening in this line*
df.to_excel(writer,"Sheet3",startcol=0, startrow=20)
When I am running the code line by line, I am getting this error for the last line:
AttributeError: 'Workbook' object has no attribute 'add_worksheet'. Now why am I seeing this error when I am not trying to add worksheet ?
Note: I am aware of this similar issue Python How to use ExcelWriter to write into an existing worksheet but its not working for me and I can't comment on that post either.
You can use the append_df_to_excel() helper function, which is defined in this answer:
Usage:
append_df_to_excel('test.xlsx', df, sheet_name="Sheet3", startcol=0, startrow=20)
Some details:
**to_excel_kwargs - used in order to pass additional named parameters to df.to_excel() like i did in the example above - parameter startcol is unknown to append_df_to_excel() so it will be treated as a part of **to_excel_kwargs parameter (dictionary).
writer.sheets = {ws.title:ws for ws in writer.book.worksheets} is used in order to copy existing sheets to writer openpyxl object. I can't explain why it's not done automatically when reading writer = pd.ExcelWriter(filename, engine='openpyxl') - you should ask authors of openpyxl module about that...
You can use openpyxl as the engine when you are creating an instance of pd.ExcelWriter.
import pandas as pd
import openpyxl
df1 = pd.DataFrame({'A':[1, 2, -3],'B':[1,2,6]})
book = openpyxl.load_workbook('examples/ex1.xlsx') #Already existing workbook
writer = pd.ExcelWriter('examples/ex1.xlsx', engine='openpyxl') #Using openpyxl
#Migrating the already existing worksheets to writer
writer.book = book
writer.sheets = {x.title: x for x in book.worksheets}
df1.to_excel(writer, sheet_name='sheet4')
writer.save()
Hope this works for you.
openpyxl has support for Pandas dataframes so you're best off using it directly. See http://openpyxl.readthedocs.io/en/latest/pandas.html for more details.
Based on https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_excel.html
This did work for me (pandas version 1.3.5)
import pandas as pd
df1 = pd.DataFrame({'a':[0,1,2], 'b':[1,2,3],'c':[2,3,4]})
df2 = pd.DataFrame({'aa':[10,11,12], 'bb':[11,12,13],'cc':[12,13,14]})
with pd.ExcelWriter('test.xlsx') as writer:
for i, df in enumerate([df1, df2]):
df.to_excel(writer,sheet_name=f'sheet_{i}', index=False)

Resources