Using xlsxwriter (or other packages) to create Excel tabs with specific naming, and write dataframe to the corresponding tab - excel

I am trying to query based on different criteria, and then create individual tabs in Excel to store the query results.
For example, I want to query all the results that match criteria A, and write the result to an Excel tab named "A". The query result is stored in the panda data frame format.
My problem is, when I want to perform 4 different queries based on criteria "A", "B", "C", "D", the final Excel file only contains one tab, which corresponds to the last criteria in the list. It seems that all the previous tabs are over-written.
Here is sample code where I replace the SQL query part with a pre-set dataframe and the tab name is set to 0, 1, 2, 3 ... instead of the default Sheet1, Sheet2... in Excel.
import pandas as pd
import xlsxwriter
import datetime
def GCF_Refresh(fileCreatePath, inputName):
currentDT = str(datetime.datetime.now())
currentDT = currentDT[0:10]
loadExcelName = currentDT + '_' + inputName + '_Load_File'
fileCreatePath = fileCreatePath +'\\' + loadExcelName+'.xlsx'
wb = xlsxwriter.Workbook(fileCreatePath)
data = [['tom'], ['nick'], ['juli']]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['Name'])
writer = pd.ExcelWriter(fileCreatePath, engine='xlsxwriter')
for iCount in range(5):
#worksheet = writer.sheets[str(iCount)]
#worksheet.write(0, 0, 'Name')
df['Name'].to_excel(fileCreatePath, sheet_name=str(iCount), startcol=0, startrow=1, header=None, index=False)
writer.save()
writer.close()
# Change the file path here to store on your local computer
GCF_Refresh("H:\\", "Bulk_Load")
My goal for this sample code is to have 5 tabs named, 0, 1, 2, 3, 4 and each tab has 'tom', 'nick' and 'juli' printed to it. Right now, I just have one tab (named 4), which is the last tab among all the tabs I expected.

There are a number of errors in the code:
The xlsx file is created using XlsxWriter directly and then overwritten by creating it Again in Pandas.
The to_excel() method takes a reference to the writer object not the file path.
The save() and close() are the same thing and shouldn't be in the
loop.
Here is a simplified version of your code with these issues fixes:
import pandas as pd
import xlsxwriter
fileCreatePath = 'test.xlsx'
data = [['tom'], ['nick'], ['juli']]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['Name'])
writer = pd.ExcelWriter(fileCreatePath, engine='xlsxwriter')
for iCount in range(5):
df['Name'].to_excel(writer,
sheet_name=str(iCount),
startcol=0,
startrow=1,
header=None,
index=False)
writer.save()
Output:
See Working with Python Pandas and XlsxWriter in the XlsxWriter docs for some details about getting Pandas and XlsxWriter working together.

Related

Append values to Dataframe in loop and if conditions

Need help please.
I have a dataframe that reads rows from Excel and appends to Dataframe if certain columns exist.
I need to add an additional Dataframe if the columns don't exist in a sheet and append filename and sheetname and write all the file names and sheet names for those sheets to an excel file. Also I want the values to be unique.
I tried adding to dfErrorList but it only showed the last sheetname and filename and repeated itself many times in the output excel file
from xlsxwriter import Workbook
import pandas as pd
import openpyxl
import glob
import os
path = 'filestoimport/*.xlsx'
list_of_dfs = []
list_of_dferror = []
dfErrorList = pd.DataFrame() #create empty df
for filepath in glob.glob(path):
xl = pd.ExcelFile(filepath)
# Define an empty list to store individual DataFrames
for sheet_name in xl.sheet_names:
df = pd.read_excel(filepath, sheet_name=sheet_name)
df['sheetname'] = sheet_name
file_name = os.path.basename(filepath)
df['sourcefilename'] = file_name
if "Project ID" in df.columns and "Status" in df.columns:
print('')
*else:
dfErrorList['sheetname'] = df['sheetname'] # adds `sheet_name` into the column
dfErrorList['sourcefilename'] = df['sourcefilename']
continue
list_of_dferror.append((dfErrorList))
df['Status'].fillna('', inplace=True)
df['Added by'].fillna('', inplace=True)
list_of_dfs.append(df)
# # Combine all DataFrames into one
data = pd.concat(list_of_dfs, ignore_index=True)
dataErrors = pd.concat(list_of_dferror, ignore_index=True)
dataErrors.to_excel(r'error.xlsx', index=False)
# data.to_excel("total_countries.xlsx", index=None)

When writing values to Excel in Pandas, formulas based on those cells are not recalculated

I am using below code for reading data, spliting it then writing it again. All actions are done correctly, but when I check in excel workbook, the values that correspond by excel function on data wrote by df.to_excel(), I found that function didn't recognize it, until I go to cell and open it then press enter.
import pandas as pd
df1 = pd.read_excel(io=IP+'.xlsx',sheet_name='VLAN', header=None)
df2 = pd.read_excel(io=IP+'.xlsx',sheet_name='Profile', header=None)
# Code to separate data
df1 = df1[0].str.split(' |:', expand=True)
df2 = df2[0].str.split(' |:', expand=True)
# Save and close the workbook
with pd.ExcelWriter(path=IP+'.xlsx',engine='openpyxl',if_sheet_exists='replace',mode='a') as write:
df1.to_excel(write , sheet_name='VLAN', header=None, index=False)
df2.to_excel(write , sheet_name='Profile', header=None, index=False)

how to read xlsx as pandas dataframe with formulas as strings

I have a excel file with some calculated columns.
for example, I have some data in columns 'a' and column 'b' is calculated using values in column 'a'.
i need to append new data to column 'a' and calculate column 'b' and save the file.
import pandas as pd
df = pd.DataFrame({'a':[1,2,3],'b':["=a2","=a3","=a4"]})
df.to_excel('test.xlsx',index=False)
when i try to read the file using pandas read excel it reads the column 'b' as NaN.
df = pd.read_excel(r'test.xlsx')
how do i achieve this. may be if i can read the file as string and append the formulas as string. when i open the file in excel the excel will do the calculations?
Use OpenPyXL to load the excel worksheet instead of directly with pandas
from openpyxl import load_workbook
import pandas as pd
wb = load_workbook(filename = 'test.xlsx')
sheet_name = wb.get_sheet_names()[0]
ws = wb[sheet_name]
df = pd.DataFrame(ws.values)
import pandas as pd
import xlsxwriter
name = '123.xlsx'
writer = pd.ExcelWriter(name,engine='xlsxwriter')
pd.DataFrame({}).to_excel(writer,sheet_name='Sheet1')
workbook = writer.book
worksheet = writer.sheets['Sheet1']
worksheet.write('A1',1)
worksheet.write('A2','=A1')
writer.save()

Appending Columns from several worksheets Python

I am trying to import certain columns of data from several different sheets inside of a workbook. However, while appending it only seems to append 'q2 survey' to a new workbook. How do I get this to append properly?
import sys, os
import pandas as pd
import xlrd
import xlwt
b = ['q1 survey', 'q2 survey','q3 survey'] #Sheet Names
df_t = pd.DataFrame(columns=["Month","Date", "Year"]) #column Name
xls = "path_to_file/R.xls"
sheet=[]
df_b=pd.DataFrame()
pd.read_excel(xls,sheet)
for sheet in b:
df=pd.read_excel(xls,sheet)
df.rename(columns=lambda x: x.strip().upper(), inplace=True)
bill=df_b.append(df[df_t])
bill.to_excel('Survey.xlsx', index=False)
I think if you do:
b = ['q1 survey', 'q2 survey','q3 survey'] #Sheet Names
list_col = ["Month","Date", "Year"] #column Name
xls = "path_to_file/R.xls"
#create the empty df named bill to append after
bill= pd.DataFrame(columns = list_col)
for sheet in b:
# read the sheet
df=pd.read_excel(xls,sheet)
df.rename(columns=lambda x: x.strip().upper(), inplace=True)
# need to assign bill again
bill=bill.append(df[list_col])
# to excel
bill.to_excel('Survey.xlsx', index=False)
it should work and correct the errors in your code, but you can do a bit differently using pd.concat:
list_sheet = ['q1 survey', 'q2 survey','q3 survey'] #Sheet Names
list_col = ["Month","Date", "Year"] #column Name
# read once the xls file and then access the sheet in the loop, should be faster
xls_file = pd.ExcelFile("path_to_file/R.xls")
#create a list to append the df
list_df_to_concat = []
for sheet in list_sheet :
# read the sheet
df= pd.read_excel(xls_file, sheet)
df.rename(columns=lambda x: x.strip().upper(), inplace=True)
# append the df to the list
list_df_to_concat.append(df[list_col])
# to excel
pd.concat(list_df_to_concat).to_excel('Survey.xlsx', index=False)

Getting AttributeError 'Workbook' object has no attribute 'add_worksheet' - while writing data frame to excel sheet

I have the following code, and I am trying to write a data frame into an "existing" worksheet of an Excel file (referred here as test.xlsx). Sheet3 is the targeted sheet where I want to place the data, and I don't want to replace the entire sheet with a new one.
df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})
book = load_workbook('test.xlsx')
writer = pd.ExcelWriter('test.xlsx')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets) # *I am not sure what is happening in this line*
df.to_excel(writer,"Sheet3",startcol=0, startrow=20)
When I am running the code line by line, I am getting this error for the last line:
AttributeError: 'Workbook' object has no attribute 'add_worksheet'. Now why am I seeing this error when I am not trying to add worksheet ?
Note: I am aware of this similar issue Python How to use ExcelWriter to write into an existing worksheet but its not working for me and I can't comment on that post either.
You can use the append_df_to_excel() helper function, which is defined in this answer:
Usage:
append_df_to_excel('test.xlsx', df, sheet_name="Sheet3", startcol=0, startrow=20)
Some details:
**to_excel_kwargs - used in order to pass additional named parameters to df.to_excel() like i did in the example above - parameter startcol is unknown to append_df_to_excel() so it will be treated as a part of **to_excel_kwargs parameter (dictionary).
writer.sheets = {ws.title:ws for ws in writer.book.worksheets} is used in order to copy existing sheets to writer openpyxl object. I can't explain why it's not done automatically when reading writer = pd.ExcelWriter(filename, engine='openpyxl') - you should ask authors of openpyxl module about that...
You can use openpyxl as the engine when you are creating an instance of pd.ExcelWriter.
import pandas as pd
import openpyxl
df1 = pd.DataFrame({'A':[1, 2, -3],'B':[1,2,6]})
book = openpyxl.load_workbook('examples/ex1.xlsx') #Already existing workbook
writer = pd.ExcelWriter('examples/ex1.xlsx', engine='openpyxl') #Using openpyxl
#Migrating the already existing worksheets to writer
writer.book = book
writer.sheets = {x.title: x for x in book.worksheets}
df1.to_excel(writer, sheet_name='sheet4')
writer.save()
Hope this works for you.
openpyxl has support for Pandas dataframes so you're best off using it directly. See http://openpyxl.readthedocs.io/en/latest/pandas.html for more details.
Based on https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_excel.html
This did work for me (pandas version 1.3.5)
import pandas as pd
df1 = pd.DataFrame({'a':[0,1,2], 'b':[1,2,3],'c':[2,3,4]})
df2 = pd.DataFrame({'aa':[10,11,12], 'bb':[11,12,13],'cc':[12,13,14]})
with pd.ExcelWriter('test.xlsx') as writer:
for i, df in enumerate([df1, df2]):
df.to_excel(writer,sheet_name=f'sheet_{i}', index=False)

Resources