Appending dataframe to existing Excel worksheet using Openpyxl - excel

I'm trying to create a new spreadsheet and worksheet containing the column headings from a dataframe. I then want to append new data to the worksheet every For loop iteration. I am likely to have a large amount of data and therefore thought it would be necessary to write it out to Excel after every iteration rather than writing the whole DF at the end.
The "Append data to existing worksheet" code in the For loop works correctly (ie gives me 3 rows of values) on its own if I am writing to a spreadsheet that already contains the column headings that I have created within Excel. But when I run the code as you see below, I only end up with the column headings and the values from the last For loop iteration. I'm obviously missing something simple but can't seem to work it out. Any help would be much appreciated
import openpyxl as xl
import pandas as pd
import numpy as np
import datetime as dt
fn = '00test101.xlsx'
# Create new workbook
wb = xl.Workbook()
wb.save(fn)
book = xl.load_workbook(fn)
writer = pd.ExcelWriter(fn,engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
# Write DF column names to new worksheet
DF = pd.DataFrame(columns=['A','B','C'])
DF.to_excel(writer, 'ABC', header=True, startrow=0)
writer.save()
for i in range(3):
a = np.array([1,3,6]) * i
# Overwrite existing DF and add data
DF = pd.DataFrame(columns=['A','B','C'])
DF.loc[dt.datetime.now()] = a
# Append data to existing worksheet
book = xl.load_workbook(fn)
writer = pd.ExcelWriter(fn,engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
DF.to_excel(writer, 'ABC', header=None, startrow=book.active.max_row)
writer.save()
# Remove unwanted default worksheet
wb = xl.load_workbook(fn)
def_sheet = wb.get_sheet_by_name('Sheet')
wb.remove_sheet(def_sheet)
wb.save(fn)

Related

Getting 'builtin_function_or_method object has no attribute ' Error while reading all worksheets from Excel file

I'm trying to read all excel files from current directory and all worksheets from each file.
When I try below code with single worksheet it works fine but when I try it with multiple worksheets then it gives below error
for row in df.values.tolist(): AttributeError: 'builtin_function_or_method' object has no attribute 'tolist'
Below is my code:
path = os.getcwd()
files = glob.glob(os.path.join(path,"*.xlsx"))
df_email = pd.DataFrame()
for f in files:
df=pd.read_excel(f, sheet_name=None)
for row in df.values.tolist():
for col in row:
matches = re.findall(regex, str(col))
if matches:
df_email = df_email.append([matches[0]], ignore_index=True)
That is because of the sheet_name parameter of the read_excel function.
If you do not specify the sheet_name value, read_excel return DataFrame (read first sheet of the excel file). However, if you specify the sheet_name value (sheet names as a list or None), it returns a dictionary of DataFrame. (see https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html)
Thus, you have to use df.values() to get each sheets in the excel file because the type of df is not DataFrame, it is dict.
Try to print the type of the df, as follows:
path = os.getcwd()
files = glob.glob(os.path.join(path,"*.xlsx"))
df_email = pd.DataFrame()
for f in files:
df=pd.read_excel(f, sheet_name=None)
print(type(df)) # You expected DataFrame, but it is a dictionary.

Write all pandas dataframe in workspace to excel

I'm trying to write all the currently available pandas dataframe in workspace to excel sheets. By following example from this SO thead, but I'm unable to make it work.
This is my not working code:
alldfs = {var: eval(var) for var in dir() if isinstance(eval(var), pd.core.frame.DataFrame)}
for df in alldfs.values():
print(df.name)
fmane = df+".xlsx"
writer = pd.ExcelWriter(fmane)
df.to_excel(writer)
writer.save()
Any help on how to correct this, so that I can pass the dataframe names to a variable, so that the excel filename being written can be same as the dataframe. I'm using spyder 4, python 3.8
Just a small fix will do the job:
alldfs = {var: eval(var) for var in dir() if isinstance(eval(var), pd.core.frame.DataFrame)}
for df_name, df in alldfs.items():
print(df_name)
fmane = df_name+".xlsx"
writer = pd.ExcelWriter(fmane)
df.to_excel(writer)
writer.save()

I want to copy one excel column data to another excel row data using python

import numpy as np
import pandas as pd
dfs = pd.read_excel('input.xlsx', sheet_name=None,header=None)
tester=dfs['Sheet1'].values.tolist()
keys = list(zip(*tester))[0]
seen = set()
seen_add = seen.add
keysu= [x for x in keys if not (x in seen or seen_add(x))]
values = list(zip(*tester))[1]
a = np.array(values).reshape(int(len(values)/len(keysu)),len(keysu))
list1=[keysu]
for i in a:
list1.append(list(i))
df=pd.DataFrame(list1)
df.to_excel('output.xlsx',index=False,header=False)
I want to copy one excel column data to another excel row data using python
want to execute and run

Is there a Pyhon module to read from Oracle and Split into multiple excel files with less memory usage based on column?

I am trying to split an oracle table based on values in a column (Hospital Names). The data set is ~3 Mil rows across 66 columns. I'm trying to write data for 1 hospital from 3 different table into 1 excel workbook in 3 different sheets.
I have a running code which worked for ~700K rows but the new set is too large and I run into memory problems. I tried to modify my code to hit the database each time for a hospital name using a for loop. But I get xlsx error of closing it explicitly.
import cx_Oracle
import getpass
import xlsxwriter
import pandas as pd
path = "C:\HN\1"
p = getpass.getpass()
# Connecting to Oracle
myusername = 'CN138609'
dsn_tns = cx_Oracle.makedsn('oflc1exa03p-vip.centene.com', '1521', service_name='IGX_APP_P')
conn = cx_Oracle.connect(user=myusername, password=p, dsn=dsn_tns)
sql_4 = "select distinct hospital_name from HN_Hosp_Records"
df4 = pd.read_sql(sql_4,conn)
hospital_name = list(df4['HOSPITAL_NAME'])
for x in hospital_name:
hosp_name = {"hosp" : x}
sql_1 = "select * from HN_Hosp_Records where hospital_name = :hosp"
sql_2 = "select * from HN_CAP_Claims_Not_In_DHCS where hospital_name = :hosp"
sql_3 = "select * from HN_Denied_Claims where hospital_name = :hosp"
df1 = pd.read_sql(sql_1,conn,params=hosp_name)
df2 = pd.read_sql(sql_2,conn,params=hosp_name)
df3 = pd.read_sql(sql_3,conn,params=hosp_name)
df_dhcs = df1.loc[df1['HOSPITAL_NAME'] == x]
df_dw = df2.loc[df2['HOSPITAL_NAME'] == x]
df_denied = df3.loc[df3['HOSPITAL_NAME'] == x]
# Create a new excel workbook
writer = pd.ExcelWriter(path + x + "_HNT_P2_REC_05062019.xlsx", engine='xlsxwriter')
# Write each dataframe to a different worksheet.
df_dhcs.to_excel(writer, sheet_name="DHCS")
df_dw.to_excel(writer, sheet_name = "Not In DHCS")
df_denied.to_excel(writer, sheet_name = "Denied")
writer.close()
Here is the warning/error I'm getting. The code doesn't stop but no file is being output:
File "C:\ProgramData\Anaconda3\lib\site-packages\xlsxwriter\workbook.py", line 153, in del
raise Exception("Exception caught in workbook destructor. "
Exception: Exception caught in workbook destructor. Explicit close() may be required for workbook.
I solved it. instead of binding variable using %s was the trick.

Read from CSV and store in Excel tabs

I am reading multiple CSVs (via URL) into multiple Pandas DataFrames and want to store the results of each CSV into separate excel worksheets (tabs). When I keep writer.save() inside the for loop, I only get the last result in a single worksheet. And when I move writer.save() outside the for loop, I only get the first result in a single worksheet. Both are wrong.
import requests
import pandas as pd
from pandas import ExcelWriter
work_statements = {
'sheet1': 'URL1',
'sheet2': 'URL2',
'sheet3': 'URL3'
}
for sheet, statement in work_statements.items():
writer = pd.ExcelWriter('B.xlsx', engine='xlsxwriter')
r = requests.get(statement) # go to URL
df = pd.read_csv(statement) # read from URL
df.to_excel(writer, sheet_name= sheet)
writer.save()
How can I get all three results in three separate worksheets?
You are re-initializing the writer object with each loop. Simply initialize it once before for and save document once after the loop. Also, in read_csv() line, you should be reading in the request content, not the URL (i.e., statement) saved in dictionary:
writer = pd.ExcelWriter('B.xlsx', engine='xlsxwriter')
for sheet, statement in work_statements.items():
r = requests.get(statement) # go to URL
df = pd.read_csv(r.content) # read from URL
df.to_excel(writer, sheet_name= sheet)
writer.save()

Resources