panda save and read excel in loop - excel

I have a loop where I call a function in which I loop to read an excel, then write and save it. But at the end I only the last result is stored.
As a simple example
for i in range(3):
callfunc(i)
callfunc(i)
panda open excel
for j in range(10:13:1)
write in excel(i,j) in new sheet
save excel
As final result i only get (3,10) (3, 11) (3,12).
It seems when re-opening the excel in the callfunc the excel doesn't get saved but the original excel is kept and I dont get why.
Thank you !

Let's use separate sheet_name:
import pandas as pd
import numpy as np
from openpyxl import load_workbook
path = r"~\Desktop\excelData\data.xlsx"
book = load_workbook(path)
for i in range(3):
writer = pd.ExcelWriter(path, engine = 'openpyxl')
writer.book = book
df = pd.DataFrame(x3)
df.to_excel(writer, sheet_name = '{}__sheet'.format(i))
writer.save()
writer.close()

Related

PysimpleGUI Combo source from Excel

Dears,
I'm using excel file as my data source, and in my program I'm using sg.combo , where it contains a long list, consequently, I want to have this list in one of the sheets in my excel file, how to do that ?
from pathlib import Path
import PySimpleGUI as sg
import pandas as pd
current_dir = Path(__file__).parent if '__file__' in locals() else Path.cwd()
EXCEL_FILE = current_dir / 'Example.xlsx'
df = pd.read_excel(EXCEL_FILE)
Thanks

How to reformat the resultant excel sheet after coming multiple excel sheet in Pandas Python

I tried combining multiple sheets of multiple excel into single excel using pandas python but in the end excel sheet,the rows labels are the excel sheet file name,each sheet as column name.I am getting it as messy.
How do I get it in proper format.Here is the code:
import pandas as pd
import os
from openpyxl.workbook import Workbook
os.chdir("C:/Users/w8/PycharmProjects/decorators_exaample/excel_files")
path = "C:/Users/w8/PycharmProjects/decorators_exaample/excel_files"
files = os.listdir(path)
AllFiles = pd.DataFrame()
for f in files:
info = pd.read_excel(f, sheet_name=None)
AllFiles=AllFiles.append(info, ignore_index=True)
writer = pd.ExcelWriter("Final.xlsx")
AllFiles.to_excel(writer)
writer.save()
The final excel looks like this :
enter image description here
you don't actually need the whole os and Workbook part. That could clean your code and ease finding errors. I assume, that path is the path to the folder where all the excel files are stored:
import pandas as pd
import glob
path = "C:\Users\w8\PycharmProjects\decorators_exaample\excel_files"
file_list = glob.glob(path)
df= pd.DataFrame()
for f in file_list :
info = pd.read_excel(f)
df = df.append(info)
df.to_excel('C:\Users\w8\PycharmProjects\decorators_exaample\excel_files\new_filename.xlsx')
should be as easy as that

Printing the value of a formula with openpyxl

I have been trying to research this for the past 2 days and the most regular answer I see is to use data_only=True however that does not seem to fix the issue of printing the value of a formula. Here is my scrip. Does anyone have an answer for this?
import os
import openpyxl
from openpyxl import Workbook
from openpyxl.reader.excel import load_workbook
from openpyxl import load_workbook
import csv
directoryPath = r'c:\users\username\documents\reporting\export\q3'
os.chdir(directoryPath)
folder_list = os.listdir(directoryPath)
for folders, sub_folders, file in os.walk(directoryPath):
for name in file:
if name.startswith("BEA"):
filename = os.path.join(folders, name)
print filename
wb = load_workbook(filename, data_only=True)
sheet = wb.get_sheet_by_name("Sensor Status")
for row_cells in sheet.iter_rows(min_row=1, max_row=4, min_col=8, max_col=13):
for cell in row_cells:
print cell.internal_value

how to create table into SQLite3 from importing excel data in python?

In my code, I am importing data from excel file into an SQLite database using python.
it doesn't give any error but it converts every excel column name into a table.
I have multiple excel files with the same data structure, containing 40K rows and 52 columns each file.
when I am importing these file data into SQLite database using python code it converts each column header name into a table.
import sqlite3
import pandas as pd
filename= gui_fname()
con=sqlite3.connect("cps.db")
wb = pd.read_excel(filename,sheet_name ='Sheet2')
for sheet in wb:
wb[sheet].to_sql(sheet,con,index=False,if_exists = 'append')
con.commit()
con.close()
it should create a table with the name of Sheet which I am importing.
I do some hit and trial and found the solution:
I just put con.commit() within the for loop and it works as required, but I didn't get the logic.
I will appreciate if anyone can explain to me this.
import sqlite3
import pandas as pd
filename= gui_fname()
con=sqlite3.connect("cps.db")
wb = pd.read_excel(filename,sheet_name = 'Sheet2')
for sheet in wb:
wb[sheet].to_sql(sheet,con,index=False,if_exists = 'append')
con.commit()
con.close()
import pandas as pd
def import_excel_to_sqlite_db(excelFile):
df = pd.read_excel(excelFile)
con = sqlite3.connect("SQLite.db")
cur = con.cursor()
results = cur.execute("Select * from TableName")
final = df.to_sql("TableName", con, if_exists="append", index=False)
pd.DataFrame(results, columns=final)
con.commit()
cur.close()

Variables assignement before function

I have created a package to quickly transform datas using pandas and xlsxwriter.
This worked pretty well and I did a few functions successfully. But recently I've hit a wall:
For a few functions I need to define variables first but they are not basic types (list, tuple, str etc.) but for instance a dataframe. I've looked into global variables and saw they're are not recommanded (and wouldn't know where to put them) and I also looked into classes but I don't know how to solve my problem using them. I've also tried creating an empty dataframe but got an empty dataframe after the function.
What I'm trying to do is a read function with pandas for .csv or .xlsx and a function for saving with Xlsxwriter engine.
The goal is to change as little as possible in the code to transform data frequently and rapidly (e.g. i have functions doing LEFT,RIGHT like in Excel or even MIDDLE with column numbers) and have an easy and short code in main.py.
Here is the stripped down version of my code which uses 2 python files (main.py and format_operations.py). I have added commentaries where I'm having issues.
Thanks in advance for your help!
"""
main.py
"""
import format_operations as tbfrm #import another python file in the same folder
import pandas as pd
import numpy as np
import xlsxwriter.utility
#file settings
file_full_path= "C:/Tests/big_data.xlsx"
file_save_to= "C:/Tests/Xlsxwriter.xlsx"
sheet_name_save_to= "Xlswriter"
dfname = ??? #I need to create the variable but I don't know how
tbfrm.FCT_universal_read(dfname,file_full_path) #CAN'T GET IT TO WORK
#column operations and formatting
columns_numeric = [3,6] # (with pandas) list of columns with number values by iloc number, starts at 0 which is column A in Excel
tbfrm.FCT_columns_numeric(dfname,columns_numeric) #example of a WORKING function (if dfname is defined)
#write with Xlsxwriter engine
XLWRITER_DF = ??? #same problem as before, how to create the variable?
workbookvarname = ??? #same here
worksheetvarname = ??? # same here
tbfrm.FCT_df_xlsxwriter(XLWRITER_DF,dfname,file_save_to,sheet_name_save_to,workbookvarname,worksheetvarname) #CAN'T GET IT TO WORK
#### WORKING piece of code I want to execute after saving with Xlsxwriter engine ####
worksheet.set_zoom(80)
# Conditional formatting
color_range_1 = "J1:J{}".format(number_rows+1)
FORMAT1 = workbook.add_format({'bg_color': '#FFC7CE','font_color': '#9C0006'})
FORMAT2 = workbook.add_format({'bg_color': '#C6EFCE','font_color': '#006100'})
worksheet.conditional_format(color_range_1, {'type': 'bottom','value': '5','format': FORMAT1})
worksheet.conditional_format(color_range_1, {'type': 'top','value': '5','format': FORMAT2})
Other file:
"""
format_operations.py
"""
import pandas as pd
import numpy as np
import xlsxwriter.utility
def FCT_universal_read(dfname,file_full_path):
if ".xls" in file_full_path:
dfname = pd.read_excel(file_full_path) #optional arguments:sheetname='Sheet1', header=0 , dtype=object to preserve values
if ".csv" in file_full_path:
dfname = pd.read_csv(file_full_path)
# save file with XLSXWriter engine for additional options to pandas
def FCT_df_xlsxwriter(XLWRITER_DF,dfname,file_save_to,sheet_name_save_to,workbookvarname,worksheetvarname):
XLWRITER_DF = pd.ExcelWriter(file_save_to, engine='xlsxwriter')
dfname.to_excel(XLWRITER_DF, sheet_name=sheet_name_save_to,encoding='utf-8')
workbookvarname = XLWRITER_DF.book
worksheetvarname = XLWRITER_DF.sheets[sheet_name_save_to]
#format as numbers
def FCT_columns_numeric(dfname,columns_numeric):
for x in columns_numeric:
dfname.iloc[:,x] = pd.to_numeric(dfname.iloc[:,x])
Your FCT_universal_read function should not modify a dataframe but instead return a new one:
def FCT_universal_read(file_full_path):
if file_full_path.split('.')[-1] == "xls":
df = pd.read_excel(file_full_path) #optional arguments:sheetname='Sheet1', header=0 , dtype=object to preserve values
if file_full_path.split('.')[-1] == "csv":
df = pd.read_csv(file_full_path)
return df
And in your main, do:
dfname = tbfrm.FCT_universal_read(file_full_path)
Same answer for FCT_df_xlsxwriter, you should rewrite it with a return so that you can do:
XLWRITER_DF, workbookvarname,worksheetvarname = tbfrm.FCT_df_xlsxwriter(dfname,file_save_to,sheet_name_save_to)
To grasp how python is dealing with the arguments you pass to a function, you should read these blog posts:
https://jeffknupp.com/blog/2012/11/13/is-python-callbyvalue-or-callbyreference-neither/
https://robertheaton.com/2014/02/09/pythons-pass-by-object-reference-as-explained-by-philip-k-dick/
You need to update FCT_universal_read so that it returns the dataframe you want. There is no need to define the dataframe outside the function, simply create and return it
df = FCT_universal_read('/your/file/path')
def FCT_universal_read(file_full_path):
if ".xls" in file_full_path:
df = pd.read_excel(file_full_path) #optional arguments:sheetname='Sheet1', header=0 , dtype=object to preserve values
return df
if ".csv" in file_full_path:
df = pd.read_csv(file_full_path)
return df
Thanks so much to both of you !! I get the logic now :)! Thanks also for the documentation.
I sucessfully managed to do both functions. I had been struggling for several hours.
I like the .split function that you used which ensures the script only looks at the extension.
I updated FCT_xlsxwriter and FCT_universal_read as you were saying. Here are both functions corrected:
'''
format_operations.py
'''
def FCT_universal_read(file_full_path):
if "xls" in file_full_path.split('.')[-1]:
dfname = pd.read_excel(file_full_path) #example: C:/Tests/Bigdata.xlsx
return dfname
if "csv" in file_full_path.split('.')[-1]:
dfname = pd.read_csv(file_full_path)
return dfname
def FCT_df_xlsxwriter(dfname,file_save_to,sheet_name_save_to):
XLWRITER_DF = pd.ExcelWriter(file_save_to, engine='xlsxwriter')
dfname.to_excel(XLWRITER_DF, sheet_name=sheet_name_save_to,encoding='utf-8')
workbook = XLWRITER_DF.book
worksheet = XLWRITER_DF.sheets[sheet_name_save_to]
return XLWRITER_DF,workbook,worksheet
Here is how I call the two functions:
'''
main.py
'''
import format_operations as tbfrm
import pandas as pd
import xlsxwriter.utility
#settings
file_full_path= "C:/Tests/big_data.xlsx"
file_save_to= "C:/Tests/Xlsxwriter.xlsx"
sheet_name_save_to= "Xlswriter"
#functions
FILE_DF = tbfrm.FCT_universal_read(file_full_path)
XLWRITER_DF,workbook,worksheet = tbfrm.FCT_df_xlsxwriter(FILE_DF,file_save_to,sheet_name_save_to)

Resources