I am trying to update an excel sheet with Python codes. I read specific cell and update it accordingly but Padadas overwrites the entire excelsheet which I loss other pages as well as formatting. Anyone can tell me how I can avoid it?
Record = pd.read_excel("Myfile.xlsx", sheet_name'Sheet1', index_col=False)
Record.loc[1, 'WORDS'] = int(self.New_Word_box.get())
Record.loc[1, 'STATUS'] = self.Stat.get()
Record.to_excel("Myfile.xlsx", sheet_name='Student_Data', index =False)
My code are above, as you can see, I only want to update few cells but it overwrites the entire excel file. I tried to search for answer but couldn't find any specific answer.
Appreciate your help.
Update: Added more clarifications
Steps:
1) Read the sheet which needs changes in a dataframe and make changes in that dataframe.
2) Now the changes are reflected in the dataframe but not in the sheet. Use the following function with the dataframe in step 1 and name of the sheet to be modified. You will use the truncate_sheet param to completely replace the sheet of concern.
The function call would be like so:
append_df_to_excel(filename, df, sheet_name, startrow=0, truncate_sheet=True)
from openpyxl import load_workbook
import pandas as pd
def append_df_to_excel(filename, df, sheet_name="Sheet1", startrow=None,
truncate_sheet=False,
**to_excel_kwargs):
"""
Append a DataFrame [df] to existing Excel file [filename]
into [sheet_name] Sheet.
If [filename] doesn"t exist, then this function will create it.
Parameters:
filename : File path or existing ExcelWriter
(Example: "/path/to/file.xlsx")
df : dataframe to save to workbook
sheet_name : Name of sheet which will contain DataFrame.
(default: "Sheet1")
startrow : upper left cell row to dump data frame.
Per default (startrow=None) calculate the last row
in the existing DF and write to the next row...
truncate_sheet : truncate (remove and recreate) [sheet_name]
before writing DataFrame to Excel file
to_excel_kwargs : arguments which will be passed to `DataFrame.to_excel()`
[can be dictionary]
Returns: None
"""
# ignore [engine] parameter if it was passed
if "engine" in to_excel_kwargs:
to_excel_kwargs.pop("engine")
writer = pd.ExcelWriter(filename, engine="openpyxl")
# Python 2.x: define [FileNotFoundError] exception if it doesn"t exist
try:
FileNotFoundError
except NameError:
FileNotFoundError = IOError
if "index" not in to_excel_kwargs:
to_excel_kwargs["index"] = False
try:
# try to open an existing workbook
if "header" not in to_excel_kwargs:
to_excel_kwargs["header"] = True
writer.book = load_workbook(filename)
# get the last row in the existing Excel sheet
# if it was not specified explicitly
if startrow is None and sheet_name in writer.book.sheetnames:
startrow = writer.book[sheet_name].max_row
to_excel_kwargs["header"] = False
# truncate sheet
if truncate_sheet and sheet_name in writer.book.sheetnames:
# index of [sheet_name] sheet
idx = writer.book.sheetnames.index(sheet_name)
# remove [sheet_name]
writer.book.remove(writer.book.worksheets[idx])
# create an empty sheet [sheet_name] using old index
writer.book.create_sheet(sheet_name, idx)
# copy existing sheets
writer.sheets = {ws.title: ws for ws in writer.book.worksheets}
except FileNotFoundError:
# file does not exist yet, we will create it
to_excel_kwargs["header"] = True
if startrow is None:
startrow = 0
# write out the new sheet
df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)
# save the workbook
writer.save()
We can't replace openpyxl engine here to write excel files as asked in comment. Refer reference 2.
References:
1) https://stackoverflow.com/a/38075046/6741053
2) xlsxwriter: is there a way to open an existing worksheet in my workbook?
Related
I am trying to read an excel file from the following URL: http://www.ssf.gob.sv/html_docs/boletinesweb/bdiciembre2020/III_Bancos/Cuadro_17.xlsx
I used the code:
ruta_indicadores = 'http://www.ssf.gob.sv/html_docs/boletinesweb/bdiciembre2020/III_Bancos/Cuadro_17.xlsx'
indicadores = pd.read_excel(ruta_indicadores)
But when i run the code, the dataframe is empty, but the file is not, so i dont know why it isn't reading excel file.
Here is the screenshoot for the excel file:
The problem is the pd.read_excel() function by default read the first sheet, but the table you want have a special sheet name, which is "HOJA1".
Here is the code that worked:
ruta_indicadores = 'http://www.ssf.gob.sv/html_docs/boletinesweb/bdiciembre2020/III_Bancos/Cuadro_17.xlsx'
indicadores = pd.read_excel(ruta_indicadores, sheet_name='HOJA1')
further more, a more robust solution:
ruta_indicadores = 'http://www.ssf.gob.sv/html_docs/boletinesweb/bdiciembre2020/III_Bancos/Cuadro_17.xlsx'
indicadores_dict = pd.read_excel(ruta_indicadores, ,sheet_name=None)
# remove the empty sheet
sheetname_list = list(filter(lambda x: not indicadores_dict[x].empty, indicadores_dict.keys()))
df_list = [indicadores_dict[s] for s in sheetname_list]
ref. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html
Let's, first of all, discuss the Scenario that Why your Code is not able to print Output. Then we will move towards how we can resolve it. The Issue was :-
You are directly fetching Table from URL. So, it has some Cache in Sheet. So, due to this, your pd.read_csv() module is not able to find your Primary Sheet.
How I found that there is another sheet in your data. For that kindly follow the code given below:-
# Import all important-libraries
from openpyxl import load_workbook
# Load 'Cuadro_17.xlsx' Excel Sheet to Workbook
indicadores = load_workbook(filename = "Cuadro_17.xlsx")
# Print Sheet Names of 'Cuadro_17.xlsx'
indicadores.sheetnames
# Output of above Cell:-
['Cognos_Office_Connection_Cache', 'HOJA1']
As you can see our first sheet is Cognos_Office_Connection_Cache and we can't fetch it.
Appropriate Solution in this Scenario:-
Now we know that our data has been stored in HOJA1 Sheet. So, we can fetch that specific Sheet. and another important thing is your Data contains Multi-Indexing. So, we have to fetch data accordingly. Code for the Same was stated below:-
# Import all important-libraries
import pandas as pd
# Store 'URL' in 'ruta_indicadores' Variable
ruta_indicadores = 'http://www.ssf.gob.sv/html_docs/boletinesweb/bdiciembre2020/III_Bancos/Cuadro_17.xlsx'
# 'Read CSV' from 'URL' Using 'ps.read_excel()' Module and also Specifies 'Sheet Name', 'Starting Range' of 'Table' and 'header' for 'Multi-level Indexing'
indicadores = pd.read_excel(ruta_indicadores, sheet_name = 'HOJA1', skiprows = 7, header = [0, 1])
# 'Drop' unnecessary 'Column'
indicadores.drop('Unnamed: 0_level_0', axis = 1, inplace = True)
# Rename Child level Column of 'Conceptos'
indicadores.rename(columns={'Unnamed: 1_level_1': ''}, inplace = True)
# Remove 'NaN' Entries from the 'indicadores' Data
indicadores = indicadores.fillna('')
# Print Few records of 'indicadores' Data
indicadores.head()
I can't print that big Output here. So, I have Attached Sample Output of above mentioned Code in the Image given below:-
As you can see we have fetched Table successfully. Hope this Solution helps you.
Here is my current code below.
I have a specific range of cells (from a specific sheet) that I am pulling out of multiple (~30) excel files. I am trying to pull this information out of all these files to compile into a single new file appending to that file each time. I'm going to manually clean up the destination file for the time being as I will improve this script going forward.
What I currently have works fine for a single sheet but I overwrite my destination every time I add a new file to the read in list.
I've tried adding the mode = 'a' and a couple different ways to concat at the end of my function.
import pandas as pd
def excel_loader(fname, sheet_name, new_file):
xls = pd.ExcelFile(fname)
df1 = pd.read_excel(xls, sheet_name, nrows = 20)
print(df1[1:15])
writer = pd.ExcelWriter(new_file)
df1.insert(51, 'Original File', fname)
df1.to_excel(new_file)
names = ['sheet1.xlsx', 'sheet2.xlsx']
destination = 'destination.xlsx'
for name in names:
excel_loader(name, 'specific_sheet_name', destination)
Thanks for any help in advance can't seem to find an answer to this exact situation on here. Cheers.
Ideally you want to loop through the files and read the data into a list, then concatenate the individual dataframes, then write the new dataframe. This assumes the data being pulled is the same size/shape and the sheet name is the same. If sheet name is changing, look into zip() function to send filename/sheetname tuple.
This should get you started:
names = ['sheet1.xlsx', 'sheet2.xlsx']
destination = 'destination.xlsx'
#read all files first
df_hold_list = []
for name in names:
xls = pd.ExcelFile(name)
df = pd.read_excel(xls, sheet_name, nrows = 20)
df_hold_list.append(df)
#concatenate dfs
df1 = pd.concat(df_hold_list, axis=1) # axis is 1 or 0 depending on how you want to cancatenate (horizontal vs vertical)
#write new file - may have to correct this piece - not sure what functions these are
writer = pd.ExcelWriter(destination)
df1.to_excel(destination)
I found part of the answer from this post and it was very useful
https://stackoverflow.com/a/42375263/13765378
However, every time I ran this code with new data, a new sheet gets added to the end of a workbook.
After a while, it is quite an effort to get to that new sheet that was just added.
Is there a way to specify adding to the beginning of the workbook, so it will be the default sheet when we open the workbook?
This will help you.use the second line.this uses openpyxl module
help link https://openpyxl.readthedocs.io/en/stable/tutorial.html
ws1 = wb.create_sheet("Mysheet") # insert at the end (default)
ws2 = wb.create_sheet("Mysheet", 0) # insert at first position
Thanks to Vignesh's answer, and Thanks to and modifying the code from
writing pandas data frame to existing workbook
I got the following code to work: [Every time the program is run, a new sheet will be created at the beginning of the workbook and contains (new) data. Rest of the code just testing out the function append_df_to_excel()]
The function append_df_to_excel() seems over-kill for what I need to do, but for now I could not find a better and cleaner way to do it.
I also do not understand why saving the workbook at the end will not save the data.
import os
from openpyxl import load_workbook
import xlsxwriter
import pandas as pd
from datetime import datetime
filename = r'C:\test\test.xlsx'
if not os.path.exists(filename):
wb = xlsxwriter.Workbook(filename)
wb.close()
def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None,
truncate_sheet=False,
**to_excel_kwargs):
"""
Append a DataFrame [df] to existing Excel file [filename]
into [sheet_name] Sheet.
If [filename] doesn't exist, then this function will create it.
Parameters:
filename : File path or existing ExcelWriter
(Example: '/path/to/file.xlsx')
df : dataframe to save to workbook
sheet_name : Name of sheet which will contain DataFrame.
(default: 'Sheet1')
startrow : upper left cell row to dump data frame.
Per default (startrow=None) calculate the last row
in the existing DF and write to the next row...
truncate_sheet : truncate (remove and recreate) [sheet_name]
before writing DataFrame to Excel file
to_excel_kwargs : arguments which will be passed to `DataFrame.to_excel()`
[can be dictionary]
Returns: None
"""
# ignore [engine] parameter if it was passed
if 'engine' in to_excel_kwargs:
to_excel_kwargs.pop('engine')
writer = pd.ExcelWriter(filename, engine='openpyxl')
if not os.path.exists(filename):
wb = xlsxwriter.Workbook(filename)
wb.close()
try:
# try to open an existing workbook
writer.book = load_workbook(filename)
# get the last row in the existing Excel sheet
# if it was not specified explicitly
if startrow is None and sheet_name in writer.book.sheetnames:
startrow = writer.book[sheet_name].max_row
# truncate sheet
if truncate_sheet and sheet_name in writer.book.sheetnames:
# index of [sheet_name] sheet
idx = writer.book.sheetnames.index(sheet_name)
# remove [sheet_name]
writer.book.remove(writer.book.worksheets[idx])
# create an empty sheet [sheet_name] using old index
writer.book.create_sheet(sheet_name, idx)
# writer.book.create_sheet(sheet_name, 0) #not working
# copy existing sheets
writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
except FileNotFoundError:
# file does not exist yet, we will create it
pass
if startrow is None:
startrow = 0
# write out the new sheet
df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)
# save the workbook
writer.save()
A = [[0,1,2],[3,4,5],[6,7,8],[9,10,11]]
df = pd.DataFrame(A, columns=list('XYZ'))
newSheet = "New_" + datetime.now().strftime('%Y-%m-%d_%H%M%S')
wb = load_workbook(filename)
ws = wb.create_sheet(newSheet, 0)
wb.save(filename)
append_df_to_excel(filename, df, sheet_name="Old2", startrow=1, startcol=1)
append_df_to_excel(filename, df, sheet_name="Old3", index=False)
append_df_to_excel(filename, df, sheet_name="Old1", startcol=2, index=False)
append_df_to_excel(filename, df, sheet_name=newSheet, columns=df.columns.values, startrow=0, startcol=0, index=False)
# wb.save(filename) # Do not do this, will get nothing written to workbook
'''
I am using openpyxl to open one xlsx file & making few arithmetic operation then the saving it in new xlsx file. Now that i want to import many files and want to operate same things and store all file results in single xlsx file multiple sheet.
'''
from openpyxl import Workbook
import openpyxl
wb= openpyxl.load_workbook(filename=r"C:\Users\server\Desktop\Python\Data.xlsx", read_only=True)
# resading file from
ws = wb['Sheet1'] # moving into sheet1
# Comprehension
row_data = [ [cell.value for cell in row] for row in ws.rows] # looping through row data in sheet
header_data = row_data[0] # leaving header data by slicing
row_data = row_data[1:] #storing xlsx file data into 2D list
[ dp.append(dp[1]*dp[2])for dp in row_data] # perfornming multplication opertion columnwise, lets say coulmn 1 * column 2 in a row_data and appending into next column
wb.close()# closing the worksheet
wb = openpyxl.Workbook() # opening new worksheet
ws = wb.active # sheet 1 is active`enter code here`
ws.append(header_data) # header data writtten
for row in row_data: # 2D list data is writng in sheet 1
ws.append(row)
wb.save(r"C:\Users\server\Desktop\Python\Result.xlsx")
'''I am able store multiple xlsx files in a list, Now i want to access each file data and perform few arithmetic operation , finally results data need to store in single xlsx file with multiple sheets in it
'''
from openpyxl import Workbook
import openpyxl
import os
location=r"C:\Users\server\Desktop\Python\Data.xlsx" # will get folder location here where many xlsx files are present
counter = 0 #keep a count of all files found
xlsx_files = [] #list to store all xlsx files found at location
for file in os.listdir(wb):
try:
if file.endswith(".xlsx"):
print ("xlsx file found:\t", file)
xlsx_files.append(str(file))
counter = counter+1
except Exception as e:
raise e
print ("No files found here!")
print ("Total files found:\t", counter)
I need to merge data from multiple sheets of an Excel to form a new summary sheet using Python. I am using pandas to read the excel sheets and create new summary sheet. After concatenation the table format is getting lost i.e. Header and borders.
Is there a way to read from source sheet with the format and write to final sheet.
if first is not possible how to format the data after concatenation
Python Code to concatenate:
import pandas as pd
df = []
xlsFile = "some path excel"
sheetNames = ['Sheet1', 'Sheet2','Sheet3']
for nms in sheetNames:
data = pd.read_excel(xlsFile, sheet_name = nms, header=None, skiprows=1)
df.append(data)
final = "some other path excel "
df = pd.concat(df)
df.to_excel(final, index=False, header=None)
Sheet 1 Input Data
Sheet 2 Input Data
Sheet 3 Input Data
Summary Sheet output
You can try the following code:
df = pd.concat(pd.read_excel('some path excel.xlsx', sheet_name=None), ignore_index=True)
If you set sheet_name=None you can read all the sheets in the workbook at one time.
I suggest you the library xlrd
(https://secure.simplistix.co.uk/svn/xlrd/trunk/xlrd/doc/xlrd.html?p=4966
and https://github.com/python-excel/xlrd)
It is a good library to do that.
from xlrd import open_workbook
path = '/Users/.../Desktop/Workbook1.xls'
wb = open_workbook(path, formatting_info=True)
sheet = wb.sheet_by_name("Sheet1")
cell = sheet.cell(0, 0) # The first cell
print("cell.xf_index is", cell.xf_index)
fmt = wb.xf_list[cell.xf_index]
print("type(fmt) is", type(fmt))
print("Dumped Info:")
fmt.dump()
see also:
Using XLRD module and Python to determine cell font style (italics or not)
and How to read excel cell and retain or detect its format in Python (I brought the above code from this address)