Python Modify data of existing .xlsx file [closed] - excel

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I have some code here in python which created a .xlsx file using openpyxl.
However, when i tried to modify the file, the new data will register into the file but previous data is gone. I heard of using deepcopy or (.copy.copy) to copy the data of the file but how can i paste the data copied plus my current edits into the .xlsx file?
(*some code are missing here as it is a program with GUI, the code is juz far too long)
##############creating######################
try:
wb_ID = load_workbook('list.xlsx')
ws_ID = wb_ID.active
pass
except EnvironmentError as e: # OSError or IOError... As FileNotFoundError only exist in Python 3.x version
print(os.strerror(e.errno)) #use operating system error to define FileNotFoundErrorError
from openpyxl import Workbook #import Workbook function to create new xlsx (excel) file
wb_ID = Workbook()
ws_ID = wb_ID.active
ws_ID['A1'] = "IC"
ws_ID.merge_cells('B1:E1')
ws_ID['B1'] = "Name"
ws_ID.merge_cells('F1:K1')
ws_ID['L1'] = "Email"
ws_ID['M1'] = "Height"
ws_ID['N1'] = "Gender"
ws_ID['K1'] = "Bloodtype"
ws_ID.merge_cells('O1:Q1')
ws_ID['O1'] = "Default Consultation Day"
ws_ID.merge_cells('R1:T1')
ws_ID['R1'] = "Latest Appoinment"
wb_ID.save("list.xlsx")
pass
############editing#########################
wb = load_workbook(filename='list.xlsx')
ws = wb.active
last_row = 1
while True:
last_row += 1
cellchecker =ws['A'+str(last_row)].value #get the value of the cell
print(cellchecker)
print last_row
if cellchecker is None: #check if cell is empty-->then this row number is the new row
wb.save('list.xlsx')
break
else:
continue
print(str(last_row)) #convert to string var before passing the var for worksheet function
from openpyxl import Workbook
wb = Workbook()
ws = wb.active
ws['A'+str(last_row)] = (str(entry_IC.get().strip(' ')))
ws.merge_cells('B'+str(last_row)+':E'+str(last_row))
ws['B'+str(last_row)] = (str(entry_Name.get()))
ws.merge_cells('F'+str(last_row)+':K'+str(last_row))
ws['F'+str(last_row)] = (str(entry_email.get().strip(' ')))
ws['L'+str(last_row)] = (str(entry_Height.get().strip(' ')))
ws['M'+str(last_row)] = gender_selected
ws['N'+str(last_row)] = bloodtype_selected
ws.merge_cells('O'+str(last_row)+':Q'+str(last_row))
ws['O'+str(last_row)] = str(default_selected_day)
ws.merge_cells('R'+str(last_row)+':T'+str(last_row))
today = datetime.date.today() #as u might wonder why i redeclare this var since it already exist, but this is local var for this function only
ws['T'+str(last_row)] = (str(today))
wb.save('list.xlsx')
noticed that the editing part will overwrite the existing data as warned in openpyxl documentation. I really couldn't find a way to modify existing .xlsx file. Please help me out, I'm stuck here.

I've run into this problem several times, and haven't been able to solve it using pure python; however, you can use the following code to call a VBA macro from a Python script, which can be used to modify your existing excel file.
This has allowed me to come up with creative ways to streamline work so that I don't ever have to actually open up excel files in order to update them, even if data processing or gathering requires python.
# run excel macro from python
import win32com.client
fn = '//path/filename.xlsm'
xl=win32com.client.Dispatch("Excel.Application")
xl.Application.Visible = True
xl.Workbooks.Open(Filename = fn,ReadOnly = 1)
xl.Application.Run("macroname")
xl=0

Related

using Python to retrieve formatted strings from excel cell

I'm trying to pull a string from an excel cell that will retain it's formatting when executed in Python. For example. I'm only a week into learning this (and this is my first post on stackoverflow), please forgive any errors of convention in my code or post.
The variable 'name' is global and is defined through input earlier in the program. Everything works fine when the cell contents are defined in the program instead (ex: question = f"Hello {name} returns exactly what i expect, with the variable value swapped out for {name}).
I am pulling the correct workbook, sheet and cell (1,1), and the cell's contents are: Hello {name}
I've also tried: f"Hello {name}"
Input:
import openpyxl
from gtts import gTTS
import os
def speak(question):
language = 'en'
myobj = gTTS(text=mytext, lang=language, slow=False)
myobj.save("q.mp3")
os.system("q.mp3")
path = "wb1.xlsx"
wb_obj = openpyxl.load_workbook(path)
sheet_obj = wb_obj.active
question = f"{sheet_obj.cell(row = 1, column = 2).value}"
speak(question)
Output:
Hello {name}
I've tried the above format of question = f"(...)" as well as without the formatting. I've also tried leaving the sheet_obj.cell(row = 1, column = 2).value as is without formatting the string. Nothing has worked for me yet, any insight would be greatly appreciated. This community has been an amazing resource so far! Thanks in advance!
To use a dynamically created format string, use the eval function.
This example may help. It creates an excel file with a formatted cell value, then retrieves the format string and creates an final output string using the eval function.
import openpyxl as px
# create workbook
wb = px.Workbook()
ws = wb.active
ws.cell(1,1).value='hello {xxx}' # format string
wb.save("extest.xlsx")
# retrieve workbook
wb = px.load_workbook('extest.xlsx')
ws = wb.worksheets[0]
v = ws.cell(1,1).value # hello {xxx}
xxx = 'python'
print(v) # hello {xxx}
print(eval('f"' + v + '"')) # hello python
print(eval('f"' + ws.cell(1,1).value + '"')) # hello python
print(eval(f'f"{ws.cell(1,1).value}"')) # hello python
Output
hello {xxx}
hello python
hello python
hello python

Saving format with Openpyxl. Need to open and save again the file

I have an application program "program.exe" that has an excel file as argument:
program.exe file.xlsx
the "file.xlsx" needs to be modified first. In particular I need to dump a dataframe with some calculations:
wb = load_workbook(filename="file.xlsx")
ws = wb.active
list2d = df.values.tolist()
for r_idx, row in enumerate(list2d, 1):
for c_idx, value in enumerate(row, 1):
ws.cell(row=r_idx+start_row, column=c_idx, value=value)
wb.save("file.xlsx")
However, my program does not accept the "file.xlsx" unless I open and save the file manually. I saw some other users experienced the same issue but not solved apparently?
Expanding on comments:
Yes, program started like this:
import openpyxl as xl
wb = xl.Workbook()
ws = wb.active
[lots of code to create/format the worksheet]
and ended like this:
f = '/home/chris/myDir/outFile.xlsx'
wb.save(f)
Popen(['localc', f]) # to open the file after saving, as there's no way to simply open the worksheet/book after creating with openpyxl--`Popen` exists within the `subprocess` package
To be clear, outFile.xlsx is a file that already exists and is being saved to.

I need to check if a date already exists in Excel file using pyexcel or openpyx

Right now this script is not recognizing when dates in the excel file match the date I am looking for (today's date). I am new to programming, so if you can could you please explain the changes you have made in a simple way that would really help me.
This is the picture of how this code currently edits Excel. https://prnt.sc/pse0xt As you can see there are duplicates of dates and I want to have every date entered only ONCE.
Here is what I am trying:
import datetime
import openpyxl
import pyexcel
date = (datetime.date.today())
date_already_exists = False
# File to open PyExcel
file_name = "example.xlsx"
sheet = pyexcel.get_sheet(file_name = file_name)
# File to open OpenPyxl
wb = openpyxl.load_workbook(file_name, read_only=True)
ws = wb.active
# Decides whether to write a new entry
for row in ws.rows:
if row[1].value == date:
date_already_exists = True
if date_already_exists is False:
sheet.row += [date, "X"]
# Saves the file
sheet.save_as(array = sheet.row, filename = file_name)

Python script that checks the same sheet in 300 xlsx files, compares it to a master sheet and updates it accordingly

I have a list of around 300 excel files, all named following this pattern [aA-zZ]{1}[0-9]{5}.xlsx and a master file. I'm trying to put together a python script that reads the same sheet/column in each file, compares it to the master's file sheet and updates it accordingly.
I've been trying openpyxl but I'm hopelessly stuck, any help is very much appreciated.
#!Python3
import openpyxl
import pandas as pd
import os
# Move to the correct location
path = "/usr/tmp/files"
os.chdir(path)
# First we open the master file
wb = load_workbook('master.xlsx')
# grab master worksheet in master.xlsx
ws = wb.active('Sheet1')
#Second we open the rest of the files that include changes and compare with the data in master.xlsx
def main():
for f in files:
wb2 = load_workbook(f)
ws2 = wb2['Sheet1']
#read first workbook to get data
wb2 = load_workbook(filename = '.xlsx')
ws2 = wb2.get_sheet_by_name(name = 'Sheet1')
#Iterate through worksheet and compare with master sheet for changes
for row in ws.iter_rows():
for cell in row:
cellContent = str(cell.value)
if cellContent == 'yes'
wb = load_workbook('master.xlsx', optimized_write=True)
# Update cell contents
ws[cell] = cellContent
# Save workbook
wb.save('master.xlsx')
if __name__ == '__main__':
main()
Thanks!!
!!!!!EDITED CODE!!!!!
#!Python3
From openpyxl import *
import pandas as pd
import os
import re
# Move to the correct location
path = "/usr/tmp/files"
os.chdir(path)
# First we open the master file
wb = load_workbook('master.xlsx')
# grab master worksheet in master.xlsx
ws = wb.get_sheet_by_name('Sheet1')
# Open the rest of the files that include changes and compare with the data in master.xlsx
def main():
files = [f for f in os.listdir('.') if re.match(r'[A-Za-z][0-9]{5}\.xlsx', f)]
#read each workbook to get data
for f in files:
wb2 = load_workbook(f)
ws2 = wb2.get_sheet_by_name('Sheet1')
#Iterate through worksheet and compare with master sheet for changes
for row in ws2.iter_rows():
for cell in row:
cellContent = str(cell.value)
if cellContent == "yes":
wb = Workbook(write_only = True)
# Update cell content
ws[cell.coordinate] = str(cellContent)
else:
continue
# Save workbook
wb.save('master.xlsx')
if __name__ == '__main__':
main()
Depending on the size of the files I'd be tempted to do this in two steps: first read in the files in read-only mode and identify whether they need correcting or not.
Then go through the list of the files that need correcting and update only these. This is probably the fastest way to do this because it stops you loading all the worksheets of all the workbooks into memory.
NB. the syntax you're using is for older versions of openpyxl and no longer supported. I strongly advise you update to >= 2.4 and refer to the official documentation first and foremost.
To get the matching files, first:
import re
then in the first line of main() define files:
files = [f for f in os.listdir('.') if re.match(r'[A-Za-z][0-9]{5}\.xlsx', f)]
Note: I changed your regex pattern on the assumption that the filenames of interest are one letter (upper or lower case) followed by 5 digits followed by '.xlsx'.
Hope that helps

Iteration error writing file to excel with python

import string
import xlrd
import xlsxwriter
workbook = xlsxwriter.Workbook('C:\T\file.xlsx')
worksheet = workbook.add_worksheet()
book = open_workbook(r'C:\T\test.xls','r')
sheet = book.sheet_by_index(0)
for row_index in range(sheet.nrows):
for col_index in range(sheet.ncols):
print sheet.cell(row_index,0).value
x = sheet.cell(row_index,0).value
worksheet.write_string(row_index,col_index,x)
workbook.close()
I'm a skiddy to python. Here i'm trying to read the xls file with xlrd for data and copy it to another xlsx file through xlsxwriter module. but the data won't get pasted in the created xlsx sheet. Please guide me through this. Above is my exact code. Please correct me if any wrong.
A volley of Thanks in advance.
Your example program almost works. Mainly it needs the open_workbook() method to be prefixed with a class and it is better to use XlsxWriter write() instead of write_string() unless you are sure that all the data you are reading is of a string type. Also, the program was only reading values from column 0.
Here is the same example with those changes in place. I've also renamed the variables in_ and out_ to make it clearer which module is calling which method:
import xlrd
import xlsxwriter
out_workbook = xlsxwriter.Workbook('file.xlsx')
out_worksheet = out_workbook.add_worksheet()
in_workbook = xlrd.open_workbook(r'test.xls', 'r')
in_worksheet = in_workbook.sheet_by_index(0)
for row_index in range(in_worksheet.nrows):
for col_index in range(in_worksheet.ncols):
cell_value = in_worksheet.cell(row_index, col_index).value
out_worksheet.write(row_index, col_index, cell_value)
print cell_value
out_workbook.close()

Resources