How do I write to an existing excel file using xlwt and keep the formatting? - excel

I am trying to change a number in an excel file (and eventually multiple excel files by putting it in a loop). I want to edit the file and save it as a new file, which I have done successfully. The problem is the new file that I save strips all of the formatting that the old excel file had. Correct me if I'm wrong: but I can't use Openpyxl because it only works for .xlsx files (all of the files I'm working with are .xls).
I've looked into pandas but was unsuccessful in finding a solution. I'm most familiar with xlrd and xlwt, but am willing to try any other libraries if it solves the problem.
import xlrd
from xlutils.copy import copy
from xlrd import *
# To open Workbook
loc = (r"X:\Projects\test.xls")
wb = xlrd.open_workbook(loc)
dose = wb.sheet_by_index(13)
manu = dose.cell_value(4,3)
#writing 675
w = copy(open_workbook(loc))
if manu == "Hologic":
w.get_sheet(13).write(5,3,675)
w.save('book2.xls')
Again, the code works without any errors. But the new .xls file has no formatting. The formatting is crucial for this project, so I can't lose any of it.

You probably can't.
Microsoft created xlsx files for a reason: the classic xls format is a legacy binary file piling up hundreds, maybe thousands, of features, each reprented in differing ways (and the file format was not even openly documented back then, I don't know if it is now). So there is one app that can open a xls file and guarrantee to present what is there with all the features intended by the file creator: Excel. And the same Excel version that created the file, in that.
So, any open library that can write to xls will create the most basic files, with no formatting - and be lucky if it can parse out the content parts.
xlsx files on the other hand use conforming xml files internally, and even a program that does not care to know about the full specs can change information in the file and preserve formatting and other things simply by not touching anything it does not know about, and assembling a valid xml again.
That said, if you can't convert to xlsx, maybe the easier thing to do is use Python to drive Excel itself to make the changes for you, in an automated way.
The documentation for that is few and far apart, but that is possible by using pywin32 and the "COM" api - take a look here for a start: https://pbpython.com/windows-com.html
Another option is using LibreOffice - it can read and write xls files with formatting (though surely with losses), and is scriptable in Python. Unfortunatelly, the information on how to script LibreOffice using Python to do that is also hard to find, and the legacy option of using their "UNO" thing to enable interaction with Python makes its use complicated.

Related

Excel behaves strange with XSLX file created manually

Based on knowledge gained through working with the OpenXML SDK, I have implemented an Excel generator in JS (using TypeScript with ReactJS and a custom JSX factory generating plain XML). The files generated open fine in Excel and one can also edit and save them fine in Excel, no errors.
However, if one tries to copy cells (even a single one) from such a generated Excel file to another worksheet in the same Excel instance, it fails with the error "The command cannot be used on multiple selections.". Just saving and reopening the file is enough to fix the problem. Copying to other applications (e.g. Notepad) works fine.
It seems that this particular error is shown by Excel in several edge cases where the data is not exactly meet the expected format, for instance I found reports of that happening when a sheet is hidden when manipulating it via VBA. However, in my case I'm not sure what could be causing the issue.
Just saving the file in Excel unfortunately significantly alters its parts, so that I couldn't get a meaningful diff out of it. I did not see what could be causing the problem. Maybe someone has some experience with the internals of Excel?
To get a sample file, copy the following into your browser address bar and save it as xlsx file:
data:application/vndopenxmlformats-officedocumentspreadsheetmlsheet;base64,UEsDBBQAAAgIAAAAAAA69A4d5wAAAGYBAAAPAAAAeGwvd29ya2Jvb2sueG1sjZA9T8MwEIZ3JP7DyTt1AAmhKEkXBOqCMgC7Y1+SU/0R3bktPx+3ocxM9/k+9+qa7XfwcEQWSrFV95tKAUabHMWpVZ8fr3fPCrbd7U1zSrwfUtpDEURp1ZzzUmstdsZgZJMWjGUyJg4ml5InLQujcTIj5uD1Q1U96WAoqpVQ838YaRzJ4kuyh4AxrxBGb3KxKzMtoro/Zz3rrrmck98Ikk3GVh1JaPCoIJpQyi/CE/SHwZNdQZBGeGOcEpOJoOAi3rnyDwVcU0l45x5Vwesr3+FIEd17AUrpW+Ntz3AOZ1112b0a634AUEsDBBQAAAgIAAAAAAD2SCbhNwEAAMYCAAANAAAAeGwvc3R5bGVzLnhtbJ1STWvDMAy9D/YfjO+rk8DGGEl6KAR22aUd7OokSmvwF7Zbkv36yXFK20EZ7GJJz++9KLLK9agkOYHzwuiK5quMEtCd6YXeV/Rz1zy9UrKuHx9KHyYJ2wNAICjRvqKHEOwbY747gOJ+ZSxovBmMUzxg6fbMWwe891GkJCuy7IUpLjStS31UjQqedOaoQ0UzyupyMPqCFDQBSOUKyInLim64FK0TkevF9wLmeaxZ4qazvQBz8HghpLz1RqAuLQ8BnG6wIEu+myxUVBsNyGJXhGg2hz+ke8envHi+p54DdtQa1+PYzz3l9Awh5VeC7A6k3MYX+BpuJONA0ijf+zhFEn/3nOJ3ljTZpCLO5toteV/ZFv+yJdxaOX0cVQuumVcgNcjG4b5dftduEbKlt7rsx+F2W9hlIesfUEsDBBQAAAgIAAAAAABh+IC4iAEAAGIDAAAYAAAAeGwvd29ya3NoZWV0cy9zaGVldDEueG1shZNNT8MwDIbvSPyHKCc4sGzdxsfUFsEQEhJCSOPjnKXeFtEkVWLY4NfjtKUaaBqXyLXzPn7jpOnlxpTsA3zQzmZ80OtzBla5Qttlxp+fbk/OObvMDw/StfNvYQWAjBQ2ZHyFWE2ECGoFRoaeq8BSZeG8kUiffilC5UEWtciUIun3T4WR2vI8rXMvGtZhK2Yo5zMoQSEUZIWz2HLu3Fss3lGqT8pKWmCbWVVqjAn22Ya0HV11DwucQllm/CrhTCrUH/BIiozPHaIzsc5ZQImUWnj3BZYLslB3pQmw6u9mqorO4XZMzpUrm5UZbWsLRm4a57rAFUXD3tlglJyPu5Uz9R6I/dpuiPwOkLSApAOMkz+A8V7AsAUMtwFb3Wnd72DUAkYd4CLZJRDN0eth3EiUeerdmvn67IFmSM9jMBnRdamYvIrZ5ooyrm2pLczQU1UTA3MCADt6fpoepwKJHbNCtdrr/dqpK2CHarpf9SDNLtXNPz4hKK+r+E5+iwUd/udpNNMQ3d+SfwNQSwMEFAAACAgAAAAAAI86L6y8AAAAmQEAABoAAAB4bC9fcmVscy93b3JrYm9vay54bWwucmVsc7WQSwrCMBBA94J3CLO3qQoiYupGBLdSDxDSaRvaJiETP729KYJacOHG1TC/N4/Z7u5dy67oSVsjYJ6kwNAoW2hTCTjnh9ka2C6bTrYnbGWIQ1RrRyxuGRJQh+A2nJOqsZOUWIcmdkrrOxli6ivupGpkhXyRpivuPxmQjZgs7x3+QrRlqRXurbp0aMIXMKfQt0jAcukrDAKeeRI5wI6FAH8sFsD/dv5mfUM1YngbvEpRbgjzkcxykOGjB2cPUEsDBBQAAAgIAAAAAABja/EoqQAAABkBAAALAAAAX3JlbHMvLnJlbHONz7EKwjAQBuBd8B3C7Tatg4g07SJCV6kPENNrGtrmQhK1vr0ZVRwcf+6/D/6yXuaJ3dEHQ1ZAkeXA0CrqjNUCLu1pswdWV+tVecZJxlQKg3GBpS8bBAwxugPnQQ04y5CRQ5suPflZxhS95k6qUWrk2zzfcf9uQPVhsvbp8B+R+t4oPJK6zWjjD/irAayVXmMUsEz8QX68Eo1ZQoE1nQDfdAXwquQfA6sXUEsDBBQAAAgIAAAAAAAUVUFPBQEAAJkCAAATAAAAW0NvbnRlbnRfVHlwZXNdLnhtbK2Sv07DMBDGdyTewfJaxU4ZEEJJOkA7AkN5AONcEiv+J59b0rfHcQsDKmXpdLLv+77fneVqNRlN9hBQOVvTJSspAStdq2xf0/ftpnigZNXc3lTbgwckSW2xpkOM/pFzlAMYgcx5sKnTuWBETMfQcy/kKHrgd2V5z6WzEWws4pxBm+oZOrHTkayndH0kB9BIydNROLNqKrzXSoqY+nxv21+U4kRgyZk1OCiPiySg/Cxh7vwNOPle01ME1QJ5EyG+CJNUfNL804Xxw7mRXQ45M6XrOiWhdXJnkoWhDyBaHACi0SxXZoSyi8t8jAcNeG16Dv2HPG+eDchzWV55iJ/87zl4/mjNF1BLAQIUABQAAAgIAAAAAAA69A4d5wAAAGYBAAAPAAAAAAAAAAAAAAAAAAAAAAB4bC93b3JrYm9vay54bWxQSwECFAAUAAAICAAAAAAA9kgm4TcBAADGAgAADQAAAAAAAAAAAAAAAAAUAQAAeGwvc3R5bGVzLnhtbFBLAQIUABQAAAgIAAAAAABh+IC4iAEAAGIDAAAYAAAAAAAAAAAAAAAAAHYCAAB4bC93b3Jrc2hlZXRzL3NoZWV0MS54bWxQSwECFAAUAAAICAAAAAAAjzovrLwAAACZAQAAGgAAAAAAAAAAAAAAAAA0BAAAeGwvX3JlbHMvd29ya2Jvb2sueG1sLnJlbHNQSwECFAAUAAAICAAAAAAAY2vxKKkAAAAZAQAACwAAAAAAAAAAAAAAAAAoBQAAX3JlbHMvLnJlbHNQSwECFAAUAAAICAAAAAAAFFVBTwUBAACZAgAAEwAAAAAAAAAAAAAAAAD6BQAAW0NvbnRlbnRfVHlwZXNdLnhtbFBLBQYAAAAABgAGAIABAAAwBwAAAAA=
Well, I don't know the particulars of how you are generating the xml file, but I can tell you how to edit the underlying xml files so that it will work, and then perhaps you can figure out how to use your implementation to change the property that's gunking things up.
First, an xlsx is a set of xml files. I'm sure you know that, but I'm just starting at the beginning. You can change the extension to zip and then extract the files, and then rezip them and change the extension back to xlsx.
So do this:
take the generated xlsx
change the extension to .zip
extract the files
find xl\worksheets\sheet1.xml
open it and find this property: worksheet>sheetViews>sheetView:tabSelected
set it to 0
save the file
go back to the unzipped folder
select all files and send to zip
change the extension on the new zip file to .xlsx
You should now be able to open the newly created xlsx, add a new sheet, and copy freely.
If this works for you, then you have diagnosed the problem, one property set to true when it shouldn't be, and it should be relatively simple for you to modify your export procedure.
I've had this issue multiple times in the past.
The way I solved it was by filling out (populating) a template (file, previously created in Office) with the exported data rather than generating a file from scratch. Office unfortunately does not fully comply with OpenXML, and for more complex exports you might even be unable to open the file.
I would also recommend Beyond Compare (now Scooter Software) for comparing the two files instead of just doing a diff.

Update linked excel path in PowerPoint via Python

I want to automate creating of a powerpoint ppt via linking template charts to some Excel files. Updating the excel file values changes the powerpoint slides automatically. I have created my powerpoint template and linked charts to sample excel files data.
I want to send the folder with the powerpoint and excel files to someone else. But this will break the link to excel files due to change in the path. (As path is not relative). I can edit the paths manually by going under the "edit links to files" option under File Menu but this is tedious as charts are numerous with multiple files.
I want to update the same via Python code using the Python-Pptx package.
Please help!
There's no API support for this in the current version of python-pptx.
You would need to modify the underlying XML directly, perhaps using python-pptx internals as a starting point and using lxml calls on the appropriate element objects. If you search on "python-pptx workaround function" you will find some examples.
Another thing to consider is modifying the XML by cruder but still possibly effective means by accessing the XML files in the .pptx package directly (the .pptx file is a Zip archive of largely XML files) and using regular expressions or perhaps a command line tool like sed or awk to do simple text substitution.
Either way you're going to need to want it pretty badly, depending on your Python skill level. You'll also of course need to discover just which strings in which parts of the XML are the ones that need changing. opc-diag can be helpful for that, but it's a bit of detective work even with the best tools.

File that normally opens our application, but will fall back to Excel

Our application exports snippets of databases in XLSX format. We wrote our own code on top of System.Packaging as it is many (many!) times faster than using the Excel objects.
Right now we save these files with a .xlsx format, and that works OK. However, it would be much nicer if double-clicking one of these opened our app instead, but failed back to Excel on machines without it.
I know that SpreadsheetML has a feature to do this. If you insert this near the top of the file:
<?mso-application progid=""Excel.Sheet""?>
some sort of magic occurs that causes Excel to open on Win machines. While this might work in SML files, it does not appear to work in "real" xlsx files - I tried adding this line to various parts of the workbook structure but it remained unrecognized.
So is there a similar mechanism we can use in "true" XLSX files generated by System.Packaging? Or some other Windows mechanism we should use in these situations?

End user difference between .xls and .xlsb?

I'm using the TransferSpreadsheet command to export Access queries to an Excel file in a folder. I realize I can specify a file name (with extension) for it to create, but I decided to experiment and left out a file name in the destination path. The result was an Excel file with the query name saved as a .xlsb file.
I'd never heard of this, but it opened fine and after research I found that it is a more compact, quicker to open/save/close than traditional .xls. Great! These exported Excel files will be opened by potentially 20-25 users, each of whom has one of Excel 03, 07, or 10. For flexibility's sake, I would prefer to export the query without defining a file name.
Is .xlsb compatible with all of these? If so, is there any reason to not use this format? Can the end user format, modify, or otherwise tinker with a .xlsb file as though it was .xls?
.xlsb was introduced in excel 2007 alongside .xlsx and .xlsm. All three formats use the OPC standard and are conceptually similar (whereas .xls, while also a binary format, is much different -- for example, it uses an OLE container format rather than zip)
.xlsb is not compatible with .xls, and AFAICT there are no open source tools that can write XLSB. The j tool (available on node npm at https://npmjs.org/package/j) appears to parse XLSB. If you expect others to use their own tools (not excel) then you are better off sticking with XLS. However, if Excel is part of the workflow, then XLSB is a compelling option

use OpenOffice Calc to open Excel files and convert to CSV or Tab-delimited

Is there any type of automation available where I can use OpenOffice Calc to open Excel files and convert them to CSV or tab-delimited files?
I'm currently using PHPExcel to open the files and iterate through them and import each row into a database but have begun to run into memory issues with large files and need another alternative.
These are xls and xlsx files so it has to work for all of them.
If there is, how would I go about programming this in PHP?
If you have other alternatives, please feel free to suggest them.
OpenOffice can be run in server mode and used to convert files between a number of supported formats.
I have used this mainly with Java thru the JODConverter library available at http://www.artofsolving.com/opensource/jodconverter
A quick websearch brought up http://sourceforge.net/projects/phopo-org/ which claims to be a PHP implementation

Resources