Update linked excel path in PowerPoint via Python - python-3.x

I want to automate creating of a powerpoint ppt via linking template charts to some Excel files. Updating the excel file values changes the powerpoint slides automatically. I have created my powerpoint template and linked charts to sample excel files data.
I want to send the folder with the powerpoint and excel files to someone else. But this will break the link to excel files due to change in the path. (As path is not relative). I can edit the paths manually by going under the "edit links to files" option under File Menu but this is tedious as charts are numerous with multiple files.
I want to update the same via Python code using the Python-Pptx package.
Please help!

There's no API support for this in the current version of python-pptx.
You would need to modify the underlying XML directly, perhaps using python-pptx internals as a starting point and using lxml calls on the appropriate element objects. If you search on "python-pptx workaround function" you will find some examples.
Another thing to consider is modifying the XML by cruder but still possibly effective means by accessing the XML files in the .pptx package directly (the .pptx file is a Zip archive of largely XML files) and using regular expressions or perhaps a command line tool like sed or awk to do simple text substitution.
Either way you're going to need to want it pretty badly, depending on your Python skill level. You'll also of course need to discover just which strings in which parts of the XML are the ones that need changing. opc-diag can be helpful for that, but it's a bit of detective work even with the best tools.

Related

Excel behaves strange with XSLX file created manually

Based on knowledge gained through working with the OpenXML SDK, I have implemented an Excel generator in JS (using TypeScript with ReactJS and a custom JSX factory generating plain XML). The files generated open fine in Excel and one can also edit and save them fine in Excel, no errors.
However, if one tries to copy cells (even a single one) from such a generated Excel file to another worksheet in the same Excel instance, it fails with the error "The command cannot be used on multiple selections.". Just saving and reopening the file is enough to fix the problem. Copying to other applications (e.g. Notepad) works fine.
It seems that this particular error is shown by Excel in several edge cases where the data is not exactly meet the expected format, for instance I found reports of that happening when a sheet is hidden when manipulating it via VBA. However, in my case I'm not sure what could be causing the issue.
Just saving the file in Excel unfortunately significantly alters its parts, so that I couldn't get a meaningful diff out of it. I did not see what could be causing the problem. Maybe someone has some experience with the internals of Excel?
To get a sample file, copy the following into your browser address bar and save it as xlsx file:
data:application/vndopenxmlformats-officedocumentspreadsheetmlsheet;base64,UEsDBBQAAAgIAAAAAAA69A4d5wAAAGYBAAAPAAAAeGwvd29ya2Jvb2sueG1sjZA9T8MwEIZ3JP7DyTt1AAmhKEkXBOqCMgC7Y1+SU/0R3bktPx+3ocxM9/k+9+qa7XfwcEQWSrFV95tKAUabHMWpVZ8fr3fPCrbd7U1zSrwfUtpDEURp1ZzzUmstdsZgZJMWjGUyJg4ml5InLQujcTIj5uD1Q1U96WAoqpVQ838YaRzJ4kuyh4AxrxBGb3KxKzMtoro/Zz3rrrmck98Ikk3GVh1JaPCoIJpQyi/CE/SHwZNdQZBGeGOcEpOJoOAi3rnyDwVcU0l45x5Vwesr3+FIEd17AUrpW+Ntz3AOZ1112b0a634AUEsDBBQAAAgIAAAAAAD2SCbhNwEAAMYCAAANAAAAeGwvc3R5bGVzLnhtbJ1STWvDMAy9D/YfjO+rk8DGGEl6KAR22aUd7OokSmvwF7Zbkv36yXFK20EZ7GJJz++9KLLK9agkOYHzwuiK5quMEtCd6YXeV/Rz1zy9UrKuHx9KHyYJ2wNAICjRvqKHEOwbY747gOJ+ZSxovBmMUzxg6fbMWwe891GkJCuy7IUpLjStS31UjQqedOaoQ0UzyupyMPqCFDQBSOUKyInLim64FK0TkevF9wLmeaxZ4qazvQBz8HghpLz1RqAuLQ8BnG6wIEu+myxUVBsNyGJXhGg2hz+ke8envHi+p54DdtQa1+PYzz3l9Awh5VeC7A6k3MYX+BpuJONA0ijf+zhFEn/3nOJ3ljTZpCLO5toteV/ZFv+yJdxaOX0cVQuumVcgNcjG4b5dftduEbKlt7rsx+F2W9hlIesfUEsDBBQAAAgIAAAAAABh+IC4iAEAAGIDAAAYAAAAeGwvd29ya3NoZWV0cy9zaGVldDEueG1shZNNT8MwDIbvSPyHKCc4sGzdxsfUFsEQEhJCSOPjnKXeFtEkVWLY4NfjtKUaaBqXyLXzPn7jpOnlxpTsA3zQzmZ80OtzBla5Qttlxp+fbk/OObvMDw/StfNvYQWAjBQ2ZHyFWE2ECGoFRoaeq8BSZeG8kUiffilC5UEWtciUIun3T4WR2vI8rXMvGtZhK2Yo5zMoQSEUZIWz2HLu3Fss3lGqT8pKWmCbWVVqjAn22Ya0HV11DwucQllm/CrhTCrUH/BIiozPHaIzsc5ZQImUWnj3BZYLslB3pQmw6u9mqorO4XZMzpUrm5UZbWsLRm4a57rAFUXD3tlglJyPu5Uz9R6I/dpuiPwOkLSApAOMkz+A8V7AsAUMtwFb3Wnd72DUAkYd4CLZJRDN0eth3EiUeerdmvn67IFmSM9jMBnRdamYvIrZ5ooyrm2pLczQU1UTA3MCADt6fpoepwKJHbNCtdrr/dqpK2CHarpf9SDNLtXNPz4hKK+r+E5+iwUd/udpNNMQ3d+SfwNQSwMEFAAACAgAAAAAAI86L6y8AAAAmQEAABoAAAB4bC9fcmVscy93b3JrYm9vay54bWwucmVsc7WQSwrCMBBA94J3CLO3qQoiYupGBLdSDxDSaRvaJiETP729KYJacOHG1TC/N4/Z7u5dy67oSVsjYJ6kwNAoW2hTCTjnh9ka2C6bTrYnbGWIQ1RrRyxuGRJQh+A2nJOqsZOUWIcmdkrrOxli6ivupGpkhXyRpivuPxmQjZgs7x3+QrRlqRXurbp0aMIXMKfQt0jAcukrDAKeeRI5wI6FAH8sFsD/dv5mfUM1YngbvEpRbgjzkcxykOGjB2cPUEsDBBQAAAgIAAAAAABja/EoqQAAABkBAAALAAAAX3JlbHMvLnJlbHONz7EKwjAQBuBd8B3C7Tatg4g07SJCV6kPENNrGtrmQhK1vr0ZVRwcf+6/D/6yXuaJ3dEHQ1ZAkeXA0CrqjNUCLu1pswdWV+tVecZJxlQKg3GBpS8bBAwxugPnQQ04y5CRQ5suPflZxhS95k6qUWrk2zzfcf9uQPVhsvbp8B+R+t4oPJK6zWjjD/irAayVXmMUsEz8QX68Eo1ZQoE1nQDfdAXwquQfA6sXUEsDBBQAAAgIAAAAAAAUVUFPBQEAAJkCAAATAAAAW0NvbnRlbnRfVHlwZXNdLnhtbK2Sv07DMBDGdyTewfJaxU4ZEEJJOkA7AkN5AONcEiv+J59b0rfHcQsDKmXpdLLv+77fneVqNRlN9hBQOVvTJSspAStdq2xf0/ftpnigZNXc3lTbgwckSW2xpkOM/pFzlAMYgcx5sKnTuWBETMfQcy/kKHrgd2V5z6WzEWws4pxBm+oZOrHTkayndH0kB9BIydNROLNqKrzXSoqY+nxv21+U4kRgyZk1OCiPiySg/Cxh7vwNOPle01ME1QJ5EyG+CJNUfNL804Xxw7mRXQ45M6XrOiWhdXJnkoWhDyBaHACi0SxXZoSyi8t8jAcNeG16Dv2HPG+eDchzWV55iJ/87zl4/mjNF1BLAQIUABQAAAgIAAAAAAA69A4d5wAAAGYBAAAPAAAAAAAAAAAAAAAAAAAAAAB4bC93b3JrYm9vay54bWxQSwECFAAUAAAICAAAAAAA9kgm4TcBAADGAgAADQAAAAAAAAAAAAAAAAAUAQAAeGwvc3R5bGVzLnhtbFBLAQIUABQAAAgIAAAAAABh+IC4iAEAAGIDAAAYAAAAAAAAAAAAAAAAAHYCAAB4bC93b3Jrc2hlZXRzL3NoZWV0MS54bWxQSwECFAAUAAAICAAAAAAAjzovrLwAAACZAQAAGgAAAAAAAAAAAAAAAAA0BAAAeGwvX3JlbHMvd29ya2Jvb2sueG1sLnJlbHNQSwECFAAUAAAICAAAAAAAY2vxKKkAAAAZAQAACwAAAAAAAAAAAAAAAAAoBQAAX3JlbHMvLnJlbHNQSwECFAAUAAAICAAAAAAAFFVBTwUBAACZAgAAEwAAAAAAAAAAAAAAAAD6BQAAW0NvbnRlbnRfVHlwZXNdLnhtbFBLBQYAAAAABgAGAIABAAAwBwAAAAA=
Well, I don't know the particulars of how you are generating the xml file, but I can tell you how to edit the underlying xml files so that it will work, and then perhaps you can figure out how to use your implementation to change the property that's gunking things up.
First, an xlsx is a set of xml files. I'm sure you know that, but I'm just starting at the beginning. You can change the extension to zip and then extract the files, and then rezip them and change the extension back to xlsx.
So do this:
take the generated xlsx
change the extension to .zip
extract the files
find xl\worksheets\sheet1.xml
open it and find this property: worksheet>sheetViews>sheetView:tabSelected
set it to 0
save the file
go back to the unzipped folder
select all files and send to zip
change the extension on the new zip file to .xlsx
You should now be able to open the newly created xlsx, add a new sheet, and copy freely.
If this works for you, then you have diagnosed the problem, one property set to true when it shouldn't be, and it should be relatively simple for you to modify your export procedure.
I've had this issue multiple times in the past.
The way I solved it was by filling out (populating) a template (file, previously created in Office) with the exported data rather than generating a file from scratch. Office unfortunately does not fully comply with OpenXML, and for more complex exports you might even be unable to open the file.
I would also recommend Beyond Compare (now Scooter Software) for comparing the two files instead of just doing a diff.

How do I write to an existing excel file using xlwt and keep the formatting?

I am trying to change a number in an excel file (and eventually multiple excel files by putting it in a loop). I want to edit the file and save it as a new file, which I have done successfully. The problem is the new file that I save strips all of the formatting that the old excel file had. Correct me if I'm wrong: but I can't use Openpyxl because it only works for .xlsx files (all of the files I'm working with are .xls).
I've looked into pandas but was unsuccessful in finding a solution. I'm most familiar with xlrd and xlwt, but am willing to try any other libraries if it solves the problem.
import xlrd
from xlutils.copy import copy
from xlrd import *
# To open Workbook
loc = (r"X:\Projects\test.xls")
wb = xlrd.open_workbook(loc)
dose = wb.sheet_by_index(13)
manu = dose.cell_value(4,3)
#writing 675
w = copy(open_workbook(loc))
if manu == "Hologic":
w.get_sheet(13).write(5,3,675)
w.save('book2.xls')
Again, the code works without any errors. But the new .xls file has no formatting. The formatting is crucial for this project, so I can't lose any of it.
You probably can't.
Microsoft created xlsx files for a reason: the classic xls format is a legacy binary file piling up hundreds, maybe thousands, of features, each reprented in differing ways (and the file format was not even openly documented back then, I don't know if it is now). So there is one app that can open a xls file and guarrantee to present what is there with all the features intended by the file creator: Excel. And the same Excel version that created the file, in that.
So, any open library that can write to xls will create the most basic files, with no formatting - and be lucky if it can parse out the content parts.
xlsx files on the other hand use conforming xml files internally, and even a program that does not care to know about the full specs can change information in the file and preserve formatting and other things simply by not touching anything it does not know about, and assembling a valid xml again.
That said, if you can't convert to xlsx, maybe the easier thing to do is use Python to drive Excel itself to make the changes for you, in an automated way.
The documentation for that is few and far apart, but that is possible by using pywin32 and the "COM" api - take a look here for a start: https://pbpython.com/windows-com.html
Another option is using LibreOffice - it can read and write xls files with formatting (though surely with losses), and is scriptable in Python. Unfortunatelly, the information on how to script LibreOffice using Python to do that is also hard to find, and the legacy option of using their "UNO" thing to enable interaction with Python makes its use complicated.

File that normally opens our application, but will fall back to Excel

Our application exports snippets of databases in XLSX format. We wrote our own code on top of System.Packaging as it is many (many!) times faster than using the Excel objects.
Right now we save these files with a .xlsx format, and that works OK. However, it would be much nicer if double-clicking one of these opened our app instead, but failed back to Excel on machines without it.
I know that SpreadsheetML has a feature to do this. If you insert this near the top of the file:
<?mso-application progid=""Excel.Sheet""?>
some sort of magic occurs that causes Excel to open on Win machines. While this might work in SML files, it does not appear to work in "real" xlsx files - I tried adding this line to various parts of the workbook structure but it remained unrecognized.
So is there a similar mechanism we can use in "true" XLSX files generated by System.Packaging? Or some other Windows mechanism we should use in these situations?

CSV to Excel Doc?

I was curious what the community thinks is the easiest way to take a CSV file and 'save as' a Excel document with only a couple formulas pasted in?
I am trying to do this behind the scenes, and not physically navigating. e.g. opening, selecting save As, etc -- even though this is already VERY simple I **need to do this in code (Think automation)
Background: I have a c++ command line program generating the .csv, and a C# GUI starting this process. Either programs could hold the code, but I figure this is easiest in C# (InterOp?) The reasons I don't directly send code into the csv is because of the amount of comma characters that will mess up the csv and because other Excel documents need to reference the sheets so they need to be in .xls format.
=AVERAGE(C2:C999)
=COUNTIFS(C:C,">0",C:C,"<31")
=COUNTIFS(C:C,">31",C:C,"<55")
=COUNTIF(C:C,">55")
Have a look and see whether command-line scipting of openoffice will do the job. It can do quite a lot of conversions very easily. Otherwise there are a lot of Excel-producing libraries, for example PHPExcel, but you'd need to wrap some programming around them.

Is this possible in Excel: Open XLS via commandline, OnLoad import CSV data, Print as PDF, Close Doc?

Thinking that to solve a problem I've got this is the fastest solution:
Generate a custom CSV file on the file (this is already done via Perl).
Have a XLS document opened via commandline via a scripting language (clients already got a few Perl scripts running in this pipeline.)
Write VBA or record a macro that executes the following OnLoad:
Imports a the data from the CSV file into the report template,
Print the file via PDF driver to fixed location using data in the CSV to name the file.
Closes the XLS file.
So, is this possible via Excel macros, if not is it possible via VBA -- thanks!
NOTE: Appears I've got to have a copy of MS Office anyway, so this is much faster to get going than using Visual Studio Tools for Office (VSTO). The report template is going to be on a server, and this way the end user can build as many reports as they like, "test" by printing a PDF using a demo CSV file, and import/embed the marco or VBA when they're done. I'd looked in Jasper Reports, but the end user is putting ad-hoc static text and groupings all over the report and I figure this way they can build reports how ever they want and then automate them. Both of these questions by me and the resulting comments/feedback are related to this question:
In Excel, is it possible to automate reading of CSV data into a template and printing it to PDF from the commandline?
Is it possible to deploy a VB application made in Excel as a stand alone app?
FOCUS OF QUESTION: Again, focus of the question is if this is possible via Excel marcos, if not macros VBA, and if there's any huge issue with this approach; for example, I know this is going to be "slow" since Excel would be loaded per job, but there's 16GB of ram on the server and it's not used at all. Figure since I've got to have a copy of office on the server anyway, this is a much faster approach.
If you've got any questions, let me know via comments.
I suppose you could launch the report file from perl and then have a macro inside the report file automatically look for the newest csv file to import. Then you could process and output. So you just need to launch the proper excel file with the embedded macros from perl and then let excel and VBA take over.

Resources