My requirement is: I have been given an excel (the user uploads it to our server) and then my program should automatically add a macros code (defined in a text file maybe) to the excel file and then send it back to the user. I found a similar question but the solution only works in Windows but since our server is Linux based, I haven't found a way to do so.
Link to the similar question: Use Python to Inject Macros into Spreadsheets
Assuming you're being sent a file in xlsm format, you need to following capabilities:
Open the file as a zip file
Locate the .bin part path from the rels files - see Microsoft Open Packaging Conventions
Locate and open the VBA project's .bin stream
parse the .bin stream as a Compound Binary File Format file
Parse the binary streams that describe and list the module contents of the file, as documented in Office VBA File Format Structure
Add your module text as a new stream, and update the files from step 5 with the new contents.
It's not a small undertaking. The work has already been done in Python, and a lot of the libraries for working with zip files and compound binary format files are already in .NET for Windows. Otherwise, as far as I'm aware, there aren't any other pre-built tools, other than the tools from aspose
Related
Is there a loop I can use to create .xlsx files from .bracken files I currently have and channel them into an output folder?
All that I have now is to convert my .bracken files into .xlsx files using this code cat MG-ABCD12345-0.genus.bracken > MG-ABCD12345-0_genus_bracken.xlsx and files are going into my current working directory. I would like the output in a folder called bracken_excel_files which is located within my current working directory. I would prefer to use common commands such as for for the loop for easier understanding.
bracken appears to be generating/sharing the same output format as kraken, which I saw somewhere to be tab-delimited fields.
If that is true, then that is the essence of what CSV files are.
In that case, you don't need to use the "cat" command (CPU and I/O consuming). You simply need to rename the file with a ".csv" suffix (to make the file format explicitly visible for others), then import that into Excel or OpenOffice/LibreOffice Calc. Each of those tools offer different options for interpreting the input when you use the "Import" function to open the files.
Based on knowledge gained through working with the OpenXML SDK, I have implemented an Excel generator in JS (using TypeScript with ReactJS and a custom JSX factory generating plain XML). The files generated open fine in Excel and one can also edit and save them fine in Excel, no errors.
However, if one tries to copy cells (even a single one) from such a generated Excel file to another worksheet in the same Excel instance, it fails with the error "The command cannot be used on multiple selections.". Just saving and reopening the file is enough to fix the problem. Copying to other applications (e.g. Notepad) works fine.
It seems that this particular error is shown by Excel in several edge cases where the data is not exactly meet the expected format, for instance I found reports of that happening when a sheet is hidden when manipulating it via VBA. However, in my case I'm not sure what could be causing the issue.
Just saving the file in Excel unfortunately significantly alters its parts, so that I couldn't get a meaningful diff out of it. I did not see what could be causing the problem. Maybe someone has some experience with the internals of Excel?
To get a sample file, copy the following into your browser address bar and save it as xlsx file:
data:application/vndopenxmlformats-officedocumentspreadsheetmlsheet;base64,UEsDBBQAAAgIAAAAAAA69A4d5wAAAGYBAAAPAAAAeGwvd29ya2Jvb2sueG1sjZA9T8MwEIZ3JP7DyTt1AAmhKEkXBOqCMgC7Y1+SU/0R3bktPx+3ocxM9/k+9+qa7XfwcEQWSrFV95tKAUabHMWpVZ8fr3fPCrbd7U1zSrwfUtpDEURp1ZzzUmstdsZgZJMWjGUyJg4ml5InLQujcTIj5uD1Q1U96WAoqpVQ838YaRzJ4kuyh4AxrxBGb3KxKzMtoro/Zz3rrrmck98Ikk3GVh1JaPCoIJpQyi/CE/SHwZNdQZBGeGOcEpOJoOAi3rnyDwVcU0l45x5Vwesr3+FIEd17AUrpW+Ntz3AOZ1112b0a634AUEsDBBQAAAgIAAAAAAD2SCbhNwEAAMYCAAANAAAAeGwvc3R5bGVzLnhtbJ1STWvDMAy9D/YfjO+rk8DGGEl6KAR22aUd7OokSmvwF7Zbkv36yXFK20EZ7GJJz++9KLLK9agkOYHzwuiK5quMEtCd6YXeV/Rz1zy9UrKuHx9KHyYJ2wNAICjRvqKHEOwbY747gOJ+ZSxovBmMUzxg6fbMWwe891GkJCuy7IUpLjStS31UjQqedOaoQ0UzyupyMPqCFDQBSOUKyInLim64FK0TkevF9wLmeaxZ4qazvQBz8HghpLz1RqAuLQ8BnG6wIEu+myxUVBsNyGJXhGg2hz+ke8envHi+p54DdtQa1+PYzz3l9Awh5VeC7A6k3MYX+BpuJONA0ijf+zhFEn/3nOJ3ljTZpCLO5toteV/ZFv+yJdxaOX0cVQuumVcgNcjG4b5dftduEbKlt7rsx+F2W9hlIesfUEsDBBQAAAgIAAAAAABh+IC4iAEAAGIDAAAYAAAAeGwvd29ya3NoZWV0cy9zaGVldDEueG1shZNNT8MwDIbvSPyHKCc4sGzdxsfUFsEQEhJCSOPjnKXeFtEkVWLY4NfjtKUaaBqXyLXzPn7jpOnlxpTsA3zQzmZ80OtzBla5Qttlxp+fbk/OObvMDw/StfNvYQWAjBQ2ZHyFWE2ECGoFRoaeq8BSZeG8kUiffilC5UEWtciUIun3T4WR2vI8rXMvGtZhK2Yo5zMoQSEUZIWz2HLu3Fss3lGqT8pKWmCbWVVqjAn22Ya0HV11DwucQllm/CrhTCrUH/BIiozPHaIzsc5ZQImUWnj3BZYLslB3pQmw6u9mqorO4XZMzpUrm5UZbWsLRm4a57rAFUXD3tlglJyPu5Uz9R6I/dpuiPwOkLSApAOMkz+A8V7AsAUMtwFb3Wnd72DUAkYd4CLZJRDN0eth3EiUeerdmvn67IFmSM9jMBnRdamYvIrZ5ooyrm2pLczQU1UTA3MCADt6fpoepwKJHbNCtdrr/dqpK2CHarpf9SDNLtXNPz4hKK+r+E5+iwUd/udpNNMQ3d+SfwNQSwMEFAAACAgAAAAAAI86L6y8AAAAmQEAABoAAAB4bC9fcmVscy93b3JrYm9vay54bWwucmVsc7WQSwrCMBBA94J3CLO3qQoiYupGBLdSDxDSaRvaJiETP729KYJacOHG1TC/N4/Z7u5dy67oSVsjYJ6kwNAoW2hTCTjnh9ka2C6bTrYnbGWIQ1RrRyxuGRJQh+A2nJOqsZOUWIcmdkrrOxli6ivupGpkhXyRpivuPxmQjZgs7x3+QrRlqRXurbp0aMIXMKfQt0jAcukrDAKeeRI5wI6FAH8sFsD/dv5mfUM1YngbvEpRbgjzkcxykOGjB2cPUEsDBBQAAAgIAAAAAABja/EoqQAAABkBAAALAAAAX3JlbHMvLnJlbHONz7EKwjAQBuBd8B3C7Tatg4g07SJCV6kPENNrGtrmQhK1vr0ZVRwcf+6/D/6yXuaJ3dEHQ1ZAkeXA0CrqjNUCLu1pswdWV+tVecZJxlQKg3GBpS8bBAwxugPnQQ04y5CRQ5suPflZxhS95k6qUWrk2zzfcf9uQPVhsvbp8B+R+t4oPJK6zWjjD/irAayVXmMUsEz8QX68Eo1ZQoE1nQDfdAXwquQfA6sXUEsDBBQAAAgIAAAAAAAUVUFPBQEAAJkCAAATAAAAW0NvbnRlbnRfVHlwZXNdLnhtbK2Sv07DMBDGdyTewfJaxU4ZEEJJOkA7AkN5AONcEiv+J59b0rfHcQsDKmXpdLLv+77fneVqNRlN9hBQOVvTJSspAStdq2xf0/ftpnigZNXc3lTbgwckSW2xpkOM/pFzlAMYgcx5sKnTuWBETMfQcy/kKHrgd2V5z6WzEWws4pxBm+oZOrHTkayndH0kB9BIydNROLNqKrzXSoqY+nxv21+U4kRgyZk1OCiPiySg/Cxh7vwNOPle01ME1QJ5EyG+CJNUfNL804Xxw7mRXQ45M6XrOiWhdXJnkoWhDyBaHACi0SxXZoSyi8t8jAcNeG16Dv2HPG+eDchzWV55iJ/87zl4/mjNF1BLAQIUABQAAAgIAAAAAAA69A4d5wAAAGYBAAAPAAAAAAAAAAAAAAAAAAAAAAB4bC93b3JrYm9vay54bWxQSwECFAAUAAAICAAAAAAA9kgm4TcBAADGAgAADQAAAAAAAAAAAAAAAAAUAQAAeGwvc3R5bGVzLnhtbFBLAQIUABQAAAgIAAAAAABh+IC4iAEAAGIDAAAYAAAAAAAAAAAAAAAAAHYCAAB4bC93b3Jrc2hlZXRzL3NoZWV0MS54bWxQSwECFAAUAAAICAAAAAAAjzovrLwAAACZAQAAGgAAAAAAAAAAAAAAAAA0BAAAeGwvX3JlbHMvd29ya2Jvb2sueG1sLnJlbHNQSwECFAAUAAAICAAAAAAAY2vxKKkAAAAZAQAACwAAAAAAAAAAAAAAAAAoBQAAX3JlbHMvLnJlbHNQSwECFAAUAAAICAAAAAAAFFVBTwUBAACZAgAAEwAAAAAAAAAAAAAAAAD6BQAAW0NvbnRlbnRfVHlwZXNdLnhtbFBLBQYAAAAABgAGAIABAAAwBwAAAAA=
Well, I don't know the particulars of how you are generating the xml file, but I can tell you how to edit the underlying xml files so that it will work, and then perhaps you can figure out how to use your implementation to change the property that's gunking things up.
First, an xlsx is a set of xml files. I'm sure you know that, but I'm just starting at the beginning. You can change the extension to zip and then extract the files, and then rezip them and change the extension back to xlsx.
So do this:
take the generated xlsx
change the extension to .zip
extract the files
find xl\worksheets\sheet1.xml
open it and find this property: worksheet>sheetViews>sheetView:tabSelected
set it to 0
save the file
go back to the unzipped folder
select all files and send to zip
change the extension on the new zip file to .xlsx
You should now be able to open the newly created xlsx, add a new sheet, and copy freely.
If this works for you, then you have diagnosed the problem, one property set to true when it shouldn't be, and it should be relatively simple for you to modify your export procedure.
I've had this issue multiple times in the past.
The way I solved it was by filling out (populating) a template (file, previously created in Office) with the exported data rather than generating a file from scratch. Office unfortunately does not fully comply with OpenXML, and for more complex exports you might even be unable to open the file.
I would also recommend Beyond Compare (now Scooter Software) for comparing the two files instead of just doing a diff.
I want to automate creating of a powerpoint ppt via linking template charts to some Excel files. Updating the excel file values changes the powerpoint slides automatically. I have created my powerpoint template and linked charts to sample excel files data.
I want to send the folder with the powerpoint and excel files to someone else. But this will break the link to excel files due to change in the path. (As path is not relative). I can edit the paths manually by going under the "edit links to files" option under File Menu but this is tedious as charts are numerous with multiple files.
I want to update the same via Python code using the Python-Pptx package.
Please help!
There's no API support for this in the current version of python-pptx.
You would need to modify the underlying XML directly, perhaps using python-pptx internals as a starting point and using lxml calls on the appropriate element objects. If you search on "python-pptx workaround function" you will find some examples.
Another thing to consider is modifying the XML by cruder but still possibly effective means by accessing the XML files in the .pptx package directly (the .pptx file is a Zip archive of largely XML files) and using regular expressions or perhaps a command line tool like sed or awk to do simple text substitution.
Either way you're going to need to want it pretty badly, depending on your Python skill level. You'll also of course need to discover just which strings in which parts of the XML are the ones that need changing. opc-diag can be helpful for that, but it's a bit of detective work even with the best tools.
I'm trying to convert a .xls file to .pdf using LibreOffice via command line on Ubuntu. I have a kind of report on the .xls file with some colors in the background of the cells and etc.
The problem is when I convert the .xls file, the .pdf loses the original format. Each page is broken almost in the half and the content of one page is displayed in two different pages.
Does anybody know how to convert the .xls file to .pdf via command line with keeping the original format?
Or some trick to set the size of the .pdf page to not break pages? (Also via command line)
The code I used to make the conversion was:
soffice --headless --convert-to pdf:"impress_pdf_Export" filename.xls
If you use LibreOffice to convert Microsoft Excel (XLS) files to PDF documents, this is a two-step process (even if your command does look like it is a one-step process):
Import the XLS into LibreOffice (even if started with --headless).
Export the PDF from LibreOffice.
If the result does not look like you expect (not similar enough to Excel's native PDF export), then start with debugging the first step from above:
Open the XLS file with LibreOffice in a GUI. Does it look like you expect it to look? Or are some formatting options looking weird?
Export the PDF from there (with the GUI). Are the page dimensions as you expect? Did you set them up how you prefer? The margins like you want them? etc.pp. ...
If you are working on Windows, you may also want to consider OfficeToPDF.exe. It is hosted on CodePlex, licensed with the Apache 2.0 License and available in binary and in source code.
It requires a working Office 2013, Office 2010 or Office 2007 installation. But then it can commandline- and batch-convert to PDF various MS Office-based file formats, including XLS(X), PPT(X), DOC(X), VSD(X) and PUB as well as Libre/OpenOffice-based ODT, ODS and ODC files.
Although this is a little bit off from the initial question (you don't _really need Office Libre if you have the Office suite and on a Windows machine)
I do appreciate the follow-up provided by Kurt. It prompted me to post the following Gist offering some clear instructions on how to go about using the .exe in a for loop.
https://gist.github.com/einsty/2189cae4175f619cff0f
Try copying appropriate font file (for me it's
a simsun.ttc file) to your libreoffice installing directory like '/opt/libreoffice4.2/share/fonts/truetype'.But if the width of a single excel sheet is too much for a print page(sth like 'A4'),it'll still collapse.
Is there any type of automation available where I can use OpenOffice Calc to open Excel files and convert them to CSV or tab-delimited files?
I'm currently using PHPExcel to open the files and iterate through them and import each row into a database but have begun to run into memory issues with large files and need another alternative.
These are xls and xlsx files so it has to work for all of them.
If there is, how would I go about programming this in PHP?
If you have other alternatives, please feel free to suggest them.
OpenOffice can be run in server mode and used to convert files between a number of supported formats.
I have used this mainly with Java thru the JODConverter library available at http://www.artofsolving.com/opensource/jodconverter
A quick websearch brought up http://sourceforge.net/projects/phopo-org/ which claims to be a PHP implementation