Agnostically convert spreadsheet to csv file - python-3.x

I have a script that takes a csv and does something with it.
I'd like this script to be agnostic to the spreadsheet file type (xlsx, xls, ods for instance), and always convert the file to csv for processing. Is there a way to do this without corrupting the data in any way?

You can use a headless version of the open-source software, Libreoffice, to convert the same extensions that Libreoffice can normally do within in the GUI. This solution does require you to install the whole office suite which may be overkill depending on your particular situation.
However, via the command line, you can call Libreoffice to do this conversion:
soffice --headless --convert-to csv <input_file> --outdir </path/to/dir>
This example is under the assumption that you are using a Unix-like machine, however there should be a similar version for Windows as well (e.g. soffice.exe). Replace the <input_file> with your file name and the </path/to/dir/> to the path to the directory you want to have your output (the output directory option is opional). You can use the wildcard * as the input file, which would convert all the files in the directory to csv.

Related

Is there a Linux terminal command for creating .xlsx files from .bracken files in a loop in a new folder

Is there a loop I can use to create .xlsx files from .bracken files I currently have and channel them into an output folder?
All that I have now is to convert my .bracken files into .xlsx files using this code cat MG-ABCD12345-0.genus.bracken > MG-ABCD12345-0_genus_bracken.xlsx and files are going into my current working directory. I would like the output in a folder called bracken_excel_files which is located within my current working directory. I would prefer to use common commands such as for for the loop for easier understanding.
bracken appears to be generating/sharing the same output format as kraken, which I saw somewhere to be tab-delimited fields.
If that is true, then that is the essence of what CSV files are.
In that case, you don't need to use the "cat" command (CPU and I/O consuming). You simply need to rename the file with a ".csv" suffix (to make the file format explicitly visible for others), then import that into Excel or OpenOffice/LibreOffice Calc. Each of those tools offer different options for interpreting the input when you use the "Import" function to open the files.

Using Octave to "Edit" notepad file instead of "Open" in Windows

I use Windows 10 and an .exe program (in-house code written by a colleague) that imports data from .txt files. Since 99% of my use of .txt files are for this program, I've changed the default Windows program so that this .exe file is run automatically when opening a .txt file. If I need to access the .txt file directly, or use it for another purpose, I right-click and choose "edit."
I'm now writing a program of my own (using Octave 4.4.1), which also uses .txt files that sometimes need to be opened/edited, but if I use "open(filename)" in my Octave script, of course it just opens the .exe file. I can open the .txt file from there, but I'd like to skip this middle step, since the aforementioned .exe program is not intended to be used in this process, and there are other users of my code that don't have the .exe program installed.
Is there a way to duplicate the right-click/edit feature in Windows within Octave code? "edit(filename)" opens the file in the native Octave editor, which is technically viable, but not exactly a desirable scenario. I've also tried changing the default Octave editor to Notepad, and I've tried Notepad++ as well, but I have had absolutely no luck, even with significant effort, of making Octave use an external default editor of any kind (even when I remove the .exe program as the default for .txt files). Thanks in advance for any advice you can offer.
You can send command-line commands from Octave using the system() function.
For example, to open the file in notepad, you could do
[status, output] = system("notepad <path_to_text_file>.txt");
If notepad isn't in your system path, you will have to add it to or use the full path to the notepad executable
Or, if you want to use Notepad++, add it to your system path and then do
[status, output] = system("notepad++ <path_to_text_file>.txt");

Adding macros script to an excel file externally in Linux

My requirement is: I have been given an excel (the user uploads it to our server) and then my program should automatically add a macros code (defined in a text file maybe) to the excel file and then send it back to the user. I found a similar question but the solution only works in Windows but since our server is Linux based, I haven't found a way to do so.
Link to the similar question: Use Python to Inject Macros into Spreadsheets
Assuming you're being sent a file in xlsm format, you need to following capabilities:
Open the file as a zip file
Locate the .bin part path from the rels files - see Microsoft Open Packaging Conventions
Locate and open the VBA project's .bin stream
parse the .bin stream as a Compound Binary File Format file
Parse the binary streams that describe and list the module contents of the file, as documented in Office VBA File Format Structure
Add your module text as a new stream, and update the files from step 5 with the new contents.
It's not a small undertaking. The work has already been done in Python, and a lot of the libraries for working with zip files and compound binary format files are already in .NET for Windows. Otherwise, as far as I'm aware, there aren't any other pre-built tools, other than the tools from aspose

Convert .xls to .pdf using LibreOffice via Command Line

I'm trying to convert a .xls file to .pdf using LibreOffice via command line on Ubuntu. I have a kind of report on the .xls file with some colors in the background of the cells and etc.
The problem is when I convert the .xls file, the .pdf loses the original format. Each page is broken almost in the half and the content of one page is displayed in two different pages.
Does anybody know how to convert the .xls file to .pdf via command line with keeping the original format?
Or some trick to set the size of the .pdf page to not break pages? (Also via command line)
The code I used to make the conversion was:
soffice --headless --convert-to pdf:"impress_pdf_Export" filename.xls
If you use LibreOffice to convert Microsoft Excel (XLS) files to PDF documents, this is a two-step process (even if your command does look like it is a one-step process):
Import the XLS into LibreOffice (even if started with --headless).
Export the PDF from LibreOffice.
If the result does not look like you expect (not similar enough to Excel's native PDF export), then start with debugging the first step from above:
Open the XLS file with LibreOffice in a GUI. Does it look like you expect it to look? Or are some formatting options looking weird?
Export the PDF from there (with the GUI). Are the page dimensions as you expect? Did you set them up how you prefer? The margins like you want them? etc.pp. ...
If you are working on Windows, you may also want to consider OfficeToPDF.exe. It is hosted on CodePlex, licensed with the Apache 2.0 License and available in binary and in source code.
It requires a working Office 2013, Office 2010 or Office 2007 installation. But then it can commandline- and batch-convert to PDF various MS Office-based file formats, including XLS(X), PPT(X), DOC(X), VSD(X) and PUB as well as Libre/OpenOffice-based ODT, ODS and ODC files.
Although this is a little bit off from the initial question (you don't _really need Office Libre if you have the Office suite and on a Windows machine)
I do appreciate the follow-up provided by Kurt. It prompted me to post the following Gist offering some clear instructions on how to go about using the .exe in a for loop.
https://gist.github.com/einsty/2189cae4175f619cff0f
Try copying appropriate font file (for me it's
a simsun.ttc file) to your libreoffice installing directory like '/opt/libreoffice4.2/share/fonts/truetype'.But if the width of a single excel sheet is too much for a print page(sth like 'A4'),it'll still collapse.

use OpenOffice Calc to open Excel files and convert to CSV or Tab-delimited

Is there any type of automation available where I can use OpenOffice Calc to open Excel files and convert them to CSV or tab-delimited files?
I'm currently using PHPExcel to open the files and iterate through them and import each row into a database but have begun to run into memory issues with large files and need another alternative.
These are xls and xlsx files so it has to work for all of them.
If there is, how would I go about programming this in PHP?
If you have other alternatives, please feel free to suggest them.
OpenOffice can be run in server mode and used to convert files between a number of supported formats.
I have used this mainly with Java thru the JODConverter library available at http://www.artofsolving.com/opensource/jodconverter
A quick websearch brought up http://sourceforge.net/projects/phopo-org/ which claims to be a PHP implementation

Resources