How to read an Open Office spreadsheet? - groovy

How can I read an Open Office 3.0 spreadsheet (.ods) from Groovy? I'd like to select specific columns from a named worksheet. Ideally, it would be useful to add a 'where' clause, or other criteria clause.

I've never used it, but Open Office has a Java API, which of course you could use from Groovy as well. It looks like the best places to start reading are the Developer's Guide, the Java UNO Reference, and the samples in Java and (hey!) Groovy. Hope that helps!

Might be something here at Spring Factories or here at Groovy and JMX. There is a forum for Groovy and Open Office.

Could you export the table / spreadsheet as SQL entries then use that. You could also look at this plugin for goovy -- http://www.ifcx.org/

OpenOffice documents are ZIP files which contains the document data as XML plus some other files (style sheets for word documents). Details can be found here.
The main problem with calc is formulas. If you just have tabular data, then you can simply read the cell values and use that. So you can open the ZIP archive, read the content.xml in it and parse that with any XML parser.
But when a cell contains a formula, then you need to execute it. In this case, you will have to open the document via the UNO API. Here is the Java version. There is a link where you can download example code that explains how to open ODF documents and how to examine their content. There are also snippets but none of them show how to examine a sheet.
The main disadvantage of UNO is the documentation. Each method is explained somewhere but you have to find the method which solves your problem, first.

Since the title does not mention Groovy (only question specifics does), I didn't want to make this a new question.
How to generally read an Open Office spreadsheet document? There are tools for creating one (ooo-python) but not for reading one. They are XML but just bluntly diving into that and trying to get the right logic of extracting the data I want seems so sub-optimal.
What I'd like is features similar to Excel COM support, but from a command line tool (or scripting language).

Related

Handling Excel Spreadsheets with Cucumber

I am planning to work on the Cucumber feature file with Groovy code (Katalon Studio) for step definitions. I wanted to use the excel file in Cucumber file or to see is there any other option to use it.
I have not yet tried as of now any other option. I am thinking just passing the cucumber step file without any parameter and then using the excel file with in the step definition and access excel file and get the corresponding value.
I see there is a post in this forum suggesting to use QMetry Automation Framework for this type of question. But it does not look like this will help on this or should I use the passing the row index from cucumber file and based on that retrieve the value. Please guide on this.
Handling excel spreadsheets with Cucumber Scenario Outline
You should know that this is not supported by Cucumber.
As specified in the FAQ:
"We advise you not to use Excel or csv files to define your test cases; using Excel or csv files is considered an anti-pattern.
One of the goals of Cucumber is to have executable specifications. This means your feature files should contain just the right level of information to document the expected behaviour of the system. If your test cases are kept in separate files, how would you be able to read the documentation?
This also means you shouldn’t have too many details in your feature file. If you do, you might consider moving them to your step definitions or helper methods. For instance, if you have a form where you need to populate lots of different fields, you might use the Builder pattern to do so."
If you are using cucumber java 5+ you can add qaf-cucumber dependency. It should work with groovy as well. It will enable to have examples from external source like CSV, XML, JSON, EXCEL, DB.

generate multiple pdfs from a database

I would like to generate multiple pdfs at once. Those pdfs should pull data from a database. It can be an excel table or a relational database, doesn't matter, I can create whatever.
Using excel and javascript in adobe acrobat pro I managed to pull data into a template pdf, but for every record (row) I have in excel table I have to manually generate one pdf, then another, and so on.. and there are a lot of records, so I would like to do that automatically if possible.
Is there a way to do that? Any suggestions?
I added an image to better explain it...
Look into the Acrobat SDK, Section: "Interapplication Communication" to learn how you can control Acrobat via VB/VBA and how you can work with the JavaScript Object (JSO).
Then have a look into the "Acrobat JavaScript Scripting reference" and look at
the Doc Object with commands like .. addField and at
the Field object to set the properties of the fields.
That should do what you want, Reinhard
PS: With Open Office you can save spreadsheets as PDF and with newer version of Excel too. Wouldn't that be already enough or perhaps a mix of above and this.
Full disclosure: I founded and run Epsillion Software.
mirta, one option is Epsillion Publisher. We built it for your exact use case.
You would need to specify what your template should look like. The Epsillion team will design it for you.
You then specify what your variables are in a Word document (e.g., name, last name, date of birth). The software will process the Word and Excel files and return PDFs for you.
Templates are flexible and flow as needed.
Hope that helps. Good luck!

Flexible customization - Generating word document using C#

Problem - Generate a word document from information retrieved from database.
My solution - Create a word document template add fields/tags in places where values need to be inserted. The template will require tables and charts as well. Using document reflector that comes with open office xml sdk reflect on the document template and extract the w:document section and port it to C#. The rest of the logic revolves simply around finding the fields/tags, replacing them, etc. Very simple approach but not very flexible!
Challenge - I want the user to have the ability to customize the template or the generated document output. But this will not be possible if I embed the template logic in code.
Any other possibilities - I looked around at Templating using T4 and RazorEngine but could not find any concrete examples of how to create word documents using these two technologies.
Now what is the best approach?
I would really appreciate your inputs on what is the best and most flexible way to generate word documents using C#.
I'm actually working a project where the business users are designing word template with mail merge fields and we are populating the values using a 3rd party software package Aspose Words. http://www.aspose.com/categories/.net-components/aspose.words-for-.net/default.aspx
The software includes a library for merging data from datatables into the mail merge fields in the word document.
I also wrote a customized word task pane add in that retrieves data views from the database and lists the fields in a drag/drop interface that mimics a crystal or sql report writing interface.
Probably would of been easier to just use crystal or sql reporting though...
It's certainly possible to generate the contents of an Office doc using T4 or Razor and then package it up. The TestScribe powertool for Visual Studio Test Manager does just that with T4. There is a thread by Sally Cavanagh in the Q&A on this page http://visualstudiogallery.msdn.microsoft.com/e79e4a0f-f670-47c2-9b8a-3b6f664bf4ae that suggests a way to look at the T4 templates that it uses, which might get you jump-started.
Here is sample to play word document template with C#
You could use a content control databinding approach.
XML Mapping Task Pane for Word 2007/2010 is an authoring tool.
To create an instance document, you just attach your XML data file.
If the resulting documents will be opened in Word, that is all that is required: Word will bind the data itself. If your consuming application is not Word, you might want to resolve the bindings yourself (eg via Open XML SDK).
Content control databinding isn't intended to support repeats and conditionals. For a way to do that, look at my OpenDoPE convention
Take a look at Templater. Disclamer: I'm the author.
Check out JODReports or Docmosis. They are Java based but some of the templating features and output options might be ideal. You can call the command line interfaces unless they also have something better to reach from C#.

Exporting Native Excel 2007 Files From .NET

Does anyone know of resources that can help me export simple contents of a GridView to a native Excel 2007 format (i.e. the OpenOfficeXML format).
I've already seen solutions like Matt Berseth's, and in fact I have been using that for a while, but it comes with an annoying warning produced by Excel 2007 as documented here stemming from the fact that a native Excel file is not generated; rather it is HTML.
My initial research shows that, at the core, xlsx files are zip files, but I have no idea how to produce these or what goes in them.
Any suggestions (or tutorials) would be greatly appreciated.
CarlosAg has an ExcelXML writer which works really well. It isn't a native excel 2007 formatted file, but it will be readable in excel 2007.
You will need to write a little method to do the exporting manually, the API is very straight forward though. You will create a sheet object, then a row object, then a cell object. You can just loop through your data and output it. The examples on the site are pretty decent.
I prefer using Microsoft's own Open XML Format SDK. It is free, it is released by Microsoft and it creates real .xlsx files.
You can find the reference documentation here, as you can see, it is pretty straightforward to use.
SpreadsheetGear for .NET can read and write native xls and xlsx files and is easier to use (takes less of your time) than other solutions because it has an Excel like API so you don't have to learn anything about Open XML.
You can see some live ASP.NET (C# and VB) Excel Reporting examples here and download an evaluation version here.
Disclaimer: I own SpreadsheetGear LLC

How best to export native data to Excel without introducing dependency on Office?

Our product has the requirement of exporting its native format (essentially an XML file) to Excel for viewing/editing. However, what this entails is having a dependency on Excel (or Office) itself for our product build - something that we do not want.
What we have done is export the data from our native format to a csv file which can be opened in Excel. If user selects an option to open the generated report as well, we (try to) launch Excel application to open it (ofcourse it requires Excel to be already present on the client system).
The data for most part is flat list of records.
Is there a better format (or even a better way) to handle this requirement? This is a common requirement for many products - how do you handle this?
Excel versions, both 2007 and several previous, have native XML formats. 2007, obviously, is XML by default, and earlier versions have the ability to save as XML. This SO question deals with the issue. I'd guess a little inspection would give an idea of what's required. I don't know if a XSD/DTD exists for older versions, but a little creative Googling might yield something.
As other people pointed out, it is reasonably easy to generate Excel XML files. You can do this in multiple ways. For example:
By creating a template Excel XML document, and then using XML DOM to stuff your data into the template, or
Converting the template Excel XML into an XSLT, and then simply passing your proprietary XML as input to XSLT.
I'm using ExcelPackage to create spreadsheets in one of my side projects. Works pretty good, but (at least the version I'm using) its a bit limited when it comes to styling and calculations.
ExcelPackage lets you create OOXML docs (.xslx files) that are natively compat with 2k7, but you can download a plugin for previous versions of Office from MS.
We export our data either using Excel objects (COM based code) on client side or CSV file (usually on server side, but can be used on client side too). And we allow copy data from grids in simple html format, what can be pasted into Excel without problems.
For one customer we even had to export data [from sql stored procedure] into csv-like tab-separated format, but named file like xxxxx.xls - this way excel opened that file in more correct way than csv file. Ugly hack, but worked well.
CSV is most compatible format (no dependencies on external applications or libraries), but customers don't like it. Maybe we need to incorporate some XLS export code, this way all users will be happy :)
If .csv isn't formatted enough, you could create a template in Excel, and use a little bit of VBA code to import the CSV and format it appropriately. This way your app is only concerned with generating the .CSV, and will use the same .XLS for each export.
If you're careful, you should be able to get this to work with most versions of Excel seamlessly.
With Perl there are several modules that can be used to produce .xlsx files without requiring an Office installation. Among those :
https://metacpan.org/pod/Excel::Writer::XLSX is the most well-known, with support for many Excel features like colors, formatting, etc.
https://metacpan.org/pod/Excel::ValueWriter::XLSX (I'm actually the author) has less features but is optimized for fast writing of large amounts of data
If you are working in Java, Checkout the POI project from APACHE.
http://poi.apache.org/
Simple, nice, complete, powerful.
We started with Office on the server, but that's not very nice. We had to kill processes that hung, and had quite a bit of a performance dip. We thought about putting it on a different machine, but didn't bother after trying and using Aspose (commercial). We don't have a very large number of simultaneous users, but complex documents. Simple ones can be handled easier with csv.
I've used FlexCel Studio for a couple of projects now. It's very functional and fast. 100% managed code, no dependencies. Sounds like you'd use the "Reports" feature which allows you to define an empty report template in Excel, then pass datatable and volia, it's populated with your data.
TMS Software
We use a combination of OleDB and Interop. We found that Interop was much faster and used less memory, but it's a pain for compatibility issues, especially when using different language installs of Office.
OleDb has the advantage that you don't require Excel to be installed on the client machine. Both Interop and OleDb support multiple sheets (tables) per workbook which you cannot do with csv.
If you're using C# or VB.Net, and your data is in a a DataSet, DataTable or List<>, then you can use my free "Export to Excel" class.
It uses the free Microsoft OpenXML libraries (so you don't need to have Excel on your server), and lets you export your data into a "real" .xlsx file with just one line of code, eg:
DataSet ds = CreateSampleData();
CreateExcelFile.CreateExcelDocument(ds, "C:\\Sample.xlsx");
All source code is provided on the following page along with a demo project, completely free of charge (and popups !)
http://mikesknowledgebase.com/pages/CSharp/ExportToExcel.htm
Hope this helps !

Resources