Convert docx to xlsx - excel

I am writting .NET application that generates reports in docx. One of the last requirements I've got was - generate also these reports in xlsx format. So,
is there any simple way to convert docx to xlsx format? I haven't found any solution or utility/library. One of the ideas was to use Microsoft.Office.Interop Copy/Paste methods, but I don't know if it helps :)

We have used a MS tool that allows you to work with Office documents as if they were xml:
Open XML SDK 2.0 for Microsoft Office
http://www.microsoft.com/downloads/details.aspx?FamilyId=C6E744E5-36E9-45F5-8D8C-331DF206E0D0&displaylang=en
This had the following benefits compared with interop:
No need in install office
No problems with memory due to Excel not closing
Better performance, in our case it went from 40 seconds to 2 (two)

Related

How to get OleDb for reading excel in asp.net core project

Is there any way of reading excel data in ASP.NET Core (built over .Net Core)? I am not able to refer OleDB in project.json of my .net core project. Is there any other way of doing this?
Do you really need OleDB to read Excel today? To my opinion, OleDB is a bit outdated. There are opensource libraries to work with Excel files which are much easier to use and provide a lot of flexibility.
ClosedXML (https://closedxml.codeplex.com/) is one such library. It's well documented and allows to both read and write Excel files with custom cell style formatting.
I have used OleDB for reading very large Excel files too, it works but there are certain issues with it, here are a few of them off the top of my head:
You will need to install MS ACE OLEDB provider which sometimes is hard to configure.
You will have to read Excel files in passes while with ClosedXML you can access rows/cells randomly by id/address.
OleDB uses a Windows registry configurable setting to check the number of rows (default is 8) to determine the data type for the whole column and sometimes there are issues with it because the data type is determined incorrectly. ClosedXML allows you to set a specific data type for any cell.
I'm not a ClosedXML developer, so this is not an ad. As I mentioned, I have used (and continue to use) both OleDB and ClosedXML in my projects. With OleDB I was able to read very large Excel files (400-800K+ of rows for example) either row by row or using SQL Server "SELECT * FROM OPENROWSET(...)". Also SQL Server can directly write to Excel files using same ACE provider and it worked for very large files too.
However, ClosedXML I have used for reading/writing relatively small files but they used a lot of custom formatting. So if you start a new project I would recommend going away from OleDB.
The only limitation of ClosedXML is that it support only zipped XML Excel files, i.e. Excel version 2007+. You can find many examples on the ClosedXML site mentioned above which would help you to get started.
Let me know if this was helpful. Thanks.
As indicated by andrews, it is recommended to use a third-party library, especially when working with formatting. An alternative to ClosedXML is EPPlus. For most of my project I have used ClosedXML, but I had to switch to EPPlus because it has support for charts.
For ASP.NET Core, you can use EPPlus.Core. I have performed a quick test within an ASP.NET Core 2.0 console app and it seems to work fine:
var newFile = new FileInfo("some file.xlsx");
using (ExcelPackage xlPackage = new ExcelPackage(newFile))
{
var ws = xlPackage.Workbook.Worksheets.Add("etc");
ws.Cells[1, 1].Value = "test";
// do work here
xlPackage.Save();
}

Can I Use Microsoft Graph to Convert MSWord Documents to Other Formats?

I want to take a .DOC or .DOCX file and convert it to another file format; e.g. PDF or HTML, etc. I don't have MS Word loaded on the local machine nor do I have an Office 365 account.
Will Microsoft Graph provide a way to do this programmatically, or am I barking up the wrong tree?
Thanks in advance for any insight or ideas!
You can do this in various ways in Office365/Graph:
Do a http GET to the drives API in MS Graph. Example:
GET /drive/items/{item-id}/content?format={format}
This also works for files in SharePoint.
See:
https://learn.microsoft.com/en-us/onedrive/developer/rest-api/api/driveitem_get_content_format
Via the Convert File action in Microsoft Flow. See John Liu's blog on this.
Essential DocIO is an option to consider. The library can convert from Doc and Docx to PDF and doesn't rely on Microsoft Office.
The entire product is available for free with no limitations through the community license if you qualify (less than 1 million US$ in revenue per year).
Note: I work for Syncfusion
Today the Microsoft Graph doesn't provide this functionality, but it's a reasonable request. You might want to raise it over at uservoice here.
Microsoft Graph (https://graph.microsoft.io/) can't be used to convert your .DOC or .DOCX files into another format, especially if you don't have MS Word or Office365. Microsoft Graph is basically 'just' an interface to access your data/objects stored in your Office365 tennant.
Without MS Word your only option (besides third party tooling) to convert would be to use Office Open XML, but even so that wouldn't suit your need as .DOC files are not based on the open XML standard. Also rendering to PDF is also not part of the Office Open XML specifications so you need to find another service to do that if you don't want MS Word to do that for you.
So in short, to answer your question ... no you can't use Microsoft Graph to convert MSWord documents to other formats.

Office Invalid XML error, file still opens in Office

I have an .xlsx file that when run through the open Office SDK 2.5 generates an error that the document is invalid and contains multiple validation errors involving the slicerCache and invalid attribute values.
I can attach more information about the actual XML if needed from the xlsx file, however my question is actually this. Excel still opens the document without an error. Not even a request to "repair" the document.
I am curious why using the Microsoft open office XML SDK generates validation errors, yet office is still able to open these documents.
Does office make a best guess? Or is the SDK given by microsoft not entirely accurate??
Thanks.
This is a formatting issue as far as I can tell. When you save it in xlsx it saves it as a workbook, not a spreadsheet. I would save it in a different file format or see if there libraries that your sdk needs in order to process the xlsx. I've never worked with office sdk, but I get similar errors when I open xlsx in other programs. 99% of the time I can just change the format. (if you live dangerously you can just manual change the file extension in your folder to something itll read.)

Exporting Native Excel 2007 Files From .NET

Does anyone know of resources that can help me export simple contents of a GridView to a native Excel 2007 format (i.e. the OpenOfficeXML format).
I've already seen solutions like Matt Berseth's, and in fact I have been using that for a while, but it comes with an annoying warning produced by Excel 2007 as documented here stemming from the fact that a native Excel file is not generated; rather it is HTML.
My initial research shows that, at the core, xlsx files are zip files, but I have no idea how to produce these or what goes in them.
Any suggestions (or tutorials) would be greatly appreciated.
CarlosAg has an ExcelXML writer which works really well. It isn't a native excel 2007 formatted file, but it will be readable in excel 2007.
You will need to write a little method to do the exporting manually, the API is very straight forward though. You will create a sheet object, then a row object, then a cell object. You can just loop through your data and output it. The examples on the site are pretty decent.
I prefer using Microsoft's own Open XML Format SDK. It is free, it is released by Microsoft and it creates real .xlsx files.
You can find the reference documentation here, as you can see, it is pretty straightforward to use.
SpreadsheetGear for .NET can read and write native xls and xlsx files and is easier to use (takes less of your time) than other solutions because it has an Excel like API so you don't have to learn anything about Open XML.
You can see some live ASP.NET (C# and VB) Excel Reporting examples here and download an evaluation version here.
Disclaimer: I own SpreadsheetGear LLC

How best to export native data to Excel without introducing dependency on Office?

Our product has the requirement of exporting its native format (essentially an XML file) to Excel for viewing/editing. However, what this entails is having a dependency on Excel (or Office) itself for our product build - something that we do not want.
What we have done is export the data from our native format to a csv file which can be opened in Excel. If user selects an option to open the generated report as well, we (try to) launch Excel application to open it (ofcourse it requires Excel to be already present on the client system).
The data for most part is flat list of records.
Is there a better format (or even a better way) to handle this requirement? This is a common requirement for many products - how do you handle this?
Excel versions, both 2007 and several previous, have native XML formats. 2007, obviously, is XML by default, and earlier versions have the ability to save as XML. This SO question deals with the issue. I'd guess a little inspection would give an idea of what's required. I don't know if a XSD/DTD exists for older versions, but a little creative Googling might yield something.
As other people pointed out, it is reasonably easy to generate Excel XML files. You can do this in multiple ways. For example:
By creating a template Excel XML document, and then using XML DOM to stuff your data into the template, or
Converting the template Excel XML into an XSLT, and then simply passing your proprietary XML as input to XSLT.
I'm using ExcelPackage to create spreadsheets in one of my side projects. Works pretty good, but (at least the version I'm using) its a bit limited when it comes to styling and calculations.
ExcelPackage lets you create OOXML docs (.xslx files) that are natively compat with 2k7, but you can download a plugin for previous versions of Office from MS.
We export our data either using Excel objects (COM based code) on client side or CSV file (usually on server side, but can be used on client side too). And we allow copy data from grids in simple html format, what can be pasted into Excel without problems.
For one customer we even had to export data [from sql stored procedure] into csv-like tab-separated format, but named file like xxxxx.xls - this way excel opened that file in more correct way than csv file. Ugly hack, but worked well.
CSV is most compatible format (no dependencies on external applications or libraries), but customers don't like it. Maybe we need to incorporate some XLS export code, this way all users will be happy :)
If .csv isn't formatted enough, you could create a template in Excel, and use a little bit of VBA code to import the CSV and format it appropriately. This way your app is only concerned with generating the .CSV, and will use the same .XLS for each export.
If you're careful, you should be able to get this to work with most versions of Excel seamlessly.
With Perl there are several modules that can be used to produce .xlsx files without requiring an Office installation. Among those :
https://metacpan.org/pod/Excel::Writer::XLSX is the most well-known, with support for many Excel features like colors, formatting, etc.
https://metacpan.org/pod/Excel::ValueWriter::XLSX (I'm actually the author) has less features but is optimized for fast writing of large amounts of data
If you are working in Java, Checkout the POI project from APACHE.
http://poi.apache.org/
Simple, nice, complete, powerful.
We started with Office on the server, but that's not very nice. We had to kill processes that hung, and had quite a bit of a performance dip. We thought about putting it on a different machine, but didn't bother after trying and using Aspose (commercial). We don't have a very large number of simultaneous users, but complex documents. Simple ones can be handled easier with csv.
I've used FlexCel Studio for a couple of projects now. It's very functional and fast. 100% managed code, no dependencies. Sounds like you'd use the "Reports" feature which allows you to define an empty report template in Excel, then pass datatable and volia, it's populated with your data.
TMS Software
We use a combination of OleDB and Interop. We found that Interop was much faster and used less memory, but it's a pain for compatibility issues, especially when using different language installs of Office.
OleDb has the advantage that you don't require Excel to be installed on the client machine. Both Interop and OleDb support multiple sheets (tables) per workbook which you cannot do with csv.
If you're using C# or VB.Net, and your data is in a a DataSet, DataTable or List<>, then you can use my free "Export to Excel" class.
It uses the free Microsoft OpenXML libraries (so you don't need to have Excel on your server), and lets you export your data into a "real" .xlsx file with just one line of code, eg:
DataSet ds = CreateSampleData();
CreateExcelFile.CreateExcelDocument(ds, "C:\\Sample.xlsx");
All source code is provided on the following page along with a demo project, completely free of charge (and popups !)
http://mikesknowledgebase.com/pages/CSharp/ExportToExcel.htm
Hope this helps !

Resources