Programatically converting from MS word to Excel - excel

Is there anyway to problematically take a MS Word file and convert to excel. (Obviously, word would to guess where to put stuff). Any language would be fine

That's a pretty wide-open question. The content of the Word document will affect how easy/hard this is.
One method you could look at is using Word automation to open the Word document and then write out a new file using comma-separated format and just name the file with a .xls extension. Upon opening this file up in Excel it should "just work".
If you need rich formatting in your output Excel document, you could use Excel automation to build your output document. Using this you'd have both Word automation (read) and Excel automation (write) in your program.
Another option that I've used (but it's a bit pricey) in a server environment is the Aspose libraries Aspose. They have a pretty nice API (at least for Word, which is what I've used) and they eliminate the automation angle.

See http://blogs.msdn.com/brian_jones/archive/2009/04/01/importing-a-table-from-wordprocessingml-to-spreadsheetml.aspx

Here is a resource on automating office applications: Office Object Models

Related

Replace hyperlinks inside Word documents from outside Microsoft Word

Suppose I have a standard Word document called document.doc.
Inside this document, there are hyperlinks to some server which no longer exists. I wish to replace the link with the aproper one, and since I must do this to many files, I hope there is a way to automate it.
How can I change the hyperlink to something else from outside Microsoft Office? Preferably in a Linux/Unix environment.
I noticed that all the hyperlinks in the document are stored in plaintext and can be viewed by
strings document.doc | grep -i "hyperlink"
I therefore tried a simple sed approach like this to edit in place:
sed -ir 's/www.badlink.com/www.goodlink.com/' document.doc
I then confirmed the hyperlink had changed by calling strings again. However, after using sed, the document cannot be opened by MS Word -- it states the file is corrupted.
So, is there any easy way to edit links in an Microsoft Word document by Linux/Unix tools? In the worst case, I imagine the task can be done with some Microsoft Office macro. And that is okay, too, if it is the only possibility.
DOC is not RTF, you can't edit it with a simple text editor.
you can easily use VBA macro \ some other language using the word Interop libraries - to do this simple search&replace, for more information check
https://msdn.microsoft.com/en-us/library/f1f367bx.aspx
Now it depends on your needs, if you need to do this on the server side, you can always use OpenOffice or better yet Aspose (commercial licensed 3rd part libraries) to do these things (quite pricey but worth every penny - google them)
If you need to do this on the CLIENT side, and (assuming client uses word, means they are running Windows) you can do it using VBA macro \ Office addin.

Using different program office extension

I have a program that can access a database with a whole bunch of articles.
Due to copyright, I can't access the database straight from my program, but I have a different program that can access it, and it's legitimate to copy small bits from the articles.
Because my friends and I quote a lot from these articles, I thought it would be useful if we could find an add-in for Word that will copy the requested part from an article.
Is there any add-in for Word that would let me use the program that I mentioned above so that I can access the database from within Word?
I would like to program this add-in myself, if possible.
Without further information about which operating system, and version of Word you are using, I can offer only a general outline.
1) It seems to me that you want to make a Word macro using Word Basic, or Visual Basic.
2) When you want to call your program which is external to Word, you need to use the shell command as outlined here from Microsoft's webpage.
I hope that helps you get started writing your macro!
CHEERS
Well its a wrokaround but you can use an automation tool which can run a sequence of actions on a given GUI like Winrunner or TestQuest to semulate the usage of the program, i assume these tools can get an input from a given xml or text file and log outputs in log text file.
If you have the output in a text file you will be able to parse the file using any programmign language and get the information you need and write it to eord or whatever format using OLE objects.

Is this possible in Excel: Open XLS via commandline, OnLoad import CSV data, Print as PDF, Close Doc?

Thinking that to solve a problem I've got this is the fastest solution:
Generate a custom CSV file on the file (this is already done via Perl).
Have a XLS document opened via commandline via a scripting language (clients already got a few Perl scripts running in this pipeline.)
Write VBA or record a macro that executes the following OnLoad:
Imports a the data from the CSV file into the report template,
Print the file via PDF driver to fixed location using data in the CSV to name the file.
Closes the XLS file.
So, is this possible via Excel macros, if not is it possible via VBA -- thanks!
NOTE: Appears I've got to have a copy of MS Office anyway, so this is much faster to get going than using Visual Studio Tools for Office (VSTO). The report template is going to be on a server, and this way the end user can build as many reports as they like, "test" by printing a PDF using a demo CSV file, and import/embed the marco or VBA when they're done. I'd looked in Jasper Reports, but the end user is putting ad-hoc static text and groupings all over the report and I figure this way they can build reports how ever they want and then automate them. Both of these questions by me and the resulting comments/feedback are related to this question:
In Excel, is it possible to automate reading of CSV data into a template and printing it to PDF from the commandline?
Is it possible to deploy a VB application made in Excel as a stand alone app?
FOCUS OF QUESTION: Again, focus of the question is if this is possible via Excel marcos, if not macros VBA, and if there's any huge issue with this approach; for example, I know this is going to be "slow" since Excel would be loaded per job, but there's 16GB of ram on the server and it's not used at all. Figure since I've got to have a copy of office on the server anyway, this is a much faster approach.
If you've got any questions, let me know via comments.
I suppose you could launch the report file from perl and then have a macro inside the report file automatically look for the newest csv file to import. Then you could process and output. So you just need to launch the proper excel file with the embedded macros from perl and then let excel and VBA take over.

an HTML file is NOT an Excel file, right?

we use an application that has an "export to excel" feature that doesn't work on PC's that done have outlook express installed.
i know, you're thinking "WTF does outlook express have to do with excel files?"
i asked the same thing, and here's what i found:
the file being generated is actually one of those Microsoft Single File Web Pages (.mht) and NOT an excel file
you need to have outlook express installed to actually view a .mht file.
i've explained to their support people that just because you can slap a .xls on a file and excel will open it does not mean its an excel file, and does not mean that this is the right way to do it.
how would you explain that this is not proper?
Many people (especially managers) confuse Excel files with reporting files. In my opinion, a file is only qualified as an Excel file if it meets all of these conditions:
Is a spreadsheet formatted in one of the many Microsoft Excel formats.
Can be opened in the most recent version of Microsoft Excel.
Is editable in Microsoft Excel.
In your case, I'm guessing only condition #3 is met, so it's no Excel file. But your support people may still call it a reporting file.
If a clean Windows image with only Excel installed can't open it, then it isn't in Excel format. Period.
If a Windows machine with Outlook Express, but without Excel can open it (if you change the extension) then it can't be an Excel file. I'd combine that with Ignacio's suggestion for a slam-dunk.
Plus, surely if it's MHT, then you can't actually do spreadsheet operations on it? Or am I misunderstanding how it works?
I don't think your statements are correct. Excel (2007) has import and export filters for single-file HTML documents (.mht) even if there is no Outlook Express installed. However, this is not a native format and worksheet features such as formulas cannot be retained (see http://office.microsoft.com/en-us/excel/HP100141051033.aspx#7)
So what you should make clear to your customers is that there is a difference between an applications native file format and a format which isn't designed to contain spreadsheet functionality and that is only supported via an import/export filter.

Using Office to programmatically convert documents?

I'm interested in using Office 2007 to convert between the pre-2007 binary formats (.doc, .xls, .ppt) and the new Office Open XML formats (.docx, .xlsx, .pptx)
How would I do this? I'd like to write a simple command line app that takes in two filenames (input and output) and perhaps the source and/or destination types, and performs the conversion.
Microsoft has a page which gives several examples of writing scripts to "drive" MS Word. One such example shows how to convert from a Word document to HTML. By changing the last parameter to any values listed here, you can get the output in different formats.
The easiest way would be to use Automation thru the Microsoft.Office.Interop. libraries. You can create an instance of a Word application, for example. There are methods attached to the Application object that will allow you to open and close documents, plus pretty much anything else you can accomplish in VBA by recording a macro.
You could also just write the VBA code in your Office application to do roughly the same thing. Both approaches are equally valid, depending on your comfort in programming in C#, VB.NET or VBA.

Resources