Parsing excel data ( containing xml ) in VBA - excel

I have a simple excel file containing rows and columns. One of the column contains rows of data which is string like this (XML data)
<Employee Name="R1" Designation="Developer">
<SkillSet Language="C#"/>
</Employee>
<Employee Name="E2" Designation="Developer">
<SkillSet Language="Java"/>
</Employee>
I would like to read this information in the excel file, parse it based on particular skillset and put them in the same excel workbook in adjacent column. I may need to provide a button click which may trigger the action.
How do I approach this problem.
Should I write a macro or should I write an excel addin. The excel can be either 2003 or earlier/2007/2010.
I can think of writing a user defined function to read the cell data but, how do I read through the columns in vb and also how do I re-use my function across different excel-workbooks.

Although this maybe coming to you at a late hour, however I had thesame problem as you mention. I'm also kind of new to VBA. My client wanted a database application in Access 2003. But after some persuation, I was able to convince them to adapt Access 2007. Which by the way, it's still not reliable, to develop a multi-lingual application.
Anyway to cut the story short, I was able to achieve same, after some google search and using the code I found here;
http://www.freevbcode.com/ShowCode.asp?ID=2922
and here
Parsing XML in VBA
Hope you find this links helpful cheers.

Add a reference (Tools-->References) to an XML parser, e.g., Microsoft XML, v6.0 -- my MS Access 2010 installation has a seven XML parsers provided by Microsoft. Declare an instance of the parser and use its properties and methods as you need.

Related

Creating rtf template with multi spreadsheets(excel)

I have created a data template report(xdodtexe) and the output will be in excel with multiple spread sheets. My E-Business Suite version is r12.1.3 and I am using Office 2013.
I have created an rtf and in two separate pages layouts are there. For example department in page and employees in page. I am using <?spreadsheet-sheet-name: department?> for naming the sheets but the sheet name comes as "fndwrr" and both the outputs are in the same excel sheet without splitting into two different sheets.
I have also used <?split-by-page-break:?> for splitting into two sheets but this also does not work.
Hello there fellow "BI Publisher self-torturer" :-),
First of all, I would suggest you go over this document here, as the best method of manipulating XLS outputs is by using Excel templates.
Second of all - it would have been great if you supply a sample of the xml data + a sample of the template code as you have it.
Btw, the split-by-page-break that you're trying is for other output formats, such as PDF, RTF etc. which follow a paging rule, unlike XLS.
This being said - I am pretty sure you can't do that using an RTF template.
Also, you should mention the version of the XDO Engine/BI Publisher, because Excel templates are available since version 11.1.15 of BI Publisher.
Pay special attention to the following section Table 3-2 Column Entries, as you would need to put the XDO_SHEET_? and XDO_SHEET_NAME_?, the first being the split/group criteria and the second the actual name of the sheet.
Cheers

Vba-Excel How to generate XML file from a SpreadSheet of Excel?

I have been searching for the logic to convert the SpreadSheet data of an excel into XML format.
I have the thousands of data like shown below. i want to convert this into XML format.
Anyone please help me, any help would be appreciated greatly.
According to the screenshot, your Excel version allows to save your document using XML Spreadsheet 2003 format: Save As... > Other Formats and locate it in the dropdown (at least for Excel 2007 it works as described).
Resulting XML will contain much of the native Excel Workbook fields and nodes, but they're might be easily removed using any more or less advanced XML editor, e.g. Altova XMLSpy or any similar. However, cleanup depends entirely on your further needs.
For your convenience may see sample Excel book and XML generated from it as described above: https://www.dropbox.com/s/kxmxu2tq52y4m9b/ExcelToXML.zip
Good luck!

Working with Office "open" XML - just how hard is it?

I'm considering replacing a (very) large body of Office-automation code with something that works with the Office XML format directly. I'm just starting out, but already I'm worried that it's too big a task.
I'll be dealing with Word, Excel and PowerPoint. So far I've only looked at Word and Excel. It looks like Word documents should be reasonably easy to manipulate, but Excel workbooks look like a nightmare. For example...
In Word, it looks like you could delete a paragraph simply by deleting the corresponding "w:p" tag. However, the supplied code snippet for deleting a row in Excel takes about 150 lines of code(!).
The reason the Excel code is so big is that deleting a row means updating the row indexes of all the subsequent rows, fixing up the "shared strings" table, etc. According to a comment at the top, the code snippet is not even complete, in that it won't deal with a workbook that has tables in it (I can live with that).
What I'm not clear on is whether that's the only restriction that the sample code has. For example, would there also be a problem if the workbook contained a Pivot Table? Or a chart that references data from the same sheet? Or some named ranges? Wouldn't you also have to update the formulae for any cells (etc.) that referenced a row whose row index had changed?
[That's not to mention the "calc chain", which (thankfully) I think you can simply delete since it's only a chache that can be re-built.]
And that's my question, woolly though it is. Just how hard do you have to work do something as simple as deleting a row properly? Is it an insurmountable task?
Also, if there are other, similar issues either with Excel or with Word or PowerPoint, I'd love to hear about them now, before I waste too much time going down a blind alley. Thanks.
Having worked with the Open XML SDK 2.0 for almost two years now I can say that doing seemingly trivial tasks can take many hours and sometimes days to figure out how to do it properly. For example, deleting an Excel row should be fairly straightforward and easy to do right? Nope because not only do you need code to delete your row, but then you have to update all the row indices, update any merged cell references, update hyperlink references, etc. Our internal delete method is close to 500 lines of code to just delete a row and I'm sure we don't have all the cases accounted for either.
The biggest complaint I have is the lack of documentation on how to do the most common tasks. The MSDN section on the Open XML SDK is very limited and whenever you need to do anything complicated you are really on your own. I've had to read the Open XML standard a lot to figure out what certain elements mean and how they should be implemented since I could find very little online.
The other challenging part is if you insert an element in a spot where it doesn't belong or put an invalid attribute on an element you will get a corrupt file when you try and open it. Most of the time you will not get any information on what caused the error and you will have to look at the Open XML standard spec to see what you did wrong.
If you need a fast turnaround time on converting that Office automation code into Open XML and what you are doing is not really basic, then I would say pass. If you have time and the patience to read up on the Word, Excel and PowerPoint XML structures and get familiar with how they relate then I say go for it. In my opinion it is really the only way to have very fine control over these office documents, but there will be a great learning curve when you start.
Oh and just for fun here is how much code is needed to add a comment to an Excel cell.
Just for completeness, here are some libraries I found for working with Excel XML:
www.extremexml.com - a layer on top of the Open XML SDK classes; focusses on injecting data into an existing spreadsheet; handles many of the cross-reference problems I identified in my question. Open source but GPL2 not LGPL. Code looks nice, and documentation is excellent. Does not appear terribly active on codeplex though.
Closed XML - another layer on top of the Open XML SDK - again open source, but with a less restrictive license (MIT). Looks nice, and looks more "active" than the above.
SpreadsheetLight - from what I can tell, a closed-source library sitting atop the Open XML SDK classes. Targeted more at those looking to create a spreadsheet from scratch rather than making changes to existing spreadsheets.
Here is another third party library dedicated to working with OpenXML:
http://www.officewriter.com
In the example cited by amurra above of deleting Excel spreadsheet rows, this is a single method call with this tool. It updates formulas and all the other references for which it seems that 500 lines of code would be required for otherwise.
The OpenXML SDK itself is a great tool for very simple things, but you still have to concern yourself with a lot of the internals of the file format and packaging structure to get things really right.
Here are some additional libraries that can manipulate with OOXML formats:
- GemBox.Spreadsheet (XLSX)
- GemBox.Document (DOCX)
Also GemBox published some articles that demonstrate how to manipulate with OOXML file format with pure .NET (without a use of any library), I think you'll find this interesting:
www.codeproject.com/Articles/15593/Read-and-write-Open-XML-files-MS-Office
(Introduction to SpreadsheetML format and an explanation on how we can read and write worksheet's cell content)
www.codeproject.com/Articles/649064/Show-Word-File-in-WPF
(Introduction to WordprocessingML format and demonstration on how we can read document's text)

What is the best way to import data from sophisticated formula enriched Excel files into SalesForce.com?

My current employer (to remain nameless) has a collection of incredibly sophisticated Microsoft Excel 2003 worksheets (developed by contractors, also to remain nameless).
The employer is replacing the Excel-based solution with a SalesForce-based solution (developed by other contractors, likewise to remain unnamed). The SalesForce solution is also very complex using dozens of related objects and "Dynamic SOQL" to contain the data and formulas which previously was contained in the Excel-based solution.
The employer's problem, which has become my problem, is that the data from the Excel spreadsheets needs to be meticulously and tediously recreated in .CSV files so it can be imported into SalesForce.
While I've recently learned I can use CTRL-` to review formulas in Excel, this doesn't solve the problem that variables in Excel have cryptic names like $O$15. If I'm lucky, when I investigate $O$15, I'll find some metadata explaining if n cells up and/or some other data m cells to the left, and/or (in rare instances) there may be a comment on the cell.
Patterns within the Excel spreadsheets are very limited, rarely lasting more than 6 concurrent rows or columns and no two sheets which need to be imported have much similarity.
Documentation of all systems are very limited.
Without my revealing any confidential data, does anyone have any good ideas how I might optimize my workflow?
It's not clear exactly what you need to do: here are 3 possible scenarios, requiring increasing knowledge of Excel.
1. If all you want is to convert the Excel spreadsheets into CSV format then just save the worksheets as CSVs.
2. If you just want the data and not the formulae then it would be simple (using VBA) to output anything that isn't a formula (the cell.Formula won't start with =).
3. If you need to create a linkage excel-->csv-->existing Salesforce objects/SOQL then you will need to understand both the Excel Spreadsheets and the Salesforce objects/SOQL that have been created. This will be difficult unless you have good knowledge and experience of Excel and also understand what the salesforce App requires.
Brian, if you're still working on this, here's one way to approach the problem. I use this kind of process often for updating data between SFDC and marketing automation apps.
1) Analyze the formulae that you're re-creating in Salesforce.com to determine what base data fields you need (stuff that doesn't have to be calculated from something else.
2) Find those columns/rows in your spreadsheets and use Paste Special -> Values in a new spreadsheet to create an upload file with values instead of formulae that you need for each data area (leads, prospects, accounts, etc.)
3) If you have to associate the info with leads or contacts or accounts and you have already uploaded or created those records in Salesforce.com, be sure to export them with their ID numbers. That makes it easy to use the vlookup formula in Excel to match up fields that you need to add and then re-upload the data into Salesforce.
Like data cleaning, this can be a tedious process. But if you take it step by step it shouldn't be too hard. Good luck.

Convert richtext strings to excel

I have a form that has TinyMCE for richtext formatting. All of our data is available to export as an HTML report, PDF Report, and Excel Spreadsheet (report).
The fields, that we allow richtext in, show up as the formatted values in both the HTML and PDF reports, but in Excel we show them as strings. For instance:
<b>this part is bold</b><br />line 2 here.
I need a way to make that show up as bold/line-break in excel rather then just showing that string, or at least a way to strip the HTML tags out of there and just show plain text (though I would really like to at least keep the line breaks). Is there some type of macro I can include in the excel download or some C++ program that can convert it or something?
Thanks for your time!
I've done something similar with PHPExcel
The trick is to take your formatted data and find a pattern. In your case, it would probably be table rows/table cells. Iterate through that structure setting the excel cell values as you go. For complex formatting you could fairly simply regex replace what is necessary to get formatted as you desire. The theory may sound a little complicated, but once you get down to it, it's only an hour or two's worth of work.
Certainly there are equivalent programs based on other server technologies. But this one has worked brilliantly for me over the years, and I trust it to work on sites for very big clients with crazy inbound traffic numbers...and it's never failed. It's the only reliable way I've found to write perfect, properly formatted Excel without requiring the user to jump through hoops to get a specific browser.

Resources