XML parse VBA excel (function trip, & MSXML2.DOMDocument) - excel

I need to parse hundreds of XML files having all the same structure as follows:
<?xml version="1.0" encoding="UTF-8"?>
<Concepts>
<ConceptModel name="food">
<Filters>
<Filter type="CC"/>
</Filters>
<Queries>
<Query lang="EN">(cheese, bread, wine)</Query>
<Query lang="DE">(Käse, Brot, Wein)</Query>
<Query lang="FR">(fromaige, pain, vin)</Query>
</Queries>
</ConceptModel>
</Concepts>
I have read several articles and posts in internet like below but I could not come up with a solution:
Excel vba Parse Complex XML
Parse XML File with VBA
So far I am doing:
Dim oXml As MSXML2.DOMDocument
Set oXml = New MSXML2.DOMDocument
oXml.LoadXML ("C:\folder\folder\name.xml")
Dim Queries As IXMLDOMNodeList
Dim Query As IXMLDOMNode
ThisWorkbook.Sheets(3).Cells(i, 1) = "before loop"
Set Queries = oXml.SelectNodes("/concepts/Queries")
MsgBox "how many Queries " & Queries.Length
For Each Query In Queries
ThisWorkbook.Sheets(3).Cells(i, 1) = "Works"
ThisWorkbook.Sheets(3).Cells(i, 2) = Query.SelectNodes("Query").iTem(0).Text
i = i + 1
Next
This code seems to be understood by VBA but it does not read the contents. The loop does not get read, meaning (I guess) that Queries is not looped at all. Which is confirmed by the fact that the Msgbox "how many queries" gives 0 as result. But actually there are three queries. Could someone give me a hand?
As second issue I would like to ask if
Dim oXml As MSXML2.DOMDocument
would be the same as
Dim oXml As MSXML2.DOMDocument60
Since I checked in tools/references "Microsof XML, v6.0"
I thought that the queries having a tag
might cause a problem. and I added the follwoing lines:
Dim childs As IXMLDOMNodeList
Set childs = oXml.SelectNodes("/concepts")
MsgBox "childs " & childs.Length
which also gives 0 as result. I would expect 3, since concepts has three children, namely ConceptModel, Filter and Queries. So, I am even more puzzled.

As close as possible to your OP
I 'd draw your attention to several errors or misunderstandings:
[1] Invalid .LoadXML Syntax
What is then the difference between .LoadXML ("C:\folder\folder\name.xml") and .Load ("C:\folder\folder\name.xml") ?
Load expects a file path and then loads the file content into the oXML object.
LoadXML doesn't expect a file parameter, but its actual XML text content that has to be a well formed string.
[2] XML distinguishes between lower and upper case, therefore nodes need to be addressed by their exact literal names:
the <Query> node wouldn't be identified by "query", "ConceptModel" isn't the same as "conceptmodel".
As second issue I would like to ask if
Dim oXml As MSXML2.DOMDocument would be the same as
Dim oXml As MSXML2.DOMDocument60,
since I checked in tools/references "Microsof XML, v6.0"?
No, it isn't. - Please note that the former declaration would load version 3.0 by default.
However it's absolutely preferrable to get the version 6.0 (any other versions are obsolete nowadays!)
As you are using so called early binding (referencing "Microsoft XML, v6.0"), I'll do the same but am referring to the current version 6.0:
Dim oXml As MSXML2.DOMDocument60 ' declare the xml doc object
Set oXml = New MSXML2.DOMDocument60 ' set an instance of it to memory
[3] misunderstanding some XPath expressions
A starting slash "/" in the XPath expression always refers to the DocumentElement (<Concepts> here),
you can add .DocumentElement to your document object instead. A starting double slash "//xyz" would find any "xyz" node if existant.
For instance
oXml.SelectNodes("//Query").Length
returns the same childNodes number (here: 3) as
oXml.DocumentElement.SelectNodes("//Query").Length ' or
oXml.SelectSingleNode("//Queries").ChildNodes.Length ' or even
oXml.SelectNodes("/*/*/*/Query").Length`.
Code example with reference to XML version 6.0
Of course you'd have to loop over several xml files, the example only uses one (starting in row 2).
Just for the case of not well formed xml files I added a detailled error Routine that enables you to identify the presumed error location. Load and LoadXML both return a boolean value (True if loaded correctly, False if not).
Sub xmlTest()
Dim ws As Worksheet: Set ws = ThisWorkbook.Sheets(3)
Dim oXml As MSXML2.DOMDocument60
Set oXml = New MSXML2.DOMDocument60
With oXml
.validateOnParse = True
.setProperty "SelectionLanguage", "XPath" ' necessary in version 3.0, possibly redundant here
.async = False
If Not .Load(ThisWorkbook.Path & "\xml\" & "name.xml") Then
Dim xPE As Object ' Set xPE = CreateObject("MSXML2.IXMLDOMParseError")
Dim strErrText As String
Set xPE = .parseError
With xPE
strErrText = "Load error " & .ErrorCode & " xml file " & vbCrLf & _
Replace(.URL, "file:///", "") & vbCrLf & vbCrLf & _
xPE.reason & _
"Source Text: " & .srcText & vbCrLf & vbCrLf & _
"Line No.: " & .Line & vbCrLf & _
"Line Pos.: " & .linepos & vbCrLf & _
"File Pos.: " & .filepos & vbCrLf & vbCrLf
End With
MsgBox strErrText, vbExclamation
Set xPE = Nothing
Exit Sub
End If
' Debug.Print "|" & oXml.XML & "|"
Dim Queries As IXMLDOMNodeList, Query As IXMLDOMNode
Dim Searched As String
Dim i&, ii&
i = 2 ' start row
' start XPath
Searched = "ConceptModel/Queries/Query" ' search string
Set Queries = oXml.DocumentElement.SelectNodes(Searched) ' XPath
'
ws.Cells(i, 1) = IIf(Queries.Length = 0, "No items", Queries.Length & " items")
ii = 1
For Each Query In Queries
ii = ii + 1
ws.Cells(i, ii) = Query.Text
Next
End With
End Sub
Additional hints
You also might be interested in an example how to list all child nodes via XMLDOM and to obtain attribute names from XML using VBA.
I include a further hint due to later comment (thanks to #barrowc )
"A further issue with using MSXML, v3.0 is that the default selection language is XSLPatterns instead of XPath.
Details on some of the differences between MSXML versions are here
and the differences between the two selection languages are discussed here."
In the current MSXML2 version 6.0 XPath 1.0 is fully supported. So it seems XSL Patterns have been implemented by Microsoft in earlier days, basically it can be regarded as a simplified subset of XPath expressions before W3C standardisation of XPath.
MSXML2 Version 3.0 allows the integration of XPath 1.0 at least by explicit selection language setting:
oXML.setProperty "SelectionLanguage", "XPath" ' oXML being the DOMDocument object as used in original post

It is the special characters (german alphabet) meaning you need to do something like a batch replace on the XML files so opening line is not this:
<?xml version="1.0" encoding="UTF-8"?>
but this:
<?xml version="1.0" encoding="iso-8859-1" ?>
Code to test with after:
Option Explicit
Public Sub test()
Dim xmlDoc As Object
Set xmlDoc = CreateObject("MSXML2.DOMDocument") 'New MSXML2.DOMDocument60
With xmlDoc
.validateOnParse = True
.setProperty "SelectionLanguage", "XPath"
.async = False
If Not .Load("C:\Users\User\Desktop\Test.xml") Then
Err.Raise .parseError.ErrorCode, , .parseError.reason
End If
End With
Debug.Print xmlDoc.SelectNodes("//Query").Length
End Sub
This is the XML I am using:
<?xml version="1.0" encoding="iso-8859-1" ?>
<Concepts>
<ConceptModel name="food">
<Filters>
<Filter type="CC"/>
</Filters>
<Queries>
<Query lang="EN">(cheese, bread, wine)</Query>
<Query lang="DE">(Käse, Brot, Wein)</Query>
<Query lang="FR">(fromaige, pain, vin)</Query>
</Queries>
</ConceptModel>
</Concepts>

Related

VBA generates Error on Mac but not Windows

This VBA code tracks the activity of PowerPoint slides and store the record in an Excel worksheet, saved on my local drive (same folder as the slides):
Dim slideShowRunning As Boolean
Dim counter As Integer
Dim st As Dat
Dim i As Integer
Dim sttime As Date
Dim oxlapp As Object
Dim oxlwb As Object
Dim oxlws As Object
Dim edtime As Date
Sub SlideShowBegin(ByVal Wn As SlideShowWindow)
st = Date
sttime = Time
counter = 0
Debug.Print " works;1 "
Set oxlapp = CreateObject("Excel.Application")
Debug.Print " works; 2"
oxlapp.Visible = False
Debug.Print " works; 3"
Set oxlwb = oxlapp.Workbooks.Open(ActivePresentation.Path & "\" & "record.xlsx")
Debug.Print " works; 4"
Set oxlws = oxlwb.Sheets("TimeRecord")
Debug.Print " works; 5"
i = oxlws.Range("A99919").End(-4162).Row
oxlws.Range("A1").Offset(i, 0).Value = st
oxlws.Range("A1").Offset(i, 1).Value = sttime
Debug.Print " works; 6"
End Sub
Public Sub OnSlideShowPageChange(ByVal Wn As SlideShowWindow)
If TypeName(slideShowRunning) = "Empty" Or slideShowRunning = False Then
slideShowRunning = True
SlideShowBegin Wn
End If
End Sub
Public Sub OnSlideShowTerminate(ByVal Wn As SlideShowWindow)
Name = Application.ActivePresentation.Name
slideShowRunning = False
edtime = Time
Debug.Print " works; 7"
ivalue = DateDiff("s", sttime, edtime)
Debug.Print ivalue
oxlws.Range("A1").Offset(i, 2).Value = edtime
oxlws.Range("A1").Offset(i, 3).Value = ivalue
oxlws.Range("A1").Offset(i, 4).Value = Name
Debug.Print " works; 9"
oxlapp.DisplayAlerts = False
Debug.Print " works; 10"
oxlwb.Save
Debug.Print " works; 11"
oxlapp.Visible = True
Debug.Print " works; 12"
oxlapp.DisplayAlerts = True
Debug.Print " works; 13"
End Sub
Note:
The code stores the PowerPoint slide Name along with slide opening
time and slide closing time.
The details are stores in Excel Sheet.
I have many slides all with same code. The code is working fine in Windows.
The code won't work when I run it on MAC.
I know there are few changes that need to be done to make it work on Mac but can't figure out what. Any help would be much appreciated.
(Hijacking AlexG's explanation)
From Wikipedia:
A path is a string of characters used to uniquely identify a location in a directory structure. It is composed by following the directory tree hierarchy in which components, separated by a delimiting character, represent each directory. The delimiting character is most commonly the slash (/), the backslash character (\), or colon (:), though some operating systems may use a different delimiter.
For example,
Classic Mac OS used : as a directory separator (eg., Macintosh HD:Documents:Letter)
Current macOS uses / as a directory separator (eg., /home/user/docs/Letter.txt)
Windows can use either \ or / as a directory separator (eg., C:\user\docs\Letter.txt)
Rather than trying to remember all the different symbols, there's a VBA property called Application.PathSeparator, which returns the path separator for the current operating system.
So, try changing your code from:
Set oxlwb = oxlapp.Workbooks.Open(ActivePresentation.Path & "\" & "record.xlsx")
...to:
Set oxlwb = oxlapp.Workbooks.Open(ActivePresentation.Path & Application.PathSeparator & "record.xlsx")
...and maybe that will solve your problem.
If not, you'll need to provide more specific information about what error you're getting and where.
I can't test it (and you'll find very little support for Excel on Mac) since not very many people use Excel on Mac — especially VBA. (Personally, the last time I touched a Macintosh was ~1986.)
There are several differences between Excel for Mac and Excel for Windows. You can read more about them here, perhaps starting with this explanation.

Having trouble with XML phase on VBA Excel

I need to parse a XML file having the structure as follows: (I can't show the data as it is confidential)
<?xml version="1.0" encoding="UTF-8"?>
<GACDWBulkLoadInterface xsi:schemaLocation="http://www.example.org/GACDWSchema GACDWSchema.xsd" xmlns="http://www.example.org/GACDWSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<BLConfiguration>
<BLProperties>
<BLProperty>
<key>isEmpty</key>
<value xmlns:xsd="http://www.w3.org/2001/XMLSchema" xsi:type="xsd:boolean">true</value>
</BLProperty>
</BLProperties>
<version>api-ag-1.1.68-20210315-43</version>
</BLConfiguration>
<Customer>
<CustomerID></CustomerID>
<Customer_Name></Customer_Name>
<ISOCountryCode></ISOCountryCode>
<customerType></customerType>
<dataSourceProvider></dataSourceProvider>
<function></function>
<TopLevelCI>
<Contact Action="Create">
<Id>4</Id>
<artifactType></artifactType>
<contactType></contactType>
<email></email>
<firstName></firstName>
<lastName></lastName>
<userID></userID>
<ChildCI/>
</Contact>
<PhysicalComputerSystem Action="Create">
<Id>2</Id>
<artifactType></artifactType>
<assetLifeCycleState></assetLifeCycleState>
<category></category>
<ciLifeCycleState></ciLifeCycleState>
<custname></custname>
<description></description>
<discontinuedDate></discontinuedDate>
<machineType></machineType>
<managed></managed>
<manufacturer></manufacturer>
<model></model>
<serialNumber></serialNumber>
<virtualFlag></virtualFlag>
<ChildCI>
<PhysicalComputerSystemContact Action="Create">
<contactUID></contactUID>
</PhysicalComputerSystemContact>
<PhysicalComputerSystemLocation Action="Create">
<buildingID></buildingID>
<buildingName></buildingName>
<city></city>
<countryName></countryName>
<isoCountryCode></isoCountryCode>
<postalCode></postalCode>
<siteName></siteName>
<stateProvince></stateProvince>
<streetAddress></streetAddress>
</PhysicalComputerSystemLocation>
</ChildCI>
</PhysicalComputerSystem>
<OperatingSystem Action="Create">
<Id>1</Id>
<aliasHostname></aliasHostname>
<artifactType></artifactType>
<assetLifeCycleState></assetLifeCycleState>
<backupContr></backupContr>
<barSaId></barSaId>
<cacfCustomGroup></cacfCustomGroup>
<category></category>
<ciLifeCycleState></ciLifeCycleState>
<classification></classification>
<cpuCount></cpuCount>
<description></description>
<discontinuedDate></discontinuedDate>
<fqhn></fqhn>
<hostsystemID></hostsystemID>
<hostsystemType></hostsystemType>
<installationDate></installationDate>
<ipAddress></ipAddress>
<majorBusProc></majorBusProc>
<managed></managed>
<memory></memory>
<memoryUnit></memoryUnit>
<osModLevel></osModLevel>
<osName></osName>
<osBuildNumber></osBuildNumber>
<osProvider></osProvider>
<osRelease></osRelease>
<osVersion></osVersion>
<prodDate></prodDate>
<purgeDate></purgeDate>
<purpose></purpose>
<securityClass></securityClass>
<serverType></serverType>
<supportEnddate></supportEnddate>
<useAliasHostname></useAliasHostname>
<virtualFlag></virtualFlag>
<ChildCI>
<OperatingSystemContact Action="Create">
<contactUID></contactUID>
</OperatingSystemContact>
<OperatingSystemEnvironmentProfile Action="Create">
<authFlag></authFlag>
<authExceptDate></authExceptDate>
<authExceptReasonCode></authExceptReasonCode>
<authExceptText></authExceptText>
<disasterRecSys></disasterRecSys>
<domain></domain>
<extUsersMax></extUsersMax>
<hcAutoInterv></hcAutoInterv>
<hcExceptDate></hcExceptDate>
<hcExceptReasonCode></hcExceptReasonCode>
<hcExceptText></hcExceptText>
<hcUnit></hcUnit>
<healthchAutoFlag></healthchAutoFlag>
<highestClientDataClassification></highestClientDataClassification>
<highestDataClassification></highestDataClassification>
<highestIbmDataClassification></highestIbmDataClassification>
<ibmCbnInterval></ibmCbnInterval>
<ibmCbnRequired></ibmCbnRequired>
<ibmCbnType></ibmCbnType>
<ibmCbnUnit></ibmCbnUnit>
<ibmPrivIDUnit></ibmPrivIDUnit>
<ibmPrivType></ibmPrivType>
<ibmQevFlag></ibmQevFlag>
<ibmQevInterval></ibmQevInterval>
<ibmQevUnit></ibmQevUnit>
<ibmUsersMax></ibmUsersMax>
<internetAccFlag></internetAccFlag>
<personalData></personalData>
<privIDFlag></privIDFlag>
<privIDInterv></privIDInterv>
<regulatoryRequirements></regulatoryRequirements>
<remediationtimeHi></remediationtimeHi>
<remediationtimeLo></remediationtimeLo>
<remediationtimeMd></remediationtimeMd>
<rerFlag></rerFlag>
<rerInterv></rerInterv>
<rerType></rerType>
<rerUnit></rerUnit>
<scExceptDate></scExceptDate>
<scExceptReasonCode></scExceptReasonCode>
<scExceptText></scExceptText>
<secExemptionDesc></secExemptionDesc>
<secExemptionFlag></secExemptionFlag>
<securityPackage></securityPackage>
<securityPolicyCategory></securityPolicyCategory>
<securityPolicyOther></securityPolicyOther>
<securityPolicyType></securityPolicyType>
<serverPurpose></serverPurpose>
<sharedIDFlag></sharedIDFlag>
<shrdExceptDate></shrdExceptDate>
<shrdExceptReasonCode></shrdExceptReasonCode>
<shrdExceptText></shrdExceptText>
<vitalBusProcessYN></vitalBusProcessYN>
</OperatingSystemEnvironmentProfile>
<OperatingSystemIpAddress Action="Create">
<dotnotation></dotnotation>
<hostName></hostName>
<isPrimaryIpHost></isPrimaryIpHost>
<name></name>
<netMask></netMask>
<vsExceptDate></vsExceptDate>
<vsExceptReasonCode></vsExceptReasonCode>
<vsExceptText></vsExceptText>
<vulScanUnit></vulScanUnit>
<vulscanAutoFlag></vulscanAutoFlag>
<vulscanAutoInterval></vulscanAutoInterval>
</OperatingSystemIpAddress>
</ChildCI>
</OperatingSystem>
<Subsystem Action="Create">
<Id>3</Id>
<applSIDataType></applSIDataType>
<artifactType></artifactType>
<assetLifeCycleState></assetLifeCycleState>
<ciLifeCycleState></ciLifeCycleState>
<controlProductName></controlProductName>
<controlProductRelease></controlProductRelease>
<controlProductType></controlProductType>
<controlProductVendor></controlProductVendor>
<controlProductVersion></controlProductVersion>
<description></description>
<discontinuedDate></discontinuedDate>
<fqhn></fqhn>
<managed></managed>
<name></name>
<parentUID></parentUID>
<piFlag></piFlag>
<productModLevel></productModLevel>
<productName></productName>
<productPatchLevel></productPatchLevel>
<productRelease></productRelease>
<productVersion></productVersion>
<purgeDate></purgeDate>
<scanTime></scanTime>
<vendorName></vendorName>
<ChildCI>
<SubsystemContact Action="Create">
<contactUID></contactUID>
</SubsystemContact>
<SubsystemEnvironmentProfile Action="Create">
<authFlag></authFlag>
<authExceptDate></authExceptDate>
<authExceptReasonCode></authExceptReasonCode>
<authExceptText></authExceptText>
<disasterRecSys></disasterRecSys>
<hcAutoInterv></hcAutoInterv>
<hcExceptDate></hcExceptDate>
<hcExceptReasonCode></hcExceptReasonCode>
<hcExceptText></hcExceptText>
<hcUnit></hcUnit>
<healthchAutoFlag></healthchAutoFlag>
<ibmCbnInterval></ibmCbnInterval>
<ibmCbnRequired></ibmCbnRequired>
<ibmCbnType></ibmCbnType>
<ibmCbnUnit></ibmCbnUnit>
<ibmPrivIDUnit></ibmPrivIDUnit>
<ibmPrivType></ibmPrivType>
<ibmQevFlag></ibmQevFlag>
<ibmQevInterval></ibmQevInterval>
<ibmQevUnit></ibmQevUnit>
<privIDFlag></privIDFlag>
<privIDInterv></privIDInterv>
<rerFlag></rerFlag>
<rerInterv></rerInterv>
<rerType></rerType>
<rerUnit></rerUnit>
<scExceptDate></scExceptDate>
<scExceptReasonCode></scExceptReasonCode>
<scExceptText></scExceptText>
<sharedIDFlag></sharedIDFlag>
<shrdExceptDate></shrdExceptDate>
<shrdExceptReasonCode></shrdExceptReasonCode>
<shrdExceptText></shrdExceptText>
<vitalBusProcessYN></vitalBusProcessYN>
</SubsystemEnvironmentProfile>
</ChildCI>
</Subsystem>
</TopLevelCI>
<Relationship>
<RelationshipName>Hosted_by</RelationshipName>
<sourceCIType>OperatingSystem</sourceCIType>
<targetCIType>OperatingSystem</targetCIType>
</Relationship>
<Relationship>
<RelationshipName>Runs_on</RelationshipName>
<sourceCIType>Subsystem</sourceCIType>
<targetCIType>OperatingSystem</targetCIType>
</Relationship>
<Relationship>
<RelationshipName>Runs_on_hw</RelationshipName>
<sourceCIType>OperatingSystem</sourceCIType>
<targetCIType>PhysicalComputerSystem</targetCIType>
</Relationship>
</Customer>
</GACDWBulkLoadInterface>
I have researched and come up with my VBA code here
Sub ReadXML()
Dim oXML As MSXML2.DOMDocument60
Dim vaFile As Variant
Set oXML = New MSXML2.DOMDocument60
'Open Browse file dialog
vaFile = Application.GetOpenFilename("XML Files (*.xml), *.xml", _
Title:="Select XML files", MultiSelect:=False)
oXML.validateOnParse = True
oXML.setProperty "SelectionLanguage", "XPath" ' necessary in version 3.0, possibly redundant here
oXML.async = False
If Not oXML.Load(vaFile) Then 'Load XML has gone bad
Dim xPE As Object ' Set xPE = CreateObject("MSXML2.IXMLDOMParseError")
Dim strErrText As String
Set xPE = oXML.parseError
With xPE
strErrText = "Load error " & .ErrorCode & " xml file " & vbCrLf & _
Replace(.URL, "file:///", "") & vbCrLf & vbCrLf & _
xPE.reason & _
"Source Text: " & .srcText & vbCrLf & vbCrLf & _
"Line No.: " & .Line & vbCrLf & _
"Line Pos.: " & .linepos & vbCrLf & _
"File Pos.: " & .filepos & vbCrLf & vbCrLf
End With
MsgBox strErrText, vbExclamation
Set xPE = Nothing
Exit Sub
End If
Debug.Print "|" & oXML.XML & "|"
Dim nodeList As MSXML2.IXMLDOMNodeList, iNode As MSXML2.IXMLDOMNode
Dim Searched As String
Searched = "/*/*"
Set nodeList = oXML.SelectNodes(Searched)
'Set Queries = oXML.DocumentElement.SelectNodes(Searched)
For Each iNode In nodeList
'Debug.Print "<" & iNode.BaseName & ">"
Debug.Print "<" & iNode.nodeName & ">"
Next
However, My problem is no matter what I come up with Xpath queries. None of them work except \* or \.
I can't specific any path. Both .SelectSingleNode or .SelectNodes alway return length =0. Is there any problem with my code or my XML file or Xpath syntax?
As mentioned in comments your xml document has namespace definitions in its DocumentElement <GACDWBulkLoadInterface> (xmlns stands for xml name space). Furthermore "it contains a default namespace so any attempted parsing on named nodes must map to this namespace URI otherwise returns nothing."
To allow eventual analysis it's necessary to include a user defined prefix (e.g. :s) into explicit namespace settings, which can be used in later XPath expressions:
Dim oXML As MSXML2.DOMDocument60
Set oXML = New MSXML2.DOMDocument60
oXML.validateOnParse = True
Dim XMLNamespaces As String
XMLNamespaces = "xmlns:s='http://www.example.org/GACDWSchema'"
oXML.SetProperty "SelectionNamespaces", XMLNamespaces
Eventually you can define any XPath expression, e.g. the childnodes of Customer[1]:
Dim nodeList As MSXML2.IXMLDOMNodeList, iNode As MSXML2.IXMLDOMNode
Dim Searched As String
Searched = "//s:Customer/s:*"
Set nodeList = oXML.SelectNodes(Searched)
Related links
Displaying XML structures including attributes via recursive function calls
Display XML with hierarchy in cells
Apparently OP's source code: Parse DomDocument

Edit Word document embedded in a workbook and save as copy

I have made a Word template and inserted it to Excel as an object. I am opening it with the code and inputting data to bookmarks and main part. However after code is done doing processes my embedded template has all the data inside. So it is not a template anymore but a file I have created with the code.
Embedded Word template should be opened as a copy, as I do not want to make any changes to original embedded template or null it with the code all the time (or is it the only way it possible to do?). Is it anyhow possible with the code to open embedded Word document as a copy, make changes to it and save as a Word document? I can't find anything useful in the internet.
Sub opentemplateWord()
Dim sh As Shape
Dim objWord As Object ''Word.Document
Dim objOLE As OLEObject
Dim wSystem As Worksheet
Dim cell As Range
Set wSystem = Worksheets("Templates")
''The shape holding the object from 'Create from file'
''Object 2 is the name of the shape
Set sh = wSystem.Shapes("Object 2")
''Activate the contents of the object
sh.OLEFormat.Activate
''The OLE Object contained
Set objOLE = sh.OLEFormat.Object
''This is the bit that took time
Set objWord = objOLE.Object
'>------- This Part Inputs Bookmarks
objWord.Bookmarks.Item("ProjectName1").Range.Text = ThisWorkbook.Sheets("MAIN").Range("D15").Value
objWord.Bookmarks.Item("ProjectName2").Range.Text = ThisWorkbook.Sheets("MAIN").Range("D16").Value
'>------- This Part Inputs Text
'ActiveWindow.ActivePane.View.SeekView = wdSeekMainDocument '<--- This is for closing footer and header?
With objWord '<--| reference 'Selection' object
For Each cell In ThisWorkbook.Worksheets("Offer Letter").Range("C1", ThisWorkbook.Worksheets("Offer Letter").Range("C" & Rows.Count).End(xlUp))
Select Case LCase(cell.Value)
Case "title"
.TypeParagraph
.Style = objWord.ActiveDocument.Styles("Heading 1")
.TypeText Text:=cell.Offset(0, -1).Text
Case "main"
.TypeParagraph
.Style = objWord.ActiveDocument.Styles("Heading 2")
.TypeText Text:=cell.Offset(0, -1).Text
Case "sub"
.TypeParagraph
.Style = objWord.ActiveDocument.Styles("Heading 3")
.TypeText Text:=cell.Offset(0, -1).Text
Case "sub-sub"
.TypeParagraph
.Style = objWord.ActiveDocument.Styles("Heading 4")
.TypeText Text:=cell.Offset(0, -1).Text
End Select
Next cell
End With
objWord.Application.Visible = False
''Easy enough
objWord.SaveAs2 ActiveWorkbook.Path & "\" & Sheets("Other Data").Range("AN2").Value & ", " & Sheets("Other Data").Range("AN7").Value & "_" & Sheets("Other Data").Range("AN8").Value & "_" & Sheets("Other Data").Range("AX2").Value & ".docx"
End Sub
This is an interesting task which I haven't looked at in a few years... The trick is to open the document in the Word application interface, instead of in-place in Excel.
I've adapted the code in the question. In order to make it easier to follow (shorter) I've removed the editing in the Word document except for writing to a couple of bookmarks. That can, of course, be put back in.
I very much recommend using VBA to assign a name to the Shape. Office applications feel free to change a generic name they assign, so relying on "Object 2" could, sometime down the line, lead to problems.
Do NOT use the Activate method in this scenario (commented out). If the object is already activated in-place the document cannot be opened in the Word.Application.
Use the OLEFormat.Object.Verb method with the parameter xlOpen to open the document in Word.
Once it's open, the OLE object can be set to a Word document object.
From your comments: 'ActiveWindow.ActivePane.View.SeekView = wdSeekMainDocument '<--- This is for closing footer and header? No. Better to work with the corresponding Range objects. There are lots of examples "out there" for that. Ask a new question if you run into problems using them.
A Word document opened in the Word application can be saved as a file (a document opened in-place cannot). The question about not saving edits, however... there are two basic approaches:
SaveAs before editing, open that document, edit and save. The original should then be untouched
Do the editing in the object, save then undo the changes. This approach is shown in the code sample
Word's object model is able to group any number of actions into a single "undo record".
Set objUndo = objWord.Application.UndoRecord
objUndo.StartCustomRecord "Edit In Word"
After the editing has been done, to get back to an "empty" (unchanged) document:
objUndo.EndCustomRecord
Set objUndo = Nothing
objWord.Undo
Finally, to close the document quit the Word application without saving changes.
Sub opentemplateWord()
Dim sh As Shape
Dim objWord As Object, objNewDoc As Object ''Word.Document
Dim objOLE As OLEObject
Dim wSystem As Worksheet
Dim cell As Range
Set wSystem = Worksheets("Templates")
''The shape holding the object from 'Create from file'
''Object 2 is the name of the shape
Set sh = wSystem.Shapes("WordFile")
''The OLE Object contained
Set objOLE = sh.OLEFormat.Object
'Instead of activating in-place, open in Word
objOLE.Verb xlOpen
Set objWord = objOLE.Object 'The Word document
Dim objUndo As Object 'Word.UndoRecord
'Be able to undo all editing performed by the macro in one step
Set objUndo = objWord.Application.UndoRecord
objUndo.StartCustomRecord "Edit In Word"
With objWord
.Bookmarks.Item("ProjectName1").Range.Text = ThisWorkbook.Sheets("MAIN").Range("D15").Value
.Bookmarks.Item("ProjectName2").Range.Text = ThisWorkbook.Sheets("MAIN").Range("D16").Value
objWord.SaveAs2 ActiveWorkbook.Path & "\" & Sheets("Other Data").Range("AN2").Value & _
", " & Sheets("Other Data").Range("AN7").Value & "_" & _
Sheets("Other Data").Range("AN8").Value & "_" & _
Sheets("Other Data").Range("AX2").Value & ".docx"
objUndo.EndCustomRecord
Set objUndo = Nothing
objWord.Undo
.Application.Quit False
End With
Set objWord = Nothing
End Sub
I am doing the same thing and used this post for reference.
I got rid of objUndo object and CustomRecord methods.
Instead, I used the Duplicate method on the OLEobject to protect the original emmbedded doc from being edited. Seemed easier this way. The previous duplicates get removed at beginning so they don't pile up to infinity.
Sub opentemplateWord_v2()
Dim wSheet As Worksheet
Dim sh As Shape
Dim objOLE As OLEObject '<-- og emmbeded doc
Dim objOLE2 As OLEObject '<-- duplicate doc
Dim objWord As Object
Set wSheet = Worksheets("TemplateSheet") '<-- worksheet embedded doc is on
'--remove all duplicates from previous runs
'
' *the original embedded doc is named 'Object 1'
' (seen by clicking on doc --> the 'Name Box' is at the top left)
'
For Each sh In wSheet.Shapes
If sh.Name <> "Object 1" Then sh.Delete
Next
Set sh = wSheet.Shapes("Object 1") '<-- set the shape to the embedded doc Object
Set objOLE = sh.OLEFormat.Object '<-- get the embedded object in shape
Set objOLE2 = objOLE.Duplicate '<-- create duplicate of embedded object
objOLE2.Verb xlOpen '<-- open duplicate doc in the Word application
Set objWord = objOLE2.Object '<-- The Word document
'~~~~~~~ do the stuff here ~~~~~~~~~~~~~~~~~~~~~~~~~
'
' for mine, I am going to find/replace keyfeilds on the document
' (this example is replacing "Planet" with "earth"
'
With objWord.Content.Find
.text = "Planet"
.Forward = True
.MatchWholeWord = True
.MatchCase = False
.Wrap = 1 'wdFindContinue
.Execute Replace:=1 'wdReplaceOne
If .Found = True Then .Parent.text = "earth"
End With
'-- No Save Action
' I amsume the user will want view/edit the output after execution,
' and save it in a specific place
'
End Sub

Need dictionary/loop assistance

I have some work to complete where I have 9 tabs of data (some of which contain thousands of lines of data). Each tab contains (amongst others) a policy number, a credit and/or a debit number.
Every policy number will have a match somewhere in the tabs containing an equal credit or debit, e.g.
tab 1 will have Policy number 123 and a credit of £100 and
tab 5 will also have policy number 123 with a debit of £100.
What I'm looking to do is, look through each policy number on every tab and find where the opposite amount is located adding the location address to each policy number.
I'm certainly not looking for anyone to create the coding for me, but what I am looking for is advice. I've looked at using loops but feel this may take a very long time to process. I've also looked at Dictionaries but am relatively new to these so am not very confident.
Is what I'm looking for even possible? And if so any ideas where to start or pointers? Any advice is greatly appreciated. Thanks!
Usage Example
#Matt555, You can test the created XML file with the following code to get the sheet names of policy "123" and debit of 100. I tested the code assuming your titles in row A:A contain "policy" and "debit"
#Peh, You are right, xml dom methods aren't used too often within vba. The advantage of using XML in this connex is a great flexibility in searching via XPath as well as performance over huge files. I prefer it even to arrays or dictionaries when filtering unique values. It is possible to return the found item number in node lists without looping through the whole data set ...
Option Explicit
Sub testPolicy()
Dim policy
Dim debit As Double
policy = "123"
debit = "100"
MsgBox "Policy " & policy & " found in " & vbNewLine & _
findSheetName(policy, debit), vbInformation, "Policy " & policy & " in Tabs"
' You can easily split this to an array and analyze the results
End Sub
Function findSheetName(ByVal policy, Optional ByVal debit) As String
' Purpose: Finds Sheet Names where policy AND/OR debit is found
' Note: Assuming your titles in row A:A contain "policy" and "debit"
' You can declare xDoc also after Option Explicit to make it public
Dim xDoc As Object
Dim xNd As Object ' MSXML.IXMDOMNode
Dim xNdList As Object ' MSXML.IXMLDOMNodeList
Dim s As String
' XPath expression
Dim xPth As String
If IsMissing(debit) Then
xPth = "//row[policy=""" & policy & """]"
Else
xPth = "//row[policy=""" & policy & """][debit=""" & debit & """]"
End If
' XML to memory
Set xDoc = CreateObject("MSXML2.Domdocument.6.0")
' allow XPath
xDoc.setProperty "SelectionLanguage", "XPath"
xDoc.validateOnParse = False
' ========
' LOAD XML
' ========
xDoc.Load ThisWorkbook.Path & "\" & "output.xml"
' Loop thru NodeList
Set xNdList = xDoc.DocumentElement.SelectNodes(xPth)
Debug.Print xPth, xNdList.Length
For Each xNd In xNdList
s = s & xNd.ParentNode.NodeName & "|"
Next xNd
Set xDoc = Nothing
findSheetName = s
End Function
You could
a) create an XML file looping through all sheets,
b) open it via load method and
c) perform a simple XPath search (I can give some examples later)
I modified a recent answer (cf. excel-vba-xml-parsing-performance)
to do step "a)" using late binding thus
a) avoiding a reference to the latest MS XML Version Version 6 (msxml6.dll) and
b) getting data over all xheets. XML allows you structured search via XPath over nodes in a logical structure comparable to HTML. The root node in this example is called data, the following nodes are named with the sheets' names and the subsequent nodes get the names in row A:A of each sheet.
A XML file is a simple text file, which you can open by a text editor. Above all you can use VBA XMLDOM methods to analyze or search through the items (nodes). I will give you examples to relating to your question, but give me some time. => see answer "Usage Example", where I explain some Advantages of XML, too (#Peh).
Please pay Attention to the added notes, too.
Option Explicit
Sub xmlExportSheets()
' Zweck: XML Export over all sheets in workbook
' cf. Site: [excel-vba-xml-parsing-performance][1][https://stackoverflow.com/questions/40986395/excel-vba-xml-parsing-performance/40987237#40987237][1]
' Note: pretty printed raw output with line breaks and indentation using an embedded XSLT stylesheet
On Error GoTo ErrHandle
' A. Declarations
' 1 DECLARE XML DOC OBJECT '
' a) Early Binding: VBA REFERENCE MSXML, v6.0 necessary'
' Dim doc As New MSXML2.DOMDocument60, xslDoc As New MSXML2.DOMDocument60, newDoc As New MSXML2.DOMDocument60
' Dim root As IXMLDOMElement, dataNode As IXMLDOMElement, datesNode As IXMLDOMElement, namesNode As IXMLDOMElement
' b) Late Binding XML Files:
Dim doc As Object
Dim xslDoc As Object
Dim newDoc As Object
' c) Late Binding XML Nodes:
Dim root As Object
Dim sh As Object ' xml node containing Sheet Name
Dim dataNode As Object
Dim datesNode As Object
Dim namesnode As Object
' 2 DECLARE other variables
Dim i As Long
Dim j As Long
Dim tmpValue As Variant
Dim tit As String
Dim ws As Worksheet
' B. XML Docs to Memory
Set doc = CreateObject("MSXML2.Domdocument.6.0")
Set xslDoc = CreateObject("MSXML2.Domdocument.6.0")
Set newDoc = CreateObject("MSXML2.Domdocument.6.0")
' C. Set DocumentElement (= root node)'
Set root = doc.createElement("data")
' D. Create Root Node
doc.appendChild root
' ===========================
' ITERATE THROUGH Sheets
' ===========================
For Each ws In ThisWorkbook.Sheets
Set sh = doc.createElement(ws.Name) '
root.appendChild sh
' ===========================
' ITERATE THROUGH ROWS ' A2:NNn
' ===========================
For i = 2 To ws.UsedRange.Rows.Count ' Sheets(1)
' DATA ROW NODE '
Set dataNode = doc.createElement("row") '
sh.appendChild dataNode
' TABLES NODE (orig.: DATES NODE) '
Set datesNode = doc.createElement(ws.Cells(1, 1)) ' Dates
datesNode.Text = ws.Range("A" & i)
dataNode.appendChild datesNode
' NAMES NODE '
For j = 1 To ws.UsedRange.Columns.Count - 1 ' = 12
tit = ws.Cells(1, j + 1)
tmpValue = ws.Cells(i, j + 1)
Set namesnode = doc.createElement(tit)
namesnode.Text = tmpValue
dataNode.appendChild namesnode
Next j
Next i
Next ws
' =============================
' PRETTY PRINT RAW OUTPUT (XSL)
' =============================
xslDoc.LoadXML "<?xml version=" & Chr(34) & "1.0" & Chr(34) & "?>" _
& "<xsl:stylesheet version=" & Chr(34) & "1.0" & Chr(34) _
& " xmlns:xsl=" & Chr(34) & "http://www.w3.org/1999/XSL/Transform" & Chr(34) & ">" _
& "<xsl:strip-space elements=" & Chr(34) & "*" & Chr(34) & " />" _
& "<xsl:output method=" & Chr(34) & "xml" & Chr(34) & " indent=" & Chr(34) & "yes" & Chr(34) & "" _
& " encoding=" & Chr(34) & "UTF-8" & Chr(34) & "/>" _
& " <xsl:template match=" & Chr(34) & "node() | #*" & Chr(34) & ">" _
& " <xsl:copy>" _
& " <xsl:apply-templates select=" & Chr(34) & "node() | #*" & Chr(34) & " />" _
& " </xsl:copy>" _
& " </xsl:template>" _
& "</xsl:stylesheet>"
' XSLT (Transformation)
xslDoc.async = False
doc.transformNodeToObject xslDoc, newDoc
' =================
' Save the XML File
' =================
newDoc.Save ThisWorkbook.Path & "\Output.xml"
MsgBox "Successfully exported Excel data to " & ThisWorkbook.Path & "\Output.XML!", vbInformation
' Regular End of procedure
Exit Sub
ErrHandle:
MsgBox Err.Number & " - " & Err.Description, vbCritical
Exit Sub
End Sub
Note
Sheet names have to be without spaces
Added Note (important hint):
XML Nodes use titles in first row of every sheet. As the modified procedure gets title names via UsedRange it's important not to have any empty cells in row A:A for this example.
Additional remark
I don't know the reason why my prompt answer (marked as "a") was downgraded by someone. I would find it helpful to argue this :-)

Vba to import a sub-portion of a hugh csv file into excel 2010

I have a csv file that has approx 600 fields and approx 100k of rows, i would like to import only select fields and only certian rows where a select set of fields match a certain set of criteria into an existing excel worksheet tab
I attempted to use ms query within excel but it stops at 255 columns, i can import the whole file in excel 2010 (250m) but it is a memory hog and by the time i remove the unneeded fields and rows it locks up my computer.
I would like to kick the import process off with an excel vba macro. I have all the front end code of file selection, etc.... But need some assistance in the text read query convert to excel area of vba
Any assitance would be greatly appreciated
Thanks
Tom
For that many records you would be better off importing the .csv into Microsoft Access, indexing some fields, writing a query that contains only what you want, and then exporting to Excel from the query.
If you really need an Excel-only solution, do the following:
Open up the VBA editor. Navigate to Tools -> References. Select the most recent ActiveX Data Objects Library. (ADO for short). On my XP machine running Excel 2003, it's version 2.8.
Create a module if you don't have one already. Or create one anyway to contain the code at the bottom of this post.
In any blank worksheet paste the following values starting at cell A1:
SELECT Field1, Field2
FROM C:\Path\To\file.csv
WHERE Field1 = 'foo'
ORDER BY Field2
(Formatting issues here. select from, etc should each be in their own row in col A for reference. The other stuff are the important bits and should go in column B.)
Amend the input fields as appropriate for your filename and query requirements, then run thegetCsv() subroutine. It will put the results in a QueryTable object starting at cell C6.
I personally hate QueryTables but the .CopyFromRecordset method I prefer to use with ADO doesn't give you field names. I left the code for that method in, commented out, so you can investigate that way. If you use it, you can get rid of the call to deleteQueryTables() because it's a really ugly hack, it deletes whole columns which you may not like, etc.
Happy coding.
Option Explicit
Function ExtractFileName(filespec) As String
' Returns a filename from a filespec
Dim x As Variant
x = Split(filespec, Application.PathSeparator)
ExtractFileName = x(UBound(x))
End Function
Function ExtractPathName(filespec) As String
' Returns the path from a filespec
Dim x As Variant
x = Split(filespec, Application.PathSeparator)
ReDim Preserve x(0 To UBound(x) - 1)
ExtractPathName = Join(x, Application.PathSeparator) & Application.PathSeparator
End Function
Sub getCsv()
Dim cnCsv As New ADODB.Connection
Dim rsCsv As New ADODB.Recordset
Dim strFileName As String
Dim strSelect As String
Dim strWhere As String
Dim strOrderBy As String
Dim strSql As String
Dim qtData As QueryTable
strSelect = ActiveSheet.Range("B1").Value
strFileName = ActiveSheet.Range("B2").Value
strWhere = ActiveSheet.Range("B3").Value
strOrderBy = ActiveSheet.Range("B4").Value
strSql = "SELECT " & strSelect
strSql = strSql & vbCrLf & "FROM " & ExtractFileName(strFileName)
If strWhere <> "" Then strSql = strSql & vbCrLf & "WHERE " & strWhere
If strOrderBy <> "" Then strSql = strSql & vbCrLf & "ORDER BY " & strOrderBy
With cnCsv
.Provider = "Microsoft.Jet.OLEDB.4.0"
.ConnectionString = "Data Source=" & ExtractPathName(strFileName) & ";" & _
"Extended Properties=""text;HDR=yes;FMT=Delimited(,)"";Persist Security Info=False"
.Open
End With
rsCsv.Open strSql, cnCsv, adOpenForwardOnly, adLockReadOnly, adCmdText
'ActiveSheet.Range("C6").CopyFromRecordset rsCsv
Call deleteQueryTables
Set qtData = ActiveSheet.QueryTables.Add(rsCsv, ActiveSheet.Range("C6"))
qtData.Refresh
rsCsv.Close
Set rsCsv = Nothing
cnCsv.Close
Set cnCsv = Nothing
End Sub
Sub deleteQueryTables()
On Error Resume Next
With Application
.ScreenUpdating = False
.Calculation = xlCalculationManual
End With
Dim qt As QueryTable
Dim qtName As String
Dim nName As Name
For Each qt In ActiveSheet.QueryTables
qtName = qt.Name
qt.Delete
For Each nName In Names
If InStr(1, nName.Name, qtName) > 0 Then
Range(nName.Name).EntireColumn.Delete
nName.Delete
End If
Next nName
Next qt
With Application
.ScreenUpdating = True
.Calculation = xlCalculationAutomatic
End With
End Sub
You can parse your input file extracting the lines that conform to your criteria. The following code uses the split function on each line of the CSV file to separate the fields and then checks to see if it matches the required criteria. If all the criteria match then selected fields are saved in a new CSV file then you can just open the smaller file. You will need to set the microsoft scripting runtime reference in the VBA editor for this to work.
This method should use little memory as it processes 1 line at a time, I tested it on data of 600 fields and 100000 lines and it took about 45 seconds to process the file with no noticable increase in RAM usage in windows task manager. It is CPU intensive and the time taken would increase as the complexity data, conditions and the number of fields copied increases.
If you prefer to write directly to an existing sheet this can be easily acheived, but you would have to rememove any old data there first.
Sub Extract()
Dim fileHandleInput As Scripting.TextStream
Dim fileHandleExtract As Scripting.TextStream
Dim fsoObject As Scripting.FileSystemObject
Dim sPath As String
Dim sFilenameExtract As String
Dim sFilenameInput As String
Dim myVariant As Variant
Dim bParse As Boolean 'To check if the line should be written
sFilenameExtract = "Exctract1.CSV"
sFilenameInput = "Input.CSV"
Set fsoObject = New FileSystemObject
sPath = ThisWorkbook.Path & "\"
'Check if this works ie overwrites existing file
If fsoObject.FileExists(sPath & sFilenameExtract) Then
Set fileHandleExtract = fsoObject.OpenTextFile(sPath & sFilenameExtract, ForWriting)
Else
Set fileHandleExtract = fsoObject.CreateTextFile((sPath & sFilenameExtract), True)
End If
Set fileHandleInput = fsoObject.OpenTextFile(sPath & sFilenameInput, ForReading)
'extracting headers for selected fields in this case the 1st, 2nd and 124th fields
myVariant = Split(fileHandleInput.ReadLine, ",")
fileHandleExtract.WriteLine (myVariant(0) & "," & _
myVariant(1) & "," & _
myVariant(123))
'Parse each line (row) of the inputfile
Do While Not fileHandleInput.AtEndOfStream
myVariant = Split(fileHandleInput.ReadLine, ",")
'Set bParse initially to true
bParse = True
'Check if the first element is greater than 123
If Not myVariant(0) > 123 Then bParse = False
'Check if second element is one of allowed values
'Trim used to remove pesky leading or lagging values when checking
Select Case Trim(myVariant(1))
Case "Red", "Yellow", "Green", "Blue", "Black"
'Do nothing as value found
Case Else
bParse = False 'As wasn't a value in the condition
End Select
'If the conditions were met by the line then write specific fields to extract file
If bParse Then
fileHandleExtract.WriteLine (myVariant(0) & "," & _
myVariant(1) & "," & _
myVariant(123))
End If
Loop
'close files and cleanup
fileHandleExtract.Close
fileHandleInput.Close
Set fileHandleExtract = Nothing
Set fileHandleInput = Nothing
Set fsoObject = Nothing
End Sub

Resources