Remove (child) node from XML DOM object using VBA (Excel) - excel

I am creating quite complex XML files using a template, replacing special search strings with values which can be entered in an Excel sheet, and then storing the xml-file.
Dim strInpPath As String
Dim strOutpPath As String
Dim fso
Dim f
Dim oDomRd As Object, oNode As Object, i As Long, oAtt As Object, oGroup As Object, oDomWr As Object
Dim oTest As Object
strInpPath = ActiveWorkbook.ActiveSheet.Cells(3, 4).Value
strOutputPath = ActiveWorkbook.ActiveSheet.Cells(4, 4).Value
Set oDomRd = CreateObject("MSXML2.DOMDocument")
oDomRd.Load strInpPath
Set oDomWr = CreateObject("MSXML2.DOMDocument")
Set fso = CreateObject("Scripting.FileSystemObject")
Set f = fso.OpenTextFile(strOutputPath, 2, True)
Set oGroup = oDomRd.SelectNodes("/")
Set oNode = oGroup.NextNode
If Not (oNode Is Nothing) Then
strout = oNode.XML
strout = ScanTable("_S_AND_R_TABLE_1", strout)
oDomRd.LoadXML (strout)
Set oGroup = oDomRd.SelectNodes("/")
Set oNode = oGroup.NextNode
If oNode.HasChildNodes() Then
Set oLists = oNode.DocumentElement
Run RemoveOptionalEmptyTags(oLists)
End If
strout = oNode.XML
f.write (strout)
Else
strout = "001 error reading file"
End If
MsgBox strout
End Function
Some of the field values are not mandatory so they can be left empty. In this case, the first procedure (scantable) enters "##REMOVE##" as value. In the second step, I want to step through the entire DOMObject and remove the nodes having the value "##REMOVE##"
for this second step I created a procedure:
Public Function RemoveOptionalEmptyTags(ByRef oLists)
For Each listnode In oLists.ChildNodes
If listnode.HasChildNodes() Then
Run RemoveOptionalEmptyTags(listnode)
Else
lcBasename = listnode.ParentNode.BaseName
lcText = listnode.Text
If lcText = "##REMOVE##" Then
listnode.ParentNode.RemoveChild listnode
Exit For
End If
End If
Next listnode
End Function
This works pretty fine, the only problem is, that the node is not removed, it only is empty ():
<Cdtr>
<Nm>Name Creditor</Nm>
<PstlAdr>
<Ctry>DE</Ctry>
<AdrLine>Street</AdrLine>
<AdrLine/>
</PstlAdr>
</Cdtr>
now the question:
How can I completely REMOVE the node, so it would look like this (the second is gone):
<Cdtr>
<Nm>Name Creditor</Nm>
<PstlAdr>
<Ctry>DE</Ctry>
<AdrLine>Street</AdrLine>
</PstlAdr>
</Cdtr>

Basically the RemoveChild syntax is correct:
{NodeToDelete}.ParentNode.RemoveChild {NodeToDelete}
But let's repeat the xml structure and note that each text node (if existant) is regarded as a ChildNode of its parent (i.e. one hierarchy level deeper).
<Cdtr> <!-- 0 documentElement -->
<Nm>Name Creditor</Nm> <!-- 1 ChildNode of Nm = 'Name Creditor' -->
<PstlAdr> <!-- 1 listNode.ParentNode.ParentNode -->
<Ctry>DE</Ctry> <!-- 2 ChildNode of Ctry = 'DE' -->
<AdrLine>Street</AdrLine> <!-- 2 ChildNode of AdrLine[1] = 'Street' -->
<AdrLine> <!-- 2 listNode.ParentNode to be removed -->
<!-- NODETEXT ##REMOVE## --> <!-- 3 ChildNode of AdrLine[2] -->
</AdrLine>
</PstlAdr>
</Cdtr>
Diving down to bottom in xml hierarchy (assuming text values) via
listnode.ParentNode.RemoveChild listnode
you are deleting the textual ChildNode of AdrLine[2] (level 3) which is the string "##REMOVE##",
but not it container node AdrLine[2] (level 2). Therefore you are deleting only the dummy text.
Following your logic in function RemoveOptionalEmptyTags() as close as possible you'd have to code instead:
listNode.ParentNode.ParentNode.RemoveChild listNode.ParentNode
addressing PstlAdr (=level 1) executing a deletion of its ChildNode AdrLine[2] (i.e. at level 2) which
automatically includes deletion of the dummy string "##REMOVE" at level 3.
Related links:
XML Parse via VBA
Obtain atrribute names from xml using VBA

Related

How to loop through XML-nodes and validate if values exists?

I have through an API fetched my data as an XML, and I wish to cycle through nodes (there are several of the same type) and add them to certain fields/a table.
Example from the XML-file:
<HistRating
xmlns="">
<EndrAr>2020</EndrAr>
<EndrMnd>7</EndrMnd>
<Rating>A</Rating>
</HistRating>
<HistRating
xmlns="">
<EndrAr>2019</EndrAr>
<EndrMnd>6</EndrMnd>
<Rating>A</Rating>
</HistRating>
I have tried the following format (at this point the XML I need is in a string in xmlDoc xmlDoc = CreateObject("MSXML2.DOMDocument.6.0"). Fully aware that this is not a really "sexy" way to write it, but I'm new at this game:
Set nodeXML = xmlDoc.getElementsByTagName("EndrAr")
Range("G1").Value = nodeXML(1).Text
Range("H1").Value = nodeXML(2).Text
Range("I1").Value = nodeXML(3).Text
Set nodeXML = xmlDoc.getElementsByTagName("EndrMnd")
Range("G2").Value = nodeXML(1).Text
Range("H2").Value = nodeXML(2).Text
Range("I2").Value = nodeXML(3).Text
Set nodeXML = xmlDoc.getElementsByTagName("Rating")
Range("G3").Value = nodeXML(1).Text
Range("H3").Value = nodeXML(2).Text
Range("I3").Value = nodeXML(3).Text
This works great as long as all three items are there. Unfortunately that is not given. If it is a new company i.e. (3) wont exist (there is one line per year above), and I would like to either set the cell to Blank or No value or something.
The result from when I run the above code:
But if I try to add a line 4 to test what happens if value does not exists I get the following (for obvious reasons)
What I would love some help with is:
Can I by some "magic" add a ifmissing (tried it, but could not get it to work)?
Other ways to add a if variable is not found, input following into cell
Or are there a complete different way I should have solved this?
This is to add accounting data from last X available years (where X is ie 4, or less if not 4 is available) from 30 nodes.
You could use an Error trapping Function. Note in the code below we choose not to use the returned boolean.
Dim myTest as String
.
.
TryReadingXmlNode nodeXML,1, myText
Range("G1").Value = myText
.
.
Public Function TryReadingXmlNode(ByVal ipNode as object, ByVal ipIndex as Long, ByRef opText as string) as boolean
On Error Resume Next
opText=ipNode.Item(ipIndex).Text
TryReadingXmlNode=Len(opText)>0
If err.number>0 then opText="NoValue"
on Error Goto 0
End Function
Start by querying all of the HistRating elements, then loop over that collection:
Const MAX_YEARS As Long = 4
Dim ratings, rating, c As Range, i as Long
Set c= Range("A1")
Set ratings = xmlDoc.getElementsByTagName("HistRating")
For Each rating in ratings
c.offset(0, i) = rating.getElementsByTagName("EndrAr")(0).Text
c.offset(1, i) = rating.getElementsByTagName("EndrMnd")(0).Text
c.offset(2, i) = rating.getElementsByTagName("Rating")(0).Text
i = i + 1
If i >= MAX_YEARS Then Exit For 'exit if processed enough nodes
Next rating

Why Object.Selectnodes(XPath) gets 1st node value if former node is empty

Why Object.SelectNodes(XPath) gets as 1st node value second node value if former node (real 1st node value) is empty.
Example below:
XML:
<?xml version="1.0" encoding="UTF-8"?>
<Document>
<person>
</person>
<person>
<name>Peter</name>
</person>
</Document>
VBA code:
Dim j as Integer
Dim FileToOpen as Variant
FileToOpen = Application.GetOpenFilename(Filefilter:="XML Files (*.xml), *.xml", _
Title:="Choose XML document ", MultiSelect:=False)
Set XDoc = CreateObject("MSXML2.DOMDocument")
XDoc.async = False: XDoc.validateOnParse = False
XDoc.Load FileToOpen
For j = 1 To 2
Set tofields = XDoc.SelectNodes("//Document/person/name")
If Not (tofields.Item(j)) Is Nothing Then
Debug.Print tofields.Item(j).Text
Else
Debug.Print "Nothing"
End If
Next j
Result:
Peter
Nothing
Why is not "Nothing" on the first place in the result ? How to reach that ? If parent node does not include 1st child node, 1st iteration is omitted.
Thank you.
Enumeration in XMLDOM differs from XPath
XMLDOM syntax enumerates nodes as zero based items in a NodeList, i.e. starting from 0, whereas XPath expressions identifying subnodes start from 1 (e.g. calling a first name item "//name[1]"). This mistakes in considering 2 as the last item index instead of looping from .Item(0) to .Item(1) in your example code. You get the number of found items results via the node list's .Length method (2 minus 1 results in 1 as the last index number, thus giving a name node series of 0 to 1).
Furthermore it's recommended to reference MSXML2 version 6.0 (MSXML2.DOMDocument refers to the last stable version 3.0 used only for compatibility reasons).
Further hints assuming that you want loop through all persons in your XML document (the nodelist in the OP delivers one item only, as the name node exists only once):
The expressions xDoc.SelectNodes("//Document/person") or //person would search the defined node set at any hierarchy Level within a given node structure. So it's less time consuming to use unambigously Set toFields = xDoc.DocumentElement.SelectNodes("person") in your case.
The following code example wouldn't show the Nothing case, as the node list displays two name nodes only (For i = 0 To toFields.Length - 1). Just in order to check your original attempt you could enumerate up to three items by changing intently to For i = 0 To toFields.Length (i.e. 0 to 2).
Additional link
Analyze your XML structure via recursive calls; a working function can be found at Parse XML using XMLDOM.
Code example
Dim xDoc As Object, toFields As Object
Dim myName As String, i As Long
Set xDoc = CreateObject("MSXML2.DOMDocument.6.0") ' recommended version 6.0 (if late bound MSXML2)
xDoc.async = False: xDoc.validateOnParse = False
' ...
Set toFields = xDoc.DocumentElement.SelectNodes("person")
For i = 0 To toFields.Length - 1
If Not toFields.Item(i) Is Nothing Then
If toFields.Item(i).HasChildNodes Then
myName = toFields.Item(i).SelectSingleNode("name").Text
Debug.Print i, IIf(Len(Trim(myName)) = 0, "**Empty name", myName)
Else
Debug.Print i, "**No name node**"
End If
Else
Debug.Print i, "**Nothing**" ' shouldn't be needed from 0 to .Length-1 items :-)
End If
Next i

Cannot get the text of nth span element using vba

I have the following html part
<div class="description">
<span>Brand:</span>
Nikon<br/>
<span>Product Code:</span> 130342 <br/>
<span>Barcode</span> 18208948581 <br/>
<span>Availability:</span>Available</div>
I am trying to get the last span and the word Available using the following
Set availability = ie.Document.getElementsByClassName(".description").getElementsByTagName("span")(2)
wks.Cells(i, "D").Value = availability.innerText
But it shows all span texts
What I am doing wrong here?
Use last-child css pseudo class in descendant combination with parent element class selector.
.description span:last-child
The :last-child CSS pseudo-class represents the last element among a
group of sibling elements.
Applying:
single match
Set availability = ie.document.querySelector(".description span:last-child")
Cells(1,1) = availability.innerText
all matches
Set availability = ie.document.querySelectorAll(".description span:last-child")
Cells(1,1) = availability.item(0).innerText
Otherwise, you can return the span collection from that parent class and index into it
Set availability = ie.document.querySelectorAll(".description span")
Cells(1,1) = availability.item(2).innerText '<==choose your index here
Or even chain:
Set availability = ie.document.querySelector(".description span + span + span") '<==expand as required. This uses [adjacent sibling combinator][4].
Sadly, pseudo classes nth-of-type / nth-child are not supported in VBA implementation though you can in many other languages e.g. python.
—-
If after just the Available you should be able to use .description as your selector to return all the text in the div. Then use Split on the .innerText using Chr$(32) to split by and extract the UBound (I.e. the last element of the generated array)
Set availability = ie.document.querySelector(".description")
Dim arr() As String
arr = split( availability.innerText, ":")
Cells(1,1) = arr(UBound(arr))
As Zac pointed out in the comments, you shouldn't use a period . with the getElementsByClassName method.
ie.Document.getElementsByClassName is returning a DispHTMLElementCollection of elements. You need to specify which element you want to reference
Set availability = ie.Document.getElementsByClassName(".description")(0).getElementsByTagName("span")(2)
A better way to write the write the code would be to reference the Microsoft HTML Object Library and create a variable to test each element returned. Unfortunately, there is a bug in the DispHTMLElementCollection implementation, so you will need to use Object instead of DispHTMLElementCollection.
Dim doc As HTMLDocument
Dim availability As Object
Set doc = ie.Document
Set availability = doc.getElementsByClassName("description")
Dim div As HTMLDivElement
Dim span As HTMLSpanElement
Dim spans As Object
For Each div In availability
Set spans = div.getElementsByTagName("span")
For Each span In spans
Debug.Print span.innerText
Next
Next
Output

Can't load node from .jdf file in Vbscript

I'm trying to get a value from a node in a .jdf file.
It gives us an error
object required: 'curNode'
in line no. 13 - inputFolder = curNode.getAttribute("Amount")
We don't really know what to do... any help please?
Thank you
'creates the msxml object
Set xmlDoc = CreateObject("Msxml2.DOMDocument.6.0")
Dim xmlDataPath,retVal
xmlDataPath = "C:\Users\liatte\Desktop\Aviv Omer Neta\JDFs to Txt\a.jdf"
'load the xml data of the script
retVal=xmlDoc.load(xmlDataPath)
'get input folder
Set curNode = xmlDoc.selectSingleNode("//JDF/ResourceLinkPool/ComponentLink")
Dim inputFolder
inputFolder = curNode.getAttribute("Amount")
To deal with the error, check
If curNode Is Nothing Then
...
Else
Dim inputFolder
...
End If
Obviously your assumptions (XPath expression) about the source file are wrong, when selectSingleNode() fails.
If an XPath expression like //JDF/ResourceLinkPool/ComponentLink does not select elements in your input document then it is likely that you are processing a document which uses namespaces, see http://en.wikipedia.org/wiki/XML_namespaces.
With XPath 1.0 a path like /foo/bar selects bar child elements of foo elements in no namespace while with an XML document of the form
<foo xmlns="http://example.com/ns1">
<bar>baz</bar>
</foo>
the elements are in the namespace http://example.com/ns1.
With your sample there is probably a default namespace declaration (e.g. xmlns="http://www.CIP4.org/JDFSchema_1_1") which requires you to change your XPath expressions by defining a prefix for the namespace e.g.
xmlDoc.setProperty "SelectionNamespaces", "xmlns:jdf='http://www.CIP4.org/JDFSchema_1_1'"
and using it:
Set curNode = xmlDoc.selectSingleNode("//jdf:JDF/jdf:ResourceLinkPool/jdf:ComponentLink")
Documentation for MSXML is at http://msdn.microsoft.com/en-us/library/windows/desktop/ms756048%28v=vs.85%29.aspx.

Change the font of form and usercontrol( .frm and .ctl ) at runtime

I have a VB 6 Add-in that adds all the projects to a project group, iterates through each of the component of those projects, and if a form or usercontrol is found then changes its properties.
The properties are defined by the user. If user wants to change the height of all the forms or usercontrol then the code snippet is as follows
Private Sub Update_ButtonClick()
'..declaring all the variables here
' VBInstance is initialized to VBIDE.VBE when the add-in is loaded
For Index = 1 To projCount
compoCount = VBInstance.VBProjects(Index).VBComponents.Count
For jIndex = 1 To compoCount
csFileName = VBInstance.VBProjects(Index).VBComponents(jIndex).name
componentType = VBInstance.VBProjects(Index).VBComponents(jIndex).Type
If componentType = VBIDE.vbext_ct_VBForm Or componentType = VBIDE.vbext_ct_UserControl Then '.frm or .ctl
VBInstance.VBProjects(Index).VBComponents(jIndex).Properties(propChange).Value = propvalue 'changing the property
VBInstance.VBProjects(Index).VBComponents(jIndex).SaveAs csFileName 'Saving the file
End If
Next jIndex
Next Index
End Sub
Whenever I give the Properties name as Font, I get the error
Runtime error '425' Invalid Object use
I have tried PropertyBag.WriteProperty from http://visualbasic.freetutes.com/learn-vb6-advanced/lesson13/p20.html but it does not serve my purpose.
Is there any way out to set the Font property of a control or form?
When I open the ctl or form in notepad, I cannot find the Font property in it so I cannot use text replacement here.
Can anyone help?
Updated Code :
Private Sub Update_ButtonClick()
Dim fobject As New StdFont
fobject.Name = "Arial"
Set propvalue = fobject
For Index = 1 To projCount
compoCount = VBInstance.VBProjects(Index).VBComponents.Count
For jIndex = 1 To compoCount
csFileName = VBInstance.VBProjects(Index).VBComponents(jIndex).Name
componentType = VBInstance.VBProjects(Index).VBComponents(jIndex).Type
If componentType = 5 Or componentType = 8 Then
VBInstance.VBProjects(Index).VBComponents(jIndex).Properties("Font").Value= propvalue
VBInstance.VBProjects(Index).VBComponents(jIndex).SaveAs csFileName
End If
Next jIndex
Next Index
End Sub
And the error that i got is
Run-time error '425':
Invalid object use
The Font property is an object, not an simple intrinsic value. You'll need to use Set with an appropriate StdFont object assigned to propvalue.
Alternatively, you can special case the font and just set the property's .Name property to the required font name.

Resources