Find specific string in a text file with VBS script - text

I need to find the string "Test Case " & index in a txt file.
I give you an example of the lines you can find in this file:
<tr><td>Log_ in_U1A1</td></tr>
<tr><td>Form1</td></tr>
As you can see in the second line I have an occurrence of the string "Test Case".
What I want to do is to add another particular string in the line which preceeds the one where "Test Case 5" appears. For example:
<tr><td>Log_ in_U1A1</td></tr>
<tr><td>Beginning_of_DD_TC5</td></tr>
<tr><td>Form1</td></tr>
It's also important that the line I add has an index i which depends on the Test Case number, and i need to add it before the first occurrence of "Test Case" & i, i dont care about the following occurrences.
I tested if InStr function worked with an example:
Dim objFSO, filepath, objInputFile, tmpStr, substrToFind
Set objFSO = CreateObject("Scripting.FileSystemObject")
filepath = "C:\VBS\filediprova.txt"
substrToFind = "<tr><td><a href=" & chr(34) & "../Test case 5"
Set objInputFile = objFSO.OpenTextFile(filepath)
tmpStr = objInputFile.ReadLine
If InStr(tmpStr, substrToFind) <= 0 Then
WScript.Echo "No matches"
Else
WScript.Echo "Found match"
End If
And it works, it recognizes my substring. In this small example the txt file only contans the followingline:
<tr><td>Form1</td></tr>
Now, when I try to loop over a file with much more lines I have some problem, I use the same InStr function.
I wrote the following loop:
Do until objInputFile.AtEndOfStream
strToAdd = "<tr><td>Beginning_of_DD_TC" & CStr(index) & "</td></tr>"
substrToFind = "<tr><td><a href=" & chr(34) & "../Test case " & index
firstStr = "<?xml version" 'my file always starts like this
tmpStr = objInputFile.ReadLine
If InStr(tmpStr, substrToFind) <= 0 Then
If Instr(tmpStr, firstStr) > 0 Then
text = tmpStr 'to avoid the first empty line
Else
text = text & vbCrLf & tmpStr
End If
Else
text = text & vbCrLf & strToAdd & vbCrLf & tmpStr
index = index + 1
End If
Loop
What's wrong?

I'd recommend using a regular expressions instead of string operations for this:
Set fso = CreateObject("Scripting.FileSystemObject")
filename = "C:\VBS\filediprova.txt"
newtext = vbLf & "<tr><td>Beginning_of_DD_TC5</td></tr>"
Set re = New RegExp
re.Pattern = "(\n.*?Test Case \d)"
re.Global = False
re.IgnoreCase = True
text = f.OpenTextFile(filename).ReadAll
f.OpenTextFile(filename, 2).Write re.Replace(text, newText & "$1")
The regular expression will match a line feed (\n) followed by a line containing the string Test Case followed by a number (\d), and the replacement will prepend that with the text you want to insert (variable newtext). Setting re.Global = False makes the replacement stop after the first match.
If the line breaks in your text file are encoded as CR-LF (carriage return + line feed) you'll have to change \n into \r\n and vbLf into vbCrLf.
If you have to modify several text files, you could do it in a loop like this:
For Each f In fso.GetFolder("C:\VBS").Files
If LCase(fso.GetExtensionName(f.Name)) = "txt" Then
text = f.OpenAsTextStream.ReadAll
f.OpenAsTextStream(2).Write re.Replace(text, newText & "$1")
End If
Next

Wow, after few attempts I finally figured out how to deal with my text edits in vbs. The code works perfectly, it gives me the result I was expecting. Maybe it's not the best way to do this, but it does its job.
Here's the code:
Option Explicit
Dim StdIn: Set StdIn = WScript.StdIn
Dim StdOut: Set StdOut = WScript
Main()
Sub Main()
Dim objFSO, filepath, objInputFile, tmpStr, ForWriting, ForReading, count, text, objOutputFile, index, TSGlobalPath, foundFirstMatch
Set objFSO = CreateObject("Scripting.FileSystemObject")
TSGlobalPath = "C:\VBS\TestSuiteGlobal\Test suite Dispatch Decimal - Global.txt"
ForReading = 1
ForWriting = 2
Set objInputFile = objFSO.OpenTextFile(TSGlobalPath, ForReading, False)
count = 7
text=""
foundFirstMatch = false
Do until objInputFile.AtEndOfStream
tmpStr = objInputFile.ReadLine
If foundStrMatch(tmpStr)=true Then
If foundFirstMatch = false Then
index = getIndex(tmpStr)
foundFirstMatch = true
text = text & vbCrLf & textSubstitution(tmpStr,index,"true")
End If
If index = getIndex(tmpStr) Then
text = text & vbCrLf & textSubstitution(tmpStr,index,"false")
ElseIf index < getIndex(tmpStr) Then
index = getIndex(tmpStr)
text = text & vbCrLf & textSubstitution(tmpStr,index,"true")
End If
Else
text = text & vbCrLf & textSubstitution(tmpStr,index,"false")
End If
Loop
Set objOutputFile = objFSO.CreateTextFile("C:\VBS\NuovaProva.txt", ForWriting, true)
objOutputFile.Write(text)
End Sub
Function textSubstitution(tmpStr,index,foundMatch)
Dim strToAdd
strToAdd = "<tr><td>Beginning_of_CF5.0_Features_TC" & CStr(index) & "</td></tr>"
If foundMatch = "false" Then
textSubstitution = tmpStr
ElseIf foundMatch = "true" Then
textSubstitution = strToAdd & vbCrLf & tmpStr
End If
End Function
Function getIndex(tmpStr)
Dim substrToFind, charAtPos, char1, char2
substrToFind = "<tr><td><a href=" & chr(34) & "../Test case "
charAtPos = len(substrToFind) + 1
char1 = Mid(tmpStr, charAtPos, 1)
char2 = Mid(tmpStr, charAtPos+1, 1)
If IsNumeric(char2) Then
getIndex = CInt(char1 & char2)
Else
getIndex = CInt(char1)
End If
End Function
Function foundStrMatch(tmpStr)
Dim substrToFind
substrToFind = "<tr><td><a href=" & chr(34) & "../Test case "
If InStr(tmpStr, substrToFind) > 0 Then
foundStrMatch = true
Else
foundStrMatch = false
End If
End Function
This is the original txt file
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta content="text/html; charset=UTF-8" http-equiv="content-type" />
<title>Test Suite</title>
</head>
<body>
<table id="suiteTable" cellpadding="1" cellspacing="1" border="1" class="selenium"><tbody>
<tr><td><b>Test Suite</b></td></tr>
<tr><td>TC_Environment_setting</td></tr>
<tr><td>TC_Set_variables</td></tr>
<tr><td>TC_Set_ID</td></tr>
<tr><td>Log_in_Admin</td></tr>
<tr><td>Set_Roles_Dispatch_Decimal</td></tr>
<tr><td>Log_ in_U1A1</td></tr>
<tr><td>Form1</td></tr>
<tr><td>contrD1</td></tr>
<tr><td>Logout</td></tr>
<tr><td>Log_ in_U1B1</td></tr>
<tr><td>Search&OpenApp</td></tr>
<tr><td>FormEND</td></tr>
<tr><td>Controllo END</td></tr>
<tr><td>Logout</td></tr>
<tr><td>Log_ in_U1A1</td></tr>
<tr><td>Form1</td></tr>
<tr><td>contrD1</td></tr>
<tr><td>Logout</td></tr>
<tr><td>Log_ in_U1B1</td></tr>
<tr><td>Search&OpenApp</td></tr>
<tr><td>FormEND</td></tr>
<tr><td>Controllo END</td></tr>
<tr><td>Logout</td></tr>
<tr><td>Log_ in_U1A1</td></tr>
<tr><td>Form1</td></tr>
<tr><td>Controllo DeadLetter</td></tr>
<tr><td>Logout</td></tr>
<tr><td>Set_Roles_Dispatch_Decimal</td></tr>
<tr><td>Logout_BAC</td></tr>
</tbody></table>
</body>
</html>
And this is the result I'm expecting
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta content="text/html; charset=UTF-8" http-equiv="content-type" />
<title>Test Suite</title>
</head>
<body>
<table id="suiteTable" cellpadding="1" cellspacing="1" border="1" class="selenium"><tbody>
<tr><td><b>Test Suite</b></td></tr>
<tr><td>TC_Environment_setting</td></tr>
<tr><td>TC_Set_variables</td></tr>
<tr><td>TC_Set_ID</td></tr>
<tr><td>Log_in_Admin</td></tr>
<tr><td>Set_Roles_Dispatch_Decimal</td></tr>
<tr><td>Log_ in_U1A1</td></tr>
<tr><td>Beginning_of_CF5.0_Features_TC5</td></tr>
<tr><td>Form1</td></tr>
<tr><td>Form1</td></tr>
<tr><td>contrD1</td></tr>
<tr><td>Logout</td></tr>
<tr><td>Log_ in_U1B1</td></tr>
<tr><td>Search&OpenApp</td></tr>
<tr><td>FormEND</td></tr>
<tr><td>Controllo END</td></tr>
<tr><td>Logout</td></tr>
<tr><td>Log_ in_U1A1</td></tr>
<tr><td>Beginning_of_CF5.0_Features_TC6</td></tr>
<tr><td>Form1</td></tr>
<tr><td>contrD1</td></tr>
<tr><td>Logout</td></tr>
<tr><td>Log_ in_U1B1</td></tr>
<tr><td>Search&OpenApp</td></tr>
<tr><td>Controllo END</td></tr>
<tr><td>Logout</td></tr>
<tr><td>Log_ in_U1A1</td></tr>
<tr><td>Beginning_of_CF5.0_Features_TC7</td></tr>
<tr><td>Form1</td></tr>
<tr><td>Controllo DeadLetter</td></tr>
<tr><td>Logout</td></tr>
<tr><td>Set_Roles_Dispatch_Decimal</td></tr>
<tr><td>Logout_BAC</td></tr>
</tbody></table>
</body>
</html>

Try to change like this ..
firstStr = "<?xml version" 'my file always starts like this
Do until objInputFile.AtEndOfStream
strToAdd = "<tr><td>Beginning_of_DD_TC" & CStr(index) & "</td></tr>"
substrToFind = "<tr><td><a href=" & chr(34) & "../Test case " & trim(cstr((index)))
tmpStr = objInputFile.ReadLine
If InStr(tmpStr, substrToFind) <= 0 Then
If Instr(tmpStr, firstStr) > 0 Then
text = tmpStr 'to avoid the first empty line
Else
text = text & vbCrLf & tmpStr
End If
Else
text = text & vbCrLf & strToAdd & vbCrLf & tmpStr
End If
index = index + 1
Loop

Related

Having trouble with XML phase on VBA Excel

I need to parse a XML file having the structure as follows: (I can't show the data as it is confidential)
<?xml version="1.0" encoding="UTF-8"?>
<GACDWBulkLoadInterface xsi:schemaLocation="http://www.example.org/GACDWSchema GACDWSchema.xsd" xmlns="http://www.example.org/GACDWSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<BLConfiguration>
<BLProperties>
<BLProperty>
<key>isEmpty</key>
<value xmlns:xsd="http://www.w3.org/2001/XMLSchema" xsi:type="xsd:boolean">true</value>
</BLProperty>
</BLProperties>
<version>api-ag-1.1.68-20210315-43</version>
</BLConfiguration>
<Customer>
<CustomerID></CustomerID>
<Customer_Name></Customer_Name>
<ISOCountryCode></ISOCountryCode>
<customerType></customerType>
<dataSourceProvider></dataSourceProvider>
<function></function>
<TopLevelCI>
<Contact Action="Create">
<Id>4</Id>
<artifactType></artifactType>
<contactType></contactType>
<email></email>
<firstName></firstName>
<lastName></lastName>
<userID></userID>
<ChildCI/>
</Contact>
<PhysicalComputerSystem Action="Create">
<Id>2</Id>
<artifactType></artifactType>
<assetLifeCycleState></assetLifeCycleState>
<category></category>
<ciLifeCycleState></ciLifeCycleState>
<custname></custname>
<description></description>
<discontinuedDate></discontinuedDate>
<machineType></machineType>
<managed></managed>
<manufacturer></manufacturer>
<model></model>
<serialNumber></serialNumber>
<virtualFlag></virtualFlag>
<ChildCI>
<PhysicalComputerSystemContact Action="Create">
<contactUID></contactUID>
</PhysicalComputerSystemContact>
<PhysicalComputerSystemLocation Action="Create">
<buildingID></buildingID>
<buildingName></buildingName>
<city></city>
<countryName></countryName>
<isoCountryCode></isoCountryCode>
<postalCode></postalCode>
<siteName></siteName>
<stateProvince></stateProvince>
<streetAddress></streetAddress>
</PhysicalComputerSystemLocation>
</ChildCI>
</PhysicalComputerSystem>
<OperatingSystem Action="Create">
<Id>1</Id>
<aliasHostname></aliasHostname>
<artifactType></artifactType>
<assetLifeCycleState></assetLifeCycleState>
<backupContr></backupContr>
<barSaId></barSaId>
<cacfCustomGroup></cacfCustomGroup>
<category></category>
<ciLifeCycleState></ciLifeCycleState>
<classification></classification>
<cpuCount></cpuCount>
<description></description>
<discontinuedDate></discontinuedDate>
<fqhn></fqhn>
<hostsystemID></hostsystemID>
<hostsystemType></hostsystemType>
<installationDate></installationDate>
<ipAddress></ipAddress>
<majorBusProc></majorBusProc>
<managed></managed>
<memory></memory>
<memoryUnit></memoryUnit>
<osModLevel></osModLevel>
<osName></osName>
<osBuildNumber></osBuildNumber>
<osProvider></osProvider>
<osRelease></osRelease>
<osVersion></osVersion>
<prodDate></prodDate>
<purgeDate></purgeDate>
<purpose></purpose>
<securityClass></securityClass>
<serverType></serverType>
<supportEnddate></supportEnddate>
<useAliasHostname></useAliasHostname>
<virtualFlag></virtualFlag>
<ChildCI>
<OperatingSystemContact Action="Create">
<contactUID></contactUID>
</OperatingSystemContact>
<OperatingSystemEnvironmentProfile Action="Create">
<authFlag></authFlag>
<authExceptDate></authExceptDate>
<authExceptReasonCode></authExceptReasonCode>
<authExceptText></authExceptText>
<disasterRecSys></disasterRecSys>
<domain></domain>
<extUsersMax></extUsersMax>
<hcAutoInterv></hcAutoInterv>
<hcExceptDate></hcExceptDate>
<hcExceptReasonCode></hcExceptReasonCode>
<hcExceptText></hcExceptText>
<hcUnit></hcUnit>
<healthchAutoFlag></healthchAutoFlag>
<highestClientDataClassification></highestClientDataClassification>
<highestDataClassification></highestDataClassification>
<highestIbmDataClassification></highestIbmDataClassification>
<ibmCbnInterval></ibmCbnInterval>
<ibmCbnRequired></ibmCbnRequired>
<ibmCbnType></ibmCbnType>
<ibmCbnUnit></ibmCbnUnit>
<ibmPrivIDUnit></ibmPrivIDUnit>
<ibmPrivType></ibmPrivType>
<ibmQevFlag></ibmQevFlag>
<ibmQevInterval></ibmQevInterval>
<ibmQevUnit></ibmQevUnit>
<ibmUsersMax></ibmUsersMax>
<internetAccFlag></internetAccFlag>
<personalData></personalData>
<privIDFlag></privIDFlag>
<privIDInterv></privIDInterv>
<regulatoryRequirements></regulatoryRequirements>
<remediationtimeHi></remediationtimeHi>
<remediationtimeLo></remediationtimeLo>
<remediationtimeMd></remediationtimeMd>
<rerFlag></rerFlag>
<rerInterv></rerInterv>
<rerType></rerType>
<rerUnit></rerUnit>
<scExceptDate></scExceptDate>
<scExceptReasonCode></scExceptReasonCode>
<scExceptText></scExceptText>
<secExemptionDesc></secExemptionDesc>
<secExemptionFlag></secExemptionFlag>
<securityPackage></securityPackage>
<securityPolicyCategory></securityPolicyCategory>
<securityPolicyOther></securityPolicyOther>
<securityPolicyType></securityPolicyType>
<serverPurpose></serverPurpose>
<sharedIDFlag></sharedIDFlag>
<shrdExceptDate></shrdExceptDate>
<shrdExceptReasonCode></shrdExceptReasonCode>
<shrdExceptText></shrdExceptText>
<vitalBusProcessYN></vitalBusProcessYN>
</OperatingSystemEnvironmentProfile>
<OperatingSystemIpAddress Action="Create">
<dotnotation></dotnotation>
<hostName></hostName>
<isPrimaryIpHost></isPrimaryIpHost>
<name></name>
<netMask></netMask>
<vsExceptDate></vsExceptDate>
<vsExceptReasonCode></vsExceptReasonCode>
<vsExceptText></vsExceptText>
<vulScanUnit></vulScanUnit>
<vulscanAutoFlag></vulscanAutoFlag>
<vulscanAutoInterval></vulscanAutoInterval>
</OperatingSystemIpAddress>
</ChildCI>
</OperatingSystem>
<Subsystem Action="Create">
<Id>3</Id>
<applSIDataType></applSIDataType>
<artifactType></artifactType>
<assetLifeCycleState></assetLifeCycleState>
<ciLifeCycleState></ciLifeCycleState>
<controlProductName></controlProductName>
<controlProductRelease></controlProductRelease>
<controlProductType></controlProductType>
<controlProductVendor></controlProductVendor>
<controlProductVersion></controlProductVersion>
<description></description>
<discontinuedDate></discontinuedDate>
<fqhn></fqhn>
<managed></managed>
<name></name>
<parentUID></parentUID>
<piFlag></piFlag>
<productModLevel></productModLevel>
<productName></productName>
<productPatchLevel></productPatchLevel>
<productRelease></productRelease>
<productVersion></productVersion>
<purgeDate></purgeDate>
<scanTime></scanTime>
<vendorName></vendorName>
<ChildCI>
<SubsystemContact Action="Create">
<contactUID></contactUID>
</SubsystemContact>
<SubsystemEnvironmentProfile Action="Create">
<authFlag></authFlag>
<authExceptDate></authExceptDate>
<authExceptReasonCode></authExceptReasonCode>
<authExceptText></authExceptText>
<disasterRecSys></disasterRecSys>
<hcAutoInterv></hcAutoInterv>
<hcExceptDate></hcExceptDate>
<hcExceptReasonCode></hcExceptReasonCode>
<hcExceptText></hcExceptText>
<hcUnit></hcUnit>
<healthchAutoFlag></healthchAutoFlag>
<ibmCbnInterval></ibmCbnInterval>
<ibmCbnRequired></ibmCbnRequired>
<ibmCbnType></ibmCbnType>
<ibmCbnUnit></ibmCbnUnit>
<ibmPrivIDUnit></ibmPrivIDUnit>
<ibmPrivType></ibmPrivType>
<ibmQevFlag></ibmQevFlag>
<ibmQevInterval></ibmQevInterval>
<ibmQevUnit></ibmQevUnit>
<privIDFlag></privIDFlag>
<privIDInterv></privIDInterv>
<rerFlag></rerFlag>
<rerInterv></rerInterv>
<rerType></rerType>
<rerUnit></rerUnit>
<scExceptDate></scExceptDate>
<scExceptReasonCode></scExceptReasonCode>
<scExceptText></scExceptText>
<sharedIDFlag></sharedIDFlag>
<shrdExceptDate></shrdExceptDate>
<shrdExceptReasonCode></shrdExceptReasonCode>
<shrdExceptText></shrdExceptText>
<vitalBusProcessYN></vitalBusProcessYN>
</SubsystemEnvironmentProfile>
</ChildCI>
</Subsystem>
</TopLevelCI>
<Relationship>
<RelationshipName>Hosted_by</RelationshipName>
<sourceCIType>OperatingSystem</sourceCIType>
<targetCIType>OperatingSystem</targetCIType>
</Relationship>
<Relationship>
<RelationshipName>Runs_on</RelationshipName>
<sourceCIType>Subsystem</sourceCIType>
<targetCIType>OperatingSystem</targetCIType>
</Relationship>
<Relationship>
<RelationshipName>Runs_on_hw</RelationshipName>
<sourceCIType>OperatingSystem</sourceCIType>
<targetCIType>PhysicalComputerSystem</targetCIType>
</Relationship>
</Customer>
</GACDWBulkLoadInterface>
I have researched and come up with my VBA code here
Sub ReadXML()
Dim oXML As MSXML2.DOMDocument60
Dim vaFile As Variant
Set oXML = New MSXML2.DOMDocument60
'Open Browse file dialog
vaFile = Application.GetOpenFilename("XML Files (*.xml), *.xml", _
Title:="Select XML files", MultiSelect:=False)
oXML.validateOnParse = True
oXML.setProperty "SelectionLanguage", "XPath" ' necessary in version 3.0, possibly redundant here
oXML.async = False
If Not oXML.Load(vaFile) Then 'Load XML has gone bad
Dim xPE As Object ' Set xPE = CreateObject("MSXML2.IXMLDOMParseError")
Dim strErrText As String
Set xPE = oXML.parseError
With xPE
strErrText = "Load error " & .ErrorCode & " xml file " & vbCrLf & _
Replace(.URL, "file:///", "") & vbCrLf & vbCrLf & _
xPE.reason & _
"Source Text: " & .srcText & vbCrLf & vbCrLf & _
"Line No.: " & .Line & vbCrLf & _
"Line Pos.: " & .linepos & vbCrLf & _
"File Pos.: " & .filepos & vbCrLf & vbCrLf
End With
MsgBox strErrText, vbExclamation
Set xPE = Nothing
Exit Sub
End If
Debug.Print "|" & oXML.XML & "|"
Dim nodeList As MSXML2.IXMLDOMNodeList, iNode As MSXML2.IXMLDOMNode
Dim Searched As String
Searched = "/*/*"
Set nodeList = oXML.SelectNodes(Searched)
'Set Queries = oXML.DocumentElement.SelectNodes(Searched)
For Each iNode In nodeList
'Debug.Print "<" & iNode.BaseName & ">"
Debug.Print "<" & iNode.nodeName & ">"
Next
However, My problem is no matter what I come up with Xpath queries. None of them work except \* or \.
I can't specific any path. Both .SelectSingleNode or .SelectNodes alway return length =0. Is there any problem with my code or my XML file or Xpath syntax?
As mentioned in comments your xml document has namespace definitions in its DocumentElement <GACDWBulkLoadInterface> (xmlns stands for xml name space). Furthermore "it contains a default namespace so any attempted parsing on named nodes must map to this namespace URI otherwise returns nothing."
To allow eventual analysis it's necessary to include a user defined prefix (e.g. :s) into explicit namespace settings, which can be used in later XPath expressions:
Dim oXML As MSXML2.DOMDocument60
Set oXML = New MSXML2.DOMDocument60
oXML.validateOnParse = True
Dim XMLNamespaces As String
XMLNamespaces = "xmlns:s='http://www.example.org/GACDWSchema'"
oXML.SetProperty "SelectionNamespaces", XMLNamespaces
Eventually you can define any XPath expression, e.g. the childnodes of Customer[1]:
Dim nodeList As MSXML2.IXMLDOMNodeList, iNode As MSXML2.IXMLDOMNode
Dim Searched As String
Searched = "//s:Customer/s:*"
Set nodeList = oXML.SelectNodes(Searched)
Related links
Displaying XML structures including attributes via recursive function calls
Display XML with hierarchy in cells
Apparently OP's source code: Parse DomDocument

Parsing Through XML File Excel VBA objDOM not Working

Trying to parse thorugh an XML file with VBA and print the results to an Excel spreadsheet. But whenever I get to creating a list with the following code it doesn't create one and I can't figure out why.
Set Songs = objDOM.SelectNodes("/channel/item")
Full code and XML is below
Function fnReadXMLByTags()
'Code From: https://excel-macro.tutorialhorizon.com/vba-excel-read-xml-by-looping-through-nodes/
'And Here: https://desmondoshiwambo.wordpress.com/2012/07/03/how-to-load-xml-from-a-local-file-with-msxml2-domdocument-6-0-and-loadxml-using-vba/
Dim mainWorkBook As Workbook
Set mainWorkBook = ActiveWorkbook
mainWorkBook.Sheets("XML_Parser").Range("A:C").Clear
Dim intFile As Integer
Dim strTemp As String
Dim strXML As String
Dim strOrderText As String
Dim objDOM As Object
Dim Songs As Object
XMLFileName = ThisWorkbook.Path & "\PlaylistFeed.xml"
'Open file
intFile = FreeFile
Open XMLFileName For Input As intFile
strXML = ""
'Load XML into string strXML
While Not EOF(intFile)
Line Input #intFile, strTemp
strXML = strXML & strTemp
Wend
Close intFile
'Load the XML into DOMDocument object
Set objDOM = CreateObject("MSXML2.DOMDocument.6.0")
objDOM.LoadXML strXML
mainWorkBook.Sheets("XML_Parser").Range("A1,B1,C1,D1").Interior.ColorIndex = 40
mainWorkBook.Sheets("XML_Parser").Range("A1,B1,C1,D1").Borders.Value = 1
mainWorkBook.Sheets("XML_Parser").Range("A" & 1).Value = "Song Number"
mainWorkBook.Sheets("XML_Parser").Range("B" & 1).Value = "Tag Number"
mainWorkBook.Sheets("XML_Parser").Range("C" & 1).Value = "Item Node"
mainWorkBook.Sheets("XML_Parser").Range("D" & 1).Value = "Value"
Set Songs = objDOM.SelectNodes("/channel/item")
mainWorkBook.Sheets("XML_Parser").Range("A" & 2).Value = Songs.Length 'always says 0
intCounter = 2
For i = 0 To Songs.Length - 1
For j = 0 To Songs(i).ChildNodes.Length - 1
mainWorkBook.Sheets("XML_Parser").Range("A" & intCounter).Value = i + 1
mainWorkBook.Sheets("XML_Parser").Range("B" & intCounter).Value = j + 1
mainWorkBook.Sheets("XML_Parser").Range("C" & intCounter).Value = Songs(i).ChildNodes(j).tagname
mainWorkBook.Sheets("XML_Parser").Range("D" & intCounter).Value = Songs(i).ChildNodes(j).Text
mainWorkBook.Sheets("XML_Parser").Range("A" & intCounter).Borders.Value = 1
mainWorkBook.Sheets("XML_Parser").Range("B" & intCounter).Borders.Value = 1
mainWorkBook.Sheets("XML_Parser").Range("C" & intCounter).Borders.Value = 1
mainWorkBook.Sheets("XML_Parser").Range("D" & intCounter).Borders.Value = 1
intCounter = intCounter + 1
mainWorkBook.Sheets("XML_Parser").Range("A" & intCounter, "B" & intCounter, "C" & intCounter, "D" & intCounter).Interior.ColorIndex = 40
mainWorkBook.Sheets("XML_Parser").Range("A" & intCounter, "B" & intCounter, "C" & intCounter, "D" & intCounter).Borders.Value = 1
mainWorkBook.Sheets("XML_Parser").Range("A" & intCounter).Value = "Song Number"
mainWorkBook.Sheets("XML_Parser").Range("B" & intCounter).Value = "Tag Number"
mainWorkBook.Sheets("XML_Parser").Range("C" & intCounter).Value = "Item Node"
mainWorkBook.Sheets("XML_Parser").Range("D" & intCounter).Value = "Value"
Next
intCounter = intCounter + 1
Next
End Function
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
<channel>
<title>WTMD Playlist</title>
<link>http://www.wtmd.org</link>
<description>Recently Played Songs at WTMD</description>
<item>
<title>Son Volt - Drown</title>
<link>
http://www.amazon.com/exec/obidos/external-search?tag=wt897fmrafomu-20&index=digital-music-track&keyword=Son+Volt%2BDrown
</link>
<description>Album: Trace</description>
<pubDate>Mon, 03 Dec 2012 20:09:44 GMT</pubDate>
<guid>
http://www.amazon.com/exec/obidos/external-search?tag=wt897fmrafomu-20&index=digital-music-track&keyword=Son+Volt%2BDrown
</guid>
<dc:date>2012-12-03T20:09:44Z</dc:date>
</item>
<item>
<title>Allen Stone - Sleep</title>
<link>
http://www.amazon.com/exec/obidos/external-search?tag=wt897fmrafomu-20&index=digital-music-track&keyword=Allen+Stone%2BSleep
</link>
<description>Album: Allen Stone</description>
<pubDate>Mon, 03 Dec 2012 20:07:19 GMT</pubDate>
<guid>
http://www.amazon.com/exec/obidos/external-search?tag=wt897fmrafomu-20&index=digital-music-track&keyword=Allen+Stone%2BSleep
</guid>
<dc:date>2012-12-03T20:07:19Z</dc:date>
</item>
When testing the length the Songs list always comes at zero and I can't figure out why. Everything I've tried either has the same problem or gives an error.
As QHarr pointed out, in parts of the XML document, you have text containing the & symbol (ampersand), for example:
tag=wt897fmrafomu-20&index=digital-music-track
The & symbol on its own is not allowed as part of well formed XML and needs to be replaced by the entity reference &:
tag=wt897fmrafomu-20&index=digital-music-track
The XPath query in the call to selectNodes is also incorrect. You have:
Set Songs = objDOM.SelectNodes("/channel/item")
That would only be correct if "channel" was the root node of the XML document. In fact, "rss" is the root node so you would need to use this instead:
Set Songs = objDOM.SelectNodes("/rss/channel/item")
The calls to set the color index and borders are incorrect. You should have this instead:
mainWorkBook.Sheets("XML_Parser").Range("A" & intCounter & ":D" & intCounter).Interior.ColorIndex = 40
mainWorkBook.Sheets("XML_Parser").Range("A" & intCounter & ":D" & intCounter).Borders.Value = 1
The XML document as shown in the question is incomplete so I'm assuming that the remaining tags were added to close the "channel" and "rss" tags. I'm also assuming that an XML declaration was added at the start:
<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
<channel>
<title>WTMD Playlist</title>
<link>http://www.wtmd.org</link>
<description>Recently Played Songs at WTMD</description>
<item>
<title>Son Volt - Drown</title>
<link>
http://www.amazon.com/exec/obidos/external-search?tag=wt897fmrafomu-20&index=digital-music-track&keyword=Son+Volt%2BDrown
</link>
<description>Album: Trace</description>
<pubDate>Mon, 03 Dec 2012 20:09:44 GMT</pubDate>
<guid>
http://www.amazon.com/exec/obidos/external-search?tag=wt897fmrafomu-20&index=digital-music-track&keyword=Son+Volt%2BDrown
</guid>
<dc:date>2012-12-03T20:09:44Z</dc:date>
</item>
<item>
<title>Allen Stone - Sleep</title>
<link>
http://www.amazon.com/exec/obidos/external-search?tag=wt897fmrafomu-20&index=digital-music-track&keyword=Allen+Stone%2BSleep
</link>
<description>Album: Allen Stone</description>
<pubDate>Mon, 03 Dec 2012 20:07:19 GMT</pubDate>
<guid>
http://www.amazon.com/exec/obidos/external-search?tag=wt897fmrafomu-20&index=digital-music-track&keyword=Allen+Stone%2BSleep
</guid>
<dc:date>2012-12-03T20:07:19Z</dc:date>
</item>
</channel>
</rss>

Convert Rich Text to HTML formatting tags

I'm working with an Excel list and want to turn:
Quercus agrifolia var. oxyadenia (Torr.) J.T. Howell
into:
<i>Quercus agrifolia</i> var. <i>oxyadenia</i> (Torr.) J.T. Howell
I have the Rich Text formatted list with formatting applied but I want to send it to Access with the formatting tags explicitly included around the related text.
I was looking to do the same thing, and found an answer on MSDN at:
Convert contents of a formatted excel cell to HTML format
I hope this helps you as well, it uses an excel macro.
Edit:
When using this I needed to modify the code for nested tags, please find my updates to the macro below:
Function fnConvert2HTML(myCell As Range) As String
Dim bldTagOn, itlTagOn, ulnTagOn, colTagOn As Boolean
Dim i, chrCount As Integer
Dim chrCol, chrLastCol, htmlTxt, htmlEnd As String
bldTagOn = False
itlTagOn = False
ulnTagOn = False
colTagOn = False
chrCol = "NONE"
'htmlTxt = "<html>"
htmlTxt = ""
chrCount = myCell.Characters.Count
For i = 1 To chrCount
htmlEnd = ""
With myCell.Characters(i, 1)
If (.Font.Color) Then
chrCol = fnGetCol(.Font.Color)
If Not colTagOn Then
htmlTxt = htmlTxt & "<font color=#" & chrCol & ">"
colTagOn = True
Else
If chrCol <> chrLastCol Then htmlTxt = htmlTxt & "</font><font color=#" & chrCol & ">"
End If
Else
chrCol = "NONE"
If colTagOn Then
htmlEnd = "</font>" & htmlEnd
'htmlTxt = htmlTxt & "</font>"
colTagOn = False
End If
End If
chrLastCol = chrCol
If .Font.Bold = True Then
If Not bldTagOn Then
htmlTxt = htmlTxt & "<b>"
bldTagOn = True
End If
Else
If bldTagOn Then
'htmlTxt = htmlTxt & "</b>"
htmlEnd = "</b>" & htmlEnd
bldTagOn = False
End If
End If
If .Font.Italic = True Then
If Not itlTagOn Then
htmlTxt = htmlTxt & "<i>"
itlTagOn = True
End If
Else
If itlTagOn Then
'htmlTxt = htmlTxt & "</i>"
htmlEnd = "</i>" & htmlEnd
itlTagOn = False
End If
End If
If .Font.Underline > 0 Then
If Not ulnTagOn Then
htmlTxt = htmlTxt & "<u>"
ulnTagOn = True
End If
Else
If ulnTagOn Then
'htmlTxt = htmlTxt & "</u>"
htmlEnd = "</u>" & htmlEnd
ulnTagOn = False
End If
End If
If (Asc(.Text) = 10) Then
htmlTxt = htmlTxt & htmlEnd & "<br>"
Else
htmlTxt = htmlTxt & htmlEnd & .Text
End If
End With
Next
If colTagOn Then
htmlTxt = htmlTxt & "</font>"
colTagOn = False
End If
If bldTagOn Then
htmlTxt = htmlTxt & "</b>"
bldTagOn = False
End If
If itlTagOn Then
htmlTxt = htmlTxt & "</i>"
itlTagOn = False
End If
If ulnTagOn Then
htmlTxt = htmlTxt & "</u>"
ulnTagOn = False
End If
'htmlTxt = htmlTxt & "</html>"
fnConvert2HTML = htmlTxt
End Function
Function fnGetCol(strCol As String) As String
Dim rVal, gVal, bVal As String
strCol = Right("000000" & Hex(strCol), 6)
bVal = Left(strCol, 2)
gVal = Mid(strCol, 3, 2)
rVal = Right(strCol, 2)
fnGetCol = rVal & gVal & bVal
End Function
Here's an alternative solution which is faster, but produces messier output (because it uses Word's HTML engine). You need to add the following references to your VBA project:
Microsoft HTML Object Library
Microsoft Scripting Runtime
Microsoft Word 16.0 Object Library
Then, call the following code by running eg. convertToHtml(Range("A1:A100")) in the immediate window:
' Given a temporary file path, load the HTML in that file
' and return the first paragraph's inner HTML.
Function extractFirstParagraph(filePath As String) As String
Dim fs As New FileSystemObject, _
html As New MSHTML.HTMLDocument, _
par As MSHTML.HTMLGenericElement
html.body.innerHTML = fs.OpenTextFile(filePath).ReadAll
Set par = html.getElementsByTagName("P")(0)
extractFirstParagraph = par.innerHTML
End Function
Sub convertToHtml(rng As Range)
' Open a single Word instance.
Dim w As New Word.Application, d As Word.Document
Set d = w.Documents.Add
Dim cell As Range
Const tempFile As String = "c:\temp\msword.html"
' For each cell in the range ...
For Each cell In rng
If cell.Value <> "" Then
' ... copy it into the Word document ...
cell.Copy
d.Range.PasteSpecial xlPasteFormats
' ... save the Word document as HTML
' in a temporary file ...
d.SaveAs2 tempFile, wdFormatHTML
' ... and extract the first paragraph.
cell.Value = extractFirstParagraph(tempFile)
Debug.Print "Cell " & cell.Address & " done."
End If
Next cell
' Close Word once you're done. Note that if a bug
' is encountered, this cleanup won't occur and the
' Word process will need to be killed to release
' file locks, otherwise you get an unhelpful error.
w.Quit False
End Sub
You can clean up the output using regular expressions by adding a reference to Microsoft VBScript Regular Expressions 5.5, and running a function like this:
' Used to avoid duplication in cleanWordHtml.
Function eraseInPlace(ByRef r As RegExp, _
ByRef s As String, p As String) As String
r.Pattern = p
s = r.Replace(s, "")
End Function
' Eliminate junk tags from HTML generated by Word.
Function cleanWordHtml(inputString As String)
Dim r As New RegExp
r.Global = True
eraseInPlace r, inputString, "mso-[^;""]*(; )?"
eraseInPlace r, inputString, " style="""""
eraseInPlace r, inputString, "<\?xml[^>]*>"
eraseInPlace r, inputString, "<\/?o:[^>]*>"
eraseInPlace r, inputString, "<SPAN><\/SPAN>"
cleanWordHtml = inputString
End Function
If you need to convert <span> tags to <font> tags (I also needed to do this because I was importing into an Access rich text field, which doesn't support CSS), try calling this function and passing in the MSHTML objects constructed in the extractFirstParagraph function:
' Given a <p> DOM node, replace any children of the
' form <span style="color: foo"> with <font color="foo">.
Function convertSpanToFont(ByRef par As MSHTML.HTMLGenericElement, _
ByRef doc As MSHTML.HTMLDocument)
Dim span As MSHTML.HTMLSpanElement, _
font As MSHTML.HTMLFontElement
For Each span In par.getElementsByTagName("span")
Set font = doc.createElement("font")
If IsNull(span.Style.Color) _
Or span.Style.Color <> "" Then
font.Color = span.Style.Color
font.innerHTML = span.innerHTML
span.insertAdjacentElement "afterEnd", font
span.removeNode True
End If
Next span
End Function
I also considered just saving the whole spreadsheet as HTML from Excel and then using another tool to get that into a format that Access can deal with, but Excel's HTML export generates CSS classes rather than inline styles. This method is also helpful if you only need to convert part of your spreadsheet to HTML.
There is actually a much cleaner and faster solution using Excel's internal XML-representation and transforming it with an XSL-Stylesheet.
You can find VBA-Code and the required XSLT on https://github.com/HeimMatthias/Excel-HTML-Tools-Public
Disclaimer: I've written this tool myself after using the unforgivably slow VBA-script for years. Sample output can be seen in this fiddle.
For technical details:
The Range.Value-Property can take a parameter xlRangeValueXMLSpreadsheet, with which it returns a fully formatted xml-String that includes an html-object of its contents.
Running ActiveSheet.Range("A1").Value(xlRangeValueXMLSpreadsheet) on a cell containing
Quercus agrifolia var. oxyadenia (Torr.) J.T. Howell
returns the following String:
<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
<Styles>
<Style ss:ID="Default" ss:Name="Normal">
<Alignment ss:Vertical="Bottom"/>
<Borders/>
<Font ss:FontName="Calibri" ss:Size="11" ss:Color="#000000"/>
<Interior/>
<NumberFormat/>
<Protection/>
</Style>
<Style ss:ID="s62">
<Font ss:FontName="Calibri" ss:Size="11" ss:Color="#000000"
ss:Italic="1"/>
</Style>
</Styles>
<Worksheet ss:Name="Tabelle1">
<Table ss:ExpandedColumnCount="1" ss:ExpandedRowCount="1"
ss:DefaultColumnWidth="61.714285714285708"
ss:DefaultRowHeight="14.571428571428571">
<Row>
<Cell ss:StyleID="s62"><ss:Data ss:Type="String"
xmlns="http://www.w3.org/TR/REC-html40"><I><Font html:Color="#000000">Quercus agrifolia</Font></I><Font
html:Color="#000000"> var. </Font><I><Font html:Color="#000000">oxyadenia</Font></I><Font
html:Color="#000000"> (Torr.) J.T. Howell</Font></ss:Data></Cell>
</Row>
</Table>
</Worksheet>
</Workbook>
The Cell-Tag surrounds an ss:Data-Object containing -- more or less -- clean html-Code. In any case, it is much simpler and much faster to sanitize this data to obtain clean html than to parse over each letter and access its styles. (By much faster, I mean easily a factor of 100x faster).
Things to watch out for: Cell-styles (both from the template and the individual cell) are not represented as html. This isn't always particularly obvious. In the example above, the cell has automatically received an italic styling, because the first word has been italicized. Since there are also roman (upright) words in the string, the code for the italic passages is represented in the html. But if the entire cell were italicized the <i>-tags would be missing. This is particularly confusing, because you cannot just surround the entire html with an <i>-Tag if the corresponding style has ss:Italic="1" as a property, you will also need to check whether it includes -Tags inside the html-part. This, of course, applies to all cell styles.

in Classic ASP: Convert comma separated string

How do I convert, in Classic ASP, this (example) comma separated Querystring
Illinois, Iowa, Wisconsin, Indiana, Kentucky,Missouri to
'Illinois', 'Iowa', 'Wisconsin', 'Indiana', 'Kentuck','Missouri'
I am using Dreamweaver and need this format to select records from a table which contain one of the states in the string Using IN clause.
If you need just the conversion here it is:
origstring = "Illinois, Iowa, Wisconsin, Indiana, Kentucky, Missouri"
convertedstring = "'" &replace(origstring, ", ","', '") &"'"
But let's say you (or anyone else) have a querystring with states:
strinclause = ""
arrstates = Split(Request.QueryString("states"), ",")
if(Ubound(arrstates) > 0) Then
for i=0 to Ubound(arrstates)
'VALIDATE/CLEAN YOUR ARRAY ITEMS AND BUILD YOUR STRING IN THIS WAY:
strinclause = strinclause & "'" &arrstates(i)& "',"
next
'NOW REMOVE LAST COMA
strinclause = Left(strinclause,Len(strinclause)-1)
end if
'HERE YOU CAN USE YOUR STRING THAT WILL BE IN THE FORMAT YOU ASKED
Jut spitballing, but you could do something like this:
<%
Dim qs : qs = Trim("" & Request.QueryString("states") ' e.g. "Illinois, Iowa, Wisconsin, Indiana, Kentucky, Missouri"
Dim cmd : Set cmd = Server.CreateObject("ADODB.Command")
cmd.ActiveConnection = db.connection ' defined elsewhere
cmd.ActiveConnection.CursorLocation = adUseClient ' 3
cmd.CommandType = adCmdText ' 1
cmd.CommandTimeout = 90
dim inp : inp = Split(qs, ",") ' states
dim sql, i : ReDim sql (ubound(inp)) ' for sql
for i = 0 to ubound(inp)
sql(i) = "select ?" ' parameter in string
cmd.Parameters.Append cmd.CreateParameter("#p" & i, adVarChar, adParamInput, 100, inp(i)) ' actual typed parameter object
next
cmd.CommandText = "select id, stock from warehouse where state in (" & Join(sql, " union ") & ")"
Set rs = cmd.Execute
Do While Not rs.eof
response.write "<p>" & rs("id") & ": " & rs("stock") & "</p>"
rs.MoveNext
Loop
Set rs = Nothing
Set cmd = Nothing
%>

Replace a number in CSV file with VBscript without replacing all text

I'm working on this code
Dim strFirm,soNumber,strValues,arrStr,strCitrix,NewText,text
strFirm = "Gray"
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTextFile = objFSO.OpenTextFile("cloud.csv",1,True)
Do while not objTextFile.AtEndOfStream
arrStr = Split(objTextFile.ReadLine, ",")
If arrStr(0) = strFirm Then
soNumber = arrStr(1)
Exit Do
End If
Loop
objTextFile.Close
strCitrix = soNumber + 1
MsgBox "Cloud Client " & strFirm & " is now using " & strCitrix & " Citrix licenses."
NewText = Replace(soNumber, soNumber, strCitrix)
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTextFile = objFSO.OpenTextFile("cloud.csv",2,True)
objTextFile.Writeline NewText
objTextFile.Close
However when I run the code the replacement wipes out all the text on my file with the exception of the number I'm writing.
What I want it to do is to leave all the other text in place and only change the one specified variable.
Example
Client1,5
Client2,7
Client3,12
Gray,6
Client4,9
Client5,17
Client6,8
And after running the script
Client1,5
Client2,7
Client3,12
Gray,7
Client4,9
Client5,17
Client6,8
Can anyone point out what I'm doing wrong?
Thank you in advance for your help.
Your output file contains only the number you're changing, because you extract just that number from the text you read from the file:
soNumber = arrStr(1)
increment it by one:
strCitrix = soNumber + 1
replace the number in soNumber (which contains only the number anyway) with the incremented number:
NewText = Replace(soNumber, soNumber, strCitrix)
and then write only that new number back to the file:
objTextFile.Writeline NewText
To preserve those parts of the original content that you want to keep you need to write them back to the file as well, not just the modified content.
If you read the source file line-by-line (which is a good idea when processing large files, as it avoids memory exhaustion), you should write the output to a temporary file as you go:
Set inFile = objFSO.OpenTextFile("cloud.csv")
Set outFile = objFSO.OpenTextFile("cloud.csv.tmp", 2, True)
Do while not objTextFile.AtEndOfStream
line = inFile.ReadLine
arrStr = Split(line, ",")
If arrStr(0) = strFirm Then
soNumber = CInt(arrStr(1))
outFile.WriteLine arrStr(0) & "," & (soNumber + 1)
Else
outFile.WriteLine line
End If
Loop
inFile.Close
outFile.Close
and then replace the original file with the modified one:
objFSO.DeleteFile "cloud.csv", True
objFSO.MoveFile "cloud.csv.tmp", "cloud.csv"
However, if your input file is small, it's easier to just read the entire file, process it, and overwrite the file with the modified content:
text = Split(objFSO.OpenTextFile("cloud.csv").ReadAll, vbNewLine)
For i = 0 To UBound(text)
If Len(text(i)) > 0 Then
arrStr = Split(text(i), ",")
If arrStr(0) = strFirm Then
soNumber = CInt(arrStr(1))
text(i) = arrStr(0) & "," & (soNumber + 1)
End If
End If
Next
objFSO.OpenTextFile("cloud.csv", 2, True).Write Join(text, vbNewLine)
The Len(text(i)) > 0 check is for skipping over empty lines (including a trailing newline at the end of the file), because empty strings are split into empty arrays, which would in turn make the check arrStr(0) = strFirm fail with an index out of bounds error.
For short file, I'd prefer a .ReadAll()/RegExp strategy:
Dim oFS : Set oFS = CreateObject("Scripting.FileSystemObject")
Dim sFirma : sFirma = "Gray"
Dim sFSpec : sFSpec = "..\data\cloud.csv"
Dim sAll : sAll = oFS.OpenTextFile(sFSpec).ReadAll()
Dim reCut : Set reCut = New RegExp
reCut.Global = True
reCut.Multiline = True
reCut.Pattern = "^(" & sFirma & ",)(\d+)"
Dim oMTS : Set oMTS = reCut.Execute(sAll)
If 1 = oMTS.Count Then
oFS.CreateTextFile(sFSpec).Write reCut.Replace(sAll, "$1" & (CLng(oMTS(0).SubMatches(1)) + 1))
Else
' handle error
End If
WScript.Echo oFS.OpenTextFile(sFSpec).ReadAll()
output:
Client1,5
Client2,7
Client3,12
Gray,7
Client4,9
Client5,17
Client6,8

Resources