Parse line from HTML response in Excel VBA - excel

I am looking to parse a 'WinHttpRequest' response in Excel VBA to pull a specific line from the HTML response. Here is the code for the HTTP request.
Function getSetName(ByVal setNUMBER As String) As String
Dim oRequest As Object
Dim xmlResponse As String
Dim setDescription As String
Dim setName As String
Set oRequest = CreateObject("WinHttp.WinHttpRequest.5.1")
oRequest.Open "GET", "https://brickset.com/sets/75192" '& setNUMBER
oRequest.Send
xmlResponse = oRequest.ResponseText
'parse xml response here
getSetName = setName
End Function
I am looking to parse only the line with the HTML tag 'meta name=""description""' to a string which I will later pull information from.
Any help or direction on how to parse this single line would be appreciated.

Try
Sub Test_GetSetName()
Debug.Print GetSetName("75192")
End Sub
Function GetSetName(ByVal SetNUMBER As String) As String
Dim html As New MSHTML.HTMLDocument
With CreateObject("WinHttp.WinHttpRequest.5.1")
.Open "GET", "https://brickset.com/sets/" & SetNUMBER, False
.send
html.body.innerHTML = .responseText
End With
GetSetName = html.querySelector("[name='description']").Content
End Function

Related

Concatenate referenced URL into XML HTTP Request

The following snippet of code sends a XML request to the following site
Sub GetContents()
Dim XMLReq As New MSXML2.XMLHTTP60
XMLReq.Open "Get", "https://echa.europa.eu/brief-profile/-/briefprofile/100.028.723", False
XMLReq.send
End Sub
I have another Sub routine GetURL() which prints out the desired URL in this case: https://echa.europa.eu/brief-profile/-/briefprofile/100.028.723
How can I essentially concatenate the output of GetURL() into the BstrUrl? i.e.
XMLReq.Open "Get", "x", False where x is the output of GetURL()
Despite various attempts the syntax is not accepted as a URL.
Assuming you are combining from your earlier question then you need to ensure you write a function which returns the url (as Tim Williams has pointed out). I would expand upon this, in that I think you would need to consider adding a test to ensure both the request succeeded, there were results, and to pass the searchKeyWord as an argument to make your function more reusable. Along the same lines, you could pass the xmlhttp object into the function, so as to avoid continually creating and destroying them.
Avoid auto-instantiation, as you can get unexpected results, and Hungarian style notation. Personally, I also avoid those type characters, as they are harder to read.
vbNullString will offer faster assignment than = "".
I would also use a shorter, faster, and more reliable css pattern to retrieve the url, based on classes and a parent child relationship of two elements.
Public Sub GetContents()
Dim searchKeyWord As String, xmlReq As MSXML2.XMLHTTP60, html As MSHTML.HTMLDocument, url As String
searchKeyWord = "Acetone"
Set xmlReq = New MSXML2.XMLHTTP60
url = GetUrl(searchKeyWord, xmlReq)
Set html = New MSHTML.HTMLDocument
If url <> "N/A" Then
With xmlReq
.Open "GET", url, False
.send
If .Status = 200 Then
html.body.innerHTML = .responseText
Debug.Print html.querySelector("title").innerText
End If
End With
End If
End Sub
Public Function GetUrl(ByVal searchKeyWord As String, ByVal http As MSXML2.XMLHTTP60) As String
Const url = "https://echa.europa.eu/search-for-chemicals?p_auth=5ayUnMyz&p_p_id=disssimplesearch_WAR_disssearchportlet&p_p_lifecycle=1&p_p_state=normal&p_p_col_id=_118_INSTANCE_UFgbrDo05Elj__column-1&p_p_col_count=1&_disssimplesearch_WAR_disssearchportlet_javax.portlet.action=doSearchAction&_disssimplesearch_WAR_disssearchportlet_backURL=https%3A%2F%2Fecha.europa.eu%2Finformation-on-chemicals%3Fp_p_id%3Ddisssimplesearchhomepage_WAR_disssearchportlet%26p_p_lifecycle%3D0%26p_p_state%3Dnormal%26p_p_mode%3Dview%26p_p_col_id%3D_118_INSTANCE_UFgbrDo05Elj__column-1%26p_p_col_count%3D1%26_disssimplesearchhomepage_WAR_disssearchportlet_sessionCriteriaId%3D"
Dim html As MSHTML.HTMLDocument, dict As Object, i As Long, r As Long
Dim dictKey As Variant, payload$, ws As Worksheet
Set html = New MSHTML.HTMLDocument
Set dict = CreateObject("Scripting.Dictionary")
Set ws = ThisWorkbook.Worksheets("Sheet1")
dict("_disssimplesearchhomepage_WAR_disssearchportlet_formDate") = "1621017052777" 'timestamp
dict("_disssimplesearch_WAR_disssearchportlet_searchOccurred") = "true"
dict("_disssimplesearch_WAR_disssearchportlet_sskeywordKey") = searchKeyWord
dict("_disssimplesearchhomepage_WAR_disssearchportlet_disclaimer") = "true"
dict("_disssimplesearchhomepage_WAR_disssearchportlet_disclaimerCheckbox") = "on"
payload = vbNullString
For Each dictKey In dict
payload = IIf(Len(dictKey) = 0, WorksheetFunction.EncodeURL(dictKey) & "=" & WorksheetFunction.EncodeURL(dict(dictKey)), _
payload & "&" & WorksheetFunction.EncodeURL(dictKey) & "=" & WorksheetFunction.EncodeURL(dict(dictKey)))
Next dictKey
With http
.Open "POST", url, False
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36"
.setRequestHeader "Content-type", "application/x-www-form-urlencoded"
.send (payload)
If .Status = 200 Then
html.body.innerHTML = .responseText
Else
GetUrl = "N/A"
Exit Function
End If
End With
Dim result As Boolean
result = html.querySelectorAll(".lfr-search-container .substanceNameLink").Length > 0
GetUrl = IIf(result, html.querySelector(".lfr-search-container .substanceNameLink").href, "N/A")
End Function
If GetURL is a function returning a string then this should work:
Sub GetContents()
Dim XMLReq As New MSXML2.XMLHTTP60
Dim url
url = GetURL()
XMLReq.Open "Get", url, False
XMLReq.send
End Sub

How can I parse values from a dynamic webpage using Excel VBA when XML/XPath doesn't seem to work?

I am attempting to scrape values from a collection of webpages using an XPath to parse what I want out of the XML. I grab the full XPath from the element using Chrome but then when I use in the code it doesnt seem to select the node I am looking for. Also when I execute the XPath statement in the console it also does not return the node. Some other element XPaths work in console but not in VBA. Am I missing something? My simple test XML works ok. My attempts to use namespace in the XPath were also not successful. Code below with an example of one of the webpages and one of the elements of interest:
Sub test()
testXML = "<test example='hello'>hello</test>"
Dim oXMLHTTP As Object
Dim sPageHTML As String
Dim sURL As String
Dim XmlResponse As String
Dim strXML As String
Dim xNode As MSXML2.IXMLDOMNode
Dim xmlElement As MSXML2.IXMLDOMElement
Dim XDoc As MSXML2.DOMDocument60
sURL = "https://www.bestplaces.net/crime/zip-code/alaska/anchorage/99510"
Set oXMLHTTP = CreateObject("MSXML2.ServerXMLHTTP")
oXMLHTTP.SetOption(2) = 13056 'Disable CA messages
oXMLHTTP.Open "GET", sURL, False
oXMLHTTP.send
XmlResponse = oXMLHTTP.responseText
'strXML = testXML
strXML = XmlResponse
Set XDoc = New MSXML2.DOMDocument60
'XDoc.setProperty "SelectionNamespaces", "xmlns:a='http://www.w3.org/1999/xhtml'"
'XDoc.setProperty "SelectionNamespaces", "xmlns:a='http://www.w3.org/2000/svg'"
'XDoc.setProperty "SelectionNamespaces", "xmlns:a='http://www.w3.org/1999/xlink'"
XDoc.LoadXML (strXML)
'Set xNode = XDoc.SelectSingleNode("/test")
Set xNode = XDoc.SelectSingleNode("/html/body/form/div[7]/div[2]/div[2]/div[3]/div/div/div/div/div/svg/g[6]/g[1]/text/tspan[2]")
If xNode Is Nothing Then
MsgBox "Nothing"
Else: MsgBox xNode.text
End If
End Sub
You are getting html back. A quick look at the page source shows that value is populated dynamically, but should be available by regex out of responseText; so your xpath wouldn't work even if converted to equivalent path for html parser.
Option Explicit
Public Sub GetValue()
Dim http As Object, s As String, re As Object
Set http = CreateObject("MSXML2.XMLHTTP")
Set re = CreateObject("VBScript.RegExp")
With http
.Open "GET", "https://www.bestplaces.net/crime/zip-code/alaska/anchorage/99510", False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
s = .responseText
End With
With re
.Pattern = "data:\s?\[(.*?),"
Debug.Print .Execute(s)(0).SubMatches(0)
End With
End Sub
Regex explanation:

Excel show image from http response

The goal is to have a sheet in excel that contains fields and a button. The click on button will create url from the fields and create a HTTP GET request that returns an image in raw data. This image then should be shown in the current sheet.
This is what i have already done:
Sub GetQrCode()
Dim hReq As Object
Dim ws As Worksheet
Set ws = Sheet1
Dim strUrl As String
strUrl = "https://www.profit365.eu/services/API/QR/PayBySquare.png?IBAN=SK2311000000001234567890&BIC=TATRSKBX&Currency=EUR&Amount=123.40&DueDate=41763&VS=1234568&SS=&CS=&size=256"
Set hReq = CreateObject("MSXML2.XMLHTTP")
With hReq
.Open "GET", strUrl, False
.Send
End With
Dim resp As String
resp = hReq.responseBody
End Sub
Now should come the code that shows the resp (raw image).
Is this possible? Has anyone done something similar?
Thank you for the responses

how to get inner text of html under id?

I am trying to pull data pull inner text under id in excel cell.
This is for XML code.
Sub getelementbyid()
Dim XMLpage As New MSXML2.XMLHTTP60
Dim hdoc As New MSHTML.HTMLDocument
Dim HBEs As MSHTML.IHTMLElementCollection
Dim HBE As MSHTML.IHTMLElement
Dim ha As String
XMLpage.Open "GET","https://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuote.jsp?symbol=HAL", False
XMLpage.send
hdoc.body.innerHTML = XMLpage.responseText
ha = hdoc.getelementbyid("open").innerText
Range("K11").Value = ha
Debug.Print ha
End Sub
I expect output value, but it shows --.
Examine the response text. There is a difference in the way the page is rendered in the browser versus what is returned in the ResponseText.
I put the URL into a browser went into dev tools (F12), found the element, and noted the numeric value inside the HTML element.
Then I dumped the response text we're getting in VBA into a cell and copied the entire cell value into Notepad++. If you do that you'll see the initial value inside the #open element is indeed "--".
The real value appears to be getting written into the HTML via JavaScript, which is common practice. There is a JSON object at the top of the page, presumably injected into the document from the back-end of the website upon your request.
So you have to parse the JSON, not the HTML. I've provided code doing just that. Now, there may be a better way to do it, I feel this code is kind of "hacky" but it's getting the job done for your example URL.
Sub getelementbyid()
Dim XMLpage As New MSXML2.XMLHTTP60
Dim hdoc As New MSHTML.HTMLDocument
Dim HBEs As MSHTML.IHTMLElementCollection
Dim HBE As MSHTML.IHTMLElement
Dim ha As String
XMLpage.Open "GET", "https://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuote.jsp?symbol=HAL", False
XMLpage.send
'// sample: ,"open":"681.05",
Dim token As String
token = """open"":"""
Dim startPosition As Integer
startPosition = InStr(1, XMLpage.responseText, token)
Dim endPosition As Integer
endPosition = InStr(startPosition, XMLpage.responseText, ",")
Dim prop As String
prop = Mid(XMLpage.responseText, startPosition, endPosition - startPosition)
prop = Replace(prop, """", vbNullString)
prop = Replace(prop, "open:", vbNullString)
Dim val As Double
val = CDbl(prop)
ha = val
Range("K11").Value = ha
Debug.Print ha
End Sub
Here are two methods. 1) Using regex on the return text. Usually frowned upon but perfectly serviceable here. 2) Traditional extract json string and use json parser to parse out value.
The data you want is stored in a json string found both on the webpage and the xmlhtttp response, under the same element:
This means you can treat the html as a string and target just the pattern for the open price using regex as shown below, or parse the xmlhttp request into an html parser, grab the required element, extract its innerText and trim off the whitespace, then pass to a json parser to extract the open price.
In both methods you want to avoid being served cached results so the following header is an important addition to attempt to mitigate for this:
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
There is no need for addtional cell formatting. Full value comes out for both your tickers.
Regex:
It is present in a json string in the response. You can regex it out easily from return text.
Regex explanation:
VBA:
Option Explicit
Public Sub GetClosePrice()
Dim ws As Worksheet, re As Object, p As String, r As String
Set ws = ThisWorkbook.Worksheets("Sheet1")
p = """open"":""(.*?)"""
Set re = CreateObject("VBScript.RegExp")
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuote.jsp?symbol=HAL", False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
If .Status = 200 Then
r = GetValue(re, .responseText, p)
Else
r = "Failed connection"
End If
End With
ws.Range("K11").Value = r
End Sub
Public Function GetValue(ByVal re As Object, ByVal inputString As String, ByVal pattern As String) As String
With re
.Global = True
.pattern = pattern
If .test(inputString) Then ' returns True if the regex pattern can be matched agaist the provided string
GetValue = .Execute(inputString)(0).submatches(0)
Else
GetValue = "Not found"
End If
End With
End Function
HTML and json parser:
This requires installing code for jsonparser from jsonconverter.bas in a standard module called JsonConverter and then going VBE>Tools>References>Add a reference to Microsoft Scripting Runtime and Microsoft HTML Object Library.
VBA:
Option Explicit
Public Sub GetClosePrice()
Dim ws As Worksheet, re As Object, r As String, json As Object
Set ws = ThisWorkbook.Worksheets("Sheet1")
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuote.jsp?symbol=MRF", False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
If .Status = 200 Then
Dim html As HTMLDocument
Set html = New HTMLDocument
html.body.innerHTML = .responseText
Set json = JsonConverter.ParseJson(Trim$(html.querySelector("#responseDiv").innerText))
r = json("data")(1)("open")
Else
r = "Failed connection"
End If
End With
ws.Range("K11").Value = r
End Sub

How to get specific data from a tag from XML page?

I have the following code to get the response text from an XML page, but it returns everything in the page:
Private Sub Testing()
Dim xmlhttp As New MSXML2.XMLHTTP60, myurl As String
myurl = "http://schemas.xmlsoap.org/soap/envelope/"
xmlhttp.Open "GET", myurl, False
xmlhttp.send
newstring = xmlhttp.responseText
Sheet1.Range("B2") = newstring
End Sub
The URL in myurl is for example only. The URL i'm trying to get is in an intranet site but this has a similar structure.
The following FIELD tag is not available in the URL above. I used the url as an example to show the type of document.
Let's say there's a tag like below in the middle of the page:
<FIELD NAME="str2">097cf4a8-2755-4c62-939c-9402e0a4e3e2</FIELD>
How do I get only this "097cf4a8-2755-4c62-939c-9402e0a4e3e2"? The str2 is unique.
Here is an example using a publicly available document. The principle of selecting by combination of tag and attribute value is shown.
Option Explicit
Public Sub Testing()
Dim xmlhttp As New MSXML2.XMLHTTP60, myurl As String
myurl = "http://www.chilkatsoft.com/xml-samples/bookstore.xml"
xmlhttp.Open "GET", myurl, False
xmlhttp.send
Dim xmlDoc As New MSXML2.DOMDocument60
Set xmlDoc = New MSXML2.DOMDocument60
xmlDoc.LoadXML xmlhttp.responseText
Debug.Print xmlDoc.SelectSingleNode("//userComment[#rating=""3""]").Text
End Sub
Source:
Output:

Resources