how to get inner text of html under id? - excel

I am trying to pull data pull inner text under id in excel cell.
This is for XML code.
Sub getelementbyid()
Dim XMLpage As New MSXML2.XMLHTTP60
Dim hdoc As New MSHTML.HTMLDocument
Dim HBEs As MSHTML.IHTMLElementCollection
Dim HBE As MSHTML.IHTMLElement
Dim ha As String
XMLpage.Open "GET","https://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuote.jsp?symbol=HAL", False
XMLpage.send
hdoc.body.innerHTML = XMLpage.responseText
ha = hdoc.getelementbyid("open").innerText
Range("K11").Value = ha
Debug.Print ha
End Sub
I expect output value, but it shows --.

Examine the response text. There is a difference in the way the page is rendered in the browser versus what is returned in the ResponseText.
I put the URL into a browser went into dev tools (F12), found the element, and noted the numeric value inside the HTML element.
Then I dumped the response text we're getting in VBA into a cell and copied the entire cell value into Notepad++. If you do that you'll see the initial value inside the #open element is indeed "--".
The real value appears to be getting written into the HTML via JavaScript, which is common practice. There is a JSON object at the top of the page, presumably injected into the document from the back-end of the website upon your request.
So you have to parse the JSON, not the HTML. I've provided code doing just that. Now, there may be a better way to do it, I feel this code is kind of "hacky" but it's getting the job done for your example URL.
Sub getelementbyid()
Dim XMLpage As New MSXML2.XMLHTTP60
Dim hdoc As New MSHTML.HTMLDocument
Dim HBEs As MSHTML.IHTMLElementCollection
Dim HBE As MSHTML.IHTMLElement
Dim ha As String
XMLpage.Open "GET", "https://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuote.jsp?symbol=HAL", False
XMLpage.send
'// sample: ,"open":"681.05",
Dim token As String
token = """open"":"""
Dim startPosition As Integer
startPosition = InStr(1, XMLpage.responseText, token)
Dim endPosition As Integer
endPosition = InStr(startPosition, XMLpage.responseText, ",")
Dim prop As String
prop = Mid(XMLpage.responseText, startPosition, endPosition - startPosition)
prop = Replace(prop, """", vbNullString)
prop = Replace(prop, "open:", vbNullString)
Dim val As Double
val = CDbl(prop)
ha = val
Range("K11").Value = ha
Debug.Print ha
End Sub

Here are two methods. 1) Using regex on the return text. Usually frowned upon but perfectly serviceable here. 2) Traditional extract json string and use json parser to parse out value.
The data you want is stored in a json string found both on the webpage and the xmlhtttp response, under the same element:
This means you can treat the html as a string and target just the pattern for the open price using regex as shown below, or parse the xmlhttp request into an html parser, grab the required element, extract its innerText and trim off the whitespace, then pass to a json parser to extract the open price.
In both methods you want to avoid being served cached results so the following header is an important addition to attempt to mitigate for this:
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
There is no need for addtional cell formatting. Full value comes out for both your tickers.
Regex:
It is present in a json string in the response. You can regex it out easily from return text.
Regex explanation:
VBA:
Option Explicit
Public Sub GetClosePrice()
Dim ws As Worksheet, re As Object, p As String, r As String
Set ws = ThisWorkbook.Worksheets("Sheet1")
p = """open"":""(.*?)"""
Set re = CreateObject("VBScript.RegExp")
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuote.jsp?symbol=HAL", False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
If .Status = 200 Then
r = GetValue(re, .responseText, p)
Else
r = "Failed connection"
End If
End With
ws.Range("K11").Value = r
End Sub
Public Function GetValue(ByVal re As Object, ByVal inputString As String, ByVal pattern As String) As String
With re
.Global = True
.pattern = pattern
If .test(inputString) Then ' returns True if the regex pattern can be matched agaist the provided string
GetValue = .Execute(inputString)(0).submatches(0)
Else
GetValue = "Not found"
End If
End With
End Function
HTML and json parser:
This requires installing code for jsonparser from jsonconverter.bas in a standard module called JsonConverter and then going VBE>Tools>References>Add a reference to Microsoft Scripting Runtime and Microsoft HTML Object Library.
VBA:
Option Explicit
Public Sub GetClosePrice()
Dim ws As Worksheet, re As Object, r As String, json As Object
Set ws = ThisWorkbook.Worksheets("Sheet1")
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuote.jsp?symbol=MRF", False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
If .Status = 200 Then
Dim html As HTMLDocument
Set html = New HTMLDocument
html.body.innerHTML = .responseText
Set json = JsonConverter.ParseJson(Trim$(html.querySelector("#responseDiv").innerText))
r = json("data")(1)("open")
Else
r = "Failed connection"
End If
End With
ws.Range("K11").Value = r
End Sub

Related

Print each item in a JSON Object as separate row in excel file using VBA

I'm trying to read the json from an URL, and paste the json object items as a separate row in an excel sheet.
But till now, I'm able to fetch the entire json object into the excel in single column A1.
I tried to iterate through the json objects and print in separate rows.
But getting 424 Error : Object Qualifier
Need your help on the same.
Sample Data : {"data":{"id":3,"name":"true red","year":2002,"color":"#BF1932","pantone_value":"19-1664"},"support":{"url":"https://reqres.in/#support-heading","text":"To keep ReqRes free, contributions towards server costs are appreciated!"}}
Private Sub HTML_VBA_Extract_Data_From_Website_To_Excel()
Dim oXMLHTTP As Object
Dim sPageHTML As String
Dim sURL As String
'Change the URL before executing the code. URL to Extract data from.
sURL = "https://reqres.in/api/products/3"
'Extract data from website to Excel using VBA
Set oXMLHTTP = CreateObject("MSXML2.ServerXMLHTTP")
oXMLHTTP.Open "GET", sURL, False
oXMLHTTP.send
sPageHTML = oXMLHTTP.responseText
Dim jsonObject As Object
Set jsonObject = JsonConverter.ParseJson(sPageHTML)
Dim i As Integer
Dim Item As Variant
i = 2
For Each Item In jsonObject.Keys
ThisWorkbook.Sheets(1).Cells(i, 1).Value = Item
i = i + 1
Next
MsgBox "XMLHTML Fetch Completed"
End Sub
This outputs:
data
support
--------
id 3
name true red
year 2002
color #BF1932
pantone_value 19-1664
to the Immediate window, as expected.
Private Sub JsonTester()
Dim oXMLHTTP As Object
Dim sURL As String
Dim jsonObject As Object
Dim Item As Variant, data As Object
sURL = "https://reqres.in/api/products/3"
Set oXMLHTTP = CreateObject("MSXML2.ServerXMLHTTP")
oXMLHTTP.Open "GET", sURL, False
oXMLHTTP.send
Set jsonObject = JsonConverter.ParseJson(oXMLHTTP.responseText)
For Each Item In jsonObject.keys
Debug.Print Item
Next
Debug.Print "--------"
Set data = jsonObject("data")
For Each Item In data.keys
Debug.Print Item, data(Item)
Next
End Sub

Scrape economic data from investing website

I am working on a code to get data from : https://www.investing.com/economic-calendar/core-durable-goods-orders-59
I have got the code for getting this via httprequest: but looking to change this to work for the economic data (link above) is there any way I can get the same for the economic indicators??
code below:
Option Explicit
Sub Export_Table()
'Html Objects---------------------------------------'
Dim htmlDoc As MSHTML.HTMLDocument
Dim htmlBody As MSHTML.htmlBody
Dim ieTable As MSHTML.HTMLTable
Dim Element As MSHTML.HTMLElementCollection
'Workbooks, Worksheets, Ranges, LastRow, Incrementers ----------------'
Dim wb As Workbook
Dim Table As Worksheet
Dim i As Long
Set wb = ThisWorkbook
Set Table = wb.Worksheets("Sheet1")
'-------------------------------------------'
Dim xmlHttpRequest As New MSXML2.XMLHTTP60 '
'-------------------------------------------'
i = 2
'Web Request --------------------------------------------------------------------------'
With xmlHttpRequest
.Open "POST", "https://www.investing.com/instruments/HistoricalDataAjax", False
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
.setRequestHeader "X-Requested-With", "XMLHttpRequest"
.send "curr_id=951681&smlID=1695217&header=CLNX+Historical+Data&st_date=01%2F01%2F2017&end_date=03%2F01%2F2019&interval_sec=Monthly&sort_col=date&sort_ord=DESC&action=historical_data"
If .Status = 200 Then
Set htmlDoc = CreateHTMLDoc
Set htmlBody = htmlDoc.body
htmlBody.innerHTML = xmlHttpRequest.responseText
Set ieTable = htmlDoc.getElementById("curr_table")
For Each Element In ieTable.getElementsByTagName("tr")
Table.Cells(i, 1) = Element.Children(0).innerText
Table.Cells(i, 2) = Element.Children(1).innerText
Table.Cells(i, 3) = Element.Children(2).innerText
Table.Cells(i, 4) = Element.Children(3).innerText
Table.Cells(i, 5) = Element.Children(4).innerText
Table.Cells(i, 6) = Element.Children(5).innerText
Table.Cells(i, 7) = Element.Children(6).innerText
i = i + 1
DoEvents: Next Element
End If
End With
Set xmlHttpRequest = Nothing
Set htmlDoc = Nothing
Set htmlBody = Nothing
Set ieTable = Nothing
Set Element = Nothing
End Sub
Public Function CreateHTMLDoc() As MSHTML.HTMLDocument
Set CreateHTMLDoc = CreateObject("htmlfile")
End Function
I have used the excel tool Power Query for this very thing. It is also called Get & Transform Data. I don't like using vba for doing this type of thing.
To make it work:
In Excel Go to Data>Get Data>From Other Sources>From Web.
Enter the URL
Wait for the webpage to load and then pick the table that you want.
This website took awhile to load, but it did work for me.
Choose "Load" which goes directly to the sheet, or "Transform Data" to manipulate the data in Power Query. There are many options in power query such as split columns, filter data, Calculate Columns and ...
I would avoid the overhead of setting up a permanent connection and simply continue using XHR. With the data > from web, you cannot grab more rows than are present on the initial landing. If however you go with XHR, you can issue POST requests to get more data. The code below utilizes a loop to retrieve additional results beyond the immediate visible on the page.
When you press the Show more link there is a POST request for an additional 6 rows which uses the latest date from the current set of results as part of the POST body. The response returned is JSON. Rather than bringing in a JSON parser, given the standard nature of the JSON, and that I am already using regex to clean the date format in column 1 to put in the POST body, I use two simple regexes to extract the html for the next results table from the response, and to check whether there are more results.
The format of the JSON is:
{
"historyRows": "<tr>…..</tr>",
"hasMoreHistory": "1"
}
Or
{
"historyRows": "<tr>…..</tr>",
"hasMoreHistory": false
}
So, I do some cleaning of the extracted html in order to not confuse the html parser within MSHTML. Furthermore, I add in an id to identify the table I have constructed, so I can continue to use an id css selector (#) list within my UpdateDateResults function.
I initially oversize an array to store each retrieved table which I update ByRef. I loop requesting more results until either there are no more results, there is an error parsing the maximum date from the last retrieved table column 1, or until my specified earliest date for data retrieval falls within the date range of the latest returned table.
Finally, I write the results array out to the sheet in one go.
N.B. You can target the table by its id. It looks like the number at the end of the id could be the same as for the goods url, lending itself to generalizing the code below to work for other goods.
VBA:
Option Explicit
Public Sub GetInvestingInfo()
'tools > references > Microsoft HTML Object Library
Dim html As MSHTML.HTMLDocument, xhr As Object
Const STARTDATE As Date = "2019-11-25" 'Adjust as required. DateAdd("yyyy", -2, Date) 2 years back. This means may have some earlier months in _
batch that spans the start date but won't issue an additional request after this
Set xhr = CreateObject("MSXML2.XMLHTTP")
Set html = New MSHTML.HTMLDocument
With xhr
.Open "GET", "https://www.investing.com/economic-calendar/core-durable-goods-orders-59", False
.setRequestHeader "User-Agent", "Safari/537.36"
.send
html.body.innerHTML = .responseText
End With
Dim firstTable As Boolean, r As Long, results() As Variant
ReDim results(1 To 100000, 1 To 5)
'process initial table and update results, get cleaned date needed for request for more results
firstTable = True
Dim latestDate As String
UpdateDateResults latestDate, results, firstTable, r, html
Dim re As Object, maxDate As String, hasMoreHistory As Boolean, s As String
Set re = CreateObject("VBScript.RegExp")
With re
.Global = True
.MultiLine = False
End With
maxDate = cleanedDate(latestDate, re)
hasMoreHistory = True
Dim errorDate As Date
errorDate = DateAdd("d", 1, Date)
Do While maxDate >= STARTDATE And maxDate < errorDate 'break loop using pre-defined earliest date, error with date conversion, or when no more rows found
Application.Wait (Now + TimeSerial(0, 0, 1)) 'Pause
s = GetMoreRows(xhr, Format$(maxDate, "YYYY-MM-DD")) 'max a POST request for more data
re.Pattern = "hasMoreHistory"":(""?.*?""?)}" 'Check if there are more rows still available. "1" for yes, false for no
hasMoreHistory = (re.Execute(s)(0).submatches(0) <> False)
If Not hasMoreHistory Then Exit Do
re.Pattern = "historyRows"":""(.*)"","
html.body.innerHTML = "<table id=""me"">" & Replace$(re.Execute(s)(0).submatches(0), "\/", "/") & "</table>" 'fix html and feed into html variable
UpdateDateResults latestDate, results, firstTable, r, html
maxDate = cleanedDate(latestDate, re) 'convert value retrieved from last row in date column of table to an actual date
Loop
With ActiveSheet
.Cells.ClearContents
.Cells(1, 1).Resize(r, 5) = results 'Don't bother to resize results as clear all cells before write ou
End With
End Sub
Public Sub UpdateDateResults(ByRef latestDate As String, ByRef results() As Variant, ByRef firstTable As Boolean, ByRef r As Long, ByVal html As MSHTML.HTMLDocument)
Dim table As MSHTML.HTMLTable 'return latest date from function
Set table = html.querySelector("#eventHistoryTable59, #me")
latestDate = table.Rows(table.Rows.Length - 1).Children(0).innerText
Dim i As Long, n As Long, j As Long
n = IIf(firstTable, 0, 1)
For i = n To table.Rows.Length - 1
r = r + 1
For j = 0 To table.Rows(i).Children.Length - 2
results(r, j + 1) = table.Rows(i).Children(j).innerText
Next
Next
firstTable = False
End Sub
Public Function cleanedDate(ByVal dirtyString As String, ByVal re As Object) As Date
re.Pattern = "(^[A-Z][a-z]{2}).*(\d{2}),.(\d{4})(.*)"
On Error GoTo errhand:
If re.test(dirtyString) Then
cleanedDate = CDate(re.Replace(dirtyString, "$2" & Chr$(32) & "$1" & Chr$(32) & "$3"))
Exit Function
End If
errhand:
cleanedDate = DateAdd("d", 1, Date)
End Function
Public Function GetMoreRows(ByVal xhr As Object, ByVal dateStamp As String) As String
With xhr
.Open "POST", "https://www.investing.com/economic-calendar/more-history", False
.setRequestHeader "User-Agent", "Safari/537.36"
.setRequestHeader "x-requested-with", "XMLHttpRequest"
.setRequestHeader "content-type", "application/x-www-form-urlencoded"
.send "eventID=430865&event_attr_ID=59&event_timestamp=" & dateStamp & "+" & Application.WorksheetFunction.EncodeURL("12:30:00") & "&is_speech=0"
GetMoreRows = .responseText
End With
End Function
Regexes (without the double " escaping for VBA):
hasMoreHistory":("?.*?"?)}
historyRows":"(.*)",

Parse line from HTML response in Excel VBA

I am looking to parse a 'WinHttpRequest' response in Excel VBA to pull a specific line from the HTML response. Here is the code for the HTTP request.
Function getSetName(ByVal setNUMBER As String) As String
Dim oRequest As Object
Dim xmlResponse As String
Dim setDescription As String
Dim setName As String
Set oRequest = CreateObject("WinHttp.WinHttpRequest.5.1")
oRequest.Open "GET", "https://brickset.com/sets/75192" '& setNUMBER
oRequest.Send
xmlResponse = oRequest.ResponseText
'parse xml response here
getSetName = setName
End Function
I am looking to parse only the line with the HTML tag 'meta name=""description""' to a string which I will later pull information from.
Any help or direction on how to parse this single line would be appreciated.
Try
Sub Test_GetSetName()
Debug.Print GetSetName("75192")
End Sub
Function GetSetName(ByVal SetNUMBER As String) As String
Dim html As New MSHTML.HTMLDocument
With CreateObject("WinHttp.WinHttpRequest.5.1")
.Open "GET", "https://brickset.com/sets/" & SetNUMBER, False
.send
html.body.innerHTML = .responseText
End With
GetSetName = html.querySelector("[name='description']").Content
End Function

How can I parse values from a dynamic webpage using Excel VBA when XML/XPath doesn't seem to work?

I am attempting to scrape values from a collection of webpages using an XPath to parse what I want out of the XML. I grab the full XPath from the element using Chrome but then when I use in the code it doesnt seem to select the node I am looking for. Also when I execute the XPath statement in the console it also does not return the node. Some other element XPaths work in console but not in VBA. Am I missing something? My simple test XML works ok. My attempts to use namespace in the XPath were also not successful. Code below with an example of one of the webpages and one of the elements of interest:
Sub test()
testXML = "<test example='hello'>hello</test>"
Dim oXMLHTTP As Object
Dim sPageHTML As String
Dim sURL As String
Dim XmlResponse As String
Dim strXML As String
Dim xNode As MSXML2.IXMLDOMNode
Dim xmlElement As MSXML2.IXMLDOMElement
Dim XDoc As MSXML2.DOMDocument60
sURL = "https://www.bestplaces.net/crime/zip-code/alaska/anchorage/99510"
Set oXMLHTTP = CreateObject("MSXML2.ServerXMLHTTP")
oXMLHTTP.SetOption(2) = 13056 'Disable CA messages
oXMLHTTP.Open "GET", sURL, False
oXMLHTTP.send
XmlResponse = oXMLHTTP.responseText
'strXML = testXML
strXML = XmlResponse
Set XDoc = New MSXML2.DOMDocument60
'XDoc.setProperty "SelectionNamespaces", "xmlns:a='http://www.w3.org/1999/xhtml'"
'XDoc.setProperty "SelectionNamespaces", "xmlns:a='http://www.w3.org/2000/svg'"
'XDoc.setProperty "SelectionNamespaces", "xmlns:a='http://www.w3.org/1999/xlink'"
XDoc.LoadXML (strXML)
'Set xNode = XDoc.SelectSingleNode("/test")
Set xNode = XDoc.SelectSingleNode("/html/body/form/div[7]/div[2]/div[2]/div[3]/div/div/div/div/div/svg/g[6]/g[1]/text/tspan[2]")
If xNode Is Nothing Then
MsgBox "Nothing"
Else: MsgBox xNode.text
End If
End Sub
You are getting html back. A quick look at the page source shows that value is populated dynamically, but should be available by regex out of responseText; so your xpath wouldn't work even if converted to equivalent path for html parser.
Option Explicit
Public Sub GetValue()
Dim http As Object, s As String, re As Object
Set http = CreateObject("MSXML2.XMLHTTP")
Set re = CreateObject("VBScript.RegExp")
With http
.Open "GET", "https://www.bestplaces.net/crime/zip-code/alaska/anchorage/99510", False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
s = .responseText
End With
With re
.Pattern = "data:\s?\[(.*?),"
Debug.Print .Execute(s)(0).SubMatches(0)
End With
End Sub
Regex explanation:

Scraping current date from website using Excel VBA

Error
Librarys
I need the date of the current day. I do not want to place it inside a variable to be able to have it work, instead I would like that variable to be Date or in its default String.
Sub WEB()
Dim IE As Object
Dim allelements As Object
Application.ScreenUpdating = False
Set IE = CreateObject("InternetExplorer.Application")
IE.navigate "http://www.fechadehoy.com/venezuela"
Do Until IE.ReadyState = 4
DoEvents
Loop
Application.Wait (Now + TimeValue("0:00:01"))
IE.document.getElementById ("date")
IE.Visible = True
Set IE = Nothing
Application.ScreenUpdating = True
End Sub
The website is http://www.fechadehoy.com/venezuela
I only need the date of this page. I am not interested in any other element of the macro.
I just need to extract the current date and get it in a variable.
if you need Lunes, 19 de agosto de 2019 then use getElementById for fecha
Debug.Print IE.document.getElementById("fecha").Innerhtml
Why go for IE when xhr can do the trick? You can get the date with the blink of an eye if you opt for XMLHttpRequest.
Sub GetCurrentDate()
Dim S$
With New XMLHTTP
.Open "GET", "http://www.fechadehoy.com/venezuela", False
.send
S = .responseText
End With
With New HTMLDocument
.body.innerHTML = S
MsgBox .getElementById("fecha").innerText
End With
End Sub
Reference to add:
Microsoft XML, v6.0
Microsoft HTML Object Library
To get rid of that reference altogether:
Sub GetCurrentDate()
Dim S$
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "http://www.fechadehoy.com/venezuela", False
.send
S = .responseText
End With
With CreateObject("htmlfile")
.body.innerHTML = S
MsgBox .getElementById("fecha").innerText
End With
End Sub
Although the answer given by #Siddharth Rout is perfectly fine, it would require quite a bit of string manipulation to get the date in a usable form.
For the above reason I'm providing an alternative solution which gets the date in a directly usable format, ready to be manipulated and used in further calculations if necessary.
As a bonus I am demonstrating how to get the date using an HTTP request instead of using the Internet Explorer. This makes the code more efficient.
Option Explicit
Sub getDate()
Dim req As New WinHttpRequest
Dim doc As New HTMLDocument
Dim el As HTMLParaElement
Dim key As String
Dim url As String
Dim retrievedDate As Date
url = "http://www.fechadehoy.com/venezuela"
key = "Fecha actual: "
''''''''''Bonus: Use an HTTP request to get the date instead of opening IE'''''''''''
With req '
.Open "GET", url, False '
.send '
doc.body.innerHTML = .responseText '
'Debug.Print .responseText '
End With '
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
For Each el In doc.getElementsByTagName("p")
If el.innerText Like "Fecha actual*" Then
retrievedDate = Mid(el.innerText, InStr(el.innerText, key) + Len(key), Len(el.innerText))
End If
Next el
End Sub
You will need to add a reference to Microsoft HTML Object Libraryand Microsoft WinHTTP Services version 5.1. To do that, go to VB editor>Tools>References.
Having the date in this format, means it can easily be manipulated. An example would be the use of functions like day(retrievedDate) , month(retrievedDate), year(retrievedDate) etc.

Resources