VBA scrape HTML from URL with JavaScript elements - excel

Using the following code
Sub Test()
'Must have the Microsoft HTML Object Library reference enabled
Dim oHtml As HTMLDocument
Dim oElement As Object
Dim link As String
Set oHtml = New HTMLDocument
With CreateObject("WINHTTP.WinHTTPRequest.5.1")
.Open "GET", "https://www.afklcargo.com/WW/en/local/app/index.jsp#/tntdetails/074-26126063", False
.Send
oHtml.Body.innerHTML = .responseText
End With
End Sub
I am unable to get the actual HTML, I believe it's because the website is using Javascript? How can I circumvent this so I can get my value?

It is dynamically added via another xhr call which you can find in the network tab of browser. So change your url to that and then use a json parser to parse the response.
Use a json parser, such as jsonconverter.bas to handle the response. After installing the code from that link in a standard module called JsonConverter, go to VBE > Tools > References > Add a reference to Microsoft Scripting Runtime.
You extract your value from the json which is returned as unix timestamp.
1561791600000 > 2019-06-29T07:00:00.000Z
The json response actually has all the info regarding the tracking. You can explore it here.
Option Explicit
Public Sub Test()
Dim json As Object
With CreateObject("WINHTTP.WinHTTPRequest.5.1")
.Open "GET", "https://www.afklcargo.com/resources/tnt/singleAwbDetails?awbId=074-26126063", False
.send
Set json = JsonConverter.ParseJson(.responseText)
Debug.Print json("booking")(1)("arrivalDateLong")("local")
End With
End Sub
Two functions, by Schmidt and Navion for doing the conversion to datetime from stamp copied directly from here.
Function Epoch2Date(ByVal E As Currency, Optional msFrac) As Date
Const Estart As Double = #1/1/1970#
msFrac = 0
If E > 10000000000# Then E = E * 0.001: msFrac = E - Int(E)
Epoch2Date = Estart + (E - msFrac) / 86400
End Function
Function Epoch2DateS(ByVal epochstamp$) As Date
Epoch2DateS = CVDate(CDbl(DateSerial(1970, 1, 1)) + Int(Val(epochstamp$) / 1000#) / 86400)
End Function
N.B.
Using json parser is the reliable and recommended way, but you can also regex or split function to get the value.

Related

Data values of uploaded images

There are a large number of classes ("df-table") on that link homepage. The first class was able to be entered into the Excel sheet, but not the second class ("df-table"), the third class ("df-table"), etc.
Various internet resources say it can be solved by using nth-of-type or using xpath, but the error keeps occurring.
I want the data values of the uploaded images.
Public Sub Jaemu()
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
Dim d As WebDriver, ws As Worksheet, URL As String
Set d = New ChromeDriver
Set ws = ThisWorkbook.Worksheets("gemstone2")
Dim http As New WinHttpRequest
With d
'.AddArgument "--headless"
.Start "Chrome"
Dim html As HTMLDocument
Dim JsonObject As Object
Set html = New HTMLDocument
URL = "https://globalmonitor.einfomax.co.kr/infomax_ds.html#/USA/1/1"
.get URL, Raise:=False ' set raise to false to avoid a timeout error
d.FindElementByCss("[ng-click='openStockSearchPopup();']").Click
d.FindElementByCss("[ng-enter='searchStockSearchPopup(true);']").SendKeys "GOOGL"
d.FindElementByCss("[ng-click='searchStockSearchPopup(true);']").Click
d.FindElementByCss("[class='slick-cell l1 r1 text-center clickable']").Click
Cells(2, 1).Value = d.FindElementByCss("[class='df-table']").Text
Cells(3, 1).Value = d.FindElementByCss(".table-contents[ng-if='IS_RT_STATE_SUCCESS(requeststate.prospectData)'] > .df-table").Text
End With
End Sub
Original OP error:
runtime error 32.
Coding line.
Cells(3, 1).Value = d.FindElementByCss("[class='df-table' class:nth-of-type(2)]").Text
New error following initial suggestion to use different CSS selector:
Runtime error 7
Coding line.
Cells(3, 1).Value = d.FindElementByCss(".table-contents[ng-if='IS_RT_STATE_SUCCESS(requeststate.prospectData)'] > .df-table").Text
Initial error (RTE32):
The :nth-of-type() pseudo class selector would go outside of the attribute selector closing ] i.e. "[class='df-table']:nth-of-type(2)", however this does not provide a match.
You can use:
.table-contents[ng-if='IS_RT_STATE_SUCCESS(requeststate.prospectData)'] > .df-table
This returns a single match and is specific enough, and not reliant on a long selector list, that it is likely fairly stable. Based on experience.
If you had wanted to use :nth-of-type() you could have used it on a parent element and taken first match, but that is less stable and prone to breaking if html changes:
.contents-area:nth-of-type(5) .df-table
Follow-up error (RTE7):
The element needs to be scrolled into view.
A not ideal way is (your mileage may vary):
d.ExecuteScript "window.scrollTo(0, document.body.scrollHeight/3);"
Application.Wait Now + TimeSerial(0, 0, 2)
Activesheet.Cells(1,3) = d.FindElementByCss(".table-contents[ng-if='IS_RT_STATE_SUCCESS(requeststate.prospectData)'] > .df-table").Text
There are better ways with inbuilt functions and custom JS which you can likely pull from the examples.xlsm by the author on GitHub. As I can only test with Python this was a quick and dirty test that worked.
You could avoid overhead of browser and use XHR to get a JSON response to then use a JSON parser with, as follows:
Option Explicit
Public Sub GetData()
Dim s As String
With CreateObject("MSXML2.XMLHTTP")
.Open "POST", "https://globalmonitor.einfomax.co.kr/facset/getKeyData", False
.SetRequestHeader "User-Agent", "Mozilla/5.0"
.SetRequestHeader "content-type", "application/json;charset=UTF-8"
.send "{""param"":""NAS:AAPL""}"
s = .responsetext
End With
Debug.Print s
End Sub

Scraping current date from website using Excel VBA

Error
Librarys
I need the date of the current day. I do not want to place it inside a variable to be able to have it work, instead I would like that variable to be Date or in its default String.
Sub WEB()
Dim IE As Object
Dim allelements As Object
Application.ScreenUpdating = False
Set IE = CreateObject("InternetExplorer.Application")
IE.navigate "http://www.fechadehoy.com/venezuela"
Do Until IE.ReadyState = 4
DoEvents
Loop
Application.Wait (Now + TimeValue("0:00:01"))
IE.document.getElementById ("date")
IE.Visible = True
Set IE = Nothing
Application.ScreenUpdating = True
End Sub
The website is http://www.fechadehoy.com/venezuela
I only need the date of this page. I am not interested in any other element of the macro.
I just need to extract the current date and get it in a variable.
if you need Lunes, 19 de agosto de 2019 then use getElementById for fecha
Debug.Print IE.document.getElementById("fecha").Innerhtml
Why go for IE when xhr can do the trick? You can get the date with the blink of an eye if you opt for XMLHttpRequest.
Sub GetCurrentDate()
Dim S$
With New XMLHTTP
.Open "GET", "http://www.fechadehoy.com/venezuela", False
.send
S = .responseText
End With
With New HTMLDocument
.body.innerHTML = S
MsgBox .getElementById("fecha").innerText
End With
End Sub
Reference to add:
Microsoft XML, v6.0
Microsoft HTML Object Library
To get rid of that reference altogether:
Sub GetCurrentDate()
Dim S$
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "http://www.fechadehoy.com/venezuela", False
.send
S = .responseText
End With
With CreateObject("htmlfile")
.body.innerHTML = S
MsgBox .getElementById("fecha").innerText
End With
End Sub
Although the answer given by #Siddharth Rout is perfectly fine, it would require quite a bit of string manipulation to get the date in a usable form.
For the above reason I'm providing an alternative solution which gets the date in a directly usable format, ready to be manipulated and used in further calculations if necessary.
As a bonus I am demonstrating how to get the date using an HTTP request instead of using the Internet Explorer. This makes the code more efficient.
Option Explicit
Sub getDate()
Dim req As New WinHttpRequest
Dim doc As New HTMLDocument
Dim el As HTMLParaElement
Dim key As String
Dim url As String
Dim retrievedDate As Date
url = "http://www.fechadehoy.com/venezuela"
key = "Fecha actual: "
''''''''''Bonus: Use an HTTP request to get the date instead of opening IE'''''''''''
With req '
.Open "GET", url, False '
.send '
doc.body.innerHTML = .responseText '
'Debug.Print .responseText '
End With '
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
For Each el In doc.getElementsByTagName("p")
If el.innerText Like "Fecha actual*" Then
retrievedDate = Mid(el.innerText, InStr(el.innerText, key) + Len(key), Len(el.innerText))
End If
Next el
End Sub
You will need to add a reference to Microsoft HTML Object Libraryand Microsoft WinHTTP Services version 5.1. To do that, go to VB editor>Tools>References.
Having the date in this format, means it can easily be manipulated. An example would be the use of functions like day(retrievedDate) , month(retrievedDate), year(retrievedDate) etc.

how to get inner text of html under id?

I am trying to pull data pull inner text under id in excel cell.
This is for XML code.
Sub getelementbyid()
Dim XMLpage As New MSXML2.XMLHTTP60
Dim hdoc As New MSHTML.HTMLDocument
Dim HBEs As MSHTML.IHTMLElementCollection
Dim HBE As MSHTML.IHTMLElement
Dim ha As String
XMLpage.Open "GET","https://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuote.jsp?symbol=HAL", False
XMLpage.send
hdoc.body.innerHTML = XMLpage.responseText
ha = hdoc.getelementbyid("open").innerText
Range("K11").Value = ha
Debug.Print ha
End Sub
I expect output value, but it shows --.
Examine the response text. There is a difference in the way the page is rendered in the browser versus what is returned in the ResponseText.
I put the URL into a browser went into dev tools (F12), found the element, and noted the numeric value inside the HTML element.
Then I dumped the response text we're getting in VBA into a cell and copied the entire cell value into Notepad++. If you do that you'll see the initial value inside the #open element is indeed "--".
The real value appears to be getting written into the HTML via JavaScript, which is common practice. There is a JSON object at the top of the page, presumably injected into the document from the back-end of the website upon your request.
So you have to parse the JSON, not the HTML. I've provided code doing just that. Now, there may be a better way to do it, I feel this code is kind of "hacky" but it's getting the job done for your example URL.
Sub getelementbyid()
Dim XMLpage As New MSXML2.XMLHTTP60
Dim hdoc As New MSHTML.HTMLDocument
Dim HBEs As MSHTML.IHTMLElementCollection
Dim HBE As MSHTML.IHTMLElement
Dim ha As String
XMLpage.Open "GET", "https://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuote.jsp?symbol=HAL", False
XMLpage.send
'// sample: ,"open":"681.05",
Dim token As String
token = """open"":"""
Dim startPosition As Integer
startPosition = InStr(1, XMLpage.responseText, token)
Dim endPosition As Integer
endPosition = InStr(startPosition, XMLpage.responseText, ",")
Dim prop As String
prop = Mid(XMLpage.responseText, startPosition, endPosition - startPosition)
prop = Replace(prop, """", vbNullString)
prop = Replace(prop, "open:", vbNullString)
Dim val As Double
val = CDbl(prop)
ha = val
Range("K11").Value = ha
Debug.Print ha
End Sub
Here are two methods. 1) Using regex on the return text. Usually frowned upon but perfectly serviceable here. 2) Traditional extract json string and use json parser to parse out value.
The data you want is stored in a json string found both on the webpage and the xmlhtttp response, under the same element:
This means you can treat the html as a string and target just the pattern for the open price using regex as shown below, or parse the xmlhttp request into an html parser, grab the required element, extract its innerText and trim off the whitespace, then pass to a json parser to extract the open price.
In both methods you want to avoid being served cached results so the following header is an important addition to attempt to mitigate for this:
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
There is no need for addtional cell formatting. Full value comes out for both your tickers.
Regex:
It is present in a json string in the response. You can regex it out easily from return text.
Regex explanation:
VBA:
Option Explicit
Public Sub GetClosePrice()
Dim ws As Worksheet, re As Object, p As String, r As String
Set ws = ThisWorkbook.Worksheets("Sheet1")
p = """open"":""(.*?)"""
Set re = CreateObject("VBScript.RegExp")
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuote.jsp?symbol=HAL", False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
If .Status = 200 Then
r = GetValue(re, .responseText, p)
Else
r = "Failed connection"
End If
End With
ws.Range("K11").Value = r
End Sub
Public Function GetValue(ByVal re As Object, ByVal inputString As String, ByVal pattern As String) As String
With re
.Global = True
.pattern = pattern
If .test(inputString) Then ' returns True if the regex pattern can be matched agaist the provided string
GetValue = .Execute(inputString)(0).submatches(0)
Else
GetValue = "Not found"
End If
End With
End Function
HTML and json parser:
This requires installing code for jsonparser from jsonconverter.bas in a standard module called JsonConverter and then going VBE>Tools>References>Add a reference to Microsoft Scripting Runtime and Microsoft HTML Object Library.
VBA:
Option Explicit
Public Sub GetClosePrice()
Dim ws As Worksheet, re As Object, r As String, json As Object
Set ws = ThisWorkbook.Worksheets("Sheet1")
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuote.jsp?symbol=MRF", False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
If .Status = 200 Then
Dim html As HTMLDocument
Set html = New HTMLDocument
html.body.innerHTML = .responseText
Set json = JsonConverter.ParseJson(Trim$(html.querySelector("#responseDiv").innerText))
r = json("data")(1)("open")
Else
r = "Failed connection"
End If
End With
ws.Range("K11").Value = r
End Sub

Excel VBA API call returning specific data only [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I would like to return data to Excel from a website (CoinMarketCap.com) similar to the example below:
VBA - API call displayed in Excel
...except I need to limit the amount of returned data to only specific currencies (not the entire Ticker).
The website requires an API Key (which I have) for these types of calls, but I don't understand how to format the url.
The website provides this info:
Using Your API Key
You may use any server side programming language that can make HTTP requests to target the Professional API. All requests should target domain https://pro-api.coinmarketcap.com.
You can supply your API Key in REST API calls in one of two ways:
Preferred method: Via a custom header named X-CMC_PRO_API_KEY
Convenience method: Via a query string parameter named CMC_PRO_API_KEY
But I can't make heads or tails from that.
Ultimately I would like to have the url return (for instance) 3 currencies, such as BTC, ADA, DASH along with the API Key (which for example purposes is "abc-xyz".
Once I have the structure of the url I can work from there (in VBA), but I'm at a total loss as to how to format the url so that it will return only that specific data.
Public API
You might find it easier to start with the public API; That has nice easy syntax.
You will need to use JSONConverter to parse the JSON response. After downloading and adding to your project you will need to go VBE > Tools > References and add a reference to Microsoft Scripting Runtime.
The following then shows you how to implement a very bare bones class, clsHTTP, to hold the XMLHTTPRequest object and expose a method GetJSON for retrieving the JSON string.
I then give some examples of parsing the JSON response for the specified cryptocurrency and USD.
It is very basic, and you would want to develop this, but I know the documentation for some of these APIs can be difficult to get a handle on.
Note: this is client side. I think the documentation you were referencing is possibly for web application development based on the server side.
Class clsHTTP
Option Explicit
Private http As Object
Private Sub Class_Initialize()
Set http = CreateObject("MSXML2.XMLHTTP")
End Sub
Public Function GetJSON(ByVal URL As String) As String
With http
.Open "GET", URL, False
.send
GetJSON = StrConv(.responseBody, vbUnicode)
'Debug.Print GetJSON
End With
End Function
Standard module:
Option Explicit
'Public API: https://coinmarketcap.com/api/
Public Sub GetTickerInfo()
Const BASE_URL As String = "https://api.coinmarketcap.com/v2/ticker/1/?convert="
Dim tickers(), i As Long, http As clsHTTP, jsonString As String, json As Object
tickers = Array("BTC", "ADA", "DASH")
Set http = New clsHTTP
For i = LBound(tickers) To UBound(tickers)
jsonString = http.GetJSON(BASE_URL & tickers(i))
Set json = JsonConverter.ParseJson(jsonString)("data")("quotes")
Debug.Print tickers(i) & " : " & json(tickers(i))("price")
Debug.Print "USD" & " : " & json("USD")("price")
Set json = Nothing
Next
End Sub
Pro-API
①Specify multi-conversion in string:
The "starter plan" only allows me to specify one conversion option (so can't easily see how to bundle currencies). You might be able to pass the following, if you have an upgraded account, instead of the loop, in my code below.
jsonString = http.GetJSON("https://pro-api.coinmarketcap.com/v1/global-metrics/quotes/latest?convert=BTC,ADA,DASH")
② Get all latest and parse out required symbols:
WARNING: This is expensive in terms of your credits. You want to try and bundle a call if possible. starter account has 200 credits per day.
You can parse out of the JSON, using JSONConverter, what you need as follows:
Class clsHTPP:
Option Explicit
Private http As Object
Private Sub Class_Initialize()
Set http = CreateObject("MSXML2.XMLHTTP")
End Sub
Public Function GetJSON(ByVal URL As String) As String
With http
.Open "GET", URL, False
.setRequestHeader "X-CMC_PRO_API_KEY", "yourAPIkey"
.setRequestHeader "Accept-Encoding", "gzip"
.send
GetJSON = StrConv(.responseBody, vbUnicode)
'Debug.Print GetJSON
End With
End Function
Standard module 1:
Option Explicit
Public Sub GetTickerInfo()
Dim http As clsHTTP, jsonString As String, json As Object, crypto As Object
Set http = New clsHTTP
jsonString = http.GetJSON("https://pro-api.coinmarketcap.com/v1/cryptocurrency/listings/latest?start=1&limit=5000&convert=USD")
Set json = JsonConverter.ParseJson(jsonString)("data") 'collection
For Each crypto In json 'dictionaries within collection
On Error Resume Next
Select Case crypto("symbol")
Case "BTC", "ADA", "DASH"
EmptyDictionary crypto
End Select
On Error GoTo 0
Next
End Sub
Public Sub EmptyDictionary(ByVal dict As Object)
Dim key As Variant
For Each key In dict.keys
Debug.Print key & " : " & dict(key)
Next
Debug.Print vbNewLine
End Sub
Using the class with a different API:
The following API has the kind of syntax I think you are after:
https://min-api.cryptocompare.com/data/price?fsym=BTC&tsyms=ADA,USD,DASH,BTC
You can thus use the class clsHTTP as follows, note that I have dropped the use of JSONConverter, using Split to get the info you want instead. You can easily still use JSONConverter if you wish.
Option Explicit
Private http As Object
Private Sub Class_Initialize()
Set http = CreateObject("MSXML2.XMLHTTP")
End Sub
Public Function GetJSON(ByVal URL As String) As String
With http
.Open "GET", URL, False
.send
GetJSON = StrConv(.responseBody, vbUnicode)
End With
End Function
And the standard module as follows:
Option Explicit
Public Sub GetTickerInfo()
Const BASE_URL As String = "https://min-api.cryptocompare.com/data/price?fsym=BTC&tsyms="
Dim tickers(), http As clsHTTP, jsonString As String, i As Long
tickers = Array("BTC", "ADA", "DASH", "USD")
Set http = New clsHTTP
jsonString = http.GetJSON(BASE_URL & Join$(tickers, ","))
For i = LBound(tickers) To UBound(tickers)
Debug.Print tickers(i) & ":" & Replace$(Split(Split(jsonString, Chr$(34) & tickers(i) & Chr$(34) & ":")(1), ",")(0), "}", vbNullString)
Next
End Sub
Output in immediate window (Ctrl+G):

Excel VBA source code for extracting data from a URL

I want to extract the title of every news item displayed on "http://pib.nic.in/newsite/erelease.aspx?relid=58313" website using Excel VBA. I have written a code using getelementsbyclassname("contentdiv"). But the debugger is showing a error pertaining to that the object doesn't support...I want to extract the information items of every relid..which is there in the URL as well...
Cold scrapes like this are generally handled more efficiently with a XMLHTTP pull. This requires the addition of a few libraries to the VBE's Tools ► References. The code below needs Microsoft XML, v6.0, Microsoft HTML Object library and Microsoft Internet Controls. Might not need the last one but you probably will if you expand the code beyond what is supplied.
Public Const csURL As String = "http://pib.nic.in/newsite/erelease.aspx?relid=×ID×"
Sub scrape_PIBNIC()
Dim htmlBDY As HTMLDocument, xmlHTTP As MSXML2.ServerXMLHTTP60
Dim i As Long, u As String, iDIV As Long
On Error GoTo CleanUp
Set xmlHTTP = New MSXML2.ServerXMLHTTP60
Set htmlBDY = New HTMLDocument
For i = 58313 To 58313
htmlBDY.body.innerHTML = vbNullString
With xmlHTTP
u = Replace(csURL, "×ID×", i)
'Debug.Print u
.Open "GET", u, False
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded; charset=UTF-8"
.send
If .Status <> 200 Then GoTo CleanUp
htmlBDY.body.innerHTML = .responseText
For iDIV = 0 To (htmlBDY.getElementsByClassName("contentdiv").Length - 1)
If CBool(htmlBDY.getElementsByClassName("contentdiv")(iDIV).getElementsByTagName("span").Length) Then
Sheets("Sheet1").Cells(Rows.Count, 1).End(xlUp).Offset(1, 0) = _
htmlBDY.getElementsByClassName("contentdiv")(iDIV).getElementsByTagName("span")(0).innerText
End If
Next iDIV
End With
Next i
CleanUp:
Set htmlBDY = Nothing
Set xmlHTTP = Nothing
End Sub
That should be enough to get you started. The site you are targeting requires that charset=UTF-8 be added to the request. I had no success without it. I strongly suspect that this may have been the source of your object doesn't support error.

Resources