Print each item in a JSON Object as separate row in excel file using VBA - excel

I'm trying to read the json from an URL, and paste the json object items as a separate row in an excel sheet.
But till now, I'm able to fetch the entire json object into the excel in single column A1.
I tried to iterate through the json objects and print in separate rows.
But getting 424 Error : Object Qualifier
Need your help on the same.
Sample Data : {"data":{"id":3,"name":"true red","year":2002,"color":"#BF1932","pantone_value":"19-1664"},"support":{"url":"https://reqres.in/#support-heading","text":"To keep ReqRes free, contributions towards server costs are appreciated!"}}
Private Sub HTML_VBA_Extract_Data_From_Website_To_Excel()
Dim oXMLHTTP As Object
Dim sPageHTML As String
Dim sURL As String
'Change the URL before executing the code. URL to Extract data from.
sURL = "https://reqres.in/api/products/3"
'Extract data from website to Excel using VBA
Set oXMLHTTP = CreateObject("MSXML2.ServerXMLHTTP")
oXMLHTTP.Open "GET", sURL, False
oXMLHTTP.send
sPageHTML = oXMLHTTP.responseText
Dim jsonObject As Object
Set jsonObject = JsonConverter.ParseJson(sPageHTML)
Dim i As Integer
Dim Item As Variant
i = 2
For Each Item In jsonObject.Keys
ThisWorkbook.Sheets(1).Cells(i, 1).Value = Item
i = i + 1
Next
MsgBox "XMLHTML Fetch Completed"
End Sub

This outputs:
data
support
--------
id 3
name true red
year 2002
color #BF1932
pantone_value 19-1664
to the Immediate window, as expected.
Private Sub JsonTester()
Dim oXMLHTTP As Object
Dim sURL As String
Dim jsonObject As Object
Dim Item As Variant, data As Object
sURL = "https://reqres.in/api/products/3"
Set oXMLHTTP = CreateObject("MSXML2.ServerXMLHTTP")
oXMLHTTP.Open "GET", sURL, False
oXMLHTTP.send
Set jsonObject = JsonConverter.ParseJson(oXMLHTTP.responseText)
For Each Item In jsonObject.keys
Debug.Print Item
Next
Debug.Print "--------"
Set data = jsonObject("data")
For Each Item In data.keys
Debug.Print Item, data(Item)
Next
End Sub

Related

Scrape economic data from investing website

I am working on a code to get data from : https://www.investing.com/economic-calendar/core-durable-goods-orders-59
I have got the code for getting this via httprequest: but looking to change this to work for the economic data (link above) is there any way I can get the same for the economic indicators??
code below:
Option Explicit
Sub Export_Table()
'Html Objects---------------------------------------'
Dim htmlDoc As MSHTML.HTMLDocument
Dim htmlBody As MSHTML.htmlBody
Dim ieTable As MSHTML.HTMLTable
Dim Element As MSHTML.HTMLElementCollection
'Workbooks, Worksheets, Ranges, LastRow, Incrementers ----------------'
Dim wb As Workbook
Dim Table As Worksheet
Dim i As Long
Set wb = ThisWorkbook
Set Table = wb.Worksheets("Sheet1")
'-------------------------------------------'
Dim xmlHttpRequest As New MSXML2.XMLHTTP60 '
'-------------------------------------------'
i = 2
'Web Request --------------------------------------------------------------------------'
With xmlHttpRequest
.Open "POST", "https://www.investing.com/instruments/HistoricalDataAjax", False
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
.setRequestHeader "X-Requested-With", "XMLHttpRequest"
.send "curr_id=951681&smlID=1695217&header=CLNX+Historical+Data&st_date=01%2F01%2F2017&end_date=03%2F01%2F2019&interval_sec=Monthly&sort_col=date&sort_ord=DESC&action=historical_data"
If .Status = 200 Then
Set htmlDoc = CreateHTMLDoc
Set htmlBody = htmlDoc.body
htmlBody.innerHTML = xmlHttpRequest.responseText
Set ieTable = htmlDoc.getElementById("curr_table")
For Each Element In ieTable.getElementsByTagName("tr")
Table.Cells(i, 1) = Element.Children(0).innerText
Table.Cells(i, 2) = Element.Children(1).innerText
Table.Cells(i, 3) = Element.Children(2).innerText
Table.Cells(i, 4) = Element.Children(3).innerText
Table.Cells(i, 5) = Element.Children(4).innerText
Table.Cells(i, 6) = Element.Children(5).innerText
Table.Cells(i, 7) = Element.Children(6).innerText
i = i + 1
DoEvents: Next Element
End If
End With
Set xmlHttpRequest = Nothing
Set htmlDoc = Nothing
Set htmlBody = Nothing
Set ieTable = Nothing
Set Element = Nothing
End Sub
Public Function CreateHTMLDoc() As MSHTML.HTMLDocument
Set CreateHTMLDoc = CreateObject("htmlfile")
End Function
I have used the excel tool Power Query for this very thing. It is also called Get & Transform Data. I don't like using vba for doing this type of thing.
To make it work:
In Excel Go to Data>Get Data>From Other Sources>From Web.
Enter the URL
Wait for the webpage to load and then pick the table that you want.
This website took awhile to load, but it did work for me.
Choose "Load" which goes directly to the sheet, or "Transform Data" to manipulate the data in Power Query. There are many options in power query such as split columns, filter data, Calculate Columns and ...
I would avoid the overhead of setting up a permanent connection and simply continue using XHR. With the data > from web, you cannot grab more rows than are present on the initial landing. If however you go with XHR, you can issue POST requests to get more data. The code below utilizes a loop to retrieve additional results beyond the immediate visible on the page.
When you press the Show more link there is a POST request for an additional 6 rows which uses the latest date from the current set of results as part of the POST body. The response returned is JSON. Rather than bringing in a JSON parser, given the standard nature of the JSON, and that I am already using regex to clean the date format in column 1 to put in the POST body, I use two simple regexes to extract the html for the next results table from the response, and to check whether there are more results.
The format of the JSON is:
{
"historyRows": "<tr>…..</tr>",
"hasMoreHistory": "1"
}
Or
{
"historyRows": "<tr>…..</tr>",
"hasMoreHistory": false
}
So, I do some cleaning of the extracted html in order to not confuse the html parser within MSHTML. Furthermore, I add in an id to identify the table I have constructed, so I can continue to use an id css selector (#) list within my UpdateDateResults function.
I initially oversize an array to store each retrieved table which I update ByRef. I loop requesting more results until either there are no more results, there is an error parsing the maximum date from the last retrieved table column 1, or until my specified earliest date for data retrieval falls within the date range of the latest returned table.
Finally, I write the results array out to the sheet in one go.
N.B. You can target the table by its id. It looks like the number at the end of the id could be the same as for the goods url, lending itself to generalizing the code below to work for other goods.
VBA:
Option Explicit
Public Sub GetInvestingInfo()
'tools > references > Microsoft HTML Object Library
Dim html As MSHTML.HTMLDocument, xhr As Object
Const STARTDATE As Date = "2019-11-25" 'Adjust as required. DateAdd("yyyy", -2, Date) 2 years back. This means may have some earlier months in _
batch that spans the start date but won't issue an additional request after this
Set xhr = CreateObject("MSXML2.XMLHTTP")
Set html = New MSHTML.HTMLDocument
With xhr
.Open "GET", "https://www.investing.com/economic-calendar/core-durable-goods-orders-59", False
.setRequestHeader "User-Agent", "Safari/537.36"
.send
html.body.innerHTML = .responseText
End With
Dim firstTable As Boolean, r As Long, results() As Variant
ReDim results(1 To 100000, 1 To 5)
'process initial table and update results, get cleaned date needed for request for more results
firstTable = True
Dim latestDate As String
UpdateDateResults latestDate, results, firstTable, r, html
Dim re As Object, maxDate As String, hasMoreHistory As Boolean, s As String
Set re = CreateObject("VBScript.RegExp")
With re
.Global = True
.MultiLine = False
End With
maxDate = cleanedDate(latestDate, re)
hasMoreHistory = True
Dim errorDate As Date
errorDate = DateAdd("d", 1, Date)
Do While maxDate >= STARTDATE And maxDate < errorDate 'break loop using pre-defined earliest date, error with date conversion, or when no more rows found
Application.Wait (Now + TimeSerial(0, 0, 1)) 'Pause
s = GetMoreRows(xhr, Format$(maxDate, "YYYY-MM-DD")) 'max a POST request for more data
re.Pattern = "hasMoreHistory"":(""?.*?""?)}" 'Check if there are more rows still available. "1" for yes, false for no
hasMoreHistory = (re.Execute(s)(0).submatches(0) <> False)
If Not hasMoreHistory Then Exit Do
re.Pattern = "historyRows"":""(.*)"","
html.body.innerHTML = "<table id=""me"">" & Replace$(re.Execute(s)(0).submatches(0), "\/", "/") & "</table>" 'fix html and feed into html variable
UpdateDateResults latestDate, results, firstTable, r, html
maxDate = cleanedDate(latestDate, re) 'convert value retrieved from last row in date column of table to an actual date
Loop
With ActiveSheet
.Cells.ClearContents
.Cells(1, 1).Resize(r, 5) = results 'Don't bother to resize results as clear all cells before write ou
End With
End Sub
Public Sub UpdateDateResults(ByRef latestDate As String, ByRef results() As Variant, ByRef firstTable As Boolean, ByRef r As Long, ByVal html As MSHTML.HTMLDocument)
Dim table As MSHTML.HTMLTable 'return latest date from function
Set table = html.querySelector("#eventHistoryTable59, #me")
latestDate = table.Rows(table.Rows.Length - 1).Children(0).innerText
Dim i As Long, n As Long, j As Long
n = IIf(firstTable, 0, 1)
For i = n To table.Rows.Length - 1
r = r + 1
For j = 0 To table.Rows(i).Children.Length - 2
results(r, j + 1) = table.Rows(i).Children(j).innerText
Next
Next
firstTable = False
End Sub
Public Function cleanedDate(ByVal dirtyString As String, ByVal re As Object) As Date
re.Pattern = "(^[A-Z][a-z]{2}).*(\d{2}),.(\d{4})(.*)"
On Error GoTo errhand:
If re.test(dirtyString) Then
cleanedDate = CDate(re.Replace(dirtyString, "$2" & Chr$(32) & "$1" & Chr$(32) & "$3"))
Exit Function
End If
errhand:
cleanedDate = DateAdd("d", 1, Date)
End Function
Public Function GetMoreRows(ByVal xhr As Object, ByVal dateStamp As String) As String
With xhr
.Open "POST", "https://www.investing.com/economic-calendar/more-history", False
.setRequestHeader "User-Agent", "Safari/537.36"
.setRequestHeader "x-requested-with", "XMLHttpRequest"
.setRequestHeader "content-type", "application/x-www-form-urlencoded"
.send "eventID=430865&event_attr_ID=59&event_timestamp=" & dateStamp & "+" & Application.WorksheetFunction.EncodeURL("12:30:00") & "&is_speech=0"
GetMoreRows = .responseText
End With
End Function
Regexes (without the double " escaping for VBA):
hasMoreHistory":("?.*?"?)}
historyRows":"(.*)",

Image url extraction using excel vba

Iam working with excel 2016. I need to extract the link of an image from a website using VBA in excel.
Example, i have a website that shows a product with the link : https://www.hikvision.com/en/products/Turbo-HD-Products/Turbo-HD-Cameras/Value-Series/ds-2ce56d0t-vpir3f/
My image is into a div , like that :
<div class="slide-image" style="background-image: url('/content/dam/hikvision/products/HIKVISION/Turbo_HD_Products/Turbo_HD_Cameras/Value_Series/D0T_Series/DS-2CE56D0T-VPIR3F/images/2CE56D0T-半球11-正视图.png.thumb.1280.1280.png');"></div>
I tried this :
Private Sub btnExtractURL_Click()
Dim sourceString As String
Dim rowIdx As Integer, rowMax As Integer
Dim posFirst As Integer, posLast As Integer, chrLength As Integer
rowMax = Range("A3").End(xlDown).Row
' ---
For rowIdx = 3 To rowMax
If Cells(rowIdx, 1).Value <> "" Then
Cells(rowIdx, 2).Value = ""
sourceString = Cells(rowIdx, 1).Value
posFirst = InStr(sourceString, "http")
posLast = InStr(posFirst, sourceString, """")
chrLength = (posLast - 1) - (posFirst - 1)
Cells(rowIdx, 2).Value = Mid(sourceString, posFirst, chrLength)
End If
Next
' ---
MsgBox "finished"
End Sub
But i have an error with this solution... I tried to extract the text to see another method, and it work's but when i insert the class of that image, it doesn't work !
Sub Get_Web_Data()
Dim request As Object
Dim response As String
Dim html As New HTMLDocument
Dim website As String
Dim price As Variant
' Website to go to.
website = "https://www.hikvision.com/en/products/Turbo-HD-Products/Turbo-HD-Cameras/Value-Series/ds-2ce56d0t-vpir3f/"
' Create the object that will make the webpage request.
Set request = CreateObject("MSXML2.XMLHTTP")
' Where to go and how to go there - probably don't need to change this.
request.Open "GET", website, False
' Get fresh data.
request.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
' Send the request for the webpage.
request.send
' Get the webpage response data into a variable.
response = StrConv(request.responseBody, vbUnicode)
' Put the webpage into an html object to make data references easier.
html.body.innerHTML = response
' Get the price from the specified element on the page.
Name = html.getElementsByClassName("prod_name").Item(0).innerText
' Output the price into a message box.
MsgBox Name
End Sub
Can you give an idea to extract this image and copy the link into my excel ?

how to get inner text of html under id?

I am trying to pull data pull inner text under id in excel cell.
This is for XML code.
Sub getelementbyid()
Dim XMLpage As New MSXML2.XMLHTTP60
Dim hdoc As New MSHTML.HTMLDocument
Dim HBEs As MSHTML.IHTMLElementCollection
Dim HBE As MSHTML.IHTMLElement
Dim ha As String
XMLpage.Open "GET","https://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuote.jsp?symbol=HAL", False
XMLpage.send
hdoc.body.innerHTML = XMLpage.responseText
ha = hdoc.getelementbyid("open").innerText
Range("K11").Value = ha
Debug.Print ha
End Sub
I expect output value, but it shows --.
Examine the response text. There is a difference in the way the page is rendered in the browser versus what is returned in the ResponseText.
I put the URL into a browser went into dev tools (F12), found the element, and noted the numeric value inside the HTML element.
Then I dumped the response text we're getting in VBA into a cell and copied the entire cell value into Notepad++. If you do that you'll see the initial value inside the #open element is indeed "--".
The real value appears to be getting written into the HTML via JavaScript, which is common practice. There is a JSON object at the top of the page, presumably injected into the document from the back-end of the website upon your request.
So you have to parse the JSON, not the HTML. I've provided code doing just that. Now, there may be a better way to do it, I feel this code is kind of "hacky" but it's getting the job done for your example URL.
Sub getelementbyid()
Dim XMLpage As New MSXML2.XMLHTTP60
Dim hdoc As New MSHTML.HTMLDocument
Dim HBEs As MSHTML.IHTMLElementCollection
Dim HBE As MSHTML.IHTMLElement
Dim ha As String
XMLpage.Open "GET", "https://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuote.jsp?symbol=HAL", False
XMLpage.send
'// sample: ,"open":"681.05",
Dim token As String
token = """open"":"""
Dim startPosition As Integer
startPosition = InStr(1, XMLpage.responseText, token)
Dim endPosition As Integer
endPosition = InStr(startPosition, XMLpage.responseText, ",")
Dim prop As String
prop = Mid(XMLpage.responseText, startPosition, endPosition - startPosition)
prop = Replace(prop, """", vbNullString)
prop = Replace(prop, "open:", vbNullString)
Dim val As Double
val = CDbl(prop)
ha = val
Range("K11").Value = ha
Debug.Print ha
End Sub
Here are two methods. 1) Using regex on the return text. Usually frowned upon but perfectly serviceable here. 2) Traditional extract json string and use json parser to parse out value.
The data you want is stored in a json string found both on the webpage and the xmlhtttp response, under the same element:
This means you can treat the html as a string and target just the pattern for the open price using regex as shown below, or parse the xmlhttp request into an html parser, grab the required element, extract its innerText and trim off the whitespace, then pass to a json parser to extract the open price.
In both methods you want to avoid being served cached results so the following header is an important addition to attempt to mitigate for this:
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
There is no need for addtional cell formatting. Full value comes out for both your tickers.
Regex:
It is present in a json string in the response. You can regex it out easily from return text.
Regex explanation:
VBA:
Option Explicit
Public Sub GetClosePrice()
Dim ws As Worksheet, re As Object, p As String, r As String
Set ws = ThisWorkbook.Worksheets("Sheet1")
p = """open"":""(.*?)"""
Set re = CreateObject("VBScript.RegExp")
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuote.jsp?symbol=HAL", False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
If .Status = 200 Then
r = GetValue(re, .responseText, p)
Else
r = "Failed connection"
End If
End With
ws.Range("K11").Value = r
End Sub
Public Function GetValue(ByVal re As Object, ByVal inputString As String, ByVal pattern As String) As String
With re
.Global = True
.pattern = pattern
If .test(inputString) Then ' returns True if the regex pattern can be matched agaist the provided string
GetValue = .Execute(inputString)(0).submatches(0)
Else
GetValue = "Not found"
End If
End With
End Function
HTML and json parser:
This requires installing code for jsonparser from jsonconverter.bas in a standard module called JsonConverter and then going VBE>Tools>References>Add a reference to Microsoft Scripting Runtime and Microsoft HTML Object Library.
VBA:
Option Explicit
Public Sub GetClosePrice()
Dim ws As Worksheet, re As Object, r As String, json As Object
Set ws = ThisWorkbook.Worksheets("Sheet1")
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuote.jsp?symbol=MRF", False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
If .Status = 200 Then
Dim html As HTMLDocument
Set html = New HTMLDocument
html.body.innerHTML = .responseText
Set json = JsonConverter.ParseJson(Trim$(html.querySelector("#responseDiv").innerText))
r = json("data")(1)("open")
Else
r = "Failed connection"
End If
End With
ws.Range("K11").Value = r
End Sub

Web Scraping ETFs Daily Data VBA

I'm trying to web scrape some daily info of differents ETFs. I found that https://www.marketwatch.com/ have a accurate info.
The most relevant info is the open Price, outstanding shares, NAV, total assets of the ETF.
Here is the link for IVV US Equity: https://www.marketwatch.com/investing/fund/ivv
I have web scraped with VBA before but the HTML of the pages I had used are different, I don't know if this is because some values of the ETFs (such as Price and Taded Volume) change constantly.
The idea is to create a code to extract relevant info and create a data base to analyze Macroeconomics factor using the ETFs as market indicators of flows between countries, regions, etc...
Mi first approach would be with VBA but after I get more into the data I would like to try with Python (after I get more conffident with it) to automate the webscraping process on a daily basis.
I am open to any suggestion or any other website that could be useful (I have tried with Yahoo Finance and Morningstar and I get the same problema with the HTML code).
This is my poor code:
Sub Get_Data()
Dim ticker As String, enlace As String
ticker = ThisWorkbook.Worksheets("ETFs").Cells(2, 2).Value 'IVV
'link = "https://www.morningstar.com/etfs/arcx/" & ticker & "/quote.html"
'link = "https://finance.yahoo.com/quote/" & ticker & "?p=" & ticker
link = "https://www.marketwatch.com/investing/fund/" & ticker
Application.ScreenUpdating = False
Dim x As Integer
x = ThisWorkbook.Worksheets("ETFs").Cells(Rows.Count, 1).End(xlUp).Row
'Dim i As Integer
'For i = 2 To x
Dim total_net_assets As Variant, open_price As Variant, NAV As Variant, shares_out
Set ie = CreateObject("InternetExplorer.application")
With ie
.Visible = False
.navigate link
While .Busy Or .readyState < 4: DoEvents: Wend
Do
DoEvents
On Error Resume Next
' Here is where I get the problem of not knowing how to reference the values I need because the class name appears repeatedly
total_net_assets = .document.getElementsByClassName("").Value
open_price = .document.getElementByClassName("price").Value
NAV = .document.getElementByClassName("").Value
shares_out = .document.getElementByClassName("kv__value kv__primary ").Value
On Error GoTo 0
Loop
End With
ThisWorkbook.Worksheets("ETFs").Cells(2, 13).Value = total_net_assets
ThisWorkbook.Worksheets("ETFs").Cells(2, 14).Value = NAV
ThisWorkbook.Worksheets("ETFs").Cells(2, 15).Value = open_price
ThisWorkbook.Worksheets("ETFs").Cells(2, 16).Value = shares_out
ie.Quit
'Next i
Application.ScreenUpdating = True
End Sub
Access method:
I use XMLHTTP requests as much faster than opening IE.
Code notes:
The following reads in fund short codes from Sheet1 column A, starting in A2, into an array. You can easily extend this adding more funds into column A.
This array is looped issuing XMLHTTP requests by concatenating the fund code into the BASE_URL variable.
I use a class, clsHTTP, to hold the XMLHTTP object to be efficient - no need to keep creating and destroying the object.
I provide this class with two methods. One to retrieve the target page innerHTML (GetString), and the other to extract the required info if available (GetInfo). I use a dictionary to test if the searched for labels are present. If present I grab the associated value. If not, I have a placeholder vbNullString in the dictionary.
I add each scraped result into a collection called results. At the end I loop this writing out to the sheet. By keeping most of the work in memory this provides for much faster scraping.
Retrieving info from HTML:
The labels e.g. Open, and values come in pairs.
You can generate a nodeList (think collection as with getElementsByClassName) by using querySelectorAll method to apply a class CSS selector to gather the label elements by their class name kv__label. The "." is the class selector.
Set labels = .querySelectorAll(".kv__label") '<== nodeList of labels
You do the same to get the associated values:
Set values = .querySelectorAll(".kv__value.kv__primary") '<== nodeList of associated values. Same length as labels nodeList so can use same index to retrieve associated label/value pairs from each nodeList.
You loop the labels using the dictionary in the clsHTTP method .GetInfo to see if you searched for labels are present, if they are, the associated value is retrieved from values by using the same index as where the label was found in the nodeList labels, and the dictionary vbNullString value for that label is updated with the actual retrieved value, else it is left as vbNullString.
Sample results:
VBA:
Class module clsHTTP:
Option Explicit
Private http As Object
Private Sub Class_Initialize()
Set http = CreateObject("MSXML2.XMLHTTP")
End Sub
Public Function GetString(ByVal url As String) As String
Dim sResponse As String
With http
.Open "GET", url, False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
sResponse = StrConv(.responseBody, vbUnicode)
GetString = sResponse
End With
End Function
Public Function GetInfo(ByVal html As HTMLDocument) As Object
Dim dict As Object, i As Long
Set dict = CreateObject("Scripting.Dictionary")
dict.Add "Open", vbNullString
dict.Add "Shares Outstanding", vbNullString
dict.Add "Total Net Assets", vbNullString
dict.Add "NAV", vbNullString
Dim values As Object, labels As Object
With html
Set values = .querySelectorAll(".kv__value.kv__primary")
Set labels = .querySelectorAll(".kv__label")
For i = 0 To labels.Length - 1
If dict.Exists(labels.item(i).innerText) Then dict(labels.item(i).innerText) = values.item(i).innerText
Next
End With
Set GetInfo = dict
End Function
Standard module 1:
Option Explicit
Public Sub GetFundInfo()
Dim sResponse As String, html As HTMLDocument, http As clsHTTP, i As Long
Dim headers(), funds(), url As String, results As Collection, ws As Worksheet
Const BASE_URL As String = "https://www.marketwatch.com/investing/fund/"
Application.ScreenUpdating = False
headers = Array("Open", "Shares Outstanding", "Total Net Assets", "NAV")
Set results = New Collection
Set http = New clsHTTP
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set html = New HTMLDocument
funds = Application.Transpose(ws.Range("A2:A3").Value) '<== Change the range here to the single column range containing your dotNums.
For i = LBound(funds) To UBound(funds)
If Not IsEmpty(funds(i)) Then
url = BASE_URL & funds(i)
html.body.innerHTML = http.GetString(url)
results.Add http.GetInfo(html).Items
End If
Next
If results.Count > 0 Then
Dim item As Variant, r As Long, c As Long
r = 2: c = 2
With ws
.Cells(1, c).Resize(1, UBound(headers) + 1) = headers
For Each item In results
.Cells(r, c).Resize(1, UBound(item) + 1) = item
r = r + 1
Next
End With
End If
Application.ScreenUpdating = True
End Sub
Set-up:
Without using a class:
Option Explicit
Public Sub GetFundInfo()
Dim sResponse As String, html As HTMLDocument, i As Long
Dim headers(), funds(), url As String, results As Collection, ws As Worksheet
Const BASE_URL As String = "https://www.marketwatch.com/investing/fund/"
Application.ScreenUpdating = False
headers = Array("Open", "Shares Outstanding", "Total Net Assets", "NAV")
Set results = New Collection
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set html = New HTMLDocument
funds = Application.Transpose(ws.Range("A2:A3").Value) '<== Change the range here to the single column range containing your dotNums.
For i = LBound(funds) To UBound(funds)
If Not IsEmpty(funds(i)) Then
url = BASE_URL & funds(i)
html.body.innerHTML = GetString(url)
results.Add GetInfo(html).Items
End If
Next
If results.Count > 0 Then
Dim item As Variant, r As Long, c As Long
r = 2: c = 2
With ws
.Cells(1, c).Resize(1, UBound(headers) + 1) = headers
For Each item In results
.Cells(r, c).Resize(1, UBound(item) + 1) = item
r = r + 1
Next
End With
End If
Application.ScreenUpdating = True
End Sub
Public Function GetString(ByVal url As String) As String
Dim http As Object
Set http = CreateObject("MSXML2.XMLHTTP")
Dim sResponse As String
With http
.Open "GET", url, False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
sResponse = StrConv(.responseBody, vbUnicode)
GetString = sResponse
End With
End Function
Public Function GetInfo(ByVal html As HTMLDocument) As Object
Dim dict As Object, i As Long
Set dict = CreateObject("Scripting.Dictionary")
dict.Add "Open", vbNullString
dict.Add "Shares Outstanding", vbNullString
dict.Add "Total Net Assets", vbNullString
dict.Add "NAV", vbNullString
Dim values As Object, labels As Object
With html
Set values = .querySelectorAll(".kv__value.kv__primary")
Set labels = .querySelectorAll(".kv__label")
For i = 0 To labels.Length - 1
If dict.Exists(labels.item(i).innerText) Then dict(labels.item(i).innerText) = values.item(i).innerText
Next
End With
Set GetInfo = dict
End Function
Okay, so you will need to create two loops. You can just keep reusing the elem0, elem1, and elemColl(1) variables for each price point you need - just make sure to reset bFoundIt to False for each new iteration so you do not exit the For Loops early.
For your total_net_assets var, you will first loop the class of kv__item. You will then need to loop each class collection of kv__label within the kv__item's elements and stop when you match the innerText: Total Net Assets. Once you match this, you will use the first coll obj elem0 to get the kv__value kv__primary class name for it.
Dim IE As Object, elem0 As Object, elem1 As Object, i As Long, bFoundIt As Boolean
Set IE = CreateObject("InternetExplorer.application")
With IE
.Visible = False
.navigate link
While .Busy Or .readyState < 4: DoEvents: Wend
DoEvents
bFoundIt = False
For Each elem0 In .document.getElementsByClassName("kv__item")
For Each elem1 In elem0.getElementsByClassName("kv__label")
If elem1.innerText = "Total Net Assets" Then
bFoundIt = True
total_net_assets = elem0.getElementsByClassName("kv__value kv__primary ")(0).innerText
Exit For
End If
Next elem1
If bFoundIt Then Exit For
Next elem0

Get data from website [duplicate]

This question already has an answer here:
Get data from listings on a website to excel VBA
(1 answer)
Closed 9 years ago.
<span itemprop="streetAddress">
**94 Grand St**
</span>
how to get this data through getelementby method in excel vba
I have tried getelementbyid, getelementbyname etc. but nothing is working
Option Explicit
Sub find()
'Uses late binding, or add reference to Microsoft HTML Object Library
' and change variable Types to use intellisense
Dim ie As Object 'InternetExplorer.Application
Dim html As Object 'HTMLDocument
Dim Listings As Object 'IHTMLElementCollection
Dim l As Object 'IHTMLElement
Dim r As Long
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = False
.Navigate "http://www.yelp.com/biz/if-boutique-new-york#query:boutique"
' Don't show window
'Wait until IE is done loading page
Do While .readyState <> 4
Application.StatusBar = "Downloading information, Please wait..."
DoEvents
Loop
Set html = .Document
End With
Set Listings = html.getElementsByTagName("span") ' ## returns the list
MsgBox (Listings(0))
For Each l In Listings
'## make sure this list item looks like the listings Div Class:
' then, build the string to put in your cell
Range("A1").Offset(r, 0).Value = l.innerText
r = r + 1
Next
Set html = Nothing
Set ie = Nothing
End Sub
The above program is used by me to get the innerText value inside the span tag... but its not working
For the single result you are looking for in detail you want to use these two lines in your code (there is only 1 listing at the detailed level)
Adapt your IE code
Set Listings = html.getElementbyid("bizInfoBody") ' ## returns the list
Range("A1").Offset(r, 0).Value = Listings.innerText
with XMLHTTP
Sub GetTxt()
Dim objXmlHTTP As Object
Dim objHtmlDoc As Object
Dim objHtmlBody As Object
Dim objTbl As Object
Dim strResponse As String
Dim strSite As String
Set objHtmlDoc = CreateObject("htmlfile")
Set objHtmlBody = objHtmlDoc.body
Set objXmlHTTP = CreateObject("MSXML2.XMLHTTP")
strSite = "http://www.yelp.com/biz/if-boutique-new-york"
With objXmlHTTP
.Open "GET", strSite, False
.Send
If .Status = 200 Then
strResponse = .responseText
objHtmlBody.innerHTML = objXmlHTTP.responseText
Set objTbl = objHtmlBody.Document.getElementbyid("bizInfoBody")
MsgBox objTbl.innerText
End If
End With
End Sub

Resources