Web scraping with excel VBA returns no values for certain sites - excel

I'm trying to scrape data from Oslo Børs related to 3months NIBOR using excel VBA. However my code do not return any values back to excel. The same code is able to retrieve table data from other similar websites, but could it be related to the ui-view set-up of the html language? I'm not very skilled on html language, so I do come to short here. Any help would be much appreciated. Thanks
Sub NIBOR3M_oslobors()
Dim ie As InternetExplorer
Dim InternetExplorer As Object
Dim strURL As String
Dim ws As Worksheet
Dim rng As Range
Dim tbl As Object
Dim rw As Object
Dim cl As Object
Dim nextrow As Long
Dim I As Long
strURL = "https://www.oslobors.no/markedsaktivitet/#/details/NIBOR3M.NIBOR/overview"
Set ie = CreateObject("InternetExplorer.Application")
With ie
.navigate strURL
Do Until .READYSTATE = 4: DoEvents: Loop
Do While .Busy: DoEvents: Loop
Set doc = ie.Document
Set ws = Worksheets("Interest rates")
For Each tbl In doc.getElementsByTagName("table")
nextrow = nextrow + 1
Set rng = ws.Range("M8:M8" & nextrow)
For Each rw In tbl.Rows
For Each cl In rw.Cells
rng.Value = rw.innerText
Set rng = rng.Offset(, 1)
I = I + 1
Next cl
nextrow = nextrow + 1
Set rng = rng.Offset(1, -I)
I = 0
Next rw
Next tbl
.Quit
End With
ActiveSheet.Range("M8").Select
End Sub

The tables you are trying to scrape are generated by scripts, so you will not find them in the HTML document. To get your hands on the data you will need to send an XHR request.
To find out how this request should look like, you have to inspect the network traffic when the page is loaded. You can do that through your browser's developer tools (Ctrl+Shift+E if you're using Firefox).
There you will see several requests of different types being sent. In your case, you should look for XHR requests of type json. There's a few of them. If you go through the responses of these requests, you will eventually find the one you need, as it will contain the data you want.
Here's how it looks like:
If you go through the Headers and the Params of the request you will see how the url, the body and the headers should look like. In this particular case, all the parameters are encoded into the url and the headers are not essential to the success of the request, so all you need is the url.
The response's payload is in json format. You can inspect its structure using a tool like this. Here's how it looks like:
To parse a response like that you need a parser (look at the end of this post).
TL;DR :
Here's how to get the data from the "Nibor 3 month" table:
Option Explicit
Sub oslobors()
Dim req As New WinHttpRequest
Dim respJSON As Object
Dim key As Variant
Dim url As String
Dim results()
Dim i As Integer
Dim sht As Worksheet
Set sht = ThisWorkbook.Worksheets("Name of your worksheet")
url = "https://www.oslobors.no/ob/servlets/components?type=nibor&source=feed.index.nibor.INDICES_FIXING&view=DELAYED&columns=ITEM%2C+CLOSENZ%2C+TRADE_TIME%2C+CLOSE_DATE%2C+CHANGENZ%2C+CHANGENZ_PCT&filter=ITEM_SECTOR%3D%3DsNIBOR3M.NIBOR&channel=e6da3e88c3464b7ad9620824b8d39c95"
With req
.Open "GET", url, False
.send
Set respJSON = JsonConverter.ParseJson(.responseText)
End With
ReDim results(1 To respJSON("rows")(1)("values").Count, 1 To 2)
i = 1
For Each key In respJSON("rows")(1)("values").Keys
results(i, 1) = key
results(i, 2) = respJSON("rows")(1)("values")(key) 'write the results in an array
i = i + 1
Next key
sht.Cells(1, "A").Resize(UBound(results, 1), UBound(results, 2)) = results 'Print the results array in one go. The results will be printed to an area/range starting with cell A1 and expanding as much as it is needed
End Sub
For demonstration purposes, the data is printed in your immediate window:
You will need to add the following references to your project (VBE>Tools>References):
Microsoft WinHTTP Services version 5.1
Microsoft HTML Objects Library
Microsoft Scripting Runtime
You will also need to add this JSON parser to your project. Follow the installation instructions in the link and you should be set to go.
To get the data from the "Historical Prices" table you just have to change the url to this:
url = "https://www.oslobors.no/ob/servlets/components?type=table&source=feed.index.nibor.INDICES_FIXING&view=DELAYED&columns=CHANGE_1WEEK_PCT%2C+HIGH_1WEEK%2C+LOW_1WEEK%2C+CHANGE_1MONTH_PCT%2C+HIGH_1MONTH%2C+LOW_1MONTH%2C+CHANGE_YEAR_PCT%2C+HIGH_YEAR%2C+LOW_YEAR%2C+CHANGE_1YEAR_PCT%2C+HIGH_1YEAR%2C+LOW_1YEAR&filter=ITEM_SECTOR%3D%3DsNIBOR3M.NIBOR&channel=283044a7d182ca196a16337ba79f089c"
EDIT
Modified the code to print the results in a worksheet named "Name of your worksheet".

Related

How to scrape location names from a website using VBA

I am trying to scrape the pickup branch locations from a car rental website home page. The idea is to see exactly where pickup branches exist for a given company.
I have successfully created this before but this company revamped their website recently and now my code doesn't work. The branch locations seem to be hidden within a form of some sort, the locations only become visible in the html once you click on the pickup location space.
My current code looks as below:
Option Explicit
Private Sub pickuplocations()
Dim html As Object
Dim ws As Worksheet
Dim headers()
Dim i As Long
Dim r As Long
Dim c As Long
Dim numrows As Long
Set ws = ThisWorkbook.Worksheets("Europcar Branches(2)")
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.europcar.co.za", False
.send
html.body.innerHTML = .responseText 'fetches all html from the website
Dim pickupbranches As Object
Dim pickupbranchresults()
Set pickupbranches = html.getElementById("_location-search-widget_15").getElementsByTagName("span")
headers = Array("Pickup Location", "Option value") 'for the ws
numrows = pickupbranches.Length - 1 'sets the row length
ReDim pickupbranchresults(1 To numrows, 1 To 2) 'sets array size for the results
For i = 1 To numrows
pickupbranchresults(i, 1) = pickupbranches.Item(i).innerText
pickupbranchresults(i, 2) = pickupbranches.Item(i).Value
Next
With ws
.Cells(1, 1).Resize(1, UBound(headers) + 1) = headers 'sets the column headers
.Cells(2, 1).Resize(UBound(pickupbranchresults, 1), UBound(pickupbranchresults, 2)) = pickupbranchresults
End With
End With
End Sub
Your current code requests the source HTML code and attempts to scrape it.
However, as explained in the comments, the list of locations is dynamically loaded when you click on the search bar and it is not a part of the page's source HTML. For this reason your code will yield no results.
It makes more sense to scrape the page dedicated to the locations:
https://www.europcar.co.za/rental-locations/
Now, if you navigate to this page and inspect the network traffic in your browser's developer tools (F12) when the page is loaded, you will see that an XHR request is being sent. It looks like so:
If you go through the Headers and the Params of the request you will see how the url, the body and the headers should look like. In this particular case, there are no parameters and the headers are not essential to the success of the request, so all you need is the url.
The response's payload is in json format. You can inspect its structure using a tool like this. Here's how it looks like:
Basically, the JSON consists of the different countries, each country consists of provinces and each province consists of the corresponding branches. Each branch consists of all the corresponding info.
To parse a response like that you need a JSON parser (look at the end of this post).
TL;DR
Here's how the code should look like:
Option Explicit
Sub getLocations()
Dim req As New WinHttpRequest
Dim url As String, results() As String
Dim sht As Worksheet
Dim responseJSON As Object, country As Object, province As Object, branch As Object
Dim i As Long
Dim rng As Range
Set sht = ThisWorkbook.Worksheets("Name of your Worksheet")
url = "https://www.europcar.co.za/api/rentalLocations/impressLocations"
With req
.Open "GET", url, False
.send
Set responseJSON = JsonConverter.ParseJson(.responseText)
End With
For Each country In responseJSON
For Each province In country("provinces")
i = 0
ReDim results(1 To province("branches").Count, 1 To 5)
For Each branch In province("branches")
i = i + 1
results(i, 1) = country("name")
results(i, 2) = province("name")
results(i, 3) = branch("name")
results(i, 4) = branch("emailAddress")
results(i, 5) = branch("contactNumber")
Next branch
With sht
Set rng = .Range("A" & .Rows.Count).End(xlUp).Offset(1, 0)
End With
rng.Resize(UBound(results, 1), UBound(results, 2)) = results
Next province
Next country
End Sub
For demonstration purposes the code above prints out the results in the following way:
Having in mind the JSON structure and the sample code I provided, you can easily modify it to fit your needs.
For the code to work you will need to add the following references to your project (VBE>Tools>References):
1. Microsoft WinHTTP Services version 5.1
2. Microsoft Scripting Runtime
You will also need to add this JSON parser to your project. Follow the installation instructions in the link and you should be set to go.

I have questions about how to click and search on web using vba

I have questions about how to click and search on web using vba.
I have wrote the code, but cannot find how to click the button in this web
Sub LEISearch()
'dimension (declare or set aside memory for) our variables
Dim objIE As InternetExplorer 'special object variable representing the IE browser
Dim LEI As HTMLLinkElement 'special object variable for an <a> (link) element
Dim y As Integer 'integer variable we'll use as a counter
Dim result As String 'string variable that will hold our result link'
Dim result2 As String
'initiating a new instance of Internet Explorer and asigning it to objIE
Set objIE = New InternetExplorer
'make IE browser visible (False would allow IE to run in the background)
objIE.Visible = True
'navigate IE to this web page (a pretty neat search engine really)
objIE.navigate "https://www.gmeiutility.org/search.jsp?keyWord"
'wait here a few seconds while the browser is busy
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
'in the search box put cell "A2" value, the word "in" and cell "C1" value
objIE.document.getElementById("searchInput").Value = _
Sheets("Macro1").Range("A1").Value
'click the 'go' button
Set LEIButton = objIE.document.getElementsByClassName("hiddenSubmitButton")
LEIButton.Focus
LEIButton.Click
'wait again for the browser
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
End Sub
This case is a really nice and clean example of web scraping so i will take this opportunity to present an educational post.
I highly recommend avoiding using IE to scrape websites whenever it's possible. It's highly inefficient. Especially in a case like this where there can be multiple pages of results. Instead, you can use HTTP requests.
An HTTP request is a structured way to request something from a server. In this case we want to send a keyword to the server and get the corresponding search results.
To find out how this request should look like, you have to inspect the network traffic when the button with the magnifying glass is clicked. You can do that through your browser's developer tools (Ctrl+Shift+E if you're using Firefox):
If you go through the Headers and the Params of the request you will see how the url, the body and the headers should look like. In this particular case, all the parameters are encoded into the url and the headers are not essential to the success of the request, so all you need is the url.
Some of the parameters of the url are the keyword, the number of results per page and the number of page.
The response's payload is in json format. You can inspect its structure using a tool like this. Here's how it looks like:
Basically the JSON response consists of as many results as you have specified that should be displayed per page (or less). To get the next page you need to send a new request with the same keyword but specifying a new page number and so on.
In fact, as you can see the website offers a lot more data than what's displayed on your browser, which could prove to be useful.
The code below searches for the keyword test, while requesting 25 results per page. One first request is sent to find out how many pages of results are there and then the code loops through all pages and prints the results in a worksheet.
TL;DR
Option Explicit
Sub main()
Dim sht As Worksheet
Dim totalNumberOfPages As Long
Dim searchResults As Object
Dim pageNumber As Long
Dim results() As String
Dim entity As Object
Dim i As Long, j As Long
Dim rng As Range
Set sht = ThisWorkbook.Worksheets("Name of your Worksheet")
''''''First request to find out the number of pages''''''
Set searchResults = XHRrequest("test", 25, 1) '
totalNumberOfPages = searchResults("totalPages") '
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''
'''''''''''''''''''Loop through all the pages''''''''''''''''''''''''''''''''
For pageNumber = 1 To totalNumberOfPages Step 1 '
Set searchResults = XHRrequest("test", 25, pageNumber) '
ReDim results(1 To searchResults("entitySearchResult").Count, 1 To 7) '
i = 0 '
'''''''''''write the results in an array'''''''''''''''''''''''''''' '
For Each entity In searchResults("entitySearchResult") ' '
i = i + 1 ' '
results(i, 1) = entity("LEINumber") ' '
results(i, 2) = entity("legalName") ' '
results(i, 3) = entity("city") ' '
results(i, 4) = entity("headquartersCountry") ' '
results(i, 5) = entity("recordStatus") ' '
results(i, 6) = entity("renewalStatus") ' '
results(i, 7) = entity("entityStatus") ' '
Next entity ' '
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' '
'''''''''''''''write all the results in the worksheet in one go''''' '
With sht ' '
Set rng = .Range("A" & .Rows.Count).End(xlUp).Offset(1, 0) ' '
End With ' '
rng.Resize(UBound(results, 1), UBound(results, 2)) = results ' '
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' '
Next pageNumber '
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
End Sub
Public Function XHRrequest(ByVal key As String, ByVal resultsPerPage As Long, ByVal pageNumber As Long) As Object
Dim req As New WinHttpRequest
Dim url As String
url = "https://www.gmeiutility.org/actions/Search/?isPendingValidationChecked=true&isSearchAllLOUChecked=true&keyWord=" & key & "&page=" & pageNumber & "&resultsPerPage=" & resultsPerPage & "&searchType=baseSearch" 'build the URL according to the parameters
'''''''''Send the HTTP request'''''''''''''''''''''''''''''''
With req '
.Open "POST", url, False '
.send '
Set XHRrequest = JsonConverter.ParseJson(.responseText) '
End With '
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
End Function
For demonstration purposes the code above prints all the data in worksheet named Name of your Worksheet.
If you need to perform multiple searches you can easily modify the code to best fit your needs. More specifically you can loop through multiple keywords and call the XHRrequest function using those keywords instead of "test".
Here's a sample of the output:
You will need to add the following references to your project (VBE>Tools>References):
Microsoft WinHTTP Services version 5.1
Microsoft HTML Objects Library
Microsoft Scripting Runtime
You will also need to add this JSON parser to your project. Follow the installation instructions in the link and you should be set to go.

Unable to "click" Span element in VBA/IE

I'm trying to automate a web scrape of the following URL: https://www.forebet.com/en/football-predictions-from-yesterday
I have code that pulls all of the data, however there is a More+ "button" which expands the list, and I can't seem to get a handle on how to send the click via VBA, as it's not a button and so I can't send use .Click
The HTML snippet for the section:
<tr id="mrows" style="height:57px;">
<td colspan="12"><span onclick="ltodrows("1x2","-1")">More [+]</span>
<div class="loader"></div>
</td>
</tr>
I've tried numerous ways of submitting the Click - but one have worked!
Start of my code:
Sub Button_More_Test()
Dim objIE As InternetExplorer
Dim itemEle As Object
'Dim e As Object
Set objIE = New InternetExplorer
objIE.Visible = True
objIE.Navigate "https://www.forebet.com/en/football-predictions-from-yesterday"
Do While objIE.Busy = True Or objIE.ReadyState <> 4: DoEvents: Loop
I see this has been answered but I also see that the total number of games is 586.
Now, don't get me wrong, it's perfectly fine to scrape the data using IE and the HTML document but I can only imagine what a nightmare it must be in terms of efficiency...
That's why I will post a method to circumvent IE and the More + button and all that.
More specifically the webpage provides a very convenient way to get the data in JSON format via an HTTP request.
To find out how this request should look like, you have to inspect the network traffic when the More+ button is clicked. You can do that through your browser's developer tools (Ctrl+Shift+E if you're using Firefox):
If you go through the Headers and the Params of the request you will see how the url, the body and the headers should look like. In this particular case, all the parameters are encoded into the url and the headers are not essential to the success of the request, so all you need is the url.
The response's payload is in json format. You can inspect its structure using a tool like this. Here's how it looks like:
Basically the JSON consists of the 586 games, each one of which consists of 41 parameters with their corresponding values.
TL;DR
Here's how the code should look like:
Option Explicit
Sub forebet()
'''''''Declarations''''''''''''''''''''''''''''''''''''''''''
Dim sht As Worksheet '
Dim req As New WinHttpRequest '
Dim jsonResponse As Object, game As Object '
Dim key As Variant '
Dim url As String '
Dim headers() As String, results() As String '
Dim i As Long, j As Long, lastRow As Long, lastCol As Long '
Dim rng As Range '
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Set sht = ThisWorkbook.Worksheets("Name of your Worksheet") 'store your worksheet in a variable
url = "https://www.forebet.com/scripts/getrs.php?ln=en&tp=1x2&in=-1"
'''''''''''HTTP request''''''''''''''''''''''''''''''''''''''''''
With req '
.Open "GET", url, False '
.send '
Set jsonResponse = JsonConverter.ParseJson(.responseText) '
End With '
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
ReDim headers(1 To 1, 1 To jsonResponse(1).Count)
ReDim results(1 To jsonResponse.Count, 1 To jsonResponse(1).Count)
''''''''Write the headers to an array''''''''
i = 1 '
For Each key In jsonResponse(1).Keys '
headers(1, i) = key '
i = i + 1 '
Next key '
'''''''''''''''''''''''''''''''''''''''''''''
''''''''''''Write the data to an array'''''''''''''''''''''''''''''''''''''''
j = 1 '
For Each game In jsonResponse '
i = 1 '
For Each key In game.Keys '
If game(key) <> "null" Then '
results(j, i) = game(key) '
i = i + 1 '
Else '
results(j, i) = "NULL" '
i = i + 1 '
End If '
Next key '
j = j + 1 '
Next game '
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
''''''''Write the headers and results arrays to the worksheet in one go''''''''''''''''''''''''''
sht.Range("A1").Resize(1, UBound(headers, 2)) = headers '
sht.Range("A2").Resize(UBound(results, 1), UBound(results, 2)) = results '
'sht.Cells.Value = sht.Cells.Value this line causes an out of memory error on 32 bit office ' '
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
'''''''''''Prevent numbers form being stored as text''''''''''''
With sht '
lastRow = .Cells(.Rows.Count, 1).End(xlUp).Row '
lastCol = .Cells(1, .Columns.Count).End(xlToLeft).Column '
Set rng = Range(.Range("A1"), .Cells(lastRow, lastCol)) '
End With '
Debug.Print rng.Address '
rng.Value = rng.Value '
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
End Sub
In fact, as you can see the website offers a lot more data, which could prove to be useful. For demonstration purposes the code above prints all the data in worksheet named Name of your Worksheet.
Here's a sample of the output:
You will need to add the following references to your project (VBE>Tools>References):
Microsoft WinHTTP Services version 5.1
Microsoft HTML Objects Library
Microsoft Scripting Runtime
You will also need to add this JSON parser to your project. Follow the installation instructions in the link and you should be set to go.
You can loop through elements of the website like so:
For Each ele In objIE.Document.getElementsByTagName("span")
If ele.innerText = "More [+]" Then
ele.Click
Exit For
End If
Next
You'll have to wait until it loads, IE.state won't change but you get body before click, and loop until it changes, for example.

Is there a way to slow down a Web Scraper so it will pick up the code?

I wrote a macro to go to WU to get historical data and for the most part, it works. However, I believe that the macro is running too fast for it to pick up the data from the website.
https://www.wunderground.com/history/daily/us/tx/el-paso/KELP/date/2017-1-3
Is the website and the table I want to get is tablesaw-sortable.
I have tried the following: DoEvents and Application.Wait (Now + TimeValue("00:00:01")) to try to slow down the process.
Sub BrowseToWU()
Dim IE As New SHDocVw.InternetExplorer
Dim HTMLDoc As MSHTML.HTMLDocument
Dim RowAddress As Integer
Dim WebAddress As String
Dim DateSheet As Date
Dim WkDay As Integer
Dim DateSheetName As String
'Application.ScreenUpdating = False
'Application.StatusBar = True
RowAddress = 2
IE.Visible = True
Do Until RowAddress = 60
WebAddress = Range("A" & RowAddress)
DateSheet = Right(WebAddress, 8)
DateSheetName = Right(WebAddress, 8)
WkDay = Weekday(DateSheet, vbSunday)
If WkDay < 3 Then
RowAddress = RowAddress + 1
ElseIf WkDay > 6 Then
RowAddress = RowAddress + 1
Else
IE.Navigate WebAddress
Do While IE.ReadyState <> READYSTATE_COMPLETE
Loop
Set HTMLDoc = IE.Document
DoEvents
Application.Wait (Now + TimeValue("00:00:05"))
DoEvents
ProcessHTMLPage HTMLDoc
DateSheet = Right(WebAddress, 8)
DoEvents
Application.Wait (Now + TimeValue("00:00:01"))
ActiveSheet.Name = DateSheetName
DoEvents
RowAddress = RowAddress + 1
'IE.Quit
Worksheets("Sheet1").Activate
End If
Loop
End Sub
Option Explicit
Sub ProcessHTMLPage(HTMLPage As MSHTML.HTMLDocument)
Dim HTMLTable As MSHTML.IHTMLElement
Dim HTMLTables As MSHTML.IHTMLElementCollection
Dim HTMLRow As MSHTML.IHTMLElement
Dim HTMLCell As MSHTML.IHTMLElement
Dim RowNum As Long, ColNum As Integer
'Dim IE As New SHDocVw.InternetExplorer
'Dim Ws As Worksheet
Set HTMLTables = HTMLPage.getElementsByClassName("tablesaw-sortable")
'DoEvents
For Each HTMLTable In HTMLTables
Worksheets.Add
DoEvents
Range("A1").Value = HTMLTable.className
Range("B1").Value = Now
RowNum = 2
For Each HTMLRow In HTMLTable.getElementsByTagName("tr")
'Debug.Print vbTab & HTMLRow.innerText
ColNum = 1
For Each HTMLCell In HTMLRow.Children
Cells(RowNum, ColNum) = HTMLCell.innerText
ColNum = ColNum + 1
Next HTMLCell
RowNum = RowNum + 1
Next HTMLRow
Next HTMLTable
DoEvents
'IE.Quit
End Sub
The macro is supposed to run through sheet1 picking up the web address to the historical data if it satisfies the criteria of being a certain day of the week.
IE will open and then it will kick over to the next module that will take in the data.
A new worksheet is created and the data pasted into the new worksheet.
The worksheet is renamed to the date of the data.
The web address sheet is activated again and the process starts over again.
The error I get is that the data isn't taken from the website, so the For statement ends and the web address sheet is renamed and an error occurs.
One way around this is to call the API that the page is using to get that info.
The API returns json which you can parse with a json parser. I use jsonconverter.bas. After installing the code from that link in a standard module called JsonConverter, go to VBE > Tools > References > Add a reference to Microsoft Scripting Runtime.
Finding the API:
If you press F12 to open developer tools and go to the Network tab and then press F5 to refresh any url of interest you will see the recorded web traffic. You can find the API call there.
See my answer here on how to search the network traffic using a specific observation value you expect to see in the response - this will filter the list of network traffic to those items containing the value of interest. Be sensible in selecting the value - you want something unlikely to occur elsewhere. You can also filter the network traffic to XHR only.
The API response:
The API returns json. More specifically, it returns a dictionary containing 2 keys. The second key, "observations", can be used to return a collection (denoted by []) of dictionaries (denoted by {}).
Each dictionary represents a row of the table (daily observations). You can loop this collection, and then loop the inner dictionaries, to access the table row values and reconstruct the table by populating an array. Explore example json response here.
Explanation of json structure:
click here to enlarge
Explanation of code:
The code is broken down into a number of helper subs and functions, allocating certains tasks to each, to
make code easier to debug and follow, as well as better align with Object Oriented Programming Principles.
Overall the process is:
Gather urls for Worksheet("Sheet1"). Helper function GetAllUrls.
Process those urls and only retain the dates which correspond with Tue-Thur. These are kept as strings formatted as "yyyymmdd" so can be passed to API later. This is handled by helper functions GetOnlyQualifyingUrlsDates and IncludeThisDate. IncludeThisDate performs the check for whether to include; GetOnlyQualifyingUrlsDates handles the looping and formatting of results.
Issue xmlhttp requests by looping over qualifying url dates and concatenating those into the url for the API call, then issuing the request. This is performed by the main sub GetTables.
Sheet creation, for output, is handled by helper function CreateWorksheet. This function calls another helper function, SheetExists, to ensure sheets are only created if they don't already exist, otherwise, the existing sheet by that name is used.
The resultant json response, from step 3, is passed to a helper sub WriteOutResults which accepts the json variable and the output sheet object as arguments. It extracts all the info from the json response; essentially reconstructing the table. It adds the table and headers to the appropriate sheet.
It calls helper function Epoch2Date, which handles the unix timestamp to datetime conversion for the two unix fields in the json object.
TODO:
The API key may be time limited. Add a helper function which returns the current valid key.
The API accepts start date and end date parameters in the url construct. It would be far better to issue one request for the entire range if possible, or chunked ranges e.g. months, to reduce the number of requests made. This would also reduce the likelihood of being blocked. This would mean some additional code would need to be written, before writing out results, to ensure only dates of interest are being written to sheets. Though you could write out all then simply loop all sheets and delete those that aren't wanted (perfectly doable if we are talking about 365 dates total). Personally, I would handle the include date part in the construction of the table from a single request (if possible) that has the min and max dates for entire urls listed passed as start and end date parameters. I would then write a single flat table out to one sheet as this will be much easier for later data analysis.
VBA:
Option Explicit
Public Sub GetTables()
'VBE > Tools > References > Microsoft Scripting Runtime
Dim json As Object, qualifyingUrlsDates(), urls(), url As String
Dim ws As Worksheet, wsOutput As Worksheet, i As Long, startDate As String, endDate As String
Set ws = ThisWorkbook.Worksheets("Sheet1")
urls = GetAllUrls(2, ws, "A")
qualifyingUrlsDates = GetOnlyQualifyingUrlsDates(urls)
'API key may be not be valid over time so look at obtaining by prior request
With CreateObject("MSXML2.XMLHTTP") 'issue xmlhttp request for each valid date (this would be better done using start and enddate to specify entire range _
of batches e.g. months within total range to cut down on requests
For i = LBound(qualifyingUrlsDates) To UBound(qualifyingUrlsDates)
startDate = qualifyingUrlsDates(i)
endDate = startDate ' a little verbose but useful for explaining
url = "https://api.weather.com/v1/geocode/31.76/-106.49/observations/historical.json?apiKey=6532d6454b8aa370768e63d6ba5a832e&startDate=" & startDate & "&endDate=" & endDate & "&units=e"
.Open "GET", url, False
.send
Set json = JsonConverter.ParseJson(.responseText)("observations")
Set wsOutput = CreateWorksheet(qualifyingUrlsDates(i))
WriteOutResults wsOutput, json
Next
End With
End Sub
Public Sub WriteOutResults(ByVal wsOutput As Worksheet, ByVal json As Object)
'json is a collection of dictionaries. Each dictionary is a time period reading from the day i.e. one row in output
Dim results(), item As Object, headers(), r As Long, c As Long, key As Variant
headers = json.item(1).keys 'get the headers which are the keys of each dictionary
ReDim results(1 To json.Count, 1 To UBound(headers) + 1)
For Each item In json
r = r + 1: c = 0 'increase row in results array to store results for table row
For Each key In item.keys
c = c + 1 'increase column number in results array for writing out results
Select Case key
Case "valid_time_gmt", "expire_time_gmt" 'convert unix timestamp fields to datetime
results(r, c) = Epoch2Date(item(key))
Case Else
results(r, c) = item(key)
End Select
Next
Next
With wsOutput
.Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
.Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
End With
End Sub
Public Function GetOnlyQualifyingUrlsDates(ByVal urls As Variant) As Variant
Dim i As Long, output(), counter As Long
ReDim output(1 To UBound(urls))
For i = LBound(urls) To UBound(urls)
If IncludeThisDate(urls(i)) Then 'check if weekday is to be included
counter = counter + 1
output(counter) = Format$(Right$(urls(i), 8), "yyyymmdd") 'if to include then add to output array of urls of interest
End If
Next
ReDim Preserve output(1 To counter)
GetOnlyQualifyingUrlsDates = output
End Function
Public Function IncludeThisDate(ByVal url As String) As Boolean
'tue, wed, thurs are valid
IncludeThisDate = Not IsError(Application.Match(Weekday(Right$(url, 8), vbSunday), Array(3, 4, 5)))
End Function
Public Function SheetExists(ByVal sheetName As String) As Boolean '<== function by #Rory
SheetExists = Evaluate("ISREF('" & sheetName & "'!A1)")
End Function
Public Function GetAllUrls(ByVal startRow As Long, ByVal ws As Worksheet, ByVal columnName As String) As Variant
'transpose used based on premise no more than a couple of years of dates
'startRow is start row for urls, ws is sheet where urls found, columnName is string representation of column for urls e.g. "A"
With ws
GetAllUrls = Application.Transpose(ws.Range("A" & startRow & ":A" & .Cells(.rows.Count, columnName).End(xlUp).Row).Value)
End With
End Function
Public Function CreateWorksheet(ByVal sheetName As String) As Worksheet
Dim ws As Worksheet
If SheetExists(sheetName) Then
Set ws = ThisWorkbook.Worksheets(sheetName)
'do something.... clear it? Then add new data to it?
Else
Set ws = ThisWorkbook.Worksheets.Add
ws.Name = sheetName
End If
Set CreateWorksheet = ws
End Function
Public Function Epoch2Date(ByVal E As Currency, Optional msFrac) As Date '# Schmidt http://www.vbforums.com/showthread.php?805245-EPOCH-to-Date-and-vice-versa
Const Estart As Double = #1/1/1970#
msFrac = 0
If E > 10000000000# Then E = E * 0.001: msFrac = E - Int(E)
Epoch2Date = Estart + (E - msFrac) / 86400
End Function

Scraping Web With VBA to return Data while matching conditions

I hope this question is not in other post since I have searched and not found an answer. I'm also quite new to programming but specially to scraping the web. If you guys know of any good, complete tutorial, I'll appreciate if you can direct me to it. I work with VBA and Python.
I begun working after reading this: Scraping data from website using vba
Very helpful, by the way. I understood the old method better so I chose that one.
The site I want to search in is: http://www.bcra.gob.ar/PublicacionesEstadisticas/Principales_variables.asp
The code I've written so far:
Sub Test()
Dim ie As Object
Dim form As Variant, button As Variant
Set ie = CreateObject("InternetExplorer.Application")
Dim TR_col As Object, TR As Object
Dim TD_col As Object, TD As Object
Dim xx As Object, x As Object
With ie
.Visible = True '< Show browser window
.navigate ("http://www.bcra.gob.ar/PublicacionesEstadisticas/Principales_variables.asp") '> Travel to homepage
Do While ie.Busy
DoEvents
Loop '< Wait for page to have loaded
End With
Set TR_col = ie.Document.getElementsByTagName("TR")
For Each TR In TR_col
Set xx = ie.Document.getElementsByTagName("a")
If xx = "Base Monetaria - Promedio acumulado del mes (MM de $)" Then
Cells(1, 1) = "Ok"
End If
Next TR
End Sub
Lastly, this is the Inspector looks like:
[![enter image description here][1]][1]
[1]: https://i.stack.imgur.com/YoG4H.png
I also highlighted the piece of information I'm using for testing purposes.
So, my approach is to search for all the "tr" tags and then validate whether the first column of the table (I guess this would be the first "td" tag) is equal to a text I'll have in a cell (in this case I just wrote in text for testing purposes). The result should be copying the number next to the date to a cell in the worksheet. In this case I wrote "Ok" just to see whether the if statement was working. But it isn't.
I guess I'm not sure how tell VBA to search for all "tr" tags, search for all the "td" tags within each "tr", find the one that matches some text, and return the 3rd "td" tag within that "tr". Makes sense?
Hope I've been specific enough and that someone can guide me.
It's not necessary to load whole browser to get HTML - you can do without it.
Sub Test()
'// References required:
'// 1) Microsoft HTML Object Library
'// 2) Microsoft XML, v6.0
Dim req As MSXML2.XMLHTTP60
Dim doc As MSHTML.HTMLDocument
Dim tbl As MSHTML.HTMLTable
Dim tblRow As MSHTML.HTMLTableRow
Dim tblCell As MSHTML.HTMLTableCell
Dim anch As MSHTML.HTMLAnchorElement
Dim html$, url$, sText$, fecha$, valor$, j%
Set req = New MSXML2.XMLHTTP60
url = "http://www.bcra.gob.ar/PublicacionesEstadisticas/Principales_variables.asp"
Set req = New MSXML2.XMLHTTP60
req.Open "GET", url, False
req.send
html = req.responseText
Set doc = New MSHTML.HTMLDocument
doc.body.innerHTML = html
Set tbl = doc.getElementsByClassName("table-BCRA")(, 0)
For j = 1 To tbl.Rows.Length - 1
With tbl.Rows(j)
'// Skip cells without data.
'// Assume correct data has three cells.
If .Cells.Length = 3 Then
Set anch = .Cells(0)
sText = anch.textContent
If sText = "Base Monetaria - Promedio acumulado del mes (MM de $)" Then
fecha = .Cells(1).innerText
valor = .Cells(2).innerText
End If
End If
End With
Next
End Sub

Resources