Scraping Web With VBA to return Data while matching conditions - excel

I hope this question is not in other post since I have searched and not found an answer. I'm also quite new to programming but specially to scraping the web. If you guys know of any good, complete tutorial, I'll appreciate if you can direct me to it. I work with VBA and Python.
I begun working after reading this: Scraping data from website using vba
Very helpful, by the way. I understood the old method better so I chose that one.
The site I want to search in is: http://www.bcra.gob.ar/PublicacionesEstadisticas/Principales_variables.asp
The code I've written so far:
Sub Test()
Dim ie As Object
Dim form As Variant, button As Variant
Set ie = CreateObject("InternetExplorer.Application")
Dim TR_col As Object, TR As Object
Dim TD_col As Object, TD As Object
Dim xx As Object, x As Object
With ie
.Visible = True '< Show browser window
.navigate ("http://www.bcra.gob.ar/PublicacionesEstadisticas/Principales_variables.asp") '> Travel to homepage
Do While ie.Busy
DoEvents
Loop '< Wait for page to have loaded
End With
Set TR_col = ie.Document.getElementsByTagName("TR")
For Each TR In TR_col
Set xx = ie.Document.getElementsByTagName("a")
If xx = "Base Monetaria - Promedio acumulado del mes (MM de $)" Then
Cells(1, 1) = "Ok"
End If
Next TR
End Sub
Lastly, this is the Inspector looks like:
[![enter image description here][1]][1]
[1]: https://i.stack.imgur.com/YoG4H.png
I also highlighted the piece of information I'm using for testing purposes.
So, my approach is to search for all the "tr" tags and then validate whether the first column of the table (I guess this would be the first "td" tag) is equal to a text I'll have in a cell (in this case I just wrote in text for testing purposes). The result should be copying the number next to the date to a cell in the worksheet. In this case I wrote "Ok" just to see whether the if statement was working. But it isn't.
I guess I'm not sure how tell VBA to search for all "tr" tags, search for all the "td" tags within each "tr", find the one that matches some text, and return the 3rd "td" tag within that "tr". Makes sense?
Hope I've been specific enough and that someone can guide me.

It's not necessary to load whole browser to get HTML - you can do without it.
Sub Test()
'// References required:
'// 1) Microsoft HTML Object Library
'// 2) Microsoft XML, v6.0
Dim req As MSXML2.XMLHTTP60
Dim doc As MSHTML.HTMLDocument
Dim tbl As MSHTML.HTMLTable
Dim tblRow As MSHTML.HTMLTableRow
Dim tblCell As MSHTML.HTMLTableCell
Dim anch As MSHTML.HTMLAnchorElement
Dim html$, url$, sText$, fecha$, valor$, j%
Set req = New MSXML2.XMLHTTP60
url = "http://www.bcra.gob.ar/PublicacionesEstadisticas/Principales_variables.asp"
Set req = New MSXML2.XMLHTTP60
req.Open "GET", url, False
req.send
html = req.responseText
Set doc = New MSHTML.HTMLDocument
doc.body.innerHTML = html
Set tbl = doc.getElementsByClassName("table-BCRA")(, 0)
For j = 1 To tbl.Rows.Length - 1
With tbl.Rows(j)
'// Skip cells without data.
'// Assume correct data has three cells.
If .Cells.Length = 3 Then
Set anch = .Cells(0)
sText = anch.textContent
If sText = "Base Monetaria - Promedio acumulado del mes (MM de $)" Then
fecha = .Cells(1).innerText
valor = .Cells(2).innerText
End If
End If
End With
Next
End Sub

Related

I have questions about how to click and search on web using vba

I have questions about how to click and search on web using vba.
I have wrote the code, but cannot find how to click the button in this web
Sub LEISearch()
'dimension (declare or set aside memory for) our variables
Dim objIE As InternetExplorer 'special object variable representing the IE browser
Dim LEI As HTMLLinkElement 'special object variable for an <a> (link) element
Dim y As Integer 'integer variable we'll use as a counter
Dim result As String 'string variable that will hold our result link'
Dim result2 As String
'initiating a new instance of Internet Explorer and asigning it to objIE
Set objIE = New InternetExplorer
'make IE browser visible (False would allow IE to run in the background)
objIE.Visible = True
'navigate IE to this web page (a pretty neat search engine really)
objIE.navigate "https://www.gmeiutility.org/search.jsp?keyWord"
'wait here a few seconds while the browser is busy
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
'in the search box put cell "A2" value, the word "in" and cell "C1" value
objIE.document.getElementById("searchInput").Value = _
Sheets("Macro1").Range("A1").Value
'click the 'go' button
Set LEIButton = objIE.document.getElementsByClassName("hiddenSubmitButton")
LEIButton.Focus
LEIButton.Click
'wait again for the browser
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
End Sub
This case is a really nice and clean example of web scraping so i will take this opportunity to present an educational post.
I highly recommend avoiding using IE to scrape websites whenever it's possible. It's highly inefficient. Especially in a case like this where there can be multiple pages of results. Instead, you can use HTTP requests.
An HTTP request is a structured way to request something from a server. In this case we want to send a keyword to the server and get the corresponding search results.
To find out how this request should look like, you have to inspect the network traffic when the button with the magnifying glass is clicked. You can do that through your browser's developer tools (Ctrl+Shift+E if you're using Firefox):
If you go through the Headers and the Params of the request you will see how the url, the body and the headers should look like. In this particular case, all the parameters are encoded into the url and the headers are not essential to the success of the request, so all you need is the url.
Some of the parameters of the url are the keyword, the number of results per page and the number of page.
The response's payload is in json format. You can inspect its structure using a tool like this. Here's how it looks like:
Basically the JSON response consists of as many results as you have specified that should be displayed per page (or less). To get the next page you need to send a new request with the same keyword but specifying a new page number and so on.
In fact, as you can see the website offers a lot more data than what's displayed on your browser, which could prove to be useful.
The code below searches for the keyword test, while requesting 25 results per page. One first request is sent to find out how many pages of results are there and then the code loops through all pages and prints the results in a worksheet.
TL;DR
Option Explicit
Sub main()
Dim sht As Worksheet
Dim totalNumberOfPages As Long
Dim searchResults As Object
Dim pageNumber As Long
Dim results() As String
Dim entity As Object
Dim i As Long, j As Long
Dim rng As Range
Set sht = ThisWorkbook.Worksheets("Name of your Worksheet")
''''''First request to find out the number of pages''''''
Set searchResults = XHRrequest("test", 25, 1) '
totalNumberOfPages = searchResults("totalPages") '
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''
'''''''''''''''''''Loop through all the pages''''''''''''''''''''''''''''''''
For pageNumber = 1 To totalNumberOfPages Step 1 '
Set searchResults = XHRrequest("test", 25, pageNumber) '
ReDim results(1 To searchResults("entitySearchResult").Count, 1 To 7) '
i = 0 '
'''''''''''write the results in an array'''''''''''''''''''''''''''' '
For Each entity In searchResults("entitySearchResult") ' '
i = i + 1 ' '
results(i, 1) = entity("LEINumber") ' '
results(i, 2) = entity("legalName") ' '
results(i, 3) = entity("city") ' '
results(i, 4) = entity("headquartersCountry") ' '
results(i, 5) = entity("recordStatus") ' '
results(i, 6) = entity("renewalStatus") ' '
results(i, 7) = entity("entityStatus") ' '
Next entity ' '
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' '
'''''''''''''''write all the results in the worksheet in one go''''' '
With sht ' '
Set rng = .Range("A" & .Rows.Count).End(xlUp).Offset(1, 0) ' '
End With ' '
rng.Resize(UBound(results, 1), UBound(results, 2)) = results ' '
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' '
Next pageNumber '
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
End Sub
Public Function XHRrequest(ByVal key As String, ByVal resultsPerPage As Long, ByVal pageNumber As Long) As Object
Dim req As New WinHttpRequest
Dim url As String
url = "https://www.gmeiutility.org/actions/Search/?isPendingValidationChecked=true&isSearchAllLOUChecked=true&keyWord=" & key & "&page=" & pageNumber & "&resultsPerPage=" & resultsPerPage & "&searchType=baseSearch" 'build the URL according to the parameters
'''''''''Send the HTTP request'''''''''''''''''''''''''''''''
With req '
.Open "POST", url, False '
.send '
Set XHRrequest = JsonConverter.ParseJson(.responseText) '
End With '
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
End Function
For demonstration purposes the code above prints all the data in worksheet named Name of your Worksheet.
If you need to perform multiple searches you can easily modify the code to best fit your needs. More specifically you can loop through multiple keywords and call the XHRrequest function using those keywords instead of "test".
Here's a sample of the output:
You will need to add the following references to your project (VBE>Tools>References):
Microsoft WinHTTP Services version 5.1
Microsoft HTML Objects Library
Microsoft Scripting Runtime
You will also need to add this JSON parser to your project. Follow the installation instructions in the link and you should be set to go.

Web scraping with excel VBA returns no values for certain sites

I'm trying to scrape data from Oslo Børs related to 3months NIBOR using excel VBA. However my code do not return any values back to excel. The same code is able to retrieve table data from other similar websites, but could it be related to the ui-view set-up of the html language? I'm not very skilled on html language, so I do come to short here. Any help would be much appreciated. Thanks
Sub NIBOR3M_oslobors()
Dim ie As InternetExplorer
Dim InternetExplorer As Object
Dim strURL As String
Dim ws As Worksheet
Dim rng As Range
Dim tbl As Object
Dim rw As Object
Dim cl As Object
Dim nextrow As Long
Dim I As Long
strURL = "https://www.oslobors.no/markedsaktivitet/#/details/NIBOR3M.NIBOR/overview"
Set ie = CreateObject("InternetExplorer.Application")
With ie
.navigate strURL
Do Until .READYSTATE = 4: DoEvents: Loop
Do While .Busy: DoEvents: Loop
Set doc = ie.Document
Set ws = Worksheets("Interest rates")
For Each tbl In doc.getElementsByTagName("table")
nextrow = nextrow + 1
Set rng = ws.Range("M8:M8" & nextrow)
For Each rw In tbl.Rows
For Each cl In rw.Cells
rng.Value = rw.innerText
Set rng = rng.Offset(, 1)
I = I + 1
Next cl
nextrow = nextrow + 1
Set rng = rng.Offset(1, -I)
I = 0
Next rw
Next tbl
.Quit
End With
ActiveSheet.Range("M8").Select
End Sub
The tables you are trying to scrape are generated by scripts, so you will not find them in the HTML document. To get your hands on the data you will need to send an XHR request.
To find out how this request should look like, you have to inspect the network traffic when the page is loaded. You can do that through your browser's developer tools (Ctrl+Shift+E if you're using Firefox).
There you will see several requests of different types being sent. In your case, you should look for XHR requests of type json. There's a few of them. If you go through the responses of these requests, you will eventually find the one you need, as it will contain the data you want.
Here's how it looks like:
If you go through the Headers and the Params of the request you will see how the url, the body and the headers should look like. In this particular case, all the parameters are encoded into the url and the headers are not essential to the success of the request, so all you need is the url.
The response's payload is in json format. You can inspect its structure using a tool like this. Here's how it looks like:
To parse a response like that you need a parser (look at the end of this post).
TL;DR :
Here's how to get the data from the "Nibor 3 month" table:
Option Explicit
Sub oslobors()
Dim req As New WinHttpRequest
Dim respJSON As Object
Dim key As Variant
Dim url As String
Dim results()
Dim i As Integer
Dim sht As Worksheet
Set sht = ThisWorkbook.Worksheets("Name of your worksheet")
url = "https://www.oslobors.no/ob/servlets/components?type=nibor&source=feed.index.nibor.INDICES_FIXING&view=DELAYED&columns=ITEM%2C+CLOSENZ%2C+TRADE_TIME%2C+CLOSE_DATE%2C+CHANGENZ%2C+CHANGENZ_PCT&filter=ITEM_SECTOR%3D%3DsNIBOR3M.NIBOR&channel=e6da3e88c3464b7ad9620824b8d39c95"
With req
.Open "GET", url, False
.send
Set respJSON = JsonConverter.ParseJson(.responseText)
End With
ReDim results(1 To respJSON("rows")(1)("values").Count, 1 To 2)
i = 1
For Each key In respJSON("rows")(1)("values").Keys
results(i, 1) = key
results(i, 2) = respJSON("rows")(1)("values")(key) 'write the results in an array
i = i + 1
Next key
sht.Cells(1, "A").Resize(UBound(results, 1), UBound(results, 2)) = results 'Print the results array in one go. The results will be printed to an area/range starting with cell A1 and expanding as much as it is needed
End Sub
For demonstration purposes, the data is printed in your immediate window:
You will need to add the following references to your project (VBE>Tools>References):
Microsoft WinHTTP Services version 5.1
Microsoft HTML Objects Library
Microsoft Scripting Runtime
You will also need to add this JSON parser to your project. Follow the installation instructions in the link and you should be set to go.
To get the data from the "Historical Prices" table you just have to change the url to this:
url = "https://www.oslobors.no/ob/servlets/components?type=table&source=feed.index.nibor.INDICES_FIXING&view=DELAYED&columns=CHANGE_1WEEK_PCT%2C+HIGH_1WEEK%2C+LOW_1WEEK%2C+CHANGE_1MONTH_PCT%2C+HIGH_1MONTH%2C+LOW_1MONTH%2C+CHANGE_YEAR_PCT%2C+HIGH_YEAR%2C+LOW_YEAR%2C+CHANGE_1YEAR_PCT%2C+HIGH_1YEAR%2C+LOW_1YEAR&filter=ITEM_SECTOR%3D%3DsNIBOR3M.NIBOR&channel=283044a7d182ca196a16337ba79f089c"
EDIT
Modified the code to print the results in a worksheet named "Name of your worksheet".

When the search button is clicked using vba the text entered in search box is not seen by web page

I have written vba code for entering manufacturer part number in search box of below website and clicking on search icon. It is able enter manufacturer part number in search box and click on search icon, but when "search icon is clicked the text entered in the text box is not picked up". It searches empty data.
'HTML Part for search icon
<em class="fa fa-search" aria-hidden="true" style="color: gray;"></em>
It being almost a month I have tried various different way which was also mentioned on stack overflow, like using "createEvent("keyboardevent")" but nothing worked.
' VBA code
Sub AptivScrapping()
Dim IE As SHDocVw.InternetExplorer
Set IE = New InternetExplorer
IE.Visible = True
IE.navigate "https://ecat.aptiv.com"
Do While IE.readyState < READYSTATE_COMPLETE
Loop
Dim idoc As MSHTML.HTMLDocument
Set idoc = IE.document
idoc.getElementById("searchUserInput").Value = "33188785"
Dim doc_ele As MSHTML.IHTMLElement
Dim doc_eles As MSHTML.IHTMLElementCollection
Set doc_eles = idoc.getElementsByTagName("a")
For Each doc_ele In doc_eles
If doc_ele.getAttribute("ng-click") = "SearchButtonClick(1)" Then
doc_ele.Click
Exit Sub
Else
End If
Next doc_ele
End Sub
The page does an xhr request to retrieve the search results. You can find it in the network tab after clicking submit. This means you can avoid, in this case, the expense of a browser and issue an xhr request. The response is json so you do need a json parser to handle the results.
I would use jsonconverter.bas to parse the json. After installing the code from that link in a standard module called JsonConverter, go to VBE > Tools > References > Add a reference to Microsoft Scripting Runtime
I dimension an array to hold the results. I determine rows from the number of items in the json collection returned and the number of columns from the size of the first item dictionary. I loop the json object, and inner loop the dictionary keys of each dictionary in collection, and populate the array. I write the array out in one go at end which is less i/o expensive.
Option Explicit
Public Sub GetInfo()
Dim json As Object, ws As Worksheet, headers()
Dim item As Object, key As Variant, results(), r As Long, c As Long
Set ws = ThisWorkbook.Worksheets("Sheet1")
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://ecat.aptiv.com/json/eCatalogSearch/SearchProducts?filter=All&options=&pageSize=10&search=33188785", False
.send
Set json = JsonConverter.ParseJson(.responseText)("Products")
End With
headers = json.item(1).keys
ReDim results(1 To json.Count, 1 To UBound(headers) + 1)
For Each item In json
r = r + 1: c = 1
For Each key In item.keys
results(r, c) = item(key)
c = c + 1
Next
Next
With ws
.Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
.Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
End With
You can do this instead:
txt = "33188785"
IE.navigate "https://ecat.aptiv.com/feature?search=" & txt
This will take you straight to the Search Result.
Code:
Sub AptivScrapping()
Dim IE As SHDocVw.InternetExplorer
Dim txt As String
Set IE = New InternetExplorer
txt = "33188785"
IE.Visible = True
IE.navigate "https://ecat.aptiv.com/feature?search=" & txt
Do While IE.Busy
Application.Wait DateAdd("s", 1, Now)
Loop
End Sub
This will be faster as You will only have to load one page.
Why that's happening, i am not sure, but seems like the TextBox that is used to input text is not being Activated when adding text automatically to it. It is being activated when we click inside it.
I got the solution for above problem from Mrxel.com below is the link for that post.
https://www.mrexcel.com/forum/excel-questions/1105434-vba-ie-automation-issue-angularjs-input-text-post5317832.html#post5317832
In this case I need to enter the search string character by character and sendKeys and input events inside the loop. Below is the working vba code.
Sub AptivScrapping()
Dim IE As SHDocVw.InternetExplorer
Set IE = New InternetExplorer
IE.Visible = True
IE.navigate "https://ecat.aptiv.com"
Do While IE.readyState < READYSTATE_COMPLETE
Loop
Dim idoc As MSHTML.HTMLDocument
Set idoc = IE.document
IE.document.getElementById("searchUserInput").Focus = True
IE.document.getElementById("searchUserInput").Select
sFieldInput = "33188785"
For s = 1 To Len(sFieldInput)
Application.SendKeys Mid(sFieldInput, s, 1)
While IE.readyState < 4 Or IE.Busy
Application.Wait DateAdd("s", LoopSeconds, Now)
Wend
Next s
IE.document.getElementById("searchUserInput").Focus = False
Dim doc_ele As MSHTML.IHTMLElement
Dim doc_eles As MSHTML.IHTMLElementCollection
Set doc_eles = idoc.getElementsByTagName("a")
For Each doc_ele In doc_eles
If doc_ele.getAttribute("ng-click") = "SearchButtonClick(1)" Then
doc_ele.Click
Exit Sub
Else
End If
Next doc_ele
End Sub

VBA Scraping div elements

So, I've trying to scrape data from a website but I simply can't reach my goal...
I'm new with VBA and i've tried to search the basics of vba in order to understand some code.
So far I got this code but it's only scraping the data from the 1st div and it scrap all the data to one cell, and I need the macro to run trought all the page and scrap all the data that has the className I input on the code on diferent cells (eg: 1st div to cell A:1, 2nd div to cell A2... and so on)
Could you help me or give me some "lights" of what I'm doing wrong pls?
Thank you!
Code:
Sub BoschRoupa()
Dim ieObj As InternetExplorer
Dim htmlEle As IHTMLElement
Dim i As Integer
i = 1
Set ieObj = New InternetExplorer
ieObj.Visible = False
ieObj.navigate "https://www.worten.pt/grandes-eletrodomesticos/maquinas-de-roupa/maquinas-de-roupa-ver-todos-marca-BALAY-e-BOSCH-e-SIEMENS?per_page=100"
Application.Wait Now + TimeValue("00:00:05")
For Each htmlEle In ieObj.document.getElementsByClassName("w-product__content")(0).getElementsByTagName("div")
With ActiveSheet
.Range("A" & i).Value = htmlEle.Children(0).textContent
End With
i = i + 1
Next htmlEle
End Sub
You can use xmlhttp, rather than a browser, then the following loop to write out all the div info. I would probably be more selective in how I grab only data of interest but the following, I hope, is in the spirit of what you have asked for.
Option Explicit
Public Sub GetInfo()
Dim data As Object, i As Long, html As HTMLDocument, r As Long, c As Long, item As Object, div As Object
Set html = New HTMLDocument '<== VBE > Tools > References > Microsoft HTML Object Library
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.worten.pt/grandes-eletrodomesticos/maquinas-de-roupa/maquinas-de-roupa-ver-todos-marca-BALAY-e-BOSCH-e-SIEMENS?per_page=100", False
.send
html.body.innerHTML = .responseText
End With
Set data = html.getElementsByClassName("w-product__content")
For Each item In data
r = r + 1: c = 1
For Each div In item.getElementsByTagName("div")
With ThisWorkbook.Worksheets("Sheet1")
.Cells(r, c) = div.innerText
End With
c = c + 1
Next
Next
End Sub

Scraping data from a website with a dynamic array function - Excel VBA

I want to eventually create a function where I can specify a web page element and URL and populate all instances of that element down a column. But am currently only experiencing limited success with this function:
Sub GrabAnchorTags() '(URL As String) As Variant'
Dim objIE As InternetExplorer
Dim elem As Object
Set objIE = New InternetExplorer
objIE.Visible = False
objIE.navigate "http://example.com/"
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
Dim aRange As Range
Debug.Print objIE.document.getElementsByTagName("a").Length
For Each elem In objIE.document.getElementsByTagName("a")
Debug.Print elem
ActiveCell.Offset(x, y).Value = elem
ActiveCell.Offset(x, y + 1).Value = elem.textContent
x = x + 1
Next
objIE.Quit
Set objIE = Nothing
End Sub
I would like to be able to turn this successfully from a macro to a function.
Currently, it uses a for loop to populate the cells and I wonder if it's possible to accomplish the same thing using evaluate or something similar because the for loop is inefficient.
This function would need to live in a cell, reference a URL in another cell, and populate the cells bellow it with all elements of a type found on the page. I am currently working on the anchor tag.
Many other solutions I referenced used macros:
Scraping data from website using excel vba
Getting links url from a webpage excel vba
VBA – Web scraping with getElementsByTagName()
Generally speaking, whenever you have many cells to write to, you should enter the data into an internal array, and then write the entire array to the worksheet in one hit. However you seem to not want a macro/sub in your case.
If you wish it to take the worksheet formula approach for usability reasons, then the best way is to use a very powerful, but underused technique in Excel development.
A NAMED RANGE
Named ranges are Excels closest thing to getting an in-memory block of data, and then other simpler formulas can use the named range to get info from the Named Range.
A Named Range doesn't have to actually be a simple block of cells on a sheet. You can write your VBA formula as a Public formula, and then reference it in the Named Range.
Function getElems(url As String, tagName As String) As String()
Dim browser As New MSXML2.XMLHTTP60
Dim doc As MSHTML.HTMLDocument
With browser
.Open "GET", url, False
.send
If .readyState = 4 And .Status = 200 Then
Set doc = New MSHTML.HTMLDocument
doc.body.innerHTML = .responseText
Else
MsgBox "Error" & vbNewLine & "Ready state: " & .readyState & _
vbNewLine & "HTTP request status: " & .Status
End If
End With
Dim tag As MSHTML.IHTMLElement
Dim tags As MSHTML.IHTMLElementCollection
Set tags = doc.getElementsByTagName(tagName)
Dim arr() As String
Dim arrCounter As Long: arrCounter = 1
ReDim arr(1 To tags.Length, 1 To 2)
For Each tag In tags
arr(arrCounter, 1) = tag.innerText
'Change the below if block to suit
If tagName = "a" Then
arr(arrCounter, 2) = tag.href
Else
arr(arrCounter, 2) = tag.innerText
End If
arrCounter = arrCounter + 1
Next tag
Set doc = Nothing
Set browser = Nothing
getElems = arr
End Function
Now set a Named Range in Excel such as:
elementData
=getElems(Sheet1!$A$1, Sheet1!$B$1)
In A1, put the URL, and in B1 put the tag Name such as "a"
Then in your cells you can say
=INDEX(elementData, ROW(1:1), 1) and in adjacent cell put =INDEX(elementData, ROW(1:1), 2) (or use ROWS formula technique)
and drag down.

Resources