Web Scraping with VBA - Almost There - excel

After studying many posts here, i finally patched together code that worked (below)! It extracts the index value from yahoo finance when the index is basically the only data point on the page (https://finance.yahoo.com/quote/%5EHSI?p=%5EHSI&.tsrc=fin-srch).
However, when I want the same index from a website containing several index values (https://www.hkex.com.hk/?sc_lang=en), the web code looks overwhelming for me to correctly reference the element and put into my coding. Can anyone please help with this final hurdle? Thank you.
Sub HSI_Scrape()
Dim IE As Object
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = False
IE.Navigate "https://finance.yahoo.com/quote/%5EHSI?p=%5EHSI&.tsrc=fin-srch"
Do While IE.Busy
Application.Wait DateAdd("s", 1, Now)
Loop
Dim hsi As String
hsi = IE.Document.getElementsByClassName("Trsdu(0.3s) Fw(b) Fz(36px) Mb(-4px) D(ib)")(0).innerText
Worksheets("Tickers").Select
Range("z11").value = hsi
End Sub

Each of the tickers on HKEX has class "listitem". So to get each ticker, search for that class using getElementsByClassName("listitem"). Loop through each element, and the first child (class="col_name") of the element will be the ticker name, the second child (class="col_last") will be the price.
Sub HSI_Scrape()
Dim IE As Object
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = False
IE.Navigate "https://www.hkex.com.hk/?sc_lang=en"
Do While IE.Busy
Application.Wait DateAdd("s", 1, Now)
Loop
Dim listitems As Object 'list of tickers
Dim listitem As Object 'each ticker
Set listitems = IE.document.getElementsByClassName("listitem")
Dim row As Integer
row = 1
For Each listitem In listitems
Worksheets("Tickers").Range("A" & row) = listitem.Children(0).innerText 'Ticker Name
Worksheets("Tickers").Range("B" & row) = listitem.Children(1).innerText 'Ticker Price
row = row + 1
Next listitem
End Sub

Related

How to scrape a class and if not found scrape another

I am using VBA to scrape a website. The scraper made by me works but I want to implement 2 more functions and don't really know how to do it. This is the code:
Sub pronutrition()
Set ie = CreateObject("InternetExplorer.Application")
my_url = "https://www.myprotein.ro/"
ie.Visible = True
i = 20
LastRow = ActiveSheet.Range("A" & ActiveSheet.Rows.Count).End(xlUp).Row
Set Rng = ActiveSheet.Range("A20:A" & LastRow)
For Each cell In Rng
ie.navigate my_url
Do While ie.Busy
DoEvents
Loop
Wait 1
ie.Document.getElementsByName("search")(0).Value = cell
ie.Document.getElementsByClassName("headerSearch_button")(0).Click
Do While ie.Busy
DoEvents
Loop
Wait 2
ActiveSheet.Range("B" & i) = ie.Document.getElementsByClassName("athenaProductBlock_productName")(0).innerText + ie.Document.getElementsByClassName("athenaProductBlock_fromValue")(0).innerText
Do While ie.Busy
DoEvents
Loop
Wait 2
ActiveSheet.Range("C" & i) = ie.Document.getElementsByClassName("athenaProductBlock_productName")(1).innerText + ie.Document.getElementsByClassName("athenaProductBlock_fromValue")(1).innerText
Do While ie.Busy
DoEvents
Loop
Wait 2
ActiveSheet.Range("D" & i) = ie.Document.getElementsByClassName("athenaProductBlock_productName")(2).innerText '+ ie.Document.getElementsByClassName("athenaProductBlock_priceValue")(2).innerText
Do While ie.Busy
DoEvents
Loop
Wait 2
ActiveSheet.Range("E" & i) = ie.Document.getElementsByClassName("athenaProductBlock_productName")(3).innerText '+ ie.Document.getElementsByClassName("athenaProductBlock_priceValue")(3).innerText
Do While ie.Busy
DoEvents
Loop
Wait 2
i = i + 1
Next cell
ie.Quit
MsgBox "Done"
End Sub
First I want to search for "athenaProductBlock_fromValue" class and if it doesn't find it to search for "athenaProductBlock_priceValue", and second, if it doesn't find more than 1 or 2 products (the range is set to 4) to stop the search (right now it returns and error if it doesn't find a 2nd or a 3rd product and won't go to search the next keyword).
Any advice would be appreciated.
Thank you!
Use a helper method to extract the HTMLCollection returned by the getElementsByClassName method. You can then check if the method returned any results.
Once you get back the collection filled, it's up to you how to handle it. You can loop and fill individual cells or join the results to fill a single cell. Also, if the Count is less then 2, ignore it etc.
Private Function TryExtractElementsByClassName(ByVal ie As Object,
ByVal className As String,
ByRef objCollection As VBA.Collection) As Boolean
'if ie is null, return false
If ie Is Nothing Then Exit Function
'if elements (HTMLCollection) is null, return false
Dim elements As Object
Set elements = ie.Document.getElementsByClassName(className)
If elements Is Nothing Then Exit Function
'fill collection
Dim element As Object, idx As Long
For idx = 0 To elements.Length
Set element = elements(idx)
If Not element Is Nothing Then objCollection.Add element
Next idx
'return
TryExtractElementsByClassName = objCollection.Count > 0
End Function
To call the helper method:
Sub Test()
Dim ie As Object
Set ie = CreateObject("InternetExplorer.Application")
Dim objColl As New VBA.Collection
'search single class name
If TryExtractElementsByClassName(ie, "athenaProductBlock_priceValue", objColl) Then
'handle results stored in objColl
End If
'search multiple class names separated by a space
If TryExtractElementsByClassName(ie, "athenaProductBlock_priceValue athenaProductBlock_fromValue", objColl) Then
'handle results stored in objColl
End If
End Sub

Autosearching CAGE codes

I need to retrieve the attributes for roughly 500 CAGE codes from the DLA and record them in my spreadsheet. I'm starting by trying to retrieve it for one item, then eventually looping it. I keep on getting an error for the below and I can't figure out why.
Note the code doesn't work unless you've already opened the website before and haven't closed the broswer (you need to accept the terms and conditions).
The cell B2 = https://cage.dla.mil/Search/Results?q=07187&page=1
Sub NSCM()
Dim ie As Object
Set ie = CreateObject("INTERNETEXPLORER.APPLICATION")
'Retrieve CAGE Code
Dim CAGE As String
CAGE = Range("B2").Value
'Navigate to Cage Code general Page
ie.navigate CAGE
ie.Visible = True
While ie.Busy
DoEvents
Wend
'Follow link to details page
For Each ele In ie.document.getElementsByTagName("a")
If InStr(ele.innerText, "Details") > 0 Then ele.Click
Next
Do While ie.Busy Or _
ie.readyState <> 4
DoEvents
Loop
'Pull Data
Dim count As Long
Dim erow As Long
Set HTML = ie.document
Set elements = HTML.getElementsByID(“detail_topsection”)
count = 0
For Each element In elements
If element.className = “result” Then
enter code hereerow = Sheet1.Cells(Rows.count, 1).End(xlUp).Offset(1, 0).Row
Cells(erow, 1) = HTML.getElementsByTagName(“span”)(10).innerText
Center code hereells(erow, 1) = HTML.getElementsByTagName(“span”)(14).innerText
count = count + 1
End If
Next element
End Sub
Set elements = HTML.getElementsByID(“detail_topsection”)
should be
Set element = HTML.getElementByID(“detail_topsection”) 'no "s"
Element Id's on a page should be unique, so getElementByID returns a single element (or nothing), not a collection of elements like getElementsByTagName

When the search button is clicked using vba the text entered in search box is not seen by web page

I have written vba code for entering manufacturer part number in search box of below website and clicking on search icon. It is able enter manufacturer part number in search box and click on search icon, but when "search icon is clicked the text entered in the text box is not picked up". It searches empty data.
'HTML Part for search icon
<em class="fa fa-search" aria-hidden="true" style="color: gray;"></em>
It being almost a month I have tried various different way which was also mentioned on stack overflow, like using "createEvent("keyboardevent")" but nothing worked.
' VBA code
Sub AptivScrapping()
Dim IE As SHDocVw.InternetExplorer
Set IE = New InternetExplorer
IE.Visible = True
IE.navigate "https://ecat.aptiv.com"
Do While IE.readyState < READYSTATE_COMPLETE
Loop
Dim idoc As MSHTML.HTMLDocument
Set idoc = IE.document
idoc.getElementById("searchUserInput").Value = "33188785"
Dim doc_ele As MSHTML.IHTMLElement
Dim doc_eles As MSHTML.IHTMLElementCollection
Set doc_eles = idoc.getElementsByTagName("a")
For Each doc_ele In doc_eles
If doc_ele.getAttribute("ng-click") = "SearchButtonClick(1)" Then
doc_ele.Click
Exit Sub
Else
End If
Next doc_ele
End Sub
The page does an xhr request to retrieve the search results. You can find it in the network tab after clicking submit. This means you can avoid, in this case, the expense of a browser and issue an xhr request. The response is json so you do need a json parser to handle the results.
I would use jsonconverter.bas to parse the json. After installing the code from that link in a standard module called JsonConverter, go to VBE > Tools > References > Add a reference to Microsoft Scripting Runtime
I dimension an array to hold the results. I determine rows from the number of items in the json collection returned and the number of columns from the size of the first item dictionary. I loop the json object, and inner loop the dictionary keys of each dictionary in collection, and populate the array. I write the array out in one go at end which is less i/o expensive.
Option Explicit
Public Sub GetInfo()
Dim json As Object, ws As Worksheet, headers()
Dim item As Object, key As Variant, results(), r As Long, c As Long
Set ws = ThisWorkbook.Worksheets("Sheet1")
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://ecat.aptiv.com/json/eCatalogSearch/SearchProducts?filter=All&options=&pageSize=10&search=33188785", False
.send
Set json = JsonConverter.ParseJson(.responseText)("Products")
End With
headers = json.item(1).keys
ReDim results(1 To json.Count, 1 To UBound(headers) + 1)
For Each item In json
r = r + 1: c = 1
For Each key In item.keys
results(r, c) = item(key)
c = c + 1
Next
Next
With ws
.Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
.Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
End With
You can do this instead:
txt = "33188785"
IE.navigate "https://ecat.aptiv.com/feature?search=" & txt
This will take you straight to the Search Result.
Code:
Sub AptivScrapping()
Dim IE As SHDocVw.InternetExplorer
Dim txt As String
Set IE = New InternetExplorer
txt = "33188785"
IE.Visible = True
IE.navigate "https://ecat.aptiv.com/feature?search=" & txt
Do While IE.Busy
Application.Wait DateAdd("s", 1, Now)
Loop
End Sub
This will be faster as You will only have to load one page.
Why that's happening, i am not sure, but seems like the TextBox that is used to input text is not being Activated when adding text automatically to it. It is being activated when we click inside it.
I got the solution for above problem from Mrxel.com below is the link for that post.
https://www.mrexcel.com/forum/excel-questions/1105434-vba-ie-automation-issue-angularjs-input-text-post5317832.html#post5317832
In this case I need to enter the search string character by character and sendKeys and input events inside the loop. Below is the working vba code.
Sub AptivScrapping()
Dim IE As SHDocVw.InternetExplorer
Set IE = New InternetExplorer
IE.Visible = True
IE.navigate "https://ecat.aptiv.com"
Do While IE.readyState < READYSTATE_COMPLETE
Loop
Dim idoc As MSHTML.HTMLDocument
Set idoc = IE.document
IE.document.getElementById("searchUserInput").Focus = True
IE.document.getElementById("searchUserInput").Select
sFieldInput = "33188785"
For s = 1 To Len(sFieldInput)
Application.SendKeys Mid(sFieldInput, s, 1)
While IE.readyState < 4 Or IE.Busy
Application.Wait DateAdd("s", LoopSeconds, Now)
Wend
Next s
IE.document.getElementById("searchUserInput").Focus = False
Dim doc_ele As MSHTML.IHTMLElement
Dim doc_eles As MSHTML.IHTMLElementCollection
Set doc_eles = idoc.getElementsByTagName("a")
For Each doc_ele In doc_eles
If doc_ele.getAttribute("ng-click") = "SearchButtonClick(1)" Then
doc_ele.Click
Exit Sub
Else
End If
Next doc_ele
End Sub

loop through rows of data to submit web form

Sub AutoLoadAccounts()
Dim IE As Object
Set IE = CreateObject("InternetExplorer.Application")
IE.Navigate "https://www.urlhere.com/admin/accounts/create"
IE.Visible = True
While IE.busy
DoEvents
Wend
IE.Document.All("title").Value = ThisWorkbook.Sheets("sheet1").Range("a1")
IE.Document.All("names").Value = ThisWorkbook.Sheets("sheet1").Range("b1")
IE.Document.All("floor").Value = 30
IE.Document.getElementById("status").selectedindex = 1
IE.Document.getElementById("email_state").selectedindex = 1
IE.Document.All("id").Value = ThisWorkbook.Sheets("sheet1").Range("c1")
IE.Document.All("years").Value = ThisWorkbook.Sheets("sheet1").Range("d1")
IE.Document.All("submit").Click
End Sub
The above code I use to populate a web form and submit it. I have around 150 rows of data ranging from A1:D1. I am trying to find a way to loop through the rows 1 by 1 after submitting the form until it reaches the end.
So essentially it will start on the first row and populate the fields from A1:D1, then once complete go down to the next row and do the same for A2:D2. and so on
The trick here is to organise your source data. Using two columns you can record the field name and the required value:
A B
1 Title Sample Title
2 Names Sample Names
3 Floor Sample Floor
To loop:
Sub AutoLoadAccounts()
Dim IE As Object
Dim cRow As Range ' Current row, used to extract values from Excel.
Set IE = CreateObject("InternetExplorer.Application")
IE.Navigate "https://www.urlhere.com/admin/accounts/create"
IE.Visible = True
While IE.busy
DoEvents
Wend
' Executes once for each row in the source range.
For Each cRow In ThisWorkbook.Sheets("sheet1").Range("A1:A3")
' Read field name and value from current row.
IE.Document.All(cRow.Value).Value = cRow.Offset(0, 1)
Next
IE.Document.All("submit").Click
End Sub
This code could be improved. At the moment the source range is hard coded (Range("A1:A3")). You could improve this, so the code automatically identifies all completed rows in Excel. If you are interested research the worksheets UsedRange object.
EDIT
Added example that reads source data from columns, not rows.
Sub AutoLoadAccounts_Columns()
Dim IE As Object
Dim cRow As Range ' Current row, used to extract values from Excel.
Set IE = CreateObject("InternetExplorer.Application")
IE.Navigate "https://www.urlhere.com/admin/accounts/create"
IE.Visible = True
While IE.busy
DoEvents
Wend
' Executes once for each row in the source range.
For Each cRow In ThisWorkbook.Sheets("sheet1").Range("A1:C1")
' Read field name and value from current row.
IE.Document.All(cRow.Value).Value = cRow.Offset(1, 0).Value
Next
IE.Document.All("submit").Click
End Sub

Fetch Data from HTML Website using VBA - FREEMAPTOOLS.COM

I am trying to input a post code into this website and pull the results into Excel using VBA
http://www.freemaptools.com/find-uk-postcodes-inside-radius.htm
In short you input a post code and set a radius either in miles or KM and it gives you all the post codes within that area. As you can imagine this tool would be very useful!
This is what I have so far:
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = 0
url = "http://www.freemaptools.com/find-uk-postcodes-inside-radius.htm"
ie.Navigate url
state = 0
Do Until state = 4
DoEvents
state = ie.readyState
Loop
It would be good if say cell A1 had the post code and cell A2 had the distance in KM. This script would then look at this as the variable.
I am not 100% sure put I think I then need to Parse the result to put them each into there own cell.
Any help with this would be incredible!
Here you go
Download the file
Sub postcode()
Dim URL As String, str_output As String, arr_output() As String, row As Long
Dim obj_Radius As Object, obj_Miles As Object, post_code As Object
Dim btn As Object, btn_Radius As Object, tb_output As Object
URL = "http://www.freemaptools.com/find-uk-postcodes-inside-radius.htm"
Dim IE As Object
Set IE = CreateObject("internetexplorer.application")
IE.Visible = True
IE.navigate URL
Do While IE.readystate <> 4
DoEvents
Loop
delay 5
Set obj_Radius = IE.document.getelementbyid("tb_radius")
obj_Radius.Value = ThisWorkbook.Sheets(1).Range("B1")
Set obj_Miles = IE.document.getelementbyid("tb_radius_miles")
obj_Miles.Value = ThisWorkbook.Sheets(1).Range("B2")
Set post_code = IE.document.getelementbyid("goto")
post_code.Value = ThisWorkbook.Sheets(1).Range("B3")
Set btn_Radius = IE.document.getelementsbytagname("Input")
For Each btn In btn_Radius
If btn.Value = "Draw Radius" Then
btn.Click
End If
Next
Do While IE.readystate <> 4
DoEvents
Loop
delay 10
Set tb_output = IE.document.getelementbyid("tb_output")
str_output = tb_output.innerText
arr_output = Split(str_output, ",")
row = 1
For i = LBound(arr_output) To UBound(arr_output)
ThisWorkbook.Sheets(1).Range("C" & row) = arr_output(i)
row = row + 1
Next
End Sub
Private Sub delay(seconds As Long)
Dim endTime As Date
endTime = DateAdd("s", seconds, Now())
Do While Now() < endTime
DoEvents
Loop
End Sub

Resources