Scraping using VBA - excel

i am trying to extract one figure from a gov website, I have done a lot of googling and I am kinda lost for ideas, my code below returns a figure but it isnt the figure I want to get and I am not entirely sure why.
I want to subtract the figure from the 'Cases by Area (Whole Pandemic)' table 'Upper tier LA' section and 'Southend on Sea' Case number.
https://coronavirus.data.gov.uk/details/cases
I stole this code from online somewhere and tried to replicate with my class number I found within F12 section on the site.
Sub ExtractLastValue()
Set objIE = CreateObject("InternetExplorer.Application")
objIE.Top = 0
objIE.Left = 0
objIE.Width = 800
objIE.Height = 600
objIE.Visible = True
objIE.Navigate ("https://coronavirus.data.gov.uk/details/cases")
Do
DoEvents
Loop Until objIE.readystate = 4
MsgBox objIE.document.getElementsByClassName("sc-bYEvPH khGBIg govuk-table__cell govuk-table__cell--numeric ")(0).innerText
Set objIE = Nothing
End Sub

Data comes from the official API and returns a json response dynamically on that page when you click the Upper Tier panel.
Have a look and play with the API guidance
here:
https://coronavirus.data.gov.uk/details/developers-guide
You can make a direct xhr request by following the guidance in the API documentation and then using a json parser to handle the response. For your request it would be something like the following:
https://coronavirus.data.gov.uk/api/v1/data?filters=areaName=Southend-on-Sea&areaType=utla&latestBy=cumCasesByPublishDate&structure=
{"date":"date", "areaName":"areaName","cumCasesByPublishDate":"cumCasesByPublishDate",
"cumCasesByPublishDateRate":"cumCasesByPublishDateRate"}
XHR:
A worked example using jsonconverter.bas as the json parser
Option Explicit
Public Sub GetCovidNumbers()
Dim http As Object, json As Object
Set http = CreateObject("MSXML2.XMLHTTP")
With http
.Open "GET", "https://coronavirus.data.gov.uk/api/v1/data?filters=areaName=Southend-on-Sea&areaType=utla&latestBy=cumCasesByPublishDate&structure={""date"":""date"",""areaName"":""areaName"",""cumCasesByPublishDate"":""cumCasesByPublishDate"",""cumCasesByPublishDateRate"":""cumCasesByPublishDateRate""}", False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
Set json = JsonConverter.ParseJson(.responseText)("data")(1)
End With
With ActiveSheet
Dim arr()
arr = json.Keys
.Cells(1, 1).Resize(1, UBound(arr) + 1) = arr
arr = json.Items
.Cells(2, 1).Resize(1, UBound(arr) + 1) = arr
End With
End Sub
Json library (Used in above solution):
I use jsonconverter.bas. Download raw code from here and add to standard module called JsonConverter . You then need to go VBE > Tools > References > Add reference to Microsoft Scripting Runtime. Remove the top Attribute line from the copied code.
Internet Explorer:
You could do a slower, more complicated, internet explorer solution where you need to select the utla option when present, then select from the table the desired value:
Option Explicit
Public Sub GetCovidNumbers()
'Tools references Microsoft Internet Controls and Microsoft HTML Object Library
Dim ie As SHDocVw.InternetExplorer, t As Date, ele As Object
Const MAX_WAIT_SEC As Long = 10
Set ie = New SHDocVw.InternetExplorer
With ie
.Visible = True
.Navigate2 "https://coronavirus.data.gov.uk/details/cases"
While .Busy Or .ReadyState <> READYSTATE_COMPLETE: DoEvents: Wend
t = Timer 'timed loop for element to be present to click on (to get utla)
Do
On Error Resume Next
Set ele = .Document.querySelector("#card-cases_by_area_whole_pandemic [aria-label='Upper tier LA']")
On Error GoTo 0
If Timer - t > MAX_WAIT_SEC Then Exit Do
Loop While ele Is Nothing
If ele Is Nothing Then Exit Sub
ele.Click
While .Busy Or .ReadyState <> READYSTATE_COMPLETE: DoEvents: Wend
Dim table As MSHTML.HTMLTable, datetime As String, result()
Set table = .Document.querySelector("table[download='cumCasesByPublishDate,cumCasesByPublishDateRate']")
datetime = .Document.querySelector("time").getAttribute("datetime")
result = GetDataForUtla("Southend-on-Sea", datetime, table)
With ActiveSheet
.Cells(1, 1).Resize(1, 4) = Array("Datetime", "Area", "Cases", "Rate per 100,000 population")
.Cells(2, 1).Resize(1, UBound(result) + 1) = result
End With
.Quit
End With
End Sub
Public Function GetDataForUtla(ByVal utla As String, ByVal datetime As String, ByVal table As MSHTML.HTMLTable) As Variant
Dim row As MSHTML.HTMLTableRow, i As Long
For Each row In table.Rows
If InStr(row.outerHTML, utla) > 0 Then
Dim arr(4)
arr(0) = datetime
For i = 0 To 2
arr(i + 1) = row.Children(i).innerText
Next
GetDataForUtla = arr
Exit Function
End If
Next
GetDataForUtla = Array("Not found")
End Function
References:
https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Selectors
https://developer.mozilla.org/en-US/docs/Web/API/Document/querySelector

Related

Web Scraping: Button clicking and help navigating through paths

I am trying to scrape some doctor names and addresses from the website: https://albertafindadoctor.ca/find-a-doc/directory
I am trying to solve the following issue:
Once on the doctor's toggle, I want to pull 4 pieces of data from the entire page, not just the first 25 displayed.
While the code works for the initial webpage, it only pulls the first 25 pieces of data. There are a significant number of other pages that I still need to pull (3822 different doctors).
Unfortunately, I'm at a loss on how to navigate and pull from these different pages. When I inspect elements to see how to navigate between pages a see matrix changing so I'm not sure if that has something to do with it?
Option Explicit
Sub GetAlbertaDoctors()
Dim objIE As InternetExplorer
Dim clinicEle As Object
Dim clinicName As String
Dim clinicAddress As String
Dim clinicCategory As String
Dim doctorName As String
Dim y As Integer
Set objIE = New InternetExplorer
objIE.Visible = True
objIE.Navigate "https://albertafindadoctor.ca/find-a-doc/directory"
While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Wend
Application.Wait (Now + TimeValue("0:00:5"))
objIE.Document.getElementsByClassName("physician-toggle")(0).Click
Application.Wait (Now + TimeValue("0:00:5"))
y = 2
For Each clinicEle In objIE.Document.getElementsByClassName("clinic")
clinicCategory = clinicEle.getElementsByClassName("pcn")(0).innerText
clinicName = clinicEle.getElementsByClassName("clinic-name")(0).innerText
doctorName = clinicEle.getElementsByTagName("h3")(0).innerText
clinicAddress = clinicEle.getElementsByClassName("address")(0).innerText
Sheets("Sheet2").Range("A" & y).Value = clinicCategory
Sheets("Sheet2").Range("B" & y).Value = clinicName
Sheets("Sheet2").Range("C" & y).Value = doctorName
Sheets("Sheet2").Range("D" & y).Value = clinicAddress
y = y + 1
Next
objIE.Quit
End Sub
When I run this, I get the error 91 "Object variable or With block variable not set" on the clicking line:
objIE.Document.getElementsByClassName("physician-toggle active")(0).Click
You don't need to loop all pages. You can use the browser to get to that page and click on Doctors if required. After that, grab the number of results and then mimic the xhr request the page makes for listings - which is returned as json. Alter the query string the page makes i.e. the parameter for limit to get all listings. Use a json parser (I use jsonconverter - instructions in the code for installation) to parse out your info.
There is a proper page load wait and a couple of loops to ensure elements are present. These should really be timed loops. See loop format here.
I add an additional test to ensure you do not attempt to click Doctors when it is not required to do so.
Not all listings has all info hence the On Error Resume Next paired with On Error GoTo 0. Looks like you may be able to build a dictionary to fill in some of the blank values based on existing paired values (or using ids present in json object).
I store all results in an array and write out in one go.
Option Explicit
'VBE > Tools > References:
' Microsoft Internet Controls
' Microsoft Scripting Runtime
'Download and add to standard module called jsonconverter from https://github.com/VBA-tools/VBA-JSON/blob/master/JsonConverter.bas
Public Sub GetListings()
Dim ie As InternetExplorer, s As String, json As Object, newUrl As String
Set ie = New InternetExplorer
With ie
.Visible = True
.Navigate2 "https://albertafindadoctor.ca/find-a-doc/directory"
While .Busy Or .readyState < 4: DoEvents: Wend
With .document.querySelector("[data-cp-option=physician]")
If Not .className = "physician-toggle active" Then .Click
End With
Dim resultsInfo() As String, numResults As Long, ele As Object
Do
On Error Resume Next
Set ele = .document.querySelector(".paginator")
On Error GoTo 0
Loop While ele Is Nothing
Do
Loop While .document.querySelector(".paginator").innerText = vbNullString
resultsInfo = Split(Trim$(.document.querySelector(".paginator").innerText), "of ")
.Quit
End With
numResults = resultsInfo(UBound(resultsInfo))
newUrl = "https://albertafindadoctor.ca/search/directory/physicians?page=1&limit=" & numResults & "&with[]=pcn&with[]=clinics&with[]=languages&with[]=specialties"
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", newUrl, False
.send
Set json = JsonConverter.ParseJson(.responseText)("items")
End With
Dim row As Object, results(), r As Long, headers(), ws As Worksheet, key As Variant
headers = Array("clinicCategory", "clinicName", "doctorName", "clinicAddress")
Set ws = ThisWorkbook.Worksheets("Sheet1")
ReDim results(1 To json.Count, 1 To UBound(headers) + 1)
For Each row In json
r = r + 1
On Error Resume Next
For Each key In row.keys
Select Case key
Case "clinical_name"
results(r, 3) = row(key)
Case "pcn"
results(r, 1) = row(key)("name")
Case "clinics"
results(r, 2) = row(key)(1)("name")
results(r, 4) = Join$(Array(row(key)(1)("street_address"), row(key)(1)("city"), row(key)(1)("province"), row(key)(1)("postal_code")), ", ")
End Select
Next
On Error GoTo 0
Next
With ws
.Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
.Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
End With
End Sub
Sample output:
Reading:
querySelector
json
css selectors
arrays and arrays2

How to scrape data from Bloomberg's website with VBA

Background
Disclaimer: I am a beginner, please bare with my - most plausibly wrong - code.
I want to update currency pairs' value (PREV CLOSE) with a button-enabled-VBA macro. My Excel worksheet contains FX pairs (e.g. USDGBP) on column G:G which are then used to run a FOR loop for every pair in the column.
The value would then be stored in column I:I
Right now, the problem according to the Debugger lies in one line of code that I will highlight below
Sources
I got some inspiration from https://www.youtube.com/watch?v=JxmRjh-S2Ms&t=1050s - notably 17:34 onwards - but I want my code to work for multiple websites at the press of a button.
I have tried the following code
Public Sub Auto_FX_update_BMG()
Application.ScreenUpdating = False 'My computer is not very fast, thus I use this line of
'code to save some computing power and time
Dim internet_object As InternetExplorer
Dim i As Integer
For i = 3 To Sheets(1).Cells(3, 7).End(xlDown).Row
FX_Pair = Sheets(1).Cells(i, 7)
Set internet_object = New InternetExplorer
internet_object.Visible = True
internet_object.navigate "https://www.bloomberg.com/quote/" & FX_Pair & ":CUR"
Application.Wait Now + TimeValue("00:00:05")
internet_object.document.getElementsByClassName("class")(0).getElementsByTagName ("value__b93f12ea") '--> DEBUGGER PROBLEM
'My goal here is to "grab" the PREV CLOSE
'value from the website
With ActiveSheet
.Range(Cells(i, 9)).Value = HTML_element.Children(0).textContent
End With
Sheets(1).Range(Cells(i, 9)).Copy 'Not sure if these 2 lines are unnecesary
ActiveSheet.Paste
Next i
Application.ScreenUpdating = True
End Sub
Expected Result
WHEN I enter "USDGBP" on a cell on column G:G, the macro would go to https://www.bloomberg.com/quote/EURGBP:CUR and "grab" the PREV CLOSE value of 0.8732 (using today's value) and insert it in the respective row of column I:I
As of now, I am just facing the debugger without much idea on how to solve the problem.
You can use class selectors in a loop. The pattern
.previousclosingpriceonetradingdayago .value__b93f12ea
specifies to get child elements with class value__b93f12ea having parent with class previousclosingpriceonetradingdayago. The "." in front is a css class selector and is a faster way of selecting as modern browsers are optimized for css. The space between the two classes is a descendant combinator. querySelector returns the first match for this pattern from the webpage html document.
This matches on the page:
You can see the parent child relationship and classes again here:
<section class="dataBox previousclosingpriceonetradingdayago numeric">
<header class="title__49417cb9"><span>Prev Close</span></header>
<div class="value__b93f12ea">0.8732</div>
</section>
N.B. If you are a Bloomberg customer look into their APIs. Additionally, it is very likely you can get this same info from other dedicated APIs which will allow for much faster and more reliable xhr requests.
VBA (Internet Explorer):
Option Explicit
Public Sub test()
Dim pairs(), ws As Worksheet, i As Long, ie As Object
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set ie = CreateObject("InternetExplorer.Application")
With ws
pairs = Application.Transpose(.Range("G2:G" & .Cells(.rows.Count, "G").End(xlUp).Row).Value) ' assumes pairs start in row 2
End With
Dim results()
ReDim results(1 To UBound(pairs))
With ie
.Visible = True
For i = LBound(pairs) To UBound(pairs)
.Navigate2 "https://www.bloomberg.com/quote/" & pairs(i) & ":CUR", False
While .Busy Or .readyState < 4: DoEvents: Wend
results(i) = .document.querySelector(".previousclosingpriceonetradingdayago .value__b93f12ea").innerText
Next
.Quit
End With
ws.Cells(2, "I").Resize(UBound(results), 1) = Application.Transpose(results)
End Sub
For very limited numbers of requests (as leads to blocking) you could use xhr request and regex out the value. I assume pairs are in sheet one and start from G2. I also assume there are no empty cells or invalid pairs in column G up to an including last pair to search for. Otherwise, you will need to develop the code to handle this.
Try regex here
Option Explicit
Public Sub test()
Dim re As Object, pairs(), ws As Worksheet, i As Long, s As String
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set re = CreateObject("VBScript.RegExp")
With ws
pairs = Application.Transpose(.Range("G2:G" & .Cells(.rows.Count, "G").End(xlUp).Row).Value) ' assumes pairs start in row 2
End With
Dim results()
ReDim results(1 To UBound(pairs))
With CreateObject("MSXML2.XMLHTTP")
For i = LBound(pairs) To UBound(pairs)
.Open "GET", "https://www.bloomberg.com/quote/" & pairs(i) & ":CUR", False
.send
s = .responseText
results(i) = GetCloseValue(re, s, "previousClosingPriceOneTradingDayAgo%22%3A(.*?)%2")
Next
End With
ws.Cells(2, "I").Resize(UBound(results), 1) = Application.Transpose(results)
End Sub
Public Function GetCloseValue(ByVal re As Object, inputString As String, ByVal pattern As String) As String 'https://regex101.com/r/OAyq30/1
With re
.Global = True
.MultiLine = True
.IgnoreCase = False
.pattern = pattern
If .test(inputString) Then
GetCloseValue = .Execute(inputString)(0).SubMatches(0)
Else
GetCloseValue = "Not found"
End If
End With
End Function
Try below code:
But before make sure to add 2 reference by going to Tools> References > then look for Microsoft HTML Object Library and Microsoft Internet Controls
This code works upon using your example.
Sub getPrevCloseValue()
Dim ie As Object
Dim mySh As Worksheet
Set mySh = ThisWorkbook.Sheets("Sheet1")
Dim colG_Value As String
Dim prev_value As String
For a = 3 To mySh.Range("G" & Rows.Count).End(xlUp).Row
colG_Value = mySh.Range("G" & a).Value
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
ie.navigate "https://www.bloomberg.com/quote/" & colG_Value & ":CUR"
Do While ie.Busy: DoEvents: Loop
Do Until ie.readyState = 4: DoEvents: Loop
'Application.Wait (Now + TimeValue("00:00:03")) 'activate if having problem with delay
For Each sect In ie.document.getElementsByTagName("section")
If sect.className = "dataBox previousclosingpriceonetradingdayago numeric" Then
prev_value = sect.getElementsByTagName("div")(0).innerText
mySh.Range("I" & a).Value = prev_value
Exit For
End If
Next sect
Next a
I have a video tutorial for basic web automation using vba which include web data scraping and other commands, please check the link below:
https://www.youtube.com/watch?v=jejwXID4OH4&t=700s

Cannot get the text inside a <p> tag using VBA

I have the following URL
https://www.wavemotion.gr/shop/smartphone-accessories/easy-one-touch-wireless-fast-charging-mount
I am trying to get the availability of the product by using the following
For i = 2 To lastrow
mylink = wks.Cells(i, 2).Value
ie.Navigate mylink
While ie.Busy Or ie.ReadyState < 4: DoEvents: Wend
t = Timer
Do
DoEvents
On Error Resume Next
Set instock = ie.Document.querySelector(".stock.in-stock").innerText
If instock Is Nothing Then
Set availability = ie.Document.querySelector(".stock.out-of-stock").innerText
Else
Set availability = instock
End If
wks.Cells(i, "D") = availability
If Timer - t > MAX_WAIT_SEC Then Exit Do
On Error GoTo 0
Loop
Next i
But I get allways nothing on
Set instock = ie.Document.querySelector(".stock.in-stock").innerText
I checked the query on
https://try.jsoup.org/
It is working
What I am doing wrong here? There is not any id to target only class name
<p class="stock in-stock">Διαθέσιμο</p>
So, what's happening here is that you're trying to Set string datatype innerText to object variable instock. The reason it's returning Nothing is because your On Error Resume Next statement is suppressing the error message. If you took that out and ran it, you would get a Type Mismatch. What you'd need to do is split it into a line that assigns the object to the object variable and then a line that reads the innerText of the assigned object.
Set instock = ie.Document.querySelector(".stock.in-stock")
If instock Is Nothing Then
Set availability = ie.Document.querySelector(".stock.out-of-stock")
Else
Set availability = instock
End If
wks.Cells(i, "D") = availability.innerText
There is a better, faster way. Use xmlhttp and parse that info out of the json stored in one of the script tags. If issuing large numbers of requests you may need to add a wait every x number of requests in case of throttling/blocking. Note: You can use the same approach with InternetExplorer and thus remove many of your lines of code, though you have another library (.bas) dependancy.
You need to install jsonconverter.bas from here and go vbe > tools > references > and add a reference to Microsoft Scripting Runtime
Option Explicit
Public Sub GetStocking()
Dim json As Object, html As HTMLDocument
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.wavemotion.gr/shop/smartphone-accessories/easy-one-touch-wireless-fast-charging-mount", False
.send
html.body.innerHTML = StrConv(.responseBody, vbUnicode)
End With
Set json = JsonConverter.ParseJson(html.querySelector("script[type='application/ld+json']").innerHTML)
Debug.Print json("offers")("availability")
End Sub
This is what the entire json contains:
Internet Explorer version:
Option Explicit
Public Sub GetInfo()
Dim ie As New InternetExplorer, i As Long, s As String, scripts As Object, json As Object
With ie
.Visible = False
.Navigate2 "https://www.wavemotion.gr/shop/smartphone-accessories/easy-one-touch-wireless-fast-charging-mount"
While .Busy Or .readyState < 4: DoEvents: Wend
Set scripts = .document.querySelectorAll("script[type='application/ld+json']")
For i = 0 To scripts.Length - 1
s = scripts.item(i).innerHTML
If InStr(s, "availability") > 0 Then
Set json = JsonConverter.ParseJson(s)
Exit For
End If
Next
.Quit
If Not json Is Nothing Then Debug.Print json("offers")("availability")
End With
End Sub

Object does not support this property of method, while parsing html document

I want to scrap every post heading form this blog. I am using the code bellow but it's giving me an error "Run time error 438 object does not support this property or method" in line
Cells(i, 1).Value = ele.getElementsByClassName("entry-title")(0).getElementsByTagName("a")(0).innerText
The code is:
Private Sub CommandButton1_Click()
Dim bot As Object
Dim doc As New HTMLDocument
Dim ele As HTMLElementCollection
Dim i As Long
Set bot = CreateObject("MSXML2.XMLHTTP")
bot.Open "GET", "http://themakeupblogger.com/makeup/", False
bot.send
doc.body.innerHTML = bot.responseText
For Each ele In doc.getElementsByTagName("article")
i = Cells(Rows.Count, 1).End(xlUp).Row + 1
Cells(i, 1).Value = ele.getElementsByClassName("entry-title")(0).getElementsByTagName("a")(0).innerText
Next ele
End Sub
Give this a shot and get all the titles you are after.
Sub demo()
Dim http As New XMLHTTP60, html As New HTMLDocument
Dim r As Long, ele As Object
With http
.Open "GET", "http://themakeupblogger.com/makeup/", False
.send
html.body.innerHTML = .responseText
End With
For Each elem In html.getElementsByClassName("entry-title")
With elem.getElementsByTagName("a")
If .Length Then r = r + 1: Cells(r, 1) = .Item(0).innerText
End With
Next elem
End Sub
Reference to add to the library:
1. Microsoft XML, v6.0
2. Microsoft HTML Object Library
Partial results:
4 High-Coverage Foundations That Might As Well Be Skincare
10 Memorial Day Beauty Essentials That Belong In Your Beach Bag
Don’t Get Married Without These Wedding Day Makeup Tips (Courtesy of a Makeup Artist)
To get the articles from that page you can do something like:
Sub demo()
Dim http As New InternetExplorer, html As New HTMLDocument
Dim r As Long, elem As Object
With http
.Visible = False
.navigate "http://themakeupblogger.com/makeup/"
Do Until .readyState = READYSTATE_COMPLETE: Loop
Set html = .document
End With
For Each elem In html.getElementsByTagName("article")
With elem.getElementsByTagName("h1")
If .Length Then r = r + 1: Cells(r, 1) = .Item(0).getElementsByTagName("a")(0).innerText
End With
With elem.getElementsByTagName("div")(3).getElementsByTagName("p")
If .Length Then Cells(r, 2) = .Item(0).innerText
End With
Next elem
End Sub
This time the reference you should add to the library:
1. Microsoft Internet Controls
2. Microsoft HTML Object Library

web scraping using excel and VBA

i wrote my VBA code in excel sheet as below but it is not scrape data for me and also i don't know why please any one help me. it gave me reullt as "click her to read more" onlyi want to scrape enitre data such as first name last name state zip code and so on
Sub extractTablesData()
Dim IE As Object, obj As Object
Dim myState As String
Dim r As Integer, c As Integer, t As Integer
Dim elemCollection As Object
Set IE = CreateObject("InternetExplorer.Application")
myState = InputBox("Enter the city where you wish to work")
With IE
.Visible = True
.navigate ("http://www.funeralhomes.com/go/listing/Search? name=&city=&state=&country=USA&zip=&radius=")
While IE.readyState <> 4
DoEvents
Wend
For Each obj In IE.document.all.item("state").Options
If obj.innerText = myState Then
obj.Selected = True
End If
Next obj
IE.document.getElementsByValue("Search").item.Click
Do While IE.Busy: DoEvents: Loop
ThisWorkbook.Sheets("Sheet1").Range("A1:K1500").ClearContents
Set elemCollection = IE.document.getElementsByTagName("TABLE")
For t = 0 To (elemCollection.Length - 1)
For r = 0 To (elemCollection(t).Rows.Length - 1)
For c = 0 To (elemCollection(t).Rows(r).Cells.Length - 1)
ThisWorkbook.Worksheets(1).Cells(r + 1, c + 1) = elemCollection(t).Rows(r).Cells(c).innerText
Next c
Next r
Next t
End With
Set IE = Nothing
End Sub
Using the same URL as the answer already given you could alternatively select with CSS selectors to get the elements of interest, and use split to get just the names and address parts from the text. We can also do away with the browser altogether to get faster results from first results page.
Business name:
You can get the name with the following selector (using paid listing example):
div.paid-listing .listing-title
This selects (sample view)
Try
Address info:
The associated descriptive information can be retrieved with the selector:
div.paid-listing .address-summary
And then using split we can parse this into just the address information.
Code:
Option Explicit
Public Sub GetTitleAndAddress()
Dim oHtml As HTMLDocument, nodeList1 As Object, nodeList2 As Object, i As Long
Const URL As String = "http://www.funeralhomes.com/go/listing/ShowListing/USA/New%20York/New%20York"
Set oHtml = New HTMLDocument
With CreateObject("WINHTTP.WinHTTPRequest.5.1")
.Open "GET", URL, False
.send
oHtml.body.innerHTML = .responseText
End With
Set nodeList1 = oHtml.querySelectorAll("div.paid-listing .listing-title")
Set nodeList2 = oHtml.querySelectorAll("div.paid-listing .address-summary")
With Worksheets("Sheet3")
.UsedRange.ClearContents
For i = 0 To nodeList1.Length - 1
.Range("A" & i + 1) = nodeList1.Item(i).innerText
.Range("B" & i + 1) = Split(nodeList2.Item(i).innerText, Chr$(10))(0)
Next i
End With
End Sub
Example output:
Yeah, without an API, this can be very tricky at best, and very inconsistent at worst. For now, you can try the script below.
Sub DumpData()
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
URL = "http://www.funeralhomes.com/go/listing/ShowListing/USA/New%20York/New%20York"
'Wait for site to fully load
IE.Navigate2 URL
Do While IE.Busy = True
DoEvents
Loop
RowCount = 1
With Sheets("Sheet1")
.Cells.ClearContents
RowCount = 1
For Each itm In IE.document.all
If itm.classname Like "*free-listing*" Or itm.classname Like "*paid-listing*" Then
.Range("A" & RowCount) = itm.classname
.Range("B" & RowCount) = Left(itm.innertext, 1024)
RowCount = RowCount + 1
End If
Next itm
End With
End Sub
You probably want some kind of input box to capture the city and state and radius from the user, or capture those variable in cells in your worksheet.
Notice, the '%20' is a space character.
I got this idea from a friend of mine, Joel, a long time ago. That guy is great!

Resources