How to create a HTTP GET in Excel VBA - excel

Good afternoon all,
I'm working on a project for work where I need to get vehicle values based on registration number and mileage and feed these into an Excel spreadsheet.
The registration number and mileage is stored in the spreadsheet but I'm stuck on where to start for getting started.
I created a rough VBA application last weekend which looked as follows
The registration number and mileage is stored in the spreadsheet but I'm stuck on where to start for getting started.
I created a rough VBA application last weekend which looked as follows:
Sub GetHTMLDocument()
Dim IE As New SHDocVw.InternetExplorer
Dim HTMLDoc As MSHTML.HTMLDocument
Dim Email As MSHTML.IHTMLElement
Dim Password As MSHTML.IHTMLElement
Dim LoginButton As MSHTML.IHTMLElement
Dim REG As MSHTML.IHTMLElement
Dim Mileage As MSHTML.IHTMLElement
Dim CAPGo As MSHTML.IHTMLElement
Dim objEvent
Dim GetValue As MSHTML.IHTMLElement
'Show IE for testing purposes
IE.Visible = True
'Navigate to web page
IE.Navigate "https://valuationanywhere.cap.co.uk/LoginPage?ReturnUrl=%2f%3f__hstc%3d208265677.8bb2d3e6c872f15cd37070c17648ee29.1549763639794.1549763639794.1549763639794.1%26__hssc%3d208265677.1.1549763639794%26__hsfp%3d959865525&__hstc=208265677.8bb2d3e6c872f15cd37070c17648ee29.1549763639794.1549763639794.1549763639794.1&__hssc=208265677.1.1549763639794&__hsfp=959865525"
'Loop an empty loop until done
Do While IE.ReadyState <> READYSTATE_COMPLETE
Loop
Set HTMLDoc = IE.Document
'inputs email address
Set Email = HTMLDoc.getElementById("inputLoginEmail")
Email.Value = "email"
'inputs password
Set Password = HTMLDoc.getElementById("inputLoginPassword")
Password.Value = "password"
'Clicks login button
Set LoginButton = HTMLDoc.getElementById("btnLogin")
LoginButton.Click
'Wait 3 seconds for page to load
Application.Wait (Now + TimeValue("0:00:03"))
Set objEvent = IE.Document.createEvent("HTMLEvents")
'Input REG into text box
Set REG = HTMLDoc.getElementById("vrm")
REG.Value = "reg"
'Input mileage into text box
Set Mileage = HTMLDoc.getElementById("mileage")
Mileage.Value = "181000"
'Fakes data entry as no focus is given to the text box
objEvent.initEvent "change", False, True
REG.dispatchEvent objEvent
Mileage.dispatchEvent objEvent
'Clicks Go button
Set tags = IE.Document.getElementsByTagName("button")
For Each tagx In tags
If tagx.innerText = "Go" Then
tagx.Click
Exit For
End If
Next
'Wait 3 seconds for popup to load
Application.Wait (Now + TimeValue("0:00:03"))
Set tags = IE.Document.getElementsByTagName("button")
For Each tagx In tags
If tagx.innerText = "Create NEW Valuation" Then
tagx.Click
Exit For
End If
Next
This would navigate to the page, log me in and search for valuation. However we will eventually have a database of hundreds of cars we want to get valuations on and our CAP service has some plugins here - https://soap.cap.co.uk/vrm/capvrm.asmx?op=VRMValuation
Is there any way I can have VBA pick a reg and mileage from a sheet, and pull back the value?
I'm not expecting anyone to write the entire thing I would love to learn from this. But can anyone point me in the right direction?
Kindest regards,
Craig

In essence you can read a 2 column range from Excel, containing column A reg and column B mileage, into a 2d array, then loop the dimension 1 of the array from lbound to ubound (i.e. the rows) and access the reg and mileage by indexing into the array. You can then concatenate those values into the body of the POST request . This is understandably very high level shown below. You would read the response into an xml document so you can parse out the info you want.
In terms of retrieving values we would need to see the relevant XML.
Option Explicit
Public Sub Test()
'VBE > Tools > References > Add a reference to Microsoft HTML Object Library
'other code
Dim regAndMileage(), xmlDoc As Object
Dim ws As Worksheet, r As Long, placeholderMileage As String, placeholderVR As String, body As String, response As String, html As HTMLDocument
Const SUBSCRIBER_ID As Long = 123
Const PASSWORD As String = "ABC"
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set html = New HTMLDocument
Set xmlDoc = CreateObject("MSXML2.DOMDocument")
regAndMileage = ws.Range("A2:B4").Value 'Create the array. Reg is in col A and mileage in col B. Check datatypes when passed are as expected (int - though Long should work; and string)
body = "<?xml version=""1.0"" encoding=""utf-8""?>"
body = body & Chr$(10) & "<soap:Envelope xmlns:xsi=""http://www.w3.org/2001/XMLSchema-instance"" xmlns:xsd=""http://www.w3.org/2001/XMLSchema"" xmlns:soap=""http://schemas.xmlsoap.org/soap/envelope/"">"
body = body & Chr$(10) & "<soap:Body>"
body = body & Chr$(10) & "<VRMValuation xmlns=""https://soap.cap.co.uk/vrm"">"
body = body & Chr$(10) & "<SubscriberID>" & SUBSCRIBER_ID & " </SubscriberID>" 'int
body = body & Chr$(10) & "<Password>" & PASSWORD & "</Password>" 'string
body = body & Chr$(10) & "<VRM>placeholderVRM</VRM>" 'string
body = body & Chr$(10) & "<Mileage>placeholderMileage</Mileage>" 'Mileage
body = body & Chr$(10) & "<StandardEquipmentRequired>boolean</StandardEquipmentRequired>"
body = body & Chr$(10) & "</VRMValuation>"
body = body & Chr$(10) & "</soap:Body>"
body = body & Chr$(10) & "</soap:Envelope>"
With CreateObject("MSXML2.XMLHTTP")
For r = LBound(regAndMileage, 1) To UBound(regAndMileage, 1)
mileage = regAndMileage(r, 1)
reg = regAndMileage(r, 2)
'create your body here and concatentate in your mileage and reg variables
.Open "POST", "protocol&domain/vrm/capvrm.asmx/VRMValuation", False
.setRequestHeader "SOAPAction", "https://soap.cap.co.uk/vrm/VRMValuation"
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
.send Replace$(Replace$(body, placeholderVRM, reg), placeholderMileage, mileage)
response = .responseText
With xmlDoc
.validateOnParse = True
.setProperty "SelectionLanguage", "XPath"
.async = False
If Not .LoadXML(sResponse) Then
Err.Raise .parseError.ErrorCode, , .parseError.reason
End If
End With
'Do something to extract values
Next
End With
End Sub
See this for more info on Ranges and arrays.
Potentially you need to add Content Length and other information in the request.
SOAP requests

Related

vba web scrapping

I have been able to launch the website and input the value but I'm not able to press enter to get the output value. It's an online calculator that allows you to see the temperature of a refrigerant while inputting a pressure. the output(temperature) is supposed to change when you input(pressure), however when I do it the output(temperature) stays the same while I'm able to change the input(pressure). could you tell me what might be the reason for this? is the website blocking my input? i get this error: (run-time erro91: object variable or with block variable not set)
when i run the code for this code "ht.getElementById("pressure").Value = "20""
enter image description here
here is my code:
Sub test()
Dim ie As InternetExplorer
Dim ht As HTMLDocument
Dim temp As Object
Dim press As Object
Set ie = New InternetExplorer
ie.Visible = True
ie.navigate ("https://reftools.danfoss.com/spa/tools/ref-slider#/")
Do Until ie.readyState = READYSTATE_COMPLETE
DoEvents
Loop
Set ht = ie.document
'input
ht.getElementById("pressure").Value = "20"
'Set temp = ht.getElementsByClassName("temperature")
'Set press = ht.getElementsByClassName("pressure")
'i = 1
'For Each te In temp
'Sheet1.Cells(i, 1).Value = te.innerText
'i = i + 1
'Next
End Sub
There is an API you can call to perform the same conversion. Below is an example implementation of calling that API using a custom function which accepts the pressure as input and returns the temperature in Celsius.
Option Explicit
Public Function GetTempFromPressure(ByVal pressure As Double) As Double
Dim body As String
body = "{""pressure"":""" & pressure & """,""refId"":""r404a"",""temperatureUnit"":""celsius"",""pressureUnit"":""bar"","
body = body & """pressureReferencePoint"":""absolute"","
body = body & """pressureCalculationPoint"":""dew"",""gaugeType"":""dry"",""altitudeInMeter"":0}"
With CreateObject("MSXML2.XMLHTTP")
.Open "POST", "https://reftools.danfoss.com/api/ref-slider/temperature?refId=r404a", False
.setRequestHeader "content-type", "application/json; charset=utf-8"
.send body
GetTempFromPressure = CDbl(.responseText)
End With
End Function
Public Sub test()
MsgBox GetTempFromPressure(20)
End Sub

Scrape economic data from investing website

I am working on a code to get data from : https://www.investing.com/economic-calendar/core-durable-goods-orders-59
I have got the code for getting this via httprequest: but looking to change this to work for the economic data (link above) is there any way I can get the same for the economic indicators??
code below:
Option Explicit
Sub Export_Table()
'Html Objects---------------------------------------'
Dim htmlDoc As MSHTML.HTMLDocument
Dim htmlBody As MSHTML.htmlBody
Dim ieTable As MSHTML.HTMLTable
Dim Element As MSHTML.HTMLElementCollection
'Workbooks, Worksheets, Ranges, LastRow, Incrementers ----------------'
Dim wb As Workbook
Dim Table As Worksheet
Dim i As Long
Set wb = ThisWorkbook
Set Table = wb.Worksheets("Sheet1")
'-------------------------------------------'
Dim xmlHttpRequest As New MSXML2.XMLHTTP60 '
'-------------------------------------------'
i = 2
'Web Request --------------------------------------------------------------------------'
With xmlHttpRequest
.Open "POST", "https://www.investing.com/instruments/HistoricalDataAjax", False
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
.setRequestHeader "X-Requested-With", "XMLHttpRequest"
.send "curr_id=951681&smlID=1695217&header=CLNX+Historical+Data&st_date=01%2F01%2F2017&end_date=03%2F01%2F2019&interval_sec=Monthly&sort_col=date&sort_ord=DESC&action=historical_data"
If .Status = 200 Then
Set htmlDoc = CreateHTMLDoc
Set htmlBody = htmlDoc.body
htmlBody.innerHTML = xmlHttpRequest.responseText
Set ieTable = htmlDoc.getElementById("curr_table")
For Each Element In ieTable.getElementsByTagName("tr")
Table.Cells(i, 1) = Element.Children(0).innerText
Table.Cells(i, 2) = Element.Children(1).innerText
Table.Cells(i, 3) = Element.Children(2).innerText
Table.Cells(i, 4) = Element.Children(3).innerText
Table.Cells(i, 5) = Element.Children(4).innerText
Table.Cells(i, 6) = Element.Children(5).innerText
Table.Cells(i, 7) = Element.Children(6).innerText
i = i + 1
DoEvents: Next Element
End If
End With
Set xmlHttpRequest = Nothing
Set htmlDoc = Nothing
Set htmlBody = Nothing
Set ieTable = Nothing
Set Element = Nothing
End Sub
Public Function CreateHTMLDoc() As MSHTML.HTMLDocument
Set CreateHTMLDoc = CreateObject("htmlfile")
End Function
I have used the excel tool Power Query for this very thing. It is also called Get & Transform Data. I don't like using vba for doing this type of thing.
To make it work:
In Excel Go to Data>Get Data>From Other Sources>From Web.
Enter the URL
Wait for the webpage to load and then pick the table that you want.
This website took awhile to load, but it did work for me.
Choose "Load" which goes directly to the sheet, or "Transform Data" to manipulate the data in Power Query. There are many options in power query such as split columns, filter data, Calculate Columns and ...
I would avoid the overhead of setting up a permanent connection and simply continue using XHR. With the data > from web, you cannot grab more rows than are present on the initial landing. If however you go with XHR, you can issue POST requests to get more data. The code below utilizes a loop to retrieve additional results beyond the immediate visible on the page.
When you press the Show more link there is a POST request for an additional 6 rows which uses the latest date from the current set of results as part of the POST body. The response returned is JSON. Rather than bringing in a JSON parser, given the standard nature of the JSON, and that I am already using regex to clean the date format in column 1 to put in the POST body, I use two simple regexes to extract the html for the next results table from the response, and to check whether there are more results.
The format of the JSON is:
{
"historyRows": "<tr>…..</tr>",
"hasMoreHistory": "1"
}
Or
{
"historyRows": "<tr>…..</tr>",
"hasMoreHistory": false
}
So, I do some cleaning of the extracted html in order to not confuse the html parser within MSHTML. Furthermore, I add in an id to identify the table I have constructed, so I can continue to use an id css selector (#) list within my UpdateDateResults function.
I initially oversize an array to store each retrieved table which I update ByRef. I loop requesting more results until either there are no more results, there is an error parsing the maximum date from the last retrieved table column 1, or until my specified earliest date for data retrieval falls within the date range of the latest returned table.
Finally, I write the results array out to the sheet in one go.
N.B. You can target the table by its id. It looks like the number at the end of the id could be the same as for the goods url, lending itself to generalizing the code below to work for other goods.
VBA:
Option Explicit
Public Sub GetInvestingInfo()
'tools > references > Microsoft HTML Object Library
Dim html As MSHTML.HTMLDocument, xhr As Object
Const STARTDATE As Date = "2019-11-25" 'Adjust as required. DateAdd("yyyy", -2, Date) 2 years back. This means may have some earlier months in _
batch that spans the start date but won't issue an additional request after this
Set xhr = CreateObject("MSXML2.XMLHTTP")
Set html = New MSHTML.HTMLDocument
With xhr
.Open "GET", "https://www.investing.com/economic-calendar/core-durable-goods-orders-59", False
.setRequestHeader "User-Agent", "Safari/537.36"
.send
html.body.innerHTML = .responseText
End With
Dim firstTable As Boolean, r As Long, results() As Variant
ReDim results(1 To 100000, 1 To 5)
'process initial table and update results, get cleaned date needed for request for more results
firstTable = True
Dim latestDate As String
UpdateDateResults latestDate, results, firstTable, r, html
Dim re As Object, maxDate As String, hasMoreHistory As Boolean, s As String
Set re = CreateObject("VBScript.RegExp")
With re
.Global = True
.MultiLine = False
End With
maxDate = cleanedDate(latestDate, re)
hasMoreHistory = True
Dim errorDate As Date
errorDate = DateAdd("d", 1, Date)
Do While maxDate >= STARTDATE And maxDate < errorDate 'break loop using pre-defined earliest date, error with date conversion, or when no more rows found
Application.Wait (Now + TimeSerial(0, 0, 1)) 'Pause
s = GetMoreRows(xhr, Format$(maxDate, "YYYY-MM-DD")) 'max a POST request for more data
re.Pattern = "hasMoreHistory"":(""?.*?""?)}" 'Check if there are more rows still available. "1" for yes, false for no
hasMoreHistory = (re.Execute(s)(0).submatches(0) <> False)
If Not hasMoreHistory Then Exit Do
re.Pattern = "historyRows"":""(.*)"","
html.body.innerHTML = "<table id=""me"">" & Replace$(re.Execute(s)(0).submatches(0), "\/", "/") & "</table>" 'fix html and feed into html variable
UpdateDateResults latestDate, results, firstTable, r, html
maxDate = cleanedDate(latestDate, re) 'convert value retrieved from last row in date column of table to an actual date
Loop
With ActiveSheet
.Cells.ClearContents
.Cells(1, 1).Resize(r, 5) = results 'Don't bother to resize results as clear all cells before write ou
End With
End Sub
Public Sub UpdateDateResults(ByRef latestDate As String, ByRef results() As Variant, ByRef firstTable As Boolean, ByRef r As Long, ByVal html As MSHTML.HTMLDocument)
Dim table As MSHTML.HTMLTable 'return latest date from function
Set table = html.querySelector("#eventHistoryTable59, #me")
latestDate = table.Rows(table.Rows.Length - 1).Children(0).innerText
Dim i As Long, n As Long, j As Long
n = IIf(firstTable, 0, 1)
For i = n To table.Rows.Length - 1
r = r + 1
For j = 0 To table.Rows(i).Children.Length - 2
results(r, j + 1) = table.Rows(i).Children(j).innerText
Next
Next
firstTable = False
End Sub
Public Function cleanedDate(ByVal dirtyString As String, ByVal re As Object) As Date
re.Pattern = "(^[A-Z][a-z]{2}).*(\d{2}),.(\d{4})(.*)"
On Error GoTo errhand:
If re.test(dirtyString) Then
cleanedDate = CDate(re.Replace(dirtyString, "$2" & Chr$(32) & "$1" & Chr$(32) & "$3"))
Exit Function
End If
errhand:
cleanedDate = DateAdd("d", 1, Date)
End Function
Public Function GetMoreRows(ByVal xhr As Object, ByVal dateStamp As String) As String
With xhr
.Open "POST", "https://www.investing.com/economic-calendar/more-history", False
.setRequestHeader "User-Agent", "Safari/537.36"
.setRequestHeader "x-requested-with", "XMLHttpRequest"
.setRequestHeader "content-type", "application/x-www-form-urlencoded"
.send "eventID=430865&event_attr_ID=59&event_timestamp=" & dateStamp & "+" & Application.WorksheetFunction.EncodeURL("12:30:00") & "&is_speech=0"
GetMoreRows = .responseText
End With
End Function
Regexes (without the double " escaping for VBA):
hasMoreHistory":("?.*?"?)}
historyRows":"(.*)",

Parse certain values from Google Maps XML to Excel cells

I have a link "https://maps.googleapis.com/maps/api/distancematrix/xml?origins=..." that I access to get XML file data.
XML file:
<DistanceMatrixResponse>
<status>OK</status>
<origin_address>London, UK</origin_address>
<destination_address>Manchester, UK</destination_address>
<row>
<element>
<status>OK</status>
<duration>
<value>14735</value>
<text>4 hours 6 mins</text>
</duration>
<distance>
<value>335534</value>
<text>336 km</text>
</distance>
</element>
</row>
</DistanceMatrixResponse>
XML file structure is always the same. I need to get <text>4 hours 6 mins</text> and <text>336 km</text> in form of 4,6 to cell A1 and 336 to cell A2 lets say "Contact database". Also problem here is that <text>4 hours 6 mins</text> sometimes can be <text>1 hour 3 min</text>. I can do it with formula but is it even possible with VBA?
I have managed to make it work so that the whole XML file data is in cell A1. However not able to separate what I need and paste to two different cells.
Sub GoogleAPI1()
Dim xmlhttp As Object
Set xmlhttp = CreateObject("MSXML2.serverXMLHTTP")
Dim myurl As String
myurl = "https://maps.googleapis.com/maps/api/distancematrix/xml?origins=" & ThisWorkbook.Worksheets("Contact database").Range("R86").Value _
& "&destinations=" & ThisWorkbook.Worksheets("Contact database").Range("R87").Value & "&mode=" & ThisWorkbook.Worksheets("Contact database").Range("R88").Value _
& "&key=" & ThisWorkbook.Worksheets("Contact database").Range("R82").Value
xmlhttp.Open "GET", myurl, False
xmlhttp.send
ThisWorkbook.Worksheets("Contact database").Range("R92") = xmlhttp.responseText
End Sub
Here's one way to use VBA to get the results you describe.
I extracted the node information, and then processed it using Regular Expressions to get it into the format you described.
Could probably do it more efficiently, and with more or different error-checking, but this might get you started.
Option Explicit
Sub getDurDist()
Dim xmlDoc As DOMDocument60
Dim xmlNode As IXMLDOMNode
Dim sTemp As String
Dim RE As Object, MC As Object
Dim rDest As Range
Set xmlDoc = New DOMDocument60
'hard coded here. Change to suit
Set rDest = Range("B1:C1")
rDest.Clear
xmlDoc.LoadXML Range("a1")
Set xmlNode = xmlDoc.SelectSingleNode("//duration/text")
sTemp = xmlNode.Text
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.Pattern = "\d+"
If .test(sTemp) = True Then
Set MC = .Execute(sTemp)
rDest(1, 1) = MC(0) & "," & MC(1)
End If
End With
Set xmlNode = xmlDoc.SelectSingleNode("//distance/text")
sTemp = xmlNode.Text
With RE
If .test(sTemp) = True Then
Set MC = .Execute(sTemp)
rDest(1, 2) = MC(0)
End If
End With
End Sub

How to get the first search result link of a google search using VBA?

In my day to day tasks I currently have to search a large number of products and gather information on these products. So my idea is to search the product on google and get the info from the first search result by extracting the data from the product title section and pretty much loop this for a number of products.
Here is my code below so far:
Sub SkuAutomation()
Dim ie As Object
'Navigates to google
Set ie = CreateObject("InternetExplorer.application")
ie.Visible = True
ie.Navigate "https://google.co.uk/search?q=" & Worksheets("sheet1").Cell(9, 4).Value & " " & Worksheets("sheet1").Cells(9, 2)
'Waits for page to load before next action
Do While ie.ReadyState <> READYSTATE_COMPLETE
Loop
End Sub
I just want to add a piece of code which either clicks on the first link that google returns or returns the link for me. My idea would then to be scrape the data from the product title section from that page! still very early stages though.
I am just a beginner so any type of help would be much appreciated! Many thanks in advance.
Your mileage will likely vary on this but for what you have provided you can use a CSS selector combination to target the first link by the page styling.
I use #search div.r [href*=http] but you could simplify to #search .r a. I am interested in knowing there is an http in the href though.
The # is an id selector, a space " " is a descendant selector (selects a child of the preceeding element and the [] is an attribute selector. A "." is a class selector i.e. selects an element by class name.
I am looking for the first element with an href attribute containing http in its value that has a parent element div element with class name r, whose parent has an id of search.
Option Explicit
Public Sub GetLink()
Dim ie As New InternetExplorer
With ie
.Visible = True
.navigate "https://google.co.uk/search?q=Currys+241825"
While .Busy Or .readyState < 4: DoEvents: Wend
Debug.Print .document.querySelector("#search div.r [href*=http]").href
.Quit
End With
End Sub
This is how I would do it. Put some search criteria in Cell A2, going down in ColumnA as far as you want to go. Then run the code below. The results will go into the adjacent cells in ColumnB
girafe
rhino
starbucks
Sub Gethits()
Dim url As String, lastRow As Long
Dim XMLHTTP As Object, html As Object, objResultDiv As Object, objH3 As Object, link As Object
Dim start_time As Date
Dim end_time As Date
Dim var As String
Dim var1 As Object
lastRow = Range("A" & Rows.Count).End(xlUp).Row
Dim cookie As String
Dim result_cookie As String
start_time = Time
Debug.Print "start_time:" & start_time
For i = 2 To lastRow
url = "https://www.google.com/search?q=" & Cells(i, 1) & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)
Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
XMLHTTP.Open "GET", url, False
XMLHTTP.setRequestHeader "Content-Type", "text/xml"
XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
XMLHTTP.send
Set html = CreateObject("htmlfile")
html.body.innerHTML = XMLHTTP.ResponseText
Set objResultDiv = html.getelementbyid("rso")
Set var1 = html.getelementbyid("resultStats")
Cells(i, 2).Value = var1.innerText
DoEvents
Next
end_time = Time
Debug.Print "end_time:" & end_time
Debug.Print "done" & "Time taken : " & DateDiff("n", start_time, end_time)
MsgBox "done" & "Time taken : " & DateDiff("n", start_time, end_time)
End Sub

web scraping using excel and VBA

i wrote my VBA code in excel sheet as below but it is not scrape data for me and also i don't know why please any one help me. it gave me reullt as "click her to read more" onlyi want to scrape enitre data such as first name last name state zip code and so on
Sub extractTablesData()
Dim IE As Object, obj As Object
Dim myState As String
Dim r As Integer, c As Integer, t As Integer
Dim elemCollection As Object
Set IE = CreateObject("InternetExplorer.Application")
myState = InputBox("Enter the city where you wish to work")
With IE
.Visible = True
.navigate ("http://www.funeralhomes.com/go/listing/Search? name=&city=&state=&country=USA&zip=&radius=")
While IE.readyState <> 4
DoEvents
Wend
For Each obj In IE.document.all.item("state").Options
If obj.innerText = myState Then
obj.Selected = True
End If
Next obj
IE.document.getElementsByValue("Search").item.Click
Do While IE.Busy: DoEvents: Loop
ThisWorkbook.Sheets("Sheet1").Range("A1:K1500").ClearContents
Set elemCollection = IE.document.getElementsByTagName("TABLE")
For t = 0 To (elemCollection.Length - 1)
For r = 0 To (elemCollection(t).Rows.Length - 1)
For c = 0 To (elemCollection(t).Rows(r).Cells.Length - 1)
ThisWorkbook.Worksheets(1).Cells(r + 1, c + 1) = elemCollection(t).Rows(r).Cells(c).innerText
Next c
Next r
Next t
End With
Set IE = Nothing
End Sub
Using the same URL as the answer already given you could alternatively select with CSS selectors to get the elements of interest, and use split to get just the names and address parts from the text. We can also do away with the browser altogether to get faster results from first results page.
Business name:
You can get the name with the following selector (using paid listing example):
div.paid-listing .listing-title
This selects (sample view)
Try
Address info:
The associated descriptive information can be retrieved with the selector:
div.paid-listing .address-summary
And then using split we can parse this into just the address information.
Code:
Option Explicit
Public Sub GetTitleAndAddress()
Dim oHtml As HTMLDocument, nodeList1 As Object, nodeList2 As Object, i As Long
Const URL As String = "http://www.funeralhomes.com/go/listing/ShowListing/USA/New%20York/New%20York"
Set oHtml = New HTMLDocument
With CreateObject("WINHTTP.WinHTTPRequest.5.1")
.Open "GET", URL, False
.send
oHtml.body.innerHTML = .responseText
End With
Set nodeList1 = oHtml.querySelectorAll("div.paid-listing .listing-title")
Set nodeList2 = oHtml.querySelectorAll("div.paid-listing .address-summary")
With Worksheets("Sheet3")
.UsedRange.ClearContents
For i = 0 To nodeList1.Length - 1
.Range("A" & i + 1) = nodeList1.Item(i).innerText
.Range("B" & i + 1) = Split(nodeList2.Item(i).innerText, Chr$(10))(0)
Next i
End With
End Sub
Example output:
Yeah, without an API, this can be very tricky at best, and very inconsistent at worst. For now, you can try the script below.
Sub DumpData()
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
URL = "http://www.funeralhomes.com/go/listing/ShowListing/USA/New%20York/New%20York"
'Wait for site to fully load
IE.Navigate2 URL
Do While IE.Busy = True
DoEvents
Loop
RowCount = 1
With Sheets("Sheet1")
.Cells.ClearContents
RowCount = 1
For Each itm In IE.document.all
If itm.classname Like "*free-listing*" Or itm.classname Like "*paid-listing*" Then
.Range("A" & RowCount) = itm.classname
.Range("B" & RowCount) = Left(itm.innertext, 1024)
RowCount = RowCount + 1
End If
Next itm
End With
End Sub
You probably want some kind of input box to capture the city and state and radius from the user, or capture those variable in cells in your worksheet.
Notice, the '%20' is a space character.
I got this idea from a friend of mine, Joel, a long time ago. That guy is great!

Resources