VBA Code to Search for Element on Website and Repeat Search - excel

I am fairly new to VBA and web scraping, but have been reading a lot trying to learn.
I have a long list of addresses in column A of an Excel file. I would like to like to search each of these addresses on the Website http://www.parkopedia.com. This website finds the closest parking and cost based on a given address.
Based on the results, I would like to pull the inner text of two Span tags. This inner text represents the cost of the closest parking and the distance to the parking.
<div class="LocationListItem__searchDetails">
<div class="LocationDetailsSearchDetails">
<div class="LocationDetailsSearchDetails__detail">
<span class="LocationDetailsSearchDetails__detail__value">Free</span>
<span class="LocationDetailsSearchDetails__detail__label">2 hours</span>
</div>
<div class="LocationDetailsSearchDetails__detail">
<span class="LocationDetailsSearchDetails__detail__value">20 </span>
In this case I would like to pull "Free" and "20" into columns B and C of my spreadsheet. But it appears they both have the same Class name of "LocationDetailsSearchDetails__detail"
I have about 1500 results to search, so if possible using xmlHTTP would be preferred.
The first address that I am searching is "10 herrmann place new york 10710"
Below is the code that I have tried far. For right now I have the URL entered directly into the code rather than pulling from the Excel file. I will figure out how to get the correct URL to pull once I can get the correct data to be scraped from the site. The data in B2 is currently blank so that it only pulls the base URL with nothing else on the end.
The issue that I seem to be having is finding a way to get the below line to work correctly.
GetParkingData = html.getFirstSpanWithClass('LocationDetailsSearchDetails__detail__value').innerText
Module
Option Explicit
Public Sub GetInfo()
Dim http As clsHTTP, sResponse As String, lastRow As Long, groupResults(), i As Long, html As HTMLDocument
Set html = New HTMLDocument
Set http = New clsHTTP
Const BASE_URL As String = "https://en.parkopedia.com/parking/locations/10_herrmann_place_new_york_10710_united_states_1jegdr72zr6xf2yc83/?arriving=202107131800&leaving=2021071320000"
Application.ScreenUpdating = False
With ThisWorkbook.Worksheets("Sheet1")
lastRow = .Cells(.Rows.Count, "B").End(xlUp).Row
Select Case lastRow
Case 1
Exit Sub
Case 2
ReDim arr(1, 1): arr(1, 1) = .Range("B2").Value
Case Else
arr = .Range("B2:B" & lastRow).Value
End Select
ReDim groupResults(1 To lastRow - 1)
With http
For i = LBound(arr, 1) To UBound(arr, 1)
If Len(BASE_URL & arr(i, 1)) > Len(BASE_URL) Then
sResponse = .GetHTML(BASE_URL & arr(i, 1))
html.body.innerHTML = sResponse
groupResults(i) = .GetParkingData(html)
sResponse = vbNullString: html.body.innerHTML = vbNullString
End If
Next
End With
For i = LBound(groupResults) To UBound(groupResults)
.Cells(i + 1, "C") = groupResults(i)
Next
End With
Application.ScreenUpdating = True
End Sub
Class ClsHTTP
Option Explicit
Private http As Object
Private Sub Class_Initialize()
Set http = CreateObject("MSXML2.XMLHTTP")
End Sub
Public Function GetHTML(ByVal URL As String) As String
Dim sResponse As String
With http
.Open "GET", URL, False
.send
GetHTML = StrConv(.responseBody, vbUnicode)
End With
End Function
Public Function GetParkingData(ByVal html As HTMLDocument) As String
On Error GoTo errhand:
GetParkingData = html.getFirstSpanWithClass('LocationDetailsSearchDetails__detail__value').innerText
Exit Function
errhand:
GetParkingData = "Not Found"
End Function
Thank you all very much! Please forgive anything that seems novice, I am trying to teach myself VBA and web scraping.

Related

How to extract company address from website

I have been browsing, trying and reading different codes from here but i can't get mine to work. What i am trying to do:
Browse a web which contains all information about the country's companies.
Look up the company in the website (taking as reference the name from a excel cell).
When the company web loads, extract "Domicilio" (address) from it and past it on another excel cell.
Information to take into account:
Website is: https://www.infocif.es/
Form to look up is: https://www.infocif.es/general/empresas-informacion-listado-empresas.asp?Buscar= , so if you want to search "014 media SL" company, it would be https://www.infocif.es/general/empresas-informacion-listado-empresas.asp?Buscar=014 media SL
If you load the previous link, which leads to the final link https://www.infocif.es/ficha-empresa/014-media-sl, you will see in the "informacion de la compaƱia" div, the needed "domicilio" information. In this case, i would like to extract: Calle Alcala 106 Primero 28009 - Madrid
what i have tried so far. The following codes from some great people on this web:
clsHTTP (MODULE CLASS)
Option Explicit
Private http As Object
Private Sub Class_Initialize()
Set http = CreateObject("MSXML2.XMLHTTP")
End Sub
Public Function GetHTML(ByVal URL As String) As String
Dim sResponse As String
With http
.Open "GET", URL, False
.send
GetHTML = StrConv(.responseBody, vbUnicode)
End With
End Function
Public Function GetDomicilio(ByVal html As HTMLDocument) As String
On Error GoTo errhand:
GetDomicilio = html.querySelector("div.row p:nth-child(6)").innerText
Exit Function
errhand:
GetDomicilio = "Not Found"
End Function
Standard module
Option Explicit
Public Sub GetInfo()
Dim http As clsHTTP, sResponse As String, lastRow As Long, groupResults(), i As Long, html As HTMLDocument
Set html = New HTMLDocument
Set http = New clsHTTP
Const BASE_URL As String = "https://www.infocif.es/general/empresas-informacion-listado-empresas.asp?Buscar="
Application.ScreenUpdating = False
With ThisWorkbook.Worksheets("Hoja1")
lastRow = .Cells(.Rows.Count, "A").End(xlUp).Row
Select Case lastRow
Case 1
Exit Sub
Case 2
ReDim arr(1, 1): arr(1, 1) = .Range("A2").Value
Case Else
arr = .Range("A2:A" & lastRow).Value
End Select
ReDim groupResults(1 To lastRow - 1)
With http
For i = LBound(arr, 1) To UBound(arr, 1)
If Len(BASE_URL & arr(i, 1)) > Len(BASE_URL) Then
sResponse = .GetHTML(BASE_URL & arr(i, 1))
html.body.innerHTML = sResponse
groupResults(i) = .GetDomicilio(html)
sResponse = vbNullString: html.body.innerHTML = vbNullString
End If
Next
End With
For i = LBound(groupResults) To UBound(groupResults)
.Cells(i + 1, "B") = groupResults(i)
Next
End With
Application.ScreenUpdating = True
End Sub
I have been trying to figure it out, and i believe that it is failing by 2 reasons:
the Base URL is wrong, or
I cant properly define the queryselector for "domicilio".
What am i getting when i run the code above?
It will run, no issues, but it will return a "not found" reply on b column. Please, assume that all the company names are listed on A column (there are like 1800 rows). Company names start by A2.
You can try the following companies:
014 MEDIA SL
100M MONTADITOS INTERNACIONAL SL
649298 CASTELLON SL
ABACO ASESORES PERICIALES SL
ABADA SA
ABELLCON 3 SL
ABEX-EXCELENCIA ROBOTICA SL
ABF IBERIA HOLDING SL
ABRAFACT SL
ACCENTURE HOLDINGS IBERIA SL
ACCESORIOS DE TUBERIA SA
ACCESORIOS FRIGORIFICOS SA
ACCIONA MOBILITY SA

VBA Scraping div elements

So, I've trying to scrape data from a website but I simply can't reach my goal...
I'm new with VBA and i've tried to search the basics of vba in order to understand some code.
So far I got this code but it's only scraping the data from the 1st div and it scrap all the data to one cell, and I need the macro to run trought all the page and scrap all the data that has the className I input on the code on diferent cells (eg: 1st div to cell A:1, 2nd div to cell A2... and so on)
Could you help me or give me some "lights" of what I'm doing wrong pls?
Thank you!
Code:
Sub BoschRoupa()
Dim ieObj As InternetExplorer
Dim htmlEle As IHTMLElement
Dim i As Integer
i = 1
Set ieObj = New InternetExplorer
ieObj.Visible = False
ieObj.navigate "https://www.worten.pt/grandes-eletrodomesticos/maquinas-de-roupa/maquinas-de-roupa-ver-todos-marca-BALAY-e-BOSCH-e-SIEMENS?per_page=100"
Application.Wait Now + TimeValue("00:00:05")
For Each htmlEle In ieObj.document.getElementsByClassName("w-product__content")(0).getElementsByTagName("div")
With ActiveSheet
.Range("A" & i).Value = htmlEle.Children(0).textContent
End With
i = i + 1
Next htmlEle
End Sub
You can use xmlhttp, rather than a browser, then the following loop to write out all the div info. I would probably be more selective in how I grab only data of interest but the following, I hope, is in the spirit of what you have asked for.
Option Explicit
Public Sub GetInfo()
Dim data As Object, i As Long, html As HTMLDocument, r As Long, c As Long, item As Object, div As Object
Set html = New HTMLDocument '<== VBE > Tools > References > Microsoft HTML Object Library
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.worten.pt/grandes-eletrodomesticos/maquinas-de-roupa/maquinas-de-roupa-ver-todos-marca-BALAY-e-BOSCH-e-SIEMENS?per_page=100", False
.send
html.body.innerHTML = .responseText
End With
Set data = html.getElementsByClassName("w-product__content")
For Each item In data
r = r + 1: c = 1
For Each div In item.getElementsByTagName("div")
With ThisWorkbook.Worksheets("Sheet1")
.Cells(r, c) = div.innerText
End With
c = c + 1
Next
Next
End Sub

Search a website with Excel data to extract results and then loop

I have 8000 values in an Excel spreadsheet.
I need to search a website and then record a specific line of data from the website to in the Excel spreadsheet.
I found code which searches for data excel macro to search a website and extract results
Sub URL_Get_ABN_Query()
strSearch = Range("a1")
With ActiveSheet.QueryTables.Add( _
Connection:="URL;http://www.abr.business.gov.au/SearchByABN.aspx?SearchText=" & _
strSearch & "&safe=active", _
Destination:=Range("a5"))
.BackgroundQuery = True
.TablesOnlyFromHTML = True
.Refresh BackgroundQuery:=False
.SaveData = True
End With
'enter code here
End Sub
It collects the data from the website like this.
I only want the 'entity type' data line.
I can not find how to extend the code to only grab this line and input to the corresponding cell. i.e. ABN(b2)search, find input 'entity type' and paste into Company Type(c2).
Alternatively, I tried to find how to fill the information vertically instead of horizontally. I could delete the columns that are not needed. I thought this may be simpler.
I tried to record the macro with developer.
I also need to loop to the next ABN and populate the corresponding field and so on (B3>C3, B4>C4, etc.).
This is absolutely possible. You've got what I often find the hardest part, sourcing the information from another platform. To make this work I would separate it out a little bit and for simplicity use 2 sheets (Sheet1 with your known data and Sheet2 for the web data).
Loop through your table of ~8000 businesses. We can identify this from the UsedRange number of Rows. We know that the ABN is in column 2 (also known as B) so we copy that into the variable to pass to the function. The function will return the "Entity type:" to column 3 (C) of the same row.
Sub LoopThroughBusinesses()
Dim i As Integer
Dim ABN As String
For i = 2 To Sheet1.UsedRange.Rows.Count
ABN = Sheet1.Cells(i, 2)
Sheet1.Cells(i, 3) = URL_Get_ABN_Query(ABN)
Next i
End Sub
Change the subroutine you created to a Function so it returns the entity type you are after. The function will save the data into Sheet2 and then return just the Entity data that we are after.
Function URL_Get_ABN_Query(strSearch As String) As String ' Change it from a Sub to a Function that returns the desired string
' strSearch = Range("a1") ' This is now passed as a parameter into the Function
Dim entityRange As Range
With Sheet2.QueryTables.Add( _
Connection:="URL;http://www.abr.business.gov.au/SearchByABN.aspx?SearchText=" & strSearch & "&safe=active", _
Destination:=Sheet2.Range("A1")) ' Change this destination to Sheet2
.BackgroundQuery = True
.TablesOnlyFromHTML = True
.Refresh BackgroundQuery:=False
.SaveData = True
End With
' Find the Range that has "Entity Type:"
Set entityRange = Sheet2.UsedRange.Find("Entity type:")
' Then return the value of the cell to its' right
URL_Get_ABN_Query = entityRange.Offset(0, 1).Value2
' Clear Sheet2 for the next run
Sheet2.UsedRange.Delete
End Function
You do not want a load of connections (queryTables) set up in this way. It will be so slow if even possible. At 8000 requests, provided xmlhttp is not blocked or throttled, the below method will be significantly faster. If there does appear to be slowing/blocking then add in a small wait every x requests.
If possible use xmlhttp to gather data. Use css selectors to specifically target the entity type. Store values in an array and write out with loop at end. Use a class to hold the xmlhttp object for greater efficiency. Provide your class with methods including how to handle not found (example given). Add some further optimizations e.g. given is switching off screen-updating. This assumes your search numbers are in column B from B2. The code below also does some basic checks that there is something present in column B and handles the case of there being 1 or more numbers.
Good code is modular and you want a function to return something and a sub to perform actions. A single sub/function shouldn't complete lots of tasks. You want to easily debug with code that follows the principle of single responsibility (or close to it).
class clsHTTP
Option Explicit
Private http As Object
Private Sub Class_Initialize()
Set http = CreateObject("MSXML2.XMLHTTP")
End Sub
Public Function GetHTML(ByVal URL As String) As String
Dim sResponse As String
With http
.Open "GET", URL, False
.send
GetHTML = StrConv(.responseBody, vbUnicode)
End With
End Function
Public Function GetEntityType(ByVal html As HTMLDocument) As String
On Error GoTo errhand:
GetEntityType = html.querySelector("a[href*='EntityTypeDescription']").innerText
Exit Function
errhand:
GetEntityType = "Not Found"
End Function
Standard module:
Option Explicit
Public Sub GetInfo()
Dim http As clsHTTP, sResponse As String, lastRow As Long, groupResults(), i As Long, html As HTMLDocument
Set html = New HTMLDocument
Set http = New clsHTTP
Const BASE_URL As String = "http://www.abr.business.gov.au/ABN/View/"
Application.ScreenUpdating = False
With ThisWorkbook.Worksheets("Sheet1")
lastRow = .Cells(.rows.Count, "B").End(xlUp).Row
Select Case lastRow
Case 1
Exit Sub
Case 2
ReDim arr(1, 1): arr(1, 1) = .Range("B2").Value
Case Else
arr = .Range("B2:B" & lastRow).Value
End Select
ReDim groupResults(1 To lastRow - 1)
With http
For i = LBound(arr, 1) To UBound(arr, 1)
If Len(BASE_URL & arr(i, 1)) > Len(BASE_URL) Then
sResponse = .GetHTML(BASE_URL & arr(i, 1))
html.body.innerHTML = sResponse
groupResults(i) = .GetEntityType(html)
sResponse = vbNullString: html.body.innerHTML = vbNullString
End If
Next
End With
For i = LBound(groupResults) To UBound(groupResults)
.Cells(i + 1, "C") = groupResults(i)
Next
End With
Application.ScreenUpdating = True
End Sub
References (VBE> Tools > References):
Microsoft HTML Object Library
CSS selectors:
I use the fact the entity description is a hyperlink (a tag) and that its value contains the string EntityTypeDescription to use a css attribute = value with contains (*) operator to target.

Web Scraping ETFs Daily Data VBA

I'm trying to web scrape some daily info of differents ETFs. I found that https://www.marketwatch.com/ have a accurate info.
The most relevant info is the open Price, outstanding shares, NAV, total assets of the ETF.
Here is the link for IVV US Equity: https://www.marketwatch.com/investing/fund/ivv
I have web scraped with VBA before but the HTML of the pages I had used are different, I don't know if this is because some values of the ETFs (such as Price and Taded Volume) change constantly.
The idea is to create a code to extract relevant info and create a data base to analyze Macroeconomics factor using the ETFs as market indicators of flows between countries, regions, etc...
Mi first approach would be with VBA but after I get more into the data I would like to try with Python (after I get more conffident with it) to automate the webscraping process on a daily basis.
I am open to any suggestion or any other website that could be useful (I have tried with Yahoo Finance and Morningstar and I get the same problema with the HTML code).
This is my poor code:
Sub Get_Data()
Dim ticker As String, enlace As String
ticker = ThisWorkbook.Worksheets("ETFs").Cells(2, 2).Value 'IVV
'link = "https://www.morningstar.com/etfs/arcx/" & ticker & "/quote.html"
'link = "https://finance.yahoo.com/quote/" & ticker & "?p=" & ticker
link = "https://www.marketwatch.com/investing/fund/" & ticker
Application.ScreenUpdating = False
Dim x As Integer
x = ThisWorkbook.Worksheets("ETFs").Cells(Rows.Count, 1).End(xlUp).Row
'Dim i As Integer
'For i = 2 To x
Dim total_net_assets As Variant, open_price As Variant, NAV As Variant, shares_out
Set ie = CreateObject("InternetExplorer.application")
With ie
.Visible = False
.navigate link
While .Busy Or .readyState < 4: DoEvents: Wend
Do
DoEvents
On Error Resume Next
' Here is where I get the problem of not knowing how to reference the values I need because the class name appears repeatedly
total_net_assets = .document.getElementsByClassName("").Value
open_price = .document.getElementByClassName("price").Value
NAV = .document.getElementByClassName("").Value
shares_out = .document.getElementByClassName("kv__value kv__primary ").Value
On Error GoTo 0
Loop
End With
ThisWorkbook.Worksheets("ETFs").Cells(2, 13).Value = total_net_assets
ThisWorkbook.Worksheets("ETFs").Cells(2, 14).Value = NAV
ThisWorkbook.Worksheets("ETFs").Cells(2, 15).Value = open_price
ThisWorkbook.Worksheets("ETFs").Cells(2, 16).Value = shares_out
ie.Quit
'Next i
Application.ScreenUpdating = True
End Sub
Access method:
I use XMLHTTP requests as much faster than opening IE.
Code notes:
The following reads in fund short codes from Sheet1 column A, starting in A2, into an array. You can easily extend this adding more funds into column A.
This array is looped issuing XMLHTTP requests by concatenating the fund code into the BASE_URL variable.
I use a class, clsHTTP, to hold the XMLHTTP object to be efficient - no need to keep creating and destroying the object.
I provide this class with two methods. One to retrieve the target page innerHTML (GetString), and the other to extract the required info if available (GetInfo). I use a dictionary to test if the searched for labels are present. If present I grab the associated value. If not, I have a placeholder vbNullString in the dictionary.
I add each scraped result into a collection called results. At the end I loop this writing out to the sheet. By keeping most of the work in memory this provides for much faster scraping.
Retrieving info from HTML:
The labels e.g. Open, and values come in pairs.
You can generate a nodeList (think collection as with getElementsByClassName) by using querySelectorAll method to apply a class CSS selector to gather the label elements by their class name kv__label. The "." is the class selector.
Set labels = .querySelectorAll(".kv__label") '<== nodeList of labels
You do the same to get the associated values:
Set values = .querySelectorAll(".kv__value.kv__primary") '<== nodeList of associated values. Same length as labels nodeList so can use same index to retrieve associated label/value pairs from each nodeList.
You loop the labels using the dictionary in the clsHTTP method .GetInfo to see if you searched for labels are present, if they are, the associated value is retrieved from values by using the same index as where the label was found in the nodeList labels, and the dictionary vbNullString value for that label is updated with the actual retrieved value, else it is left as vbNullString.
Sample results:
VBA:
Class module clsHTTP:
Option Explicit
Private http As Object
Private Sub Class_Initialize()
Set http = CreateObject("MSXML2.XMLHTTP")
End Sub
Public Function GetString(ByVal url As String) As String
Dim sResponse As String
With http
.Open "GET", url, False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
sResponse = StrConv(.responseBody, vbUnicode)
GetString = sResponse
End With
End Function
Public Function GetInfo(ByVal html As HTMLDocument) As Object
Dim dict As Object, i As Long
Set dict = CreateObject("Scripting.Dictionary")
dict.Add "Open", vbNullString
dict.Add "Shares Outstanding", vbNullString
dict.Add "Total Net Assets", vbNullString
dict.Add "NAV", vbNullString
Dim values As Object, labels As Object
With html
Set values = .querySelectorAll(".kv__value.kv__primary")
Set labels = .querySelectorAll(".kv__label")
For i = 0 To labels.Length - 1
If dict.Exists(labels.item(i).innerText) Then dict(labels.item(i).innerText) = values.item(i).innerText
Next
End With
Set GetInfo = dict
End Function
Standard module 1:
Option Explicit
Public Sub GetFundInfo()
Dim sResponse As String, html As HTMLDocument, http As clsHTTP, i As Long
Dim headers(), funds(), url As String, results As Collection, ws As Worksheet
Const BASE_URL As String = "https://www.marketwatch.com/investing/fund/"
Application.ScreenUpdating = False
headers = Array("Open", "Shares Outstanding", "Total Net Assets", "NAV")
Set results = New Collection
Set http = New clsHTTP
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set html = New HTMLDocument
funds = Application.Transpose(ws.Range("A2:A3").Value) '<== Change the range here to the single column range containing your dotNums.
For i = LBound(funds) To UBound(funds)
If Not IsEmpty(funds(i)) Then
url = BASE_URL & funds(i)
html.body.innerHTML = http.GetString(url)
results.Add http.GetInfo(html).Items
End If
Next
If results.Count > 0 Then
Dim item As Variant, r As Long, c As Long
r = 2: c = 2
With ws
.Cells(1, c).Resize(1, UBound(headers) + 1) = headers
For Each item In results
.Cells(r, c).Resize(1, UBound(item) + 1) = item
r = r + 1
Next
End With
End If
Application.ScreenUpdating = True
End Sub
Set-up:
Without using a class:
Option Explicit
Public Sub GetFundInfo()
Dim sResponse As String, html As HTMLDocument, i As Long
Dim headers(), funds(), url As String, results As Collection, ws As Worksheet
Const BASE_URL As String = "https://www.marketwatch.com/investing/fund/"
Application.ScreenUpdating = False
headers = Array("Open", "Shares Outstanding", "Total Net Assets", "NAV")
Set results = New Collection
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set html = New HTMLDocument
funds = Application.Transpose(ws.Range("A2:A3").Value) '<== Change the range here to the single column range containing your dotNums.
For i = LBound(funds) To UBound(funds)
If Not IsEmpty(funds(i)) Then
url = BASE_URL & funds(i)
html.body.innerHTML = GetString(url)
results.Add GetInfo(html).Items
End If
Next
If results.Count > 0 Then
Dim item As Variant, r As Long, c As Long
r = 2: c = 2
With ws
.Cells(1, c).Resize(1, UBound(headers) + 1) = headers
For Each item In results
.Cells(r, c).Resize(1, UBound(item) + 1) = item
r = r + 1
Next
End With
End If
Application.ScreenUpdating = True
End Sub
Public Function GetString(ByVal url As String) As String
Dim http As Object
Set http = CreateObject("MSXML2.XMLHTTP")
Dim sResponse As String
With http
.Open "GET", url, False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
sResponse = StrConv(.responseBody, vbUnicode)
GetString = sResponse
End With
End Function
Public Function GetInfo(ByVal html As HTMLDocument) As Object
Dim dict As Object, i As Long
Set dict = CreateObject("Scripting.Dictionary")
dict.Add "Open", vbNullString
dict.Add "Shares Outstanding", vbNullString
dict.Add "Total Net Assets", vbNullString
dict.Add "NAV", vbNullString
Dim values As Object, labels As Object
With html
Set values = .querySelectorAll(".kv__value.kv__primary")
Set labels = .querySelectorAll(".kv__label")
For i = 0 To labels.Length - 1
If dict.Exists(labels.item(i).innerText) Then dict(labels.item(i).innerText) = values.item(i).innerText
Next
End With
Set GetInfo = dict
End Function
Okay, so you will need to create two loops. You can just keep reusing the elem0, elem1, and elemColl(1) variables for each price point you need - just make sure to reset bFoundIt to False for each new iteration so you do not exit the For Loops early.
For your total_net_assets var, you will first loop the class of kv__item. You will then need to loop each class collection of kv__label within the kv__item's elements and stop when you match the innerText: Total Net Assets. Once you match this, you will use the first coll obj elem0 to get the kv__value kv__primary class name for it.
Dim IE As Object, elem0 As Object, elem1 As Object, i As Long, bFoundIt As Boolean
Set IE = CreateObject("InternetExplorer.application")
With IE
.Visible = False
.navigate link
While .Busy Or .readyState < 4: DoEvents: Wend
DoEvents
bFoundIt = False
For Each elem0 In .document.getElementsByClassName("kv__item")
For Each elem1 In elem0.getElementsByClassName("kv__label")
If elem1.innerText = "Total Net Assets" Then
bFoundIt = True
total_net_assets = elem0.getElementsByClassName("kv__value kv__primary ")(0).innerText
Exit For
End If
Next elem1
If bFoundIt Then Exit For
Next elem0

web scraping using excel and VBA

i wrote my VBA code in excel sheet as below but it is not scrape data for me and also i don't know why please any one help me. it gave me reullt as "click her to read more" onlyi want to scrape enitre data such as first name last name state zip code and so on
Sub extractTablesData()
Dim IE As Object, obj As Object
Dim myState As String
Dim r As Integer, c As Integer, t As Integer
Dim elemCollection As Object
Set IE = CreateObject("InternetExplorer.Application")
myState = InputBox("Enter the city where you wish to work")
With IE
.Visible = True
.navigate ("http://www.funeralhomes.com/go/listing/Search? name=&city=&state=&country=USA&zip=&radius=")
While IE.readyState <> 4
DoEvents
Wend
For Each obj In IE.document.all.item("state").Options
If obj.innerText = myState Then
obj.Selected = True
End If
Next obj
IE.document.getElementsByValue("Search").item.Click
Do While IE.Busy: DoEvents: Loop
ThisWorkbook.Sheets("Sheet1").Range("A1:K1500").ClearContents
Set elemCollection = IE.document.getElementsByTagName("TABLE")
For t = 0 To (elemCollection.Length - 1)
For r = 0 To (elemCollection(t).Rows.Length - 1)
For c = 0 To (elemCollection(t).Rows(r).Cells.Length - 1)
ThisWorkbook.Worksheets(1).Cells(r + 1, c + 1) = elemCollection(t).Rows(r).Cells(c).innerText
Next c
Next r
Next t
End With
Set IE = Nothing
End Sub
Using the same URL as the answer already given you could alternatively select with CSS selectors to get the elements of interest, and use split to get just the names and address parts from the text. We can also do away with the browser altogether to get faster results from first results page.
Business name:
You can get the name with the following selector (using paid listing example):
div.paid-listing .listing-title
This selects (sample view)
Try
Address info:
The associated descriptive information can be retrieved with the selector:
div.paid-listing .address-summary
And then using split we can parse this into just the address information.
Code:
Option Explicit
Public Sub GetTitleAndAddress()
Dim oHtml As HTMLDocument, nodeList1 As Object, nodeList2 As Object, i As Long
Const URL As String = "http://www.funeralhomes.com/go/listing/ShowListing/USA/New%20York/New%20York"
Set oHtml = New HTMLDocument
With CreateObject("WINHTTP.WinHTTPRequest.5.1")
.Open "GET", URL, False
.send
oHtml.body.innerHTML = .responseText
End With
Set nodeList1 = oHtml.querySelectorAll("div.paid-listing .listing-title")
Set nodeList2 = oHtml.querySelectorAll("div.paid-listing .address-summary")
With Worksheets("Sheet3")
.UsedRange.ClearContents
For i = 0 To nodeList1.Length - 1
.Range("A" & i + 1) = nodeList1.Item(i).innerText
.Range("B" & i + 1) = Split(nodeList2.Item(i).innerText, Chr$(10))(0)
Next i
End With
End Sub
Example output:
Yeah, without an API, this can be very tricky at best, and very inconsistent at worst. For now, you can try the script below.
Sub DumpData()
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
URL = "http://www.funeralhomes.com/go/listing/ShowListing/USA/New%20York/New%20York"
'Wait for site to fully load
IE.Navigate2 URL
Do While IE.Busy = True
DoEvents
Loop
RowCount = 1
With Sheets("Sheet1")
.Cells.ClearContents
RowCount = 1
For Each itm In IE.document.all
If itm.classname Like "*free-listing*" Or itm.classname Like "*paid-listing*" Then
.Range("A" & RowCount) = itm.classname
.Range("B" & RowCount) = Left(itm.innertext, 1024)
RowCount = RowCount + 1
End If
Next itm
End With
End Sub
You probably want some kind of input box to capture the city and state and radius from the user, or capture those variable in cells in your worksheet.
Notice, the '%20' is a space character.
I got this idea from a friend of mine, Joel, a long time ago. That guy is great!

Resources