Excel VBA Pulling Web Data From a List of Hyperlinks - excel

I have a list of hyperlinks in column C on sheet 1, and I want to pull data from each link and put the data for each link in separate worksheets which have already been created. All of the hyperlinks are to the same website...pro football reference... but each link is for a different NFL player. I want to pull the same data table for each player. I have been able to pull data from the first link and put it in sheet 2 as it should be, but I am very new to VBA and can't figure out how to create a loop to do this for each link in my list and to put it in the other sheets. Below is the code I currently have to get data from the first link:
Sub passingStats()
Dim x As Long, y As Long
Dim htm As Object
Set htm = CreateObject("htmlFile")
With CreateObject("msxml2.xmlhttp")
.Open "GET", Range("C2"), False
.send
htm.body.innerhtml = .responsetext
End With
With htm.getelementbyid("passing")
For x = 0 To .Rows.Length - 1
For y = 0 To .Rows(x).Cells.Length - 1
Sheets(2).Cells(x + 4, y + 1).Value = .Rows(x).Cells(y).innertext
Next y
Next x
End With
End Sub
Any help would be greatly appreciated.

The following shows using a loop.
N.B.
There is a logic flaw in your table write which I have written a patch for.
Some strings where being converted incorrectly in your script. I have prefixed with ' to stop this.
Code:
Option Explicit
Public Sub GetInfo()
Dim html As New HTMLDocument, links(), link As Long, wsSourceSheet As Worksheet
Dim hTable As HTMLTable, ws As Worksheet, playerName As String
Set wsSourceSheet = ThisWorkbook.Worksheets("Sheet1") '<change to sheet containing links
Application.ScreenUpdating = False
With wsSourceSheet
links = .Range("C2:C" & .Cells(.Rows.Count, "C").End(xlUp).Row).Value
End With
For link = LBound(links, 1) To UBound(links, 1)
If InStr(links(link, 1), "https://") > 0 Then
Set html = GetHTMLDoc(links(link, 1))
Set hTable = html.getElementById("passing")
If Not hTable Is Nothing Then
playerName = GetNameAbbr(links(link, 1))
Set ws = AddPlayerSheet(playerName)
WriteTableToSheet hTable, ws
FixTable ws
End If
End If
Next
Application.ScreenUpdating = True
End Sub
Public Function GetHTMLDoc(ByVal url As String) As HTMLDocument
Dim sResponse As String, html As New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", url, False
.send
sResponse = StrConv(.responseBody, vbUnicode)
End With
sResponse = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
html.body.innerHTML = sResponse
Set GetHTMLDoc = html
End Function
Public Sub WriteTableToSheet(ByVal hTable As HTMLTable, ByVal ws As Worksheet)
Dim x As Long, y As Long
With hTable
For x = 0 To .Rows.Length - 1
For y = 0 To .Rows(x).Cells.Length - 1
If y = 6 Or y = 7 Then
ws.Cells(x + 4, y + 1).Value = Chr$(39) & .Rows(x).Cells(y).innerText
Else
ws.Cells(x + 4, y + 1).Value = .Rows(x).Cells(y).innerText
End If
Next y
Next x
End With
End Sub
Public Function GetNameAbbr(ByVal url As String) As String
Dim tempArr() As String
tempArr = Split(url, "/")
GetNameAbbr = Left$(tempArr(UBound(tempArr)), 6)
End Function
Public Function AddPlayerSheet(ByVal playerName As String) As Worksheet
Dim ws As Worksheet
If SheetExists(playerName) Then
Application.DisplayAlerts = False
ThisWorkbook.Worksheets(playerName).Delete
Application.DisplayAlerts = True
End If
Set ws = ThisWorkbook.Worksheets.Add
ws.Name = playerName
Set AddPlayerSheet = ws
End Function
Public Function SheetExists(ByVal playerName As String) As Boolean
SheetExists = Evaluate("ISREF('" & playerName & "'!A1)")
End Function
Public Sub FixTable(ByVal ws As Worksheet)
Dim found As Range, numSummaryRows As Long
With ws
Set found = .Columns("A").Find("Career")
If found Is Nothing Then Exit Sub
numSummaryRows = .Cells(.Rows.Count, "A").End(xlUp).Row - found.Row
numSummaryRows = IIf(numSummaryRows = 0, 1, numSummaryRows + 1)
Debug.Print found.Offset(, 1).Resize(numSummaryRows, 30).Address, ws.Name
found.Offset(, 1).Resize(numSummaryRows, 30).Copy found.Offset(, 2)
found.Offset(, 1).Resize(numSummaryRows, 1).ClearContents
End With
End Sub
Test links in sheet1:
Sample webpage:
Corresponding code write out:

Related

praise data from an attribute

want to get value of "data-defaultasin" attribute ,in b,c,d columns and so on from these elements (screenshot given).
product url- https://www.amazon.in/dp/B06XTB2N7P
inspect ss product page worksheet
Sub praseasin()
Dim ASIN
Dim doc As HTMLDocument
Dim htmTable As HTMLTable
Dim wb As Workbook: Set wb = ThisWorkbook
Dim ws As Worksheet: Set ws = wb.Worksheets("Sheet1")
Dim img As HTMLDocument
Dim i, lastRow As Long
lastRow = ws.Range("A" & ws.Rows.Count).End(xlUp).Row
For i = 3 To lastRow
Set doc = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", ws.Cells(i, 1), True
.send
Do: DoEvents: Loop Until .readyState = 4
Dim resp
resp = .responseText
.abort
End With
' On Error Resume Next
doc.body.innerHTML = resp
Set ASIN = doc.getelementsbyclassname("swatchAvailable")
' On Error Resume Next
r.Offset(0, 1).Value = li.getAttribute("data-defaultasin")
Next
End Sub
Try the following to get the asins from the different colors and put them in different columns. It's a demo script in which I've used one link multiple times. Modify it to suit your need. Thanks.
Sub FetchAsin()
Dim Http As New XMLHTTP60
Dim Html As New HTMLDocument, I&, R&, C&
Dim linkList As Variant, link As Variant
linkList = Array( _
"https://www.amazon.in/dp/B06XTB2N7P", _
"https://www.amazon.in/dp/B06XTB2N7P", _
"https://www.amazon.in/dp/B06XTB2N7P" _
)
For Each link In linkList
With Http
.Open "GET", link, False
.send
Html.body.innerHTML = .responseText
End With
With Html.querySelectorAll("[class^='swatch'][data-defaultAsin]")
For I = 0 To .Length - 1
C = C + 1: Cells(R + 1, C) = .Item(I).getAttribute("data-defaultAsin")
Next I
C = 0: R = R + 1
End With
Next link
End Sub
Reference to add to the library:
Microsoft XML, v6.0

Scraping from VBA with queryselector

I have used code from this website to pull data from site:
Option Explicit
Public Sub GetInfo()
Dim sResponse As String, i As Long, Html As New HTMLDocument
Dim prices As Object, info As Object
Application.ScreenUpdating = False
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://bazashifer.ru/proflist-profnastil", False
.send
sResponse = .responseText
End With
With Html
.body.innerHTML = sResponse
Set info = .querySelectorAll("div.views-field.views-field-title")
Set prices = .querySelectorAll("div.views-field.views-field-field-cena-tovara")
End With
With Worksheets(2)
For i = 0 To info.Length - 1
.Cells(i + 1, 1) = info(i).innerText
.Cells(i + 1, 2) = prices(i).innerText
Next i
End With
Application.ScreenUpdating = True
End Sub
The code above works just as intended. I implemented code to take multiply links ( link 1, link 2, link 3 ) :
Option Explicit
Public Sub GetInfoAll()
Dim wsSheet As Worksheet, Rows As Long, http As New XMLHTTP60, Html As New HTMLDocument, links As Variant, link As Variant
Dim prices As Object, info As Object, i As Long, sResponse As String
Set wsSheet = Sheets(1)
Rows = wsSheet.Cells(wsSheet.Rows.Count, "A").End(xlUp).row
links = wsSheet.Range("A1:A" & Rows)
With http
For Each link In links
.Open "GET", link, False
.send
sResponse = .responseText
With Html
.body.innerHTML = sResponse
Set info = .querySelectorAll("div.views-field.views-field-title")
Set prices = .querySelectorAll("div.views-field.views-field-field-cena-tovara")
End With
With Worksheets(2)
For i = 0 To info.Length - 1
.Cells(i + 1, 1) = info(i).innerText
.Cells(i + 1, 2) = prices(i).innerText
Next i
End With
Next link
End With
End Sub
The above code works and should pull data into columns, but for the next link the code re-writes the data.
Any help would be great. Thanks
Rows = wsSheet.Cells(wsSheet.Rows.Count, "A").End(xlUp).row
You need to have something like this during your output loop on Sheet 2 because you cant hard code the number of results.
Edit here's what I really meant about the output
Public Sub GetInfoAll()
Dim wsSheet As Worksheet, Rows As Long, http As New XMLHTTP60, Html As New HTMLDocument, links As Variant, link As Variant
Dim prices As Object, info As Object, i As Long, sResponse As String, offsetRows As Long
Dim wb As Workbook
Set wb = Application.Workbooks("Book1")
Set wsSheet = wb.Sheets(1)
Rows = wsSheet.Cells(wsSheet.Rows.Count, "A").End(xlUp).Row
links = wsSheet.Range("A1:A" & Rows)
With http
For Each link In links
.Open "GET", link, False
.send
sResponse = .responseText
With Html
.body.innerHTML = sResponse
Set info = .querySelectorAll("div.views-field.views-field-title")
Set prices = .querySelectorAll("div.views-field.views-field-field-cena-tovara")
End With
With wb.Worksheets(2)
For i = 0 To info.Length - 1
offsetRows = 0
offsetRows = wb.Worksheets(2).Cells(wb.Worksheets(2).Rows.Count, "A").End(xlUp).Row + 1
.Cells(offsetRows, 1) = info(i).innerText
.Cells(offsetRows, 2) = prices(i).innerText
Next i
End With
Next link
End With
End Sub
I think it is ideal to make use of container and then loop through it to parse the desired content. Consider the following an example. You can always append the rest to suit your need.
Public Sub GetInfo()
Dim Html As New HTMLDocument, Htmldoc As New HTMLDocument
Dim Wb As Workbook, ws As Worksheet, R&, I&
Dim link As Variant, linklist As Variant
Set Wb = ThisWorkbook
Set ws = Wb.Worksheets("output")
linklist = Array( _
"https://bazashifer.ru/armatura-stekloplastikovaya", _
"https://bazashifer.ru/truby-0", _
"https://bazashifer.ru/setka-stekloplastikovaya" _
)
For Each link In linklist
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", link, False
.send
Html.body.innerHTML = .responseText
End With
With Html.querySelectorAll(".view-content > .views-row")
For I = 0 To .Length - 1
Htmldoc.body.innerHTML = .item(I).outerHTML
R = R + 1: ws.Cells(R, 1) = Htmldoc.querySelector(".views-field-title a").innerText
ws.Cells(R, 2) = Htmldoc.querySelector("[class*='cena-tovara'] > .field-content").innerText
Next I
End With
Next link
End Sub
I think the problem is that your columns aren't updated for each link.
For i = 0 To info.Length - 1
.Cells(i + 1, 1) = info(i).innerText
.Cells(i + 1, 2) = prices(i).innerText
Next i
In this part you write everything to the first and second column. This should be updated everytime you move to a new link.
So maybe add a 'colcount' variable which updates just before you move to the next link?
something like this:
Infocol = 1
Pricecol = 2
For Each link In links
....
.Cells(i + 1, Infocol) = info(i).innerText
.Cells(i + 1, Priceol) = prices(i).innerText
....
Infocol = infocol + 2
Pricecol = Pricecol + 2
Next link
You go +2 so you don't overwrite your price column with your new info.

VBA HTML Listing Info Pull

I am looking to follow a series of URL's that are found in column A (example: https://www.ebay.com/itm/Apple-iPhone-7-GSM-Unlocked-Verizon-AT-T-TMobile-Sprint-32GB-128GB-256GB/352381131997?epid=225303158&hash=item520b8d5cdd:m:mWgYDe4a79NeLuAlV-RmAQA:rk:7:pf:0) and pull the following information from them:
- Title
- Price
- Description
I think there are multiple issues with my code... For one, I can't get the program to follow specific URL's listed in the Excel (only if I specify one within the code). Also, pulling multiple fields has given me issues.
Option Explicit
Public Sub ListingInfo()
Dim ie As New InternetExplorer, ws As Worksheet, t As Date
Dim i As Integer
i = 0
Do While Worksheets("Sheet1").Cells(i, 1).Value <> ""
Const MAX_WAIT_SEC As Long = 10
Set ws = ThisWorkbook.Worksheets("Sheet1")
With ie
.Visible = True
.Navigate2 Worksheets("Sheet1").Cells(i, 1).Value
While .Busy Or .readyState < 4: DoEvents: Wend
Dim Links As Object, i As Long, count As Long
t = Timer
Do
On Error Resume Next
Set Title = .document.querySelectorAll("it-ttl")
Set price = .document.querySelectorAll("notranslate")
Set Description = .document.querySelectorAll("ds_div")
count = Links.Length
On Error GoTo 0
If Timer - t > MAX_WAIT_SEC Then Exit Do
Loop While count = 0
For i = 0 To Title.Length - 1
ws.Cells(i + 1, 1) = Title.item(i)
ws.Cells(i + 1, 2) = price.item(i)
ws.Cells(i + 1, 3) = Description.item(i)
Next
.Quit
i = i + 1
Loop
End With
End Sub
I would use late binding for MSXML2.XMLHTTP and set a reference to the Microsoft HTML Object Library for the HTMLDocument.
Note: querySelector() references the first item it finds that matches its search string.
Here is the short version:
Public Sub ListingInfo()
Dim cell As Range
With ThisWorkbook.Worksheets("Sheet1")
For Each cell In .Range("A1", .Cells(.Rows.Count, 1).End(xlUp))
Dim Document As MSHTML.HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", cell.Value, False
.send
Set Document = New MSHTML.HTMLDocument
Document.body.innerHTML = .responseText
End With
cell.Offset(0, 1).Value = Document.getElementByID("itemTitle").innerText
cell.Offset(0, 2).Value = Document.getElementByID("prcIsum").innerText
If Not Document.querySelector(".viSNotesCnt") Is Nothing Then
cell.Offset(0, 3).Value = Document.querySelector(".viSNotesCnt").innerText
Else
'Try Something Else
End If
Next
End With
End Sub
A more elaborate solution would be to break the code up into smaller routines and load the data into an Array. The main advantage of this is that you can test each subroutine separately.
Option Explicit
Public Type tListingInfo
Description As String
Price As Currency
Title As String
End Type
Public Sub ListingInfo()
Dim source As Range
Dim data As Variant
With ThisWorkbook.Worksheets("Sheet1")
Set source = .Range("A1:D1", .Cells(.Rows.count, 1).End(xlUp))
data = source.Value
End With
Dim r As Long
Dim record As tListingInfo
Dim url As String
For r = 1 To UBound(data)
record = getListingInfo()
url = data(r, 1)
record = getListingInfo(url)
With record
data(r, 2) = .Description
data(r, 3) = .Price
data(r, 4) = .Title
End With
Next
source.Value = data
End Sub
Public Function getListingInfo(url As String) As tListingInfo
Dim ListingInfo As tListingInfo
Dim Document As MSHTML.HTMLDocument
Set Document = getHTMLDocument(url)
With ListingInfo
.Description = Document.getElementByID("itemTitle").innerText
.Price = Split(Document.getElementByID("prcIsum").innerText)(1)
.Title = Document.querySelectorAll(".viSNotesCnt")(0).innerText
Debug.Print .Description, .Price, .Title
End With
End Function
Public Function getHTMLDocument(url As String) As MSHTML.HTMLDocument
Const READYSTATE_COMPLETE As Long = 4
Dim Document As MSHTML.HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", url, False
.send
If .readyState = READYSTATE_COMPLETE And .Status = 200 Then
Set Document = New MSHTML.HTMLDocument
Document.body.innerHTML = .responseText
Set getHTMLDocument = Document
Else
MsgBox "URL: " & vbCrLf & "Ready state: " & .readyState & vbCrLf & "HTTP request status: " & .Status, vbInformation, "URL Not Responding"
End If
End With
End Function
There are a lot of things to fix in your code. It is late here so I will just give pointers (and update fully later) and working code below:
Declare all variables and use appropriate type
Review For Loops and how transpose can be used to create a 1d array of urls pulled from sheet to loop over
Review the difference between querySelector and querySelectorAll methods
Review CSS selectors (you are specifying everything as type selector when in fact you are not selecting by tag for the elements of interest; nor by your stated text)
Think about placement of your IE object creation and of your .Navigate2 to make use of existing object
Make sure to use distinct loop counters
Be sure not to overwrite values in sheet
Code:
Option Explicit
Public Sub ListingInfo()
Dim ie As New InternetExplorer, ws As Worksheet
Dim i As Long, urls(), rowCounter As Long
Dim title As Object, price As Object, description As Object
Set ws = ThisWorkbook.Worksheets("Sheet1")
urls = Application.Transpose(ws.Range("A1:A2").Value) '<= Adjust
With ie
.Visible = True
For i = LBound(urls) To UBound(urls)
If InStr(urls(i), "http") > 0 Then
rowCounter = rowCounter + 1
.Navigate2 urls(i)
While .Busy Or .readyState < 4: DoEvents: Wend
Set title = .document.querySelector(".it-ttl")
Set price = .document.querySelector("#prcIsum")
Set description = .document.querySelector("#viTabs_0_is")
ws.Cells(rowCounter, 3) = title.innerText
ws.Cells(rowCounter, 4) = price.innerText
ws.Cells(rowCounter, 5) = description.innerText
Set title = Nothing: Set price = Nothing: Set description = Nothing
End If
Next
.Quit
End With
End Sub
Here's an approach using Web Requests, using MSXML. It should be significantly faster than using IE, and I'd encourage you to strongly consider using this approach wherever possible.
You'll need references to Microsoft HTML Object Library and Microsoft XML v6.0 to get this working.
Option Explicit
Public Sub SubmitRequest()
Dim URLs As Excel.Range
Dim URL As Excel.Range
Dim LastRow As Long
Dim wb As Excel.Workbook: Set wb = ThisWorkbook
Dim ws As Excel.Worksheet: Set ws = wb.Worksheets(1)
Dim ListingDetail As Variant
Dim i As Long
Dim j As Long
Dim html As HTMLDocument
ReDim ListingDetail(0 To 2, 0 To 10000)
'Get URLs
With ws
LastRow = .Cells(.Rows.Count, 1).End(xlUp).Row
Set URLs = .Range(.Cells(1, 1), .Cells(LastRow, 1))
End With
'Update the ListingDetail
For Each URL In URLs
Set html = getHTML(URL.Value2)
ListingDetail(0, i) = html.getElementByID("itemTitle").innertext 'Title
ListingDetail(1, i) = html.getElementByID("prcIsum").innertext 'Price
ListingDetail(2, i) = html.getElementsByClassName("viSNotesCnt")(0).innertext 'Seller Notes
i = i + 1
Next
'Resize array
ReDim Preserve ListingDetail(0 To 2, 0 To i - 1)
'Dump in Column T,U,V of existing sheet
ws.Range("T1:V" & i).Value = WorksheetFunction.Transpose(ListingDetail)
End Sub
Private Function getHTML(ByVal URL As String) As HTMLDocument
'Add a reference to Microsoft HTML Object Library
Set getHTML = New HTMLDocument
With New MSXML2.XMLHTTP60
.Open "GET", URL
.send
getHTML.body.innerHTML = .responseText
End With
End Function

Scrape using XMLHTTP throws error at specific class name

I am trying to scrape a site with this code to extract names and contacts ...
Sub Test()
Dim htmlDoc As Object
Dim htmlDoc2 As Object
Dim elem As Variant
Dim tag As Variant
Dim dns As String
Dim pageSource As String
Dim pageSource2 As String
Dim url As String
Dim row As Long
row = 2
dns = "https://www.zillow.com/detroit-mi/real-estate-agent-reviews/"
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", dns, True
.send
While .readyState <> 4: DoEvents: Wend
If .statusText <> "OK" Then
MsgBox "ERROR" & .Status & " - " & .statusText, vbExclamation
Exit Sub
End If
pageSource = .responseText
End With
Set htmlDoc = CreateObject("htmlfile")
htmlDoc.body.innerHTML = pageSource
Dim xx
'Got error here
Set xx = htmlDoc.getElementsByClassName("ldb-contact-summary")
Set htmlDoc = Nothing
Set htmlDoc2 = Nothing
End Sub
When trying to use this line
Set xx = htmlDoc.getElementsByClassName("ldb-contact-summary")
I got an error 'Object doesn't support that property or method' (438)
Can you help me please as I am not so good at scraping issues?
To get the names and their corresponding phone numbers, you can try the below snippet:
Sub GetProfileInfo()
Const URL$ = "https://www.zillow.com/detroit-mi/real-estate-agent-reviews/?page="
Dim Http As New XMLHTTP60, Html As New HTMLDocument
Dim post As HTMLDivElement, R&, P&
For p = 1 To 3 'put here the highest number you wanna traverse
With Http
.Open "GET", URL & p, False
.send
Html.body.innerHTML = .responseText
End With
For Each post In Html.getElementsByClassName("ldb-contact-summary")
With post.querySelectorAll(".ldb-contact-name a")
If .Length Then R = R + 1: Cells(R, 1) = .item(0).innerText
End With
With post.getElementsByClassName("ldb-phone-number")
If .Length Then Cells(R, 2) = .item(0).innerText
End With
Next post
Next p
End Sub
Reference to add to the library to execute the above script:
Microsoft xml, v6.0
Microsoft Html Object Library
As you mention all the pages in the comment above I will use a class to hold the XMLHTTP object and provide it with methods to extract the data, whilst incorporating a method to find the number of results pages and loop them. Testing this gave me 251 rows of results.
Note: Discovered through debugging that keeping the SetRequestHeader was causing, for you, requests for human verification. Removing this meant the XMLHTTP method worked. It worked with and without for me.
Class clsHTTP
Option Explicit
Private http As Object
Private Sub Class_Initialize()
Set http = CreateObject("MSXML2.XMLHTTP")
End Sub
Public Function GetString(ByVal url As String) As String
Dim sResponse As String
With http
.Open "GET", url, False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
sResponse = StrConv(.responseBody, vbUnicode)
GetString = sResponse
End With
End Function
Public Function GetInfo(ByVal html As HTMLDocument) As Variant
Dim names As Object, telNums As Object, i As Long, namesArray(), telsArray()
Set names = html.querySelectorAll("[class*='ldb-contact-name']")
Set telNums = html.querySelectorAll(".ldb-phone-number")
ReDim namesArray(0 To names.Length - 1)
ReDim telsArray(0 To telNums.Length - 1)
For i = 0 To names.Length - 1
namesArray(i) = names.item(i).innerText
telsArray(i) = telNums.item(i).innerText
Next
GetInfo = Array(namesArray, telsArray)
End Function
Standard module 1
Option Explicit
Public Sub GetReviewData()
Dim sResponse As String, html As HTMLDocument, http As clsHTTP
Dim numPages As Long, pageNum As Long, url As String
Dim results As Collection, item As Variant, ws As Worksheet
url = "https://www.zillow.com/detroit-mi/real-estate-agent-reviews/"
Set http = New clsHTTP
Set html = New HTMLDocument
Set results = New Collection
Set ws = ThisWorkbook.Worksheets("Sheet1")
With html
.body.innerHTML = http.GetString(url)
numPages = .querySelectorAll("[data-idx]").item(html.querySelectorAll("[data-idx]").Length - 2).innerText
results.Add http.GetInfo(html)
If numPages > 1 Then
For pageNum = 2 To numPages
url = "https://www.zillow.com/detroit-mi/real-estate-agent-reviews/?page=" & pageNum & "&showAdvancedItems=false&regionID=17762&locationText=Detroit%20MI"
.body.innerHTML = http.GetString(url)
results.Add http.GetInfo(html)
Next
End If
Dim numResults As Long
If results.Count > 0 Then
Application.ScreenUpdating = False
For Each item In results
numResults = UBound(item(0)) + 1
With ws
.Cells(GetLastRow(ws, 1) + 1, 1).Resize(numResults, 1) = Application.Transpose(item(0))
.Cells(GetLastRow(ws, 2) + 1, 2).Resize(numResults, 1) = Application.Transpose(item(1))
End With
Next
Application.ScreenUpdating = True
End If
End With
End Sub
Public Function GetLastRow(ByVal ws As Worksheet, Optional ByVal columnNumber As Long = 1) As Long
With ws
GetLastRow = .Cells(.Rows.Count, columnNumber).End(xlUp).Row
End With
End Function
Selenium:
Option Explicit
Public Sub GetReviewData()
Dim html As HTMLDocument
Dim numPages As Long, pageNum As Long, url As String
Dim results As Collection, item As Variant, ws As Worksheet
Dim d As WebDriver, elements As WebElements
url = "https://www.zillow.com/detroit-mi/real-estate-agent-reviews/?page=1&showAdvancedItems=false&regionID=17762&locationText=Detroit%20MI"
Set html = New HTMLDocument
Set results = New Collection
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set d = New ChromeDriver
With d
.Start "Chrome"
.get url
Set elements = .FindElementsByCss("[data-idx]")
numPages = elements(elements.Count - 1).Text
html.body.innerHTML = .PageSource
results.Add GetInfo(html)
If numPages > 1 Then
For pageNum = 2 To numPages
Application.Wait Now + TimeSerial(0, 0, 2)
url = "https://www.zillow.com/detroit-mi/real-estate-agent-reviews/?page=" & pageNum & "&showAdvancedItems=false&regionID=17762&locationText=Detroit%20MI"
.get url
html.body.innerHTML = .PageSource
results.Add GetInfo(html)
Next
End If
Dim numResults As Long
If results.Count > 0 Then
Application.ScreenUpdating = False
For Each item In results
numResults = UBound(item(0)) + 1
With ws
.Cells(GetLastRow(ws, 1) + 1, 1).Resize(numResults, 1) = Application.Transpose(item(0))
.Cells(GetLastRow(ws, 2) + 1, 2).Resize(numResults, 1) = Application.Transpose(item(1))
End With
Next
Application.ScreenUpdating = True
End If
End With
End Sub
Public Function GetLastRow(ByVal ws As Worksheet, Optional ByVal columnNumber As Long = 1) As Long
With ws
GetLastRow = .Cells(.Rows.Count, columnNumber).End(xlUp).Row
End With
End Function
Public Function GetInfo(ByVal html As HTMLDocument) As Variant
Dim names As Object, telNums As Object, i As Long, namesArray(), telsArray()
Set names = html.querySelectorAll("[class*='ldb-contact-name']")
Set telNums = html.querySelectorAll(".ldb-phone-number")
ReDim namesArray(0 To names.Length - 1)
ReDim telsArray(0 To telNums.Length - 1)
For i = 0 To names.Length - 1
namesArray(i) = names.item(i).innerText
telsArray(i) = telNums.item(i).innerText
Next
GetInfo = Array(namesArray, telsArray)
End Function

Scrape text from a website using Excel VBA

I found this article explaining how to scrape certain tags from a website using Excel VBA.
The code below gets the content from the first <p> tag that it finds:
Sub get_title_header()
Dim wb As Object
Dim doc As Object
Dim sURL As String
Dim lastrow As Long
lastrow = Sheet1.Cells(Rows.Count, "A").End(xlUp).Row
For i = 2 To lastrow
Set wb = CreateObject("internetExplorer.Application")
sURL = Cells(i, 1)
wb.navigate sURL
wb.Visible = True
While wb.Busy
DoEvents
Wend
'HTML document
Set doc = wb.document
Cells(i, 2) = doc.title
On Error GoTo err_clear
Cells(i, 3) = doc.GetElementsByTagName("p")(0).innerText
err_clear:
If Err <> 0 Then
Err.Clear
Resume Next
End If
wb.Quit
Range(Cells(i, 1), Cells(i, 3)).Columns.AutoFit
Next i
End Sub
I'd like to make the scraper get all the content that is within a <p> tag on a webpage. So I guess a foreach functionality of some kind is missing.
How can the content from multiple <p> tags be collected?
UPDATE
The working code!
Sub get_title_header()
Dim wb As Object
Dim doc As Object
Dim sURL As String
Dim lastrow As Long
Dim i As Integer
lastrow = Sheet1.Cells(Rows.Count, "A").End(xlUp).Row
For i = 2 To lastrow
Set wb = CreateObject("internetExplorer.Application")
sURL = Cells(i, 1)
wb.navigate sURL
wb.Visible = True
While wb.Busy
DoEvents
Wend
'HTML document
Set doc = wb.document
Cells(i, 2) = doc.Title
On Error GoTo err_clear
Dim el As Object
For Each el In doc.GetElementsByTagName("p")
counter = counter + 1
Cells(i, counter + 2).Value = Cells(counter + 1).Value & el.innerText
Next el
counter = 0
err_clear:
If Err <> 0 Then
Err.Clear
Resume Next
End If
wb.Quit
Range(Cells(i, 1), Cells(i, 10)).Columns.AutoFit
Next i
End Sub
You're almost there! doc.GetElementsByTagName("p") returns a collection of HTMLParagraphElement objects of which you accessed the first entry using doc.GetElementsByTagName("p")(0). As you allude to, a For Each loop would let you access each in turn:
Sub get_title_header()
Dim wb As Object
Dim doc As Object
Dim sURL As String
Dim lastrow As Long
Dim i As Integer
lastrow = Sheet1.Cells(Rows.Count, "A").End(xlUp).Row
For i = 2 To lastrow
Set wb = CreateObject("internetExplorer.Application")
sURL = Cells(i, 1)
wb.navigate sURL
wb.Visible = True
While wb.Busy
DoEvents
Wend
'HTML document
Set doc = wb.document
Cells(i, 2) = doc.Title
On Error GoTo err_clear
Dim el As Object
For Each el In doc.GetElementsByTagName("p")
Cells(i, 3).Value = Cells(i, 3).Value & ", " & el.innerText
Next el
err_clear:
If Err <> 0 Then
Err.Clear
Resume Next
End If
wb.Quit
Range(Cells(i, 1), Cells(i, 3)).Columns.AutoFit
Next i
End Sub
If you just need to get the content of the webpage in plain text this code is more concise
Function WEBSITE_TEXT(Destination As String) As String
' Requires a reference to Microsoft XML, v6.0
' Draws on the stackoverflow answer at bit.ly/parseXML
Dim myRequest As XMLHTTP60
Dim myDomDoc As DOMDocument60
' Check and clean inputs
On Error GoTo exitRoute
If Destination = "" Then
WEBSITE_TEXT = ""
Exit Function
End If
' Read the XML data from the Google Maps API
Set myRequest = New XMLHTTP60
myRequest.Open "GET", Destination, False
myRequest.send
' Parse HTML content
Dim html As New HTMLDocument
Dim text As String
html.body.innerHTML = myRequest.responseText
' Return the website content
text = html.body.innerText
If Not html Is Nothing Then WEBSITE_TEXT = text
exitRoute:
' Tidy up
text = ""
Set myRequest = Nothing
End Function

Resources