Loop through NextSibling DIV tagname Selenium VBA - excel

In my HTML page, at this link https://pastebin.com/nu0dLvch
There are elements which have the id "DetailSection1" .. And I am trying to loop through the div tags which is after this id (the tags are five in count)
This is my try but didn't work for me
Dim v, post As Object, sibling As Object, i As Long
Set post = .FindElementsByCss("#DetailSection1")
For i = 1 To post.Count
'Debug.Print post.Item(i).Attribute("outerHTML")
Set sibling = post.Item(i).NextSibling
Select Case sibling.NodeType
Case 3
v = sibling.NodeValue
Case 1
v = sibling.innerText
End Select
Debug.Print v
Next i
How can I loop through the five tags of DIV after this element of id "#DetailSection1"?
This is what I got as Arabic characters (Green is what I got while the yellow part is the correct characters)

1) If you want to use your logic then you will need to loop the nextElementSiblings until empty string is returned (rather than hardcode loop to 5). n.b. I am using nextElementSibling as I want the next sibling element node (node type 1).
W3C Element Traversal Specification
2.4. nextElementSibling:
Accessing this attribute of an element must return a reference to the sibling node of that element which most
immediately follows that element in document order, and which is of
nodeType 1, as an Element object. If the element on which this
attribute is accessed does not have any following sibling nodes, or if
none of those following sibling nodes are element nodes, then this
attribute must return null.
This is an ie version, as I can test that, but should be easy to translate. Selenium Basic does not expose nextElementSibling method so you can still use HTMLDocument and querySelector to access from page html, via transfer from pageSource of webdriver (assuming no non-trivial html changes with MSHTML.HTMLDocument), or switch to nextSibling traversal; and add in nodeType = 1 test during loop.
Option Explicit
Public Sub TEST()
Dim ie As New InternetExplorer
ie.Visible = True
'ie.Document.Charset = "utf-8" ''< may be require for handling Arabic chars. Not required with my settings.
ie.Navigate2 "file:///C:/Users/<User>/Desktop/Test.html"
While ie.Busy Or ie.readyState <> READYSTATE_COMPLETE: DoEvents: Wend
Dim startNodes As Object, node As Object, firstNode As Boolean
Dim i As Long, c As Long, r As Long
Set startNodes = ie.document.querySelectorAll("#DetailSection1")
r = 1
For i = 0 To startNodes.Length - 1 'you could determine number of child divs to get num columns then use step 5 loop or mod to write out in rows and cols
Set node = Nothing
firstNode = True
c = 1
Do
If firstNode Then
Set node = startNodes.item(i).nextElementSibling
Else
Set node = node.nextElementSibling
End If
If node.innerText <> vbNullString Then
c = c + 1 'you may need backwards loop to reverse output columns
ActiveSheet.Cells(r, c) = Trim$(node.innerText)
End If
firstNode = False
Loop Until node.innerText = vbNullString
r = r + 1
Next
ie.Quit
End Sub
2) You could dynamically pick up class and use a combination of css attribute selectors to hopefully correctly target nodes of interest. I have read in your html from file via IE. It is shaky due to reliance on attributes and relationships. It is all nested tables with few distinguishing features and likely dynamic attribute values. If not dynamic, then hardcode value for targetClass as ad66b5fc2d-4b59-45e6-b104-e14dfb5b1dac-0.
Option Explicit
Public Sub Test()
Dim ie As New InternetExplorer
ie.Visible = True
ie.Navigate2 "file:///C:/Users/User/Desktop/Test.html"
While ie.Busy Or ie.readyState <> READYSTATE_COMPLETE: DoEvents: Wend
Dim startNodes As Object, i As Long, targetClass As String, cssSelector As String
targetClass = "." & ie.document.querySelector("#DetailSection1").nextElementSibling.className ' "." & ie.document.querySelector("div[style*='center']").className
cssSelector = targetClass & "[style*='center'], " & targetClass & "[style*='center'] ~ div[style*='text-align']"
Set startNodes = ie.document.querySelectorAll(cssSelector)
For i = 0 To startNodes.Length - 2 'you could determine number of child divs to get num columns then use step 5 loop or mod to write out in rows and cols
ActiveSheet.Cells(i + 1, 1) = startNodes.item(i).innerText
Next
ie.Quit
End Sub
Untested selenium translation (can't test so written from my memory (eek!):
Dim startNodes As Object, i As Long, targetClass As String, cssSelector As String
targetClass = "." & .FindElementByCss("div[style*='center']").Attribute("class")
cssSelector = targetClass & "[style*='center'], " & targetClass & "[style*='center'] ~ div[style*='text-align']"
Set startNodes = .FindElementsByCss(cssSelector)
For i = 0 To startNodes.Count - 2 'you could determine number of child divs to get num columns then use step 5 loop or mode to write out in rows and cols
ActiveSheet.Cells(i + 1, 1) = startNodes.item(i).Text
Next
You can explore whether you can get targetClass from (nodeType = 1):
.FindElementByCss("#DetailSection1").nextSibling.className
.FindElementByCss("#DetailSection1").nextSibling.Attribute("class") '<== I think this
I'm sorry I can't test. I'm not sure those versions will work but would be good to know.
Ref:
https://stackoverflow.com/a/62366101/6241235 # أبو عائشة ورقية ومحمد

In fact, I am very satisfied with the solutions presented by QHarr. But I am eager to discover and learn new skills, so I am trying on my side after studying well what QHarr presented and this is my try-on selenium
Sub Test()
Dim bot As New ChromeDriver, a(1 To 1000, 1 To 5), post As Object, i As Long, j As Long
With bot
.AddArgument "--headless"
.Get "file:///C:\Sample.html"
Set post = .FindElementsByCss("#DetailSection1")
If post.Count > 0 Then
For i = 1 To post.Count
For j = 1 To 5
a(i, j) = Application.WorksheetFunction.Clean(Trim$(post.Item(i).FindElementsByXPath("following-sibling::div")(j).Text))
Next j
Next i
ActiveSheet.Range("A1").Resize(post.Count, UBound(a, 2)).Value = a
End If
End With
End Sub
** Note: I welcome any new and other ideas as I am eager to learn about different approaches.

Related

How to scrape a class and if not found scrape another

I am using VBA to scrape a website. The scraper made by me works but I want to implement 2 more functions and don't really know how to do it. This is the code:
Sub pronutrition()
Set ie = CreateObject("InternetExplorer.Application")
my_url = "https://www.myprotein.ro/"
ie.Visible = True
i = 20
LastRow = ActiveSheet.Range("A" & ActiveSheet.Rows.Count).End(xlUp).Row
Set Rng = ActiveSheet.Range("A20:A" & LastRow)
For Each cell In Rng
ie.navigate my_url
Do While ie.Busy
DoEvents
Loop
Wait 1
ie.Document.getElementsByName("search")(0).Value = cell
ie.Document.getElementsByClassName("headerSearch_button")(0).Click
Do While ie.Busy
DoEvents
Loop
Wait 2
ActiveSheet.Range("B" & i) = ie.Document.getElementsByClassName("athenaProductBlock_productName")(0).innerText + ie.Document.getElementsByClassName("athenaProductBlock_fromValue")(0).innerText
Do While ie.Busy
DoEvents
Loop
Wait 2
ActiveSheet.Range("C" & i) = ie.Document.getElementsByClassName("athenaProductBlock_productName")(1).innerText + ie.Document.getElementsByClassName("athenaProductBlock_fromValue")(1).innerText
Do While ie.Busy
DoEvents
Loop
Wait 2
ActiveSheet.Range("D" & i) = ie.Document.getElementsByClassName("athenaProductBlock_productName")(2).innerText '+ ie.Document.getElementsByClassName("athenaProductBlock_priceValue")(2).innerText
Do While ie.Busy
DoEvents
Loop
Wait 2
ActiveSheet.Range("E" & i) = ie.Document.getElementsByClassName("athenaProductBlock_productName")(3).innerText '+ ie.Document.getElementsByClassName("athenaProductBlock_priceValue")(3).innerText
Do While ie.Busy
DoEvents
Loop
Wait 2
i = i + 1
Next cell
ie.Quit
MsgBox "Done"
End Sub
First I want to search for "athenaProductBlock_fromValue" class and if it doesn't find it to search for "athenaProductBlock_priceValue", and second, if it doesn't find more than 1 or 2 products (the range is set to 4) to stop the search (right now it returns and error if it doesn't find a 2nd or a 3rd product and won't go to search the next keyword).
Any advice would be appreciated.
Thank you!
Use a helper method to extract the HTMLCollection returned by the getElementsByClassName method. You can then check if the method returned any results.
Once you get back the collection filled, it's up to you how to handle it. You can loop and fill individual cells or join the results to fill a single cell. Also, if the Count is less then 2, ignore it etc.
Private Function TryExtractElementsByClassName(ByVal ie As Object,
ByVal className As String,
ByRef objCollection As VBA.Collection) As Boolean
'if ie is null, return false
If ie Is Nothing Then Exit Function
'if elements (HTMLCollection) is null, return false
Dim elements As Object
Set elements = ie.Document.getElementsByClassName(className)
If elements Is Nothing Then Exit Function
'fill collection
Dim element As Object, idx As Long
For idx = 0 To elements.Length
Set element = elements(idx)
If Not element Is Nothing Then objCollection.Add element
Next idx
'return
TryExtractElementsByClassName = objCollection.Count > 0
End Function
To call the helper method:
Sub Test()
Dim ie As Object
Set ie = CreateObject("InternetExplorer.Application")
Dim objColl As New VBA.Collection
'search single class name
If TryExtractElementsByClassName(ie, "athenaProductBlock_priceValue", objColl) Then
'handle results stored in objColl
End If
'search multiple class names separated by a space
If TryExtractElementsByClassName(ie, "athenaProductBlock_priceValue athenaProductBlock_fromValue", objColl) Then
'handle results stored in objColl
End If
End Sub

Web Scraping: Button clicking and help navigating through paths

I am trying to scrape some doctor names and addresses from the website: https://albertafindadoctor.ca/find-a-doc/directory
I am trying to solve the following issue:
Once on the doctor's toggle, I want to pull 4 pieces of data from the entire page, not just the first 25 displayed.
While the code works for the initial webpage, it only pulls the first 25 pieces of data. There are a significant number of other pages that I still need to pull (3822 different doctors).
Unfortunately, I'm at a loss on how to navigate and pull from these different pages. When I inspect elements to see how to navigate between pages a see matrix changing so I'm not sure if that has something to do with it?
Option Explicit
Sub GetAlbertaDoctors()
Dim objIE As InternetExplorer
Dim clinicEle As Object
Dim clinicName As String
Dim clinicAddress As String
Dim clinicCategory As String
Dim doctorName As String
Dim y As Integer
Set objIE = New InternetExplorer
objIE.Visible = True
objIE.Navigate "https://albertafindadoctor.ca/find-a-doc/directory"
While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Wend
Application.Wait (Now + TimeValue("0:00:5"))
objIE.Document.getElementsByClassName("physician-toggle")(0).Click
Application.Wait (Now + TimeValue("0:00:5"))
y = 2
For Each clinicEle In objIE.Document.getElementsByClassName("clinic")
clinicCategory = clinicEle.getElementsByClassName("pcn")(0).innerText
clinicName = clinicEle.getElementsByClassName("clinic-name")(0).innerText
doctorName = clinicEle.getElementsByTagName("h3")(0).innerText
clinicAddress = clinicEle.getElementsByClassName("address")(0).innerText
Sheets("Sheet2").Range("A" & y).Value = clinicCategory
Sheets("Sheet2").Range("B" & y).Value = clinicName
Sheets("Sheet2").Range("C" & y).Value = doctorName
Sheets("Sheet2").Range("D" & y).Value = clinicAddress
y = y + 1
Next
objIE.Quit
End Sub
When I run this, I get the error 91 "Object variable or With block variable not set" on the clicking line:
objIE.Document.getElementsByClassName("physician-toggle active")(0).Click
You don't need to loop all pages. You can use the browser to get to that page and click on Doctors if required. After that, grab the number of results and then mimic the xhr request the page makes for listings - which is returned as json. Alter the query string the page makes i.e. the parameter for limit to get all listings. Use a json parser (I use jsonconverter - instructions in the code for installation) to parse out your info.
There is a proper page load wait and a couple of loops to ensure elements are present. These should really be timed loops. See loop format here.
I add an additional test to ensure you do not attempt to click Doctors when it is not required to do so.
Not all listings has all info hence the On Error Resume Next paired with On Error GoTo 0. Looks like you may be able to build a dictionary to fill in some of the blank values based on existing paired values (or using ids present in json object).
I store all results in an array and write out in one go.
Option Explicit
'VBE > Tools > References:
' Microsoft Internet Controls
' Microsoft Scripting Runtime
'Download and add to standard module called jsonconverter from https://github.com/VBA-tools/VBA-JSON/blob/master/JsonConverter.bas
Public Sub GetListings()
Dim ie As InternetExplorer, s As String, json As Object, newUrl As String
Set ie = New InternetExplorer
With ie
.Visible = True
.Navigate2 "https://albertafindadoctor.ca/find-a-doc/directory"
While .Busy Or .readyState < 4: DoEvents: Wend
With .document.querySelector("[data-cp-option=physician]")
If Not .className = "physician-toggle active" Then .Click
End With
Dim resultsInfo() As String, numResults As Long, ele As Object
Do
On Error Resume Next
Set ele = .document.querySelector(".paginator")
On Error GoTo 0
Loop While ele Is Nothing
Do
Loop While .document.querySelector(".paginator").innerText = vbNullString
resultsInfo = Split(Trim$(.document.querySelector(".paginator").innerText), "of ")
.Quit
End With
numResults = resultsInfo(UBound(resultsInfo))
newUrl = "https://albertafindadoctor.ca/search/directory/physicians?page=1&limit=" & numResults & "&with[]=pcn&with[]=clinics&with[]=languages&with[]=specialties"
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", newUrl, False
.send
Set json = JsonConverter.ParseJson(.responseText)("items")
End With
Dim row As Object, results(), r As Long, headers(), ws As Worksheet, key As Variant
headers = Array("clinicCategory", "clinicName", "doctorName", "clinicAddress")
Set ws = ThisWorkbook.Worksheets("Sheet1")
ReDim results(1 To json.Count, 1 To UBound(headers) + 1)
For Each row In json
r = r + 1
On Error Resume Next
For Each key In row.keys
Select Case key
Case "clinical_name"
results(r, 3) = row(key)
Case "pcn"
results(r, 1) = row(key)("name")
Case "clinics"
results(r, 2) = row(key)(1)("name")
results(r, 4) = Join$(Array(row(key)(1)("street_address"), row(key)(1)("city"), row(key)(1)("province"), row(key)(1)("postal_code")), ", ")
End Select
Next
On Error GoTo 0
Next
With ws
.Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
.Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
End With
End Sub
Sample output:
Reading:
querySelector
json
css selectors
arrays and arrays2

Pulling specific table cells from Morningstar, then looping to next Morningstar page

I am currently trying to scrape certain pieces of data from a table on Morningstar, then have it loop to the next ticker and repeat until there are no more tickers.
Currently, it will pull the entire "rank in category" row on the Trailing Total Returns table. I'm simply trying to pull the 3 month, 6 month, YTD, 1 year, 3 year, and 5 year. When it's done pulling those, it will loop to the next ticker as determined by the "Cells(p, 14)" in the navigate line.
ie. It detects "LINKX" is in cell 1, 14 so it navigates to http://performance.morningstar.com/fund/performance-return.action?t=LINKX&region=usa&culture=en_US and pulls all of the "Rank in Category" lines from "trailing total returns" table. I only want the specified ones put into specified cell locations, then loop to the next ticker.
I've browsed through many of these threads, using excel VBA I am trying to pull key specific info from a certain tickers page, then loop to next ticker and repeat.
Declare PtrSafe Function apiShowWindow Lib "user32" Alias "ShowWindow" _
(ByVal hwnd As LongPtr, ByVal nCmdShow As LongPtr) As LongPtr
Global Const SW_MAXIMIZE = 3
Global Const SW_SHOWNORMAL = 1
Global Const SW_SHOWMINIMIZED = 2
Sub LinkedInWebScrapeScript()
Dim objIE As InternetExplorer
Dim html As HTMLDocument
Set objIE = GetObject("new:{D5E8041D-920F-45e9-B8FB-B1DEB82C6E5E}")
objIE.Visible = 1
Dim p As Integer
p = 3
objIE.navigate ("http://performance.morningstar.com/fund/performance-return.action?t=" & Cells(p, 14) & "&region=usa&culture=en_US")
Application.Wait Now + #12:00:02 AM#
While objIE.Busy
DoEvents
Wend
apiShowWindow objIE.hwnd, SW_MAXIMIZE
For i = 1 To 2
objIE.document.parentWindow.scrollBy 0, 100000 & i
Application.Wait Now + #12:00:01 AM#
Next i
Dim TDelements As IHTMLElementCollection
Dim htmldoc As MSHTML.IHTMLDocument 'Document object
Dim eleColtr As MSHTML.IHTMLElementCollection 'Element collection for tr tags
Dim eleColtd As MSHTML.IHTMLElementCollection 'Element collection for td tags
Dim eleColtd1 As MSHTML.IHTMLElementCollection
Dim eleRow As MSHTML.IHTMLElement 'Row elements
Dim eleCol As MSHTML.IHTMLElement 'Column elements
Set htmldoc = objIE.document 'Document webpage
Set eleColtr = htmldoc.getElementsByTagName("tr") 'Find all tr tags
Set TDelements = htmldoc.getElementsByTagName("table")
'This section populates Excel
i = 0 'start with first value in tr collection
Set eleColtd = htmldoc.getElementsByClassName("r_table3 width955px print97")(0).getElementsByClassName("last")(0).getElementsByClassName("row_data divide") 'get all the td elements in that specific tr
For Each eleCol In eleColtd 'for each element in the td collection
Sheets("Sheet2").Range("A1").Offset(i, j).Value = eleCol.innerText 'paste the inner text of the td element, and offset at the same time
j = j + 1 'move to next element in td collection
Next eleCol 'rinse and repeat
i = i + 1
p = p + 1
objIE.navigate ("http://performance.morningstar.com/fund/performance-return.action?t=" & Cells(p, 14) & "&region=usa&culture=en_US")
Set eleColtd = htmldoc.getElementsByClassName("r_table3 width955px print97")(0).getElementsByClassName("last")(0).getElementsByClassName("row_data divide") 'get all the td elements in that specific tr
For Each eleCol In eleColtd 'for each element in the td collection
Sheets("Sheet2").Range("A1").Offset(i, j).Value = eleCol.innerText 'paste the inner text of the td element, and offset at the same time
z = z + 1
j = j + 1 'move to next element in td collection
Next eleCol 'rinse and repeat
End Sub
It will pull the entire "rank in category" row on the Trailing Total Returns table. I'm simply trying to pull the 3 month, 6 month, YTD, 1 year, 3 year, and 5 year. When it's done pulling those, it will loop to the next ticker as determined by the "Cells(p, 14)" in the navigate line.
The following shows a loop and how to select the appropriate table, tbody then table cells using css selectors. Tickers are read into an array from column N starting at row 1. It assumes there are not blank cells within the range (though you could add a test to be sure).
There is a loop over the array, which contains each ticker, and the TICKER placeholder in the url is replaced with the current ticker value.
There is a line to click on the monthly display tab.
The appropriate row is identified via
Set rankings = .querySelectorAll("#tab-month-end-content .last td")
#tab-month-end-content is an id selector which gets the right tab, then .last is the class selector for the class name of the last tbody (which is last), then td is used to specify the child td cells within that tbody.
CSS selectors:
Modern browsers are optimized for css. Css selectors are a fast way to match on elements in an html document. Css selectors are applied via querySelector or querySelectorAll methods; in this case, of HTMLDocument (ie.document). querySelector returns a single node: the first match for the css selector; querySelectorAll returns a nodeList of all matched items - you then index into that nodeList to get specific items e.g. the second td cell is at index 1.
Looking at the pattern we specified:
#tab-month-end-content .last td
The first part is an id selector, #, which selects an element by id
#tab-month-end-content
When applied to the page this returns two matches and we want the second
Click on image to enlarge
The next part
.last
is a class selector, ., for class name last. This selects the tbody tag child element shown in the image above. As only the second id matched element has this child we are now working with the right parent element to go on and select the td type elements using type selector
td
The whitespace,, in between each part described above are known as descendant combinators, and they specify that elements matched by the second selector are selected if they have an ancestor element matching the first selector i.e. that the selector to the left is a parent of the selector matched elements retrieved by the adjacent css selector to the right.
We can see this with the next image:
Click on image to enlarge
VBA:
Option Explicit
Public Sub GetData()
Dim ie As Object, tickers(), ws As Worksheet, lastRow As Long
Dim results(), headers(), r As Long, i As Long, url As String
headers = Array("ticker", "3m", "6m", "ytd", "1y", "3y", "6y")
Set ws = ThisWorkbook.Worksheets("Sheet1")
tickers = Application.Transpose(ws.Range("N1:N" & GetLastRow(ws, 14)).Value)
ReDim results(1 To UBound(tickers), 1 To UBound(headers) + 1)
Set ie = GetObject("new:{D5E8041D-920F-45e9-B8FB-B1DEB82C6E5E}")
With ie
.Visible = True
For i = LBound(tickers) To UBound(tickers)
r = r + 1
url = Replace$("http://performance.morningstar.com/fund/performance-return.action?t=TICKER&region=usa&culture=en_US", "TICKER", tickers(i))
.Navigate2 url
While .Busy Or .readyState < 4: DoEvents: Wend
.document.querySelector("[tabname='#tabmonth']").Click
Dim rankings As Object
Do
Loop While .document.querySelectorAll("#tab-month-end-content .last td").Length = 0 'could add timed loop here
With .document
Set rankings = .querySelectorAll("#tab-month-end-content .last td")
On Error Resume Next
results(r, 1) = tickers(i)
results(r, 2) = rankings.item(1).innerText
results(r, 3) = rankings.item(2).innerText
results(r, 4) = rankings.item(3).innerText
results(r, 5) = rankings.item(4).innerText
results(r, 6) = rankings.item(5).innerText
results(r, 7) = rankings.item(6).innerText
On Error GoTo 0
End With
Set rankings = Nothing
Next
ws.Cells(1, 15).Resize(UBound(results, 1), UBound(results, 2)) = results
.Quit
End With
End Sub
Public Function GetLastRow(ByVal ws As Worksheet, Optional ByVal columnNumber As Long = 1) As Long
With ws
GetLastRow = .Cells(.rows.Count, columnNumber).End(xlUp).Row
End With
End Function
As mentioned by #SIM, you could use xmlhttp and avoid browser though not sure with your security settings whether need to whitelist sites. You will need to explore if the placeholder is valid in the url here: XNAS:TICKER. The XNAS prefix may vary across your tickers, in which case you would need the appropriate string including prefix in column N and then replace the extended placeholder with that e.g. .....=PLACEHOLDER&region.......
Option Explicit
Public Sub GetData()
Dim tickers(), ws As Worksheet, lastRow As Long
Dim results(), headers(), r As Long, i As Long, url As String, html As HTMLDocument
Set html = New HTMLDocument 'vbe > tools > references > Microsoft HTML Object Library
headers = Array("ticker", "3m", "6m", "ytd", "1y", "3y", "6y")
Set ws = ThisWorkbook.Worksheets("Sheet1")
tickers = Application.Transpose(ws.Range("N1:N" & GetLastRow(ws, 14)).Value)
ReDim results(1 To UBound(tickers), 1 To UBound(headers) + 1)
With CreateObject("MSXML2.XMLHTTP")
For i = LBound(tickers) To UBound(tickers)
r = r + 1
url = Replace$("http://performance.morningstar.com/perform/Performance/fund/trailing-total-returns.action?&t=XNAS:TICKER&region=usa&culture=en-US&cur=&ops=clear&s=0P0000J533&ndec=2&ep=true&align=m&annlz=true&comparisonRemove=false&loccat=&taxadj=&benchmarkSecId=&benchmarktype=", "TICKER", tickers(i))
.Open "GET", url, False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.setRequestHeader "DNT", "1"
.send
html.body.innerHTML = .responseText
Dim rankings As Object
With html
Set rankings = .querySelectorAll(".last td")
On Error Resume Next
results(r, 1) = tickers(i)
results(r, 2) = rankings.item(1).innerText
results(r, 3) = rankings.item(2).innerText
results(r, 4) = rankings.item(3).innerText
results(r, 5) = rankings.item(4).innerText
results(r, 6) = rankings.item(5).innerText
results(r, 7) = rankings.item(6).innerText
On Error GoTo 0
End With
Set rankings = Nothing
Next
ws.Cells(1, 15).Resize(UBound(results, 1), UBound(results, 2)) = results
End With
End Sub
Public Function GetLastRow(ByVal ws As Worksheet, Optional ByVal columnNumber As Long = 1) As Long
With ws
GetLastRow = .Cells(.rows.Count, columnNumber).End(xlUp).Row
End With
End Function

Excel Macro To Pull Google Image Links

The goal is to get images from Google Images that match the part numbers in my database. My code runs, and it pulls up the correct Google pages but refuses to put the links into the spreadsheet. I have tried everything I can think of, but as of now, I keep on getting Error 1004 (Application-defined or Object-defined error).`
Sub SearchBotGoogleImgLink()
Dim objIE As Object
Set IE = CreateObject("InternetExplorer.Application")
Dim HTMLdoc As HTMLDocument
Dim imgElements As IHTMLElementCollection
Dim imgElement As HTMLImg
Dim aElement As HTMLAnchorElement
Dim n As Integer
Dim i As Integer
Dim url As String
Dim url2 As String
Dim m As Long
Dim lastRow As Long
Dim url3 As String
Dim SearchRow As Long
Dim aEle As HTMLLinkElement
Worksheets("Sheet1").Select
SearchRow = 1
Do Until IsEmpty(ActiveSheet.Cells(SearchRow, 1))
Sheets("Sheet1").Select
Application.StatusBar = SearchRow - 1 & " of " & "4368" & " Items Done"
Item = Trim(ActiveSheet.Cells(SearchRow, 1))
url = "https://www.google.com/search?hl=en&biw=1600&bih=796&tbm=isch&sa=1&ei=CTOpW_2jO6nAjwT67rqACw&q=A2N0015C3KUU&oq=" & Cells(SearchRow, 1) & "&oq=A2N0015C3KUU&gs_l=img.12...0.0..1704...0.0..0.0.0.......1......gws-wiz-img.9wB6WwQJhwA"
Set objIE = New InternetExplorer
objIE.Visible = True
objIE.navigate url
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
For Each aEle In objIE.document.getElementsByTagName("IMG")
result = aEle
Sheets("Sheet1").Range(SearchRow & "C").Value = result
Sheets("Sheet1").Range(SearchRow & "D") = aEle.innerHTML
Sheets("Sheet1").Range(SearchRow & "F").Value = aEle.innerText
Debug.Print aEle.innerText
Next
Loop
'For i = 1 To lastRow
'url = "https://www.google.com/search?hl=en&biw=1600&bih=796&tbm=isch&sa=1&ei=CTOpW_2jO6nAjwT67rqACw&q=A2N0015C3KUU&oq=" & Cells(SearchRow, 1) & "&oq=A2N0015C3KUU&gs_l=img.12...0.0..1704...0.0..0.0.0.......1......gws-wiz-img.9wB6WwQJhwA"
Set HTMLdoc = objIE.document
Set imgElements = HTMLdoc.getElementsByTagName("IMG")
n = 1
For Each imgElement In imgElements
If InStr(ingElement.src, sImageSearchString) Then
If imgElement.ParentNode.nodeName = "A" Then
Set aElement = imgElement.ParentNode
If n = 2 Then
url2 = aElement.href 'imgElement.src
url3 = imgElement.src 'aElement.href
n = n + 1
End If
End If
End If
Next
Cells(SearchRow, 5) = url2
IE.Quit
Set IE = Nothing
End Sub
Notes on your code:
You need Option Explicit at the top of your code to check on variable declarations and typos amongst other advantages. There are a number of missing declarations e.g. result, and used ones later e.g. Set IE = CreateObject("InternetExplorer.Application"). You have two different variables (one late bound and one early) both creating IE instances. You only in fact use one.
Your current error may be down to you trying to work with an object here:
result = aEle which won't work without the Set keyword to provide the required reference.
Without example URLs and expected output it is difficult to advise on the later loops in your code. You appear to have a duplicate loop over IMG elements but this time with some restrictions. It is likely these loops can be merged.
An example:
The following uses an arbitrary concatenation in to pull the img src links in from search results based on A2N0015C3KUU.
It uses a CSS selector combination of #ires img[src] to target elements with img tags and src attributes within the parent element with id ires (search results).
It is to demonstrate the principle of gathering aNodeList of matching elements and writing out to a sheet. The querySelectorAll method applied the CSS selector combination to the HTMLDocument and returns the nodeList. The nodeList is looped along its .Length, with items accessed by index starting at 0.
Option Explicit
Public Sub GetInfo()
Dim IE As New InternetExplorer
With IE
.Visible = True
.navigate "https://www.google.com/search?hl=en&biw=1600&bih=796&tbm=isch&sa=1&ei=CTOpW_2jO6nAjwT67rqACw&q=A2N0015C3KUU&oq=1&%20%22&oq=A2N0015C3KUU&gs_l=img.12...0.0..1704...0.0..0.0.0.......1......gws-wiz-img.9wB6WwQJhwA"
While .Busy Or .readyState < 4: DoEvents: Wend
Dim aNodeList As Object, i As Long
Set aNodeList = IE.document.querySelectorAll("#ires img[src]")
For i = 0 To aNodeList.Length - 1
ActiveSheet.Cells(i + 2, 4) = aNodeList.item(i).src
Next
'Quit '<== Remember to quit application
End With
End Sub

web scraping using excel and VBA

i wrote my VBA code in excel sheet as below but it is not scrape data for me and also i don't know why please any one help me. it gave me reullt as "click her to read more" onlyi want to scrape enitre data such as first name last name state zip code and so on
Sub extractTablesData()
Dim IE As Object, obj As Object
Dim myState As String
Dim r As Integer, c As Integer, t As Integer
Dim elemCollection As Object
Set IE = CreateObject("InternetExplorer.Application")
myState = InputBox("Enter the city where you wish to work")
With IE
.Visible = True
.navigate ("http://www.funeralhomes.com/go/listing/Search? name=&city=&state=&country=USA&zip=&radius=")
While IE.readyState <> 4
DoEvents
Wend
For Each obj In IE.document.all.item("state").Options
If obj.innerText = myState Then
obj.Selected = True
End If
Next obj
IE.document.getElementsByValue("Search").item.Click
Do While IE.Busy: DoEvents: Loop
ThisWorkbook.Sheets("Sheet1").Range("A1:K1500").ClearContents
Set elemCollection = IE.document.getElementsByTagName("TABLE")
For t = 0 To (elemCollection.Length - 1)
For r = 0 To (elemCollection(t).Rows.Length - 1)
For c = 0 To (elemCollection(t).Rows(r).Cells.Length - 1)
ThisWorkbook.Worksheets(1).Cells(r + 1, c + 1) = elemCollection(t).Rows(r).Cells(c).innerText
Next c
Next r
Next t
End With
Set IE = Nothing
End Sub
Using the same URL as the answer already given you could alternatively select with CSS selectors to get the elements of interest, and use split to get just the names and address parts from the text. We can also do away with the browser altogether to get faster results from first results page.
Business name:
You can get the name with the following selector (using paid listing example):
div.paid-listing .listing-title
This selects (sample view)
Try
Address info:
The associated descriptive information can be retrieved with the selector:
div.paid-listing .address-summary
And then using split we can parse this into just the address information.
Code:
Option Explicit
Public Sub GetTitleAndAddress()
Dim oHtml As HTMLDocument, nodeList1 As Object, nodeList2 As Object, i As Long
Const URL As String = "http://www.funeralhomes.com/go/listing/ShowListing/USA/New%20York/New%20York"
Set oHtml = New HTMLDocument
With CreateObject("WINHTTP.WinHTTPRequest.5.1")
.Open "GET", URL, False
.send
oHtml.body.innerHTML = .responseText
End With
Set nodeList1 = oHtml.querySelectorAll("div.paid-listing .listing-title")
Set nodeList2 = oHtml.querySelectorAll("div.paid-listing .address-summary")
With Worksheets("Sheet3")
.UsedRange.ClearContents
For i = 0 To nodeList1.Length - 1
.Range("A" & i + 1) = nodeList1.Item(i).innerText
.Range("B" & i + 1) = Split(nodeList2.Item(i).innerText, Chr$(10))(0)
Next i
End With
End Sub
Example output:
Yeah, without an API, this can be very tricky at best, and very inconsistent at worst. For now, you can try the script below.
Sub DumpData()
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
URL = "http://www.funeralhomes.com/go/listing/ShowListing/USA/New%20York/New%20York"
'Wait for site to fully load
IE.Navigate2 URL
Do While IE.Busy = True
DoEvents
Loop
RowCount = 1
With Sheets("Sheet1")
.Cells.ClearContents
RowCount = 1
For Each itm In IE.document.all
If itm.classname Like "*free-listing*" Or itm.classname Like "*paid-listing*" Then
.Range("A" & RowCount) = itm.classname
.Range("B" & RowCount) = Left(itm.innertext, 1024)
RowCount = RowCount + 1
End If
Next itm
End With
End Sub
You probably want some kind of input box to capture the city and state and radius from the user, or capture those variable in cells in your worksheet.
Notice, the '%20' is a space character.
I got this idea from a friend of mine, Joel, a long time ago. That guy is great!

Resources