How to extract text from div element under numerous elements?
Hello, I am trying to scrape the data from the website. I have attached the code via paste bin. However the text for example "NET SALES" that I need is under tr then td then div. I am having a hard time extracting that data. If it is NET SALES I would like to get the value of the div inside the 4th td. I have attached the code I have so far. The report generated is also dynamic, so having a hard time identifying a correct id for it. The table id" ctl00_cpMain_rptMain_fixedTable" can be used but I can't get the code to navigate down all those elements to get to the tbody that contains the tr and elements I need. Thank you
Sub GetData()
Dim IE As Object
Dim ListOfRows As Object
Dim CellsInsideRow As Object
Dim DivsInsideCell As Object
Dim StrInsideDiv As Object
Dim tRows As Object
Dim tCells As Object
Dim tDivs As Object
Dim Content As Object
Dim ContentArea As Object
Dim ThingInDiv As Object
Dim tThings
Dim t As Integer
Dim s As String
Dim Tbody As Object
Dim ttDivs As HTMLDivElement
Dim DivsInsideOfDivs As Object
Set IE = CreateObject("InternetExplorer.Application")
'Dim IE As SHDocVw.InternetExplorer'
Dim HtmlDoc As MSHTML.HTMLDocument
Dim htmlinput As MSHTML.IHTMLElement
Set IE = New SHDocVw.InternetExplorer
' Open Home Page'
With IE
.Visible = True
.navigate ""
End With
' Identify Source Code'
Set HtmlDoc = IE.document
'Mention this bug'
' Input Username and Password'
'On Error Resume Next
'IE.document.forms("aspnetForm").elements("ctl00$cpMain$logMain$UserName").Value = "**********"
'IE.document.forms("aspnetForm").elements("ctl00$cpMain$logMain$Password").Value = "******"
Do While IE.Busy
Do While IE.Busy
IE.document.forms("aspnetForm").elements("ctl00$cpMain$StartDate").Value = Range("A1").Text '
Do While IE.Busy
Do While IE.Busy
On Error Resume Next
Set ListOfRows = IE.document.getElementsByTagName("tr")
Debug.Print ListOfRows.Length
For Each tRows In ListOfRows
Set CellsInsideRow = tRows.getElementsByTagName("td")
For Each tCells In CellsInsideRow
Set DivsInsideCell = tCells.getElementsByTagName("div")
For Each tDivs In DivsInsideCell
Set ttDivs = tDivs
'If Trim(ttDivs.innerText) = "AVERAGE NET SALE" Then
'Debug.Print Trim(ttDivs.innerText)
'End If
Next tDivs
Next tCells
Next tRows
'Set ListOfRows = Content.getElementsByTagName("tr")
'MsgBox (Len(ListOfRows))
'For Each tRows In ListOfRows
' Set CellsInsideRow = tRows.getElementsByTagName("td")
' For Each tCells In CellsInsideRow
' Set DivsInsideCell = tCells.getElementsByTagName("div")
'Next tCells
'Next tRows
End Sub

You have nested tables but in your html there is a table with id content that has all the info. You can gather the rows (tr) in that table and loop looking for the string of interest. If you use .innerText on the row you will get the financial amount along with the text description. Otherwise, you can extract the sub info from the td elements within the row. Something like the following:
Dim html As HTMLDocument, tableRows As Object, i As Long, rowOfInterest As Object
Set html = ie.document
Dim a As Object
Set tableRows = html.querySelectorAll("#content tr")
For i = 0 To tableRows.Length - 1
If InStr(tableRows.Item(i).innerText, "NET SALES") > 0 Then
Set rowOfInterest = tableRows.Item(i)
Exit For
End If
If Not rowOfInterest Is Nothing Then
Debug.Print tableRows.Item(i).innerText
Debug.Print tableRows.Item(i).getElementsByTagName("td")(1).innerText
End If
Sample of the row info:
The values are populated through javascript so you need to allow enough time for population.

The HTML is a bit of a mess. Inside the target HTMLTableCell is another table and many divs inside that tables cels. I would just split the target cells.InnerText by line breaks and process the split data. It'll be a lot easier than trying to decipher this HTML.
Sub ProcessData(IE As InternetExplorerMedium)
Const AverageNetSalesText As String = "AVERAGE NET SALE"
Dim doc As HTMLDocument, tbl As HTMLTable, ListRows As Object, tr As HTMLTableRow, td As HTMLTableCell
Dim data As Variant
Set doc = IE.Document
Set tbl = doc.getElementById("ctl00_cpMain_rptMain_fixedTable")
Set ListRows = tbl.getElementsByTagName("TR")
For Each tr In ListRows
If tr.Cells.Length >= 2 Then
Set td = tr.Cells(2)
If Not td Is Nothing Then
If InStr(td.innerText, "AVERAGE") > 0 Then
data = Split(td.innerText, vbNewLine)
Dim AverageNetSales As Single
If UBound(data) >= 15 Then
If InStr(data(15), AverageNetSalesText) > 0 Then
data(15) = Trim(data(15))
AverageNetSales = Right(data(15), Len(data(15)) - Len(AverageNetSalesText))
'Do something
Range("A17") = AverageNetSales
Exit Sub
End If
End If
End If
End If
End If
End Sub


I am new scraping web data and also using For...Next. I am trying to get data (all pages) from a website but it seems the code is wrong, since I get error 91. This is the code:
Dim ie As Object
Sub connect()
ie.Visible = True
End Sub
Sub id_tr_td_for()
For i = 0 To 10
For j = 0 To 5
Cells(i + 1, j + 1) = ie.document.getElementById("toplists").getElementsByTagName("tr")(i).getElementsByTagName("td")(j).innerText
End Sub
Can somebody help me with it and also to let me know who can I list all pages?
Thank you.
I'm not sure where the error comes from, I got it too.
The following code should be helpful, it will print the contents of the table for the specified page(s) to the debug window.
The following code should copy all the data for selected pages to sheet1
You will need to Add a couple of references in the VBA Editor to be able to use it. (Tools Menu, References and then find and select them) Microsoft HTML Object Library and Microsoft Internet Controls
Const MaxPage = 2 ' set to 26 (or however many there are) - at 2 for testing purposes
Dim Browser As InternetExplorer
Sub Start()
Dim Page As Integer: Page = 1 ' start at page 1
Dim PageDocument As IHTMLDocument
Dim RecordRow As IHTMLElementCollection
Dim RecordItem As IHTMLElement
Dim Sheet As Worksheet: Set Sheet = ThisWorkbook.Worksheets("Sheet1") ' output sheet
If Browser Is Nothing Then
Set Browser = New InternetExplorer
End If
Dim oRow As Integer: oRow = 2 ' begin output at row 2 (account for header)
Dim Record As Integer
For Page = 1 To MaxPage
LoadPage Page
For Record = 0 To 99 ' zero index, 100 items (1-99)
Set PageDocument = Browser.Document
Set RecordRow = PageDocument.getElementById("toplists").getElementsByTagName("table")(0).getElementsByTagName("tbody")(0).getElementsByTagName("tr")(Record).getElementsByTagName("td")
Sheet.Cells(oRow, 1).Value = Trim(RecordRow(0).innerText)
Sheet.Cells(oRow, 2).Value = Trim(RecordRow(1).innerText)
Sheet.Cells(oRow, 3).Value = Trim(RecordRow(2).innerText)
Sheet.Cells(oRow, 4).Value = Trim(RecordRow(3).innerText)
Sheet.Cells(oRow, 5).Value = Trim(RecordRow(4).innerText)
Sheet.Cells(oRow, 6).Value = Trim(RecordRow(5).innerText)
oRow = oRow + 1
Next Record
Next Page
End Sub
Sub LoadPage(ByVal PageNumber As Integer)
Debug.Print "Navigating to Page #" & CStr(PageNumber)
Browser.navigate "" & CStr(PageNumber)
While Browser.readyState <> 4 Or Browser.Busy: DoEvents: Wend
Debug.Print "Navigation Complete"
End Sub
Updated Code
The Index Out-of-Bound error likely occurred due to the hard-coded indexes, if a page does not have 99 records it will fail, if a record doesn't have 5 fields, it will fail. The following code does away with indexes and just scrapes every row and cell it finds. You shouldn't get index errors but the output could be jagged.
Further Update
The 462 error was caused by the Browser.Quit. This closes the browser but does not set the reference to Nothing so when you run the code again it is trying to use a non-existent browser. Explicitly setting it to nothing at the end fixes this.
There is no link in the competitor column, the whole row has a data-url which is handled by something else. That URL can easily be accessed though.
Sub NewStart()
Dim PageDocument As IHTMLDocument
Dim Records As IHTMLElementCollection
Dim Record As IHTMLElement
Dim RecordItems As IHTMLElementCollection
Dim RecordItem As IHTMLElement
Dim OutputRow As Integer: OutputRow = 2
Dim OutputColumn As Integer
Dim Page As Integer
Dim Sheet As Worksheet: Set Sheet = ThisWorkbook.Worksheets("Sheet1")
If Browser Is Nothing Then
Set Browser = New InternetExplorer
Browser.Visible = True
End If
For Page = 1 To MaxPage
LoadPage Page
Set PageDocument = Browser.Document
Set Records = PageDocument.getElementById("toplists").getElementsByTagName("table")(0).getElementsByTagName("tbody")(0).getElementsByTagName("tr")
For Each Record In Records
Set RecordItems = Record.getElementsByTagName("td")
OutputColumn = 1
For Each RecordItem In RecordItems
Sheet.Cells(OutputRow, OutputColumn).Value = Trim(RecordItem.innerText)
OutputColumn = OutputColumn + 1
Next RecordItem
Sheet.Cells(OutputRow, OutputColumn).Value = "" & Record.getAttribute("data-athlete-url") ' This will add the link after the last column
OutputRow = OutputRow + 1
Next Record
Next Page
Set Browser = Nothing ' This will fix the 462 error
End Sub

I have written vba code for entering manufacturer part number in search box of below website and clicking on search icon. It is able enter manufacturer part number in search box and click on search icon, but when "search icon is clicked the text entered in the text box is not picked up". It searches empty data.
'HTML Part for search icon
<em class="fa fa-search" aria-hidden="true" style="color: gray;"></em>
It being almost a month I have tried various different way which was also mentioned on stack overflow, like using "createEvent("keyboardevent")" but nothing worked.
' VBA code
Sub AptivScrapping()
Dim IE As SHDocVw.InternetExplorer
Set IE = New InternetExplorer
IE.Visible = True
IE.navigate ""
Dim idoc As MSHTML.HTMLDocument
Set idoc = IE.document
idoc.getElementById("searchUserInput").Value = "33188785"
Dim doc_ele As MSHTML.IHTMLElement
Dim doc_eles As MSHTML.IHTMLElementCollection
Set doc_eles = idoc.getElementsByTagName("a")
For Each doc_ele In doc_eles
If doc_ele.getAttribute("ng-click") = "SearchButtonClick(1)" Then
Exit Sub
End If
Next doc_ele
End Sub
The page does an xhr request to retrieve the search results. You can find it in the network tab after clicking submit. This means you can avoid, in this case, the expense of a browser and issue an xhr request. The response is json so you do need a json parser to handle the results.
I would use jsonconverter.bas to parse the json. After installing the code from that link in a standard module called JsonConverter, go to VBE > Tools > References > Add a reference to Microsoft Scripting Runtime
I dimension an array to hold the results. I determine rows from the number of items in the json collection returned and the number of columns from the size of the first item dictionary. I loop the json object, and inner loop the dictionary keys of each dictionary in collection, and populate the array. I write the array out in one go at end which is less i/o expensive.
Option Explicit
Public Sub GetInfo()
Dim json As Object, ws As Worksheet, headers()
Dim item As Object, key As Variant, results(), r As Long, c As Long
Set ws = ThisWorkbook.Worksheets("Sheet1")
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "", False
Set json = JsonConverter.ParseJson(.responseText)("Products")
End With
headers = json.item(1).keys
ReDim results(1 To json.Count, 1 To UBound(headers) + 1)
For Each item In json
r = r + 1: c = 1
For Each key In item.keys
results(r, c) = item(key)
c = c + 1
With ws
.Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
.Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
End With
You can do this instead:
txt = "33188785"
IE.navigate "" & txt
This will take you straight to the Search Result.
Sub AptivScrapping()
Dim IE As SHDocVw.InternetExplorer
Dim txt As String
Set IE = New InternetExplorer
txt = "33188785"
IE.Visible = True
IE.navigate "" & txt
Do While IE.Busy
Application.Wait DateAdd("s", 1, Now)
End Sub
This will be faster as You will only have to load one page.
Why that's happening, i am not sure, but seems like the TextBox that is used to input text is not being Activated when adding text automatically to it. It is being activated when we click inside it.
I got the solution for above problem from below is the link for that post.
In this case I need to enter the search string character by character and sendKeys and input events inside the loop. Below is the working vba code.
Sub AptivScrapping()
Dim IE As SHDocVw.InternetExplorer
Set IE = New InternetExplorer
IE.Visible = True
IE.navigate ""
Dim idoc As MSHTML.HTMLDocument
Set idoc = IE.document
IE.document.getElementById("searchUserInput").Focus = True
sFieldInput = "33188785"
For s = 1 To Len(sFieldInput)
Application.SendKeys Mid(sFieldInput, s, 1)
While IE.readyState < 4 Or IE.Busy
Application.Wait DateAdd("s", LoopSeconds, Now)
Next s
IE.document.getElementById("searchUserInput").Focus = False
Dim doc_ele As MSHTML.IHTMLElement
Dim doc_eles As MSHTML.IHTMLElementCollection
Set doc_eles = idoc.getElementsByTagName("a")
For Each doc_ele In doc_eles
If doc_ele.getAttribute("ng-click") = "SearchButtonClick(1)" Then
Exit Sub
End If
Next doc_ele
End Sub

The goal is to get images from Google Images that match the part numbers in my database. My code runs, and it pulls up the correct Google pages but refuses to put the links into the spreadsheet. I have tried everything I can think of, but as of now, I keep on getting Error 1004 (Application-defined or Object-defined error).`
Sub SearchBotGoogleImgLink()
Dim objIE As Object
Set IE = CreateObject("InternetExplorer.Application")
Dim HTMLdoc As HTMLDocument
Dim imgElements As IHTMLElementCollection
Dim imgElement As HTMLImg
Dim aElement As HTMLAnchorElement
Dim n As Integer
Dim i As Integer
Dim url As String
Dim url2 As String
Dim m As Long
Dim lastRow As Long
Dim url3 As String
Dim SearchRow As Long
Dim aEle As HTMLLinkElement
SearchRow = 1
Do Until IsEmpty(ActiveSheet.Cells(SearchRow, 1))
Application.StatusBar = SearchRow - 1 & " of " & "4368" & " Items Done"
Item = Trim(ActiveSheet.Cells(SearchRow, 1))
url = "" & Cells(SearchRow, 1) & "&oq=A2N0015C3KUU&gs_l=img.12...0.0..1704...0.0..0.0.0.......1......gws-wiz-img.9wB6WwQJhwA"
Set objIE = New InternetExplorer
objIE.Visible = True
objIE.navigate url
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
For Each aEle In objIE.document.getElementsByTagName("IMG")
result = aEle
Sheets("Sheet1").Range(SearchRow & "C").Value = result
Sheets("Sheet1").Range(SearchRow & "D") = aEle.innerHTML
Sheets("Sheet1").Range(SearchRow & "F").Value = aEle.innerText
Debug.Print aEle.innerText
'For i = 1 To lastRow
'url = "" & Cells(SearchRow, 1) & "&oq=A2N0015C3KUU&gs_l=img.12...0.0..1704...0.0..0.0.0.......1......gws-wiz-img.9wB6WwQJhwA"
Set HTMLdoc = objIE.document
Set imgElements = HTMLdoc.getElementsByTagName("IMG")
n = 1
For Each imgElement In imgElements
If InStr(ingElement.src, sImageSearchString) Then
If imgElement.ParentNode.nodeName = "A" Then
Set aElement = imgElement.ParentNode
If n = 2 Then
url2 = aElement.href 'imgElement.src
url3 = imgElement.src 'aElement.href
n = n + 1
End If
End If
End If
Cells(SearchRow, 5) = url2
Set IE = Nothing
End Sub
Notes on your code:
You need Option Explicit at the top of your code to check on variable declarations and typos amongst other advantages. There are a number of missing declarations e.g. result, and used ones later e.g. Set IE = CreateObject("InternetExplorer.Application"). You have two different variables (one late bound and one early) both creating IE instances. You only in fact use one.
Your current error may be down to you trying to work with an object here:
result = aEle which won't work without the Set keyword to provide the required reference.
Without example URLs and expected output it is difficult to advise on the later loops in your code. You appear to have a duplicate loop over IMG elements but this time with some restrictions. It is likely these loops can be merged.
An example:
The following uses an arbitrary concatenation in to pull the img src links in from search results based on A2N0015C3KUU.
It uses a CSS selector combination of #ires img[src] to target elements with img tags and src attributes within the parent element with id ires (search results).
It is to demonstrate the principle of gathering aNodeList of matching elements and writing out to a sheet. The querySelectorAll method applied the CSS selector combination to the HTMLDocument and returns the nodeList. The nodeList is looped along its .Length, with items accessed by index starting at 0.
Option Explicit
Public Sub GetInfo()
Dim IE As New InternetExplorer
With IE
.Visible = True
.navigate ""
While .Busy Or .readyState < 4: DoEvents: Wend
Dim aNodeList As Object, i As Long
Set aNodeList = IE.document.querySelectorAll("#ires img[src]")
For i = 0 To aNodeList.Length - 1
ActiveSheet.Cells(i + 2, 4) = aNodeList.item(i).src
'Quit '<== Remember to quit application
End With
End Sub

What I have to do is use Excel VBA to:
login to Amazon Seller
open a workbook
loop through a column to get an order number
put it in the search box
hit the search button
go to the order page and extract the data
then have the extracted data go back into a specified column in
another Excel workbook
The loop and order number parts are what I'm currently stumped on. I've figured out this much code as of this moment:
Sub MyAmazonSeller()
Dim MyHTML_Element As IHTMLElement
Dim MyURL As String
Dim oSignInLink As HTMLLinkElement
Dim oInputEmail As HTMLInputElement
Dim oInputPassword As HTMLInputElement
Dim oInputSigninButton As HTMLInputButtonElement
'InputSearchOrder will be the destination for order numbers taken from the workbook
Dim InputSearchOrder As HTMLInputElement
Dim InputSearchButton As HTMLInputButtonElement
Dim IE As InternetExplorer
Dim AAOrder As Workbook
Dim AAws As Worksheet
MyURL = ""
Set IE = New InternetExplorer
' Open the browser and navigate.
With IE
.Silent = True
.Navigate MyURL
.Visible = True
Loop Until .ReadyState = READYSTATE_COMPLETE
End With
' Get the html document.
Set HTMLDoc = IE.Document
' See if you have the sign in link is because you are in the main
' page
Set oSignInLink = HTMLDoc.getElementById("signin-button-container")
If Not oSignInLink Is Nothing Then
End If
' Get the email field and the next button
Set oInputEmail = HTMLDoc.getElementById("username")
Set oInputPassword = HTMLDoc.getElementById("password")
' Click the button and wait
oInputEmail.Value = ""
' Get the password field and the sign in button
Set oInputPassword = HTMLDoc.getElementById("password")
Set oInputSigninButton = HTMLDoc.getElementById("sign-in-button")
' Click the button and wait
oInputPassword.Value = "xxxxxxxx"
Application.Wait (Now + TimeValue("0:00:05"))
Set AAOrder = Application.Workbooks.Open("Z:\Employee Folders\Employee\trackingnumber_sample_spreadsheet.xls")
Set AAws = AAws.Worksheets("PrimeOrdersWithNoFulfillmentRe")
Set InputSearchOrder = HTMLDoc.getElementById("sc-search-field")
'What I'm currently stuck on
InputSearchOrder.Value = "001-7163923-7572632"
Set InputSearchButton = HTMLDoc.getElementsByClassName("sc-search-button")(0)
'Was able to add this snippet, but I'm getting an error 13, most likely with
'my e variable. I'm basically trying to do a loop within a loop, extracting 5
'pieces of data and sticking them back into their respective columns in the
'original Excel sheet. The problem comes when scraping the HTML. I'm basically
'trying to get text in the tables which have a few levels and it's frustrating
'me to no end.
With HTMLDoc
Set elems = HTMLDoc.getElementsByTagName("td")
For Each e In elems
If e.innerText Like "*1Z*" Then
Range("D2").Value = e.innerText
End If
Next e
End With
If Err <> 0 Then
Resume Next
End If
End Sub

<span itemprop="streetAddress">
**94 Grand St**
how to get this data through getelementby method in excel vba
I have tried getelementbyid, getelementbyname etc. but nothing is working
Option Explicit
Sub find()
'Uses late binding, or add reference to Microsoft HTML Object Library
' and change variable Types to use intellisense
Dim ie As Object 'InternetExplorer.Application
Dim html As Object 'HTMLDocument
Dim Listings As Object 'IHTMLElementCollection
Dim l As Object 'IHTMLElement
Dim r As Long
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = False
.Navigate ""
' Don't show window
'Wait until IE is done loading page
Do While .readyState <> 4
Application.StatusBar = "Downloading information, Please wait..."
Set html = .Document
End With
Set Listings = html.getElementsByTagName("span") ' ## returns the list
MsgBox (Listings(0))
For Each l In Listings
'## make sure this list item looks like the listings Div Class:
' then, build the string to put in your cell
Range("A1").Offset(r, 0).Value = l.innerText
r = r + 1
Set html = Nothing
Set ie = Nothing
End Sub
The above program is used by me to get the innerText value inside the span tag... but its not working
For the single result you are looking for in detail you want to use these two lines in your code (there is only 1 listing at the detailed level)
Adapt your IE code
Set Listings = html.getElementbyid("bizInfoBody") ' ## returns the list
Range("A1").Offset(r, 0).Value = Listings.innerText
Sub GetTxt()
Dim objXmlHTTP As Object
Dim objHtmlDoc As Object
Dim objHtmlBody As Object
Dim objTbl As Object
Dim strResponse As String
Dim strSite As String
Set objHtmlDoc = CreateObject("htmlfile")
Set objHtmlBody = objHtmlDoc.body
Set objXmlHTTP = CreateObject("MSXML2.XMLHTTP")
strSite = ""
With objXmlHTTP
.Open "GET", strSite, False
If .Status = 200 Then
strResponse = .responseText
objHtmlBody.innerHTML = objXmlHTTP.responseText
Set objTbl = objHtmlBody.Document.getElementbyid("bizInfoBody")
MsgBox objTbl.innerText
End If
End With
End Sub
