VBA Web Scraping Code Loop Fails After 3rd Iteration - excel

I'm trying to pull data from multiple webpages (different stock pages from the same site). I can get the data pulled for the first 3 times the loop is executed but on the 4th iteration it brings up error 91: Object Variable or With block Variable not set up.
I tried moving around the internet explorer opening command so that it opens a new browser at the beginning of each iteration, and closes it at the end of the loop, to make sure the IE object wasn't somehow failing. That didn't work, same issue.
Sub GetStock()
Dim ws As Worksheet: Set ws = ActiveSheet
Dim cellnum As Range: Set cellnum = Range(ActiveCell.Address)
Dim i As Integer
Dim IE As Object
Dim text As String
i = 1
Do Until i > 10
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
cellnum = Range(ActiveCell.Offset(i, 7).Address)
With IE
.navigate cellnum.Value
Do While .Busy And .readyState <> 4: DoEvents: Loop
Sleep 1000
text = .Document.getElementsByClassName("classname")(1).outerText
End With
ws.Cells(i, 12).Value = text
i = i + 1
IE.Quit
Loop
End Sub
The links to the webpage are held within cells, hence the cellnum code. Finds the correct cell, retrieves the webpage within it, then moves on to the cell below it. The code is working perfectly for the first 3 iterations but for some reason fails on the 4th. The error code identifies the "text=.document.getElementsByClassName..." line as the error.

I think your issue is probably due to the element not existing on the webpage. If it does exist, are you sure you are pulling the right element from the collection?
Try running it with
.document.getElementsByClassName("classname")(0).outerText.
If that works then I would suggest looking at how many elements with the class "classname" are on the webpage. While on the other pages you may have 2 or more elements, it could be that on the 3rd page you only have one.
Can you post the webpages you are scraping?

Found the solution! The Sleep.1000 command wasn't providing enough time in all cases, and I guess the code was trying to pull data before a page was available. I thought the loop in there would solve that but I guess not (very new to this). Anyways, I changed it to Sleep.3000 to give my slow internet enough time to catch up and its working like a dream.
Thanks for all the help everyone.

Related

Scraping table behind login wall

I am struggling to get the right piece of code to scrape a table that is being a password protected website into an excel workbook. I have been able to get all of the code to work up to the scraping of the table part. When I run the code, it opens IE, logins in but then errors out (91: Object variable or WITH block variable not set). The code is below:
Private Sub CommandButton3_Click()
Declare variables
Dim IE As Object
Dim Doc As HTMLDocument
Dim HTMLTable As Object
Dim TableRow As Object
Dim TableCell As Object
Dim myRow As Long
'Create a new instance of Internet Explorer
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
'Navigate to the website
IE.Navigate "https://www.myfueltanksolutions.com/validate.asp"
'Wait for the page to finish loading
Do While IE.ReadyState <> 4
DoEvents
Loop
'Set the document object
Set Doc = IE.Document
'Fill in the security boxes
Doc.all("CompanyID").Value = "ID"
Doc.all("UserId").Value = "Username"
Doc.all("Password").Value = "Password"
'Click the submit button
Doc.all("btnSubmit").Click
'Wait for the page to finish loading
Do Until IE.ReadyState = READYSTATE_COMPLETE
DoEvents
Loop
'Set the HTMLTable object
Set HTMLTable = Doc.getElementById("RecentInventorylistform")
'Loop through each row in the table
For Each TableRow In HTMLTable.getElementsByTagName("tr")
'Loop through each cell in the row
For Each TableCell In TableRow.getElementsByTagName("td")
'Write the table cell value to the worksheet
Worksheets("Sheet1").Range("A5").Offset(myRow, 0).Value = TableCell.innerText
myRow = myRow + 1
Next TableCell
Next TableRow
Do Until IE.ReadyState = READYSTATE_COMPLETE
DoEvents
Loop
'Log out and close website
IE.Navigate ("https://www.myfueltanksolutions.com/signout.asp?action=rememberlogin")
IE.Quit
End Sub
I have included the HTML code of the table I am trying to scrape on the re-directed page after login.
I wont be tired to told it again and again and again and ... ;-)
Don't work with the IE anymore. MS is actively phasing it out!
But for explanation:
I'am sure, this is the code fragment which don't do what you expect:
...
...
'Wait for the page to finish loading
Do Until IE.ReadyState = READYSTATE_COMPLETE
DoEvents
Loop
'Set the HTMLTable object
Set HTMLTable = Doc.getElementById("RecentInventorylistform")
...
...
Waiting for READYSTATE_COMPLETE doesn't work here (for which reasons ever). So the code will go on without a stop and doesn't load the new content. The use of getElementByID() ends up in the named error then because there is no element with that id.
Excursus for some get-methods of the DOM (Document Object Model):
The methods getElementsByTagName() and getElementsByClassName() will build a node collection which contains all elements with the given criterion. If you build a collection like that with getElementsByTagName("a") you get a collection with all anchor tags. Every element of the collection can be called with it's index like in an array. If you want to know how many elements are in a collection like that you can read the attribute length. If there is no element you ask for, in our example a-tags, the length will be 0. But the collection was build so you have an object.
The get-methods which build a collection have an s for plural in ...Elements... But getElementByID() has no s because an id can only be once in a html document. No collection needed here. The method getElementByID() always try to buld an object from the asked criterion. If there is no element like that you will get the error that there is no object.
How to solve the issue:
We must change the termination criterion and the body of the loop. We must ask again and again if the element with the wanted id is present. To do that we must use the given line:
Set HTMLTable = Doc.getElementById("RecentInventorylistform")
Like I said before there will be raising an error if it is not present. That's right. But with On Error Resume Next we can ignore any error in the code.
Attention!
Only use this in specific situations and switch back to error handling with On Error GoTo 0 after the critical part of code.
Replace the code I posted above in this answer with the following one:
(To avoid endless loops it is recommended to use a time out mechanism too. But I will keep it simple here.)
Do
On Error Resume Next
Set HTMLTable = Doc.getElementById("RecentInventorylistform")
On Error GoTo 0
Loop While HTMLTable Is Nothing

Excel VBA - Web Scraping - Get value in HTML Table cell

I am trying to create a macro that scrapes a cargo tracking website.
But I have to create 4 such macros as each airline has a different website.
I am new to VBA and web scraping.
I have put together a code that works for 1 website. But when I tried to replicate it for another one, I am stuck in the loop. I think it maybe how I am referring to the element, but like I said, I am new to VBA and have no clue about HTML.
I am trying to get the "notified" value in the highlighted line from the image.
IMAGE:"notified" text to be extracted
Below is the code I have written so far that gets stuck in the loop.
Any help with this would be appreciated.
Sub FlightStat_AF()
Dim url As String
Dim ie As Object
Dim nodeTable As Object
'You can handle the parameters id and pfx in a loop to scrape dynamic numbers
url = "https://www.afklcargo.com/mycargo/shipment/detail/057-92366691"
'Initialize Internet Explorer, set visibility,
'call URL and wait until page is fully loaded
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = False
ie.navigate url
Do Until ie.readyState = 4: DoEvents: Loop
'Wait to load dynamic content after IE reports it's ready
'We can do that in a loop to match the point the information is available
Do
On Error Resume Next
Set nodeTable = ie.document.getElementByClassName("block-whisper")
On Error GoTo 0
Loop Until Not nodeTable Is Nothing
'Get the status from the table
MsgBox Trim(nodeTable.getElementsByClassName("fs-12 body-font-bold").innerText)
'Clean up
ie.Quit
Set ie = Nothing
Set nodeTable = Nothing
End Sub
Some basics:
For simple accesses, like the present ones, you can use the get methods of the DOM (Document Object Model). But there is an important difference between getElementByID() and getElementsByClassName() / getElementsByTagName().
getElementByID() searches for the unique ID of a html tag. This is written as the ID attribute to html tags. If the html standard is kept by the page, there is only one element with this unique ID. That's the reason why the method begins with getElement.
If the ID is not found when using the method, VBA throws a runtime error. Therefore the call is encapsulated in the loop from the other answer from me, into switching off and on again the error handling. But in the page from this question there is no ID for the html area in question.
Instead, the required element can be accessed directly. You tried the access with getElementsByClassName(). That's right. But here comes the difference to getElementByID().
getElementsByClassName() and getElementsByTagName() begin with getElements. Thats plural because there can be as many elements with the same class or tag name as you want. This both methods create a html node collection. All html elements with the asked class or tag name will be listet in those collections.
All elements have an index, just like an array. The indexes start at 0. To access a particular element, the desired index must be specified. The two class names fs-12 body-font-bold (class names are seperated by spaces, you can also build a node collection by using only one class name) deliver 2 html elements to the node collection. You want the second one so you must use the index 1.
This is the VBA code for the asked page by using the IE:
Sub FlightStat_AF()
Dim url As String
Dim ie As Object
'You can handle the parameters id and pfx in a loop to scrape dynamic numbers
url = "https://www.afklcargo.com/mycargo/shipment/detail/057-92366691"
'Initialize Internet Explorer, set visibility,
'call URL and wait until page is fully loaded
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = False
ie.navigate url
Do Until ie.readyState = 4: DoEvents: Loop
'Wait to load dynamic content after IE reports it's ready
'We do that with a fix manual break of a few seconds
'because the whole page will be "reload"
'The last three values are hours, minutes, seconds
Application.Wait (Now + TimeSerial(0, 0, 3))
'Get the status from the table
MsgBox Trim(ie.document.getElementsByClassName("fs-12 body-font-bold")(1).innerText)
'Clean up
ie.Quit
Set ie = Nothing
End Sub
Edit: Sub as function
This sub to test the function:
Sub testFunction()
Dim flightStatAfResult As String
flightStatAfResult = FlightStat_AF("057-92366691")
MsgBox flightStatAfResult
End Sub
This is the sub as function:
Function FlightStat_AF(cargoNo As String) As String
Dim url As String
Dim ie As Object
Dim result As String
'You can handle the parameters id and pfx in a loop to scrape dynamic numbers
url = "https://www.afklcargo.com/mycargo/shipment/detail/" & cargoNo
'Initialize Internet Explorer, set visibility,
'call URL and wait until page is fully loaded
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = False
ie.navigate url
Do Until ie.readyState = 4: DoEvents: Loop
'Wait to load dynamic content after IE reports it's ready
'We do that with a fix manual break of a few seconds
'because the whole page will be "reload"
'The last three values are hours, minutes, seconds
Application.Wait (Now + TimeSerial(0, 0, 3))
'Get the status from the table
result = Trim(ie.document.getElementsByClassName("fs-12 body-font-bold")(1).innerText)
'Clean up
ie.Quit
Set ie = Nothing
'Return value of the function
FlightStat_AF = result
End Function

Using Excel VBA to automate form filling in Internet Explorer

I want to take values from an excel sheet and store them in an array. I then want to take the values from the array and use them to fill the web form.
I have managed to store the values in the array and I have managed to get VBA to open Internet Explorer (IE)
The code runs and no errors appear, but the text fields are not being populated, nor is the button being clicked
(The debugger points to [While .Busy] as the error source, located in the WITH block)
How do I go about filling the form (that has a total of 3 text boxes to fill)?
There is also a drop down menu that I need to choose a value from, but I need to fill the text boxes prior to moving on to that part of the task.
Sub CONNECT_TO_IE()
the_start:
Dim ie As Object
Dim objElement As Object
Dim objCollection As Object
acct = GET_CLIENT_NAME()
name = GET_CODE()
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
ie.navigate ("<<my_website>>")
ie.FullScreen = False
On Error Resume Next
Do
DoEvents
If Err.Number <> 0 Then
ie.Quit
Set ie = Nothing
GoTo the_start:
End If
Loop Until ie.readystate = 4
Application.Wait Now + TimeValue("00:00:10")
ie.Document.getElementbyid("<<field_1>>").Value = "PPP"
ie.Document.getElementbyid("<<field_2>>").Value = "PPP"
ie.Document.getElementbyid("<<field_3>>").Click
Set ie = Nothing
End Sub
UPDATE: Turns out the reason this wasn't working is because there are some settings in the HTML of the site that do not allow for the automation to occur, so any code versions I had were correct but they were doomed to fail. So you were correct in that regard #TimWilliams.
I know this because the website I was trying to access is on a secure server/virtual machine. I edited the code to fill in the google search bar and it did not work on the virtual machine however when I ran the same code locally, it worked fine.

excel vba copying fedex tracking information

I want to copy into Excel 3 tracking information tables that website generates when I track a parcel. I want to do it through Excel VBA. I can write a loop and generate this webpage for various tracking numbers. But I am having a hard time copying tables - the top table, travel history and shipments track table. Any solution? In my vba code last 3 lines below are giving an error :( - run time error '438' Object doesn't support this property or error.
Sub final()
Application.ScreenUpdating = False
Set ie = CreateObject("InternetExplorer.Application")
my_url = "https://www.fedex.com/fedextrack/index.html?tracknumbers=713418602663&cntry_code=us"
With ie
.Visible = True
.navigate my_url
Do Until Not ie.Busy And ie.readyState = 4
DoEvents
Loop
End With
ie.document.getElementById("detailsBody").Value
ie.document.getElementById("trackLayout").Value
ie.document.getElementById("detail").Value
End Sub
.Value is not a method available in that context. also, you will want to assign the return value of the method call to a variable. Also, you should declare your variables :)
I made some modifications and include one possible way of getting data from one of the tables. YOu may need to reformat the output using TextToColumns or similar, since it prints each row in a single cell.
I also notice that when I execute this, the tables have sometimes not finished loading and the result will be an error unless you put in a suitable Wait or use some other method to determine when the data has fully loaded on the webpage. I use a simple Application.Wait
Option Explicit
Sub final()
Dim ie As Object
Dim my_url As String
Dim travelHistory As Object
Dim history As Variant
Dim h As Variant
Dim i As Long
Application.ScreenUpdating = False
Set ie = CreateObject("InternetExplorer.Application")
my_url = "https://www.fedex.com/fedextrack/index.html?tracknumbers=713418602663&cntry_code=us"
With ie
.Visible = True
.navigate my_url
'## I modified this logice a little bit:
Do While .Busy And .readyState <> 4
DoEvents
Loop
End With
'## Here is a simple method wait for IE to finish, you may need a more robust solution
' For assistance with that, please ask a NEW question.
Application.Wait Now() + TimeValue("0:00:10")
'## Get one of the tables
Set travelHistory = ie.Document.GetElementByID("travel-history")
'## Split teh table to an array
history = Split(travelHistory.innerText, vbLf)
'## Iterate the array and write each row to the worksheet
For Each h In history
Range("A1").Offset(i).Value = h
i = i + 1
Next
ie.Quit
Set ie = Nothing
End Sub

How does one wait for an Internet Explorer 9 frame to load using VBA Excel?

There are many online resources that illustrate using Microsoft Internet Explorer Controls within VBA Excel to perform basic IE automation tasks. These work when the webpage has a basic construct. However, when webpages contain multiple frames they can be difficult to work with.
I need to determine if an individual frame within a webpage has completely loaded. For example, this VBA Excel code opens IE, loads a webpage, loops thru an Excel sheet placing data into the webpage fields, executes search, and then returns the IE results data to Excel (my apologies for omitting the site address).
The target webpage contains two frames:
1) The searchbar.asp frame for search value input and executing search
2) The searchresults.asp frame for displaying search results
In this construct the search bar is static, while the search results change according to input criteria. Because the webpage is built in this manner, the IEApp.ReadyState and IEApp.Busy cannot be used to determine IEfr1 frame load completion, as these properties do not change after the initial search.asp load. Therefore, I use a large static wait time to avoid runtime errors as internet traffic fluctuates. This code does work, but is slow. Note the 10 second wait after the cmdGO statement. I would like to improve the performance by adding solid logic to determine the frame load progress.
How do I determine if an autonomous frame has finished loading?
' NOTE: you must add a VBA project reference to "Internet Explorer Controls"
' in order for this code to work
Dim IEapp As Object
Dim IEfr0 As Object
Dim IEfr1 As Object
' Set new IE instance
Set IEapp = New InternetExplorer
' With IE object
With IEapp
' Make visible on desktop
.Visible = True
' Load target webpage
.Navigate "http://www.MyTargetWebpage.com/search.asp"
' Loop until IE finishes loading
While .ReadyState <> READYSTATE_COMPLETE
DoEvents
Wend
End With
' Set the searchbar.asp frame0
Set IEfr0 = IEapp.Document.frames(0).Document
' For each row in my worksheet
For i = 1 To 9999
' Input search values into IEfr0 (frame0)
IEfr0.getElementById("SearchVal1").Value = Cells(i, 5)
IEfr0.getElementById("SearchVal2").Value = Cells(i, 6)
' Execute search
IEfr0.all("cmdGo").Click
' Wait a fixed 10sec
Application.Wait (Now() + TimeValue("00:00:10"))
' Set the searchresults.asp frame1
Set IEfr1 = IEapp.Document.frames(1).Document
' Retrieve webpage results data
Cells(i, 7) = Trim(IEfr1.all.Item(26).innerText)
Cells(i, 8) = Trim(IEfr1.all.Item(35).innerText)
Next
As #JimmyPena said. it's a lot easier to help if we can see the URL.
If we can't, hopefully this overview can put you in the right direction:
Wait for page to load (IEApp.ReadyState and IEApp.Busy)
Get the document object from the IE object. (done)
Loop until the document object is not nothing.
Get the frame object from the document object.
Loop until the frame object is not nothing.
Hope this helps!
I used loop option to check the field value until its populated like this
Do While IE.Document.getElementById("USERID").Value <> "test3"
IE.Document.getElementById("USERID").Value = "test3"
Loop
this is a Rrrreeally old thread, but I figured I would post my findings, because I came here looking for an answer...
Looking in the locals window, I could see that the "readystate" variable was only "READYSTATE_COMPLETE" for the IE App itself. but for the iframe, it was lowercase "complete"
So I explored this by using a debug.print loop on the .readystate of the frame I was working with.
Dim IE As Object
Dim doc As MSHTML.HTMLDocument
Set doc = IE.Document
Dim iframeDoc As MSHTML.HTMLDocument
Set iframeDoc = doc.Frames("TheFrameIwasWaitingFor").Document
' then, after I had filled in the form and fired the submit event,
Debug.Print iframeDoc.readyState
Do Until iframeDoc.readyState = "complete"
Debug.Print iframeDoc.readyState
DoEvents
Loop
So this will show you line after line of "loading" in the immediate window, eventually showing "complete" and ending the loop. it can be abridged to remove the debug.prints of course.
another thing:
debug.print iframeDoc.readystate ' is the same as...
debug.print doc.frames("TheFrameIwasWaitingFor").Document.readystate
' however, you cant use...
IE.Document.frames("TheFrameIwasWaitingFor").Document.readystate ' for some reason...
forgive me if all of this is common knowledge. I really only picked up VBA scripting a couple days ago...

Resources