How to pull data inside "span" for VBA? - excel

I need the span value "0.062540" to pull from website through VBA.
{]1
My code is below:
Dim ie As New InternetExplorer
Dim doc As HTMLDocument
ie.Visible = False
ie.navigate "https://www.tefas.gov.tr/FonAnaliz.aspx?FonKod=MAC"
Do
DoEvents Loop Until ie.readyState = READYSTATE_COMPLETE
Set doc = ie.document
On Error Resume Next
output = doc.getElementsByClassName("top-list").getElementByTagName("span")(0).innerText
Sheet1.Range("B19").Value = output
ie.Quit
End Sub
However, I could not fetch the related value. Could you help with my problem?

First you need to remove On Error Resume Next and check which errors you get! This line hides all your error messages, but the errors still occur, you just cannot see them. If you don't see them you cannot fix them, therefore your code cannot work.
Never use this line as you did. Either remove it or implemet a proper error handling according VBA Error Handling – A Complete Guide.
Then if you do all these actions in one line like
output = doc.getElementsByClassName("top-list").getElementByTagName("span")(0).innerText
the error can be in multiple positions in that line and it is almost impossible to debug it and find out where in that line it is. Therefore we need to split that line up into multiple single actions to see in which part the error exactly occurs.
So we split it up like below which is exactly the same as the one line above:
Dim Divs As Object 'collection of div elements
Set Divs = doc.getElementsByClassName("top-list")
Dim Spans As Object 'collection of span elements
Set Spans = Divs.getElementByTagName("span")
Dim Output As String
Output = Spans(0).innerText
Now we will see that finding the div with a class top-list works. And we get an error at finding the Span elements. So if we have a look at the Divs variable we see that it is an collection of multiple items. Therefore we need to access the first item in that collection like Divs(0). Furthermore it is not getElementByTagName but getElementsByTagName (with an s). So correcting it to the following:
Dim Divs As Object 'collection of div elements
Set Divs = doc.getElementsByClassName("top-list")
Dim Spans As Object 'collection of span elements
Set Spans = Divs(0).getElementsByTagName("span")
Dim Output As String
Output = Spans(0).innerText
and we see it works.
Finally it is a good idea to implement some error handling, so in case something goes wrong you don't end up with hidden Internet Explorer windows that get never closed:
Public Sub FetchNumber()
Dim ie As New InternetExplorer
On Error Goto SAFE_QUIT 'make sure in case of error ie.quit is called.
ie.Visible = False
ie.navigate "https://www.tefas.gov.tr/FonAnaliz.aspx?FonKod=MAC"
Do
DoEvents Loop Until ie.readyState = READYSTATE_COMPLETE
Dim doc As HTMLDocument
Set doc = ie.document
Dim Divs As Object
Set Divs = doc.getElementsByClassName("top-list")
Dim Spans As Object
Set Spans = Divs(0).getElementsByTagName("span")
Dim Output As String
Output = Spans(0).innerText
SAFE_QUIT:
If Err.Number <> 0 Then
Err.Raise Err.Number, Err.Source, Err.Description, Err.HelpFile, Err.HelpContext
End If
On Error Goto 0 're activate error reporting
ie.Quit
Set ie = Nothing
End Sub

Related

Scraping table behind login wall

I am struggling to get the right piece of code to scrape a table that is being a password protected website into an excel workbook. I have been able to get all of the code to work up to the scraping of the table part. When I run the code, it opens IE, logins in but then errors out (91: Object variable or WITH block variable not set). The code is below:
Private Sub CommandButton3_Click()
Declare variables
Dim IE As Object
Dim Doc As HTMLDocument
Dim HTMLTable As Object
Dim TableRow As Object
Dim TableCell As Object
Dim myRow As Long
'Create a new instance of Internet Explorer
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
'Navigate to the website
IE.Navigate "https://www.myfueltanksolutions.com/validate.asp"
'Wait for the page to finish loading
Do While IE.ReadyState <> 4
DoEvents
Loop
'Set the document object
Set Doc = IE.Document
'Fill in the security boxes
Doc.all("CompanyID").Value = "ID"
Doc.all("UserId").Value = "Username"
Doc.all("Password").Value = "Password"
'Click the submit button
Doc.all("btnSubmit").Click
'Wait for the page to finish loading
Do Until IE.ReadyState = READYSTATE_COMPLETE
DoEvents
Loop
'Set the HTMLTable object
Set HTMLTable = Doc.getElementById("RecentInventorylistform")
'Loop through each row in the table
For Each TableRow In HTMLTable.getElementsByTagName("tr")
'Loop through each cell in the row
For Each TableCell In TableRow.getElementsByTagName("td")
'Write the table cell value to the worksheet
Worksheets("Sheet1").Range("A5").Offset(myRow, 0).Value = TableCell.innerText
myRow = myRow + 1
Next TableCell
Next TableRow
Do Until IE.ReadyState = READYSTATE_COMPLETE
DoEvents
Loop
'Log out and close website
IE.Navigate ("https://www.myfueltanksolutions.com/signout.asp?action=rememberlogin")
IE.Quit
End Sub
I have included the HTML code of the table I am trying to scrape on the re-directed page after login.
I wont be tired to told it again and again and again and ... ;-)
Don't work with the IE anymore. MS is actively phasing it out!
But for explanation:
I'am sure, this is the code fragment which don't do what you expect:
...
...
'Wait for the page to finish loading
Do Until IE.ReadyState = READYSTATE_COMPLETE
DoEvents
Loop
'Set the HTMLTable object
Set HTMLTable = Doc.getElementById("RecentInventorylistform")
...
...
Waiting for READYSTATE_COMPLETE doesn't work here (for which reasons ever). So the code will go on without a stop and doesn't load the new content. The use of getElementByID() ends up in the named error then because there is no element with that id.
Excursus for some get-methods of the DOM (Document Object Model):
The methods getElementsByTagName() and getElementsByClassName() will build a node collection which contains all elements with the given criterion. If you build a collection like that with getElementsByTagName("a") you get a collection with all anchor tags. Every element of the collection can be called with it's index like in an array. If you want to know how many elements are in a collection like that you can read the attribute length. If there is no element you ask for, in our example a-tags, the length will be 0. But the collection was build so you have an object.
The get-methods which build a collection have an s for plural in ...Elements... But getElementByID() has no s because an id can only be once in a html document. No collection needed here. The method getElementByID() always try to buld an object from the asked criterion. If there is no element like that you will get the error that there is no object.
How to solve the issue:
We must change the termination criterion and the body of the loop. We must ask again and again if the element with the wanted id is present. To do that we must use the given line:
Set HTMLTable = Doc.getElementById("RecentInventorylistform")
Like I said before there will be raising an error if it is not present. That's right. But with On Error Resume Next we can ignore any error in the code.
Attention!
Only use this in specific situations and switch back to error handling with On Error GoTo 0 after the critical part of code.
Replace the code I posted above in this answer with the following one:
(To avoid endless loops it is recommended to use a time out mechanism too. But I will keep it simple here.)
Do
On Error Resume Next
Set HTMLTable = Doc.getElementById("RecentInventorylistform")
On Error GoTo 0
Loop While HTMLTable Is Nothing

Excel VBA IE Object and using dropdown list

I am experimenting with web automation and struggling a bit trying to utilize a drop down list.
My code works up to the point of searching for a company name and hitting "go". On the new page I can't seem to find the right code that selects the group of elements that represents the drop down list. I then want to select "100" entries, but I can't even grab the nodes that represent this list.
I have been browsing multiple different pages on stackoverflow that talk about CSS selectors and looked at tutorials but that doesn't seem to help either. I either end up grabbing nothing, or whatever I grab can't use the getElementsByTagName method, which ultimately I am trying to drill down into the td and select nodes . Not sure what to do with those yet, but I can't even grab them. Thoughts?
(note stopline is just a line that I use a breakpoint on to stop my code)
CSS helper website: https://www.w3schools.com/cssref/trysel.asp
Code:
Option Explicit
Sub test()
On Error GoTo ErrHandle
Dim ie As New InternetExplorer
Dim doc As New HTMLDocument
Dim ws As Worksheet
Dim stopLine As Integer
Dim oSearch As Object, oSearchButton As Object
Dim oForm As Object
Dim oSelect As Object
Dim list As Object
Set ws = ThisWorkbook.Worksheets("Sheet1")
ie.Visible = True
ie.navigate "https://www.sec.gov/edgar/searchedgar/companysearch.html"
Do
DoEvents
Loop Until ie.readyState = READYSTATE_COMPLETE
Set doc = ie.Document
Set oSearch = doc.getElementById("companysearchform")
Set oSearchButton = oSearch.getElementsByTagName("input")(1)
Set oSearch = oSearch.getElementsByTagName("input")(0)
oSearch.Value = "Summit Midstream Partners, LP"
oSearchButton.Click
Do
DoEvents
Loop Until ie.readyState = READYSTATE_COMPLETE
Set doc = ie.Document
Set list = doc.querySelectorAll("td select")
stopLine = 1
Exit Sub
ErrHandle:
MsgBox Err.Number & " - " & Err.Description, vbCritical
Exit Sub
End Sub
td select will return a single node so you only need querySelector. The node has an id so you might as well use the quicker querySelector("#count") to target the parent select. To change the option you can then use SelectedIndex on the parent select, or, target the child option by its value attribute querySelector("[value='100']").Selected = True. You may then need to attach and trigger change/onchange htmlevent to the parent select to register the change.
However, I would simply extract the company CIK from current page then concatenate the count=100 param into the url and .Navigate2 that using following format:
https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001549922&type=&dateb=&owner=include&count=100&search_text=
You can extract CIK, after initial search company click and wait for page load, with:
Dim cik As String
cik = ie.document.querySelector("[name=CIK]").value
ie.Navigate2 "https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=" & cik & "&type=&dateb=&owner=include&count=100&search_text="
Given several params are left blank you can likely shorten to:
"https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=" & cik & "&owner=include&count=100"
If you are unable to get the initial parent select you probably need a timed loop waiting for that element to be present after clicking the search button. An example is shown here in a StackOverflow answer.

Runtime error 91 “Object variable or with block variable not set”

Sub FindUser()
Dim ie as shdocvw.internetexplorer
Dim ht as htmldocument
Set ie = new internetexplorermedium
Ie.visible = True
Ie.navigate ("url")
Do while ie.busy or ie.readystate <> 4
Doevents
Loop
Set ht = ie.document
Activesheet.range("b30").value = ht.getelementbyid("infobasic").getelementsbytagname("span") (0).innertext
End sub
I get error at bolded text (activesheet...line). But if I continue running code manualy I get the desired value in cell. It's just the error in middle. Please help. I want to get the web data into excel cell. In code I have mentioned only one tag however I will be using more tags to get more results from web.
The error means that you are trying to access a Property of an object that is null.
You need to expand your method a bit, and check each object before you try to access its properties to avoid such errors.
For example:
Dim objInfo as Object
Set objInfo = ht.getelementbyid("infobasic")
If objInfo Is Nothing then
'no need to go any further here as the object is null and
'will throw an error if you try to access its properties.
End if
Dim elements as Object
Set elements = objInfo.getelementsbytagname("span")
If elements Is Nothing then
'same as above
End if
Dim element as Object
Set element = elements(0)
If Not element Is Nothing then
'here you can safely access the .innerText property
End if
After debugging the above, you can see which object has not been set.
Hope this helps.

VBA IE.Document empty error

I've been running a query for a while now getting data from a webpage. After numerous runs it has decided to stop working, and I've traced the issue back to the ie.document object - it never returns anything.
When compiling my project I see that the "Document" element of ie returns an error of "Applicaiton-defined or Object-defined error" - even before I navigate to a webpage. Also some other elements return this error also - namely "Status Text" and "Type"
The link contains a screenshot of my error:
https://www.dropbox.com/s/wcxxep8my10nu8h/vba%20ie%20document.jpg?dl=0
In case that doesn't work here a scaled back version of the code I'm running
Sub getCard()
Dim ie As InternetExplorer
Dim url1 As String
url1 = "google.com"
Set ie = New InternetExplorer
ie.Visible = True
ie.Navigate url1
WaitBrowserQuiet ie
End Sub
Sub WaitBrowserQuiet(objIE As InternetExplorer)
Do While objIE.Busy Or objIE.ReadyState <> READYSTATE_COMPLETE
DoEvents
Loop
End Sub
As soon as I get to the "Set ie = New InternetExplorer" part of the code is when the ie object is created and I see the errors. If I do happen to navigate to webpage, then the ie.document object is empty.
I've searched around and tried a few things to stop this happening - restarted my computer, run "ie.quit" and "Set ie = Nothing", reset my Internet Explorer, etc... Nothing seems to work.
It seems like it may be a deeper issue given I'm getting an error message even before navigating to a webpage. Hope someone knows how to stop the error.
Your URL is URL1, try changing that, or just putting the URL in there.
In your code you have the object "ie" locally defined in the sub getCard and when this sub finishes,so goes the binding. Also changing from private to public internet zones can remove the binding to that object. What I rather do is use a global object appIE and then when it runs into such an error I catch the error (if TypeName(appIE) = "Object" Then FindIE) and find the object again with this sub:
Sub FindIE() 'Needs reference to "Microsoft Shell Controls And Automation" from VBA->Tools->References
Dim sh
Dim eachIE
Dim SearchUntil
SearchUntil = Now() + 20 / 24 / 60 / 60 'Allow to search for 20 seconds otherwise it interrupts search
Do
Set sh = New Shell32.Shell
For Each eachIE In sh.Windows
If InStr(1, eachIE.LocationURL, ServerAddress) Then
Set appIE = eachIE
'IE.Visible = False 'This is here because in some environments, the new process defaults to Visible.
'Exit Do
End If
Next eachIE
If TypeName(appIE) <> "Object" Then
If InStr(1, appIE.LocationURL, ServerAddress) > 0 Or SearchUntil < Now() Then Exit Do
End If
Loop
Set eachIE = Nothing
Set sh = Nothing
End Sub
This code contains parts of other people here from stackoverflow, but I forgot who to credit for some essential parts of the code. Sorry.

getelementsbyclassname Excel vba - error on repeated calls

Morning,
I am having trouble with a webscrape from Excel, whereby getelementsbyclassname is failing to act on some objects, throwing up the "Object doesn't support this property or method" error.
The problem appears when the object I am feeding into getelementsbyclassname is itself the result of a getelementsbyclassname method. I am not sure why, particularly as I can get the class name when acting on a larger object...
Here is a code extract
''''Boring Variables Declaration I've cut out''''
'Initialise IE
Dim IEApp As New InternetExplorer
Set IEApp = New InternetExplorer
IEApp.Visible = True 'JB
'Open page and wait for page to load
IEApp.navigate ("http://www.anicewebsite.com")
Do Until IEApp.readyState = READYSTATE_COMPLETE And IEApp.Busy = False
DoEvents
Loop
Set HTMLdoc = IEApp.document
Set RefLocation = Sheets("INFO_DUMP").Range("LocationRefCell")
Set trElements = HTMLdoc.getElementsByClassName("basic-details")
For Each trElement In trElements
'Select the LHS box and extract info
Set tdElement = trElement.getElementsByClassName("tieredToggle")
'write start/end locations
'''''THIS NEXT LINE THROWS AN ERROR'''''
Data_str = tdElement.getElementsByClassName("title").innerText
'''''AS DOES'''''
MyObject=tdElement.getElementsByClassName("title")
RefLocation.Offset(1, 2).Value = Data_str
Next 'close tr Loop
However, I can get the 'title' object via
For Each trElement In trElements
Set MyObject=trElement.getElementsByClassName("title")
Next 'close tr Loop
so the error is, presumably, something about tdElement (a DispHTMLElement Collection), which I tried to attach an image of but I lack the reputation (see link at end of post)...
Many thanks for any help.
PS. the webpage is structured, roughly, with a 2-column table whose rows I isolate with "basic-details". The first column is the "tiered toggle" and then the items I want are inner text in eg. "title". I need to use tieredtoggle as objects in each column have repeated class names
http://i.stack.imgur.com/1tyb6.png
You can use this to get the innertext.
Data_str = tdElement.getElementsByClassName("title")(0).innerText
Instead of ("title")(0) you can enter the index value where the element is present.

Resources