Excel VBA IE Object and using dropdown list - excel

I am experimenting with web automation and struggling a bit trying to utilize a drop down list.
My code works up to the point of searching for a company name and hitting "go". On the new page I can't seem to find the right code that selects the group of elements that represents the drop down list. I then want to select "100" entries, but I can't even grab the nodes that represent this list.
I have been browsing multiple different pages on stackoverflow that talk about CSS selectors and looked at tutorials but that doesn't seem to help either. I either end up grabbing nothing, or whatever I grab can't use the getElementsByTagName method, which ultimately I am trying to drill down into the td and select nodes . Not sure what to do with those yet, but I can't even grab them. Thoughts?
(note stopline is just a line that I use a breakpoint on to stop my code)
CSS helper website: https://www.w3schools.com/cssref/trysel.asp
Code:
Option Explicit
Sub test()
On Error GoTo ErrHandle
Dim ie As New InternetExplorer
Dim doc As New HTMLDocument
Dim ws As Worksheet
Dim stopLine As Integer
Dim oSearch As Object, oSearchButton As Object
Dim oForm As Object
Dim oSelect As Object
Dim list As Object
Set ws = ThisWorkbook.Worksheets("Sheet1")
ie.Visible = True
ie.navigate "https://www.sec.gov/edgar/searchedgar/companysearch.html"
Do
DoEvents
Loop Until ie.readyState = READYSTATE_COMPLETE
Set doc = ie.Document
Set oSearch = doc.getElementById("companysearchform")
Set oSearchButton = oSearch.getElementsByTagName("input")(1)
Set oSearch = oSearch.getElementsByTagName("input")(0)
oSearch.Value = "Summit Midstream Partners, LP"
oSearchButton.Click
Do
DoEvents
Loop Until ie.readyState = READYSTATE_COMPLETE
Set doc = ie.Document
Set list = doc.querySelectorAll("td select")
stopLine = 1
Exit Sub
ErrHandle:
MsgBox Err.Number & " - " & Err.Description, vbCritical
Exit Sub
End Sub

td select will return a single node so you only need querySelector. The node has an id so you might as well use the quicker querySelector("#count") to target the parent select. To change the option you can then use SelectedIndex on the parent select, or, target the child option by its value attribute querySelector("[value='100']").Selected = True. You may then need to attach and trigger change/onchange htmlevent to the parent select to register the change.
However, I would simply extract the company CIK from current page then concatenate the count=100 param into the url and .Navigate2 that using following format:
https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001549922&type=&dateb=&owner=include&count=100&search_text=
You can extract CIK, after initial search company click and wait for page load, with:
Dim cik As String
cik = ie.document.querySelector("[name=CIK]").value
ie.Navigate2 "https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=" & cik & "&type=&dateb=&owner=include&count=100&search_text="
Given several params are left blank you can likely shorten to:
"https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=" & cik & "&owner=include&count=100"
If you are unable to get the initial parent select you probably need a timed loop waiting for that element to be present after clicking the search button. An example is shown here in a StackOverflow answer.

Related

Scraping table behind login wall

I am struggling to get the right piece of code to scrape a table that is being a password protected website into an excel workbook. I have been able to get all of the code to work up to the scraping of the table part. When I run the code, it opens IE, logins in but then errors out (91: Object variable or WITH block variable not set). The code is below:
Private Sub CommandButton3_Click()
Declare variables
Dim IE As Object
Dim Doc As HTMLDocument
Dim HTMLTable As Object
Dim TableRow As Object
Dim TableCell As Object
Dim myRow As Long
'Create a new instance of Internet Explorer
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
'Navigate to the website
IE.Navigate "https://www.myfueltanksolutions.com/validate.asp"
'Wait for the page to finish loading
Do While IE.ReadyState <> 4
DoEvents
Loop
'Set the document object
Set Doc = IE.Document
'Fill in the security boxes
Doc.all("CompanyID").Value = "ID"
Doc.all("UserId").Value = "Username"
Doc.all("Password").Value = "Password"
'Click the submit button
Doc.all("btnSubmit").Click
'Wait for the page to finish loading
Do Until IE.ReadyState = READYSTATE_COMPLETE
DoEvents
Loop
'Set the HTMLTable object
Set HTMLTable = Doc.getElementById("RecentInventorylistform")
'Loop through each row in the table
For Each TableRow In HTMLTable.getElementsByTagName("tr")
'Loop through each cell in the row
For Each TableCell In TableRow.getElementsByTagName("td")
'Write the table cell value to the worksheet
Worksheets("Sheet1").Range("A5").Offset(myRow, 0).Value = TableCell.innerText
myRow = myRow + 1
Next TableCell
Next TableRow
Do Until IE.ReadyState = READYSTATE_COMPLETE
DoEvents
Loop
'Log out and close website
IE.Navigate ("https://www.myfueltanksolutions.com/signout.asp?action=rememberlogin")
IE.Quit
End Sub
I have included the HTML code of the table I am trying to scrape on the re-directed page after login.
I wont be tired to told it again and again and again and ... ;-)
Don't work with the IE anymore. MS is actively phasing it out!
But for explanation:
I'am sure, this is the code fragment which don't do what you expect:
...
...
'Wait for the page to finish loading
Do Until IE.ReadyState = READYSTATE_COMPLETE
DoEvents
Loop
'Set the HTMLTable object
Set HTMLTable = Doc.getElementById("RecentInventorylistform")
...
...
Waiting for READYSTATE_COMPLETE doesn't work here (for which reasons ever). So the code will go on without a stop and doesn't load the new content. The use of getElementByID() ends up in the named error then because there is no element with that id.
Excursus for some get-methods of the DOM (Document Object Model):
The methods getElementsByTagName() and getElementsByClassName() will build a node collection which contains all elements with the given criterion. If you build a collection like that with getElementsByTagName("a") you get a collection with all anchor tags. Every element of the collection can be called with it's index like in an array. If you want to know how many elements are in a collection like that you can read the attribute length. If there is no element you ask for, in our example a-tags, the length will be 0. But the collection was build so you have an object.
The get-methods which build a collection have an s for plural in ...Elements... But getElementByID() has no s because an id can only be once in a html document. No collection needed here. The method getElementByID() always try to buld an object from the asked criterion. If there is no element like that you will get the error that there is no object.
How to solve the issue:
We must change the termination criterion and the body of the loop. We must ask again and again if the element with the wanted id is present. To do that we must use the given line:
Set HTMLTable = Doc.getElementById("RecentInventorylistform")
Like I said before there will be raising an error if it is not present. That's right. But with On Error Resume Next we can ignore any error in the code.
Attention!
Only use this in specific situations and switch back to error handling with On Error GoTo 0 after the critical part of code.
Replace the code I posted above in this answer with the following one:
(To avoid endless loops it is recommended to use a time out mechanism too. But I will keep it simple here.)
Do
On Error Resume Next
Set HTMLTable = Doc.getElementById("RecentInventorylistform")
On Error GoTo 0
Loop While HTMLTable Is Nothing

How to pull data inside "span" for VBA?

I need the span value "0.062540" to pull from website through VBA.
{]1
My code is below:
Dim ie As New InternetExplorer
Dim doc As HTMLDocument
ie.Visible = False
ie.navigate "https://www.tefas.gov.tr/FonAnaliz.aspx?FonKod=MAC"
Do
DoEvents Loop Until ie.readyState = READYSTATE_COMPLETE
Set doc = ie.document
On Error Resume Next
output = doc.getElementsByClassName("top-list").getElementByTagName("span")(0).innerText
Sheet1.Range("B19").Value = output
ie.Quit
End Sub
However, I could not fetch the related value. Could you help with my problem?
First you need to remove On Error Resume Next and check which errors you get! This line hides all your error messages, but the errors still occur, you just cannot see them. If you don't see them you cannot fix them, therefore your code cannot work.
Never use this line as you did. Either remove it or implemet a proper error handling according VBA Error Handling – A Complete Guide.
Then if you do all these actions in one line like
output = doc.getElementsByClassName("top-list").getElementByTagName("span")(0).innerText
the error can be in multiple positions in that line and it is almost impossible to debug it and find out where in that line it is. Therefore we need to split that line up into multiple single actions to see in which part the error exactly occurs.
So we split it up like below which is exactly the same as the one line above:
Dim Divs As Object 'collection of div elements
Set Divs = doc.getElementsByClassName("top-list")
Dim Spans As Object 'collection of span elements
Set Spans = Divs.getElementByTagName("span")
Dim Output As String
Output = Spans(0).innerText
Now we will see that finding the div with a class top-list works. And we get an error at finding the Span elements. So if we have a look at the Divs variable we see that it is an collection of multiple items. Therefore we need to access the first item in that collection like Divs(0). Furthermore it is not getElementByTagName but getElementsByTagName (with an s). So correcting it to the following:
Dim Divs As Object 'collection of div elements
Set Divs = doc.getElementsByClassName("top-list")
Dim Spans As Object 'collection of span elements
Set Spans = Divs(0).getElementsByTagName("span")
Dim Output As String
Output = Spans(0).innerText
and we see it works.
Finally it is a good idea to implement some error handling, so in case something goes wrong you don't end up with hidden Internet Explorer windows that get never closed:
Public Sub FetchNumber()
Dim ie As New InternetExplorer
On Error Goto SAFE_QUIT 'make sure in case of error ie.quit is called.
ie.Visible = False
ie.navigate "https://www.tefas.gov.tr/FonAnaliz.aspx?FonKod=MAC"
Do
DoEvents Loop Until ie.readyState = READYSTATE_COMPLETE
Dim doc As HTMLDocument
Set doc = ie.document
Dim Divs As Object
Set Divs = doc.getElementsByClassName("top-list")
Dim Spans As Object
Set Spans = Divs(0).getElementsByTagName("span")
Dim Output As String
Output = Spans(0).innerText
SAFE_QUIT:
If Err.Number <> 0 Then
Err.Raise Err.Number, Err.Source, Err.Description, Err.HelpFile, Err.HelpContext
End If
On Error Goto 0 're activate error reporting
ie.Quit
Set ie = Nothing
End Sub

Excel VBA - Web Scraping - Get value in HTML Table cell

I am trying to create a macro that scrapes a cargo tracking website.
But I have to create 4 such macros as each airline has a different website.
I am new to VBA and web scraping.
I have put together a code that works for 1 website. But when I tried to replicate it for another one, I am stuck in the loop. I think it maybe how I am referring to the element, but like I said, I am new to VBA and have no clue about HTML.
I am trying to get the "notified" value in the highlighted line from the image.
IMAGE:"notified" text to be extracted
Below is the code I have written so far that gets stuck in the loop.
Any help with this would be appreciated.
Sub FlightStat_AF()
Dim url As String
Dim ie As Object
Dim nodeTable As Object
'You can handle the parameters id and pfx in a loop to scrape dynamic numbers
url = "https://www.afklcargo.com/mycargo/shipment/detail/057-92366691"
'Initialize Internet Explorer, set visibility,
'call URL and wait until page is fully loaded
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = False
ie.navigate url
Do Until ie.readyState = 4: DoEvents: Loop
'Wait to load dynamic content after IE reports it's ready
'We can do that in a loop to match the point the information is available
Do
On Error Resume Next
Set nodeTable = ie.document.getElementByClassName("block-whisper")
On Error GoTo 0
Loop Until Not nodeTable Is Nothing
'Get the status from the table
MsgBox Trim(nodeTable.getElementsByClassName("fs-12 body-font-bold").innerText)
'Clean up
ie.Quit
Set ie = Nothing
Set nodeTable = Nothing
End Sub
Some basics:
For simple accesses, like the present ones, you can use the get methods of the DOM (Document Object Model). But there is an important difference between getElementByID() and getElementsByClassName() / getElementsByTagName().
getElementByID() searches for the unique ID of a html tag. This is written as the ID attribute to html tags. If the html standard is kept by the page, there is only one element with this unique ID. That's the reason why the method begins with getElement.
If the ID is not found when using the method, VBA throws a runtime error. Therefore the call is encapsulated in the loop from the other answer from me, into switching off and on again the error handling. But in the page from this question there is no ID for the html area in question.
Instead, the required element can be accessed directly. You tried the access with getElementsByClassName(). That's right. But here comes the difference to getElementByID().
getElementsByClassName() and getElementsByTagName() begin with getElements. Thats plural because there can be as many elements with the same class or tag name as you want. This both methods create a html node collection. All html elements with the asked class or tag name will be listet in those collections.
All elements have an index, just like an array. The indexes start at 0. To access a particular element, the desired index must be specified. The two class names fs-12 body-font-bold (class names are seperated by spaces, you can also build a node collection by using only one class name) deliver 2 html elements to the node collection. You want the second one so you must use the index 1.
This is the VBA code for the asked page by using the IE:
Sub FlightStat_AF()
Dim url As String
Dim ie As Object
'You can handle the parameters id and pfx in a loop to scrape dynamic numbers
url = "https://www.afklcargo.com/mycargo/shipment/detail/057-92366691"
'Initialize Internet Explorer, set visibility,
'call URL and wait until page is fully loaded
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = False
ie.navigate url
Do Until ie.readyState = 4: DoEvents: Loop
'Wait to load dynamic content after IE reports it's ready
'We do that with a fix manual break of a few seconds
'because the whole page will be "reload"
'The last three values are hours, minutes, seconds
Application.Wait (Now + TimeSerial(0, 0, 3))
'Get the status from the table
MsgBox Trim(ie.document.getElementsByClassName("fs-12 body-font-bold")(1).innerText)
'Clean up
ie.Quit
Set ie = Nothing
End Sub
Edit: Sub as function
This sub to test the function:
Sub testFunction()
Dim flightStatAfResult As String
flightStatAfResult = FlightStat_AF("057-92366691")
MsgBox flightStatAfResult
End Sub
This is the sub as function:
Function FlightStat_AF(cargoNo As String) As String
Dim url As String
Dim ie As Object
Dim result As String
'You can handle the parameters id and pfx in a loop to scrape dynamic numbers
url = "https://www.afklcargo.com/mycargo/shipment/detail/" & cargoNo
'Initialize Internet Explorer, set visibility,
'call URL and wait until page is fully loaded
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = False
ie.navigate url
Do Until ie.readyState = 4: DoEvents: Loop
'Wait to load dynamic content after IE reports it's ready
'We do that with a fix manual break of a few seconds
'because the whole page will be "reload"
'The last three values are hours, minutes, seconds
Application.Wait (Now + TimeSerial(0, 0, 3))
'Get the status from the table
result = Trim(ie.document.getElementsByClassName("fs-12 body-font-bold")(1).innerText)
'Clean up
ie.Quit
Set ie = Nothing
'Return value of the function
FlightStat_AF = result
End Function

How to input values into dropdown box of web page using Excel VBA

I'm trying to operate a website to display desired option chain data with an Excel VBA macro. The website -- CBOE.com -- has an input field for the ticker symbol of the desired option chains. My code has been able to drive that part of the webpage and a default option chain is displayed. It defaults to the most current month that options expire (May 2018 as of this note). From there the user can input other expiration dates for which to have other option chains (for the same symbol) to be retrieved and displayed. This is where my code seems to be breaking down.
Just above the default option chain display is a dropdown input box labeled "Expiration:" where a list of other expiration months can be selected. Once selected, a green Submit button must be clicked to get the specified option chain for the selected expiration month. Alternatively, below the default option chain are explicit filter buttons for expiration months also.
As said, my code gets to the point of specifying the symbol and getting default option chains displayed, but I can't seem to get the dropdown input field for other expiration months to work.
If anyone can see where and how my code is deficient, I'd really appreciate that insight.
Many thanks.
--Mark.
Here is my core code in question:
Sub getmarketdata_V3()
Dim mybrowser As Object, myhtml As String
Dim htmltables As Object, htmltable As Object
Dim htmlrows As Object, htmlrow As Object
Dim htmlcells As Object, htmlcell As Object
Dim xlrow As Long, xlcol As Integer
Dim exitat As Date, symbol As String
Dim flag As Integer
On Error GoTo errhdl
Const myurl = "http://www.cboe.com/delayedquote/quote-table"
symbol = UCase(Trim(Range("ticker").Text))
With Range("ticker").Worksheet
Range(Range("ticker").Offset(1, 0), Cells(Rows.Count, Range("ticker").Column + 13)).ClearContents
End With
Set mybrowser = CreateObject("internetexplorer.application")
mybrowser.Visible = True
mybrowser.navigate myurl
While mybrowser.busy Or mybrowser.readyState <> 4
DoEvents
Wend
With mybrowser.document.all
exitat = Now + TimeValue("00:00:05")
Do
.Item("ctl00$ContentTop$C002$txtSymbol").Value = symbol
.Item("ctl00$ContentTop$C002$btnSubmit").Value = "Submit"
.Item("ctl00$ContentTop$C002$btnSubmit").Click
If Err.Number = 0 Then Exit Do
Err.Clear
DoEvents
If Now > exitat Then Exit Do
Loop
End With
'This With statement is to refresh the mybrowser.document since the prior With statement pulls up a partially new webpage
With mybrowser.document.all
On Error Resume Next
exitat = Now + TimeValue("00:00:05")
'Tried using "ID" label to select desired month--in this case 2018 July is a dropdown option:
'Usind this label seems to blank out the value displayed in the dropdown input box, but does not cause
'any of the options to display nor implant "2018 July" in it either. It just remains blank and no new option
'chain is retrieved.
.Item("ContentTop_C002_ddlMonth").Select
.Item("ContentTop_C002_ddlMonth").Value = "2018 July"
.Item("ContentTop_C002_ddlMonth").Click
'Then tried using "Name" label to select desired month--in this case 2018 July is an option:
' .Item("ctl00$ContentTop$C002$ddlMonth").Value = "2018 July"
' .Item("ctl00$ContentTop$C002$ddlMonth").Click
' .Item("ctl00$ContentTop$C002$btnFilter").Value = "View Chain"
' .Item("ctl00$ContentTop$C002$btnFilter").Click
End With
While mybrowser.busy Or mybrowser.readyState <> 4
DoEvents
Wend
'Remaining logic, except for this error trap logic deals with the option chain results once it has been successfully retrieved.
'For purposes of focus on the issue of not being able to successfully have such a table displayed, that remaining process logic is not
'included here.
errhdl:
If Err.Number Then MsgBox Err.Description, vbCritical, "Get data"
On Error Resume Next
mybrowser.Quit
Set mybrowser = Nothing
Set htmltables = Nothing
End Sub
For your code:
These 2 lines change the month and click the view chain (I tested with symbol FLWS). Make sure you have sufficient delays for page to actually have loaded.
mybrowser.document.querySelector("#ContentTop_C002_ddlMonth").Value = "201809"
mybrowser.document.querySelector("#ContentTop_C002_btnFilter").Click
I found the above sketchy for timings when added into your code so I had a quick play with Selenium basic as well. Here is an example with selenium:
Option Explicit
'Tools > references > selenium type library
Public Sub GetMarketData()
Const URL As String = "http://www.cboe.com/delayedquote/quote-table"
Dim d As ChromeDriver, symbol As String
symbol = "FLWS"
Set d = New ChromeDriver
With d
.Start
.Get URL
Dim b As Object, c As Object, keys As New keys
Set b = .FindElementById("ContentTop_C002_txtSymbol")
b.SendKeys symbol
.FindElementById("ContentTop_C002_btnSubmit").Click
Set c = .FindElementById("ContentTop_C002_ddlMonth")
c.Click
c.SendKeys keys.Down 'move one month down
.FindElementById("ContentTop_C002_btnFilter").Click
Stop '<<delete me later
.Quit
End With
End Sub
Try the below approach, in case you wanna stick to IE. I tried to kick out hardcoded delay from the script. It should get you there. Make sure to fill in the text field with the appropriate ticker from the below script before execution.
There you go:
Sub HandleDropDown()
Const url As String = "http://www.cboe.com/delayedquote/quote-table"
Dim IE As New InternetExplorer, Html As HTMLDocument, post As Object, elem As Object
With IE
.Visible = True
.navigate url
While .Busy Or .readyState <> 4: DoEvents: Wend
Set Html = .document
End With
Do: Set post = Html.getElementById("ContentTop_C002_txtSymbol"): DoEvents: Loop While post Is Nothing
post.Value = "tickername" ''make sure to fill in this box with appropriate symbol
Html.getElementById("ContentTop_C002_btnSubmit").Click
Do: Set elem = Html.getElementById("ContentTop_C002_ddlMonth"): DoEvents: Loop While elem Is Nothing
elem.selectedIndex = 2 ''just select the month using it's dropdown order
Html.getElementById("ContentTop_C002_btnFilter").Click
End Sub
Reference to add to the library:
Microsoft Internet Controls
Microsoft HTML Object Library

Error "Object variable or with block variable not set" when using getElementsByClassName

I am want to scrap from amazon some fields.
Atm I am using a link and my vba script returns me name and price.
For example:
I put the link into column A and get the other fields in the respective columns, f.ex.: http://www.amazon.com/GMC-Denali-Black-22-5-Inch-Medium/dp/B00FNVBS5C/ref=sr_1_1?s=outdoor-recreation&ie=UTF8&qid=1436768082&sr=1-1&keywords=bicycle
However, I would also like to have the product description.
Here is my current code:
Sub ScrapeAmz()
Dim Ie As New InternetExplorer
Dim WebURL
Dim Docx As HTMLDocument
Dim productDesc
Dim productTitle
Dim price
Dim RcdNum
Ie.Visible = False
For RcdNum = 2 To ThisWorkbook.Worksheets(1).Range("A65536").End(xlUp).Row
WebURL = ThisWorkbook.Worksheets(1).Range("A" & RcdNum)
Ie.Navigate2 WebURL
Do Until Ie.ReadyState = READYSTATE_COMPLETE
DoEvents
Loop
Set Docx = Ie.Document
productTitle = Docx.getElementById("productTitle").innerText
'productDesc = Docx.getElementsByClassName("productDescriptionWrapper")(0).innerText
price = Docx.getElementById("priceblock_ourprice").innerText
ThisWorkbook.Worksheets(1).Range("B" & RcdNum) = productTitle
'ThisWorkbook.Worksheets(1).Range("C" & RcdNum) = productDesc
ThisWorkbook.Worksheets(1).Range("D" & RcdNum) = price
Next
End Sub
I am trying to get the product description by using productDesc = Docx.getElementsByClassName("productDescriptionWrapper")(0).innerText.
However, I get an error.
Object variable or with block variable not set.
Any suggestion why my statement does not work?
I appreciate your replies!
I'm pretty sure your problem is being caused by attempting to access the document before it's completely loaded. You're just checking ie.ReadyState.
This is my understanding of the timeline for loading a page with an IE control.
Browser connects to page: ie.ReadyState = READYSTATE_COMPLETE. At this point, you can access ie.document without causing an error, but the document has only started loading.
Document fully loaded: ie.document.readyState = "complete"
(note that frames may still be loading and AJAX processing may still be occurring.)
So you really need to check for two events.
Do
If ie.ReadyState = READYSTATE_COMPLETE Then
If ie.document.readyState = "complete" Then Exit Do
End If
Application.Wait DateAdd("s", 1, Now)
Loop
edit: after actually looking at the page you're trying to scrape, it looks like the reason it's failing is because the content you're trying to get at is in an iframe. You need to go through the iframe before you can get to the content.
ie.document.window.frames("product-description-iframe").contentWindow.document.getElementsByClassName("productDescriptionWrapper").innerText

Resources