I was scraping some web content using VBA and MSXML, so I know the basics. But now I would like to get data from web page which was generated by JavaScript.
I can't give you exact link because it's private, but I can describe it - basically, there is div container with headings and some images, and below it is tables, which load dynamically (rounding circles), but not updating (so they load only once). If open source code view in browser, you can't find these tables, only container and headings/src of images. But if you click on tables and choose "inspect element", you are able to see typical structure of <th <tr> <td> etc.
Methods I know:
1) Save page and then scrape it - probably not the best solution.
Is there any fast way to save all pages if I have a list of their URLs?
2) Use Internet Explorer controls via VBA, wait till page is loaded and then get elements as usual - but seems slow for me(?) - like 25s on one page, even if it's loaded for 0.5s.
Maybe I should turn off something that slows down loading?
Can you check what is wrong?
Here is code I found:
Sub FuturesScrap3(ByVal URL As String)
Dim HTMLDoc As New HTMLDocument
Dim AnchorLinks As Object
Dim tdElements As Object
Dim tdElement As Object
Dim AnchorLink As Object
Dim lRow As Long
Dim oElement As Object
Dim oIE As InternetExplorer
Set oIE = New InternetExplorer
oIE.navigate URL
oIE.Visible = True
Do Until (oIE.readyState = 4 And Not oIE.Busy)
DoEvents
Loop
'Wait for Javascript to run
Application.Wait (Now + TimeValue("0:01:00"))
HTMLDoc.body.innerHTML = oIE.document.body.innerHTML
With HTMLDoc.body
Set AnchorLinks = .getElementsByTagName("a")
Set tdElements = .getElementsByTagName("td") '
For Each AnchorLink In AnchorLinks
Debug.Print AnchorLink.innerText
Next AnchorLink
End With
lRow = 1
For Each tdElement In tdElements
Debug.Print tdElement.innerText
Cells(lRow, 1).Value = tdElement.innerText
lRow = lRow + 1
Next
'Clicking the Month tab
For Each oElement In oIE.document.all
If Trim(oElement.innerText) = "Month" Then
oElement.Focus
oElement.Click
End If
Next oElement
Do Until (oIE.readyState = 4 And Not oIE.Busy)
DoEvents
Loop
'Wait for Javascript to run
Application.Wait (Now + TimeValue("0:01:00"))
HTMLDoc.body.innerHTML = oIE.document.body.innerHTML
With HTMLDoc.body
Set AnchorLinks = .getElementsByTagName("a")
Set tdElements = .getElementsByTagName("td") '
For Each AnchorLink In AnchorLinks
Debug.Print AnchorLink.innerText
Next AnchorLink
End With
lRow = 1
For Each tdElement In tdElements
Debug.Print tdElement.innerText
Cells(lRow, 2).Value = tdElement.innerText
lRow = lRow + 1
Next tdElement End sub
3) Use web drivers like Selenium - couldn't find proper examples. If you give me some from scratch, like getting data from element by classname, it will be great.
4) Unknown for me, but possibly the fastest - getting data directly from JS variables/arrays which are used to build these tables. I heard you can connect VBA with JavaScript but haven't found any proper examples how to get data.
All solutions should be in VBA range. I would like to know what is the fastest way.
Thank you for your comments.
#Marc, no, it was not possible to get the data using web query/power query's "import from web", only headings.
I edited code a bit - there was 1 minute(!) delay (maybe author made mistake when he added delay to load scripts on page).
Sub FuturesScrap3(ByVal URL As String)
Dim HTMLDoc As New HTMLDocument
Dim AnchorLinks As Object
Dim tdElements As Object
Dim tdElement As Object
Dim AnchorLink As Object
Dim lRow As Long
Dim oElement As Object
Dim oIE As InternetExplorer
Set oIE = New InternetExplorer
oIE.navigate URL
oIE.Visible = True
Do Until (oIE.readyState = 4 And Not oIE.Busy)
DoEvents
Loop
'Wait for Javascript to run - 1 second is enough in my case
Application.Wait (Now + TimeValue("0:00:01"))
HTMLDoc.body.innerHTML = oIE.document.body.innerHTML
With HTMLDoc.body
Set AnchorLinks = .getElementsByTagName("a")
Set tdElements = .getElementsByTagName("td") '
For Each AnchorLink In AnchorLinks
Debug.Print AnchorLink.innerText
Next AnchorLink
End With
lRow = 1
For Each tdElement In tdElements
Debug.Print tdElement.innerText
Cells(lRow, 1).Value = tdElement.innerText
lRow = lRow + 1
Next
'Clicking the Month tab
For Each oElement In oIE.document.all
If Trim(oElement.innerText) = "Month" Then
oElement.Focus
oElement.Click
End If
Next oElement
Do Until (oIE.readyState = 4 And Not oIE.Busy)
DoEvents
Loop
HTMLDoc.body.innerHTML = oIE.document.body.innerHTML
With HTMLDoc.body
Set AnchorLinks = .getElementsByTagName("a")
Set tdElements = .getElementsByTagName("td") '
For Each AnchorLink In AnchorLinks
Debug.Print AnchorLink.innerText
Next AnchorLink
End With
lRow = 1
For Each tdElement In tdElements
Debug.Print tdElement.innerText
Cells(lRow, 2).Value = tdElement.innerText
lRow = lRow + 1
Next tdElement
End sub
Related
I am attempting to draw financial data from the Australian stock exchange (ASX). Specifically, the share registry for a particular company.
The website would be something like this (I say something because the ticker in the URL would change from firm to firm).
The HTML snippet I am looking at is:
<td class="ng-binding">COMPUTERSHARE INVESTOR SERVICES PTY LIMITED
<br>Yarra Falls, 452 Johnston Street, ABBOTSFORD, VIC, AUSTRALIA, 3067</td>
My overarching goal is to create a spreadsheet whereby I would have a list of Ticker Symbols and a macro would take that symbol, place it in the url and draw the adjacent Share Registry value into Excel. (In this case, COMPUTERSHARE INVESTOR SERVICES PTY LIMITED)
I tried to figure out how to use a variable url (I believe it involves Concatenation) and draw financial tables from websites.
I've settled on this tutorial that doesn't post the results into Excel, just presents it in a box. If I can results to post in the box that'll be a great first step.
Many tutorials use the IE object and from what I can gather, the use of XMLHttpRequest is far more efficient and nimble and considering I would have a large number of ticker symbols, would probably be best to use with regards to time.
Sub Get_Web_Data()
' TeachExcel.com
Dim request As Object
Dim response As String
Dim html As New HTMLDocument
Dim website As String
Dim price As Variant
' Website to go to.
website = "https://www.asx.com.au/asx/share-price-research/company/BFG/details"
' Create the object that will make the webpage request.
Set request = CreateObject("MSXML2.XMLHTTP")
' Where to go and how to go there - probably don't need to change this.
request.Open "GET", website, False
' Get fresh data.
request.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
' Send the request for the webpage.
request.send
' Get the webpage response data into a variable.
response = StrConv(request.responseBody, vbUnicode)
' Put the webpage into an html object to make data references easier.
html.body.innerHTML = response
' Get the price from the specified element on the page.
price = html.getElementsByClassName("disclaimer disclaimer-company-info").Item(0).innerText
' Output the price into a message box.
MsgBox price
End Sub
I edited the code to present the only HTML class element that seems to be working for me, which is the disclaimer info at the bottom of the page. I gather that the use of html.getElementsByClassName is wrong in my scenario.
I have read that CSS Selectors might be applicable in this case and the use of 'children' (I think) as the info I am after (ng-binding) occurs numerous times throughout the table, so I gather I would need to direct the macro to select the nth instance.
I do not expect a completely written extract. Any hints and tips to point me in the right direction?
How about this?
Sub Web_Table_Option_Two()
Dim HTMLDoc As New HTMLDocument
Dim objTable As Object
Dim lRow As Long
Dim lngTable As Long
Dim lngRow As Long
Dim lngCol As Long
Dim ActRw As Long
Dim objIE As InternetExplorer
Set objIE = New InternetExplorer
objIE.Navigate "https://www.asx.com.au/asx/share-price-research/company/BFG/details"
Do Until objIE.ReadyState = 4 And Not objIE.Busy
DoEvents
Loop
Application.Wait (Now + TimeValue("0:00:03")) 'wait for java script to load
HTMLDoc.body.innerHTML = objIE.Document.body.innerHTML
With HTMLDoc.body
Set objTable = .getElementsByTagName("table")
For lngTable = 0 To objTable.Length - 1
For lngRow = 0 To objTable(lngTable).Rows.Length - 1
For lngCol = 0 To objTable(lngTable).Rows(lngRow).Cells.Length - 1
ThisWorkbook.Sheets("Sheet1").Cells(ActRw + lngRow + 1, lngCol + 1) = objTable(lngTable).Rows(lngRow).Cells(lngCol).innerText
Next lngCol
Next lngRow
ActRw = ActRw + objTable(lngTable).Rows.Length + 1
Next lngTable
End With
objIE.Quit
End Sub
If you want to specify the row, you can do that, and just grab the row number that you want/need.
Finally, if you want to loop through an array of stock tickers, use the code below.
Sub Web_Table_Option_Two()
Dim HTMLDoc As New HTMLDocument
Dim objTable As Object
Dim lRow As Long
Dim lngTable As Long
Dim lngRow As Long
Dim lngCol As Long
Dim ActRw As Long
Dim objIE As InternetExplorer
Set objIE = New InternetExplorer
Dim c As Range
Dim sht As Worksheet
Dim LastRow As Long
Dim wb As Workbook: Set wb = ThisWorkbook
Set sht = wb.Sheets("Stocks")
'find last used row in ColumnA
LastRow = sht.Cells(sht.Rows.Count, "A").End(xlUp).Row
For Each c In Range("A2:A" & LastRow)
mystock = c.Value
objIE.Navigate "https://www.asx.com.au/asx/share-price-research/company/" & mystock & "/details"
Do Until objIE.ReadyState = 4 And Not objIE.Busy
DoEvents
Loop
Sheets.Add After:=ActiveSheet
ActiveSheet.Name = mystock
ActRw = 1
Application.Wait (Now + TimeValue("0:00:01")) 'wait for java script to load
HTMLDoc.body.innerHTML = objIE.Document.body.innerHTML
With HTMLDoc.body
Set objTable = .getElementsByTagName("table")
For lngTable = 0 To objTable.Length - 1
For lngRow = 0 To objTable(lngTable).Rows.Length - 1
For lngCol = 0 To objTable(lngTable).Rows(lngRow).Cells.Length - 1
ThisWorkbook.ActiveSheet.Cells(ActRw + lngRow + 1, lngCol + 1) = objTable(lngTable).Rows(lngRow).Cells(lngCol).innerText
Next lngCol
Next lngRow
ActRw = ActRw + objTable(lngTable).Rows.Length + 1
Next lngTable
End With
Next c
objIE.Quit
End Sub
Before:
After:
I am trying to select data from the dropdown in the web URL, my all code is working fine but I am unable to select the value from the dropdown.
Sub pulldata2()
Dim tod As String, UnderLay As String
Dim IE As Object
Dim doc As HTMLDocument
'Html table
Dim Tbl As HTMLTable, Cel As HTMLTableCell, Rw As HTMLTableRow, Col As HTMLTableCol
Dim TrgRw As Long, TrgCol As Long
'Create new sheet
tod = ThisWorkbook.Sheets("URLList").Range("C2").Value
have = False
For Each sht In ThisWorkbook.Sheets
If sht.Name = tod Then
have = True
Exit For
End If
Next sht
If have = False Then
Sheets.Add After:=Sheets(Sheets.Count)
ActiveSheet.Name = tod
Else
If MsgBox("Sheet " & tod & " already exists Overwrite Data?", vbYesNo) = vbNo Then Exit Sub
End If
'Start Internetexplorer
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
IE.navigate "https://nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp?segmentLink=17&instrument=OPTIDX&symbol=NIFTY&date=25APR2019"
Do While IE.Busy Or IE.readyState <> 4
Application.Wait DateAdd("s", 1, Now)
Loop
Set doc = IE.document
Dim ColOff As Long
'Put data to sheet and loop to next URL
For Nurl = 2 To 191
ColOff = (Nurl - 2) * 23
TrgRw = 1
UnderLay = ThisWorkbook.Sheets("URLList").Range("A" & Nurl).Value
doc.getElementById("underlyStock").Value = UnderLay
doc.parentWindow.execScript "goBtnClick('stock');", "javascript"
'now i want to select data from dropdown id=date, value= 27JUN2019
doc.querySelector("Select[name=date] option[value=27JUN2019]").Selected = True
Do While IE.Busy Or IE.readyState <> 4
Application.Wait DateAdd("s", 1, Now)
Loop
Set Tbl = doc.getElementById("octable")
ThisWorkbook.Sheets(tod).Cells(TrgRw, ColOff + 1).Value = UnderLay
ThisWorkbook.Sheets(tod).Cells(TrgRw, ColOff + 1).Font.Size = 20
ThisWorkbook.Sheets(tod).Cells(TrgRw, ColOff + 1).Font.Bold = True
ThisWorkbook.Sheets(tod).Cells(TrgRw, ColOff + 1).Select
TrgRw = TrgRw + 1
For Each Rw In Tbl.Rows
TrgCol = 1
For Each Cel In Rw.Cells
ThisWorkbook.Sheets(tod).Cells(TrgRw, ColOff + TrgCol).Value = Cel.innerText
TrgCol = TrgCol + Cel.colSpan ' if Column span is > 1 multiple
Next Cel
TrgRw = TrgRw + 1
Next Rw
TrgRw = TrgRw + 1
Next Nurl
'exit the internetexplorer
IE.Quit
Set IE = Nothing
End Sub
why my code not working, I am new in VBA please help to find an error in my code.
Simply alter the url rather than use dropdown
https://nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp?segmentLink=17&instrument=OPTIDX&symbol=NIFTY&date=27JUN2019
You can also use xhr to get the content rather than a slow browser. I use the clipboard to write out the table.
Option Explicit
Public Sub GetInfo()
Dim html As Object, hTable As Object, ws As Worksheet, clipboard As Object
Set html = New HTMLDocument
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp?segmentLink=17&instrument=OPTIDX&symbol=NIFTY&date=27JUN2019", False
.send
html.body.innerHTML = .responseText
Set hTable = html.getElementById("octable")
clipboard.SetText hTable.outerHTML
clipboard.PutInClipboard
ws.Range("A1").PasteSpecial
End With
End Sub
Alternative:
1) You could loop the tr and td within hTable above to write out the table
2) You could also use powerquery from web (via data tab Excel 2016+ , or using free powerquery add-in for 2013. You paste the url into the top of the pop up browser and press Go then select the table to import.
Changing stocks:
Stocks are part of the url query string e.g. symbol=NIFTY , so you can concatenate the new symbol into the url during a loop
"https://nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp?segmentLink=17&instrument=OPTIDX&symbol=" & yourSymbolGoesHere & "&date=27JUN2019"
If you really want to use IE be sure to encase the value of the attribute within '' e.g. '27JUN2019'
Option Explicit
'VBE > Tools > References: Microsoft Internet Controls
Public Sub ClickButton()
Dim ie As InternetExplorer
Const URL As String = "https://nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp?segmentLink=17&instrument=OPTIDX&symbol=NIFTY&date=25APR2019"
Set ie = New InternetExplorer
With ie
.Visible = True
.Navigate2 URL
While .Busy Or .readyState < 4: DoEvents: Wend
With .document
.querySelector("[value='27JUN2019']").Selected = True
Stop
End With
End With
End Sub
I have no knowledge of vba. Only the macro recorder is used.
I need to download the data from a web page to an Excel spreadsheet and with my knowledge of vba I am not capable.
In particular, what I want to do a macro to download to Excel a data table of the page: https://www.investing.com/equities/cellnex-telecom-historical-data
This download would have to be configured in terms of time, date range and ordering.
The steps would be the following:
1.- The objective is to copy the data from the "CLNX historical data" table to an Excel spreadsheet.
2.- That download should be done by previously selecting "Monthly" in the drop-down menu by calling "Term".
3.- That the download is made by previously selecting the range of dates for the last 2 years.
4.- Finally, order the table in descending order by the column "Maximum".
5.- Once the term, the date range and the order are selected, copy the data from the "CLNX historical data" table to an Excel spreadsheet.
I have tried with the macro recorder but I am not able to configure the term, the date range or the ordering.
Could someone help me?
Thanks for your help.
The code:
Sub DataInvesting()
Dim IE As Object
Set IE = CreateObject("InternetExplorer.Application")
IE.navigate "https://www.investing.com/equities/cellnex-telecom-historical-data"
Do Until IE.readyState = 4
DoEvents
Loop
IE.Document.getElementsByClassName("newInput selectBox float_lang_base_1")(0).Value = "Monthly"
IE.Visible = True
Set IE = Nothing
Set appIE = Nothing
End Sub
I have just tested the following code and it works, instead of creating an instance of internet explorer every time we need to run this macro, we will use xmlhttp requests. Just copy the entire code and paste it into a module in vba. Don't forget to add references (Tools/References) to Microsoft HTML Object Library and Microsoft XML v6.0.
Option Explicit
Sub Export_Table()
'Html Objects---------------------------------------'
Dim htmlDoc As MSHTML.HTMLDocument
Dim htmlBody As MSHTML.htmlBody
Dim ieTable As MSHTML.HTMLTable
Dim Element As MSHTML.HTMLElementCollection
'Workbooks, Worksheets, Ranges, LastRow, Incrementers ----------------'
Dim wb As Workbook
Dim Table As Worksheet
Dim i As Long
Set wb = ThisWorkbook
Set Table = wb.Worksheets("Sheet1")
'-------------------------------------------'
Dim xmlHttpRequest As New MSXML2.XMLHTTP60 '
'-------------------------------------------'
i = 2
'Web Request --------------------------------------------------------------------------'
With xmlHttpRequest
.Open "POST", "https://www.investing.com/instruments/HistoricalDataAjax", False
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
.setRequestHeader "X-Requested-With", "XMLHttpRequest"
.send "curr_id=951681&smlID=1695217&header=CLNX+Historical+Data&st_date=01%2F01%2F2017&end_date=03%2F01%2F2019&interval_sec=Monthly&sort_col=date&sort_ord=DESC&action=historical_data"
If .Status = 200 Then
Set htmlDoc = CreateHTMLDoc
Set htmlBody = htmlDoc.body
htmlBody.innerHTML = xmlHttpRequest.responseText
Set ieTable = htmlDoc.getElementById("curr_table")
For Each Element In ieTable.getElementsByTagName("tr")
Table.Cells(i, 1) = Element.Children(0).innerText
Table.Cells(i, 2) = Element.Children(1).innerText
Table.Cells(i, 3) = Element.Children(2).innerText
Table.Cells(i, 4) = Element.Children(3).innerText
Table.Cells(i, 5) = Element.Children(4).innerText
Table.Cells(i, 6) = Element.Children(5).innerText
Table.Cells(i, 7) = Element.Children(6).innerText
i = i + 1
DoEvents: Next Element
End If
End With
Set xmlHttpRequest = Nothing
Set htmlDoc = Nothing
Set htmlBody = Nothing
Set ieTable = Nothing
Set Element = Nothing
End Sub
Public Function CreateHTMLDoc() As MSHTML.HTMLDocument
Set CreateHTMLDoc = CreateObject("htmlfile")
End Function
I can't test this as despite setting up a free account it keeps saying the password is wrong. Fed up with 5 password resets and same problem and suspect it want my social media details.
The following broadly outlines steps I would consider though some timed waits are most likely needed.
Option Explicit
'VBE > Tools > References:
' Microsoft Internet Controls
Public Sub Info()
Dim ie As New InternetExplorer
Const URL As String = ""https://www.investing.com/equities/cellnex-telecom-historical-data""
With ie
.Visible = True
.Navigate2 URL
While .Busy Or .readyState < 4: DoEvents: Wend
.document.querySelector(".login").Click
While .Busy Or .readyState < 4: DoEvents: Wend
.Navigate2 URL
While .Busy Or .readyState < 4: DoEvents: Wend
With .document.querySelector("#loginFormUser_email")
.Focus
.Value = "Bob#gmail.com"
End With
With .document.querySelector("#loginForm_password")
.Focus
.Value = "systemSucksDoesn'tAcceptMyPassword"
End With
Application.Wait Now + TimeSerial(0, 0, 2)
.document.querySelector("[onclick*=submitLogin]").Click
While .Busy Or .readyState < 4: DoEvents: Wend
.document.querySelector("#data_interval").Click
.document.querySelector("[value=Monthly]").Click
With .document.querySelector("#picker")
.Focus
.Value = "03/08/2017 - 03/08/2019"
.FireEvent "onchange"
End With
'TODO Sorting column when clarified which column
.document.querySelector("[title='Download Data']").Click
Application.Wait Now + TimeSerial(0, 0, 10)
Stop
.Quit
End With
End Sub
Try this.
Sub Web_Table_Option()
Dim HTMLDoc As New HTMLDocument
Dim objTable As Object
Dim lRow As Long
Dim lngTable As Long
Dim lngRow As Long
Dim lngCol As Long
Dim ActRw As Long
Dim objIE As InternetExplorer
Set objIE = New InternetExplorer
objIE.Navigate "https://www.investing.com/equities/cellnex-telecom-historical-data"
Do Until objIE.ReadyState = 4 And Not objIE.Busy
DoEvents
Loop
Application.Wait (Now + TimeValue("0:00:03")) 'wait for java script to load
HTMLDoc.body.innerHTML = objIE.Document.body.innerHTML
With HTMLDoc.body
Set objTable = .getElementsByTagName("curr_table")
For lngTable = 0 To objTable.Length - 1
For lngRow = 0 To objTable(lngTable).Rows.Length - 1
For lngCol = 0 To objTable(lngTable).Rows(lngRow).Cells.Length - 1
ThisWorkbook.Sheets("Sheet1").Cells(ActRw + lngRow + 1, lngCol + 1) = objTable(lngTable).Rows(lngRow).Cells(lngCol).innerText
Next lngCol
Next lngRow
ActRw = ActRw + objTable(lngTable).Rows.Length + 1
Next lngTable
End With
objIE.Quit
End Sub
I am looking to follow a series of URL's that are found in column A (example: https://www.ebay.com/itm/Apple-iPhone-7-GSM-Unlocked-Verizon-AT-T-TMobile-Sprint-32GB-128GB-256GB/352381131997?epid=225303158&hash=item520b8d5cdd:m:mWgYDe4a79NeLuAlV-RmAQA:rk:7:pf:0) and pull the following information from them:
- Title
- Price
- Description
I think there are multiple issues with my code... For one, I can't get the program to follow specific URL's listed in the Excel (only if I specify one within the code). Also, pulling multiple fields has given me issues.
Option Explicit
Public Sub ListingInfo()
Dim ie As New InternetExplorer, ws As Worksheet, t As Date
Dim i As Integer
i = 0
Do While Worksheets("Sheet1").Cells(i, 1).Value <> ""
Const MAX_WAIT_SEC As Long = 10
Set ws = ThisWorkbook.Worksheets("Sheet1")
With ie
.Visible = True
.Navigate2 Worksheets("Sheet1").Cells(i, 1).Value
While .Busy Or .readyState < 4: DoEvents: Wend
Dim Links As Object, i As Long, count As Long
t = Timer
Do
On Error Resume Next
Set Title = .document.querySelectorAll("it-ttl")
Set price = .document.querySelectorAll("notranslate")
Set Description = .document.querySelectorAll("ds_div")
count = Links.Length
On Error GoTo 0
If Timer - t > MAX_WAIT_SEC Then Exit Do
Loop While count = 0
For i = 0 To Title.Length - 1
ws.Cells(i + 1, 1) = Title.item(i)
ws.Cells(i + 1, 2) = price.item(i)
ws.Cells(i + 1, 3) = Description.item(i)
Next
.Quit
i = i + 1
Loop
End With
End Sub
I would use late binding for MSXML2.XMLHTTP and set a reference to the Microsoft HTML Object Library for the HTMLDocument.
Note: querySelector() references the first item it finds that matches its search string.
Here is the short version:
Public Sub ListingInfo()
Dim cell As Range
With ThisWorkbook.Worksheets("Sheet1")
For Each cell In .Range("A1", .Cells(.Rows.Count, 1).End(xlUp))
Dim Document As MSHTML.HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", cell.Value, False
.send
Set Document = New MSHTML.HTMLDocument
Document.body.innerHTML = .responseText
End With
cell.Offset(0, 1).Value = Document.getElementByID("itemTitle").innerText
cell.Offset(0, 2).Value = Document.getElementByID("prcIsum").innerText
If Not Document.querySelector(".viSNotesCnt") Is Nothing Then
cell.Offset(0, 3).Value = Document.querySelector(".viSNotesCnt").innerText
Else
'Try Something Else
End If
Next
End With
End Sub
A more elaborate solution would be to break the code up into smaller routines and load the data into an Array. The main advantage of this is that you can test each subroutine separately.
Option Explicit
Public Type tListingInfo
Description As String
Price As Currency
Title As String
End Type
Public Sub ListingInfo()
Dim source As Range
Dim data As Variant
With ThisWorkbook.Worksheets("Sheet1")
Set source = .Range("A1:D1", .Cells(.Rows.count, 1).End(xlUp))
data = source.Value
End With
Dim r As Long
Dim record As tListingInfo
Dim url As String
For r = 1 To UBound(data)
record = getListingInfo()
url = data(r, 1)
record = getListingInfo(url)
With record
data(r, 2) = .Description
data(r, 3) = .Price
data(r, 4) = .Title
End With
Next
source.Value = data
End Sub
Public Function getListingInfo(url As String) As tListingInfo
Dim ListingInfo As tListingInfo
Dim Document As MSHTML.HTMLDocument
Set Document = getHTMLDocument(url)
With ListingInfo
.Description = Document.getElementByID("itemTitle").innerText
.Price = Split(Document.getElementByID("prcIsum").innerText)(1)
.Title = Document.querySelectorAll(".viSNotesCnt")(0).innerText
Debug.Print .Description, .Price, .Title
End With
End Function
Public Function getHTMLDocument(url As String) As MSHTML.HTMLDocument
Const READYSTATE_COMPLETE As Long = 4
Dim Document As MSHTML.HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", url, False
.send
If .readyState = READYSTATE_COMPLETE And .Status = 200 Then
Set Document = New MSHTML.HTMLDocument
Document.body.innerHTML = .responseText
Set getHTMLDocument = Document
Else
MsgBox "URL: " & vbCrLf & "Ready state: " & .readyState & vbCrLf & "HTTP request status: " & .Status, vbInformation, "URL Not Responding"
End If
End With
End Function
There are a lot of things to fix in your code. It is late here so I will just give pointers (and update fully later) and working code below:
Declare all variables and use appropriate type
Review For Loops and how transpose can be used to create a 1d array of urls pulled from sheet to loop over
Review the difference between querySelector and querySelectorAll methods
Review CSS selectors (you are specifying everything as type selector when in fact you are not selecting by tag for the elements of interest; nor by your stated text)
Think about placement of your IE object creation and of your .Navigate2 to make use of existing object
Make sure to use distinct loop counters
Be sure not to overwrite values in sheet
Code:
Option Explicit
Public Sub ListingInfo()
Dim ie As New InternetExplorer, ws As Worksheet
Dim i As Long, urls(), rowCounter As Long
Dim title As Object, price As Object, description As Object
Set ws = ThisWorkbook.Worksheets("Sheet1")
urls = Application.Transpose(ws.Range("A1:A2").Value) '<= Adjust
With ie
.Visible = True
For i = LBound(urls) To UBound(urls)
If InStr(urls(i), "http") > 0 Then
rowCounter = rowCounter + 1
.Navigate2 urls(i)
While .Busy Or .readyState < 4: DoEvents: Wend
Set title = .document.querySelector(".it-ttl")
Set price = .document.querySelector("#prcIsum")
Set description = .document.querySelector("#viTabs_0_is")
ws.Cells(rowCounter, 3) = title.innerText
ws.Cells(rowCounter, 4) = price.innerText
ws.Cells(rowCounter, 5) = description.innerText
Set title = Nothing: Set price = Nothing: Set description = Nothing
End If
Next
.Quit
End With
End Sub
Here's an approach using Web Requests, using MSXML. It should be significantly faster than using IE, and I'd encourage you to strongly consider using this approach wherever possible.
You'll need references to Microsoft HTML Object Library and Microsoft XML v6.0 to get this working.
Option Explicit
Public Sub SubmitRequest()
Dim URLs As Excel.Range
Dim URL As Excel.Range
Dim LastRow As Long
Dim wb As Excel.Workbook: Set wb = ThisWorkbook
Dim ws As Excel.Worksheet: Set ws = wb.Worksheets(1)
Dim ListingDetail As Variant
Dim i As Long
Dim j As Long
Dim html As HTMLDocument
ReDim ListingDetail(0 To 2, 0 To 10000)
'Get URLs
With ws
LastRow = .Cells(.Rows.Count, 1).End(xlUp).Row
Set URLs = .Range(.Cells(1, 1), .Cells(LastRow, 1))
End With
'Update the ListingDetail
For Each URL In URLs
Set html = getHTML(URL.Value2)
ListingDetail(0, i) = html.getElementByID("itemTitle").innertext 'Title
ListingDetail(1, i) = html.getElementByID("prcIsum").innertext 'Price
ListingDetail(2, i) = html.getElementsByClassName("viSNotesCnt")(0).innertext 'Seller Notes
i = i + 1
Next
'Resize array
ReDim Preserve ListingDetail(0 To 2, 0 To i - 1)
'Dump in Column T,U,V of existing sheet
ws.Range("T1:V" & i).Value = WorksheetFunction.Transpose(ListingDetail)
End Sub
Private Function getHTML(ByVal URL As String) As HTMLDocument
'Add a reference to Microsoft HTML Object Library
Set getHTML = New HTMLDocument
With New MSXML2.XMLHTTP60
.Open "GET", URL
.send
getHTML.body.innerHTML = .responseText
End With
End Function
I have a VBA code that selects info from drop-down menus on a government website and then submits the query. The requested data then opens up in another IE page. I am trying to copy this data into excel; however, I am unable to do so.
My code currently copies the text on the first IE page that contains the drop-down menus. The government website is: http://www.osfi-bsif.gc.ca/Eng/wt-ow/Pages/FINDAT.aspx
I have look all over the internet for a solution but nothing seems to work...
Here is my code:
Sub GetOsfiFinancialData()
Dim UrlAddress As String
UrlAddress = "http://ws1.osfi-bsif.gc.ca/WebApps/FINDAT/DTIBanks.aspx?T=0&LANG=E"
Dim ie As Object
Set ie = CreateObject("internetexplorer.application")
With ie
.Silent = True
.Visible = False
.navigate UrlAddress
End With
Do Until Not ie.Busy And ie.readyState = 4
DoEvents
Loop
Application.Wait (Now() + TimeValue("00:00:05"))
'Select Bank
ie.document.getElementById("DTIWebPartManager_gwpDTIBankControl1_DTIBankControl1_institutionTypeCriteria_institutionsDropDownList").Value = Z005
'open window with financial data
Dim objButton
Set objButton = ie.document.getElementById("DTIWebPartManager_gwpDTIBankControl1_DTIBankControl1_submitButton")
objButton.Focus
objButton.Click
'select new pop-up window
marker = 0
Set objshell = CreateObject("Shell.Application")
IE_count = objshell.Windows.Count
For x = 0 To (IE_count - 1)
On Error Resume Next ' sometimes more web pages are counted than are open
my_title = objshell.Windows(x).document.Title
If my_title Like "Consolidated Monthly Balance Sheet" & "*" Then 'compare to find if the desired web page is already open
Set ie = objshell.Windows(x)
marker = 1
Exit For
Else
End If
Next
Do Until Not ie.Busy And ie.readyState = 4
DoEvents
Loop
Application.Wait (Now() + TimeValue("00:00:05"))
Dim doc As MSHTML.HTMLDocument
Dim tables As MSHTML.IHTMLElementCollection
Dim table As MSHTML.HTMLTable
Dim clipboard As MSForms.DataObject
Set doc = ie.document
Set tables = doc.getElementsByTagName("body")
Set table = tables(0)
Set clipboard = New MSForms.DataObject
'paste in sheets
Dim test
Set test = ActiveWorkbook.Sheets("Test")
clipboard.SetText table.outerHTML
clipboard.PutInClipboard
test.Range("A1").PasteSpecial xlPasteAll
clipboard.Clear
MsgBox ("Task Completed")
End Sub
Your help is greatly appreciated!
You were using the current test with document.Title. I found that For Each of all windows looking for the full title worked in combination with copy pasting the pop-up window outerHTML. No additional wait time was required.
Inside the For Each Loop, after you reset the IE instance to the new window, you can obtain the new URL with ie.document.url. As you already have the data loaded you might as well just copy paste it straight away in my opinion.
Code:
Option Explicit
Public Sub GetOsfiFinancialData()
Dim UrlAddress As String, objButton, ie As Object
UrlAddress = "http://ws1.osfi-bsif.gc.ca/WebApps/FINDAT/DTIBanks.aspx?T=0&LANG=E"
Set ie = CreateObject("internetexplorer.application")
With ie
.Silent = True
.Visible = False
.navigate UrlAddress
While .Busy Or .readyState < 4: DoEvents: Wend
.document.getElementById("DTIWebPartManager_gwpDTIBankControl1_DTIBankControl1_institutionTypeCriteria_institutionsDropDownList").Value = "Z005"
Set objButton = .document.getElementById("DTIWebPartManager_gwpDTIBankControl1_DTIBankControl1_submitButton")
objButton.Focus
objButton.Click
Dim objShellWindows As New SHDocVw.ShellWindows, currentWindow As IWebBrowser2
For Each currentWindow In objShellWindows
If currentWindow.document.Title = "Consolidated Monthly Balance Sheet - Banks, Trust and Loan" Then
Set ie = currentWindow
Exit For
End If
Next
Dim clipboard As Object
Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
clipboard.SetText ie.document.body.outerHTML
clipboard.PutInClipboard
ThisWorkbook.Worksheets("Sheet1").Cells(1, 1).PasteSpecial
.Quit
End With
End Sub
References (VBE > Tools > References):
Microsoft Internet Controls
I don't have time to get into all the stuff about controlling one browser from another, but I think you can figure that part out, especially since you made some great progress on this already. Get URL#2 from URL#1, like you are doing, but with some better data controls around it, and then do this...
Option Explicit
Sub Web_Table_Option_One()
Dim xml As Object
Dim html As Object
Dim objTable As Object
Dim result As String
Dim lRow As Long
Dim lngTable As Long
Dim lngRow As Long
Dim lngCol As Long
Dim ActRw As Long
Set xml = CreateObject("MSXML2.XMLHTTP.6.0")
With xml
.Open "GET", "http://ws1.osfi-bsif.gc.ca/WebApps/Temp/2f40b7ef-d024-4eca-a8a3-fb82153efafaFinancialData.aspx", False
.send
End With
result = xml.responseText
Set html = CreateObject("htmlfile")
html.body.innerHTML = result
Set objTable = html.getElementsByTagName("Table")
For lngTable = 0 To objTable.Length - 1
For lngRow = 0 To objTable(lngTable).Rows.Length - 1
For lngCol = 0 To objTable(lngTable).Rows(lngRow).Cells.Length - 1
ThisWorkbook.Sheets("Sheet1").Cells(ActRw + lngRow + 1, lngCol + 1) = objTable(lngTable).Rows(lngRow).Cells(lngCol).innerText
Next lngCol
Next lngRow
ActRw = ActRw + objTable(lngTable).Rows.Length + 1
Next lngTable
End Sub