VBA: copy data from website into excel - excel

I have a VBA code that selects info from drop-down menus on a government website and then submits the query. The requested data then opens up in another IE page. I am trying to copy this data into excel; however, I am unable to do so.
My code currently copies the text on the first IE page that contains the drop-down menus. The government website is: http://www.osfi-bsif.gc.ca/Eng/wt-ow/Pages/FINDAT.aspx
I have look all over the internet for a solution but nothing seems to work...
Here is my code:
Sub GetOsfiFinancialData()
Dim UrlAddress As String
UrlAddress = "http://ws1.osfi-bsif.gc.ca/WebApps/FINDAT/DTIBanks.aspx?T=0&LANG=E"
Dim ie As Object
Set ie = CreateObject("internetexplorer.application")
With ie
.Silent = True
.Visible = False
.navigate UrlAddress
End With
Do Until Not ie.Busy And ie.readyState = 4
DoEvents
Loop
Application.Wait (Now() + TimeValue("00:00:05"))
'Select Bank
ie.document.getElementById("DTIWebPartManager_gwpDTIBankControl1_DTIBankControl1_institutionTypeCriteria_institutionsDropDownList").Value = Z005
'open window with financial data
Dim objButton
Set objButton = ie.document.getElementById("DTIWebPartManager_gwpDTIBankControl1_DTIBankControl1_submitButton")
objButton.Focus
objButton.Click
'select new pop-up window
marker = 0
Set objshell = CreateObject("Shell.Application")
IE_count = objshell.Windows.Count
For x = 0 To (IE_count - 1)
On Error Resume Next ' sometimes more web pages are counted than are open
my_title = objshell.Windows(x).document.Title
If my_title Like "Consolidated Monthly Balance Sheet" & "*" Then 'compare to find if the desired web page is already open
Set ie = objshell.Windows(x)
marker = 1
Exit For
Else
End If
Next
Do Until Not ie.Busy And ie.readyState = 4
DoEvents
Loop
Application.Wait (Now() + TimeValue("00:00:05"))
Dim doc As MSHTML.HTMLDocument
Dim tables As MSHTML.IHTMLElementCollection
Dim table As MSHTML.HTMLTable
Dim clipboard As MSForms.DataObject
Set doc = ie.document
Set tables = doc.getElementsByTagName("body")
Set table = tables(0)
Set clipboard = New MSForms.DataObject
'paste in sheets
Dim test
Set test = ActiveWorkbook.Sheets("Test")
clipboard.SetText table.outerHTML
clipboard.PutInClipboard
test.Range("A1").PasteSpecial xlPasteAll
clipboard.Clear
MsgBox ("Task Completed")
End Sub
Your help is greatly appreciated!

You were using the current test with document.Title. I found that For Each of all windows looking for the full title worked in combination with copy pasting the pop-up window outerHTML. No additional wait time was required.
Inside the For Each Loop, after you reset the IE instance to the new window, you can obtain the new URL with ie.document.url. As you already have the data loaded you might as well just copy paste it straight away in my opinion.
Code:
Option Explicit
Public Sub GetOsfiFinancialData()
Dim UrlAddress As String, objButton, ie As Object
UrlAddress = "http://ws1.osfi-bsif.gc.ca/WebApps/FINDAT/DTIBanks.aspx?T=0&LANG=E"
Set ie = CreateObject("internetexplorer.application")
With ie
.Silent = True
.Visible = False
.navigate UrlAddress
While .Busy Or .readyState < 4: DoEvents: Wend
.document.getElementById("DTIWebPartManager_gwpDTIBankControl1_DTIBankControl1_institutionTypeCriteria_institutionsDropDownList").Value = "Z005"
Set objButton = .document.getElementById("DTIWebPartManager_gwpDTIBankControl1_DTIBankControl1_submitButton")
objButton.Focus
objButton.Click
Dim objShellWindows As New SHDocVw.ShellWindows, currentWindow As IWebBrowser2
For Each currentWindow In objShellWindows
If currentWindow.document.Title = "Consolidated Monthly Balance Sheet - Banks, Trust and Loan" Then
Set ie = currentWindow
Exit For
End If
Next
Dim clipboard As Object
Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
clipboard.SetText ie.document.body.outerHTML
clipboard.PutInClipboard
ThisWorkbook.Worksheets("Sheet1").Cells(1, 1).PasteSpecial
.Quit
End With
End Sub
References (VBE > Tools > References):
Microsoft Internet Controls

I don't have time to get into all the stuff about controlling one browser from another, but I think you can figure that part out, especially since you made some great progress on this already. Get URL#2 from URL#1, like you are doing, but with some better data controls around it, and then do this...
Option Explicit
Sub Web_Table_Option_One()
Dim xml As Object
Dim html As Object
Dim objTable As Object
Dim result As String
Dim lRow As Long
Dim lngTable As Long
Dim lngRow As Long
Dim lngCol As Long
Dim ActRw As Long
Set xml = CreateObject("MSXML2.XMLHTTP.6.0")
With xml
.Open "GET", "http://ws1.osfi-bsif.gc.ca/WebApps/Temp/2f40b7ef-d024-4eca-a8a3-fb82153efafaFinancialData.aspx", False
.send
End With
result = xml.responseText
Set html = CreateObject("htmlfile")
html.body.innerHTML = result
Set objTable = html.getElementsByTagName("Table")
For lngTable = 0 To objTable.Length - 1
For lngRow = 0 To objTable(lngTable).Rows.Length - 1
For lngCol = 0 To objTable(lngTable).Rows(lngRow).Cells.Length - 1
ThisWorkbook.Sheets("Sheet1").Cells(ActRw + lngRow + 1, lngCol + 1) = objTable(lngTable).Rows(lngRow).Cells(lngCol).innerText
Next lngCol
Next lngRow
ActRw = ActRw + objTable(lngTable).Rows.Length + 1
Next lngTable
End Sub

Related

Scrape CDC vaccination data using VBA

I am trying to scrape the vaccination data from the below CDC website:
https://covid.cdc.gov/covid-data-tracker/#vaccinations
I have tried querySelectorAll but no luck. Can anyone help take a look? Much appreciated!
Sub useClassnames()
Dim appIE As Object
Set appIE = CreateObject("internetexplorer.application")
With appIE
.navigate "https://covid.cdc.gov/covid-data-tracker/#vaccinations"
.Visible = False
End With
Do While appIE.Busy
DoEvents
Loop
Set allRowOfData = appIE.document.getElementById("maincontent")
Debug.Print allRowOfData.innerHTML
'Set element = appIE.document.querySelectorAll(".container mt-5")
'For Each elements In element
' Debug.Print elements
'Next elements
'For Each element In allRowOfData
'Debug.Print element
'Next element
End Sub
Here you have, just change your worksheet name or number :)
Option Explicit
Const updatedCol = 1
Const dosesDistributedColVal = 2
Const peopleInicVaccColVal = 3
Sub useClassnames()
'declare worksheet variable and set headers
Dim targetWsh As Worksheet: Set targetWsh = ThisWorkbook.Sheets(1)
targetWsh.Cells(1, 1).Value = "Last Update"
targetWsh.Cells(1, 2).Value = "Doses Distributed"
targetWsh.Cells(1, 3).Value = "People Initiating Vaccination"
Dim lstRegisterRow As Long: lstRegisterRow = targetWsh.Range("A" & targetWsh.Rows.Count).End(xlUp).Row + 1
'open IE and navigate to site
Dim appIE As InternetExplorer: Set appIE = New InternetExplorer
appIE.navigate "https://covid.cdc.gov/covid-data-tracker/#vaccinations"
appIE.Visible = False
While appIE.Busy = True Or appIE.readyState < 4: DoEvents: Wend
Dim oHtmlDoc As HTMLDocument: Set oHtmlDoc = appIE.document
Dim oHtmlElementColl As IHTMLElementCollection
'Get and write last update date
Application.Wait (Now + TimeValue("0:00:02")) 'wait 2 secs to avoid error, if recieve error, add seconds as needed
Set oHtmlElementColl = oHtmlDoc.getElementsByTagName("small")
targetWsh.Cells(lstRegisterRow, updatedCol) = oHtmlElementColl(0).innerHTML
'Get and write Doses Distributed and People Initiating Vaccination
Set oHtmlElementColl = oHtmlDoc.GetElementsByClassName("card-number")
targetWsh.Cells(lstRegisterRow, dosesDistributedColVal) = oHtmlElementColl(0).innerText
targetWsh.Cells(lstRegisterRow, peopleInicVaccColVal) = oHtmlElementColl(1).innerText
appIE.Quit
End Sub

How to upload values to a website and select a button using Excel VBA

I am trying to automate the sending of SMSes from a company website but I do not know how to upload the message, the cellphone number and select the button to send the message.
Sub smssend()
Dim appIE As Object
Dim e As Object
Dim wb As Workbook
Dim ws As Worksheet
Dim cellno As String
Dim mess As String
Dim strPattern As String: strPattern = "^((?:\+27|0[0-9]{9}"
Dim regEx As New RegExp
Dim linecount As Long
linecount = 2
Set wb = Application.Workbooks("SMSmacro")
Set ws = wb.Worksheets("Sheet1")
Set appIE = CreateObject("internetexplorer.application")
With appIE
.navigate "http://cadde.abgza.co.za/SMS/CreateSMS/CreateSms"
Do While appIE.busy
DoEvents
Application.Wait (Now + TimeValue("0:00:03"))
Loop
.Visible = True
End With
Do While appIE.busy
DoEvents
Application.Wait (Now + TimeValue("0:00:03"))
Loop
Do While ws.Cells(linecount, 1) <> ""
cellno = ws.Cells(linecount, 1)
mess = ws.Cells(linecount, 2)
a = Len(mess)
If Len(mess) > 160 Then
ws.Cells(linecount, 4).Value = "Message Too Long"
GoTo nxt
End If
With regEx
If regEx.Test(cellno) Then
With appIE
Set e = appIE.document.getElementById("cellNumber")
e = cellno
End With
Else
ws.Cells(linecount, 3).Value = "Incorrect Cell Number"
End If
End With
nxt:
linecount = linecount + 1
Loop
End Sub
I've rewritten your code. Never use GoTo, except for error handling.
The three variables nodeCellNo, nodeMess and nodeSubBut are not mandatory. I have introduced them to make the code more comprehensible for you. Without variables you can access a node directly this way:
appIE.document.getElementById("cellNumber").Value = cellNo
I can't tell from your HTML code whether the change event of the textarea tag needs to be triggered to recognize the text of the message.
Please read the comments in the code carefully:
Sub SendSMS()
'If you have constant values use constants in your code
Const url As String = "http://cadde.abgza.co.za/SMS/CreateSMS/CreateSms"
Const strPattern As String = "^((?:\+27|0[0-9]{9}"
'If you use late binding, no Excel reference is required
'Advantage: The code runs immediately on every computer
'Disadvantage: No IntelliSense is available during programming
Dim appIE As Object
Dim nodeCellNo As Object
Dim nodeMess As Object
Dim nodeSubBut As Object
Dim wb As Workbook
Dim ws As Worksheet
Dim cellNo As String
Dim mess As String
Dim regEx As Object 'Changed to late binding
Dim currentRow As Long
'Initialize variables
currentRow = 2
Set wb = Application.Workbooks("SMSmacro.xlsm") 'Full name needed
Set ws = wb.Worksheets("Sheet1")
Set regEx = CreateObject("VBScript.RegExp")
Set appIE = CreateObject("internetexplorer.application")
'Use the following line if you are in an intranet and
'the IE lost connection to the remote server
'Set appIE = GetObject("new:{D5E8041D-920F-45e9-B8FB-B1DEB82C6E5E}")
'Load page to IE
appIE.Visible = True
appIE.navigate url
Do While appIE.busy: DoEvents: Loop
'The following line is only needed if the web page loads dynamic
'content after IE has reported that it is no longer busy
'The length of the break can be adjusted to your needs
Application.Wait (Now + TimeValue("0:00:03"))
'If the page looks like the SMS sending page after sending,
'the loop can start here. If another page is loaded, one
'could navigate again within the loop to the SMS URL. For
'many SMS I would set the loop further up and restart IE
'for each SMS. Unfortunately, if you navigate a lot in one
'instance of IE, it becomes unstable.
Do While ws.Cells(currentRow, 1) <> ""
cellNo = ws.Cells(currentRow, 1).Value
mess = ws.Cells(currentRow, 2).Value
If Len(mess) > 160 Then
ws.Cells(currentRow, 4).Value = "Message Too Long"
Else
If regEx.Test(cellNo) Then
'Insert cellphone number to html form
Set nodeCellNo = appIE.document.getElementById("cellNumber")
nodeCellNo.Value = cellNo
'Insert message to html form
'The textarea tag has no value attribute
'You can set the text by innertext to set it
'between the opening and the closing tectarea tag
Set nodeMess = appIE.document.getElementById("typedMessage")
nodeMess.innertext = mess
'Click submit button
Set nodeSubBut = appIE.document.getElementById("btnSend")
nodeSubBut.Click
'Wait to send the SMS
Application.Wait (Now + TimeValue("0:00:03"))
Else
ws.Cells(currentRow, 3).Value = "Incorrect Cell Number"
End If
End If
'Next SMS
currentRow = currentRow + 1
Loop
End Sub

How to select value from drop down on web url?

I am trying to select data from the dropdown in the web URL, my all code is working fine but I am unable to select the value from the dropdown.
Sub pulldata2()
Dim tod As String, UnderLay As String
Dim IE As Object
Dim doc As HTMLDocument
'Html table
Dim Tbl As HTMLTable, Cel As HTMLTableCell, Rw As HTMLTableRow, Col As HTMLTableCol
Dim TrgRw As Long, TrgCol As Long
'Create new sheet
tod = ThisWorkbook.Sheets("URLList").Range("C2").Value
have = False
For Each sht In ThisWorkbook.Sheets
If sht.Name = tod Then
have = True
Exit For
End If
Next sht
If have = False Then
Sheets.Add After:=Sheets(Sheets.Count)
ActiveSheet.Name = tod
Else
If MsgBox("Sheet " & tod & " already exists Overwrite Data?", vbYesNo) = vbNo Then Exit Sub
End If
'Start Internetexplorer
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
IE.navigate "https://nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp?segmentLink=17&instrument=OPTIDX&symbol=NIFTY&date=25APR2019"
Do While IE.Busy Or IE.readyState <> 4
Application.Wait DateAdd("s", 1, Now)
Loop
Set doc = IE.document
Dim ColOff As Long
'Put data to sheet and loop to next URL
For Nurl = 2 To 191
ColOff = (Nurl - 2) * 23
TrgRw = 1
UnderLay = ThisWorkbook.Sheets("URLList").Range("A" & Nurl).Value
doc.getElementById("underlyStock").Value = UnderLay
doc.parentWindow.execScript "goBtnClick('stock');", "javascript"
'now i want to select data from dropdown id=date, value= 27JUN2019
doc.querySelector("Select[name=date] option[value=27JUN2019]").Selected = True
Do While IE.Busy Or IE.readyState <> 4
Application.Wait DateAdd("s", 1, Now)
Loop
Set Tbl = doc.getElementById("octable")
ThisWorkbook.Sheets(tod).Cells(TrgRw, ColOff + 1).Value = UnderLay
ThisWorkbook.Sheets(tod).Cells(TrgRw, ColOff + 1).Font.Size = 20
ThisWorkbook.Sheets(tod).Cells(TrgRw, ColOff + 1).Font.Bold = True
ThisWorkbook.Sheets(tod).Cells(TrgRw, ColOff + 1).Select
TrgRw = TrgRw + 1
For Each Rw In Tbl.Rows
TrgCol = 1
For Each Cel In Rw.Cells
ThisWorkbook.Sheets(tod).Cells(TrgRw, ColOff + TrgCol).Value = Cel.innerText
TrgCol = TrgCol + Cel.colSpan ' if Column span is > 1 multiple
Next Cel
TrgRw = TrgRw + 1
Next Rw
TrgRw = TrgRw + 1
Next Nurl
'exit the internetexplorer
IE.Quit
Set IE = Nothing
End Sub
why my code not working, I am new in VBA please help to find an error in my code.
Simply alter the url rather than use dropdown
https://nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp?segmentLink=17&instrument=OPTIDX&symbol=NIFTY&date=27JUN2019
You can also use xhr to get the content rather than a slow browser. I use the clipboard to write out the table.
Option Explicit
Public Sub GetInfo()
Dim html As Object, hTable As Object, ws As Worksheet, clipboard As Object
Set html = New HTMLDocument
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp?segmentLink=17&instrument=OPTIDX&symbol=NIFTY&date=27JUN2019", False
.send
html.body.innerHTML = .responseText
Set hTable = html.getElementById("octable")
clipboard.SetText hTable.outerHTML
clipboard.PutInClipboard
ws.Range("A1").PasteSpecial
End With
End Sub
Alternative:
1) You could loop the tr and td within hTable above to write out the table
2) You could also use powerquery from web (via data tab Excel 2016+ , or using free powerquery add-in for 2013. You paste the url into the top of the pop up browser and press Go then select the table to import.
Changing stocks:
Stocks are part of the url query string e.g. symbol=NIFTY , so you can concatenate the new symbol into the url during a loop
"https://nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp?segmentLink=17&instrument=OPTIDX&symbol=" & yourSymbolGoesHere & "&date=27JUN2019"
If you really want to use IE be sure to encase the value of the attribute within '' e.g. '27JUN2019'
Option Explicit
'VBE > Tools > References: Microsoft Internet Controls
Public Sub ClickButton()
Dim ie As InternetExplorer
Const URL As String = "https://nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp?segmentLink=17&instrument=OPTIDX&symbol=NIFTY&date=25APR2019"
Set ie = New InternetExplorer
With ie
.Visible = True
.Navigate2 URL
While .Busy Or .readyState < 4: DoEvents: Wend
With .document
.querySelector("[value='27JUN2019']").Selected = True
Stop
End With
End With
End Sub

Web scraping in Investing.com with Excel vba

I have no knowledge of vba. Only the macro recorder is used.
I need to download the data from a web page to an Excel spreadsheet and with my knowledge of vba I am not capable.
In particular, what I want to do a macro to download to Excel a data table of the page: https://www.investing.com/equities/cellnex-telecom-historical-data
This download would have to be configured in terms of time, date range and ordering.
The steps would be the following:
1.- The objective is to copy the data from the "CLNX historical data" table to an Excel spreadsheet.
2.- That download should be done by previously selecting "Monthly" in the drop-down menu by calling "Term".
3.- That the download is made by previously selecting the range of dates for the last 2 years.
4.- Finally, order the table in descending order by the column "Maximum".
5.- Once the term, the date range and the order are selected, copy the data from the "CLNX historical data" table to an Excel spreadsheet.
I have tried with the macro recorder but I am not able to configure the term, the date range or the ordering.
Could someone help me?
Thanks for your help.
The code:
Sub DataInvesting()
Dim IE As Object
Set IE = CreateObject("InternetExplorer.Application")
IE.navigate "https://www.investing.com/equities/cellnex-telecom-historical-data"
Do Until IE.readyState = 4
DoEvents
Loop
IE.Document.getElementsByClassName("newInput selectBox float_lang_base_1")(0).Value = "Monthly"
IE.Visible = True
Set IE = Nothing
Set appIE = Nothing
End Sub
I have just tested the following code and it works, instead of creating an instance of internet explorer every time we need to run this macro, we will use xmlhttp requests. Just copy the entire code and paste it into a module in vba. Don't forget to add references (Tools/References) to Microsoft HTML Object Library and Microsoft XML v6.0.
Option Explicit
Sub Export_Table()
'Html Objects---------------------------------------'
Dim htmlDoc As MSHTML.HTMLDocument
Dim htmlBody As MSHTML.htmlBody
Dim ieTable As MSHTML.HTMLTable
Dim Element As MSHTML.HTMLElementCollection
'Workbooks, Worksheets, Ranges, LastRow, Incrementers ----------------'
Dim wb As Workbook
Dim Table As Worksheet
Dim i As Long
Set wb = ThisWorkbook
Set Table = wb.Worksheets("Sheet1")
'-------------------------------------------'
Dim xmlHttpRequest As New MSXML2.XMLHTTP60 '
'-------------------------------------------'
i = 2
'Web Request --------------------------------------------------------------------------'
With xmlHttpRequest
.Open "POST", "https://www.investing.com/instruments/HistoricalDataAjax", False
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
.setRequestHeader "X-Requested-With", "XMLHttpRequest"
.send "curr_id=951681&smlID=1695217&header=CLNX+Historical+Data&st_date=01%2F01%2F2017&end_date=03%2F01%2F2019&interval_sec=Monthly&sort_col=date&sort_ord=DESC&action=historical_data"
If .Status = 200 Then
Set htmlDoc = CreateHTMLDoc
Set htmlBody = htmlDoc.body
htmlBody.innerHTML = xmlHttpRequest.responseText
Set ieTable = htmlDoc.getElementById("curr_table")
For Each Element In ieTable.getElementsByTagName("tr")
Table.Cells(i, 1) = Element.Children(0).innerText
Table.Cells(i, 2) = Element.Children(1).innerText
Table.Cells(i, 3) = Element.Children(2).innerText
Table.Cells(i, 4) = Element.Children(3).innerText
Table.Cells(i, 5) = Element.Children(4).innerText
Table.Cells(i, 6) = Element.Children(5).innerText
Table.Cells(i, 7) = Element.Children(6).innerText
i = i + 1
DoEvents: Next Element
End If
End With
Set xmlHttpRequest = Nothing
Set htmlDoc = Nothing
Set htmlBody = Nothing
Set ieTable = Nothing
Set Element = Nothing
End Sub
Public Function CreateHTMLDoc() As MSHTML.HTMLDocument
Set CreateHTMLDoc = CreateObject("htmlfile")
End Function
I can't test this as despite setting up a free account it keeps saying the password is wrong. Fed up with 5 password resets and same problem and suspect it want my social media details.
The following broadly outlines steps I would consider though some timed waits are most likely needed.
Option Explicit
'VBE > Tools > References:
' Microsoft Internet Controls
Public Sub Info()
Dim ie As New InternetExplorer
Const URL As String = ""https://www.investing.com/equities/cellnex-telecom-historical-data""
With ie
.Visible = True
.Navigate2 URL
While .Busy Or .readyState < 4: DoEvents: Wend
.document.querySelector(".login").Click
While .Busy Or .readyState < 4: DoEvents: Wend
.Navigate2 URL
While .Busy Or .readyState < 4: DoEvents: Wend
With .document.querySelector("#loginFormUser_email")
.Focus
.Value = "Bob#gmail.com"
End With
With .document.querySelector("#loginForm_password")
.Focus
.Value = "systemSucksDoesn'tAcceptMyPassword"
End With
Application.Wait Now + TimeSerial(0, 0, 2)
.document.querySelector("[onclick*=submitLogin]").Click
While .Busy Or .readyState < 4: DoEvents: Wend
.document.querySelector("#data_interval").Click
.document.querySelector("[value=Monthly]").Click
With .document.querySelector("#picker")
.Focus
.Value = "03/08/2017 - 03/08/2019"
.FireEvent "onchange"
End With
'TODO Sorting column when clarified which column
.document.querySelector("[title='Download Data']").Click
Application.Wait Now + TimeSerial(0, 0, 10)
Stop
.Quit
End With
End Sub
Try this.
Sub Web_Table_Option()
Dim HTMLDoc As New HTMLDocument
Dim objTable As Object
Dim lRow As Long
Dim lngTable As Long
Dim lngRow As Long
Dim lngCol As Long
Dim ActRw As Long
Dim objIE As InternetExplorer
Set objIE = New InternetExplorer
objIE.Navigate "https://www.investing.com/equities/cellnex-telecom-historical-data"
Do Until objIE.ReadyState = 4 And Not objIE.Busy
DoEvents
Loop
Application.Wait (Now + TimeValue("0:00:03")) 'wait for java script to load
HTMLDoc.body.innerHTML = objIE.Document.body.innerHTML
With HTMLDoc.body
Set objTable = .getElementsByTagName("curr_table")
For lngTable = 0 To objTable.Length - 1
For lngRow = 0 To objTable(lngTable).Rows.Length - 1
For lngCol = 0 To objTable(lngTable).Rows(lngRow).Cells.Length - 1
ThisWorkbook.Sheets("Sheet1").Cells(ActRw + lngRow + 1, lngCol + 1) = objTable(lngTable).Rows(lngRow).Cells(lngCol).innerText
Next lngCol
Next lngRow
ActRw = ActRw + objTable(lngTable).Rows.Length + 1
Next lngTable
End With
objIE.Quit
End Sub

Scrape web content that was generated by JavaScript to Excel

I was scraping some web content using VBA and MSXML, so I know the basics. But now I would like to get data from web page which was generated by JavaScript.
I can't give you exact link because it's private, but I can describe it - basically, there is div container with headings and some images, and below it is tables, which load dynamically (rounding circles), but not updating (so they load only once). If open source code view in browser, you can't find these tables, only container and headings/src of images. But if you click on tables and choose "inspect element", you are able to see typical structure of <th <tr> <td> etc.
Methods I know:
1) Save page and then scrape it - probably not the best solution.
Is there any fast way to save all pages if I have a list of their URLs?
2) Use Internet Explorer controls via VBA, wait till page is loaded and then get elements as usual - but seems slow for me(?) - like 25s on one page, even if it's loaded for 0.5s.
Maybe I should turn off something that slows down loading?
Can you check what is wrong?
Here is code I found:
Sub FuturesScrap3(ByVal URL As String)
Dim HTMLDoc As New HTMLDocument
Dim AnchorLinks As Object
Dim tdElements As Object
Dim tdElement As Object
Dim AnchorLink As Object
Dim lRow As Long
Dim oElement As Object
Dim oIE As InternetExplorer
Set oIE = New InternetExplorer
oIE.navigate URL
oIE.Visible = True
Do Until (oIE.readyState = 4 And Not oIE.Busy)
DoEvents
Loop
'Wait for Javascript to run
Application.Wait (Now + TimeValue("0:01:00"))
HTMLDoc.body.innerHTML = oIE.document.body.innerHTML
With HTMLDoc.body
Set AnchorLinks = .getElementsByTagName("a")
Set tdElements = .getElementsByTagName("td") '
For Each AnchorLink In AnchorLinks
Debug.Print AnchorLink.innerText
Next AnchorLink
End With
lRow = 1
For Each tdElement In tdElements
Debug.Print tdElement.innerText
Cells(lRow, 1).Value = tdElement.innerText
lRow = lRow + 1
Next
'Clicking the Month tab
For Each oElement In oIE.document.all
If Trim(oElement.innerText) = "Month" Then
oElement.Focus
oElement.Click
End If
Next oElement
Do Until (oIE.readyState = 4 And Not oIE.Busy)
DoEvents
Loop
'Wait for Javascript to run
Application.Wait (Now + TimeValue("0:01:00"))
HTMLDoc.body.innerHTML = oIE.document.body.innerHTML
With HTMLDoc.body
Set AnchorLinks = .getElementsByTagName("a")
Set tdElements = .getElementsByTagName("td") '
For Each AnchorLink In AnchorLinks
Debug.Print AnchorLink.innerText
Next AnchorLink
End With
lRow = 1
For Each tdElement In tdElements
Debug.Print tdElement.innerText
Cells(lRow, 2).Value = tdElement.innerText
lRow = lRow + 1
Next tdElement End sub
3) Use web drivers like Selenium - couldn't find proper examples. If you give me some from scratch, like getting data from element by classname, it will be great.
4) Unknown for me, but possibly the fastest - getting data directly from JS variables/arrays which are used to build these tables. I heard you can connect VBA with JavaScript but haven't found any proper examples how to get data.
All solutions should be in VBA range. I would like to know what is the fastest way.
Thank you for your comments.
#Marc, no, it was not possible to get the data using web query/power query's "import from web", only headings.
I edited code a bit - there was 1 minute(!) delay (maybe author made mistake when he added delay to load scripts on page).
Sub FuturesScrap3(ByVal URL As String)
Dim HTMLDoc As New HTMLDocument
Dim AnchorLinks As Object
Dim tdElements As Object
Dim tdElement As Object
Dim AnchorLink As Object
Dim lRow As Long
Dim oElement As Object
Dim oIE As InternetExplorer
Set oIE = New InternetExplorer
oIE.navigate URL
oIE.Visible = True
Do Until (oIE.readyState = 4 And Not oIE.Busy)
DoEvents
Loop
'Wait for Javascript to run - 1 second is enough in my case
Application.Wait (Now + TimeValue("0:00:01"))
HTMLDoc.body.innerHTML = oIE.document.body.innerHTML
With HTMLDoc.body
Set AnchorLinks = .getElementsByTagName("a")
Set tdElements = .getElementsByTagName("td") '
For Each AnchorLink In AnchorLinks
Debug.Print AnchorLink.innerText
Next AnchorLink
End With
lRow = 1
For Each tdElement In tdElements
Debug.Print tdElement.innerText
Cells(lRow, 1).Value = tdElement.innerText
lRow = lRow + 1
Next
'Clicking the Month tab
For Each oElement In oIE.document.all
If Trim(oElement.innerText) = "Month" Then
oElement.Focus
oElement.Click
End If
Next oElement
Do Until (oIE.readyState = 4 And Not oIE.Busy)
DoEvents
Loop
HTMLDoc.body.innerHTML = oIE.document.body.innerHTML
With HTMLDoc.body
Set AnchorLinks = .getElementsByTagName("a")
Set tdElements = .getElementsByTagName("td") '
For Each AnchorLink In AnchorLinks
Debug.Print AnchorLink.innerText
Next AnchorLink
End With
lRow = 1
For Each tdElement In tdElements
Debug.Print tdElement.innerText
Cells(lRow, 2).Value = tdElement.innerText
lRow = lRow + 1
Next tdElement
End sub

Resources