VBA Scraping Data From Multiple Websites - excel

I am trying to use VBA to scrape commodity/stock prices from multiple pages on investing.com and insert them into an excel spreadsheet.
The following code is what I have working to do a single price, in this example gold:
Sub Extractdatafromwebsite()
Dim ie As New InternetExplorer
Dim doc As HTMLDocument
ie.Visible = False
ie.navigate "http://uk.investing.com/commodities/gold"
Do
DoEvents
Loop Until ie.READYSTATE = READYSTATE_COMPLETE
Set doc = ie.document
output = doc.GetElementById("last_last").innerText
Range("A1").Value = output
ie.Quit
End Sub
However I need data from multiple sites to get different prices, all at the same time.
I tried elaborating on the code I have that is working, the following example is what I tried to display the price of gold and silver, however it only displayed the gold price in cells A1 & A2:
Sub Extractdatafromwebsite()
Dim ie As New InternetExplorer
Dim doc As HTMLDocument
ie.Visible = False
ie.navigate "http://uk.investing.com/commodities/gold"
Do
DoEvents
Loop Until ie.READYSTATE = READYSTATE_COMPLETE
Set doc = ie.document
output = doc.GetElementById("last_last").innerText
Range("A1").Value = output
ie.Quit
ie.navigate "http://uk.investing.com/commodities/silver"
Set doc = ie.document
output = doc.GetElementById("last_last").innerText
Range("A2").Value = output
ie.Quit
End Sub
Please could somebody help me figure out how to get this to work for multiple pages? I have tried searching however have not come up with anything that suits my needs.
Also is it possible to get something to pop up saying something like "Waiting...." whilst the data is being collected?
Thanks

I've found that using READYSTATE is not reliable in the sense that the document hasn't fully loaded sometimes - or at least the object model hasn't loaded.
So I typically add a sleep command and Doevents before trying to access the new doc object
This should work for you (and as #Dave says, you don't need to use IE.Quit)
Private Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
Sub Extractdatafromwebsite()
Dim ie As New InternetExplorer
Dim doc As HTMLDocument
ie.Visible = False
ie.Navigate "http://uk.investing.com/commodities/gold"
Do
Sleep 500
DoEvents
Loop Until ie.ReadyState = 4 ' READYSTATE_COMPLETE
Sleep 500
Set doc = ie.Document
output = doc.GetElementById("last_last").innerText
Range("A1").Value = output
ie.Navigate "http://uk.investing.com/commodities/silver"
Do
Sleep 500
DoEvents
Loop Until ie.ReadyState = 4 ' READYSTATE_COMPLETE
Sleep 500
Set doc = ie.Document
output = doc.GetElementById("last_last").innerText
Range("A2").Value = output
ie.Quit
Set ie = Nothing
End Sub

Related

VBA DOM getElementsBy can't get childnodes

I'm trying to get the innertext of a label but i'm getting an error. Through the console i'm succesfully getting the inner text with this script :
document.getElementsByClassName("item alt")[0].childNodes[2].childNodes[0].innerText
Element i'm trying to get :
<tr class="item alt" data-id="1376936"><td class="toolbar left"><span class="ui-icon ui-icon-triangle-1-e"></span></td><td class="time">14:00</td><td class="status"><span class="status-1 rc">FT</span>
My VBA script :
Sub WebScraping()
Dim ie As InternetExplorer
Dim html As HTMLDocument
Set ie = New InternetExplorer
ie.Visible = True
ie.navigate "https://www.whoscored.com/Regions/74/Tournaments/22/Seasons/7814/Stages/17593/Fixtures/France-Ligue-1-2019-2020"
Do While ie.readyState <> READYSTATE_COMPLETE
Application.StatusBar = "Trying to go to Whoscored ..."
DoEvents
Loop
Set doc = ie.document
Do While ie.readyState <> READYSTATE_COMPLETE
Application.StatusBar = "Trying to go to Whoscored ..."
DoEvents
Loop
Set a = doc.getElementsByClassName("item alt")(0).ChildNodes(2).ChildNodes(0).innerText
MsgBox (a)
End Sub
Set a = doc.getElementsByClassName("item alt")(0).ChildNodes(2).ChildNodes(0).innerText
Try to use the getElementsByClasssName method to find the child node, please modify above code as below:
Dim a As String
a = doc.getElementsByClassName("item alt")(0).getElementsByClassName("status")(0).getElementsByClassName("status-1")(0).innerText
MsgBox (a)
The first line in every module should be Option Explicit.
I'm not sure what you want at all. But to show the wanted element use this:
Sub WebScraping()
Dim ie As InternetExplorer
Dim doc As HTMLDocument
Dim a As Object
Set ie = New InternetExplorer
ie.Visible = True
ie.navigate "https://www.whoscored.com/Regions/74/Tournaments/22/Seasons/7814/Stages/17593/Fixtures/France-Ligue-1-2019-2020"
Do While ie.readyState <> READYSTATE_COMPLETE
Application.StatusBar = "Trying to go to Whoscored ..."
DoEvents
Loop
Application.StatusBar = False
Set doc = ie.document
Set a = doc.getElementsByClassName("item alt")(0).getElementsByClassName("status")(0)
MsgBox a.innerText
End Sub

VBA Internet Explorer - Object variables disappear

I am trying to learn web scraping with VBA and i am running into an issue in the most basic first step.
The problem (I think) is that after IEObject.Navigate is performed, all the variables of IEObject dissapear. So I get the error
"Run-time error '426': The remote server machine does not exist or is
unavailable"
on line 8. Any help is appreciated.
Code:
Sub VBAWeb()
Dim IEObject As InternetExplorer
Set IEObject = New InternetExplorer
IEObject.Visible = True
IEObject.Navigate URL:="https://google.com"
Do While IEObject.Busy = True Or IEObject.ReadyState <> READYSTATE_COMPLETE
Application.Wait Now + TimeValue("00:00:01")
Loop
Debug.Print IEObject.LocationURL
End Sub
I suggest you try to run code below on your side may help to fix the issue.
Sub demo()
Dim URL As String
Dim IE As Object
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
URL = "https://www.microsoft.com"
IE.Navigate URL
Do While IE.ReadyState = 4: DoEvents: Loop
Do Until IE.ReadyState = 4: DoEvents: Loop
Debug.Print IE.LocationURL
Set IE = Nothing
End Sub
You can see that the IE object was created differently. Also, the loop is a little bit different.

Extract html source code into excel using VBA

I am trying to simply paste the content or innertext into excel using getElementByID function.
The content is actually the iframe link which I am trying to extract it and paste into cell.
The photo shown is the html source code.
Sub GetData()
Dim ie As New SHDocVw.InternetExplorer
Dim htmldoc As MSHTML.HTMLDocument
Dim result As MSHTML.IHTMLElement
ie.Visible = True
ie.navigate "http://www.bursamalaysia.com/market/listed-companies/company-announcements/5925865"
Do While ie.readyState <> READYSTATE_COMPLETE
Loop
Application.Wait (Now() + TimeValue("00:00:016")) ' For internal page refresh or loading
Set htmldoc = ie.document
Set Results = HTML.getElementById("bm_ann_detail_iframe")
Sheets("Sheet1").Range("a1").Value = Results.innerText
End Sub
html source code
You should use consistent variable naming in your code. If you put Option Explicit at the top of your code that will help.
You want to access the src attribute of the iframe to get the URL shown.
If you plan to use the new URL then you actually want the part before the "#". This means changing to:
ThisWorkbook.Worksheets("Sheet1").Range("A1").Value = Split(ie.document.getElementById("bm_ann_detail_iframe").src, "#")(0)
Code:
Option Explicit
Public Sub GetData()
Dim ie As New SHDocVw.InternetExplorer
ie.Visible = True
ie.navigate "http://www.bursamalaysia.com/market/listed-companies/company-announcements/5925865"
While ie.Busy Or ie.readyState < 4: DoEvents: Wend
ThisWorkbook.Worksheets("Sheet1") = ie.document.getElementById("bm_ann_detail_iframe").src
ie.Quit
End Sub

Excel VB Macro to scrape webpage. Can't code to click html button

I have a short excel macro that is designed to:
1) Open Internet Explorer and navigate to "http://www.puco.ohio.gov/pucogis/address/search.cfm"
2) Fill out a form on that site with data from the excel workbook
3) Click a button to submit the form
4) Scrape some innertext from the website and place it in a cell in the workbook
5) Close Internet Explorer
I can not get step 3 to work. That is, I can not get the click/submit function to work with this website. When the button is clicked the website populates with information specific to the information entered in the form. Everything else in the code is working. I have searched for an answer and tried the submit verses click approach with no luck.
Thanks for you help.
Code below:
Private Sub SiteData()
Dim ie As Object
Dim utility As Variant
Dim HTMLButton
Set ie = CreateObject("InternetExplorer.Application")
ie.Navigate "http://www.puco.ohio.gov/pucogis/address/search.cfm"
ie.Visible = True
While ie.Busy
DoEvents
Wend
ie.Document.all("address").Value = ThisWorkbook.Sheets("Site Info").Range("D14")
While ie.Busy
DoEvents
Wend
Set HTMLButton = ie.Document.getElementsByTagName("input")(1)
HTMLButton.Click
While ie.Busy
DoEvents
Wend
Set utility = ie.Document.getElementById("supName")
ThisWorkbook.Sheets("Site Info").Range("D50") = utility.innerText
ie.Quit
Set ie = Nothing
End Sub
Try this solution, which I found from this answer to a similar question. That answer was not accepted, but I have tested this with your code and seems to be working.
Private Sub SiteData()
Dim ie As Object
Dim utility As Variant
Dim HTMLButton
Set ie = CreateObject("InternetExplorer.Application")
ie.Navigate "http://www.puco.ohio.gov/pucogis/address/search.cfm"
ie.Visible = True
While ie.Busy
DoEvents
Wend
ie.Document.all("address").Value = ThisWorkbook.Sheets("Site Info").Range("D14")
While ie.Busy
DoEvents
Wend
Call ie.Document.parentWindow.execScript("codeAddress()")
While ie.Busy
DoEvents
Wend
Set utility = ie.Document.getElementById("supName")
ThisWorkbook.Sheets("Site Info").Range("D50") = utility.innerText
ie.Quit
Set ie = Nothing
End Sub
If you don't know or can't reasonably anticipate the function call codeAddress(), then you can try something like this to derive it from the button's onclick property:
Dim fn$
fn = HTMLButton.onclick
fn = Mid(fn, InStr(fn, "{"))
fn = Trim(Replace(Replace(Replace(fn, "{", vbNullString), "}", vbNullString), vbLf, vbNullString))
Call ie.Document.parentWindow.execScript(fn)
You can call the JavaScript directly. try this it will work
Instead of:
Set HTMLButton = ie.Document.getElementsByTagName("input")(2)
HTMLButton.Click
use
ie.Document.parentWindow.execScript code:="codeAddress()"
note that IE may prompt you to confirm every run so you may need to
stop showing this message for smooth operation
Private Sub CommandButton1_Click()
Dim ie As Object
Dim utility As Variant
Dim HTMLButton
Set ie = CreateObject("InternetExplorer.Application")
ie.Navigate "http://www.puco.ohio.gov/pucogis/address/search.cfm"
ie.Visible = True
While ie.Busy
DoEvents
Wend
ie.Document.all("address").Value = ThisWorkbook.Sheets("Site Info").Range("D14")
While ie.Busy
DoEvents
Wend
ie.Document.parentWindow.execScript code:="codeAddress()"
'Set HTMLButton = ie.Document.getElementsByTagName("input")(2)
'HTMLButton.Click
While ie.Busy
DoEvents
Wend
Set utility = ie.Document.getElementById("supName")
ThisWorkbook.Sheets("Site Info").Range("D16") = utility.innerText
ie.Quit
Set ie = Nothing
End Sub
thanks also to this article helped me to solve your problem
How to find and call javascript method from vba

Trying to extract ONE value from a webpage with VBA in Excel

I've been trying to find the information now for a couple of days, but all the examples I've found just has a small piece of the code, I need it all =)
What I want to do is to extract one value from a homepage and put it into a cell in Excel
(and then take another value from another page on the same site and put in the next cell etc etc.)
The page is a swedish stock-exchange page, and the page I've used as a test-page is the stock for "Investor B" (https://www.avanza.se/aktier/om-aktien.html/5247/investor-b)
And the value I'm interested in is the one called "Senaste" (this is the page-information surrounding it)
<li>
<span class="XSText">Senast<br/></span>
<span class="lastPrice SText bold"><span class="pushBox roundCorners3" title="Senast uppdaterad: 17:29:59">248,60</span></span>
</li>
And it's the value 248,60 I'm after!
I got some coding experience, but not for VBA-scripting, after reading some forum-posts (mostly here), I've been trying out a few example by myself, but couldn't get any to work.
Since I'm quite basic with VBA, I might have got the structure wrong, so please be basic and patient with me, this was my test, but I got "Runtime error 429"
ActiveX component can't create object
I might be totally on the wrong track
Private Sub CommandButton1_Click()
Dim ie As Variant
Set ie = CreateObject("InternetExplorer")
ie.navigate "https://www.avanza.se/aktier/om-aktien.html/5247/investor-b"
ie.Visible = True
Do
DoEvents
Loop Until ie.readyState = READYSTATE_COMPLETE
Application.Wait (Now() + TimeValue("00:00:016")) ' For internal page refresh or loading
Dim doc As Variant 'variable for document or data which need to be extracted out of webpage
Set doc = CreateObject("HTMLDocument")
Set doc = ie.document
Dim dd As Variant
dd = doc.getElementsByClassName("lastPrice SText bold")(0).innerText
MsgBox dd
End Sub
EDIT: 2014-05-12 Current code beeing tested 17:05
under the button command
Private Sub CommandButton1_Click()
Dim IE As Object
' Create InternetExplorer Object
Set IE = CreateObject("InternetExplorer.Application")
' You can uncoment Next line To see form results
IE.Visible = False
' Send the form data To URL As POST binary request
IE.Navigate "https://www.avanza.se/aktier/om-aktien.html/5247/investor-b"
' Statusbar
Application.StatusBar = "Loading, Please wait..."
' Wait while IE loading...
'Do While IE.Busy
' Application.Wait DateAdd("s", 1, Now)
'Loop
'this should go from ready-busy-ready
IEWait IE
Application.StatusBar = "Searching for value. Please wait..."
' Dim Document As HTMLDocument
' Set Document = IE.Document
Dim dd As Variant
dd = IE.Document.getElementsByClassName("lastPrice SText bold")(0).innerText
MsgBox dd
' Show IE
IE.Visible = True
' Clean up
Set IE = Nothing
Set objElement = Nothing
Set objCollection = Nothing
Application.StatusBar = ""
End Sub
And in module1
Public Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
Public Function IEWait(p_ieExp As InternetExplorer)
'this should go from ready-busy-ready
Dim initialReadyState As Integer
initialReadyState = p_ieExp.ReadyState
'wait 250 ms until it's done
Do While p_ieExp.Busy Or p_ieExp.ReadyState <> READYSTATE_COMPLETE
Sleep 250
Loop
End Function
As said earlier, I do not know if I got the structure right with this latest add-in, not to expired in this kind of coding I'm afraid.
Best Regards
Stop editing 2014-05-12 17:08
You are close but have a couple small errors.
Here is how I would set it up (Tested):
Private Sub CommandButton1_Click()
Dim IE As Object
' Create InternetExplorer Object
Set IE = CreateObject("InternetExplorer.Application")
' You can uncoment Next line To see form results
IE.Visible = False
' URL to get data from
IE.Navigate "https://www.avanza.se/aktier/om-aktien.html/5247/investor-b"
' Statusbar
Application.StatusBar = "Loading, Please wait..."
' Wait while IE loading...
Do While IE.Busy
Application.Wait DateAdd("s", 1, Now)
Loop
Application.StatusBar = "Searching for value. Please wait..."
Dim dd As String
dd = IE.Document.getElementsByClassName("lastPrice SText bold")(0).innerText
MsgBox dd
' Show IE
IE.Visible = True
' Clean up
Set IE = Nothing
Application.StatusBar = ""
End Sub
Results:
Tested in Excel 2010 with the following references:
Edit - Option B
To get rid of a possible "Run-Time Error '91'" try changing a few lines like this:
Dim dd As Variant
Set dd = IE.Document.getElementsByClassName("lastPrice SText bold")
MsgBox dd(0).textContent
Edit - Option C
Yet another way to get elements:
Dim tag
Dim tags As Object
Set tags = IE.Document.getElementsByTagName("*")
For Each tag In tags
If tag.className = "lastPrice SText bold" Then
MsgBox tag.innerText
Exit For
End If
Next tag
(All three methods have been tested on Excel 2010 and IE10)
I just wanted to add the code I'm currently running which works perfectly fine at the moment, if people run into the same problem. This is to get two values into dedicated cells.
Private Sub CommandButton10_Click()
Dim IE As Object
Dim dd As Variant
' Create InternetExplorer Object
Set IE = GetObject("new:{D5E8041D-920F-45e9-B8FB-B1DEB82C6E5E}")
IE.Visible = False
' Send the form data To URL As POST binary request
IE.Navigate "https://www.avanza.se/aktier/om-aktien.html/52476/alk-abell-b"
Application.StatusBar = "Loading, Please wait..."
IEWait IE
Application.StatusBar = "Searching for value. Please wait..."
dd = IE.Document.getElementsByClassName("lastPrice SText bold")(0).innerText
Range("Y2").Value = dd
IE.Navigate "https://www.avanza.se/aktier/om-aktien.html/52380/alm--brand"
Application.StatusBar = "Loading, Please wait..."
IEWait IE
Application.StatusBar = "Searching for value. Please wait..."
dd = IE.Document.getElementsByClassName("lastPrice SText bold")(0).innerText
Range("Y3").Value = dd
' Clean up
Set IE = Nothing
Set objElement = Nothing
Set objCollection = Nothing
Application.StatusBar = ""
End Sub
If one wants more data, it is just to copy the part starting with
IE.Navigate "https://www.pagewhereyourdatayouwanttoextractis.com"
and stops with
Range("Y2").Value = dd
It is ofcourse based if the page you want to extract data from has a similiar structure to the one above.
Hope this can help some people out there.
Best Regards

Resources