Get custom element via screenscraping with VBA - excel

I would like to screenscrape some prices from yahoo finance for some stocks in my excel sheet.
My approach is to use:
Function Scrape()
Dim appIE As Object
Set appIE = CreateObject("internetexplorer.application")
With appIE
.Navigate "https://de.finance.yahoo.com/quote/TSLA"
.Visible = True
End With
Do While appIE.Busy
DoEvents
Loop
Set data = appIE.document.getElementById("data-reactid") #this is the point where I'm stuck
End Function
The question I have is how to get the custom elements such as:
<span class="Trsdu(0.3s) Fw(b) Fz(36px) Mb(-4px) D(ib)" data-reactid="32">1.025,05</span>
The site seems to use reactid for every element which makes it easy to pinpoint the elements. How would I go by doing that for the above example data-reactid="32"
Thanks

You can try using xhr because the content you look for is available in the page source. This is one of the efficient ways how you can go:
Public Sub GetPrice()
Const Url$ = "https://de.finance.yahoo.com/quote/TSLA"
Dim S$, itemPrice$
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", Url, False
.send
S = .responseText
End With
With CreateObject("HTMLFile")
.write S
itemPrice = .getElementById("quote-market-notice").ParentNode.FirstChild.innerText
MsgBox itemPrice
End With
End Sub

Related

How can we use http.Open "GET" to list items from a table in HTML?

I'm testing an idea that I had. It seems like I should be able to scrape out various HTML elements from a table in a website, but my code can't seem to find the table, which definitely seems to be there.
Sub TryThis()
Dim oHtml As HTMLDocument
Dim oElement As Object
Set oHtml = New HTMLDocument
With CreateObject("WINHTTP.WinHTTPRequest.5.1")
.Open "GET", "https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population", False
.send
oHtml.body.innerHTML = .responseText
End With
Set myitem = oHtml.getElementsByClassName("wikitable sortable jquery-tablesorter")
i = 0
For Each oElement In myitem
Sheets("Sheet1").Range("A" & i + 1) = myitem(i).innerText
i = i + 1
Next oElement
End Sub
Essentially, I would like to loop through HTML items, print out, in cells, what is in the table named 'wikitable sortable jquery-tablesorter' Here is a screen shot that may help.
You were really close, I think the issue is the jquery-tablesorter class is being added by jQuery (or plugin) after the page is loaded via JS. So that class isn't present in the DOM when the content is pulled in by the web request, it's added after. So removing it from the search criteria, should fix the issue.
Here's what I came up to address this, and also move the table contents over a bit quicker. I just did the first instance of wikitable sortable classes, but should be possible to loop each table too.
Sub TryThis()
Dim oHtml As HTMLDocument
Dim oElement As Object
Dim htmlText As String
Set oHtml = New HTMLDocument
With CreateObject("WINHTTP.WinHTTPRequest.5.1")
.Open "GET", "https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population", False
.send
oHtml.body.innerHTML = .responseText
End With
htmlText = oHtml.getElementsByClassName("wikitable sortable")(0).outerhtml
With CreateObject("new:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}") 'Clipboard
.SetText htmlText
.PutInClipboard
Sheets(1).Range("A1").Select
Sheets(1).PasteSpecial Format:="Unicode Text"
End With
End Sub

VBA HTML elements to Excel

I am working on a code that uses VBA-Excel to navigate to a website and copy some values to Excel.
I can open the website and navigate, but I can't save the "Precipitation" values in excel sheet
Sub accuweather()
Dim ie As InternetExplorer
Dim pagePiece As Object
Dim webpage As HTMLDocument
Set ie = New InternetExplorer
ie.Visible = True
ie.navigate ("http://www.accuweather.com/en/pt/abadia/869773/daily-weather-forecast/869773?day=2")
Do While ie.readyState = 4: DoEvents: Loop
Do Until ie.readyState = 4: DoEvents: Loop
While ie.Busy
DoEvents
Wend
Set webpage = ie.document
Set mtbl = webpage.getElementsByTagName("details-card card panel details allow-wrap")
Set table_data = mtbl.getElementsByTagName("div")(1)
For itemNum = 1 To 240
For childNum = 0 To 5
Cells(itemNum, childNum + 1) = table_data.Item(itemNum).Children(childNum).innerText
Next childNum
Next itemNum
ie.Quit
Set ie = Nothing
End Sub
The method you are using is getElementsByTagName but the reference is for a multi-valued class. So the correct method would be getElementsByClassName.
However, you don't need the browser as that content is static and you can just use faster xmlhttp request and a single (more robust and faster) class to target.
This
html.querySelectorAll(".list")
is retrieving the two parent nodes which have the various p tag children. The first child in both cases
.Item(i).FirstChild
is the precipitation info.
Option Explicit
Public Sub GetPrecipitationValues()
Dim html As MSHTML.HTMLDocument, i As Long
Set html = New MSHTML.HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.accuweather.com/en/pt/abadia/869773/daily-weather-forecast/869773?day=2", False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
html.body.innerHTML = .responseText
End With
With html.querySelectorAll(".list")
For i = 0 To .Length - 1
Debug.Print .Item(i).FirstChild.innerText
Next
End With
End Sub

Scraping current date from website using Excel VBA

Error
Librarys
I need the date of the current day. I do not want to place it inside a variable to be able to have it work, instead I would like that variable to be Date or in its default String.
Sub WEB()
Dim IE As Object
Dim allelements As Object
Application.ScreenUpdating = False
Set IE = CreateObject("InternetExplorer.Application")
IE.navigate "http://www.fechadehoy.com/venezuela"
Do Until IE.ReadyState = 4
DoEvents
Loop
Application.Wait (Now + TimeValue("0:00:01"))
IE.document.getElementById ("date")
IE.Visible = True
Set IE = Nothing
Application.ScreenUpdating = True
End Sub
The website is http://www.fechadehoy.com/venezuela
I only need the date of this page. I am not interested in any other element of the macro.
I just need to extract the current date and get it in a variable.
if you need Lunes, 19 de agosto de 2019 then use getElementById for fecha
Debug.Print IE.document.getElementById("fecha").Innerhtml
Why go for IE when xhr can do the trick? You can get the date with the blink of an eye if you opt for XMLHttpRequest.
Sub GetCurrentDate()
Dim S$
With New XMLHTTP
.Open "GET", "http://www.fechadehoy.com/venezuela", False
.send
S = .responseText
End With
With New HTMLDocument
.body.innerHTML = S
MsgBox .getElementById("fecha").innerText
End With
End Sub
Reference to add:
Microsoft XML, v6.0
Microsoft HTML Object Library
To get rid of that reference altogether:
Sub GetCurrentDate()
Dim S$
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "http://www.fechadehoy.com/venezuela", False
.send
S = .responseText
End With
With CreateObject("htmlfile")
.body.innerHTML = S
MsgBox .getElementById("fecha").innerText
End With
End Sub
Although the answer given by #Siddharth Rout is perfectly fine, it would require quite a bit of string manipulation to get the date in a usable form.
For the above reason I'm providing an alternative solution which gets the date in a directly usable format, ready to be manipulated and used in further calculations if necessary.
As a bonus I am demonstrating how to get the date using an HTTP request instead of using the Internet Explorer. This makes the code more efficient.
Option Explicit
Sub getDate()
Dim req As New WinHttpRequest
Dim doc As New HTMLDocument
Dim el As HTMLParaElement
Dim key As String
Dim url As String
Dim retrievedDate As Date
url = "http://www.fechadehoy.com/venezuela"
key = "Fecha actual: "
''''''''''Bonus: Use an HTTP request to get the date instead of opening IE'''''''''''
With req '
.Open "GET", url, False '
.send '
doc.body.innerHTML = .responseText '
'Debug.Print .responseText '
End With '
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
For Each el In doc.getElementsByTagName("p")
If el.innerText Like "Fecha actual*" Then
retrievedDate = Mid(el.innerText, InStr(el.innerText, key) + Len(key), Len(el.innerText))
End If
Next el
End Sub
You will need to add a reference to Microsoft HTML Object Libraryand Microsoft WinHTTP Services version 5.1. To do that, go to VB editor>Tools>References.
Having the date in this format, means it can easily be manipulated. An example would be the use of functions like day(retrievedDate) , month(retrievedDate), year(retrievedDate) etc.

getElementsBy() extract text

I'm really new to VBA and I've been trying to get the value below the Column "Impuesto".
I'm getting error 438. I still don't quite understand how to refer to a certain part of the web page.
Sub extract()
Dim myIE As Object
Dim myIEDoc As Object
Dim element As IHTMLElement
Set myIE = CreateObject("InternetExplorer.Application")
myIE.Visible = False
myIE.navigate "https://zonasegura1.bn.com.pe/TipoCambio/"
While myIE.Busy
DoEvents
Wend
Set myIEDoc = myIE.document
Range("B1") = myIEDoc.getElementsByID("movimiento")(0).getElementsByTagName("span")
End Sub
You need getElementsByClassName() not getElementsByID since the word movimiento is in <li class="movimiento bg"> Impuesto </li>
Range("B1") = myIEDoc.getElementsByClassName("movimiento")(0).getElementsByClassName("l2 valor")(0)
Edit:
Check out the tag if the tag name if <li>..</li> so you should getElementsByTagName("li")
Check out the tag if the tag contain id <li id="movimiento">..</li> so you should getElementByID("movimiento")
Check out the tag if the tag contain class <li class="movimiento">..</li> so you should getElementsByClassName("movimiento")
Try the below script. It should fetch you the data you are after. When the execution is done, you should find the value in Range("A1") in your spreadsheet.
Sub Get_Quote()
Dim post As Object
With CreateObject("InternetExplorer.Application")
.Visible = True
.navigate "https://zonasegura1.bn.com.pe/TipoCambio/"
While .Busy = True Or .readyState < 4: DoEvents: Wend
Set post = .document.querySelector(".movimiento span.l2.valor")
[A1] = post.innerText
.Quit
End With
End Sub
It is faster to use XMLHTTP request as follows:
Option Explicit
Public Sub GetInfo()
Dim sResponse As String, html As HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://zonasegura1.bn.com.pe/TipoCambio/", False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
sResponse = StrConv(.responseBody, vbUnicode)
End With
Set html = New HTMLDocument
With html
.body.innerHTML = sResponse
Debug.Print .querySelector(".movimiento .l2.valor").innerText
End With
End Sub

vba code to fetch data from website

I am a newbie in this website and in VBA programming as well. I am stuck into a problem where I have to fetch the data from this page. I need to have the hyperlink url of Check Rates 10 button. Can anyone help me with this problem.
I have done the following code:
Sub GetData()
Dim IE As New InternetExplorer
IE.navigate "http://www.kieskeurig.nl/zoeken/index.html?q=4960999543345"
IE.Visible = False
Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE
Application.Wait (Now() + TimeValue("00:00:016")) ' For internal page refresh or loading
Dim doc As HTMLDocument 'variable for document or data which need to be extracted out of webpage
Set doc = IE.document
Dim dd As Variant
dd = doc.getElementsByClassName("lgn")(0).outerHtml
'Range("a1").Value = dd
MsgBox dd
End Sub
In which I am getting text of the button but I want to have the value of the class. I think I am very close to the result but somehow cant reach to the goal...can anyone please help me...
Regards,
I think this is what you're looking for:
(Code modified slightly from Kyle's answer here)
Sub Test()
'Must have the Microsoft HTML Object Library reference enabled
Dim oHtml As HTMLDocument
Dim oElement As Object
Dim link As String
Set oHtml = New HTMLDocument
With CreateObject("WINHTTP.WinHTTPRequest.5.1")
.Open "GET", "http://www.kieskeurig.nl/zoeken/index.html?q=4960999543345", False
.Send
oHtml.Body.innerHTML = .responseText
End With
If InStr(1, oHtml.getElementsByClassName("lgn")(0).innerText, "Bekijk 10 prijzen") > 0 Then
link = Mid(oHtml.getElementsByClassName("lgn")(0).href, 7)
Debug.Print "http://www.kieskeurig.nl" & link
End If
End Sub
This code prints the URL to the immediate window. Hope that helps!
This works for me...
Sub GetData()
Set IE = CreateObject("InternetExplorer.Application")
my_url = "http://www.kieskeurig.nl/zoeken/index.html?q=4960999543345"
With IE
.Visible = True
.navigate my_url
.Top = 50
.Left = 530
.Height = 400
.Width = 400
Do Until Not IE.Busy And IE.readyState = 4
DoEvents
Loop
End With
Application.Wait (Now() + TimeValue("00:00:016")) ' For internal page refresh or loading
Set Results = IE.document.getElementsByTagName("a")
For Each itm In Results
If itm.classname = "lgn" Then
dd = itm.getAttribute("href")
Exit For
End If
Next
' if you wnat to click the link
itm.Click
' otherwise
'Range("a1").Value = dd
MsgBox dd
End Sub

Resources