Scraping current date from website using Excel VBA - excel

Error
Librarys
I need the date of the current day. I do not want to place it inside a variable to be able to have it work, instead I would like that variable to be Date or in its default String.
Sub WEB()
Dim IE As Object
Dim allelements As Object
Application.ScreenUpdating = False
Set IE = CreateObject("InternetExplorer.Application")
IE.navigate "http://www.fechadehoy.com/venezuela"
Do Until IE.ReadyState = 4
DoEvents
Loop
Application.Wait (Now + TimeValue("0:00:01"))
IE.document.getElementById ("date")
IE.Visible = True
Set IE = Nothing
Application.ScreenUpdating = True
End Sub
The website is http://www.fechadehoy.com/venezuela
I only need the date of this page. I am not interested in any other element of the macro.
I just need to extract the current date and get it in a variable.

if you need Lunes, 19 de agosto de 2019 then use getElementById for fecha
Debug.Print IE.document.getElementById("fecha").Innerhtml

Why go for IE when xhr can do the trick? You can get the date with the blink of an eye if you opt for XMLHttpRequest.
Sub GetCurrentDate()
Dim S$
With New XMLHTTP
.Open "GET", "http://www.fechadehoy.com/venezuela", False
.send
S = .responseText
End With
With New HTMLDocument
.body.innerHTML = S
MsgBox .getElementById("fecha").innerText
End With
End Sub
Reference to add:
Microsoft XML, v6.0
Microsoft HTML Object Library
To get rid of that reference altogether:
Sub GetCurrentDate()
Dim S$
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "http://www.fechadehoy.com/venezuela", False
.send
S = .responseText
End With
With CreateObject("htmlfile")
.body.innerHTML = S
MsgBox .getElementById("fecha").innerText
End With
End Sub

Although the answer given by #Siddharth Rout is perfectly fine, it would require quite a bit of string manipulation to get the date in a usable form.
For the above reason I'm providing an alternative solution which gets the date in a directly usable format, ready to be manipulated and used in further calculations if necessary.
As a bonus I am demonstrating how to get the date using an HTTP request instead of using the Internet Explorer. This makes the code more efficient.
Option Explicit
Sub getDate()
Dim req As New WinHttpRequest
Dim doc As New HTMLDocument
Dim el As HTMLParaElement
Dim key As String
Dim url As String
Dim retrievedDate As Date
url = "http://www.fechadehoy.com/venezuela"
key = "Fecha actual: "
''''''''''Bonus: Use an HTTP request to get the date instead of opening IE'''''''''''
With req '
.Open "GET", url, False '
.send '
doc.body.innerHTML = .responseText '
'Debug.Print .responseText '
End With '
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
For Each el In doc.getElementsByTagName("p")
If el.innerText Like "Fecha actual*" Then
retrievedDate = Mid(el.innerText, InStr(el.innerText, key) + Len(key), Len(el.innerText))
End If
Next el
End Sub
You will need to add a reference to Microsoft HTML Object Libraryand Microsoft WinHTTP Services version 5.1. To do that, go to VB editor>Tools>References.
Having the date in this format, means it can easily be manipulated. An example would be the use of functions like day(retrievedDate) , month(retrievedDate), year(retrievedDate) etc.

Related

How can we use http.Open "GET" to list items from a table in HTML?

I'm testing an idea that I had. It seems like I should be able to scrape out various HTML elements from a table in a website, but my code can't seem to find the table, which definitely seems to be there.
Sub TryThis()
Dim oHtml As HTMLDocument
Dim oElement As Object
Set oHtml = New HTMLDocument
With CreateObject("WINHTTP.WinHTTPRequest.5.1")
.Open "GET", "https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population", False
.send
oHtml.body.innerHTML = .responseText
End With
Set myitem = oHtml.getElementsByClassName("wikitable sortable jquery-tablesorter")
i = 0
For Each oElement In myitem
Sheets("Sheet1").Range("A" & i + 1) = myitem(i).innerText
i = i + 1
Next oElement
End Sub
Essentially, I would like to loop through HTML items, print out, in cells, what is in the table named 'wikitable sortable jquery-tablesorter' Here is a screen shot that may help.
You were really close, I think the issue is the jquery-tablesorter class is being added by jQuery (or plugin) after the page is loaded via JS. So that class isn't present in the DOM when the content is pulled in by the web request, it's added after. So removing it from the search criteria, should fix the issue.
Here's what I came up to address this, and also move the table contents over a bit quicker. I just did the first instance of wikitable sortable classes, but should be possible to loop each table too.
Sub TryThis()
Dim oHtml As HTMLDocument
Dim oElement As Object
Dim htmlText As String
Set oHtml = New HTMLDocument
With CreateObject("WINHTTP.WinHTTPRequest.5.1")
.Open "GET", "https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population", False
.send
oHtml.body.innerHTML = .responseText
End With
htmlText = oHtml.getElementsByClassName("wikitable sortable")(0).outerhtml
With CreateObject("new:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}") 'Clipboard
.SetText htmlText
.PutInClipboard
Sheets(1).Range("A1").Select
Sheets(1).PasteSpecial Format:="Unicode Text"
End With
End Sub

Select Item in a dropdown from website via Excel Macro

I would like to select the options "Addition, Bulk, Reduction" using excel VBA
This what I have so far, but nothing is being selected.
Dim ie As InternetExplorer
Set ie = New InternetExplorer
ie.Visible = True
ie.navigate "my URL"
Do While ie.readyState <> READYSTATE_COMPLETE
DoEvents
Loop
'time_adjust_group_ident = Reduction
Dim doc As HTMLDocument
Set doc = ie.document
doc.getElementById("time_adjust_group_ident").Value = "Reduction"
End Sub
You don't need Internet Explorer object for this. Please take a look in the code below where I use MSXML2.XMLHTTP to make a HTTP request and get the HTML response as a string, and then parse it using the HTMLFile object.
I'm using the CreateObject method instead of adding the references via Tools > References, so you can run this code anywhere without having to add references manually every time you open this in a different machine.
In this example, I'm retrieving the children elements of the language-selector dropdown in a given website, and looping through it using a For Each to write each child element's content in a spreadsheet row.
Sub LoadHtml()
Dim strUrl As String
strUrl = "https://developer.mozilla.org/en-US/docs/Web/HTML/Element/select"
Dim httpRequest As Object
Set httpRequest = CreateObject("MSXML2.XMLHTTP")
With httpRequest
.Open "GET", strUrl, False
.send
End With
Dim html As Object
Set html = CreateObject("HTMLFile")
html.body.innerHTML = httpRequest.ResponseText
Dim child As Object
Dim row As Integer
row = 1
For Each child In html.getElementById("language-selector").Children
Range("A" & row) = child.innerText
row = row + 1
Next
End Sub

Get custom element via screenscraping with VBA

I would like to screenscrape some prices from yahoo finance for some stocks in my excel sheet.
My approach is to use:
Function Scrape()
Dim appIE As Object
Set appIE = CreateObject("internetexplorer.application")
With appIE
.Navigate "https://de.finance.yahoo.com/quote/TSLA"
.Visible = True
End With
Do While appIE.Busy
DoEvents
Loop
Set data = appIE.document.getElementById("data-reactid") #this is the point where I'm stuck
End Function
The question I have is how to get the custom elements such as:
<span class="Trsdu(0.3s) Fw(b) Fz(36px) Mb(-4px) D(ib)" data-reactid="32">1.025,05</span>
The site seems to use reactid for every element which makes it easy to pinpoint the elements. How would I go by doing that for the above example data-reactid="32"
Thanks
You can try using xhr because the content you look for is available in the page source. This is one of the efficient ways how you can go:
Public Sub GetPrice()
Const Url$ = "https://de.finance.yahoo.com/quote/TSLA"
Dim S$, itemPrice$
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", Url, False
.send
S = .responseText
End With
With CreateObject("HTMLFile")
.write S
itemPrice = .getElementById("quote-market-notice").ParentNode.FirstChild.innerText
MsgBox itemPrice
End With
End Sub

Why does my code to Scrape Text using VBA works in Debug only

I have written some code to scrape specific dates from Google's patent website. After reviewing lots of examples I figured out the getElementsByClassName that gets the date I need. The code below works when I step through in debug mode and generates the desired MsgBox. But when I run it, it gives me "Run-time error '91': Object variable or With block variable not set."
I have added delays wherever I thought that might be an issue. I have also disassociated the code from any interaction with the Excel spreadsheet where I would ultimately put the date, just to make it as simple as possible. I've also copied the code from the original spreadsheet to a new blank one, but same issue.
Any help would be appreciated.
Sub Get_Date()
Dim ie As InternetExplorer
Dim sURL As String
Dim strGrant As Variant
Set ie = New InternetExplorer
sURL = "https://patents.google.com/patent/US6816842B1/en?oq=6816842"
ie.navigate sURL
ie.Visible = False
Do While ie.Busy Or ie.ReadyState < 4
DoEvents
Loop
strGrant = ie.document.getElementsByClassName("granted style-scope application-timeline")(0).innerText
Do While ie.Busy Or ie.ReadyState < 4
DoEvents
Loop
MsgBox strGrant
ie.Quit
End Sub
````
It's likely a timing issue as per my comment. That's dealt with in other answers to similar questions. Main things to consider are:
Use proper page load waits: While IE.Busy Or ie.readyState < 4: DoEvents: Wend
Possibly a timed loop to attempt to set the element to a variable then testing if set.
Alternatively, a bit of a punt but it seems that all granted dates are the same as publication dates (patent publication date). If this is true then you can use xhr to get the publication date
Option Explicit
Public Sub GetDates()
Dim html As HTMLDocument, i As Long, patents()
patents = Array("US7724240", "US6876312", "US8259073", "US7523862", "US6816842B1")
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
For i = LBound(patents) To UBound(patents)
.Open "GET", "https://patents.google.com/patent/" & patents(i) & "/en?oq=" & patents(i), False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
html.body.innerHTML = .responseText
If html.querySelectorAll("[itemprop=publicationDate]").length > 0 Then
Debug.Print html.querySelector("[itemprop=publicationDate]").DateTime
End If
Next
End With
End Sub

VBA post request with formdata (URL doesnt change)

Ive been going through many similar questions, like this and this but mine is much simpler.I want to change the date on a webform and get the data using POST request
I have this code which makes a POST request:
Sub winpost()
Dim WebClient As WinHttp.WinHttpRequest
Set WebClient = New WinHttp.WinHttpRequest
Dim searchResult As HTMLTextElement: Dim searchTxt As String
Dim html As New HTMLDocument
Dim Payload As String
Payload = "ContentPlaceHolder1_ddlday=6"
With WebClient
.Open "POST", "http://pib.nic.in/AllRelease.aspx", False
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
.send (Payload)
.waitForResponse
End With
html.body.innerHTML = WebClient.responseText
Set searchResult = html.querySelector(".search_box_result"): searchTxt = searchResult.innerText
Debug.Print searchTxt
End Sub
The website is this.The page sends a post request onchange of any fields.
On looking at ChromeDevTools under network > Formdata section i see this:
ctl00$ContentPlaceHolder1$ddlday: 8
I have tried various versions of this in the Payload string.But it always returns the same page (8th jan).
Internet Explorer
With IE slightly different syntax from selenium basic (shown at bottom) as no SelectByText option. You can use indices or attribute = value css selectors for example. Here months are indices upto 12 instead of month names
Option Explicit
Public Sub SetDates()
Dim ie As New InternetExplorer
With ie
.Visible = True
.Navigate2 "http://pib.nic.in/AllRelease.aspx"
While .Busy Or .readyState < 4: DoEvents: Wend
With .Document
.querySelector("#btnSave").Click
.querySelector("#ContentPlaceHolder1_ddlMonth [value='2']").Selected = True
.querySelector("#ContentPlaceHolder1_ddlYear [value='2018']").Selected = True
.querySelector("#ContentPlaceHolder1_ddlday [value='2']").Selected = True
End With
Stop '<==delete me later
.Quit
End With
End Sub
Selenium basic:
If you do go down the selenium basic vba route you can do something like as follows. Note: You would need to go VBE > Tools > References > Add reference to selenium type library after installing selenium. You would also need latest Chrome and ChromeDriver and ChromeDriver folder should be placed on environmental path or chromedriver placed in folder containing selenium executables.
Option Explicit
Public Sub SetDates()
Dim d As WebDriver
Set d = New ChromeDriver
Const Url = "http://pib.nic.in/AllRelease.aspx"
With d
.Start "Chrome"
.get Url
.FindElementById("btnSave").Click
'date values
.FindElementById("ContentPlaceHolder1_ddlMonth").AsSelect.SelectByText "February"
.FindElementById("ContentPlaceHolder1_ddlYear").AsSelect.SelectByText "2018"
.FindElementById("ContentPlaceHolder1_ddlday").AsSelect.SelectByText "2"
Stop 'delete me later
.Quit
End With
End Sub

Resources