VBA post request with formdata (URL doesnt change) - excel

Ive been going through many similar questions, like this and this but mine is much simpler.I want to change the date on a webform and get the data using POST request
I have this code which makes a POST request:
Sub winpost()
Dim WebClient As WinHttp.WinHttpRequest
Set WebClient = New WinHttp.WinHttpRequest
Dim searchResult As HTMLTextElement: Dim searchTxt As String
Dim html As New HTMLDocument
Dim Payload As String
Payload = "ContentPlaceHolder1_ddlday=6"
With WebClient
.Open "POST", "http://pib.nic.in/AllRelease.aspx", False
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
.send (Payload)
.waitForResponse
End With
html.body.innerHTML = WebClient.responseText
Set searchResult = html.querySelector(".search_box_result"): searchTxt = searchResult.innerText
Debug.Print searchTxt
End Sub
The website is this.The page sends a post request onchange of any fields.
On looking at ChromeDevTools under network > Formdata section i see this:
ctl00$ContentPlaceHolder1$ddlday: 8
I have tried various versions of this in the Payload string.But it always returns the same page (8th jan).

Internet Explorer
With IE slightly different syntax from selenium basic (shown at bottom) as no SelectByText option. You can use indices or attribute = value css selectors for example. Here months are indices upto 12 instead of month names
Option Explicit
Public Sub SetDates()
Dim ie As New InternetExplorer
With ie
.Visible = True
.Navigate2 "http://pib.nic.in/AllRelease.aspx"
While .Busy Or .readyState < 4: DoEvents: Wend
With .Document
.querySelector("#btnSave").Click
.querySelector("#ContentPlaceHolder1_ddlMonth [value='2']").Selected = True
.querySelector("#ContentPlaceHolder1_ddlYear [value='2018']").Selected = True
.querySelector("#ContentPlaceHolder1_ddlday [value='2']").Selected = True
End With
Stop '<==delete me later
.Quit
End With
End Sub
Selenium basic:
If you do go down the selenium basic vba route you can do something like as follows. Note: You would need to go VBE > Tools > References > Add reference to selenium type library after installing selenium. You would also need latest Chrome and ChromeDriver and ChromeDriver folder should be placed on environmental path or chromedriver placed in folder containing selenium executables.
Option Explicit
Public Sub SetDates()
Dim d As WebDriver
Set d = New ChromeDriver
Const Url = "http://pib.nic.in/AllRelease.aspx"
With d
.Start "Chrome"
.get Url
.FindElementById("btnSave").Click
'date values
.FindElementById("ContentPlaceHolder1_ddlMonth").AsSelect.SelectByText "February"
.FindElementById("ContentPlaceHolder1_ddlYear").AsSelect.SelectByText "2018"
.FindElementById("ContentPlaceHolder1_ddlday").AsSelect.SelectByText "2"
Stop 'delete me later
.Quit
End With
End Sub

Related

Data values of uploaded images

There are a large number of classes ("df-table") on that link homepage. The first class was able to be entered into the Excel sheet, but not the second class ("df-table"), the third class ("df-table"), etc.
Various internet resources say it can be solved by using nth-of-type or using xpath, but the error keeps occurring.
I want the data values of the uploaded images.
Public Sub Jaemu()
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
Dim d As WebDriver, ws As Worksheet, URL As String
Set d = New ChromeDriver
Set ws = ThisWorkbook.Worksheets("gemstone2")
Dim http As New WinHttpRequest
With d
'.AddArgument "--headless"
.Start "Chrome"
Dim html As HTMLDocument
Dim JsonObject As Object
Set html = New HTMLDocument
URL = "https://globalmonitor.einfomax.co.kr/infomax_ds.html#/USA/1/1"
.get URL, Raise:=False ' set raise to false to avoid a timeout error
d.FindElementByCss("[ng-click='openStockSearchPopup();']").Click
d.FindElementByCss("[ng-enter='searchStockSearchPopup(true);']").SendKeys "GOOGL"
d.FindElementByCss("[ng-click='searchStockSearchPopup(true);']").Click
d.FindElementByCss("[class='slick-cell l1 r1 text-center clickable']").Click
Cells(2, 1).Value = d.FindElementByCss("[class='df-table']").Text
Cells(3, 1).Value = d.FindElementByCss(".table-contents[ng-if='IS_RT_STATE_SUCCESS(requeststate.prospectData)'] > .df-table").Text
End With
End Sub
Original OP error:
runtime error 32.
Coding line.
Cells(3, 1).Value = d.FindElementByCss("[class='df-table' class:nth-of-type(2)]").Text
New error following initial suggestion to use different CSS selector:
Runtime error 7
Coding line.
Cells(3, 1).Value = d.FindElementByCss(".table-contents[ng-if='IS_RT_STATE_SUCCESS(requeststate.prospectData)'] > .df-table").Text
Initial error (RTE32):
The :nth-of-type() pseudo class selector would go outside of the attribute selector closing ] i.e. "[class='df-table']:nth-of-type(2)", however this does not provide a match.
You can use:
.table-contents[ng-if='IS_RT_STATE_SUCCESS(requeststate.prospectData)'] > .df-table
This returns a single match and is specific enough, and not reliant on a long selector list, that it is likely fairly stable. Based on experience.
If you had wanted to use :nth-of-type() you could have used it on a parent element and taken first match, but that is less stable and prone to breaking if html changes:
.contents-area:nth-of-type(5) .df-table
Follow-up error (RTE7):
The element needs to be scrolled into view.
A not ideal way is (your mileage may vary):
d.ExecuteScript "window.scrollTo(0, document.body.scrollHeight/3);"
Application.Wait Now + TimeSerial(0, 0, 2)
Activesheet.Cells(1,3) = d.FindElementByCss(".table-contents[ng-if='IS_RT_STATE_SUCCESS(requeststate.prospectData)'] > .df-table").Text
There are better ways with inbuilt functions and custom JS which you can likely pull from the examples.xlsm by the author on GitHub. As I can only test with Python this was a quick and dirty test that worked.
You could avoid overhead of browser and use XHR to get a JSON response to then use a JSON parser with, as follows:
Option Explicit
Public Sub GetData()
Dim s As String
With CreateObject("MSXML2.XMLHTTP")
.Open "POST", "https://globalmonitor.einfomax.co.kr/facset/getKeyData", False
.SetRequestHeader "User-Agent", "Mozilla/5.0"
.SetRequestHeader "content-type", "application/json;charset=UTF-8"
.send "{""param"":""NAS:AAPL""}"
s = .responsetext
End With
Debug.Print s
End Sub

Scraping current date from website using Excel VBA

Error
Librarys
I need the date of the current day. I do not want to place it inside a variable to be able to have it work, instead I would like that variable to be Date or in its default String.
Sub WEB()
Dim IE As Object
Dim allelements As Object
Application.ScreenUpdating = False
Set IE = CreateObject("InternetExplorer.Application")
IE.navigate "http://www.fechadehoy.com/venezuela"
Do Until IE.ReadyState = 4
DoEvents
Loop
Application.Wait (Now + TimeValue("0:00:01"))
IE.document.getElementById ("date")
IE.Visible = True
Set IE = Nothing
Application.ScreenUpdating = True
End Sub
The website is http://www.fechadehoy.com/venezuela
I only need the date of this page. I am not interested in any other element of the macro.
I just need to extract the current date and get it in a variable.
if you need Lunes, 19 de agosto de 2019 then use getElementById for fecha
Debug.Print IE.document.getElementById("fecha").Innerhtml
Why go for IE when xhr can do the trick? You can get the date with the blink of an eye if you opt for XMLHttpRequest.
Sub GetCurrentDate()
Dim S$
With New XMLHTTP
.Open "GET", "http://www.fechadehoy.com/venezuela", False
.send
S = .responseText
End With
With New HTMLDocument
.body.innerHTML = S
MsgBox .getElementById("fecha").innerText
End With
End Sub
Reference to add:
Microsoft XML, v6.0
Microsoft HTML Object Library
To get rid of that reference altogether:
Sub GetCurrentDate()
Dim S$
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "http://www.fechadehoy.com/venezuela", False
.send
S = .responseText
End With
With CreateObject("htmlfile")
.body.innerHTML = S
MsgBox .getElementById("fecha").innerText
End With
End Sub
Although the answer given by #Siddharth Rout is perfectly fine, it would require quite a bit of string manipulation to get the date in a usable form.
For the above reason I'm providing an alternative solution which gets the date in a directly usable format, ready to be manipulated and used in further calculations if necessary.
As a bonus I am demonstrating how to get the date using an HTTP request instead of using the Internet Explorer. This makes the code more efficient.
Option Explicit
Sub getDate()
Dim req As New WinHttpRequest
Dim doc As New HTMLDocument
Dim el As HTMLParaElement
Dim key As String
Dim url As String
Dim retrievedDate As Date
url = "http://www.fechadehoy.com/venezuela"
key = "Fecha actual: "
''''''''''Bonus: Use an HTTP request to get the date instead of opening IE'''''''''''
With req '
.Open "GET", url, False '
.send '
doc.body.innerHTML = .responseText '
'Debug.Print .responseText '
End With '
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
For Each el In doc.getElementsByTagName("p")
If el.innerText Like "Fecha actual*" Then
retrievedDate = Mid(el.innerText, InStr(el.innerText, key) + Len(key), Len(el.innerText))
End If
Next el
End Sub
You will need to add a reference to Microsoft HTML Object Libraryand Microsoft WinHTTP Services version 5.1. To do that, go to VB editor>Tools>References.
Having the date in this format, means it can easily be manipulated. An example would be the use of functions like day(retrievedDate) , month(retrievedDate), year(retrievedDate) etc.

Open IE, search value in a web page search bar, wait for next page to load

So currently my code opens a web page, then enters a product code from my spreadsheet into the search bar and navigates to the next page. From here the macro takes a picture from the web page and places it in my spreadsheet.
The problem is when the second web page opens too slowly, I get an image from the first web page.
I have tried running a do while loop like the ones below, but it doesn't seem to work for the second web page.
What can I do so that the macro waits for the second website to load before scraping the picture?
With IE
.Visible = False
.navigate "https://www.genericwebsitename.com/"
Do While .Busy Or .readyState <> 4: DoEvents: Loop
Set Doc = IE.document
IE.document.getElementsByName("searchterm")(0).Value =
Sheets("sheet1").range("c4").Value
Doc.forms(0).submit
Do While .Busy Or .readyState <> 4: DoEvents: Loop
End With
There is an id on the product page, associated with the product image) which is not present on the search page. You can use a timed loop looking for that.
I've re-organised the code a little and mostly used querySelector to apply css selectors to match on desired elements. This returns a single match and is quicker and more efficient than returning an entire collection and indexing.
Option Explicit
Public Sub GetImageLink()
Dim ie As Object, imageLink As String, t As Date
Const MAX_WAIT_SEC As Long = 10
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = True
.Navigate2 "https://www.talacooking.com/"
Do While .Busy Or .readyState <> 4: DoEvents: Loop
.document.querySelector("[name=searchterm]").Value = "10B10631" 'Sheets("sheet1").Range("c4").Value
.document.querySelector("form").submit
Do While .Busy Or .readyState <> 4: DoEvents: Loop
Dim image As Object
t = Timer
Do
On Error Resume Next
Set image = .document.querySelector("#product-image img")
On Error GoTo 0
If Timer - t > MAX_WAIT_SEC Then Exit Do
Loop While image Is Nothing
If Not image Is Nothing Then
imageLink = image.src
'download image?
Else
imageLink = "Not found"
'Message to user?
End If
.Quit
End With
End Sub
XHR with split on response string
You could side step the issue and do an xhr request - which is what the browser does. It is much faster and there is no browser opening or requirement for a timed loop.
You pass the productId in query string and get a json response. The right way to do this is to use a jsonparser to handle the response and parse out the image url. There are less optimal ways such as using split.
E.g. XHR with split on response string
Option Explicit
Public Sub test()
Dim http As Object, productId As String
Set http = CreateObject("MSXML2.XMLHTTP")
productId = "10B10631"
Debug.Print GetImageUrl(http, productId)
End Sub
Public Function GetImageUrl(ByVal http As Object, ByVal productId As String) As String
Dim s As String
On Error GoTo errHand:
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.talacooking.com/quicksearch?format=json&searchterm=" & productId, False
.send
s = .responseText
GetImageUrl = Replace$(Split(Split(s, "src=\""")(1), Chr$(34))(0), "\/", "/")
End With
Exit Function
errHand:
GetImageUrl = "Not found"
End Function
XHR with json parser:
Function re-written to use json parser. Note that the item in the json of interest, JsonConverter.ParseJson(.responseText)("results")(1)("html"), is actually html. That HTML has to be passed to an HTML parser to then extract the src.
I use jsonconverter.bas. After installing the code from that link in a standard module called JsonConverter, go to VBE > Tools > References > Add a reference to Microsoft Scripting Runtime.
Public Function GetImageUrl(ByVal http As Object, ByVal productId As String) As String
Dim s As String, json As Object, html As HTMLDocument
On Error GoTo errHand:
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.talacooking.com/quicksearch?format=json&searchterm=" & productId, False
.send
html.body.innerHTML = JsonConverter.ParseJson(.responseText)("results")(1)("html")
GetImageUrl = html.querySelector(".product-image").src
End With
Exit Function
errHand:
GetImageUrl = "Not found"
End Function
References (VBE > Tools > References):
Microsoft HTML Object Library

Why does my code to Scrape Text using VBA works in Debug only

I have written some code to scrape specific dates from Google's patent website. After reviewing lots of examples I figured out the getElementsByClassName that gets the date I need. The code below works when I step through in debug mode and generates the desired MsgBox. But when I run it, it gives me "Run-time error '91': Object variable or With block variable not set."
I have added delays wherever I thought that might be an issue. I have also disassociated the code from any interaction with the Excel spreadsheet where I would ultimately put the date, just to make it as simple as possible. I've also copied the code from the original spreadsheet to a new blank one, but same issue.
Any help would be appreciated.
Sub Get_Date()
Dim ie As InternetExplorer
Dim sURL As String
Dim strGrant As Variant
Set ie = New InternetExplorer
sURL = "https://patents.google.com/patent/US6816842B1/en?oq=6816842"
ie.navigate sURL
ie.Visible = False
Do While ie.Busy Or ie.ReadyState < 4
DoEvents
Loop
strGrant = ie.document.getElementsByClassName("granted style-scope application-timeline")(0).innerText
Do While ie.Busy Or ie.ReadyState < 4
DoEvents
Loop
MsgBox strGrant
ie.Quit
End Sub
````
It's likely a timing issue as per my comment. That's dealt with in other answers to similar questions. Main things to consider are:
Use proper page load waits: While IE.Busy Or ie.readyState < 4: DoEvents: Wend
Possibly a timed loop to attempt to set the element to a variable then testing if set.
Alternatively, a bit of a punt but it seems that all granted dates are the same as publication dates (patent publication date). If this is true then you can use xhr to get the publication date
Option Explicit
Public Sub GetDates()
Dim html As HTMLDocument, i As Long, patents()
patents = Array("US7724240", "US6876312", "US8259073", "US7523862", "US6816842B1")
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
For i = LBound(patents) To UBound(patents)
.Open "GET", "https://patents.google.com/patent/" & patents(i) & "/en?oq=" & patents(i), False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
html.body.innerHTML = .responseText
If html.querySelectorAll("[itemprop=publicationDate]").length > 0 Then
Debug.Print html.querySelector("[itemprop=publicationDate]").DateTime
End If
Next
End With
End Sub

Unable to get rid of blank screen popping up while scraping content

I've created a macro to parse the name of different movies traversing multiple pages from a torrent site. I used InternetExplorer in combination with Html.body.innerHTML parser (usually used with xmlhttp requests) to make the execution faster. Although the content of this site is not dynamic, I used IE to see how it behaves when it does the scraping in combination with Html.body.innerHTML.
When I run my script using IE, It parses the content as expected and finishes the job. The only problem I'm facing is that there is a blank screen out of nowhere popping up in every pagination like the image below.
How can I get rid of this blank screen issues?
My script (no issues when I use Html.body.innerHTML in combination with Chrome Driver):
Sub FetchContent()
Const link = "https://yts.am/browse-movies/0/all/action/0/latest?page="
Dim driver As New ChromeDriver, Html As New HTMLDocument, genre$
Dim post As HTMLDivElement, I&, R&
For I = 1 To 3
With driver
.AddArgument "--headless"
.get link & I
Html.body.innerHTML = .ExecuteScript("return document.documentElement.outerHTML")
End With
For Each post In Html.getElementsByClassName("browse-movie-bottom")
R = R + 1: Sheets(1).Cells(R, 1) = post.getElementsByClassName("browse-movie-title")(0).innerText
Next post
Next I
driver.Quit
End Sub
I would like to get the same behavior when I try using Html.body.innerHTML in combination with IE but it produces blank screens along with the desired result:
Sub FetchContent()
Const link = "https://yts.am/browse-movies/0/all/action/0/latest?page="
Dim IE As New InternetExplorer, Html As New HTMLDocument, genre$
Dim post As HTMLDivElement, I&, R&
For I = 1 To 4
With IE
.Visible = False
.navigate link & I
While .Busy Or .readyState < 4: DoEvents: Wend
Html.body.innerHTML = .document.DocumentElement.outerHTML
End With
For Each post In Html.getElementsByClassName("browse-movie-bottom")
R = R + 1: Sheets(1).Cells(R, 1) = post.getElementsByClassName("browse-movie-title")(0).innerText
Next post
Next I
IE.Quit
End Sub
Blank screen that appears in every pagination in case of IE combination with Html.body.innerHTML:
I try to test your code with IE and find that I am able to produce the issue.
I find that below line is causing this issue.
Html.body.innerHTML = .document.DocumentElement.outerHTML
Below is the modified version of your code in which this issue is resolved.
Sub FetchContent1()
Const link = "https://yts.am/browse-movies/0/all/action/0/latest?page="
Dim IE As New InternetExplorer, Html As New HTMLDocument, genre$
Dim post As HTMLDivElement, I&, R&
For I = 1 To 4
With IE
.Visible = False
.navigate link & I
While .Busy Or .readyState < 4: DoEvents: Wend
For Each post In .document.getElementsByClassName("browse-movie-bottom")
R = R + 1: Sheets(1).Cells(R, 1) = post.getElementsByClassName("browse-movie-title")(0).innerText
Next post
End With
Next I
IE.Quit
End Sub
Output in IE 11:
Now, It will not open any blank page in IE.

Resources