Print inner text under "id" - excel

Any one of three highlighted part are the value i want to print. I am trying below code
Sub JJ()
Dim IE As New SHDocVw.InternetExplorer
Dim hdoc As MSHTML.HTMLDocument
Dim ha As String
IE.Visible = True
IE.navigate "https://www.nseindia.com/get-quotes/equity?symbol=DIVISLAB"
Do While IE.readyState <> READYSTATE_COMPLETE
Loop
Set hdoc = IE.document
ha = hdoc.getElementById("preOpenFp").innerText
Debug.Print ha
End Sub
But the output is nothing pls help.

The website you're trying to scrape offers a very convenient way to do it. All you need to do is send an HTTP request and get the corresponding JSON response which looks like so:
If you take a look at the network traffic in your browser's developer tools, you'll see the requests that are being sent to the server when the page is being loaded. Among these requests you'll find the following one:
To send this request and get the info you need, you have to do the following:
Option Explicit
Sub nse()
Dim req As New MSXML2.XMLHTTP60
Dim url As String
Dim json As Object
url = "https://www.nseindia.com/api/quote-equity?symbol=DIVISLAB"
With req
.Open "GET", url, False
.send
Set json = JsonConverter.ParseJson(.responseText)
End With
Debug.Print json("preOpenMarket")("IEP")
End Sub
This will print the value of IEP to your immediate window (in this case 2390). You can modify the code to best fit your needs.
To parse a JSON string you will need to add this to your project. Follow the installation instructions in the link and you should be set to go.
You will also need to add the following references to your project (VBE>Tools>References):
Microsoft XML version 6.0
Microsoft Scripting Runtime

Related

Can MSXML2.XMLHTTP be used with Chrome

I have been using the following Excel VBA macro to bring back data from a website. It worked fine until a few days ago when the website stopped supporting IE. Of course the macro just fails now as there is no data on the webpage to bring back to Excel, just a message saying, "Your browser, Internet Explorer, is no longer supported." Is there a way to have the "Get method" (MSXML2.XMLHTTP) use Chrome instead of IE to interact with the website? BTW, my default browser is already set to "Chrome".
Dim html_doc As HTMLDocument ' note: reference to Microsoft HTML Object Library must be set
Sub KS()
' Define product url
KS_url = "https://www.kingsoopers.com/p/r-w-knudsen-just-blueberry-juice/0007468210784"
' Collect data
Set html_doc = New HTMLDocument
Set xml_obj = CreateObject("MSXML2.XMLHTTP")
xml_obj.Open "GET", KS_url, False
xml_obj.send
html_doc.body.innerHTML = xml_obj.responseText
Set xml_obj = Nothing
KS_product = html_doc.getElementsByClassName("ProductDetails-header")(0).innerText
KS_price = "$" & html_doc.getElementsByClassName("kds-Price kds-Price--alternate mb-8")(1).Value
do Stuff
End Sub
The check for this is a basic server check on user agent. Tell it what it wants to "hear" by passing a supported browser in the UA header...(or technically, in this case, just saying the equivalent of: "Hi, I am not Internet Explorer".)
It can be as simple as xml.setRequestHeader "User-Agent", "Chrome". I said basic because you could even pass xml.setRequestHeader "User-Agent", "I am a unicorn", so it is likely an exclusion based list on the server for Internet Explorer.
Option Explicit
Public Sub KS()
Dim url As String
url = "https://www.kingsoopers.com/p/r-w-knudsen-just-blueberry-juice/0007468210784"
Dim html As MSHTML.HTMLDocument, xml As Object
Set html = New MSHTML.HTMLDocument
Set xml = CreateObject("MSXML2.XMLHTTP")
xml.Open "GET", url, False
xml.setRequestHeader "User-Agent", "Mozilla/5.0"
xml.send
html.body.innerHTML = xml.responseText
Debug.Print html.getElementsByClassName("ProductDetails-header")(0).innerText
Debug.Print "$" & html.getElementsByClassName("kds-Price kds-Price--alternate mb-8")(1).Value
Stop
End Sub
Compare that with adding no UA or adding xml.setRequestHeader "User-Agent", "MSIE".
Study the article here by Daniel Pineault and this paragraph:
Feature Browser Emulation
Also note my comment dated 2020-09-13.

Excel vba getElementsByClassName

I am trying to scrape IPO date from crunchbase.
Unfortunately I get Runtime Error 1004 “Application-defined or Object-defined error”.
My goal is to save IPO date in A1 cell.
Sub GetIE()
Dim IE As Object
Dim URL As String
Dim myValue As IHTMLElement
URL = "https://www.crunchbase.com/organization/verastem"
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
IE.Navigate URL
Do While IE.Busy Or IE.ReadyState <> 4
DoEvents
Loop
Set myValue = IE.Document.getElementsByClassName("post_glass post_micro_glass")(0)
Range("A1").Value = myValue
Set IE = Nothing
End Sub
I can't find that class name in the html for that url. You can use the css selector I show below which can be scraped by xmlhttp and thus avoiding opening a browser
Option Explicit
Public Sub GetDate()
Dim html As HTMLDocument
Set html = New HTMLDocument '< VBE > Tools > References > Microsoft Scripting Runtime
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.crunchbase.com/organization/verastem#section-overview", False
.send
html.body.innerHTML = .responseText
End With
ActiveSheet.Range("A1") = html.querySelectorAll(".field-type-date.ng-star-inserted").item(1).innerText
End Sub
If you don't want to use compound classes then you can also use
ActiveSheet.Range("A1") = html.querySelectorAll("#section-ipo-stock-price .field-type-date").item(1).innerText
You can see the relevant html here:
Note the element has multiple (compound) classes
<span class="component--field-formatter field-type-date ng-star-inserted" title="Jan 27, 2012">Jan 27, 2012</span>
There are 3 classes component--field-formatter ; field-type-date and ng-star-inserted. I use two of these in combination in the first solution I give. Multiple classes is popular now-a-days due to the versatility it gives in page styling e.g. it allows overriding styles easily. You can read about css specificity* to understand this better.
More classes may mean the code is a little less robust as the ordering of classes may be changed and a class, or more, may be removed. This was raised by #SIM in a comment on an answer to another web-scraping question. Thus, I offer one solution with two of the classes used, and another solution with only one of the classes used.
Whilst you do get the same date for this page with simply:
ActiveSheet.Range("A1") = html.querySelector("#section-ipo-stock-price .field-type-date").innerText
I wouldn't want to assume that would always hold true as it grabs the date from the line where it says "Their stock opened".
* https://developer.mozilla.org/en-US/docs/Web/CSS/Specificity
References:
querySelectorAll
css selectors

VBA post request with formdata (URL doesnt change)

Ive been going through many similar questions, like this and this but mine is much simpler.I want to change the date on a webform and get the data using POST request
I have this code which makes a POST request:
Sub winpost()
Dim WebClient As WinHttp.WinHttpRequest
Set WebClient = New WinHttp.WinHttpRequest
Dim searchResult As HTMLTextElement: Dim searchTxt As String
Dim html As New HTMLDocument
Dim Payload As String
Payload = "ContentPlaceHolder1_ddlday=6"
With WebClient
.Open "POST", "http://pib.nic.in/AllRelease.aspx", False
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
.send (Payload)
.waitForResponse
End With
html.body.innerHTML = WebClient.responseText
Set searchResult = html.querySelector(".search_box_result"): searchTxt = searchResult.innerText
Debug.Print searchTxt
End Sub
The website is this.The page sends a post request onchange of any fields.
On looking at ChromeDevTools under network > Formdata section i see this:
ctl00$ContentPlaceHolder1$ddlday: 8
I have tried various versions of this in the Payload string.But it always returns the same page (8th jan).
Internet Explorer
With IE slightly different syntax from selenium basic (shown at bottom) as no SelectByText option. You can use indices or attribute = value css selectors for example. Here months are indices upto 12 instead of month names
Option Explicit
Public Sub SetDates()
Dim ie As New InternetExplorer
With ie
.Visible = True
.Navigate2 "http://pib.nic.in/AllRelease.aspx"
While .Busy Or .readyState < 4: DoEvents: Wend
With .Document
.querySelector("#btnSave").Click
.querySelector("#ContentPlaceHolder1_ddlMonth [value='2']").Selected = True
.querySelector("#ContentPlaceHolder1_ddlYear [value='2018']").Selected = True
.querySelector("#ContentPlaceHolder1_ddlday [value='2']").Selected = True
End With
Stop '<==delete me later
.Quit
End With
End Sub
Selenium basic:
If you do go down the selenium basic vba route you can do something like as follows. Note: You would need to go VBE > Tools > References > Add reference to selenium type library after installing selenium. You would also need latest Chrome and ChromeDriver and ChromeDriver folder should be placed on environmental path or chromedriver placed in folder containing selenium executables.
Option Explicit
Public Sub SetDates()
Dim d As WebDriver
Set d = New ChromeDriver
Const Url = "http://pib.nic.in/AllRelease.aspx"
With d
.Start "Chrome"
.get Url
.FindElementById("btnSave").Click
'date values
.FindElementById("ContentPlaceHolder1_ddlMonth").AsSelect.SelectByText "February"
.FindElementById("ContentPlaceHolder1_ddlYear").AsSelect.SelectByText "2018"
.FindElementById("ContentPlaceHolder1_ddlday").AsSelect.SelectByText "2"
Stop 'delete me later
.Quit
End With
End Sub

Export HTML to text file with different results

I have two codes .. that are supposed to export the html file to text file
Sub Demo1()
Dim http As New XMLHTTP60
Dim html As New HTMLDocument
With http
.Open "GET", "https://www.google.com.eg/", False
.send
html.body.innerHTML = .responseText
WriteTxtFile html.body.innerHTML
End With
End Sub
Sub WriteTxtFile(ByVal aString As String, Optional ByVal filePath As String = "C:\Users\Future\Desktop\Output.txt")
Dim fso As Object
Dim fileout As Object
Set fso = CreateObject("Scripting.FileSystemObject")
Set fileout = fso.CreateTextFile(filePath, True, True)
fileout.write aString
fileout.Close
End Sub
Sub Demo2()
Dim ie As Object
Dim f As Integer
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = True
.navigate ("https://www.google.com.eg/")
Do: DoEvents: Loop Until .readyState = 4
f = FreeFile()
Open ThisWorkbook.Path & "\Sample.txt" For Output As #f
Print #f, .document.body.innerHTML
Close #f
.Quit
End With
End Sub
Both Demo1 and Demo2 are the codes .. and they resulted in "Sample.txt" and "Output.txt"
But I found those html documents are different results
Can you help me to clarify what is the right one .. and why they are different?
Thanks advanced for help
Xmlhttp does not provide all the rendered content of a webpage. Particularly anything rendered via JavaScript execution. Any scripts are not executed.
Internet Explorer on the other hand will render the page (provided the browser version and JavaScript syntax is supported. For example, you will run into problems with the ec6 - latest Ecmascript as this is not supported on legacy browsers. It is I believe on Edge for Windows 10. You can check compatibility tables to see what is and isn’t supported ) fully.
If you familiarize yourself with dev tools for your browser you can explore how different parts of a webpage are rendered. You can learn to debug scripts and see what changes are made to the DOM and page styling. Often a page will issue XHR requests to update content on a page for example. If you want to have a play look here.
So, I suspect that the first html document may have less content and a different overall DOM structure from the second on this basis.
To test for differences due to writing to text file methodology you need to compare Apples with Apples i.e use the same scraping access method and syntax to retrieve the page content before writing out.
Please provide the differences if you want a deeper explanation.
Exploring page updating:
Firefox Network Tab
Internet Explorer Network Inspector
Chrome Network Tab

Browsing to web page, typing in text to search bar, and searching... but the text disappears when I click the search button

First post here. Tried looking for similar posts but wasn't able to turn up anything.
I'm a little new to VBA. I'm trying to use Excel to navigate to a specific website, click a radio button, type in some text as a search string, and then search on that text. Everything seems fine when I walk through my code, but when I click the search button my search string gets blanked out and I get an error message telling me to enter search criteria. Code below:
Sub FranklinCountyWebsite()
'References: Microsoft Internet Controls, Microsoft HTML Object Library
Dim IE As New SHDocVw.InternetExplorer
Dim HTMLDoc As MSHTML.HTMLDocument
IE.Visible = True
IE.navigate "https://sheriff.franklincountyohio.gov/real-estate/"
Do While IE.readyState <> READYSTATE_COMPLETE
Loop
Set HTMLDoc = IE.document
HTMLDoc.getElementById("ctl00_SheetContentPlaceHolder_c_search1_rblSrchOptions_3").Click
HTMLDoc.getElementById("ctl00_SheetContentPlaceHolder_c_search1_SrchSearchString").Value = "43215"
HTMLDoc.getElementById("ctl00_SheetContentPlaceHolder_c_search1_btnSearch").Click
End Sub
Interestingly, if I go to the Franklin County website and type in the text manually and then hit search, everything works fine. Is there something easy I'm overlooking?
You can try the same using serverxmlhttp request which is way faster than IE. The below script can lead you to the target page where you wished to get data from.
Sub Fetch_Item()
Dim post As Object, qsp$, S$
qsp = "q=searchType%3dZipCode%26searchString%3d43215%26foreclosureType%3d%26sortType%3daddress%26saleDateFrom%3d4%2f30%2f2017+12%3a00%3a00+AM%26saleDateTo%3d10%2f30%2f2018+11%3a59%3a59+PM"
With New ServerXMLHTTP
.Open "GET", "https://sheriff.franklincountyohio.gov/real-estate/results.aspx?" & qsp, False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
S = .responseText
End With
With New HTMLDocument
.body.innerHTML = S
Set post = .getElementById("ctl00_SheetContentPlaceHolder_C_searchresults_reSaleSummary_ctl00_lblAddrHeader")
MsgBox post.innerText
End With
End Sub
Output:
155-157 CLEVELAND AVE COLUMBUS, OH 43215 010054688, 010055721
Reference to add to the library:
Microsoft XML, V6.0
Microsoft HTML Object Library

Resources