Can MSXML2.XMLHTTP be used with Chrome - excel

I have been using the following Excel VBA macro to bring back data from a website. It worked fine until a few days ago when the website stopped supporting IE. Of course the macro just fails now as there is no data on the webpage to bring back to Excel, just a message saying, "Your browser, Internet Explorer, is no longer supported." Is there a way to have the "Get method" (MSXML2.XMLHTTP) use Chrome instead of IE to interact with the website? BTW, my default browser is already set to "Chrome".
Dim html_doc As HTMLDocument ' note: reference to Microsoft HTML Object Library must be set
Sub KS()
' Define product url
KS_url = "https://www.kingsoopers.com/p/r-w-knudsen-just-blueberry-juice/0007468210784"
' Collect data
Set html_doc = New HTMLDocument
Set xml_obj = CreateObject("MSXML2.XMLHTTP")
xml_obj.Open "GET", KS_url, False
xml_obj.send
html_doc.body.innerHTML = xml_obj.responseText
Set xml_obj = Nothing
KS_product = html_doc.getElementsByClassName("ProductDetails-header")(0).innerText
KS_price = "$" & html_doc.getElementsByClassName("kds-Price kds-Price--alternate mb-8")(1).Value
do Stuff
End Sub

The check for this is a basic server check on user agent. Tell it what it wants to "hear" by passing a supported browser in the UA header...(or technically, in this case, just saying the equivalent of: "Hi, I am not Internet Explorer".)
It can be as simple as xml.setRequestHeader "User-Agent", "Chrome". I said basic because you could even pass xml.setRequestHeader "User-Agent", "I am a unicorn", so it is likely an exclusion based list on the server for Internet Explorer.
Option Explicit
Public Sub KS()
Dim url As String
url = "https://www.kingsoopers.com/p/r-w-knudsen-just-blueberry-juice/0007468210784"
Dim html As MSHTML.HTMLDocument, xml As Object
Set html = New MSHTML.HTMLDocument
Set xml = CreateObject("MSXML2.XMLHTTP")
xml.Open "GET", url, False
xml.setRequestHeader "User-Agent", "Mozilla/5.0"
xml.send
html.body.innerHTML = xml.responseText
Debug.Print html.getElementsByClassName("ProductDetails-header")(0).innerText
Debug.Print "$" & html.getElementsByClassName("kds-Price kds-Price--alternate mb-8")(1).Value
Stop
End Sub
Compare that with adding no UA or adding xml.setRequestHeader "User-Agent", "MSIE".

Study the article here by Daniel Pineault and this paragraph:
Feature Browser Emulation
Also note my comment dated 2020-09-13.

Related

Print inner text under "id"

Any one of three highlighted part are the value i want to print. I am trying below code
Sub JJ()
Dim IE As New SHDocVw.InternetExplorer
Dim hdoc As MSHTML.HTMLDocument
Dim ha As String
IE.Visible = True
IE.navigate "https://www.nseindia.com/get-quotes/equity?symbol=DIVISLAB"
Do While IE.readyState <> READYSTATE_COMPLETE
Loop
Set hdoc = IE.document
ha = hdoc.getElementById("preOpenFp").innerText
Debug.Print ha
End Sub
But the output is nothing pls help.
The website you're trying to scrape offers a very convenient way to do it. All you need to do is send an HTTP request and get the corresponding JSON response which looks like so:
If you take a look at the network traffic in your browser's developer tools, you'll see the requests that are being sent to the server when the page is being loaded. Among these requests you'll find the following one:
To send this request and get the info you need, you have to do the following:
Option Explicit
Sub nse()
Dim req As New MSXML2.XMLHTTP60
Dim url As String
Dim json As Object
url = "https://www.nseindia.com/api/quote-equity?symbol=DIVISLAB"
With req
.Open "GET", url, False
.send
Set json = JsonConverter.ParseJson(.responseText)
End With
Debug.Print json("preOpenMarket")("IEP")
End Sub
This will print the value of IEP to your immediate window (in this case 2390). You can modify the code to best fit your needs.
To parse a JSON string you will need to add this to your project. Follow the installation instructions in the link and you should be set to go.
You will also need to add the following references to your project (VBE>Tools>References):
Microsoft XML version 6.0
Microsoft Scripting Runtime

Scrape data that is not in the source code, using VBA

I'm trying to scrape whole div from one website. The data is not visible in the source code, it changes based on the variable in the URL (link).
I was looking for any solution to copy to the excel sheet everything from
<div id="div_measures_for_2103909010" class="measures_detail">
Unfortunately since there is no data in direct source code I have found a way to display only data from the div provided above Link
However to get this data I would need at first get the link to the direct data (the link is in the source code).
Do you have any idea how to deal with it the best possible way?
I've tried to download the source code, search for the link, open the link and copy all the data, but I have troubles downloading the source code (excel downloads only part of it due to cell data limitations). Here is my current code:
Sub Open_Webpage()
Set objHTTP = CreateObject("MSXML2.ServerXMLHTTP")
URL = "https://ec.europa.eu/taxation_customs/dds2/taric/measures.jsp?Lang=en&SimDate=20190329&Area=&MeasType=&StartPub=&EndPub=&MeasText=&GoodsText=&op=&Taric=2103909010&search_text=goods&textSearch=&LangDescr=pl&OrderNum=&Regulation=&measStartDat=&measEndDat="
objHTTP.Open "GET", URL, False
objHTTP.setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
objHTTP.send ("")
html = objHTTP.responseText
Range("A1").Value = html
End Sub
If I am able to have full code in one cell I can then look for the link in the source code and use it:
=MID(LEFT(A1,FIND("' width='100%'",A1)-1),FIND("' src='",A1)+7,LEN(A1))
I know that there must be some better solution, but I'm not so proficient in VBA to figure it out...
You can regex out the required url, do a little cleaning then pass on to xhr. For some reason I was unable to simply use getAttribute("onclick") so had to use outerHTML (innerHTML also fine) on the element
Option Explicit
Public Sub GetInfo()
Dim html As HTMLDocument, s As String, re As Object, url As String
Set re = CreateObject("vbscript.regexp")
Set html = New HTMLDocument '< VBE > Tools > References > Microsoft Scripting Runtime
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://ec.europa.eu/taxation_customs/dds2/taric/measures.jsp?Lang=en&SimDate=20190329&Area=&MeasType=&StartPub=&EndPub=&MeasText=&GoodsText=&op=&Taric=2103909010&search_text=goods&textSearch=&LangDescr=pl&OrderNum=&Regulation=&measStartDat=&measEndDat=", False
.send
html.body.innerHTML = .responseText
s = html.querySelector("[id$='_end_goods']").outerHTML
With re
.Global = True
.MultiLine = True
.IgnoreCase = True
.Pattern = "measures_details\.jsp(.*)'\);"
If .Test(s) Then
url = "https://ec.europa.eu/taxation_customs/dds2/taric/measures_details.jsp" & .Execute(s)(0).SubMatches(0)
url = Replace$(url, "&", "&")
End If
End With
If Len(url) > 0 Then
.Open "GET", url, False
.send
html.body.innerHTML = .responseText
ActiveSheet.Cells(1, 1) = html.querySelector(".measures_detail").innerText
End If
End With
End Sub
Try the regex here
References:
VBE > Tools > References > Microsoft HTML Object Library

Browsing to web page, typing in text to search bar, and searching... but the text disappears when I click the search button

First post here. Tried looking for similar posts but wasn't able to turn up anything.
I'm a little new to VBA. I'm trying to use Excel to navigate to a specific website, click a radio button, type in some text as a search string, and then search on that text. Everything seems fine when I walk through my code, but when I click the search button my search string gets blanked out and I get an error message telling me to enter search criteria. Code below:
Sub FranklinCountyWebsite()
'References: Microsoft Internet Controls, Microsoft HTML Object Library
Dim IE As New SHDocVw.InternetExplorer
Dim HTMLDoc As MSHTML.HTMLDocument
IE.Visible = True
IE.navigate "https://sheriff.franklincountyohio.gov/real-estate/"
Do While IE.readyState <> READYSTATE_COMPLETE
Loop
Set HTMLDoc = IE.document
HTMLDoc.getElementById("ctl00_SheetContentPlaceHolder_c_search1_rblSrchOptions_3").Click
HTMLDoc.getElementById("ctl00_SheetContentPlaceHolder_c_search1_SrchSearchString").Value = "43215"
HTMLDoc.getElementById("ctl00_SheetContentPlaceHolder_c_search1_btnSearch").Click
End Sub
Interestingly, if I go to the Franklin County website and type in the text manually and then hit search, everything works fine. Is there something easy I'm overlooking?
You can try the same using serverxmlhttp request which is way faster than IE. The below script can lead you to the target page where you wished to get data from.
Sub Fetch_Item()
Dim post As Object, qsp$, S$
qsp = "q=searchType%3dZipCode%26searchString%3d43215%26foreclosureType%3d%26sortType%3daddress%26saleDateFrom%3d4%2f30%2f2017+12%3a00%3a00+AM%26saleDateTo%3d10%2f30%2f2018+11%3a59%3a59+PM"
With New ServerXMLHTTP
.Open "GET", "https://sheriff.franklincountyohio.gov/real-estate/results.aspx?" & qsp, False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
S = .responseText
End With
With New HTMLDocument
.body.innerHTML = S
Set post = .getElementById("ctl00_SheetContentPlaceHolder_C_searchresults_reSaleSummary_ctl00_lblAddrHeader")
MsgBox post.innerText
End With
End Sub
Output:
155-157 CLEVELAND AVE COLUMBUS, OH 43215 010054688, 010055721
Reference to add to the library:
Microsoft XML, V6.0
Microsoft HTML Object Library

Excel VBA source code for extracting data from a URL

I want to extract the title of every news item displayed on "http://pib.nic.in/newsite/erelease.aspx?relid=58313" website using Excel VBA. I have written a code using getelementsbyclassname("contentdiv"). But the debugger is showing a error pertaining to that the object doesn't support...I want to extract the information items of every relid..which is there in the URL as well...
Cold scrapes like this are generally handled more efficiently with a XMLHTTP pull. This requires the addition of a few libraries to the VBE's Tools ► References. The code below needs Microsoft XML, v6.0, Microsoft HTML Object library and Microsoft Internet Controls. Might not need the last one but you probably will if you expand the code beyond what is supplied.
Public Const csURL As String = "http://pib.nic.in/newsite/erelease.aspx?relid=×ID×"
Sub scrape_PIBNIC()
Dim htmlBDY As HTMLDocument, xmlHTTP As MSXML2.ServerXMLHTTP60
Dim i As Long, u As String, iDIV As Long
On Error GoTo CleanUp
Set xmlHTTP = New MSXML2.ServerXMLHTTP60
Set htmlBDY = New HTMLDocument
For i = 58313 To 58313
htmlBDY.body.innerHTML = vbNullString
With xmlHTTP
u = Replace(csURL, "×ID×", i)
'Debug.Print u
.Open "GET", u, False
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded; charset=UTF-8"
.send
If .Status <> 200 Then GoTo CleanUp
htmlBDY.body.innerHTML = .responseText
For iDIV = 0 To (htmlBDY.getElementsByClassName("contentdiv").Length - 1)
If CBool(htmlBDY.getElementsByClassName("contentdiv")(iDIV).getElementsByTagName("span").Length) Then
Sheets("Sheet1").Cells(Rows.Count, 1).End(xlUp).Offset(1, 0) = _
htmlBDY.getElementsByClassName("contentdiv")(iDIV).getElementsByTagName("span")(0).innerText
End If
Next iDIV
End With
Next i
CleanUp:
Set htmlBDY = Nothing
Set xmlHTTP = Nothing
End Sub
That should be enough to get you started. The site you are targeting requires that charset=UTF-8 be added to the request. I had no success without it. I strongly suspect that this may have been the source of your object doesn't support error.

Excel Web Query Object and Cookies: Is there a better way?

I have a HTML web page at work that I want to query data from tables into excel 2007. This web page requires I sign on with a password. I sign in with my normal IE7 browser, then I go to DATA -> connections -> my connections and edit the query. This reads the IE7 cookie cache and I re-POST the data to connect to the server's security by clicking "retry" when it says "the web query returned no data". After I do this, the data imports fine.
I can do this just fine and it only needs to be done once a day. Other users of my application find this difficult which leads to my question:
Is there a way to automatically POST this data back with VB? I'm thinking maybe I should use the cookie property of the IE.Document.cookie?
I'm calling the following login script, before I continue with the web query (set reference to XML library). Look around to find some instructions how you can find your POST parameters.
Sub XMLHttpLogin()
Dim i As Integer
Dim sExpr As String
Dim sPar As String, sURL as String
Dim sResp As String
Dim XMLHttp As MSXML2.XMLHTTP60
Set XMLHttp = New MSXML2.XMLHTTP60
sPar = "name=user1&pass=pass1&form_id=form1" 'The parameters to send.
sURL = "http://www.stackoverflow.com"
With XMLHttp
.Open "POST", sURL, True 'Needs asynchronous connection
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
.send (sPar)
i = 0 'wait until data has been downloaded
Do While i = 0
If .readyState = 4 Then
If .Status = 200 Then Exit Do
End If
DoEvents
Loop
sResp = .responseText 'contains source code of website
sExpr = "not-logged-in" 'look for this string in source code
If InStr(1, sResp, sExpr, vbTextCompare) Then
MsgBox "Not logged in. Error in XMLHttpLogin"
End If
End With
End Sub

Resources