VBA : how to connect MSXML2.XMLHTTP60 response to IHTMLDocument(iframe) - excel

I am trying to use MSXML2 and IHTMLDocument
to deal with iframe part of HTML web page.
I want to use MSXML2 and save it to better "capture" the data, thinking it's faster than just using InternetExplorer or VBA selenium reference supported by VBA menu.
(I don't want to avoid using IE or selenium as much as possible)
But I couldn't find out how to save document as XML format(to take advantage of its speed) and at the same time click on the element in the document without the help of browser(ie or selenium).
And even after clicking some tab(id="cns_Tab21") on this web page, I have difficulty retrieving data.
So my question is..
1> Is it possible to minimize the use of browser for clicking?
2> Even after clicking(using Selenium), it throws an xpath related error in VBA editor.
Thank you for your answer in advance and the URL used for this is
http://bitly.kr/finance
and the iframe inside the link is http://bitly.kr/LT0aCb
'I declared objects
Dim XMLReq As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim iframeDoc As IHTMLDocument
'and saved XML data to HTML format
HTMLDoc.body.innerHTML = XMLReq.responseText
'and trying to save this HTML to iframe...
Set iframeDoc = HTMLDoc.getElementById("coinfo_cp")
'I tried .contentDocument but it maybe HTMLdoc doesn't have this property.
and I don't know how to access information I saved to iframeDoc above.
'And after I use Selenium I can't figure out why it throw an error
For Each ele In selenium.FindElementsByTag("th")
If ele.Attribute("innerText") = "CAPEX" Then
Debug.Print ele.FindElementsByXPath("./../td").Attribute("innerText")
This post isn't a duplicate since I am trying to use XML to handle iframe element and without InternetExplorer reference in VBA Excel.(ie.document)

You can make replicate the xhr request the page makes when that tab (not iframe) is selected. I use clipboard to copy table to Excel. Note: url I am using is from our discussions. This info should be reflected in question.
Option Explicit
Public Sub GetTable()
'VBE > Tools > References > Microsoft HTML Object Library
Dim html As HTMLDocument, hTable As HTMLTable, clipboard As Object
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://navercomp.wisereport.co.kr/v2/company/ajax/cF1001.aspx?cmp_cd=005930&fin_typ=0&freq_typ=Y&encparam=ZXR1cWFjeGJnS1lWOHhCYmNScmJXUT09&id=bG05RlB6cn", False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
html.body.innerHTML = .responseText
End With
Set hTable = html.querySelector(".hbG05RlB6cn + .gHead01")
Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}") ' New DataObject
clipboard.SetText hTable.outerHTML
clipboard.PutInClipboard
ThisWorkbook.Worksheets("Sheet1").Cells(1, 1).PasteSpecial
End Sub
You can find the params of the ajax url for the tab content update in the scripts of the page
Along with the target for the update:
This needs tidying up:
Option Explicit
Public Sub GetTable()
'https://navercomp.wisereport.co.kr/v2/company/c1010001.aspx?cmp_cd=005930
'VBE > Tools > References > Microsoft HTML Object Library
Dim html As HTMLDocument, hTable As HTMLTable, clipboard As Object, ws As Worksheet
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://navercomp.wisereport.co.kr/v2/company/ajax/cF1001.aspx?cmp_cd=005930&fin_typ=0&freq_typ=Y&encparam=ZXR1cWFjeGJnS1lWOHhCYmNScmJXUT09&id=bG05RlB6cn", False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
html.body.innerHTML = .responseText
End With
Set hTable = html.querySelector(".hbG05RlB6cn + .gHead01") '2nd tab. CAPEX row
Dim html2 As HTMLDocument, i As Long
Set html2 = New HTMLDocument
html2.body.innerHTML = hTable.outerHTML
Dim tableBodyRows As Object, tableBodyRowLength As Long, tableHeaderRowLength As Long, tableHeaderRows As Object, targetRow As Long
Set tableBodyRows = html2.querySelectorAll("tbody tr .bg")
tableBodyRowLength = tableBodyRows.Length
tableHeaderRowLength = html2.querySelectorAll("thead tr").Length + 2
For i = 0 To tableBodyRowLength - 1
If Trim$(tableBodyRows.item(i).innerText) = "CAPEX" Then
targetRow = i + tableHeaderRowLength + 1
Exit For
End If
Next
Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}") ' New DataObject
clipboard.SetText hTable.outerHTML
clipboard.PutInClipboard
ws.Cells(1, 1).PasteSpecial
Dim unionRng As Range
For i = (tableHeaderRowLength + 1) To (tableBodyRowLength + tableHeaderRowLength)
If i <> targetRow Then
If Not unionRng Is Nothing Then
Set unionRng = Union(ws.rows(i), unionRng)
Else
Set unionRng = ws.rows(i)
End If
End If
Next
If Not unionRng Is Nothing Then unionRng.Delete
End Sub

Related

How can we use http.Open "GET" to list items from a table in HTML?

I'm testing an idea that I had. It seems like I should be able to scrape out various HTML elements from a table in a website, but my code can't seem to find the table, which definitely seems to be there.
Sub TryThis()
Dim oHtml As HTMLDocument
Dim oElement As Object
Set oHtml = New HTMLDocument
With CreateObject("WINHTTP.WinHTTPRequest.5.1")
.Open "GET", "https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population", False
.send
oHtml.body.innerHTML = .responseText
End With
Set myitem = oHtml.getElementsByClassName("wikitable sortable jquery-tablesorter")
i = 0
For Each oElement In myitem
Sheets("Sheet1").Range("A" & i + 1) = myitem(i).innerText
i = i + 1
Next oElement
End Sub
Essentially, I would like to loop through HTML items, print out, in cells, what is in the table named 'wikitable sortable jquery-tablesorter' Here is a screen shot that may help.
You were really close, I think the issue is the jquery-tablesorter class is being added by jQuery (or plugin) after the page is loaded via JS. So that class isn't present in the DOM when the content is pulled in by the web request, it's added after. So removing it from the search criteria, should fix the issue.
Here's what I came up to address this, and also move the table contents over a bit quicker. I just did the first instance of wikitable sortable classes, but should be possible to loop each table too.
Sub TryThis()
Dim oHtml As HTMLDocument
Dim oElement As Object
Dim htmlText As String
Set oHtml = New HTMLDocument
With CreateObject("WINHTTP.WinHTTPRequest.5.1")
.Open "GET", "https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population", False
.send
oHtml.body.innerHTML = .responseText
End With
htmlText = oHtml.getElementsByClassName("wikitable sortable")(0).outerhtml
With CreateObject("new:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}") 'Clipboard
.SetText htmlText
.PutInClipboard
Sheets(1).Range("A1").Select
Sheets(1).PasteSpecial Format:="Unicode Text"
End With
End Sub

Get image url in excel VBA

I need to get the url of the image, i don't need the image itself i just need url. But i can't get it to work.
Here's the site: https://www.bauhaus.com.tr/hirdavat-hirdavat-urunleri-menteseler-mobilya-menteseleri/adilon-karyola-demiri-61185746
I have no experience in coding, so if there's an easier way please share it.
Here's the code:
Dim i, sonsat As Integer
Dim url As String
Dim XMLreq As New MSXML2.XMLHTTP60
Dim HTMLdoc As New MSHTML.HTMLDocument
sonsat = Sheets("Sayfa1").Range("A10000").End(xlUp).Row
For i = 2 To sonsat
On Error Resume Next
url = Sheets("Sayfa1").Range("A" & i)
XMLreq.Open "GET", url, False
XMLreq.send
If XMLreq.Status <> 200 Then
MsgBox "Sayfaya Ulaşılamadı"
Exit Sub
End If
HTMLdoc.body.innerHTML = XMLreq.responseText
Sheets("Sayfa1").Range("C" & i) = HTMLdoc.getElementsByClassName("title")(0).innerText
Sheets("Sayfa1").Range("B" & i) = HTMLdoc.getElementsByClassName("title sub")(0).innerText
Sheets("Sayfa1").Range("D" & i) = HTMLdoc.getElementsByClassName("proAttr sku")(0).innerText
Sheets("Sayfa1").Range("E" & i) = HTMLdoc.getElementsByclassName("item")(391).innerText
Next
End Sub
Your code is about right, but it turns out the image is inserted after page load, probably by a JavaScript component. So when you get the document, it does not contain the image lightbox.
However, the image is referenced in the page metadata as
<html>
<head>
...
<meta property="og:image" content="https://...">
...
</head>
...
</html>
So you can pull it from there.
Also some advice about clean coding:
As you're running in a loop, the HTML document and XML HTTP Request objects would better be created each time (put New MSXML2.XMLHTTP60 and New MSHTML.HTMLDocument inside the loop)
HTMLDocument should be closed after use
So here's a piece of code very close to your final requirements (I just changed the loop so I didn't have to re-make your Excel Workbook on my side:
Sub testHtml()
Dim i As Integer, j As Integer
Dim url As String
Dim XMLreq As MSXML2.XMLHTTP60
Dim HTMLdoc As MSHTML.HTMLDocument
Dim els As Variant 'DispHTMLElementCollection
Dim meta As MSHTML.HTMLMetaElement
'Replace following loop with your own (looping over worksheet column)
For i = 1 To 3
'You should actually read this url from your worksheet
url = "https://www.bauhaus.com.tr/hirdavat-hirdavat-urunleri-menteseler-mobilya-menteseleri/adilon-karyola-demiri-61185746"
Set XMLreq = New MSXML2.XMLHTTP60
XMLreq.Open "GET", url, False
XMLreq.send
If XMLreq.Status <> 200 Then
MsgBox "Error!"
Exit Sub
End If
Set HTMLdoc = New MSHTML.HTMLDocument
HTMLdoc.body.innerHTML = XMLreq.responseText
Set els = HTMLdoc.getElementsByTagName("meta")
'Debug.Print els.Length
For j = 0 To els.Length - 1
Set meta = els(j)
'Debug.Print meta.outerHTML
If meta.getAttribute("property") = "og:image" Then
Debug.Print meta.Content 'Output to Execution Window
'You should output to your WorkSheet instead
Exit For
End If
Next j
HTMLdoc.Close
Next i
End Sub
And final note: to see the ouput of Debug.Print, use the Execution Window in VBA

Select Item in a dropdown from website via Excel Macro

I would like to select the options "Addition, Bulk, Reduction" using excel VBA
This what I have so far, but nothing is being selected.
Dim ie As InternetExplorer
Set ie = New InternetExplorer
ie.Visible = True
ie.navigate "my URL"
Do While ie.readyState <> READYSTATE_COMPLETE
DoEvents
Loop
'time_adjust_group_ident = Reduction
Dim doc As HTMLDocument
Set doc = ie.document
doc.getElementById("time_adjust_group_ident").Value = "Reduction"
End Sub
You don't need Internet Explorer object for this. Please take a look in the code below where I use MSXML2.XMLHTTP to make a HTTP request and get the HTML response as a string, and then parse it using the HTMLFile object.
I'm using the CreateObject method instead of adding the references via Tools > References, so you can run this code anywhere without having to add references manually every time you open this in a different machine.
In this example, I'm retrieving the children elements of the language-selector dropdown in a given website, and looping through it using a For Each to write each child element's content in a spreadsheet row.
Sub LoadHtml()
Dim strUrl As String
strUrl = "https://developer.mozilla.org/en-US/docs/Web/HTML/Element/select"
Dim httpRequest As Object
Set httpRequest = CreateObject("MSXML2.XMLHTTP")
With httpRequest
.Open "GET", strUrl, False
.send
End With
Dim html As Object
Set html = CreateObject("HTMLFile")
html.body.innerHTML = httpRequest.ResponseText
Dim child As Object
Dim row As Integer
row = 1
For Each child In html.getElementById("language-selector").Children
Range("A" & row) = child.innerText
row = row + 1
Next
End Sub

Using MSXML in a VBA script to pull website data

I have the following code from http://dailydoseofexcel.com/archives/2011/03/08/get-data-from-website-that-requires-a-login/#comment-60553
Sub GetTable()
Dim xml As Object ' MSXML2.XMLHTTP60
Dim htmlDoc As Object ' MSHTML.HTMLDocument
Dim htmlBody As Object ' MSHTML.HTMLBody
Dim ieTable As Object
Dim clip As DataObject
Set xml = GetMSXML
With xml
.Open "POST", "https://web.site", False
.send "username=myname&password=mypassword"""
End With
With xml
.Open "POST", "https://web.site/anotherpage", False
End With
Set htmlDoc = CreateHTMLDoc
Set htmlBody = htmlDoc.Body
htmlBody.innerHTML = xml.responseText
Set ieTable = htmlBody.all.Item("report")
'copy the tables html to the clipboard and paste to teh sheet
If Not ieTable Is Nothing Then
Set clip = New DataObject
clip.SetText "<html>" & ieTable.outerHTML & "</html>"
clip.PutInClipboard
Range("A1").Select
ActiveSheet.PasteSpecial "Unicode Text"""
End If
End Sub
Function CreateHTMLDoc() As Object ' MSHTML.HTMLDocument
Set CreateHTMLDoc = CreateObject("htmlfile")
End Function
Function GetMSXML() As Object ' MSXML2.XMLHTTP
On Error Resume Next
Set GetMSXML = CreateObject("MSXML2.XMLHTTP")
End Function
Using this code I am attempting to access the site web.site and pass it a username and password to login, before proceeding to another page on the site, before copying the content of a table (results) into sheet1 of the excel workbook.
I have tried to debug this using f8 but without the visual browser that I would get if I were to follow this page http://dailydoseofexcel.com/archives/2011/03/08/get-data-from-website-that-requires-a-login/
then its a bit difficult to see exactly what is happening and where it is failing.
Try the following code to assist you go onto the site, if you have any questions then just leave a comment on my channel https://www.youtube.com/watch?v=hfAhmae4iqA ;
Dim IEe As InternetExplorer
Dim doc, element
Set IEe = New InternetExplorer
IEe.Visible = False 'make true if you want to the internet explorer
IEe.Navigate "YOUR WEBSITE"
Do While IEe.ReadyState = 4: DoEvents: Loop
Do Until IEe.ReadyState = 4: DoEvents: Loop
Set element = IEe.Document.getElementByID(INSERT ELEMENT ID) 'RIGHT CLICK ON WEBSITE AND SAY INSPECT ELEMENT CLICK THE MOUSE ICON AND THEN CLICK THE TEXT BOX WHERE THE PASSWORD OR USERNAME SHOULD BE INSERTED
element.Value = "USERNAME"
Set element = IEe.Document.getElementByID(INSERT ELEMENT ID) 'THE FIRST IS FOR USERNAME THE NEXT FOR PASSWORD
element.Value = "PASSWORD" 'remember storing a password in a macro is not safe

Get data from website [duplicate]

This question already has an answer here:
Get data from listings on a website to excel VBA
(1 answer)
Closed 9 years ago.
<span itemprop="streetAddress">
**94 Grand St**
</span>
how to get this data through getelementby method in excel vba
I have tried getelementbyid, getelementbyname etc. but nothing is working
Option Explicit
Sub find()
'Uses late binding, or add reference to Microsoft HTML Object Library
' and change variable Types to use intellisense
Dim ie As Object 'InternetExplorer.Application
Dim html As Object 'HTMLDocument
Dim Listings As Object 'IHTMLElementCollection
Dim l As Object 'IHTMLElement
Dim r As Long
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = False
.Navigate "http://www.yelp.com/biz/if-boutique-new-york#query:boutique"
' Don't show window
'Wait until IE is done loading page
Do While .readyState <> 4
Application.StatusBar = "Downloading information, Please wait..."
DoEvents
Loop
Set html = .Document
End With
Set Listings = html.getElementsByTagName("span") ' ## returns the list
MsgBox (Listings(0))
For Each l In Listings
'## make sure this list item looks like the listings Div Class:
' then, build the string to put in your cell
Range("A1").Offset(r, 0).Value = l.innerText
r = r + 1
Next
Set html = Nothing
Set ie = Nothing
End Sub
The above program is used by me to get the innerText value inside the span tag... but its not working
For the single result you are looking for in detail you want to use these two lines in your code (there is only 1 listing at the detailed level)
Adapt your IE code
Set Listings = html.getElementbyid("bizInfoBody") ' ## returns the list
Range("A1").Offset(r, 0).Value = Listings.innerText
with XMLHTTP
Sub GetTxt()
Dim objXmlHTTP As Object
Dim objHtmlDoc As Object
Dim objHtmlBody As Object
Dim objTbl As Object
Dim strResponse As String
Dim strSite As String
Set objHtmlDoc = CreateObject("htmlfile")
Set objHtmlBody = objHtmlDoc.body
Set objXmlHTTP = CreateObject("MSXML2.XMLHTTP")
strSite = "http://www.yelp.com/biz/if-boutique-new-york"
With objXmlHTTP
.Open "GET", strSite, False
.Send
If .Status = 200 Then
strResponse = .responseText
objHtmlBody.innerHTML = objXmlHTTP.responseText
Set objTbl = objHtmlBody.Document.getElementbyid("bizInfoBody")
MsgBox objTbl.innerText
End If
End With
End Sub

Resources