I need to get the url of the image, i don't need the image itself i just need url. But i can't get it to work.
Here's the site: https://www.bauhaus.com.tr/hirdavat-hirdavat-urunleri-menteseler-mobilya-menteseleri/adilon-karyola-demiri-61185746
I have no experience in coding, so if there's an easier way please share it.
Here's the code:
Dim i, sonsat As Integer
Dim url As String
Dim XMLreq As New MSXML2.XMLHTTP60
Dim HTMLdoc As New MSHTML.HTMLDocument
sonsat = Sheets("Sayfa1").Range("A10000").End(xlUp).Row
For i = 2 To sonsat
On Error Resume Next
url = Sheets("Sayfa1").Range("A" & i)
XMLreq.Open "GET", url, False
XMLreq.send
If XMLreq.Status <> 200 Then
MsgBox "Sayfaya Ulaşılamadı"
Exit Sub
End If
HTMLdoc.body.innerHTML = XMLreq.responseText
Sheets("Sayfa1").Range("C" & i) = HTMLdoc.getElementsByClassName("title")(0).innerText
Sheets("Sayfa1").Range("B" & i) = HTMLdoc.getElementsByClassName("title sub")(0).innerText
Sheets("Sayfa1").Range("D" & i) = HTMLdoc.getElementsByClassName("proAttr sku")(0).innerText
Sheets("Sayfa1").Range("E" & i) = HTMLdoc.getElementsByclassName("item")(391).innerText
Next
End Sub
Your code is about right, but it turns out the image is inserted after page load, probably by a JavaScript component. So when you get the document, it does not contain the image lightbox.
However, the image is referenced in the page metadata as
<html>
<head>
...
<meta property="og:image" content="https://...">
...
</head>
...
</html>
So you can pull it from there.
Also some advice about clean coding:
As you're running in a loop, the HTML document and XML HTTP Request objects would better be created each time (put New MSXML2.XMLHTTP60 and New MSHTML.HTMLDocument inside the loop)
HTMLDocument should be closed after use
So here's a piece of code very close to your final requirements (I just changed the loop so I didn't have to re-make your Excel Workbook on my side:
Sub testHtml()
Dim i As Integer, j As Integer
Dim url As String
Dim XMLreq As MSXML2.XMLHTTP60
Dim HTMLdoc As MSHTML.HTMLDocument
Dim els As Variant 'DispHTMLElementCollection
Dim meta As MSHTML.HTMLMetaElement
'Replace following loop with your own (looping over worksheet column)
For i = 1 To 3
'You should actually read this url from your worksheet
url = "https://www.bauhaus.com.tr/hirdavat-hirdavat-urunleri-menteseler-mobilya-menteseleri/adilon-karyola-demiri-61185746"
Set XMLreq = New MSXML2.XMLHTTP60
XMLreq.Open "GET", url, False
XMLreq.send
If XMLreq.Status <> 200 Then
MsgBox "Error!"
Exit Sub
End If
Set HTMLdoc = New MSHTML.HTMLDocument
HTMLdoc.body.innerHTML = XMLreq.responseText
Set els = HTMLdoc.getElementsByTagName("meta")
'Debug.Print els.Length
For j = 0 To els.Length - 1
Set meta = els(j)
'Debug.Print meta.outerHTML
If meta.getAttribute("property") = "og:image" Then
Debug.Print meta.Content 'Output to Execution Window
'You should output to your WorkSheet instead
Exit For
End If
Next j
HTMLdoc.Close
Next i
End Sub
And final note: to see the ouput of Debug.Print, use the Execution Window in VBA
Related
There are 15 objects listed on this website, each has a link under the photo. The 6th object has none. When extracting and transferring the content with my code the missing html-href is not skipped and in Excel, 14 hrefs are listed below each other (the 6th cell should remain empty or "no ducument"), but the last cell does (& error because 14<=>15). Unfortunately I have to keep my code structure and just need a loop or condition to complete it. Does anyone have any ideas? Thanks.
My incomplete code:
Public Sub GetData()
Dim html As New HTMLDocument
Dim elmt01 As Object, elmt02 As Object
Dim y As Long
Dim xURL As String
Set html = New MSHTML.HTMLDocument
xURL = "https://immobilienpool.de/suche/immobilien?page=1"
With CreateObject("MSXML2.XMLHTTP.6.0")
.Open "GET", xURL, False
.send
html.body.innerHTML = .responseText
End With
Set elmt01 = html.querySelectorAll("li[class*='contentBox']") '15 items
Set elmt02 = html.querySelectorAll("li a[title*='zusätzliche']") '14 hrefs
For y = 0 To elmt01.Length - 1
If InStr(elmt02, "pdf") Then 'better: If elmt02 exists in elmt01 then...
ActiveSheet.Cells(y + 1, 2) = elmt02.Item(y).href
Else
ActiveSheet.Cells(y + 1, 2) = "No document"
End If
Next
End Sub
The following script should solve the issue you are having. I had to modify your code to skip the blank row. I hope you will be able to comply with the current version:
Public Sub GetData()
Dim Html As HTMLDocument, HTMLDoc As HTMLDocument
Dim oPdfLink As Object, xURL As String, I As Long
Set Html = New MSHTML.HTMLDocument
Set HTMLDoc = New MSHTML.HTMLDocument
xURL = "https://immobilienpool.de/suche/immobilien?page=1"
With CreateObject("MSXML2.XMLHTTP.6.0")
.Open "GET", xURL, False
.send
Html.body.innerHTML = .responseText
End With
With Html.querySelectorAll("li[class*='contentBox']")
For I = 0 To .Length - 1
HTMLDoc.body.innerHTML = .item(I).outerHTML
Set oPdfLink = HTMLDoc.querySelector("a[title*='zusätzliche']")
If Not oPdfLink Is Nothing Then
ActiveSheet.Cells(I + 1, 2) = oPdfLink.href
Else:
ActiveSheet.Cells(I + 1, 2) = "No document"
End If
Next I
End With
End Sub
I am trying to use MSXML2 and IHTMLDocument
to deal with iframe part of HTML web page.
I want to use MSXML2 and save it to better "capture" the data, thinking it's faster than just using InternetExplorer or VBA selenium reference supported by VBA menu.
(I don't want to avoid using IE or selenium as much as possible)
But I couldn't find out how to save document as XML format(to take advantage of its speed) and at the same time click on the element in the document without the help of browser(ie or selenium).
And even after clicking some tab(id="cns_Tab21") on this web page, I have difficulty retrieving data.
So my question is..
1> Is it possible to minimize the use of browser for clicking?
2> Even after clicking(using Selenium), it throws an xpath related error in VBA editor.
Thank you for your answer in advance and the URL used for this is
http://bitly.kr/finance
and the iframe inside the link is http://bitly.kr/LT0aCb
'I declared objects
Dim XMLReq As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim iframeDoc As IHTMLDocument
'and saved XML data to HTML format
HTMLDoc.body.innerHTML = XMLReq.responseText
'and trying to save this HTML to iframe...
Set iframeDoc = HTMLDoc.getElementById("coinfo_cp")
'I tried .contentDocument but it maybe HTMLdoc doesn't have this property.
and I don't know how to access information I saved to iframeDoc above.
'And after I use Selenium I can't figure out why it throw an error
For Each ele In selenium.FindElementsByTag("th")
If ele.Attribute("innerText") = "CAPEX" Then
Debug.Print ele.FindElementsByXPath("./../td").Attribute("innerText")
This post isn't a duplicate since I am trying to use XML to handle iframe element and without InternetExplorer reference in VBA Excel.(ie.document)
You can make replicate the xhr request the page makes when that tab (not iframe) is selected. I use clipboard to copy table to Excel. Note: url I am using is from our discussions. This info should be reflected in question.
Option Explicit
Public Sub GetTable()
'VBE > Tools > References > Microsoft HTML Object Library
Dim html As HTMLDocument, hTable As HTMLTable, clipboard As Object
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://navercomp.wisereport.co.kr/v2/company/ajax/cF1001.aspx?cmp_cd=005930&fin_typ=0&freq_typ=Y&encparam=ZXR1cWFjeGJnS1lWOHhCYmNScmJXUT09&id=bG05RlB6cn", False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
html.body.innerHTML = .responseText
End With
Set hTable = html.querySelector(".hbG05RlB6cn + .gHead01")
Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}") ' New DataObject
clipboard.SetText hTable.outerHTML
clipboard.PutInClipboard
ThisWorkbook.Worksheets("Sheet1").Cells(1, 1).PasteSpecial
End Sub
You can find the params of the ajax url for the tab content update in the scripts of the page
Along with the target for the update:
This needs tidying up:
Option Explicit
Public Sub GetTable()
'https://navercomp.wisereport.co.kr/v2/company/c1010001.aspx?cmp_cd=005930
'VBE > Tools > References > Microsoft HTML Object Library
Dim html As HTMLDocument, hTable As HTMLTable, clipboard As Object, ws As Worksheet
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://navercomp.wisereport.co.kr/v2/company/ajax/cF1001.aspx?cmp_cd=005930&fin_typ=0&freq_typ=Y&encparam=ZXR1cWFjeGJnS1lWOHhCYmNScmJXUT09&id=bG05RlB6cn", False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
html.body.innerHTML = .responseText
End With
Set hTable = html.querySelector(".hbG05RlB6cn + .gHead01") '2nd tab. CAPEX row
Dim html2 As HTMLDocument, i As Long
Set html2 = New HTMLDocument
html2.body.innerHTML = hTable.outerHTML
Dim tableBodyRows As Object, tableBodyRowLength As Long, tableHeaderRowLength As Long, tableHeaderRows As Object, targetRow As Long
Set tableBodyRows = html2.querySelectorAll("tbody tr .bg")
tableBodyRowLength = tableBodyRows.Length
tableHeaderRowLength = html2.querySelectorAll("thead tr").Length + 2
For i = 0 To tableBodyRowLength - 1
If Trim$(tableBodyRows.item(i).innerText) = "CAPEX" Then
targetRow = i + tableHeaderRowLength + 1
Exit For
End If
Next
Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}") ' New DataObject
clipboard.SetText hTable.outerHTML
clipboard.PutInClipboard
ws.Cells(1, 1).PasteSpecial
Dim unionRng As Range
For i = (tableHeaderRowLength + 1) To (tableBodyRowLength + tableHeaderRowLength)
If i <> targetRow Then
If Not unionRng Is Nothing Then
Set unionRng = Union(ws.rows(i), unionRng)
Else
Set unionRng = ws.rows(i)
End If
End If
Next
If Not unionRng Is Nothing Then unionRng.Delete
End Sub
I would like to scrape a website (extract a product price) from a single website page (with XML HTTP request). But before this script should run I need to have selected the correct store first (saved in browser cookie variable or included in any other way/request if possible) since prices are different in different shops.
I have created a working code but it's taking a very long time to run so i assume there must be faster and cleaner :) way. I also needed to include the application to wait for the website to follow the steps.
My current vba code:
runs a HTTP IE request to open the website, and in multiple clicks selects the desired store and saves it in a cookie (like a site user should do)
next the product page is requested with another HTTP IE request and data is extracted. I found out a can't use the XML HTTP request because it won't use the cookie value with the correct store, displaying the correct price.
The price i'm after (in the example below) is E 1,39 instead of E 1,48 (when no cookie value is used and no store is selected).
The cookie value is saved in the cookie "www.jumbo.com/cookie/HomeStore the Content is holding the store tag which is known upfront and could be hardcoded in a request if possible.
Selecting the correct store (and saving it in a browser cookie)
Sub SetStore()
Dim IE As New SHDocVw.InternetExplorer
Dim HTMLDoc As MSHTML.HTMLDocument
Dim HTMLSearchbox As MSHTML.IHTMLElement
Dim HTMLSearchboxes As MSHTML.IHTMLElementCollection
Dim HTMLButton As MSHTML.IHTMLElement
Dim HTMLButtons As MSHTML.IHTMLElementCollection
Dim HTMLSearchButton As MSHTML.IHTMLElement
Dim HTMLSearchButtons As MSHTML.IHTMLElementCollection
Dim HTMLStoreID As MSHTML.IHTMLElement
Dim HTMLStoreIDs As MSHTML.IHTMLElementCollection
Dim HTMLSaveStore As MSHTML.IHTMLElement
Dim HTMLSaveStores As MSHTML.IHTMLElementCollection
'set on False to hide IE screen
IE.Visible = True
'navigate to url with limited content
IE.navigate "https://www.jumbo.com/content/algemene-voorwaarden/"
Do While IE.readyState <> READYSTATE_COMPLETE
Loop
Set HTMLDoc = IE.document
Set HTMLButtons = HTMLDoc.getElementsByTagName("button")
For Each HTMLButton In HTMLButtons
If HTMLButton.getAttribute("data-jum-action") = "openHomeStoreFinder" Then
HTMLButton.Click
Exit For
End If
Next HTMLButton
Application.Wait Now + #12:00:02 AM#
Set HTMLSearchboxes = HTMLDoc.getElementsByTagName("input")
For Each HTMLSearchbox In HTMLSearchboxes
If HTMLSearchbox.getAttribute("id") = "searchTerm__DkKYx4XylsAAAFJktpb2Guy" Then
'input field store name/location to show search results
HTMLSearchbox.Value = "Oosterhout"
Application.Wait Now + #12:00:03 AM#
HTMLSearchbox.Click
Exit For
End If
Next HTMLSearchbox
Set HTMLSearchButtons = HTMLDoc.getElementsByTagName("button")
For Each HTMLSearchButton In HTMLSearchButtons
If HTMLSearchButton.getAttribute("data-jum-filter") = "search" Then
HTMLSearchButton.Click
Exit For
End If
Next HTMLSearchButton
Application.Wait Now + #12:00:05 AM#
Set HTMLStoreIDs = HTMLDoc.getElementsByTagName("li")
For Each HTMLStoreID In HTMLStoreIDs
'oosterhout = YC8KYx4XB88AAAFIDcIYwKxJ
'nieuwegein = 84IKYx4XziUAAAFInSYYwKrH
'vaassen = JYYKYx4XC1oAAAFItvcYwKxJ
'brielle = OG8KYx4XP4wAAAFIlsEYwKxK
If HTMLStoreID.getAttribute("data-jum-store-id") = "YC8KYx4XB88AAAFIDcIYwKxJ" Then
HTMLStoreID.Click
Application.Wait Now + #12:00:03 AM#
Exit For
End If
Next HTMLStoreID
Set HTMLSaveStores = HTMLDoc.getElementsByTagName("button")
For Each HTMLSaveStore In HTMLSaveStores
If HTMLSaveStore.getAttribute("data-jum-action") = "saveHomeStore" Then
HTMLSaveStore.Click
Exit For
End If
Next HTMLSaveStore
'IE.Quit
End Sub
Extracting data from product page (IE HTTP request, working with cookie store value)
Sub GetJumboPriceIE()
Dim IE As New SHDocVw.InternetExplorer
Dim HTMLDoc As MSHTML.HTMLDocument
Dim JumInputs As MSHTML.IHTMLElementCollection
Dim JumInput As MSHTML.IHTMLElement
Dim JumPrice As MSHTML.IHTMLElement
Dim JumboPrice As Double
Dim Price_In_Cents_Tag As String
Dim SKU_tag As String, SKU_url As String
SKU_tag = "173140KST"
SKU_url = "https://www.jumbo.com/lu-bastogne-koeken-original-260g/173140KST/"
IE.Visible = False
IE.navigate SKU_url
Do While IE.readyState <> READYSTATE_COMPLETE
Loop
Set HTMLDoc = IE.document
IE.Quit
Set JumInputs = HTMLDoc.getElementsByTagName("input")
Price_In_Cents_Tag = "PriceInCents_" & SKU_tag
Set JumPrice = HTMLDoc.getElementById(Price_In_Cents_Tag)
JumboPrice = JumPrice.getAttribute("value") / 100
Debug.Print JumboPrice
End Sub
The code above is working but would like to use XML HTTP request code like below (but using the correct store). The price of 1,39 is printed.
Extracting data from product page (using XML HTTP request), but cookie value is not used
Sub GetJumboPriceXML()
Dim XMLReq As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim JumInputs As MSHTML.IHTMLElementCollection
Dim JumInput As MSHTML.IHTMLElement
Dim JumPrice As MSHTML.IHTMLElement
Dim JumboPrice As Double
Dim Price_In_Cents_Tag As String
Dim SKU_tag As String, SKU_url As String
SKU_tag = "173140KST"
SKU_url = "https://www.jumbo.com/lu-bastogne-koeken-original-260g/173140KST/"
XMLReq.Open "GET", SKU_url, False
XMLReq.send
If XMLReq.Status <> 200 Then
MsgBox "Problem" & vbNewLine & XMLReq.Status & " - " & XMLReq.statusText
Exit Sub
End If
HTMLDoc.body.innerHTML = XMLReq.responseText
Set JumInputs = HTMLDoc.getElementsByTagName("input")
Price_In_Cents_Tag = "PriceInCents_" & SKU_tag
Set JumPrice = HTMLDoc.getElementById(Price_In_Cents_Tag)
JumboPrice = JumPrice.getAttribute("value") / 100
Debug.Print JumboPrice
End Sub
This code is not using the correct store and outputting the price i'm not after (The price 1,48 is printed).
To summarize:
When no store is selected (no cookie set) the following URL now gives the price of €1,48.
I would like the VB script to set the store to “Jumbo Oosterhout Nieuwe Bouwlingstraat” and then scrape a predefined list op product URL’s and extract the prices (URL above gives €1,39).
Then set the store to a different local store “Jumbo Brielle Thoelaverweg” and scrape the identical list of product URL’s. The above URL gives €1,48.
You can select a different store by clicking on the location pin icon at the top right of the page.
Thanks a lot for your help
<span class="export excel">Excel </span>|
How can I copy the href in this HTML via Excel VBA?
This is my code, but it doesn't work.
Set Export = ie.Document.all("export excel")
URL = Export.href
ie.Navigate URL
This code you should walk through in debug mode ... it's more for instructional use than for production, but it will do what you want from it. I used Google as a test site.
Sub Test()
Dim Browser As SHDocVw.InternetExplorer
Dim HTMLDoc As MSHTML.HTMLDocument
Dim Link As String, Target As Object
Link = "http://www.google.com"
' start browser
Set Browser = New SHDocVw.InternetExplorer
Browser.Visible = True
' wait a bit
Browser.Navigate Link
' wait a bit
Set HTMLDoc = Browser.Document
' wait a bit
Set Target = GetElementByTagAndClassName(HTMLDoc, "SPAN", "gbtb2")
If Not (Target Is Nothing) Then
' test here if parent really is a <a>
Debug.Print Target.parentElement.href
' ta-taaaa!!!
End If
End Sub
' get element by tag and attribute value
Function GetElementByTagAndClassName(Doc As MSHTML.HTMLDocument, ByVal Tag As String, ByVal Match As String) As MSHTML.IHTMLElement
Dim ECol As MSHTML.IHTMLElementCollection
Dim IFld As MSHTML.IHTMLElement
Set GetElementByTagAndClassName = Nothing
Set ECol = Doc.getElementsByTagName(Tag)
For Each IFld In ECol
' Debug.Print IFld.className
If IFld.className = Match Then
Set GetElementByTagAndClassName = IFld
Exit Function
End If
Next
End Function
This question already has an answer here:
Get data from listings on a website to excel VBA
(1 answer)
Closed 9 years ago.
<span itemprop="streetAddress">
**94 Grand St**
</span>
how to get this data through getelementby method in excel vba
I have tried getelementbyid, getelementbyname etc. but nothing is working
Option Explicit
Sub find()
'Uses late binding, or add reference to Microsoft HTML Object Library
' and change variable Types to use intellisense
Dim ie As Object 'InternetExplorer.Application
Dim html As Object 'HTMLDocument
Dim Listings As Object 'IHTMLElementCollection
Dim l As Object 'IHTMLElement
Dim r As Long
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = False
.Navigate "http://www.yelp.com/biz/if-boutique-new-york#query:boutique"
' Don't show window
'Wait until IE is done loading page
Do While .readyState <> 4
Application.StatusBar = "Downloading information, Please wait..."
DoEvents
Loop
Set html = .Document
End With
Set Listings = html.getElementsByTagName("span") ' ## returns the list
MsgBox (Listings(0))
For Each l In Listings
'## make sure this list item looks like the listings Div Class:
' then, build the string to put in your cell
Range("A1").Offset(r, 0).Value = l.innerText
r = r + 1
Next
Set html = Nothing
Set ie = Nothing
End Sub
The above program is used by me to get the innerText value inside the span tag... but its not working
For the single result you are looking for in detail you want to use these two lines in your code (there is only 1 listing at the detailed level)
Adapt your IE code
Set Listings = html.getElementbyid("bizInfoBody") ' ## returns the list
Range("A1").Offset(r, 0).Value = Listings.innerText
with XMLHTTP
Sub GetTxt()
Dim objXmlHTTP As Object
Dim objHtmlDoc As Object
Dim objHtmlBody As Object
Dim objTbl As Object
Dim strResponse As String
Dim strSite As String
Set objHtmlDoc = CreateObject("htmlfile")
Set objHtmlBody = objHtmlDoc.body
Set objXmlHTTP = CreateObject("MSXML2.XMLHTTP")
strSite = "http://www.yelp.com/biz/if-boutique-new-york"
With objXmlHTTP
.Open "GET", strSite, False
.Send
If .Status = 200 Then
strResponse = .responseText
objHtmlBody.innerHTML = objXmlHTTP.responseText
Set objTbl = objHtmlBody.Document.getElementbyid("bizInfoBody")
MsgBox objTbl.innerText
End If
End With
End Sub