VBA Webscrape URL from HTML (src="") - excel

I tried to combine code parts i could make work, but it was working with <span>, <meta> but it is not working with <img>
Can anyone help to make it work?
I try to get:
https://www.lego.com/cdn/cs/set/assets/blt34360a0ffaff7811/11015_alt.png?fit=bounds&format=png&width=800&height=800&dpr=1
From this code:
<img src="https://www.lego.com/cdn/cs/set/assets/blt34360a0ffaff7811/11015_alt.png?fit=bounds&format=png&width=800&height=800&dpr=1" alt="" class="Imagestyles__Img-sc-1qqdbhr-0 cajeby">
Code part where i want to get the Src url
Sub picgrab()
Dim Doc As Object
Dim nodeAllPic As Object
Dim nodeOnePic As Object
Dim pic As Object
Set Doc = CreateObject("htmlFile")
With CreateObject("MSXML2.XMLHTTP.6.0")
url = "https://www.lego.com/hu-hu/product/around-the-world-11015"
.Open "GET", url, False
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0"
.send
' It is important that i can't use InternetExplorer.
'This should work i guess, but it skips after 'For Each' line.
Set nodeAllPic = Doc.getElementsByClassName("Imagestyles__Img-sc-1qqdbhr-0 cajeby")
For Each nodeOnePic In nodeAllPic
If nodeOnePic.getAttribute("class") = "Imagestyles__Img-sc-1qqdbhr-0 cajeby" Then
Set pic = nodeOneVip.getElementsByClassName("Imagestyles__Img-sc-1qqdbhr-0 cajeby")(0)
ActiveCell.Value = pic.getAttribute("src")
End If
Next nodeOnePic
End With
End Sub
I tired the code above and modified it many way, but couldn't get the content of Src="" .

Need to write the response
First of all, you never write the HTML response to your htmlfile object. So you won't be able to find anything when you call the method getElementsByClassName on it.
Make sure that you include the following line before trying to use the Doc object:
Doc.Write .responseText
Dynamic Content
Secondly, some of the content on that page is not in the original HTTP request that XMLHTTP receives. The page contains JavaScript code that loads content dynamically.
To test this in Chrome, you can open the Chrome DevTools window on that page, then disable JavaScript and refresh the page.
You'll then see the original HTML and a notification that says that JavaScript is disabled.
And now, if you search inside the Elements tab, you won't find the element you were looking for (at least I couldn't find anything with a class "cajeby").
Browser emulation
So, now what? Well, you'll need to use an object to manipulate the original response to execute the JavaScript code. For that you could use Selenium. It's the modern way of doing web scraping or any browser automation with VBA.
You can easily find tutorials on how to get started with Selenium for VBA, but I would recommend this video by WiseOwlTutorials.
Then your code could look like this:
Dim Browser As New Selenium.WebDriver
Browser.Start "chrome", "https://www.lego.com/hu-hu/product/around-the-world-11015"
Browser.Get "/"
Dim img As WebElement
Set img = Browser.FindElementByCss(".Imagestyles__Img-sc-1qqdbhr-0.cajeby", timeout:=5000)
Debug.Print img.Attribute("src")
Set Browser = Nothing
Some notes on the code
Make sure that you have included a reference to the Selenium Library
Notice the use of FindElementByCss. This is necessary because you are using 2 class names and no other method currently support that, but you'll need to use the CSS selector syntax. (More about this here).
Notice the use of timeout:=5000 that lets Selenium know that you are willing to wait up to 5000 milliseconds for the JavaScript code to load the content you are looking for (More details here).

Related

Replacing IE Bits with Edge in VBA

To prepare for the eventual 'going away' of IE11, I've been trying to figure out how to replace a couple parts of my code. One involves launching IE and using that browser to scrape some pages. Is there an equivalent way to do the below in Edge? I don't see a way to add a reference to the Edge libraries like I did with 'Microsoft Internet Objects' and IE11.
Dim ie As InternetExplorerMedium: Set ie = New InternetExplorerMedium
Dim html As HTMLDocument
With ie
.Visible = False
.Navigate website 'string that's created above this code
End With
Do While ie.ReadyState <> READYSTATE_COMPLETE
DoEvents
Loop
Application.Wait Now + #12:00:10 AM#
Set html = ie.Document
Thanks everyone for your help.
Ok, a few explanations. I am writing these as a reply so as not to have to split them into several comments.
Does Edge work instead of IE to do web scraping with VBA?
It does not work directly. The reason is that IE has a COM interface (Wikipedia: Component Object Model). No other browser has this interface. Not even Edge.
But for Edge there is also a web driver for Selenium. Even provided directly by MS.
Another alternative - xhr
Since you can't use Selenium because you don't have admin rights, there might be the possibility to use xhr (XML HTTP Request). However, in order to make a statement on this, we would have to know the page that you want to scrape.
Xhr can be used directly from VBA because it does not use a browser. The big limitation is that only static content can be processed. No JavaScript is executed, so nothing is reloaded or generated dynamically in any other way. On the other hand, this option is much faster than browser solutions. Often, a static file provided by the web server is sufficient. This can be an HTML file, a JSON or another data exchange format.
There are many examples of using xhr with VBA here on SO. Take note of the possibility first as another approach. I can't explain the method exhaustively here, also because I don't know everything about it myself. But there are many ways to use it.
By the way
IE will finally be discontinued in June 2022 and will then also no longer be delivered with Windows. That's what I read on the German IT pages a few days ago. But there are already massive restrictions on the use of IE.

How to get the content of the entire webpage

I have been using the attached code in my VBA macro to get a webpage content for a couple of years.
Recently a new functionality was needed based on the information on the webpage.
I found out that I can see the information I need when I Shift-Ctrl-I in Chrome and Copy the top Element but it is not present in what I’m getting through the code.
What do I need to change in the code to get the whole page? The equivalent of Shift-Ctrl-I Copy Element.
Set Request = CreateObject("MSXML2.XMLHTTP")
Request.Open "GET", ZadanieRef, False
Request.setRequestHeader "If-Modified-Since", "Tue, 1 Jan 2019 00:00:00 GMT"
Request.send
response = StrConv(Request.responseBody, vbUnicode)
I I have looked, this is what I found vba Open URL in Chrome including login and password
The answer was:
Chrome does not have a library in VBA, so there is no way to manipulate chrome like IE. You can only open a webpage direct via Shell, but not manipulate further.
If you know otherwise please provide a link.

Screenshotting Google Maps and Pasting into Excel Document VBA [duplicate]

This question already has answers here:
is it possible to display a Google Earth map INSIDE Excel?
(3 answers)
Closed 4 years ago.
I have a code that already searches for the latitude and longitude and pastes to my worksheet, which works perfectly. I'm looking for a way to take that latitude and longitude, load google maps, and either take a screenshot of the google maps page or embed the map into Excel.
In my code below I have a code that already loads google maps for any input address, but I do not know how to either take the screenshot of the map (preferably without the input information on the side of the page) or embed the map into Excel. The extra code at the bottom is for a request/response from a USGS website that pulls official seismic information for a location, but should not effect the top part of the code.
Please note that I want this to just be a static screenshot of the map if possible. I do not want to install Google Earth on multiple desktops to be able to embed an interactive map into the worksheet if at all possible.
Option Explicit
Public Sub Seismicgrab()
Dim browser As New ChromeDriver
Dim URL As String
Dim ws As Object
Dim xmlhttp As New MSXML2.XMLHTTP60
browser.Get "http://www.google.com/maps?q=" & Range("H13").Value
browser.Wait 5000
Cells(19, 13).Value = browser.URL
browser.Close
URL = Range("M24").Value
xmlhttp.Open "GET", URL, False
xmlhttp.Send
Worksheets("Title").Range("M25").Value = xmlhttp.responseText
End Sub
You can use the TakeScreenshot method of the object
browser.TakeScreenshot.SaveAs ".....jpg" '<== put your path and file name here
For more flexibility e.g. cropping consider switching languages and using any of these methods:
How to capture the screenshot of a specific element rather than entire page using Selenium Webdriver?
Additionally, there are ways I believe with standard VBA and API calls to take a screenshot and then crop an image.

Access web page body text using VBA & Selenium

I am trying to convert an Excel macro that currently uses Internet Explorer and use the following line of code to extract the web page’s <body> text
x = .Document.DocumentElement.InnerText
Using the Selenium demo, I am able to produce a jpg of the page with Chrome & IE, but Firefox just loads a blank page and IE64 & Edge don’t work on Windows 10.
I have been unable to find the proper VBA command with Selenium to copy the body text to variable ”x”. I only want to read it.
I am trying to do this to make my macro browser independent.
The macro is for my use only.
Jim
You are not making it browser agnostic. You are simply widening the choice of browser to those supported via selenium basic. This brings some problems of its own which you are noticing.
Folders containing the drivers must be on the environmental path or the path passed to selenium webdriver as an argument.
You should use the latest Chrome browser and Chrome driver
You cannot use the latest FireFox browser and driver. It is not supported. I think you need FF v.46.0.1.
If using IE then zoom must be to 100%.
I suggest browsing the issues pages of Github for further known issues
Heuristically, I have heard some banter about problems with Windows 10 and Selenium Basic - would be interested to know if anyone has got this working as I am not on that version.
Review the examples.xlsm provided by selenium basic GitHub site to see which other browsers are supported (e.g. Opera, PhantomJS, FirefoxLight,CEF).
With Chrome you can get the body text with this:
Option Explicit
Public Sub GetInfo()
Dim d As WebDriver, s As String
Set d = New ChromeDriver
Const URL = "https://www.neutrinoapi.com/api/api-examples/python/"
With d
.Start "Chrome"
.get URL
s = .FindElementByTag("body").Text
Debug.Print s
.Quit
End With
End Sub
Other info: https://stackoverflow.com/a/52294259/6241235

VBA Excel Download webpage complete

I'm trying to download a complete webpage. In other words automate this process:
1- Open the webpage
2- Click on Save as
3- Select Complete
4- Close the webpage.
This is what I've got so far:
URL = "google.com" 'for TEST
Dim IE
Set IE = CreateObject("Internetexplorer.Application")
IE.Visible = False
IE.Navigate URL
Do
Loop While IE.Busy = True
Dim i
Dim Filename
i = 0
Filename = "C:\Test.htm"
IE.Document.ExecCommand "SaveAs", False, Filename
When I run the code in the last line a save file dialog appears. Is there any way to suppress this?
Any help would be most appreciated.
The Save As dialog cannot be suppressed:
The Save HTML Document dialog cannot be suppressed when calling this method from script.
It is also a modal dialog and you cannot automate the way to click the "Save" button. VBA execution pauses while waiting manual user input when faced with a dialog of this sort.
Rather than using the IE.Document.ExecCommand method, you could try to read the page's HTML and print that to a file using standard I/O functions.
Option Explicit
Sub SaveHTML()
Dim URL as String
Dim IE as Object
Dim i as Long
Dim FileName as String
Dim FF as Integer
URL = "http://google.com" 'for TEST
Filename = "C:\Test.htm"
Set IE = CreateObject("Internetexplorer.Application")
IE.Visible = True
IE.Navigate URL
Do
Loop While IE.Busy
'Creates a file as specified
' this will overwrite an existing file if already exists
CreateObject("Scripting.FileSystemObject").CreateTextFile FileName
FF = FreeFile
Open Filename For Output As #FF
With IE.Document.Body
Print #FF, .OuterHtml & .InnerHtml
End With
Close #FF
IE.Quit
Set IE = Nothing
End Sub
I am not sure whether this will give you exactly what you want, or not. There are other ways to get data from web and probably the best would be to get the raw HTML from an XMLHTTP request and print that to a file.
Of course, it is rarely the case that we actually need an entire web page in HTML format, so if you are looking to then scrape particular data from a web page, the XMLHTTP and DOM would be the best way to do this, and it's not necessary to save this to a file at all.
Or, you could use the Selenium wrapper to automate IE, which is much more robust than using the relatively few native methods to the InternetExplorer.Application class.
Note also that you are using a rather crude method of waiting for the web page to load (Loop While IE.Busy). While this may work sometimes, it may not be reliable. There are dozens of questions about how to do this properly here on SO, so I would refer you to the search feature here to tweak that code a little bit.

Resources