I'm trying to download a complete webpage. In other words automate this process:
1- Open the webpage
2- Click on Save as
3- Select Complete
4- Close the webpage.
This is what I've got so far:
URL = "google.com" 'for TEST
Dim IE
Set IE = CreateObject("Internetexplorer.Application")
IE.Visible = False
IE.Navigate URL
Do
Loop While IE.Busy = True
Dim i
Dim Filename
i = 0
Filename = "C:\Test.htm"
IE.Document.ExecCommand "SaveAs", False, Filename
When I run the code in the last line a save file dialog appears. Is there any way to suppress this?
Any help would be most appreciated.
The Save As dialog cannot be suppressed:
The Save HTML Document dialog cannot be suppressed when calling this method from script.
It is also a modal dialog and you cannot automate the way to click the "Save" button. VBA execution pauses while waiting manual user input when faced with a dialog of this sort.
Rather than using the IE.Document.ExecCommand method, you could try to read the page's HTML and print that to a file using standard I/O functions.
Option Explicit
Sub SaveHTML()
Dim URL as String
Dim IE as Object
Dim i as Long
Dim FileName as String
Dim FF as Integer
URL = "http://google.com" 'for TEST
Filename = "C:\Test.htm"
Set IE = CreateObject("Internetexplorer.Application")
IE.Visible = True
IE.Navigate URL
Do
Loop While IE.Busy
'Creates a file as specified
' this will overwrite an existing file if already exists
CreateObject("Scripting.FileSystemObject").CreateTextFile FileName
FF = FreeFile
Open Filename For Output As #FF
With IE.Document.Body
Print #FF, .OuterHtml & .InnerHtml
End With
Close #FF
IE.Quit
Set IE = Nothing
End Sub
I am not sure whether this will give you exactly what you want, or not. There are other ways to get data from web and probably the best would be to get the raw HTML from an XMLHTTP request and print that to a file.
Of course, it is rarely the case that we actually need an entire web page in HTML format, so if you are looking to then scrape particular data from a web page, the XMLHTTP and DOM would be the best way to do this, and it's not necessary to save this to a file at all.
Or, you could use the Selenium wrapper to automate IE, which is much more robust than using the relatively few native methods to the InternetExplorer.Application class.
Note also that you are using a rather crude method of waiting for the web page to load (Loop While IE.Busy). While this may work sometimes, it may not be reliable. There are dozens of questions about how to do this properly here on SO, so I would refer you to the search feature here to tweak that code a little bit.
Related
I tried to combine code parts i could make work, but it was working with <span>, <meta> but it is not working with <img>
Can anyone help to make it work?
I try to get:
https://www.lego.com/cdn/cs/set/assets/blt34360a0ffaff7811/11015_alt.png?fit=bounds&format=png&width=800&height=800&dpr=1
From this code:
<img src="https://www.lego.com/cdn/cs/set/assets/blt34360a0ffaff7811/11015_alt.png?fit=bounds&format=png&width=800&height=800&dpr=1" alt="" class="Imagestyles__Img-sc-1qqdbhr-0 cajeby">
Code part where i want to get the Src url
Sub picgrab()
Dim Doc As Object
Dim nodeAllPic As Object
Dim nodeOnePic As Object
Dim pic As Object
Set Doc = CreateObject("htmlFile")
With CreateObject("MSXML2.XMLHTTP.6.0")
url = "https://www.lego.com/hu-hu/product/around-the-world-11015"
.Open "GET", url, False
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0"
.send
' It is important that i can't use InternetExplorer.
'This should work i guess, but it skips after 'For Each' line.
Set nodeAllPic = Doc.getElementsByClassName("Imagestyles__Img-sc-1qqdbhr-0 cajeby")
For Each nodeOnePic In nodeAllPic
If nodeOnePic.getAttribute("class") = "Imagestyles__Img-sc-1qqdbhr-0 cajeby" Then
Set pic = nodeOneVip.getElementsByClassName("Imagestyles__Img-sc-1qqdbhr-0 cajeby")(0)
ActiveCell.Value = pic.getAttribute("src")
End If
Next nodeOnePic
End With
End Sub
I tired the code above and modified it many way, but couldn't get the content of Src="" .
Need to write the response
First of all, you never write the HTML response to your htmlfile object. So you won't be able to find anything when you call the method getElementsByClassName on it.
Make sure that you include the following line before trying to use the Doc object:
Doc.Write .responseText
Dynamic Content
Secondly, some of the content on that page is not in the original HTTP request that XMLHTTP receives. The page contains JavaScript code that loads content dynamically.
To test this in Chrome, you can open the Chrome DevTools window on that page, then disable JavaScript and refresh the page.
You'll then see the original HTML and a notification that says that JavaScript is disabled.
And now, if you search inside the Elements tab, you won't find the element you were looking for (at least I couldn't find anything with a class "cajeby").
Browser emulation
So, now what? Well, you'll need to use an object to manipulate the original response to execute the JavaScript code. For that you could use Selenium. It's the modern way of doing web scraping or any browser automation with VBA.
You can easily find tutorials on how to get started with Selenium for VBA, but I would recommend this video by WiseOwlTutorials.
Then your code could look like this:
Dim Browser As New Selenium.WebDriver
Browser.Start "chrome", "https://www.lego.com/hu-hu/product/around-the-world-11015"
Browser.Get "/"
Dim img As WebElement
Set img = Browser.FindElementByCss(".Imagestyles__Img-sc-1qqdbhr-0.cajeby", timeout:=5000)
Debug.Print img.Attribute("src")
Set Browser = Nothing
Some notes on the code
Make sure that you have included a reference to the Selenium Library
Notice the use of FindElementByCss. This is necessary because you are using 2 class names and no other method currently support that, but you'll need to use the CSS selector syntax. (More about this here).
Notice the use of timeout:=5000 that lets Selenium know that you are willing to wait up to 5000 milliseconds for the JavaScript code to load the content you are looking for (More details here).
To prepare for the eventual 'going away' of IE11, I've been trying to figure out how to replace a couple parts of my code. One involves launching IE and using that browser to scrape some pages. Is there an equivalent way to do the below in Edge? I don't see a way to add a reference to the Edge libraries like I did with 'Microsoft Internet Objects' and IE11.
Dim ie As InternetExplorerMedium: Set ie = New InternetExplorerMedium
Dim html As HTMLDocument
With ie
.Visible = False
.Navigate website 'string that's created above this code
End With
Do While ie.ReadyState <> READYSTATE_COMPLETE
DoEvents
Loop
Application.Wait Now + #12:00:10 AM#
Set html = ie.Document
Thanks everyone for your help.
Ok, a few explanations. I am writing these as a reply so as not to have to split them into several comments.
Does Edge work instead of IE to do web scraping with VBA?
It does not work directly. The reason is that IE has a COM interface (Wikipedia: Component Object Model). No other browser has this interface. Not even Edge.
But for Edge there is also a web driver for Selenium. Even provided directly by MS.
Another alternative - xhr
Since you can't use Selenium because you don't have admin rights, there might be the possibility to use xhr (XML HTTP Request). However, in order to make a statement on this, we would have to know the page that you want to scrape.
Xhr can be used directly from VBA because it does not use a browser. The big limitation is that only static content can be processed. No JavaScript is executed, so nothing is reloaded or generated dynamically in any other way. On the other hand, this option is much faster than browser solutions. Often, a static file provided by the web server is sufficient. This can be an HTML file, a JSON or another data exchange format.
There are many examples of using xhr with VBA here on SO. Take note of the possibility first as another approach. I can't explain the method exhaustively here, also because I don't know everything about it myself. But there are many ways to use it.
By the way
IE will finally be discontinued in June 2022 and will then also no longer be delivered with Windows. That's what I read on the German IT pages a few days ago. But there are already massive restrictions on the use of IE.
My goal is to click the excel download button on this website. I keep getting 'Automation error. The interface is unknown' at my while loops.
Sub GetData()
Dim IE As InternetExplorerMedium
Dim HTMLDoc As HTMLDocument
Dim objElement As HTMLObjectElement
Set IE = New InternetExplorerMedium
With IE
.Visible = True
.Navigate "https://www.pimco.com/en-us/investments/mutual-funds"
Do While .readyState = 4: DoEvents: Loop
Do Until .readyState = 4: DoEvents: Loop
.document.getElementById("csvLink").Click
End With
Set IE = Nothing
End Sub
Here's a bunch of functions you can use to help clean up your code. https://stackoverflow.com/a/59721369/12685075
I wouldnt try and cram all that into a With clause.
I would be looking at splitting each step into it's own segment with functions.
Then checking for ready state and making sure the element exists first using error handling before you click it.
That being said, I'm going to say you can probably skip the IE explorer loading and get the link directly using XMLHTTP Requests. So open the page in chrome, turn on DevTools, Refresh the page, download the CSV, and start looking through the network requests.
You'll find one that represents the downloaded file, and it's likely a direct link you can then use with the parameters to let XMLHTTP skip the page stuff, and get the file everytime without worrying about the loading elements like CSS / formatting / fonts.
Some explanations why you run in trouble:
Don't use InternetExplorerMedium.
The problem here is that IE is opened twice. After the first opening, it is immediately closed again and the URL is loaded in another instance. But this instance is no longer assigned to the IE variable and cannot be referenced by the macro. You can observe this when you execute your macro. IE seems to twitch once.
The lines Do While .readyState = 4: DoEvents: Loop and Do Until .readyState = 4: DoEvents: Loop wait for the opposide.
The 4 says page is complete loaded. So you can use Do While .readyState <> 4: DoEvents: Loop or Do Until .readyState = 4: DoEvents: Loop. One of both loops is enough.
The page loads dynamic content after the IE reports complete.
For that reason you must break until that content is loaded. The simplest way to do this is a hard break. Look at that part in the code below.
You must trigger the download.
To do this, you need Sendkeys(). This is not a good thing, but can hardly be avoided here. I don't think there is a direct download link, as Peyter assumes, because I assume that the file for the download is only generated upon request based on the displayed data of the page. At least this is my experience with such downloads.
Please read the comments in the macro I wrote above the Sendkeys() line to find the downloaded file on your computer afterwards.
Here is the code that works:
Sub GetData()
Dim IE As Object
Set IE = CreateObject("internetexplorer.application")
IE.Visible = True
IE.Navigate "https://www.pimco.com/en-us/investments/mutual-funds"
Do Until IE.readyState = 4: DoEvents: Loop
'Manual break to load dynamic content after
'the IE reports the ready state 'complete' (4)
'The last three values are hours, minutes, seconds
Application.Wait (Now + TimeSerial(0, 0, 10))
'Now we can click the button
IE.document.getElementById("csvLink").Click
'Here you need sendkeys to trigger the save button
'Don't touch anything while the code runs
'Sendkeys will send the key combination in the brackets
'to the application which has the focus
'The file will be saved to your standard donload directory
'or to the download directory you placed in the IE settings
'if you did that
Application.SendKeys ("%{S}")
'Clean up
IE.Quit
Set IE = Nothing
End Sub
I ultimately want to use Excel VBA with InternetExplorer to scrape data from a webpage.
The scenario requires login to one website page that then redirects to another page where another data entry is required to trigger the response containing the data that I ultimately want.
My code (redacted) successfully performs the initial login and reaches the redirected page, but it stalls before making the data entry on that page. I do not receive any error notification.
I haven't been able to find an answer to the stall, but I assume it is somehow related to the website page redirection. As a point of interest, if I omit the Application.Wait, the execution tries to use the ("name").Value following it in first phase of the login. If anyone can explain that also, I'd be very interested.
Any help or guidance would be greatly appreciated.
Dim IE As InternetExplorerMedium
Set IE = New InternetExplorerMedium
IE.Visible = True
IE.navigate "http://something/Login.aspx"
Do Until IE.readyState = 4: DoEvents: Loop
Do While IE.Busy: DoEvents: Loop
IE.document.getElementById("name").Value = "signin"
IE.document.getElementById("password").Value = "pword"
IE.document.getElementById("btnValidate").Click
Do While IE.Busy: DoEvents: Loop
Do Until IE.readyState = 4: DoEvents: Loop
'Now at redirected page
Application.Wait (Now + TimeValue("0:00:08"))
'This is where execution stalls
IE.document.getElementById("name").Value = "identification"
IE.document.getElementById("btnValidate").Click
This question already has an answer here:
Closed 10 years ago.
Possible Duplicate:
Using an IE browser with Visual Basic
I have a website that is updated daily. I need to retrieve the information from this website daily. Instead of opening up a new browser eg new internet explorer everyday, is it possible to use an already opened internet explorer to retrieve the information.
I don't have IE installed so I can't promise this will work, but give it a shot. Note that you'll need to set references to Microsoft Internet Controls and Microsoft HTML Object Library.
Function GetOpenIE() As SHDocVw.InternetExplorer
Dim ie As SHDocVw.InternetExplorer
Dim sw As SHDocVw.shellWindows
Set sw = New SHDocVw.shellWindows
For Each ie In sw
If TypeOf ie.Document Is HTMLDocument Then
Set GetOpenIE = ie
Exit Function
End If
Next ie
End Function