Scraping source code of website does not work on VDI - excel

I have a problem with extracting data from the website using VBA on Citrix Virtual Desktop.
I have wrote my code on my local desktop first and it works good - HTML source has been extracted to the cell in Excel.
On VDI IE opens the website without any problems.
Code:
Sub GetBody()
Dim Body As String
the_start:
Set ObjIE = CreateObject("InternetExplorer.Application")
ObjIE.Visible = False
ObjIE.navigate ("https://pl.wikipedia.org/wiki/Wikipedia:Strona_g%C5%82%C3%B3wna")
Do
DoEvents
If Err.Number <> 0 Then
ObjIE.Quit
Set ObjIE = Nothing
GoTo the_start:
End If
Loop Until ObjIE.readyState = 4
Body = ObjIE.document.Body.innerHTML
Cells(1, 1).Value = Body
End Sub
When I try to run this code on VDI I am getting following error:
Run-time error '-2147467259(80004005)': Method 'Document' of object 'IWebBrowser2' failed.
Any ideas where this error comes from and what I should add to run it successfully on VDI?

I have done some changes mentioned in the comments (like changing the endless loop etc.) and also have another errors ( Automation error The object invoked has disconnected from its clients).. Previously I have declared IE as a object in this line below:
Set ObjIE = CreateObject("InternetExplorer.Application")
Soultion for all my problems:
Dim IE as SHDocVw.InternetExplorer
Set IE = New InternetExplorerMedium
Thank you all for participating in this thread and THANK YOU SO MUCH for your help!

Related

Filling web form fields but web page unable to detect text

I'm filling a web form using VBA, and I am able to fill text in the inputbox, but the webpage still is unable to detect the text and shows an error:
"Error: Required Field - Please provide an answer"
Set objIE = CreateObject("InternetExplorer.Application")
objIE.Visible = True
URL = "https://npc.collegeboard.org/app/dartmouth/start"
objIE.Navigate URL
objIE.Document.getElementById("student.firstName").Focus
objIE.Document.getElementById("student.firstName").Value = "Tom"
Looks like theres some AngularJS running in the background, and it can't detect text fed in my VBA. Any help would be highly appreciated.
First of all after objIE.Navigate URL you should wait until the website is fully loaded and the IE is ready. This is done with the following:
objIE.Navigate URL 'this needs some time but VBA will continue excecuting the next statement qickly
Const READYSTATE_COMPLETE As Integer = 4
Do While objIE.Busy Or objIE.ReadyState <> READYSTATE_COMPLETE
DoEvents
Loop
'now IE is ready and the page is loaded.
But it might be that some JavaScript is not ready yet and this is not recognized by objIE.Busy Or objIE.ReadyState. So you can do a workaround:
Dim Obj As Object
Do While Obj Is Nothing
On Error Resume Next
Set Obj = objIE.Document.getElementById("student.firstName")
On Error GoTo 0
Loop
'now `student.firstName` is accessible, and probably all the other fields are too.
This will try to access the field student.firstName if it is not there it will error. We suppress the error message using On Error Resume Next and jump back to TryAgain until it was found.
Note that this has one disadvantage: If there is a problem loading this site it will get stuck in this loop. So I recommend to get a timed cancel criterium like if this takes more than a minute cancel it and throw a error message.
Something like the following should work:
Option Explicit
Sub test()
Dim objIE As Object
Set objIE = CreateObject("InternetExplorer.Application")
objIE.Visible = True
Dim URL As String
URL = "https://npc.collegeboard.org/app/dartmouth/start"
objIE.Navigate URL
Const READYSTATE_COMPLETE As Integer = 4
Do While objIE.Busy Or objIE.ReadyState <> READYSTATE_COMPLETE
DoEvents
Loop
Dim Obj As Object
Do While Obj Is Nothing
On Error Resume Next
Set Obj = objIE.Document.getElementById("student.firstName")
On Error GoTo 0
Loop
objIE.Document.getElementById("student.firstName").Focus
objIE.Document.getElementById("student.firstName").Value = "Tom"
End Sub

VBA Internet explorer Automation error

I was trying to set up a public test environment to see if anyone would be able to help me with another question I asked this morning and I'm getting this error which I was not getting in my original code and after browsing a bit around I cannot fix: Automation error - The object invoked has disconnected from its clients.
Here is the full code:
Sub GetBranches()
Dim objIE As InternetExplorer
Set objIE = New InternetExplorerMedium ' create new browser
objIE.Visible = True
objIE.navigate "https://casadasereia.net/vbatests/viewtree241653.html"
' wait for browser
Do While objIE.Busy = True Or objIE.readyState <> 4
DoEvents
Loop
End Sub
Anyone knows how to fix this?
Use late-binding to avoid additional reference to a library and this will fix versioning issues if any.
Sub GetBranches()
Dim objIE as Object
Set objIE = CreateObject("InternetExplorer.Application")
objIE.Visible = True
objIE.navigate "https://casadasereia.net/vbatests/viewtree241653.html"
' wait for browser
Do While objIE.Busy = True Or objIE.readyState <> 4
DoEvents
Loop
End Sub

VBA extract text value from webpage?

I have a webpage with some text in a HTML Span like so:
<span id="ctl00_ContentPlaceHolder1_FormView1_GridView1_ctl02_lb_ExpiryDate">Expiry Date : 16/02/2018</span>
I am trying to get this value display it as a message in excel using the below code:
Sub PullExpiry()
Dim appIE As Object
Set appIE = CreateObject("internetexplorer.application")
With appIE
.Navigate "https://www.brcdirectory.com/InternalSite//Site.aspx?BrcSiteCode=" & Range("J6").Value
.Visible = True
End With
Do While appIE.Busy Or appIE.ReadyState <> 4
DoEvents
Loop
Set getPrice = appIE.Document.getElementById("ctl00_ContentPlaceHolder1_FormView1_GridView1_ctl02_lb_ExpiryDate")
Dim myValue As String
myValue = getPrice.innerText
appIE.Quit
Set appIE = Nothing
MsgBox myValue
End Sub
This was working on my laptop (operating windows) but it does not work on my computer (also operating windows). Both windows are the same version with the same version of IE. I cannot explain it.
I have Microsoft Office and Excel Object libraries turned on in both references.
I get an error about an active x component not being able to create something
Please can someone show me where i am going wrong?
You need to add Microsoft Internet Controls to your references to be able to create the object. You may have to register the shdocvw.dll library on your computer. It may be registered on your laptop already which is why it might be bombing on your computer.
How to register a .dll
MSDN Documentation: look at the last sentence before the C# example.
Similar Question
After playing around a little bit and ensuring everything was registered, this ran fine on my PC:
Public Sub ie()
Dim ieApp As SHDocVw.InternetExplorerMedium
Dim html As HTMLDocument
Set ieApp = New SHDocVw.InternetExplorerMedium
ieApp.Visible = True
ieApp.Navigate "http://internalAddress/"
Do While ieApp.Busy Or ieApp.ReadyState <> 4
DoEvents
Loop
Set html = ieApp.Document.getElementById("myID")
End Sub

VBA IE.Document empty error

I've been running a query for a while now getting data from a webpage. After numerous runs it has decided to stop working, and I've traced the issue back to the ie.document object - it never returns anything.
When compiling my project I see that the "Document" element of ie returns an error of "Applicaiton-defined or Object-defined error" - even before I navigate to a webpage. Also some other elements return this error also - namely "Status Text" and "Type"
The link contains a screenshot of my error:
https://www.dropbox.com/s/wcxxep8my10nu8h/vba%20ie%20document.jpg?dl=0
In case that doesn't work here a scaled back version of the code I'm running
Sub getCard()
Dim ie As InternetExplorer
Dim url1 As String
url1 = "google.com"
Set ie = New InternetExplorer
ie.Visible = True
ie.Navigate url1
WaitBrowserQuiet ie
End Sub
Sub WaitBrowserQuiet(objIE As InternetExplorer)
Do While objIE.Busy Or objIE.ReadyState <> READYSTATE_COMPLETE
DoEvents
Loop
End Sub
As soon as I get to the "Set ie = New InternetExplorer" part of the code is when the ie object is created and I see the errors. If I do happen to navigate to webpage, then the ie.document object is empty.
I've searched around and tried a few things to stop this happening - restarted my computer, run "ie.quit" and "Set ie = Nothing", reset my Internet Explorer, etc... Nothing seems to work.
It seems like it may be a deeper issue given I'm getting an error message even before navigating to a webpage. Hope someone knows how to stop the error.
Your URL is URL1, try changing that, or just putting the URL in there.
In your code you have the object "ie" locally defined in the sub getCard and when this sub finishes,so goes the binding. Also changing from private to public internet zones can remove the binding to that object. What I rather do is use a global object appIE and then when it runs into such an error I catch the error (if TypeName(appIE) = "Object" Then FindIE) and find the object again with this sub:
Sub FindIE() 'Needs reference to "Microsoft Shell Controls And Automation" from VBA->Tools->References
Dim sh
Dim eachIE
Dim SearchUntil
SearchUntil = Now() + 20 / 24 / 60 / 60 'Allow to search for 20 seconds otherwise it interrupts search
Do
Set sh = New Shell32.Shell
For Each eachIE In sh.Windows
If InStr(1, eachIE.LocationURL, ServerAddress) Then
Set appIE = eachIE
'IE.Visible = False 'This is here because in some environments, the new process defaults to Visible.
'Exit Do
End If
Next eachIE
If TypeName(appIE) <> "Object" Then
If InStr(1, appIE.LocationURL, ServerAddress) > 0 Or SearchUntil < Now() Then Exit Do
End If
Loop
Set eachIE = Nothing
Set sh = Nothing
End Sub
This code contains parts of other people here from stackoverflow, but I forgot who to credit for some essential parts of the code. Sorry.

Excel VBA force shutdown IE

I am currently using the following sub to close my IE after automating:
Public Sub CloseIE()
Dim Shell As Object
Dim IE As Object
Set Shell = CreateObject("Shell.Application")
For Each IE In Shell.Windows
If TypeName(IE.Document) = "HTMLDocument" Then
IE.Quit
End If
Next
End Sub
This works great but the problem occurs when I try to run the IE code again, I get the following:
Run-time error '-2147023706 (800704a6)':
Automation error
A system shutdown has already been scheduled.
After 20 secs, I can re-run the code. Is there any way of "force closing" IE so that I can run the code again directly after without the error?
EDIT:
Heres the code that initiates IE:
Sub testSub()
Dim IE As Object, Doc As Object, strCode As String
Set IE = CreateObject("internetexplorer.application")
IE.Visible = True
IE.Navigate "website name"
Do While IE.ReadyState <> 4: DoEvents: Loop
Set Doc = CreateObject("htmlfile")
Set Doc = IE.Document
CODE HERE
CloseIE
End Sub
I used a simple procedure to create an InternetExplorer object, navigate, then close it using IE.Quit and Set IE = Nothing.
Apparently after closing, the internet was still running in the background for about another minute (I noticed it using task manager, background processes). I went into Internet Explorer options and unclicked “delete browsing history on exit”.
That fixed my issue. I am not sure why IE takes that long to clear the history and I’m not sure if there is a workaround but it is the only viable option for me thus far.
Modify your main sub to Quit the IE application, and then set the object variable to Nothing:
Sub testSub()
Dim IE As Object, Doc As Object, strCode As String
Set IE = CreateObject("internetexplorer.application")
IE.Visible = True
IE.Navigate "website name"
Do While IE.ReadyState <> 4: DoEvents: Loop
Set Doc = CreateObject("htmlfile")
Set Doc = IE.Document
'### CODE HERE
IE.Quit
Set IE = Nothing
End Sub
You will not need the CloseIE procedure anymore.

Resources