Access web page body text using VBA & Selenium - excel

I am trying to convert an Excel macro that currently uses Internet Explorer and use the following line of code to extract the web page’s <body> text
x = .Document.DocumentElement.InnerText
Using the Selenium demo, I am able to produce a jpg of the page with Chrome & IE, but Firefox just loads a blank page and IE64 & Edge don’t work on Windows 10.
I have been unable to find the proper VBA command with Selenium to copy the body text to variable ”x”. I only want to read it.
I am trying to do this to make my macro browser independent.
The macro is for my use only.
Jim

You are not making it browser agnostic. You are simply widening the choice of browser to those supported via selenium basic. This brings some problems of its own which you are noticing.
Folders containing the drivers must be on the environmental path or the path passed to selenium webdriver as an argument.
You should use the latest Chrome browser and Chrome driver
You cannot use the latest FireFox browser and driver. It is not supported. I think you need FF v.46.0.1.
If using IE then zoom must be to 100%.
I suggest browsing the issues pages of Github for further known issues
Heuristically, I have heard some banter about problems with Windows 10 and Selenium Basic - would be interested to know if anyone has got this working as I am not on that version.
Review the examples.xlsm provided by selenium basic GitHub site to see which other browsers are supported (e.g. Opera, PhantomJS, FirefoxLight,CEF).
With Chrome you can get the body text with this:
Option Explicit
Public Sub GetInfo()
Dim d As WebDriver, s As String
Set d = New ChromeDriver
Const URL = "https://www.neutrinoapi.com/api/api-examples/python/"
With d
.Start "Chrome"
.get URL
s = .FindElementByTag("body").Text
Debug.Print s
.Quit
End With
End Sub
Other info: https://stackoverflow.com/a/52294259/6241235

Related

Send Keys from VBA to chrome by Chrome DevTools Protocol

I want to automate Chrome from Excel VBA. I am using the framework mentioned in : (Method 2 in the answer)
Automating Edge Browser using VBA without downloading Selenium
where in github located
https://github.com/longvh211/Chromium-Automation-with-CDP-for-VBA
In the browser I want to automate, there is a search input and looking at the sample I can select the input like this
chrome.jsEval "el = document.evaluate(""//input[contains(#placeholder,'Search')]"", document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue;"
And I can change its value like :
chrome.jsEval "el.value = """ & WorkSheet.Cells(2, 3) & """"
However the input has autocomplete (typeahead) and when I change its value like above, it doesn't filter the table below it. I believe I should send the value by sendkey like in Selenium, but couldn't figure out how to do this with this framework.
I would appreciate any help on this issue.

Replacing IE Bits with Edge in VBA

To prepare for the eventual 'going away' of IE11, I've been trying to figure out how to replace a couple parts of my code. One involves launching IE and using that browser to scrape some pages. Is there an equivalent way to do the below in Edge? I don't see a way to add a reference to the Edge libraries like I did with 'Microsoft Internet Objects' and IE11.
Dim ie As InternetExplorerMedium: Set ie = New InternetExplorerMedium
Dim html As HTMLDocument
With ie
.Visible = False
.Navigate website 'string that's created above this code
End With
Do While ie.ReadyState <> READYSTATE_COMPLETE
DoEvents
Loop
Application.Wait Now + #12:00:10 AM#
Set html = ie.Document
Thanks everyone for your help.
Ok, a few explanations. I am writing these as a reply so as not to have to split them into several comments.
Does Edge work instead of IE to do web scraping with VBA?
It does not work directly. The reason is that IE has a COM interface (Wikipedia: Component Object Model). No other browser has this interface. Not even Edge.
But for Edge there is also a web driver for Selenium. Even provided directly by MS.
Another alternative - xhr
Since you can't use Selenium because you don't have admin rights, there might be the possibility to use xhr (XML HTTP Request). However, in order to make a statement on this, we would have to know the page that you want to scrape.
Xhr can be used directly from VBA because it does not use a browser. The big limitation is that only static content can be processed. No JavaScript is executed, so nothing is reloaded or generated dynamically in any other way. On the other hand, this option is much faster than browser solutions. Often, a static file provided by the web server is sufficient. This can be an HTML file, a JSON or another data exchange format.
There are many examples of using xhr with VBA here on SO. Take note of the possibility first as another approach. I can't explain the method exhaustively here, also because I don't know everything about it myself. But there are many ways to use it.
By the way
IE will finally be discontinued in June 2022 and will then also no longer be delivered with Windows. That's what I read on the German IT pages a few days ago. But there are already massive restrictions on the use of IE.

Use VBA to open URL in Default-Browser an catch existing Session

I try to open a specific URL of a web-application I'm already logged in (or tells me to login if I'm not) in the default browser (Chrome). When I copy/paste this URL into the browser address bar, it perfectly works. It doesn't when I open this URL by VBA with ThisWorkbook.FollowHyperlink - then it redirects - as a kind of fallback - to the homepage instead the specific URL.
I found out that this is a session problem and VBA somehow doesn't recognize/catch the existing session.
As "ugly workaround" I'm currently redirecting over http://www.dereferer.org/ to the specific URL, what perfectly works, but is needing additional time.
This doesn't work:
ThisWorkbook.FollowHyperlink ("https://www.example.com/function/edit/2019-04-09)
This works:
ThisWorkbook.FollowHyperlink ("http://www.dereferer.org/?https://www.example.com/function/edit/2019-04-09)
(for my needs it's not required to encode the target URL)
As this redirect is slow and indirect, I'm searching for a way to directly open the targeted URL while using the existing session (if possible). If this isn't possible (for example because of security), what's the best/fastest way to redirect without setting up an own redirector (which redirects like dereferer.org over a GET parameter)?
A clunky and ill-advised workaround, but you could bypass FollowHyperlink, and instead use Shell to open the website in a new tab/window of your default web-browser:
Shell "explorer ""https://www.example.com/function/edit/2019-04-09"""
(As a note, if you type as a hyperlink in a cell and clicked on it manually, instead of using VBA FollowHyperlink, then the same issue would still occur. This also happens in Word and PowerPoint. Just be thankful you're not trying to catch the FollowHyperlink event and "correct" that in the window)
In response to comments - for Mac you will need to use "open" instead of "explorer". This code should run on both Mac or PC:
Shell IIf(Left(Application.Operatingsystem, 3)="Win","explorer ","open ") & _
"""https://www.example.com/function/edit/2019-04-09"""
If you are allowed to install selenium basic I would use that
Option Explicit
'download selenium https://github.com/florentbr/SeleniumBasic/releases/tag/v2.0.9.0
'Ensure latest applicable driver e.g. ChromeDriver.exe in Selenium folder
'VBE > Tools > References > Add reference to selenium type library
Public Sub DownloadFile()
Dim d As WebDriver
Set d = New ChromeDriver
Const URL = "url"
With d
.Start "Chrome"
.get URL
'login steps
.get 'otherUrl'
Stop '<delete me later
.Quit
End With
End Sub

How to detect Firefox user agent?

I am working on an application where I am required to make legacy code, which has been designed primarily for Internet Explorer, work with Firefox.
The problem I have hit is iframes nested within a table structure do not expand to the full height of the table cell. Due to the size of the web application the decision has been made to create a JavaScript shim to address this issue instead of making mark-up changes. This shim will only be included on the page if the browser is Firefox as the problem does not exist within other browsers I have tested.
So my question is:
Using a classic ASP VBScript function how can I identify Firefox browsers, this should include any edge cases?
So far I have the following which checks the user agent for the string value "Firefox". Are there any cases where this would not work?
function IsFirefox()
dim userAgent : userAgent = Request.ServerVariables("HTTP_USER_AGENT")
dim locationOfFirefox : locationOfFirefox = InStr(1, userAgent, "Firefox", 1)
IsFirefox = (locationOfFirefox > 0)
end function
According to a document from the Mozilla Foundation, Firefox must be identified by the user agent when it contains the string "Firefox/xyz" and does not contain the string "Seamonkey/xyz". More information:
https://developer.mozilla.org/en-US/docs/Browser_detection_using_the_user_agent

change Watir browser headers

How is it possible to change browser header with Watir?
I'd like to change browser headers (in Firefox or Chrome) when using Watir.
I know about watir-user-agent gem, but I'm interested in changing browser version.
Is that possible?
Thanks
Yes this can be done.
Unfortunately Watir does not seem to provide any very easy way to do this
However, here are 2 simple options which work:
A. Use a proxy server.
This is a well understood way to modify headers generally. However I have not personally used it during automation.
Steps :
1. Setup proxy server before your test code is executed
2. Ensure the proxy server will add the required headers to every request
3. Then when your test browser requests any page ----> the proxy server will automatically add the required headers.
B. Use browser extensions
Since Watir cannot seem to to modify headers by itself ... then we just ask Watir to use a normal browser extension which can!
I have done this successfully using Chrome and firefox
Note: These steps work with ONLY the indicated extensions - but a similar approach should also work fine for many other extensions.
Firefox Steps:
1. Start firefox
2. Search for 'Modify Headers Firefox' using a very popular search engine .... the top result is https://addons.mozilla.org/En-us/firefox/addon/modify-headers/
3. Download the .xpi file for this extension ... currently you can do this by right clicking on the button and clicking "save link as"
4. Install the extension as normal, change the headers as you wish, close firefox, then locate and save the "modifyheaders.conf" file ... this file should be somewhere in your user folder
5. Make the following class (which extends Profile)
class FirefoxProfileWithAddedFiles < Selenium::WebDriver::Firefox::Profile
# This method OVERRIDES the one in Profile
# This method creates the firefox profile folder
def layout_on_disk
#Call the superclass layout method
profile_directory = super
#Add custom file
if(!#file_to_add_to_profile.nil?)
FileUtils.cp(#file_to_add_to_profile, profile_directory)
end
profile_directory
end
def add_file_to_profile(filepath)
#file_to_add_to_profile = filepath
end
6. Set your test script up as follows
...
#Setup Firefox Profile
profile = Selenium::WebDriver::Firefox::Profile::FirefoxProfileWithAddedFiles.new
profile.add_extension("SOMEPATH/modifyheaders.xpi")
profile.add_file_to_profile("SOMEPATH/modifyheaders.conf")
profile["modifyheaders.config.active"] = true
#Start up Firefox
#browser = Watir::Browser.new :firefox, :profile => profile
...
Chrome Steps
1. Start chrome
2. Search for 'Modify Headers Firefox' using a very popular search engine .... the top result is https://chrome.google.com/webstore/detail/modify-headers-for-google/innpjfdalfhpcoinfnehdnbkglpmogdi
3. Install the extension as normal, change the headers as you wish, then close chrome
4. Locate the unpacked extension folder and copy it. On windows the folder will be something like...
C:\Users\MYNAME\AppData\Local\Google\Chrome\User Data\Default\Extensions\innpjfdalfhpcoinfnehdnbkglpmogdi\2.0.3_0
5. Locate the extension configuration file and copy it. On windows, the file will be something like...
C:\Users\MYNAME\AppData\Local\Google\Chrome\User Data\Default\Local Storage\chrome-extension_innpjfdalfhpcoinfnehdnbkglpmogdi_0.localstorage
6. Set your test script up as follows:
...
#Setup Chrome Profile Folder
profile_directory = Dir.mktmpdir("webdriver-chrome-profile")
extension_configuration_folder = FileUtils.mkdir_p "#{profile_directory}/Default/Local Storage"
FileUtils.cp("PATH_TO_MY_EXTENSION_CONFIGURATION_FILE", extension_configuration_folder[0])
#Start Webdriver
#browser = Watir::Browser.new :chrome, :switches => ["--user-data-dir=#{profile_directory}", "--load-extension=#{PATH_TO_MY_UNPACKED_EXTENSION_FOLDER"]
...
Watir automates the browser INSIDE the browser window, with very limited interaction up at the OS level (such as responding to alerts, etc), you would need to pre-configure the browser (presuming that was possible) to what you wanted, or use a tool such as Autoit to interact with the browser's OS level controls to do that.. (presuming the browser even has the feature to allow you to alter what it is reporting in terms of browser and version when it makes a request to a website)
If you are using Watir-Webdriver along with Firefox then you may be able to do this via a profile that sets those parameters. In that case you create the profile, then the browser object with that profile specified. it's pretty much a webdriver function, but easy enough to access when creating the browser object.
See this webdriver bug for the parameters to use (down in the comments) when creating the profile. Refer to webdriver docs for more info on how to setup and use profiles for firefox.
Another option that might be useful would be to fork your own version of the code for the user-agent gem and add browser_version as one of the things to be set. It's using profiles for FF, so doing that should be possible, at least for FF. for Chrome it is using the user-agent switch to override the useragent string, so it should be possible there also, although you would have to do a little work to modified the fixed strings the gem uses to replace the portion that has the version with the one you want.
Then if you get it working issue a pull request to add that enhancement to the gem..
or if you are not up to that sort of thing yourself, then beg, plead and offer to bribe the gem author with something appropriate if they would extend the gem for you to make version one of the things that could be set.

Resources