"Eager" Page Load Strategy workaround for Chromedriver Selenium in Python - python-3.x

I want to speed up the loading time for pages on selenium because I don't need anything more than the HTML (I am trying to scrape all the links using BeautifulSoup). Using PageLoadStrategy.NONE doesn't work to scrape all the links, and Chrome no longer supports PageLoadStrategy.EAGER. Does anyone know of a workaround to get PageLoadStrategy.EAGER in python?

ChromeDriver is the standalone server which implements WebDriver's wire protocol for Chromium. Chrome and Chromium are still in the process of implementing and moving to the W3C standard. Currently ChromeDriver is available for Chrome on Android and Chrome on Desktop (Mac, Linux, Windows and ChromeOS).
As per the current WebDriver W3C Editor's Draft The following is the table of page load strategies that links the pageLoadStrategy capability keyword to a page loading strategy state, and shows which document readiness state that corresponds to it:
However, if you observe the current implementation of of ChromeDriver, the Chrome DevTools does takes into account the following document.readyStates:
document.readyState == 'complete'
document.readyState == 'interactive'
Here is a sample relevant log:
[1517231304.270][DEBUG]: DEVTOOLS COMMAND Runtime.evaluate (id=11) {
"expression": "var isLoaded = document.readyState == 'complete' || document.readyState == 'interactive';if (isLoaded) { var frame = document.createElement('iframe'); frame.name = 'chromedriver dummy frame'; ..."
}
As per WebDriver Status you will find the list of all WebDriver commands and their current support in ChromeDriver based on what is in the WebDriver Specification. Once the implementation are completed from all aspects PageLoadStrategy.EAGER is bound to be functionally present within Chrome Driver.

You only use normal or none as the pageLoadStrategy in chromdriver. So either choose none and handle everything yourself or wait for the page load as it normally happens

Related

How can I know if browser is Chrome vs Firefox from web extension popup JavaScript?

I am using the chrome namespace for both Chrome and Firefox, but would like to know which browser is running the web extension.
Links to extension resources have different schemes in Chrome and Firefox.
const isFirefox = chrome.runtime.getURL('').startsWith('moz-extension://');
const isChrome = chrome.runtime.getURL('').startsWith('chrome-extension://');
Check chrome.app which is absent in Firefox:
const isFirefox = !chrome.app;
Check for browser which is absent in Chrome:
const isFirefox = window.browser && browser.runtime;
(the additional check is to avoid false positives on pages that have an element with id="browser" that creates a named property on window object for this element)
Use the asynchronous browser.runtime.getBrowserInfo.
P.S. navigator.userAgent may be changed during debugging in devtools when switching to device mode or via about:config option in Firefox so it's an unreliable source.
This is what I do in my own extensions to check for Firefox (FF) vs Chrome:
const FF = typeof browser !== 'undefined';
Update: (1)
Here is an explanation .....
I am using the chrome namespace for both Chrome and Firefox, but would
like to know which browser is running the web extension.
AFA I understand, the question relates to extension code and not content code. I use above code in background script in "firefox-webextensions" or "google-chrome-extension" background script.
From then on then code would be:
if (FF) {...}
else { .... }
Once established, content script has no bearing on it.
In case of a developer who somehow decides to use id="browser" then a further step could be added which returns a boolean true|false e.g.
const FF = typeof browser !== 'undefined' && !!browser.runtime;
Worth nothing that the following returns an object or undefined and not a boolean
const isFirefox = window.browser && browser.runtime;
While it works fine in if() conditionals, it wont work in other situations where a boolean would be required (e.g. switch)
(1) Note: Marking down answers, discourages people from spending time and effort in answering questions in future.

How can I ignore tests when under a particular browser?

My suite of cucumbers gets run on both Firefox and Chrome. Some of them require a browser resize, which is horrible to deal with in Chrome. Since the behaviors that need the resize don't require cross browser testing, I'd like some way to ignore them when the detected browser is Chrome. Is there a way to do this? Perhaps with hooks or in the steps? I'm currently doing the resizing in Before and After hooks.
I don't know which web-driver you are using, but for watir-webdriver you can do the following:
You can determine which browser it is in the steps that you want to skip using the code in the below URL.
http://watirwebdriver.com/determining-which-browser/
Once you determine that it is chrome you can just skip that particular step.
In your test helper, you can add those methods :
def use_chrome_driver
Capybara.register_driver :selenium_chrome do |app|
Capybara::Selenium::Driver.new(app, :browser => :chrome)
end
Capybara.current_driver = :selenium_chrome
end
def setup
Capybara.current_driver = :selenium
end
All your tests will use the selenium default webdriver, then when you need to use Chrome, just call the method use_chrome_driver at the beginning of your test like that :
def test_login_with_chrome
use_chrome_driver
...
end
You may also add into your helper your firefox driver with the correct browser size you need, and make it the default selenium browser.

Download page with javascript executed

I want to download a page with javascript executed using python. QT is one of solutions and here is the code:
class Downloader(QApplication):
__event = threading.Event()
def __init__(self):
QApplication.__init__(self, [])
self.webView = QWebView()
self.webView.loadFinished.connect(self.loadFinished)
def load(self, url):
self.__event.clear()
self.webView.load(QUrl(url))
while not self.__event.wait(.05): self.processEvents()
return self.webView.page().mainFrame().documentElement() if self.__ok else None
def loadFinished(self, ok):
self.__ok = ok
self.__event.set()
downloader = Downloader()
page = downloader.load(url)
The problem is that sometimes downloader.load() return a page without javascript executed. Downloader.loadStarted() and Downloader.loadFinished() are called only once.
What is the proper way to wait for a complete page download?
EDIT
If add self.webView.page().networkAccessManager().finished.connect(request_ended) into __init__() and define
def request_ended(reply):
print(reply.error(), reply.url().toString())
then it turns out that sometimes reply.error()==QNetworkReply.UnknownNetworkError. This behaviour stands when unreliable proxy is used, that fails to download some of the resources (part of which are js files), hence some of js not being executed. When proxy is not used (== connection is stable), every reply.error()==QNetworkReply.NoError.
So, the updated question is:
Is it possible to retry getting reply.request() and apply it to the self.webView?
JavaScript requires a runtime to be executed with (python alone won't do) a popular one is PhantomJS these days.
Unfortuantely, PhantomJs has no python support anymore so you could resort to e.g. Ghost.py to do this job for you which allows you to selectively execute JS you want.
You should use Selenium
It provides different WebDriver, for example, PhantomJS, or other common browsers, like firefox.

Can js code in chrome extension detect that it's executed as content script?

I have a google chrome extension that shares some code between it's content script and background process / popup. If it some easy and straightforward way for this code to check if it's executed as content script or not? (message passing behavior differs).
I can include additional "marker" javascript in manifest or call some chrome fnction unavailable from content script and check for exceptions - but these methods looks awkward to be. Maybe it's some easy and clean way to make this check?
To check whether or not your script is running as a content script, check if it is not being executed on a chrome-extension scheme.
if (location.protocol == 'chrome-extension:') {
// Running in the extension's process
// Background-specific code (actually, it could also be a popup/options page)
} else {
// Content script code
}
If you further want to know if you're running in a background page, use chrome.extension.getBackgroundPage()=== window. If it's true, the code is running in the background. If not, you're running in the context of a popup / options page / ...
(If you want to detect if the code is running in the context of an extension, ie not in the context of a regular web page, check if chrome.extension exists.)
Explanation of revised answer
Previously, my answer suggested to check whether background-specific APIs such as chrome.tabs were defined. Since Chrome 27 / Opera 15, this approach comes with an unwanted side-effect: Even if you don't use the method, the following error is logged to the console (at most once per page load per API):
chrome.tabs is not available: You do not have permission to access this API. Ensure that the required permission or manifest property is included in your manifest.json.
This doesn't affect your code (!!chrome.tabs will still be false), but users (developers) may get annoyed, and uninstall your extension.
The function chrome.extension.getBackgroundPage is not defined at all in content scripts, so alone it can be used to detect whether the code is running in a content script:
if (chrome.extension.getBackgroundPage) {
// background page, options page, popup, etc
} else {
// content script
}
There are more robust ways to detect each context separately in a module I wrote
function runningScript() {
// This function will return the currently running script of a Chrome extension
if (location.protocol == 'chrome-extension:') {
if (location.pathname == "/_generated_background_page.html")
return "background";
else
return location.pathname; // Will return "/popup.html" if that is the name of your popup
}
else
return "content";
}

Capybara use Internet Explorer as browser rather than Firefox

Hi is it possible to tell Capybara to use IE instead of always defaulting to Firefox?
I have to write some automated tests but the business only supports Internet Explorer so I need the tests to be run on this browser.
Thanks.
As marc_s suggested in the comments, you could try making IE the default browser on your test machine.
I also see some google hits about using Capybara with Selenium (remote control).
If you're interested, check the Selenium docs for how to specify the browser.
Edit It seems the tutorial I posted before was Rack-only. Not sure, but maybe this will work instead:
http://www.johng.co.uk/2010/10/13/run_capybara_and_cucumber_features_in_internet_explorer_on_remote_windows/
Capybara.app_host = "http://192.168.1.37:3000"
Capybara.default_driver = :selenium
Capybara.register_driver :selenium do |app|
Capybara::Driver::Selenium.new(app,
:browser => :remote,
:url => "http://192.168.1.127:4444/wd/hub",
:desired_capabilities => :internet_explorer)
end
It still requires Selenium.
Edit 2:
If you get this error:
Capybara::TimeoutError: failed to resynchronize, ajax request timed out
Then try adding this code to features/step_definitions/mydefiniation.rb:
Before do
page.driver.options[:resynchronize] = false
end
See this question about that specific problem: Using Capybara for AJAX integration tests
Use ->
ignore_mode = opts.delete(:introduce_flakiness_by_ignoring_security_domains) != false
Goto -> External Libraries- selenium-webdriver - lib - selenium - webdriver - ie - bridge.rb
Update module IE -> def initialize
It contains -
ignore_mode = opts.delete(:introduce_flakiness_by_ignoring_security_domains)
just add != false so that it becomes ->
ignore_mode = opts.delete(:introduce_flakiness_by_ignoring_security_domains) != false

Resources