How would I go about muting a tab or entire browser with NodeJS selenium-webdriver module in firefox
I can't find any docs or any methods on how to do this for javascript
Any ideas?
Related
I realized today that you can merge Chrome DevTools Protocol with Selenium in order to automate some very specific parts of a process within a website.
for instance: after some initial conditions have met, automate the process of uploading some files to an account and etc...
According to the official repository you use a sentence like the following on cmd to create a new chrome session with your user data:
chrome.exe --remote-debugging-port=9222 --user-data-dir:"C:\Users\ResetStoreX\AppData\Local\Google\Chrome\User Data"
So in my case, the sentence above generates the following output:
The thing is, in my original session I had some Chrome extensions added, and I know that If I were to work only with Selenium using its chromedriver.exe, I could easily add an extension (which must be compressed as a .crx file) by using the following sentence:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
opt = Options() #the variable that will store the selenium options
opt.add_extension(fr'{extension_path}')
But it seems that Chrome DevTools Protocol can't just add as much Options as Selenium, so I would have to install all my extensions in this pseudo session of mine again, no problem.
But, after installing such extensions, will those keep installed and ready for use after I execute again chrome.exe --remote-debugging-port=9222 --user-data-dir:"C:\Users\ResetStoreX\AppData\Local\Google\Chrome\User Data", and if so, where?
Or if not, does that mean that I would have to re install those every single time I need to do tests with Chrome DevTools Protocol and my Chrome extensions? Thanks in advice.
Can confirm, a session opened with Chrome DevTools Protocol somehow stores permanently the extensions you re installed. It also remembers if you used some specific credentials for logging in to some sites.
I have an URL to scrape and i ask me what's the best method.
With selenium for example:
executable_path = "....\\chromedriver" browser = webdriver.Chrome(executable_path=executable_path)
url = "xxxxxxxxxx" browser.get(url) timeout = 20
# find_elements_by_xpath returns an array of selenium objects.
titles_element = browser.find_elements_by_css_selector('[data-test-id="xxxx"]'
This method launches Chrome Browser. On windows i have to install both "Chrome browser" and a Chrome Driver in the same version. But what happens in a Linux server: no problem to install Chrome driver but it's not a problem to install a Chrome browser on a server without graphic interface?
Do you suggest me to use rather request module than selenium because my URL is already built.
The risk to be caught by website is more important with selenium or request?
If you have just one URL to scrape Selenium is better because it's easier to code than requests.
For exemple : if you need to scroll down to make your data appear, it will be harder to do without a browser
If you want to do intensive scraping maybe you should try requests with beautifulsoup, it will use way less ressource on your server.
You can also use scrapy, it's very easy to spoof the user agent with it, this makes your bot harder to detect.
If you scrape responsibly with a delay between 2 requests, you should not be detected with either method. You can check the robot.txt document to be safe
Is there a way to debug nightwatch even within the command chains, or debug the injected code in chrome DevTools?
For instance, I want to debug the "window" object:
I use chrome Version 59.0.3071.115.
According to ChromeDriver - WebDriver for Chrome the DevTools is always disconnected from ChromeDriver as soon as the DevTools opens. Meaning, if I inject the code within the execute command (images) the DevTools will close and I have to reopen it again? Meaning, I cannot even debug it in the front end?
Thanks!
Apparently, the only way to debug or set breakpoints within the command queue is by means of callbacks, as shown in the following example.
Setting a breakpoint to inspect a page in browser
Sometimes it is important to inspect a page in browser in the middle
of a test case run. For example, to verify used selectors. To pause
execution at the right moment set a breakpoint inside a callback:
browser.perform(function () {
console.log('dummy statement'); // install a breakpoint here });
The example where taken from https://github.com/nightwatchjs/nightwatch/wiki/Understanding-the-Command-Queue.
Except:
Except for the execute command, because nightwatch injects the specified script directly into the browser, which will be executed right there. Moreover, chrome only allows one DevTools per page, hence it will try to reopen the DevTools each time a command must be executed.
DevTools window keeps closing
This is normal.
When you open the DevTools window, ChromeDriver is automatically
disconnected. When ChromeDriver receives a command, if disconnected,
it will attempt to close the DevTools window and reconnect.
Chrome's DevTools only allows one debugger per page. As of 2.x,
ChromeDriver is now a DevTools debugging client. Previous versions of
ChromeDriver used a different automation API that is no longer
supported in Chrome 29.
If you need to inspect something in DevTools, the best you can do now
is pause your test so that ChromeDriver won't close DevTools. When you
are done inspecting things in Chrome, you can unpause your test and
ChromeDriver will close the window and continue.
Source: https://sites.google.com/a/chromium.org/chromedriver/help/devtools-window-keeps-closing
I'm coding a parser for Google Ads and I need to separate top and bottom sections.
When I get a page with Node.JS request module I get all ads with ('.ads-ad') selector. The same I get via a browser.
But in a browser I can see parent DIV element with id='taw' (for top ads) and parent DIV element with id='bottomads' (for bottom ads). But I don't see these DIV elements with request.
Is there existing simple way to get Google page like a browser?
Thank you in advance!
Try JSDom or Casper JS. These are headless browsers which render the pages and run any javascript on the page, this ensures the the final content is the same as in your browser.
There is a webpage with live text data in a span tag that updates without the page refreshing. Is it possible to use cheerio or maybe another node.js module to get the page info and keep it open so node.js also sees the updates?
I would like to not keep re-requesting. As A human with the webpage open in the browser i do not need to refresh so logically the same should be doable in node.js
True?
You can use phantomjs
It's like a real browser but without window.
You can handle all browser event, so you can know when an element is added to page.