browser.windows.each doesn't work watir webdriver - browser

I don't understand why this code doesn't work (ruby 1.9.3 484):
require 'rubygems'
require 'watir-webdriver'
browser = Watir::Browser.new :chrome #open chrome
browser.window.resize_to(1280, 960)
browser.goto "http://stackoverflow.com/"
browser = Watir::Browser.new :chrome #new window chrome
browser.window.resize_to(1280, 960)
browser.goto "http://google.fr/"
browser.windows.each{|wb|puts wb.url}
The result:
http://google.fr/

ok thx a lot man of snow!!! is it right?
require 'rubygems'
require 'watir-webdriver'
browser = Watir::Browser.new :chrome #open chrome
browser.window.resize_to(1280, 960)
browser.goto "http://stackoverflow.com/"
browser.execute_script("window.open('http://yahoo.com')")
browser.windows.each{|wb|puts wb.url}

Related

How to get the request headers for particular url using selenium [duplicate]

https://www.sahibinden.com/en
If you open it incognito window and check headers in Fiddler then these are the two main headers you get:
When I click the last one and check request headers this is what I get
I want to get these headers in Python. Is there any way that I can get these using Selenium? Im a bit clueless here.
You can use Selenium Wire. It is a Selenium extension which has been developed for this exact purpose.
https://pypi.org/project/selenium-wire/
An example after pip install:
## Import webdriver from Selenium Wire instead of Selenium
from seleniumwire import webdriver
## Get the URL
driver = webdriver.Chrome("my/path/to/driver", options=options)
driver.get("https://my.test.url.com")
## Print request headers
for request in driver.requests:
print(request.url) # <--------------- Request url
print(request.headers) # <----------- Request headers
print(request.response.headers) # <-- Response headers
You can run JS command like this;
var req = new XMLHttpRequest()
req.open('GET', document.location, false)
req.send(null)
return req.getAllResponseHeaders()
On Python;
driver.get("https://t.me/codeksiyon")
headers = driver.execute_script("var req = new XMLHttpRequest();req.open('GET', document.location, false);req.send(null);return req.getAllResponseHeaders()")
# type(headers) == str
headers = headers.splitlines()
The bottom line is, No, you can't retrieve the request headers using Selenium.
Details
It had been a long time demand from the Selenium users to add the WebDriver methods to read the HTTP status code and headers from a HTTP response. We have discussed about implementing this feature through Selenium at length within the discussion WebDriver lacks HTTP response header and status code methods.
However, Jason Leyba (Selenium contributor) in his comment straightly mentioned:
We will not be adding this feature to the WebDriver API as it falls outside of our current scope (emulating user actions).
Ashley Leyba further added, attempting to make WebDriver the ideal web testing tool will suffer in overall quality as driver.get(url) blocks until the browser has loaded the page and return the response for the final loaded page. So in case of a login redirects, status codes and headers will always end up with a 200 instead of the 302 you're looking for.
Finally, Simon M Stewart (WebDriver creator) in his comment concluded that:
This feature isn't going to happen. The recommended approach is to either extend the HtmlUnitDriver to access the information you require or to make use of an external proxy that exposes this information such as the BrowserMob Proxy
It's not possible to get headers using Selenium. Further information
However, you might use other libraries such as requests, BeautifulSoup to get headers.
Maybe you can use BrowserMob Proxy for this. Here is a example:
import settings
from browsermobproxy import Server
from selenium.webdriver import DesiredCapabilities
config = settings.Config
server = Server(config.BROWSERMOB_PATH)
server.start()
proxy = server.create_proxy()
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--proxy-server=%s' % proxy.proxy)
chrome_options.add_argument('--headless')
capabilities = DesiredCapabilities.CHROME.copy()
capabilities['acceptSslCerts'] = True
capabilities['acceptInsecureCerts'] = True
driver = webdriver.Chrome(options=chrome_options,
desired_capabilities=capabilities,
executable_path=config.CHROME_PATH)
proxy.new_har("sahibinden", options={'captureHeaders': True})
driver.get("https://www.sahibinden.com/en")
entries = proxy.har['log']["entries"]
for entry in entries:
if 'request' in entry.keys():
print(entry['request']['url'])
print(entry['request']['headers'])
print('\n')
proxy.close()
driver.quit()
js_headers = '''
const _xhr = new XMLHttpRequest();
_xhr.open("HEAD", document.location, false);
_xhr.send(null);
const _headers = {};
_xhr.getAllResponseHeaders().trim().split(/[\\r\\n]+/).map((value) => value.split(/: /)).forEach((keyValue) => {
_headers[keyValue[0].trim()] = keyValue[1].trim();
});
return _headers;
'''
page_headers = driver.execute_script(js_headers)
type(page_headers) # -> dict
You can use https://pypi.org/project/selenium-wire/ a plug-in replacement for webdriver adding request/response manipulation even for https by using its own local ssl certificate.
from seleniumwire import webdriver
d = webdriver.Chrome() # make sure chrome/chromedriver is in path
d.get('https://en.wikipedia.org')
vars(d.requests[-1].headers)
will list the headers in the last requests object list:
{'policy': Compat32(), '_headers': [('content-length', '1361'),
('content-type', 'application/json'), ('sec-fetch-site', 'none'),
('sec-fetch-mode', 'no-cors'), ('sec-fetch-dest', 'empty'),
('user-agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.102 Safari/537.36'),
('accept-encoding', 'gzip, deflate, br')],
'_unixfrom': None, '_payload': None, '_charset': None,
'preamble': None, 'epilogue': None, 'defects': [], '_default_type': 'text/plain'}

Python selenium cannot saved to default download path

Selenium 4.6.0
Python 3.10.9
If I'm using headless mode then this issue won't happen, files can be downloaded to Downloads/test folder.
But if it's non-headless mode it ignores the options prefs and download to Downloads folder.
options = webdriver.ChromeOptions()
ua = UserAgent()
userAgent = ua.random
print(userAgent)
options.add_argument('--user-agent=User-Agent: {}'.format(userAgent))
# options.add_argument('--headless')
options.add_argument("--user-data-dir=C:/Users/Myname/AppData/Local/Google/Chrome/User Data")
options.add_argument("--profile-directory=Profile 2")
prefs = {"download.default_directory" : r"C:\Users\Myname\Downloads\test\\"}
options.add_experimental_option("prefs", prefs)
self.webdriver = webdriver.Chrome(executable_path="../chromedriver_win32/chromedriver.exe",options=options)
self.webdriver.maximize_window()
print ("Headless Chrome Initialized")

Experimental Chrome Options in Selenium Node.js

I need to convert these python lines:
options = webdriver.ChromeOptions()
options.add_experimental_option('excludeSwitches', ['enable-logging'])
driver = webdriver.Chrome(executable_path='<path-to-chrome>', options=options)
From this issue: Python selenium: DevTools listening on ws://127.0.0.1
I don't know how to add experimental options in Node.js, I can't find any documentation.
I also couldn't find the experimental chrome options in Selenium Node.js. But if you want to exclude a chrome switch I think you can do it like this:
const {Builder} = require('selenium-webdriver')
const chrome = require('selenium-webdriver/chrome');
const chromeOptions = new chrome.Options()
chromeOptions.excludeSwitches("enable-logging")
let driver = await new Builder().forBrowser('chrome').setChromeOptions(chromeOptions).build();

Does Selenium Grid 4 for Firefox lack extension addon option?

I have Selenium working well locally, adding extensions with the following setup.
from selenium import webdriver
from selenium.webdriver.firefox.options import Options as firefox_options
def init_firefox(self,threadname,headless,images_off):
if headless == True:
options=firefox_options()
options.add_argument("--headless")
driver = webdriver.Firefox(options=options,executable_path=(r"C:\Users\charl\OneDrive\python\gecko\geckodriver.exe"))
else:
options=firefox_options()
driver = webdriver.Firefox(executable_path=(r"C:\Users\charl\OneDrive\python\gecko\geckodriver.exe"))
extension_dir = ('C:\\Users\\charl\\OneDrive\\python\\gecko\\extensions\\')
extensions = [
'firefox#vid.io.xpi',
'noimages.xpi',
]
for extension in extensions:
driver.install_addon(extension_dir + extension, temporary=True)
self.close_tab(driver)
self.login(driver)
return driver
But when I try the same on Selenium Grid 4 using this code:-
from selenium import webdriver
from selenium.webdriver.firefox.options import Options as firefox_options
def init_firefox_remote(self,threadname,headless,images_off):
if headless == True:
#options = Options()
options=firefox_options()
options.add_argument("--headless")
driver = webdriver.Remote(command_executor='http://xx.xxx.xx.xx:4444/wd/hub', options=options)
else:
options=firefox_options()
driver = webdriver.Remote(command_executor='http://xx.xx.xx.xx:4444/wd/hub', options=options)
extension_dir = ('/dev/shm/extensons/')
extensions = [
'firefox#vid.io.xpi',
'noimages.xpi',
]
for extension in extensions:
driver.install_addon(extension_dir + extension, temporary=True)
self.close_tab(driver)
self.login(driver)
return driver
I get an error:-
AttributeError: 'WebDriver' object has no attribute 'install_addon'
The version of Selenium Grid I am using is created like this
$ docker run -d -p 4444:4444 -v /dev/shm:/dev/shm selenium/standalone-firefox:4.0.0-beta-3-prerelease-20210321
Any ideas? Does Selenium Grid for Firefox lack the install extension option?
Installing addons in a remote Firefox browser is done by creating a Firefox profile and adding the extension there:
from selenium import webdriver
from selenium.webdriver.firefox.options import Options as FirefoxOptions
profile = webdriver.FirefoxProfile()
options = FirefoxOptions()
profile.add_extension("/path/to/extension.xpi")
driver = webdriver.Remote(command_executor="http://xx.xx.xx.xx:4444/wd/hub",
options=options,
browser_profile=profile)
I don't think it is possible to load a temporary addon in a remote firefox browser.
install_addon is only available for local webdrivers. A simple workaround is required when using remote webdrivers, as mentioned in this issue.
More specifically, change this line:
driver.install_addon(extension_dir + extension, temporary=True)
to
webdriver.Firefox.install_addon(driver, extension_dir + extension, temporary=True)
The full code should look like the following:
from selenium import webdriver
from selenium.webdriver.firefox.options import Options as firefox_options
def init_firefox_remote(self,threadname,headless,images_off):
if headless == True:
#options = Options()
options=firefox_options()
options.add_argument("--headless")
driver = webdriver.Remote(command_executor='http://xx.xxx.xx.xx:4444/wd/hub', options=options)
else:
options=firefox_options()
driver = webdriver.Remote(command_executor='http://xx.xx.xx.xx:4444/wd/hub', options=options)
extension_dir = ('/dev/shm/extensons/')
extensions = [
'firefox#vid.io.xpi',
'noimages.xpi',
]
for extension in extensions:
webdriver.Firefox.install_addon(driver, extension_dir + extension, temporary=True)
self.close_tab(driver)
self.login(driver)
return driver
I have opened a pull request to the Selenium Docs to clarify such usages.

How is my scraper being detected immediately by a search engine

I am using Scrapy with Selenium in order to scrape urls from a particular search engine (ekoru). Here is a screenshot of the response I get back from the search engine with just ONE request:
Since I am using selenium, I'd assume that my user-agent should be fine so what else could the issue be that makes the search engine detect the bot immediately?
Here is my code:
class CompanyUrlSpider(scrapy.Spider):
name = 'company_url'
def start_requests(self):
yield SeleniumRequest(
url='https://ekoru.org',
wait_time=3,
screenshot=True,
callback=self.parseEkoru
)
def parseEkoru(self, response):
driver = response.meta['driver']
search_input = driver.find_element_by_xpath("//input[#id='fld_q']")
search_input.send_keys('Hello World')
search_input.send_keys(Keys.ENTER)
html = driver.page_source
response_obj = Selector(text=html)
links = response_obj.xpath("//div[#class='serp-result-web-title']/a")
for link in links:
yield {
'ekoru_URL': link.xpath(".//#href").get()
}
Sometimes you need to pass other parameters in order to avoid being detected by any webpage.
Let me share a code you can use:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
#This code helps to simulate a "human being" visiting the website
chrome_options = Options()
chrome_options.add_argument('--start-maximized')
driver = webdriver.Chrome(options=chrome_options, executable_path=r"chromedriver")
driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {"source":
"""Object.defineProperty(navigator,
'webdriver', {get: () => undefined})"""})
url = 'https://ekoru.org'
driver.get(url)
Yields (Check out below the bar address "Chrome is being controlled..."):

Resources