I am a new developer. I am trying to execute selenium code in an already opened tor session.
I also tried to find my query on the internet but didn't get a satisfying answer.
My code looks like this:
binary = r'C:\Users\Admin\Desktop\Tor Browser\Browser\firefox.exe'
options = Options()
options.set_preference("network.proxy.type", 1)
options.set_preference("network.proxy.socks", "127.0.0.1")
options.set_preference("network.proxy.socks_port", 9050)
options.set_preference("network.proxy.socks_remote_dns", False)
options.set_capability( "debuggerAddress", f"127.0.0.1:1060")
# options.add_experimental_option("debuggerAddress", f"127.0.0.1:1060")
global browser
# only one instance of a browser opens, remove global for multiple instances
if not browser:
browser=webdriver.Firefox(firefox_binary=binary,options=options,firefox_profile=profile)
return browser
In simple words, this code doesn't connect to the running tor session but creates another session.
Related
I wish to connect to a website and download some pdf files. The website allows us to view the content only after log in. It asks us to log in using OTP and can't be login at more than 3 devices simultaneously.
I wish to download all the pdf listed. So I previously tried the
python playwright open --save-storage websitename.json
to save the login. But it doesn't work for that specific website.
The website.json file was empty whereas it worked for other websites.
Therefore the only solution I could think of know, is to connect to the current browser, open that website and then download those pdfs.
If you have some solution for this or even some other approach please do inform.
I was also thinking about switching over to puppeteer for the same.
But, I don't know the html parsing using node.js, since I feel using css selectors more comfortable, so I can't switch it.
Playwright is basically same as Puppeteer. So it wouldn't be a problem if you switch between the two.
You can use puppeteer-core or playwright to control your existing browser installation, for example Chrome, and then use the existing user data (Profile) folder to load the specified website login info (cookies, webstorage, etc).
const launchOptions = {
headless: false,
executablePath: '/Applications/Google Chrome/Contents/MacOS/Google Chrome', // For MacOS
// executablePath: 'C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe', // For Windows
// executablePath: '/usr/bin/google-chrome' // For Linux
args: [
'--user-data-dir=/Users/username/Library/Application Support/Google/Chrome/', // For MacOS
// '--user-data-dir=%userprofile%\\AppData\\Local\\Chrome\\User Data', // For Windows
// '--profile-directory=Profile 1' // This to select default or specified Profile
]
}
const puppeteer = require('puppeteer-core')
const browser = await puppeteer.launch(launchOptions)
For more details about Playwright's method, you can check this workaround:
https://github.com/microsoft/playwright/issues/1985
To connect to an already running browser (Chrome) session, you can use connect_over_cdp method (added in v1.9 of playwright).
For this, you need to start Chrome in debug mode. Create a desktop shortcut for Chrome and edit Target section of shortcut properties to start it with debug mode. Add --remote-debugging-port=9222 to the target box in shortcut properties so that the target path becomes:
C:\Program Files\Google\Chrome\Application\chrome.exe" --remote-debugging-port=9222
Now start Chrome and check if it is in debug mode. For this open a new tab and paste this url in the address bar: http://localhost:9222/json/version. If you are in debug mode, you should see now a page with a json response, otherwise if you are in "normal" mode, it will say "Page not found" or something similar.
Now in your python script, write following code to connect to chrome instance:
browser = playwright.chromium.connect_over_cdp("http://localhost:9222")
default_context = browser.contexts[0]
page = default_context.pages[0]
Here is the full script code:
# Import the sync_playwright function from the sync_api module of Playwright.
from playwright.sync_api import sync_playwright
# Start a new session with Playwright using the sync_playwright function.
with sync_playwright() as playwright:
# Connect to an existing instance of Chrome using the connect_over_cdp method.
browser = playwright.chromium.connect_over_cdp("http://localhost:9222")
# Retrieve the first context of the browser.
default_context = browser.contexts[0]
# Retrieve the first page in the context.
page = default_context.pages[0]
# Print the title of the page.
print(page.title)
# Print the URL of the page.
print(page.url)
I am trying to close portable browser via selenium
I passed --remote-debugging-port=9222 because if I do not pass it then the program is stuck in object creation of webdriver.Chrome(). It will open the portable browser but not load the URL.
But after the URL is open I want to close the browser but driver.quit() it is not working for me. I have tried some other methods for close the browser but they do not work as well.
I want to close the specific instance of the browser which is opened by this program not other opened instances of the browser.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.binary_location = 'C:/Portable/GoogleChromePortable/GoogleChromePortable.exe'
chrome_options.add_argument("--remote-debugging-port=9222")
chrome_options.add_argument("--incognito")
chrome_options.add_argument("--profile-directory=Person 1")
driver = webdriver.Chrome(options=chrome_options,executable_path='C:/Portabl/chromedriver_win32/chromedriver.exe')
url = "https://www.google.com/"
driver.get(url)
driver.quit()
I am using:
selenium 3.141.0, windows 10, python 3.8.0, portable chrome version 93.0.4577.63 (32-bit)
Your this statement
I passed --remote-debugging-port=9222 because if I do not pass it then
the program is stuck in object creation of webdriver.Chrome()
is not correct. --remote-debugging-port=9222 looks like a port number where your application is deployed and you have used chrome option to send them to browser object.
driver.quit()
this typically should have worked, what is the error when it did not work ?
also, for closing a single instance you could do
driver.close()
see if that helps.
I have been experimenting with automating some tasks with Selenium, and was able to get it to work successfully once, however, when I try and recreate what I did to write it into a class, with both headless and the browser open, I run into an issue where it either hangs after logging in (with the browser open) or the login will fail due to incorrect password when headless (same info that is used when the browser is open and logs in successfully).
When using default settings (Chromedriver), I can navigate to the login page, populate my info, hit the submit button, the webpage acts as it should, but the next line of code I run, no matter what (get_cookies, any kind of find call, get_screenshot_as_file), it will hang up, and timeout if I have the timeout setting applied, if not applied, I have let it run for 5 minutes with nothing happening.
When applying the headless option to the driver (desired functionality), through screenshots of the webpage, I am able to see that it is navigating to the page and populating all of the info properly, however it will always say login failed due to incorrect user/password when submitting, but it is the exact info used in the above when it logged in just fine (could this be cookie related? Not sure what all changes when headless is applied).
I have restarted my computer, updated all packages, restarted the text editor (Atom running Hydrogen), updated Chrome, uninstalled Chrome, did every combination of option I could think of, and I am still unable to recreate my first run where it worked just fine.
Versions:
* Mac OSX Catalina 10.15
* Python: 3.6.8
* Selenium: 3.141.0
* Chrome: 80.0.3987.163
-Current Code EDIT (My text editor is setup to run like a Jupyter notebook for reference, and I listed out all of the options I have attempted, not all are included in each run I have tried)-
LOGIN = os.environ.get['BANKUSER']
PASSWORD = os.environ.get['BANKPASS']
chrme_drvr = '/usr/local/bin/chromedriver'
options = webdriver.ChromeOptions()
options.binary_location = "/Applications/Google
Chrome.app/Contents/MacOS/Google Chrome"
options.add_argument("--window-size=300,150")
options.add_argument("--disable-gpu")
options.add_argument("--disable-extensions")
options.add_argument("--proxy-server='direct://'")
options.add_argument("--proxy-bypass-list=*")
options.add_argument("--start-maximized")
options.add_argument("--headless")
options.add_argument("--webdriver-logfile=webdrive.log")
options.add_argument("--DBUS_SESSION_BUS_ADDRESS=/dev/null ")
service_args = []
service_log_path = './chromedriver.log'
driver = webdriver.Chrome(chrme_drvr,
options=options,
service_args=service_args,
service_log_path=service_log_path)
login_url = 'https://www.schwab.com/public/schwab/nn/login/login.html&lang=en'
driver.get(login_url)
driver.switch_to.frame('loginIframe')
username = driver.find_element_by_id('LoginId')
password = driver.find_element_by_id('Password')
username.send_keys(LOGIN)
password.send_keys(PASSWORD)
driver.find_element_by_id('LoginSubmitBtn').click() #Code works through this line when not in headless mode, will be an invalid login when in headless mode
driver.get_screenshot_as_file("test.png") # This hangs
driver.get_cookies() # This hangs
element = browser.find_element_by_id("accounts_summary") # This hangs
``
I'd like to open a Chrome instance on Python though selenium using a proxy to hide my real IP and stop getting blocked when I scrape certain websites.
I have been going thought several similar posts like this and this but with no success.
I am using the following code:
from selenium import webdriver
PROXY = "165.22.62.179:8118" # IP:PORT or HOST:PORT
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--proxy-server=%s' % PROXY)
chrome = webdriver.Chrome(executable_path = path_to_chromedriver, options=chrome_options)
chrome.get(target_website)
The instance open correctly but when I run the command chrome.get(target_website) I get no answer:
If I open the instance without proxy it works fine. I am getting the proxies from this website. I want to create a function that takes as input a proxy IP and returns a running Chrome instance that uses that IP.
Can you please me help to fix my code? Thanks for your help!
I'm using Selenium 3.5 for Python to test some behaviors of my web app using both local pages and online ones.
Until now, I just used Chrome as main browser to test it and everything works fine. Then, I decided to use Firefox 55.0.3 using geckodriver v0.19.0.
I have the following issues:
selenium opens the local pages such as file:///PATH/index.html but it doesn't close the browser even using the function quit();
I would use some browser option such as the incognito mode, but I was not able to find a list of all available options for Firefox.
This is a snippet of my code to use selenium with Firefox.
from selenium import webdriver
url = 'http://www.yahoo.com'
#url = 'file:///PATH/index.html'
browser = webdriver.Firefox()
browser.get(url)
browser.quit()
# borser.close() -> I also tried with the close()
I also read this question for the first issue but it doesn't work with the current version of the used framework.
About the second I can only find partial solution such as this one for the incognito mode but it doesn't work as expected.
Any ideas? Is it a kind of bug in geckodriver?