I am trying to use selenium to control Chrome at work. The methods I have found all seem to install the webdriver every time the code is run. However, because of work from home and a rather slow VPN, this can take upto a minute. So, I am trying to check for the webdriver executable and then skip the install method if it exists.
Issue: When searching the directory, the the files list is empty.
chromedriver.exe does exist in C:\Users<user>.wdm\drivers\chromedriver\win32\103.0.5060 as installed by a previous run.
import os
from selenium import webdriver
from selenium.webdriver.chrome.service import Service as ChromeService
from webdriver_manager.chrome import ChromeDriverManager
# set a variable for the web driver folder.
wddir = os.environ["USERPROFILE"] + "\.wdm\drivers\chromedriver\win32"
# check the existance of the folder.
# if it exists, find the driver and set it
if os.path.exists(wddir):
wdname = "chromedriver.exe"
for root, dir, files in os.walk(wddir):
if wdname in files:
driver = webdriver.Chrome(os.path.join(root, wdname))
else: #install the driver
driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()))
Resulting variable list:
- dir: ['103.0.5060']
- files: []
- root: C:\Users\<user>\.wdm\drivers\chromedriver\win32
- wddir: C:\Users\<user>\.wdm\drivers\chromedriver\win32
- wdname: chromedriver.exe
Since files = [], it goes on to install the driver anyway.
As an aside, is this a viable method for skipping the install? I am open to suggestions for better methods.
Related
I have realized of something very weird when trying to deploy a chrome driver using --user-data-dir and --profile-directory from the user on Python 3.9.7, see below:
If you compile the following code:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
opt = Options() #the variable that will store the selenium options
opt.add_argument('--user-data-dir='+r'C:\Users\ResetStoreX\AppData\Local\Google\Chrome\User Data') #Add the user data path as an argument in selenium Options
opt.add_argument('--profile-directory=Default') #Add the profile directory as an argument in selenium Options
s = Service('C:/Users/ResetStoreX/AppData/Local/Programs/Python/Python39/Scripts/chromedriver.exe')
driver = webdriver.Chrome(service=s, options=opt)
driver.get('https://opensea.io/login?referrer=%2Faccount')
You get successfully a chrome driver instance using the corresponding --user-data-dir and --profile-directory:
Now, after killing all chrome driver instances using the following code on cmd:
taskkill /F /IM chromedriver.exe
And then compiling this other code:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
opt = Options() #the variable that will store the selenium options
path = input('Introduce YOUR profile path:')
opt.add_argument('--user-data-dir='+fr'"{path}"') #Add the user data path as an argument in selenium Options
opt.add_argument('--profile-directory=Default') #Add the profile directory as an argument in selenium Options
s = Service('C:/Users/ResetStoreX/AppData/Local/Programs/Python/Python39/Scripts/chromedriver.exe')
driver = webdriver.Chrome(service=s, options=opt)
driver.get('https://opensea.io/login?referrer=%2Faccount')
For finally typing: C:\Users\ResetStoreX\AppData\Local\Google\Chrome\User Data as input
You get this error:
WebDriverException: unknown error: Could not remove old devtools port
file. Perhaps the given user-data-dir at
"C:\Users\ResetStoreX\AppData\Local\Google\Chrome\User Data"
is still attached to a running Chrome or Chromium process
Why does that happen?
Isn't opt.add_argument('--user-data-dir='+fr'"{path}"') a valid way of passing this user data path:
path = C:\Users\ResetStoreX\AppData\Local\Google\Chrome\User Data ?
I figured it out, I was creating a syntax error with opt.add_argument('--user-data-dir='+fr'"{path}"'), so I changed it for opt.add_argument('--user-data-dir='+fr'{path}'), the improved code would be the following:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
opt = Options() #the variable that will store the selenium options
path = input('Introduce YOUR profile path:')
opt.add_argument('--user-data-dir='+fr'{path}') #Add the user data path as an argument in selenium Options
opt.add_argument('--profile-directory=Default') #Add the profile directory as an argument in selenium Options
s = Service('C:/Users/ResetStoreX/AppData/Local/Programs/Python/Python39/Scripts/chromedriver.exe')
driver = webdriver.Chrome(service=s, options=opt)
driver.get('https://opensea.io/login?referrer=%2Faccount')
After compiling this code, the program will run without throwing any errors and get the same result as the first code shown in this post.
skillshare-downloader says:
grab your cookie by typing:
document.cookie
Then it says:
Copy-paste cookie from developer console (without " if present) into example script.
Example:
from downloader import Downloader
cookie = """
ADD YOUR COOKIE HERE
"""
It adds an extra step.
Is there any way we can save document.cookie output to a file so that we can just read the cookie from the file instead of going to the console and type document.cookie and copy-paste the output?
I checked How to write console.log to a file instead. I also checked Python open browser and run javascript function. It suggests using Selenium or webbroser module. However, I am not sure how to approach this problem.
What can I do?
Assuming that you are using chrome:
Install selenium by running in a terminal pip install selenium
Install a chromedriver via a manager (It allows you to control chrome) by running in a terminal pip install webdriver-manager
Create a file example.py and paste this inside
#import the selenium webdriver and the chromedriver
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from time import sleep
#trying to stop skillshare from detecting we a are a bot
options = webdriver.ChromeOptions()
options.add_argument('--disable-blink-features=AutomationControlled')
#create the instance of chrome
driver = webdriver.Chrome(ChromeDriverManager().install(),options=options)
#get command used to open an url
driver.get('https://www.skillshare.com/')
#login part
#press on sign in
driver.find_element_by_css_selector("a.button.alt-white-ghost.transparent.initialized").click()
#change email here
driver.find_element_by_name("email").send_keys("testmail#mail.com")
#change password here
driver.find_element_by_name("password").send_keys("fakepassword")
#wait a second
sleep(1)
#click on sign in
driver.find_element_by_xpath("//span[text()='Sign In']/parent::button").click()
#wait 3 seconds for the login
sleep(3)
#execute_script is used to execute the command in the browser console, using return here to store it in a variable
cookie = driver.execute_script('return document.cookie')
#python way of creating a file on the given path and write the cookie inside it
f = open("D:\cookie.txt", "w")
f.write(cookie)
f.close()
#closing the chrome instance
driver.close()
open a terminal and type python example.py and it will run the script
Since JupyterLab 3.x jupyter-server is used instead of the classic notebook server, and the following code does not list servers served with jupyter_server:
from notebook import notebookapp
notebookapp.list_running_servers()
None
What still works for the file/notebook name is:
from time import sleep
from IPython.display import display, Javascript
import subprocess
import os
import uuid
def get_notebook_path_and_save():
magic = str(uuid.uuid1()).replace('-', '')
print(magic)
# saves it (ctrl+S)
# display(Javascript('IPython.notebook.save_checkpoint();')) # Javascript Error: IPython is not defined
nb_name = None
while nb_name is None:
try:
sleep(0.1)
nb_name = subprocess.check_output(f'grep -l {magic} *.ipynb', shell=True).decode().strip()
except:
pass
return os.path.join(os.getcwd(), nb_name)
But it's not pythonic nor fast
How to get the current running server instances - and so e.g. the current notebook file?
Migration to jupyter_server should be as easy as changing notebook to jupyter_server, notebookapp to serverapp and changing the appropriate configuration files - the server-related codebase is largely unchanged. In the case of listing servers simply use:
from jupyter_server import serverapp
serverapp.list_running_servers()
I would like to have Selenium run a headless instance of Google Chrome to mine data from certain websites without the UI overhead. I downloaded the ChromeDriver executable from here and copied it to my current scripting directory.
The driver appears to work fine with Selenium and is able to browse automatically, however I cannot seem to find the headless option. Most online examples of using Selenium with headless Chrome go something along the lines of:
import os
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.binary_location = '/Applications/Google Chrome Canary.app/Contents/MacOS/Google Chrome Canary'`
driver = webdriver.Chrome(executable_path=os.path.abspath(“chromedriver"), chrome_options=chrome_options)
driver.get("http://www.duo.com")`
However when I inspect the possible arguments for the Selenium WebDriver using the command chromedriver -h this is what I get:
D:\Jobs\scripts>chromedriver -h
Usage: chromedriver [OPTIONS]
Options
--port=PORT port to listen on
--adb-port=PORT adb server port
--log-path=FILE write server log to file instead of stderr, increases log level to INFO
--log-level=LEVEL set log level: ALL, DEBUG, INFO, WARNING, SEVERE, OFF
--verbose log verbosely (equivalent to --log-level=ALL)
--silent log nothing (equivalent to --log-level=OFF)
--append-log append log file instead of rewriting
--replayable (experimental) log verbosely and don't truncate long strings so that the log can be replayed.
--version print the version number and exit
--url-base base URL path prefix for commands, e.g. wd/url
--whitelisted-ips comma-separated whitelist of remote IP addresses which are allowed to connect to ChromeDriver
No --headless option is available.
Does the ChromeDriver obtained from the link above allow for headless browsing?
--headless is not argument for chromedriver but for Chrome. --headless Run chrome in headless mode, i.e., without a UI or display server dependencies. ChromeDriver is a separate executable that WebDriver uses to control Chrome and Webdriver is a a collection of language specific bindings to drive a browser.
I am able to run in headless mode with this set of options. I hope this will help:
from bs4 import BeautifulSoup, NavigableString
from selenium.webdriver.chrome.options import Options
from selenium import webdriver
import requests
import re
options = Options()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-gpu')
browser = webdriver.Chrome(chrome_options=options) # see edit for recent code change.
browser.implicitly_wait(20)
Update 12 Aug 2019:
old : browser = webdriver.Chrome(chrome_options=options)
new : browser = webdriver.Chrome(options=options)
Try
options.headless=True
The following is how I set up my headless chrome
options = webdriver.ChromeOptions()
options.headless=True
options.add_argument('window-size=1920x1080')
prefs = {
"download.default_directory": r"C:\FilePath\Download",
"download.prompt_for_download": False,
"download.directory_upgrade": True}
options.add_experimental_option('prefs', prefs)
chromedriver = (r"C:\Filepath\chromedriver.exe")
--headless is not argument for chromedriver but Chrome, you can see more arguments or Command Line Switches for chrome here
I am trying to use PhantomJS with Selenium and Python.
My understanding is:
I will have to write Python script utilizing Selenium package which will interact with Selenium to operate on PhantomJS WebDriver to automate web application testing.
I have installed following:
Python v3.5.1.
Selenium using pip install selenium v3.7.0.
PhantomJS v2.1.1
In meantime I tested using Chrome WebDriver by placing it in PATH, and it executes without errors. Following is my script to open google.com using chrome webdriver.
from selenium import webdriver
driver = webdriver.Chrome() # or add to your PATH
driver.get('https://google.com/')
Using PhantomJS:
from selenium import webdriver
url = "http://www.google.com"
path_phantom = r'H:\phantomjs\bin\phantomjs.exe'
driver = webdriver.PhantomJS(executable_path=path_phantom)
driver.get(url)
driver.save_screenshot(r'H:\out.png')
driver.quit()
Errors:
Traceback (most recent call last):
File "C:\Users\acer\Desktop\testing\openYoutube.py", line 5, in
driver = webdriver.PhantomJS()
File "C:\Users\acer\AppData\Local\Programs\Python\Python35-32\lib\site-package
s\selenium\webdriver\phantomjs\webdriver.py", line 51, in init
log_path=service_log_path)
File "C:\Users\acer\AppData\Local\Programs\Python\Python35-32\lib\site-package
s\selenium\webdriver\phantomjs\service.py", line 50, in init
service.Service.init(self, executable_path, port=port, log_file=open(log
_path, 'w'))
PermissionError: [Errno 13] Permission denied: 'ghostdriver.log'
Am I misplacing PhantomJS exe or missing any step ?
You can place the PhantomJS v2.1.1 binary at any location within your system and use the following code block :
from selenium import webdriver
url = "http://www.url.com.br/contact.asp"
path_phantom = r'C:\your_path\phantomjs-2.1.1-windows\bin\phantomjs.exe'
driver = webdriver.PhantomJS(executable_path=path_phantom)
driver.set_window_size(1400,1000)
driver.get(url)
Update :
Please consider the following points and try the following code block with debug messages:
Run CCleaner tool to wipe off all the OS chores from your system.
You can opt for a System Reboot.
Try to keep the Python Application, WebBrowser binaries and the WebDriver binaries i.e. phantomjs.exe on the same drive.
from selenium import webdriver
url = "http://www.google.com"
path_phantom = r'C:\Utility\phantomjs-2.1.1-windows\bin\phantomjs.exe'
driver = webdriver.PhantomJS(executable_path=path_phantom)
print("PhantomJS browser invoked")
driver.get(url)
print("Browser Initialized")
driver.save_screenshot("C://Utility//out.png")
driver.quit()
print("Browser Closed")
Problem seems to be with the log file.
Changing path of log file solved this problem.
path_phantom = r'H:\phantomjs\bin\phantomjs.exe'
log_path=r'H:\ghostdriver.log' #changed path to a temporary file.
# service_log_path is required to change path of log file.
driver = webdriver.PhantomJS(executable_path=path_phantom,service_log_path=log_path)
From your error:
PermissionError: [Errno 13] Permission denied: 'ghostdriver.log
Seems that it try to create this file ghostdriver.log but fails because of the permissions.
As suggested in this answer, try to add the argument
service_log_path=os.path.devnull
to the function webdriver.PhantomJS().
Or make sure it is able to create the file.