save document.cookie output in a file - python-3.x

skillshare-downloader says:
grab your cookie by typing:
document.cookie
Then it says:
Copy-paste cookie from developer console (without " if present) into example script.
Example:
from downloader import Downloader
cookie = """
ADD YOUR COOKIE HERE
"""
It adds an extra step.
Is there any way we can save document.cookie output to a file so that we can just read the cookie from the file instead of going to the console and type document.cookie and copy-paste the output?
I checked How to write console.log to a file instead. I also checked Python open browser and run javascript function. It suggests using Selenium or webbroser module. However, I am not sure how to approach this problem.
What can I do?

Assuming that you are using chrome:
Install selenium by running in a terminal pip install selenium
Install a chromedriver via a manager (It allows you to control chrome) by running in a terminal pip install webdriver-manager
Create a file example.py and paste this inside
#import the selenium webdriver and the chromedriver
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from time import sleep
#trying to stop skillshare from detecting we a are a bot
options = webdriver.ChromeOptions()
options.add_argument('--disable-blink-features=AutomationControlled')
#create the instance of chrome
driver = webdriver.Chrome(ChromeDriverManager().install(),options=options)
#get command used to open an url
driver.get('https://www.skillshare.com/')
#login part
#press on sign in
driver.find_element_by_css_selector("a.button.alt-white-ghost.transparent.initialized").click()
#change email here
driver.find_element_by_name("email").send_keys("testmail#mail.com")
#change password here
driver.find_element_by_name("password").send_keys("fakepassword")
#wait a second
sleep(1)
#click on sign in
driver.find_element_by_xpath("//span[text()='Sign In']/parent::button").click()
#wait 3 seconds for the login
sleep(3)
#execute_script is used to execute the command in the browser console, using return here to store it in a variable
cookie = driver.execute_script('return document.cookie')
#python way of creating a file on the given path and write the cookie inside it
f = open("D:\cookie.txt", "w")
f.write(cookie)
f.close()
#closing the chrome instance
driver.close()
open a terminal and type python example.py and it will run the script

Related

Why does opt.add_argument('--user-data-dir='+r'path') work but opt.add_argument('--user-data-dir='+fr'"{path}"') doesn't as an option to Selenium?

I have realized of something very weird when trying to deploy a chrome driver using --user-data-dir and --profile-directory from the user on Python 3.9.7, see below:
If you compile the following code:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
opt = Options() #the variable that will store the selenium options
opt.add_argument('--user-data-dir='+r'C:\Users\ResetStoreX\AppData\Local\Google\Chrome\User Data') #Add the user data path as an argument in selenium Options
opt.add_argument('--profile-directory=Default') #Add the profile directory as an argument in selenium Options
s = Service('C:/Users/ResetStoreX/AppData/Local/Programs/Python/Python39/Scripts/chromedriver.exe')
driver = webdriver.Chrome(service=s, options=opt)
driver.get('https://opensea.io/login?referrer=%2Faccount')
You get successfully a chrome driver instance using the corresponding --user-data-dir and --profile-directory:
Now, after killing all chrome driver instances using the following code on cmd:
taskkill /F /IM chromedriver.exe
And then compiling this other code:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
opt = Options() #the variable that will store the selenium options
path = input('Introduce YOUR profile path:')
opt.add_argument('--user-data-dir='+fr'"{path}"') #Add the user data path as an argument in selenium Options
opt.add_argument('--profile-directory=Default') #Add the profile directory as an argument in selenium Options
s = Service('C:/Users/ResetStoreX/AppData/Local/Programs/Python/Python39/Scripts/chromedriver.exe')
driver = webdriver.Chrome(service=s, options=opt)
driver.get('https://opensea.io/login?referrer=%2Faccount')
For finally typing: C:\Users\ResetStoreX\AppData\Local\Google\Chrome\User Data as input
You get this error:
WebDriverException: unknown error: Could not remove old devtools port
file. Perhaps the given user-data-dir at
"C:\Users\ResetStoreX\AppData\Local\Google\Chrome\User Data"
is still attached to a running Chrome or Chromium process
Why does that happen?
Isn't opt.add_argument('--user-data-dir='+fr'"{path}"') a valid way of passing this user data path:
path = C:\Users\ResetStoreX\AppData\Local\Google\Chrome\User Data ?
I figured it out, I was creating a syntax error with opt.add_argument('--user-data-dir='+fr'"{path}"'), so I changed it for opt.add_argument('--user-data-dir='+fr'{path}'), the improved code would be the following:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
opt = Options() #the variable that will store the selenium options
path = input('Introduce YOUR profile path:')
opt.add_argument('--user-data-dir='+fr'{path}') #Add the user data path as an argument in selenium Options
opt.add_argument('--profile-directory=Default') #Add the profile directory as an argument in selenium Options
s = Service('C:/Users/ResetStoreX/AppData/Local/Programs/Python/Python39/Scripts/chromedriver.exe')
driver = webdriver.Chrome(service=s, options=opt)
driver.get('https://opensea.io/login?referrer=%2Faccount')
After compiling this code, the program will run without throwing any errors and get the same result as the first code shown in this post.

Chrome browser closes automatically when run chromedriver inside a method/function

I was trying to run some selenium test in chrome browser in Pycharm IDE.I wrote the chrome driver inside a function & when i tried to run the code it opened up the browser and closed automatically within a sec.But when I wrote the chrome driver outside the function it open the browser and didnot close. How do i manage to keep the browser open if I write the chromedriver code inside a method/function?
Code:
from selenium import webdriver
import os
class Chrome:
def Run(self):
driverLocation="F:\\Workspace py\chromedriver\chromedriver.exe"
os.environ["webdriver.chrome.driver"] = driverLocation
driver = webdriver.Chrome(driverLocation)
driver.get("https://www.google.com")
Test=Chrome()
Test.Run()
This worked for me:
from selenium import webdriver
import os
class Chrome:
def Run(self):
self.driverLocation="F:\\Workspace py\chromedriver\chromedriver.exe"
os.environ["webdriver.chrome.driver"] = self.driverLocation
self.driver = webdriver.Chrome(driverLocation)
self.driver.get("https://www.google.com")
Test=Chrome()
Test.Run()

--headless is not an option in Chrome WebDriver for Selenium

I would like to have Selenium run a headless instance of Google Chrome to mine data from certain websites without the UI overhead. I downloaded the ChromeDriver executable from here and copied it to my current scripting directory.
The driver appears to work fine with Selenium and is able to browse automatically, however I cannot seem to find the headless option. Most online examples of using Selenium with headless Chrome go something along the lines of:
import os
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.binary_location = '/Applications/Google Chrome Canary.app/Contents/MacOS/Google Chrome Canary'`
driver = webdriver.Chrome(executable_path=os.path.abspath(“chromedriver"), chrome_options=chrome_options)
driver.get("http://www.duo.com")`
However when I inspect the possible arguments for the Selenium WebDriver using the command chromedriver -h this is what I get:
D:\Jobs\scripts>chromedriver -h
Usage: chromedriver [OPTIONS]
Options
--port=PORT port to listen on
--adb-port=PORT adb server port
--log-path=FILE write server log to file instead of stderr, increases log level to INFO
--log-level=LEVEL set log level: ALL, DEBUG, INFO, WARNING, SEVERE, OFF
--verbose log verbosely (equivalent to --log-level=ALL)
--silent log nothing (equivalent to --log-level=OFF)
--append-log append log file instead of rewriting
--replayable (experimental) log verbosely and don't truncate long strings so that the log can be replayed.
--version print the version number and exit
--url-base base URL path prefix for commands, e.g. wd/url
--whitelisted-ips comma-separated whitelist of remote IP addresses which are allowed to connect to ChromeDriver
No --headless option is available.
Does the ChromeDriver obtained from the link above allow for headless browsing?
--headless is not argument for chromedriver but for Chrome. --headless Run chrome in headless mode, i.e., without a UI or display server dependencies. ChromeDriver is a separate executable that WebDriver uses to control Chrome and Webdriver is a a collection of language specific bindings to drive a browser.
I am able to run in headless mode with this set of options. I hope this will help:
from bs4 import BeautifulSoup, NavigableString
from selenium.webdriver.chrome.options import Options
from selenium import webdriver
import requests
import re
options = Options()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-gpu')
browser = webdriver.Chrome(chrome_options=options) # see edit for recent code change.
browser.implicitly_wait(20)
Update 12 Aug 2019:
old : browser = webdriver.Chrome(chrome_options=options)
new : browser = webdriver.Chrome(options=options)
Try
options.headless=True
The following is how I set up my headless chrome
options = webdriver.ChromeOptions()
options.headless=True
options.add_argument('window-size=1920x1080')
prefs = {
"download.default_directory": r"C:\FilePath\Download",
"download.prompt_for_download": False,
"download.directory_upgrade": True}
options.add_experimental_option('prefs', prefs)
chromedriver = (r"C:\Filepath\chromedriver.exe")
--headless is not argument for chromedriver but Chrome, you can see more arguments or Command Line Switches for chrome here

Where to place PhantomJS exe?

I am trying to use PhantomJS with Selenium and Python.
My understanding is:
I will have to write Python script utilizing Selenium package which will interact with Selenium to operate on PhantomJS WebDriver to automate web application testing.
I have installed following:
Python v3.5.1.
Selenium using pip install selenium v3.7.0.
PhantomJS v2.1.1
In meantime I tested using Chrome WebDriver by placing it in PATH, and it executes without errors. Following is my script to open google.com using chrome webdriver.
from selenium import webdriver
driver = webdriver.Chrome() # or add to your PATH
driver.get('https://google.com/')
Using PhantomJS:
from selenium import webdriver
url = "http://www.google.com"
path_phantom = r'H:\phantomjs\bin\phantomjs.exe'
driver = webdriver.PhantomJS(executable_path=path_phantom)
driver.get(url)
driver.save_screenshot(r'H:\out.png')
driver.quit()
Errors:
Traceback (most recent call last):
File "C:\Users\acer\Desktop\testing\openYoutube.py", line 5, in
driver = webdriver.PhantomJS()
File "C:\Users\acer\AppData\Local\Programs\Python\Python35-32\lib\site-package
s\selenium\webdriver\phantomjs\webdriver.py", line 51, in init
log_path=service_log_path)
File "C:\Users\acer\AppData\Local\Programs\Python\Python35-32\lib\site-package
s\selenium\webdriver\phantomjs\service.py", line 50, in init
service.Service.init(self, executable_path, port=port, log_file=open(log
_path, 'w'))
PermissionError: [Errno 13] Permission denied: 'ghostdriver.log'
Am I misplacing PhantomJS exe or missing any step ?
You can place the PhantomJS v2.1.1 binary at any location within your system and use the following code block :
from selenium import webdriver
url = "http://www.url.com.br/contact.asp"
path_phantom = r'C:\your_path\phantomjs-2.1.1-windows\bin\phantomjs.exe'
driver = webdriver.PhantomJS(executable_path=path_phantom)
driver.set_window_size(1400,1000)
driver.get(url)
Update :
Please consider the following points and try the following code block with debug messages:
Run CCleaner tool to wipe off all the OS chores from your system.
You can opt for a System Reboot.
Try to keep the Python Application, WebBrowser binaries and the WebDriver binaries i.e. phantomjs.exe on the same drive.
from selenium import webdriver
url = "http://www.google.com"
path_phantom = r'C:\Utility\phantomjs-2.1.1-windows\bin\phantomjs.exe'
driver = webdriver.PhantomJS(executable_path=path_phantom)
print("PhantomJS browser invoked")
driver.get(url)
print("Browser Initialized")
driver.save_screenshot("C://Utility//out.png")
driver.quit()
print("Browser Closed")
Problem seems to be with the log file.
Changing path of log file solved this problem.
path_phantom = r'H:\phantomjs\bin\phantomjs.exe'
log_path=r'H:\ghostdriver.log' #changed path to a temporary file.
# service_log_path is required to change path of log file.
driver = webdriver.PhantomJS(executable_path=path_phantom,service_log_path=log_path)
From your error:
PermissionError: [Errno 13] Permission denied: 'ghostdriver.log
Seems that it try to create this file ghostdriver.log but fails because of the permissions.
As suggested in this answer, try to add the argument
service_log_path=os.path.devnull
to the function webdriver.PhantomJS().
Or make sure it is able to create the file.

Set Accepted-Lang for Chrome Headless with Selenium (Python)

Setting the Accepted-Lang header works fine with regular Chrome via ChromeOptions
options.add_experimental_option('prefs', {'intl.accept_languages': 'en,en_US'})
I'm trying to switch to new headless Chrome, but apparently this option has no effect when checking headers on validator.w3.org. Can I change them in another way? Anybody knows if support for this feature is coming?
Using Chrome 60, Chromedriver 2.30, Selenium 3.4.3, Python 3.6.1 on MacOS
Using this code:
from selenium import webdriver
print('Start')
options = webdriver.ChromeOptions()
options.add_argument('headless')
options.add_experimental_option('prefs', {'intl.accept_languages':'en,en_US'})
driver = webdriver.Chrome(chrome_options=options)
driver.get('http://validator.w3.org/i18n-checker/check?uri=google.com#validate-by-uri+')
print('Loaded')
# Check headers in output.html file. Search for 'Request headers'
html_source = driver.page_source
file = open('output.html', 'w')
file.write(html_source)
file.close
driver.implicitly_wait(5)
# Or check headers with select
# WARNING: This fails with 'headless' chrome option!
element = driver.find_element_by_xpath("//code[#class='headers_accept_language_0']").get_attribute('textContent')
print('Element:', element)
driver.close()
print('Finish')
Thanks!
That should be possible using the Chrome-developer-protocoll (cdp).
You can execute cdp commands using driver.execute_cdp_cmd().
Implemented it to Selenium-Profiles

Resources