selenium driver not saving the webpage content

selenium driver not saving the webpage content - python-3.x

Below code is creating an empty task_list.html file. It is not saving the full html content file , which is what i want . How to fix it?
from selenium import webdriver
import codecs
driver = webdriver.Firefox()
html = driver.page_source
driver.get("<our_internal_work_website>")
with open(r"C:\Users\task_list.html", "wb") as f:
f.write(html.encode('utf-8'))
Few points:
I am brand new to Python, just started few days back and troubleshooting issues as they come by searching on stackoverflow.
I am running this code on interactive shell on Powershell in Windows 10.
I manually enter the login credentials when opens . I have still not reached expertise to be able to put login details automatically.
I do not have html/css knowledge , so will google around in order to troubleshoot.
I want to use beautiful soup in order to parse the html , but it also requires login credentials and i do not know where to put the login credentials because it doesnt explicitly open the "Firefox" browser. Due to that , for now i am working with Selenium, because it can open the Firefox browser.

Related

Is there a way to connect to my existing browser session using playwright

I wish to connect to a website and download some pdf files. The website allows us to view the content only after log in. It asks us to log in using OTP and can't be login at more than 3 devices simultaneously.
I wish to download all the pdf listed. So I previously tried the
python playwright open --save-storage websitename.json
to save the login. But it doesn't work for that specific website.
The website.json file was empty whereas it worked for other websites.
Therefore the only solution I could think of know, is to connect to the current browser, open that website and then download those pdfs.
If you have some solution for this or even some other approach please do inform.
I was also thinking about switching over to puppeteer for the same.
But, I don't know the html parsing using node.js, since I feel using css selectors more comfortable, so I can't switch it.

Playwright is basically same as Puppeteer. So it wouldn't be a problem if you switch between the two.
You can use puppeteer-core or playwright to control your existing browser installation, for example Chrome, and then use the existing user data (Profile) folder to load the specified website login info (cookies, webstorage, etc).
const launchOptions = {
headless: false,
executablePath: '/Applications/Google Chrome/Contents/MacOS/Google Chrome', // For MacOS
// executablePath: 'C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe', // For Windows
// executablePath: '/usr/bin/google-chrome' // For Linux
args: [
'--user-data-dir=/Users/username/Library/Application Support/Google/Chrome/', // For MacOS
// '--user-data-dir=%userprofile%\\AppData\\Local\\Chrome\\User Data', // For Windows
// '--profile-directory=Profile 1' // This to select default or specified Profile
]
}
const puppeteer = require('puppeteer-core')
const browser = await puppeteer.launch(launchOptions)
For more details about Playwright's method, you can check this workaround:
https://github.com/microsoft/playwright/issues/1985

To connect to an already running browser (Chrome) session, you can use connect_over_cdp method (added in v1.9 of playwright).
For this, you need to start Chrome in debug mode. Create a desktop shortcut for Chrome and edit Target section of shortcut properties to start it with debug mode. Add --remote-debugging-port=9222 to the target box in shortcut properties so that the target path becomes:
C:\Program Files\Google\Chrome\Application\chrome.exe" --remote-debugging-port=9222
Now start Chrome and check if it is in debug mode. For this open a new tab and paste this url in the address bar: http://localhost:9222/json/version. If you are in debug mode, you should see now a page with a json response, otherwise if you are in "normal" mode, it will say "Page not found" or something similar.
Now in your python script, write following code to connect to chrome instance:
browser = playwright.chromium.connect_over_cdp("http://localhost:9222")
default_context = browser.contexts[0]
page = default_context.pages[0]
Here is the full script code:
# Import the sync_playwright function from the sync_api module of Playwright.
from playwright.sync_api import sync_playwright
# Start a new session with Playwright using the sync_playwright function.
with sync_playwright() as playwright:
# Connect to an existing instance of Chrome using the connect_over_cdp method.
browser = playwright.chromium.connect_over_cdp("http://localhost:9222")
# Retrieve the first context of the browser.
default_context = browser.contexts[0]
# Retrieve the first page in the context.
page = default_context.pages[0]
# Print the title of the page.
print(page.title)
# Print the URL of the page.
print(page.url)

Trying to close portable browser via selenium

I am trying to close portable browser via selenium
I passed --remote-debugging-port=9222 because if I do not pass it then the program is stuck in object creation of webdriver.Chrome(). It will open the portable browser but not load the URL.
But after the URL is open I want to close the browser but driver.quit() it is not working for me. I have tried some other methods for close the browser but they do not work as well.
I want to close the specific instance of the browser which is opened by this program not other opened instances of the browser.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.binary_location = 'C:/Portable/GoogleChromePortable/GoogleChromePortable.exe'
chrome_options.add_argument("--remote-debugging-port=9222")
chrome_options.add_argument("--incognito")
chrome_options.add_argument("--profile-directory=Person 1")
driver = webdriver.Chrome(options=chrome_options,executable_path='C:/Portabl/chromedriver_win32/chromedriver.exe')
url = "https://www.google.com/"
driver.get(url)
driver.quit()
I am using:
selenium 3.141.0, windows 10, python 3.8.0, portable chrome version 93.0.4577.63 (32-bit)

Your this statement
I passed --remote-debugging-port=9222 because if I do not pass it then
the program is stuck in object creation of webdriver.Chrome()
is not correct. --remote-debugging-port=9222 looks like a port number where your application is deployed and you have used chrome option to send them to browser object.
driver.quit()
this typically should have worked, what is the error when it did not work ?
also, for closing a single instance you could do
driver.close()
see if that helps.

WhatsApp without QR code scanning everytime through Selenium WebDriver

Is there any whatsapp or webdriver API to access WhatsApp web without scanning QR code everytime while accessing it using selenium and chrome webdriver in python?

This is What you need. This code Read QR and store it in cookies
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
jokes = ["You don't need a parachute to go skydiving. You need a parachute to go skydiving twice.",
"This is Test Message."]
options = Options()
options.add_argument("--user-data-dir=chrome-data")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome('/usr/local/bin/chromedriver', options=options)
driver.maximize_window()
driver.get('https://web.whatsapp.com') # Already authenticated
time.sleep(20)
##################### Provide Recepient Name Here ###############################
driver.find_element_by_xpath("//*[#title='MyJakartaNumber']").click()
for joke in jokes:
driver.find_element_by_xpath('//*[#id="main"]/footer/div[1]/div[2]/div/div[2]').send_keys(joke)
driver.find_element_by_xpath('//*[#id="main"]/footer/div[1]/div[3]/button/span').click()
time.sleep(10)
time.sleep(30)
driver.close()

Your "WhatsApp" and "QR Code" don't tell anything to me, however if you're testing an application which requires an extra action to sign in I don't think you will be able to perform it using Selenium as it's browser automation framework.
Web applications identify users via Cookies - special HTTP Headers containing client-side information. When you start a web browser via Selenium bindings - it kicks off a clean browser session which is not authenticated in "WhatsApp" (whatever it is)
The solutions would be in:
Authenticate in WhatsApp manually, store your browser profile somewhere and start Selenium by pointing it to the earlier storied profile folder
Authenticate in WhatsApp manually, store your browser cookies and use WebDriver.add_cookie() function to read the stored cookies into the current clean session

You can use "pywhatkit". pywhatkit is used to send messages using whatssapp web. Run:
pip install pywhatkit
and you are good to go.

Is it possible to open a tab or something on chrome using python?

I already know how to open chrome, but I do not know how to open a specific page from the URL. I'm using python 3.6

To open a webpage in your default browser:
import webbrowser
webbrowser.open("https://stackoverflow.com/questions/53796343")
To open a webpage in a new tab in Chrome:
import webbrowser
chrome_controller = webbrowser.get(using="chromium-browser")
chrome_controller.open_new_tab("https://stackoverflow.com/questions/53796343")
If this doesn't work for you, there are alternative values of the using parameter. Check the module's documentation.

Selenium 3.5 closing local files with Firefox 55.0.3

I'm using Selenium 3.5 for Python to test some behaviors of my web app using both local pages and online ones.
Until now, I just used Chrome as main browser to test it and everything works fine. Then, I decided to use Firefox 55.0.3 using geckodriver v0.19.0.
I have the following issues:
selenium opens the local pages such as file:///PATH/index.html but it doesn't close the browser even using the function quit();
I would use some browser option such as the incognito mode, but I was not able to find a list of all available options for Firefox.
This is a snippet of my code to use selenium with Firefox.
from selenium import webdriver
url = 'http://www.yahoo.com'
#url = 'file:///PATH/index.html'
browser = webdriver.Firefox()
browser.get(url)
browser.quit()
# borser.close() -> I also tried with the close()
I also read this question for the first issue but it doesn't work with the current version of the used framework.
About the second I can only find partial solution such as this one for the incognito mode but it doesn't work as expected.
Any ideas? Is it a kind of bug in geckodriver?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

selenium driver not saving the webpage content - python-3.x

Related

Is there a way to connect to my existing browser session using playwright

Trying to close portable browser via selenium

WhatsApp without QR code scanning everytime through Selenium WebDriver

Is it possible to open a tab or something on chrome using python?

Selenium 3.5 closing local files with Firefox 55.0.3

Categories

Resources