Why can't get the page source with headless browser using selenium? - python-3.x

I can get the page source with browser--chrome's head on.
vim get_with_head.py
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
chrome_options = Options()
browser = webdriver.Chrome(executable_path="/usr/bin/chromedriver",options=chrome_options)
browser.maximize_window()
wait = WebDriverWait(browser, 40)
url="https://www.nasdaq.com/market-activity/quotes/nasdaq-ndx-index"
browser.get(url)
wait.until(lambda e: e.execute_script('return document.readyState') != "loading")
print(browser.page_source)
It works fine.
python3 get_with_head.py
The chrome opens the webpage,all content in the webpage showns ,now i add three lines to make it a headless browser :
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument("--headless")
The whole codes:
vim get_without_head.py
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
chrome_options = Options()
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument("--headless")
browser = webdriver.Chrome(executable_path="/usr/bin/chromedriver",options=chrome_options)
browser.maximize_window()
wait = WebDriverWait(browser, 40)
url="https://www.nasdaq.com/market-activity/quotes/nasdaq-ndx-index"
browser.get(url)
wait.until(lambda e: e.execute_script('return document.readyState') != "loading")
print(browser.page_source)
It can't get the content on the webpage:
python3 get_without_head.py
<html><head>
<title>Access Denied</title>
</head><body>
<h1>Access Denied</h1>
You don't have permission to access "http://www.nasdaq.com/market-activity/quotes/nasdaq-ndx-index" on this server.<p>
Reference #18.4660dc17.1631258672.2c70b7e3
</p></body></html>
Why can get all content with browser's head on instead of in headless status ?

Why?
Headless mode uses its own default User-Agent if it is not given as an argument. However some webpages may block Headless mode User-Agent to avoid unwanted traffic. It may result in Access denied error while trying to open a webpage.
An exemplary default User-Agent for headless mode:
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/60.0.3112.50 Safari/537.36
As you see, it explicitly shows that browser is running on Headless mode.
Solution:
Change the User-Agent option.
windows_useragent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36"
linux_useragent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36"
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
chrome_options = Options()
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument("--headless")
user_agent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.50 Safari/537.36'
chrome_options.add_argument(f'user-agent={user_agent}')
browser = webdriver.Chrome(options=chrome_options)
browser.maximize_window()
wait = WebDriverWait(browser, 40)
url="https://www.nasdaq.com/market-activity/quotes/nasdaq-ndx-index"
browser.get(url)
wait.until(lambda e: e.execute_script('return document.readyState') != "loading")
print(browser.page_source)

Related

TypeError: get_mobile_load_time() missing 1 required positional argument: 'url'

from selenium import webdriver
def get_mobile_load_time(url):
mobile_user_agent = "Mozilla/5.0 (Linux; Android 8.0.0; SM-G960F Build/R16NW) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.84 Mobile Safari/537.3"
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument(f'user-agent={mobile_user_agent}')
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get(url)
load_time = driver.execute_script(
"return (window.performance.timing.loadEventEnd - window.performance.timing.navigationStart) / 1000;")
driver.quit()
return load_time
I'm trying to build Python code to assess a URL's mobile responsiveness. I accessed the URL via beautifulsoup and then passed it to the function.
However, I get 'Internal server error' and the console throws this error: "TypeError: get_mobile_load_time() missing 1 required positional argument: 'url"
What should I do?

How to detect if selenium initializer driver is headless

Let's say I have this code
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument("window-size=1920,1080")
browser=webdriver.Chrome(options=options,executable_path=r"chromedriver.exe")
browser.execute_cdp_cmd('Network.setUserAgentOverride',
{"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.53 Safari/537.36'})
How can I check if the initialized browser is headless or not, programmatically? I mean, if I type
browser.get_window_size() I get {'width': 1920, 'height': 1080}, if I write browser.execute_script('return navigator.languages') it returns ['en-US', 'en']
What I'm looking for is something like browser.is_headless() where I can get if a given browser is headless or not.
options = webdriver.ChromeOptions()
options.headless
Will return True, if --headless argument is set into ChomeOptions(), otherwise, will return False.
Based on the official Selenium documentation
options.headless
should return whether headless is set or not
If you're using Firefox (tested on Firefox 106):
if driver.caps.get("moz:headless", False):
print("Firefox is headless")

selenium: bypass access denied

I'm trying to navigate a website with Selenium, but I'm getting an error: Access Denied. You do not have permission to access "http://tokopedia.com/" on this server.
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
CHROMEDRIVER_PATH = r'C:/chromedriver.exe'
tokopedia = "https://tokopedia.com/"
options = Options()
options.add_argument("--headless")
options.add_argument('--disable-gpu')
options.add_argument('--no-sandbox')
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(executable_path=CHROMEDRIVER_PATH, chrome_options=options)
driver.get(tokopedia)
print(driver.page_source)
how to solve it? Thank you for the help
Try the below code. It is working for me -
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
tokopedia = "https://tokopedia.com/"
options = Options()
options.add_argument("--headless")
options.add_argument('--disable-gpu')
options.add_argument('--no-sandbox')
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
user_agent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.50 Safari/537.36'
options.add_argument('user-agent={0}'.format(user_agent))
driver = webdriver.Chrome(options=options)
driver.get(tokopedia)
print(driver.page_source)

Selenium not able to create login account on target.com

When I try to create an account on target.com using selenium web driver, it's showing me this error "Sorry, something went wrong. Please try again." Whereas if I try to create an account in the same browser in a different tab, account get created, how do I create an account using selenium webdriver ?
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time
options = webdriver.ChromeOptions()
options.add_argument("user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36")
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome('C:/Users/priya/Desktop/project 14/chromedriver.exe',chrome_options=options)
driver.get('https://www.target.com/')
account = driver.find_element_by_id('account')
account.click()
time.sleep(1)
create = driver.find_element_by_id('accountNav-createAccount')
create.click()
time.sleep(4)
username = driver.find_element_by_id('username')
username.send_keys('abcdef#example.com')
fname = driver.find_element_by_id('firstname')
fname.send_keys('John')
lname = driver.find_element_by_id('lastname')
lname.send_keys('Kenny')
password = driver.find_element_by_id('password')
password.send_keys('Icecram12345')
submit = driver.find_element_by_id('createAccount')
submit.click()
driver.close().
link to error: [1]: https://i.stack.imgur.com/7ksIv.png

web.Whatsapp headlessly using phantomjs

Using Phantomjs with to start web session on web.whatsapp.com, using chrome's user-agent as whatsapp not support phantomjs as user-agent
Code as Follows :
var page = require('webpage').create();
page.settings.userAgent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36';
page.viewportSize = {
width: 1200,
height: 800
};
page.open('https://web.whatsapp.com/', function() {
page.render('home.png');
phantom.exit();
});
But the output is blank white screen with dot on center
script output screenshot
any bug in my code or is there any compatible issue ?
Phantomjs is not waiting to load page completely, you can see elastic loading page icon.
Try this code with sleep.
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
user_agent = (
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"
)
dcap = dict(DesiredCapabilities.PHANTOMJS)
dcap["phantomjs.page.settings.userAgent"] = user_agent
driver = webdriver.PhantomJS(desired_capabilities=dcap, executable_path=r'/bin/phantomjs')
driver.get('http://web.whatsapp.com')
timeout = 30
try:
element_present = EC.presence_of_element_located((By.Class, 'qrcode'))
WebDriverWait(driver, timeout).until(element_present)
except TimeoutException:
print "Timed out waiting for page to load"
Note : whatsapp need cryptoSha256 and cryptoAesCbc supported browser for proper crypt management, Phantom js is not supporting cryptoSha256 and cryptoAesCbc.

Resources