Pycharm selenium how to open actual webbroswer window (LINUX) - python-3.x

APOLOGISE FOR FIRST POST, I AM NEW TO STACK OVERFLOW...GREATLY APPRECICIATE THE HELP...
I can get it to run without opening the actual window, I want to see the actual page it opens though...
I have..
-Imported os
-Made sure webdriver is up to date and matches current version
-is on path..(absolute and relative, even put driver in the same file
tried Chrome and Firefox
checked chown is me and is executable
-it will open from separate terminal instance when I type chromedriver.
-'which chromedriver' shows /usr/bin/chromedriver (and I used that as path.
I have a very new linux system running Ubuntu (POPos)
ALL UPDATED AND UPGRADED.
I don't know what is wrong...
from selenium import webdriver
import os
import time
options = webdriver.ChromeOptions()
options.add_argument('--headless') # Remove this if you want a selenium controlled browser window
options.add_argument('--ignore-certificate-errors')
user_agent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36'
options.add_argument('user-agent={0}'.format(user_agent))
preferences = {
"profile.default_content_settings.popups": 0,
"download.default_directory": os.getcwd() + os.path.sep,
"directory_upgrade": True
} # My own set of preferences, use what you want
options.add_experimental_option('prefs', preferences)
driver = webdriver.Chrome("/home/wprice/PycharmProjects/sele/chromedriver-Linux64", options=options) # Since I am using Windows
driver.get("HTTPS://GOOGLE.COM")
time.sleep(20)
driver.save_screenshot("test.png")
ERRORS:
/home/wprice/PycharmProjects/sele/bin/python /home/wprice/PycharmProjects/sele/sele.py
Traceback (most recent call last):
File "/home/wprice/PycharmProjects/sele/sele.py", line 18, in <module>
driver = webdriver.Chrome("/home/wprice/PycharmProjects/sele/chromedriver-Linux64", options=options) # Since I am using Windows
File "/home/wprice/PycharmProjects/sele/lib/python3.8/site-packages/selenium/webdriver/chrome/webdriver.py", line 73, in __init__
self.service.start()
File "/home/wprice/PycharmProjects/sele/lib/python3.8/site-packages/selenium/webdriver/common/service.py", line 98, in start
self.assert_process_still_running()
File "/home/wprice/PycharmProjects/sele/lib/python3.8/site-packages/selenium/webdriver/common/service.py", line 109, in assert_process_still_running
raise WebDriverException(
selenium.common.exceptions.WebDriverException: Message: Service /home/wprice/PycharmProjects/sele/chromedriver-Linux64 unexpectedly exited. Status code was: 127

We don't have your code / stack trace, so I am guessing it's probably one of the following issues I can think of at the moment::
Your editor is unable to find the correct web driver path
Your Chrome web driver version doesn't match with the installed chrome version
Error in initializing the web drivers
So, based on that you can try this code, just replace the variables with the right values and it should probably work
options = webdriver.ChromeOptions()
options.add_argument('--headless') # Remove this if you want a selenium controlled browser window
options.add_argument('--ignore-certificate-errors')
user_agent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
options.add_argument('user-agent={0}'.format(user_agent))
preferences = {
"profile.default_content_settings.popups": 0,
"download.default_directory": os.getcwd() + os.path.sep,
"directory_upgrade": True
} # My own set of preferences, use what you want
options.add_experimental_option('prefs', preferences)
driver = webdriver.Chrome("CHROMEDRIVER.EXE_ABSOLUTE_PATH", options=options) # Since I am using Windows
driver.get("WEBSITE_TO_SCRAPE")
time.sleep(20)
driver.save_screenshot("test.png")

Related

How do I find the XPATH of a certain element from a headless browser?

I'm accessing Google images with a headless linux chromeDriver browser. Here are my headers:
`
headers = ({'User-Agent':
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit\
/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'})
`
I'm downloading an image from an XPATH. My script works fine on my desktop. But as I cronjob this script from a linux server with a headless Linux chromeDriver browser, it's downloading an image from a different XPATH.
The problem is that I can't manually see which XPATH the script is downloading whilst on the Linux server.
However, I'm certain this is a "browser" problem because I understand that XPATHs are different between browsers. Are my headers (above) telling my script to use Firefox? Am I actually using Chrome?
On my desktop, I'm running Chrome 108 on macOS (Big Sur) and my script works fine.
Tried:
# Finding 2nd image
small_image = browser.find_element('xpath',"//*[#id='islrg']/div[1]/div[2]")
print("Clicking...")
small_image.click()
time.sleep(2)
# Expanding image to full resolution
big_image = browser.find_element('xpath',"//*[#id='Sva75c']/div[2]/div/div[2]/div[2]/div[2]/c-wiz/div[2]/div[1]/div[1]/div[2]/div/a/img")
time.sleep(5)
# Download image
imageURL= big_image.get_attribute('src')
Actual result:
got attribute('src) of SMALL_IMAGE instead of BIG_IMAGE, from what I can see.

I installed requests but still got import requests ImportError: No module named requests python

I'm following this tutorial for web scraping and tried to run it as test initially but always get this error message on VS Code even though I've installed the latest Python version 3.8.1 along with requests module, this's the error message below
This's the tutorial link and you can pause at 5:07 to see him running and testing the code normally without any errors.
https://www.youtube.com/watch?v=Bg9r_yLk7VY&t=241s
And this's my code so far running on Mac OS
import requests
from bs4 import BeautifulSoup
URL = 'https://www.amazon.com/-/de/dp/B07RF1XD36/ref=lp_16225007011_1_6?s=computers-intl-ship&ie=UTF8&qid=1581249551&sr=1-6'
headers ={"User-Agent": 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36'}
page = requests.get(URL, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
print(soup.prettify())
print('Hello')

Set Accepted-Lang for Chrome Headless with Selenium (Python)

Setting the Accepted-Lang header works fine with regular Chrome via ChromeOptions
options.add_experimental_option('prefs', {'intl.accept_languages': 'en,en_US'})
I'm trying to switch to new headless Chrome, but apparently this option has no effect when checking headers on validator.w3.org. Can I change them in another way? Anybody knows if support for this feature is coming?
Using Chrome 60, Chromedriver 2.30, Selenium 3.4.3, Python 3.6.1 on MacOS
Using this code:
from selenium import webdriver
print('Start')
options = webdriver.ChromeOptions()
options.add_argument('headless')
options.add_experimental_option('prefs', {'intl.accept_languages':'en,en_US'})
driver = webdriver.Chrome(chrome_options=options)
driver.get('http://validator.w3.org/i18n-checker/check?uri=google.com#validate-by-uri+')
print('Loaded')
# Check headers in output.html file. Search for 'Request headers'
html_source = driver.page_source
file = open('output.html', 'w')
file.write(html_source)
file.close
driver.implicitly_wait(5)
# Or check headers with select
# WARNING: This fails with 'headless' chrome option!
element = driver.find_element_by_xpath("//code[#class='headers_accept_language_0']").get_attribute('textContent')
print('Element:', element)
driver.close()
print('Finish')
Thanks!
That should be possible using the Chrome-developer-protocoll (cdp).
You can execute cdp commands using driver.execute_cdp_cmd().
Implemented it to Selenium-Profiles

Trying to use Selenium web-driver / PhantomJS in Python3. Only working for http?

As the title says, i'm trying to use PhantomJS with Selenium web-driver in python3, however all attempts at getting PhantonJS to even connect to SSL/TLS websites has returned nothing but HTML tags and a blank screenshot?
I tried any possible solution i've found in regards to this problem however none of them have rectified the issue for me. Is PhantomJS broken or am i missing something here?
(CODE EXAMPLE)
(GNU/Linux)
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
dcap = dict(DesiredCapabilities.PHANTOMJS)
dcap["phantomjs.page.settings.userAgent"] = (
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/53 "
"(KHTML, like Gecko) Chrome/15.0.87"
)
try:
driver = webdriver.PhantomJS(desired_capabilities=dcap, service_args=['--ignore-ssl-errors=true', '--ssl-protocol=any'])
driver.set_window_size(2000, 1500)
driver.get('https://www.google.com/')
driver.save_screenshot('/root/Desktop/screen.png')
except Exception as E:
print(E)

How can I put a webkit PyQt window as wallpaper on my desktop programmatically?

I try to put google calendar in a webkit window as wallpaper.
The python working script looks like :
#!/usr/bin/python2 -u
# -*- coding: iso8859-15 -*-
display_UI = True
email = "xxxx#gmail.com"
passwd = "xxxxxxxx"
useragent = "Mozilla/5.0 (X11; Linux x86_64; rv:7.0.1) Gecko/20100101 Firefox/7.0.1"
import spynner
browser = spynner.Browser(
debug_level=spynner.WARNING,
user_agent = useragent
)
browser.create_webview(display_UI)
browser.webview.setWindowTitle('Gcalendar')
browser.webview.showMaximized()
browser.load("https://accounts.google.com/ServiceLogin?service=cl&passive=1209600&continue=https://www.google.com/calendar/render&followup=http://www.google.com/calendar&scc=1")
browser.fill("input[name=Email]", email)
browser.fill("input[name=Passwd]", passwd)
browser.click("input[name=signIn]")
browser.wait_load()
browser.load("https://www.google.com/calendar/render?pli=1")
browser.wait_load()
# vim:ts=4:sw=4
What I would like to do now, is to programmatically put this window as a wallpaper :
skip taskbar
skip pager
full screen (DONE with spynner module)
if I hide all applications, the window should stay like any wallpapers.
What I tried without sucess
kde advanced settings on application name
xwinwrap
You can take a screenshot of the page, save it somewhere and then launch a command that changes the background.
image = spynner.QImage(browser.webpage.viewportSize(), spynner.QImage.Format_ARGB32)
painter = spynner.QPainter(image)
browser.webpage.mainFrame().render(painter)
painter.end()
image.save("/path/to/img/gcscreen.png")
Then use the subprocess module to call a command that changes the wallpaper from terminal. A google search resulted with this for a Gnome ubuntu. I'm sure you can also find a similar one for KDE if you like.

Resources