get 50 search results per search in selenium python - python-3.x

When I open a chrome window from selenium and make a get request, the number of results that are being displayed are 10 on google.
Is there a way to get 50 results per page?
Or is there a way to modify the string such that it will fetch 50 results?.
Below is my code, please help me
from selenium import webdriver
chrome_path = r"C:\Users\Desktop\chromedriver_win32\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
url = 'https://www.google.com/search?q=some random string'
driver.get(url)
Now, when I execute the above code, it opens chrome and fetches 10 results.
Is there a way to get 50 results out of the search string?
If not in selenium, is there any other way to achieve it?
Please help me.
Thanks in advance

Here's it:
from selenium import webdriver
driver = webdriver.Firefox()
driver.get(
"https://www.google.com/search?q=some random string&num=200&start=0&sourceid=chrome&ie=UTF-8")

Related

Selenium webdriver python element screenshot not working properly

I looked up Selenium python documentation and it allows one to take screenshots of an element. I tried the following code and it worked for small pages (around 3-4 actual A4 pages when you print them):
from selenium.webdriver import FirefoxOptions
firefox_profile = webdriver.FirefoxProfile()
firefox_profile.set_preference("browser.privatebrowsing.autostart", True)
# Configure options for Firefox webdriver
options = FirefoxOptions()
options.add_argument('--headless')
# Initialise Firefox webdriver
driver = webdriver.Firefox(firefox_profile=firefox_profile, options=options)
driver.maximize_window()
driver.get(url)
driver.find_element_by_tag_name("body").screenshot("career.png")
driver.close()
When I try it with url="https://waitbutwhy.com/2020/03/my-morning.html", it gives the screenshot of the entire page, as expected. But when I try it with url="https://waitbutwhy.com/2018/04/picking-career.html", almost half of the page is not rendered in the screenshot (the image is too large to upload here), even though the "body" tag does extend all the way down in the original HTML.
I have tried using both implicit and explicit waits (set to 10s, which is more than enough for a browser to load all contents, comments and discussion section included), but that has not improved the screenshot capability. Just to be sure that selenium was in fact loading the web page properly, I tried loading without the headless flag, and once the webpage was completely loaded, I ran driver.find_element_by_tag_name("body").screenshot("career.png"). The screenshot was again half-blank.
It seems that there might be some memory constraints put on the screenshot method (although I couldn't find any), or the logic behind the screenshot method itself is flawed. I can't figure it out though. I simply want to take the screenshot of the entire "body" element (preferably in a headless environment).
You may try this code, just that you need to install a package from command prompt using the command pip install Selenium-Screenshot
import time
from selenium import webdriver
from Screenshot import Screenshot_Clipping
driver = webdriver.Chrome()
driver.maximize_window()
driver.implicitly_wait(10)
driver.get("https://waitbutwhy.com/2020/03/my-morning.html")
obj=Screenshot_Clipping.Screenshot()
img_loc=obj.full_Screenshot(driver, save_path=r'.', image_name='capture.png')
print(img_loc)
time.sleep(5)
driver.close()
Outcome/Result comes out to be like, you just need to zoom the screenshot saved
Hope this works for you!

How to __scrape__ data off page loaded via Javascript

I want to scrape the comments off this page using beautifulsoup - https://www.x....s.com/video_id/the-suburl
The comments are loaded on click via Javascript. The comments are paginated and each page loads comments on click too. I wish to fetch all comments, for each comment, I want to get the poster profile url, the comment, no. of likes, no of dislikes, and time posted (as stated on the page).
The comments can be a list of dictionaries.
How do I go about this?
This script will print all comments found on the page:
import json
import requests
from bs4 import BeautifulSoup
url = 'https://www.x......com/video_id/gggjggjj/'
video_id = url.rsplit('/', maxsplit=2)[-2].replace('video', '')
u = 'https://www.x......com/threads/video/ggggjggl/{video_id}/0/0'.format(video_id=video_id)
comments = requests.post(u, data={'load_all':1}).json()
for id_ in comments['posts']['ids']:
print(comments['posts']['posts'][id_]['date'])
print(comments['posts']['posts'][id_]['name'])
print(comments['posts']['posts'][id_]['url'])
print(BeautifulSoup(comments['posts']['posts'][id_]['message'], 'html.parser').get_text())
# ...etc.
print('-'*80)
This would be done with Selenium. Selenium emulates a browser. Depending on your preferences you can use a chrome driver or the Firefox driver which is the geckodriver.
Here is a link on how to install the chrome webdriver:
http://jonathansoma.com/lede/foundations-2018/classes/selenium/selenium-windows-install/
Then in your code here is how you would set it up:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
# this part may change depending on where you installed the webdriver.
# You may have to define the path to the driver.
# For me my driver is in C:/bin so I do not need to define the path
chrome_options = Options()
# or '-start maximized' if you want the browser window to open
chrome_options.add_argument('--headless')
driver = webdriver.Chrome(options=chrome_options)
driver.get(your_url)
html = driver.page_source # downloads the html from the driver
Selenium has several functions that you can use to perform certain actions such as click on elements on the page. Once you find an element with selenium you can use the .click() method to interact with the element.
Let me know if this helps

How to click the Continue button using Selenium and Python

I'm trying to automate some tedious copy / paste I do monthly from my bank's online service via Selenium and Python 3. Unfortunately, I can't get Selenium to click the log-in link.
It's the blue continue button at https://www1.bmo.com/onlinebanking/cgi-bin/netbnx/NBmain?product=5.
Strangely, when I try to click that link manually in the browser launched by Selenium, it doesn't work either - whereas it does work in a browser I launch manually.
I suspect the issue is that the bank's website is smart enough to detect that I'm automating the browser activity. Is there any way to get around that?
If not, could it be something else?
I've tried using Chrome and Firefox - to no avail. I'm using a 64 bit Windows 10 machine with Chrome 73.0.3683.103 and Firefox 66.0.
Relevant code is below.
#websites and log in information
bmo_login_path = 'https://www1.bmo.com/onlinebanking/cgi-bin/netbnx/NBmain?product=5'
bmo_un = 'fake_user_name'
bmo_pw = 'fake_password'
#Selenium setup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
chrome_driver_path = 'C:\\Path\\To\\Driver\\chromedriver.exe'
gecko_driver_path = 'C:\\Path\\To\\Driver\\geckodriver.exe'
browswer_bmo = webdriver.Firefox(executable_path = gecko_driver_path)
#browswer_bmo = webdriver.Chrome(executable_path = chrome_driver_path)
#log into BMO
browswer_bmo.get(bmo_login_path)
time.sleep(5)
browswer_bmo.find_element_by_id('siBankCard').send_keys(bmo_un)
browswer_bmo.find_element_by_id('regSignInPassword').send_keys(bmo_pw)
browswer_bmo.find_element_by_id('btnBankCardContinueNoCache1').click()
Sending the keys works perfectly. I may actually have the wrong element ID (I was trying to test that in Chrome when I realized I couldn't click the link manually) - but I think the bigger issue is that I can't manually click the link in the browser launched by Selenium. Thank you for any ideas.
EDIT
This is a screenshot that I get of all I get when I try to click the continue button.
Ultimately the error message I get in my IDE (Jupyter Notebook) is:
TimeoutException: Message: timeout
(Session info: chrome=74.0.3729.108)
(Driver info: chromedriver=74.0.3729.6 (255758eccf3d244491b8a1317aa76e1ce10d57e9-refs/branch-heads/3729#{#29}),platform=Windows NT 10.0.17134 x86_64)
To click on the button with text as Continue you can fill up the Card Number and Password field inducing WebDriverWait for the element_to_be_clickable() and you can use the following solution:
Code Block:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = webdriver.ChromeOptions()
options.add_argument('start-maximized')
options.add_argument('disable-infobars')
options.add_argument('--disable-extensions')
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get('https://www1.bmo.com/onlinebanking/cgi-bin/netbnx/NBmain?product=5')
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input.dijitReset.dijitInputInner#siBankCard[name='FBC_Number']"))).send_keys("1234567890112233")
driver.find_element_by_css_selector("input.dijitReset.dijitInputInner#regSignInPassword[name='FBC_Password']").send_keys("fake_password")
driver.find_element_by_css_selector("span.dijitReset.dijitInline.dijitIcon.dijitNoIcon").click()
# driver.quit()
Browser Snapshot:
I was able to fix this issue and solve the problem by adding the following line below the options variables. This disables the chrome check for automation. I used the whole sale code and then added the following line in the correct location before starting the driver.
options.add_experimental_option("excludeSwitches", ['enable-automation'])
ref: https://help.applitools.com/hc/en-us/articles/360007189411--Chrome-is-being-controlled-by-automated-test-software-notification

Selenium NoSuchElement Exception with specific website

I'm recently trying to learn Selenium and found a website that just ignores my attempts to find particular element by ID, name or xpath. The website is here:
https://www.creditview.pl/PL/Creditview.htm
I am trying to select first text window, the one labeled Uzytkownik, the code for it goes like that:
I am trying to find it using several methods:
from selenium import webdriver
browser = webdriver.Chrome()
site = "https://www.creditview.pl/pl/creditview.htm"
browser.get(site)
login_txt = browser.find_element_by_xpath(r"/html//input[#id='ud_username']")
login_txt2 = browser.find_element_by_id("ud_username")
login_txt3 = browser.find_element_by_name("ud_username")
No matter what I try I keep getting:
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element:
as if the element wasn't there at all.
I have suspected that the little frame containing the field might be an iframe and tried to switch to various elements with no luck. Also tried to check if the element isn't somehow obscured to my code (hidden element). Nothing seems to work, or I am making some newbie mistake and the answer is right in front of me. Finally I was able to select other element on the site and used several TAB keys to move cursor to desired position, but is feels like cheating.
Can someone please point show me how to find the element ? I literally can't sleep because of this issue :)
Given that your element is there, you still need to wait for your element to be loaded/visible/clickable etc. You can do that using selenium's expected conditions (EC).
from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.by import By
my_XPATH = r"/html//input[#id='ud_username']"
wait_time = 10 # Define maximum time to wait in seconds
driver = webdriver.Chrome()
site = "https://www.creditview.pl/pl/creditview.htm"
driver.get(site)
try:
my_element = driver.WebDriverWait(driver, wait_time).until(EC.presence_of_element_located(By.XPATH,my_XPATH))
except:
print ("element not found after %d seconds" % (wait_time))

Unable to locate the referenced ID using selenium and python

I am trying to locate a search box with id as (search2) from a website. I have been able to successfully login to the website using the below code.
import requests
from tqdm import tqdm
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
options = webdriver.ChromeOptions()
driver = webdriver.Chrome(executable_path=r'C:\chromedriver_win32\chromedriver.exe', chrome_options=options)
driver.implicitly_wait(30)
tgt = "C:\\mypath"
profile = {"plugins.plugins_list": [{"enabled":False, "name":"Chrome PDF Viewer"}],
"download.default_directory" : tgt}
options.add_experimental_option("prefs",profile)
print(options)
driver.get("http://mylink.com/")
user=driver.find_element_by_id("username")
passw=driver.find_element_by_id("password")
user.send_keys("abc#xyz.com")
passw.send_keys("Pwd")
driver.find_element_by_xpath('/html/body/div[2]/div/div/div[2]/form/div[3]/button').click()
page=driver.find_element_by_id("search2")
print(page)
The code works perfectly till here but the moment I add the below to it I get an error
page.send_keys("abc")
The error that I get is as below.
selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
What I am trying to do here is login to the website and search for some items and download the results. I have already tried using the implicitly wait options as mentioned in the code. Any help would be highly appreciated.
Adding the below piece of code did the trick. Had to make the current thread sleep while the program continues to run the next steps.
time.sleep(5)
Thanks everyone.

Resources