What is making the webbrowser close before it finishes? - python-3.x

I have the code bellow which I know has worked before but for some reason seems to be broken now. The code is mean't to open a search engine, search for a query and return a list of results by the href tag. The webbrowser will open and navigate to http://www.startpage.com success fully, it then puts the term I have entered at the bottom into the search box but then just closes the browser. No error, no links. Nothing.
import selenium.webdriver as webdriver
def get_results(search_term):
url = "https://www.startpage.com"
browser = webdriver.Firefox()
browser.get(url)
search_box = browser.find_element_by_id("query")
search_box.send_keys(search_term)
search_box.submit()
try:
links = browser.find_elements_by_xpath("//ol[#class='web_regular_results']//h3//a")
except:
links = browser.find_elements_by_xpath("//h3//a")
results = []
for link in links:
href = link.get_attribute("href")
print(href)
results.append(href)
browser.close()
return results
get_results("dog")
Does anyone know what is wrong with this? Basically it gets to search_box.submit() then skips everything until browser.close().

Unlike find_element_by_xpath (single WebElement) If find_elements_by_xpath won't find any results it won't throw an exception, it will return an empty list. links is empty so the for loop is never executed. You can change the try except to if condition, and check if it has values
links = browser.find_elements_by_xpath("//ol[#class='web_regular_results']//h3//a")
if not links:
links = browser.find_elements_by_xpath("//h3//a")

It is not recommended to use browser close function within the function that you are testing. Instead you can use after the get_results("dog") function and keep the test logic away.
get_results("dog")
browser.close()
By doing this way selenium will complete the execution of the function first and then close the browser window.
The problem with your solution is that the method is returning the result set after the browser is closing the window due to which you are facing logical problem with your script.

Related

Selenium Web driver cannot find css selector even if its present (python)

I am trying to scrape data from seetickets.us. I am clicking on each org and then all events by that org. The scraper correctly scrape data from each event but the problem is that when I come back to all events page web driver cannot find the css selector.
Here is the site structure:
https://ibb.co/WBjMDJf
clicking on World Cafe Live get me here:
https://ibb.co/cLbMP19
clicking on any event will move me toward further info about event.
Now when the driver is coming back from extracting each event , It is not able to go into other event. I have also tried explicit wait anf time.sleep()
Here is my code:
#this is the func click on each event and extract data then come back to all event page
def get_all_events_in_each_event(self):
inner_events = self.get_all_inner_events()
print(len(inner_events))
for event in inner_events:
self.click_inner_event(event)
self.get_event_loc()
self.get_talent()
self.get_facebook()
self.get_date()
self.get_showtime_city()
self.get_ticket_doors()
self.back()
try:
WebDriverWait(self, 10).until(
EC.element_to_be_clickable((By.CLASS_NAME, "event-images-box")))
except Exception as e:
print("Wait Timed out")
print(e)
#this is the func to click on each event in all event pages
def click_inner_event(self , inner_event):
link = inner_event.find_element_by_css_selector('div[class="event-info"]')
link.click()
Here is HTML of all events page:
https://ibb.co/wcKWc68
Kindly help me with finding what's wrong here.
Thanks
As #Arundeep Chohan , correctly pointed that web driver loses reference when moving back and forth so I had to re grab all the elements.
Correct code is:
def get_all_events_in_each_event(self):
inner_events = self.get_all_inner_events()
for i in range(len(inner_events)):
self.click_inner_event(inner_events[i])
self.get_event_loc()
self.get_talent()
self.get_facebook()
self.get_date()
self.get_showtime_city()
self.get_ticket_doors()
self.back()
inner_events = self.get_all_inner_events() #regrabbing the elements
Thanks arundeep for the answer.

How to set window.alert when redirecting

I'm crawlling some web pages for my research.
I want to inject javascript code below when redirecting to other page:
window.alert = function() {};
I tried to inject the javascript code below using WebDriverWait, so that selenium may execute the code as soon as the driver redirect to new page. But It doesn't work.
while (some conditions) :
try:
WebDriverWait(browser, 5).until(
lambda driver: original_url != browser.current_url)
browser.execute_script("window.alert = function() {};")
except:
//do sth
original_url = browser.current_url
It seems that the driver execute javascript code after the page loaded because the alert that made in the redirected page is showing.
Chrome 14+ blocks alerts inside onunload (https://stackoverflow.com/a/7080331/3368011)
But, I think the following questions may help you:
JavaScript before leaving the page
How to call a function before leaving page with Javascript
JavaScript before leaving the page
I solved my problem in other way.
I tried and tried again with browser.switch_to_alert but it didn't work. Then I found that it was deprecated, so not works correctly. I checked the alert and dismiss it in every 1 second with following code :
while *some_condition* :
try:
Alert(browser).dismiss()
except:
print("no alert")
continue
This works very fine, in Windows 10, python 3.7.4

Selenium don't read all the content of a webpage

I'm newbie with selenium. I have defined a batch process where I process several pages. These pages are all equals but with different data. Therefore, I process all these pages with the same code. When I start the process this works fine, but, from time to time, I always get the same error in the same point. I have found that when I try to get the data from the web page, there are two tables that I can not get their code when the process doesn't work fine. And I don't understand anything because if I restart the process in the same page that has failed previously then it works fine!!! So, it seems that selenium not always load the content of the web page correctly.
I use python 3 and selenium with these code:
caps = DesiredCapabilities().CHROME
caps["pageLoadStrategy"] = "normal"
#b = WebDriver(executable_path="./chromedriver")
b = webdriver.Chrome(desired_capabilities=caps, executable_path="./chromedriver")
b.set_window_size(300, 300)
b.get(url)
html = b.page_source
self.setSoup(bs4.BeautifulSoup(html,'html.parser'))
b.close()
How can I avoid to get always this error which is produced from time to time?
Edit I:
I have checked that when the process works fine this sentence returns me two tables:
tables = self.get_soup().findAll("table", class_="competitor-table comparative responsive")
But, when the process works wrong, this code returns me 0 tables. How I said before, If I process again the web page that previously has gave me the error, then works fine, therefore, it returns me two tables instead of zero.
For this reason, I supose that selenium is not returns me always the code of the page, because for the same page when it works wrong returns me zero tables and when it works fine returns me two tables.
Edit II:
For example, right now I've got an error in this page:
http://www.fiba.basketball/euroleaguewomen/18-19/game/1912/Olympiacos-Perfumerias-Avenida#|tab=boxscore
The tables that I try to retrieve and I don't get them are these:
How you can see, there is two tables with this CSS class. I don't post the content of the tables because are so big.
This is the code where I try to get the content of the tables:
def set_boxscore(self):
tables = self.get_soup().findAll("table", class_="competitor-table comparative responsive")
local = False
print("Total tablas: {}".format(len(tables)))
for t in tables:
local = not local
if local:
self.home_team.set_stats(t.tfoot.find("tr", class_="team-totals"))
else:
self.away_team.set_stats(t.tfoot.find("tr", class_="team-totals"))
rows = t.tbody.findAll("tr")
for row in rows:
time = row.find("td", class_="min").string
if time.find(MESSAGES.MESSAGE_PLAYER_NOT_PLAY) == -1:
if local:
player = PlayerEuropa(row)
self.home_players.append(player)
else:
player = PlayerEuropa(row)
self.away_players.append(player)
In this code I write the total tables that I can find in them and how you can see, right now, I've got zero tables:
And, now, If I restart the process, then, it will work correctly for me.
Edit III:
One example more, about the process that I have defined. These url's has been processed correctly.
http://www.fiba.basketball/eurocupwomen/18-19/game/2510/VBW-CEKK-Ceglèd-Rutronik-Stars-Keltern#|tab=boxscore
http://www.fiba.basketball/eurocupwomen/18-19/game/2510/Elfic-Fribourg-Tarbes-GB#|tab=boxscore
http://www.fiba.basketball/eurocupwomen/18-19/game/2510/Basket-Landes-BBC-Sint-Katelijne-Waver#|tab=boxscore
But when I have tried to process this other url, then, I've got the error explained previously:
http://www.fiba.basketball/eurocupwomen/18-19/game/0111/Gorzow-Sparta-k-M-R--Vidnoje#|tab=boxscore
To render the web page I use selenium and I always do it at the beginning of the process. I get the content of the web page with this code:
def __init__(self, url):
"""Constructor"""
caps = DesiredCapabilities().CHROME
caps["pageLoadStrategy"] = "normal"
#b = WebDriver(executable_path="./chromedriver")
b = webdriver.Chrome(desired_capabilities=caps, executable_path="./chromedriver")
b.set_window_size(300, 300)
b.get(url)
html = b.page_source
self.setSoup(bs4.BeautifulSoup(html,'html.parser'))
b.close()
After this code is when I start to retrieve the information of the web page. For some reason, sometimes, the web page is not rendered completely because when I try to get some information, this is not found it and get the error explained previously.

Like instagram photo with selenium python

I'm trying to code my own bot python selenium bot for instagram to experiment a little. I succeeded to log in and to search an hashtag in the search bar which lead me to the following web page:
But I can not figure out how to like the photos in the feed, i tried to use a search with xpath and the following path:
"//[#id="reactroot"]/section/main/article/div1/div/div/div1/div1/a/div"
But it didn't work, does anybody have an idea?
First of all, in your case it is recommended to use the official Instagram Api for Python (documentation here on github).
It would make your bot much simpler, more readable and most of all lighter and faster. So this is my first advice.
If you really need using Selenium I also recommend downloading Selenium IDE add-on for Chrome here since it can save you much time, believe me. You can find a nice tutorial on Youtube.
Now let's talk about possible solutions and them implementations. After some research I found that the xpath of the heart icon below left the post behaves like that:
The xpath of the icon of the first post is:
xpath=//button/span
The xpath of the icon of the second post is:
xpath=//article[2]/div[2]/section/span/button/span
The xpath of the icon of the third post is:
xpath=//article[3]/div[2]/section/span/button/span
And so on. The first number near "article" corresponds to the number of the post.
So you can manage to get the number of the post you desire and then click it:
def get_heart_icon_xpath(post_num):
"""
Return heart icon xpath corresponding to n-post.
"""
if post_num == 1:
return 'xpath=//button/span'
else:
return f'xpath=//article[{post_num}]/div[2]/section/span/button/span'
try:
# Get xpath of heart icon of the 19th post.
my_xpath = get_heart_icon_xpath(19)
heart_icon = driver.find_element_by_xpath(my_xpath)
heart_icon.click()
print("Task executed successfully")
except Exception:
print("An error occurred")
Hope it helps. Let me know if find other issues.
i'm trying to do the same)
Here is a working method. Firstly, we find a class of posts(v1Nh3), then we catch link attribute(href).
posts = bot.find_elements_by_class_name('v1Nh3')
links = [elem.find_element_by_css_selector('a').get_attribute('href') for elem in posts]
I implemented a function that likes all the picture from an Instagram page. It could be used on the "Explore" page or simply on an user's profil page.
Here's how I did it.
First, to navigate to a profil page from Instagram's main page, I create a xPath for the "SearchBox" and a xPath to the element in the dropdown menu results corresponding to the index.
def search(self, keyword, index):
""" Method that searches for a username and navigates to nth profile in the results where n corresponds to the index"""
search_input = "//input[#placeholder=\"Search\"]"
navigate_to = "(//div[#class=\"fuqBx\"]//descendant::a)[" + str(index) + "]"
try:
self.driver.find_element_by_xpath(search_input).send_keys(keyword)
self.driver.find_element_by_xpath(navigate_to).click()
print("Successfully searched for: " + keyword)
except NoSuchElementException:
print("Search failed")
Then I open the first picture:
def open_first_picture(self):
""" Method that opens the first picture on an Instagram profile page """
try:
self.driver.find_element_by_xpath("(//div[#class=\"eLAPa\"]//parent::a)[1]").click()
except NoSuchElementException:
print("Profile has no picture")
Like each one of them:
def like_all_pictures(self):
""" Method that likes every picture on an Instagram page."""
# Open the first picture
self.open_first_picture()
# Create has_picture variable to keep track
has_picture = True
while has_picture:
self.like()
# Updating value of has_picture
has_picture = self.has_next_picture()
# Closing the picture pop up after having liked the last picture
try:
self.driver.find_element_by_xpath("//button[#class=\"ckWGn\"]").click()
print("Liked all pictures of " + self.driver.current_url)
except:
# If driver fails to find the close button, it will navigate back to the main page
print("Couldn't close the picture, navigating back to Instagram's main page.")
self.driver.get("https://www.instagram.com/")
def like(self):
"""Method that finds all the like buttons and clicks on each one of them, if they are not already clicked (liked)."""
unliked = self.driver.find_elements_by_xpath("//span[#class=\"glyphsSpriteHeart__outline__24__grey_9 u-__7\" and #aria-label=\"Like\"]//parent::button")
liked = self.driver.find_elements_by_xpath("//span[#class=\"glyphsSpriteHeart__filled__24__red_5 u-__7\" and #aria-label=\"Unlike\"]")
# If there are like buttons
if liked:
print("Picture has already been liked")
elif unliked:
try:
for button in unliked:
button.click()
except StaleElementReferenceException: # We face this stale element reference exception when the element we are interacting is destroyed and then recreated again. When this happens the reference of the element in the DOM becomes stale. Hence we are not able to get the reference to the element.
print("Failed to like picture: Element is no longer attached to the DOM")
This method checks if picture has a "Next" button to the next picture:
def has_next_picture(self):
""" Helper method that finds if pictures has button \"Next\" to navigate to the next picture. If it does, it will navigate to the next picture."""
next_button = "//a[text()=\"Next\"]"
try:
self.driver.find_element_by_xpath(next_button).click()
return True
except NoSuchElementException:
print("User has no more pictures")
return False
If you'd like to learn more feel free to have a look at my Github repo: https://github.com/mlej8/InstagramBot

trouble getting the current url on selenium

I want to get the current url when I am running Selenium.
I looked at this stackoverflow page: How do I get current URL in Selenium Webdriver 2 Python?
and tried the things posted but it's not working. I am attaching my code below:
from selenium import webdriver
#launch firefox
driver = webdriver.Firefox()
url1='https://poshmark.com/search?'
# search in a window a window
driver.get(url1)
xpath='//input[#id="user-search-box"]'
searchBox=driver.find_element_by_xpath(xpath)
brand="freepeople"
style="top"
searchBox.send_keys(' '.join([brand,"sequin",style]))
from selenium.webdriver.common.keys import Keys
#EQUIValent of hitting enter key
searchBox.send_keys(Keys.ENTER)
print(driver.current_url)
my code prints https://poshmark.com/search? but it should print: https://poshmark.com/search?query=freepeople+sequin+top&type=listings&department=Women because that is what selenium goes to.
The issue is that there is no lag between your searchBox.send_keys(Keys.ENTER) and print(driver.current_url).
There should be some time lag, so that the statement can pick the url change. If your code fires before url has actually changed, it gives you old url only.
The workaround would be to add time.sleep(1) to wait for 1 second. A hard coded sleep is not a good option though. You should do one of the following
Keep polling url and wait for the change to happen or the url
Wait for a object that you know would appear when the new page comes
Instead of using Keys.Enter simulate the operation using a .click() on search button if it is available
Usually when you use click method in selenium it takes cared of the page changes, so you don't see such issues. Here you press a key using selenium, which doesn't do any kind of waiting for page load. That is why you see the issue in the first place
I had the same issue and I came up with solution that uses default explicit wait (see how explicit wait works in documentation).
Here is my solution
class UrlHasChanged:
def __init__(self, old_url):
self.old_url = old_url
def __call__(self, driver):
return driver.current_url != self.old_url:
#contextmanager
def url_change(driver):
current_url = driver.current_url
yield
WebDriverWait(driver, 10).until(UrlHasChanged(current_url))
Explanation:
At first, I created my own wait condition (see here) that takes old_url as a parameter (url from before action was made) and checks whether old url is the same like current_url after some action. It returns false when both urls are the same and true otherwise.
Then, I created context manager to wrap action that I wanted to make, and I saved url before action was made, and after that I used WebDriverWait with created before wait condition.
Thanks to that solution I can now reuse this function with any action that changes url to wait for the change like that:
with url_change(driver):
login_panel.login_user(normal_user['username'], new_password)
assert driver.current_url == dashboard.url
It is safe because WebDriverWait(driver, 10).until(UrlHasChanged(current_url)) waits until current url will change and after 10 seconds it will stop waiting by throwing an exception.
What do you think about this?
I fixed this problem by clicking on the button by using href. Then do driver.get(hreflink). Click() was not working for me!

Resources